Edit large mbox files

From MozillaZine Knowledge Base

This article was written for Thunderbird but also applies to Mozilla Suite / SeaMonkey (though some menu sequences may differ).

Undeleting a message or trying to repair a corrupt folder sometimes requires a editor that can open a 1GB or larger file. Most text editors can't open that large a file. You normally need enough memory available for the original file plus a copy. The problem is that most 32 bit Windows applications can't access more than 2GB even if you have enough memory plus swap file available. Its possible to increase that to about 3GB by editing the boot.ini file settings and using a utility to set a flag in the .EXE's header but thats risky.

A safer solution is to use a text or hex editor that knows how to edit files larger than 1GB without having to load the entire file in memory.

  • Notepad++ , EditPad Lite and JujuEdit are text editors for Windows that can edit 2GB files. EditPad Pro is a commercial version of EditPad Lite that can partially read files into memory using the "huge files threshold" setting in Options|Preferences|Open Files.
  • EmEditor is a commercial text editor that can edit a 248GB file. It works under Windows.
  • HiEditor and HxD are text editors that can edit any size text file. Both work under Windows.
  • ifhex is a hex editor that requires less than 2MB of memory to load a 2GB file. It works under Linux (requires Qt).
  • HexFiend is a hex editor that can edit files up to at least 118GB. It works under OSX. .
  • UltraEdit is a commercial text editor for Windows, Linux and OS X that has no file size limit if you disable its use of temporary files. Its also a good idea to disable line numbers, line terminator conversion, code folding, the function list, syntax highlighting, the XML Manager and the Line Change Indicator (LCI) to increase performance. See this article for instructions on how to do that.
  • VEDIT is a commercial text editor for Windows that can edit files up to 2GB. VEDIT Pro64 can edit a 100GB file.
  • Vim can edit very large files on 64-bit systems. It can edit files of up to 2GB on 32-bit systems, especially if you disable the swap and backup file support (it's also a good idea to disable syntax highlighting) and it works on most platforms including Windows, Unix, Linux, Mac, VAX/VMS, etc., but unless you're used to Vi you may find it awkward to use at first. There is a LargeFile plug-in that disables certain features to increase speed.
  • Its not clear what the limits are but Sublime Text 3 will edit at least a 3.1 GB text file. [1]
  • PilotEdit Lite can edit 10GB files. The commercial version (PilotEdit) can edit 400GB files.

If you're running a 64 bit operating system with 4GB of memory text editors such as Notepad++ (a notepad replacement) that can edit any file that can fit in virtual memory will work.

Emacs used to be a good solution but recent versions have a low filesize limit due to the elisp pointer representation. Textpad is another popular choice but it's limited to "file sizes up to the largest contiguous chunk of virtual memory". It can be hard to get a 1GB chunk if you don't have a lot of memory.

Atom currently can't open large files. Supposedly that will be fixed in version 1.0. [2] [3] (the standard text editor under OS X) supports editing large files, though its not clear what the limit is. Its also slow. [4]

You could use a file splitter such as Gsplit to split the file into several pieces, edit one or more of the pieces, and then have it recombine them. Gsplit can split 4GB files and creates a small program to join the pieces back together.

TextWrangler is a OS X text editor that can edit any file up to 384MB. This limit is because they store the entire file in RAM, and it's Unicode encoded. However, somebody said they used `split -b 300m foo.xml`to let them edit just part of a xml file that was too large. [5] You might be able to find a text editor that supports command line arguments for loading just part of the file into memory, and edit it in multiple steps.

Split a mbox file into two parts

There are utilities to split a file in two but there is no guarantee it won't split one of the messages into two parts. There are some utilities that will try to parse a mbox file and split each message into a separate file, but they don't work well if the mbox file is badly corrupted.

  1. Make two copies of the mbox file
  2. Open one of the copies with the editor
  3. Pick a message about midway in the file. Find its Message-ID: header. That should have a unique value. Copy and paste the header into Notepad or its equivalent.
  4. Scroll to the end of the message and then delete the remainder of the file. Save it.
  5. Open the other copy. Search for the unique Message-ID (copy and paste its value from Notepad into the edit field of whatever search command you use)
  6. Scroll to the end of the message. Delete everything in the file above it. Save it.

There is a From_ line before each message and a blank line after each message. A From_ line is a line that begins with the the characters F, r, o, m, space and contains the time and date. For example, From - Sun Jun 14 22:15:03 2009. See Recover messages from a corrupt folder for more information about the structure of a mbox file.

See also

External links