MozillaZine

Edit large mbox files

From MozillaZine Knowledge Base

This article was written for Thunderbird but also applies to Mozilla Suite / SeaMonkey (though some menu sequences may differ).

Undeleting a message or trying to repair a corrupt folder sometimes requires a editor that can open a 1GB or larger file. Most text editors can't open that large a file. You normally need enough memory available for the original file plus a copy. The problem is that most 32 bit Windows applications can't access more than 2GB even if you have enough memory plus swap file available. Its possible to increase that to about 3GB by editing the boot.ini file settings and using a utility to set a flag in the .EXE's header but thats risky.

A safer solution is to use a text or hex editor that knows how to edit files larger than 1GB without having to load the entire file in memory.

  • Notepad++ , EditPad Lite and JujuEdit are text editors for Windows that can edit 2GB files. EditPad Pro is a commercial version of EditPad Lite that can partially read files into memory using the "huge files threshold" setting in Options|Preferences|Open Files.
  • EmEditor is a commercial text editor that can edit a 248GB file. It works under Windows.
  • HiEditor and HxD are text editors that can edit any size text file. Both work under Windows.
  • ifhex is a hex editor that requires less than 2MB of memory to load a 2GB file. It works under Linux (requires Qt).
  • HexFiend is a hex editor that can edit files up to at least 118GB. It works under OSX.
  • VEDIT is a commercial text editor for Windows that can edit files up to 2GB. VEDIT Pro64 can edit a 100GB file.

Vim can edit files of up to 2GB on 32-bit machines, especially if you disable the swap and backup file support (it's also a good idea to disable syntax highlighting) and it works on most platforms including Windows, Unix, Linux, Mac, VAX/VMS, etc., but unless you're used to Vi you may find it awkward to use at first. There is a LargeFile plug-in that disables certain features to increase speed.

Emacs used to be a good solution but recent versions have a low filesize limit due to the elisp pointer representation. Textpad is another popular choice but it's limited to "file sizes up to the largest contiguous chunk of virtual memory". It can be hard to get a 1GB chunk if you don't have a lot of memory.

You could also use a file splitter such as Gsplit to split the file into several pieces, edit one or more of the pieces, and then have it recombine them. Gsplit can split 4GB files and creates a small program to join the pieces back together.

If you're running a 64 bit operating system with 4GB of memory text editors such as Notepad++ (a notepad replacement) that can edit any file that can fit in virtual memory will work. Vim (again, if you're willing to learn it) can also edit very large files on 64-bit systems.

Split a mbox file into two parts

There are utilities to split a file in two but there is no guarantee it won't split one of the messages into two parts. There are some utilities that will try to parse a mbox file and split each message into a separate file, but they don't work well if the mbox file is badly corrupted.

  1. Make two copies of the mbox file
  2. Open one of the copies with the editor
  3. Pick a message about midway in the file. Find its Message-ID: header. That should have a unique value. Copy and paste the header into Notepad or its equivalent.
  4. Scroll to the end of the message and then delete the remainder of the file. Save it.
  5. Open the other copy. Search for the unique Message-ID (copy and paste its value from Notepad into the edit field of whatever search command you use)
  6. Scroll to the end of the message. Delete everything in the file above it. Save it.

There is a From_ line before each message and a blank line after each message. A From_ line is a line that begins with the the characters F, r, o, m, space and contains the time and date. For example, From - Sun Jun 14 22:15:03 2009. See Recover messages from a corrupt folder for more information about the structure of a mbox file.

See also

External links