Recover messages from a corrupt folder

From MozillaZine Knowledge Base
Jump to navigationJump to search
This article was written for Thunderbird but also applies to Mozilla Suite / SeaMonkey (though some menu sequences may differ).

You can undelete a message by just editing the X-Mozilla-Status header for a message in an mbox file and setting it to zero. You don't need to understand much to do that - mainly that folders are stored as mbox files and how to recognize an X-Mozilla-Status header. Unfortunately, knowing that is not enough when recovering messages from a badly corrupted folder. You need to understand the basic structure of an Internet message, be able to identify the beginning and end of messages, decide how to fix damaged messages, and when to scrap message fragments. It is straightforward once you know how to do that, but there is no silver bullet.

If a folder is not badly corrupted you can work around the problem by either deleting the index file (folder.msf). compacting the folder, or moving all of the messages to an empty folder and replacing the corrupt folder. This is described in Compacting folders. This article deals with what to do when that doesn't work and you can't recover many of the messages by copying/moving them.

Both POP and IMAP accounts can have corrupt folders. This article only deals with corrupt folders stored on your hard disk, file share or USB drive since usually you can recover messages from a corrupt remote folder by moving them and then compacting the folder, and a IMAP server can use many different formats to store the messages.

Preparation

  1. Do not compact the folder again. It can make things worse. If you're still running Thunderbird, exit it.
  2. Back up your entire profile . Mozbackup is a popular way to do this if you are using Windows.
  3. Find your profile. If you are using a recent version of Thunderbird Help -> Troubleshooting Information -> Show Folder will display the profile using your systems file manager (windows explorer, finder etc.). If you can't see it using Windows Explorer (or whatever file browser your operating system supports) read show hidden files and folders.

Mbox files

Thunderbird uses mbox files to store the messages for a folder. Each mbox file is an ordinary 7-bit ASCII text file with the folders name and no file extension. You can read the messages using a text editor though it will be very user unfriendly due to all of the headers that are normally hidden, HTML tags not being interpreted, and seeing big blocks of characters all run together for any binary attachments.

Thunderbird uses a "index" file with the folders name and a .msf file extension to cache the information needed for the folder listing. There will also frequently be a subdirectory with the folders name and a .sbd extension. Only the mbox file has any messages. Don't bother with the other files. The mbox file for your inbox folder is "inbox."

Neither the "index" file nor the mbox file has any sort of index indicating where messages begins or ends. The mbox file stores each message separately in the order they were downloaded/saved. There is a From_ line before each message and a blank line after each message. The From_ line is not a From: header, in fact you can't even see it using View -> Message Source. Its a line that begins with the the characters F, r, o, m, space and contains the time and date. Thunderbird always stores a X-Mozilla-Status: and a X-Mozilla-Status2: header right after it. For example:

From - Sun Jun 14 22:15:03 2009
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000

Once Thunderbird starts to lose track of where each message begins and ends when it physically deletes a message its easy for it to accidentally physically delete several other messages. Its unusual, but it is possible to lose every message in a folder by compacting it if it is badly corrupted. A badly corrupt folder will typically have a mixture of good messages, some that are mangled but still recognizable as messages, some messages that are scrunched together as one message, and message fragments. The message fragments are usually identifiable in a folder listing as they have a date of 1969. This is the default date when there is no Date: header.

If you see a line in the message body beginning with either >From_ or >>From_ that is "From_ line quoting" to prevent the email client from confusing the word "From" in the message body with the From_line used to separate messages. Supposedly Thunderbird still uses a mboxrd variation of a mbox file with more complex From_ line quoting rules but recent versions don't seem to use From_ line quoting.

Recommended solution

Use the Cut MboxD program (a Linux version of CutMbox is available here) to break the mbox file into multiple mbox files that each have 500 messages. It knows how to fix some types of minor corruption, and if you have just one or two messages that Thunderbird can't identify where they end you may recover big blocks of messages by breaking the file into several files. See importing folders for how to import each of those mbox files into a child folder in "Local Folders". You can then merge any recovered messages.

If you don't recover everything you can try to edit some of those mbox files per the next section. That requires you to understand the layout of a mbox file and is something most users are not comfortable doing. Since you broke the large mbox file into smaller mbox files you probably can get by using a normal text editor unless they have large attachments.

Edit mbox file - Advanced

Install a text editor that can edit large mbox files such as JujuEdit . Then edit the mbox file to fix the layout of each message so that Thunderbird can find it. If you have a lot of messages you might want to remove any message fragments and paste fake From_ headers and blank lines as needed so that Thunderbird recognizes the start of a message rather than try to completely recover everything.

Structure of a message

Dan's Mail Format Site has an excellent description of all of the parts of a message. However, you only need to recognize how the different MIME sections are stored and what a Content-Type header is used for. The original Internet message format only supported 7bit ASCII messages. The MIME standard adds a few headers to identify what type of a message it is and how its encoded. Messages can have multiple copies of a message body using different formats, break a message into multiple parts of varying types etc. There are many possible combinations.

The key is to find the Content-Type: header near the Subject. If its set to a string that doesn't have the word multipart then there is only one MIME section. For example

Content-Type: text/plain; charset=UTF-8

means its a plain text message using Unicode (UTF-8). All you need to verify in this case is that you can identify the beginning and end of the message (look for the From_ line and the blank line after the message). If it contains the word multipart you need to make certain that Thunderbird can parse the different MIME sections. Each MIME section will have its own Content-Type: header and will be separated by a boundary string. For example

Content-Type: multipart/mixed; boundary="------------040302080909030309080805"

means that the message has multiple MIME parts, each of which has a Content-Type header, and separated by "------------040302080909030309080805" boundary strings. You'll typically see this with a plain text or HTML message followed by one or more attachments. There are many different types of MIME sections, you don't need to recognize each of them. Whats necessary is to make certain that Thunderbird can parse the message. This means:

  • Adding a boundary string at the end of a multipart message if it doesn't have one
  • Getting rid of fragments of MIME sections,
  • Deleting any message fragments that you can't fix.
  • If two message are scrunched together you can separate them with two blank lines and then copy and paste one of the From_ lines from another message at the top of the second message. The contents of the From_ line don't really matter as long as its recognized as a From_ line.

Example of what to look for

This is a copy of a uncorrupted "inbox." file in Mail\Local Folders that contains two messages. The first message is a plain text message notifying somebody about a post to Mozillazine. It only has one Content-Type header. The next message had two parts, one of them a binary attachment. The Agalychnis_saltator_leaf_frog.jpg file is base64 encoded. That is a way to store 2 bytes of binary data as three 7bit ASCII bytes. It looks like a big block of characters all run together.

The critical parts are bolded. Notice how little you have to pay attention to. If the message has lost part of the headers at the start of the message it might not look right in the folder listing, but you could still read the message. There are a couple of other MIME headers such as Content-Transfer-Encoding: and Content-Disposition: that are important but its rare for them to get lost or mangled without losing most of the MIME section they're in. Its not worth paying attention to them unless you want to invest the time in learning more about the MIME standards. You are not going to be able to recover everything, normally you should just try to recover most of the messages without spending a lot of time or getting frustrated.


From - Sun Jun 14 22:15:03 2009

X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Return-Path: <apache@mozillazine.org>
Received: from compute1.internal (compute1.internal [10.202.2.41])
	 by store50m.internal (Cyrus v2.3.14-fmsvn18904-c7f26adc) with LMTPA;
	 Mon, 15 Jun 2009 00:39:43 -0400
Received: from mx5.messagingengine.com ([10.202.2.204])
  by compute1.internal (LMTPProxy); Mon, 15 Jun 2009 00:39:43 -0400
Received: from fraxinus.osuosl.org (fraxinus.osuosl.org [140.211.166.137])
	by mx5.messagingengine.com (Postfix) with ESMTP id 7ACB236AD6
	for <somebody@example.com>; Mon, 15 Jun 2009 00:39:42 -0400 (EDT)
Received: from localhost (localhost [127.0.0.1])
	by fraxinus.osuosl.org (Postfix) with ESMTP id EA5BB1C40BD
	for <somebody@example.com>; Mon, 15 Jun 2009 04:39:40 +0000 (UTC)
X-Virus-Scanned: amavisd-new at osuosl.org
Received: from fraxinus.osuosl.org ([127.0.0.1])
	by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id B3L7DQq5vKSY for <somebody@example.com>;
	Mon, 15 Jun 2009 04:39:40 +0000 (UTC)
Received: from www2.mozillazine.org (www2.mozillazine.org [140.211.166.87])
	by fraxinus.osuosl.org (Postfix) with ESMTP id CB8161C4011
	for <somebody@example.com>; Mon, 15 Jun 2009 04:39:40 +0000 (UTC)
Received: by www2.mozillazine.org (Postfix, from userid 81)
	id C752184003E; Mon, 15 Jun 2009 04:39:40 +0000 (UTC)
To: "=?UTF-8?B?dGFuc3RhYWZs?=" <somebody@example.com>
Subject: =?UTF-8?B?VG9waWMgcmVwbHkgbm90aWZpY2F0aW9uIC0gIlNldHRpbmcgVXAgQUlNIE1h?=
 =?UTF-8?B?aWwgSW4gVGh1bmRlcmJpcmQi?=
From: <forums@mozillazine.org>
Reply-To: <forums@mozillazine.org>
Sender: <forums@mozillazine.org>
MIME-Version: 1.0
Message-ID: <56c2446a10fdd3e287456dbf9f18461d@forums.mozillazine.org>
Date: Mon, 15 Jun 2009 04:39:32 +0000

Content-Type: text/plain; charset=UTF-8

Content-Transfer-Encoding: 8bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: PhpBB3
X-MimeOLE: phpBB3
X-phpBB-Origin: phpbb://forums.mozillazine.org

Hello tanstaafl,

You are receiving this notification because you are watching the topic,
"Setting Up AIM Mail In Thunderbird" at "mozillaZine Forums". This topic
has received a reply since your last visit. You can use the following link
to view the replies made, no more notifications will be sent until you
visit the topic.

If you want to view the newest post made since your last visit, click the
following link:

http://forums.mozillazine.org/viewtopic.php?f=28&t=1298175&p=6706055&e=6706055

If you want to view the topic, click the following link:
http://forums.mozillazine.org/viewtopic.php?f=28&t=1298175

If you want to view the forum, click the following link:
http://forums.mozillazine.org/viewforum.php?f=28

If you no longer wish to watch this topic you can either click the
"Unsubscribe topic" link found at the bottom of the topic above, or by
clicking the following link:


http://forums.mozillazine.org/viewtopic.php?uid=10639&f=28&t=1298175&unwatch=topic

-- 
Thanks, mozillaZine

From - Sun Jun 14 22:15:15 2009

X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Return-Path: <somebody@example.com>
Received: from compute1.internal (compute1.internal [10.202.2.41])
	 by store50m.internal (Cyrus v2.3.13-fmsvn16638) with LMTPA;
	 Thu, 30 Oct 2008 20:55:09 -0400
X-Sieve: CMU Sieve 2.3
X-Spam-score: 0.0
X-Spam-hits: BAYES_50 0.001, SPF_PASS -0.001, BAYES_USED global
X-Spam-source: IP='74.208.5.67', Host='mail.gmx.com', Country='US', FromHeader='com',
  MailFrom='com'
X-Spam-charsets: plain='ISO-8859-1'
X-Attached: Agalychnis_saltator_leaf_frog.jpg
X-Resolved-to: somebody@example.com
X-Delivered-to: somebody@example.com
X-Mail-from: somebody@example.com
Received: from mx1.messagingengine.com ([10.202.2.200])
  by compute1.internal (LMTPProxy); Thu, 30 Oct 2008 20:55:09 -0400
Received: from mail.gmx.com (mail.gmx.com [74.208.5.67])
	by mx1.messagingengine.com (Postfix) with SMTP id 76CE9286FA1
	for <somebody@example.com>; Thu, 30 Oct 2008 20:55:07 -0400 (EDT)
Received: (qmail invoked by alias); 31 Oct 2008 00:55:05 -0000
Received: from netblock-68-183-70-71.dslextreme.com (EHLO [192.168.0.101]) [68.183.70.71]
  by mail.gmx.com (mp-us002) with SMTP; 30 Oct 2008 20:55:05 -0400
X-Authenticated: #48372005
X-Provags-ID: V01U2FsdGVkX1/KhfBDHBHKEXzJxhelC+sSkqu4Zr2S7EHfv17ZXM
	nssAoTN3VbkYyR
Message-ID: <490A5768.1020204@gmx.com>
Date: Thu, 30 Oct 2008 17:55:04 -0700
From: Somebody <somebody@example.com>
User-Agent: Thunderbird 2.0.0.17 (Windows/20080914)
MIME-Version: 1.0
To: somebody@example.com
Subject: plain text message with an attachment

Content-Type: multipart/mixed; boundary="------------040302080909030309080805"

X-Y-GMX-Trusted: 0
X-FuHaFi: 0.00

This is a multi-part message in MIME format.

--------------040302080909030309080805

Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Content-Transfer-Encoding: 7bit

blah blah

--------------040302080909030309080805

Content-Type: image/jpeg; name="Agalychnis_saltator_leaf_frog.jpg"

Content-Transfer-Encoding: base64
Content-Disposition: inline;
 filename="Agalychnis_saltator_leaf_frog.jpg"

/9j/4AAQSkZJRgABAQEAZABkAAD/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQY
GBcUFhYaHSUfGhsjHBYWICwgIyYnKSopGR8tMC0oMCUoKSj/2wBDAQcHBwoIChMKChMoGhYa
KCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCj/wAAR
CACFAMgDAREAAhEBAxEB/8QAHAABAAEFAQEAAAAAAAAAAAAAAAEDBAUGBwII/8QAOxAAAQMD
AgQEBAUBBgcAAAAAAQACAwQFEQYhEhMxQQciUWEyQnGBCBQVscEjFmJykaHRJENSgpLh8P/E
ABsBAQACAwEBAAAAAAAAAAAAAAABBAIDBQYH/8QAMhEAAgEDAgQEBQQCAwEAAAAAAAECAwQR
EiEFEzFBUWFx8CIyobHRFIGRwSPhBhXxQv/aAAwDAQACEQMRAD8A+VUBKAhASgCAIAgIQEoA
gCAIAgIQBASgCAICqwYYXHqVg3l4IKXdZAIApABwoBcQvWEkEXGxHutZJjlYAQEoAgCAIAgC
AICEBKAIAgIQEoAgCAmNvG8NChvCyCrOcHhHQLGK7kFFZgKGApAQEtOCoaDLiOT3WtxGS2W0
kIAgIQEoAgCAIQEAQkIAhAQkIQEJCAIC5p2cMRkPfotU3l4BSxxuJ7BZ9CCmeqyA7KAFJIQB
CD00qGiGeVJkEAQBAEICEhCAgCAIAgCAIAgCAICvT05lwTkNOwx3Wuc9Jml3ZXrCGNDG7ADC
VH2NUyuoaRlAMoBlMAZTACkBCQgCAIAgP//Z

--------------040302080909030309080805--


Fix it

  1. Delete the index file (folder.msf). Thunderbird will create a new one when it starts based on whatever messages it can find in the mbox file.
  2. Make a backup copy of the mbox file. Your first attempt at recovering messages will probably fail and you'll want to start over without having to restore the entire profile.
  3. Edit the mbox file using the text editor. You can get away with using Notepad if its less than a 64KB file but its safer to use a text editor that can handle larger files. Do not use a word processor (Microsoft Word) or you will mangle the messages.
  4. If you have hundreds of messages in the folder split the mbox file into several mbox files, splitting it just before a good message. This will make it easier to work with. You can merge the messages into one folder later on using Thunderbird.
  5. If you're unsure what to do sometimes its useful to save the edited mbox file, make a change, save it as a mbox file with a different name in the Local Folders section of your profile, start Thunderbird, open that folder, and see what it looks like. If you use an editor that has a very large undo buffer that works after a save (such as JujuEdit) you could make a change, view the mbox file as a folder using the MboxViewer and then undo the change (while the file is still open) if it didn't do what you want.
  6. If you have problems understanding a fragmented multipart message look at an example of a similar message in a good folder.

The best way to learn how to recover messages is to experiment. For example, make a copy of a small good mbox file (with a different name), use an editor to remove part of a message and see what effect that has in Thunderbird.

Keep in mind that if the idea of manually scrolling through the file with a text editor, recognizing patterns and trying to fix errors seems too hard that you don't have to fix everything. You could just add From_ lines and blank lines as needed and delete message fragments in order to let Thunderbird parse the mbox file, writing off some messages as garbage or deleting them. You could copy and paste any plain text from the message body, the subject and who sent the message into a draft message before deleting a garbage message.

Alternative solutions

1. You could use the ImportExportTools add-on to:

  • Split the mbox into multiple .EML files
  • Import the .EML files in small bunches to isolate which ones are broken.
  • Split the .EML files into multiple .EML files if they contain parts of multiple messages
  • Edit the broken .EML files as needed and import them.

The main drawback is that the add-on is trying to parse the mbox file just like Thunderbird. It will usually fail, and if it succeeds all you really did was run a file splitter. You still have to correct the damage, and its usually easier to do that editing a large mbox file where you have a overall view of where corruption might have occurred.

2. You could copy and paste pieces of a message into a draft message. However, that would only let you recover data in its most basic form. You'd lose information about who sent the message etc., would have to strip HTML tags, couldn't recover any attachments etc.

3. There are a lot of PERL scripts to manipulate mbox files available on the Internet due to the mbox format originally being developed for UNIX mail systems. Some can add a missing blank line before a From_ line or try to fix a problem with From_ line quoting but none seem suitable for fixing corrupt folders due to not compacting often enough.


See also

External links