Junk Mail Controls

From MozillaZine Knowledge Base
Jump to navigationJump to search

This article applies to both Thunderbird and the Mozilla Suite. The Junk Mail Controls interface in the Mozilla Suite is slightly different from the Thunderbird interface, described below.


Both Thunderbird and the Mozilla Suite can detect and filter unwanted e-mail messages. In order for the junk-mail filtering to be effective, however, you first need to "train" it because it uses Bayesian filtering.

Activating the Junk Mail Controls

To start using the Junk Mail Controls in Thunderbird:

  1. In versions 1.5.0.x and earlier, go to "Tools -> Junk Mail Controls...". In 2.0.0.x, the controls are divided between two locations: (a) Tools -> Options -> Privacy -> Junk and (b) Tools -> Account Settings -> *each account* -> Junk Settings.
  2. From the dropdown list, choose the email account for which you want to activate the Junk Mail Controls. "Email account" means the folder structure under which the emails are saved, so if you use the Global Inbox in Thunderbird, you need to select "Local Folders" as the account. This is different from the message filters, where you select the account that retrieves the email.
  3. Click the "Adaptive Filter" tab and make sure that the checkbox for "Enable adaptive junkmail detection" is checked. Even though the phrase "Configure Junk Settings for: <account name>" shows when you are on this tab, the option to reset training data applies to ALL of your accounts. There is only one set of training data, so don't reset it here thinking you will only affect the account shown.
  4. Click the "Settings" tab and choose your settings for "White lists" and "Handling" as desired.
    • You can choose to have junk messages sent to the Junk folder, but the Junk folder normally will not appear alongside your other mail folders until it is first used.
    • If you choose the option for "sanitizing " junk e-mails, this means that messages marked as junk will be displayed without images or other HTML formatting.
  5. If you have more than one email account, repeat the above steps for each account with which you want to use the Junk Mail Controls.
  6. When finished, click the "OK" button to exit the Junk Mail Controls dialog.

Training the Junk Mail Controls

To train the junk-mail filtering, you need to mark messages that you've received as either "junk" or "not junk", and it's important that you mark both types of messages rather than simply the ones that are junk. There are various ways that you can mark messages:

  • Right-click on a message and choose "Mark -> As Junk" (or "As Not Junk").
  • Select a message and from the "Message" menu, choose "Mark -> As Junk" (or "As Not Junk").
  • Select a message and click on the "Junk" icon on the toolbar.
  • Select a message and click on the "Junk Status" column in the message-list pane (which will show a small "Junk" icon if the message is marked as junk).
  • Select a message and type "J" (for Junk) or "Shift+J" (for not Junk).

Initially, the automatic junk mail detection for incoming messages might not be very accurate because it hasn't yet been trained very much, and you should thus be careful to check your Junk folder to see if any non-junk messages have been mistakenly detected as junk. After an initial training period, however, you should find that the Junk Mail Controls are very effectively detecting unwanted junk emails and keeping them from your Inbox.

Any mention of Spam

Thunderbird doesn't use or key on the word Spam in it's identification of junk mail. If you see the word Spam in a Subject or have a Spam folder it is either due to your email provider or a add-on (such as SpamPal). If its due to your email provider log into webmail using a browser and browse its help to get more information. They may support webmail commands to let you manage or disable whatever they're doing.

Use custom headers added by your email provider

Your email provider may run a spam filtering program on their mail server such as SpamAssassin, MailScanner, CRM114, SpamProbe, QSF or Bogofilter that analyzes each message and adds custom headers with information about its content. If they use SpamAssassin, see the next section for how to integrate it with Thunderbirds junk mail controls. Otherwise, use "View -> Message Source" and look for headers whose name begins with a 'X' and contain phrases such as Spam. They typically provide a spam score and/or keywords that you can test using message filters.

For example:

X-Spam-score: 1.5
X-Spam-hits: BAYES_60 1, HTML_IMAGE_ONLY_16 1.526, HTML_IMAGE_RATIO_08 0.001,
  HTML_MESSAGE 0.001, RCVD_IN_DNSWL_LOW -1, SPF_HELO_PASS -0.001,
  SPF_PASS -0.001, BAYES_USED global
X-Spam-source: IP='88.131.62.198', Host='mail.anp.se', Country='SE', FromHeader='com',
  MailFrom='se'

You could test whether X-Spam-score "is greater than" a certain value, whether X-Spam-source "doesn't contain" Country='US' (the example was from Sweden) or test whether X-Spam-hits "contains" certain keywords (they're the name of a test that increased the spam score) that you notice that the junk mail controls has problems recognizing spam with those attributes. Not every email provider will provide as much customization as the example, but it should at least have some sort of spam score you can test.

If your email provider supports Sender Policy Framework (SPF) and/or DomainKeys (DK) you could use the Sender Verification Extension. SPF and DK are frameworks used to help figure out whether a From: address was spoofed. The add-on uses that information plus DNS black and white lists such as SURBL, Spamhaus, DNSWL, and Sender Score Certified to check the senders reputation. When you read a message the add-on adds a line at the top of message with the verification status of the sender. For example, "Reputable Sender", "This sender is a known malicious spammer or phisher. Discard this email.", or "Sending domain does not support verification (address could be forged).". Since you have to actually read the message to get the warning, its probably most useful as a alternative to Thunderbirds phishing protection (which most users disable due to its inability to learn, and many false positives).

Trusting SpamAssassin and SpamPal

Spammers sometimes use Bayesian poisoning to degrade spam filters that use Bayesian filtering. SpamPal uses DNS Blacklists and SpamAssassin uses several methods (its most well known for its extensive testing of message headers) to filter spam so that type of attack has little or no effect on them. Both of them add special headers to a message to indicate whether its spam.

Tools -> Junk Mail Controls has a setting to tell Thunderbird to trust junk mail headers set by either SpamPal or SpamAssassin. The order of processing is:

  1. Message filters
  2. "Trust header"
  3. "Adaptive junk" (junk mail controls)

Some email providers customize the headers added by SpamAssassin, or modify the subject prefix. This can cause the junk mail controls to ignore the information. "Trust header" is actually a standard message filter, stored in a isp subdirectory in the Thunderbird program directory. Thunderbird checks whether either X-Spam-Status: or X-Spam-Flag: begins with Yes, or the subject begins with ***SPAM***. If you run into this problem backup the SpamAssassin.sfd file and then change what it tests for using a text editor (not a word processor). There is also a SpamPal.sfd file.

Some users have reported that the trust SpamAssassin option sometimes ignores the junk mail headers in Thunderbird 2.x. Its not clear whether you can workaround this bug by disabling the option and adding the appropriate message filter.

Tweaking

The mail.adaptivefilters.junk_threshold preference is a threshold used to determine when messages are classified as junk. It defaults to 90 in version 1.5.0.4. Lowering this value will make it easier to recognize messages as spam, though it increases the risk that it will classify a legitimate message as spam. This might be useful if you get spam messages that it seems to have a tough time learning about. For example, messages that look like text but are actually clickable images.

You can change the preference using Tools -> Options -> Advanced -> General -> Config Editor. Enter junk in the Filter field to show only the preferences that contain junk in their name, and then double-click on mail.adaptivefilters.junk_threshold, enter a value lower than the default 90 in the edit field and press the OK button. Many users report good results with values of 30 or lower.

The bayesian filter typically requires several hundred spam and several hundred legitimate messages in order to train itself to recognize spam. Its needs both, if you have a thousand spam messages but only a dozen legitimate messages it won't learn much. This doesn't mean its initially useless, just that how well it works will depend a lot more upon what spam messages you get. You don't need to keep the messages afterwards, it stores all of the information it needs as tokens in the training.dat file.

The Bayes Junk Tool can be used to examine and modify the training data. Sometimes it helps to get rid of tokens that are just as likely to occur in spam and legitimate messages, especially if the training data file gets very large. The web site also has several sets of training data that you can import or merge with your existing training data.

Bayesian filters are useful, but they're not always the best tool. Sometimes checking whether the message was sent by somebody on a DNSBL list is more effective. See this article for how to integrate SpamPal and the junk mail controls, and control which messages are downloaded.

The FolderFlags extension can set various internal flags that Thunderbird uses to classify folders. If you set the "Junk" flag on a folder (other than the one spam is moved to) it won't scan that folder for spam.

The Delete Junk Context Menu extension adds "Delete Mail Marked as Junk" to a folders context menu. It can be configured to delete mail without moving it to using the Trash folder.

Image Spam

One way to weed out image based spam is to create a message filter and set it to match all of the following:

  • Content-Type contains multipart/related
  • From isn't in my address book (repeat this for each address book)

In "Perform these actions" add

  • "Set Junk Status" to "Junk"
  • "Move Message to" your junk mail folder.

That will mark the message as junk and move it to the junk mail folder if the Content-Type header contains multipart/related and the sender wasn't in your address book. The message filters don't know how to recognize Content-Type headers, you will need to add it using the "customize..." option at the bottom of the leftmost list box. This method is rather heavy-handed. If your email provider runs a spam filter program such as SpamAssassin it will typically do a much better job recognizing image spam.

Problems with Junk Processing

Other information

  • The file that stores your custom training data for the Junk Mail Controls is called "training.dat". It is stored in your profile folder.
  • To view your Junk Mail Log, you follow "Tools -> Junk Mail Controls -> Logging", select "Junk Mail Log" and then use the mouse to 'grab' any corner and stretch it open till it expands and the area where any log data is will show.

Regular Expressions - Advanced

Neither the junk mail controls or the message filters support wild cards or regular expressions. There don't appear to be any extensions that add support for that. However, SpamPal (a mail classification program normally used for filtering spam) supports a RegExFilter Plugin that adds regular expression support based on Perl Regular Expressions. If you configure the junk mail controls to trust SpamPal you could use regular expressions to filter spam. [1]

There are many other SpamPal plugins available here. For example, you could extend white lists and black lists to apply to email addresses from any header, white list any message that contain words from a list of good words, filter on what web sites are mentioned, launch other programs (passing them information about the message as command line arguments) or run Ruby scripts. The main drawback is none of this is integrated into the junk mail controls - it just knows when SpamPal marks a message as spam.

See also

External links