Foreign language spam: Difference between revisions

From MozillaZine Knowledge Base
Jump to navigationJump to search
Line 28: Line 28:
You could add a custom header for X-Spam-source and test whether "X-Spam-source" "doesn't contain" "US". Or create a message filter set to "Matches any of the following" that tests whether "X-Spam-source" "contains" "CN" and "X-Spam-source" "contains" "RU". See [http://www.wap.org/info/techstuff/domains.html this web page] for a list of internet domain abbreviations.
You could add a custom header for X-Spam-source and test whether "X-Spam-source" "doesn't contain" "US". Or create a message filter set to "Matches any of the following" that tests whether "X-Spam-source" "contains" "CN" and "X-Spam-source" "contains" "RU". See [http://www.wap.org/info/techstuff/domains.html this web page] for a list of internet domain abbreviations.


There is a [https://addons.mozilla.org/en-US/thunderbird/addon/3595 Country lookup] extension but it appears designed to just show you colored flag icon, and doesn't set a custom header that you can test. Same thing for the [https://addons.mozilla.org/en-US/thunderbird/addon/1244 Display Mail Route] extension.
There is a [https://addons.mozilla.org/en-US/thunderbird/addon/3595 Country lookup] extension but it appears designed to just show you a colored flag icon, and doesn't set a custom header that you can test. Same thing for the [https://addons.mozilla.org/en-US/thunderbird/addon/1244 Display Mail Route] extension.


==See also==
==See also==

Revision as of 05:04, 8 December 2008

This article discusses several different ways to identify foreign language spam using message filters. It will focus on Russian and Chinese since they're the most common case, but the same techniques can be used with any foreign language. Several of the techniques rely upon custom headers. You can add a custom header by selecting "Customize" from the bottom of the left most list box (it starts with Subject) when creating a message filter. If you add one you have to use the custom header in that message filter, but it can be used in any other message filter (in any account).

Content-Type: header

The Content-Type header may identify the character set. For example, Content-Type: text/plain; charset=koi8-r indicates it a plain text message using the KOI8-R character set encoding. Its designed to cover Russian and Bulgarian using the Cyrillic alphabet. If it does, then you can add a custom header for Content-Type and test whether "Content-Type" "contains" "koi8-r" . Use "View -> message source" (or <Control>U) to see the message source.

Russian uses the Cyrillic alphabet. Some commonly used character sets for Russian spam are are KOI8-R , KOI8-U, ISO 8859-5, and Windows-1251.

There are a couple of problems with this approach:

  • They don't have to identify the character set.
  • It might use Unicode (UTF-8) or Windows-1251 (it adds Cyrillic alphabet characters to a 7-bit ASCII character set). These character sets are sometimes used for messages that just have 7-Bit ASCII characters.
  • If the Content-Type line wraps Thunderbird won't read all of it. This is a bug, it will read multiple lines for "Recieved", "To" and "CC". You used to be able to workaround this by testing whether the "Body" contain that string, but they recently redefined "Body" to mean just the message body rather than the entire message, including all headers. Unfortunately when they did that they didn't add some alias such as "Headers" to let you test all of the headers or "All" to test the entire message.

The Wikipedia has a list of popular, Cyrilic , Big5 (Chinese) , GB (Chinese) character sets.

Foreign letters

Look in the Wikipedia to find the most common vowels for a foreign language and then test whether a message contains any of them that are not used in English. Russian uses а, э, ы, у, о, я, е, ё, ю, и. You could create a message filter set to "Matches any of the following" that test whether "Body" "contains" "и", "Body" "contains" "ё" and so forth until you covered all of the vowels. However, since English also uses "a" you wouldn't want to test for that letter. The reason for "Matches any of the following" is to logically OR them - you want the action to take place if any of those letters are found.

Foreign words

Chose several very common words such as "and" and "to", and use babelFish to convert the word from English to the other language. For example, and is и and to is до in Russian. You could create a message filter set to "Matches any of the following" that tests whether "Body" "contains" "и" and "Body" "contains" "до"

What country the senders SMTP server is in

Thunderbird doesn't provide any information on what country the sender's SMTP server is in. But some SpamAssassin implementations are configured to identify what country the senders domain is in. For example:

X-Spam-source: IP='202.108.255.197', Host='smtpr2.tom.com', Country='CN', FromHeader='com', MailFrom='com'

You could add a custom header for X-Spam-source and test whether "X-Spam-source" "doesn't contain" "US". Or create a message filter set to "Matches any of the following" that tests whether "X-Spam-source" "contains" "CN" and "X-Spam-source" "contains" "RU". See this web page for a list of internet domain abbreviations.

There is a Country lookup extension but it appears designed to just show you a colored flag icon, and doesn't set a custom header that you can test. Same thing for the Display Mail Route extension.

See also

External links