MozillaZine

Network.IDN.blacklist chars

From MozillaZine Knowledge Base

Contents

Background

IDN addresses have recently come under close scrutiny, mostly due to domain registrars failing to follow certain guidelines that help prevent a type of website spoofing attack.

Mozilla’s first response to the threat of this type of spoofing was to disable IDN support and instead display the more verbose form of IDN URLs—punycode. (Punycode bears little resemblance to the intended appearance of an IDN, removing the risk of spoofing.)

Later, it was decided that some IDN addresses would be shown as intended—but only if the domain’s registrar had a public anti-spoofing policy. (Another preference keeps track of which top-level domains are displayed as intended.)

About the same time, developers realized that certain Unicode characters were too dangerous to ever be shown inside an IDN domain name. Initially, these just included characters that looked similar to a forward slash (U+2044 and U+2215). However, eventualy the list grew to include spaces (U+2006, U+2007), dots (U+06D4), fractions (U+2154), and other various characters. As a result of this realization, a blacklist of characters was created: if any IDN contained any of the specified characters, it would instead be shown in its punycode form.

As of 2009-02-24, the complete list of (107) blacklisted characters is as follows. (Depending on your browser, platform, and installed fonts, the example characters may not display as intended. Some of them aren’t intended for display in the normal sense of the word.)

Character Hex Code Character Name
U+0020 SPACE
  U+00A0 NO-BREAK SPACE
¼ U+00BC VULGAR FRACTION ONE QUARTER
½ U+00BD VULGAR FRACTION ONE HALF
¾ U+00BE VULGAR FRACTION THREE QUARTERS
ǃ U+01C3 LATIN LETTER RETROFLEX CLICK
ː U+02D0 MODIFIER LETTER TRIANGULAR COLON
̷ U+0337 COMBINING SHORT SOLIDUS OVERLAY
̸ U+0338 COMBINING LONG SOLIDUS OVERLAY
։ U+0589 ARMENIAN FULL STOP
׃ U+05C3 HEBREW PUNCTUATION SOF PASUQ
״ U+05F4 HEBREW PUNCTUATION GERSHAYIM
؉ U+0609 ARABIC-INDIC PER MILLE SIGN
؊ U+060A ARABIC-INDIC PER TEN THOUSAND SIGN
٪ U+066A ARABIC PERCENT SIGN
۔ U+06D4 ARABIC FULL STOP
܁ U+0701 SYRIAC SUPRALINEAR FULL STOP
܂ U+0702 SYRIAC SUBLINEAR FULL STOP
܃ U+0703 SYRIAC SUPRALINEAR COLON
܄ U+0704 SYRIAC SUBLINEAR COLON
U+115F HANGUL CHOSEONG FILLER
U+1160 HANGUL JUNGSEONG FILLER
U+1735 PHILIPPINE SINGLE PUNCTUATION
  U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM-SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+200B ZERO WIDTH SPACE
U+2024 ONE DOT LEADER
U+2027 HYPHENATION POINT
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR
U+202F NARROW NO-BREAK SPACE
U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
U+2041 CARET INSERTION POINT
U+2044 FRACTION SLASH
U+2052 COMMERCIAL MINUS SIGN
U+205F MEDIUM MATHEMATICAL SPACE
U+2153 VULGAR FRACTION ONE THIRD
U+2154 VULGAR FRACTION TWO THIRDS
U+2155 VULGAR FRACTION ONE FIFTH
U+2156 VULGAR FRACTION TWO FIFTHS
U+2157 VULGAR FRACTION THREE FIFTHS
U+2158 VULGAR FRACTION FOUR FIFTHS
U+2159 VULGAR FRACTION ONE SIXTH
U+215A VULGAR FRACTION FIVE SIXTHS
U+215B VULGAR FRACTION ONE EIGHT
U+215C VULGAR FRACTION THREE EIGHTHS
U+215D VULGAR FRACTION FIVE EIGHTHS
U+215E VULGAR FRACTION SEVEN EIGHTHS
U+215F FRACTION NUMERATOR ONE
U+2215 DIVISION SLASH
U+2236 RATIO
U+23AE INTEGRAL EXTENSION
U+2571 BOX DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT
U+29F6 SOLIDUS WITH OVERBAR
U+29F8 BIG SOLIDUS
U+2AFB TRIPLE SOLIDUS BINARY RELATION
U+2AFD DOUBLE SOLIDUS OPERATOR
U+2FF0 IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT
U+2FF1 IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO BELOW
U+2FF2 IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO MIDDLE AND RIGHT
U+2FF3 IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO MIDDLE AND BELOW
U+2FF4 IDEOGRAPHIC DESCRIPTION CHARACTER FULL SURROUND
U+2FF5 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM ABOVE
U+2FF6 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM BELOW
U+2FF7 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LEFT
U+2FF8 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER LEFT
U+2FF9 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT
U+2FFA IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER LEFT
U+2FFB IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID
  U+3000 IDEOGRAPHIC SPACE
U+3002 IDEOGRAPHIC FULL STOP
U+3014 LEFT TORTOISE SHELL BRACKET
U+3015 RIGHT TORTOISE SHELL BRACKET
U+3033 VERTICAL KANA REPEAT MARK UPPER HALF
U+3164 HANGUL FILLER
U+321D PARENTHESIZED KOREAN CHARACTER OJEON
U+321E PARENTHESIZED KOREAN CHARACTER O HU
U+33AE SQUARE RAD OVER S
U+33AF SQUARE RAD OVER S SQUARED
U+33C6 SQUARE C OVER KG
U+33DF SQUARE A OVER M
U+A789 MODIFIER LETTER COLON
U+FE14 PRESENTATION FORM FOR VERTICAL SEMICOLON
U+FE15 PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK
︿ U+FE3F PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET
U+FE5D SMALL LEFT TORTOISE SHELL BRACKET
U+FE5E SMALL RIGHT TORTOISE SHELL BRACKET
 U+FEFF ZERO-WIDTH NO-BREAK SPACE
U+FF0E FULLWIDTH FULL STOP
U+FF0F FULL WIDTH SOLIDUS
U+FF61 HALFWIDTH IDEOGRAPHIC FULL STOP
U+FFA0 HALFWIDTH HANGUL FILLER
U+FFF9 INTERLINEAR ANNOTATION ANCHOR
U+FFFA INTERLINEAR ANNOTATION SEPARATOR
U+FFFB INTERLINEAR ANNOTATION TERMINATOR
U+FFFC OBJECT REPLACEMENT CHARACTER
U+FFFD REPLACEMENT CHARACTER

Possible values and their effects

This string preference interprets every character in the value as an entry in the blacklist. The default value is a string containing the characters in the table above.

First checked in

2005-07-22 by Masayuki Nakano

Has an effect in

  • Deer Park Alpha 2
  • Mozilla Firefox 1.5 (all versions since Beta 1)
  • SeaMonkey (all versions)

Related bugs

Related preferences

External links