注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

写着玩

Bob

 
 
 

日志

 
 
 
 

IDN in Google Chrome  

2010-01-10 22:47:53|  分类: Chrome |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

Background

Back in the day, hostnames could only consist of the letters A to Z, digits, and a few other characters. Internationalized Domain Names (IDNs) were devised to support arbitrary Unicode characters in hostnames in a backward-compatible way. This works by having user agents transform a hostname containing Unicode characters to one fitting the traditional mold, which can then be sent on to DNS servers. For example, http://?bb.at is transformed to http://xn--bb-eka.at. The transformed form is called punycode.

Ideally, user agents could always display the Unicode version of a hostname. However, different characters from different languages can look very similar, and this can make phishing attacks possible. For example, the Latin "a" looks a lot like the Cyrillic "а", so someone could register http://ebаy.com (http://xn--eby-7cd.com/), which would easily be mistaken for http://ebay.com. This is called a homograph attack.

In a perfect world, domain registrars would not allow such nefarious domain names to be registered. Some TLD registrars do exactly that, mostly by restricting the characters allowed, but many do not. For some TLDs that are meant to be international, this would be nontrivial to do (e.g., .com).

As a result, all browsers try to protect against homograph attacks by displaying punycode instead of the original IDN if the hostname does not fulfill certain properties. They try to do this in a way that allows IDN to be shown for valid hosnames, but protects against phishing.

Google Chrome's IDN policy

Google Chrome decides if it should show IDN or punycode for each component of a hostname separately. To decide if a component should be shown in IDN form, Google Chrome uses an algorithm that depends on the languages that the user claims to understand. On Windows and Linux, these languages can be configured in the Google Chrome's Fonts and Languages dialog. On Mac OS X, they are currently derived from the system language. The algorithm is:
  • Build the set of characters in the component.
  • If that set contains at least onblacklisted character, punycode is displayed.
  • If the language list is empty, the component is displayed in IDN form exactly if it contains only characters used in a single language and the following steps are skipped.
  • The characters 0-9, +, -, [, ], _, and the space character are removed from the set.
  • If all the remaining characters belong to a single of the languages in the language list as described below, the IDN form is displayed. Otherwise punycode is shown.
(This is implemented by IsIDNComponentSafe() in net/base/net_util.cc.)

The characters belonging to a given language are specified by the Unicode Consortium's CLDR dataset (schema; you can also view the exemplary characters of every language online, here's for example the set for Japanese). In Google Chrome, this is implemented by the ICU function ulocdata_getExemplarSet(), with the characters a-z added for whitelisted languages whose glyphs can't be confused with a-z. The whitelisted languages currently are Chinese (zh), Japanese (ja), and Korean (ko).

    Consequences / Examples

    Google Chrome will display IDN for components of a hostname consisting solely of characters that belong to one of the languages selected in the language settings—even on .com and .net domains, not only in domains native to that language. For example, http://россия.net will be displayed in IDN form if you claim to speak Russian or another language written in Cyrillic, and as punycode otherwise. Likewise, http://私の団体も.jp/ will be shown in IDN form only if you claim to speak Japanese in Google Chrome's options.

    Google Chrome will always display punycode for components of a hostname that contains characters not in the main exemplary character set of any language. For example, http://?.net/ will always be displayed as punycode in Google Chrome.

    Google Chrome will always display punycode for components that mix letters from multiple languages. For example, there is not a single language that contains all characters found in http://???????ē???????m????????t??.de, so this will be shown as punycode. Likewise, http://ebаy.com (with a Cyrillic "а") will always be shown as punycode, even if both English and Russian are in the accepted languages. This is true even if the domain is below a TLD whose registry takes care to protect against homograph attacks.

    Behavior of other browsers

    IE

    IE displays URLs in IDN form if every component contains only characters of one of the languages configured in "Languages" on the "General" tab of "Internet Options", similar to what Google Chrome does.

    http://msdn.microsoft.com/en-us/library/bb250505(VS.85).aspx

    Firefox

    Firefox has a whitelist of TLDs whose registrars take care that no homographically confusable domains can be registered. URLs under such top-level domains are shown as Unicode unless they contain one of several blacklisted characters. For TLDs that are not whitelisted (e.g., .com), Firefox always displays punycode.

    http://kb.mozillazine.org/Network.enableIDN
    http://www.mozilla.org/projects/security/tld-idn-policy-list.html
    http://kb.mozillazine.org/Network.IDN.blacklist_chars

    Opera

    Like Firefox, Opera has a whitelist of TLDs and shows IDN only for these whitelisted TLDs.

    Safari

    Safari has a whitelist of scripts that do not contain confusable characters, and only shows the IDN form for whitelisted scripts. The whitelist does not include Cyrillic and Greek (they are confusable with Latin characters), so Safari will always show punycode for Russian and Greek URLs.

      评论这张
     
    阅读(295)| 评论(0)
    推荐 转载

    历史上的今天

    评论

    <#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    页脚

    网易公司版权所有 ©1997-2017