URL Spoofing with IDNA 2008

26 June 2023 - Jacob Anderson

“Floss.com”, “floss.com”, “Floß.com”, or “floß.com”; à%2Ecom or à⒈com or a1.com?

IDNA Details at unicode.org

The original domain name specification relies upon using ASCII characters to define absolute name matches for locations on a network.

Name servers and resolvers must compare labels in a case-insensitive manner, i.e. A=a, and hence all character strings must be ASCII with zero parity. Non-alphabetic codes must match exactly.

The use of ASCII characters in DNS records has not changed since the 1980s. According to RFC 2181, though, any octet can be part of the name, including non-printable characters. This relaxed requirement on naming suggests that a name consisting of 10 zero length unicode characters followed by a known domain name is a legitimate DNS name.

For instance, a name like my%200D%200D%200D%200Dpayments.com is an allowed domain that prints as my‍‍‍‍payments.com (literally: <%="my\x200D\x200D\x200D\x200Dpayments.com"%>). If you saw a URL like https://mypayments.com you may actually be looking at https://my%200D%200D%200D%200Dpayments.com but your "browser" would show you the rendered URL rather than the decoded URL.

The IDNA 2008 suggests that the zero length multibyte (MB) characters not be allowed in name registrations. That doesn't stop a malicious DNS service from registering one of those names using a TLD that looks trustworthy, e.g. https://my%200D%200D%200D%200Dpayments.apple. Anyone can put up a TLD resolver and broadcast their domains, hoping that some resolver will pick them up.

Defending against a unicode obfuscation phishing attack is very difficult unless you have an intercept that knows what MB characters to strip from the name before comparing against known names. In the case of using a TLD like apple, the defense is much harder. In that case, the attacker is hosting the unknown TLD using a trusted name in commerce. This requires the "browser" to have a dictionary of "trusted" TLDs and only allow such domains. This could cause some frustration when new TLDs are introduced outside of the update cycle of the "browser."

Another attack is to obfuscate the TLD: mypayments.co‍m https://mypayments.co%200Dm. Here the attacker needs to have a TLD name resolution infrastructure setup so that the "browser" can resolve the .co%200Dm domain. The only defense here is to use an allow-list of TLDs that the local DNS knows, or to disallow IDNA extensions in domain names (not recommended).

What about using a DNS proxy? You find these in all of the enterprise firewall UTM solutions. These services can filter DNS queries by entropy (for exfiltration mitigation using noise names), and by obfuscation (phishing name confusion). If you have this service but did not enable it on your perimeter, it's time to look into configuration options. A DNS query proxy can help with mitigating phishing attacks at work, where you will find these enterprise level services. At home, though, and on your mobile phone, these services are more rare.

How about something more malicious. Here's a well known domain: аpple.com, but is it? That first 'a' isn't really an ASCII letter 'a'. In fact, it's a Cyrillic a, and it's punycode is actually xn--pple-43d.com. Doubt this works? Let's put it through bind to find out. You can setup the zone file so that it refers to the "xn--pple-43d.com" domain like any other domain name. Then you just fill in the subdomains, e.g. "www.xn--pple-43d.com." Once you have the zone file setup, go to your web server and setup the bindings for the test site. In IIS you have to use the actual Cyrillic version of the domain name and it will map the IDNA punycode properly. If you use the punycode, then IIS will complain about an invalid value.

Our friends at Google are always looking out for you. If you open chrome and visit that domain using the Cyrillic version, Chrome will warn you that you are visiting a confused domain and will ask you if you want to go to the bona fide apple.com domain or if you want to bypass and visit the confused domain. Once you "skip," though, chrome will default to skip until you tell it not to. Note that Chrome will show you the punycode domain name in the addressbar, rather than the Cyrillic.

What about that ZWJ (zero-width-joiner) character? First you need to punycode that domain with the ZWJ character in it. The puncycode looks like "xn--apple-xt3b.com" with the ZWJ injected after the 'a', and would look like "apple.com" when printed. When I tried using nslookup to query this domain, it returned an error:

nslookup: 'a‍pple.com' is not a legal IDN name (string contains a forbidden context-j character), use +noidnin

Sites I used during this research, and the IETF RFC pages for DNS (so many of them):