1. 43

  2. 7

    Homographs are somewhat of a problem in information security, though less so now than they were.

    Take for example this: bаnkofamerica.com

    That’s not “bankofamerica.com”. The first ‘a’ is the Cyrillic letter Azu, which just happens to be written identically to the Latin letter a.

    (I say “just happens”; they share a common ancestor in the Greek alphabet as adapted by the followers of St. Cyril.)

    So, even though they look identical, they would be treated differently by applications and DNS.

    Newer versions of browsers will avoid that sort of thing by not allowing mixed character sets in domain names (displaying them as raw Punycode if there are mixed character sets), but that only applies to browsers. Avoiding that sort of attack in other applications is more spotty.

    1. 2

      Talking about homographs, I made this quick utility https://onlineunicodetools.com/spoof-unicode-text. You can enter regular and it will output spoofed text with homographs replacing regular characters.

      Similarly, I also made another utility https://onlineunicodetools.com/generate-unicode-text that rewrites the input text in many different Unicode fonts.

    2. 4

      Wait, nixers is still alive? I thought you it was closed a few weeks ago.

      1. 1

        After disussing some alternative platforms, it was decided we would give it another go for now.

      2. 3
        1. 4

          DNS doesn’t actually convert it, as mentioned in the post. It’s the normalization process that does, it uses a map found on unicode.org. I made a small script that only outputs valid ones. There’s also the unorm javascript library, which is the equivalent of libidn in js.

          1. 4

            Quite a few programming languages also work similarly, using a specific (but not IDNA) method of normalizing identifiers. For example, here’s some valid Python.

            If you know your way around the different Unicode equivalence types and normalization forms, you can have a lot of fun (for varying definitions of “fun”) with this.

            1. 2

              That’s awesome, thanks for the info.
              Actually, that reminds me that in URLs upper and lower cases are equivalent, I had forgotten upper cases in my script. That adds another whole range of equivalent glyphs.

        2. 2

          You might like this converter I wrote ages ago. https://mozfreddyb.github.io/unicode-text-convert/

          1. 3

            I also wrote one, and there are many more online, but the novelty in the thread is that the domain will be resolved the same way.