1. 20

  2. 13

    Any standard that contains the words:

    WTF-8 is a hack intended to be used internally in self-contained systems with components that need to support potentially ill-formed UTF-16 for legacy reasons.

    Should have those words be immediately followed by:

    For those who are reading this document because you inherited a system where they did not heed the words above. We are sorry.

    Because I guarantee you you someone will violate the intent on day one.

    1. 2

      FYI: Ill-encoded text here is web content (JS being UTF-16)

      1. 2

        Sure but I guarantee it won’t be long before someone has to support a service in C#/Java/whatever that leaks this encoding out into the wild.

    2. 11

      A side note, Simon works on Servo, and this spec is used inside of Rust to handle various relevant things.

      1. 1

        For me WTF-8 has always been doing the transform to UTF-8 twice, e.g. toUTF8(toUTF8(latin1src)) (double-UTF8)