1. 24

I got frustrated that ascii(1) was not able to even display information about umlauts; and then I got frustrated because unicode(1) didn’t display useful information about newlines. So I wrote my own tool in rust!

This is fairly silly, but I hope it will be useful to somebody (it’s definitely been useful to me so far)!


  2. 10

    See also my tool: https://github.com/philpennock/character

    Golang; use the -v option to sub-comands to see pretty tables. Can map names to chars, chars to names, give vim digraph information, browse ranges, and copy the resulting characters to the clipboard.

    The fixed-width table doesn’t align cleanly in lobste.rs; that’s a lobste.rs problem. ;)

    % character named -v/ 'small letter sharp'
    │ C │ Name                       │ Hex │ Dec │ UTF-8  │ Block              │ Vim │ HTML    │ XML     │
    │ ß │ LATIN SMALL LETTER SHARP S │  df │ 223 │ %C3%9F │ Latin-1 Supplement │ ss  │ ß │ ß │
    1. 2

      Looks splendid here. https://p.fuwafuwa.moe/woonhh.png

    2. 8

      This is fairly silly, but I hope it will be useful to somebody (it’s definitely been useful to me so far)!

      You would be surprised by the kind of use cases that may arise in software you write for little one-off situations. One of my colleagues at work nudged me one day and said he stumbled upon text-memorize on my GitHub page. I originally wrote this program to help me memorize dialogues in my Russian language courses. Anyways, his son has problems remembering things, so he uses the program to help him with his memorization. What’s even more fascinating is that apparently his son has taken this program to school, showed it to other kids, and now they are trying to re-write the whole thing in other languages like Java and Go. You should have seen the look on my face after my co-worker had told me all this. I just went back to my desk and realized that I had inadvertently introduced those kids to programming. Who knows what they will write in the coming years knowing that they’re not even in high school yet.

      1. 1

        Looks nice and useful! Though, I don’t understand why you treat ISO-8859-1 separately? Since modern UNIX is pretty much uses UTF-8 for storage everywhere, I’d expect to see the ISO-8859-1 characters in UTF-8 bit representation as well (since the non-ASCII subset of ISO-8859-1 is encoded as two bytes in UTF-8).

        1. 2

          That’s a good question - most of the presentation functions so far are a bit experimental, anyway (:

          My reasoning was that I often want to know what’s 7-bit ascii clean, and what’s not; that the code points that have their 8th bit set are in the Latin-1 Supplemental Block) seemed like interesting information; I might extend that to list more supplements by name.