1. 14

    Worth noting that up until the conclusion, the article uses “Unicode” as a synonym for UCS-2, which was not unreasonable at the time, and the arguments mostly hold up; UCS-2 was replaced by UTF-16 and then (mostly) with UTF-8 because UCS-2 was fatally flawed.

    He argues that simultaneously supporting UTF-8, UTF-16, and UTF-32 is infeasible, and that has been borne out. He just missed the fact that we ended up picking just one of these, and that’s worked out great.

    Verisign recently opened a Pandora’s Box when the company stated that it was taking orders for URLs in the language particular to those countries which either desire or demand to work in a written set other than Latin1.

    This statement betrays a fundamental confusion about domain names that makes me a little suspicious of the author’s technical grasp of the subject, though his historical and cultural insights seem sound.

    1. 3

      I remember the big hubbub over multilingual domain names in 2000-02 that caused a bit of discomposure.

      1. 2

        Yeah, I was confused while reading this, because the crux of his argument was that “Unicode” only supported about 65k distinct characters which isn’t enough for every single possible hanzi in some dictionary. But I know that today Unicode supports something like one million possible code points, which is actually enough for every single possible hanzi, even uncommon ones. That change must have happened after he wrote this article (maybe precisely in order to solve the problem of supporting enough rare hanzi?)

        1. 5

          Unicode’s original plan was to unify the world’s digital character sets; while the Chinese language might have fifty thousand characters, the Chinese government’s GB2312 character encoding contains only 6,763, a much more practical number.

          After Unicode 1.x was released, ISO showed up and said “hey, that’s a nice universal character set, we were planning on making one too, why don’t we join forces? Of course, encoding digital character sets isn’t enough, we want to encode EVERY CHARACTER EVER!” And so Unicode 2.0 came out, introducing UTF-16, surrogate pairs, the Basic Multilingual Plane and all the astral planes so as to have enough room for everything.

      1. 1

        One thing I’d like to ask the author: do you need “to lead instead of follow” in order to want to learn something? Or was that just a rhetorical device?

        1. 1

          I linked this thread to the author, and he replied:

          It was contextual. Same for “to create instead of build.”

          Nothing against builders or followers. But I was aiming against the usual, “don’t do that!” response I see to everyone trying new things. Specifically the recent article doing the rounds about running a website in C++ (spoiler alert: aside from this forum, my site runs on C++ code.)

          I was saying, do you want to follow others’ advice, or try something new and learn for yourself? Maybe C++ ends up going badly for you, in which case you learn why, and can apply those lessons going forward, instead of just parroting others’ advice not to even try. Maybe it goes well for you, in which case you have something exciting and potentially innovative on your hands.

          I was saying, do you want to just incrementally build websites based on what everyone else has done? Eg maintain yet another Wordpress blog, maybe monkey patch in a client feature request or two? To copy everyone else and try to deploy whatever JS framework is handy this week? Or would you rather work on a framework and understand your craft on a more fundamental level?

          My own experiences are that the deeper I go into computer science, the more sheer fun I have. The most exciting time I can remember programming was working on a language of my own. It didn’t bear fruit, but what I learned transformed my GUI toolkit (among other things) into something far more robust. And the more I work on these things, the more I recognize how unbelievably bloated every last thing in computer science has become. It’s refreshing to shed away decades of cruft and just start fresh. To see how much you can do with 1/20th the code. Eg I just wrote a fancy cross-platform CRUD application with lots of input validation in 400 lines … of C++ code! I can parse PNG images in 16KiB of code from two headers … no need for 400KiB of zlib and 300KiB of libpng. I have a functional IPv4+6 web server in 200 lines of code. A high-quality audio IIR filter in 3.8KiB of code. A delta-patching format you can describe in full on a postcard. And every time I hear, “but you could have just used $library!” And yeah, I could have. And I would have added lots of dependencies making the code harder to build, and I will have learned nothing meaningful in the process. No thank you.

          I understand there are coders that just wanna get paid. Surely, the least amount of work is the easiest way to do that. But I was appealing to people who enjoy programming as an art: don’t let other people convince you to be a worker bee when you could have so much more fun experimenting. One thing is for certain: nobody’s ever going to remember the 9-5 office worker who maintained blog.generic-company.com. But they sure will remember Daniel Bernstein, or John Carmack, or Bjarne Stroustrup.

          I also understand it’s not a good idea to deploy your own homegrown Caeser cipher out at your job at Wells Fargo. I’m talking about applications where safety and job security aren’t on the line here. Someone has to learn how to write crypto libraries. If we all leave it to the OpenSSL guys then we get a software monoculture that screws us all when, surprise!, they’re not perfect either and now everyone has to deal with Heartbleed.

          But, look how fast my site loads. You click that article link and the page is there in 20 milliseconds. Now go try a web 3.0 darling like Slack, and you’re sitting there twiddling your thumbs for 20 seconds while the chat loads. Is my C++ on the web approach really worse than that?

        1. 20

          A bit unrelated to the linked content, but “Low Hanging Fruit of Programming Language Design” really makes me wish a place/collection/book which documents the lessons (both good and bad) learned in the last few decades of language design.

          Language design currently feels like it has plateaued and isn’t really moving forward anymore except in a few specific attributes the language author really cares about.

          Having some exhaustive documentation that holds potential design approaches for individual features, possibly with a verdict based on past experiences would help to prevent language designers repeating the same mistakes over and over.

          Entries could include:

          • The various, bad approaches to operators and operator “overloading”.
          • How not to mess up equality and identity.
          • Things that don’t belong on a language’s top type.
          • Implicit numeric conversions will never work in a satisfying fashion.
          • Picking <> for Generics is wrong and broken, as evidenced by all languages that tried it.
          • Why ident: Type is way better than Type ident.
          • Typeclass coherency is fundamentally anti-modular and doesn’t scale.
          • Upper and lower bounds are a – rather uninteresting – subset of context bounds/typeclasses.
          • There is no reason to require “builtin/primitve” types to be written differently from user-defined types.
          • How to name things in a predictable way.
          1. 13

            Another good source of lessons would be what Gilad Bracha calls shadow languages. Essentially, any time you find a simple mechanism getting extended with more and more features, until it essentially becomes a (crappy) programming language in its own right. The obvious thing to try in these situations is to throw out that standalone system and instead provide some mechanism in the ‘real’ language for programs to calculate these things in a “first class” way.

            Bracha gives examples of ML modules, where e.g. functors are just (crappy) functions, Polymer (which I don’t know anything about) and imports (e.g. conditional imports, renaming, etc.).

            Some more examples I can think of:

            • Haskell’s typeclass resolution mechanism is basically a crappy logic language, which can be replaced by normal function arguments. Maybe a better generalisation would be implicit arguments (as found in Agda, for example)? One way to programmatically search for appropriate arguments is to use “proof tactics”.
            • Haskell types have gained datakinds, type families, etc. which are basically just a (crappy) functional programming language. When types are first-class values (like in Coq, Agda, Idris, etc.) we can use normal datatypes, functions, etc. instead.
            • Build/packaging systems. Things like Cabal, NPM, PyPI, RPM, Deb, etc. These are usually declarative, with elaborate dependency solvers. As well as being rather limited in what we can express, some systems require all participants to maintain a certain level of vigilence, to prevent things like ‘malicious solutions’ (e.g. malicious packages claiming to implement veryCommonDependency version 99999999999). I’ve seen a couple of nice approaches to this problem: one is that of Nix/Guix, which provide full functional programming languages for calculating packages and their dependencies (I’ve even written a Nix function which calls out to Tinc and Cabal for solving Haskell package dependencies!); the other is Racket’s old PLaneT packaging system, where programs write their dependencies in-line, and they’re fetched as needed. Unfortunately this latter system is now deprecated in favour of raco, which is just another NPM-alike :(
            • Multi-stage programming seems like a language-level concept that could subsume a bunch of existing pain points, like optimisation, configuration, or even packaging and deployment. Why bother with a mountain of flaky bash scripts to orchestrate compilers, build tools, test suites, etc. when we can selectively compile or interpret sub-expressions from the same language? The recent talk “programming should eat itself” looks like the start of some really exciting possibilities in this area!
            1. 4

              I tried to write something like that, but this is so subjective.

              This is probably not a big problem in practice. If you design and publish a language, many helpful trolls will come and tell you all mistakes you made. 😉

              1. 6

                FWIW, I compiled a list of articles on some of the topics I mentioned above:

                Maybe it is interesting for you.

                The articles’ conclusion are based on an exhaustive survey of more than a dozen popular languages as well as many minor, but influential ones.

                1. 2

                  Thanks, a few of my writings:

                  Is there a way to get a feed of your articles? https://soc.github.io/feed/ is empty.

                  1. 1

                    Looks like we agree on pretty much everything in the first two articles. :-)

                    I think there are only bad options for dealing with operators (while operator overloading is pretty much broken altogether), but some approaches are less bad then others.

                    My opinion is to pretty much use the simplest thing to document, specify and implement, and emphasize the lack of importance of operators to users to stop them from going overboard.

                    I believe they get way too much attention given how unimportant they are in the grand scheme of things.

                    Is there a way to get a feed of your articles? https://soc.github.io/feed/ is empty.

                    I’ll look into that, wasn’t even aware that I had a feed. :-)

                  2. 2

                    For equality, I’m not sure if there should be an equals method in Object (or Any or AnyRef).

                    Equality is not a member of a type. (Unfortunately, in Java everything must be inside a class) Equality often depends on context. For example, when are two URLs equal? Sometimes you want to compare the strings. Sometimes the string without the fragment identifier. Sometimes you want to make a request to look what gets returned for the URLs.

                    Sometimes, we might prefer to not provide equals at all. For example, does it make sense if locks can be equal?

                    The argument pro Object.equals is convenience. For many types there is a reasonable default. Manually specifying equality for every hash map instantiation is tedious.

                    1. 1

                      For equality, I’m not sure if there should be an equals method in Object (or Any or AnyRef).

                      I agree. The considerations don’t rely on it, except as a tool for demonstration. If a language isn’t forced (e. g. by the runtime/host language/library ecosystem) to have it on their top-type it makes sense to leave it out.

                    2. 2

                      Why are [] better than <> for generics

                      I feel like this should be a two-part argument:

                      • why re-using <> for generics is bad
                      • why [] are better used for generics than for indexing

                      Your article is pretty convincing on the first part, but curiously silent on the second. What does Scala use for indexing into ordered collections? Or does it avoid them altogether?

                      1. 4

                        Scala uses () for indexing into ordered collections, a la vbscript.

                        As a language developer, I’ve not implemented generics, so I’ve yet to develop strong feelings about <> in that sense
                        As a language user, <> for generics has never tripped me up. That has mostly been In C# and Java, however, and I think both languages keep the places where <> vs < or > shows up mostly distinct. I’d hardly call it a disastrous choice on this side of things, even if it took some extra work on the part of the language teams for them.

                        1. 2

                          I do know that <> ends up looking painfully ugly, at least in Rust. Also is making it harder to find a nice syntax for constant generics, and is responsible for the ugly turbofish operator.

                          1. 1

                            I would suppose that is a bit more of a matter of taste, but I’m unsure that [] would be any better on that front, unless ::<> would be replaced by something other than ::[]. Which might be possible if Rust didn’t use [] for indexing. Given the tiny bit of Rust I’ve written, giving up [] for indexing would almost make sense, since you’re often using methods for collection access as it is. I’d have to sit down with the Rust grammar to be sure.

                            1. 2

                              The important thing to keep in mind is that <> weren’t chosen for any kind of reason except “these are literally the only symbols on the keyboard which kind of look like braces that we can retrofit into an existing language.”

                              If you start from a blank slate and ask “what is the best we can do, making the rules of the language simple and easy to understand” the result will be very different.

                              Consider two approaches:

                              Approach A

                              • () brackets: used for method calls, except where they are not: Array construction, array indexing, etc.
                              • [] brackets: used for array construction and array indexing
                              • {} brackets: used for array construction, method bodies, initializers, …
                              • <> “brackets”: used as an operator for comparisons, used for generics

                              Approach B

                              • () brackets: used for terms, grouping, marks a single expression
                              • [] brackets: used for types
                              • {} brackets: used for refining a term/type, marks a sequence of statements
                              • <> “brackets”: not used, because they are not brackets

                              I think no one would say “let’s mix things up and assign brackets to various use-cases randomly” and pick approach A over approach B.

                              And yes, Rust would be simpler and easier to read if they kept the use of [] for generics, instead of migrating to <>.

                              1. 2

                                unless ::<> would be replaced by something other than ::[].

                                That’s exactly what I’m thinking. It’s subjective, but I also find that <> makes polymorphic types look claustrophobic, where as [] feels more ‘airy’ and open, due to their shapes.

                                Here’s an example from today:

                                fn struct_parser<N>(fields: &[Field<N, Rc<binary::Type<N>>>]) -> Rc<ParseExpr<N>> {
                                

                                vs.

                                fn struct_parser[N](fields: &Slice[Field[N, Rc[binary::Type[N]]]]) -> Rc[ParseExpr[N]] {
                                

                                Ideally I would prefer that the same syntax be used for both value level and type level abstraction and application, but I’ll save that debate for another time…

                      2. 3

                        Even a list of how different language designs approach the same problem, with respect and without comparing them, would be a huge improvement over what we have now. Should be easier to compile than a “here are the lessons” document since it’s less subjective.

                        1. 2

                          To compare languages, I’ve used:

                          And although I haven’t used it very much, Rosetta code does what you want:

                          Of course these are superficial, but surprisingly useful. I draw mostly on the languages I really know, but it’s nice to have an awareness of others. I know about 5 languages really well (written thousands of lines of code in each), 5 more languages a little bit (Lua, Go, etc.), and then there are maybe 10 languages that I don’t know which are “interesting” (Swift, Clojure, etc.)

                          I think that posting to lobste.rs or /r/ProgrammingLanguages can help feedback with those. Here is one thread I posted, summarizing a thread from HN:

                          https://www.reddit.com/r/ProgrammingLanguages/comments/7e32j8/span_slices_string_view_stringpiece_etc/

                          I don’t think there is much hope of getting all the information you want in one place. Because there is so much information out there, and some languages like Swift are new and rapidly developing.

                          Personally I maintain a Wiki that is my personal “delicious” (bookmark manager), although I have started to move some of it to the Oil wiki [1]

                          [1] https://github.com/oilshell/oil/wiki

                          1. 2

                            FWIW, I compiled a list of articles on some of the topics I mentioned above:

                            Maybe it is interesting for you.

                            The articles’ conclusion are based on an exhaustive survey of more than a dozen popular languages as well as many minor, but influential ones.

                            (Sorry for the double post.)

                            1. 1

                              A can also recommend /r/Compilers. At least, I had a nice discussion there recently.

                        2. 2

                          The best things I’ve found on this are interviews with language designers. But it is scattered.

                          1. 2

                            That would be nice, but I see several problems:

                            • Language design depends on the domain. There’s no right answer for every domain. For any language that someone claims is “general purpose”, I will name a domain where virtually no programs in it are written (for good reasons).
                            • Almost all languages features interact, so what is right for one language is wrong for another.
                            • Some things are subjective, like the two syntax rules you propose. They’re also dependent on the language.
                            1. 3

                              Kind of agree with your points, but I believe there is a reasonable subset of topics, where one can provide a conclusive verdict based on decades of languages trying various approaches, like for instance abusing <> for generics or ident: Type being better.

                          1. 4

                            Your docs mention that on POSIX systems, paths might not be valid UTF-8 (or any single encoding), but it’s not clear to me what Pathie does in such a situation: are paths containing invalid UTF-8 inaccessible? Can you read them from the OS but not construct them yourself?

                            Your docs also say that Windows uses UTF-16LE, which is not strictly true: in the same way that POSIX paths are a bucket of bytes and not necessarily valid UTF-8, Windows paths are a bucket of uint16_ts and not necessarily valid UTF-16 (in particular, they can have lone surrogates that do not form a surrogate pair, or values that are not assigned in the Unicode database). How does Pathie interact with such malformed paths?

                            Lastly, macOS: as your documentation points out macOS does have an enforced filename encoding, but it also has an enforced normalisation (at least for HFS+ volumes). That means your application can create a string, create a file with that name, then readdir() the directory containing that file and none of the returned directory entries will byte-for-byte match the string you started with. Does that affect Pathie’s operation?

                            1. 2

                              Your docs mention that on POSIX systems, paths might not be valid UTF-8 (or any single encoding), but it’s not clear to me what Pathie does in such a situation: are paths containing invalid UTF-8 inaccessible?

                              First off, Pathie does not assume POSIX pathes are UTF-8, because that isn’t specified. Unless you compile Pathie with ASSUME_UTF8_ON_UNIX, it takes the encoding information from the environment via the nl_langinfo(3) function called with CODESET as the parameter (which is why you need to initialise your locale on Linux systems).

                              are paths containing invalid UTF-8 inaccessible? Can you read them from the OS but not construct them yourself?

                              In the case of a path with invalid characters in the locale’s encoding (e.g., invalid UTF-8 on most modern Linuxes), you’ll get an exception when trying to read such a path from the filesystem, because iconv(3) fails with EILSEQ (which is transformed into a proper C++ exception by Pathie). You cannot either construct pathes containing invalid characters, because you will receive the same exception. I’ll make this more clear in the docs.

                              Your docs also say that Windows uses UTF-16LE, which is not strictly true:

                              Pathes in valid encoding are UTF-16LE. Broken path encodings may be anything and that’s nothing one can make assumptions about. Again, you’ll receive an exception when you encounter them (because the underlying WideCharToMultiByte() function from the Win32API fails).

                              (in particular, they can have lone surrogates that do not form a surrogate pair, or values that are not assigned in the Unicode database)

                              I was not aware of that. Do you have a link with explanations, ideally on MSDN?

                              Lastly, macOS: as your documentation points out macOS does have an enforced filename encoding, but it also has an enforced normalisation (at least for HFS+ volumes)

                              macOS is not officially supported by Pathie (which is stated at the top of the README), simply because I don’t have a Mac to test on.

                              That means your application can create a string, create a file with that name, then readdir() the directory containing that file and none of the returned directory entries will byte-for-byte match the string you started with. Does that affect Pathie’s operation?

                              It shouldn’t affect Pathie’s operation itself. Pathie will simply pass through what the filesystem gives it; since on macOS no conversion of path encodings happens, these normalised sequences are handed through to the application that uses Pathie.

                              Thanks for the feedback!

                              1. 3

                                Do you have a link with explanations, ideally on MSDN?

                                Unfortunately, I can’t find a smoking-gun writeup on MSDN. However, in my searching, I did find:

                                • Scheme48 has a special OS String type, and motivates it saying “On Windows, unpaired UTF-16 surrogates are admissible in encodings, and no lossless text decoding for them exists.”
                                • Racket’s encoding conversion functions include special “platform-UTF-8” and “platform-UTF-16” encodings: “On Windows, the input can include UTF-16 code units that are unpaired surrogates…”
                                • Rust also includes a special OSString type: “On Windows, strings are often arbitrary sequences of non-zero 16-bit values, interpreted as UTF-16 when it is valid to do so.”
                                • I found the Rust ticket that introduced the OSString type, which includes a (Rust) test case. One of the Rust devs dug up an MSDN page that says “…the file system treats path and file names as an opaque sequence of WCHARs.”
                                • That issue also linked to a report of UTF-16-invalid filenames being found in the wild in somebody’s Recycle Bin.
                                1. 3

                                  It’s a problem of enforcement. Rust internally uses WTF-8 as an internal encoding to fix that.

                                  https://simonsapin.github.io/wtf-8/

                                  1. 1

                                    An interesting read, thank you for the pointer. I’ll see if I adapt Pathie accordingly, but until now it has done the job for me (and it’s mostly a library I use for my own projects).

                                  2. 1

                                    Thanks!

                              1. 2

                                It turns out the URL in the description is not for video linked from the heading. I was quite confused for a while there.

                                1. 3

                                  Wondered when somebody would trip over the easter egg, sorry :)

                                  1. 2

                                    Once I figured out what had happened, I was amused, so no worries. :)

                                    1. 1

                                      I did and came back to the Lobster link thinking that there must be more to listen to. Great one indeed ;)

                                  1. [Comment removed by author]

                                    1. 10

                                      You’re saying that ST was great 4-5 years ago, but apart from the langserver, which one of your points didn’t apply back then as much as it does now? You say that “today there are better editors”, but surely vim is much older than 4-5 years and basically didn’t change.

                                      1. [Comment removed by author]

                                        1. 8

                                          The primary reason I stick with Sublime Text is that Atom and VSCode have unacceptably worse performance for very mundane editing tasks.

                                          I’ve tried to switch to both vim and Spacemacs (I’d love to use an open source editor), but it’s non-trivial to configure them to replicate functionality that I’ve become attached to in Sublime.

                                          1. 1

                                            I thought VSCode was supposed to be very quick. Haven’t experimented with it much myself, what mundane editing tasks make it grind to a halt? I am well aware Atom has performance issues.

                                            1. 1

                                              Neither Atom nor VSCode grind to a halt for me, but I can just tell the difference in how quicky text renders and how quickly input is handled.

                                              I’m not usually one of those people who obsesses about app performance, but editors are an exception because I spend large chunks of my life using them.

                                            2. 1

                                              I’ve tried to switch to both vim and Spacemacs (I’d love to use an open source editor), but it’s non-trivial to configure them to replicate functionality that I’ve become attached to in Sublime

                                              This is the reason who I stay with vim, unable to replicate vim functionality in other editors.

                                              1. 1

                                                Yeah, fortunately NeoVintageous for Sublime does everything I need for vim-style movement and editing.

                                        2. 3

                                          I think the really ground-breaking feature that ST introduced was multi-cursor editing. Now most editors have some version of that. Once you get used to it, it’s very convenient, and the cognitive overhead is low.

                                          As for the mini-map, I suppose it’s a matter of taste, but I found it very helpful for scanning quickly through big files looking for structure. Visual pattern recognition is something human brains are ‘effortlessly’ good at, so why not put it to use? Of course, I was using bright syntax hilighting, which makes code patterns much more visible in miniature. Less benefit for the hilight-averse.

                                          I’ve been using ST3 beta for a few years as my primary editor. I tried using Atom and (more recently) VS Code, but didn’t like them as much: the performance gap was quite noticeable at start-up and for oversized data files. The plug-in ecosystems might make the difference for some folks, but all I really used was git-gutter and some pretty standard linters. For spare-time fun projects I still enjoy Light Table, but it’s more of a novelty. I’m gradually moving away from the Mac and want a light-weight open-source editor that will run on any OS.

                                          So now, as part of my effort to simplify and get better at unix tools, I’m using vis. I’m enjoying the climb up the learning curve, but I think that if I stick with it long enough, I’ll probably end up writing a mouse-mode plugin. And maybe git-gutter. Interactive structural regexps and multi-cursor editing seem like a winning combination, though.

                                          1. 3

                                            You might enjoy exploring kakoune as well. http://kakoune.org | https://github.com/mawww/kakoune

                                            1. 2

                                              I’m an Emacs guy myself and I honestly think that multi-cursor editing is just eye-candy for good ol’ editor macros, and both both vim and Emacs include them since… forever?

                                              1. 3

                                                I’ve never used Sublime Text, but I’ve used multiple-cursors in vis and Kakoune, and it beats the heck out of Vim’s macro feature, just because of the interactivity.

                                                With Vim, I’d record a macro and bang on the “replay” button a bunch of times only to find that in three of seventeen cases it did the wrong thing and made a mess, so I’d have to undo and (blindly) try again, or go back and fix those three cases manually.

                                                With multiple cursors, I can do the first few setup steps, then bang on the “cycle through cursors” button to check everything’s in sync. If there’s any outliers, I can find them before I make changes and keep them in mind as I edit, instead of having my compiler (or whatever) spit out syntax errors afterward.

                                                Also, multiple cursors are the most natural user interface for [url=http://doc.cat-v.org/bell_labs/structural_regexps/]structural regular expressions[/url], and being able to slice-and-dice a CSV (or any non-recursive syntax) by defining regexes for fields and delimiters is incredibly powerful.

                                                1. 0

                                                  [url=http://doc.cat-v.org/bell_labs/structural_regexps/]structural regular expressions[/url]

                                                  This might be the first attempt at BBCode I’ve seen on Lobsters. Thanks for reminding me how much I hate it.

                                                  1. 1

                                                    Dangit, you can tell I wrote that reply at like 11PM, can’t you. :(

                                                2. 1

                                                  I agree with you. I use Vim, and was thinking about switching until I realized that a search and repeat (or a macro when it’s more complex) works just as well. Multiple cursors is a cute trick, but never seemed as useful as it first appeared.

                                                3. 2

                                                  I thought multiple cursors was awesome. Then I switched to using Emacs, thanks to Spacemacs. Which introduced to me [0] iedit. I think this is superior to multiple cursors. I am slowly learning Emacs through Spacemacs, I’m still far away from being any type of guru.

                                                  [0] https://github.com/syl20bnr/spacemacs/blob/master/doc/DOCUMENTATION.org#replacing-text-with-iedit

                                                4. 2

                                                  I’ve started using vim for work, and although I’ve become quite fast, I find myself missing ST’s multiple cursors.

                                                  I might try switching to a hackable editor like Yi. I’ve really enjoyed using xmonad recently for that reason.

                                                1. 7

                                                  Wow, that was a great presentation. I was expecting something along the lines of BSD/GPL values dissonance, but it was actually broader than that.

                                                  If nothing else, now I have a name for why some technologies’ popularity is so mystifying to me: it’s just that their communities are built around values I value less highly.

                                                  1. 6

                                                    I haven’t read the book myself, but I’ve watched the YouTube lectures the author produced, and they were great. Quite accessible (for people who identify as programmers that have heard enough about category theory to want to learn more), and very mind-expanding. Although this submission is tagged “haskell”, no prior Haskell knowledge is required for the videos to make sense, and any Haskell notation is explained at first use (sometimes with C++ notation, which is likely to be a good deal more familiar).

                                                    1. 13

                                                      “Your browser, Your way”

                                                      If making a request on the web causes “direct damage”, then perhaps whoever is responding to that request should think of something, or take their broken service offline if they can’t fix it.

                                                      It’s “invalid traffic” why? Because the people running that extension don’t purchase the advertised products? Nah, don’t serve those links if it’s not alright to click or crawl them.

                                                      1. 3

                                                        It’s not the target of the request that is being damaged, so they have no incentive to take their service offline.

                                                        If I stand back and squint a bit, this extension vaguely resembles terrorism: somebody wants to disrupt a large, powerful organisation (an advertising network, a government), but they can’t target it directly, so they start causing harm to smaller, weaker organisations (customers, citizens) in the hope that the larger organisation will tear itself apart, or at least become too preoccupied to do its job well.

                                                        I see your point that there will always be Bad Guys on the internet, and people have to secure their stuff. However, there will also always be unsecured stuff on the internet (through malpractice or ignorance or innocent mistake) and making it easy for Bad Guys to exploit that stuff doesn’t help matters.

                                                        1. 5

                                                          However, there will also always be unsecured stuff on the internet (through malpractice or ignorance or innocent mistake) and making it easy for Bad Guys to exploit that stuff doesn’t help matters.

                                                          Yes, that’s wildly tangential.

                                                          The ads are not served by accident. They’re all intentional, and extensively monitored. They’re not hooked up to a bomb or some database that leaks individuals’ credentials.

                                                          It’s not the target of the request that is being damaged, so they have no incentive to take their service offline.

                                                          If all the users of an ad network get “punished” for click fraud or clicks get devalued so much that people don’t want to buy these ads, the ad network will run out of customers. The ad network should have plenty of incentive to fix their service.

                                                          But who said the damage is only supposed to go against ad networks? People might very well want to take down all the “small sites” that offer little value and lots of ads. And if there’s really no ad-supported site worth saving, there’s no collateral damage either.

                                                      1. 6

                                                        “The point of decentralized publishing is not censorship resistance – decentralization provides a little resilience to intermediary censorship, but not a lot. Instead, decentralization is important because it allows a community to run under its own rules. “ <- vehemently disagree. The most important reason for decentralization is censorship resistence. A community run under its own rules is exactly a community that wants to be resistant to at least some kinds of censorship, even if they are more than happy to censor other kinds of things themselves.

                                                        Whatever thing you want your community to be about, and whatever rules you want to enforce in that community, there’s someone out there who thinks that that is as bad as the author of this article thinks lolicon is. Decentralized platforms are what make it possible for you to influence the censoring that goes on immediately around you, instead of that entity.

                                                        1. 7

                                                          I think the idea behind “decentralization provides a little resilience to intermediary censorship, but not a lot” is that even if you run your own Mastodon instance with your own rules, your instance is online at the pleasure of your DNS provider, your ISP, your CDN, Digital Ocean/Vultr/AWS, the government…

                                                          If you really want censorship resistance, you want to think about publishing to a DHT like trackerless BitTorrent, or putting things into the Bitcoin blockchain, that kind of thing. But actually getting published information from there into people’s browsers is a lot more difficult.

                                                          1. 3

                                                            “your instance is online at the pleasure of your DNS provider, your ISP, your CDN, Digital Ocean/Vultr/AWS, the government…”

                                                            Well said. That’s probably what the author meant. It’s certainly what I mean when I say using these platforms to avoid ISP or government censorship is a joke. One can just block the protocols or identified users. The other can pass laws against it or executive uses existing powers to cause trouble for users, intermediaries, or suppliers. They’re already doing that in Five Eyes per Snowden leaks at least for surveillance.

                                                            I still like these tools for giving the communities using them more control over both the platform and what’s tolerated. Definitely a step up from the lock-in-loving platforms such as Facebook. What most miss is the marketing ability to achieve wide uptake. Personally, I’d start with targeting SME’s or enterprise customers just to make money to improve the software and deploy more instances.

                                                          2. 4

                                                            The issues are one and the same.

                                                            In a centralised structure, the central authority has power over others. In a decentralised structure, there’s no central authority and there is a smaller power-asymmetry in the system. A central authority can use said power to control what people are allowed to say or hear i.e. censorship, or they can use it to ‘run things their way’. Decentralisation allows people to be free to choose for themselves the rules of their systems. That includes “I can say whatever I want” or “Let’s use ABC ruleset and XYZ mod for our game servers”.

                                                            1. 3

                                                              I don’t want to ascribe intent to mastodon users/developers, and ultimiately open source lets you kinda do what you want with software (bring your own philosophy).

                                                              I remember back in the PHPBB days, where people would start a bunch of forums. It’s somewhat similar to the “decentralized” stuff in practice (modulo the single sign-on-y aspects, most forums worked the same way). Main reason to start a different forum was simply for topic.

                                                              I feel like that aspect is a bit under-rated in these discussions. Just like LinkedIn and Facebook might have similar feature sets, so do different Mastadon instances. But the primary differentiator isn’t feature set, but the community. I think about this a lot when people talk about all networking apps being “the same”, and I think it applies here as well.

                                                              1. 2

                                                                This discussion reminds me of covenant communities in Hoppean philosophy.

                                                                If people are free to do what they want in a framework that allows for “programmable” law (whether “real world” contracts or just digital rules), people can set up communities that follow whatever rules they think are socially beneficial, without forcing those rules on anybody outside their social community.

                                                                So freedom from forcibly imposed large-scale policies (like censorship rules on internet infrastructure) lets people set up small-scale rules in a way that’s more effective whilst minimizing the number of people who get screwed over.

                                                              1. 8

                                                                Is there a non-HTTPS version of this site for those of us too weak-willed to join the Glorious Anti-Certificate-Authority Rebellion?

                                                                  1. 3

                                                                    For those who missed it, here is Tedu’s explanation of why he is using a self-signed HTTPS certificate: https://www.tedunangst.com/flak/post/moving-to-https. (You might have to add a temporary exception to view that page.)

                                                                    1. 1

                                                                      Just click to ignore the warning? :)

                                                                      1. 2

                                                                        Not possible in all browsers. E.g. Firefox Focus can’t read tedu’s blog.

                                                                        1. 1

                                                                          Terrible habit to get into. Better to read the google cache.

                                                                          1. 2

                                                                            Wait, does that mean that Google’s cache does not validate the certificate?

                                                                      1. 1

                                                                        For added context: post author byuu is the primary author of the higan multi-system emulator, and the ARM7TDMI core is used in higan’s Game Boy Advance emulation… and also the ST018 coprocessor for the Super NES.

                                                                        1. 4

                                                                          It should be easy to find models by looking for long lists of triplets of numbers — vertex coordinates. Well, not quite. Pokémon models are stored as compiled shaders.

                                                                          As the post goes on to explain, it does make a lot of sense… but I was very surprised when I read that.

                                                                          1. 4

                                                                            Some time ago, I wrote a C library to interact with the X and MS Windows clipboard in a cross-platform way without large dependencies like Gtk+. It only supports text, but at least Unicode text. I haven’t used it anymore since then, though. From that experience, I can definitely say that the X clipboard system is ridiculously complex. Does anyone have a pointer to me whether Wayland has improved on that topic?

                                                                            1. 3

                                                                              At one point I decided I wanted to write the Wayland equivalent of xclip, and I started looking into the documentation. Unfortunately for me, it turns out that Wayland will only let you read or write the clipboard in response to an input event, like a mouse-click or key-press. This is so that unscrupulous programs can’t lurk in the background and send all your clipboard-copies to the Russian mafia and cause your pastes to produce Viagra adverts. A noble goal, but it makes a hypothetical wayclip program much less usable.

                                                                              Of course, in practice every Wayland desktop will have Xwayland installed for compatibility with existing X11 apps… which means that under Wayland, you keep using xclip and it works just fine.

                                                                            1. 0

                                                                              Shellcheck is an impressive project – it does a pretty complete parse using Haskell’s Parsec parser combinator library.

                                                                              Although in some sense it was a negative inspiration for Oil [1]. Someone integrated it with the code review system at Google, and mostly what it would tell me to is to quote every variable substitution [2]. I would get an error on every line.

                                                                              This is technically correct, but I also thought it was dumb to put so much effort into working around the problems of a crappy language. It would be a lot easier if the language didn’t have this bad behavior in the first place.

                                                                              So that is one design principle for the Oil language: no confusion between strings and arrays of strings [3].

                                                                              [1] http://www.oilshell.org/ [2] Try !qefs in #bash on freenode [3] http://www.oilshell.org/blog/2016/11/06.html

                                                                              1. 2

                                                                                The POSIX shell is a programming language with a single type: the string. It would have been better if this single type were an array of strings.

                                                                                Have you looked into rc, the Plan9 shell? It’s basically exactly that (and also nicer, more regular syntax than the Bourne shell)

                                                                                1. 1

                                                                                  Yes I encountered rc several years ago, and re-read the paper as I started work on Oil. It’s on my list of shells [1], along with the similar es shell.

                                                                                  I’ve also looked at the code, but I don’t think I’ve run any scripts with it. I remember it didn’t have word splitting, which is good. It seems like a smaller, cleaned-up POSIX sh, which is good. But it’s also incompatible, which is probably why it wasn’t adopted widely.

                                                                                  If there’s anything in particular about the rc shell you think would be good in a modern shell like Oil, I’m interested.

                                                                                  [1] https://github.com/oilshell/oil/wiki/ExternalResources

                                                                                  1. 1

                                                                                    I haven’t used rc heavily myself, but the thing I find most appealing is the idea of not having to worry about quoting-induced catastrophes in shell-scripts.

                                                                                    I’m sure you’ve already got a plan for this, but if “array of strings” is your basic data-type, I hope Oil will make it easy to plumb between the various uses of that structure in shell programming, including arguments given to a command, arguments received from a caller, the results of a find(1) (both \n-delimited and \0-delimited), the inputs to an xargs(1), etc.

                                                                                    1. 1

                                                                                      Yes definitely! You will like Oil if I manage to finish it. Help is appreciated. :)

                                                                                      Though I should mention the functionality of find will likely be builtin. As I mentioned here[1], it’s really an expression language in the style of the [ and [[ language, and the ! -a -o \( syntax is an ugly hack.

                                                                                      But of course you could still call out to the external GNU find if you want.

                                                                                      There’s also an argument for xargs and xargs -P in particular being part of the shell. In fact the GNU bash manual refers to GNU parallel [2].

                                                                                      A shell is a thing that starts processes, and some things get lost if you delegate that task to an external xargs (tracing, stats, time, exit code, etc.)

                                                                                      But again, Oil work with external xargs if you want as well.

                                                                                      [1] https://lobste.rs/s/jfarwh/find_is_beautiful_tool

                                                                                      [2] https://www.gnu.org/software/bash/manual/bash.html#GNU-Parallel

                                                                              1. 4

                                                                                I downloaded, compiled, and ran oil on Debian Testing, and it worked fine, but that’s no big surprise.

                                                                                In terms of test-scripts, the shell-based build system redo includes a test suite it uses to find the most appropriate shell to run build-scripts with; in its own words:

                                                                                Note that this file isn’t really a test for POSIX compliance. It’s a test for usefulness-compliance, that is, interpreting POSIX in a particular way for consistency, so that users of redo can depend on all the following functionality.

                                                                                Currently osh fails or produces warnings for 18 of the 116 tests, so that might be help you prioritise which functionality to add next.

                                                                                Also, in other contents I notice you express a preference for shell-based build scripts over shell-in-Makefile build scripts; perhaps you could consider using redo? If the end-user has redo installed, they get all the usual benefits of Makefiles (parallel building, incremental building). If the end-user doesn’t have redo installed, there’s a small shell-script you can ship with your software that just runs the build without all the incremental smarts.

                                                                                1. 1

                                                                                  I’ve seen redo, but I hadn’t seen the test suite. I will check it out – thanks for the pointer!

                                                                                  I’m a fan of most things DJB, and redo is nice. But I think it’s too minimal – the lack of any real projects using it is probably a signal. I probably won’t use it directly, but I may take inspiration from it.

                                                                                  Off the top of my head, redo causes you create lots of tiny little files. I think it’s best for small things. And I also don’t know how I would cleanly support build variants in redo (debug, release, asan, etc.)

                                                                                  I wrote about it a bit here:

                                                                                  https://www.reddit.com/r/oilshell/comments/5q8du5/shell_awk_and_make_should_be_combined/

                                                                                  As well as another shell-based build system I found, blur.

                                                                                  I think my main beef is the syntactic conflict between Make and Shell. It’s just ugly. So many essential characters collide, like \ and $ and more.

                                                                                1. 7

                                                                                  The Twisted Python networking framework uses a “Protocol” object to serialise and deserialise messages from a byte-stream, so protocols are not tied to any particular transport.

                                                                                  Some protocols (SOCKS being a good example) start with a request/authorization phase, but if the authorization succeeds, the protocol switches to a transparent byte-pipe. One implementation strategy might be something like:

                                                                                  def dataReceived(self, data):
                                                                                      if self.auth_succeeded == True:
                                                                                          self.pass_through(data)
                                                                                      else:
                                                                                          self.parse_request(data)
                                                                                  

                                                                                  Alternatively, somebody who’s read Design Patterns might create a RequestState and a PipeState, and have the Protocol object replace one with the other when authentication succeeds.

                                                                                  But in Python… each class instance has a magic __class__ property that tells you what class the instance is an instance of… and that property is writable. I have seen (not written, but seen) production code for a SOCKS-like protocol that, upon successful authorisation did something like:

                                                                                  self.__class__ = PassthroughProtocol
                                                                                  

                                                                                  …to suddenly and instantly switch which class the instance belongs to. The original author had the grace to leave a big scary “here be dragons” comment above the line in question, and for the application in question it probably was the right implementation choice, but I’ve always been impressed at the audacity of that trick.

                                                                                  1. 17

                                                                                    This doesn’t entirely solve the problem for two reasons.

                                                                                    You set it to xterm:yterm:zterm. How does the terminal know which option was selected? This only works if the list is a strict superset, with no conflicts. The potential for user error and mismatches is huge.

                                                                                    It doesn’t really solve the problem of stale servers missing entries. Things work better perhaps, but still not great. What happens when I write uterm which has more features than xterm but not as many as yterm?

                                                                                    Feature detection is the right way to do things. Then I can invent my own terminal, and all the remote servers will use all the features available. (Feature query would probably be a better name; I don’t mean literally try everything and see what sticks.)

                                                                                    Bonus thought: any solution should also consider the situation of nested terminals. I’m running an xterm, I login to a server and run tmux. What is my TERM? A simple priority list doesn’t really answer questions like how many colors can be displayed.

                                                                                    1. 10

                                                                                      Agreed. I like the article because it raises the issue of pain associated with TERM, but feature querying is the better way to solve this morass.

                                                                                      1. 3

                                                                                        Feature detection is the right way to do things.

                                                                                        I believe that the author did discuss both feature querying, as well as simply having a list of enabled features, as possibilities.

                                                                                        1. 6

                                                                                          I agree that a giant string, putting the termcap itself in the environment, is a mess.

                                                                                          Dismissing online querying may have been dismissed too soon. It doesn’t have to slow down every terminal app, only those that use the requested extension. If TERM says xterm, and you don’t need 24-bit color, then you just go. It’s only kaleidoscope ls that needs to perform the query.

                                                                                          1. 3

                                                                                            One problem with feature-querying is that things are often more complex than “yes-or-no”. For example, gnome-terminal supports arbitrary (24-bit sRGB) colour. xterm supports the same escape sequence, but maps the requested colour to the nearest one in its 256-colour palette. Do they both “support” the feature? tmux, as a terminal emulator, supports arbitrary 24-bit colour, but if the terminal it’s running inside doesn’t support 24-bit colour, it obviously can only display 256 colours or less… but if you later reconnect to tmux from gnome-terminal, you’ll see them all. Does tmux “support” the feature? What if tmux is running inside xterm, so tmux can send the 24-bit colour escape sequences to the outer terminal, but the outer terminal then (unknown to tmux) does the wrong thing with them?

                                                                                            A more practical problem with feature-querying is that there’s not really a good way to squeeze it into the existing de facto wire-protocol. There’s no way to guarantee that terminal responses will get back to the application that asked for them (the app could exit back to the shell, or exec something else), and so over time, just about every “answerback” feature in terminals has been neutered for security reasons: an app can’t read the title-bar, or the icon text, or the clipboard, or just about anything else.

                                                                                            $TERMINALS is not a perfect solution to the problem, but at this late stage in the game, it’s probably the most practical one left.

                                                                                            1. 2

                                                                                              One problem with feature-querying is that things are often more complex than “yes-or-no”. For example, gnome-terminal supports arbitrary (24-bit sRGB) colour. xterm supports the same escape sequence, but maps the requested colour to the nearest one in its 256-colour palette. Do they both “support” the feature?

                                                                                              I think that’s a non-issue. If you can spit out an escape sequence and the terminal accepts it instead of doing something weird and horrible, there is no problem using that sequence.

                                                                                              So what if your terminal supported 24-bit sRGB but you’re actually forwarding it to a display with over a link using 16-bit YUV with 4:2:0 chroma subsampling and another layer of lossy quantization? You don’t get all the colours but it works.

                                                                                              The real grief is (1) whether a feature is supported at all, and (2) whether different terminals use the same encoding for that feature. Problem (1) is unavoidable but it should be possible come up with a standard encoding for purely stylistic codes that may be ignored if not implemented. Apart from that, querying might be an acceptable choice. The gratuitous differences (problem 2) really need to go as far as I’m concerned. If a feature is supported, then all terminals claiming to support that feature should damn well use the same exact encoding. Otherwise it’s not the same feature. Then people need to work together a little so that we don’t get a dozen different “features” that all do the same thing, differently.

                                                                                              1. 1

                                                                                                If you can spit out an escape sequence and the terminal accepts it instead of doing something weird and horrible, there is no problem using that sequence.

                                                                                                For some applications, asking for a 24-bit colour and getting a 256-colour approximation is perfectly reasonable. For other applications, it can be a deal-breaker (for example, if your foreground and background colours are distinguishably different in 24-bit colour, but happen to map to the same colour in the 256-colour palette).

                                                                                                There’s also the problem that implementations have bugs. Imagine a terminal-emulator developer who wants to add 24-bit colour support. They design, implement and test the feature, it works beautifully, so they update the “feature query” code to report “yes, I support 24-bit mode”. The new version ships, it winds up in CentOS 9 so it’ll be around for a decade, and then somebody discovers that it segfaults if you try to use 24-bit colour mode in a locale that uses comma as the decimal separator. Now, that may seem like an unlikely sequence of events, but the web-browser community briefly experimented with having browsers self-report what features they supported, and this “client reports it supports a feature, but it turns out to be broken for some use-cases” sort of thing wound up happening all the time.

                                                                                                The browser community eventually settled on “graceful degradation” as the philosophy of choice. That has occasionally worked for terminal emulation (consider the X11 mouse reporting mode versus the SGR mouse reporting mode), but there’s not a lot of potential control sequences guaranteed to be ignored by legacy terminals that we can use to signal new features.

                                                                                                1. 2

                                                                                                  That kind of “worst is best” mentality might have been inevitable in the world of web browsers where there is a sick amount of money involved and coming from people stubbornly (or forcedly) stuck with IE6 or whatever. And that same approach of catering for the lowest common denominator and adding complexity to work around bugs in legacy software that will never be patched gives us OpenSSL… honestly, I think we can do much better. Especially so in the world of terminal emulators which are multiple orders of magnitude simpler.

                                                                                                  Yes, I advocate (see my other comments here) that most software flat out drop support for legacy terminals. People who want the complexity of them are free to bite the bullet and run a compatibility layer. So if you really end up in the unlikely situation that you’re forced to use an old broken terminal emulator that you may not patch, then you should use that compatibility layer first, and then escalate the issue with whoever is disallowing software updates in 2020. The problem internal to that organization should not leak out and affect the rest of the world. If that organization is just you, your problem is self-inflicted and I have no sympathy.

                                                                                                  In a similar sense, worrying about the number of colours that can be displayed is imho a silly corner case problem that the whole system doesn’t need to cater for. First it makes me question the sanity of the application – why would it forcibly choose such close colors that they are distinguishable in true colour but blend together with a smaller colormap? Even if the terminal supported true colour, such poorly chosen colours may yet be indistinguishable to the viewer. What now, do we also feed data about my PC monitor and eyeballs in to the system? Alternatively, the program that purports to support true colour but then does a poor remapping of them is buggy. Fix it, use something else, or disable colours in it, or strip the colour capability flag. The solution doesn’t need to be so complicated.

                                                                                                  I don’t love terminals, but one of their redeeming qualities is that they are, in the end, relatively simple. I think that we can keep it so, and even make them simpler than they are now, but the effort would be hindered if people worry too much about absurd corner case scenarios and broken unpatchable software that they absolutely must use in conjunction with newer software that rightly assumes the program on the other end isn’t broken.

                                                                                              2. 1

                                                                                                Unfortunately a list of terminal types doesn’t solve the nesting problem. Are you going to remember to reset the environment every time you connect to tmux from a different machine? On the other hand, querying allows a program to determine the current state of the actual display terminal. This seems easier than demanding tmux support a superset of all features and downmix on its own.

                                                                                                1. 1

                                                                                                  I don’t think the nesting problem is soluble, even if feature-querying were practical. An application would have to re-query every time it drew to the screen, and… I’m pretty sure a lot of applications wouldn’t bother. (also, how should tmux respond to feature requests when running detached, or when attached to multiple terminals with different capabilities?)

                                                                                                  It’s no doubt a lot of work for the tmux people to support a superset of all features and downmix them at runtime… but that’s how tmux works today, so it wouldn’t be extra work.

                                                                                        1. 6

                                                                                          It is a fossilized world, and one that I would like to see go.

                                                                                          Curses and its database of terminal incompatibility made some sense back when terminals were hardware, there were no standards for them, and rewiring & reprogramming them would’ve been harder than just slapping a layer on top of it, host-side.

                                                                                          Now it’s just legacy and bloat that at best does nothing for you, and at worst causes trouble because your database doesn’t have the right entries or your curses implementation is cranky. Bloat – on OpenBSD, about 36k lines of C, plus the database – in a library that just about every application using terminal escapes will load.

                                                                                          1. 2

                                                                                            how would you propose that an alternative should work? (and yes, it’s a leading question but I’m fishing for ideas…)

                                                                                            1. 8

                                                                                              Start with a standard. Yes that means drop support for ancient crap. If someone really wants to play with old glass on new systems, they can take the hit and run that bloated compatibility layer.

                                                                                              Although I’m not too fond of anything that is in use now, the deprecated ANSI standard or the de-facto xterm standard do work. I wrote a code editor that uses a tiny subset of the ANSI escapes and it ought to work on practically any terminal one might run into these days. If there are a few weird ones that just don’t support these escapes, I say fix them or nuke them. There are plenty of terminal emulators to choose from, and it’s not terribly difficult to make a new one.

                                                                                              Long term, I’d like a modernized, forward-looking standard. For instance, I think it is really silly that there are so many keys and inputs that we cannot reliably use because people couldn’t agree on a way to encode arbitrary inputs. And then there’s hacks like waiting some arbitrary time to determine whether you pressed the ESC key or sent some escape sequence starting with that same character. This is a thing that can be solved.

                                                                                              I’m not terribly concerned with extensions – after all terminals aren’t changing that much these days. And frankly most of the change seems to be in purely stylistic stuff like colours. So it is possible to write the standard such that these purely stylistic codes must be parsed but may be ignored. Thus, colour, italics, bold and such become entirely optional, and there may be space left to encode nonstandard stylistic instructions. Then applications may be written under the assumption that whatever style it wants to use, it can send, and the receiving terminal supports it – if they don’t, no harm done, they just won’t see the colour. No need to query about these.

                                                                                              Then if there’s still a pressing need for it, I wouldn’t object to having some way to query the terminal about feature support. Actually, I don’t particularly care how the query is done – these flags could be passed in the environment, a config file, whatever. Or they may be given by the terminal in response to a code that asks for them. Just like you can query for the cursor location already on existing terminals. Implementation details. See cpuid, opengl extensions, etc.

                                                                                              The important thing is that pointless differences just need to go away. If you support feature X, there should be exactly one standard encoding of the control codes that make up X. No database of silly to figure out how 300 different terminals have 700 encodings for the same basic feature X that has been around for decades. If there’s some non-standard extension Y, other people willing to implement that same feature must follow the same rules. Eventually non-standard extensions can be incorporated into the standard. Again, see OpenGL.

                                                                                              1. 2

                                                                                                In terms of supporting arbitrary key inputs, have you seen the proposal from libtermkey’s author?

                                                                                                1. 2

                                                                                                  I have seen it, yes. And it is a fair proposal that does a fair job of maintaining compatibility with historic behavior while requiring relatively small changes to support the new scheme in current terminal emulators.

                                                                                                  However, I might be a little too idealistic to just embrace that idea as great. A pet peeve of mine is programs that confuse letters and keys, and then hold some keys special because they’re not letters. Applications really have two modes of taking input – one is text, which might have been typed out, or pasted, or produced by an IME. The other mode is just taking abstract inputs and binding them to actions. Most nontrivial applications kinda need both.

                                                                                                  When the artificial confusion of letter inputs, modified letter inputs and non-letter inputs is maintained, you’ll find that your application is unable to bind an action directly to control or shift, or to use some letter key as a modifier. Or, indeed, any other key that wasn’t explicitly given the special role of control and alt..

                                                                                                  And what about non-keyboard inputs? Terminals may support mice too. Again here I just see unnecessary artificial boundaries in that these things are just flat out unable to process anything that is not a keyboard, or perhaps one of the two big mouse buttons. In fact it is possible to encode any input – from a fancy keyboard, or the seventeenth thumb button of my mouse, or my belly button – all uniquely. It is not hard, if people only quit looking at the concrete and thought in abstract for a minute.

                                                                                                  So it should be possible to encode any combination of inputs (thus allowing any of these inputs to act as a modifier). That same encoding may include the text (or single letter) that is associated with that combination. Then it is up to the application to treat it as text input or as something abstract.

                                                                                                  Of course this would be a much bigger and more intrusive change – not so great in terms of backwards compatibility.

                                                                                          1. 5

                                                                                            I first got access to a computer as a child, so I never learned to type “properly”. By the time somebody did try to teach me, suppressing my bad typing habits was nearly impossible: if I forgot to do things “correctly”, it would Just Work and I might not even notice I’d slipped up.

                                                                                            So, I switched my keyboard layout to Dvorak. I didn’t move keycaps around, although for the first week or two I kept a window open with a picture of the Dvorak layout in it, for reference. This time around, I made sure to practice hitting keys with the “correct” fingers in the “correct” positions. Each time I slipped up and went back to my old habits, I started typing gibberish, so I had immediate feedback I’d done something wrong. With regular IRC activity to motivate typing speed, I felt comfortable in Dvorak after maybe a month or so?

                                                                                            I have no idea how fast I type in Dvorak, or how it compares to how fast I typed in QWERTY; I think pretty slowly, so as long as I type faster than that I’m fine.