1. 75
  1.  

  2. 27

    It’s important to distinguish between regex-based syntax highlighting which is done by most editors, and semantics based highlighting which is done by IntelliJ. IntelliJ has been stuffing important info into highlighting for a long time.

    • the baseline is that names are colored differently depending on what they refer to (locals vs params vs classes vs interfaces)
    • there are additional decorations when something happens (static things are in italics, places of implicit coercion’s/casts, (in Rust) mutable things are underlined, unsafe operations are colored red)
    • there are some extra contextual highlights,like highlighting all refs of identifier at carret, highlighting all exit points, recursive calls, suspend points.

    It’s not like we need to radically rethink anything, the tooling has been there for a long time for a certain selection of popular languages.

    1. 7

      the tooling has been there for a long time for a certain selection of popular languages

      Every one of the suggestions in the article except rainbow braces and nesting are already in daily use in c# for both Visual Studio and Rider, the two biggest c# IDEs. In both cases the color schemes used are completely configurable and can include text color, background color, font and font style. The metadata highlighting is done with colored underlines and in some cases icons in the margin.

      I assume the reason they don’t have the nesting highlighting is because indentation style guidelines are well thought out in c# and that is usually enough to see how things are nested. Rainbow braces would be nice to have but configurable selected brace highlighting means it is at least possible to actively read that information if needed.

      1. 9

        More and more I’m realizing I’m missing out on the cutting edge of developer tooling simply because I don’t work in JVM or .NET. Time to relearn F#, I guess…

        1. 2

          I use IntelliJ for a primarily Perl project. There is a great plug-in for semantic Perl support that IntelliJ offered to install the first time I opened a Perl file. Requires a fast machine or lots of patience. I end up in powersave mode most of the time.

          1. 1

            I work with (enterprise y) JavaScript and i benefit from IntelliJ/Webstorm, so much so that I’ve bought a personal copy. I can recommend getting one of them trial licences (three probably are some) and try it for a month.

            It’s not for everyone. You may dislike it, and certain things will sl annoy you to no end. But you can’t tell if you don’t try it.

        2. 6

          Tree-sitter for neovim looks promising as well.

          1. 5

            It’s true that there’s been some improvement in the IDEs, and it’s not just IntelliJ. VS Code does a lot of it as well. I can also get other types of decorations. For example, the Elm extension in VS Code gives me an “exposed” marker on functions, it shows the number of references to each function (which I can expand into a view), highlights unused imports etc.

            Perhaps things are ok in terms of highlighting, and what I’d really like is the ability to have dynamic, queryable views of the source code, which is what the post alludes to as well, such as highlighting functions which are over N lines long, or get a view of all functions which use a particular type, or a view of all functions with N or more levels of nesting. Some of these things can be done with search, but only simple ones, and not very conveniently.

            1. 9

              I think most of these things (except dynamic part) are handled by SSR (available since 2006):

              https://www.jetbrains.com/help/idea/search-templates.html#script_constraints

              For dynamic part, one needs to write a custom plugin.

              1. 1

                Interesting! I think this might be specific to Java or some subset of languages. I haven’t seen this feature in RubyMine or WebStorm.

          2. 16

            Most text editors (and even IDEs) have a surprising lack of semantic knowledge. Editing programs as flat text is brittle and prone to error. If we had better, language-aware transforms and querying systems built into our text editors, we’d be able to more easily build interactive tools/macros/linters rather than relying on the batch processing we use these days.

            Some cool, language-aware tools that exist today (ish) are:

            1. 17

              The problem is that plain text is proven to carry information across thousands of years, whereas custom formats rot. I can read a paper from 1965 and understand the Fortran source in it, but it’s next to impossible to read many binary formats from the 90s without custom code.

              I think that we need to focus on simplifying the analysis of languages: effect systems, and limiting global state should make it easier to analyze the semantics of syntactic structures and thus make structure and semantics easier to highlight. (I certainly don’t need syntax highlighting working in haskell, but it’s hard not to miss in C-likes).

              1. 2

                Well, ASCII has only been around for a few decades, so I’m not sure it’s been shown to last thousands of years yet ;)

                Granted, there haven’t been any graph (for program ASTs) or table-like (for other program data) data structures that are as pervasive as ASCII or UTF-8 plaintext, and if you want to argue that it makes sense to keep the serialization format plaintext so it’s human readable (like JSON or graphviz or CSV), that’s fine. It still doesn’t prevent us from storing more rich and semantic information beyond just flat symbolic source code.

                The problem with source code is it’s difficult to build a parsers for it, and there’s only one representation for code. For instance, if all source code was stored as an AST in json, think of how easy it would be for you to build custom tools to analyze and transform your code? (This is a terrible idea for other reasons, but it illustrates the idea).

                1. 2

                  True, I’m using a wider definition of “plain text” than just ASCII.

                  You’re right about being able to deserialize plain text into more semantically interesting structures, of course. Then, though, you’re tying visualization (or, at least, editing) to a probably-limited set of tools. I think about the XML ecosystem, which fifteen years ago probably seemed unstoppable, a sure bet for further tool development… but these days the only really powerful one of which I’m aware is Oxygen, which is dated and costs $lots for business licenses.

                  Other problems are possible as well, such as vulnerability to deserialization attacks, like CVE-2017-2779.

                  Ultimately I think that many things could be helped by plain text structures that allow more sophisticated namespacing and structuring than the usual function/class/const options we get: first class modules, as in OCaml, for example. I think these sorts of things are coming, but it’s a slow process.

              2. 10

                Editing programs as flat text is brittle and prone to error.

                I strongly disagree. I work on a rather large code base and nearly everyone on my team prefers to use vim or emacs. There’s something to be said for walking through a neighbourhood rather than driving through one when you want to buy a house. The vast majority of our time (99%?) is spent reading code or debugging rather than writing code. Every line of code should be thoughtful and we should ALWAYS optimize for readability. Not just the semantics of variables and objects but the design of the whole system.

                Languages like java are impossible to write without tool assistance. They’re aggregating large and miserable frameworks where code refers to variables in other files through inheritance and all that other stuff. Just trying to figure out which implementation of foo() an object will call can be difficult or impossible without assistance. That sort of complexity now needs to be internalized in your limited human memory banks as you try to make sense of it all.

                1. 2

                  Oh, I use Vim too – I dislike IDEs for their bloat, and I also prefer languages that are more oriented towards small, compact solutions (and even have an interest in taking it to an extreme with, e.g. APL). If the entire program can be kept in a single file (or even better a single page of text), all the better. Spatial compactness is useful for understanding and debugging, and less code means less bugs.

                  My original point still stands though. Having better tools doesn’t mean code quality has to suffer. The fact of the matter is that we end up having larger codebases that require more complicated code transforms or linting checks. At minimum, having a syntax aware way of doing variable or function renaming in a text editor is superior to blindly running sed over a arbitrarily[1] line-oriented character array. Even from a programmer’s perspective, I’m not convinced a purely symbolic representation of code is always superior. It’s certainly a compact and information dense way of viewing small pieces of code, but it quickly becomes overwhelming when coming to grips with larger systems. Plus, there’s only so much info you can cram into one screenful of code.

                  I think, ideally, we’d have multiple ways of viewing the same code depending on the context we’re working in. For instance, when trying to jump into a new codebase to add a feature, data flow is more important than directly understanding the specific implementations of any function. It would be useful to be able to take a function, and view it in the context of a block diagram to see how it fits into the rest of the system and all code paths that lead to it. In another situation, you may want to view it from a documentation perspective that allows you to semantically tie documentation, proofs, formulas, or diagrams directly into the code, even to specific expressions (kind of like docstrings, but more structured and format rich). Or in a situation where you’re working with a protocol, rather than having an implicit finite state machine that’s only viewable from a code point of view (with a switch statement or through functions that are tail called), you could flip into a graphical view of the FSM or a tabular view of the state transitions.

                  Some of the things I’ve mentioned above are somewhat possible today with external tools, but the problem is they each construct their own AST and semantic knowledge of the source (sometimes incorrectly). There’s no communication between the tools, no referential integrity (if you update the source, do you have to rebuild an index for each tool from scratch?). A standardized, semantic storage format for code would help to address some of these issues.

                  [1]: I say arbitrarily here, because sometimes the line-oriented nature of sed or grep conflicts with the true expression oriented structure of the code. For instance, if a function signature is split across two lines, trying to search for all return types with grep -e '\w+ .*(.*).*{ wouldn’t work. Besides, most syntax structures are recursive which regexs are inherently limited at parsing.

                  1. 4

                    It would be useful to be able to take a function, and view it in the context of a block diagram to see how it fits into the rest of the system and all code paths that lead to it.

                    I think it would be more useful if a function had to consider less and less the rest of the system. Otherwise you have a poor contract and high coupling. I think code and architecture need to blend together and if you need a tool to make sense of it all then you’ve failed.

                    This is a perfect example of nightmarish code for me. There’s about 200 methods and maybe 6 or 7 deep on the inheritance chain. It’s barely possible to manage even with an IDE and a WPF textbook sitting on your desk. https://docs.microsoft.com/en-us/dotnet/api/system.windows.shapes.rectangle?view=netcore-3.1

                2. 8

                  I strongly agree with you. We’ve been hamstrung by the primitive editors for decades. This fixation on text cripples other tools like version control as well - semantic diffs would be an obvious improvement but it’s rarely available. (The usual counterarguments about the universality and accessibility of text don’t stack up to me.)

                  1. 2

                    The insistence on using plain text for canonical storage, API interface, and user interface is IMO the thing most holding us back (some other top contenders being the pursuit of “performance” and compilation-as-DRM).

                    1. 10

                      Looking at the current web, I would have to disagree with the idea that the pursuit of performance is holding anything or anyone back…

                      1. 1

                        If you’d seen all the node-gyp build failures I had, you might think differently. But I’m thinking more about stack busting and buffer overruns at runtime and hobbled tooling at devtime in this case.

                        1. 2

                          Native modules and the whole node-gyp system is horrible, but I don’t think that’s due to pursuing performance? Most of the time, packages with native code seem to just have taken the easiest path by creating node bindings for an existing library, and I don’t think node-gyp itself is bad due to a pursuit of performance…

                          AFAIK, though this could be wrong, the main reason for node’s horrible native code support is that people just use the V8 C++ API directly, and Google is institutionally incapable of writing stable interfaces which other people can depend on. They constantly rename methods, rename or remove classes, move header files around, even deprecate functionality before replacements exist. Even that isn’t just due to a pursuit of performance though, but due to a fear of tech debt and a lack of care for anyone outside of Google.

                  2. 2

                    Comby is definitely a huge upgrade from writing regexps. There’s also Retrie for Haskell and Coccinelle + coccigrep for C. I’d really love to see a semantic search/replace/patch tool for Rust…

                  3. 6

                    Something else people may be interested in: semgrep.

                    1. 4

                      I found the circle just as fast with and without color…

                      1. 2

                        The blurry anti-aliasing highlighted it as much as color did.

                        1. 2

                          I didn’t notice because of that. I noticed because it was the only circle in a sea of identical objects.

                      2. 4

                        To the question “Why aren’t things this way?”, I think the answer is that simple syntax highlighting is just good enough. Maybe it can be improved a bit, maybe it will make it worse. But overall, any change one way or the other is not going to make developers massively more productive.

                        1. 3

                          Pinging @hwayne, since he’s the author of this.

                          So, a lot of these tend to follow the sorts of ad-hoc sorts of structures I tend to build using ripgrep or reference following or debugging.

                          I think this would be best combined with something like that datalog-based IDE, so that one could build these analysis passes on the fly.

                          Granted, it would be a lot of work to pull off.

                          1. 5

                            IMO the ability to make hyperspecialized one-off analysis is 1) incredibly powerful and 2) something we don’t really know how to make accessible to generalist programmers.

                            1. 1

                              What tools exist in that space as things are right now, short of writing/modifying a compiler?

                              For me, code-analysis has either been something baked into a compiler, or something I’ve done by hand, more or less, sometime with mechanical aids, and sometimes not.

                              1. 1

                                I suppose the exceptions to that would be things like writing interpreter hooks, or using method_missing type approaches.

                                1. 1

                                  I wish I knew more about it… almost everybody I know who seems to do this stuff is doing it with regexes and grep.

                                  1. 1

                                    I could see something akin to XPath working out, if you had the syntax tree in some sort of “regular” form. Of course, that still makes it hard to figure some things out, and what you can pull off depends largely on what has been exposed on the tree in question.

                            2. 3

                              These are all just variations of syntax highlighting, they all still “waste” the information channel.

                              If you really want to make the most of the color channel, make color syntactically defining, like in colorForth.

                              1. 3

                                I think that could possibly be one of the modes, but there are many different, potentially conflicting ways to colour a program depending on the task at hand, and I’d rather have the ability to toggle modes. I think that would be more useful than a rigid colour scheme imposed by semantics.

                                1. 4

                                  All of which require a projectional editor that works in both directions. On a related note, the idea of writing code where colour is an important input strikes me as a nightmare to use, and I’ve got full-colour vision.

                                  1. 1

                                    Depends on the language and the UX, I guess.

                                    1. 1

                                      On a related note, the idea of writing code where colour is an important input strikes me as a nightmare to use, and I’ve got full-colour vision.

                                      I think it’s safe to say that as cool as colorForth is/was, it’s never ever going to happen in a non-niche way. Though, piet is thriving…

                                2. 2

                                  I like the idea of different syntax coloring that you can switch on and off for different contexts, but these context switches increase the congestive load of using the color information channel. A different syntax highlighting mode for debugging, greenfield codding, legacy codding and reviewing would lead to a weaker feel for the highlight meaning of each mode.

                                  If we are talking next generation tools then types, options, exceptions should be handled much more comprehensively than by highlighting in the editor. The syntax checker, linter, type checker or compiler will let you know what is going on, ex. that you have a non-exhaustive pattern match, T expected but option(T). Since these are not just “this code regions is a member of this class” but “this code region has this additional warning/error information” these would be better handled by the editors syntax error features.

                                  For code review and legacy editing we are going to end up with ML tools with ratings and suggestions tied to code regions. I don’t see several dozen syntax highlighting regexs as worth while because each is specific and not generalized.

                                  1. 2

                                    Another idea that I’ve seen is highlighting differently the current context, for example color hovered function and gray out others.

                                    I guess you could write plugins to have all these options available, but it would become messy to maintain everyone’s need.


                                    PS: Yet another static blog requiring JS to display static pages. 🙁 Rendered HTML is included in a <noscript> element but the CSS makes it broken.

                                    1. 1

                                      PS: Yet another static blog requiring JS to display static pages. 🙁 Rendered HTML is included in a element but the CSS makes it broken.

                                      Works for me, with uMatrix and all JS blocked by default. The page doesn’t have margins, which makes it look a bit ugly, but the text is readable.

                                    2. 2

                                      I like his suggestion to make syntax-highlighting modal with different dimensions. e.g. ctrl-alt-1 , 2 , etc could be assigned to block-level, brace-level, symbol-level highlighting.

                                      1. 2

                                        The more specific you make your syntax highlighting the less often it will occur and the less likely you are to remember what it actually means. There’s a balance to find there somewhere.

                                        1. 1

                                          Color carries a huge amount of information. Color draws our attention.

                                          I think this is the key point, and the reason why I dislike syntax highlighting: Color draws my attention – and I don’t need my attention drawn to ‘if’: it’s distracting. Most coloring that happens automatically is trying to answer questions that I didn’t ask.

                                          Highlighting of questions I ask about the code is where I suspect color would be most effective. I don’t want it to be automatic – it should happen when I ask my editor a question. For example, if I ask the editor to do dataflow analysis for me, I’d like it to highlight which assignments and function calls a value can reach and flow through.