I really want matrix to succeed, but the issues are plentiful.
The fact that self-hosting synapse in a performant manner is no trivial feat (this is slowly improving), compounded by the fact that no mobile client yet supports sliding sync (ElementX when) makes my user experience in general very miserable. Even the element-desktop client have horrible performance, unable to make use of GPU acceleration on nearly all of my devices.
It’s nothing particularly novel to matrix: rendering UIs on the CPU tends to use more battery than the hardware component whose entire goal is rendering, and it’s hard to hit the increasingly-high refresh rates expected solely via CPU rendering.
A chat application ought to do very infrequent redraws, basically when a new message comes in or whenever the user is composing, worst case when a 10fps gif is being displayed. I find it concerning we now need GPU acceleration for something as simple as a chat to render itself without feeling slugish.
Rendering text is one of the most processor-intensive things that a modern GUI does. If you can, grab an early Mac OS X machine some time. Almost all of the fancy visual effects that you get today were already there and were mostly smooth, but rendering a window full of text would have noticeable lag. You can’t easily offload the glyph placement to the GPU, but you can render the individual glyphs and you definitely can composite the rendered glyphs and cache pre-composited text blocks in textures. Unless you’re doing some very fancy crypto, that will probably drop the power consumption of a client for a plain text chat protocol by 50%. If you’re doing rich text and rendering images, the saving will be more.
The downside with the texture atlas rugged approach is that the distribution of glyphs in the various cached atlases in every process tend to become substantially re-invented across multiple graphics sources and make out quite a bit of your local and GPU RAM use. The number of different sizes, styles and so on aren’t that varied unless you dip into some kind of opinionated networked document, and even then the default is default.
My point is that there is quite some gain to be had by somehow segmenting off the subsurfaces and somewhat split the load – a line packing format in lieu of the pixel buffer one with the LTR/RTL toggles, codepoint or glyph-index lookup, (so the client need to know at least GSUB of the specific font-set) and attributes (bold, italic, colour, …) one way and kerning feedback for picking/selection the other.
That’s actually the setup (albeit there’s work to be done specifically in the feedback / shaping / substitution area) done in arcan-tui. Initial connection populates font slots and preferred size with a rough ‘how does this fit a monospaced grid w/h” hint. Clients using the same drawing properties shares glyph cache. We’re not even at the atlas stage (or worse, SDFs) stage yet the savings are substantial.
The downside with the texture atlas rugged approach is that the distribution of glyphs in the various cached atlases in every process tend to become substantially re-invented across multiple graphics sources and make out quite a bit of your local and GPU RAM use
I’m quite surprised by this. I’d assume you wouldn’t render an entire font, but maybe blocks of 128 glyphs at a time. If you’re not doing sub-pixel AA (which seems to have gone out of fashion these days), it’s 8 bits per pixel. I’d guess a typical character size is no more than 50x50 pixels, so that’s around 300 KiB per block. You’d need quite a lot of blocks to make a noticeable dent in the > 1GiB of GPU memory on a modern system. Possibly less if you render individual glyphs as needed into larger blocks (maybe the ff ligature is the only one that you need in that 128-character range, for example). I’d be really surprised if this used up more than a few tens of MiBs, but you’ve probably done the actual experiments so I’d be very curious what the numbers are.
That’s actually the setup (albeit there’s work to be done specifically in the feedback / shaping / substitution area) done in arcan-tui. Initial connection populates font slots and preferred size with a rough ‘how does this fit a monospaced grid w/h” hint. Clients using the same drawing properties shares glyph cache. We’re not even at the atlas stage (or worse, SDFs) stage yet the savings are substantial.
That sounds like an interesting set of optimisations. Can you quantify ‘substantial’ at all? Do you know if Quartz does anything similar? I suspect it’s a bit tricky if you’ve got multiple rounds of compositing, since you need to render text to some texture that the app then renders into a window (possibly via multiple rounds of render-to-texture) that the compositor composes onto the final display. How does Arcan handle this? And how does it play with the network transparency?
I recall seeing a paper from MSR at SIGGRAPH around 2005ish that rendered fonts entirely on the GPU by turning each bezier curve into two triangles (formed from the four control points) and then using a pixel shader to fill them with transparent or coloured pixels on rendering. That always seemed like a better approach since you just stored a fairly small vertex list per glyph, rather than a bitmap per glyph per size, but I’m not aware of any rendering system actually using this approach. Do you know why not? I presume things like font hinting made it a bit more complex than the cases that the paper handled, but they showed some very impressive performance numbers back then.
I’m quite surprised by this. I’d assume you wouldn’t render an entire font, but maybe blocks of 128 glyphs at a time. If you’re not doing sub-pixel AA (which seems to have gone out of fashion these days), it’s 8 bits per pixel.
You could’ve gotten away with an alpha-coverage only 8-bit texture had it not been for those little emoji fellows, someone gave acid to the LOGO turtles and now it’s all technicolour rainbow – so full RGBA it is. While it is formally not a requirement anymore, there’s old GPUs around and you still can get noticeable difference when textures are a nice power-of-two (POT) so you align to that as well. Then comes the quality nuances when rendering scaled, since accessibility tools like there zooms in and out you want those to look pretty and not alias or shimmer too bad. The better way for that still is mip-mapping, so there is a point to raster at a higher resolution, switch that mipmap toggle and have the GPU sort out which sampling level to use.
That sounds like an interesting set of optimisations. Can you quantify ‘substantial’ at all? Do you know if Quartz does anything similar?
There was already a big leap for the TUI cases not having WHBPP*2 or so pixels to juggle around, render to texture or buffer to texture and pass onwards (that could be another *4 because GPU pipelines and locking semantics you easily get drawing-to, in-flight, queued, presenting).
The rest was that the font rendering code we have is mediocre (it was 2003 and all that ..) and some choices that doesn’t fit here. We cache on fonts, then the rasterizer caches on resolved glyphs, and the outliner/shaper caches on glyph lookup. I don’t have the numbers available, but napkin level I got it to around 50-75% overhead versus the uncompressed size of the font. Multiply that by the number of windows open (I drift towards the upper two digit of active CLI shells).
The size of a TPACK cell is somewhere around 8 bytes or so, using UCS4 even (you already needed the 32-bit due to having font-index addressing for literal substitution), then add some per-line headers. It also does I and P frames so certain changes (albeit not scrolling yet) are more compact. I opted against trying to be overly tightly packed as that has punished people in the past and for the network case, ZSTD just chews that up into nothing. It’s also nice having annotation-compact text-only intermediate representation to juggle around. We have some subprojects about to leverage that.
Do you know if Quartz does anything similar? I suspect it’s a bit tricky if you’ve got multiple rounds of compositing, since you need to render text to some texture that the app then renders into a window (possibly via multiple rounds of render-to-texture) that the compositor composes onto the final display. How does Arcan handle this? And how does it play with the network transparency?
I don’t remember what Quartz did or how their current *Kits, sorry.
For Arcan itself it gets much more complicated and a larger story, as we are also our own intermediate representation for UI components and nest recursively. The venerable format string based ‘render_text’ call at the Lua layer force rasterisation into text locally as some genius thought it a good idea to allow arbitrary embedding of images and other video objects. There’s a long checklist of things to clean up, but that’s after I close down the network track. Thankfully a much more plastic youngling is poking around in those parts.
Speaking of networking – depending on the network conditions we outperform SSH when it starts to sting. The backpressure from things like ‘find /’ or ‘cat /dev/random’ resolves and renders locally and with actual synch in the protocol you have control over tearing.
I recall seeing a paper from MSR at SIGGRAPH around 2005ish that rendered fonts entirely on the GPU by turning each bezier curve into two triangles (formed from the four control points) and then using a pixel shader to fill them with transparent or coloured pixels on rendering.
AFAIR @moonchild has researched this more than me as to the current glowing standards. Back in ‘05 there was still a struggle getting the text part to behave, especially in 3D. Weighted channel based hinting was much more useful for tolerable quality as well, and that’s was easier as a raster preprocess. Eventually Valve set the standard with SDFs that it still(?) the dominant solution today (recently made its way natively into FreeType), and quality optimisations like multi-channel SDFs.
Thanks. I’m more curious about the absolute sizes than the relative savings. Even with emoji, I wouldn’t expect it to be a huge proportion of video memory on a modern system (even my 10-year-old laptop has 2 GiB of video memory). I guess it’s more relevant on mobile devices, which may have only this much total memory.
I will try and remember to actually measure those bits myself, can’t find the thread where C-pharius posted it on Discord because well, Discord.
The savings are even more relevant if you hope to either a. at least drive some machines from an FPGAd DIY graphics adapter instead of the modern monstrosities, b. accept a 10-15 year rollback in terms of available compute should certain conflicts escalate, and c. try to consolidate GPU processing to a few victim machines or even VMs (though the later are problematic, see below) – both of which I eventually hope for.
I layered things such that the Lua API looks like a balance between ‘animated display postscript’ and ‘basic for graphics’ so that packing the calls in a wire format is doable and asynchronous enough for de-coupling. The internal graphics pipeline also goes through an intermediate-representation layer intended for a wire format before that gets translated to GL calls for the same reason – at any time, these two critical junctions (+ the clients themselves) cannot be assumed/relied upon to be running on the same device / security domain.
Public security researchers (CVE/bounty hunters) have in my experience been pack animals as far as targeting goes. Mobile GPUs barely did its normal job correctly and absolutely not securely for a long time and little to nothing could be heard. From DRM (as in corporate malware) unfriendly friends I’ve heard of continuous success bindiffing Nvidia blobs. Fast > Features > Correct > Secure seems generally to be the priority.
With DRM (as in direct rendering manager) the same codebase hits BSDs and Linux alike, and for any VM compartmentation, VirGL cuts through it. The whole setup is massive. It evolves at a glacial pace and it’s main job is different forms of memcpy where the rules for src, dst, size and what happens to the data in transit are murky at best. “Wayland” (as it is apparently now the common intersection for several bad IPC systems) alone would’ve had CVEs coming out the wazoo had there been an actual culture around it, we are still waiting around for conformance tests, much less anything requiring more hygiene. Fuzzing is non-existent. I am plenty sure there are people harvesting and filling their barns.
Account creation and SSO will come with OIDC. OIDC will come in September.
the code’s there and works; just needs to be released and documented. NB that shifting to native OIDC will be a slightly painful migration though; some of the old auth features may disappear until reimplemented in native OIDC, whicy may or may not be a problem for you.
So TLDR is that emacs-LSP sends invalid utf16 character offsets on text changes. It sends utf8 offsets, when the LSP spec says utf16. Rust-Analyzer tries to use these invalid offsets to manipulate internal data and panics, because it ends up trying to change stuff inside a multibyte-character. That only happens on certain multibyte stuff for obvious reasons (bottom emoji). Fix would be UTF-8 OPT-IN (new LSP option), or sending correct UTF16 offsets. Fixes exist, but are in upstreaming limbo. Hope I got this right.
Also fun from the article: unicode normalization/denormalization (“é” can be one or two codepoints, if you like combining diacritics). Big emojis are small emojis zero-width-joined. ECMAScript can’t decide between UTF-16 code units and Unicode codepoints (indexing vs iteration).
lsp-mode’s auto-download doesn’t seem to work if you use rustic, at least, and it falls back to RLS, which is completely deprecated. Emacs considers that “characters” are 22-bit-wide, just like Unicode (21 bits) + 1 bit to make room for “raw bytes”.
There’s also a lighting review of zig hidden in plain sight.
I really liked the unicode-utf8-utf16 visualizations. I wish that would exist as some kind of website where you can look up stuff and see how it is represented in different unicode encodings.
I definitely had to notice the Zig intermezzo - the code required is very verbose. Reminded me of your “Twitch” Article mentioning burnout, and gave me the thought how you’re probably also “locked in” to rust content, as the thing that made your articles self sustaining.
I haven’t really used LSP but the gist I get is that everyone is vaguely embarrassed by the fact that the spec says everything has to be UTF-16, and instead of fixing it, most people just kind of pretend it’s already been fixed because the alternative of actually having to think about non-UTF8 text gives people traumatic flashbacks to the 1990s and no one has time for that?
Like… the specified behavior is also hilariously the wrong thing, so no one wants to actually defend it and insist on following the spec?
[edit: hadn’t made it to the end of the article where it says the spec now allows negotiation on this as of a few months ago]
Not sure about “instead of fixing it” bit. rust-analyzer (following clangd) supported sane utf-8 offsets since forever. I’d personally be perfectly happy if clients supported only utf8 (which would be a de-jure violation of the spec), as long as they properly advertise utf8 support.
That’s what I meant by pretending it’s already been fixed; ignoring the spec and doing the right thing anyway by counting codepoints instead of using UTF-16 offsets. IMO the right thing to do, but at the same time “please ignore the spec because it’s very stupid” is a hard argument to make.
I actually made that argument a while ago, when every LSP implementation started to realize that the spec mandated UTF-16 offsets (while mandating UTF-8 document encoding…). At that point in time most implementations were either using UTF-8 byte offsets or codepoint offsets and we could have unified on something sensible while pressuring Microsoft to fix the spec to what implementations were actually doing instead of what happened to be convenient to them. Unfortunately that did not happen and every LSP implementation now has to contain the same unnecessary complexity to be compliant. The recent change to LSP added support for alternate offset encoding but still mandates that UTF-16 must be supported.
It’s a bit hard to fix the spec, as it isn’t collectively maintained, and the upstream isn’t particularly responsive. Historically, just doing the thing ahead of the spec, and using the “this already is how the things work in the wild” was the most effective way to move forward. Which sort-of what happened with position-encoding as well!
UTF-16 code units are still how quite a lot of things in the web platform are specified, largely because that’s JavaScript’s closest thing to a “character” type, and JS is the native language of that platform. So things like setting a max length of 10 on an HTML form input means “10 UTF-16 code units”, not “10 bytes” or “10 code points” or “10 graphemes”.
Though curiously some parsing algorithms are still specified in terms of code points and not code units, which means implementing them in a “just use UTF-8” language can be trickier than expected. For example, the HTML5 legacy color parsing algorithm (the one that can turn any string, even “chucknorris”, into an RGB color value) requires identifying code points at specific positions as well as being able to perform replacements based on code point values.
UTF-16 code units are still how quite a lot of things in the web platform are specified
And that would be relevant, if the context had anything at all to do with the web rather than a supposedly language-agnostic protocol! But they made a big mistake and let implementation details leak into the spec, and it’s taken them years to admit it.
As the post explains, LSP is based on JSON-RPC, which ties it firmly back to the web domain (JSON-RPC being built around JSON, which in turn is built on top of JavaScript). Plus LSP itself was originally developed for VS Code, an Electron app, which likely had some influence on the selection of things like JSON, JS, etc.
I’m not all sure it is an “implementation detail”, unless JSON itself is considered a detail that shouldn’t leak into the spec. Which would be weird, since usually the data format is supposed to be part of a spec.
(where I’m going with this, ultimately, is JSON being more complex, still, than people generally realize – even though the 2017 JSON RFC waved its hands in the direction of saying JSON ought to be UTF-8, it did so in a way that left loopholes for protocols like LSP to fall back to “JSON is JS is UTF-16”, plus the RFC itself still has UTF-16-isms in it, notably in the escape syntax which requires use of surrogate pairs for non-BMP code points)
I phrased that badly – the idea was just to point out that JSON explicitly allows for a protocol to do UTF-16 (or, really, any other encoding), and that JSON’s inherent origin in the web domain means “web domain is irrelevant” is wrong.
LSP is caring about UTF-16-y things in a different way.
But this is turning into a super-deep tangent just for what I meant as a throwaway comment about how UTF-16 isn’t as dead as people seem to want to think it is.
The second case is interesting, because it depends on how std::vector is implemented.
If std::vector was {T* data, size_t size, size_t capacity}, then pop_back would likely have no undefined behaviour provided T was a trivially destructible type (a type for which no code runs on destruction). Decrementing size from 0 is (unfortunately if you ask me) well defined to wrap around, so the sanitizers would not be allowed to complain about, for all they know we might actually have expected that wrap around.
In practice, std::vector is usually implemented as {T* begin, T* end, T* end_of_capacity}, in which case forming a pointer to 1-before-begin (which would happen on an empty vector where begin == end) is by itself UB (just forming the pointer, no need to try to dereference it). I wonder why the sanitizers do not detect this.
in which case forming a pointer to 1-before-begin (which would happen on an empty vector where begin == end) is by itself UB (just forming the pointer, no need to try to dereference it). I wonder why the sanitizers do not detect this.
I think because it would be extremely expensive to check every pointer-arithmetic operation - the sanitizers (address sanitizer, at least) instead check for validity on dereference. This means they can have false negatives, of course.
The difficulty lies around knowing the bounds of the array into which a pointer points. As far as I understand ASan currently just (more-or-less) tracks which memory is allocated, not what is allocated there; for example if the memory contained a structure containing an array, the structure size (and thus the allocation size) are potentially larger than the array size, so you can’t use the allocation bounds to decide whether a pointer has moved outside the array.
Keeping track of whether memory is allocated is as simple as having a shadow-space where a bit or set of bits keeps track of the state of each byte (or slightly larger unit) of memory; tracking type as well would require a much more complicated structure, and a lot of additional instrumentation.
Yeah, that makes sense. Initially I thought you could just instrument after each pointer arithmetic operation but did not realise that the costly part is to resolve what range of memory is valid for this specific pointer.
I noticed a weird trend where the check for if something is undefined happens after the undefined behavior. Intuitively it makes sense to me that this is insufficient so I’m wondering why this failure is so common? For example here the overflow check happens after the overflow already happened.
I don’t usually do that, but in my case there were 2 reasons:
the initial intention when writing it wasn’t protecting against overflow/UB but simply protecting against the computation going outbound for the dereference
the assignment needed to be done early because I was targeting a codebase with some ancient C convention on variable declaration required to be before any code; and since I had the form where I wanted to bail out early instead of opening a scope, I had to declare it early:
int x = ...
if (...)
return -1
// x can not be declared here
return f(x)
Not trying to be critical, but it shouldn’t be news that you can’t check for a UB condition after it’s happened. I’ve seen so many cases where people run into similar problems, or problems related to the type-punning rules. Usually the thought process is something along the lines of:
ok, so I know this is supposed to have undefined behavior. But the variable will still have some value, so I can check if the value is in the correct range and so avoid issues that way…
No, you can’t. This is what undefined behaviour means. All bets are off; if it’s hit, it is a bug in the code, full-stop, and no checking afterwards can fix it because the behaviour from that point (*note1) on is totally undefined. Maybe it seems to work in some cases. It doesn’t matter; use a different compiler, or a later version of the same compiler, and all of a sudden it could stop “working”.
Don’t think of the C compiler as some sort of glorified high-level assembler. It’s way more sophisticated than that. There are rules you have to follow, if you are using any of the modern compilers. You don’t have to like it (and there are even switches available that will give behaviour that you want, but that’s not standard C any more) but it is the case. You must not ever invoke undefined behaviour.
Note 1: Actually, I believe the whole program behaviour is undefined if the program exhibits undefined behaviour at any point. So technically, even things that were supposed to happen before the UB might not happen or might happen differently.
You are technically correct, but I’m sure you understand that the consequences of such a policy means that pushed to the extreme we could have the situation where a 100k LoC codebase has a single little bug deep down somewhere, then crashing or executing random code straight at startup is an acceptable behavior.
The cost of a single mistake is very high, that’s the main point I’m trying to make.
What’s the alternative? If the compiler can’t optimise around code that hasn’t been executed yet having UB, then the opportunities for useful optimisation become near non-existent.
The compiler is not doing anything unreasonable here: every step, in isolation, is desirable and valid. If the end result feels unreasonable, then that’s either (a) a problem with the language spec being insufficiently relaxed about what it considers to be UB or (b) a problem with insufficient tooling (or, in the case of a language like Rust, built-in compiler checks) to catch issues like this.
To point the finger at the compiler is a very hard sell indeed because there’s no specific thing to point at that it’s doing wrong.
It might be reasonable not to do the optimization. The alternative in rust is to actually define the behavior of wrapping, which would be equivalent to using -fwrapv in C. Sure we loose some optim, but is it worth it? I’m starting to believe so.
Yes, I agree: but that’s a problem with the language spec, not the compiler. The language spec should just say ‘overflow has wrapping semantics’. You’ll lose some opportunities for optimisation and compatibility with a lot of older of obscure platforms (some platforms have arithmetic instructions that don’t wrap on overflow, and this is one of the big reasons that the C spec leaves overflow undefined!), but this is enough of a footgun that I think it’s a worthwhile tradeoff in the year of our lord 2022.
But again, this isn’t GCC’s fault: this is the fault of the language spec and the compromises that went into its creation. Don’t like it? Time to get a new language (this isn’t me trying to be gatekeepy: horrid footgun shit like this is a big reason I moved to Rust and never looked back).
Not saying it’s GCC fault, but just because a spec did a mistake doesn’t mean GCC should be braindead about it: it holds a responsibility for all the software in C out there. Nothing forces GCC to do dangerous optimizations; it can still follow the specs by not honoring this part. GCC serves the user, not the specs; the question becomes: do users want this kind of optimization and assume its consequences by default?
Where’s the mistake? Integer overflow being undefined is a feature, not a bug. There are platforms where the behaviour of overflow is implementation defined, entirely unpredictable, or just straight up UB at a hardware level, leaving the machine in a totally invalid state. C is designed to target bizarre and unusual architectures like these, and so having integer overflow be undefined is a huge boon to the language’s portability without sacrificing (and even improving, in many cases) performance.
If you’re just going to do language spec revisionism and claim that ‘the spec is wrong’ or something, then I think it’s clear that C’s not the language for you. Heck, it’s definitely not the language for me: I aggressively avoid touching the thing nowadays.
Many GPUs have saturating semantics on overflow. Other architectures emulate small integers with large integers, meaning that overflow results in unobservable ‘extra bits’. Changing the standard to make integer overflow always wrap would make writing C for these architectures extremely difficult without significant performance ramifications.
If reducing portability is fine with you, then so be it: but that’s not what C is for: it’s supposed to be the lowest common denominator of a vast array of architectures, and it does this quite effectively in no small part because it leaves things like integer overflow undefined.
There are platforms where the behaviour of overflow is implementation defined, entirely unpredictable, or just straight up UB at a hardware level, leaving the machine in a totally invalid state.
Can you name one such platform? That is still used after Y2K?
Also note that the spirit of UB in 1989 was almost certainly a compatibility thing. I doubt the standard committee anticipated anything other than -fwrapv on regular 2’s complement processors. And it’s only later that compiler writers realised that they could interpret “UB” in a way that in this particular case was most probably against the spirit of it.
I dont think defining the behaviour of overflow is desirable: programmers want overflow to happen in very rare cases and defining its behaviour now means tools cannot distinguish between overflow the programmer wanted/expected and accidental overflow (the vast majority of cases in my experience).
We currently can write sanitizers around overflow because its undefined, if we had defined it as wrapping the sanitizers could only say “well its wrapping, but I guess you wanted that, right ?”
AFAIU rust traps on overflow in debug, and defines it as wrapping in release, I believe this is mostly because they decided undefined behaviour in safe code was unacceptable, so they went with defined but very likely wrong in release.
You lose far fewer optimisations in a language that is not C. Unfortunately, in C, it is a very common idiom to use int as the type for a loop induction variable. Having to reason about wrapping breaks a lot of the analyses that feed vectorisation. In C++ or Rust, you typically use iterations, rather than raw indexes, and these iterations will use an unsigned type by default. Operating over the domain of positive integers with wrap to zero is much simpler than operating over the domain of signed integers with overflow wrapping to a large negative number and so the C++ and Rust versions of these loops are easier to vectorise. In C, using something like size_t as the type of the induction variable will often generate better code.
Then… how about renouncing these optimisations, and tell everyone to update their code to use size_t so it is fast again? Because I sure resent compiler writers for having introduced critical vulnerabilities, and tell everyone to fix their programs so they are safe again…
I mean, sometimes the hoops I have to jump through… libsodium and Monocypher for instance can’t use arithmetic left shifts on signed integers at all. Instead of x << 26 we need to write x * (1<<26), and hope the compiler will be smart enough to generate a simple shift (thankfully it is). Reason being, left shifting negative integers is straight up undefined. No ifs, no buts, it’s undefined even when the result would stay within range.
That’s what PCC does. It renounces all of these optimisations and, as a result, generates much slower code. OpenBSD tried to use it for a while, but even they couldn’t put up with the poor performance (and OpenBSD generally favours security over performance). The market has shown time and time again that a C compiler that generates fast code will always be chosen over one that generates more predictable code for undefined behaviour.
It’s not like there aren’t alternative compilers that do what you claim to want, it’s just that no one (including you) actually wants to pay the performance cost of using them.
The market has shown time and time again that a C compiler that generates fast code will always be chosen over one that generates more predictable code for undefined behaviour.
Gosh, I think our mutual employer provides a strong counter-example. The incentives of a standalone compiler vendor are very different to a vendor that uses the compiler to compile billions of lines of its own production code. Our compiler adds new security features at the expense of performance continually, and internal code is required to use them. IMHO these end up at the opposite absurd end of the spectrum, like default-initializing stack variables to ensure the undefined behavior on access becomes implementation defined, stack overflow buffer checks, etc. In addition to performance vs. security, there’s also a stronger emphasis on compatibility vs. performance; updating the compiler in a way that would defeat large numbers of existing security checks would come under a lot of scrutiny.
I thought about MSVC’s interpretation of volatile as a counter example here (it treats it as the equivalent of sequentially consistent atomic, because that’s what a lot of internal legacy code assumed). But then I thought of all of the third-party project switching to using clang on Windows, including things like Chrome and, by extension, all Electron apps and realised that it wasn’t such a counter example after all. For a long time, MSVC was the only compiler that could fully parse the Windows headers, which gave it a big advantage in the market, now that clang can do the same, that’s eroding (I believe clang can now parse all of the Windows source code, but it can’t correctly codgen some large chunks and doesn’t generate code that is the shape expected by some auditing tools).
Alias analysis is another place where MSVC avoids taking advantage of undefined behaviour. Apple pushed for making -fstrict-aliasing the default and fixed (or encouraged others to fix) a huge amount of open source and third-party code, giving around 5% better performance across the entire system. MSVC does not take advantage of type-based alias analysis because doing so would break a lot of existing code that relies on UB. This is also pushing people who have code that does not depend on illegal type punning to use clang and get more performance.
Note that I am talking specifically about interpreting the standard with respect to UB to enable optimisations here. I see security flags such as /GUARD, stack canaries, InitAll, and so on as a different category, for three reasons:
They are opt in, so you can ignore them for places where you know you’re sandboxing your program or where it isn’t operating on untrusted data.
They make certain behaviour well defined, which makes it easier to reason about your source code. Not taking advantage of UB does not have this property: your program still depends on UB and may still behave in unexpected ways and your performance is now harder to reason about because it will vary hugely depending on whether, post inlining, you have enough hints in your source for the compiler to prove that the condition will not trigger.
They, in general, don’t impede other optimisations. For example, InitAll combines with stead store elimination and typically can be elided by this optimisation (and Shayne did some great work to improve this). /GUARD is applied very late in the pipeline (I worked on the LLVM implementation of this so that we could have CFG for Rust and Objective-C), and so inlining and devirtualisation can significantly reduce the number of places where you need the check (MSVC has some very impressive profile-guided devirtualisation support, which helps a lot here). In contrast, things like not assuming that integer addition results in a larger number have a big knock-on effect on other optimisations.
Well, there is renouncing a class of optimisations, and defining a class of behaviours. I don’t think those are the same. Which one PCC was trying? Did it define integer overflow and pointer aliasing etc. or did it disable dangerous looking optimisations altogether?
it’s just that no one (including you) actually wants to pay the performance cost of using them.
I put myself in a situation where I can actually cop out of that one: I tend to write libraries, not applications, and I ship source code. This means I have no control over the compilation flags, and I’m forced to assume the worst case and stick to strictly conforming code. Otherwise I would try some of them (most notably -fwrapv) and measure the impact on performance. I believe I would accept any overhead below 5%. But you’re right, there is a threshold beyond which I’d just try to be more careful. I don’t know for sure which threshold this is though.
Libraries would still come shipped with a build system to produce (shared) objects, right?
Not when this library is literally one C source file with its header, with zero dependency, and used on obscure embedded targets that don’t even have a standard library and I don’t know of anyway.
I do ship with a Makefile, but many people don’t even use it. And even if they did, they control $CFLAGS.
Well, then again, I did it on purpose: sticking to standard C99 with zero dependency is how people ended up using it in those contexts. My work is used on a previously underserved niche, that’s a win.
And in practice, I did one error of any consequence, and it was a logic bug, not anything to do with C’s UB. I did have a couple UB, but none ended up amounting to anything. (There again, it helps that my library does zero heap allocation.)
Yes, that is exactly the by-design consequence of C UB. A single UB anywhere deep in your code could convert your computer into a giant whale or a potted plant.
the assignment needed to be done early because I was targeting a codebase with some ancient C convention on variable declaration required to be before any code
If this is referring to C89 style, then you can declare a variable without assigning it:
Good observation. In C/C++ you are intended to check for valid preconditions before you perform an operation that relies on them. In Python and many others, there is a pervasive “look before you leap” idiom because there is no undefined behavior, either it behaves correctly or throws an exception, i.e. every operation is checked beforehand. Could be from an influx of folks into C/C++ from those languages.
For those who don’t understand, C/C++ does it this way because specifying “undefined behavior” allows you to assume that preconditions are valid without having to recheck them on every call, allowing the programmer to be more efficient with the CPU.
In Python and many others, there is a pervasive “look before you leap” idiom because there is no undefined behavior, either it behaves correctly or throws an exception, i.e. every operation is checked beforehand.
I think “look before you leap” (LBYL) is the opposite of what you’re trying to describe. I’ve usually heard that described as “easier to ask forgiveness than permission” (EAFP).
Note that the order of operations doesn’t matter for UB. UB is not an event that happens. Instead, “UB can’t happen” is an assumption that the compiler is free to make, and then move or delete code under that assumption. Mere existence of any UB anywhere in your program, even in dead code that is never executed, is a license to kill for a C compiler.
Whole-function analysis can have an effect that seems like UB going back in time. For example, the compiler may analyze range of possible values of a variable by checking its every use and spotting 2 / x somewhere. Division by 0 is UB, so it can assume x != 0 and change or delete code earlier in the function based on this assumption, even if the code doesn’t have a chance to reach the 2 / x expression.
For example, the compiler may analyze range of possible values of a variable by checking its every use and spotting 1 / x somewhere, and then assume x != 0 and change or delete code based on that earlier in the function, even before execution has a chance to reach the 1 / x.
Yep, but if that 1 / x is in dead code it can’t affect assumptions that the compiler will make for live code. And if the 1 / x is in a particular execution path then the compiler can’t use it to make assumptions about a different path.
As an example, for:
if (x == 0) {
printf("x is zero!\n");
}
if (x == 1) {
printf("1/x = %d\n", 1 / x);
}
… the compiler will not remove the x == 0 check based on division that occurs in the x == 1 branch. Similarly, if such a division appears in dead code, it can’t possibly affect a live execution path.
So:
even in dead code that is never executed, is a license to kill for a C compiler.
No.
(Edit, based on your edits): In addition:
Division by 0 is UB, so it can assume x != 0 and change or delete code earlier in the function based on this assumption,
Yes, if it is guaranteed that from the earlier code the 2 / x division must be subsequently reached, otherwise no.
even if the code doesn’t have a chance to reach the 2 / x expression.
No. As per above example, the compiler cannot assume that because something is true on some particular execution path it is true on all paths.
If what you were claiming was true, it would be impossible/useless to perform null checks in code. Consider:
if (p != NULL) {
*p = 0;
}
If the compiler can assume that p is not NULL based on the fact that a store to *p exists, it can remove the NULL check, converting the above to just:
*p = 0;
This is clearly different and will (for example) crash if p happens to be NULL. But a compiler can’t and won’t make that change: https://godbolt.org/z/hzbhqdW1h
On the other hand if there is a store that appears unconditionally on the same execution path it can and will remove the check, eg.
*p = 0;
if (p != NULL) {
printf("p is not null!");
}
… for which both gcc and clang will remove the check (making the call to printf unconditional): https://godbolt.org/z/zr9hc7315
As it happens, neither compiler will remove the check in the case where the store (*p = 0) is moved after the if block, but it would be valid for them to do so.
I think this is the core of the issue and why people are getting so fired up.
If you assume that integer operations are sent to the CPU in tact, and the CPU was made in the last 30 years, then checking for overflow after the fact is a single compare.
If you have to check for the potential for overflow beforehand, the comparison is much more involved. I was curious what it actually looks like and stumbled onto this which implements it in four compares (and three boolean evaluations.)
At some level, this whole conversation becomes a disagreement about the nature of bounds checking. If you assume bounds checking does not exist (or can be compiled away!) then you can exploit UB to optimize signed arithmetic to improve performance. If you assume bounds checking needs to exist, that UB exploit is a huge negative because it forces much more pessimism to put the bounds check back, making performance worse.
Then we end up with compiler builtins to perform signed arithmetic deterministically. This is odd because the UB optimization assumes that if the language spec doesn’t require something it’s not needed in an implementation, but the existence of the builtin suggests otherwise. The UB optimization assumes that there’s no value in having a predictable implementation defined behavior, but the builtin is a predictable implementation defined behavior.
It’s the observation that most of the time overflow is a bug. If you wanted overflow semantics, you should have asked for it specifically. This is how e.g. Zig works.
Avoiding the overflow requires a bounds check. I interpreted your earlier question being about why these bounds checks often create an overflow in order to perform the check (which is not a bug, it’s integral to the check.) There’s no standards compliant way to request overflow semantics specifically, so that option doesn’t really exist, and doing the check without an overflow is gross.
If the standard had provided a mechanism to request overflow semantics via different syntax, we probably wouldn’t have such intense discussion.
LSP is worrying to me as well but I can’t quite explain why yet. It is scary that many community efforts have been abandoned in the rush to embrace LSP, an abstraction controlled and potentially limited by Microsoft. This puts them in a strategically advantageous position to control what sorts of dev features the OSS community can get. Meanwhile it seems full Visual Studio doesn’t use LSP for its most powerful features.
LSP is a protocol, not a technology. It’s actually one the best thing that could have happened for future editors and IDE: you just need to support LSP and you get compatibility with all the languages for free. Getting proper language support in editors was a pain before LSP.
It doesn’t prevent IDEs to support more than LSP, but LSP is now the bare minimum that we can expect to have anywhere. This is a great thing.
LSP is a good thing, but the fact that its effectively under Microsoft control is not. My main gripe about it is that to make the life of some VSCode developer easier, they decided all locations in LSP would be encoded as UTF-16 offsets from the start of the buffer (buffers themselves must be transfered in UTF-8 encoding, go figure). Every single LSP implementation now needs to have a UTF-8 to UTF-16 offset computation logic to be compliant.
I believe this could have been fixed if LSP had been a community project, because at the time this was widely noticed most LSPs implementations were just passing along byte offsets or codepoint offsets, so we could have just fixed the few implementations that used UTF-16 offsets and changed the spec. Unfortunately Microsoft has full control over the spec so nothing was changed and all LSP implementation had deal with that unnecessary complexity.
That particular problem was fixed in the latest version, from the change log
Add support to negotiate the position encoding.
But yeah, I also have gripes with what LSP technically is, and it’s stewardship.
For this reason (but moreso due to general architectural sanity), I strongly recommend to not hard-code LSP as a core domain model, but rather treat it as a serialization layer:
On the servers, produce a library for semantic analysis of code with language-specific APIs. You don’t have to make library’s API public/stable, just make sure it exists internally.
On the client side, provide hooks to populate completions, outline, symbol search, etc, but leave it up to an extension to wire those with LSP. These days we have LSP built into vim & emacs, but there’s no LSP support in VS Code. The just have an npm LSP library, which extension authors can choose to use.
I love LSP, the fact that I can use a fringe editor and get all the same language integration features of VSCode is amazing. Sure, it doesn’t get me the whole ecosystem of vim or emacs plugins, or every feature of intellisense, but it’s so much more than nothing.
If you had told me 15 years ago that such a thing would get traction, and that Microsoft of all people would be driving it, I would have guffawed my coffee all over the table.
Context: the ones I use are rust-analyzer/kakoune. I don’t think MS itself is involved in the stack at all, even in a legal sense.
I generally think LSP is great for allowing other editors to exist (while it’s not as good as stuff like PyCharm/VS classic, its basically better than everything else that was offered before).
However, one nasty thing that happens is that each language gets, roughly, the one blessed LSP client. In Python’s case, that has ended up being MSFT’s non-OSS client. It totally sucks the oxygen out of the room for alternatives, and it’s such a hard problem to begin with. Honestly not great.
And I will steal this paragraph, closest to my heart:
If that doesn’t sound anything special, it means that it makes sense. Unfortunately, the field of text editors on UNIX systems has over the years turned into an archipelago, in which every editor aims at being an island. Job management, shell, terminal emulation, window multiplexing… Text editors have turned into closed ecosystems (or Integrated Development Environments) that provide many (sometimes cardboard-looking) features unrelated to editing, which new comers have to buy into, or be left out in the cold.
Then why does Kakoune have a diff algorithm and a json parser among other things?
In terms of code, it also uses a few data types/algorithms that standard c++ already provides.
Then why does Kakoune have a diff algorithm and a json parser among other things? In terms of code, it also uses a few data types/algorithms that standard c++ already provides.
There’s nothing unusual about the Kakoune project having their own data structures; it’s common for C++ projects to contain functionality that’s also in the standard library.
I would say that this reflects more on the language then on the Kakoune project.
There are various places where the internal diff algorithm is used, such as in the terminal output code (on the builtin-terminal-ui branch that replaces ncurses with a custom backend), when we reload a buffer or pipe part of a buffer through an external filter, the diff algorithm allows Kakoune to know what actually changed.
Kakoune replaces some of the standard library utilities with its own for various reasons, HashMap is used instead of std::unordered_map to provide slightly different semantics (iteration order matches insertion order for example) and better performance (HashMap uses open addressing), RefPtr is used instead of shared_ptr because we use an intrusive ref count and we share the implementation with SafePtr (a pointer type that makes an object assert at destruction if any pointers to it remain alive).
Are you aware of the article (written by the creator of the project himself) titled “Why Kakoune?”, though? I didn’t expect that my article could be seen as an argument for the editor, it’s an interesting angle.
I was not aware of it, thanks for sharing. Even if it is much more “why”, it is also much longer :-). And programmers love to steal in my experience :-).
I really want matrix to succeed, but the issues are plentiful.
The fact that self-hosting synapse in a performant manner is no trivial feat (this is slowly improving), compounded by the fact that no mobile client yet supports sliding sync (ElementX when) makes my user experience in general very miserable. Even the element-desktop client have horrible performance, unable to make use of GPU acceleration on nearly all of my devices.
As an IRC user, do I want to know why a instant messaging client would need GPU acceleration? :x
It’s nothing particularly novel to matrix: rendering UIs on the CPU tends to use more battery than the hardware component whose entire goal is rendering, and it’s hard to hit the increasingly-high refresh rates expected solely via CPU rendering.
A chat application ought to do very infrequent redraws, basically when a new message comes in or whenever the user is composing, worst case when a 10fps gif is being displayed. I find it concerning we now need GPU acceleration for something as simple as a chat to render itself without feeling slugish.
Rendering text is one of the most processor-intensive things that a modern GUI does. If you can, grab an early Mac OS X machine some time. Almost all of the fancy visual effects that you get today were already there and were mostly smooth, but rendering a window full of text would have noticeable lag. You can’t easily offload the glyph placement to the GPU, but you can render the individual glyphs and you definitely can composite the rendered glyphs and cache pre-composited text blocks in textures. Unless you’re doing some very fancy crypto, that will probably drop the power consumption of a client for a plain text chat protocol by 50%. If you’re doing rich text and rendering images, the saving will be more.
The downside with the texture atlas rugged approach is that the distribution of glyphs in the various cached atlases in every process tend to become substantially re-invented across multiple graphics sources and make out quite a bit of your local and GPU RAM use. The number of different sizes, styles and so on aren’t that varied unless you dip into some kind of opinionated networked document, and even then the default is default.
My point is that there is quite some gain to be had by somehow segmenting off the subsurfaces and somewhat split the load – a line packing format in lieu of the pixel buffer one with the LTR/RTL toggles, codepoint or glyph-index lookup, (so the client need to know at least GSUB of the specific font-set) and attributes (bold, italic, colour, …) one way and kerning feedback for picking/selection the other.
That’s actually the setup (albeit there’s work to be done specifically in the feedback / shaping / substitution area) done in arcan-tui. Initial connection populates font slots and preferred size with a rough ‘how does this fit a monospaced grid w/h” hint. Clients using the same drawing properties shares glyph cache. We’re not even at the atlas stage (or worse, SDFs) stage yet the savings are substantial.
I’m quite surprised by this. I’d assume you wouldn’t render an entire font, but maybe blocks of 128 glyphs at a time. If you’re not doing sub-pixel AA (which seems to have gone out of fashion these days), it’s 8 bits per pixel. I’d guess a typical character size is no more than 50x50 pixels, so that’s around 300 KiB per block. You’d need quite a lot of blocks to make a noticeable dent in the > 1GiB of GPU memory on a modern system. Possibly less if you render individual glyphs as needed into larger blocks (maybe the ff ligature is the only one that you need in that 128-character range, for example). I’d be really surprised if this used up more than a few tens of MiBs, but you’ve probably done the actual experiments so I’d be very curious what the numbers are.
That sounds like an interesting set of optimisations. Can you quantify ‘substantial’ at all? Do you know if Quartz does anything similar? I suspect it’s a bit tricky if you’ve got multiple rounds of compositing, since you need to render text to some texture that the app then renders into a window (possibly via multiple rounds of render-to-texture) that the compositor composes onto the final display. How does Arcan handle this? And how does it play with the network transparency?
I recall seeing a paper from MSR at SIGGRAPH around 2005ish that rendered fonts entirely on the GPU by turning each bezier curve into two triangles (formed from the four control points) and then using a pixel shader to fill them with transparent or coloured pixels on rendering. That always seemed like a better approach since you just stored a fairly small vertex list per glyph, rather than a bitmap per glyph per size, but I’m not aware of any rendering system actually using this approach. Do you know why not? I presume things like font hinting made it a bit more complex than the cases that the paper handled, but they showed some very impressive performance numbers back then.
You could’ve gotten away with an alpha-coverage only 8-bit texture had it not been for those little emoji fellows, someone gave acid to the LOGO turtles and now it’s all technicolour rainbow – so full RGBA it is. While it is formally not a requirement anymore, there’s old GPUs around and you still can get noticeable difference when textures are a nice power-of-two (POT) so you align to that as well. Then comes the quality nuances when rendering scaled, since accessibility tools like there zooms in and out you want those to look pretty and not alias or shimmer too bad. The better way for that still is mip-mapping, so there is a point to raster at a higher resolution, switch that mipmap toggle and have the GPU sort out which sampling level to use.
There was already a big leap for the TUI cases not having WHBPP*2 or so pixels to juggle around, render to texture or buffer to texture and pass onwards (that could be another *4 because GPU pipelines and locking semantics you easily get drawing-to, in-flight, queued, presenting).
The rest was that the font rendering code we have is mediocre (it was 2003 and all that ..) and some choices that doesn’t fit here. We cache on fonts, then the rasterizer caches on resolved glyphs, and the outliner/shaper caches on glyph lookup. I don’t have the numbers available, but napkin level I got it to around 50-75% overhead versus the uncompressed size of the font. Multiply that by the number of windows open (I drift towards the upper two digit of active CLI shells).
The size of a TPACK cell is somewhere around 8 bytes or so, using UCS4 even (you already needed the 32-bit due to having font-index addressing for literal substitution), then add some per-line headers. It also does I and P frames so certain changes (albeit not scrolling yet) are more compact. I opted against trying to be overly tightly packed as that has punished people in the past and for the network case, ZSTD just chews that up into nothing. It’s also nice having annotation-compact text-only intermediate representation to juggle around. We have some subprojects about to leverage that.
I don’t remember what Quartz did or how their current *Kits, sorry.
For Arcan itself it gets much more complicated and a larger story, as we are also our own intermediate representation for UI components and nest recursively. The venerable format string based ‘render_text’ call at the Lua layer force rasterisation into text locally as some genius thought it a good idea to allow arbitrary embedding of images and other video objects. There’s a long checklist of things to clean up, but that’s after I close down the network track. Thankfully a much more plastic youngling is poking around in those parts.
Speaking of networking – depending on the network conditions we outperform SSH when it starts to sting. The backpressure from things like ‘find /’ or ‘cat /dev/random’ resolves and renders locally and with actual synch in the protocol you have control over tearing.
AFAIR @moonchild has researched this more than me as to the current glowing standards. Back in ‘05 there was still a struggle getting the text part to behave, especially in 3D. Weighted channel based hinting was much more useful for tolerable quality as well, and that’s was easier as a raster preprocess. Eventually Valve set the standard with SDFs that it still(?) the dominant solution today (recently made its way natively into FreeType), and quality optimisations like multi-channel SDFs.
Thanks. I’m more curious about the absolute sizes than the relative savings. Even with emoji, I wouldn’t expect it to be a huge proportion of video memory on a modern system (even my 10-year-old laptop has 2 GiB of video memory). I guess it’s more relevant on mobile devices, which may have only this much total memory.
I will try and remember to actually measure those bits myself, can’t find the thread where C-pharius posted it on Discord because well, Discord.
The savings are even more relevant if you hope to either a. at least drive some machines from an FPGAd DIY graphics adapter instead of the modern monstrosities, b. accept a 10-15 year rollback in terms of available compute should certain conflicts escalate, and c. try to consolidate GPU processing to a few victim machines or even VMs (though the later are problematic, see below) – both of which I eventually hope for.
I layered things such that the Lua API looks like a balance between ‘animated display postscript’ and ‘basic for graphics’ so that packing the calls in a wire format is doable and asynchronous enough for de-coupling. The internal graphics pipeline also goes through an intermediate-representation layer intended for a wire format before that gets translated to GL calls for the same reason – at any time, these two critical junctions (+ the clients themselves) cannot be assumed/relied upon to be running on the same device / security domain.
Public security researchers (CVE/bounty hunters) have in my experience been pack animals as far as targeting goes. Mobile GPUs barely did its normal job correctly and absolutely not securely for a long time and little to nothing could be heard. From DRM (as in corporate malware) unfriendly friends I’ve heard of continuous success bindiffing Nvidia blobs. Fast > Features > Correct > Secure seems generally to be the priority.
With DRM (as in direct rendering manager) the same codebase hits BSDs and Linux alike, and for any VM compartmentation, VirGL cuts through it. The whole setup is massive. It evolves at a glacial pace and it’s main job is different forms of memcpy where the rules for src, dst, size and what happens to the data in transit are murky at best. “Wayland” (as it is apparently now the common intersection for several bad IPC systems) alone would’ve had CVEs coming out the wazoo had there been an actual culture around it, we are still waiting around for conformance tests, much less anything requiring more hygiene. Fuzzing is non-existent. I am plenty sure there are people harvesting and filling their barns.
An amusing related curiosity I ran across while revisiting a few notes on some replated topic - https://cgit.freedesktop.org/wayland/wayland/tree/NOTES?id=33a52bd07d28853dbdc19a1426be45f17e573c6b
“How do apps share the glyph cache?”
That’s the notes from the first Wayland commit covering their design axioms. Seems like they never figured that one out :-)
Ah, that makes sense, thanks. I’m definitely sympathetic to the first problem.
With irissi I’m using GPU acceleration because my terminal emulator is OpenGL based.
Element X when == a few days ago on Android: https://element.io/blog/element-x-android-preview/ and a few months ago on iOS: https://element.io/blog/element-x-experience-the-future-of-element/
Sadly I’m blocked by no support for SSO…
as your link says:
the code’s there and works; just needs to be released and documented. NB that shifting to native OIDC will be a slightly painful migration though; some of the old auth features may disappear until reimplemented in native OIDC, whicy may or may not be a problem for you.
If you’re on Android, note that an early release of Element X just hit the Play Store yesterday: https://element.io/blog/element-x-android-preview/.
So TLDR is that emacs-LSP sends invalid utf16 character offsets on text changes. It sends utf8 offsets, when the LSP spec says utf16. Rust-Analyzer tries to use these invalid offsets to manipulate internal data and panics, because it ends up trying to change stuff inside a multibyte-character. That only happens on certain multibyte stuff for obvious reasons (bottom emoji). Fix would be UTF-8 OPT-IN (new LSP option), or sending correct UTF16 offsets. Fixes exist, but are in upstreaming limbo. Hope I got this right.
That is correct.
Also fun from the article: unicode normalization/denormalization (“é” can be one or two codepoints, if you like combining diacritics). Big emojis are small emojis zero-width-joined. ECMAScript can’t decide between UTF-16 code units and Unicode codepoints (indexing vs iteration).
lsp-mode’s auto-download doesn’t seem to work if you use rustic, at least, and it falls back to RLS, which is completely deprecated. Emacs considers that “characters” are 22-bit-wide, just like Unicode (21 bits) + 1 bit to make room for “raw bytes”.
There’s also a lighting review of zig hidden in plain sight.
I really liked the unicode-utf8-utf16 visualizations. I wish that would exist as some kind of website where you can look up stuff and see how it is represented in different unicode encodings.
I definitely had to notice the Zig intermezzo - the code required is very verbose. Reminded me of your “Twitch” Article mentioning burnout, and gave me the thought how you’re probably also “locked in” to rust content, as the thing that made your articles self sustaining.
I haven’t really used LSP but the gist I get is that everyone is vaguely embarrassed by the fact that the spec says everything has to be UTF-16, and instead of fixing it, most people just kind of pretend it’s already been fixed because the alternative of actually having to think about non-UTF8 text gives people traumatic flashbacks to the 1990s and no one has time for that?
Like… the specified behavior is also hilariously the wrong thing, so no one wants to actually defend it and insist on following the spec?
[edit: hadn’t made it to the end of the article where it says the spec now allows negotiation on this as of a few months ago]
Not sure about “instead of fixing it” bit. rust-analyzer (following clangd) supported sane utf-8 offsets since forever. I’d personally be perfectly happy if clients supported only utf8 (which would be a de-jure violation of the spec), as long as they properly advertise utf8 support.
That’s what I meant by pretending it’s already been fixed; ignoring the spec and doing the right thing anyway by counting codepoints instead of using UTF-16 offsets. IMO the right thing to do, but at the same time “please ignore the spec because it’s very stupid” is a hard argument to make.
I actually made that argument a while ago, when every LSP implementation started to realize that the spec mandated UTF-16 offsets (while mandating UTF-8 document encoding…). At that point in time most implementations were either using UTF-8 byte offsets or codepoint offsets and we could have unified on something sensible while pressuring Microsoft to fix the spec to what implementations were actually doing instead of what happened to be convenient to them. Unfortunately that did not happen and every LSP implementation now has to contain the same unnecessary complexity to be compliant. The recent change to LSP added support for alternate offset encoding but still mandates that UTF-16 must be supported.
It’s a bit hard to fix the spec, as it isn’t collectively maintained, and the upstream isn’t particularly responsive. Historically, just doing the thing ahead of the spec, and using the “this already is how the things work in the wild” was the most effective way to move forward. Which sort-of what happened with position-encoding as well!
Sounds like the LSP protocol itself is another horror show of outdated definitions and reality vs spec fights.
Well it was invented by Microsoft.
UTF-16 code units are still how quite a lot of things in the web platform are specified, largely because that’s JavaScript’s closest thing to a “character” type, and JS is the native language of that platform. So things like setting a max length of 10 on an HTML form input means “10 UTF-16 code units”, not “10 bytes” or “10 code points” or “10 graphemes”.
Though curiously some parsing algorithms are still specified in terms of code points and not code units, which means implementing them in a “just use UTF-8” language can be trickier than expected. For example, the HTML5 legacy color parsing algorithm (the one that can turn any string, even “chucknorris”, into an RGB color value) requires identifying code points at specific positions as well as being able to perform replacements based on code point values.
And that would be relevant, if the context had anything at all to do with the web rather than a supposedly language-agnostic protocol! But they made a big mistake and let implementation details leak into the spec, and it’s taken them years to admit it.
As the post explains, LSP is based on JSON-RPC, which ties it firmly back to the web domain (JSON-RPC being built around JSON, which in turn is built on top of JavaScript). Plus LSP itself was originally developed for VS Code, an Electron app, which likely had some influence on the selection of things like JSON, JS, etc.
That’s what “let implementation details leak into the spec” means.
I’m not all sure it is an “implementation detail”, unless JSON itself is considered a detail that shouldn’t leak into the spec. Which would be weird, since usually the data format is supposed to be part of a spec.
(where I’m going with this, ultimately, is JSON being more complex, still, than people generally realize – even though the 2017 JSON RFC waved its hands in the direction of saying JSON ought to be UTF-8, it did so in a way that left loopholes for protocols like LSP to fall back to “JSON is JS is UTF-16”, plus the RFC itself still has UTF-16-isms in it, notably in the escape syntax which requires use of surrogate pairs for non-BMP code points)
LSP doesn’t not encode JSON in UTF-16. It uses UTF-8 on the wire.
I phrased that badly – the idea was just to point out that JSON explicitly allows for a protocol to do UTF-16 (or, really, any other encoding), and that JSON’s inherent origin in the web domain means “web domain is irrelevant” is wrong.
LSP is caring about UTF-16-y things in a different way.
But this is turning into a super-deep tangent just for what I meant as a throwaway comment about how UTF-16 isn’t as dead as people seem to want to think it is.
The spec is supposed to let you switch between UTF8 & UTF16, but it’s an LSP extension & both ends have to support it.
The second case is interesting, because it depends on how std::vector is implemented.
If std::vector was
{T* data, size_t size, size_t capacity}
, then pop_back would likely have no undefined behaviour provided T was a trivially destructible type (a type for which no code runs on destruction). Decrementing size from 0 is (unfortunately if you ask me) well defined to wrap around, so the sanitizers would not be allowed to complain about, for all they know we might actually have expected that wrap around.In practice, std::vector is usually implemented as
{T* begin, T* end, T* end_of_capacity}
, in which case forming a pointer to 1-before-begin (which would happen on an empty vector where begin == end) is by itself UB (just forming the pointer, no need to try to dereference it). I wonder why the sanitizers do not detect this.I think because it would be extremely expensive to check every pointer-arithmetic operation - the sanitizers (address sanitizer, at least) instead check for validity on dereference. This means they can have false negatives, of course.
The difficulty lies around knowing the bounds of the array into which a pointer points. As far as I understand ASan currently just (more-or-less) tracks which memory is allocated, not what is allocated there; for example if the memory contained a structure containing an array, the structure size (and thus the allocation size) are potentially larger than the array size, so you can’t use the allocation bounds to decide whether a pointer has moved outside the array.
Keeping track of whether memory is allocated is as simple as having a shadow-space where a bit or set of bits keeps track of the state of each byte (or slightly larger unit) of memory; tracking type as well would require a much more complicated structure, and a lot of additional instrumentation.
Yeah, that makes sense. Initially I thought you could just instrument after each pointer arithmetic operation but did not realise that the costly part is to resolve what range of memory is valid for this specific pointer.
It hasn’t crossed my mind that sanitisers wouldn’t have a special case for
vector
. I guess textual inclusion makes it a struct like any other…I noticed a weird trend where the check for if something is undefined happens after the undefined behavior. Intuitively it makes sense to me that this is insufficient so I’m wondering why this failure is so common? For example here the overflow check happens after the overflow already happened.
I don’t usually do that, but in my case there were 2 reasons:
Not trying to be critical, but it shouldn’t be news that you can’t check for a UB condition after it’s happened. I’ve seen so many cases where people run into similar problems, or problems related to the type-punning rules. Usually the thought process is something along the lines of:
ok, so I know this is supposed to have undefined behavior. But the variable will still have some value, so I can check if the value is in the correct range and so avoid issues that way…
No, you can’t. This is what undefined behaviour means. All bets are off; if it’s hit, it is a bug in the code, full-stop, and no checking afterwards can fix it because the behaviour from that point (*note1) on is totally undefined. Maybe it seems to work in some cases. It doesn’t matter; use a different compiler, or a later version of the same compiler, and all of a sudden it could stop “working”.
Don’t think of the C compiler as some sort of glorified high-level assembler. It’s way more sophisticated than that. There are rules you have to follow, if you are using any of the modern compilers. You don’t have to like it (and there are even switches available that will give behaviour that you want, but that’s not standard C any more) but it is the case. You must not ever invoke undefined behaviour.
Note 1: Actually, I believe the whole program behaviour is undefined if the program exhibits undefined behaviour at any point. So technically, even things that were supposed to happen before the UB might not happen or might happen differently.
You are technically correct, but I’m sure you understand that the consequences of such a policy means that pushed to the extreme we could have the situation where a 100k LoC codebase has a single little bug deep down somewhere, then crashing or executing random code straight at startup is an acceptable behavior.
The cost of a single mistake is very high, that’s the main point I’m trying to make.
What’s the alternative? If the compiler can’t optimise around code that hasn’t been executed yet having UB, then the opportunities for useful optimisation become near non-existent.
The compiler is not doing anything unreasonable here: every step, in isolation, is desirable and valid. If the end result feels unreasonable, then that’s either (a) a problem with the language spec being insufficiently relaxed about what it considers to be UB or (b) a problem with insufficient tooling (or, in the case of a language like Rust, built-in compiler checks) to catch issues like this.
To point the finger at the compiler is a very hard sell indeed because there’s no specific thing to point at that it’s doing wrong.
It might be reasonable not to do the optimization. The alternative in rust is to actually define the behavior of wrapping, which would be equivalent to using -fwrapv in C. Sure we loose some optim, but is it worth it? I’m starting to believe so.
Yes, I agree: but that’s a problem with the language spec, not the compiler. The language spec should just say ‘overflow has wrapping semantics’. You’ll lose some opportunities for optimisation and compatibility with a lot of older of obscure platforms (some platforms have arithmetic instructions that don’t wrap on overflow, and this is one of the big reasons that the C spec leaves overflow undefined!), but this is enough of a footgun that I think it’s a worthwhile tradeoff in the year of our lord 2022.
But again, this isn’t GCC’s fault: this is the fault of the language spec and the compromises that went into its creation. Don’t like it? Time to get a new language (this isn’t me trying to be gatekeepy: horrid footgun shit like this is a big reason I moved to Rust and never looked back).
Not saying it’s GCC fault, but just because a spec did a mistake doesn’t mean GCC should be braindead about it: it holds a responsibility for all the software in C out there. Nothing forces GCC to do dangerous optimizations; it can still follow the specs by not honoring this part. GCC serves the user, not the specs; the question becomes: do users want this kind of optimization and assume its consequences by default?
Where’s the mistake? Integer overflow being undefined is a feature, not a bug. There are platforms where the behaviour of overflow is implementation defined, entirely unpredictable, or just straight up UB at a hardware level, leaving the machine in a totally invalid state. C is designed to target bizarre and unusual architectures like these, and so having integer overflow be undefined is a huge boon to the language’s portability without sacrificing (and even improving, in many cases) performance.
If you’re just going to do language spec revisionism and claim that ‘the spec is wrong’ or something, then I think it’s clear that C’s not the language for you. Heck, it’s definitely not the language for me: I aggressively avoid touching the thing nowadays.
I am sure there is, so please name one.
Many GPUs have saturating semantics on overflow. Other architectures emulate small integers with large integers, meaning that overflow results in unobservable ‘extra bits’. Changing the standard to make integer overflow always wrap would make writing C for these architectures extremely difficult without significant performance ramifications.
If reducing portability is fine with you, then so be it: but that’s not what C is for: it’s supposed to be the lowest common denominator of a vast array of architectures, and it does this quite effectively in no small part because it leaves things like integer overflow undefined.
Can you name one such platform? That is still used after Y2K?
Also note that the spirit of UB in 1989 was almost certainly a compatibility thing. I doubt the standard committee anticipated anything other than
-fwrapv
on regular 2’s complement processors. And it’s only later that compiler writers realised that they could interpret “UB” in a way that in this particular case was most probably against the spirit of it.Compiler writers are on the standard committee…
I dont think defining the behaviour of overflow is desirable: programmers want overflow to happen in very rare cases and defining its behaviour now means tools cannot distinguish between overflow the programmer wanted/expected and accidental overflow (the vast majority of cases in my experience).
We currently can write sanitizers around overflow because its undefined, if we had defined it as wrapping the sanitizers could only say “well its wrapping, but I guess you wanted that, right ?”
AFAIU rust traps on overflow in debug, and defines it as wrapping in release, I believe this is mostly because they decided undefined behaviour in safe code was unacceptable, so they went with defined but very likely wrong in release.
You lose far fewer optimisations in a language that is not C. Unfortunately, in C, it is a very common idiom to use int as the type for a loop induction variable. Having to reason about wrapping breaks a lot of the analyses that feed vectorisation. In C++ or Rust, you typically use iterations, rather than raw indexes, and these iterations will use an unsigned type by default. Operating over the domain of positive integers with wrap to zero is much simpler than operating over the domain of signed integers with overflow wrapping to a large negative number and so the C++ and Rust versions of these loops are easier to vectorise. In C, using something like size_t as the type of the induction variable will often generate better code.
Then… how about renouncing these optimisations, and tell everyone to update their code to use
size_t
so it is fast again? Because I sure resent compiler writers for having introduced critical vulnerabilities, and tell everyone to fix their programs so they are safe again…I mean, sometimes the hoops I have to jump through… libsodium and Monocypher for instance can’t use arithmetic left shifts on signed integers at all. Instead of
x << 26
we need to writex * (1<<26)
, and hope the compiler will be smart enough to generate a simple shift (thankfully it is). Reason being, left shifting negative integers is straight up undefined. No ifs, no buts, it’s undefined even when the result would stay within range.That’s what PCC does. It renounces all of these optimisations and, as a result, generates much slower code. OpenBSD tried to use it for a while, but even they couldn’t put up with the poor performance (and OpenBSD generally favours security over performance). The market has shown time and time again that a C compiler that generates fast code will always be chosen over one that generates more predictable code for undefined behaviour.
It’s not like there aren’t alternative compilers that do what you claim to want, it’s just that no one (including you) actually wants to pay the performance cost of using them.
Gosh, I think our mutual employer provides a strong counter-example. The incentives of a standalone compiler vendor are very different to a vendor that uses the compiler to compile billions of lines of its own production code. Our compiler adds new security features at the expense of performance continually, and internal code is required to use them. IMHO these end up at the opposite absurd end of the spectrum, like default-initializing stack variables to ensure the undefined behavior on access becomes implementation defined, stack overflow buffer checks, etc. In addition to performance vs. security, there’s also a stronger emphasis on compatibility vs. performance; updating the compiler in a way that would defeat large numbers of existing security checks would come under a lot of scrutiny.
I thought about MSVC’s interpretation of volatile as a counter example here (it treats it as the equivalent of sequentially consistent atomic, because that’s what a lot of internal legacy code assumed). But then I thought of all of the third-party project switching to using clang on Windows, including things like Chrome and, by extension, all Electron apps and realised that it wasn’t such a counter example after all. For a long time, MSVC was the only compiler that could fully parse the Windows headers, which gave it a big advantage in the market, now that clang can do the same, that’s eroding (I believe clang can now parse all of the Windows source code, but it can’t correctly codgen some large chunks and doesn’t generate code that is the shape expected by some auditing tools).
Alias analysis is another place where MSVC avoids taking advantage of undefined behaviour. Apple pushed for making -fstrict-aliasing the default and fixed (or encouraged others to fix) a huge amount of open source and third-party code, giving around 5% better performance across the entire system. MSVC does not take advantage of type-based alias analysis because doing so would break a lot of existing code that relies on UB. This is also pushing people who have code that does not depend on illegal type punning to use clang and get more performance.
Note that I am talking specifically about interpreting the standard with respect to UB to enable optimisations here. I see security flags such as /GUARD, stack canaries, InitAll, and so on as a different category, for three reasons:
Well, there is renouncing a class of optimisations, and defining a class of behaviours. I don’t think those are the same. Which one PCC was trying? Did it define integer overflow and pointer aliasing etc. or did it disable dangerous looking optimisations altogether?
I put myself in a situation where I can actually cop out of that one: I tend to write libraries, not applications, and I ship source code. This means I have no control over the compilation flags, and I’m forced to assume the worst case and stick to strictly conforming code. Otherwise I would try some of them (most notably
-fwrapv
) and measure the impact on performance. I believe I would accept any overhead below 5%. But you’re right, there is a threshold beyond which I’d just try to be more careful. I don’t know for sure which threshold this is though.How’s that? Libraries would still come shipped with a build system to produce (shared) objects, right?
Not when this library is literally one C source file with its header, with zero dependency, and used on obscure embedded targets that don’t even have a standard library and I don’t know of anyway.
I do ship with a Makefile, but many people don’t even use it. And even if they did, they control
$CFLAGS
.Ouch, that’s not an enviable situation to be in :S
Too bad you can’t enforce some of those semantics using
#pragma
or something.Well, then again, I did it on purpose: sticking to standard C99 with zero dependency is how people ended up using it in those contexts. My work is used on a previously underserved niche, that’s a win.
And in practice, I did one error of any consequence, and it was a logic bug, not anything to do with C’s UB. I did have a couple UB, but none ended up amounting to anything. (There again, it helps that my library does zero heap allocation.)
Yes, that is exactly the by-design consequence of C UB. A single UB anywhere deep in your code could convert your computer into a giant whale or a potted plant.
Yes. Writing code in C is a minefield, and I think people who write code in this language need to be aware of that.
If this is referring to C89 style, then you can declare a variable without assigning it:
Yeah but I don’t like that for 2 reasons:
Good observation. In C/C++ you are intended to check for valid preconditions before you perform an operation that relies on them. In Python and many others, there is a pervasive “look before you leap” idiom because there is no undefined behavior, either it behaves correctly or throws an exception, i.e. every operation is checked beforehand. Could be from an influx of folks into C/C++ from those languages.
For those who don’t understand, C/C++ does it this way because specifying “undefined behavior” allows you to assume that preconditions are valid without having to recheck them on every call, allowing the programmer to be more efficient with the CPU.
I think “look before you leap” (LBYL) is the opposite of what you’re trying to describe. I’ve usually heard that described as “easier to ask forgiveness than permission” (EAFP).
My mistake, I meant “leap before you look”
Note that the order of operations doesn’t matter for UB. UB is not an event that happens. Instead, “UB can’t happen” is an assumption that the compiler is free to make, and then move or delete code under that assumption. Mere existence of any UB anywhere in your program, even in dead code that is never executed, is a license to kill for a C compiler.
No, unless you mean that it’s a license to remove the dead code (which the compiler can do anyway).
If code that would have undefined behaviour when executed is never executed, then it does not trigger the undefined behaviour (by definition).
Whole-function analysis can have an effect that seems like UB going back in time. For example, the compiler may analyze range of possible values of a variable by checking its every use and spotting
2 / x
somewhere. Division by 0 is UB, so it can assumex != 0
and change or delete code earlier in the function based on this assumption, even if the code doesn’t have a chance to reach the2 / x
expression.Yep, but if that 1 / x is in dead code it can’t affect assumptions that the compiler will make for live code. And if the 1 / x is in a particular execution path then the compiler can’t use it to make assumptions about a different path.
As an example, for:
… the compiler will not remove the x == 0 check based on division that occurs in the x == 1 branch. Similarly, if such a division appears in dead code, it can’t possibly affect a live execution path.
So:
No.
(Edit, based on your edits): In addition:
Yes, if it is guaranteed that from the earlier code the 2 / x division must be subsequently reached, otherwise no.
No. As per above example, the compiler cannot assume that because something is true on some particular execution path it is true on all paths.
If what you were claiming was true, it would be impossible/useless to perform null checks in code. Consider:
If the compiler can assume that p is not NULL based on the fact that a store to
*p
exists, it can remove the NULL check, converting the above to just:This is clearly different and will (for example) crash if
p
happens to be NULL. But a compiler can’t and won’t make that change: https://godbolt.org/z/hzbhqdW1hOn the other hand if there is a store that appears unconditionally on the same execution path it can and will remove the check, eg.
… for which both gcc and clang will remove the check (making the call to printf unconditional): https://godbolt.org/z/zr9hc7315
As it happens, neither compiler will remove the check in the case where the store (
*p = 0
) is moved after theif
block, but it would be valid for them to do so.I think this is the core of the issue and why people are getting so fired up.
If you assume that integer operations are sent to the CPU in tact, and the CPU was made in the last 30 years, then checking for overflow after the fact is a single compare.
If you have to check for the potential for overflow beforehand, the comparison is much more involved. I was curious what it actually looks like and stumbled onto this which implements it in four compares (and three boolean evaluations.)
At some level, this whole conversation becomes a disagreement about the nature of bounds checking. If you assume bounds checking does not exist (or can be compiled away!) then you can exploit UB to optimize signed arithmetic to improve performance. If you assume bounds checking needs to exist, that UB exploit is a huge negative because it forces much more pessimism to put the bounds check back, making performance worse.
Then we end up with compiler builtins to perform signed arithmetic deterministically. This is odd because the UB optimization assumes that if the language spec doesn’t require something it’s not needed in an implementation, but the existence of the builtin suggests otherwise. The UB optimization assumes that there’s no value in having a predictable implementation defined behavior, but the builtin is a predictable implementation defined behavior.
It’s the observation that most of the time overflow is a bug. If you wanted overflow semantics, you should have asked for it specifically. This is how e.g. Zig works.
Avoiding the overflow requires a bounds check. I interpreted your earlier question being about why these bounds checks often create an overflow in order to perform the check (which is not a bug, it’s integral to the check.) There’s no standards compliant way to request overflow semantics specifically, so that option doesn’t really exist, and doing the check without an overflow is gross.
If the standard had provided a mechanism to request overflow semantics via different syntax, we probably wouldn’t have such intense discussion.
I agree, not having a checked add or a two’s complement add is definitely a hole in the standard and should be fixed.
LSP is worrying to me as well but I can’t quite explain why yet. It is scary that many community efforts have been abandoned in the rush to embrace LSP, an abstraction controlled and potentially limited by Microsoft. This puts them in a strategically advantageous position to control what sorts of dev features the OSS community can get. Meanwhile it seems full Visual Studio doesn’t use LSP for its most powerful features.
LSP is a protocol, not a technology. It’s actually one the best thing that could have happened for future editors and IDE: you just need to support LSP and you get compatibility with all the languages for free. Getting proper language support in editors was a pain before LSP.
It doesn’t prevent IDEs to support more than LSP, but LSP is now the bare minimum that we can expect to have anywhere. This is a great thing.
LSP is a good thing, but the fact that its effectively under Microsoft control is not. My main gripe about it is that to make the life of some VSCode developer easier, they decided all locations in LSP would be encoded as UTF-16 offsets from the start of the buffer (buffers themselves must be transfered in UTF-8 encoding, go figure). Every single LSP implementation now needs to have a UTF-8 to UTF-16 offset computation logic to be compliant.
I believe this could have been fixed if LSP had been a community project, because at the time this was widely noticed most LSPs implementations were just passing along byte offsets or codepoint offsets, so we could have just fixed the few implementations that used UTF-16 offsets and changed the spec. Unfortunately Microsoft has full control over the spec so nothing was changed and all LSP implementation had deal with that unnecessary complexity.
That particular problem was fixed in the latest version, from the change log
But yeah, I also have gripes with what LSP technically is, and it’s stewardship.
For this reason (but moreso due to general architectural sanity), I strongly recommend to not hard-code LSP as a core domain model, but rather treat it as a serialization layer:
On the servers, produce a library for semantic analysis of code with language-specific APIs. You don’t have to make library’s API public/stable, just make sure it exists internally.
On the client side, provide hooks to populate completions, outline, symbol search, etc, but leave it up to an extension to wire those with LSP. These days we have LSP built into vim & emacs, but there’s no LSP support in VS Code. The just have an npm LSP library, which extension authors can choose to use.
I love LSP, the fact that I can use a fringe editor and get all the same language integration features of VSCode is amazing. Sure, it doesn’t get me the whole ecosystem of vim or emacs plugins, or every feature of intellisense, but it’s so much more than nothing. If you had told me 15 years ago that such a thing would get traction, and that Microsoft of all people would be driving it, I would have guffawed my coffee all over the table.
Context: the ones I use are rust-analyzer/kakoune. I don’t think MS itself is involved in the stack at all, even in a legal sense.
Edit: spelling
I generally think LSP is great for allowing other editors to exist (while it’s not as good as stuff like PyCharm/VS classic, its basically better than everything else that was offered before).
However, one nasty thing that happens is that each language gets, roughly, the one blessed LSP client. In Python’s case, that has ended up being MSFT’s non-OSS client. It totally sucks the oxygen out of the room for alternatives, and it’s such a hard problem to begin with. Honestly not great.
This article is a very nice read. I will be using it as an answer to the many people asking me why kak?
I have switched almost a year ago and I cannot imagine myself going back to any other editors I have used before.
And I will steal this paragraph, closest to my heart:
Then why does Kakoune have a diff algorithm and a json parser among other things? In terms of code, it also uses a few data types/algorithms that standard c++ already provides.
There’s nothing unusual about the Kakoune project having their own data structures; it’s common for C++ projects to contain functionality that’s also in the standard library.
I would say that this reflects more on the language then on the Kakoune project.
There are various places where the internal diff algorithm is used, such as in the terminal output code (on the builtin-terminal-ui branch that replaces ncurses with a custom backend), when we reload a buffer or pipe part of a buffer through an external filter, the diff algorithm allows Kakoune to know what actually changed.
Kakoune replaces some of the standard library utilities with its own for various reasons, HashMap is used instead of std::unordered_map to provide slightly different semantics (iteration order matches insertion order for example) and better performance (HashMap uses open addressing), RefPtr is used instead of shared_ptr because we use an intrusive ref count and we share the implementation with SafePtr (a pointer type that makes an object assert at destruction if any pointers to it remain alive).
I think you are mixing two concepts, but I can understand you wrong.
In Dutch, kak is slang for poop / shit. So I find your statement funny, why kak.
Well, it is usual abbreviation of the program. My bad, as it is very similar in Czech (and some other Slavic langs). So better, Why Kakoune?
Thanks!
Are you aware of the article (written by the creator of the project himself) titled “Why Kakoune?”, though? I didn’t expect that my article could be seen as an argument for the editor, it’s an interesting angle.
I was not aware of it, thanks for sharing. Even if it is much more “why”, it is also much longer :-). And programmers love to steal in my experience :-).