Running an LLM query through a GPU is very high latency
How much better would this be on a SoC with unified memory, like Apple Silicon? Or would the latency advantage, if any, on such systems come from the limited size of LLMs that such systems can run?
While I’m generally on the perf-is-good side of this argument, I don’t really find these examples compelling. The smallest examples cited have over a hundred million users. At that kind of scale, a 1% performance improvement translates to huge real-world savings (especially for server-side things where the provider is paying directly), let alone the 50%+ things that it’s discussing. Generalising from these examples is not so obvious. If I am running a desktop app that consumes 30% of my CPU under load and a later version optimises it to consume 15%, that’s a similar saving to the first big win at Facebook that the article mentions. Whereas the Facebook optimisation saves millions of dollars in hardware purchases for Facebook, this saving makes no difference to me that I will notice (I might be spending very slightly more money on power, but my CPU is probably still in a low-power state either way).
This ignores the opportunity cost. Every month a developer spends optimising is a month spent not doing something else. If you write your app in a GC’d language and pay a 30% perf hit (totally made-up number) but are able to ship two compelling features in the time that a competitor ships one, then guess who will win in the market.
This ignores the opportunity cost. Every month a developer spends optimising is a month spent not doing something else. If you write your app in a GC’d language and pay a 30% perf hit (totally made-up number) but are able to ship two compelling features in the time that a competitor ships one, then guess who will win in the market.
I really wish we could do something about this messed-up incentive to screw over our users and potential users. When I lived in the Seattle area, I became friends with a poor person who had (maybe still has?) a PowerPC iMac and a mid-2000s PC in storage, and no current desktop or laptop (only a phone). Ideally, she should still be able to do the same things with those computers now that she did when they were new. But our industry’s upgrade treadmill left her behind. (Admittedly, the company I was working for at the time was and still is part of the problem.)
I guess I need to put my money where my mouth is and translate my current Electron app to Rust, on my own time.
When I lived in the Seattle area, I became friends with a poor person who had (maybe still has?) a PowerPC iMac and a mid-2000s PC in storage, and no current desktop or laptop (only a phone). Ideally, she should still be able to do the same things with those computers now that she did when they were new.
Those computers can do the same things they did when they were new. They can run the same office suites and the same web browsers and render the same web pages and play the same games. And since they likely have actual spinning-platter hard drives rather than SSDs, their storage probably hasn’t degraded over time either.
What they can’t do is run a bunch of things that were developed in the intervening decades. And that should be expected – we don’t stand still, and as faster hardware is developed we find ways to take advantage of it.
To see the point, consider a feature of the phone I carry in my pocket. I shop sometimes at a local Japanese market, and a lot of the products on their shelves do not have English labels. And I don’t speak Japanese. But I can point my phone’s camera at some packaging and it will do an OCR of the Japanese text, translate it to English, and overlay the translation onto the image in, if not real time, then close enough as makes no difference. There is simply no amount of caring about performance that could backport that experience to the very first smartphone I ever owned – its hardware would never be capable of doing this. Even trying to take a still photo and upload it to a server for the processing would be problematic due to both the much worse camera (and lack of onboard hardware-and-software-assisted image stabilization and sharpening and so on) and the low-speed cellular connection it had.
And there are tons and tons of examples of things like this in daily life. Modern tech isn’t just bloated bloatware bloating its bloaty bloat for the sake of bloat; there are tons of things we not only do but take for granted today on devices whose size/form factor would not have been possible 15-20 years ago. And those things are useful. But there’s no way to backport them to sufficiently-old hardware, and it would not be good to try to stand athwart that progress and yell “Stop!” in order to ensure that hardware never is perceived as obsolete.
It’s easy to see that kind of progress on our smartphones. But on my PC, at first glance, it feels like I’m doing mostly the same tasks that I did on the laptop I bought for college in 1999 – browse the web, do email, listen to music, write some documents, and program. Back then, I did all those things on a laptop with a 366 MHz single-core processor and 64 MB of RAM. To avoid burying the lead, I’ve concluded that that feeling is misguided, though I don’t know how much processing power and RAM we’d actually need given ideally efficient software.
The web certainly demands more of our computers than it did back then. For content websites, this is IMO a definite step backwards. One definite advancement has been the addition of streaming video; I’m sure my 1999 laptop wouldn’t have been able to handle that. Another definite advancement is the multi-process model of modern browsers, which enables stronger sandboxing. (For that matter, my 1999 laptop was never upgraded beyond Windows Me, though it also ran various Linux distros over its lifetime.) The strong security of modern browsers also benefits email clients, assuming we count the ability to render HTML email as a necessity, which I do.
Playing a local music library has probably changed the least since 1999. In fact, I do so using foobar2000, the most lean-and-mean Win32 application I routinely use. Such a lightweight player could also support streaming music services, if any of them were open to it. Instead, AFAIK all the streaming services use web apps, or web apps in native wrappers (e.g. Electron or CEF), for their desktop clients. But then, while that might be a bad thing for resource usage, it can be a good thing for accessibility; Spotify’s original highly custom UI, for example, was inaccessible with screen readers. But of course, there were accessible native desktop apps before Electron, so we should be able to have accessibility without the extreme bloat.
Microsoft Office was already getting bloated in 1999. I remember one of my classmates in 1997 complaining about how sluggish Office 97 was on his computer. But again, to some extent, one person’s bloat might have been another person’s accessibility. By 1999, at least one Windows screen reader was using the Office COM object models, which were originally implemented for VBA, to provide access to those applications. I don’t see how the additional bloat of the intervening years has made things better, though.
I certainly appreciate the power of the Rust compiler, and other modern optimizing compilers that we didn’t have in 1999. I’m tempted to give that a pass because programming is a niche activity, but part of me still believes, as Guido van Rossum did (does?), that computer programming should be for everybody. And even on modern computers, Rust is sometimes criticized for its compilation time and resource requirements. Personally, I’m willing to accept that tradeoff to get both developer productivity and efficient generated code. Luckily there are other language/compiler/runtime designs that have different tradeoffs, and some of them, e.g. a CPython or Lua-style bytecode interpreter, could even still be as usable on that 1999 laptop as they were back then.
One task that’s definitely new since then is remote meetings, e.g. Zoom. The useful life of my old laptop coincided with the bad old days of VoIP (and by the end, I had upgraded the RAM, primarily because of a resource-hungry Java app I was working on). By the time Skype landed in late 2003, I had moved onto a more powerful computer, so I don’t know if it would have run on the previous one. I suspect not. And video calling? Forget it.
One thing that has definitely changed for me since then is that I rely much more on a screen reader now (I have low vision). One of the Windows screen readers I use, NVDA, is written primarily in Python, and this enables a very flexible add-on system. the NVDA project didn’t start until 2006, and it never ran on anything older than Windows XP, so it’s safe to say that NVDA wouldn’t have run well, if at all, on my old laptop, possibly even after the RAM upgrade (to the odd quantity of 192 MB). The other major third-party Windows screen reader (JAWS) started in the 90s, and it’s bifurcated between a C++ core and a scripting layer on top. Perhaps as a result of that separation, its scripting layer isn’t as powerful as NVDA’s add-on system.
So, where does that leave us? A single core at 366 MHz and 64 MB of RAM is clearly not enough for a modern PC. But do we really need a minimum of 8 GB of RAM and whatever amount of minimum processing power is now practically required?
About 10 years ago, a novel called Off to Be the Wizard had this exchange between two time-travelers:
Phillip (from 1984): What on earth can a person do with four gigabytes of RAM?
Martin (from 2012): Upgrade it immediately.
I’m not sure if the author meant that to be funny or sad, though the former would be more in keeping with the tone of the book. But I’m still inclined to interpret it as a sad commentary.
There are also plenty of things my laptop computer can do today that a laptop of 15-20 years ago couldn’t, and plenty of software taking advantage of those capabilities.
But you seem to have decided that it’s all “bloat” and don’t particularly seem open to being persuaded otherwise, so I won’t bother trying.
I think my response was more complicated than that, though it certainly ended negatively. I realize that to some extent, increased hardware requirements are an inevitable result of real progress. But I still wonder how much better we could do if we didn’t so heavily prioritize developer convenience, and racing to implement more features, above runtime efficiency. I’m sure there’s no going back to 64 MB of RAM for a general-purpose computer, but maybe we don’t need 4 GB or higher as a minimum. I don’t know though; I’m open to being persuaded that I’m wrong.
Every month a developer spends optimising is a month spent not doing something else.
Casey’s whole point in the “clean code, horrible performance” thing is that this is not how it works. You don’t write garbage code and then spend a month optimizing it. You just keep performance in mind and don’t write stupid code in the first place. There are some common practices (OOP abuse, for example) which don’t improve code quality but cost a lot in terms of performance. Instead of choosing those approaches, just write the code in a way that’s as good (maintainable, readable, etc) but without the performance pitfalls.
Maybe you want to spend a 30% performance hit for the productivity improvements of a GC, but maybe you don’t want to spend a 1000% performance hit for the dubious productivity improvements of writing your compute-heavy application in Python, for example, or a 500% performance hit for the “benefit” of modelling your data as a class hierarchy with virtual methods Clean Code style rather than plain structs or sum types.
Well, I understand it’s frustrating to have to use a language you don’t like at work. I mainly program in OCaml so this might come out as defensive in places, but I do think so some of the criticism is valid!
For a start, yeah, the syntax has known flaws (especially wrt nesting; it’s also not great for formatting imho). The lack of line comments is annoying. However, lexing inside comments is an actual feature that allows for nested comments (so you can comment out something with comments in it, no problem).
The interface issue is kind of a thing, but you can either use objects, first-class modules, or go C-style with a record of functions. This is a lot easier than C because OCaml has closures, so it’s just a bit manual, not difficult. I have my gripes with some standard types not being interfaces (in particular, IO channels…). But overall this is a bit exaggerated as it’s not hard to implement your own. The thing about Int.abs
puzzles me, what else do you expect Int.abs min_int
to return anyway?
And of course, the ecosystem. Yeah, there are multiple standard libraries (although I’d consider batteries to have lost most momentum). It’s painful. It’s made a lot of progress in the past years (e.g. ocaml-lsp-server is a pretty solid LSP server now!) but it’s true that OCaml remains a small language. This part is pretty accurate.
However, the last paragraph annoys me more. Yes, OCaml is a lot more functional than Java or Rust. It’s not Haskell, but: in OCaml, immutable structures are both efficient and idiomatic. In Java, they’re not idiomatic; in Rust, I know of one library (im
) that explicitly relies on that, everything in the stdlib is mutable, and most of the ecosystem embraces mutability. In addition, Java does not actually have functions; and Rust has a strong distinction between functions and closures. In OCaml all functions are closures and it’s seamless to create closures thanks to the GC. Try writing CPS code in Rust and see if it’s as trivial as in OCaml! Tailcalls are not easy to emulate…
So there are reasons to use OCaml in 2023. If you work on some algorithm-heavy, symbolic domain like compilers or logic tools (historically a strength of OCaml) it’s still one of the best languages; it’s not tied to a big corporation, compiles to native code with reasonable resource requirements.
However, lexing inside comments is an actual feature that allows for nested comments (so you can comment out something with comments in it, no problem).
You don’t need to actually lex the contents of comments for that, just skim through the character stream looking for opening and closing delimiters and counting how many you find.
Agree with your second-to-last paragraph though; a lot of me learning Rust was learning how not to try to write it as if it were OCaml. Inconvenient closures and lack of partial evaluation were particularly painful.
You don’t need to actually lex the contents of comments for that, just skim through the character stream looking for opening and closing delimiters and counting how many you find.
Yes, that’s what OCaml’s lexer does, but it has to account for string literals as well, to handle these:
(* "this is what a comment opening in OCaml looks like: (*" *)
If you don’t look at "
you would think that a “*)” is missing. That’s why (* " *)
is not valid.
Nope. Why would your comments care whether or not there’s an unclosed string delimiter in them? Rust works this way for example: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=973d237c97542b8e34623fdad0423f9c Various langs do various things with various flavor of “raw” strings, but those are a nice-to-have, not an essential feature.
You can do it by having your lexer state machine have entirely separate sub-lexers for strings and block comments, like this pseudocode:
fn lex_token(in: &mut Lexer) -> Token {
loop {
let next_char = in.advance_one_char();
if next_char == '(' && in.peek() == '*' {
skip_comment(in, 1);
continue;
} else if next_char == '"' {
return lex_string(in);
} else {
// the rest of the lexer
}
}
}
fn skip_comment(in: &mut Lexer, depth: usize) {
while depth > 0 {
let next_two_chars = in.peek_two();
if next_two_chars == "(*" { depth += 1; }
else if next_two_chars == "*)" {depth -= 1; }
in.advance_one_char();
}
}
fn lex_string(in: &mut Lexer) -> String {
let mut accumulator = ...;
loop {
let next_char = in.peek();
if next_char == '\\' { in.advance_one_char(); ...handle escapes... }
else if next_char == '"' { break; }
else { accumulator.push(next_char); in.advance_one_char() }
}
accumulator
}
The Rust code you linked is a great example of why OCaml does this. In OCaml, you can surround any valid code with (* *)
to comment out that code (and only that code). That is not the case in Rust – as you demonstrated, surrounding valid code with /* */
commented out the rest of the file, because Rust treats the /*
inside the string literal as the start of a nested comment.
As opposed to not allowing a comment to have an un-matched "
in it? You can have one or the other but not both, it seems; comments can invisibly contain string delimiters or strings can invisibly contain comment delimiters. I guess I personally find the Rust method less surprising.
in Rust, I know of one library (im) that explicitly relies on that,
This is a bit more subtle. In Rust, there’s basically zero semantic difference between immutable and mutable collections, so there’s simply no need to have them. This is qualitatively different from languages with Java-like semantics.
But the bit about closures is spot on. Rust punishes you hard if you try to write high order code, mostly because there’s no “closure type”.
I know what you mean, but the stdlib collections, for examples, are mutable, because the only way to add or remove elements is literally to mutate them in place. The basic feature of functional collections is that you can cheaply “modify” them by creating a new version without touching the old one. In Rust, im
does that, but Vec
or HashMap
certainly don’t, and to keep both the new and old versions you need a full copy.
OCaml has pretty decent support for immutable collections, as do Clojure, Scala, Erlang/Elixir, Haskell, and probably a few others. It’s not that common otherwise and it’s imho a big part of what makes a language “functional”.
The basic feature of functional collections is that you can cheaply “modify” them by creating a new version without touching the old one
I am not sure. My gut feeling is that in the vast majority of cases, functional programs do not retain historical, old versions of collections. Cheap modification is a spandrel, the actually important bit is value semantics: collections cannot be modified by someone else from under your feet.
You’d be surprised! A use case that I think the compiler has, for example, is to keep the scoping environment stored in each node of the post-typechecking AST. As a pure functional structure, each environment is only slightly different from the previous one, and you can store thousands of variants in the AST for one file. If you do incremental computations, the sharing is also pretty natural between incrementally modified versions.
This updates me a bit, as, indeed, compilers are fairly frequent use-case for FP, and indeed in compilers you want scopes to be immutable. But I think scopes usually are a specialized data structure, rather than an off-the-shelf map? Looking at OCaml’s own compiler:
It seems that they have have “scope which points at the parent scope” plus an immutable map, and the map is hand-coded. So this looks closer to “we have a specialised use-case and implement a specialized data structure for it”, rather than “we re-use facilities provided by the language because they are a great match”.
Sure, this is also for bootstrapping reasons I think. A lot of things in the compiler don’t directly depend on the stdlib because they come too early, or that’s my understanding anyway. Parts are also very old :-).
You can also find some instances of functional structures being used in backtracking algorithms (pretty sure alt-ergo uses them for that, maybe even CVC5 with custom immutable structures in C++). You just keep the old version and return to it when you backtrack.
I personally have some transaction-like code that modifies state (well… an environment) and rolls back to the previous env if some validation failure occurs. It’s also much simpler when everything is immutable.
Granted these are relatively niche but it shows how convenient the ability to keep old versions can be. And the language does facilitate that because the GC is optimized for allocating a lot of small structures (like tree nodes), and there’s easy functional updates of records, etc.
collections cannot be modified by someone else from under your feet.
The other edge case is if you have errors and need to back out of the current operation. Immutable data means you will always have the original copy to fall back to if you want to retry the operation. If you are using single owner mutability, have partially modified the collection and need to back out of the operation then it’s a bit trickier to retry because you need to undo all of the operations to get back to the original state.
I think that they are nearly equivalent, but still different enough to have some tradeoffs. For example, Erlang style let it crash style languages might be better off with full immutability so that they don’t corrupt state on crashes (or things will get complicated).
It’s interesting because I don’t see this distinction being drawn any where.
FWIW, in AccessKit, I previously used im, so that when updating the accessibility tree (which is implemented as a hash map of nodes keyed by node ID), I could have a complete snapshot of the tree before applying the update, without paying the cost of copying all nodes (as opposed to just the ones that were changed or removed). I stopped using im because its license (MPL) is incompatible with my goals for AccessKit, and to avoid compromising performance by copying the whole map on every update, I had to compromise my design; I no longer have a complete snapshot of the tree as it was before the update.
Yup, sometimes you do want to have historical snapshots, and immutable collections are essential for those cases as well. Another good example here is Cargo: it uses im to implement dependency solving using trial&backtracking method.
My point is that’s not the primary reason why immutable collections are prevalent in functional programs. It’s rather that FP really wants to work with values, and immutability is one way to achieve that.
More generally, I’ve observed that “Rust doesn’t have/default to immutable collections, so FP style in rust is hard” is a very common line of reasoning, and I believe it to be wrong and confused (but for a good reason, as that’s a rather subtle part about Rust). In Rust, ordinary vectors are values (http://smallcultfollowing.com/babysteps/blog/2018/02/01/in-rust-ordinary-vectors-are-values/), you really don’t need immutable data structures unless you specifically want to maintain the whole history.
I stopped using im because its license (MPL) is incompatible with my goals for AccessKit
Would https://lib.rs/crates/hamt-rs work?
Ooh, I like this idea. Uh, sure, here goes:
All hardware sucks, all software sucks
While true, I think this is a dangerous rule because some hardware and software sucks a lot more than it needs to given the constraints and this should not be excused.
A program is a machine; there is nothing a program can do that you couldn’t do with enough gears, levers and cogs.
Send data to the other side of the planet in under a second?
You get nothing for nothing; design is the art of choosing tradeoffs.
Again, true but misleading unless you think very carefully about your baseline. Often you are already assuming costs that a good solution wouldn’t have to pay. There are a lot of order-of-magnitude improvements available to projects that are willing to change everything and sometimes that’s less engineering effort than working within the constraints that have accreted over a few decades.
The “lone hacker” mythos is a lie; the greatest superpower of humanity is teamwork.
This should probably be rule 0 in bold. If you’re really as good as you think you are, you should be able to train five people to be almost as good as you and that team of five should be able to massively outperform you working alone.
This should probably be rule 0 in bold. If you’re really as good as you think you are, you should be able to train five people to be almost as good as you and that team of five should be able to massively outperform you working alone.
Doesn’t this ignore communication overhead? Or are you saying that at 5 programmers, you break even? In any case, 1 programmer is still way cheaper than 5.
While true, I think this is a dangerous rule because some hardware and software sucks a lot more than it needs to given the constraints and this should not be excused.
Oh, certainly! It’s there not to excuse imperfection but to remind you that perfection is unattainable and that hardware and software are interlocked.
Send data to the other side of the planet in under a second?
Just need a long enough pushrod… :-P In seriousness, you don’t need a computer to do that.
Just need a long enough pushrod… :-P In seriousness, you don’t need a computer to do that.
You can’t do it with purely mechanical devices, because they are limited to the speed of sound. You could do it with electromechanical devices, but you’re then limited in the rate of transmission (which I didn’t specify, only the latency). If you bring in throughput requirements, you need electronics (or possibly some hypothetical nano mechanical device, but since they don’t actually exist yet it’s hard to say).
Now now, no moving the bar; I said “you can do it”, not “you can do it feasibly or as quickly”. :-P Good point about the speed of sound though; in steel it’s about 6 km/s, so wiggling a pushrod around the world would have at least a few minutes of latency.
But you can do it with two humans and a telegraph key, which is what I was really thinking of. I hesitated about involving electricity in the question, which I now realized was a little of an implicit assumption, but I think it’s inevitable. With a minimal amount of electricity you can have some music-box style cogs that push the key for you from some kind of pre-recorded storage. If you get the fancy electromechanical devices involved though, it seems fairly straightforward to get the point where you can feed something a punchcard and have it duplicated at the other end.
Now, that’s assuming a circuit-switched network, of course, with a single telegraph wire strung across the world. If you want to do a packet-switched one like the internet, then it gets tricky.
This is fun though, how could we define that statement better? I wrote it as “there is nothing a program can do…”, but you know, in the case of sending a signal across the world the program isn’t really doing most of the work. All the hard stuff is the infrastructure between point A and point B…
Good posture pays off. It can be different for everyone, but those fingers and wrists and vertebrae gotta last a lifetime.
Amen! And if you’re experiencing pain, go to a specialist (eg a physiotherapist), and do the exercises.
I had suffered from achilles tendinopathy (nothing to do with using a computer) on and off for years, and was cured by some simple exercises I just didn’t know to do. Then I recently impinged my shoulder. Turns out my lower traps are really weak on that side. Exercises are helping, and they’re only taking a few minutes out of my day.
Yeah, something that people don’t know enough is that having strong muscles means you are more resistant to injury, of all sorts. Muscles are designed to get beat up and heal all the time, so when they take the majority of the force of stuff they protect the more slow-healing joints and connective tissue.
What are the exercises you do? My achilles tendons and hamstrings just kinda suck, they’re never as flexible as they should be.
A program is a machine; there is nothing a program can do that you couldn’t do with enough gears, levers and cogs.
Could you elaborate on this? I understand the literal meaning, and it’s a thought worth contemplating, if only for the sense of wonder. But the other commandments are much more applicable to day-to-day coding, and I’m struggling to extract a practical lesson from this one.
(Even “All hardware/software sucks” has a practical interpretation. Your tools will never be perfect, so if you’re distracted by a rough edge that refuses to be polished, accept that it sucks and spend your energy on what you were originally trying to do.)
Basically, don’t get too hung up on what Should or Should Not be in terms of programs, tools, methods, etc. They’re useful when they serve your goals and not useful when they don’t. Lisp and Cobol are both machines designed to do certain things for certain people, you have to view them by their real goals and merits. Treating emacs or Rust or Linux or anything else like a religion is frankly absurd. But people do that with their physical tools as well, so that’s really only natural.
Not to suck all the fun out of it, of course. Machines can and should be beautiful, fun, fascinating constructs. But lots of software engineering made a lot more sense to me the more I learned from EE’s, MechE’s, etc.
I have 4 APU2’s running next to me (1 as a router, the rest in various infra roles, like PXE boot server, etc). I can’t remember buying another piece of hardward where everything worked as described and I didn’t have any complaints. Well, maybe validated ECC support would be nice.
Definitely sad to see this end. Better to stock up for another decade.
Definitely sad to see this end. Better to stock up for another decade.
Depending on your needs, letting go and moving with the mainstream might be another option. I used routers running open-source firmware (mostly OpenWrt) for several years, but a couple of years ago, when I moved and switched ISPs, I decided to just use the ISP-provided modem/router. I’ve concluded that what I create for others is more important than what I use myself, and I don’t work in low-level networking, so a stock consumer Internet gateway suits me just fine.
That, and the quality of them has gone up a lot. The plastic router your ISP used to give you sucked, now it’s more than adequate. I don’t really relish wanting more infrastructure, so having to massage pf.conf or an enterprise UBNT installation isn’t too appealing.
As this completely depends on your ISP I am not buying that argument. There will always be people who can’t work with the default ones and there will always be highly technical users who can.
Remember that KHTML started as an independent hobby project to make a new web browser. In part because it was very cleanly architected, it became the basis of Safari and Chrome, even though Gecko was already mature and open source at that time.
Also as far as I know Apple and Google never hired any of the KHTML folks. So even if your hobby project is wildly successful, don’t expect it to lead to a job!
Yea, Creative Selection by Ken Kocienda goes into detail as to how KHTML turned into WebKit. Basically they hired a guy to make Safari, he went on the internet, found Konqueror, and decided to steal it
The narrative that companies project around their use of open source is so much different from the reality. Marketing is truly incredible.
Safari/WebKit was pivotal to the value of the iPhone early on. Imagine dedicating thousands of hours of your life into a project for the purposes of lifting up the community just to have some private corporation rip off your work and literally sell hundreds of billions of dollars of product in large part based on the value of your work and then not giving you nor the community anything in return. Well actually, lots of open source developers would have no issue with that. It’s strange to me. Even monkeys, not even apes, demonstrate an instinct towards reciprocity.
What’s funny to me is that if I were that guy working for Apple, I would have done exactly the same thing. The pressure to quickly put out useful products incentivizes economical engineering. It’s a systemic outcome. I value reciprocity, even if I don’t desire to gain, so I have to agree with @caleb, if I were the original authors, I would have chosen GPL. I’m happy to get cucumbers but if the other guy is getting grapes, that’s not fair.
You’re right, in the sense that the job it led to was neither at Apple nor at Google, and that getting a job like that is far from automatic.
FWIW he took this picture on the weekend after the job interviews. I took him and another new hire for a stroll in the mountains.
by adding all the features that make web apps actually possible?
by making the web actually meaningfully usable on mobile phones?
by actually having the resources to make the various web specs actually usable for the purpose of implementing a browser?
I don’t understand the last one, but yes to the first two. and more than that, doing it in a way that locks out independent developers due to the tremendous and unnecessary complexity.
I don’t understand the last one
Writing a fully compatible browser engine in the era of khtml was not feasible either, and in many respects was harder.
As this article actually notes the web specs of the era were incomplete and frequently out right inaccurate.
One of the big parts of working on webkit for many years was actually investing the time and effort into actually fixing those short comings. Without that work developing a new engine that actually worked was not really feasible, To put this in perspective:
*Actually fixing page and site rendering require mammoth amounts of work: the specifications were close to useless for everything other than SVG, and that was largely due to lack of support for SVG meaning that there wasn’t a huge amount of existing content to hit the ambiguities in the spec. That means a rendering issue on a site generally meant working out what exact piece of content was being rendered differently, then spending weeks trying to reverse engineer what the actual semantics of that were in the existing engines.
The whole point of the old “html5”/whatwg (and the parallel es3.1+es5) process that a lot of people then complained about was the recognition that the “standards” were not remotely good enough to actually build a browser. Prior to the huge amount of work done by paid engineers from Apple, Mozilla, Google, Opera, and Microsoft you simply could not implement a browser just by following the specs.
doing it in a way that locks out independent developers due to the tremendous and unnecessary complexity.
I do not understand what you’re talking about here. Are you saying that browsers/“the open web” should have stopped gaining any features a decade or two ago?
That background on specifications is interesting, but I doubt that the tradeoff for increased complexity has made it easier to implement a browser. According to @ddevault, W3C specifications now total over 100 million words [0].
doing it in a way that locks out independent developers due to the tremendous and unnecessary complexity.
I do not understand what you’re talking about here. Are you saying that browsers/“the open web” should have stopped gaining any features a decade or two ago?
Safari and Chrome added features for mobile browsing and web apps in an unnecessarily complex way. KHTML enabled them to do it more quickly than they otherwise would have. Hopefully that clarifies my previous comment?
[0] https://drewdevault.com/2020/03/18/Reckless-limitless-scope
That background on specifications is interesting, but I doubt that the tradeoff for increased complexity has made it easier to implement a browser.
Um, no, prior to the “let’s make the browser specs correct and complete” work building a browser was vastly more difficult. Fixing the specs was a significant amount of work, and why would we do that if it was not beneficial.
There are two reasons for the increased “complexity” of modern specs:
Firstly, word count is being counted as complexity. This is plainly and obviously false. Modern web specs are complete and are precise, and have no room for ambiguity. That necessarily makes them significantly more verbose. The old ecmascript (pre-3.1) said something long the following for for(in)
enumeration in ES3:
‘Get the name of the next property of Result(3) that doesn’t have the DontEnum attribute. If there is no such property, go to step 14.’
In the current ES specification, which actually contains the information required to implement a JS engine that actually works this is:
a. If exprValue is either undefined or null, then
i. Return Completion Record { [[Type]]: break, [[Value]]: empty, [[Target]]: empty }.
b. Let obj be ! ToObject(exprValue).
c. Let iterator be EnumerateObjectProperties(obj).
d. Let nextMethod be ! GetV(iterator, "next").
e. Return the Iterator Record { [[Iterator]]: iterator, [[NextMethod]]: nextMethod, [[Done]]: false }.
Which references EnumerateObjectProperties (https://tc39.es/ecma262/#sec-enumerate-object-properties) which is multiple paragraphs of text.
Note that this is just the head in the modern specification, the body logic for a for-in includes the logic required to correctly handle subject mutation and what not, which again the old “simple” specification did not mention. I had to spend huge amounts of time working out/reverse engineering what SpiderMonkey and JScript did for for-in as the specification failed to actually describe what was necessary. Because that wasn’t in the specification KJS didn’t do anything remotely correctly - not because the KJS folk weren’t talented but because the spec was incomplete and they didn’t have the time or resources required to work out what actually needed to happen.
This is all for ecmascript which was honestly probably the best and most complete and “accurate” of the specs for the web. I want to be extremely clear here:
You could not use the old specs to implement a browser that was compatible with the web as it existed even at that time
If you matched the specs exactly your browser would simply be wrong in many cases, but even that word “exactly”breaks down: the specs did not provide enough information to create a compatible implementation.
The modern specifications are much much more verbose, because any ambiguity in the specification, or any gaps in possible behaviour are now understood to be spec errors, and so the specifications are necessarily “verbose”.
Safari and Chrome added features for mobile browsing and web apps in an unnecessarily complex way.
I am curious what you think would make them less complex? You keep making variations of this claim but you haven’t provided any examples, nor shown how such an example would be made less complex.
KHTML enabled them to do it more quickly than they otherwise would have.
I would put the time savings as not more than about a year of time saving, if that.[*]
I think the long term win for everyone came about from khtml being open source, and then the blog post complaining about apple’s initial “here’s a tarball” approach to development that led to the more reasonable webkit development model we have today. Even before the modern “lets bundle a 400Mb browser with our 500kb app”/electron development model having a actually usable embeddable web component was useful - that’s a significant part of why khtml had the architecture it had. The only other browser engine that had that was MSHTML - Gecko was integrated deeply into the UI of the browser (in both directions), as were the other engines like Presto and omniweb, but they were also closed source and the latter was only marginally better at web compatibility than khtml.
WebKit being based on KHTML forced it to be open source, doing that meant that QtWebKit, WebKitGTK, etc were able to exist giving KDE and Gnome robust embeddable web views that were production quality and could actually be used on the real web, which seems like quite a big win vs KHTML based components. One of the big architectural features of webkit is the way it integrates with the target platform, and by big I mean at the time of the blink fork someone published something about how much code blink was able to remove by forking from webkit, except more or less all the code they removed was support for Gtk, Qt, Wx, etc (and JSC of course) and the architecture that made supporting those platforms possible.
Anyway, I get that you apparently hate Apple, and presumably google, and that’s fair, they’re corporations and generally corporations aren’t your friend - but you are making a bunch of claims that aren’t really true. First off the specs aren’t more complex for no reason. There’s more text now because the specs are actually usable, and therefore have to actually specify everything rather than leaving out details. There are more specs, because people want browsers to do more. If the browser is going to do something new, there needs to be a specification for that thing, and so “more words”. The old specs that weren’t as large and complex weren’t meaningfully complete, so while there might be fewer features to implement than a modern browser, it was much harder to implement any of them. Finally, the general claim that open source or other developers didn’t get anything out of the webkit fork seems to me to be fairly clearly false: a large number of modern OSS projects use webkit, blink, and v8 (some use JSC, but JSC’s much unloved C API is super unwieldy and out of date :-/)
[*] KSVG2 was a much bigger saving simply due SVG’s spec actually being usable so implementing SVG did not necessitate weeks or months of work determining the actual behaviour of every line in the specification. That said if webkit hadn’t picked up SVG, then I don’t really see it being as common as it is now - webkit/safari was the first implementation of SVG in a mobile browser at all, and for an extended period I was the only browser representative on that committee, a lot of which consisted of me trying to stop terrible decisions or correct false assumptions of what was/was not possible. As with Hixie I eventually left as a result of BS behavior (though not as bad as Hixie’s experience), but if not for Hixie and I, and WK shipping SVG I suspect SVG would have been supplanted by another format.
Actually, out of curiosity I did some bugzilla spelunking and found one of the first incredibly complex (:D) changes I ever made in webkit (you can even see my old student id in the changelog): https://bugs.webkit.org/show_bug.cgi?id=3539
What I think is particularly great is that this is a trivial bug, bought about entirely by the low quality of the specifications of the time. It even has Brendan Eich pop up and explicitly call it out as an error in the specification. If you follow the conversation in you can see I was having to manually test the behaviour of other browsers, and come up with related test cases to see what was happening in those cases.
This is a trivial API (Array.join()) used in a core behaviour (Array.toString is in terms of join()) yet didn’t specify exception propagation, and completely failed to specify how recursion should be handled.
Another fun thing: in looking for my ancient commits I found a bunch of the webkit svg work back ported into ksvg and khtml, directly benefiting khtml.
Um, no, prior to the “let’s make the browser specs correct and complete” work building a browser was vastly more difficult.
I don’t know the last time a browser having parity with established alternatives was developed from scratch, do you? It would certainly be illuminating if there were a case in the past 5 years and we could compare the man hours of development with that of an earlier example. You think it would be less for a modern browser?
Fixing the specs was a significant amount of work, and why would we do that if it was not beneficial.
I hope you didn’t think I was claiming that more rigorous specs for the same standard would not be beneficial.
I am curious what you think would make them less complex? You keep making variations of this claim but you haven’t provided any examples, nor shown how such an example would be made less complex.
I thought it was common knowledge that the web was unnecessarily complex. A priori it makes sense that rapid incremental development combined with the need to support every previous way of doing things would create a lot of complexity that could be avoided if development was slower and more carefully planned out.*
You seem to know a lot more about browser development than me, so perhaps you can be the one to offer specifics. What’s an example of user-facing functionality which is possible today but was not possible with the web standards of 10-15 years ago? Or what complex additions to web standards in the past 10 years were necessary to enable some useful functionality?
Your reply to your own comment mentions bugs due to under-specified exception propagation in Array.join(). Perhaps one way the web could be simpler would be if JavaScript used return values for error handling, similar to C or go, rather than exceptions.
Anyway, I get that you apparently hate Apple, and presumably google, and that’s fair, they’re corporations and generally corporations aren’t your friend - but you are making a bunch of claims that aren’t really true. First off the specs aren’t more complex for no reason. There’s more text now because the specs are actually usable, and therefore have to actually specify everything rather than leaving out details. There are more specs, because people want browsers to do more.
Just flagging that “people” can mean a lot of things, and if you want to argue that a majority of web users wanted the features to be added, that seems quite bold and requires some justification. Did web users want publishers to regain the ability to show them popups, which were effectively eliminated in the early 2000s by popup blockers?
Finally, the general claim that open source or other developers didn’t get anything out of the webkit fork seems to me to be fairly clearly false: a large number of modern OSS projects use webkit, blink, and v8 (some use JSC, but JSC’s much unloved C API is super unwieldy and out of date :-/)
I don’t think I made that claim, did I?
* - To clarify, I’m not saying KHTML being GPL would have made browser development more careful, just that it would have delayed the reckless development that has been happening and making things worse.
Ok, at this point I’m not sure what you’re trying to say, so I’m going to try to make what I am saying very clear:
It is exponentially easier to implement a browser using the modern specs than the old specs.
There is more work to implement a browser now, because they have more features. Those features are all individually something people can reasonably implement.
What Kling is doing with ladybird would simply not be possible with the specs from 15 years ago. Again, at the time apple started working on webkit/khtml, it could not render yahoo.com correctly despite the being one of the most popular sites at the time. The first 5-8 years of webkit and safari’s existence was essentially trying to reach consistent web compatibility with gecko, and that was a funded team. This is because pretty much every time a site was not rendering correctly would mean having to spend weeks or months working out what was actually happening - because the specs were incomplete or wrong - to then make changes, while trying to ensure that the change in behavior didn’t result in things going wrong somewhere else.
I cannot emphasize how hard it was to fix any rendering issue.
Meanwhile in a few years with the modern specs Kling has almost on his own got vastly more complex websites than anything khtml had do, much more correct than khtml ever did. Which was only possible because the vastly more precise wording in the modern specs, even though that makes the specs more complex by the addition of words.
I thought it was common knowledge that the web was unnecessarily complex.
No. It’s common knowledge that the web is complex. Unnecessary is a subjective value judgement. There are plenty of specs in the web I don’t care about, but that others do, and vice versa. Just because you don’t use something doesn’t mean some else does.
A priori it makes sense that rapid incremental development combined with the need to support every previous way of doing things would create a lot of complexity that could be avoided if development was slower and more carefully planned out.
Modern features are carefully thought out, and often take months if not years to specify. Specifically to avoid unnecessary complexity. The problem is many people will say “I want to do X, and that only requires Y”, and then spec author and engine developers have to ask the ignored follow up “what if you want to do related Z?” or “how does that interact with these other features?”. Because the specifications need to solve the problem of “how do we make X possible” not “how do we do X only”. A lot of the pretty terrible misfeatures of the web come down to engines saying “we want to do X and that only requires Y” and then just doing that rather than considering any other factors.
The modern web is not “rapid incremental development”. It’s obviously incremental because we aren’t starting from scratch every few years. There aren’t versioned releases of the spec because we learned from the past is that that doesn’t actually work, for many reasons, but at a most basic level: what does it mean if a single feature in the spec is complete and fully specified and a browser ships said feature? should that require a versioned release of the spec? what if a different engine has a different set of fully implemented features? The result is the realization that version numbering a spec is meaningless.
What’s an example of user-facing functionality which is possible today but was not possible with the web standards of 10-15 years ago? Or what complex additions to web standards in the past 10 years were necessary to enable some useful functionality?
This is a subjective complaint - specifically preferring the “the web was good enough then why did we not just freeze it in time” model of development. The answer is a lot of things on the web aren’t necessary. But that applies to everything: did the web really need more than ncsa mosaic?
For example:
CSS has many more features that make more complex design possible without being unmanageably complex (or just outright impossible). But all you need to do is be able to show text and images, so by your position none of that is necessary
CSS animations are totally unnecessary, and were achievable using JS before then, It was incredibly slow and laggy on all but the highest end PCs, and basically unworkable on phones due to the touch screen interface reducing the time constraints for human perceptible latency.
XHR was clearly not needed for a long time, I think it was only introduced in IE5? so that’s 99/2000 and wasn’t copied by Mozilla for some time after that, the web was big enough at the time and no one needed it before then
Canvas could easily be handled server side - it was only invented because a number of core widgets in Tiger’s dashboard couldn’t leverage servers for rendering, but accidentally made available to the web at large and aggressively adopted by people even though the API was kind of bad because it was meant to be an internal feature and so essentially just modeled the CoreGraphics API.
And I could do this for almost every feature introduce over the last 30 years. Are there some features I detest? absolutely. notifications are basically just for spam, but web apps want to be able to act like real apps, and apparently people want that. personally I turn off notifications for non web apps as well. I hate them with a passion, but clearly people do want those. Google constantly wants to give web access to arbitrary hardware because they literally want chrome to be the only thing running on people’s computers lest they do something that google can’t monetize (WebBluetooth, MIDI, ….)
There are a few specs that have been added that I don’t think you could reasonably pretend could be done on older specs:
The media specifications, and it’s reasonable that you would want JS to start/stop audio at least, and get playback position for the UI. That means you need additional specification. Alas DRM happened. OTOH if DRM did not happen then you either require plugins or you have an unspecified feature that everyone implements. Still DRM remains BS.
The text event specs. JS, and so key events (and events in general) entered the browsers as developed in the English world. As such they utterly failed to handle any language that isn’t one key press equals one character. They added “input” events at some point, which were still bad. The new text composition event spec is not something you could emulate. That said in my opinion people should not be trying to handle text entry themselves, and instead should use the builtin text area or content editable (itself a new spec, but also awful to use if you were trying to make a word processor)
Your reply to your own comment mentions bugs due to under-specified exception propagation in Array.join(). Perhaps one way the web could be simpler would be if JavaScript used return values for error handling, similar to C or go, rather than exceptions.
No, the error is in the spec, not the language. In a language specification you have to say exactly how every step occurs, and that includes error/exception propagation. In fact the spec is entirely in terms of returned “Completions” IIRC the completion consists of a value and a completion kind, which could be normal, return, break, continue, or exception. Any step that gets a completion should be testing the kind of the completion, in this case it should have been checking for an exception. If JS was instead error value based (which it essentially was originally and is part of why JS is so lenient) the only change would be checking for an error return instead of an exception.
Just flagging that “people” can mean a lot of things, and if you want to argue that a majority of web users wanted the features to be added, that seems quite bold and requires some justification.
No, web developers want features, because they believe those features will let them make things that users want, or they’ll make more complex things easy and/or possible. To be clear engine developers don’t just say yes to every proposal because sometimes webdevs want things that have tremendous privacy or security implications. But then if a feature does seem like it will be generally useful, engine devs, web devs, work together to create an actual specification. Engine developers also propose features, generally based on making it easier, more efficient, or safer to for web developers to do the things that engine developers see happening at scale.
Did web users want publishers to regain the ability to show them popups, which were effectively eliminated in the early 2000s by popup blockers?
I’m unsure what you’re talking about here? I’m guessing notifications or similar in which case: yeah, actually people do want them. The problem you hit is the general webapp problem: a browser can’t distinguish a notification request from a site you want to be at from one you’re at through misclick or ad. But this applies to many of the more terrible features web developers propose which boil down to “well a native app can do X safely, why can’t my website?” followed by “then just have a permission dialog” which demonstrably does not work.
I don’t think I made that claim, did I?
I’m unsure what your underlying complaint is then? you said it was bad that apple (and eventually by proxy google) forked khtml, even though apple followed all the rules of open source, and even though apple’s work ended up back in khtml directly. The existence of safari and webkit was a significant driver of the “let’s make the specifications of the web actually usable”, so if you argument is “apple should not have made a browser engine” then you’re saying “we should have stuck with the specifications of the 90s and earlier 2000s”, the ones that could not be used to implement a browser.
I just want to say that I’ve really enjoyed your comments in this thread. They’ve added some context to the development of web standards I have not thought about before. This is this site at its best.
Ok, at this point I’m not sure what you’re trying to say, so I’m going to try to make what I am saying very clear:
It is exponentially easier to implement a browser using the modern specs than the old specs.
There is more work to implement a browser now, because they have more features. Those features are all individually something people can reasonably implement.
This still lacks clarity to be honest. Are you conceding that implementing a modern browser would take more man hours, while taking “difficulty” to refer to the character of the work (reading specs vs. reverse engineering) rather than the sheer amount?
I’m unsure what your underlying complaint is then? you said it was bad that apple (and eventually by proxy google) forked khtml, even though apple followed all the rules of open source, and even though apple’s work ended up back in khtml directly.
I said that WebKit has been used to further the degradation of the open web, in part by adding features for web apps and mobile usage in a way that locks out independent developers due to unnecessary complexity. This is far from saying “open source or other developers didn’t get anything out of the webkit fork.” Can you see the difference?
The existence of safari and webkit was a significant driver of the “let’s make the specifications of the web actually usable”, so if you argument is “apple should not have made a browser engine” then you’re saying “we should have stuck with the specifications of the 90s and earlier 2000s”, the ones that could not be used to implement a browser.
That’s a complete non-sequitur. Apple not forking KHTML would not have prevented them from making a browser engine, and it would not have prevented them or anyone else from improving the specifications. I have no idea why you would think that.
I can address some of the other things in your comment but I want to be sure I’m being understood.
This still lacks clarity to be honest. Are you conceding that implementing a modern browser would take more man hours, while taking “difficulty” to refer to the character of the work (reading specs vs. reverse engineering) rather than the sheer amount?
There are more features to implement, but it is much much easier to implement those features. Hence it is easier to implement a browser. For more or less any feature you want to implement from 15 or 20 years ago, if you say “I’m going to open the spec and implement the feature” you will be unable to do so. For more or less every feature today, you can open the spec, essentially copy line for line, and you will have a correct implementation of that feature. This is exactly what Kling has been doing, and has been a demonstrably successful approach.
I said that WebKit has been used to further the degradation of the open web, in part by adding features for web apps and mobile usage in a way that locks out independent developers due to unnecessary complexity.
Again this reads like “the internet was fine and didn’t need any more features than it had 20 years ago” and “the only reason the internet has more features is because of webkit”. The first is more subjective but I’d say is not true, the latter is simply false.
That’s a complete non-sequitur. Apple not forking KHTML would not have prevented them from making a browser engine, and it would not have prevented them or anyone else from improving the specifications.
In that case the former means apple forking/not-forking khtml has no impact on the growth of the web as a platform, and to the second: one of the biggest drivers for improving the specs that make up the web platform was apple and Mozilla trying to ensure compatibility between webkit and gecko. If you remove an apple created browser engine a huge amount of that effort disappears, and while sure nothing is stopping anyone from doing that work, there’s also nothing stopping anyone from starting it.
Honestly this entire argument seems to be predicated on webkit being solely responsible for the amount of stuff in the modern web platform, which isn’t true, and that only happened because webkit forked khtml, which isn’t true, and that the web platform was fine fine 20 years ago and none of the features added in the last 20 years were of any value.
The motivating factor for the vast majority of new specifications is not “mobile”, it is web developers trying to make more and more powerful apps. Historically driven by MS (that’s how we got XHR), and then by Google. Chrome was developed largely because it is in Google’s interest to have people in a browser as close to 100% of the time as possible, and so they needed browsers to be able to basically be an app platform. That is why Chrome exists. When they were starting they did spend time deciding between gecko and webkit, and the only reason they chose webkit was because the architecture was not so heavily tied to the embedding app. If webkit was not there, they would have just based chromium on top of gecko, only without the preceding 5-8 years of specification cleanup. They’ve also historically been pretty bad about trying to add random half-thought out features to the web without considering anything beyond their own immediate use case, something that only apple’s webkit team really successfully pushed back on. So again removing apple and webkit means worse specs.
Simply claiming “large spec == bad”, or even somehow “anticompetitive”, but if you’re going to make claims like that you really need have done more research than just appealing to things being “obvious” while dismissing the actual history and technical details involved.
This still lacks clarity to be honest. Are you conceding that implementing a modern browser would take more man hours, while taking “difficulty” to refer to the character of the work (reading specs vs. reverse engineering) rather than the sheer amount?
There are more features to implement, but it is much much easier to implement those features. Hence it is easier to implement a browser. For more or less any feature you want to implement from 15 or 20 years ago, if you say “I’m going to open the spec and implement the feature” you will be unable to do so. For more or less every feature today, you can open the spec, essentially copy line for line, and you will have a correct implementation of that feature. This is exactly what Kling has been doing, and has been a demonstrably successful approach.
Should I infer an answer to my question from that?
I said that WebKit has been used to further the degradation of the open web, in part by adding features for web apps and mobile usage in a way that locks out independent developers due to unnecessary complexity.
Again this reads like “the internet was fine and didn’t need any more features than it had 20 years ago” and “the only reason the internet has more features is because of webkit”. The first is more subjective but I’d say is not true, the latter is simply false.
If that’s how you read it, where did you get “open source or other developers didn’t get anything out of the webkit fork”?
I will leave it as an exercise to the reader to identify the difference between what it “reads like” and what it actually says.
That’s a complete non-sequitur. Apple not forking KHTML would not have prevented them from making a browser engine, and it would not have prevented them or anyone else from improving the specifications.
In that case the former means apple forking/not-forking khtml has no impact on the growth of the web as a platform
It does not. Usually when it gets to the point of deconstructing sentences word by word to identify the invention of a new stawman position, the opportunity for fruitful discussion has past.
At any rate it seems your disagreement is with someone you’ve invented in your head, so I invite you to carry on the argument there.
At any rate it seems your disagreement is with someone you’ve invented in your head, so I invite you to carry on the argument there.
Ok. I tried to answer repeatedly, and you just kept coming back round to claim that specs are bigger and harder to implement Because Apple. You showed zero interest in actually learning anything about what you were claiming. My apparent “straw men” was me trying to understand what you were actually trying to say, as I answered your original claim of “apple forking khtml resulted in much anticompetively complex specifications”, so I assumed I was misunderstanding.
But yeah lets end this, as far as I can tell there is nothing I can say, no history, no information, or anything that will sway you from that position, as you just dismiss anything I do say without any evidence to support your position.
And you showed zero interest in understanding my position. Nice use of quotes without an actual quote by the way. Perhaps there is some solace in knowing that I don’t actually hold any of the positions you have ascribed to me in your last few comments.
All those things can be done with a GPLed code base.
Companies do invest in truly free software. Linux, for example.
I’m unsure what you’re trying to say here? WebKit is BSD and LGPL licensed, has had significant investment by numerous companies, and is used by a variety of non-commercial products.
As a direct result of the permissive license on KHTML, KDE was able to adopt a WebKit-based web view (built around QTWebKit) a few years after Apple picked up the code. The new web view, unlike KHTML, was able to correctly render most pages on the web at the time.
But, sure, it would have been better if KHTML had remained a partial implementation used only by KDE. I’m sure Apple developing a proprietary web engine rather than working on an open source codebase would have been much better for the open web.
Your comments and reasoning seem to imply that we should be grateful to Apple for taking the KHTML code and turning it into something much better. I don’t necessarily take that premise for granted. I think there is an argument to be made that a non-insignificant part of the intention behind Apple’s continuation of WebKit as an open source project was so that they could rally free (as in beer) open source labor and expertise towards their purposes. Or put another way, I don’t think it’s a given that Apple alone would have been able to create or sustain a high-quality proprietary web rendering engine.
We have a pretty representative example of this, Microsoft was unable to maintain Internet Explorer alone. So much so that they abandoned their proprietary engine in favor of an open source engine. I don’t think it’s unreasonable to assume the same fate would have befallen Safari. Safari’s death is still often predicted.
In many cases it may be that large corporations need open source ecosystems more than open source ecosystems need large corporations. I would guess that that is the situation in most cases when I consider the events of the past twenty years. If so, open source developers and communities shouldn’t be so quick to sell themselves short or sacrifice the terms on which they allow others to use their IP.
Your comments and reasoning seem to imply that we should be grateful to Apple for taking the KHTML code and turning it into something much better.
If the goal of releasing software under an open license is not to allow people to improve it so that the amount of useful code under F/OSS licenses increases, then what is the goal?
I think there is an argument to be made that a non-insignificant part of the intention behind Apple’s continuation of WebKit as an open source project was so that they could rally free (as in beer) open source labor and expertise towards their purposes
The goal for Apple was to share the cost of maintaining the engine with other people. They made no secret of this. A lot of the early contributions were from Nokia, who had a similar need for their Series 60 mobile platform. Apple maintained abstraction layers that allowed Nokia and others to plug in their own rendering back ends and widget sets. KDE benefitted from this because it made it easy to maintain the Qt support, Nokia benefited because they got a web engine for a fraction of the cost of writing one and cheaper than licensing one from Opera, Apple benefitted because other people were improving WebKit.
The vast majority of contributors to WebKit were not ‘free (as in beer) open source labor’, they were paid employees of Apple, Nokia, Google, and so on. If you looked at the WebKit repo around 2010, you’d have seen around a dozen different platform integrations, maintained by different groups.
If you are opposed to corporations contributing to F/OSS codebases for reasons of self interest, what economic model do you propose instead to fund F/OSS development?
We have a pretty representative example of this, Microsoft was unable to maintain Internet Explorer alone. So much so that they abandoned their proprietary engine in favor of an open source engine.
This is not what I’ve heard from folks on that team (disclaimer: I am a researcher at Microsoft, I don’t work on anything related to Windows or Edge). The reason that Microsoft invested in Blink was largely driven by Electron. It is used in Office and a bunch of third-party applications. The choices were:
There is no possible world in which the second is not cheaper than the first because it is a subset of the work.
Again, Microsoft invests in an open source project because it allows sharing costs with Google and other contributors. Some of these people may be volunteers doing it for fun, most are not.
I started working on LLVM because I wanted a modern Objective-C compiler for non-Apple platforms. The GCC codebase was an unmaintainable mess. Clang had Objective-C parsing support but not code generation support. I was able to add enough support for Objective-C code generation for GNU Objective-C runtimes that we could build all of GNUstep in a few weeks. I gained far more from Apple’s contributions to LLVM than Apple gained from mine. Should I be mad because Apple shipped a load of system frameworks compiled with code that I worked on and all I got back in exchange was the free use of a few tens of millions of dollars worth of code written by their engineers?
I am primarily a pragmatist. I think both the restrictive and permissive F/OSS licenses are appropriate in different circumstances. In general, however, for individual hobbyist programmers who may not yet understand why they are doing what they are doing, I would recommend they use GPL, or even a more restrictive license, until they have a well-understood reason to use a more permissive license. Often you don’t know how your code will end up being used or by whom so it’s more prudent to reserve IP rights early on and release them later once you have the benefit of hindsight.
If the goal of releasing software under an open license is not to allow people to improve it so that the amount of useful code under F/OSS licenses increases, then what is the goal?
I think this is a classic misunderstanding between the “free software” and the “open source” camps. The first camp is more oriented towards social good and the second camp is oriented towards utilitarianism.
If you are opposed to corporations contributing to F/OSS codebases for reasons of self interest, what economic model do you propose instead to fund F/OSS development?
As above, I am not opposed to corporations contributing to F/OSS codebases for reasons of self-interest in principle. As far as economic models to fund F/OSS development go, that’s a separate question but there are many answers. Linux is a GPL project, that has never stopped corporations from using it and contributing to it. That development is funded in two primary ways as far as I know: through fundraising and paying developers to work on it.
The choices were:
There is a third choice. Re-base electron on top of the IE browser engine for their own products. That option was probably out of the question because it was probably cheaper for Microsoft to make use of an open source engine then continuing to maintain their own proprietary engine. This was part of my thesis above. It’s not a given that even large corporations themselves are able to efficiently maintain their own proprietary web browsers / rendering engines, they are reliant upon F/OSS communities.
Should I be mad because Apple shipped a load of system frameworks compiled with code that I worked on and all I got back in exchange was the free use of a few tens of millions of dollars worth of code written by their engineers?
I think this comment was based on the premise that I was opposed to corporations contributing to F/OSS codebases for reasons of self-interest in princple. Since I’m not I won’t address it.
My primary contribution to this discussion is only to counter the premise that F/OSS developers necessarily gain more than large corporations do when large corporations take their code and use it in proprietary products. Or that F/OSS software would fade into obscurity without usage by large corporations in proprietary products. This premise is often what drives a F/OSS developer who is on the fence between choosing a permissive BSD-style license and a restrictive GPL-style license. The license does have some effect but ultimately I think large corporations will find a way to use GPL software if they need to, even Apple used GCC for many years before LLVM came around. Under those circumstances, F/OSS developers who value the terms of the GPL should not choose against it just because they fear the license will drive contributors away or inhibit their opportunities. It may be the case that your work is more valuable to them than them using your work is valuable to you.
I think this is a classic misunderstanding between the “free software” and the “open source” camps. The first camp is more oriented towards social good and the second camp is oriented towards utilitarianism.
Note that the concept of “social good” with regard to the FSF is very narrow – it is: “contribute back the source code if modified”. Thanks to Freedom Zero, if a repressive regime were to use GPL’d software to regulate its gas chambers, and made sure to contribute back the source for other gas chamber operators, the FSF would be A-OK with it.
Thanks to Freedom Zero, if a repressive regime were to use GPL’d software to regulate its gas chambers, and made sure to contribute back the source for other gas chamber operators, the FSF would be A-OK with it.
I get the point you’re trying to make but this example is excessive and most likely untrue to the point of making your point difficult to take seriously. I highly doubt the FSF would be A-OK with their software being used to commit murder. I think you mean to say that the GPL would in theory permit that usage.
In any case, the intention and spirit that motivated the creation of the GPL was the promotion of social good in general (just read RMS’s blog to get a sense of his general priorities). The specific terms were just their best tactic for accomplishing that in the context in which they were operating.
This is not what I’ve heard from folks on that team (disclaimer: I am a researcher at Microsoft, I don’t work on anything related to Windows or Edge). The reason that Microsoft invested in Blink was largely driven by Electron. It is used in Office and a bunch of third-party applications.
That matches my understanding from the engine dev grapevine. MS was trying to deal with electron apps being gigantic and wanted to be able to essentially have it built into the OS (a la WebKit.framework on Darwin) because otherwise the resource costs are huge as every app is essentially a full copy of chrome and so has a huge dirty memory footprint. My understanding is that adopting blink didn’t actually change anything in that regard because blink doesn’t actually have any real API stability guarantees, and people just include whichever version of blink they built on anyway rather than trying to rely on a system framework.
To clarify, I would call LGPL a weak copyleft license. If KHTML had a truly permissive license, WebKit could have been proprietary.
My heuristic is anything that would have slowed down the process by which the web became the intractable skinner box it is today would have been for the good, and any change that would have given less help to the companies who did that probably would have helped.
Being more rigorous, you can’t really predict what would have happened if KHTML were GPL rather than LGPL, and I grant it’s possible that it would have hurt. But my sense is that it probably would have helped slow things down.
As a direct result of the permissive license on KHTML, KDE was able to adopt a WebKit-based web view (built around QTWebKit) a few years after Apple picked up the code. The new web view, unlike KHTML, was able to correctly render most pages on the web at the time.
This overlooks that if Apple developed their own engine, it would have taken longer for them to implement the same features that ended up in WebKit, so it would have taken longer for the web to reach the same level of complexity, and maybe KDE could have kept pace with web standards via in-house development.
On the other hand, maybe Apple would have adopted Gecko instead of KHTML, and who knows how that would affect things. Maybe web code bases would have converged even faster, leading to a faster transformation of the web into something like we have today, ultimately worsening our present situation.
But, sure, it would have been better if KHTML had remained a partial implementation used only by KDE. I’m sure Apple developing a proprietary web engine rather than working on an open source codebase would have been much better for the open web.
I honestly don’t know. I guess if WebKit were proprietary, Google may have adopted Gecko, which could have hurt in the long run as above. But if Gecko were also GPL, then Google would not have been able to develop Chrome as quickly, and making it proprietary would have made it harder for them to promote it to people who care about web openness.
Anyway, the point is that it’s generally better to avoid helping companies whose interests are opposed to effective web or software openness. I will defend that heuristic against the one you appear to be using, that it’s better to have more free software even if it’s used by tech giants to impose costs and harms on the rest of us.
Have the original KHTML developers said so? If not, we have no right to be indignant on their behalf at what Apple did. For all I know, maybe the KHTML developers are happy at the outcome, not least because they got to benefit from Apple’s work, even if not monetarily.
it led to an acceleration of a problem that effects all of us, so yes we have a right to be indignant
Plenty of GPLed projects are quite popular, and the great thing is that their users’ rights are respected, too.
The patches Apple contributed back were a bit shitty. Giant diffs with useless commit messages, and often quite late.
If Lars had used the GPL3, what would Apple have done? My guess is that its patches would have looked like the patches various hardware vendors contribute back to the linux kernel: Giant diffs with useless commit messages, late.
I worked at Trolltech at the time; we discussed this with them. They didn’t sound like fans of Gecko.
A colleague of mine spoke with them in person, then came back and told us that they were nice and that they thought konmqueror was nice but with performance problems, compared to Gecko’s million lines of unfixable sloth. They also offered to contribute fixes back, but it was clear that it would happen a little late (it was very far down on their management’s priority list).
That makes sense. But as the web gets more complex, the cost of starting anew also increases, so by the time Apple was deciding on a browser engine to adopt it may have been impractical to start their own. If KTML was GPLv3, I feel like that would have been a deal breaker, so they might have been stuck with Gecko.
Well, there was also another browser engine Apple might have bought, like how Apple bought EasySW to use CUPS. Presto was small, fast, maintainable and all the rights were owned by Opera.
I’m planning to go to RustConf this year. Rust has become one of my favorite languages over the past couple of years, and I’m developing an open-source project in it, so I look forward to meeting up with some of the Rust community.
emotional appeals to slippery-slope arguments
I think that in the current year people who still claim that slippery slopes are an invalid concern are either bad actors or useful idiots.
People have such a grumpy view on telemetry because once it becomes normalized there is no incentive in place to roll it back or limit it. Given how gnarly the economy is going to be for tech for the next several years, I fully expect maximum resource extraction to be the dominant strategy–and companies will turn to that telemetry to make it happen.
If I have a problem, I’ll send in a bug report or mainly upload a crash dump. I do not need my machine snitching on me however beneficial people will claim it to be.
I think that in the current year people who still claim that slippery slopes are an invalid concern are either bad actors or useful idiots.
Why is the current year relevant? A logical argument is valid or not, regardless of the prevailing zeitgeist. Maybe one has to willfully be somewhat ignorant to keep believing we can have nice things, especially in our current pessimism-saturated online discourse. So be it; I’ll be a useful idiot too. If the most pessimistic opinions about the impact of microelectronics had prevailed in the 1970s and early 1980s, I, for one, would have been much worse off. Telemetry is certainly not an invention on the same scale, but I’m willing to have faith that it can be a valid tool for producing more usable software, despite previous abuses and the current prevailing opinion.
I think one place it makes sense to have telemetry is on websites where your primary question is “are our users finding the buttons we want them to find”.
The average website user will not file a bug report.
I’m not sure I 100% agree with all of the analogies completely, but they’re all quite plausible. I think the more useful comparison is to think of a CPU’s ISA the same way that you think of a compiler IR. It is not the thing that’s executed, it’s the thing that you lower to the thing that it executed.
In particular, architectural registers are just a dense encoding of data flow over small blocks. Most CPUs now do register renaming (low-end microcontrollers don’t, big ones do, all of Arm’s A-profile cores now do), so their internal representation is SSA form: every rename register is assigned to once and then read multiple times, the identifier is reused once you reach a point where there are no outstanding references to an SSA value. This happens when a write to the same architectural register leaves speculation and is why xor
ing a register with itself before a jump can improve performance on x86 (you mark the SSA value as dead). I find it slightly depressing that we use a fixed register set as an IR in between two SSA IRs, but none of the projects that I’m aware of to fix this (including two that I worked on) have managed to come up with anything more efficient.
Some CPUs are a lot more compiler-like. NetBurst and newer Intel chips have loop caches that let them compile a hot loop to micro-ops (not sure if they optimise this) and then execute it directly. Transmeta CPUs and NVIDIA’s Project Denver cores expose an AArch64 ISA but internally JIT it to a VLIW architecture (the Denver architecture is really fun, it has multiple pipelines offset by one cycle so each instruction in a VLIW bundle can feed the result to the next one without allocating a rename register). The big change from Crusoe to Denver was adding a simple map that translated one Arm instruction to one VLIW instruction so that the JIT didn’t have to warm up to execute, only to execute fast.
In particular, architectural registers are just a dense encoding of data flow over small blocks.
This really blew my mind when I first understood this. I’ll unpack this statement a bit more for those interested (might not be the best explanation as I just came up with a random example, but hopefully it helps someone):
Modern CPU’s internally have many more registers than the ISA specifies. The ISA registers are just used to encode the dependencies, internally different registers may be used if the CPU detects that operations on the same register are independent.
For example, if you do something like (Intel syntax):
add rax, 15
mov [edi], rax
mov rax, rdx
mov [esi], rax
My understanding is that modern CPU’s interpret this as two “flows”.
First one:
rax
refers to (this is a dynamic mapping)rax
refers to at the address of the internal register that edi
refers toSecond one:
rax
to the internal register that has the label rdx
rax
refers to at the address of the internal register that esi
refers toSo, the nice thing is that once the CPU detects that these are independent (since the contents of the rax
in the second flow do not depend on the first flow – it is reset by the mov
), the second flow can actually be executed before the first flow.
the second flow can actually be executed before the first flow
Sort of—the notion of ‘before’ is a bit tenuous when a CPU is massively concurrent on many levels. To begin with, the writes might in fact be executed concurrently, neither strictly before or after the other (they almost certainly will be—latency of memory accesses is pretty high; but even ignoring that, on a modern uarch, they could both retire on the same cycle). Another issue is that edi and esi might be equal, denoting the same location (or they might partially overlap); in this case, you would have to make sure that the last write the program performs really does come last. (In particular, if you already know what esi is, but you don’t yet know what edi is, then you want to start the write to esi right away, but you need to be able to back out later if edi turns out to have been equal to it.) Finally, there are what we more usually think of as concurrency issues: multiprocessing/cache coherency. What if we write to esi ‘first’, and then another processor reads from esi, and from edi, and it sees the write to esi but not to edi? Under some architectures, that is legal, but not x86.
Right, good points! So indeed it’s probably not a good example. I hope it still illustrates the point somewhat.
newer Intel chips have loop caches that let them compile a hot loop to micro-ops (not sure if they optimise this) and then execute it directly
What do you mean by ‘optimise’?
the Denver architecture is really fun, it has multiple pipelines offset by one cycle so each instruction in a VLIW bundle can feed the result to the next one without allocating a rename register
Cuute—no bypass network?
Aside: now I wonder if there’s any parallel to be drawn between tricolour and MESI.
What do you mean by ‘optimise’?
At the very least, I wouldn’t be surprised if they did some micro-op fusion before caching the loops.
Intel does both “macro” and “micro” uop fusion. Macro fusion is merging multiple instructions into a single uop. This is done for cmp+jcc combos. Micro uop fusion stores multiple uops together as one until the allocation stage; this is usually done for instructions with memory sources, where the read-from-memory uop and the operation uop are kept together throughout the frontend. This saves space and bandwidth in the cache.
But there’s yet another shortcut for short loops (up to ~50 uops), the loop stream detector, where the uops are directly recycled from the uop queue, bypassing even the cache. This has mostly the same throughput as the uop cache, though; the main purpose here is to save power by turning the decoders off when in a short hot loop.
Some CPUs are a lot more compiler-like.
This intuitively seems like it would be a bad thing for power efficiency, particularly to the extent that the CPU spends energy on speculation that turns out to be unneeded. To what extent is that the case?
Power and throughput are always a tradeoff, made more complicated by the fact that the best thing for power is often to run quickly and then enter a low-power state. In-order cores with no speculation don’t spend power doing the wrong thing, but they do spend power doing nothing. Whenever a pipeline is stalled doing nothing, you’re still powering the registers and you often don’t have long enough to power gate it, so it’s still consuming power. Architectures like the UltraSPARC Tx series avoid some of this by having multiple threads running, so each one can issue an instruction and when it’s stalled (waiting for memory or results) another runs, but this increases the power that you need for register files.
If you wanted to design a CPU purely for efficiency (with no constraints on programming model) then you would probably build a multithreaded stack machine, so you have a small amount of per-thread state and can always issue one instruction from one of your threads. Such a machine would be a good fit for Erlang-like languages but would be really bad at running C.
I wrote a bit about this almost a decade ago: There’s no such thing as a general-purpose processor
If you wanted to design a CPU purely for efficiency (with no constraints on programming model) then you would probably build a multithreaded stack machine
This reminded me of Forth and GreenArrays, which then reminded me of this post by Yossi Kreinin about Forth and stack machines, and the contortions needed to implement MD5 on a GreenArrays chip.
Not being very close to that level of programming, the way I understood this was:
This seems crazily inefficient, but not too surprising when you’re talking about the software/hardware boundary
I had read about the 1980’s dataflow architectures many years ago … which use data dependencies rather than a linear ordering.
It seems like an obviously good idea (?), but narrow waist inertia means that it’s more economical to create this “compatibility layer” than to try to rewrite the interface, like Mill was trying to do
Dynamic scheduling is not a worse-is-better concession to backwards compatibility; it is a way of dealing with the fact that the behaviour of software is dynamic. Particularly, along two axes: memory access, and control. A priori optimal scheduling is not possible given the unpredictable latency of a memory access. And whereas, at the source level some term may join the two arms of a branch, at runtime, only one arm will be taken at any given time, and the CPU may simply attach the dataflow edge to its final destination. Not that it mightn’t be worthwhile to devise an architecture which is easier to decode and schedule, but it simply wouldn’t be a very big deal.
In other words: you’re not simply recovering the parallelism as it was at compile time; you’re finding parallelism where it exists only at runtime.
Languages like c are far more actively harmful, performance-wise, because at the level of applications (rather than the concern of a single cpu), you want to parallelise along other axes—particularly multiprocessing, but also vectorisation—at which scale it becomes untenable to do things speculatively or dynamically, but the implementation in your source language is (of necessity) missing all the parallelism inherent to the algorithm it implements.
Yes, some things you only know at runtime in the CPU. But I’d still be surprised if the CPU doesn’t have to recover info that the compiler already has computed.
And ditto for C being mostly unable to let you encode data dependencies / partial order, and instead forcing a total order.
I’m sure someone has tried to measure this at some point – how much parallelism in a x86 or C program is static vs. how much is dynamic. I’d be very interested in that analysis.
I have worked on more than one EDGE architecture in the last few years. The concept is beautiful, but there were sufficient problems that neither ended up shipping (which means I probably can’t say much about the, in public, unfortunately: Doug Berger gave an ISCA keynote about one of the, a few years back, so at least it’s existence isn’t secret). The problems come down to modelling conditional control flow. In C code, there’s an average of one branch every seven instructions. An efficient encoding needs to handle dataflow between basic blocks well. If you handle dataflow within and between basic blocks with different mechanisms then you end up needing more hardware than just doing register rename.
This seems crazily inefficient
It is kinda, but one lesson of the past couple decades has been that compact code often wins over orthogonal / “makes sense”, because bigger code blows out cache and runs into fetch bottlenecks. Pretending you have less registers than you do saves bits over making the true number of registers addressable, even counting the movs that would otherwise be unnecessary. And pretending everything is linear and recovering the parallelism in hardware saves bits over actually encoding the parallelism.
Better yet just set color-scheme: dark light
on html
and get a dark theme for free if you only use system colors
Thank you! Are there any other things I can add to tell the browser, “just do what’s best for the user, and don’t worry about breaking my other styles, because I don’t have any, because I don’t know what’s best for the user”?
I’ve gotta be honest… I sorta stopped reading when I got to the part about the library being 1MB. Why would it be important to reduce the size? Maybe on some embedded device with a super-small amount of memory? Otherwise, when your watch has GBs of memory, why optimize for library size?
That aside… cargo bloat seems like a pretty cool tool!
A megabyte here, a megabyte there, soon you have Electron-level bloat, which everyone loves to hate. I don’t know about Federico, but as a library developer, I’m keenly aware that, at least judging by message board discussions, my potential users are harsh critics of bloat. And in my particular case, I don’t want anyone to decide that they’re not going to make their GUI accessible (e.g. to screen reader users) because my library was too bloated.
Your app isn’t alone in your phone. When it grows in size, it will cause other apps to be flushed out of memory, and when backgrounded, it will be flushed out sooner because it occupies memory.
Are you aware of specific parts of std that are bloated due to generics?
I’m not an authority on this, but here’s what I know. The base overhead that std adds to each binary is primarily due to the backtrace functionality. Statically linking the code to walk the stack, read debug info, etc. adds considerable overhead. This code can be removed, but only using the unstable build-std feature of cargo and the unstable panic_immediate_abort feature of std, as documented in min-sized-rust.
I think most people complaining about the size would like to see some way that you can include only the part of the std which your program actually uses. Which IIRC is something we won’t see happening that fast.
It’s already there… With LTO, only the part of std used is included. Two parts of std are problematic for this: one is backtrace, as already mentioned. Another is formatting: it uses dynamic dispatch so unused code may be included due to analysis conservatism. Otherwise, all unused parts of std should be removed by LTO. If not, please file a bug.
Oh interesting to know, didn’t realize LTO does that already. What exactly do I have to set for that ? Because I’ve seen various values for LTO=true
, including “fat” and with multiple answers whether it makes a difference.
I know you’re worried about blocking the main thread. But consider this: it’s way easier to fix a main-thread-blocker than it is to fix a weird, intermittent bug or crash due to threading.
I think it’s better still to use a language that prevents those classes of intermittent bugs and crashes, such as Rust. While language-level concurrency support, through features like controlled mutability and Rust’s Send and Sync traits, is certainly no panacea, particularly when interoperating with code in other languages, I think it does help. I was disappointed to realize the other day that Apple apparently didn’t tackle this problem in Swift. Yes, they more recently added things like async/await and actors, but AFAIK, nothing like Rust’s features for actually using multiple threads safely.
If the discourse on sites like this one is any indication, I think there’s a growing backlash against the trend of ever-increasing abstraction. Not just here in infrastructure, but elsewhere as well; for example, some of us want to go back to writing GUI applications in systems programming languages (albeit with better safety, e.g. Rust). We’ve had enough of teetering atop the tower of abstractions, and we want to find other ways forward. So, I think the future should be finding better ways to deploy and manage infrastructure at level 1 or 2, while keeping the good parts of higher-abstraction deployments (e.g. reproducible infrastructure as code). I’m watching the Nix scene with interest, but haven’t yet jumped in with a real NixOS deployment.
If you want to dip a toe in, there’s https://devenv.sh for development purposes. You can move from there to deployment later if you like it.
I look forward to finding out what they’ll be running on the bare metal this time. A conventional distro like Ubuntu or Debian? a VM hypervisor like vSphere or Proxmox VE? An immutable container-focused OS like Flatcar? And will they be doing OS installs on boot drives, or doing some kind of PXE boot setup? They’re now going a level below the abstraction of ephemeral, easily replaced VMs that they currently have with something like EC2, so these choices matter now.
I also wonder what they’ll be using for storage, e.g. ZFS, LVM, or something else. In typical cloud deployments, the durable storage is all managed by the cloud provider (through things like RDS, S3, and maybe EFS), and the ephemeral VMs can just have smallish ext4 root filesystems.
And don’t forget problems like “my VM/container/whathaveyou crashes every couple of days on this new hardware we’ve got, nobody has a clue why, and there is nobody else to pass the buck to”.
We do something like this (custom Debian-based PXE-booted in-memory OS that runs VMs for CI) and while we would never have been able to afford to run this on the cloud and we are able to run it on hardware that’s not available on the cloud (like M1 Mac Studio for aarch64 builds), it’s not all roses. For example, the Mac Studio has a USB controller that requires firmware loaded during boot and there is some race that causes it to get wedged once in a while after reboot and which requires cold boot to recover.
And don’t forget problems like “my VM/container/whathaveyou crashes every couple of days on this new hardware we’ve got, nobody has a clue why, and there is nobody else to pass the buck to”.
Eh, my experience with running Linux on commodity x86-64 server hardware, particularly rented from a provider like OVH, is that it just keeps running, and if there’s a reliability problem, it’s something I did wrong at a higher layer of the stack.
Whether their practitioners will admit it or not3, there’s a certain amount of cool machismo affiliated4 with low-level and systems programming, especially when that programming is in unsafe languages.
I’d argue that this machismo is also the reason programmers are so dead-set on finding/creating/using memory-safe languages without garbage collectors when gc already exists and works great for 90% of use-cases. It makes me think of the “men would rather do X than go to therapy” memes. I recognize that this is probably an unpopular opinion.
I will happily use GC where the situation makes sense. For application programming, these days I would consider using a language without a GC to be an exercise in extreme masochism, not machismo. Half the time, application code isn’t even CPU-intensive enough to require a compiled language and I’ll happily use something like Lua for the majority of the logic.
C and C++ are systems programming languages. C++ is a significantly better one, because it at least pretends to have types, but both of them are intended for use cases where you sometimes need to peak below the level of the abstract machine. A memory allocator can’t be written in a GC’d language[1], because it’s providing one of the things that the GC depends on. The same is true for a lot of language-runtime features.
Beyond that, some larger systems code is very sensitive to tail latency, yet most GC’s are throughput-optimised. Introducing 10ms latency spikes on a system in a datacenter can make tail latency spike into the seconds, which is noticeably to humans. Twitter does some astonishingly horrible things to avoid this in their software.
Global GC[2] has no place in a systems language. You’re right that GC works great for 90% of use cases. I’d actually put that number higher: GC works great for anything that isn’t systems programming, and that is well over 99% of programming. Safe systems languages are still rare / nonexistent (Rust is safer, but since it needs unsafe
for any data structure that isn’t a tree and has no way of guaranteeing safety properties in FFI, it isn’t safe), safe applications programming languages are common and widely used. People using systems programming languages for things that are not systems-programming problems are both inviting and causing suffering.
[1] Well, okay, MMTk exists, but it’s not noticeably easier to write a memory allocator in it than it is in C/C++.
[2] Local GC can be incredibly useful, but that depends on having an abstraction for ‘this related set of objects’, which most systems languages lack currently.
For application programming, these days I would consider using a language without a GC to be an exercise in extreme masochism, not machismo.
I’m not so sure, particularly for local (i.e. desktop and mobile) applications. This post suggests that Apple’s rejection of tracing GC in favor of reference counting has been a competitive advantage for a while. Now, if you want to say reference counting (particularly without cycle collection, as on Apple platforms) is a type of GC, fine. But then, Lua doesn’t use reference counting.
To me, using a type of memory management that frees up memory as soon as we can, rather than using more of it and then cleaning up sometime later when we think we’ve used too much, is a way of putting our users first, not a form of machismo or masochism. Then again, given economic realities, maybe we have to be masochistic to truly put our users first. Anyway, I’d settle for top-to-bottom reference counting, as in Apple-native apps, Windows C++ apps, or even classic VB, as an alternative to tracing GC; I think the adoption of tracing GC in most modern managed runtimes is a mistake for apps running on our users’ machines.
I’d probably include automatic reference counting as GC, though sometimes the requirement to explicitly break cycles can be painful. That post suggests that the memory consumption of ARC is lower than GC. That’s not what I heard from folks on the Apple compiler team, the key was determinism. GC gives less predictable worst-case memory consumption and this was critical for the iPhone because it didn’t swap and so running out of memory would kill a process. This was a lot worse on the early iOS devices, where there was only enough memory for a single graphical application and so a GC peak would cause the foreground app to die, whereas on more modern devices it would instead cause a background one to exit earlier than normal.
It also didn’t help that Apple massively screwed up adding GC to Objective-C. The type system didn’t differentiate between scanned memory and I scanned memory and so it was trivial to accidentally introduce dangling pointers. You could had three allocation mechanisms (two via the same function) that all returned void*. One returned a pointer that was not GC’d. One returned a pointer that was GC’d, but wasn’t scanned by the GC. The third returned a pointer that was GC’d, but was scanned. The last case stored u typed data and so had to be scanned conservatively: integers that happened to contain the address of a valid allocation would introduce leaks. Remember that the original iPhone was 32-bit: the probability of a random integer pointing to a valid allocation and keeping an object live was high in a 32-bit world. Oh, and the stack was also conservatively scanned, with the same problems. This meant hat you could easily write a program that would use 100 MiB of RAM on 99 runs of the same test and 200 MiBs on the 100th.
For most desktop or mobile apps, I simply don’t care about that anymore. The system has enough ability to handle the occasional memory spike (via swapping, paging out cached files, or killing a background app) that a user is unlikely to notice. Computers are fast enough that GC pause times are well below anything a human can notice.
On the other hand, for datacenter applications, a GC pause can really hurt. If your average response time is 1ms but a GC pause turns it into 10ms, that doesn’t matter much locally. If it’s one of 100 queries that you need to respond to a remote request then now the probability of one of those nodes hitting a GC pause is high enough that it impacts average latency and therefore throughput. Twitter did some awful things to try to mitigate this problem.
I think it’s still fair to say that, over most of the life of a long-running application, ARC leads to lower total memory usage than GC. I think about it this way: a GC-based application blithely allocates memory until it hits some threshold, then it cleans up (perhaps only partially), then it goes back to allocating memory like it’s unlimited for a while, and the cycle continues. An ARC-based application frugally frees memory for each object as soon as that object can no longer be reached. (Of course, reference cycles lead to leaks, but hopefully we can avoid those.) To me, the latter is much more considerate of our users. We developers and power-users, who typically buy capacious machines and manage them well, may have enough extra memory to handle the excesses of GC. But we really ought to assume that our users are running our software on machines that are already a bit overtaxed. Particularly in the desktop world, where the OS doesn’t just kill background apps (and, for some kinds of background apps, we wouldn’t want it to).
And yes, I’m unhappy that my current desktop application project, which is an application that runs continuously in the background on users’ machines, is Electron-based. I want to do something about that; the business case just isn’t there yet. Maybe at some point I’ll be able to do the masochistic thing and put my own personal time into it anyway.
It’s not that clear cut. Reference counting requires more per-object state than tracing, though you may be able to pack that state in another word in the common case (which adds some overhead to atomic paths), but you generally can’t for weak references. These end up requiring an extra allocation to track the state, so a cyclic object graph will often need 3-4 words more memory per object with RC than tracing.
In Apple’s case, there’s also the complexity of the autorelease pool. This is used to elide reference-count manipulation for an object that has a lot of short-lived assignments (e.g. things passed up the stack and manipulated on the way). Once an object is autoreleased, it won’t be freed until (at the earliest) the end of the lexical scope of the autorelease pool. Around 15 years ago, I encountered some Objective-C code that allocated a load of short-lived temporaries in a loop and ended up with 500 MiB of garbage in an autorelease pool (my laptop at the time had 1 GiB of RAM). This was easy to fix by adding an autorelease pool into a hot loop, but that’s basically the same as adding an explicit GC call in a hot loop.
Electron has a lot of overhead from things beyond the GC (not least the fact that it runs at least two completely separate v8 instances). On my Mac, the Signal app (Electron, but well written) does not use noticeably more RAM than other comparable apps.
I’m actually 100% in agreement with you — except that it’s de mode to develop applications in a certain systems programming language with “zero cost abstractions” and to deride garbage collection as heavy and slow.
In some cases these confidently incorrect beliefs can be automatically caught. But the culture of false confidence cannot itself be caught: C is a motorcycle, and turning it into a tricycle removes the “cool” factor that attracts so many people to it.
For the record, if I could develop in C with the safety of a tricycle, but have the resulting code be acceptable to the motorcycle crowd (at least enough for them to use my library, whether or not they contribute to it), I’d accept that. Probably not for my current major library project though, since I’m way too far down the road of writing that one in Rust.
The only way things ever change is if someone makes the first move, instead of blaming each other or waiting on the world to change. I can only control my own behavior, so I have to make the first move. I’m a developer, so I guess that means I have to voluntarily saddle myself with underpowered hardware, so I can feel the users’ pain, perhaps magnified. I’ve already put off upgrading my main PC; I’m currently using a Skylake laptop from 2016. But that’s still a quad-core i7-6700HQ with 16 GB of RAM, so maybe I need to go lower.
I worked at Sun Microsystems in the early 1990s and I remember hearing that this was a policy on some teams that were building UI code. They were given low-end workstations so they would experience their UIs the same way end users would. Can’t say firsthand if it was actually true (I was working on low-level stuff) but it seemed like an interesting concept to me at the time.
That is a smart idea and one heck of a smart manager! Sun was innovative in so many ways, this may very well be true.
Facebook is reported to have had a similar concept, but with internet speed. On Tuesdays, they gave employees the possibility of experiencing their website as if they had a 2G connection. (source: https://www.businessinsider.com/facebook-2g-tuesdays-to-slow-employee-internet-speeds-down-2015-10)
“Why does Thunderbird look so old, and why does it take so long to change?”
If that actually does bother anyone, can’t they just use a different email client? Wouldn’t it be better to keep Thunderbird working with its existing UI, for those of us that have gotten used to it over the past 20 years?
I think Thunderbird has lost it’s priorities.
No one actually cares how modern an email client looks. No one is wearing their email client on their head as a fashion statement. No one is reading their email in closet because they can’t stand the idea that someone would peak over their shoulder and see them using UI from 2004.
Thunderbird needs some reorganization and usability updates but modernizing the design is not one of them.
Having worked at plenty of software companies I often find rewrites are often just large feature requests cobbled together under the guise of rewriting the base of the application as the only way to achieve the cluster of new features. Is the old software really that bad or has it grown in a disorganized way and/or do you just not understand it?
I dunno, I won’t use a mail app that looks weird and old. I’d consider using Thunderbird if it looked good and worked better than Mail.app.
real talk: Thunderbird looks better than it did a couple years back! I booted into it for the first time in 4 or so years and was like “oh this is pretty good”
Granted I’m in KDE and before I was using it in Mac. But I feel like it’s pretty good for “email power usage”.
There’s a legit complaint about how the panes are laid out by default, but I think that you can get pretty far by just moving around existing components.
UI and UI conventions for email have been pretty continuously evolving since we first got GUI mail clients. And that’s without considering that UI conventions in general evolve over time; an application that doesn’t match the UI conventions of the surrounding system is likely to see itself replaced by one that does.
Which is why keeping the same UI for decades on end is not something that usually works in mass-market software.
I can confidently say I’ve never stopped using a useful piece of software because they hadn’t updated their UI to keep up with whatever fashion trend is currently hip. On the other hand, I have (repeatedly) stopped using software after a UI overhaul screwed up my existing comfort with the software, opening the door for me to look around and see what other software, while equally alien, might have better functionality.
Messing with the look and feel of your application is an exercise in wagering your existing users in pursuit of new ones. In commercial projects, that can often make sense. In FLOSS, it rarely does, as your existing users, particularly those that will be the most annoyed by changes to their workflows, are also the most likely to contribute to the project, either through thorough bug reports or pull requests.
I think it is important to consider the possibility that you are not representative of the mass-market software user. Or, more bluntly: if you were representative of the mass-market software user, the market would behave very differently than what we observe.
I dunno, I don’t think Thunderbird users in general are representative of the mass-market software user. Most of those are just using Gmail or Outlook.com. Desktop MUA users are few and far between these days and use them specifically because they have specialized mail management requirements.
I don’t see where APhilisophicalCat claims to be “representative of the mass-market software user”. I rather would interpret their “In commercial projects, that can often make sense. In FLOSS, it rarely does” as disagreeing with you on whether Thunderbird is “mass-market software”. (I don’t use Thunderbird and I claim no stake in this.)
I think the people building Thunderbird think of it as, or at least aspire to it being, a mass-market application.
(standard disclaimer: I am a former employee of the Mozilla Corporation; I have not worked there since the mid-2010s; when I did work there the stuff I worked on wasn’t related to Thunderbird; I am expressing my own personal opinion based on reading the public communications of the Thunderbird team rather than any sort of lingering insider knowledge or perspective)
I don’t think things evolved, I think we just have a bunch of ways to stylize a list and where buttons to other lists might go. There are trends but at the end of the day you’re reading a list of things.
The point I was trying to make is that sometimes rewrites are just shifting complexity and that you can satisfy both crowds by working on an old tech stack. Not that there isn’t a market for whatever-UI-trend email apps.
I don’t think things evolved, I think we just have a bunch of ways to stylize a list and where buttons to other lists might go.
I remember when Gmail first came out, and introduced a broad audience to the switch from email clients that present things in terms of individual messages, to clients that present things in terms of conversations.
From an underlying technical perspective this may feel like nothing – just two ways of styling lists, why does it matter? – but from a user interface and experience perspective it was a gigantic shift. It’s rare these days to see an email client or interface that still clings to the pre-Gmail message-based UI conventions.
The same is true for other UI conventions like labeling/tagging, “snooze” functions, etc. etc. Sure, in some sense your inbox is still a list or at most a list of trees, but there are many different ways to present such a thing, and which way you choose to present does in fact matter to end users.
Exactly and there isn’t one crowd; you should aim to appease both. Even if Gmail started a new trend.
I think a lot of people don’t like using “ugly” software. Definitely matters more to nontechies than it does to techies, I think.
even techies will look elsewhere if the app gives you the vibe of something that seems to be from the windows 2000 era and thus probably as other problems too (let’s say scaling)
But current thunderbird on the desktop looks fine ?!
There’s a certain group (small?) that likes these old interfaces though. Enough that things like SerenityOS are quite popular. Or maybe it’s just me.
I can appreciate a very technical but functional UI where you can find everything you need, but it doesn’t look that fresh. And then there is also the “jankynes” factor, like Handbrake, which looks very janky, but exposes all ffmpeg configuration flags as a UI. In my experience there is a big divide between applications that just look old but provide a lot of functionality, even looking janky at first - and apps that simply got thrown together in a short time and weren’t updated to keep up with modern requirements. One example for the latter is looking at f-droid applications, where a very old looking app can be a good indicator that it has never been updated to modern android permission requirements. Or that your desktop application just doesn’t support scaling and is either tiny on high-DPI or blurry - god forbid you move it between different DPI displays.
Yup, that’s why Sylpheed/Claws were considered examples of a great UI for desktop email: https://www.claws-mail.org/screenshots.php
Techies are just as, if not more, aesthetically conscious when it comes to software than non-techies. They just have different aesthetic preferences.
I agree with the above. I’d much prefer if this announcement was about fundamental things related to the internals of thunderbird, not the chrome.
I say this as a long-time thunderbird user, who loves the app and hopes to continue using it (my thunderbird profile is 11 years old). Don’t fix what’s not broken, but do fix what is.
If that actually does bother anyone, can’t they just use a different email client?
Name one other usable, open-source, mail client that runs on Windows.
Looking old covers a lot of things. There are a bunch of things in the Thunderbird that are annoying and probably hard to fix without some significant redesign. Off the top of my head:
It’s also not clear to me how well the HTML email renderer view is sandboxed. Apple’s Mail.app opens emails in separate sandboxed renderer processes, I’m not sure the extent to which Thunderbird can take advantage of this underlying functionality from Firefox because it’s designed to isolate tabs and all of Thunderbird’s data lives in one tab.
Sylpheed? But now we’re talking about super-esoteric software. I can’t imagine the user that uses sylpheed and thinks “Thunderbird isn’t usable enough”.
familiarity is a usability feature. And I’m sure most UX people are aware of this on some level, but it’s rare to see it articulated
A screen reader that uses machine learning, computer vision, etc. to literally read the screen, i.e. interpreting the pixels on the screen to reconstruct the text and the UI structure. I’ve concluded that this needs to exist on all desktop and mobile platforms, as soon as it’s technologically feasible. Getting semantic information from applications through accessibility APIs, semantic markup, ARIA, etc. is a useful stopgap, and will continue to be for a while longer, which is why I’m not giving up on my AccessKit project. But getting the whole world to accommodate disabled users by implementing this kind of accessibility is a never-ending battle, and fighting against most people’s natural preference to pretend that we don’t exist gets discouraging. I want to solve the problem once and for all.
Apple has already implemented a feature called Screen Recognition in their VoiceOver screen reader for iOS. Here’s their paper on Screen Recognition. They had to train it on ~77,000 screens from ~4,000 apps, using a total of 50 workers to gather and annotate the data. And as they admit, the results are far from perfect. I wonder how much better could be done with $1 billion, or how soon we’d reach diminishing returns.
Edit to add: No, that’s not ambitious enough for the posed question. Let’s go for a pair of smart glasses with stereoscopic cameras, built-in bone conduction headphones, and an on-device AI, that can not only read a computer or phone screen, but describe whatever it’s seeing in real time (something akin to the audio description tracks available for many movies and TV shows), translate text to Braille and send it wirelessly to a Braille display (and maybe we should build a larger, higher-resolution tactile display while we’re at it), and use spatial audio to help with independent navigation.