The interesting thing to me about the argument made in the article, is that one could achieve pretty much the same result by cutting and pasting Javascript-style exception handling in the same way that one cuts and pastes if err != nil { return err } in Go.
And I’m not defending Javascript-style exception handling here.
Nor am I suggesting that Go error handling is as bad as Javascript-style exception handling.
I’m just pointing out that it’s a weak argument that boils down to “I like the look of cut&paste if better than I like the look of cut&paste try..catch”.
I personally like languages that support handling most errors as values, and yet still have support for exceptions for dealing with things that most callers are not expecting to deal with (i.e. where an actual longjmp makes sense). To that end, Go got it half right by supporting multiple return values, which is a critical ingredient for supporting errors as values.
I personally like languages that support handling most errors as values, and yet still have support for exceptions for dealing with things that most callers are not expecting to deal with (i.e. where an actual longjmp makes sense). To that end, Go got it half right by supporting multiple return values, which is a critical ingredient for supporting errors as values.
Go also has panic() for things that most callers are not expecting to deal with.
Yes, I’ve seen panic/recover. The recover side seems a bit awkward, but exception handling has always seemed at least a little awkward. I’m pretty sure that the use of panic/recover is generally frowned on in Go, and I’ve read Rob Pike’s comments on the topic, and I agree with his fundamental position (you shouldn’t need more than a handful of “recovers” in an entire system – at least for most systems).
It’s frowned upon for error handling, but not for cases like you described—events that are truly exceptional and indicate a program error. The reflect standard library panics all over the place.
Nice! I don’t know what it is but there is something really satisfying about hosting your website at home.
You can have some fun as well, like getting an LED to blink on every hit to the site.
I do want to do something hardware related because right now I’m under utilising the pi’s hardware abilities, but I feel like I’d have trouble distinguishing real traffic from bot traffic.
I have an interactive pixel grid that syncs to an ePaper in my home on my website: https://www.svenknebel.de/posts/2023/12/2/ (picture, grid it self at the top of the homepage feed)
Very intentionally very low-res so I don’t have to worry about people writing/drawing bad stuff, and its an entirely separate small program, so if someone ever manages to crash it only that part is gone.
I just moved my blog off of EC2 to my Raspberry Pi Kubernetes cluster at home just today. The whole idea behind running it on EC2 was that I figured I would have fewer reliability issues than on my homelab Kubernetes cluster, but the Kubernetes cluster has been remarkably stable (especially for stateless apps) and my EC2 setup was remarkably flaky[^1]. It’s definitely rewarding to run my own services, and it saves me a bunch of time/money to boot.
[^1]: not because of EC2, but because I would misconfigure Linux things, or not properly put my certificates in an EBS volume, or not set the spot instance termination policy properly, or any of a dozen other things–my k8s cluster runs behind cloudflare which takes care of the https stuff for me
I don’t think I would write “Go’s error handling is awesome”, but it’s probably the least bad thing I’ve used. The main alternatives to exceptions that I’ve used have been C’s return values which are typically pretty terrible and Rust’s which are better in principle (less boilerplate at call sites, no unhandled-error bugs, and similar non-problems), but somewhat worse than Go’s in practice:
My biggest grievance with Rust error handling is the choice between (1) the sheer burden of managing your own error trait implementations (2) the unidiomatic-ness of using anyhow in libraries and (3) the absurd amount of time I spend debugging macro expansion errors from crates like thiserror. Maybe there’s some (4) option where you return some dynamic boxed error and pray callers never need to introspect it?
And I don’t think either Rust or Go have a particularly compelling solution for attaching stack traces automatically (which is occasionally helpful) or for displaying additional error context beyond the stack trace (i.e., all of the parameters that were passed down in a structured format)? Maybe Rust has something here–I’m somewhat less familiar.
I’m also vaguely aware that Zig and Swift do errors-as-values, but I haven’t used them either. Maybe the people who have found something better than Go could enlighten us?
We use a custom error type in Go that lets you add fields, like loggers do, and just stores the values in a map[string]interface{}. This doesn’t solve the problem of capturing all parameter values, or stack traces, but it’s pretty good in practice. This is particularly true since errors tend to eventually end up in log messages so you can do something like log.WithFields(err.Fields).Error("something terrible happened"). If we could nest those fields based on the call stack I would probably never complain again.
Best solution Ive used is polymorphic variants from ocaml. They force you to at least acknowledge the error exists and you can easily compose different errors together.
Exceptions are better on all counts though, so even that is a faulty conclusion in my opinion.
Exceptions do the correct thing by default, auto-unwrap in the success case, auto-bubble up on the error case with as fine-grained scoping as necessary (try-catch blocks). Go’s error handling is just a tad bit higher than the terrible mess that is C’s errno.
Edit: Re cryptic stack traces: most exception-based languages can trivially chain exceptions as well, so you get both a clear chain of cause and a stack trace. I swear if Java would simply default to printing out the message of each exception in the chain at the top in an easier to read matter and only print the stack trace after, people would like it better.
I was hoping this article would compare if err != nil to more modern approaches (Rust’s ?) and not just Java-style exceptions, but unfortunately it doesn’t.
I’d be more interested to read an article that weighs the approaches against each other.
One point the article misses is how value-based error handling works really nicely when you don’t have constructors (either in your language, or at least your codebase, in case you’re using C++.)
I’ve been pretty disappointed by Rust’s approach to error handling. It improves upon two “problems” in Go which IMHO are not actually problems in practice: if err != nil boilerplate and unhandled return values, but making good error types is fairly hard–either you manually maintain a bunch of implementations of the Error trait (which is a truly crushing amount of boilerplate) or you use something like anyhow to punt on errors (which is generally considered to be poor practice for library code) or you use some crate that generates the boilerplate for you via macros. The latter seems idyllic, but in practice I spend about as much time debugging macro errors as I would spend just maintaining the implementations manually.
In Go, the error implementation is just a single method called Error() that returns a string. Annotating that error, whether in a library or a main package, is just fmt.Errorf("calling the endpoint: %w", err). I don’t think either of them do a particularly good job of automating stack trace stuff, and I’m not sure about Rust, but at least Go does not have a particularly good solution for getting more context out of the error beyond the error message–specifically parameter values (if I’m passing a bunch of identifiers, filepaths, etc down the call stack that would be relevant for debugging, you have to pack them into the error message and they often show up several times in the error message or not at all because few people have a good system for attaching that metadata exactly once).
A lot of people have a smug sense of superiority about their language’s approach to error handling, which (beyond the silliness of basing one’s self-esteem on some programming language feature) always strikes me as silly because even the best programming languages are not particularly good at it, or at least not as good as I imagine it ought to be.
a bunch of implementations of the Error trait (which is a truly crushing amount of boilerplate)
you usually just need to impl Display, which I wouldn’t call a “crushing” amount of boilerplate
or you use some crate that generates the boilerplate for you via macros.
thiserror is pretty good, although tbqh just having an enum Error and implementing display for it is good enough. I’ve done some heavy lifting with error handling before but that’s usually to deal with larger issues, like making sure errors are Clone + Serialize + Deserialize and can keep stacktraces across FFI boundaries.
It’s pretty rarely “just” impl Display though, right? If you want automatic conversions from some upstream types you need to implement From, for example. You could not do it, but then you’re shifting the boilerplate to every call site. Depending on other factors, you likely also need Debug and Error. There are likely others as well that I’m not thinking about.
#[derive(Debug)] and impl Display makes the impl of Error trivial (impl Error for E {}). If you’re wrapping errors then you probably want to implement source(). thiserror is a nice crate for doing everything with macros, and it’s not too heavy so the debugging potential is pretty low.
One advantage of map_err(...) everywhere instead of implementing From is that it gives you access to file!() and line!() macros so you can get stack traces out of your normal error handling.
thiserror is a nice crate for doing everything with macros, and it’s not too heavy so the debugging potential is pretty low.
I’ve used thiserror and a few other crates, and I still spend a lot more time than I’d like debugging macro expansions. To the point where I waffle between using it and maintaining the trait implementations by hand. I’m not sure which of the two is less work on balance, but I know that I spend wayyy more time trying to make good error types in Rust than I do with Go (and I’d like to reiterate that I think there’s plenty of room for improvement on Go’s side).
One advantage of map_err(…) everywhere instead of implementing From is that it gives you access to file!() and line!() macros so you can get stack traces out of your normal error handling.
Maybe I should try this more. I guess I wish there was clear, agreed-upon guidance for how to do error handling in Rust. It seems like lots of people have subtly different ideas about how to do it–you mentioned just implementing Display while others encourage thiserror and someone else in this thread suggested Box<dyn Error> while others suggest anyhow.
The rule of thumb I’ve seen is anyhow for applications and thiserror or your own custom error type for libraries, and if thiserror doesn’t fit your needs (for example, needing clone-able or serializable errors, stack traces, etc). Most libraries I’ve seen either use thiserror if they’re wrapping a bunch of other errors, or just have their own error type which is usually not too complex.
That’s too bad, I genuinely enjoy learning about new (to me) ways of solving these problems, I just dislike the derisive fervor with which these conversations take place.
You discount anyhow as punting on errors, but Go’s Error() with a string is the same strategy.
If you want that, you don’t even need anyhow. Rust’s stdlib has Box<dyn Error>. It supports From<String>, so you can use .map_err(|err| format!("calling the endpoint: {err}")). There’s downcast() and .source() for chaining errors and getting errors with data, if there’s more than a string (but anyhow does that better with .context()).
One source of differences in different languages’ error handling complexity is whether you think errors are just generic failures with some human-readable context for logging/debugging (Go makes this easy), or you think errors have meaning that should be distinguishable in code and handled by code (Rust assumes this). The latter is inherently more complicated, because it’s doing more. You can do it either way in either language, of course, it’s just a question of what seems more idiomatic.
I don’t think I agree. It’s perfectly idiomatic in Go to define your own error types and then to handle them distinctly in code up the stack. The main difference is that Rust typically uses enums (closed set) rather than Go’s canonical error interface (open set). I kind of think an open set is more appropriate because it gives upstream functions more flexibility to add error cases in the future without breaking the API, and of course Rust users can elect into open set semantics–they just have to do it a little more thoughtfully. The default in Go seems a little more safe in this regard, and Go users can opt into closed set semantics when appropriate (although I’m genuinely not sure off the top of my head when you need closed set semantics for errors?). I’m sure there are other considerations I’m not thinking of as well–it’s interesting stuff to think about!
Maybe “idiomatic” isn’t quite the right word and I just mean “more common”. As I say, you can do both ways in both languages. But I see a lot of Go code that propagates errors by just adding a string to the trace, rather than translating them into a locally meaningful error type. (E.g.,
return fmt.Errorf("Couldn't do that: %w", err)
so the caller can’t distinguish the errors without reading the strings, as opposed to
return &ErrCouldntDoThat{err} // or equivalent
AFAIK the %w feature was specifically designed to let you add strings to a human-readable trace without having to distinguish errors.
Whereas I see a lot of Rust code defining a local error type and an impl From to wrap errors in local types. (Whether that’s done manually or via a macro.)
Maybe it’s just what code I’m looking at. And of course, one could claim people would prefer the first way in Rust, if it had a stdlib way to make a tree of untyped error strings.
But I see a lot of Go code that propagates errors by just adding a string to the trace, rather than translating them into a locally meaningful error type
Right, we usually add a string when we’re just passing it up the call stack, so we can attach contextual information to the error message as necessary (I don’t know why you would benefit from a distinct error type in this case?). We create a dedicated error type when there’s something interesting that a caller might want to switch on (e.g., resource not found versus resource exists).
AFAIK the %w feature was specifically designed to let you add strings to a human-readable trace without having to distinguish errors.
It returns a type that wraps some other error, but you can still check the underlying error type with errors.Is() and errors.As(). So I might have an API that returns *FooNotFoundErr and its caller might wrap it in fmt.Errorf("fetching foo: %w", err), and the toplevel caller might do if errors.As(err, &fooNotFoundErr) { return http.StatusNotFound }.
Whereas I see a lot of Rust code defining a local error type and an impl From to wrap errors in local types. (Whether that’s done manually or via a macro.)
I think this is just the open-vs-closed set thing? I’m curious where we disagree: In Go, fallible functions return an error which is an open set of error types, sort of like Box<dyn Error>, and so we don’t need a distinct type for each function that represents the unique set of errors it could return. And since we’re not creating a distinct error type for each fallible function, we may still want to annotate it as we pass it up the call stack, so we have fmt.Errorf() much like Rust has anyhow! (but we can use fmt.Errorf() inside libraries as well as applications precisely because concrete error types aren’t part of the API). If you have to make an error type for each function’s return, then you don’t need fmt.Errorf() because you just add the annotation on your custom type, but when you don’t need to create custom types, you realize that you still want to annotate your errors.
I tend to agree that rusts error handling is both better and worse. In day to day use I can typically get away with anyhow or dyn Error but it’s honestly a mess, and one that I really dread when it starts barking at me.
On the other hand… I think being able to chain ‘?’ blocks is a god send for legibility, I think Result is far superior to err.
I certainly bias towards Rusts overall but it’s got real issues.
There is one thing to be said against ?: it does not encourage the addition of contextual information, which can make diagnosing an error more difficult when e.g. it gets expect-ed (or logged out) half a dozen frames above with no indication of the path it took.
However I that is hardly unsolvable. You could have e.g. ?("text") which wraps with text and returns, and ?(unwrapped) which direct returns (the keyword being there to encourage wrapping, one could even imagine extending this to more keywords e.g. ?(panic)` would be your unwrap).
Oh yeah I’m not saying it’s not possible to decorate things (it very much is), just pointing out that the incentives are not necessarily in that direction.
If I was a big applications writer / user of type-erased errors, I’d probably add a wrapping method or two to Result (if I was to use “raw” boxed error, as IIRC anyhow has something like that already).
I’ve often wondered if people would like Java exceptions more if it only supported checked exceptions. You still have the issue of exceptions being a parallel execution flow / go-to, but you lose the issue of random exceptions crashing programs. In my opinion it would make the language easier to write, because the compiler would force you to think about all the ways your program could fail at each level of abstraction. Programs would be more verbose, but maybe it would force us to think more about exception classes.
Tl;Dr Java would be fine if we removed RuntimeException?
No, Go has unchecked exceptions. They’re called “panics”.
What makes Go better than Java is that you return the error interface instead of a concrete error type, which means you can add a new error to an existing method without breaking all your callers and forcing them to update their own throws declarations.
Rust, for example, has a good compromise of using option types and pattern matching to find error conditions, leveraging some nice syntactic sugar to achieve similar results.
I’m also personally quite fond of error handling in Swift.
Rust, Zig, and Swift all have interesting value-oriented results. Swift more so since it added, well, Result and the ability to convert errors to that.
No matter how many times Go people try to gaslight me, I will not accept this approach to error-handling as anything approaching good. Here’s why:
Go’s philosophy regarding error handling forces developers to incorporate errors as first class citizens of most functions they write.
[…]
Most linters or IDEs will catch that you’re ignoring an error, and it will certaintly be visible to your teammates during code review.
Why must you rely on a linter or IDE to catch this mistake? Because the compiler doesn’t care if you do this.
If you care about correctness, you should want a compiler that considers handling errors part of its purview. This approach is no better than a dynamic language.
The fact that the compiler doesn’t catch it when you ignore an error return has definitely bitten me before. doTheThing() on its own looks like a perfectly innocent line of code, and the compiler won’t even warn on it, but it might be swallowing an error.
I learned that the compiler doesn’t treat unused function results as errors while debugging a bug in production; an operation which failed was treated as if it succeeded and therefore wasn’t re-tried as it should. I had been programming in Go for many years at that point, but it had never occurred to me that silently swallowing an error in Go could possibly be so easy as just calling a function in the normal way. I had always done _ = doTheThing() if I needed to ignore an error, out of the assumption that of course unused error returns is a compile error.
Because errors aren’t special to the Go compiler, and Go doesn’t yell at you if you ignore any return value. It’s probably not the most ideal design decision, but in practice it’s not really a problem. Most functions return something that you have to handle, so when you see a naked function call it stands out like a sore thumb. I obviously don’t have empirical evidence, but in my decade and a half of using Go collaboratively, this has never been a real pain point whether with junior developers or otherwise. It seems like it mostly chafes people who already had strong negative feelings toward Go.
Is there a serious argument behind the sarcasm as to how this is comparable to array bounds checks? Do you have any data about the vulnerabilities that have arisen in Go due to unhandled errors?
Because the programmer made an intentional decision to ignore the error. It won’t let you call a function that returns an error with out assigning it to something, that would be a compile time error. If the programmer decides to ignore it, that’s on the programmer (and so beware 3rd party code).
Now perhaps it might be a good idea for the compiler to insert code when assigned to _ that panics if the result is non-nil. Doesn’t really help at runtime, but at least it would fail loudly so they could be found.
I’ve spent my own share of time tracking down bugs because something appeared to be working but the error/exception was swallowed somewhere without a trace.
huh… til. I always assumed you needed to use the result, probably because of single vs multiple returns needing both being a compile time error. Thanks.
Because the programmer made an intentional decision to ignore the error.
f.Write(s)
is not an intentional decision to ignore the error. Neither is
_, err := f.Write(s)
Yet the go compiler will never flag the first one, and may not flag the second one depending on err being used elsewhere in the scope (e.g. in the unheard of case where you have two different possibly error-ing calls in the same scope and you check the other one).
Yet the go compiler will never flag the first one, and may not flag the second one depending on err being used elsewhere in the scope (e.g. in the unheard of case where you have two different possibly error-ing calls in the same scope and you check the other one).
_, err := f.Write(s) is a compiler error if err already exists (no new variables on left side of :=), and if err doesn’t already exist and you aren’t handling it, you get a different error (declared and not used: err). I think you would have to assign a new variable t, err := f.Write(s) and then take care to handle t in order to silently ignore the err, but yeah, with some work you can get Go to silently swallow it in the variable declaration case.
Because they couldn’t be arsed to add this in v0, and they can’t be arsed to work on it for cmd/vet, and there are third-party linters which do it, so it’s all good. Hopefully you don’t suffer from unknown unknowns and you know you should use one of these linters before you get bit, and they don’t get abandoned.
(TBF you need both that and errcheck, because the unused store one can’t catch ignoring return values entirely).
I don’t really care. Generally speaking, I would expect compilers to either warn or error on an implicitly swallowed error. The Go team could fix this issue by either adding warnings for this case specifically (going back on their decision to avoid warnings), or by making it a compile error, I don’t care which.
This is slightly more nuanced. Go project ships both go build and go vet. go vet is an isomorphic to how Rust handles warnings (that warnings apply to you, not your dependencies).
So there would be nothing wrong per se if this was caught by go vet and not go build.
What is the issue though, is that this isn’t caught by first-party go vet, and requires third party errcheck.
Meh plenty of code bases don’t regularly run go vet. This is a critical enough issue that it should be made apparent as part of any normal build, either as a warning or an error.
If you care about correctness, you should want a compiler that considers handling errors part of its purview. This approach is no better than a dynamic language.
I agree with you that it’s better for this to be a compiler error, but (1) I’ll never understand why this is such a big deal–I’m sure it’s caused bugs, but I don’t think I’ve ever seen one in the dozen or so years of using Go and (2) I don’t think many dynamic languages have tooling that could catch unhandled errors so I don’t really understand the “no better than a dynamic language” claim. I also suspect that the people who say good things about Go’s error handling are making a comparison to exceptions in other languages rather than to Rust’s approach to errors-as-values (which has its own flaws–no one has devised a satisfactory error handling system as far as I’m aware).
The fact that these bugs seem so rare and that the mitigation seems so trivial makes me feel like this is (yet another) big nothingburger.
The most common response to my critique of Go’s error-handling is always some variation on “this never happens”, which I also do not accept because I have seen this happen. In production. So good for you, if you have not; but I know from practice this is an issue of concern.
I don’t think many dynamic languages have tooling that could catch unhandled errors so I don’t really understand the “no better than a dynamic language” claim.
Relying on the programmer to comprehensively test inputs imperatively in a million little checks at runtime is how dynamic languages handle errors. This is how Go approached error-handling, with the added indignity of unnecessary verbosity. At least in Ruby you can write single-line guard clauses.
I don’t really follow your dismissal of Rust since you didn’t actually make an argument, but personally I consider Rust’s Option type the gold standard of error-handling so far. The type system forces you to deal with the possiblity of failure in order to access the inner value. This is objectively better at preventing “trivial” errors than what Go provides.
The most common response to my critique of Go’s error-handling is always some variation on “this never happens”, which I also do not accept because I have seen this happen. In production. So good for you, if you have not; but I know from practice this is an issue of concern.
I’m sure it has happened before, even in production. I think most places run linters in CI which default to checking errors, and I suspect if someone wasn’t doing this and experienced a bug in production, they would just turn on the linter and move on with life. Something so exceedingly rare and so easily mitigated does not meet my threshold for “issue of concern”.
Relying on the programmer to comprehensively test inputs imperatively in a million little checks at runtime is how dynamic languages handle errors
That’s how all languages handle runtime errors. You can’t handle them at compile time. But your original criticism was that Go is no better than a dynamic language with respect to detecting unhandled errors, which seems untrue to me because I’m not aware of any dynamic languages with these kinds of linters. Even if such a linter exists for some dynamic language, I’m skeptical that they’re so widely used that it merits elevating the entire category of dynamic languages.
I don’t really follow your dismissal of Rust since you didn’t actually make an argument, but personally I consider Rust’s Option type the gold standard of error-handling so far. The type system forces you to deal with the possiblity of failure in order to access the inner value. This is objectively better at preventing “trivial” errors than what Go provides.
I didn’t dismiss Rust, I was suggesting that you may have mistaken the article as some sort of criticism of Rust’s error handling. But I will happily register complaints with Rust’s error handling as well–while it does force you to check errors and is strictly better than Go in that regard, this is mostly a theoretical victory insofar as these sorts of bugs are exceedingly rare in Go even without strict enforcement, and Rust makes you choose between the verbosity of managing your own error types, debugging macro expansion errors from crates like thiserror, or punting altogether and doing the bare minimum to provide recoverable error information. I have plenty of criticism for Go’s approach to error handling, but pushing everything into an error interface and switching on the dynamic type gets the job done.
For my money, Rust has the better theoretical approach and Go has the better practical approach, and I think both of them could be significantly improved. They’re both the best I’m aware of, and yet it’s so easy for me to imagine something better (automatic stack trace annotations, capturing and formatting relevant context variables, etc). Neither of them seems so much better in relative or absolute terms that their proponents should express superiority or derision.
Fair enough. It’s a pity things like this are so difficult to answer empirically, and we must rely on our experiences. I am very curious how many orgs are bitten by this and how frequently.
Enabling a linter is different from doing “a million little checks at runtime”. This behaviour is not standard because you can use Go for many reasons other than writing production-grade services, and you don’t want to clutter your terminal with unchecked error warnings.
I admit that it would be better if this behaviour were part of go vet rather than an external linter.
The strange behaviour here is not “Go people are trying to gaslight me”, but people like you coming and complaining about Go’s error handling when you have no interest in the language at all.
Enabling a linter is different from doing “a million little checks at runtime”.
You can’t lint your way out of this problem. The Go type system is simply not good enough to encapsulate your program’s invarients, so even if your inputs pass a type check you still must write lots of imperative checks to ensure correctness.
Needing to do this ad-hoc is strictly less safe than relying on the type system to check this for you. err checks are simply one example of this much larger weakness in the language.
The strange behaviour here is not “Go people are trying to gaslight me”, but people like you coming and complaining about Go’s error handling when you have no interest in the language at all.
I have to work with it professionally, so I absolutely do have an interest in this. And I wouldn’t feel the need to develop this critique of it publicly if there weren’t a constant drip feed of stories telling me how awesome this obviously poor feature is.
Your views about how bad Go’s type system is are obviously not supported by the facts, otherwise Go programs would be full of bugs (or full of minuscule imperative checks) with respect to your_favourite_language.
I understand your point about being forced to use a tool in your $job that you don’t like, that happened to me with Java, my best advice to you is to just change $job instead of complaining under unrelated discussions.
Your views about how bad Go’s type system is are obviously not supported by the facts, otherwise Go programs would be full of bugs (or full of minuscule imperative checks)
They are full of bugs, and they are full of miniscule imperative checks. The verbosity of all the if err != nil checks is one of the first things people notice. Invoke “the facts” without bringing any isn’t meaningfully different than subjective opinion.
Your comments amount to “shut up and go away” and I refuse. To publish a blog post celebrating a language feature, and to surface it on a site of professionals, is to invite comment and critique. I am doing this, and I am being constructive by articulating specific downsides to this language decision and its impacts. This is relevant information that people use to evaluate languages and should be part of the conversation.
If if err != nil checks are the “minuscle imperative checks” you complain about, I have no problem with that.
That you have “facts” about Go programs having worse technical quality (and bug count) than any other language I seriously doubt, at most you have anecdotes.
And the only anecdote you’ve been able to come up with so far is that you’ve found “production bugs” caused by unchecked errors that can be fixed by a linter. Being constructive would mean indicating how the language should change to address your perceived problem, not implying that the entire language should be thrown out the window. If that’s how you feel, just avoid commenting on random Go post.
Yeah, I have seen it happen maybe twice in eight years of using Go professionally, but I have seen it complained about in online comment sections countless times. :-)
If I were making a new language today, I wouldn’t copy Go’s error handling. It would probably look more like Zig. But I also don’t find it to be a source of bugs in practice.
Everyone who has mastered a language builds up muscle memory of how to avoid the Bad Parts. Every language has them. This is not dispositive to the question of whether a particular design is good or not.
Not seeing a problem as a bug in production doesn’t tell you much. It usually just means that the developers spent more writing tests or doing manual testing - and this is just not visible to you. The better the compiler and type-system, the fewer tests you need for the same quality.
Not seeing a problem as a bug in production doesn’t tell you much
Agreed, but I wasn’t talking about just production–I don’t recall seeing a bug like this in any environment, at any stage.
It usually just means that the developers spent more writing tests or doing manual testing - and this is just not visible to you.
In a lot of cases I am the developer, or I’m working closely with junior developers, so it is visible to me.
The better the compiler and type-system, the fewer tests you need for the same quality.
Of course with Go we don’t need to write tests for unhandled errors any more than with Rust, we just use a linter. And even when static analysis isn’t an option, I disagree with the logic that writing tests is always slower. Not all static analysis is equal, and in many cases it’s not cheap from a developer velocity perspective. Checking for errors is very cheap from a developer velocity perspective, but pacifying the borrow checker is not. In many cases, you can write a test or two in the time it would take to satisfy rustc and in some cases I’ve even introduced bugs precisely because my attention was so focused on the borrow checker and not on the domain problem (these were bugs in a rewrite from an existing Go application which didn’t have the bugs to begin with despite not having the hindsight benefit that the Rust rewrite enjoyed). I’m not saying Rust is worse or static analysis is bad, but that the logic that more static analysis necessarily improves quality or velocity is overly simplistic, IMHO.
Of course with Go we don’t need to write tests for unhandled errors any more than with Rust, we just use a linter.
I just want to emphasize that It’s not the same thing - as you also hint to in the next sentence.
I disagree with the logic that writing tests is always slower.
I didn’t say that writing tests is always slower or that using the compiler to catch these things is necessarily always better. I’m not a Rust developer btw. and Rust’s errorhandling is absolutely not the current gold-standard by my own judgement.
I just want to emphasize that It’s not the same thing - as you also hint to in the next sentence.
It kind of is the same thing: static analysis. The only difference is that the static analysis is broken out into two tools instead of one, so slightly more care needs to be taken to ensure the linter is run in CI or locally or wherever appropriate. To be clear, I think Rust is strictly better for having it in the compiler–I mostly just disagree with the implications in this thread that if the compiler isn’t doing the static analysis then the situation is no better than a dynamic language.
I didn’t say that writing tests is always slower or that using the compiler to catch these things is necessarily always better.
What did you mean when you said “It usually just means that the developers spent more writing tests or doing manual testing … The better the compiler and type-system, the fewer tests you need for the same quality.” if not an argument about more rigorous static analysis saving development time? Are we just disagreeing about “always”?
I mostly just disagree with the implications in this thread that if the compiler isn’t doing the static analysis then the situation is no better than a dynamic language.
Ah I see - that is indeed an exaggeration that I don’t share.
Are we just disagreeing about “always”?
First that, but it also in general has other disadvantages. For instance, writing tests or doing manual tests is often easy to do. Learning how to deal with a complex time system is not. Go was specifically created to get people to contribute fast.
Just one example that shows that it’s not so easy to decide which way is more productive.
Swallowing errors is the very worst option there is. Even segfaulting is better, you know at least something is up in that case.
Dynamic languages usually just throw an exception and those have way better behavior (you can’t forget, an empty catch is a deliberate sign to ignore an error, not an implicit one like with go), at least some handler further up will log something and more importantly the local block that experienced the error case won’t just continue executing as if nothing happened.
I disagree with this, only because it’s imperialism. I’m British, in British English I write marshalling (with two of the letter l), sanitising (-sing instead of -zing except for words ending in a z), and -ise instead of -ize, among other things. You wouldn’t demand an Arabic developer to write all his comments in English for your sake for the sake of being idiomatic, would you?
I’ve worked for a few companies in Germany now, about half of them with their operating language being in German. All of them had comments being written exclusively in English. I don’t know how that is in other countries, but I get the impression from Europeans that this is pretty standard.
That said, my own preference is for American English for code (i.e. variable names, class names, etc), but British English for comments, commit messages, pull requests, etc. That’s because the names are part of the shared codebase and therefore standardised, but the comments and commit messages are specifically from me. As long as everyone can understand my British English, then I don’t think there’s much of a problem.
EDIT: That said, most of these suggestions feel more on the pedantic end of the spectrum as far as advice goes, and I would take some of this with a pinch of salt. In particular, when style suggestions like “I tend to write xyz” become “do this”, then I start to raise eyebrows at the usefulness of a particular style guide.
All of them had comments being written exclusively in English. I don’t know how that is in other countries, but I get the impression from Europeans that this is pretty standard.
Developers in China seem to prefer Chinese to English. When ECharts was first open-sourced by Baidu most of the inline comments (and the entire README) were in Chinese:
In Japan I feel like the tech industry is associated with English, and corporate codebases seem to use mostly English in documentation. However, many people’s personal projects have all the comments/docs in Japanese.
If someone wants to force everyone to spell something the same within a language they should make sure it’s spelled wrong in all varieties, like with HTTP’s ‘referer’.
The Go core developers feel so strongly about their speling that they’re wiling to change the names of constants from other APIs.
The gRPC protocol contains a status code enum (https://grpc.io/docs/guides/status-codes/), one of which is CANCELLED. Every gRPC library uses that spelling except for go-grpc, which spells it Canceled.
Idiosyncratic positions and an absolute refusal to concede to common practice is part and parcel of working with certain kinds of people.
We’re drifting off-topic, but I have to ask: gRPC is a Google product; Go is a Google product; and Google is a US company. How did gRPC end up with CANCELLED in the first place?!
You wouldn’t demand an Arabic developer to write all his comments in English for your sake for the sake of being idiomatic, would you?
If this is something other than a private pet project of a person who has no ambition of ever working with people outside of his country? Yes, yes I would.
I believe the advice is still applicable to non-native speakers. In all companies I worked for in France, developers write code in English, including comments, sometimes even internal docs. There are a lot of inconsistencies (typically mixing US English and GB English, sometimes in the same sentence.)
In my experience (LatAm) the problem with that is people tend to have pretty poor English writing skills. You end up with badly written comments and commit messages, full of grammatical errors. People were aware of this so they avoided writing long texts in order to limit their mistakes, so we had one-line PR descriptions, very sparse commenting, no docs to speak of, etc.
Once I had the policy changed for the native language (Portuguese) in PRs and docs they were more comfortable with it and documentation quality improved.
In Europe people are much more likely to have a strong English proficiency even as a second or third language. You have to know your audience, basically.
While I like to write paragraphs of explanation in-between code, my actual comments are rather ungrammatical, with a bit of git style verb-first, removing all articles and other things. Proper English feels wrong in these contexts. Some examples from my currently opened file:
; Hide map’s slider when page opens first time
;; Giv textbox data now
;;Norm longitude within -180-180
; No add marker when click controls
;; Try redundant desperate ideas to not bleed new markers through button
;; Scroll across date line #ToDo Make no tear in marker view (scroll West from Hawaii)
Those comments would most likely look weird to a person unfamiliar with your particular dialect.
In a small comment it’s fine to cut some corners, similar to titles in newspapers, but we can’t go overboard: the point of these things is to communicate, we don’t want to make it even more difficult for whoever is reading them. Proper grammar helps.
For clarification, this is not my dialect/way of speaking. But I see so many short interline comments like this, that I started thinking they feel more appropriate and make them too, now. Strange!
Is “hat” a standard term regularly used in the golang ecosystem for a specific thing and on the list given in the article? If not, it is not relevant to the point in the article.
(And even generalized: if it happens to be an important term for your code base or ecosystem, it probably makes sense to standardize on how to spell it. in whatever language and spelling you prefer. I’ve worked on mixed-language codebases, and it’d been helpful if people consistently used the German domain-specific terms instead of mixing them with various translation attempts. Especially if some participants don’t speak the language (well) and have to treat terms as partially opaque)
I had to solve this once. I maintain a library that converts between HTML/CSS color formats, and one of the formats is a name (and optional spec to say which set of names to draw from). HTML4, CSS2, and CSS2 only had “gray”, but CSS3 added “grey” as another spelling for the same color value, and also added a bunch of other new color names which each have a “gray” and a “grey” variant.
Which raises the question: if I give the library a hex code for one of these and ask it to convert to name, which name should it convert to?
The solution I went with was to always return the “gray” variant since that was the “original” spelling in earlier HTML and CSS specs:
I don’t think it’s really “imperialism”—firstly, “marshaling” isn’t even the preferred spelling in the US. Secondly in countries all over the world job listings stipulate English language skills all the time (even Arabic candidates) and the practice is widely accepted because facilitating communication is generally considered to be important. Lastly, while empires certainly have pushed language standardization as a means to stamp out identities, I don’t think it follows that all language standards exist to stamp out identities (particularly when they are optional, as in the case of this post).
“marshaling” isn’t even the preferred spelling in the US
What makes you say that? (Cards on the table, my immediate thought was “Yes, it is.” I had no data for that, but the ngram below suggests that the single l spelling is the (currently) preferred US spelling.)
I used Tailscale the other day to solve a problem where I wanted to scrape a government website from inside GitHub Actions but I was being IP blocked by Cloudflare. My scraper in GitHub Actions now connects to my Tailscale network and uses my Apple TV as an exit node - works great! https://til.simonwillison.net/tailscale/tailscale-github-actions
I’ve done similar to bypass geographic restrictions on streaming services when traveling abroad. The streaming services block IP addresses from certain regions and they also typically block popular VPNs, but with Tailscale I set the exit node for my Raspberry Pi at home and I was good to go. The only issue is that if you want to watch content on a TV, you need something like an Apple TV that can connect the TV to Tailscale.
Ooh now here’s a killer app idea: a simple, usable terminal command that you can use to sandbox a particular directory, a la chroot but actually usable for security. So you can run sandbox in a terminal and everything outside the local dir is unreachable forever to any process started within it, or you run sandbox cargo build like you’d run sudo except it has the opposite effect. Always starts from the existing local state, so you don’t have to do any setup a la Docker.
Not an ideal solution, given that many cargo commands want to touch the network or the rest of the filesystem for things like downloading cached packages, but it’s a thought. Maybe you can have a TOFU type setup where you run it and it goes through and asks “can this talk over the network to crates.io?” and “can this read ~/.cargo/registry/cache? Can this write ~/.cargo/registry/cache?”. Then, idk, it remembers the results for that directory and command?
I know all the tools are there to make something like this, no idea if it’s feasible in terms of UI though, or even whether it’d actually be useful for security. But it seems like something we should have.
Always starts from the existing local state, so you don’t have to do any setup a la Docker.
If you give this one requirement up you can do all of this today pretty easily. It’s a lot harder to do this otherwise as unprivileged sandboxing is already annoying + x-plat sandboxing is extremely painful.
I actually started work on an idea for that years ago but a mixture of “I wasn’t in a good mindstate and then 2019 made it worse” and fears of becoming more vulnerable, not less, if I was the face of a tool others were relying on for security (by going from being at risk of passive/undirected attacks to being at risk of active/directed attacks) caused it to go on de facto hiatus before I finished it.
You can see what got written here: https://github.com/ssokolow/nodo/ (defaults.toml illustrates the direction I was thinking in terms of making something like nodo cargo build Just Work™ with a useful amount of sandboxing.)
This is possible through apparmor and selinux. It’s not trivial, but doable. Unfortunately macos is on its own here with sandbox-exec being basically unsupported and wired in behaviour.
I think it would be a good idea even for things like default-allow, but preventing writes to home/SSH configuration. But ui? Nah, this is going to be a per-project mess.
I think that would be tricky because the sandbox program would need to know what files and other resources are required by the program it is supposed to execute in order to run them in a subdirectory—there’s not a great programmatic way to do this, and even if there was, it wouldn’t improve security (the command could just say “I need the contents of the user’s private keys” for instance). The alternative is to somehow tell the sandbox program what resources are required by the script which can be really difficult to do in the general case and probably isn’t a lot better than Docker or similar.
On a developer workstation, probably most critical are your home directory (could contain SSH keys, secrets to various applications, etc.), /etc, /var, and /run/user/<UID>. You could use something like bubblewrap to only make the project’s directory visible in $HOME, use a tmpfs for $HOME/.cargo, and use stubs or tmpfses for the other directories.
I did this once and it works pretty well and across projects. However, the question is if you don’t trust the build, why would you trust the application itself? So, at that point you want to run it at all or in an isolated VM anyway. So it probably makes more sense to build the project in a low-privileged environment like that as well.
IMO sandboxing is primarily interesting for applications that you trust in principle, but process untrusted data (chat clients, web browsers, etc.). So you sandbox them for when there is a zero-day vulnerability. E.g. running something like Signal, Discord, or a Mastodon client without sandboxing is pretty crazy (e.g. something like iMessage needs application sandboxing + blastdoor all the time to ensure that zero-days cannot elevate to wider access).
No,. there’s a lot of policy discretion. The US government has access to any data stored in the US belonging to non-US persons without basic due process like search warrants. The data they choose to access is a policy question. The people being installed in US security agencies have strong connections to global far right movements.
In 2004 servers operated by Rackspace in the UK on behalf of Indymedia were handed over to the American authorities with no consideration of the legal situation in the jurisdiction where they were physically located.
/Any/ organisation- governmental or otherwise- that exposes themselves to that kind of risk needs to be put out of business.
I seem to remember an incident where instapaper went offline. The FBI raided a data centre and took a blade machine offline containing blade servers they had warrants for, and instapapers, which they didn’t. So accidents happen.
Yes, but in that case the server was in an American-owned datacenter physically located in America (Virginia), where it was within the jurisdiction of the FBI.
That is hardly the same as a server in an American-owned datacenter physically located in the UK, where it was not within the jurisdiction of the FBI.
Having worked for an American “multinational” I can see how that sort of thing can happen: a chain of managers unversed in the law assumes it is doing “the right thing”. Which makes it even more important that customers consider both the actual legal situation and the cost of that sort of foulup when choosing a datacenter.
The US government has access to any data stored in the US belonging to non-US persons without basic due process like search warrants.
Serious question, who’s putting data in us-west etc when there is eu data centres? And does that free rein over data extend to data in European data centres? I was under the impression that safe harbour regs protected it? But it’s been years since I had to know about this kind of stuff and it’s now foggy.
It does not matter where the data is stored. Using EU datacenters will help latency if that is where your users are, but it will not protect you from warrants. The author digs into this in this post, but unfortunately, it is in Dutch: https://berthub.eu/articles/posts/servers-in-de-eu-eigen-sleutels-helpt-het/
Serious question, who’s putting data in us-west etc when there is eu data centres?
A lot of non-EU companies. Seems like a weird question, not everyone is either US or EU. Almost every Latin American company I’ve worked for uses us-east/west, even if it has no US customers. It’s just way cheaper than LATAM data centers and has better latency than EU.
Obviously the world isn’t just US/EU, I appreciate that. This article is dealing with the trade agreements concerning EU/US data protection though so take my comment in that perspective.
I haven’t personally made up my mind on this, but one piece of evidence in the “it’s dramatically different (in a bad way)” side of things would be the usage of unvetted DOGE staffers with IRS data. That to me seems to indicate that the situation is worse than before.
Not sure what you mean—Operational Desert Storm and the Cold War weren’t initiated by the US nor were Iraq and the USSR allies in the sense that the US is allied with Western Europe, Canada, etc (yes, the US supported the USSR against Nazi Germany and Iraq against Islamist Iran, but everyone understood those alliances were temporary—the US didn’t enter into a mutual defense pact with Iraq or USSR, for example).
they absolutely 100% were initiated by the US. yes the existence of a mutual defense pact is notable, as is its continued existence despite the US “seeking to harm” its treaty partners. it sounds like our differing perceptions of whether the present moment is “dramatically different” come down to differences in historical understanding, the discussion of which would undoubtedly be pruned by pushcx.
This isn’t true, as the US has been the steward of the Internet and its administration has turned hostile towards US’s allies.
In truth, Europe already had a wake-up call with Snowden’s revelations, the US government spying on non-US citizens with impunity, by coercing private US companies to do it. And I remember the Obama administration claiming that “non-US citizens have no rights”.
But that was about privacy, whereas this time we’re talking about a far right administration that seems to be on a war path with US’s allies. The world today is not the same as it was 10 years ago.
hm, you have a good point. I was wondering why now it would be different but “privacy” has always been too vague a concept for most people to grasp/care about. But an unpredictable foreign government which is actively cutting ties with everyone and reneging on many of its promises with (former?) allies might be a bigger warning sign to companies and governments world wide.
I mean, nobody in their right mind would host stuff pertaining to EU citizens in, say, Russia or China.
Ah yes, one of the super toxic throw away comments anyone can make which alienates one participant and makes the other feel super smug.
These things are mostly going away, phone keyboards killed the “grammar nazi”, no one can spell anymore and no one knows who is at fault. I’m looking forward to a world where our default isn’t trying to win the conversation, but to move it forward more productively.
I’m looking forward to a world where our default isn’t trying to win the conversation, but to move it forward more productively.
Probably a bit optimistic
On UUOC: I suspect it was originally in the spirit of fun. It becoming a cultural gatekeeping tool (as many, many in-group signifiers become) is a separate phenomenon and I think it’s probably best to view it as such
IMO the issue is that pointing at other people’s code and saying “lol that’s dumb” is really only something you can do with friends, with strangers over the internet it’s a very different thing
Often, people in these forums are friends to some extent and I’d assume that part of this that became toxic is that since you’re not ALL friends, it’s hard to tell who knows who and what’s a joke and what’s a dig.
I thought the main benefit of the article was towards the end, and what @carlana summarized well here:
Nil channels are useful because essentially any practical use of channels should have a select with multiple branches and nil lets you turn off one branch of the select while keeping the others working. Even the most basic thing you can do, send work to some worker goroutines and receive answered data back, will have two branches for send data and receive data and at a certain point you will have sent everything and want to turn the sending branch off while still receiving.
I’ve been writing Go since 2012 and I didn’t know this. 🤷♂️
I think the language would be a lot nicer without the make() init.
In the last 8 years using Go, every time I took a break away from Go and came back, the nil map, nil channel, and zero maps init tripped me up quite consistently. With generics available now, I think there are plenty of ways to clean up these older APIs and make the language friendlier to new users.
I was going to write a comment about why you really want nil channels, but carlana already did so I’ll just bring it to your attention in case you’re only watching replies.
Nil maps aren’t so directly useful, but the difference between an empty map and no map at all is 48 bytes, which is non-negligible if you have some kind of data structure with millions of maps that might or might not exist.
I was going to write a comment about why you really want nil channels, but carlana already did so I’ll just bring it to your attention in case you’re only watching replies.
The issue is not the functionality, it’s the implicitness. It’s that if you forget to make() your channel you get a behaviour which is very likely to screw you over when you don’t expect it.
If it gave you a channel by default it would be impliclty open or closed, buffered (with some capacity) or unbuffered. But it declines to do that, and has you construct what you want, which makes the code more explicit.
At the same time, there’s a need for one more thing, a (non)channel that is never available for communication. Go types must always have a zero value, and “a channel that’s never ready” is a lot more zero-ish than anything else you could come up with.
Yes, there’s a guideline that zero values should be useful when possible (a nil slice is ready for append, a zero bytes.Buffer is empty, a zero sync.Mutex is unlocked), but that takes a backseat to the requirement for a zero value to be uniquely zero. Complaining about how
Would that it did. It does give me a channel by default, one that is pretty much never what I want.
which makes the code more explicit.
Ah yes, the explicitness of implicitly doing something stupid.
Go types must always have a zero value
Have you considered that that’s a mistake?
Complaining about how […] fails.
It does does it not? In both cases the language does something which is at best useless and at worst harmful, and which, with a little more effort put into its design, it could just not do.
Like I said elsewhere, for pretty much every non-trivial use of channels, you will want a nil channel at some point, so you can deactivate one branch of a select. It’s pretty much always one of the things I want.
I think I disagree? Or at least I’ve never made good use of a nil channel. Maybe now that I’ve learned about its uses for select{...} I’ll have a different opinion, but there have been plenty of times I don’t want a nil channel. And this problem isn’t limited to channels either–Go also gives nil pointers and nil maps by default, even though a nil pointer or map is frequently a bug. Defaulting to a zero value is certainly an improvement on C’s default (“whatever was in that particular memory region”), but I think it would be a lot better if it just forced us to initialize the memory.
I do wish that map was a value type by default and you would need to write *m to actually use it. That would be much more convenient. The Go team said they did that in early versions of Go, but they got sick of the pointer, so they made it internal to the map, but I think that was a mistake.
Defaulting to a zero value is certainly an improvement on C’s default (“whatever was in that particular memory region”)
Technically it’s UB, which is even worse. You may get whatever was at that location, or you might get the compiler deleting the entire thing and / or going off the rails completely.
Yeah, it’s not going to happen, but I’m convinced that would have been the choice to make in 2012 (or earlier). I can live with it, and Go is still the most productive tool in my bucket, but that particular decision is pretty disappointing, especially because we can’t back out of it the way we could have done if we had mandatory initialization (you can relax that requirement without breaking compatibility).
To add to that point, there’s another issue which is that many channels oughtn’t be nil, but Go doesn’t give us a very good way to express that. In fact, it goes even further and makes nil a default even when a nil channel would be a bug. I really, really wish Go had (reasonably implemented) sum types.
I haven’t done Go seriously in a while, but when I did, I was continually annoyed at this sort of thing because there’s no way to encapsulate these patterns to make it easy to get them right. I remember reading a Go team post about how to use goroutines properly and ranting about how Go’s only solution for reusing high-level code patterns is blog posts.
But now that it has generics, is it possible to solve this? Has someone made a package of things like worker pools and goroutine combinators (e.g., split/merge) that get this stuff right so you don’t have to rediscover the mistakes?
As an example of what annoyed me, it was things like this blog post on pipelines and cancellation. Which should have just been a library, not a blog post.
Conc has a truly awful API. It really shows the power of writing a good blog post to make your package popular. I made my own concurrency package, and there were no ideas in conc worth copying. Honestly though, my recommendation for most people is to just use https://pkg.go.dev/golang.org/x/sync/errgroup since it’s semi-standard.
This crate should really be more well-known in the Rust ecosystem! The following quote on OpenAPI support resonates a lot with me (source):
[An] important goal for us was to build something with strong OpenAPI support, and particularly where the code could be the source of truth and a spec could be generated from the code that thus could not diverge from the implementation. We weren’t sure that would be possible, but it seemed plausible and worthwhile if we could do it. None of the crates we found had anything like this.
I use something like this in the Go ecosystem, and generating an OpenAPI spec from source code is much better than generating source code from an OpenAPI spec.
One under-appreciated reason why it’s better: you don’t have to handle the entirely of the OpenAPI spec, just the features your server framework wants to produce. We took advantage of this by also writing our own code gen for clients too, that’s also easier to make good because you only need to handle that same subset.
Kernel development is pretty unusual in the sense that many things you’d expect to be part of a project are carried out downstream of it.
For one, there’s no CI and not much in the way of testing there. Instead, developers and users do all kinds of testing on their own terms, such as the linux test project or syzbot (which was the thing that found this filesystem bug).
I was even more surprised when I found out that documentation is also largely left to downstream, and so there are a bunch of syscalls that the Linux manpages project simply hasn’t gotten around to add manpages for.
I was pretty surprised to find that the only documentation for the ext filesystems was just one guy trying to read the (really dodgy) code and infer what happened under different cases. A lot of “I think it does X when Y happens, but I’m not sure” in the documentation. And reading through the code, I understand it. That filesystem is its own spec because no one actually understands it. It’s wild how much this kind of stuff exists near the bottom of our tech stacks.
Yup. I tried to learn about the concurrency semantics of ext4 - stuff like “if I use it from multiple processes, do they see sequentially consistent operations?”, so I asked around. Nobody was able to give me a good answer.
Also, one kernel dev in response smugly told me to go read memory-barriers.txt. Which I’d already read and which is useless for this purpose! Because it documents kernel-internal programming practices, not the semantics presented to userland.
I’ve never really worked on an application that ran out of connections or experienced degraded performance as a consequence of too many connections—what sort of system/scale benefits from PgBouncer? Presumably read replicas allow you to scale read connections linearly (at some cost) so it mostly comes into play when you have hundreds of concurrent writes? Or maybe you teach for PgBouncer before you scale out read replicas because the “peril” of the former outweighs the cost of the latter?
Lots of folks, for better or worse, run N replicas of their app, or N microservices, each with M connections in a pool. Seems to be very common with Kubernetes shops. Pretty easy to end up with a hundred open connections this way, and that’s not cheap.
Read replicas are too complex to be retrofitted on to most applications - they’re usually eventually consistent and the app needs to be taught which queries to route to the replica. PgBouncer can’t really do that sort of read query offloading, because of SELECT launch_missiles()-shaped problems. Easier to throw money at the problem scaling vertically than doing open heart surgery on your persistence layer, I guess.
I understand the N replicas with M connections per replica thing, but I usually only see ~3 replicas. Unless reach replicas has hundreds of connections in its pool, I don’t think this would explain it. Are lots of people running with dozens of replicas? What kinds of apps / at what scale is this relevant?
Read replicas are somewhat complex, but it seems more straightforward to tell whether a query will work for a read replica (and if you aren’t sure just leave it on the master) than it is to navigate the PgBouncer concerns raised in this article. And I think eventual consistency is probably fine for a lot of CRUD apps, no?
We have a python web app, that’s read/write to postgres. Each instance of the service will start with 5 connections to the DB, and we have many instances running around.
Certainly we could build out read replica’s of PG, retool the web app to read from 1 DB and write to another, but it was infinitely easier to just make pgbouncer do all the hard work and keep a single DB.
We could also optimize our queries too. Again, pgbouncer and scaling the 1 prod DB instance is just easier. DB slow, throw some more ram or CPU at the problem and move on with life.
We haven’t run out of hardware growth room on x86 yet, and they keep expanding our growth ability hardware wise. Perhaps someday we will start to reach the limits of what a single box can do, and then we can start to optimize. Until that day though, it’s not worth the developer time. Hardware running costs are way cheaper than developer time.
We have a python web app, that’s read/write to postgres. Each instance of the service will start with 5 connections to the DB, and we have many instances running around.
How many replicas do you have? According to the article, the sweet spot is 200-300 connections, which means you could have 40-60 replicas before performance becomes an issue. Do you really have this many replicas, and if so how much traffic does your app get?
Additionally, how did you avoid the pitfalls described in the article. It sounds like PgBouncer requires changes to the query versus vanilla Postgres, at least if you have it configured in any useful way. You mention that hardware is cheap compared to developer time, and I agree, but I’m trying to understand why PgBouncer minimizes developer time.
Certainly we could build out read replica’s of PG, retool the web app to read from 1 DB and write to another, but it was infinitely easier to just make pgbouncer do all the hard work and keep a single DB.
I keep hearing that it’s much easier to use PgBouncer, but they articles makes it sound like using it in any useful capacity while maintaining transaction and query correctness is not very easy. Similarly you make it sound like it would be really hard to change your code to use read replicas, but i would think it would be really easy (instead of one database handle you have two and your reads use the read handle.
I’m not advocating for one approach, I’m just trying to understand why people keep saying PgBouncer is easy when the article makes it seem pretty complicated to implement correctly.
DB replicas? Zero real-time ones. Do you mean something else? We have an hourly read-replica(for testing) and a daily replica that comes from our backup(so we can exercise our backup/restore), that’s used for development.
We run a few hundred PG connections through pgbouncer without trouble. We also have around 150 normal long-lived connections for our client/server and other longer lived sessions that don’t go through PG bouncer. Only the web traffic goes through pgbouncer.
if so how much traffic does your app get?
We have about 1k users(it’s an internal application). It runs the backoffice stuff(payroll, HR, purchasing, accounting, etc), our biggest load on the web side is timesheets, which most of our users are, and they of course all want to do timesheets at the exact same time.
I’m just trying to understand why people keep saying PgBouncer is easy when the article makes it seem pretty complicated to implement correctly.
I dunno, it was easy for us :) We did have a few growing pains when we deployed it, but they were easily solved. We haven’t had issues in years now. I don’t remember what the issues were when we were deploying, and I can’t remember what mode we run it in. I’m not in a position to easily check at the moment. I think we went from testing to in production within a week, so whatever issues we had, they were not difficult or hard to fix for us.
If you want me to, I can go look at our config later and re-familiarize myself with it, and share some more details.
I meant application replicas. How many instances of your application are connecting to the database with ~5 connections each?
I’m surprised you need PgBouncer at all for an internal system with 1k users?
It was easy for us
The way the article makes it sound, if you just throw PgBouncer in front of something without configuration, it doesn’t actually do much. How do you know if PgBouncer is actually improving performance?
This is all from memory, I could be off somewhere.
I meant application replicas. How many instances of your application are connecting to the database with ~5 connections each?
I’d have to go count, more than a dozen, less than 100.
I’m surprised you need PgBouncer at all for an internal system with 1k users?
Because we have 1k active users at one time.
How do you know if PgBouncer is actually improving performance?
For us it wasn’t really about performance, it was about relieving the # of connections to PG. PG connections are expensive. We had 500 or so PG connections from the web side when we moved to PG bouncer a few years ago. Now I think we have 50 or 100 active for web connections.
Adding pgbouncer is typically a lot easier than adding a read replica IME (at least for rails, though the recent multidb support changes that calculus a lot)
Many of the features are operational and so depending on what you want to do there’s no real minimum scale.
For example, you can use it to “pause” new connections to the database - effectively draining the connections. I’ve used this in the past to restart a production database to pick up OS updates without having to go through all the faff of a failover.
But it’s not uncommon for applications to behave badly (eg open way too many conns, for too long) and pgbouncer gives you tools to mitigate that. It’s not just a “big scale” thing.
The majority of bugs (quantity, not quality/severity) we have are due to the stupid little corner cases in C that are totally gone in Rust. Things like simple overwrites of memory (not that rust can catch all of these by far), error path cleanups, forgetting to check error values, and use-after-free mistakes. That’s why I’m wanting to see Rust get into the kernel, these types of issues just go away, allowing developers and maintainers more time to focus on the REAL
bugs that happen (i.e. logic issues, race conditions, etc.)
This is an extremely strong statement.
I think a few things are also interesting:
I think people are realizing how low quality the Linux kernel code is, how haphazard development is, how much burnout and misery is involved, etc.
I think people are realizing how insanely not in the open kernel dev is, how much is private conversations that a few are privy to, how much is politics, etc.
I think people are realizing how insanely not in the open kernel dev is, how much is private conversations that a few are privy to, how much is politics, etc.
The Hellwig/Ojeda part of the thread is just frustrating to read because it almost feels like pleading. “We went over this in private” “we discussed this already, why are you bringing it up again?” “Linus said (in private so there’s no record)”, etc., etc.
Dragging discussions out in front of an audience is a pretty decent tactic for dealing with obstinate maintainers. They don’t like to explain their shoddy reasoning in front of people, and would prefer it remain hidden. It isn’t the first tool in the toolbelt but at a certain point there is no convincing people directly.
Dragging discussions out in front of an audience is a pretty decent tactic for dealing with
With quite a few things actually. A friend of mine is contributing to a non-profit, which until recently had this very toxic member (they’ve even attempted felony). They were driven out of the non-profit very soon after members talked in a thread that was accessible to all members. Obscurity is often one key component of abuse, be it mere stubbornness or criminal behaviour. Shine light, and it often goes away.
IIRC Hintjens noted this quite explicitly as a tactic of bad actors in his works.
It’s amazing how quickly people are to recognize folks trying to subvert an org piecemeal via one-off private conversations once everybody can compare notes. It’s equally amazing to see how much the same people beforehand will swear up and down oh no that’s a conspiracy theory such things can’t happen here until they’ve been burned at least once.
This is an active, unpatched attack vector in most communities.
I’ve found the lowest example of this is even meetings minutes at work. I’ve observed that people tend to act more collaboratively and seek the common good if there are public minutes, as opposed to trying to “privately” win people over to their desires.
I think people are realizing how low quality the Linux kernel code is, how haphazard development is, how much burnout and misery is involved, etc.
Something I’ve noticed is true in virtually everything I’ve looked deeply at is the majority of work is poor to mediocre and most people are not especially great at their jobs. So it wouldn’t surprise me if Linux is the same. (…and also wouldn’t surprise me if the wonderful Rust rewrite also ends up poor to mediocre.)
yet at the same time, another thing that astonishes me is how much stuff actually does get done and how well things manage to work anyway. And Linux also does a lot and works pretty well. Mediocre over the years can end up pretty good.
After tangentially following the kernel news, I think a lot of churning and death spiraling is happening. I would much rather have a rust-first kernel that isn’t crippled by the old guard of C developers reluctant to adopt new tech.
Take all of this energy into RedoxOS and let Linux stay in antiquity.
I’ve seen some of the R4L people talk on Mastodon, and they all seem to hate this argument.
They want to contribute to Linux because they use it, want to use it, and want to improve the lives of everyone who uses it. The fact that it’s out there and deployed and not a toy is a huge part of the reason why they want to improve it.
Hopping off into their own little projects which may or may not be useful to someone in 5-10 years’ time is not interesting to them. If it was, they’d already be working on Redox.
The most effective thing that could happen is for the Linux foundation, and Linus himself, to formally endorse and run a Rust-based kernel. They can adopt an existing one or make a concerted effort to replace large chunks of Linux’s C with Rust.
IMO the Linux project needs to figure out something pretty quickly because it seems to be bleeding maintainers and Linus isn’t getting any younger.
They may be misunderstanding the idea that others are not necessarily incentivized to do things just because it’s interesting for them (the Mastodon posters).
Redox does have the chains of trying to do new OS things. An ABI-compatible Rust rewrite of the Linux kernel might get further along than expected (even if it only runs in virtual contexts, without hardware support (that would come later.))
Linux developers want to work on Linux, they don’t want to make a new OS. Linux is incredibly important, and companies already have Rust-only drivers for their hardware.
Basically, sure, a new OS project would be neat, but it’s really just completely off topic in the sense that it’s not a solution for Rust for Linux. Because the “Linux” part in that matters.
I read a 25+ year old article [1] from a former Netscape developer that I think applies in part
The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed. There’s nothing wrong with it. It doesn’t acquire bugs just by sitting around on your hard drive. Au contraire, baby! Is software supposed to be like an old Dodge Dart, that rusts just sitting in the garage? Is software like a teddy bear that’s kind of gross if it’s not made out of all new material?
Adopting a “rust-first” kernel is throwing the baby out with the bathwater. Linux has been beaten into submission for over 30 years for a reason. It’s the largest collaborative project in human history and over 30 million lines of code. Throwing it out and starting new would be an absolutely herculean effort that would likely take years, if it ever got off the ground.
The idea that old code is better than new code is patently absurd. Old code has stagnated. It was built using substandard, out of date methodologies. No one remembers what’s a bug and what’s a feature, and everyone is too scared to fix anything because of it. It doesn’t acquire new bugs because no one is willing to work on that weird ass bespoke shit you did with your C preprocessor. Au contraire, baby! Is software supposed to never learn? Are we never to adopt new tools? Can we never look at something we’ve built in an old way and wonder if new methodologies would produce something better?
This is what it looks like to say nothing, to beg the question. Numerous empirical claims, where is the justification?
It’s also self defeating on its face. I take an old codebase, I fix a bug, the codebase is now new. Which one is better?
Like most things in life the truth is somewhere in the middle. There is a reason there is the concept of a “mature node” in the semiconductor industry. They accept that new is needed for each node, but also that the new thing takes time to iron out the kinks and bugs. This is the primary reason why you see apple take new nodes on first before Nvidia for example, as Nvidia require much larger die sizes, and so less defects per square mm.
You can see this sometimes in software for example X11 vs Wayland, where adoption is slow, but most definetly progressing and now-days most people can see that Wayland is now, or is going to become the dominant tech in the space.
I don’t think this would qualify as dialectic, it lacks any internal debate and it leans heavily on appeals by analogy and intuition/ emotion. The post itself makes a ton of empirical claims without justification even beyond the quoted bit.
That means we can probably keep a lot of the old trusty Linux code around while making more of the new code safe by writing it in Rust in the first place.
I don’t think that’s a fair assessment of Spolsky’s argument or of CursedSilicon’s application of it to the Linux kernel.
Firstly, someone has already pointed out the research that suggests that existing code has fewer bugs in than new code (and that the older code is, the less likely it is to be buggy).
Secondly, this discussion is mainly around entire codebases, not just existing code. Codebases usually have an entire infrastructure around them for verifying that the behaviour of the codebase has not changed. This is often made up of tests, but it’s also made up of the users who try out a release of a codebase and determine whether it’s working for them. The difference between making a change to an existing codebase and releasing a new project largely comes down to whether this verification (both in terms of automated tests and in terms of users’ ability to use the new release) works for the new code.
Given this difference, if I want to (say) write a new OS completely in Rust, I need to choose: Do I want to make it completely compatible with Linux, and therefore take on the significant challenge of making sure everything behaves truly the same? Or do I make significant breaking changes, write my own OS, and therefore force potential adopters to rebuild their entire Linux workflows in my new OS?
The point is not that either of these options are bad, it is that they represent significant risks to a project. Added to the general risk that is writing new code, this produces a total level of risk that might be considered the baseline risk of doing a rewrite. Now risk is not bad per se! If the benefits of being able to write an OS in a language like Rust outweigh the potential risks, then it still makes sense to perform the rewrite. Or maybe the existing Linux kernel is so difficult to maintain that a new codebase really would be the better option. But the point that CursedSilicon was making by linking the Spolsky piece was, I believe, that the risks for a project like the Linux kernel are very high. There is a lot of existing, old code. And there is a very large ecosystem where either breaking or maintaining compatibility would each come with significant challenges.
Unfortunately, it’s very difficult to measure the risks and benefits here in a quantitative, comparable way, so I think where you fall on the “rewrite vs continuity” spectrum will depend mostly on what sort of examples you’ve seen, and how close you think this case is to those examples. I don’t think there’s any objective way to say whether it makes more sense to have something like R4L, or something like RedoxOS.
Firstly, someone has already pointed out the research that suggests that existing code has fewer bugs in than new code (and that the older code is, the less likely it is to be buggy).
I haven’t read it yet, but I haven’t made an argument about that, I just created a parody of the argument as presented. I’ll be candid, i doubt that the research is going to compel me to believe that newer code is inherently buggier, it may compel me to confirm my existing belief that testing software in the field is one good method to find some classes of bugs.
Secondly, this discussion is mainly around entire codebases, not just existing code.
I guess so, it’s a bit dependent on where we say the discussion starts - three things are relevant; RFL, which is not a wholesale rewrite, a wholesale rewrite of the Linux kernel, and Netscape. RFL is not about replacing the entire Linux kernel, although perhaps “codebase” here refers to some sort of unit, like a driver. Netscape wanted a wholesale rewrite, based on the linked post, so perhaps that’s what’s really “the single worst strategic mistake that any software company can make”, but I wonder what the boundary here is? Also, the article immediately mentions that Microsoft tried to do this with Word but it failed, but that Word didn’t suffer from this because it was still actively developed - I wonder if it really “failed” just because pyramid didn’t become the new Word? Did Microsoft have some lessons learned, or incorporate some of that code? Dunno.
I think I’m really entirely justified when I say that the post is entirely emotional/ intuitive appeals, rhetoric, and that it makes empirical claims without justification.
There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:
This is rhetoric. These are unsubstantiated empirical claims. The article is all of this. It’s fine as an interesting, thought provoking read that gets to the root of our intuitions, but I think anyone can dismiss it pretty easily since it doesn’t really provide much in the form of an argument.
It’s important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time.
Again, totally unsubstantiated. I have MANY reasons to believe that, it is simply question begging to say otherwise.
That’s all this post is. Over and over again making empirical claims with no evidence and question beggign.
We can discuss the risks and benefits, I’d advocate for that. This article posted doesn’t advocate for that. It’s rhetoric.
existing code has fewer bugs in than new code (and that the older code is, the less likely it is to be buggy).
This is a truism. It is survival bias. If the code was buggy, it would eventually be found and fixed. So all things being equal newer code is riskier than old code. But it’s also been impirically shown that using Rust for new code is not “all things being equal”. Google showed that new code in Rust is as reliable as old code in C. Which is good news: you can use old C code from new Rust projects without the risk that comes from new C code.
But it’s also been impirically shown that using Rust for new code is not “all things being equal”.
Yeah, this is what I’ve been saying (not sure if you’d meant to respond to me or the parent, since we agree) - the issue isn’t “new” vs “old” it’s things like “reviewed vs unreviewed” or “released vs unreleased” or “tested well vs not tested well” or “class of bugs is trivial to express vs class of bugs is difficult to express” etc.
I don’t disagree that the rewards can outweigh the risks, and in this case I think there’s a lot of evidence that suggests that memory safety as a default is really important for all sorts of reasons. Let alone the many other PL developments that make Rust a much more suitable language to develop in than C.
It’s a Ship of Theseus—at no point can you call it a “new” codebase, but after a period of time, it could be completely different code. I have a C program I’ve been using and modifying for 25 years. At any given point, it would have been hard to say “this is now a new codebase,
yet not one line of code in the project is the same as when I started (even though it does the same thing at it always has).
I don’t see the point in your question. It’s going to depend on the codebase, and on the nature of the changes; it’s going to be nuanced, and subjective at least to some degree. But the fact that it’s prone to subjectivity doesn’t mean that you get to call an old codebase with a single fixed bug a new codebase, without some heavy qualification which was lacking.
What’s old and new is poorly defined and yet there’s an argument being made that “old” and “new” are good indicators of something. If they’re so poorly defined that we have to bring in all sorts of additional context like the nature of the changes, not just when they happened or the number of lines changed, etc, then it seems to me that we would be just as well served to throw away the “old” and “new” and focus on that context.
I feel like enough people would agree more-or-less on what was an “old” or “new” codebase (i.e. they would agree given particular context) that they remain useful terms in a discussion. The general context used here is apparent (at least to me) given by the discussion so far: an older codebase has been around for a while, has been maintained, has had kinks ironed out.
There’s a really important distinction here though. The point is to argue that new projects will be less stable than old ones, but you’re intuitively (and correctly) bringing in far more important context - maintenance, testing, battle testing, etc. If a new implementation has a higher degree of those properties then it being “new” stops being relevant.
It’s also self defeating on its face. I take an old codebase, I fix a bug, the codebase is now new. Which one is better?
My point was that this statement requires a definition of “new codebase” that nobody would agree with, at least in the context of the discussion we’re in. Maybe you are attacking the base proposition without applying the surrounding context, which might be valid if this were a formal argument and not a free-for-all discussion.
If a new implementation has a higher degree of those properties
I think that it would be considered no longer new if it had had significant battle-testing, for example.
FWIW the important thing in my view is that every new codebase is a potential old codebase (given time and care), and a rewrite necessarily involves a step backwards. The question should probably not be, which is immediately better?, but, which is better in the longer term (and by how much)? However your point that “new codebase” is not automatically worse is certainly valid. There are other factors than age and “time in the field” that determine quality.
Methodologies don’t matter for quality of code. They could be useful for estimates, cost control, figuring out whom you shall fire etc. But not for the quality of code.
I’ve never observed a programmer become better or worse by switching methodology. Dijkstra would’ve not became better if you made him do daily standups or go through code reviews.
There are ways to improve your programming by choosing different approach but these are very individual. Methodology is mostly a beancounting tool.
When I say “methodology” I’m speaking very broadly - simply “the approach one takes”. This isn’t necessarily saying that any methodology is better than any other. The way I approach a task today is better, I think, then the way that I would have approached that task a decade ago - my methodology has changed, the way I think has changed. Perhaps that might mean I write more tests, or I test earlier, but it may mean exactly the opposite, and my methods may only work best for me.
I’m not advocating for “process” or ubiquity, only that the approach one tasks may improve over time, which I suspect we would agree on.
It’s the largest collaborative project in human history and over 30 million lines of code.
How many of those lines are part of the core? My understanding was that the overwhelming majority was driver code. There may not be that much core subsystem code to rewrite.
For a previous project, we included a minimal Linux build. It was around 300 KLoC, which included networking and the storage stack, along with virtio drivers.
That’s around the size a single person could manage and quite easy with a motivated team.
If you started with DPDK and SPDK then you’d already have filesystems and a copy of the FreeBSD network stack to run in isolated environments.
Once many drivers share common rust wrappers over core subsystems, you could flip it and write the subsystem in Rust. Then expose C interface for the rest.
I see that Drew proposes a new OS in that linked article, but I think a better proposal in the same vein is a fork. You get to keep Linux, but you can start porting logic to Rust unimpeded, and it’s a manageable amount of work to keep porting upstream changes.
Remember when libav forked from ffmpeg? Michael Niedermayer single-handedly ported every single libav commit back into ffmpeg, and eventually, ffmpeg won.
At first there will be extremely high C percentage, low Rust percentage, so porting is trivial, just git merge and there will be no conflicts. As the fork ports more and more C code to Rust, however, you start to have to do porting work by inspecting the C code and determining whether the fixes apply to the corresponding Rust code. However, at that point, it means you should start seeing productivity gains, community gains, and feature gains from using a better language than C. At this point the community growth should be able to keep up with the extra porting work required. And this is when distros will start sniffing around, at first offering variants of the distro that uses the forked kernel, and if they like what they taste, they might even drop the original.
I genuinely think it’s a strong idea, given the momentum and potential amount of labor Rust community has at its disposal.
I think the competition would be great, especially in the domain of making it more contributor friendly to improve the kernel(s) that we use daily.
I certainly don’t think this is impossible, for sure. But the point ultimately still stands: Linux kernel devs don’t want a fork. They want Linux. These folks aren’t interested in competing, they’re interested in making the project they work on better. We’ll see if some others choose the fork route, but it’s still ultimately not the point of this project.
Linux developers want to work on Linux, they don’t want to make a new OS.
While I don’t personally want to make a new OS, I’m not sure I actually want to work on Linux. Most of the time I strive for portability, and so abstract myself from the OS whenever I can get away with it. And when I can’t, I have to say Linux’s API isn’t always that great, compared to what the BSDs have to offer (epoll vs kqueue comes to mind). Most annoying though is the lack of documentation for the less used APIs: I’ve recently worked with Netlink sockets, and for the proc stuff so far the best documentation I found was the freaking source code of a third party monitoring program.
I was shocked. Complete documentation of the public API is the minimum bar for a project as serious of the Linux kernel. I can live with an API I don’t like, but lack of documentation is a deal breaker.
While I don’t personally want to make a new OS, I’m not sure I actually want to work on Linux.
I think they mean that Linux kernel devs want to work on the Linux kernel. Most (all?) R4L devs are long time Linux kernel devs. Though, maybe some of the people resigning over LKML toxicity will go work on Redox or something…
Re-Implementing the kernel ABI would be a ton of work for little gain if all they wanted was to upstream all the work on new hardware drivers that is already done - and then eventually start re-implementing bits that need to be revised anyway.
If the singular required Rust toolchain didn’t feel like such a ridiculous to bootstrap 500 ton LLVM clown car I would agree with this statement without reservation.
Zig is easier to implement (and I personally like it as a language) but doesn’t have the same safety guarantees and strong type system that Rust does. It’s a give and take. I actually really like Rust and would like to see a proliferation of toolchain options, such as what’s in progress in GCC land. Overall, it would just be really nice to have an easily bootstrapped toolchain that a normal person can compile from scratch locally, although I don’t think it necessarily needs to be the default, or that using LLVM generally is an issue. However, it might be possible that no matter how you architect it, Rust might just be complicated enough that any sufficiently useful toolchain for the language could just end up being a 500 ton clown car of some kind anyways.
Depends on which parts of GP’s statement you care about: LLVM or bootstrap. Zig is still depending on LLVM (for now), but it is no longer bootstrappable in a limited number of steps (because they switched from a bootstrap C++ implementation of the compiler to keeping a compressed WASM build of the compiler as a blob.
Yep, although I would also add it’s unfair to judge Zig in any case on this matter now given it’s such a young project that clearly is going to evolve a lot before the dust begins to settle (Rust is also young, but not nearly as young as Zig). In ten to twenty years, so long as we’re all still typing away on our keyboards, we might have a dozen Zig 1.0 and a half dozen Zig 2.0 implementations!
Yeah, the absurdly low code quality and toxic environment make me think that Linux is ripe for disruption. Not like anyone can produce a production kernel overnight, but maybe a few years of sustained work might see a functional, production-ready Rust kernel for some niche applications and from there it could be expanded gradually. While it would have a lot of catching up to do with respect to Linux, I would expect it to mature much faster because of Rust, because of a lack of cruft/backwards-compatibility promises, and most importantly because it could avoid the pointless drama and toxicity that burn people out and prevent people from contributing in the first place.
From the thread in OP, if you expand the messages, there is wide agreement among the maintainers that all sorts of really badly designed and almost impossible to use (safely) APIs ended up in the kernel over the years because the developers were inexperienced and kind of learning kernel development as they went. In retrospect they would have designed many of the APIs very differently.
It’s based on my forays into the Linux kernel source code. I don’t doubt there’s some quality code lurking around somewhere, but the stuff I’ve come across (largely filesystem and filesystem adjacent) is baffling.
Seeing how many people are confidently incorrect about Linux maintainers only caring about their job security and keeping code bad to make it a barrier to entry, if nothing else taught me how online discussions are a huge game of Chinese whispers where most participants don’t have a clue of what they are talking about.
I doubt that maintainers are “only caring about their job security and keeping back code” but with all due respect: You’re also just taking arguments out of thin air right now. What I do believe is what we have seen: Pretty toxic responses from some people and a whole lot of issues trying to move forward.
Seeing how many people are confidently incorrect about Linux maintainers only caring about their job security and keeping code bad to make it a barrier to entry
Huh, I’m not seeing any claim to this end from the GP, or did I not look hard enough? At face value, saying that something has an “absurdly low code quality” does not imply anything about nefarious motives.
Still, in GP’s case the Chinese whispers have reduced “the safety of this API is hard to formalize and you pretty much have to use it the way everybody does it” to “absurdly low quality”. To which I ask, what is more likely. 1) That 30-million lines of code contain various levels of technical debt of which maintainers are aware; and that said maintainers are worried even of code where the technical debt is real but not causing substantial issue in practice? Or 2) that a piece of software gets to run on literally billions of devices of all sizes and prices just because it’s free and in spite of its “absurdly low quality”?
Linux is not perfect, neither technically nor socially. But it sure takes a lot of entitlement and self-righteousness to declare it “of absurdly low quality” with a straight face.
GP here: I probably should have said “shockingly” rather than “absurdly”. I didn’t really expect to get lawyered over that one word, but yeah, the idea was that for a software that runs on billions of devices, the code quality is shockingly low.
Of course, this is plainly subjective. If your code quality standards are a lot lower than mine then you might disagree with my assessment.
That said, I suspect adoption is a poor proxy for code quality. Internet Explorer was widely adopted and yet it’s broadly understood to have been poorly written.
But it sure takes a lot of entitlement and self-righteousness to declare it “of absurdly low quality” with a straight face
I’m sure self-righteousness could get you to the same place, but in my case I arrived by way of experience. You can relax, I wasn’t attacking Linux—I like Linux—it just has a lot of opportunity for improvement.
I guess I’ve seen the internals of too much proprietary software now to be shocked by anything about Linux per se. I might even argue that the quality of Linux is surprisingly good, considering its origins and development model.
I think I’d lawyer you a tiny bit differently: some of the bugs in the kernel shock me when I consider how many devices run that code and fulfill their purposes despite those bugs.
FWIW, I was not making a dig at open source software, and yes plenty of corporate software is worse. I guess my expectations for Linux are higher because of how often it is touted as exemplary in some form or another. I don’t even dislike Linux, I think it’s the best thing out there for a huge swath of use cases—I just see some pretty big opportunities for improvement.
But it sure takes a lot of entitlement and self-righteousness to declare it “of absurdly low quality” with a straight face.
Or actual benchmarks: the performance the Linux kernel leaves on the table in some cases is absurd. And sure it’s just one example, but I wouldn’t be surprised if it was representative of a good portion of the kernel.
Well not quite but still “considered broken beyond repair by many people related to life time management” - which is definitely worse than “hard to formalize” when “the way ever[y]body does it” seems to vary between each user.
I love Rust but still, we’re talking of a language which (for good reasons!) considers doubly linked lists unsafe. Take an API that gets a 4 on Rusty Russell’s API design scale (“Follow common convention and you’ll get it right”), but which was designed for a completely different programming language if not paradigm, and it’s not surprising that it can’t easily be transformed into a 9 (“The compiler/linker won’t let you get it wrong”). But at the same time there are a dozen ways in which, according to the same scale, things could actually be worse!
What I dislike is that people are seeing “awareness of complexity” and the message they spread is “absurdly low quality”.
Note that doubly linked lists are not a special case at all in Rust. All the other common data structures like Vec, HashMap etc. also need unsafe code in their implementation.
Implementing these datastructures in Rust, and writing unsafe code in general, is indeed roughly a 4. But these are all already implemented in the standard library, with an API that actually is at a 9. And std::collections::LinkedList is constructive proof that you can have a safe Rust abstraction for doubly linked lists.
Yes, the implementation could have bugs, thus making the abstraction leaky. But that’s the case for literally everything, down to the hardware that your code runs on.
You’re absolutely right that you can build abstractions with enough effort.
My point is that if a doubly linked list is (again, for good reasons) hard to make into a 9, a 20-year-old API may very well be even harder. In fact, std::collections::LinkedList is safe but still not great (for example the cursor API is still unstable); and being in std, it was designed/reviewed by some of the most knowledgeable Rust developers, sort of by definition. That’s the conundrum that maintainers face and, if they realize that, it’s a good thing. I would be scared if maintainers handwaved that away.
Yes, the implementation could have bugs, thus making the abstraction leaky.
Bugs happen, but if the abstraction is downright wrong then that’s something I wouldn’t underestimate. A lot of the appeal of Rust in Linux lies exactly in documenting/formalizing these unwritten rules, and wrong documentation can be worse than no documentation (cue the negative parts of the API design scale!); even more so if your documentation is a formal model like a set of Rust types and functions.
That said, the same thing can happen in a Rust-first kernel, which will also have a lot of unsafe code. And it would be much harder to fix it in a Rust-first kernel, than in Linux at a time when it’s just feeling the waters.
In fact, std::collections::LinkedList is safe but still not great (for example the cursor API is still unstable); and being in std, it was designed/reviewed by some of the most knowledgeable Rust developers, sort of by definition.
At the same time, it was included almost as like, half a joke, and nobody uses it, so there’s not a lot of pressure to actually finish off the cursor API.
It’s also not the kind of linked list the kernel would use, as they’d want an intrusive one.
And yet, safe to use doubly linked lists written in Rust exist. That the implementation needs unsafe is not a real problem. That’s how we should look at wrapping C code in safe Rust abstractions.
The whole comment you replied to, after the one sentence about linked lists, is about abstractions. And abstractions are rarely going to be easy, and sometimes could be hardly possible.
That’s just a fact. Confusing this fact for something as hyperbolic as “absurdly low quality” is stunning example of the Dunning Kruger effect, and frankly insulting as well.
I personally would call Linux low quality because many parts of it are buggy as sin. My GPU stops working properly literally every other time I upgrade Linux.
No one is saying that Linux is low quality because it’s hard or impossible to abstract some subsystems in Rust, they’re saying it’s low quality because a lot of it barely works! I would say that your “Chinese whispers” misrepresents the situation and what people here are actually saying. “the safety of this API is hard to formalize and you pretty much have to use it the way everybody does it” doesn’t apply if no one can tell you how to use an API, and everyone does it differently.
Actually, the NT kernel of all things seems to have a pretty good reputation, and I wouldn’t dismiss the BSD kernels out of hand. I don’t know which kernel is better, but it seems you do. If you could explain how you came to this conclusion that would be most helpful.
*nod* I haven’t been a Windows person since shortly after the release of Windows XP (i.e. the first online activation DRM’d Windows) but, whenever I see glimpses of what’s going on inside the NT kernel in places like Project Zero: The Definitive Guide on Win32 to NT Path Conversion, it really makes me want to know more.
Random sidenote: I wish there was standard shortcuts or aliases for frequently typed commands. It’s annoying to type systemctl daemon-reload after editing a unit, e.g. why not systemctl dr? Or debugging a failed unit, journalctl -xue myunit seems unnecessarily arcane, why not --debug or friendlier?
Well I guess fu rolls better of the tongue than uf. But I remember literally looking up if there isn’t anything like -f and having issues with that. Oh well.
I’m not sure it would be “clever”. At best it would make transactional changes (i.e. changes that span several files) hard, at worst impossible. It would also be a weird editing experience when just saving activates the changes.
I wonder why changes should need to be transactional? In Kubernetes we edit resource specs—which are very similar to systemd units—individually. Eventually consistency obviates transactions. I think the same could have held for systemd, right?
I wonder why changes should need to be transactional
Because the services sd manages are mote stateful. If sd restarted every service each moment their on-disk base unit file changes [1], desktop users, database admins, etc would have terrible experience.
I thoroughly enjoyed “Recoding America” which IIRC was written by someone very involved with Code For America (a founder, perhaps?–it’s been a while). Even still, while I sort of understand what Civic Tech is inside of government (for example, healthcare.gov or some other government form), I don’t really have a good idea about what Civic Tech looks like outside of government? What are some good examples of things non-government agencies have built? Do they require partnering with a government entity, or are there things that people can do for their fellow citizens independent of a local government? How can I find out if there is anything happening in my state or municipality?
Some of the classic “first wave” examples would be sites like everyblock (now defunct), govtrack.us or my own project openstates.org. These projects aim to help citizens better understand what’s happening at different levels of government (local, federal, and state respectively) and, in theory, give them more of a voice. These projects do not require partnership with governments (in fact, at the federal and state level it is very hard to “partner” in any meaningful way if you’re a small/independent team) but benefit from open data if it is available. (But do not require it, web scraping is a big part of the job in a lot of cases, since improving a government interface often means a crufty HTML site a vendor built in 2004.
Alaveteli is a codebase that enables you to run your own Freedom of Information website in any country.
FixMyStreet is a codebase which in its basic form will let you run a site that helps citizens report street issues to the authorities responsible for fixing them. It has also been successfully adapted many times for other projects which a) collect reports or posts with defined geographical points, b) optionally also collect a category, description and photos, and c) forward these to an email address which is defined by the user inputs.
EveryPolitician is a repository of open, structured data which aims to cover every politician in every country in the world. It’s free to use and should be useful to anyone running a website or app which helps citizens monitor, understand or contact their elected representatives.
These represent three common kinds of civic applications: those that provide a better interface to a government service (make it easier to file FOIAs), those that crowdsource information within communities, and those that bring together information from different governments or levels of government into a single place to make it easier for the citizen to digest. (Different units of governance often themselves have little incentive or ability to collaborate directly.)
I wish I could point you towards an easy way to find stuff in your state or municipality, that is a part of what we lost when we lost Sunlight & the brigades.
The interesting thing to me about the argument made in the article, is that one could achieve pretty much the same result by cutting and pasting Javascript-style exception handling in the same way that one cuts and pastes
if err != nil { return err }in Go.And I’m not defending Javascript-style exception handling here.
Nor am I suggesting that Go error handling is as bad as Javascript-style exception handling.
I’m just pointing out that it’s a weak argument that boils down to “I like the look of cut&paste
ifbetter than I like the look of cut&pastetry..catch”.I personally like languages that support handling most errors as values, and yet still have support for exceptions for dealing with things that most callers are not expecting to deal with (i.e. where an actual longjmp makes sense). To that end, Go got it half right by supporting multiple return values, which is a critical ingredient for supporting errors as values.
Go also has
panic()for things that most callers are not expecting to deal with.Yes, I’ve seen panic/recover. The recover side seems a bit awkward, but exception handling has always seemed at least a little awkward. I’m pretty sure that the use of panic/recover is generally frowned on in Go, and I’ve read Rob Pike’s comments on the topic, and I agree with his fundamental position (you shouldn’t need more than a handful of “recovers” in an entire system – at least for most systems).
It’s frowned upon for error handling, but not for cases like you described—events that are truly exceptional and indicate a program error. The reflect standard library panics all over the place.
Nice! I don’t know what it is but there is something really satisfying about hosting your website at home. You can have some fun as well, like getting an LED to blink on every hit to the site.
I do want to do something hardware related because right now I’m under utilising the pi’s hardware abilities, but I feel like I’d have trouble distinguishing real traffic from bot traffic.
I have an interactive pixel grid that syncs to an ePaper in my home on my website: https://www.svenknebel.de/posts/2023/12/2/ (picture, grid it self at the top of the homepage feed)
Very intentionally very low-res so I don’t have to worry about people writing/drawing bad stuff, and its an entirely separate small program, so if someone ever manages to crash it only that part is gone.
there is a neat little project at https://lights.climagic.com/ where you can switch the lights on and off remotely…
I just moved my blog off of EC2 to my Raspberry Pi Kubernetes cluster at home just today. The whole idea behind running it on EC2 was that I figured I would have fewer reliability issues than on my homelab Kubernetes cluster, but the Kubernetes cluster has been remarkably stable (especially for stateless apps) and my EC2 setup was remarkably flaky[^1]. It’s definitely rewarding to run my own services, and it saves me a bunch of time/money to boot.
[^1]: not because of EC2, but because I would misconfigure Linux things, or not properly put my certificates in an EBS volume, or not set the spot instance termination policy properly, or any of a dozen other things–my k8s cluster runs behind cloudflare which takes care of the https stuff for me
TL;DR: Because the only alternative I’m aware of are exceptions.
I don’t think I would write “Go’s error handling is awesome”, but it’s probably the least bad thing I’ve used. The main alternatives to exceptions that I’ve used have been C’s return values which are typically pretty terrible and Rust’s which are better in principle (less boilerplate at call sites, no unhandled-error bugs, and similar non-problems), but somewhat worse than Go’s in practice:
My biggest grievance with Rust error handling is the choice between (1) the sheer burden of managing your own error trait implementations (2) the unidiomatic-ness of using
anyhowin libraries and (3) the absurd amount of time I spend debugging macro expansion errors from crates likethiserror. Maybe there’s some (4) option where you return some dynamic boxed error and pray callers never need to introspect it?And I don’t think either Rust or Go have a particularly compelling solution for attaching stack traces automatically (which is occasionally helpful) or for displaying additional error context beyond the stack trace (i.e., all of the parameters that were passed down in a structured format)? Maybe Rust has something here–I’m somewhat less familiar.
I’m also vaguely aware that Zig and Swift do errors-as-values, but I haven’t used them either. Maybe the people who have found something better than Go could enlighten us?
We use a custom error type in Go that lets you add fields, like loggers do, and just stores the values in a
map[string]interface{}. This doesn’t solve the problem of capturing all parameter values, or stack traces, but it’s pretty good in practice. This is particularly true since errors tend to eventually end up in log messages so you can do something likelog.WithFields(err.Fields).Error("something terrible happened"). If we could nest those fields based on the call stack I would probably never complain again.That’s pretty awesome. I’ll play around with this. Thanks for sharing.
Best solution Ive used is polymorphic variants from ocaml. They force you to at least acknowledge the error exists and you can easily compose different errors together.
Exceptions are better on all counts though, so even that is a faulty conclusion in my opinion.
Exceptions do the correct thing by default, auto-unwrap in the success case, auto-bubble up on the error case with as fine-grained scoping as necessary (try-catch blocks). Go’s error handling is just a tad bit higher than the terrible mess that is C’s errno.
Edit: Re cryptic stack traces: most exception-based languages can trivially chain exceptions as well, so you get both a clear chain of cause and a stack trace. I swear if Java would simply default to printing out the message of each exception in the chain at the top in an easier to read matter and only print the stack trace after, people would like it better.
I was hoping this article would compare
if err != nilto more modern approaches (Rust’s?) and not just Java-style exceptions, but unfortunately it doesn’t.I’d be more interested to read an article that weighs the approaches against each other.
One point the article misses is how value-based error handling works really nicely when you don’t have constructors (either in your language, or at least your codebase, in case you’re using C++.)
I’ve been pretty disappointed by Rust’s approach to error handling. It improves upon two “problems” in Go which IMHO are not actually problems in practice:
if err != nilboilerplate and unhandled return values, but making good error types is fairly hard–either you manually maintain a bunch of implementations of the Error trait (which is a truly crushing amount of boilerplate) or you use something likeanyhowto punt on errors (which is generally considered to be poor practice for library code) or you use some crate that generates the boilerplate for you via macros. The latter seems idyllic, but in practice I spend about as much time debugging macro errors as I would spend just maintaining the implementations manually.In Go, the error implementation is just a single method called
Error()that returns a string. Annotating that error, whether in a library or a main package, is justfmt.Errorf("calling the endpoint: %w", err). I don’t think either of them do a particularly good job of automating stack trace stuff, and I’m not sure about Rust, but at least Go does not have a particularly good solution for getting more context out of the error beyond the error message–specifically parameter values (if I’m passing a bunch of identifiers, filepaths, etc down the call stack that would be relevant for debugging, you have to pack them into the error message and they often show up several times in the error message or not at all because few people have a good system for attaching that metadata exactly once).A lot of people have a smug sense of superiority about their language’s approach to error handling, which (beyond the silliness of basing one’s self-esteem on some programming language feature) always strikes me as silly because even the best programming languages are not particularly good at it, or at least not as good as I imagine it ought to be.
you usually just need to impl
Display, which I wouldn’t call a “crushing” amount of boilerplatethiserror is pretty good, although tbqh just having an
enum Errorand implementing display for it is good enough. I’ve done some heavy lifting with error handling before but that’s usually to deal with larger issues, like making sure errors are Clone + Serialize + Deserialize and can keep stacktraces across FFI boundaries.It’s pretty rarely “just” impl Display though, right? If you want automatic conversions from some upstream types you need to implement From, for example. You could not do it, but then you’re shifting the boilerplate to every call site. Depending on other factors, you likely also need Debug and Error. There are likely others as well that I’m not thinking about.
#[derive(Debug)]andimpl Displaymakes the impl ofErrortrivial (impl Error for E {}). If you’re wrapping errors then you probably want to implementsource().thiserroris a nice crate for doing everything with macros, and it’s not too heavy so the debugging potential is pretty low.One advantage of
map_err(...)everywhere instead of implementingFromis that it gives you access tofile!()andline!()macros so you can get stack traces out of your normal error handling.I’ve used
thiserrorand a few other crates, and I still spend a lot more time than I’d like debugging macro expansions. To the point where I waffle between using it and maintaining the trait implementations by hand. I’m not sure which of the two is less work on balance, but I know that I spend wayyy more time trying to make good error types in Rust than I do with Go (and I’d like to reiterate that I think there’s plenty of room for improvement on Go’s side).Maybe I should try this more. I guess I wish there was clear, agreed-upon guidance for how to do error handling in Rust. It seems like lots of people have subtly different ideas about how to do it–you mentioned just implementing
Displaywhile others encouragethiserrorand someone else in this thread suggestedBox<dyn Error>while others suggestanyhow.The rule of thumb I’ve seen is
anyhowfor applications andthiserroror your own custom error type for libraries, and ifthiserrordoesn’t fit your needs (for example, needing clone-able or serializable errors, stack traces, etc). Most libraries I’ve seen either use thiserror if they’re wrapping a bunch of other errors, or just have their own error type which is usually not too complex.Surprisingly, you don’t see people mention Common Lisp’s condition system in these debates
That’s too bad, I genuinely enjoy learning about new (to me) ways of solving these problems, I just dislike the derisive fervor with which these conversations take place.
You discount
anyhowas punting on errors, but Go’sError()with a string is the same strategy.If you want that, you don’t even need
anyhow. Rust’s stdlib hasBox<dyn Error>. It supportsFrom<String>, so you can use.map_err(|err| format!("calling the endpoint: {err}")). There’sdowncast()and.source()for chaining errors and getting errors with data, if there’s more than a string (butanyhowdoes that better with.context()).Ah, I didn’t know about downcast(). Thanks for the correction.
One source of differences in different languages’ error handling complexity is whether you think errors are just generic failures with some human-readable context for logging/debugging (Go makes this easy), or you think errors have meaning that should be distinguishable in code and handled by code (Rust assumes this). The latter is inherently more complicated, because it’s doing more. You can do it either way in either language, of course, it’s just a question of what seems more idiomatic.
I don’t think I agree. It’s perfectly idiomatic in Go to define your own error types and then to handle them distinctly in code up the stack. The main difference is that Rust typically uses enums (closed set) rather than Go’s canonical error interface (open set). I kind of think an open set is more appropriate because it gives upstream functions more flexibility to add error cases in the future without breaking the API, and of course Rust users can elect into open set semantics–they just have to do it a little more thoughtfully. The default in Go seems a little more safe in this regard, and Go users can opt into closed set semantics when appropriate (although I’m genuinely not sure off the top of my head when you need closed set semantics for errors?). I’m sure there are other considerations I’m not thinking of as well–it’s interesting stuff to think about!
Maybe “idiomatic” isn’t quite the right word and I just mean “more common”. As I say, you can do both ways in both languages. But I see a lot of Go code that propagates errors by just adding a string to the trace, rather than translating them into a locally meaningful error type. (E.g.,
so the caller can’t distinguish the errors without reading the strings, as opposed to
AFAIK the
%wfeature was specifically designed to let you add strings to a human-readable trace without having to distinguish errors.Whereas I see a lot of Rust code defining a local error type and an
impl Fromto wrap errors in local types. (Whether that’s done manually or via a macro.)Maybe it’s just what code I’m looking at. And of course, one could claim people would prefer the first way in Rust, if it had a stdlib way to make a tree of untyped error strings.
Right, we usually add a string when we’re just passing it up the call stack, so we can attach contextual information to the error message as necessary (I don’t know why you would benefit from a distinct error type in this case?). We create a dedicated error type when there’s something interesting that a caller might want to switch on (e.g., resource not found versus resource exists).
It returns a type that wraps some other error, but you can still check the underlying error type with
errors.Is()anderrors.As(). So I might have an API that returns*FooNotFoundErrand its caller might wrap it infmt.Errorf("fetching foo: %w", err), and the toplevel caller might doif errors.As(err, &fooNotFoundErr) { return http.StatusNotFound }.I think this is just the open-vs-closed set thing? I’m curious where we disagree: In Go, fallible functions return an
errorwhich is an open set of error types, sort of likeBox<dyn Error>, and so we don’t need a distinct type for each function that represents the unique set of errors it could return. And since we’re not creating a distinct error type for each fallible function, we may still want to annotate it as we pass it up the call stack, so we havefmt.Errorf()much like Rust hasanyhow!(but we can use fmt.Errorf() inside libraries as well as applications precisely because concrete error types aren’t part of the API). If you have to make an error type for each function’s return, then you don’t needfmt.Errorf()because you just add the annotation on your custom type, but when you don’t need to create custom types, you realize that you still want to annotate your errors.This is true, usually you create a specific error type on the fly when you understand that the caller needs to distinguish it.
I tend to agree that rusts error handling is both better and worse. In day to day use I can typically get away with anyhow or dyn Error but it’s honestly a mess, and one that I really dread when it starts barking at me.
On the other hand… I think being able to chain ‘?’ blocks is a god send for legibility, I think Result is far superior to err.
I certainly bias towards Rusts overall but it’s got real issues.
There is one thing to be said against
?: it does not encourage the addition of contextual information, which can make diagnosing an error more difficult when e.g. it gets expect-ed (or logged out) half a dozen frames above with no indication of the path it took.However I that is hardly unsolvable. You could have e.g.
?("text") which wraps with text and returns, and ?(unwrapped) which direct returns (the keyword being there to encourage wrapping, one could even imagine extending this to more keywords e.g.?(panic)` would be your unwrap).In a chain i’ll just map_err which as soon as the chain is multiline looks and works well. Inline it’s not excellent ha.
Oh yeah I’m not saying it’s not possible to decorate things (it very much is), just pointing out that the incentives are not necessarily in that direction.
If I was a big applications writer / user of type-erased errors, I’d probably add a wrapping method or two to
Result(if I was to use “raw” boxed error, as IIRC anyhow has something like that already).I’ve often wondered if people would like Java exceptions more if it only supported checked exceptions. You still have the issue of exceptions being a parallel execution flow / go-to, but you lose the issue of random exceptions crashing programs. In my opinion it would make the language easier to write, because the compiler would force you to think about all the ways your program could fail at each level of abstraction. Programs would be more verbose, but maybe it would force us to think more about exception classes.
Tl;Dr Java would be fine if we removed RuntimeException?
You’d need to make checked exceptions not horrendous to use to start with e.g. genericity exception transparency, etc…
It would also necessarily yield a completely different language, consider what would happen if NPEs were checked.
Basically, Kotlin, so yeah, totally agree with you.
No, Go has unchecked exceptions. They’re called “panics”.
What makes Go better than Java is that you return the error interface instead of a concrete error type, which means you can add a new error to an existing method without breaking all your callers and forcing them to update their own throws declarations.
The creator of C# explains the issue well here: https://www.artima.com/articles/the-trouble-with-checked-exceptions#part2
You can just throw Exception (or even a generic) in Java just fine, though if all you want is an “error interface”.
Java’s problem with checked exceptions is simply that checked exceptions would probably require effect types to be ergonomic.
Looks like it’s been updated:
I’m also personally quite fond of error handling in Swift.
Rust, Zig, and Swift all have interesting value-oriented results. Swift more so since it added, well, Result and the ability to convert errors to that.
Zig’s not really value oriented. It’s more like statically typed error codes.
No matter how many times Go people try to gaslight me, I will not accept this approach to error-handling as anything approaching good. Here’s why:
Why must you rely on a linter or IDE to catch this mistake? Because the compiler doesn’t care if you do this.
If you care about correctness, you should want a compiler that considers handling errors part of its purview. This approach is no better than a dynamic language.
The fact that the compiler doesn’t catch it when you ignore an error return has definitely bitten me before.
doTheThing()on its own looks like a perfectly innocent line of code, and the compiler won’t even warn on it, but it might be swallowing an error.I learned that the compiler doesn’t treat unused function results as errors while debugging a bug in production; an operation which failed was treated as if it succeeded and therefore wasn’t re-tried as it should. I had been programming in Go for many years at that point, but it had never occurred to me that silently swallowing an error in Go could possibly be so easy as just calling a function in the normal way. I had always done
_ = doTheThing()if I needed to ignore an error, out of the assumption that of course unused error returns is a compile error.Does anyone know the reason why the Go compiler allows ignored errors?
Because errors aren’t special to the Go compiler, and Go doesn’t yell at you if you ignore any return value. It’s probably not the most ideal design decision, but in practice it’s not really a problem. Most functions return something that you have to handle, so when you see a naked function call it stands out like a sore thumb. I obviously don’t have empirical evidence, but in my decade and a half of using Go collaboratively, this has never been a real pain point whether with junior developers or otherwise. It seems like it mostly chafes people who already had strong negative feelings toward Go.
It’s similar to array bounds checks in c - not really a problem.
I hope this is sarcasm.
Yes
Is there a serious argument behind the sarcasm as to how this is comparable to array bounds checks? Do you have any data about the vulnerabilities that have arisen in Go due to unhandled errors?
Because the programmer made an intentional decision to ignore the error. It won’t let you call a function that returns an error with out assigning it to something, that would be a compile time error. If the programmer decides to ignore it, that’s on the programmer (and so beware 3rd party code).
Now perhaps it might be a good idea for the compiler to insert code when assigned to _ that panics if the result is non-nil. Doesn’t really help at runtime, but at least it would fail loudly so they could be found.
I’ve spent my own share of time tracking down bugs because something appeared to be working but the error/exception was swallowed somewhere without a trace.
This is incorrect: https://go.dev/play/p/k7ErZU5QYCu
huh… til. I always assumed you needed to use the result, probably because of single vs multiple returns needing both being a compile time error. Thanks.
To be fair I was just as certain as you that of course Go requires using the return values, until I had to debug this production bug. No worries.
is not an intentional decision to ignore the error. Neither is
Yet the go compiler will never flag the first one, and may not flag the second one depending on
errbeing used elsewhere in the scope (e.g. in the unheard of case where you have two different possibly error-ing calls in the same scope and you check the other one).Yeah, I always thought the _ was required, I learned something today!
I do have a few places with err and err2, it does kind of suck - I should probably breakup those functions.
_, err := f.Write(s)is a compiler error iferralready exists (no new variables on left side of :=), and iferrdoesn’t already exist and you aren’t handling it, you get a different error (declared and not used: err). I think you would have to assign a new variablet, err := f.Write(s)and then take care to handletin order to silently ignore theerr, but yeah, with some work you can get Go to silently swallow it in the variable declaration case.Because they couldn’t be arsed to add this in v0, and they can’t be arsed to work on it for cmd/vet, and there are third-party linters which do it, so it’s all good. Hopefully you don’t suffer from unknown unknowns and you know you should use one of these linters before you get bit, and they don’t get abandoned.
(TBF you need both that and errcheck, because the unused store one can’t catch ignoring return values entirely).
Considering how much effort the Go team puts in basically everything, this language makes it very hard to to take you serious.
Yes, except for things that they decide not to be arsed about. I can confirm this as a very real experience of dealing with Go.
Which is fair enough.
Sure, but then it is equally fair to criticize them for it.
The go compiler doesn’t do warnings, only errors. Linters do warnings, and do warn about unchecked errors.
I don’t really care. Generally speaking, I would expect compilers to either warn or error on an implicitly swallowed error. The Go team could fix this issue by either adding warnings for this case specifically (going back on their decision to avoid warnings), or by making it a compile error, I don’t care which.
This is slightly more nuanced. Go project ships both
go buildandgo vet.go vetis an isomorphic to how Rust handles warnings (that warnings apply to you, not your dependencies).So there would be nothing wrong per se if this was caught by
go vetand notgo build.What is the issue though, is that this isn’t caught by first-party
go vet, and requires third partyerrcheck.Meh plenty of code bases don’t regularly run
go vet. This is a critical enough issue that it should be made apparent as part of any normal build, either as a warning or an error.And that’s perfectly fine given that Go is pleasurable even for quick and dirt prototypes, fun side projects, and so on.
I agree with you that it’s better for this to be a compiler error, but (1) I’ll never understand why this is such a big deal–I’m sure it’s caused bugs, but I don’t think I’ve ever seen one in the dozen or so years of using Go and (2) I don’t think many dynamic languages have tooling that could catch unhandled errors so I don’t really understand the “no better than a dynamic language” claim. I also suspect that the people who say good things about Go’s error handling are making a comparison to exceptions in other languages rather than to Rust’s approach to errors-as-values (which has its own flaws–no one has devised a satisfactory error handling system as far as I’m aware).
The fact that these bugs seem so rare and that the mitigation seems so trivial makes me feel like this is (yet another) big nothingburger.
The most common response to my critique of Go’s error-handling is always some variation on “this never happens”, which I also do not accept because I have seen this happen. In production. So good for you, if you have not; but I know from practice this is an issue of concern.
Relying on the programmer to comprehensively test inputs imperatively in a million little checks at runtime is how dynamic languages handle errors. This is how Go approached error-handling, with the added indignity of unnecessary verbosity. At least in Ruby you can write single-line guard clauses.
I don’t really follow your dismissal of Rust since you didn’t actually make an argument, but personally I consider Rust’s
Optiontype the gold standard of error-handling so far. The type system forces you to deal with the possiblity of failure in order to access the inner value. This is objectively better at preventing “trivial” errors than what Go provides.I’m sure it has happened before, even in production. I think most places run linters in CI which default to checking errors, and I suspect if someone wasn’t doing this and experienced a bug in production, they would just turn on the linter and move on with life. Something so exceedingly rare and so easily mitigated does not meet my threshold for “issue of concern”.
That’s how all languages handle runtime errors. You can’t handle them at compile time. But your original criticism was that Go is no better than a dynamic language with respect to detecting unhandled errors, which seems untrue to me because I’m not aware of any dynamic languages with these kinds of linters. Even if such a linter exists for some dynamic language, I’m skeptical that they’re so widely used that it merits elevating the entire category of dynamic languages.
I didn’t dismiss Rust, I was suggesting that you may have mistaken the article as some sort of criticism of Rust’s error handling. But I will happily register complaints with Rust’s error handling as well–while it does force you to check errors and is strictly better than Go in that regard, this is mostly a theoretical victory insofar as these sorts of bugs are exceedingly rare in Go even without strict enforcement, and Rust makes you choose between the verbosity of managing your own error types, debugging macro expansion errors from crates like
thiserror, or punting altogether and doing the bare minimum to provide recoverable error information. I have plenty of criticism for Go’s approach to error handling, but pushing everything into an error interface and switching on the dynamic type gets the job done.For my money, Rust has the better theoretical approach and Go has the better practical approach, and I think both of them could be significantly improved. They’re both the best I’m aware of, and yet it’s so easy for me to imagine something better (automatic stack trace annotations, capturing and formatting relevant context variables, etc). Neither of them seems so much better in relative or absolute terms that their proponents should express superiority or derision.
I don’t accept your unsubstantiated assertion that this is rare, so it seems we are at an impasse.
Fair enough. It’s a pity things like this are so difficult to answer empirically, and we must rely on our experiences. I am very curious how many orgs are bitten by this and how frequently.
Couldn’t agree more (honourable mention to Zig, though).
Enabling a linter is different from doing “a million little checks at runtime”. This behaviour is not standard because you can use Go for many reasons other than writing production-grade services, and you don’t want to clutter your terminal with unchecked error warnings.
I admit that it would be better if this behaviour were part of
go vetrather than an external linter.The strange behaviour here is not “Go people are trying to gaslight me”, but people like you coming and complaining about Go’s error handling when you have no interest in the language at all.
You can’t lint your way out of this problem. The Go type system is simply not good enough to encapsulate your program’s invarients, so even if your inputs pass a type check you still must write lots of imperative checks to ensure correctness.
Needing to do this ad-hoc is strictly less safe than relying on the type system to check this for you.
errchecks are simply one example of this much larger weakness in the language.I have to work with it professionally, so I absolutely do have an interest in this. And I wouldn’t feel the need to develop this critique of it publicly if there weren’t a constant drip feed of stories telling me how awesome this obviously poor feature is.
Your views about how bad Go’s type system is are obviously not supported by the facts, otherwise Go programs would be full of bugs (or full of minuscule imperative checks) with respect to your_favourite_language.
I understand your point about being forced to use a tool in your $job that you don’t like, that happened to me with Java, my best advice to you is to just change $job instead of complaining under unrelated discussions.
They are full of bugs, and they are full of miniscule imperative checks. The verbosity of all the
if err != nilchecks is one of the first things people notice. Invoke “the facts” without bringing any isn’t meaningfully different than subjective opinion.Your comments amount to “shut up and go away” and I refuse. To publish a blog post celebrating a language feature, and to surface it on a site of professionals, is to invite comment and critique. I am doing this, and I am being constructive by articulating specific downsides to this language decision and its impacts. This is relevant information that people use to evaluate languages and should be part of the conversation.
If
if err != nilchecks are the “minuscle imperative checks” you complain about, I have no problem with that.That you have “facts” about Go programs having worse technical quality (and bug count) than any other language I seriously doubt, at most you have anecdotes.
And the only anecdote you’ve been able to come up with so far is that you’ve found “production bugs” caused by unchecked errors that can be fixed by a linter. Being constructive would mean indicating how the language should change to address your perceived problem, not implying that the entire language should be thrown out the window. If that’s how you feel, just avoid commenting on random Go post.
Yeah, I have seen it happen maybe twice in eight years of using Go professionally, but I have seen it complained about in online comment sections countless times. :-)
If I were making a new language today, I wouldn’t copy Go’s error handling. It would probably look more like Zig. But I also don’t find it to be a source of bugs in practice.
Everyone who has mastered a language builds up muscle memory of how to avoid the Bad Parts. Every language has them. This is not dispositive to the question of whether a particular design is good or not.
The happy people are just happily working on solving their real problems. not wasting time complaining.
Not seeing a problem as a bug in production doesn’t tell you much. It usually just means that the developers spent more writing tests or doing manual testing - and this is just not visible to you. The better the compiler and type-system, the fewer tests you need for the same quality.
Agreed, but I wasn’t talking about just production–I don’t recall seeing a bug like this in any environment, at any stage.
In a lot of cases I am the developer, or I’m working closely with junior developers, so it is visible to me.
Of course with Go we don’t need to write tests for unhandled errors any more than with Rust, we just use a linter. And even when static analysis isn’t an option, I disagree with the logic that writing tests is always slower. Not all static analysis is equal, and in many cases it’s not cheap from a developer velocity perspective. Checking for errors is very cheap from a developer velocity perspective, but pacifying the borrow checker is not. In many cases, you can write a test or two in the time it would take to satisfy rustc and in some cases I’ve even introduced bugs precisely because my attention was so focused on the borrow checker and not on the domain problem (these were bugs in a rewrite from an existing Go application which didn’t have the bugs to begin with despite not having the hindsight benefit that the Rust rewrite enjoyed). I’m not saying Rust is worse or static analysis is bad, but that the logic that more static analysis necessarily improves quality or velocity is overly simplistic, IMHO.
I just want to emphasize that It’s not the same thing - as you also hint to in the next sentence.
I didn’t say that writing tests is always slower or that using the compiler to catch these things is necessarily always better. I’m not a Rust developer btw. and Rust’s errorhandling is absolutely not the current gold-standard by my own judgement.
It kind of is the same thing: static analysis. The only difference is that the static analysis is broken out into two tools instead of one, so slightly more care needs to be taken to ensure the linter is run in CI or locally or wherever appropriate. To be clear, I think Rust is strictly better for having it in the compiler–I mostly just disagree with the implications in this thread that if the compiler isn’t doing the static analysis then the situation is no better than a dynamic language.
What did you mean when you said “It usually just means that the developers spent more writing tests or doing manual testing … The better the compiler and type-system, the fewer tests you need for the same quality.” if not an argument about more rigorous static analysis saving development time? Are we just disagreeing about “always”?
Ah I see - that is indeed an exaggeration that I don’t share.
First that, but it also in general has other disadvantages. For instance, writing tests or doing manual tests is often easy to do. Learning how to deal with a complex time system is not. Go was specifically created to get people to contribute fast.
Just one example that shows that it’s not so easy to decide which way is more productive.
Ah, I think we’re agreed then. “always” in particular was probably a poor choice of words on my part.
Swallowing errors is the very worst option there is. Even segfaulting is better, you know at least something is up in that case.
Dynamic languages usually just throw an exception and those have way better behavior (you can’t forget, an empty catch is a deliberate sign to ignore an error, not an implicit one like with go), at least some handler further up will log something and more importantly the local block that experienced the error case won’t just continue executing as if nothing happened.
I disagree with this, only because it’s imperialism. I’m British, in British English I write marshalling (with two of the letter l), sanitising (-sing instead of -zing except for words ending in a z), and -ise instead of -ize, among other things. You wouldn’t demand an Arabic developer to write all his comments in English for your sake for the sake of being idiomatic, would you?
I’ve worked for a few companies in Germany now, about half of them with their operating language being in German. All of them had comments being written exclusively in English. I don’t know how that is in other countries, but I get the impression from Europeans that this is pretty standard.
That said, my own preference is for American English for code (i.e. variable names, class names, etc), but British English for comments, commit messages, pull requests, etc. That’s because the names are part of the shared codebase and therefore standardised, but the comments and commit messages are specifically from me. As long as everyone can understand my British English, then I don’t think there’s much of a problem.
EDIT: That said, most of these suggestions feel more on the pedantic end of the spectrum as far as advice goes, and I would take some of this with a pinch of salt. In particular, when style suggestions like “I tend to write xyz” become “do this”, then I start to raise eyebrows at the usefulness of a particular style guide.
Developers in China seem to prefer Chinese to English. When ECharts was first open-sourced by Baidu most of the inline comments (and the entire README) were in Chinese:
In Japan I feel like the tech industry is associated with English, and corporate codebases seem to use mostly English in documentation. However, many people’s personal projects have all the comments/docs in Japanese.
If someone wants to force everyone to spell something the same within a language they should make sure it’s spelled wrong in all varieties, like with HTTP’s ‘referer’.
The Go core developers feel so strongly about their speling that they’re wiling to change the names of constants from other APIs.
The gRPC protocol contains a status code enum (https://grpc.io/docs/guides/status-codes/), one of which is
CANCELLED. Every gRPC library uses that spelling except for go-grpc, which spells itCanceled.Idiosyncratic positions and an absolute refusal to concede to common practice is part and parcel of working with certain kinds of people.
We’re drifting off-topic, but I have to ask: gRPC is a Google product; Go is a Google product; and Google is a US company. How did gRPC end up with
CANCELLEDin the first place?!When you use a lot of staff on H-1B and E-3 visas, you get a lot of people who write in English rather than American!
Wait until you hear about the HTTP ‘Referer’ header. The HTTP folks have been refusing to conform to common practice for more than 30 years!
If this is something other than a private pet project of a person who has no ambition of ever working with people outside of his country? Yes, yes I would.
I believe the advice is still applicable to non-native speakers. In all companies I worked for in France, developers write code in English, including comments, sometimes even internal docs. There are a lot of inconsistencies (typically mixing US English and GB English, sometimes in the same sentence.)
In my experience (LatAm) the problem with that is people tend to have pretty poor English writing skills. You end up with badly written comments and commit messages, full of grammatical errors. People were aware of this so they avoided writing long texts in order to limit their mistakes, so we had one-line PR descriptions, very sparse commenting, no docs to speak of, etc.
Once I had the policy changed for the native language (Portuguese) in PRs and docs they were more comfortable with it and documentation quality improved.
In Europe people are much more likely to have a strong English proficiency even as a second or third language. You have to know your audience, basically.
While I like to write paragraphs of explanation in-between code, my actual comments are rather ungrammatical, with a bit of git style verb-first, removing all articles and other things. Proper English feels wrong in these contexts. Some examples from my currently opened file:
Those comments would most likely look weird to a person unfamiliar with your particular dialect.
In a small comment it’s fine to cut some corners, similar to titles in newspapers, but we can’t go overboard: the point of these things is to communicate, we don’t want to make it even more difficult for whoever is reading them. Proper grammar helps.
For clarification, this is not my dialect/way of speaking. But I see so many short interline comments like this, that I started thinking they feel more appropriate and make them too, now. Strange!
“If you use standard terms, spell them in a standard way” is not the same as “use only one language ever”.
Is “chapéu” or “hat” the standard way of spelling hat in Golang? If it’s “hat”, your standard is “only use American English ever”.
Is “hat” a standard term regularly used in the golang ecosystem for a specific thing and on the list given in the article? If not, it is not relevant to the point in the article.
(And even generalized: if it happens to be an important term for your code base or ecosystem, it probably makes sense to standardize on how to spell it. in whatever language and spelling you prefer. I’ve worked on mixed-language codebases, and it’d been helpful if people consistently used the German domain-specific terms instead of mixing them with various translation attempts. Especially if some participants don’t speak the language (well) and have to treat terms as partially opaque)
What? England had the word “hat” long before the USA existed.
I had to solve this once. I maintain a library that converts between HTML/CSS color formats, and one of the formats is a name (and optional spec to say which set of names to draw from). HTML4, CSS2, and CSS2 only had “gray”, but CSS3 added “grey” as another spelling for the same color value, and also added a bunch of other new color names which each have a “gray” and a “grey” variant.
Which raises the question: if I give the library a hex code for one of these and ask it to convert to name, which name should it convert to?
The solution I went with was to always return the “gray” variant since that was the “original” spelling in earlier HTML and CSS specs:
https://webcolors.readthedocs.io/en/latest/faq.html#why-does-webcolors-prefer-american-spellings
I thought you guys loved imperialism?
Imperialism is like kids, you like your own brand.
I don’t think it’s really “imperialism”—firstly, “marshaling” isn’t even the preferred spelling in the US. Secondly in countries all over the world job listings stipulate English language skills all the time (even Arabic candidates) and the practice is widely accepted because facilitating communication is generally considered to be important. Lastly, while empires certainly have pushed language standardization as a means to stamp out identities, I don’t think it follows that all language standards exist to stamp out identities (particularly when they are optional, as in the case of this post).
What makes you say that? (Cards on the table, my immediate thought was “Yes, it is.” I had no data for that, but the ngram below suggests that the single l spelling is the (currently) preferred US spelling.)
https://books.google.com/ngrams/graph?content=marshaling%2Cmarshalling&year_start=1800&year_end=2022&corpus=en-US&smoothing=3&case_insensitive=true
It’s imperialist to use social and technical pressure to “encourage” people to use American English so their own codebases are “idiomatic”.
I disagree. I don’t see how it is imperialism in any meaningful sense. Also “pressure” is fairly absurd here.
I used Tailscale the other day to solve a problem where I wanted to scrape a government website from inside GitHub Actions but I was being IP blocked by Cloudflare. My scraper in GitHub Actions now connects to my Tailscale network and uses my Apple TV as an exit node - works great! https://til.simonwillison.net/tailscale/tailscale-github-actions
I’ve done similar to bypass geographic restrictions on streaming services when traveling abroad. The streaming services block IP addresses from certain regions and they also typically block popular VPNs, but with Tailscale I set the exit node for my Raspberry Pi at home and I was good to go. The only issue is that if you want to watch content on a TV, you need something like an Apple TV that can connect the TV to Tailscale.
I read your article after I created my setup and found it similar to mine. Another cool use case!
Ooh now here’s a killer app idea: a simple, usable terminal command that you can use to sandbox a particular directory, a la chroot but actually usable for security. So you can run
sandboxin a terminal and everything outside the local dir is unreachable forever to any process started within it, or you runsandbox cargo buildlike you’d runsudoexcept it has the opposite effect. Always starts from the existing local state, so you don’t have to do any setup a la Docker.Not an ideal solution, given that many cargo commands want to touch the network or the rest of the filesystem for things like downloading cached packages, but it’s a thought. Maybe you can have a TOFU type setup where you run it and it goes through and asks “can this talk over the network to
crates.io?” and “can this read~/.cargo/registry/cache? Can this write~/.cargo/registry/cache?”. Then, idk, it remembers the results for that directory and command?I know all the tools are there to make something like this, no idea if it’s feasible in terms of UI though, or even whether it’d actually be useful for security. But it seems like something we should have.
Years ago, I did this with AppArmor to prevent applications from accessing the network. You can use it to restrict file access too.
If you give this one requirement up you can do all of this today pretty easily. It’s a lot harder to do this otherwise as unprivileged sandboxing is already annoying + x-plat sandboxing is extremely painful.
I started exploring/ experimenting with this via https://github.com/insanitybit/cargo-sandbox
But for a first pass you can just do
docker runand use a mount for caching etc if you want.I actually started work on an idea for that years ago but a mixture of “I wasn’t in a good mindstate and then 2019 made it worse” and fears of becoming more vulnerable, not less, if I was the face of a tool others were relying on for security (by going from being at risk of passive/undirected attacks to being at risk of active/directed attacks) caused it to go on de facto hiatus before I finished it.
You can see what got written here: https://github.com/ssokolow/nodo/ (defaults.toml illustrates the direction I was thinking in terms of making something like
nodo cargo buildJust Work™ with a useful amount of sandboxing.)This is possible through apparmor and selinux. It’s not trivial, but doable. Unfortunately macos is on its own here with sandbox-exec being basically unsupported and wired in behaviour.
I think it would be a good idea even for things like default-allow, but preventing writes to home/SSH configuration. But ui? Nah, this is going to be a per-project mess.
I think that would be tricky because the
sandboxprogram would need to know what files and other resources are required by the program it is supposed to execute in order to run them in a subdirectory—there’s not a great programmatic way to do this, and even if there was, it wouldn’t improve security (the command could just say “I need the contents of the user’s private keys” for instance). The alternative is to somehow tell the sandbox program what resources are required by the script which can be really difficult to do in the general case and probably isn’t a lot better than Docker or similar.On a developer workstation, probably most critical are your home directory (could contain SSH keys, secrets to various applications, etc.),
/etc,/var, and/run/user/<UID>. You could use something like bubblewrap to only make the project’s directory visible in$HOME, use a tmpfs for$HOME/.cargo, and use stubs or tmpfses for the other directories.I did this once and it works pretty well and across projects. However, the question is if you don’t trust the build, why would you trust the application itself? So, at that point you want to run it at all or in an isolated VM anyway. So it probably makes more sense to build the project in a low-privileged environment like that as well.
IMO sandboxing is primarily interesting for applications that you trust in principle, but process untrusted data (chat clients, web browsers, etc.). So you sandbox them for when there is a zero-day vulnerability. E.g. running something like Signal, Discord, or a Mastodon client without sandboxing is pretty crazy (e.g. something like iMessage needs application sandboxing + blastdoor all the time to ensure that zero-days cannot elevate to wider access).
It’s just as safe as it’s always been.
No,. there’s a lot of policy discretion. The US government has access to any data stored in the US belonging to non-US persons without basic due process like search warrants. The data they choose to access is a policy question. The people being installed in US security agencies have strong connections to global far right movements.
In 2004 servers operated by Rackspace in the UK on behalf of Indymedia were handed over to the American authorities with no consideration of the legal situation in the jurisdiction where they were physically located.
/Any/ organisation- governmental or otherwise- that exposes themselves to that kind of risk needs to be put out of business.
I seem to remember an incident where instapaper went offline. The FBI raided a data centre and took a blade machine offline containing blade servers they had warrants for, and instapapers, which they didn’t. So accidents happen.
Link: https://blog.instapaper.com/post/6830514157
Yes, but in that case the server was in an American-owned datacenter physically located in America (Virginia), where it was within the jurisdiction of the FBI.
That is hardly the same as a server in an American-owned datacenter physically located in the UK, where it was not within the jurisdiction of the FBI.
Having worked for an American “multinational” I can see how that sort of thing can happen: a chain of managers unversed in the law assumes it is doing “the right thing”. Which makes it even more important that customers consider both the actual legal situation and the cost of that sort of foulup when choosing a datacenter.
The FBI has offices around the world.
https://www.fbi.gov/contact-us/international-offices
Serious question, who’s putting data in
us-westetc when there is eu data centres? And does that free rein over data extend to data in European data centres? I was under the impression that safe harbour regs protected it? But it’s been years since I had to know about this kind of stuff and it’s now foggy.It does not matter where the data is stored. Using EU datacenters will help latency if that is where your users are, but it will not protect you from warrants. The author digs into this in this post, but unfortunately, it is in Dutch: https://berthub.eu/articles/posts/servers-in-de-eu-eigen-sleutels-helpt-het/
I re-read the English article a bit better and see he addresses it with sources and linked articles. Saturday morning, what can I say.
A lot of non-EU companies. Seems like a weird question, not everyone is either US or EU. Almost every Latin American company I’ve worked for uses us-east/west, even if it has no US customers. It’s just way cheaper than LATAM data centers and has better latency than EU.
Obviously the world isn’t just US/EU, I appreciate that. This article is dealing with the trade agreements concerning EU/US data protection though so take my comment in that perspective.
I don’t see how this is at odds with the parent comment?
That is the one good thing. It has always been unsafe, but now people are finally starting to understand that.
Because it’s dramatically less safe. Everyone saying “it’s the same as before” has no clue what is happening in the US government right now.
And everyone saying it’s dramatically different has no clue what has happened in the US government in the past.
I haven’t personally made up my mind on this, but one piece of evidence in the “it’s dramatically different (in a bad way)” side of things would be the usage of unvetted DOGE staffers with IRS data. That to me seems to indicate that the situation is worse than before.
yeah could be
You’re incorrect. The US has never had a government that openly seeks to harm its own allies.
What do you mean? Take Operation Desert Storm. Or the early Cold War.
Not sure what you mean—Operational Desert Storm and the Cold War weren’t initiated by the US nor were Iraq and the USSR allies in the sense that the US is allied with Western Europe, Canada, etc (yes, the US supported the USSR against Nazi Germany and Iraq against Islamist Iran, but everyone understood those alliances were temporary—the US didn’t enter into a mutual defense pact with Iraq or USSR, for example).
they absolutely 100% were initiated by the US. yes the existence of a mutual defense pact is notable, as is its continued existence despite the US “seeking to harm” its treaty partners. it sounds like our differing perceptions of whether the present moment is “dramatically different” come down to differences in historical understanding, the discussion of which would undoubtedly be pruned by pushcx.
My gut feeling says that you’re right, but actually I think practically nobody knows whether you are or not. To take one example, it’s not clear whether the US government is going to crash its own banking system: https://www.crisesnotes.com/how-can-we-know-if-government-payments-stop-an-exploratory-analysis-of-banking-system-warning-signs/ . The US governmant has done plenty of things that BAD before but it doesn’t often do anything that STRANGE. I think.
the reply was to me
Oh, yeah. Clearly I’m bad at parsing indentation on mobile.
Just because it was not safe before, doesn’t mean it cannot be (alarmingly) less safe now.
And just because it logically can be less safe now doesn’t mean it is.
It is not. Not anymore. But I don’t want to get into political debate here.
I suspect parent meant it has never been safe
This isn’t true, as the US has been the steward of the Internet and its administration has turned hostile towards US’s allies.
In truth, Europe already had a wake-up call with Snowden’s revelations, the US government spying on non-US citizens with impunity, by coercing private US companies to do it. And I remember the Obama administration claiming that “non-US citizens have no rights”.
But that was about privacy, whereas this time we’re talking about a far right administration that seems to be on a war path with US’s allies. The world today is not the same as it was 10 years ago.
hm, you have a good point. I was wondering why now it would be different but “privacy” has always been too vague a concept for most people to grasp/care about. But an unpredictable foreign government which is actively cutting ties with everyone and reneging on many of its promises with (former?) allies might be a bigger warning sign to companies and governments world wide.
I mean, nobody in their right mind would host stuff pertaining to EU citizens in, say, Russia or China.
Which is to say: its not safe at all and never has been a good idea.
Ah yes, one of the super toxic throw away comments anyone can make which alienates one participant and makes the other feel super smug.
These things are mostly going away, phone keyboards killed the “grammar nazi”, no one can spell anymore and no one knows who is at fault. I’m looking forward to a world where our default isn’t trying to win the conversation, but to move it forward more productively.
Ah yes, “the good old days”
The 1990s were so good that some people’s biggest problem was other people wasting a process in their shell scripts. 🙃
Probably a bit optimistic
On UUOC: I suspect it was originally in the spirit of fun. It becoming a cultural gatekeeping tool (as many, many in-group signifiers become) is a separate phenomenon and I think it’s probably best to view it as such
IMO the issue is that pointing at other people’s code and saying “lol that’s dumb” is really only something you can do with friends, with strangers over the internet it’s a very different thing
Often, people in these forums are friends to some extent and I’d assume that part of this that became toxic is that since you’re not ALL friends, it’s hard to tell who knows who and what’s a joke and what’s a dig.
That’s a lot of testing and wording when the spec is quite clear:
With clarifications in the spec for sending
and receiving
in case the original statement was not clear enough.
I thought the main benefit of the article was towards the end, and what @carlana summarized well here:
I’ve been writing Go since 2012 and I didn’t know this. 🤷♂️
I think the language would be a lot nicer without the
make()init. In the last 8 years using Go, every time I took a break away from Go and came back, the nil map, nil channel, and zero maps init tripped me up quite consistently. With generics available now, I think there are plenty of ways to clean up these older APIs and make the language friendlier to new users.I was going to write a comment about why you really want nil channels, but carlana already did so I’ll just bring it to your attention in case you’re only watching replies.
Nil maps aren’t so directly useful, but the difference between an empty map and no map at all is 48 bytes, which is non-negligible if you have some kind of data structure with millions of maps that might or might not exist.
The issue is not the functionality, it’s the implicitness. It’s that if you forget to
make()your channel you get a behaviour which is very likely to screw you over when you don’t expect it.If it gave you a channel by default it would be impliclty open or closed, buffered (with some capacity) or unbuffered. But it declines to do that, and has you construct what you want, which makes the code more explicit.
At the same time, there’s a need for one more thing, a (non)channel that is never available for communication. Go types must always have a zero value, and “a channel that’s never ready” is a lot more zero-ish than anything else you could come up with.
Yes, there’s a guideline that zero values should be useful when possible (a nil slice is ready for
append, a zerobytes.Bufferis empty, a zerosync.Mutexis unlocked), but that takes a backseat to the requirement for a zero value to be uniquely zero. Complaining about howfails feels the same as complaining about how
fails.
How about it doesn’t do that either.
Would that it did. It does give me a channel by default, one that is pretty much never what I want.
Ah yes, the explicitness of implicitly doing something stupid.
Have you considered that that’s a mistake?
It does does it not? In both cases the language does something which is at best useless and at worst harmful, and which, with a little more effort put into its design, it could just not do.
Like I said elsewhere, for pretty much every non-trivial use of channels, you will want a nil channel at some point, so you can deactivate one branch of a select. It’s pretty much always one of the things I want.
I think I disagree? Or at least I’ve never made good use of a nil channel. Maybe now that I’ve learned about its uses for
select{...}I’ll have a different opinion, but there have been plenty of times I don’t want a nil channel. And this problem isn’t limited to channels either–Go also gives nil pointers and nil maps by default, even though a nil pointer or map is frequently a bug. Defaulting to a zero value is certainly an improvement on C’s default (“whatever was in that particular memory region”), but I think it would be a lot better if it just forced us to initialize the memory.I do wish that map was a value type by default and you would need to write
*mto actually use it. That would be much more convenient. The Go team said they did that in early versions of Go, but they got sick of the pointer, so they made it internal to the map, but I think that was a mistake.Technically it’s UB, which is even worse. You may get whatever was at that location, or you might get the compiler deleting the entire thing and / or going off the rails completely.
Good point. Even keeping track of what is/isn’t UB is a big headache.
That would have implications across the whole language design that wouldn’t, in my opinion, be overall good. Zero-initialization is quite fundamental.
This is also my answer to masklinn’s “how about it doesn’t do that either” in a comment I couldn’t bring myself to respond directly to.
Yeah, it’s not going to happen, but I’m convinced that would have been the choice to make in 2012 (or earlier). I can live with it, and Go is still the most productive tool in my bucket, but that particular decision is pretty disappointing, especially because we can’t back out of it the way we could have done if we had mandatory initialization (you can relax that requirement without breaking compatibility).
To add to that point, there’s another issue which is that many channels oughtn’t be nil, but Go doesn’t give us a very good way to express that. In fact, it goes even further and makes
nila default even when a nil channel would be a bug. I really, really wish Go had (reasonably implemented) sum types.I haven’t done Go seriously in a while, but when I did, I was continually annoyed at this sort of thing because there’s no way to encapsulate these patterns to make it easy to get them right. I remember reading a Go team post about how to use goroutines properly and ranting about how Go’s only solution for reusing high-level code patterns is blog posts.
But now that it has generics, is it possible to solve this? Has someone made a package of things like worker pools and goroutine combinators (e.g., split/merge) that get this stuff right so you don’t have to rediscover the mistakes?
As an example of what annoyed me, it was things like this blog post on pipelines and cancellation. Which should have just been a library, not a blog post.
https://sourcegraph.com/blog/building-conc-better-structured-concurrency-for-go
Yes! Thank you, I will check this out, as it looks like I may have a Go project coming up in the near future.
Conc has a truly awful API. It really shows the power of writing a good blog post to make your package popular. I made my own concurrency package, and there were no ideas in conc worth copying. Honestly though, my recommendation for most people is to just use https://pkg.go.dev/golang.org/x/sync/errgroup since it’s semi-standard.
This crate should really be more well-known in the Rust ecosystem! The following quote on OpenAPI support resonates a lot with me (source):
I use something like this in the Go ecosystem, and generating an OpenAPI spec from source code is much better than generating source code from an OpenAPI spec.
One under-appreciated reason why it’s better: you don’t have to handle the entirely of the OpenAPI spec, just the features your server framework wants to produce. We took advantage of this by also writing our own code gen for clients too, that’s also easier to make good because you only need to handle that same subset.
I wonder how much code in the kernel also has been broken without anyone noticing.
The whole thing also suggests there’s not much testing in the kernel in general, be it automated or manual.
Kernel development is pretty unusual in the sense that many things you’d expect to be part of a project are carried out downstream of it.
For one, there’s no CI and not much in the way of testing there. Instead, developers and users do all kinds of testing on their own terms, such as the linux test project or syzbot (which was the thing that found this filesystem bug).
I was even more surprised when I found out that documentation is also largely left to downstream, and so there are a bunch of syscalls that the Linux manpages project simply hasn’t gotten around to add manpages for.
I was pretty surprised to find that the only documentation for the ext filesystems was just one guy trying to read the (really dodgy) code and infer what happened under different cases. A lot of “I think it does X when Y happens, but I’m not sure” in the documentation. And reading through the code, I understand it. That filesystem is its own spec because no one actually understands it. It’s wild how much this kind of stuff exists near the bottom of our tech stacks.
Yup. I tried to learn about the concurrency semantics of ext4 - stuff like “if I use it from multiple processes, do they see sequentially consistent operations?”, so I asked around. Nobody was able to give me a good answer.
Also, one kernel dev in response smugly told me to go read memory-barriers.txt. Which I’d already read and which is useless for this purpose! Because it documents kernel-internal programming practices, not the semantics presented to userland.
This is also true of stuff like gcc. Just a very different style than more modern projects.
GCC has quite a bit of upstream documentation.
I’m talking about the testing, you’re right that every time I’ve looked at it’s documentation, it feels very thorough.
Want to know how to make the perfect crab dip? You came to the right place! But first, my life story…
(recipe sites are the worst offenders)
I’ve never really worked on an application that ran out of connections or experienced degraded performance as a consequence of too many connections—what sort of system/scale benefits from PgBouncer? Presumably read replicas allow you to scale read connections linearly (at some cost) so it mostly comes into play when you have hundreds of concurrent writes? Or maybe you teach for PgBouncer before you scale out read replicas because the “peril” of the former outweighs the cost of the latter?
Lots of folks, for better or worse, run N replicas of their app, or N microservices, each with M connections in a pool. Seems to be very common with Kubernetes shops. Pretty easy to end up with a hundred open connections this way, and that’s not cheap.
Read replicas are too complex to be retrofitted on to most applications - they’re usually eventually consistent and the app needs to be taught which queries to route to the replica. PgBouncer can’t really do that sort of read query offloading, because of
SELECT launch_missiles()-shaped problems. Easier to throw money at the problem scaling vertically than doing open heart surgery on your persistence layer, I guess.I understand the N replicas with M connections per replica thing, but I usually only see ~3 replicas. Unless reach replicas has hundreds of connections in its pool, I don’t think this would explain it. Are lots of people running with dozens of replicas? What kinds of apps / at what scale is this relevant?
Read replicas are somewhat complex, but it seems more straightforward to tell whether a query will work for a read replica (and if you aren’t sure just leave it on the master) than it is to navigate the PgBouncer concerns raised in this article. And I think eventual consistency is probably fine for a lot of CRUD apps, no?
We have a python web app, that’s read/write to postgres. Each instance of the service will start with 5 connections to the DB, and we have many instances running around.
Certainly we could build out read replica’s of PG, retool the web app to read from 1 DB and write to another, but it was infinitely easier to just make pgbouncer do all the hard work and keep a single DB.
We could also optimize our queries too. Again, pgbouncer and scaling the 1 prod DB instance is just easier. DB slow, throw some more ram or CPU at the problem and move on with life.
We haven’t run out of hardware growth room on x86 yet, and they keep expanding our growth ability hardware wise. Perhaps someday we will start to reach the limits of what a single box can do, and then we can start to optimize. Until that day though, it’s not worth the developer time. Hardware running costs are way cheaper than developer time.
How many replicas do you have? According to the article, the sweet spot is 200-300 connections, which means you could have 40-60 replicas before performance becomes an issue. Do you really have this many replicas, and if so how much traffic does your app get?
Additionally, how did you avoid the pitfalls described in the article. It sounds like PgBouncer requires changes to the query versus vanilla Postgres, at least if you have it configured in any useful way. You mention that hardware is cheap compared to developer time, and I agree, but I’m trying to understand why PgBouncer minimizes developer time.
I keep hearing that it’s much easier to use PgBouncer, but they articles makes it sound like using it in any useful capacity while maintaining transaction and query correctness is not very easy. Similarly you make it sound like it would be really hard to change your code to use read replicas, but i would think it would be really easy (instead of one database handle you have two and your reads use the read handle.
I’m not advocating for one approach, I’m just trying to understand why people keep saying PgBouncer is easy when the article makes it seem pretty complicated to implement correctly.
DB replicas? Zero real-time ones. Do you mean something else? We have an hourly read-replica(for testing) and a daily replica that comes from our backup(so we can exercise our backup/restore), that’s used for development.
We run a few hundred PG connections through pgbouncer without trouble. We also have around 150 normal long-lived connections for our client/server and other longer lived sessions that don’t go through PG bouncer. Only the web traffic goes through pgbouncer.
We have about 1k users(it’s an internal application). It runs the backoffice stuff(payroll, HR, purchasing, accounting, etc), our biggest load on the web side is timesheets, which most of our users are, and they of course all want to do timesheets at the exact same time.
I dunno, it was easy for us :) We did have a few growing pains when we deployed it, but they were easily solved. We haven’t had issues in years now. I don’t remember what the issues were when we were deploying, and I can’t remember what mode we run it in. I’m not in a position to easily check at the moment. I think we went from testing to in production within a week, so whatever issues we had, they were not difficult or hard to fix for us.
If you want me to, I can go look at our config later and re-familiarize myself with it, and share some more details.
I meant application replicas. How many instances of your application are connecting to the database with ~5 connections each?
I’m surprised you need PgBouncer at all for an internal system with 1k users?
The way the article makes it sound, if you just throw PgBouncer in front of something without configuration, it doesn’t actually do much. How do you know if PgBouncer is actually improving performance?
This is all from memory, I could be off somewhere.
I’d have to go count, more than a dozen, less than 100.
Because we have 1k active users at one time.
For us it wasn’t really about performance, it was about relieving the # of connections to PG. PG connections are expensive. We had 500 or so PG connections from the web side when we moved to PG bouncer a few years ago. Now I think we have 50 or 100 active for web connections.
Adding pgbouncer is typically a lot easier than adding a read replica IME (at least for rails, though the recent multidb support changes that calculus a lot)
Many of the features are operational and so depending on what you want to do there’s no real minimum scale.
For example, you can use it to “pause” new connections to the database - effectively draining the connections. I’ve used this in the past to restart a production database to pick up OS updates without having to go through all the faff of a failover.
But it’s not uncommon for applications to behave badly (eg open way too many conns, for too long) and pgbouncer gives you tools to mitigate that. It’s not just a “big scale” thing.
This is an extremely strong statement.
I think a few things are also interesting:
I think people are realizing how low quality the Linux kernel code is, how haphazard development is, how much burnout and misery is involved, etc.
I think people are realizing how insanely not in the open kernel dev is, how much is private conversations that a few are privy to, how much is politics, etc.
The Hellwig/Ojeda part of the thread is just frustrating to read because it almost feels like pleading. “We went over this in private” “we discussed this already, why are you bringing it up again?” “Linus said (in private so there’s no record)”, etc., etc.
Dragging discussions out in front of an audience is a pretty decent tactic for dealing with obstinate maintainers. They don’t like to explain their shoddy reasoning in front of people, and would prefer it remain hidden. It isn’t the first tool in the toolbelt but at a certain point there is no convincing people directly.
With quite a few things actually. A friend of mine is contributing to a non-profit, which until recently had this very toxic member (they’ve even attempted felony). They were driven out of the non-profit very soon after members talked in a thread that was accessible to all members. Obscurity is often one key component of abuse, be it mere stubbornness or criminal behaviour. Shine light, and it often goes away.
IIRC Hintjens noted this quite explicitly as a tactic of bad actors in his works.
It’s amazing how quickly people are to recognize folks trying to subvert an org piecemeal via one-off private conversations once everybody can compare notes. It’s equally amazing to see how much the same people beforehand will swear up and down oh no that’s a conspiracy theory such things can’t happen here until they’ve been burned at least once.
This is an active, unpatched attack vector in most communities.
I’ve found the lowest example of this is even meetings minutes at work. I’ve observed that people tend to act more collaboratively and seek the common good if there are public minutes, as opposed to trying to “privately” win people over to their desires.
There is something to be said for keeping things between people with skin in the game.
It’s flipped over here, though, because more people want to contribute. The question is whether it’ll be stabe long-term.
Something I’ve noticed is true in virtually everything I’ve looked deeply at is the majority of work is poor to mediocre and most people are not especially great at their jobs. So it wouldn’t surprise me if Linux is the same. (…and also wouldn’t surprise me if the wonderful Rust rewrite also ends up poor to mediocre.)
yet at the same time, another thing that astonishes me is how much stuff actually does get done and how well things manage to work anyway. And Linux also does a lot and works pretty well. Mediocre over the years can end up pretty good.
After tangentially following the kernel news, I think a lot of churning and death spiraling is happening. I would much rather have a rust-first kernel that isn’t crippled by the old guard of C developers reluctant to adopt new tech.
Take all of this energy into RedoxOS and let Linux stay in antiquity.
I’ve seen some of the R4L people talk on Mastodon, and they all seem to hate this argument.
They want to contribute to Linux because they use it, want to use it, and want to improve the lives of everyone who uses it. The fact that it’s out there and deployed and not a toy is a huge part of the reason why they want to improve it.
Hopping off into their own little projects which may or may not be useful to someone in 5-10 years’ time is not interesting to them. If it was, they’d already be working on Redox.
The most effective thing that could happen is for the Linux foundation, and Linus himself, to formally endorse and run a Rust-based kernel. They can adopt an existing one or make a concerted effort to replace large chunks of Linux’s C with Rust.
IMO the Linux project needs to figure out something pretty quickly because it seems to be bleeding maintainers and Linus isn’t getting any younger.
They may be misunderstanding the idea that others are not necessarily incentivized to do things just because it’s interesting for them (the Mastodon posters).
Yep, I made a similar remark upthread. A Rust-first kernel would have a lot of benefits over Linux, assuming a competent group of maintainers.
along similar lines: https://drewdevault.com/2024/08/30/2024-08-30-Rust-in-Linux-revisited.html
Redox does have the chains of trying to do new OS things. An ABI-compatible Rust rewrite of the Linux kernel might get further along than expected (even if it only runs in virtual contexts, without hardware support (that would come later.))
Linux developers want to work on Linux, they don’t want to make a new OS. Linux is incredibly important, and companies already have Rust-only drivers for their hardware.
Basically, sure, a new OS project would be neat, but it’s really just completely off topic in the sense that it’s not a solution for Rust for Linux. Because the “Linux” part in that matters.
I read a 25+ year old article [1] from a former Netscape developer that I think applies in part
The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed. There’s nothing wrong with it. It doesn’t acquire bugs just by sitting around on your hard drive. Au contraire, baby! Is software supposed to be like an old Dodge Dart, that rusts just sitting in the garage? Is software like a teddy bear that’s kind of gross if it’s not made out of all new material?Adopting a “rust-first” kernel is throwing the baby out with the bathwater. Linux has been beaten into submission for over 30 years for a reason. It’s the largest collaborative project in human history and over 30 million lines of code. Throwing it out and starting new would be an absolutely herculean effort that would likely take years, if it ever got off the ground.
[1] https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/
The idea that old code is better than new code is patently absurd. Old code has stagnated. It was built using substandard, out of date methodologies. No one remembers what’s a bug and what’s a feature, and everyone is too scared to fix anything because of it. It doesn’t acquire new bugs because no one is willing to work on that weird ass bespoke shit you did with your C preprocessor. Au contraire, baby! Is software supposed to never learn? Are we never to adopt new tools? Can we never look at something we’ve built in an old way and wonder if new methodologies would produce something better?
This is what it looks like to say nothing, to beg the question. Numerous empirical claims, where is the justification?
It’s also self defeating on its face. I take an old codebase, I fix a bug, the codebase is now new. Which one is better?
Like most things in life the truth is somewhere in the middle. There is a reason there is the concept of a “mature node” in the semiconductor industry. They accept that new is needed for each node, but also that the new thing takes time to iron out the kinks and bugs. This is the primary reason why you see apple take new nodes on first before Nvidia for example, as Nvidia require much larger die sizes, and so less defects per square mm.
You can see this sometimes in software for example X11 vs Wayland, where adoption is slow, but most definetly progressing and now-days most people can see that Wayland is now, or is going to become the dominant tech in the space.
The truth lies where it lies. Maybe the middle, maybe elsewhere. I just don’t think we’ll get to the truth with rhetoric.
Aren’t the arguments above more dialectic than rhetoric?
I don’t think this would qualify as dialectic, it lacks any internal debate and it leans heavily on appeals by analogy and intuition/ emotion. The post itself makes a ton of empirical claims without justification even beyond the quoted bit.
fair enough, I can see how one would make that argument.
“Good” is subjective, but there is real evidence that older code does contain fewer vulnerabilities: https://www.usenix.org/conference/usenixsecurity22/presentation/alexopoulos
That means we can probably keep a lot of the old trusty Linux code around while making more of the new code safe by writing it in Rust in the first place.
I don’t think that’s a fair assessment of Spolsky’s argument or of CursedSilicon’s application of it to the Linux kernel.
Firstly, someone has already pointed out the research that suggests that existing code has fewer bugs in than new code (and that the older code is, the less likely it is to be buggy).
Secondly, this discussion is mainly around entire codebases, not just existing code. Codebases usually have an entire infrastructure around them for verifying that the behaviour of the codebase has not changed. This is often made up of tests, but it’s also made up of the users who try out a release of a codebase and determine whether it’s working for them. The difference between making a change to an existing codebase and releasing a new project largely comes down to whether this verification (both in terms of automated tests and in terms of users’ ability to use the new release) works for the new code.
Given this difference, if I want to (say) write a new OS completely in Rust, I need to choose: Do I want to make it completely compatible with Linux, and therefore take on the significant challenge of making sure everything behaves truly the same? Or do I make significant breaking changes, write my own OS, and therefore force potential adopters to rebuild their entire Linux workflows in my new OS?
The point is not that either of these options are bad, it is that they represent significant risks to a project. Added to the general risk that is writing new code, this produces a total level of risk that might be considered the baseline risk of doing a rewrite. Now risk is not bad per se! If the benefits of being able to write an OS in a language like Rust outweigh the potential risks, then it still makes sense to perform the rewrite. Or maybe the existing Linux kernel is so difficult to maintain that a new codebase really would be the better option. But the point that CursedSilicon was making by linking the Spolsky piece was, I believe, that the risks for a project like the Linux kernel are very high. There is a lot of existing, old code. And there is a very large ecosystem where either breaking or maintaining compatibility would each come with significant challenges.
Unfortunately, it’s very difficult to measure the risks and benefits here in a quantitative, comparable way, so I think where you fall on the “rewrite vs continuity” spectrum will depend mostly on what sort of examples you’ve seen, and how close you think this case is to those examples. I don’t think there’s any objective way to say whether it makes more sense to have something like R4L, or something like RedoxOS.
I haven’t read it yet, but I haven’t made an argument about that, I just created a parody of the argument as presented. I’ll be candid, i doubt that the research is going to compel me to believe that newer code is inherently buggier, it may compel me to confirm my existing belief that testing software in the field is one good method to find some classes of bugs.
I guess so, it’s a bit dependent on where we say the discussion starts - three things are relevant; RFL, which is not a wholesale rewrite, a wholesale rewrite of the Linux kernel, and Netscape. RFL is not about replacing the entire Linux kernel, although perhaps “codebase” here refers to some sort of unit, like a driver. Netscape wanted a wholesale rewrite, based on the linked post, so perhaps that’s what’s really “the single worst strategic mistake that any software company can make”, but I wonder what the boundary here is? Also, the article immediately mentions that Microsoft tried to do this with Word but it failed, but that Word didn’t suffer from this because it was still actively developed - I wonder if it really “failed” just because pyramid didn’t become the new Word? Did Microsoft have some lessons learned, or incorporate some of that code? Dunno.
I think I’m really entirely justified when I say that the post is entirely emotional/ intuitive appeals, rhetoric, and that it makes empirical claims without justification.
This is rhetoric. These are unsubstantiated empirical claims. The article is all of this. It’s fine as an interesting, thought provoking read that gets to the root of our intuitions, but I think anyone can dismiss it pretty easily since it doesn’t really provide much in the form of an argument.
Again, totally unsubstantiated. I have MANY reasons to believe that, it is simply question begging to say otherwise.
That’s all this post is. Over and over again making empirical claims with no evidence and question beggign.
We can discuss the risks and benefits, I’d advocate for that. This article posted doesn’t advocate for that. It’s rhetoric.
This is a truism. It is survival bias. If the code was buggy, it would eventually be found and fixed. So all things being equal newer code is riskier than old code. But it’s also been impirically shown that using Rust for new code is not “all things being equal”. Google showed that new code in Rust is as reliable as old code in C. Which is good news: you can use old C code from new Rust projects without the risk that comes from new C code.
Yeah, this is what I’ve been saying (not sure if you’d meant to respond to me or the parent, since we agree) - the issue isn’t “new” vs “old” it’s things like “reviewed vs unreviewed” or “released vs unreleased” or “tested well vs not tested well” or “class of bugs is trivial to express vs class of bugs is difficult to express” etc.
Was restating your thesis in the hopes of making it clearer.
I don’t disagree that the rewards can outweigh the risks, and in this case I think there’s a lot of evidence that suggests that memory safety as a default is really important for all sorts of reasons. Let alone the many other PL developments that make Rust a much more suitable language to develop in than C.
That doesn’t mean the risks don’t exist, though.
Nobody would call an old codebase with a handful of fixes a new codebase, at least not in the contexts in which those terms have been used here.
How many lines then?
It’s a Ship of Theseus—at no point can you call it a “new” codebase, but after a period of time, it could be completely different code. I have a C program I’ve been using and modifying for 25 years. At any given point, it would have been hard to say “this is now a new codebase, yet not one line of code in the project is the same as when I started (even though it does the same thing at it always has).
I don’t see the point in your question. It’s going to depend on the codebase, and on the nature of the changes; it’s going to be nuanced, and subjective at least to some degree. But the fact that it’s prone to subjectivity doesn’t mean that you get to call an old codebase with a single fixed bug a new codebase, without some heavy qualification which was lacking.
If it requires all of that nuance and context maybe the issue isn’t what’s “old” and what’s “new”.
I don’t follow, to me that seems like a non-sequitur.
What’s old and new is poorly defined and yet there’s an argument being made that “old” and “new” are good indicators of something. If they’re so poorly defined that we have to bring in all sorts of additional context like the nature of the changes, not just when they happened or the number of lines changed, etc, then it seems to me that we would be just as well served to throw away the “old” and “new” and focus on that context.
I feel like enough people would agree more-or-less on what was an “old” or “new” codebase (i.e. they would agree given particular context) that they remain useful terms in a discussion. The general context used here is apparent (at least to me) given by the discussion so far: an older codebase has been around for a while, has been maintained, has had kinks ironed out.
There’s a really important distinction here though. The point is to argue that new projects will be less stable than old ones, but you’re intuitively (and correctly) bringing in far more important context - maintenance, testing, battle testing, etc. If a new implementation has a higher degree of those properties then it being “new” stops being relevant.
Ok, but:
My point was that this statement requires a definition of “new codebase” that nobody would agree with, at least in the context of the discussion we’re in. Maybe you are attacking the base proposition without applying the surrounding context, which might be valid if this were a formal argument and not a free-for-all discussion.
I think that it would be considered no longer new if it had had significant battle-testing, for example.
FWIW the important thing in my view is that every new codebase is a potential old codebase (given time and care), and a rewrite necessarily involves a step backwards. The question should probably not be, which is immediately better?, but, which is better in the longer term (and by how much)? However your point that “new codebase” is not automatically worse is certainly valid. There are other factors than age and “time in the field” that determine quality.
Methodologies don’t matter for quality of code. They could be useful for estimates, cost control, figuring out whom you shall fire etc. But not for the quality of code.
You’re suggesting that the way you approach programming has no bearing on the quality of the produced program?
I’ve never observed a programmer become better or worse by switching methodology. Dijkstra would’ve not became better if you made him do daily standups or go through code reviews.
There are ways to improve your programming by choosing different approach but these are very individual. Methodology is mostly a beancounting tool.
When I say “methodology” I’m speaking very broadly - simply “the approach one takes”. This isn’t necessarily saying that any methodology is better than any other. The way I approach a task today is better, I think, then the way that I would have approached that task a decade ago - my methodology has changed, the way I think has changed. Perhaps that might mean I write more tests, or I test earlier, but it may mean exactly the opposite, and my methods may only work best for me.
I’m not advocating for “process” or ubiquity, only that the approach one tasks may improve over time, which I suspect we would agree on.
If you take this logic to its end, you should never create new things.
At one point in time, Linux was also the new kid on the block.
The best time to plant a tree is 30 years ago. The second best time is now.
I don’t think Joel Spolsky was ever a Netscape developer. He was a Microsoft developer who worked on Excel.
My mistake! The article contained a bit about Netscape and I misremembered it
How many of those lines are part of the core? My understanding was that the overwhelming majority was driver code. There may not be that much core subsystem code to rewrite.
For a previous project, we included a minimal Linux build. It was around 300 KLoC, which included networking and the storage stack, along with virtio drivers.
That’s around the size a single person could manage and quite easy with a motivated team.
If you started with DPDK and SPDK then you’d already have filesystems and a copy of the FreeBSD network stack to run in isolated environments.
Once many drivers share common rust wrappers over core subsystems, you could flip it and write the subsystem in Rust. Then expose C interface for the rest.
Oh sure, that would be my plan as well. And I bet some subsystem maintainers see this coming, and resist it for reasons that aren’t entirely selfless.
That’s pretty far into the future, both from a maintainer acceptance PoV and from a rustc_codegen_gcc and/or gccrs maturity PoV.
Sure. But I doubt I’ll running a different kernel 10y from now.
And like us, those maintainers are not getting any younger and if they need a hand, I am confident I’ll get faster into it with a strict type checker.
I am also confident nobody in our office would be able to help out with C at all.
This cannot possibly be true.
It’s the largest collaborative open source os kernel project in human history
It’s been described as such based purely on the number of unique human contributions to it
I would expect Wikipedia should be bigger 🤔
I see that Drew proposes a new OS in that linked article, but I think a better proposal in the same vein is a fork. You get to keep Linux, but you can start porting logic to Rust unimpeded, and it’s a manageable amount of work to keep porting upstream changes.
Remember when libav forked from ffmpeg? Michael Niedermayer single-handedly ported every single libav commit back into ffmpeg, and eventually, ffmpeg won.
At first there will be extremely high C percentage, low Rust percentage, so porting is trivial, just git merge and there will be no conflicts. As the fork ports more and more C code to Rust, however, you start to have to do porting work by inspecting the C code and determining whether the fixes apply to the corresponding Rust code. However, at that point, it means you should start seeing productivity gains, community gains, and feature gains from using a better language than C. At this point the community growth should be able to keep up with the extra porting work required. And this is when distros will start sniffing around, at first offering variants of the distro that uses the forked kernel, and if they like what they taste, they might even drop the original.
I genuinely think it’s a strong idea, given the momentum and potential amount of labor Rust community has at its disposal.
I think the competition would be great, especially in the domain of making it more contributor friendly to improve the kernel(s) that we use daily.
I certainly don’t think this is impossible, for sure. But the point ultimately still stands: Linux kernel devs don’t want a fork. They want Linux. These folks aren’t interested in competing, they’re interested in making the project they work on better. We’ll see if some others choose the fork route, but it’s still ultimately not the point of this project.
While I don’t personally want to make a new OS, I’m not sure I actually want to work on Linux. Most of the time I strive for portability, and so abstract myself from the OS whenever I can get away with it. And when I can’t, I have to say Linux’s API isn’t always that great, compared to what the BSDs have to offer (epoll vs kqueue comes to mind). Most annoying though is the lack of documentation for the less used APIs: I’ve recently worked with Netlink sockets, and for the proc stuff so far the best documentation I found was the freaking source code of a third party monitoring program.
I was shocked. Complete documentation of the public API is the minimum bar for a project as serious of the Linux kernel. I can live with an API I don’t like, but lack of documentation is a deal breaker.
I think they mean that Linux kernel devs want to work on the Linux kernel. Most (all?) R4L devs are long time Linux kernel devs. Though, maybe some of the people resigning over LKML toxicity will go work on Redox or something…
That’s is what I was saying, yes.
I’m talking about the people who develop the Linux kernel, not people who write userland programs for Linux.
Re-Implementing the kernel ABI would be a ton of work for little gain if all they wanted was to upstream all the work on new hardware drivers that is already done - and then eventually start re-implementing bits that need to be revised anyway.
If the singular required Rust toolchain didn’t feel like such a ridiculous to bootstrap 500 ton LLVM clown car I would agree with this statement without reservation.
Would zig be a better starting place?
Zig is easier to implement (and I personally like it as a language) but doesn’t have the same safety guarantees and strong type system that Rust does. It’s a give and take. I actually really like Rust and would like to see a proliferation of toolchain options, such as what’s in progress in GCC land. Overall, it would just be really nice to have an easily bootstrapped toolchain that a normal person can compile from scratch locally, although I don’t think it necessarily needs to be the default, or that using LLVM generally is an issue. However, it might be possible that no matter how you architect it, Rust might just be complicated enough that any sufficiently useful toolchain for the language could just end up being a 500 ton clown car of some kind anyways.
Depends on which parts of GP’s statement you care about: LLVM or bootstrap. Zig is still depending on LLVM (for now), but it is no longer bootstrappable in a limited number of steps (because they switched from a bootstrap C++ implementation of the compiler to keeping a compressed WASM build of the compiler as a blob.
Yep, although I would also add it’s unfair to judge Zig in any case on this matter now given it’s such a young project that clearly is going to evolve a lot before the dust begins to settle (Rust is also young, but not nearly as young as Zig). In ten to twenty years, so long as we’re all still typing away on our keyboards, we might have a dozen Zig 1.0 and a half dozen Zig 2.0 implementations!
Yeah, the absurdly low code quality and toxic environment make me think that Linux is ripe for disruption. Not like anyone can produce a production kernel overnight, but maybe a few years of sustained work might see a functional, production-ready Rust kernel for some niche applications and from there it could be expanded gradually. While it would have a lot of catching up to do with respect to Linux, I would expect it to mature much faster because of Rust, because of a lack of cruft/backwards-compatibility promises, and most importantly because it could avoid the pointless drama and toxicity that burn people out and prevent people from contributing in the first place.
What is the, some kind of a new meme? Where did you hear it first?
From the thread in OP, if you expand the messages, there is wide agreement among the maintainers that all sorts of really badly designed and almost impossible to use (safely) APIs ended up in the kernel over the years because the developers were inexperienced and kind of learning kernel development as they went. In retrospect they would have designed many of the APIs very differently.
Someone should compile everything to help future OS developers avoid those traps! There are a lot of exieting non-posix experiments though.
It’s based on my forays into the Linux kernel source code. I don’t doubt there’s some quality code lurking around somewhere, but the stuff I’ve come across (largely filesystem and filesystem adjacent) is baffling.
Seeing how many people are confidently incorrect about Linux maintainers only caring about their job security and keeping code bad to make it a barrier to entry, if nothing else taught me how online discussions are a huge game of Chinese whispers where most participants don’t have a clue of what they are talking about.
I doubt that maintainers are “only caring about their job security and keeping back code” but with all due respect: You’re also just taking arguments out of thin air right now. What I do believe is what we have seen: Pretty toxic responses from some people and a whole lot of issues trying to move forward.
Huh, I’m not seeing any claim to this end from the GP, or did I not look hard enough? At face value, saying that something has an “absurdly low code quality” does not imply anything about nefarious motives.
I can personally attest to having never made that specific claim.
Indeed that remark wasn’t directly referring to GP’s comment, but rather to the range of confidently incorrect comments that I read in the previous episodes, and to the “gatekeeping greybeards” theme that can be seen elsewhere on this page. First occurrence, found just by searching for “old”: Linux is apparently “crippled by the old guard of C developers reluctant to adopt new tech”, to which GP replied in agreement in fact. Another one, maintainers don’t want to “do the hard work”.
Still, in GP’s case the Chinese whispers have reduced “the safety of this API is hard to formalize and you pretty much have to use it the way everybody does it” to “absurdly low quality”. To which I ask, what is more likely. 1) That 30-million lines of code contain various levels of technical debt of which maintainers are aware; and that said maintainers are worried even of code where the technical debt is real but not causing substantial issue in practice? Or 2) that a piece of software gets to run on literally billions of devices of all sizes and prices just because it’s free and in spite of its “absurdly low quality”?
Linux is not perfect, neither technically nor socially. But it sure takes a lot of entitlement and self-righteousness to declare it “of absurdly low quality” with a straight face.
GP here: I probably should have said “shockingly” rather than “absurdly”. I didn’t really expect to get lawyered over that one word, but yeah, the idea was that for a software that runs on billions of devices, the code quality is shockingly low.
Of course, this is plainly subjective. If your code quality standards are a lot lower than mine then you might disagree with my assessment.
That said, I suspect adoption is a poor proxy for code quality. Internet Explorer was widely adopted and yet it’s broadly understood to have been poorly written.
I’m sure self-righteousness could get you to the same place, but in my case I arrived by way of experience. You can relax, I wasn’t attacking Linux—I like Linux—it just has a lot of opportunity for improvement.
I guess I’ve seen the internals of too much proprietary software now to be shocked by anything about Linux per se. I might even argue that the quality of Linux is surprisingly good, considering its origins and development model.
I think I’d lawyer you a tiny bit differently: some of the bugs in the kernel shock me when I consider how many devices run that code and fulfill their purposes despite those bugs.
FWIW, I was not making a dig at open source software, and yes plenty of corporate software is worse. I guess my expectations for Linux are higher because of how often it is touted as exemplary in some form or another. I don’t even dislike Linux, I think it’s the best thing out there for a huge swath of use cases—I just see some pretty big opportunities for improvement.
Or actual benchmarks: the performance the Linux kernel leaves on the table in some cases is absurd. And sure it’s just one example, but I wouldn’t be surprised if it was representative of a good portion of the kernel.
Well not quite but still “considered broken beyond repair by many people related to life time management” - which is definitely worse than “hard to formalize” when “the way ever[y]body does it” seems to vary between each user.
I love Rust but still, we’re talking of a language which (for good reasons!) considers doubly linked lists unsafe. Take an API that gets a 4 on Rusty Russell’s API design scale (“Follow common convention and you’ll get it right”), but which was designed for a completely different programming language if not paradigm, and it’s not surprising that it can’t easily be transformed into a 9 (“The compiler/linker won’t let you get it wrong”). But at the same time there are a dozen ways in which, according to the same scale, things could actually be worse!
What I dislike is that people are seeing “awareness of complexity” and the message they spread is “absurdly low quality”.
Note that doubly linked lists are not a special case at all in Rust. All the other common data structures like
Vec,HashMapetc. also need unsafe code in their implementation.Implementing these datastructures in Rust, and writing unsafe code in general, is indeed roughly a 4. But these are all already implemented in the standard library, with an API that actually is at a 9. And
std::collections::LinkedListis constructive proof that you can have a safe Rust abstraction for doubly linked lists.Yes, the implementation could have bugs, thus making the abstraction leaky. But that’s the case for literally everything, down to the hardware that your code runs on.
You’re absolutely right that you can build abstractions with enough effort.
My point is that if a doubly linked list is (again, for good reasons) hard to make into a 9, a 20-year-old API may very well be even harder. In fact,
std::collections::LinkedListis safe but still not great (for example the cursor API is still unstable); and being in std, it was designed/reviewed by some of the most knowledgeable Rust developers, sort of by definition. That’s the conundrum that maintainers face and, if they realize that, it’s a good thing. I would be scared if maintainers handwaved that away.Bugs happen, but if the abstraction is downright wrong then that’s something I wouldn’t underestimate. A lot of the appeal of Rust in Linux lies exactly in documenting/formalizing these unwritten rules, and wrong documentation can be worse than no documentation (cue the negative parts of the API design scale!); even more so if your documentation is a formal model like a set of Rust types and functions.
That said, the same thing can happen in a Rust-first kernel, which will also have a lot of unsafe code. And it would be much harder to fix it in a Rust-first kernel, than in Linux at a time when it’s just feeling the waters.
At the same time, it was included almost as like, half a joke, and nobody uses it, so there’s not a lot of pressure to actually finish off the cursor API.
It’s also not the kind of linked list the kernel would use, as they’d want an intrusive one.
And yet, safe to use doubly linked lists written in Rust exist. That the implementation needs unsafe is not a real problem. That’s how we should look at wrapping C code in safe Rust abstractions.
The whole comment you replied to, after the one sentence about linked lists, is about abstractions. And abstractions are rarely going to be easy, and sometimes could be hardly possible.
That’s just a fact. Confusing this fact for something as hyperbolic as “absurdly low quality” is stunning example of the Dunning Kruger effect, and frankly insulting as well.
I personally would call Linux low quality because many parts of it are buggy as sin. My GPU stops working properly literally every other time I upgrade Linux.
No one is saying that Linux is low quality because it’s hard or impossible to abstract some subsystems in Rust, they’re saying it’s low quality because a lot of it barely works! I would say that your “Chinese whispers” misrepresents the situation and what people here are actually saying. “the safety of this API is hard to formalize and you pretty much have to use it the way everybody does it” doesn’t apply if no one can tell you how to use an API, and everyone does it differently.
I agree, Linux is the worst of all kernels.
Except for all the others.
Actually, the NT kernel of all things seems to have a pretty good reputation, and I wouldn’t dismiss the BSD kernels out of hand. I don’t know which kernel is better, but it seems you do. If you could explain how you came to this conclusion that would be most helpful.
NT gets a bad rap because of the OS on top of it, not because it’s actually bad. NT itself is a very well-designed kernel.
*nod* I haven’t been a Windows person since shortly after the release of Windows XP (i.e. the first online activation DRM’d Windows) but, whenever I see glimpses of what’s going on inside the NT kernel in places like Project Zero: The Definitive Guide on Win32 to NT Path Conversion, it really makes me want to know more.
More likely a fork that gets rusted from the inside out
Somewhere else it was mentioned that most developers in the kernel could just not be bothered with checking for basic things.
Nobody is forcing any of these people to do this.
Random sidenote: I wish there was standard shortcuts or aliases for frequently typed commands. It’s annoying to type
systemctl daemon-reloadafter editing a unit, e.g. why notsystemctl dr? Or debugging a failed unit,journalctl -xue myunitseems unnecessarily arcane, why not--debugor friendlier?I’m using these:
this is shorter to type, completion still works and I get my less options
Typing this for me looks like sy<tab><tab> d<tab> - doesn’t your shell have systemd completions ?
It does but what you describe doesn’t work for me.
what doesn’t work ? in any modern shell when you are here and type tab twice you will get to daemon-reload. ex: https://streamable.com/jdedh6
your shell doesn’t show up a tab-movable highlight when such prompt appears? If so, try that out. It’s very nice feature.
journalctl -u <service> --followis equally annoyingjournalctl -fu
My favorite command in all linux. Some daemon is not working. F U Mr. Daemon!
so this does exist - I could swear I tried that before and it didn’t work
I wasn’t sure whether to read it as short args or a message directed at journalctl.
Thankfully it can be both! :)
You gotta use -fu not -uf, nothing makes you madder then having to follow some service logs :rage:
That’s standard getopt behaviour.
Well I guess fu rolls better of the tongue than uf. But I remember literally looking up if there isn’t anything like -f and having issues with that. Oh well.
Would it be “too clever” for
systemdto wait for unit files to change and reload the affected system automagically when it changed?I’m not sure it would be “clever”. At best it would make transactional changes (i.e. changes that span several files) hard, at worst impossible. It would also be a weird editing experience when just saving activates the changes.
I wonder why changes should need to be transactional? In Kubernetes we edit resource specs—which are very similar to systemd units—individually. Eventually consistency obviates transactions. I think the same could have held for systemd, right?
Because the services sd manages are mote stateful. If sd restarted every service each moment their on-disk base unit file changes [1], desktop users, database admins, etc would have terrible experience.
[1] say during a routine distro upgrade.
Shorter commands would be easier to type accidentally. I approve of something as powerful as systemctl not being that way.
Does tab completion not work for you, though?
I thoroughly enjoyed “Recoding America” which IIRC was written by someone very involved with Code For America (a founder, perhaps?–it’s been a while). Even still, while I sort of understand what Civic Tech is inside of government (for example, healthcare.gov or some other government form), I don’t really have a good idea about what Civic Tech looks like outside of government? What are some good examples of things non-government agencies have built? Do they require partnering with a government entity, or are there things that people can do for their fellow citizens independent of a local government? How can I find out if there is anything happening in my state or municipality?
Some of the classic “first wave” examples would be sites like everyblock (now defunct), govtrack.us or my own project openstates.org. These projects aim to help citizens better understand what’s happening at different levels of government (local, federal, and state respectively) and, in theory, give them more of a voice. These projects do not require partnership with governments (in fact, at the federal and state level it is very hard to “partner” in any meaningful way if you’re a small/independent team) but benefit from open data if it is available. (But do not require it, web scraping is a big part of the job in a lot of cases, since improving a government interface often means a crufty HTML site a vendor built in 2004.
If you look at MySociety, the pre-eminent UK civic tech organization, (https://www.mysociety.org/advice-and-support/) their flagship products are
These represent three common kinds of civic applications: those that provide a better interface to a government service (make it easier to file FOIAs), those that crowdsource information within communities, and those that bring together information from different governments or levels of government into a single place to make it easier for the citizen to digest. (Different units of governance often themselves have little incentive or ability to collaborate directly.)
I wish I could point you towards an easy way to find stuff in your state or municipality, that is a part of what we lost when we lost Sunlight & the brigades.