1.

Here’s another, way more hypothetical scenario where bubblesort is good:

You have a line of machines, with each machine n directly connected through a slow network link with machines n-1 and n+1. Each machine holds a large value(movement/exchange takes a long time), and you need to sort all of the values across the machines via some small proxy (comparison doesn’t take a long time).

Since a bubble sort will do the minimum number of transfers across the machines, it’s not worse than any other sorting algorithm. You could even parallelize it, because the order of operations doesn’t really matter - if each machine comparing it’s value to it’s neighbors doesn’t need to swap, all of the data has been sorted.

But this is very much a hypothetical scenario.

1.

I was going to try to make this example more realistic by saying that you want to sort a linked list while using a very small memory overhead (so that you can’t just stick it all in a vector, sort the vector, and turn it back into a linked list). But actually even in that situation, you could use comb sort for O(n logn) time. For each comb size, you’d need to walk the list a bit to create the first comb, but advancing the comb by 1 is constant time. The comb sizes decrease exponentially, so the sum of their sizes is O(n logn).

2. 14

Type assertions make up a large class of unit tests endemic to codebases written in dynamically typed languages. They, too, become redundant.

1. 7

Citation needed.

Unit tests test values. As values have types, this implicitly tests the type without having to explicitly do any work.

Conversely, static types do not check values, thus you still need unit tests.

1. 9

Where the type ends and a value starts is not that clear. Think for example about a builder pattern where a specific kind of builder restricts the parameters you can set at compile time. Or basic non-nullable types that eliminate the checks and tests for them. Or ADTs where you replace integers enumerating something by alternatives with extra attributes attached. Or integer types of limited range like only positive values. Or a list of http headers with type (ContentType, string)|(ContentLength,int)|… Or an object which gets changed from FooInitializer->Foo like a stricter builder pattern so you don’t need to check this.isInitialised(). Or …

I can go for a long time, but the point is - with good enough type system you make better types which lets you reduce the number of value, which lets you write fewer tests.

2. 9

In my experience type annotations are useful at a smaller scale than tests. Types catch a lot of mistakes that would be really boring to test at the same level. Tests at that level would be too tightly coupled to the implementation, and would hinder refactoring. It’s harder to fix a type error revealed by a higher-level test because the runtime error often does not occur at the point where the mistake was made. Type annotations also reduce the amount that needs to be covered in documentation.

3.

Indeed, I am yet to be convinced that types are more useful than tests.

1. 11

I’d frame it more as: types and tests work very well together. Types can make a whole bunch of tests redundant, but it’s still critical to have a comprehensive set of example based and property based tests.

1.

There is actually research into better testing through using a strong type system (formal verification) as a counterpart. For example, fuzzy testing can be much more effective that way, e.g. you can specifically generate inputs that will execute a different code path, speeding up testing considerably.

This was mentioned in an episode of Signals & Threads, titled Compiler optimization.

2.

From a small study we conducted, if you have a coverage adequate test suites, type correctness checks is redundant. However, not the other way around. Note that this is a very small study, and in Python with mypy (with admittedly limited type system), and I did not get a lot of bandwidth to expand this study further.

On examining the actual source code of projects, it was very difficult to find areas where adding types could have detected more errors, and it was also difficult to find places where I could be sure of type annotations sufficiently to avoid writing test cases.

1. 8

This study shows that a test suite with good coverage catches more bugs introduced by mutating the existing code than just type annotations. Which makes sense: type annotations won’t check behaviour, and errors caught by the type checker will very likely cause incorrect behaviour.

This is very limited perspective that doesn’t reflect how code is written, which is an interaction between human and computer. “Will the bug get caught?” is easily measurable, but it’s just a part of the picture.

“How long will it take for bug to be flagged after it’s introduced?” Mistyped code, depending on the language/toolchain, either won’t even compile, or will be flagged very early in the process (it’s likely to be underlined right in the developer’s editor, in the worst case it will fail linting phase which usually comes before functional tests and runs faster (okay, the study used mypy, it won’t run fast, but for the sake of argument let’s pretend it was pyright)).

“How hard will it be to fix the bug after it’s flagged?” Type errors are usually flagged right where the problem is, or one level up/down the call chain. Functional errors introduced by using wrong type can show far downstream and need to be traced up. Error was correctly flagged, but finding what caused it is work that would have been prevented by the typing system.

On top of that, in my experience, an expressive type system can provide a blanket protection against entire categories of errors and enforce best practices, many of which would be hard to enforce in a test suite, like:

• In Rust (unlike Go or C), I won’t forget to check for errors, because `Result` type has to be used. I can ignore the error explicitly, on purpose. Test suite won’t flag that you haven’t checked that `malloc()` haven’t returned 0, and you’ll have 100% test coverage (because problem is “not enough lines in the first place”, not “lines not covered by tests”, and if you thought to include the “`malloc()` fails” scenario in the test suite you’d have already checked for it in your code) and green tests until your software runs on a memory-starved machine under production load
• Similarly, in Typescript I won’t just access an attribute of a returned value that might be `null`. The test suite won’t help you spot that, like in previous scenario.
• In concurrent Rust, the typing system will let me know when I have a data race and the code won’t even compile. Good luck testing for that (`go build`/`go test` has a `-race` flag, not sure how useful it is in practice)
• Embedded Rust uses types (lifetimes and borrow checker) to ensure that a hardware register is not used simultaneously from multiple places, and many higher level APIs use typing to enforce safe usage of hardware features in compile time. While it’s relatively easy to test your logic using unit tests on a full computer with an OS, it’s not trivial to reflect embedded conditions in a way that could expose unsafe register access – I have no idea how would I do that, TBH, maybe hope qemu is good enough simulation (I’d have to somehow emulate the specific registers used by my code, though). That alone saved me a lot of debugging.

I also don’t think that mutating existing, already debugged code would introduce any of these issues.

I’m not saying types are more useful than tests, though. They complement tests, can replace the most boilerplate kind of tests (oh, the fact that you write just type declaration instead of manual assertions or boilerplate test cases also saves developer’s time and effort), will catch some kinds of errors faster, and you can focus on testing your code’s behaviour.

2. 7

You can emulate a type system in an “adequate” test suite, and not vice-versa. That’s pretty obvious. The adequate test suite becomes much smaller when you can lean onto a strong type system (Python’s type hints are good but not sufficient in my opinion) to sort out the basics, and you don’t have to ensure it is comprehensive in that regard because your compiler/type validator will do it for you.

1.

You can emulate a type system in an “adequate” test suite

Not actually true. Test cases test a single point of execution; type systems check a whole swath of possible executions, often infinitely many.

Consider a type signature on a function that takes a String and returns an int. The type checker will verify that it returns an int for every possible input string, of which there are infinitely many. If the function actually returns a bool instead of an int, but only if its input is “boolean-please”, that’s hard to catch with a test case but easy with types. Or more realistically, imagine it’s an integer parsing function which accidentally returns `None`(Python)/`Undefined`(TS) if the string has a sign and also a leading zero. Yes, you should have a test case for that, but notice that that test case is testing this one particular example whereas the type system is checking all possible execution paths at once.

Or consider the type signature `function sort<T: Comparable>(list: Vec<T>)`. This says “for every type `T` such that `T` implements `Comparable`, this function takes a vector of `T`s and returns nothing”. It guarantees that the function does not invoke any method on `T` except for those available on `Comparable`. Pretty sure I’ve seen bugs of this nature, where a function accidentally invokes operations it shouldn’t and thus works for only a subset of the inputs it should.

(Notice that I’m not saying “types are better than tests”. A lot of this thread reads as a “types vs. tests” debate, which I find odd since they complement each other so well.)

1.

rough draft:

``````test_foo_method_takes_only_int:
for a in all types:
if a is int: continue
assert_fail(foo(a.first))
``````

(and yes, this isn’t exhaustive. It could be extended to be that way, by integrating the equivalent of what is pytype/mypy/… for the language in question and doing it)

It’s not very sensible to do, but once a language is sufficiently late bound that it needs shenanigans like that, it probably supports them, too.

3.

From a small study we conducted, if you have a coverage adequate test suites, type correctness checks is redundant.

And how do you get to that adequate test suite? Surely you found many bugs while writing it? How fast could you fix those bugs? My personal experience with types is that they speed up that initial phase, because so many of my errors are caught right there by the compiler, and the error message is very close to the actual cause of my error. If I skipped that step and went straight to the test suite, many runtime errors would not be so obviously linked to the root cause, my bugs would be harder to fix, and my initial development would slow down significantly.

I’ve experienced it first hand trying to write an Earley parser in Lua. I wasn’t just slowed down, I was incapable of understanding why I was adding 2 functions, or why I was calling a number. This was a 50 lines algorithm.

2.

Types do help me figure out where the logic is not watertight to almost an arbitrary degree I choose. It’s an acquired skill to structure your data in a way that makes illegal states impossible to represent and it’s obviously always limited. Other cool technique involves never branching on values, but parsing and branching on parse results instead. The results tend to be pretty good in practice. Well, at least in my experience.

I would say that strict typing especially shines for format conversions, transformations and similar tasks where I want to make sure that for every possible input there is a non-exceptional output. It’s just easier for me to convince myself of the correctness by construction than by trying to devise good enough tests.

This is probably something one has to explore for themselves in practice, ideally with a language that actually has an expressive type system and not just a way to specify memory layout.

1.

I’m absolutely a proponent of strong static typing, but non-dependent type systems are very limited in the grand scheme of things in their capabilities, fundamentally. Depending on which niche you are working on, they may or may not be that useful. Stuff like parallelism (not the embarrassingly parallel kind), emulating some other system’s behavior (where you effectively recreate the “object system” of something else), etc are areas where they don’t shine too much.

4.

I don’t agree that unit tests have a bunch of type checks (unless you have APIs that are returning varying types based on input, like with the second parameter to `open` in Python, but even then, that’s getting into dependent type territory).

Having said that, there is a category of unit test that is way less needed: the structural transformation test. Like if you have a function that takes and ID and returns a User, you could have the basic unit test, but it’s very likely that it does the “obvious thing”, if only because it is implemented.

I think most people write the test anyways just to check or something, but if you are able to conjure up a more complicated type from a less complicated one, it’s hard to write an implementation that is wrong unless you did it on purpose IMO.

5.

All types constrain values. It’s fundamentally what they are.

2.

We don’t yet have evidence that they do, at least not so far. Furthermore, from our admittedly small study with a limited number of small python programs tests can make types redundant. Not the other way round.

1.

This study does not test that though. It tests the number of bugs in programs. But it’s not “typesystem vs #bugs” it is “typesystem vs #bugs vs speed”. In other words, development will just move faster until there are “too many bugs”. So without also measuring speed of development, the study is kind of pointless.

1.

What the study does is to use Mutation Analysis, a principled fault injection technique for evaluating the effectiveness. Mutation analysis exhaustively generates all possible first order faults, and finds which of these faults would be detected by the given test suite/static analysis technique.

As far as I understand, mutation analysis/mutation testing is the best available tool we have for evaluations of this kind.

1. 5

As far as I understand, mutation analysis/mutation testing is the best available tool we have for evaluations of this kind.

Regardless, it’s nowhere near enough. A quick look at your study tells me they didn’t test how much time it took for a human dev to reverse the mutation that was detected. And I’m pretty darn sure that mutations caught by the type system are fixed faster when you get a type error, and slower when all you get is a bunch of failed tests.

2.

Sometimes it’s sufficient to use enough anecdotal evidence. If I had to have scientific evidence for every decision I have to make, I wouldn’t get anywhere.

No matter what the study does/did, it doesn’t help with the original question. Because as I said, you can just reduce the number of bugs by (manual) testing. But you have to spend a lot of time to do so. This time hadn’t and can’t be measured by any study easily, so the original question (which ultimately is about productivity) can’t be answered.

It’s also not a black/white question, as it might depend on many things, including the size of the project. Saying “a chainsaw is more productive than an axe” is also wrong without mentioning the circumstances.

2.

Python type annotations are not enforced. If they were, then they would indeed make tests for type validation redundant.

1.

Please see the paper. If you have unit test coverage, you can be reasonably sure that the covered parts are type safe. Not the other way around even if types were enforced (our results were based on running type checks).

3. 5

If you are able to delete the runtime checks from the Ruby code and replace them with static types, then the runtime checks were never necessary in the first place. The semantic equivalent to a type-check would be an assert statement, not a check and raise. So either the Ruby code is wrong or your translation is wrong. You should only need to validate data at the program’s boundaries with untrusted sources. At the interior of the program, it should expect data to already be valid. The superfluous runtime checks in the Ruby code are indicative of a lack of discipline around what data is trusted or not trusted in the program and that means it’s likely there are security vulnerabilities present.

EDIT: A classic mailing list entry on how to properly sanity check arguments in Ruby: https://blade.ruby-lang.org/ruby-talk/100511

1. 19

The discipline argument has always been odd to me. Have you worked on large projects with multiple different people? How do you expect everyone to keep all the assumptions about what data has been validated (and how) in their heads, how do you synchronize it even?

Offloading a hugely amount of tedious work that a computer can do quickly and automatically to humans is, in my opinion, a ridiculous proposition.

1.

How do you expect everyone to keep all the assumptions about what data has been validated (and how) in their heads, how do you synchronize it even?

I referenced this in my comment. Use an assert(). Redundant checking of invariants is wasteful and sloppy.

1.

You’re ranting sufficiently hard that it’s difficult to tell what you’re trying to say.

It sounds like you’re contrasting `assert(condition)` with runtime checks, but `assert(condition)` is a runtime check. It’s executed at runtime, not at compile-time, and it checks whether `condition` is true. The difference between the two is whether the error is expected to be caught or not, I think?

My best guess at what you’re actually trying to say is:

• For all internal facing interfaces, assume that no mistakes are made and all data is well formed. If you’re going to check at all, check only using `assert(condition)` statements in unit tests. (That’s what the link you shared says, though you didn’t mention unit tests.)
• For all external facing interfaces, don’t trust your users and put runtime checks on everything.

Except I’m not sure why you’re attacking this article (“superfluous”, “lack of discipline”, etc), because it seems perfectly likely that the interface it discusses is external facing, in which case it is following your advice.

Responding to your actual point, that’s a very good approach and I try to follow it too, with one very large exception. If you’re working in a medium to large codebase (say 10,000+ LOC), it’s unrealistic to expect all data to be well formed and all function arguments to be correct. Thus I aim to organize code into modules (which means different things in different languages) that are impossible to misuse. If a “module” needs to maintain an invariant, it must either (i) have an interface that makes it impossible to violate that invariant, no matter what the calling code does, or (ii) check that the invariant is obeyed in its boundary, and error if it’s violated. This way every “module” guarantees that its invariants are obeyed within it, and if they’re not it’s a bug in the module (and you don’t have to look elsewhere).

1.

check only using assert(condition) statements in unit tests.

assert statements are not only for unit tests. They are also enabled in debug builds.

Except I’m not sure why you’re attacking this article (“superfluous”, “lack of discipline”, etc),

Not sure what you mean by “attacking.” If you mean that my intent was to personally demean or disrespect the author, then that’s incorrect. My intent was to accurately describe what I observed in the article. Those are neutral words in my vocabulary when discussing engineering.

it’s unrealistic to expect all data to be well formed and all function arguments to be correct.

This statement is equivalent to saying that it’s unrealistic to produce working code.

If a “module” needs to maintain an invariant, it must either (i) have an interface that makes it impossible to violate that invariant, no matter what the calling code does, or (ii) check that the invariant is obeyed in its boundary, and error if it’s violated.

This is false. You’re expressing a personal API design philosophy or engineering discipline but not a hard reality. For a topical example, the standard multiplication operator does neither (i) or (ii) and freely allows incorrect usage that results in overflow. Often it’s not feasible to always check at runtime that the operands of the multiplication operator don’t result in an overflow nor have some special paired number type that guarantees that its members don’t overflow when multiplied. The reasonable approach is to structure the program such that it’s an invariant that the operands at the multiplication site don’t overflow, with a debug assertion. Not only is this the reasonable approach, this is the approach Zig and Rust take.

1.

Not sure if you’re deliberately trolling… if you’re not, know that (i) you have misunderstood what I said, and (ii) I still don’t know what you’re advocating for, and it would be helpful if you clarified that instead of EDIT: arguing against your repeated misunderstandings of other people’s positions.

1.

Can you explain what it is you believe I have misunderstood? I thought my previous post was sufficiently comprehensive.

I don’t believe I was “attacking” you per my previous comment.

1.

Can you explain what it is you believe I have misunderstood?

I don’t believe that would be productive. What would be productive is if you simply wrote down what you’re advocating for. I made a guess above, is that accurate, modulo that you would put `assert`s (which execute at runtime in debug mode but not in production) in the code as well as in unit tests?

I don’t believe I was “attacking” you per my previous comment.

Apologies, I should have said “arguing against” instead of “attacking”. A lot of your phrasing makes it sound like you’re angry, though I realize now you’re not. Edited.

1.

I made a guess above, is that accurate, modulo that you would put asserts (which execute at runtime in debug mode but not in production) in the code as well as in unit tests?

Your initial guess was correct, sans the implication that asserts should only be run in unit tests.

1.

Ok. First of all, please realize that what you were suggesting wasn’t clear until now. Actually I’m still not entirely clear on how many `assert()`s you suggest having. Should most internal invariants be checked with `assert()`s in well chosen locations? Your first comment seems to me to say “yes” in some places and “no” in others, to that question.

Let me try to state the crux of the matter precisely. Say you’re working in a company with a hundred other employees and 100,000+ lines of code, and everyone follows your advice to the letter. What will happen is that you’ll end up with bugs that are extremely difficult to debug, because the place they manifest is far away from the actual bug.

For example, there’s a data type `Account` that requires a complicated invariant. If this invariant is violated, it doesn’t cause an error in the `Account` code. Instead it causes another invariant in a different class `History` to be violated (because the code that interlinks the two relies on the `Account` invariant), and then some code that uses `History` ends up accessing an array out of bounds. So now you see an array out of bounds error (hopefully you’re not working in C++, so it is at least guaranteed to stop there), and you need to work backwards to see that it came from a bad `History`, and then work backwards from there and see that it came from a bad `Account`, instead of one of the dozen other ways a `History` could have been corrupted if there was a bug elsewhere in the codebase. Oh, and this error wasn’t caught in your unit tests and only happened in production, once.

In contrast, if you consider the boundary of `Account` and of `History` to be “external facing” and check their invariants on their boundaries, even in production, then you’d get a very clear error message at the actual cause of the error: an invalid `Account` construction.

I’ve worked at a company where all the code assumed you wouldn’t make a mistake, and at a company where the code assumed you might make a mistake and would error if you used it wrong. Issues at the former company were so much harder to debug, it was a nightmare.

1.

Actually I’m still not entirely clear on how many assert()s you suggest having.

You can have zero asserts or an unbounded amount of asserts. Assertions on input data are not significantly wordier than type declarations.

What will happen is that you’ll end up with bugs that are extremely difficult to debug, because the place they manifest is far away from the actual bug.

Then assert at the earliest location possible. There is usually little downside.

1.

Then assert at the earliest location possible. There is usually little downside.

The downsides of runtime checks, according to you:

the runtime checks were never necessary in the first place

The superfluous runtime checks in the Ruby code are indicative of a lack of discipline around what data is trusted or not trusted in the program and that means it’s likely there are security vulnerabilities present.

Redundant checking of invariants is wasteful and sloppy.

However, you’re saying that you can assert with little downside. So this whole time you’ve been quibbling about the difference between:

``````  unless min_length.is_a?(Integer) && min_length >= 0 && min_length <= 255
raise "Minimum length has to be between 0 and 255"
end
``````

and

``````assert(min_length.is_a?(Integer) && min_length >= 0 && min_length <= 255)
``````

???

1.

Assertions are semantically different than explicit checks that return errors or raise exceptions. An assertion means that the caller is required to make sure arguments satisfy certain properties, otherwise unspecified behavior may occur. An explicit check is a specification that in the event that arguments are invalid, the function will return an error or raise an exception.

Given the meaning of an assertion, it’s valid to disable them in production. That’s why this distinction was created.

1.

You say this like it’s an unassailable fact, but it actually varies between languages and between developer conventions.

My Google searching suggests that Ruby doesn’t even have an `assert` statement outside of a unit testing module, so it’s a strange place to start this discussion.

Rust has both `assert`, which runs in both debug and production, and `debug_assert` which runs only in debug mode. You often need to use `assert` instead of `debug_assert` to uphold Rust’s safety guarantees. E.g. if a public non-unsafe function could cause memory corruption if passed invalid arguments, it is incorrect for that function to not check those arguments in production, according to Rust’s safety guarantee. Rust’s safety guarantee is taken pretty seriously: e.g. I think most Rust library authors would remove any dependency they knew violated it.

As far as developer conventions: it is entirely possible for developers to implement an assertion using an exception. An assertion (as you’ve defined it) is a semantic concept, which has many possible implementations. There are all sorts of reasons that a developer might choose to implement an assertion with an exception.

For example, here’s the code from the article, written using an assertion (remember, as you pointed out, assertion is a semantic concept, so it’s perfectly fine to implement it using `raise`):

``````// Initialize with a length between 0 and 255, inclusive.
// The behavior is unspecified if the length is not in this range.
def initialize(min_length)
unless min_length.is_a?(Integer) && min_length >= 0 && min_length <= 255
raise "Minimum length has to be between 0 and 255"
end
....
end
``````

The difference is that I added a comment saying that the behavior is unspecified if the argument is invalid. (It should be in the function documentation; I don’t know the syntax for that in Ruby but pretend the comments are official documentation.) To make debugging easier, the “unspecified behavior” is implemented as an exception that will be raised, even in production.

otherwise unspecified behavior may occur

The central point that I and others have been trying to convey, is that unspecified behavior in large systems is really bad. It makes debugging a nightmare. You need some, but should keep it local.

1.

At a high level I think your main point of contention is that redundant validity checks (so-called “defensive programming”) result in a more resilient system whereas in my opinion it’s a symptom of a system that was designed without proper discipline around trusted / validated values. These are conflicting principles.

The “defensive programming” principle, if taken to its logical conclusion, results in a system where every argument is validated upon every function call and its not clear which part of the program is responsible for the primary validation of input data. Long-term maintenance and refactoring becomes difficult because it’s not clear who does what. Whereas with the “unspecified behavior” principle, if taken to its logical conclusion, results in a system that may start failing silently if there is a single bug in the program in a difficult-to-debug way, though it’s more clear what part of the system is responsible for validation and makes long-term maintenance and refactoring easier.

I think you want to bias towards building the latter system for hopefully obvious reason (e.g. explicitly checking a value you know to be valid over and over again serves no purpose and is confusing to people who read the program). To mitigate the issue of incorrect code / logical errors / “bugs”, asserts provide a way to materialize into code the abstract specifications about what functions expect. They serve as living executable documentation that can be selectively enforced depending on the paranoia of the programmer. If when engineering your system, you expect the invalid values or states to happen only when there are bugs in the program, then it’s reasonable that you enable assertions during testing and debugging and disable them once you are ready for release and all functional specifications of the program have been QA tested.

One issue is that lots of web programmers never see a proper “release” so the debug-enable / release-disable convention with assertions isn’t practical to them. The code is never done, there is no comprehensive QA, it’s never golden master. Therefore in those situations I’d recommend just always leaving assertions enabled, regardless of language used. I’d still recommend keeping the assertion concept distinct from normal logic in the program, if only to provide option value and serve as documentation for other programmers and perhaps static analysis tools.

1.

I’m strongly against redundant validation. Validation should happen in exactly one place. But let’s end this here.

1.

Thanks for the chat, I enjoyed it and I hope the exposure to a different point of view was helpful. Perhaps you can suggest using asserts more liberally to your team the next time you find yourself working in a large codebase that has no mitigations for programmer error, as a sort of compromise to reduce the pain of debugging. Best wishes.

1.

Sorry I can’t say the same, I found the conversation extremely frustrating. I can’t say I’ve learned anything, because even at this point I’m not sure if we have different views. Please try to learn from the fact that this was a huge thread with a lot of people being very confused at what you were saying. You need to give way more context and explanation up front.

2.

You seem to miss the point that some bugs only surface in production.

1.

If you don’t trust your code to operate correctly in production then leave assertions on. Use a process monitor to monitor when assertions fail and provide reasonable feedback to the user. Or use a more restrictive language.

1.

So your suggestion is to write the Ruby example exactly as written in the original article? What are we even debating at this point?

1.

That’s not my suggestion. There is a semantic difference between assertions and explicit checks. In simple terms, assertions are the runtime version of type-checks. Explicit checks are new behavior. Assertions afford you the ability to turn them off and indeed I run my production code with them off. I recommend that everyone build their systems such that they can feel confident turning off assertions in production.

2.

So what is wrong with the next part of @justinpombrio’s reply?

[…] it seems perfectly likely that the interface it discusses is external facing, in which case it is following your advice.

1.

My original comment pointed to the semantic inconsistency between the original Ruby version and the translated Crystal version. The article stated that the validation check was deleted in the translated version implying that it was redundant validation code. Redundant validation code signals a lack of discipline in terms of how trusted and untrusted values are handled throughout the code. TFA’s author clarified that the semantics of the translated code were in fact different from the original and that the validation was not simply deleted as originally stated in the article but moved.

2.

How do you expect everyone to keep all the assumptions about what data has been validated (and how) in their heads, how do you synchronize it even?

I recognize this is an OO idealist perspective, but like sandi metz says, if statements are type checks.. don’t type check.

Parse, don’t validate applies to dynamic languages just as much as static, note that these ideas becomes easier when you work with immutable data so you only have to check invariants in constructors.

After you define your types (classes, records), you write your code that depends on those assumptions as methods, IMO this is better in languages like clojure where you can define these away from type definitions via a protocol.

This to me is the core idea that people miss who either do OO wrong, or knock it.

I’m not saying its easy, I think in fact it still does require a lot of discipline, but its a well defined discipline.

1.

I don’t think that has anything to do with OO? The same argument can be used for statically typed functional languages like Haskell.

1.

I don’t think that has anything to do with OO?

Yes, that’s actually my point.

What is interesting to me is this form of classification + polymorphism, and importantly the kind of dynamic code that the author was trying to replace with static typed code wouldn’t exist in this paradigm.

To be clear, my whole point is that this debate is actually bigger than static v dynamic, or FP v OO, its about generic code & there are really good examples of that in all these paradigms.

2.

I generally agree with this. In this library’s case, it is at a program boundary, so I think the validations are sensible.

1.

If that’s the case, then the translation is not equivalent. The initialize method should take a wide integer type and then do the sanity checking for the correct range inside that method just as the Ruby code does. Since you replaced the argument type with a narrow integer type, that implies that you have moved the sanity checking for the value’s range to before the initialize method is called. That changes the contract of the translated method.

1.

While you are technically correct in the sense that the domain of the two function is not identical, I think you’re being overly pedantic or disingenuous. The whole point is that static type checking allows you to statically restrict the domain in a way that is simply not possible in dynamic languages. The only reason why the domains are not identical, is because Ruby lacks the ability to restrict domains at all. As a result, the Ruby function is not total.

In other words, of course the Crystal version should not take a wider integer range as input. That way, library users cannot pass in arguments that result in an exception, a thing that is not possible in Ruby.

1.

disingenuous

An accusation of being disingenuous requires an assumption of bad faith, which I think would be unreasonable and/or unwarranted in this context.

The only reason why the domains are not identical, is because Ruby lacks the ability to restrict domains at all.

Where input validation happens is incidental to whether the program is statically or dynamically typed. It’s not a given that the reason the input validation occurred in the `initialize` method originally is because Ruby lacks static typing. An alternative Ruby implementation could have been as follows:

``````class Foo
# min_length is expected to be integer-like in the range of 0 <= 255
def initialize(min_length)
@min_length = min_length
end
end

obj = Foo.new

# make sure min_length is according to spec
unless min_length.is_a?(Integer) && min_length >= 0 && min_length <= 255
raise "Minimum length has to be between 0 and 255"
end

obj.initialize min_length
``````

In other words, of course the Crystal version should not take a wider integer range as input. That way, library users will helpfully know how to actually use it correctly.

I don’t think that’s a given either. The input validation has to happen somewhere. In the original code the author says that it was specified to occur in the `initialize` method. When translating a program from one language to another I think it’s less error-prone to not change input validation sites and method contracts as much as possible. Every call site now needs to be aware of the new semantics of the `initialize` method.

1.

Where input validation happens is incidental to whether the program is statically or dynamically typed. It’s not a given that the reason the input validation occurred in the initialize method originally is because Ruby lacks static typing.

As @EvanHahn already pointed out, the `initialize` function is the library boundary. I therefore think it is reasonable to assume that input validation has to happen in the initialize method.

When translating a program from one language to another I think it’s less error-prone to not change input validate sites and method contracts as much as possible.

I don’t think that is unreasonable, but I also think it depends on the context. You could start out with a exact replica, make sure everything type checks, and then gradually add stricter constraints, letting the compiler help you maintain functionality along the way. If you aren’t taking advantage of the features of the new programming language, why are you translating your program at all?

Every call site now needs to be aware of the new semantics of the initialize method.

True, but the compiler will tell you. So there’s no chance you will accidentally misuse the initialize method. It is simply not possible.

1.

As @EvanHahn already pointed out, the initialize function is the library boundary. I therefore think it is reasonable to assume that input validation has to happen in the initialize method.

Library boundaries are not necessarily trust boundaries. In fact, they often should not be since it’s usually only a small percentage of your program where you are accepting untrusted user input.

1.

I… am not sure what you are saying here? Aren’t library boundaries literally the exact points in a library where you are accepting “untrusted user input”? Perhaps I am misunderstanding what you mean by trust boundary, or what kind of untrusted user input you mean?

To clarify, in the case of a library I use the word “user” to denote the developer using the library. The “input” is the arguments that the user passes to the library boundary functions. That user input is (and should be) untrusted by the library author.

1.

I… am not sure what you are saying here? Aren’t library boundaries literally the exact points in a library where you are accepting “untrusted user input”? Perhaps I am misunderstanding what you mean by trust boundary, or what kind of untrusted user input you mean?

Untrusted user input means input coming from outside of the process, in particular the actual input to your program from its users. Any agent that your program can not trust to produce valid input.

There is no functional trust difference between library code and your own code, it’s simply code that is packaged separately. Regardless of how the library code is packaged, you must trust it does what it is specified to do and it must trust that you use it the way it is specified to be used. The same trust applies to your own code. If you cannot trust a library or the library cannot trust you, you cannot link to it liberally and expect your program to function correctly.

1.

Aha, I’m glad we cleared up that confusion! I disagree with what your opinion regarding library code, but I recognize that it is a valid stance.

2.

Yeah, the translation is not equivalent and the contract is different. A one-to-one port should do what you describe.

4. 2

Previously, on Lobsters, nobody had anything to say about these concepts. I am disappointed that we are now engaging with a nearly-content-free summary provided by a racist neoreactionary. (Receipts are here, particularly these emails.) If we want to discuss the machine-learning papers, then we should prefer commentary from people who understand the underlying maths and who are not constantly trying to nudge the reader towards their hateful beliefs.

1. 6

we should prefer commentary from people who understand the underlying maths

I strongly agree with this part. I also have a few problems with some of Scott’s other views, but I think that’s unimportant in this context. The main thing is that we would be reading stuff by people who know what they’re talking about.

2. 8

Yes, he follows the classic racist neo-reactionary hate-spreading strategy of writing a long and detailed argument against neo-reactionary beliefs a decade ago and not writing much of anything about race that I recall. Are you worried that people who read this article about machine learning are going to become neo-reactionaries by contamination?

This post is being upvoted because it’s a well explained summary of a complicated topic. If you know of another introduction that’s as accessible from someone who understands the underlying math better, please share it!

1. 4

Yes, he follows the classic racist neo-reactionary hate-spreading strategy of writing a long and detailed argument against neo-reactionary beliefs a decade ago and not writing much of anything about race that I recall.

Well, the leaked email in which he defends reactionaries and “human bio diversity” dates to several months after that. And I think that to this day he’s a fan of Charles Murray, which on its own I consider enough of a red flag to avoid anything he writes.

3. 4

Previously, on Lobsters, nobody had anything to say about these concepts. I am disappointed that we are now engaging with a nearly-content-free summary

Shows the power of a good summary?

1. 3

I’d say it shows the power of a site that it popular on HN. The submission https://news.ycombinator.com/item?id=38438261 reached #2 on the frontpage on 2023-11-27.

1. 5

I’m just saying it’s not surprising that a well written pop article gets more engagement than a paper that presupposes expert knowledge of the state of LLM research, HN or here.

4. 5

The best you can come up with is ten-year-old email in which the author explains why he chose not to plug his ears when neoreactionaries were talking? I’m not saying he’s racist or not – it’s always hard to tell with these bright young dudes who believe they can think their way past human problems with pure reason – but these “receipts” are weak as hell.

Between the presupposition that he’s a racist and the description of this summary as “nearly-content-free”, if there’s an agenda to discuss here, it might be yours.

1. 6

This is not just undeniably racist, he also attempts to “hide his power level” like many racists do:

1. HBD is probably partially correct or at least very non-provably not-correct. https://occidentalascent.wordpress.com/2012/06/10/the-facts-that-need-to-be-explained/

[…]

(I will appreciate if you NEVER TELL ANYONE I SAID THIS, not even in confidence. And by “appreciate”, I mean that if you ever do, I’ll probably either leave the Internet forever or seek some sort of horrible revenge.)

(To this day, I believe Scott has not followed through on his threat.)

1. 5

This is probably because I’m not running in the circles that use the acronym much but I still haven’t figured out what HBD expands to (if anyone wants to tell me, I would be extremely appreciative of you spending your time educating me, but please do it by DMs because I’m about to argue this whole thread is off-topic).

I have an extremely broad view of what’s technical enough to be on-topic for lobste.rs as my posting history hopefully suggests, but I think this criticism is off-topic. It doesn’t even try to tie the criticism of the author to the content they posted - I’d have a very different opinion here if there were a cogent argument of how the article is misleading or wrong, explained by the author’s underlying biases (which, for all I know, are pretty heinous, I just am not qualified to interpret these particular receipts). That would make those biases clearly relevant. As is, we’re being asked to discredit the post solely based on who wrote it, and the only evidence that the person is inherently untrustworthy is a non-technical argument that I think only makes sense to someone who already agrees with it.

With regard to the posts - the new paper appears to be going one step further by demonstrating that you can disentangle the representations discussed in the old paper by using autoencoders, which is pretty neat and doesn’t seem inevitable, so it seems like a new development worth a second look. I agree that this particular writeup isn’t all that helpful - the whole business specifically hinges on one sentence “On the far left of the graph, the data is dense; you need to think about every feature at the same time.” and I genuinely cannot figure out what in technical terms is meant by this. I think they need to unpack those ideas in order to provide something helpful, but I’m not sure if they understand either. The writeup seems fairly fond of handwavey explanations, and I’ll want to get back to the original Anthropic paper to have any hope of untangling it.

1. 4

DMed.

5. 3

I’ve been reading Astral Codex for several years and have seen no evidence he’s a ”racist neoreactionary”. He’s got a very comprehensive essay (linked in another comment here) tearing down the alt-right ideology, and he describes himself as basically progressive. He’s not fond of the more leftist “woke”/“sjw” ideologies, but I’ve got problems with those too, and I think it’s fair to debate these things.

If he wrote something reprehensible ten years ago in private email, and has since recanted and abandoned ties with that belief system, I’m willing to forgive. People do change their ideologies over time.

I found the article informative. I’m not at all an expert on neural networks, and I’m sure if you are you would find it simplistic, but the original “Toy Models” paper you linked isn’t IMO approachable by non-experts. I read the first few pages and there was way too much jargon and assumed shared knowledge for me to get anything from it.

I see a lot of articles posted to lobsters written by people who are not experts on the topic and who sort of oversimplify to some degree — stuff like “hey, I wanted to learn how malloc works, so here’s my toy implementation”. Sometimes these annoy me and I’ll point out mistakes, but this kind of intro can be really useful for people learning about the topic.

5. 15

In case you don’t understand (like initially myself):

Go https://zig-play.dev/ and paste in:

``````const std = @import("std");

const AAAA = struct {
foo: [100]u32,
};

fn aaaaa(a: AAAA, b: *AAAA) void {
b.*.foo[0] = 5;

std.debug.print("wtf: {}", .{ a.foo[0] });
}

pub fn main() !void {
var f: AAAA = undefined;

f.foo[0] = 0;

aaaaa(f, &f);
}
``````

Result: `wtf: 5`, while the `a` was assumed to be a copy.

1. 6

Ah so “anything” rather than everything.

1. 2

good catch! I’ve updated the article title (can’t change the lobster one though)

2. 1

And is that expected behavior, or a bug that will be fixed? If it will be fixed, it doesn’t seem like a big deal, unless the point is that the bug reveals the underlying behavior of things being passed by reference under the hood?

1. 1

it is UB (I think), not a bug (by design), just how Zig works for now

1. 1

The zig language has a spec IIRC. So if this behaviour isn’t in the spec then it would be a bug, no?

1. 7

Oh, it’s explained in the link at the top of the post. Which I followed earlier but (impressively) failed to notice what it said.

https://ziglang.org/documentation/0.11.0/#Pass-by-value-Parameters

Structs, unions, and arrays can sometimes be more efficiently passed as a reference, since a copy could be arbitrarily expensive depending on the size. When these types are passed as parameters, Zig may choose to copy and pass by value, or pass by reference, whichever way Zig decides will be faster. This is made possible, in part, by the fact that parameters are immutable.

So it’s not undefined behavior in the C sense, it’s merely implementation defined behavior. But this is one hell of a footgun. I’m very surprised this simple program has two possible behaviors. I would expect that the compiler would sometimes pass by reference, but conservatively ensure that the fact it was doing so was never visible, thus making a copy in this case.

1. 5

The optimisation in itself is reasonable. The fact that it’s not transparent, i.e., it doesn’t preserve value semantics, seems to be the bug.

6. 5

Hm no mention of integer overflow. A new language seems like an opportunity to do something about that, and report on the performance cost

As far as I know, the philosophy of the language isn’t to be fast at all costs, so it seems worthwhile

1. 4

I did a silly μbenchmark checking integer years ago and found that it’s not that much overhead, at least on the x86 architecture, but it may cause larger code to be generated.

1. 2

Microbenchmarks are somewhat misleading for this kind of thing because you are most likely to see overhead from additional branch predictor state. If the core is doing the right thing, branch-on-overflow instructions are statically predicted not taken and consume branch predictor state only if they are ever taken. I think x86 and 64-bit Arm chips all do this. RISC-V can’t because they can’t differentiate this kind of branch (they don’t have flags and so the branch looks like any other branch and so they can’t tell that this is a really unlikely one without additional hinting).

2. 2

I have yet to play around with Hare, but what I can find in the spec under Additive arithmetic is that signed integer overflow is at least not undefined behavior:

6.6.37.3 If an operation would cause the result to overflow or underflow the result type, it is truncated towards the least significant bits in the case of integer types, and towards zero in the case of float types. In the case of signed types, this truncation will cause the sign bit to change

1. 3

Well, kind of. The semantics of arithmetic operations in the presence of overflow are well defined. The semantics of arbitrary code that does arithmetic may not be. If some code does `a+b` on two positive values and the result is negative, then it will almost certainly do the wrong thing unless it’s explicitly checking for overflow.

2. 2

Ah thanks for the correction! Missed that

3. 2

It is mentioned in passing, in the “undefined behavior” section:

For a start, much of the behavior that C leaves undefined is defined by Hare. For instance, signed overflows are defined by the specification.

7. 2

If you’re increasing the bit depth of the single greyscale value `1234 5678`, wouldn’t it be more accurate to turn it into `1234 5678 7878 7878` (repeating the last byte) rather than `1234 5678 1234 5678`? That’s what my intuition says, but I don’t have a formal argument for it.

The three-value and four-value CSS hex color syntaxes are defined as repeating the first hex digit for each channel (RGBA). Color `#fa0` is equivalent to `#ffaa00`, and `#fa08` is equivalent to `#ffaa0088`. There is no CSS syntax that turns two digits for a channel into more digits, though, so we can’t compare CSS’s design to my idea above.

1. 8

Why just the last byte, rather than, say, the last bit, or the last 4 bits? That seems quite arbitrary.

Consider what happens when we use your proposed mechanism to extend 3->6 digits (no meaningful difference between decimal and any other base here):

126 -> 126 666
127 -> 127 777; delta = 1111
128 -> 128 888; delta = 1111. so far so good
129 -> 129 999; delta = 1111
130 -> 130 000; delta = 1. uh oh
131 -> 131 111; delta = 1111

Now use the linked article’s mechanism:

126 -> 126 126
127 -> 127 127; delta = 1001
128 -> 128 128; delta = 1001
129 -> 129 129; delta = 1001
130 -> 130 130; delta = 1001
131 -> 131 131; delta = 1001

(Not so coincidentally, this mapping can be implemented as—or, rather, is rightfully defined as—a multiplication by 1001.)

2. 4

1234 * 99999999 / 9999 = 12341234

1. 3

In more detail: first of all, this is in decimal instead of binary/hex, for clarity. 1234 is a 4-digit decimal color, and we want to convert it to 8-digit decimal. Dividing by 9999 (the largest 4-digit decimal value) converts 1234 to a fraction between 0 and 1 inclusive. Multiplying by 99999999 (the latest 8-digit decimal value) converts that fraction to 8 digits. Though you need to do the multiplication before the division because integer math.

3. 1

In CSS syntax, for RGBA, each digit in “#ffaa0088” is a hexadecimal digit (4 bits). The last byte is 2 of those digits.

In the article, for greyscale, each digit in “12345678” is a binary digit (1 bit). The last byte is all 8 digits. Repeating the last (and only) byte would be “12345678 12345678”.

4. 1

Moonchild’s sibling comment is a good answer to the accuracy question. I wouldn’t have known myself! CSS colors are actually a good example of the proposed technique in action since each 4-bit hex digit gets expanded to an 8-bit channel intensity by copying. That could’ve been a nice lead to my article.

8. 7

With tests, I can refactor my code. Without tests, your code will get worse and worse because you don’t feel confident enough to refactor

I wonder to what extent this is true. Poor quality tests that test the implementation is what it is also inhibit refactoring.

How do we know that codebases without tests are less refactorable? Not being refactored is evidence, but weak evidence - it might be a sign that refactoring is not needed.

How do you get out of this? Tests need to be non-negotiable. Tests are part of being a professional engineer.

How very Taylorist. The high paid people who don’t do the work will tell the plebs exactly how they should do the work.

1. 10

How do we know that codebases without tests are less refactorable?

Experience. Every large codebase I’ve seen has had a quagmire somewhere or other with the following properties:

• It’s bad. In the sense of much more complex than it needs to be, and filled with bugs, and being poorly tested, and probably also being extremely difficult to test even if you wanted to (e.g. from mixing state and IO with logic).
• Everyone is afraid to touch it, because if you touch it stuff will break. The normal way you feel comfortable modifying code is either from tests, or from understanding it, but the quagmire supports neither.
• Many people know some ways to make it less bad, but doing so would (i) require a ton of time, and (ii) likely break stuff in the process.

Thus no one makes changes to the quagmire except for really local patches, which over time makes it even worse.

1. 1

Compare Postgres vs. SQLite’s respective willingness to undertake invasive refactorings, and then compare each project’s test coverage. Just one data point but I think it’s telling.

2. 1

Ok but that doesn’t tell us that codebases without tests have this property of being a quagmire. That tells us that many quagmires have no tests.

In my experience useless tests can make this even worse.

1. 3

Yeah, there are two opposite failure modes: (i) not testing tricky things, and (ii) over-testing simple things, and tying tests to the implementation. Both are bad.

EDIT: I’m having trouble telling if you’re arguing against having any tests at all. If you are, have you worked on a codebase with >10,000 LOC? At a certain point, tests become the only reliable way to make sure complex parts of the codebase work correctly. (It remains important not to test trivial things, and not to tie the tests too closely to the implementation.)

1. 1

I’ve got files with several thousand lines of code. Believe me when I say you have to test things, but automated tests are not necessarily valuable for all codebases.

3. 1

The problem is that you’re expecting the same developers who wrote the awful codebase to write the testing suite. Whatever reason for the terrible code (time pressure, misaligned incentives, good ol’ ineptitude) still apply to the development of the tests.

Meaning if the code is really bad, the tests are going to be really bad, and bad tests are worse than no tests.

2. 3

How do we know that codebases without tests are less refactorable?

In my case, 20 years experience, 11 of which have involved decent-to-good test coverage (I don’t mean just line coverage, but also different types of tests to verify different aspects, such as load tests). In a well-tested code base, some tests will simply break, ideally within milliseconds, if you do anything wrong during a refactor. In an untested code base an unknown number of interactions will have to be manually tested to make sure every impacted piece of functionality still works as intended. And I do mean “unknown”, because most realistically sized code bases involve at least some form of dynamic dispatch, configuration, and other logic far removed from the refactored code, making even the job of working out which parts are impacted intractable.

Having automated tests is such a no-brainer by now, I very much agree with the author. But the author isn’t telling developers they need to write tests, they are telling managers they need to allow developers time to write tests.

1. 5

Perhaps for a better A:B example:

I have some things in LLVM that have good test coverage, and some that don’t, and some that do but only in a downstream fork. People routinely refactor the things with good test coverage without my involvement. They routinely break things in the second and third category, and the third is usually fairly easy for me to fix when I merge from upstream.

3. 3

How do we know that codebases without tests are less refactorable? Not being refactored is evidence, but weak evidence - it might be a sign that refactoring is not needed.

I once had to upgrade an existing site (built by someone else) to newer versions of CakePHP because of PHP support (their hosting provider stopped supporting old PHP versions, and that version of CakePHP would break). In which they’d completely overhauled the ORM. The code contained very tricky business logic for calculating prices (it was an automated system for pricing offers based on selected features). None of it was tested. In the end, I had to throw in the towel - too much shitty code and not enough familiarity with what was supposed to be the correct behaviour of that code.

The company had to rewrite their site from scratch. This would not have been necessary if there had been enough tests to at least verify correct behaviour.

In another example, I had to improve performance of a shitty piece of code that a ex-colleague of mine had written. Again, no tests and lots of complex calculations. It made things unnecessarily difficult and the customer had picked out a few bugs that crept in because of missing tests. I think I managed to “test” it by checking the output against a verified correct output that was produced earlier. But there were many cases where code just looked obviously wrong, with no good idea on what the correct behaviour would’ve been.

1. 1

In the first case it sounds like the core problem was business logic intertwined with ORM code, and secondarily a lack of indication of intent. Tests would actually have helped with both.

And in fairness the second one also sounds like tests would be the solution.

1. 3

Indeed. Now to be fair, both were shitty codebases to begin with, and the same lack of discipline that produced such code resulted in the code not having tests in the first place. And if they had written tests, they’d likely be so tied to the implementation as to be kind of useless.

But tests are what make outside contributions more possible by providing some guardrails. If someone doesn’t fully grok your code, at least they’d break the test suite when making well-intentioned changes. Asserts would also help with this, but at a different level.

4. 3

With tests, I can refactor my code. Without tests, your code will get worse and worse because you don’t feel confident enough to refactor

I wonder to what extent this is true.

True enough that my job the past 2 months was about porting a legacy component that meshes very poorly with the current system (written in C instead of C++, uses a different messaging system…). Having re-implemented a tiny portion of it I can attest that we could re-implement the functionality we actually use in 1/5th to 1/10th of the original code. It was decided however that we could not do that. Reason being, it is reputed to work in production (with the old system of course), and it basically has no test. To avoid the cost of re-testing, they chose to port the old thing into the new system instead of doing it right.

I’ve heard some important design decisions behind the Boeing 737 MAX had a similar rationale.

5. 1

Poor quality tests that test the implementation is what it is also inhibit refactoring.

Honestly it’s not that hard to make tests that test behaviors and contracts. I’d even say it’s harder to write tests that are so tied to the implementation that it makes refactoring hard

9. 9

I want to see if Zig can match Rust on most type safety

Comparing type system expressiveness is not a good proxy for type safety. C++ would score similarly to Zig and Rust in terms of type system expressiveness, but in terms of type safety all the implicit conversions between numeric types, or arrays and pointers, or the difficulty in constraining types to specific behaviors (up until C++20 at least) mean it’s much less type safe than Rust.

1. 1

C++ compilers can be configured to report implicit conversions as errors?

I think you’re marginally right that type system expressiveness isn’t the whole story, but it’s the structural typing / promotions per system that you have to be cautious about too. But it seems any type system worth anything will today error on implicit or dangerous conversions?…

1. 10

There are more warnings these days in C++ compilers, but it’s still qualitatively different than newer languages (Rust, Go, or Zig, I imagine). There’s significant C legacy, and no plans to change the language in these respects – most of the compatible improvements have been done already.

For example, `bool*` implicitly converts to `bool` (by design), so if you return `b` instead of `*b`, you will be in for a silent surprise.

I recently mentioned “hidden static expressiveness in C++” here [1], so it’s true it’s very powerful.

But it also has basic holes. You can’t lean on the type system like you can in other languages – you absolutely need tests. I have a Zulip thread going back three years about semi-automatically translating Oils to C++, and some other issues that came up

• #1 thing that bit me like 3 times was `signed char` bugs, with implicit casting to integer. Using `char*` strings and then doing individual character comparisons is extremely error prone.
• Also it’s not specified whether `char` is `unsigned` or `signed`. It’s signed on most but not all compilers.
• Another surprising one is that a base class a subclass can have fields of the same name. These are two different fields!!
• Shockingly, you don’t actually have to return anything for a function that declares a return value. You can get a nice seg fault at runtime. I believe this should always be a compiler warning but it did actually trip us up. (We’re generating most code, not writing it by hand)
• function overloads rules – the compiler thinks that `f(double)` is similar to `f(int)` and will sometimes choose it
• The `bool*` bug above
• uninitializated vars with garbage
• shadowed variables, e.g. in case block
• template instantiation bugs are hard to debug
• throwing an exception in a destructor is a fatal, unrecoverable error (may be undefined behavior)

This is with `-std=c++11`, but I’m basically sure none of that changes in newer C++.

Actually that’s pretty much all of them related to C++ itself – the others were related to our translation. But it’s still not what you want. It feels modern in many ways (since it’s still changing), exceeding even newer languages, and ancient in others.

I believe C++ `constexpr` exceeds Rust at the moment, and as mentioned multiple inheritance appears to be more powerful than Rust, though arguably we’re getting that mainly through ASDL.

Also looking at the original post, sorry I don’t think your methodology is valid at all. It’s not really saying or demonstrating much.

This document serves as evidence that the Rust and Zig type systems are similar in expressiveness and is not an aspect of argument when chosing either language

This absolutely doesn’t follow from what you wrote. The semantics of the type checker are what matter, not necessarily what types there are. This affects how libraries are designed and what’s idiomatic.

1. 1

Considering they’re both strongly typed languages the semantics are going to be pretty much the same and what’s left is can one express things that the other can’t?

And fair enough, please explain how the comparisons are invalid and say nothing so I can gladly update the text! It was explorative so anything that guides it to being more correct is more than welcome.

1. 7

The Rust and Zig type systems aren’t remotely the same – they have different features, and even the “same” feature may work completely differently!

A good litmus test is how you write containers, like vectors and dicts. That is, writing library level code in both languages will reveal many differences.

And also writing application-level code. The stdlib matters a lot for how the language feels, and what idioms you use to express different solutions.

I’m not an expert in either language, but I’d stand by my statement that Zig favors metaprogramming, while Rust favors type checking.

https://news.ycombinator.com/item?id=26378696

Just like Lisp favors metaprogramming, while ML favors type checking.

So these are fundamentally different languages that are used in different ways. The table not only demonstrates nothing, but I’d say it’s actively misleading!

The top level story is one I often point to if you want to see an experience of the same program in both languages - https://news.ycombinator.com/item?id=26374268

1. 1

I feel this is a great prompt for me to start a very long rant.

Instead, I would rather just add to the top “andyc@lobste.rs says this table is actively misleading / crap - follow with caution or please completely disregard “ than argue further, because I respect your time and I think people generally don’t want to hear the barking.

Maybe in the end it’s just the wording at the top, miscommunicating my intent. I’ll go revise it so it’s more ambiguous…

Edit: The deed is done

1. 2

OK well it doesn’t hurt to have a strong opinion … but strong claims deserve strong evidence :-)

FWIW as something even more concrete, Zig and Go don’t have iterators, while Rust and C++ do, both due to the type system. That issue appears all over your code.

Also Rust has an enormous borrow checker, entirely based on types! Which no other language has (although D and Swift and Mojo seem to have made moves in that direction)

2. 2

Tangential, but I’m legitimately curious what you mean by “strongly typed”. E.g., what subset of these four properties do you mean to point to?

https://lobste.rs/s/z6lpqg/strong_static_typing_hill_i_m_willing_die#c_exvsdk

1. 1

The second definition

1. 2

Notice that that’s a different definition than `kingmob` was using in that thread, and the difference is important. So you and `kingmob` both would communicate better by saying “type sound” and “lacking undefined behavior” respectively.

2. 3

Expressiveness is typically in conflict with type safety. For example:

• Zig without pointers would be safer. It removes whole classes of errors, like null pointer dereferences and use after free.
• Zig with Rust style borrow checking would be safer. Call this language Zig-Borrow (Zig has some memory safety checks, but it’s not entirely memory safe. Rust’s borrow checking is 100% memory safe.) However, Zig-Borrow would be strictly less expressive than Zig. To see this, see that you can trivially convert any program in Zig-Borrow to Zig: simply remove all lifetime annotations and `&` vs. `&mut` vs `*` distinctions. The latter isn’t the case: some Zig programs (even safe ones!) could not be expressed in Zig-Borrow.

The full story is more complicated than this, because when a programmer loses expressiveness, they have to turn to other parts of the language to express what they mean to, and it’s possible that the part of the language they turn to is itself unsafe. Still, to a first approximation, expressiveness is the enemy of type safety.

(There’s also a formal sense in which this is true: every proof of type safety I’ve ever seen continues to hold if you remove a syntactic construct from the language.)

1. 1

That’s a really nice way to explain how expressiveness decreases type safety.

I think it may even help me explain my comparison: if we can make a mapping of Rust -> Zig types, we can at least say “Zig can equally Rust express what Rust can”. Of course you need to look at the underlying type systems that verify the types, and of course the borrow checker is part of this, but obviously Zig doesn’t have a borrow checker, so it only makes sense to look at everything else… For which they are pretty similar in my opinion - I’ve used both, Rust for 3 years, Zig for only 6 months, and the type systems feel very similar at least in usage and end results…

I 100% agree with everyone the story is way more complicated. :)

1. 5

Of course you need to look at the underlying type systems that verify the types, and of course the borrow checker is part of this, but obviously Zig doesn’t have a borrow checker, so it only makes sense to look at everything else…

That’s kind of like saying “obviously sum types with exhaustive checking are a part of this, but C++ doesn’t have them, so it only makes sense to look at everything else” or even, in the extreme, “Obviously types are a part of them but JavaScript doesn’t have them, so it only makes sense to look at everything else.” Excluding the part of the type system that one language has and the other doesn’t and then saying they have the same capabilities in the type system is actually begging the question in the “actual definition of that logical fallacy” sense. I don’t think that was your intent, but that’s the actual thing that happened!

1. 4

Yeah no, I f’d up, which I have no problem ever admitting x) I even knew before hand comparing type systems is a non trivial endeavor, basically not possible. I was just asking to get slapped around :D.

Maybe it’s more about problem modeling with types what Im trying to express. Whatever it is, it has felt nice to get this comparison out of my head x)

10. 11

First of all, SQL, from the get-go, doesn’t have a notion of the execution order of things.

A vanishingly small amount of SQL is written by humans.

Like Postscript!

1. 1

Why not? Apart from IO, there are guards, that depend on ordering. The compiler can’t in general check if guards overlap.

``````ghci> :{
ghci| f | True == True = 42
ghci|   | False == False = 43
ghci| :}
ghci>
ghci> f
42
``````

If this is not execution order, what would you call it?

1. 2

Well, the point of my comment is that SQL isn’t as special a language as the author seemed to think. In that sense, a couple exceptions don’t invalidate my point. I bet there are a couple places in SQL where subexpressions are guaranteed to be executed in some order too.

There’s a way of viewing Haskell where even `do` and guards don’t have “a notation of execution order”, though. The guard above is equivalent to:

``````if True == True then f 42 else (if False == False then f 43 else undefined)
``````

And

``````do
x
y
z
``````

is defined to be equivalent to (not sure if I’m getting this exactly right, but you get the idea):

``````x >>= \x -> (y ==> \y. z y)
``````

In both cases, the required “execution order” can actually be viewed as purely functional code with nesting.

1. 2

Oh ok, I understand. I suppose that the assertion is that if expressions don’t have side effects, and most things are expressions, then the language mostly doesn’t have side effects. I guess I misunderstood it because I think of expression evaluation as embedding an order of operations. Which is apparently where I went wrong.

1. 1

Another way to think of it is in terms of equivalences. The blog post gave some equivalences between SQL expressions. In Haskell, these expressions are always equivalent, for any subexpressions X and Y:

``````X + Y === Y + X
f(X, Y) === f(Y, X)
``````

This is unusual! Very few languages have these equivalences.

1. 1

On the other hand, haskell loses a lot of equivalences because of laziness and in particular the use of syntactic dependencies. (Fast and loose reasoning isn’t actually correct.)

1. 1

I don’t know if this. Could you give an example of an equivalence lost to laziness? And what’s a “syntactic dependency”?

1. 1

Suppose we define:

``````times Z _ = Z
times (S x) y = plus y (times x y)
``````

In a strict language, this commutes (assuming acyclicality), but in haskell, times Z bot is Z, but times bot Z is bot.

1. 1

Ooh, interesting! Thanks for the example.

11. 27

Without trying to re-incite any drama: The compile-time reflection work by thephd that was recently abandoned is exactly what’s needed to remove the need for such dire hacks.

1. 4

Is there even such a need? I worked professionally in Rust for 2 years, have done all my hobby and research projects in it for a decade, and I’ve never desired such a thing, and would have considered it a code smell if any one else desired it. (The code smell being wanting runtime behavior to vary depending on whether a type implements a trait.)

1. 12

There is such a need. It might not come up in the day-to-day but it’s definitely needed in some cases.

Case in point: the standard library does this in a bunch of places, mostly as a performance optimization. However, because they have standard library privileges, they don’t need this autoderef hack and can use the unstable bona-fide specialization instead.

1. 2

Oh, I think I get it, you would do this if (i) you have a `dyn T` (otherwise you’d just use a compile-time `where` block), and (ii) you want to specialize based on the type for optimization purposes (if it’s not for optimization that’s a code smell, you’re making the type `dyn T` be a lie, if you take a `dyn T` then a user-implemented copy of an existing type should behave identically), and (iii) for some reason you can’t have the trait `T` help with this specialization, and (iv) you have profiled and determined that the compiler isn’t going to optimize this for you anyways?

This status quo around this seems fine to me. Niche optimization requires hack to implement. C++ users are confused why it’s even considered a hack. I’m worried about every niche use case like this getting a language extension until Rust becomes more like C++.

Note that I’m not arguing against thephd’s work (which I don’t know much about), or against comptime programming in general (which I think is going to be one of the next big things in programming languages). Just saying that this is a very poor motivation for either.

1. 4

Well yes, but not only then.

otherwise you’d just use a compile-time where block

This is not true in Rust. Even at compile-time, you can not have “overlapping impls” - that’s exactly what the unstable specialization feature enables. So this is a problem for all generic interfaces too, not just dyn stuff.

2. 6

(The code smell being wanting runtime behavior to vary depending on whether a type implements a trait.)

Two cases this comes up a lot in are `Copy` and `Debug`.

If a type implements `Copy` then the compiler guarantees it doesn’t implement `Drop`, which means code that shuffles those types around can be a lot simpler (no worries about panic safety). In many cases what might be a big complex function collapses into a `memcpy()` – for example `MaybeUninit::write_slice()` vs `MaybeUninit::write_slice_cloned()`.

So if you have a function like `fn fill<T: Clone>(&mut self, value: T)` then it’s really tempting to have a way to tell the compiler “hey, I can make this work with a `Clone`, but if the type also happens to be `Copy` then use this fast path”. And indeed that’s what `slice::fill()` does, via `slice/specialize.rs`.

For `Debug`, imagine you have a function like `do_math<T: Add>(x: T, y: T) -> T`. It’s returning the wrong value and you want to drop a few `println!()` calls in there to see what’s going on, but it’s at the very bottom of a big call stack and changing the trait bounds to `T: Add + Debug` will cause type errors everywhere. Wouldn’t it be nice to be able to implement something like this?

``````trait TryDebug {
fn try_format(&self) -> Option<String> { None }
}
impl<T: Debug> TryDebug {
fn try_format(&self) -> Option<String> { Some(format!("{:?}", self)) }
}

fn do_math<T: Add>(x: T, y: T) -> T {
if let Some(debug_x) = x.try_debug() { println!("x: {}", debug_x); }
if let Some(debug_y) = y.try_debug() { println!("y: {}", debug_y); }
...
``````

The problem is that this sort of compile-time trait inspection violates an important rule of Rust, which the ability to reason about the capabilities of a function based on its type signature. It’s weird and spooky.

The OP’s complaint is that macros look kind of like function calls, and macros can do a sort of limited compile-time trait inspection via method resolution rules, so they feel it’s weird and spooky. But it’s not! Macros are just a sort of built-in lightweight codegen, it’s not spooky at all.

3. 6

I think it’s more of a code generation issue: we’d like to be able to branch in code generation on whether a type implements a trait.

Which, to me, feels like a clear code smell when your code generation method is “macros”. They should be purely syntactic. But it’s still desirable behavior for (some) code generators.

I would also be totally happy to call the whole thing over-optimization for developer experience and kind of a misfire on behalf of Tracing. Honestly, to my eye, that whole system there around reporting values is kind of confusing, perhaps all in the name of making those macros convenient to use.

1. 2

You can use traits to solve the specific problem from OP’s tweet and don’t need compile-time reflection or type-aware macros.

https://github.com/tokio-rs/tracing/pull/2781#issuecomment-1785931826

1. 2

I’m aware. This is, to my eye, a workaround to get type-aware macros. We’d like the macro to write code that’s behaviorally different depending on its trait admittance. This method does so by leaning on syntactic ambiguity resolution happening to embed a kind of specialization. I suspect it’s disfavored because it relies on a subtle interaction between language features that may not lead to a robust implementation of, effectively, macro behavior branching on trait admittance.

I agree with the OP that it’s ugly and would prefer an uglier UX to avoid it.

12. 6

Function capturing looks really nice!

1. 3

Isn’t it? I think it may have been invented by Pyret! https://pyret.org/docs/latest/Expressions.html#%28part._s~3acurried-apply-expr%29

(“Function capturing” being the “_ + 1” shorthand for “lambda x: x + 1”.)

1. 8

Scala also has this. I thought Idris did too, but Google isn’t quickly showing me anything that substantiates that.

1. 3

Scala’s much older than Pyret, so Pyret’s definitely not the original source then. I wonder if any language used it before Scala.

2. 5

it’s been in Mathematica since summer 1988

2. 1

Isn’t it the same as partial application, which a lot of functional languages have?

1. 2

It is IMO a nicer shorthand for dealing with partially applying an argument that comes out of order in the function.

Compare

``````divide(_, y)
``````

``````\x -> divide x y
``````

or

``````flip divide y
``````

the latter of which always confuses me because of the function application order.

``````(`divide` y)
``````

and the infix version

``````(/ y)
``````

for this specific case, but if you add more arguments you’ll eventually have to resort to just writing the lambda out like I did first.

Maybe there are other functional languages with similar sugar? I seem to recall Scala does the same thing.

1. 3

Yeah, Scala has pretty much the same syntax. It’s very nice, but actually, three versions are needed imho:

1. The one above with _
2. The one with named parameters such as `\x -> divide x y`
3. One with a default-name parameter

The third one is necessary when working with structures that you don’t want to decompose but still have to access more than once. E.g.: you want to call `sendEmail(sender, text)` and you have a class that has multiple fields, including sender and text. You cannot do `foreach ... sendEmail(_.sender, _.text)` because there’s only one argument and using `_` twice means you want to access the second argument (which doesn’t exist in this example).

Groovy solves that with a special keyword “it” which is similar to `_` but it always refers to the same argument. E.g. `foreach(data -> sendEmail(data.sender, data.text))` can be shortened to `foreach(sendEmail(it.sender, it.text))`. Unfortunately groovy does not have the `_`, so when you want to do `foo( (a, b) -> a + b)` you cannot shorten it to `foo(it + it)` whereas in gleam or Scala you can just do `foo(_ + _)`.

1. 1

I downloaded the same PDF a couple weeks ago, and I’ve been meaning to read through it :) Someone who knows this stuff should really write an article like “here are some programs you can write in Rust but not Hylo and vice versa”.

1. 3

The literal rule for what you can’t express in Hylo is: any Rust function that, if you made its lifetimes explicit, has at most one lifetime parameter. If that lifetime parameter is used in the return type, you need a “subscript”, otherwise a regular function will do. Oh, and references can only appear at the top level, that is very important. Vice-versa is simpler: Rust is strictly more expressive.

But that doesn’t quite answer “what programs you can write” b.c. there’s the question of writing “the same thing” a different way.

I’ll consider writing this blog post.

1. 1

I want to make sure I understand the literal rule: Is “at most” a typo?

Yes I’d love to read that post. Please ping me on Twitter/GMail/whatever (same handle everywhere) if you write it!

1. 3

Gah, sorry, yes, I wrote that backwards.

A Rust function without references can obviously be written in Hylo. If it has references only on the inputs, that can be expressed in Hylo:

``````fn foo(arg1: &T, arg2: &mut T) -> T // works in Hylo
``````

Every Rust function with references only in its inputs can be written with a single lifetime parameter. Put differently, lifetime parameters are only ever needed when there’s an output reference. Thus the above function can be written with only one lifetime parameter:

``````fn foo<'a>(arg1: &'a T, arg2: &'a mut T) -> T
``````

If a Rust function has references on both its inputs and output, then it’s expressible in Hylo only if there’s at most one lifetime parameter when you write out the lifetimes. So:

``````fn foo<'a>(&'a mut self, &'a T) -> &'a T // works in Hylo, needs to be implemented with a "subscript"
fn foo<'a, 'b>(&'a mut self, &'b T) -> &'a T // does not work in Hylo
``````

The other, bigger, restriction is that references have to be top-level. You can’t nest them. So no `Option<&str>`, or `HashMap<i32, &User>`. And no iterators over references! No such thing as `split_words() -> impl Iterator<Item = &str>`!

Instead of a function returning an `Option<&str>`, the function would take a callback and invoke it only if the option is populated. And iterators need to be internal iterators: to iterate over a collection, you pass a closure to the collection and the collection invokes that closure once per element.

I’ve added a file to my blog post ideas folder, and it says to ping you, so I’ll definitely ping you if I write it. Though no promises, as that ideas folder grows more than it shrinks…

13. 5

I think I may have another silvery hued bullet of my own. It’s disappointing really, but to my dismay I find myself able to apply it time and again: have a taste for simplicity. I know it sounds like “just fucking git gud”, but it’s more specific than that. Time and again, I see code that is so needlessly complex that I easily can (and sometimes even do) rewrite it and divide its size by 3 to 10.

I get that people are sometimes under pressure and need to do things quick. That’s not it. The code they wrote clearly took longer to write and test than the simplified version I can think of just looking at the API they implement (and that’s before I even question the API itself, there are some jarring mistakes there too).

Yet I’m pretty sure I am not some hot shot that’s so much better than everyone else. I’ve discussed with enough colleagues and interviewed enough juniors to disabuse me of that notion. Even the ones who committed the horrible code I’m complaining about are fairly smart. It’s something else.

Now “simplicity” is very vague, but there’s a more specific low-hanging fruit: modularity.

John Ousterhout speculates that problem decomposition is the single most important notion in all of computer programming. I basically agree, though I tend to think of it in terms of source code locality instead. Our modules should be deep, with small interfaces and significant functionality.

I have a sneaking suspicion that most of Brook’s colleagues actually had a taste for simplicity, and perhaps even most professionals of the time did. Casey Muratori once suggested that one reason is they didn’t have a choice, the machines they were working on were just too small and too slow to tolerate unnecessary complexity. Now that our machines have gigabytes of memory and even more effective operations per seconds we lost that feedback, and many of us failed to acquire the taste.

Hardware getting better allowed software to get worse in some ways. If there’s no silver bullet, here’s the next best thing: stop feeding the werewolf.

1. 8

I have a sneaking suspicion that most of Brook’s colleagues actually had a taste for simplicity, and perhaps even most professionals of the time did. Casey Muratori once suggested that one reason is they didn’t have a choice, the machines they were working on were just too small and too slow to tolerate unnecessary complexity.

This claim is at odds with, well, basically the entire thesis of The Mythical Man-Month. Here’s Brooks in the original preface:

The effort cannot be called wholly successful, however. Any OS/360 user is quickly aware of how much better it should be. The flaws in design and execution pervade especially the control program, as distinguished from the language compilers. Most of these flaws date from the 1964-65 period and hence must be laid to my charge. Furthermore, the product was late, it took more memory than planned, the costs were several times the estimate, and it did not perform very well until several releases after the first.

Stories like this are common in the literature and folklore of early computing. I’ve lost track of how many times I’ve heard stories of a wise senior programmer (who it is, and what the project was, tends to change with the telling) who allocated a buffer of memory at the start of a project, hid it, and then did nothing with it, because the team would then unknowingly be bound by a smaller memory budget, but one that could, importantly, be increased once they inevitably failed to stay within it. If everyone is scrupulously cherishing every byte and every cycle as a precious resource never to be wasted, the wise senior programmer would not need to resort to such tricks!

So in general and on the strength of available evidence, I think we must reject the nostalgic view that our predecessors were titans of frugal resource usage imposed on them by their machines; they were more or less the same as we are now, and many of the things they built were slow, resource-hungry, over-complex, and buggy just as many of the things we build today are. The main source of this nostalgic myth, I believe, is a form of survivorship bias: the things that have come down to us today to that we are encouraged to study and learn from are part of the small percentage of all software of that era which actually achieved goals of simplicity, frugality, etc., because nobody bothers to keep alive the study and enjoyment of the ones that didn’t.

1. 1

Err… it feels like your two examples are actually making my case for me: even though Brooks was working on mainframes, those machines were much smaller than they are now, and the poor performance and too big a size were only noticed (and eventually corrected) because of this. Complexity on the other hand they may have had more room that if they were programming for a personal computer. Same thing for the “allocate a buffer then miraculously find memory before we ship the game” story, it’s about artificially constraining memory so even if the projects overflow the arbitrary limits it doesn’t overflow the real one.

As for being late, I don’t think the size of the machine matters at all. I can see projects slipping even on a tiny microcontroller with KiB of memory.

And of course, I don’t think for a minute the people from that era were intrinsically better than we are. But they did live in a different environment, where selected into their position from different criteria, and had a different education. They’ve got to be different. In what ways I don’t know, and here I was focusing on a single aspect of their environment, and how it might have influenced them.

1. 4

Every generation of programmers looks back at previous generations and thinks “Wow, they must have really cared about performance to come up with those amazing techniques for working in such limited hardware!”

But the current generation of programmers also always thinks “Wow, I sure am glad we have the near limitless resources of this generation of hardware!” And every generation has people who insist that the bounties yielded by Moore’s Law, which were carefully and respectfully made use of by previous generations (who had to care because the hardware forced it on them) are now being wasted by the current generation of profligate developers who have lost the ethos of caring about performance.

In other words, every generation sees their iteration of hardware as a liberation from the straitjacket the previous generation had to program in. But the previous generation always saw their hardware as a liberation from the straitjacket the generation before them had to program in. And on and on, probably all the way back to ENIAC.

Every generation of software expands to fill the corresponding generation of hardware. Thus it has always been. Thus it always shall be. Every generation of programmers is, on average, average. They tend not to feel the limits of their hardware very much, because to them the current generation is always a big improvement over the last. The software that survives to be studied and recommended for study across multiple generations, however, is anything but average, and using it (or the programmers whose work it was) as representative of its era or of the ethos or job-selection requirements of its era, will lead you astray.

1. 1

I understand selection bias, but it’s not just that. I have used computers since I was 10, which gave me roughly 30 years worth of memory. I’ve played quite a bit of time with my Dad’s Atari ST, then saw the rise of the IBM PC clones since Windows 95 came out… you get the idea. I remember compiling Gentoo packets on my Core2 Duo laptop, and it took ages, I remember boot times being quite a bit longer than they are since the advent of solid state drives, and of course I’ve seen the ungodly progress we’ve made in real time computer generated graphics.

At the same time though, many programs that gained comparatively little functionality, are just as slow now as they were 20 years ago. My phone slows down just because I update it. I’ve installed pretty much nothing, I regularly wipe old data, but no, memory gets tighter and tighter, applications laggier and laggier… even going back to the home screen, that used to be instantaneous, now often takes more than 10 seconds.

I would somewhat understand if programs went slower and more wasteful as hardware gets faster and bigger. It’s a reasonable trade-off to make, to a point. My problem is when that slowdown outpaces the unreal speed at which hardware improved over the last 30 years I could personally observe. I believe that in many cases, we are way past the point where wasting more computational resources helps us deliver a product faster.

1. 2

We started out with this:

Casey Muratori once suggested that one reason is they didn’t have a choice, the machines they were working on were just too small and too slow to tolerate unnecessary complexity.

But there never was a time when this was the case. In every era, programmers are limited by the hardware they work with, but in every era they approach it not as “I must be frugal and responsible, since I have only a few kilobytes of memory”, but as “Wow, I have kilobytes of memory? I can do so much more than before!”

At the same time though, many programs that gained comparatively little functionality, are just as slow now as they were 20 years ago.

I’ve mentioned this example before, I think to you specifically, but: I like to shop in a local Japanese market. Yet I don’t speak or read Japanese. 20 years ago I’d have been out of luck. Today, I can pull my phone out of my pocket, point it at a label or a package, and it does live text recognition and live translation of that text. I literally am blown away every single time. The devices I have today can do so much more than the ones I had twenty years ago that I simply cannot for the life of me fathom the stance that they’ve gained “comparatively little functionality”.

My phone slows down just because I update it.

Back in the day “updates” came on a stack of floppy disks and most people didn’t bother with them. Which is one way to avoid problems from updates!

And then when we started getting updates downloadable over the internet, people started yelling at companies to extend support back further and further, claiming that anything else is “planned obsolescence”. Meanwhile, companies that make software don’t want to be frozen to the capabilities of the oldest supported hardware. So they kludge it, a time-honored tradition reflected in just how old that bit of jargon is, and the result is stuff that runs slower on old hardware.

But the idea that we are in some sort of era where people are uniquely wasteful of hardware resources, or uniquely uncaring about performance, is just completely factually wrong. The average programmer and the average software of 10, 20, 30, 40 years ago were not noticeably better or more frugal or more careful of performance relative to the hardware they had than programmers and software today.

1. 1

To sum up my thesis:

• Programming was different because the machines were different.
• Some skills (not all) that were required then are useful now, and will be for the foreseeable future.
• Since those skills are no longer mandatory, they occur less frequently.
• Software is more wasteful than it would have been if those skills had been perfectly retained.

in every era they approach it not as “I must be frugal and responsible, since I have only a few kilobytes of memory”, but as “Wow, I have kilobytes of memory? I can do so much more than before!”

The actual mindset they had matters much less than what the computers allowed them to do.

But the idea that we are in some sort of era where people are uniquely wasteful of hardware resources, or uniquely uncaring about performance, is just completely factually wrong.

I remain unconvinced. The only indisputable fact I see right now is how much more powerful our computers are. We’ve gone from KHz to MHz in a couple decades, that’s 6 orders of magnitude. Such a difference in degree, even gradual, is bound to introduce differences in kind.

I also never pretended older programmers cared more about performance (and, for the KiB computers and less, simplicity). Like everyone else they likely cared first and foremost about making it work. But when your computer is small enough or slow enough, the more wasteful approaches we can afford today simply did not work. The minimum performance bar was just higher, and they met it because they simply had no choice.

1. 2

Where I disagree with you is that I don’t think there ever was any sort of conscious/deliberate “skill” of carefully and frugally making use of limited resources. There was just software that didn’t do as much as today.

Like, there’s a reason why all those classic games have super-simple low-res graphics and tiny color palettes and had to have extremely simple movement options, etc. And it wasn’t because the programmers who made them had lost some skill of getting more out of the hardware.

But when your computer is small enough or slow enough, the more wasteful approaches we can afford today simply did not work. The minimum performance bar was just higher, and they met it because they simply had no choice.

The “minimum performance bar” was not higher relative to the hardware of the era. Every era has had some software that was fast for the time and some that was slow for the time and a lot that was just average.

And relative to what the hardware of the time was capable of (which is why I emphasized that in my last comment) lots of software of previous eras really was slow and bloated. Really. Yes, really. There was no Apollo-13-style “poor performance is not an option” stuff going on. There was lots of really awful crappy terrible slow software being written. Tons of memory and tons of CPU cycles wasted.

Please, stop promoting the myth that there ever was anything more to it than that.

1. 2

I think the point here is that there are relatively “simple” programs which were feature complete (in the eye of the beholder). Programs with similar functionality are nowadays as slow or slower than those programs in the early days. That makes no intuitive sense to the user - if the same old program would be run today, it would be super fast.

It would make logical sense that a newly built program which does the exact same thing nowadays would be much faster, except that’s not typically the case. Newer programming environments offer conveniences to the programmer which are invisible to the user, but do have an additional performance impact.

For example, if the “fast” program back in the day was hand-written in assembly or C, the same program nowadays might be written in Java or Python or what have you. Nobody in their right mind would hand-write large programs in assembly if they have the choice. C is also quickly falling out of fashion.

As another example, a DOS program had the CPU all to itself. A similar program running under Windows or even Linux would have the OS and background processes to contend with, so it would necessarily feel slower.

Does it make sense? On the whole, I am not sure. We (think we) need more and more software, so it does makes sense that we’re able to produce it faster and more efficiently. What user really wants to go back to running everything single-tasking in DOS? What programmer really wants to go back to writing everything by hand in ASM/C (including device drivers)?

2. 1

Where I disagree with you is that I don’t think there ever was any sort of conscious/deliberate “skill” of carefully and frugally making use of limited resources.

I did not mean to say it was conscious or deliberate.

There was just software that didn’t do as much as today.

Mostly, yes. Computer games especially, with how they compete for triangles and pixels. But we also have software that doesn’t do much more today than it did 20 years ago or so, and somehow manages to not be any faster. It starts being seriously dated, but Jonathan Blow’s Photoshop example was striking.

A likely important factor is how competitive a given sector is. Games are highly competitive, and an excellent game that lags will be played less than an excellent game that does not. At the other end of the spectrum I suspect Photoshop is almost monopolistic, with a huge captive audience they’d need to seriously piss of before they all move to Gimp or Krita.

1. 4

Games are highly competitive, and an excellent game that lags will be played less than an excellent game that does not.

The best-selling video game of all time, notorious for the sheer amount of time its players sink into it, is also infamous for its terrible performance, to such a degree that many guides recommend, as one of the first things you do, installing a mod pack whose sole purpose is to try to make the performance into something reasonable.

That game is Minecraft.

And it is not alone, nor anywhere near; the games industry is well known for shipping things that are broken, buggy, slow, resource-hogging and/or all of the above. So much so that it’s spawned endless memes.

And I’m fairly certain this has all been pointed out to you in prior iterations of this debate.

1. 1

I’m aware that gameplay, time of publishing, and marketing, affect game sales. I reckon they do significantly reduce the selection pressure on criteria such as graphics quality, loading times, input lag, and frame rate.

They do not eliminate that pressure though. Not as effectively as a near-monopoly or vendor locking would.

1. 2

Your argument would require that we somehow see a higher standard of performance coming from game developers.

The empirical reality is we don’t. Game dev is not some special realm of Performance-Carers. Nor were programmers of previous eras particularly concerned – relative to the capabilities of their hardware, they wrote plenty of software that was as slow as the things you complain about today.

You really, really need to learn to accept this, ditch the mythologizing, and move on.

1. 2

You keep painting my comments as if I was saying some groups of people were more virtuous than others. I keep insisting that different groups of people are subject to different external constraints.

Let’s try an analogy with cars. If fuel prices shoot through the roof people will quickly start to pay real close attention to fuel efficiency before buying a new car, creating a market pressure that will force manufacturers to either produce more efficient cars, go out of business… or form a cartel.

1. 1

You keep painting my comments as if I was saying some groups of people were more virtuous than others. I keep insisting that different groups of people are subject to different external constraints.

Because the underlying message always is that certain groups “care” about “performance”. This isn’t the first time we’ve gone round and round on this.

And again, the simple empirical fact is that in every era there are people like you who complain about a lost art of performance and simplicity. What looks to us, now, like constraints that developers of the past must have had to come up with explicit strategies for, were to them at the time rarely perceived as constraints at all, because they felt liberated from how constrained the hardware of their recent past was.

If fuel prices shoot through the roof people will quickly start to pay real close attention to fuel efficiency before buying a new car, creating a market pressure that will force manufacturers to either produce more efficient cars, go out of business… or form a cartel.

For like the fifth time now across these threads: the empirical reality of the game dev industry is that they do not seem to feel performance is a constraint upon them. Games with horrid performance and resource usage are put out all the time and succeed in the market. Minecraft, which did so long before it was bought out, is a great example of this and also of the fact that no “cartel” is necessary to artificially protect games that have poor performance.

1. 1

Because the underlying message always is that certain groups “care” about “performance”. This isn’t the first time we’ve gone round and round on this.

Not my message. Perhaps initially, but I got your point. Please don’t put words in my mouth.

no “cartel” is necessary to artificially protect games that have poor performance.

Of course not. I wouldn’t dare even suggest such a thing.

For like the fifth time now across these threads: the empirical reality of the game dev industry is that they do not seem to feel performance is a constraint upon them.

That really depends on the game.

• Factorio devs reported explicitly spending a ton of time on explicit performance optimisations so the factory could grow to ungodly proportions.
• Jonathan Blow reported that since he wanted unlimited rewind for his puny 2D platform game, he had to spend time making sure the inevitable actual limit was high enough not to be perceived as such by the player (he settled for 45 minutes).
• I can look at Star Citizen and see players complaining about assets failing to load in time, and they end up falling through the entire map (as an example as poor performance affecting correctness).
• In competitive multiplayer FPS, even a single dropped frame could make the difference between a head shot and a miss, not to mention network problems. Could they keep the player count high if the game routinely lagged?
• Almost all 3D game I have ever played on a PC the last 20 years had graphics settings, and in most cases going to the max tanked the frame rate to a perceivable level. Not only I as a player felt the difference, but most importantly the game devs gave me the option… as if performance were an actual constraint they had to address.

Of course, you could come up with 10 times as many examples where performance matters so little you’d go to have out of your way to make it lag (typical of most skinner boxes for palmtops). There’s no unified “game industry”, just like I’m increasingly realising there’s no “embedded industry”: it’s a collection of sub-niches, each with their own constraints.

Still, a significant number of those niches absolutely feel performance constraints.

1. 2

The fact that you’re not even able to talk about this without needing to resort to loaded/judgmental language is a pretty big deal. Oh, see, the True Game Devs™ really do feel the constraint and really do care about performance – it’s those developers of “palmtop” “skinnerboxes” who are being inappropriately held up as counterexamples.

Meanwhile the catastrophic-failure-of-the-week in game dev is Cities: Skylines II, which was apparently close to functionally unplayable on launch day due to abysmal performance. That’s not some rando mobile gacha “skinnerbox”, that’s a big-time game from a big-time publisher.

It really is time to just let it go and admit that in every era and in every field of programming the average programmer is average, and there was no special era and is no special field in which the Performance-Carers were or are uniquely clustered.

2. 1

Photoshop is dealing with constraints that a game (even an AAA game) doesn’t have - like backwards compatibility, and safeguarding the user’s input against data loss and corruption. It’s not a complete explanation for PS’ perceived performance issues, but stuffing everything into the framework of “how fast can you sling stuff onto the screen” is one-dimensional at best.

1. 2

Photoshop is dealing with constraints that a game (even an AAA game) doesn’t have - like backwards compatibility, and safeguarding the user’s input against data loss and corruption.

Still, I don’t see the link between that, and the nearly 100-fold increase in boot times.

• Why, during 20 years of hardware improvement, did Photoshop kept requiring 7 seconds to boot?
• Why does the modern (2017) Photoshop version’s main menu takes 1 full second to drop down, even the second time around?

Backward compatibility doesn’t explain anything here: we’re loading a modern image format, not a Photoshop work file. And sure this image has to be turned into a Photoshop specific internal representation, but that internal representation doesn’t need to follow any kind of backward compatibility. That’s a problem when (auto) saving the works, not for loading it from a JPEG. Likewise, what data loss and corruption can happen during boot?

Having a lot of features is not a reason to be slow. I can perhaps forgive video games for their even slower boot times (The Witness, Factorio), but those load much more than code to memory, they also have a ton of assets they chose to load right away instead of streaming them during the game. What’s the Photoshop equivalent?

That’s a question for many programs by the way: how much data does a program need to load to memory before it can reach an operational state? My guess is, if it takes over 10 times the equivalent `fread(3)` call, there’s a huge margin for improvement.

1. 1

I was recently playing an AAA game which had horrendous load times on my platform (Xbox One), so much so that I usually started the load, then did stuff like emptying the dishwasher while it reached the state where I could start playing. Once it was loaded though, the experience was fast and bug-free.

I’m not a heavy user of Adobe’s products, but it’s possible that their users are fine with a long start-up time as long as the flow within the program is smooth and non-laggy. You start up the application in the morning while fixing coffee, then work through the day.

1. 1

I was recently playing an AAA game which had horrendous load times on my platform (Xbox One), […]

Strange, I’ve heard game consoles usually have certification processes that are supposed to limit load times. Seems I’m not up to date on that.

I’m not a heavy user of Adobe’s products, but it’s possible that their users are fine with a long start-up time as long as the flow within the program is smooth and non-laggy.

Except in Blows demonstration in 2017, it was not. The main drop-down menu took 1 full second to appear, and not just the first time, so whatever it did wasn’t even cached. It’s but one example, I expect other parts of Photoshop were snappy and reactive, but if something as basic as the main menu lagged so much we’re not off to a good start.

2. 3

Now “simplicity” is very vague, but there’s a more specific low-hanging fruit: modularity.

I think this is one of the places where I lean towards a stronger form of the weak linguistic relativity hypothesis for programming languages. I spent a lot of time writing Objective-C and I found that, in that language, 90% of what I wrote ended up being things that could be pulled into different projects or repackaged in libraries. This is not true of any of the code that I wrote before learning Objective-C and, especially, the OpenStep libraries, in spite of the fact that Objective-C was something like the 10th language that I learned. Having started written code in that style, I tend to bring the some of same concepts (modularity, loose design-time coupling) to other languages.

It’s worth noting that, at around the same time as inventing Objective-C, Brad Cox wrote an essay (that I seem unable to find today) with the title ‘What if there is a silver bullet and the competition gets it first?’ where he strongly advocated for modularity as a key design idea. He proposed designing two categories of languages:

• Those for building components
• Those for integrating components.

His idea for Objective-C was as a language for packaging C libraries up into a way that Smalltalk-like languages could use them easily. These days, a lot of interesting projects are using Rust as the language for writing components and Python as the language for integrating them.

1. 1

I think there is a case for figuring out the right amount of breaking-things-down, but I also think we are nowhere close to figuring out what that right amount is or how to go about it. And systems and languages that prize “modularity” seem to turn into intractable messes of abstraction more often than they turn into paragons of easy-to-understand simplicity.

3. 2

The complexity is usually imposed by the customer. I can’t count the number of times I had a great idea for speeding up and simplifying some code, only to discover some pesky requirement was cutting off that particular approach.

1. 1

That’s why code design should be a two-way communication between the business and the implementor. There are cases where the business side over-specify something for no good reason, making efficient implementation impossible. But of course, there are legitimate cases where it’s just a hard requirement.

1. 2

There’ve been many cases of a “must have” hard requirement which we implemented and then it turned out nobody used it. Customers often don’t know what they need either. Or sometimes a feature is about prestige, like the boss’s spouse would love this so there’s no way it’s going to be axed regardless of how little sense it makes.

4. 2

Yes! A thousand times this!

The primary barrier to writing large scale software is complexity. I see developers focus on everything else, like efficiency and local readability and local robustness, which, yes, are important too. But not focusing on the interfaces between things, or even seeming not to really grok what an interface between components is, so that the user of any component needs to understand the details of that component to use it. And not realizing that there’s a dozen ways of writing any chunk of functionality, and some will be less than half the size and complexity of others. And then there’s 100,000 lines of code, and, lo and behold, it doesn’t work quite right in the edge cases, and it takes days to debug things.

14. 11

Stringref is an extremely thoughtful proposal for strings in WebAssembly. It’s suprising, in a way, how thoughtful one need be about strings.

Here is an aside, I promise it’ll be relevant. I once visited Gerry Sussman in his office, he was very busy preparing for a class and I was surprised to see that he was preparing his slides on oldschool overhead projector transparencies. “It’s because I hate computers” he said, and complained about how he could design a computer from top to bottom and all its operating system components but found any program that wasn’t emacs or a terminal frustrating and difficult and unintuitive to use (picking up and dropping his mouse to dramatic effect).

And he said another thing, with a sigh, which has stuck with me: “Strings aren’t strings anymore.”

If you lived through the Python 2 to Python 3 transition, and especially if you lived through the world of using Python 2 where most of the applications you worked with were (with an anglophone-centric bias) probably just using ascii to suddenly having unicode errors all the time as you built internationally-viable applications, you’ll also recognize the motivation to redesign strings as a very thoughtful and separate thing from “bytestrings”, as Python 3 did. Python 2 to Python 3 may have been a painful transition, but dealing with text in Python 3 is mountains better than beforehand.

The WebAssembly world has not, as a whole, learned this lesson yet. This will probably start to change soon as more and more higher level languages start to enter the world thanks to WASM GC landing, but for right now the thinking about strings for most of the world is very C-brained, very Python 2. Stringref recognizes that if WASM is going to be the universal VM it hopes to be, strings are one of the things that need to be designed very thoughtfully, both for the future we want and for the present we have to live in (ugh, all that UTF-16 surrogate pair pain!). Perhaps it is too early or too beautiful for this world. I hope it gets a good chance.

1. 4

But python 3 also got strings wrong, and this article explicitly agrees!! Perhaps counterintuitively, a UTF-8 internal representation is more efficient and universal than the array of code points.

Basically modern languages like Go and Rust got it right, Java and JS are still infected by windows, and Python 3 and Guile have a large amount of complexity they don’t need, which lowers performance. The article mentions moving Guile toward UTF-8.

This is what I’ve been saying across these comment threads, and my own blog post, but there’s a surprising amount of pushback:

https://lobste.rs/s/ql8goe/how_create_utf_16_surrogate_pair_by_hand

https://lobste.rs/s/gqh9tt/why_does_farmer_emoji_have_length_7

I didn’t see PyPy mentioned in this post - unlike cpython, it has a UTF-8 representation that implements the array of code points API in a compatible way.

So I think the obvious implementation of stringref is to do that, but with 2 optional side tables - utf16 and utf32. I will write a bit more about this later!

And btw I agree this is an excellent post. A bit dense but all the facts and conclusions are on point

Also related to the recent discussion of why “universal VM” is a difficult/flawed concept - https://lobste.rs/s/v5ferx/scheme_browser_hoot_tale#c_n87bzw

I do think some of universal string type is inevitable for performance, if WASM wants to support the common languages that people seem to want it to

1. 5

Yeah, I think if I were designing a language from scratch strings would be immutable byte sequences with methods that return different views of the byte slice as a sequence of bytes, codepoints, or grapheme clusters. I think Swift either does this or is close. 99% of the time, a view into bytes is all you need for parsing and whatnot, and you can just treat natural language text as opaque blobs.

2. 2

You’re right that Python 3 also got strings wrong. But it’s an experience people are very familiar with, and still carries the point of my post, even if it’s true that they didn’t get it “right” :)

2. 2

My biggest complaint about Unicode is that it doesn’t allow full round-trip conversion of binary data between the various transformation formats, and instead requires implementations to corrupt the data they are given with replacement characters, or to reject the data and refuse to operate. As a result, Python and Rust (and others I am less familiar with) have had to come up with various nonstandard contortions to cope with situations where it’s reasonable to expect UTF-8 but not guaranteed.

In practise there is full round-trip conversion between WTF-16 (16 bit binary data) and WTF-8, but there is no 8-bit equivalent of bare surrogates that would allow a program to accept non-UTF-8 bytes and losslessly process them within the Unicode framework. If such a thing did exist then it would be easier to do the kind of shape-shifting string representation that Andy Wingo is talking about - the change of representation can happen quietly under the hood, with no need to report failure or risk of data loss. But note that Python’s surrogateescape is not the right way to do this, because it conflicts with WTF-8 and WTF-16; the right solution needs 128 newly allocated codepoints.

1. 7

Why do people keep insisting that you have to be able to pass arbitrary binary data through the channels explicitly designed for correctly-encoded text?

1. 3

Because they want to be able to treat data as text when it comes from channels that do not guarantee the encoding.

Even in channels with explicit charset labels, mojibake exists. Even in systems that were supposed to be UTF-16, WTF exists. It’s wishful thinking to design systems that can’t cope with the way data is used in practice.

1. 5

But the reality of systems is exactly why the separation exists! You want to encode facts like “this is a valid Unicode string” into the type system.

I maintain a library, camino which provides path types which have the type-level assertion that they’re valid Unicode (as most but not all paths in practice are). The Path/Utf8Path separation is very valuable when you’re thinking about which kinds of paths your program should support.

1. 5

Consider this shell program:

``````ls *.py
``````

The shell has to make these calls:

• `glob('*.py')`
• `execve()` on the results of `glob()`.

Note that the program is correct, no matter what the encoding of the files is. Shell do not do any decoding or encoding of filenames – they are opaque bytes.

In other words, it doesn’t matter what the encoding is. And you can’t know – there’s literally no way to tell.

You have a filename encoded in UTF-8 and UTF-16 on the same file system, or in the same directory.

So if your programming language is forcing encoding/decoding at all boundaries, then it’s creating errors in programs where there were none.

It’s also bad for performance to do all the round-tripping. This isn’t theoretical – the JVM string support is a limiting factor in the performance of Bazel.

Also I think some of the Rust build systems (Buck2?) bypass Rust’s string type for this reason.

ripgrep apparently has its own string type which may or may not be valid UTF-8, also bypassing the default string types.

It’s definitely good for programming languages and APIs to help with data encoding problems. The problem is when they force you go through their APIs.

Then they’re taking responsibility for a problem that they cannot solve correctly. It’s a problem that depends on the program being written.

1. 6

But `Path`, the type representing a filepath in the std, is not utf8 in Rust. It is OS specific, e.g. WTF16 on Windows and “raw bytes” on Linux.

Applications that handle paths that may or may not be valid utf8 are expected to use the `Path` type so as to respect the native OS representation.

Camino is for paths that are controlled by the application, and so known to be utf8, and that want the added convenience of transparent conversion to native Rust strings.

2. 3

Sure. If it expected from the source that some data isn’t text, then don’t treat it as text. Bytes are bytes. Some bytes may look like text to a user, but that is the concern for a presentation layer.

3. 2

Rust does not force encoding and decoding at boundaries—if you stay in OsString land, you won’t go through any further validation.

You have to write non-UTF8-path supporting code carefully, but that’s always been the case (see Python 2 to 3 conversion). Rust just brings these considerations to the forefront.

Some applications need their own path and string types. That’s fine too.

2. 3

I don’t think I would call what Rust does contorting. The solution in Rust is to present the data as CString or OsString;

A CString is simply a Vec without NUL, and then with a NUL at the end, there is no guarantee about it being Unicode or UTF-8 or anything in specifiy. It COULD be a UTF8 string but there is no way to know.

The OsString covers more close to the complaint of the article too; it is a textual string in the native platform representation. If you were running on a JavaScript platform, this would be UTF-16 with Surrogates (WTF-16), on Windows it’s Unicode-16. But without first going through validating their unicodeness, Rust won’t let you play with them.

OsString being it’s own type means that if the data is just passing through, there is no need to check if it’s valid UTF-8. Ie, you asked for a directory listing on Windows and then open the file; there is no requirement to verify anything, Windows gave you a filename it considered valid, so you can open it.

You can even do some basic operations (joining OsStrings and indexing into them) which basically eliminates quite a few cases where in other languages you’d need to convert to the internal string format first.

This essentially solves the roundtrip problem.

1. 3

You missed out Path and friends.

The contortion is that Rust has five times the number of string types that it needs (or maybe three times), with fallible conversions that would be infallible if Unicode did not require lossage.

There are places where Rust fails to round-trip, most obviously if you need to report an error involving an OsString or Path.

So I think if Unicode were fixed, Rust could be much simpler and would have fewer nasty little corners where there are cracks in its string handling design that are “essentially” papered over.

1. 3

I think it is very valuable to have a type-level assertion that a string is valid Unicode (not arbitrary binary data, but actual real Unicode). I’m not sure how that can be done without a String/OsString/Vec like separation.

1. 1

The type-level assertion in Rust is about UTF-8 rather than Unicode as a whole. What problems would that type-level assertion help you with if the definition of UTF-8 were extended so there was no such thing as invalid UTF-8?

1. 1

Being able to distinguish invalid/corrupted data from valid data (even if it’s not something you can/care to interpret) is pretty useful. Lots of “can’t happen” scenarios in my life get turned up when something fails to parse, whether it’s a UTF-8 string or a network protobuf message or whatever. Otherwise you’d just get corrupted data and not be able to tell where in the chain it happens or what it might be intended to be.

UTF-8 is just a nice way to squeeze 21-bit numbers down into fewer bits when most of the numbers are small. If you have no such thing as invalid UTF-8, then you need some other nice compression scheme.

1. 1

I think a standard like UTF-8 should be designed to work well at the lowest layers where data transparency is often needed. Of course higher-level code will need to parse the data so it can do more specific unicode processing, but the cost shouldn’t be imposed where it isn’t needed.

What I want does not change the interpretation of anything that is currently valid UTF-8. It only changes the interpretation of invalid bytes, and 128 newly-special codepoints. https://dotat.at/@/2011-04-12-unicode-and-binary-data.html

1. 1

I think there’s space for something like that, but honestly that’s too much of a niche case and shouldn’t be the default encoding for strings.

2. 2

Sorry to interrupt the rant, but I’m trying to figure out precisely what the rant is about. Could someone explain using small words what it would mean to “fix” Unicode?

To keep things concrete: as things are an OsString, like a Windows filename, considers more byte sequences to be valid than UTF-8 does. For example:

``````bytes fe = "�" are invalid UTF-8
bytes with a null byte in the middle are invalid UTF-8 and invalid OsString
``````

The complaint seems to be that you can’t always convert an OsString into (say) UTF-8 and then back. Which makes sense to me, because in that case the OsString doesn’t have UTF-8 in it; in fact it likely doesn’t contain text at all. So some questions:

1. Most importantly, what would it mean to “fix” Unicode? Is the idea that UTF-8 should accept all byte sequences as valid? Or some new format? Are we supposed to tell the Unicode Consortium that not only do they have to create a format for all textual communication between humans in all languages no matter how stupid those languages’ writing rules are, but that it must also, to satisfy some programmers, accept all byte strings as valid?
2. What’s the use case for converting OsString -> UTF-8 -> OsString? Does this use case assume that the OsString contains text? If not, why is it being converted to a specific textual format in the middle, why not keep it as an OsString?
3. How about null bytes in the middle of a bye string? That’s invalid OsString as well as invalid UTF-8. Should that also be accepted, to allow even more byte strings to “round trip”?
1. 1

If a string isn’t UTF-8 it might just be really old.

Most importantly, what would it mean to “fix” Unicode? Is the idea that UTF-8 should accept all byte sequences as valid?

Right, like WTF-16 does for u16 sequences. WTF-16 is much more common in practice than UTF-16, because none of the systems that adopted UCS2 (Java, EcmaScript, Windows) enforced the requirement that surrogates must be paired when they moved to UTF-16. So if that’s OK, something similar for UTF-8 should be OK too. As I said, it only needs 128 codepoints and a tweak to UTF-8.

What’s the use case for converting OsString -> UTF-8 -> OsString?

The usual problem is something like printing an OsString as an error message. Full round-tripping between u8 bytes, u16 words, and unicode codepoints might not be something many programs do, but it is something that will happen in the ecosystem as a whole, and without it there will still be problems of loss of functionality and data.

How about null bytes in the middle of a bye string?

Unicode doesn’t require them to be corrupted or rejected, so they are outside the scope of my beef. I am happy if converting a String to an OsString is fallible, so long as the reverse is infallible.

1. 2

Or… we could try not putting non-textual garbage in our strings? Just for once? Please?

1. 3

I promise I’ll give up the idea of fixing UTF-8 when WTF-16 and WTF-8 have been eliminated.

3. 1

If I am not mistaken, Emacs solves the round-tripping problem by using 32 bit per character and stuffing invalid pints into the high bits somehow (see https://github.com/emacs-mirror/emacs/blob/master/src/character.h). I think that’s the right solution for that specific application since a text editor has to cope with arbitrary files, potentially containing text in multiple encodings at once, or even binary files.

15. 9

One thought I’ve been having lately is about the assumption that we need resizable stacks.

In a strongly-typed language we should be able to predict stack usage statically in most cases. I’d suggest that in the cases we can’t (like non-tail recursion, calling an “opaque” function via function pointer, etc) which can result in stack overflows that these are also effects that the programmer needs to be aware of and mark explicitly.

For example in non-tail recursion the runtime or programmer would need to allocate (malloc) some space for the callee’s stack. This allocation would be fallible and error recovery is up to the programmer, but there would be no possibility of stack overflow.

Similary for async functions they could have an (immovable) stack allocated for them prior to beginning execution/polling. (They might themselves call other functions recursively or asynchronously).

I am then wondering if the async syntax could go the other way around. It is up to the caller to mark any asynchronous function calls the same as they would have to mark non-tail recursion. Ideally the language syntax or standard libraries would allow for structured concurrency only, and together with this it might be possible to implement Rust-style burrowing rules/linear types.

Does anyone know why the whole idea of growable stacks is important / taken as given?

(There is something about C ABI compatibility here too - calling other languages might require some spare stack space. I also seem to remember the Go developers complaining that there are no documented requirements for any given libc function for any given platform. Until that is fixed this seems fundamentally “unsafe”, where “safe” implies impossibility of stack overflow).

1. 5

I wonder if Rust could have just embraced stacklessness, and by default compiled every function down to a state machine. Make arbitrary unlimited stack usage (as required by calling callbacks, etc) the unusual case which requires some extra work. Then there would be no special syntax for async.

That would be great for lots of use cases, like embedding Rust into other systems.

2. 5

I’d suggest that in the cases we can’t (like non-tail recursion, calling an “opaque” function via function pointer, etc) which can result in stack overflows that these are also effects that the programmer needs to be aware of and mark explicitly.

I suspect you might be imagining that non-tail recursion is fairly rare and can typically be avoided? It’s actually quite common, though. It shows up basically whenever you do anything whatsoever with tree-shaped data. For example:

• JSON parsers. There’s a wonderful blog post somewhere that looks at the correctness of common JSON parsers, and finds a ton of bugs and ambiguities in the edge cases, for almost every JSON parser in the wild. One of these is that most JSON parsers will stack overflow if you give it a list `[[[[[[...]]]]]]` nested a thousand deep, because (we so infer) they’re implementing using recursion.
• Code that works with a config file, if the configuration allows for arbitrarily deep nesting of any kind. (Remember that it doesn’t have to be realistic that a config file actually uses nesting so deep that it causes a stack overflow. It’s merely the possibility of nesting, that naturally leads to non-tail recursion in the code that handles config files.)
• Doing stuff while recursively walking through a file system directory. (In the simplest case, it’s easy to code this up with a queue and avoid recursion, but as what you’re doing gets more complicated that becomes harder and harder, while recursion stays easy.)
• Doing anything whatsoever with a tree.

You can always translate recursion into loops, but it can get messy. The general technique is called defunctionalization. You’ve seen it: it’s the difference between an iterator written `yield()` in Python, and an iterator written out explicitly in C++ or Rust.

I’ve also dreamt of a programming language that disallows recursion, or at least explicitly marks it. But it does effect a lot of very ordinary code.

1. 6

For what it’s worth, async recursion requires boxing in Rust today, too.

2. 4

Usually, a static limit is a good idea. Unbounded recursion exposed to the Internet is off a denial of service attack, where you can make the program run out of stack and segfault.

3. 3

Just noting that rewriting non-tail recursion to loops doesn’t solve the fundamental problem. You still have an unbounded job queue/stack but it has just been moved to the heap.

1. 1

I mean, it depends on where your data is coming from and what your expectations are. If the data is from the untrusted Internet, and you want to prevent people from doing DoS through your service, then yes, you’re right. I think that’s pretty rare, though?

First, lots of data comes from you, internally (e.g. your config files), and is thus trusted. It might be big enough to overflow the stack, but it won’t be big enough to cause problems if you’re using the heap.

Second, lots of data comes from semi-trusted sources. Think business partners, who are incompetent but not malicious.

Third, even for data that comes from untrusted sources, DoS might not be a big concern. If your user DoSes you, you ban them after. Being DoSed is bad, certainly, but much much less bad than a security vulnerability where they steal credit card numbers from you or something. And it’s less likely to happen because there’s less profit incentive. Also, if you think a malicious user can’t crash your service in a thousand different ways already, that’s probably wishful thinking…

Don’t let the perfect be the enemy of the good. There are tons of situations where data could cause a stack overflow, but it would be better for it to run slowly on the heap.

1. 3

First, lots of data comes from you, internally (e.g. your config files), and is thus trusted. It might be big enough to overflow the stack, but it won’t be big enough to cause problems if you’re using the heap.

If your code is in a library that can be pulled in as a dependency, such as a JSON parser, then you don’t get to make that assumption; you don’t know who is going to be using it.

Second, lots of data comes from semi-trusted sources. Think business partners, who are incompetent but not malicious.

This kind of thinking took down facebook hard when I first joined. A new employee pushed a change to a config file, which started crashing the parser for a critical system – which, as I recall, was used in the process of deploying systems. Making an immediate rollback hard. But that’s fine; it’s a cost of doing business.

Even so; it’s still a better experience to return a “recursed too deep” error to the user, with information on where in the config the error was found, and information about what can be done to fix it. `Segmentation fault` is not a lovely error to try to unwind.

2. 1

All of which could also be addressed to the same degree by just increasing the minimum size of the stack. Every major OS will let you do it and some compilers can even do it ahead of time. I’m not saying that rewriting non-tail recursion to loops is pointless, it has advantages (e.g., portability, easier error handling of running out of space, possibly more efficient use of memory) and disadvantages (e.g., speed, complicated to write/maintain in many cases). My sole point was that rewriting non-tail recursion to loops doesn’t magically solve the unbounded problem, its just a slightly different approach/tradeoff.

1. 2

Sorry, my last reply was a bit ranty.

In most situations I agree, but not for, say, a JSON parser. A JSON parser written using recursion will crash from stack overflow on a 10kb JSON file that’s deeply nested. I would categorize that as a bug because it’s so easily avoidable, and because a general purpose JSON parser should be able to handle a wide variety of inputs. (E.g., try typing 1200 “[“s followed by 1200 “]“s in this website: https://onlinejson.com/json-parser/)

Increasing stack size is a reasonable thing to do for your own code, but it would be a bit presumptuous for the author of a JSON parser to tell their users to increase their stack size if they want to parse something deeply nested.

Using the heap instead of the stack changes the behavior of the parser: instead of successfully parsing some 10gb JSON files while stack overflowing on some 10kb JSON files, it now can parse any JSON file that’s 10gb or less. True, it still will run out of memory on a truly enormous JSON file, but that’s more of a fundamental hardware limitation.

Every major OS will let you do it and some compilers can even do it ahead of time.

You’re forgetting about the most popular OS, the browser :-). I don’t think a web site can control the stack size used by the client’s browser?

4. 1

Yes, to me the most important part of this is to make it trivially easy. In JS we write `y = await f(x)` for async code, maybe something that easy like `y = call f(x)` would be required? Your IDE could identify where these are necessary (using LSP or whatever) so it’s still developer friendly.

(Now I wonder if you could get the compiler to insert the `call` for you, but often explicit is better than implicit IMO).

3. 3

The idea of statically knowing max recursion depth reminds me of stuff related to https://en.wikipedia.org/wiki/Total_functional_programming

4. 3

Ooh. Bold! The assumption that we have a resizable stack is implicit in almost every programming language based off ALGOL-60 (ie all of them now), but in the absence of recursion and function pointers you can indeed compute function stack usage at compile-time. Simplest case, just have a single copy of each function’s environment, each with a return pointer to the parent environment that gets updated whenever the function is entered. IIUC this is exactly how oldschool Fortran and COBOL worked, and part of why the more compsci-y crowd of the 60’s and 70’s hated them so; you require a stack for recursion, and it also makes nested functions and function pointers easier, so these advanced-for-the-time features were things that those languages could never do.

Having a different set of design tradeoffs with different assumptions is a very interesting idea that deserves investigation; you could have portions of your program (different function colors, perhaps?) that are stackful and stackless, with different constraints. No idea how it could look; go crazy!

1. 3

The history of ALGOL 60 needing the stack is pretty interesting in its own right. I covered it recently (https://madhadron-a-programmers-miscellany.simplecast.com/episodes/the-trail-of-algol). A bunch of the folks who didn’t want it spent the next few years trying to figure out how to make ALGOL 60 resistant to it. It turns out that really nailing it down so you couldn’t slip in recursion was pretty hard.

5. 2

This sounds to me a lot like what Fbip2 is trying to do. Am i wrong?

1. 3

I don’t know that acronym. Do you mean this? https://www.microsoft.com/en-us/research/uploads/prod/2023/05/fbip.pdf

(That took some searching!)

1. 3

Yep! Koka and Lean and Effekt are all doing pretty fascinating work in this direction due to their needs for effects to be efficient.

1. 2

Thanks, I’ll take a closer look.

6. 2

Hm one consequence of that is that the Rust compiler would become a “special” Rust program – since it uses a recursive descent parser AFAIK.

That implies data-dependent stack usage, and so do most compiler passes on any recursive/tree-shaped IR.

I think Zig was trying to go in this direction, and remove stack overflows from unbounded recursion, but I think they either abandoned it or haven’t gotten there (?) I’m interested in any updates.

I think there is a simple algorithm for predicting stack usage – walk the call tree and always take the function with the biggest stack – but:

• You can’t have any casting of function pointers anywhere, which is fairly common in C programs – though with Rust’s type system, I imagine you shouldn’t need this in most cases (?)
• You can’t have a function of one type call a function of the same type, anywhere. Otherwise it could lead to unbounded recursion.
• I wonder if it will still give a big overestimate for many programs. You might still want to pre-allocate an 8MB stack and use a guard page anyway, i.e. if the analysis tells you need a 100 MB stack, but 300 KB is the common case.

I’m sure someone has done that analysis on some real programs – I would be pretty interested in that, including something like Nginx (if it’s even possible)

I think you may have to rewrite your whole library ecosystem to be stack space aware … similar to how you would have to rewrite a lot of code to reduce allocs, or to eliminate GC. I think the D language had a split between a GC stdlib and a no GC stdlib, which became a big issue.

1. 2

Yes, in fact it was from Andrew Kelley’s first Zig talk (recurse centre) when I first started to think about this, where he mentioned thinking of ways of making “perfect software” (no crashes) and speculated whether Zig could avoid stack overflow or not (the answer at the time was “not yet”).

It seems they have thought about this in detail since; that accepted proposal #1006 is very much in line with what I had in mind (except in a higher-level language I’d like less boilerplate - it’s good to be able to use your own allocator and handle errors, recursion limits, etc - but some syntax sugar to make it trivial would be a major boon).

2. 1

Hm one consequence of that is that the Rust compiler would become a “special” Rust program – since it uses a recursive descent parser AFAIK.

I find it curious to highlight this — rewriting the parser to use the heap as necessary, which sounds like it should be a bounded and fairly well understood task, sounds to me like it would be a relatively minor task compared to the open-ended — and potentially intractable? — language-design work entailed by making a language as complex as Rust have statically sized stacks as the default.

https://github.com/rust-lang/unsafe-code-guidelines/issues/427 (“Can we require bounded stack consumption?”) and part of its parent thread (“Explicit Tail Calls”) seem relevant here.

1. 1

Well I don’t think you would rewrite it all – you would just have to mark it as using unbounded stack.

Because there are approximately zero programming language implementations that have hand-written parsers that are non-recursive !! Recursive descent is a really good way of writing a production quality parser (arguably one of only two ways)

As far as generating code, there’s no grammar for Rust, and it would make the error messages worse

https://github.com/rust-lang/wg-grammar

It would be a REALLY bad engineering choice to try to do something like that, because the Rust parser already works, and unbounded stack usage is a non-issue in practice.

3. 1

Hm thinking about this more, I guess there is no reason the analysis is going to tell you that you need 100 MB stack… the only reason that would happen in this regime is:

``````char big[ 100 * MiB ];
``````

So maybe most code will use small stacks, and you don’t have to rewrite a lot.

It is interesting to think about, and now I’m more curious if Zig still has this plan … It does seem to fit Zig’s philosophy, not sure if it really fits Rust’s philosophy.

Personally I like recursion because I’m using lots of data-dependent trees, but there’s probably a place for a more restricted language.

Wuffs has no memory allocation at all – https://github.com/google/wuffs

It generates C that’s meant to be embedded in C programs, but they are hefty-sized parsing programs.

Hm I googled and looks like it’s still accepted/planned?

https://github.com/ziglang/zig/issues/1006

https://github.com/ziglang/zig/issues/6025

I believe that in Zig it’s more idiomatic to not use recursion. People will write JSON parsers with their own stacks, not the call stack

I’d be interested in seeing what actual Rust JSON parsers do – do they use recursion or not?

1. 2

My own thoughts on stack analysis have come to three possible outcomes for stack analysis: “the stack has a known max bound”, “this stack has a known max bound but the typical max bound is very different from it” (maybe there’s one uncommon code path that does lots of recursion), and “the stack has unknown max bound”. But the compiler may alter this result, for example tail recursion optimization turns “unknown max bound” to “known max bound”, and I could see where inlining or other optimizations could do the same by, example, being able to prove that a recursive function terminates on the input that it’s given.

I’d be interested in seeing what actual Rust JSON parsers do – do they use recursion or not?

Looks like `serde_json` uses a recursive parser with an optional depth limit?. The `json` crate also uses a recursive-ish parser but might split its recursion stack onto the heap?

1. 1

Hm interesting, looking at the Rust evidence, I think the O(1) stack usage is nice in theory, but it seems to be way down there on the list of things people run into / complain about. I assume people are running JSON parsers in production by now :)

Like in theory any data-dependent recursion in a web server is a DoS, but in reality there are probably 5 other causes of DoS ahead of that. And you can always bail at a fixed limit, and you can always put some of the stack data structures on the heap.

In other words, I wouldn’t think of it as an all-or-nothing issue – it’s more of a quantitative thing – how often do these problems happen?

And can you mitigate them WITHOUT language support?

The FFI issue mentioned is also a big deal. Because any function that does I/O is calling into libc, or the system DLL on Windows, or the equivalent on OS X. And the compiler has no idea how much stack they’re using.

So you basically can make a restricted subset for pure computation, which I think is not a bad idea … but thinking about it a bit more, it’s not surprising that it’s kinda far down on the priority list!

7. 1

In a strongly-typed language we should be able to predict stack usage statically in most cases.

How does having strong types establish runtime control flow bounds adequate to predict runtime stack size?

1. 2

“strongly-typed” has too many different conflicting meanings, so any sentence using the phrase is just handwaving. https://lobste.rs/s/z6lpqg/strong_static_typing_hill_i_m_willing_die#c_exvsdk

I think it depends more on how static the call graph is, rather than how static or sound the type system is. A statically typed language is less likely to have pathologically dynamic function calls, but even dynamically typed languages are not obliged to indirect every function call through a mutable lookup table.

16. 12

The author noted at the top of the article that they added the word “static” to the title after initially publishing. I think they probably should also remove the word “strong” throughout the article, since the article focuses on the virtues of statically typed languages and seems to provide no discussion on “strongly typed languages”.

1. 7

I have yet to see the day where nobody argues vehemently for static types without confusing them with the strong types. Regardless of which side one falls on, it really undermines their “expertise” when they do it.

1. 9

On the contrary, anyone who uses the phrase “strong type system” without further clarification isn’t an expert, because an expert would know there’s no agreed upon definition of that term. See `dst`‘s link by Norman Ramsey (who’s an expert), and the Wikipedia page.

Here’s a variety of things you might mean when you say “strongly typed”; which do you mean?

• Lacks undefined behavior, so roughly “not C/C++”.
• Is type sound. That is, for every expression E, if the type system says it has type T, and when you run it you get a value V, the type system would also say that V has type T. E.g. Typescript is purposefully not sound.
• Lacks implicit conversions between different types (e.g. automatically .toString()ing an int passed to a function that expects a string).
• Doesn’t have `null`.
1. 2

Good thing I’m not claiming to be an expert :D

Personally, I would have gone with definition #1 on your list, which is also what Norman’s SO answer suggests.

FWIW, I’ve almost never heard anyone claim implicit conversions are relevant to strong typing, and I’ve definitely never heard anyone claim null is, but I don’t hang out much in these debates.

All that being said, even if strong vs weak lacks rigor, it should still be distinguishable from static/dynamic, no?

1. 4

In that case, you probably want to just say “undefined behavior”, because everyone will know what you mean. If you say “strong type system”, various members of your audience will substitute that for:

• 15% –> “lacks undefined behavior”
• 20% –> “type sound”, though most of these people won’t know that term
• 20% –> “static type system” (lots of new programmers in this bucket)
• 30% –> “exactly the programming languages that I like and no others”
• 10% (annoying PL people) –> “why aren’t people from industry ever precise, what the hell does he mean”
• 5% –> something neither of us would guess in a thousand years; people are very creative

The best part is, no one will realize that they’ve interpreted the phrase “strong type system” differently, so if they talk to you or to each other after, everyone will talk past each other and possibly start flamewars.

(Obviously I pulled those numbers out of someplace inappropriate. I’d actually be really curious to see a survey of programmers, asking them what they think “strong type system” means.)

Re: null. If you ask someone what a String is in Java, they’ll say something about a sequence of characters. You ask if you can get the length of one. They say sure, and point you to the docs. Docs say “Returns: An int value, representing the length of the string”. You write a function that takes a String. You call `arg.length()`. Add your code to a bigger project. Run it. It crashes! “Null pointer exception”. In your code! What went wrong? The type system promised that `arg` has type `String`. `String` has a `.length()` method. But `arg` wasn’t a `String`, it was null! The type system lied to you!

The “solution” is that the type `String` doesn’t actually mean an instance of `String`, it means “either a `String` object, or null”. And you can’t safely call any method until you’re sure via reasoning outside of the type system that the receiver object isn’t `null`. And no one bothers to write this down everywhere because it would be tedious and repetitive. But honestly, I’m sympathetic with the simpleton “you” from the previous paragraph: type system lied to me, that weren’t no `String`. And from that perspective, it’s a type soundness violation.

1. 2

Although it’s not a term I would use,¹ I for one had thought of “strongly typed” as a term poorly defined but mostly “Lacks implicit conversions”.

2. 4

My experience is that “strong types” is not a term used much in PLT-research. At least not where I’ve been — maybe I’ve been with the wrong people. :) I think I’ve mostly seen the term used in discussion forums and on blogs. Not that I doubt we could find some books or papers by PLT-folks or other experts using the term.

I was curious and searched some texts from what I’d consider experts, and they don’t use the term, or use it differently:

From the R5RS scheme standard:

Scheme has latent as opposed to manifest types. Types are associated with values (also called objects) rather than with variables. (Some authors refer to languages with latent types as weakly typed or dynamically typed languages.) Other languages with latent types are APL, Snobol, and other dialects of Lisp. Languages with manifest types (sometimes referred to as strongly typed or statically typed languages) include Algol 60, Pascal, and C.

Andrew Appel’s Types and Programming Languages use the term safety:

Language safety is not the same thing as static type safety. Language safety can be achieved by static checking, but also by run-time checks that trap nonsensical operations just at the moment when they are attempted and stop the program or raise an exception. For example, Scheme is a safe language, even though it has no static type system.

From SICP:

This is a small reflection, in Lisp, of the difficulties that conventional strongly typed languages such as Pascal have in coping with higher-order procedures. In such languages, the programmer must specify the data types of the arguments and the result of each procedure: number, logical value, sequence, and so on

In a draft of Bob Harper’s Practical Foundations for Programming Languages he writes (though I’m a bit unsure about how to interpret the “or”):

Most programming languages are safe (or, type safe, or strongly typed). Informally, this means that certain kinds of mismatches cannot arise during execution. For example, type safety for E states that it will never arise that a number is added to a string, or that two numbers are concatenated, neither of which is meaningful.

17. 10

The slowdown has nothing to do with iterators. And it has nothing to do with Rust. At the core, it’s a very basic and fundamental concurrency bug. As he mentioned in the post, if the programmer refactored the two `for` loops into a single one instead of an iterator, the exact same thing would have happened. Conversely, if he used two separate iterations, the bug wouldn’t have happened. And of course the situation would be exactly the same in C++ or any other language like that. This is bordering on clickbait, really nothing to see here. I expected a post addressing real performance costs of iterators.

1. 4

The author goes by she/her pronouns btw

2. 3

You’re assuming bad faith for what I read as an example how someone would naively (intuitively?) convert a to b, and it’s in Rust and it uses iterators - that’s what happened. Of course this can happen elsewhere, still nothing in this post is wrong.

But yeah, one could probably frame it a little bit different,

3. 3

The slowdown has nothing to do with iterators.

Yes, it does. The post clearly states that this is a problem for beginners, and I think you’re just to knowledgeable to even understand the issue. Of course this needs two loops, but the problem is that a beginner might not understand that .map and .for_each get turned into a single loop. In a language like js or ruby, stuff like this would run as two loops. That’s the entire misunderstanding.

This isn’t clickbait, but you aren’t the target audience.

4. 1

Actually, I hear that iterators actually tend to be slightly faster than `for` loops in Rust.

18. 6

I enjoyed the post! Genuine question: is keeping all copies of all ingredients around the ‘best strategy’ for playing factorio, as factories get orders of magnitude bigger? What are the ‘best strategies’, and how do these concepts map to the process of programming?

The largest challenge when first playing Factorio is building a factory without knowing how new technologies will change the game later. Conflicting constraints eventually lead to a required ‘refactoring’ of the factory. In both programming and Factorio, how can one balance exploitation and extensibility such that large systems can expand without becoming ossified due to a lack of foresight? What changes once you’ve already played through the game once?

1. 16

Oh man this is a great post, yes. I’m an avid player of Factorio and games like it, when I have the brain-bandwidth and free time, so I can try to answer your question. Long story short, there is no single “best strategy”, which is part of what makes Factorio and similar games fun. You have to experiment and figure out the strategy that fits the bill!

The two constants of Factorio are that you will never be producing enough of a particular piece of stuff, and you will never have enough raw material. You constantly build new things to expand your capabilities, and then find new bottlenecks and expand your operations to free them up. Which will eventually result in a different bottleneck, but then you expand your operations to solve that, and then you get a new tech that lets you produce something new that either keeps you alive or solves a logistical problem you have, etc etc.

So with that in mind:

• No, keeping all ingredients around will not scale well. You can see that in the screenshots of the factories in the article: they end up with like 12 belts of ingredients, as wide as the rest of the factory put together. Belts cannot easily cross: you have to build underground belts, which can only go in straight lines for a limited distance and which take up space at their endpoints. Splitting or merging belts is easier but still takes up space. So if you keep all your intermediate ingredients around, you will end up with lots and lots of time and space devoted to logistics spaghetti, most of which is unnecessary and all of which will be very difficult to change when you suddenly need to, because…
• Belts also have limited throughput. As the game goes on you will get faster belts (which take more advanced materials to build), but when your automation starts really ramping up around midgame it is pretty easy to produce and consume an entire belt’s throughput of, say, green circuits, which then requires multiple belts worth of input materials to produce. So you can scale in two ways, “vertically” or “horizontally”. To scale “vertically” you have a single belt of inputs “A” with multiple factories taking stuff off of it and putting their products “B” onto a single output belt, which is very easy and efficient and can only go so far until you’re literally consuming everything a single belt can carry. So once you reach that point you have to scale “horizontally” and make multiple parallel production lines, and figure out how to get all their outputs to where they need to be. Which then becomes harder, because…
• Factories take up physical space. You need to leave yourself room for expansion, often in unexpected directions. You also have to go find your physical inputs which may be far away or in dangerous places, and you have to actually be able to build your factory when there are inconvenient (or useful) lakes or cliffs in the way. You can get rid of lakes and cliffs, or you can put the pieces of your factories further away from each other, but you have to use up stuff to do either of those things, which means building more factories to produce that stuff.
• Usually you end up with broad two types of products: stuff that you need steady supplies of of but which are difficult to use up faster than you can produce them (at least until late game), such as railroad locomotives, belts, power lines, solar panels, etc, and stuff that you can never have enough of, such as research items, iron plates, electronic circuits, concrete, etc.

This leads to a lot of lessons that will sound fairly familiar to lots of programmers:

• You have tradeoffs between efficiency and flexibility. If everything was designed perfectly up front and you had unlimited basic resources to work with, it’s pretty easy to design things to be scaled up to infinity. This is often what the lategame looks like, giant grids of factories that take in fairly fixed inputs and produce fairly fixed outputs. (Often measured in “spaceships per minute” by the people who get really into it.) But trying to do that right from the start will leave you hamstrung, as the game progresses what you need will change so you have to be efficient, but also adapt to circumstances.
• As /u/kornel says, having a main bus is not the perfect strategy, but is pretty useful for most cases… because it is adaptable. It is quite easy to add more factories branching off of a bus just by making the bus longer, and it is quite easy to add more products to a bus by making it wider – though as I said making it wider has costs too, so you want to make sure you use it for the most useful and general products, perhaps making other things on-site when they are needed. If you know that you will need it, you can also leave yourself some extra space on a bus for improving throughput, such as running multiple belts of the same material or having a “feeder” line coming in from the side that replenishes a depleted resource. So you end up with many loosely coupled components that are joined by this shared interface layer.
• As your factory grows larger, you will end up with multiple sub-factories in various places, often linked by long belts or more commonly railroads, which do the same basic job but have higher throughputs over long distance (at the cost of higher latency, being more work to set up and manage, and needing more space). These usually end up specialized in certain things, whether it’s as simple as mining iron and producing plates and gears that are shipped back home, or as complex as an entire petroleum plant that results in half a dozen different products in varying amounts, some of which can be partially substituted for others. But because of all the interlocking products and changing supplies and demands, making a truly distributed point-to-point system is very difficult. It’s much easier to keep all your general-purpose continually growing production stuff centralized in one place, and make the offshoot factories specialize, because then you don’t need to get much stuff to your offshoot factories. (Hmm, that reminds me a bit of colonial mercantilism; something to ponder someday.) So you end up with a hierarchical system with a large complicated core interacting with many smaller, simple systems through more special-purpose interfaces.
• Relatedly, if the transport costs become too inefficient for certain places (usually expressed as them turning into belt-spaghetti), it can be worth it to modularize and specialize. For example if you have a process that consumes lots of steel, you can make an entire sub-factory that takes raw materials (coal, iron ore, power) and turns it directly into steel to feed this one process, and give it its own supply lines of those inputs instead of having it connect to your main bus. Knowing when and where to do this takes some judgement and experience, since for more complicated things it is difficult to wrap things up into a single neat “package” design that produces exactly what you need with no weird hairy edge cases, but if done well you can get a nice reusable, efficient module that is easily slotted into larger designs via a more abstract interface (put in four belts of iron and two belt of coal, get out one belt of steel).
• Eventually you get flying logistics robots that can move arbitrary stuff from point A to point B, automatically emptying “provider” containers and filling “requester” ones… but until you start reaching the endgame they are slow, expensive and can’t carry very much, so are best used for low-volume, high-value things. Just like the real world, I suppose. True point-to-point mesh networks are great, but have high overhead, and are still outperformed in throughput by more specialized systems.

You will always have to refactor(y) your setup multiple times, and that’s fine. The state of the game world is too big to hold in your head all at once unless you’re some kind of savant, and there’s plenty of unknowns to trip you up, and many interlocking design decisions that can work out equally well in the end. Experience results in you knowing more about how the different parts of these systems interact, so you have a better feel of what to expect and how things slot together, but no plan is ever perfect unless you’re one of the really white-knuckled players who literally pave over half the game world and do the math to optimize everything perfectly. But I never have the patience for that sort of thing.

So… Um. Yeah. Factorio is pretty cool.

1. 1

Thanks for the comprehensive reply! I should really play some Factorio haha

I think your proposed tradeoff between flexibility and efficiency is spot on. Ideally, one should start off with maximal flexibility, and ‘gradually cast systems in stone’, optimizing the worn parts of the system for efficiency. See also Functional Core, Imperative Shell type architectures.

1. 8

I should really play some Factorio haha

Sorry, “some” is not an amount of Factorio one can play. The options are “none” and “way too much”.

1. 1

Instructions unclear, sat down on a rainy day to play “some” factorio. Now my girl is gone, my kid is grown, and I’m an old man. What year is it?

2. 5

What changes once you’ve already played through the game once?

The 2 biggest changes in my gameplay came when I watched Nefrums do a speed run, and when I got the Lazy Bastard achievement.

Seeing how someone truly skilled built their factory gave me lots of ideas about what kinds of things were possible or optimal. Using inserters to put coal on a forge belt and using red inserters to take ingredients from 2 parallel belts 2 tiles away from the assembler both seem obvious in retrospect, but I never did those things before. Those are small examples of design minutia, the real value was seeing how Nefrums approached the game. How he built his factory and expanded his resource collection in waves. His priorities and milestones that he targeted deliberately one by one. Studying prior art is not unique to Factorio or programming, but I will say that watching Nefrums, an expert, was vastly more valuable to me than any of the tips or designs from amateurs I found on Reddit or the forums.

In doing Lazy Bastard (hand craft no more than 111 items) I learned how incredibly ridiculously convenient it is to have a mall (area of your factory where building materials are pre-crafted and stored). I thought a Lazy Bastard run would be a pain, but it really wasn’t at all. Now I never play the game without an extensive mall based on the one I made for Lazy Bastard. This mirrors my feelings about automated tests and inline asserts. The first time I used asserts in a simple storage engine for a DB class in college, over half of them fired at some point or another during development, and I was enlightened.

In both programming and Factorio, how can one balance exploitation and extensibility such that large systems can expand without becoming ossified due to a lack of foresight?

My attempt at the “There is no spoon” achievement actually does mirror some of my professional experience, as I had a constraint for myself that I would launch the rocket in 8 hours with a sustainable factory. That is, a factory that I could continue the game with after launching the rocket, rather than throw away after getting the achievement. And I have continued that factory, using it to get all the other achievements. I also required myself to use 100% nuclear power by the time my starting coal ran out. My experiences:

• Proof of concept: my first attempt was well over the time limit (20+ hours) for either speed run achievement, but I focused on exploring the kind of factory layout I wanted to use for my next attempts once I realized I was not going to make it.
• Design review: after the first attempt I spent some time in creative mode reviewing my design and tweaking parts of it for my second attempt. I also consulted my brother for his thoughts on some of my designs.
• Project planning, task breakdown: after my second attempt (~14.5 hours) I spent more time in creative mode making my factory better suited to incremental improvement throughout my next run.
• Future proofing: my original factories were extremely tight and optimized, until I needed to snake an ingredient through a place I didn’t expect, and there was literally not a single tile of space to do it. From then on I left 4 tiles of empty space between components, even if I spent lots of time designing those components and their neighbors ahead of time and theoretically didn’t need the space.
• Avoid large rewrites: my first attempt factory was loosely based on the factory I built for my Lazy Bastard run. For my later attempts I avoided redesigning my factory from scratch even though I had plenty of new ideas. My goal was to make the necessary improvements to succeed on my next attempt, not start over completely.
• Scaling pains: I succeeded on my third attempt (~7.5 hours), but there were still complications in expanding the factory after the rocket launch. I had planned my factory ratios in a way that could be upgraded with modules post-launch, but I had to spend a lot of time revisiting areas of the factory to address unforeseen throughput issues.
• Monolith vs microservices: I quickly realized that trains were impractical for an 8 hour run due to the up-front cost of building a truly scalable train network. My factory had isolated sections for different components, but was still basically a single integrated entity. Only after the launch and upgrades / maintenance to my original “monolith” factory did I invest in a train network. I kept the original factory running the entire time that I developed external “microservices” for oil refining, smelting ore, mass chip manufacturing, and so on. If I had committed to building a train network for the 8 hour rocket launch, I don’t think it would have scaled well in the long term, and I would have needed to redesign it anyway.

is keeping all copies of all ingredients around the ‘best strategy’ for playing factorio, as factories get orders of magnitude bigger?

For my core factory to launch a rocket, I absolutely did not do this. I recreated several intermediates in different places as they were needed. For example, snaking ~1 assembler of gears from my red science all the way to engine units for blue science seemed unnecessary. It was far simpler to just create more gears alongside the pipes next to the engine unit assemblers. I only snaked intermediates that were expensive or complex to make, or highly unrelated to anything else nearby: red chips, blue chips, sulfur, batteries, plastic, and low-density structures. But I did make an effort to place consumers of these things close together where possible. I despise the “main bus” approach to factory building.

My post-launch “microservices” never completely did this either. I connected complex or high-volume intermediates to the network, but constructed simple, low-volume, or single use things inline. I made trains for the same things I modularized before, as well as including fluids that require water to produce. I never had trains for copper wire, gears, pipes, rods, engine units, robot frames, explosives, and so on. I shipped raw uranium ore to nuclear power plants, and handled all processing on site.

how do these concepts map to the process of programming?

I don’t think my approach is necessarily the best way to play Factorio, but it definitely reflects my values in software engineering. I believe that components should be modular with well defined boundaries, but microservices are a heavy-handed tool that should be used sparingly. That hyper-generalization is overrated, and specialization of mundane things can actually make systems simpler and reduce maintenance costs. That systems should be designed up front as much as is practical, while leaving room for incremental improvement or changing requirements down the line.

1. 2

ahahaha you posted this right as I finished my own novel-length reply, and I love hearing about how different your design philosophy is! “I despise the main bus approach” made me really stop and think. XD Do you have any good sources besides Nefrums?

1. 2

I think the main bus is reasonable-ish if you have no idea what’s coming next. But everything doesn’t need everything else, so it’s not that hard to put similar things together, especially once you’ve launched a rocket before. Chips are the best example, my 8 hour factory is divided by a large vertical mall, chips on the left, and everything else on the right. They don’t have any codependencies, they don’t need to be together for any reason. I don’t even belt green chips to the right half, only red and blue. The entire right half is served by a few green chip factories on that side.

I don’t actually like videos or streaming as a medium for learning. At first I learned by playing with my brother and a friend, who both had 500+ hours before I even started. Only after launching a rocket multiplayer did I really look into improving my game.

My brother is into the speed run community so when I asked about Factorio pros he pointed me to Nefrums and AntiElitz, the 2 top Factorio speed runners. I like Nefrums a lot more because he spends more time explaining his decisions and teaching on his stream. But once I got inspiration on common design elements and an understanding of how an expert views the progression of the game, I stopped watching. From there I made and tested my own designs in creative mode. I didn’t copy any of his designs directly, except for forges because there really is just a one true best way to make forge lines and nothing to improve about it.

For certain things that are more complicated like train loading/unloading, intersections, beaconed factories, and balancers, I looked around online for people who had done exceptional work in those areas.

Train loading/unloading should be done evenly for each car in the train and there is no one “best” solution. I sought out all the existing approaches so I could synthesize a design I am satisfied with. I don’t have any good resources for this, I mostly looked at a lot mediocre designs to get an understanding of what approaches people have done before. Though for what it’s worth you can get an optimally balanced design with the `/ -CHESTS` circuit approach without much effort. I just found that unsatisfying.

There is an insanely thorough benchmark of train intersections on the forums that I used for inspiration before designing my own intersections. I made something similar to the super compact Celtic knot, that’s a little more compact and has 4 tiles between the rails instead of 6.

For beaconed factories, flame_Sla on Reddit has posted saves of 10k SPM designs and beyond. I found those valuable to study, I learned a lot about beacon layout and direct insertion. You can make much more compact designs if you cut belts out of the equation entirely.

Balancers in particular are an uninteresting problem (to me) solved by The Balancer Book, compiled from a combination of prior art and a SAT solver.

In summary, I spent a lot of time in creative mode, studying prior art for anything that seemed nuanced enough to warrant it. Every component of my factory has gone through several iterations in creative mode, been put to the test in a real run, and refined from there.

1. 4

Thanks a lot! I think this conversation honestly is a pretty good demonstration of the mindset of “mid-tier player” vs “advanced player”. My answer is “design for flexibility and adaptability because you can’t keep everything in your head, you’re going to have to change things no matter what” and yours is “I understand I’m going to have these specific design rules and patterns, so I will build around them and it will be more efficient without needing as much refactoring.”

It also reflects technical mindset: I love figuring out how to put things together from nothing and exploring new problems, but doubling-down on it and putting lots of work into understanding the details and how to apply them is less interesting to me. It sounds like you’re more dedicated to optimization and refinement than I usually am, and that’s resulted in you having a deeper understanding of how pieces interlock. I’m usually done with a factory after it’s launching rockets semi-consistently, going back to expand and optimize it has limited appeal.

Didn’t know I had that much to learn, but I have a few things I’m interested in checking out now!

2. 2

Now I really need to play Factorio sometime!

I like how you emphasized the social attribute of systems design. There’s a lot you can learn on your own from your first factory. Building a megafactory, on the other hand, requires goals, community, planning, and coordination. The same is true of programming: no system is designed in a vacuum, and tradeoffs must be discovered, considered, and decided upon. This happens a lot faster if you can learn from others.

I enjoyed hearing how the design of your factories changed as you the constraints and resulting tradeoffs were determined for a particular run.

3. 5

All is probably an overkill, but this strategy in general sounds like https://wiki.factorio.com/Tutorial:Main_bus

4. 3

I enjoyed the post! Genuine question: is keeping all copies of all ingredients around the ‘best strategy’ for playing factorio, as factories get orders of magnitude bigger? What are the ‘best strategies’, and how do these concepts map to the process of programming?

No, because plenty of ingredients aren’t dense enough to ship - for instance, it’s basically never worth putting copper cables on your main belt because 1 copper plate = 2 cables, so you’re better off just making your ‘copper cable belt’ carry plates instead and then crafting cables locally (and it crafts in a second or less, unlike e.g. engines, which are basically just iron+steel but take (20?) seconds each to craft)

5. 2

is keeping all copies of all ingredients around the ‘best strategy’ for playing factorio, as factories get orders of magnitude bigger?

“Megabases” that scale up factories tend to be built on top of train networks. They often use a mod like LTN or Cybersyn to route deliveries, though it can be done with the game’s base mechanics with somewhat more work.

I’ve often seen “city block” designs described as being analogus to serivce-oriented architecture, though I’m not sure it’s a perfect analogy. Each block tends to be responsible for a single thing, requesting a small number of inputs and providing a single or small number of outputs.

I’m not sure there’s a single “best” design or set of principles for such designs. How strictly do you enforce blocks doing a single thing — are they restricted to a single recipe, or can they make some precursors? Do you optimise for the lowest number of inputs and outputs, or for speed and scale? Do you scale by improving a city block, or just copying it (like improving performance compared to running more instances of a service)?

Something that’s kept me coming back to Factorio is that there isn’t a single optimal way to play that makes up a meta that all players follow. While there are some common patterns, there’s a lot of potential tradeoffs players will find they have different priorities for.

1. 1

My own factorio experience is to start with spaghetti, then main bus (which are both covered by the article). Then, because the factory must grow, switch to city blocks, which also start introducing interesting concepts of queuing theory.

6. 1