Threads for kprotty

  1. 13

    The reason that Tim Harris (and / or Keir Fraser’s?) lockless ring buffer uses separate words for the producer and consumer queue is twofold:

    • You don’t need to worry about overflow. In this version, when you increment the counter in the low bits, it can overflow into the high bits. In their version, as long as the size of the queue is less than half the range of the integer, you can always just mask off the low bits. The wrapping behaviour of C unsigned integers means that the arithmetic for calculating the amount of free space is a simple subtraction even if either counter overflows.
    • The producer and consumer counters can be concurrently written from different cores. Putting them in the same word increases false sharing. Typically, you’d want to put them in separate cache lines (you can save some space by putting one in front of the ring and one after, which gives you some false sharing when you use the first / last entries in the ring but not for the common case).
    1. 5

      Another point for using separate words is that the single-producer and single-consumer cases can use an atomic store to commit instead of an atomic read-modify-write; The former of which (with release semantics) reduces to a normal store on x86.

      1. 4

        I was curious so I looked for this…

        I don’t know if any of these are precisely what you’re referring to…

        1. 3

          The one that’s used in the Xen ring buffer implementation. I think this comes from Keir’s dissertation but I’m not sure whether Keir, Tim, or some combination of the two invented it. A lot of the stuff in that project came out of pub conversations between the two of them and with Ian Pratt, so attribution is nontrivial.

          1. 2

            Bingo, found an article explaining it. Thanks!

            And I’m digging the Fraser/Harris paper. The transactional object system is clever enough that I’m having to fight off an urge to implement it just for fun.

            1. 1

              Bingo, found an article explaining it. Thanks!

              Hmm, that looks very familiar…

              And I’m digging the Fraser/Harris paper. The transactional object system is clever enough that I’m having to fight off an urge to implement it just for fun.

              I have a similar reaction. A bunch of the stuff that they did is so elegant that I find it very difficult to not apply it in horribly inappropriate settings.

      1. 40

        I tried out this language while it was in early development, writing some of the standard library (hash::crc* and unix::tty::*) to test the language. I wrote about this experience, in a somewhat haphazard way. (Note, that blog post is outdated and not all my opinions are the same. I’ll be trying to take a second look at Hare in the coming days.)

        In general, I feel like Hare just ends up being a Zig without comptime, or a Go without interfaces, generics, GC, or runtime. I really hate to say this about a project where they authors have put in such a huge amount of effort over the past year or so, but I just don’t see its niche – the lack of generics mean I’d always use Zig or Rust instead of Hare or C. It really looks like Drew looked at Zig, said “too bloated”, and set out to create his own version.

        Another thing I find strange: why are you choosing to not support Windows and macOS? Especially since, you know, one of C’s good points is that there’s a compiler for every platform and architecture combination on earth?

        That said, this language is still in its infancy, so maybe as time goes and the language finds more users we’ll see more use-cases for Hare.

        In any case: good luck, Drew! Cheers!

        1. 10

          why are you choosing to not support Windows and macOS?

          DdV’s answer on HN:

          We don’t want to help non-FOSS OSes.

          (Paraphrasing a lot, obvs.)

          My personal 2c:

          Some of the nastier weirdnesses in Go are because Go supports Windows and Windows is profoundly un-xNix-like. Supporting Windows distorted Go severely.

          1. 13

            Some of the nastier weirdnesses in Go are because Go supports Windows and Windows is profoundly un-xNix-like. Supporting Windows distorted Go severely.

            I think that’s the consequence of not planning for Windows support in the first place. Rust’s standard library was built without the assumption of an underlying Unix-like system, and it provides good abstractions as a result.

            1. 5

              Amos talks about that here: Go’s file APIs assume a Unix filesystem. Windows support was kludged in later.

            2. 5

              Windows and Mac/iOS don’t need help from new languages; it’s rather the other way around. Getting people to try a new language is pretty hard, let alone getting them to build real software in it. If the language deliberately won’t let them target three of the most widely used operating systems, I’d say it’s shooting itself in the foot, if not in the head.

              (There are other seemingly perverse decisions too. 80-character lines and 8-character indentation? Manual allocation with no cleanup beyond a general-purpose “defer” statement? I must not be the target audience for this language, is the nicest response I have.)

              1. 2

                Just for clarity, it’s not my argument. I was just trying to précis DdV’s.

                I am not sure I agree, but then again…

                I am not sure that I see the need for yet another C-replacement. Weren’t Limbo, D, Go, & Rust all attempts at this?

                But that aside: there are a lot of OSes out there that are profoundly un-Unix-like. Windows is actually quite close, compared to, say, Oberon or classic MacOS or Z/OS or OpenVMS or Netware or OS/2 or iTron or OpenGenera or [cont’d p94].

                There is a lot of diversity out there that gets ignored if it doesn’t have millions of users.

                Confining oneself to just OSes in the same immediate family seems reasonable and prudent to me.

            3. 10

              My understanding is that the lack of generics and comptime is exactly the differentiating factor here – the project aims at simplicity, and generics/compile time evaluations are enormous cost centers in terms of complexity.

              1. 20

                You could say that generics and macros are complex, relative to the functionality they offer.

                But I would put comptime in a different category – it’s reducing complexity by providing a single, more powerful mechanism. Without something like comptime, IMO static languages lose significant productivity / power compared to a dynamic language.

                You might be thinking about things from the tooling perspective, in which case both features are complex (and probably comptime even more because it’s creating impossible/undecidable problems). But in terms of the language I’d say that there is a big difference between the two.

                I think a language like Hare will end up pushing that complexity out to the tooling. I guess it’s like Go where they have go generate and relatively verbose code.

                1. 3

                  Yup, agree that zig-style seamless comptime might be a great user-facing complexity reducer.

                  1. 16

                    I’m not being Zig-specific when I say that, by definition, comptime cannot introduce user-facing complexity. Unlike other attributes, comptime only exists during a specific phase of compiler execution; it’s not present during runtime. Like a static type declaration, comptime creates a second program executed by the compiler, and this second program does inform the first program’s runtime, but it is handled entirely by the compiler. Unlike a static type declaration, the user uses exactly the same expression language for comptime and runtime.

                    If we think of metaprogramming as inherent complexity, rather than incidental complexity, then an optimizing compiler already performs compile-time execution of input programs. What comptime offers is not additional complexity, but additional control over complexity which is already present.

                    To put all of this in a non-Zig context, languages like Python allow for arbitrary code execution during module loading, including compile-time metaprogramming. Some folks argue that this introduces complexity. But the complexity of the Python runtime is there regardless of whether modules get an extra code-execution phase; the extra phase provides expressive power for users, not new complexity.

                    1. 8

                      Yeah, but I feel like this isn’t what people usually mean when they say some feature “increases complexity.”

                      I think they mean something like: Now I must know more to navigate this world. There will be, on average, a wider array of common usage patterns that I will have to understand. You can say that the complexity was already there anyway, but if, in practice, is was usually hidden, and now it’s not, doesn’t that matter?

                      then an optimizing compiler already performs compile-time execution of input programs.

                      As a concrete example, I don’t have to know about a new keyword or what it means when the optimizing compiler does its thing.

                      1. 2

                        A case can be made that this definition of complexity is a “good thing” to improve code quality / “matters”:

                        Similar arguments can be used for undefined behavior (UB) as it changes how you navigate a language’s world. But for many programmers, it can be usually hidden by code seemingly working in practice (i.e. not hitting race conditions, not hitting unreachable paths for common input, updating compilers, etc.). I’d argue that this still matters (enough to introduce tooling like UBSan, ASan, and TSan at least).

                        The UB is already there, both for correct and incorrect programs. Providing tools to interact with it (i.e. __builtin_unreachable -> comptime) as well as explicit ways to do what you want correctly (i.e. __builtin_add_overflow -> comptime specific lang constructs interacted with using normal code e.g. for vs inline for) would still be described as “increases complexity” under this model which is unfortunate.

                        1. 1

                          The UB is already there, both for correct and incorrect programs.

                          Unless one is purposefully using a specific compiler (or set thereof), that actually defines the behaviour the standard didn’t, then the program is incorrect. That it just happens to generate correct object code with this particular version of that particular compiler on those particular platforms is just dumb luck.

                          Thus, I’d argue that tools like MSan, ASan, and UBSan don’t introduce any complexity at all. The just revealed the complexity of UB that was already there, and they do so reliably enough that they actually relieve me of some of the mental burden I previously had to shoulder.

                      2. 5

                        languages like Python allow for arbitrary code execution during module loading, including compile-time metaprogramming.

                        Python doesn’t allow compile-time metaprogramming for any reasonable definition of the word. Everything happens and is introspectable at runtime, which allows you to do similar things, but it’s not compile-time metaprogramming.

                        One way to see this is that sys.argv is always available when executing Python code. (Python “compiles” byte code, but that’s an implementation detail unrelated to the semantics of the language.)

                        On the other hand, Zig and RPython are staged. There is one stage that does not have access to argv (compile time), and another one that does (runtime).

                        Related to the comment about RPython I linked here:

                        http://www.oilshell.org/blog/2021/04/build-ci-comments.html

                        https://old.reddit.com/r/ProgrammingLanguages/comments/mlflqb/is_this_already_a_thing_interpreter_and_compiler/gtmbno8/

                        1. 4

                          Yours is a rather unconventional definition of complexity.

                          1. 5

                            I am following the classic paper, “Out of the Tar Pit”, which in turn follows Brooks. In “Abstractive Power”, Shutt distinguishes complexity from expressiveness and abstractedness while relating all three.

                            We could always simply go back to computational complexity, but that doesn’t capture the usage in this thread. Edit for clarity: Computational complexity is a property of problems and algorithms, not a property of languages nor programming systems.

                            1. 3

                              Good faith question: I just skimmed the first ~10 pages of “Out of the Tar Pit” again, but was unable to find the definition that you allude to, which would exclude things like the comptime keyword from the meaning of “complexity”. Can you point me to it or otherwise clarify?

                              1. 4

                                Sure. I’m being explicit for posterity, but I’m not trying to be rude in my reply. First, the relevant parts of the paper; then, the relevance to comptime.

                                On p1, complexity is defined as the tendency of “large systems [to be] hard to understand”. Unpacking their em-dash and subjecting “large” to the heap paradox, we might imagine that complexity is the amount of information (bits) required to describe a system in full detail, with larger systems requiring more information. (I don’t really know what “understanding” is, so I’m not quite happy with “hard to understand” as a concrete definition.) Maybe we should call this “Brooks complexity”.

                                On p6, state is a cause of complexity. But comptime does not introduce state compared to an equivalent non-staged approach. On p8, control-flow is a cause of complexity. But comptime does not introduce new control-flow constructs. One could argue that comptime requires extra attention to order of evaluation, but again, an equivalent program would have the same order of evaluation at runtime.

                                On p10, “sheer code volume” is a cause of complexity, and on this point, I fully admit that I was in error; comptime is a long symbol, adding size to source code. In this particular sense, comptime adds Brooks complexity.

                                Finally, on a tangent to the top-level article, p12 explains that “power corrupts”:

                                [I]n the absence of language-enforced guarantees (…) mistakes (and abuses) will happen. This is the reason that garbage collection is good — the power of manual memory management is removed. … The bottom line is that the more powerful a language (i.e. the more that is possible within the language), the harder it is to understand systems constructed in it.

                                comptime and similar metaprogramming tools don’t make anything newly possible. It’s an annotation to the compiler to emit specialized code for the same computational result. As such, they arguably don’t add Brooks complexity. I think that this argument also works for inline, but not for @compileError.

                    2. 18

                      My understanding is that the lack of generics and comptime is exactly the differentiating factor here – the project aims at simplicity, and generics/compile time evaluations are enormous cost centers in terms of complexity.

                      Yeah, I can see that. But under what conditions would I care how small, big, or icecream-covered the compiler is? Building/bootstrapping for a new platform is a one-time thing, but writing code in the language isn’t. I want the language to make it as easy as possible on me when I’m using it, and omitting features that were around since the 1990’s isn’t helping.

                      1. 8

                        Depends on your values! I personally see how, eg, generics entice users to write overly complicated code which I then have to deal with as a consumer of libraries. I am not sure that not having generics solves this problem, but I am fairly certain that the problem exists, and that some kind of solution would be helpful!

                        1. 3

                          In some situations, emitted code size matters a lot (and with generics, that can quickly grow out of hand without you realizing it).

                          1. 13

                            In some situations

                            I see what you mean, but I think in those situations it’s not too hard to, you know, refrain from use generics. I see no reason to force all language users to not use that feature. Unless Hare is specifically aiming for that niche, which I don’t think it is.

                            1. 4

                              There are very few languages that let you switch between monomorphisation and dynamic dispatch as a compile-time flag, right? So if you have dependencies, you’ve already had the choice forced on you.

                              1. 6

                                If you don’t like how a library is implemented, then don’t use it.

                                1. 2

                                  Ah, the illusion of choice.

                        2. 10

                          Where is the dividing line? What makes functions “not complex” but generics, which are literally functions evaluated at compile time, “complex”?

                          1. 14

                            I don’t know where the line is, but I am pretty sure that this is past that :D

                            https://github.com/diesel-rs/diesel/blob/master/diesel_cli/src/infer_schema_internals/information_schema.rs#L146-L210

                            1. 17

                              Sure, that’s complicated. However:

                              1. that’s the inside of the inside of a library modeling a very complex domain. Complexity needs to live somewhere, and I am not convinced that complexity that is abstracted away and provides value is a bad thing, as much of the “let’s go back to simpler times” discourse seems to imply. I rather someone takes the time to solve something once, than me having to solve it every time, even if with simpler code.

                              2. Is this just complex, or is it actually doing more than the equivalent in other languages? Rust allows for expressing constraints that are not easily (or at all) expressable in other languages, and static types allow for expressing more constraints than dynamic types in general.

                              In sum, I’d reject a pull request with this type of code in an application, but don’t mind it at all in a library.

                              1. 4

                                that’s the inside of the inside of a library modeling a very complex domain. Complexity needs to live somewhere,

                                I find that’s rarely the case. It’s often possible to tweak the approach to a problem a little bit, in a way that allows you to simply omit huge swaths of complexity.

                                1. 3

                                  Possible, yes. Often? Not convinced. Practical? I am willing to bet some money that no.

                                  1. 7

                                    I’ve done it repeatedly, as well as seeing others do it. Occasionally, though admittedly rarely, reducing the size of the codebase by an order of magnitude while increasing the number of features.

                                    There’s a huge amount of code in most systems that’s dedicated to solving optional problems. Usually the unnecessary problems are imposed at the system design level, and changing the way the parts interface internally allows simple reuse of appropriate large-scale building blocks and subsystems, reduces the number of building blocks needed, and drops entire sections of translation and negotiation glue between layers.

                                    Complexity rarely needs to be somewhere – and where it does need to be, it’s in often in the ad-hoc, problem-specific data structures that simplify the domain. A good data structure can act as a laplace transform for the entire problem space of a program, even if it takes a few thousand lines to implement. It lets you take the problem, transform it to a space where the problem is easy to solve, and put it back directly.

                              2. 7

                                You can write complex code in any language, with any language feature. The fact that someone has written complex code in Rust with its macros has no bearing on the feature itself.

                                1. 2

                                  It’s the Rust culture that encourages things like this, not the fact that Rust has parametric polymorphism.

                                  1. 14

                                    I am not entirely convinced – to me, it seems there’s a high correlation between languages with parametric polymorphism and languages with culture for high-to-understand abstractions (Rust, C++, Scala, Haskell). Even in Java, parts that touch generics tend to require some mind-bending (producer extends consumer super).

                                    I am curious how Go’s generic would turn out to be in practice!

                                    1. 8

                                      Obligatory reference for this: F# Designer Don Syme on the downsides of type-level programming

                                      I don’t want F# to be the kind of language where the most empowered person in the discord chat is the category theorist.

                                      It’s a good example of the culture and the language design being related.

                                      https://lobste.rs/s/pkmzlu/fsharp_designer_on_downsides_type_level

                                      https://old.reddit.com/r/ProgrammingLanguages/comments/placo6/don_syme_explains_the_downsides_of_type_classes/

                                      which I linked here: http://www.oilshell.org/blog/2022/03/backlog-arch.html

                            2. 3

                              In general, I feel like Hare just ends up being a Zig without comptime, or a Go without interfaces, generics, GC, or runtime. … I’d always use Zig or Rust instead of Hare or C.

                              What if you were on a platform unsupported by LLVM?

                              When I was trying out Plan 9, lack of LLVM support really hurt; a lot of good CLI tools these days are being written in Rust.

                              1. 15

                                Zig has rudimentary plan9 support, including a linker and native codegen (without LLVM). We’ll need more plan9 maintainers to step up if this is to become a robust target, but the groundwork has been laid.

                                Additionally, Zig has a C backend for those targets that only ship a proprietary C compiler fork and do not publish ISA details.

                                Finally, Zig has the ambitions to become the project that is forked and used as the proprietary compiler for esoteric systems. Although of course we would prefer for businesses to make their ISAs open source and publicly documented instead. Nevertheless, Zig’s MIT license does allow this use case.

                                1. 2

                                  I’ll be damned! That’s super impressive. I’ll look into Zig some more next time I’m on Plan 9.

                                2. 5

                                  I think that implies that your platform is essentially dead ( I would like to program my Amiga in Rust or Swift or Zig, too) or so off-mainstream (MVS comes to mind) that those tools wouldn’t serve any purpose anyway because they’re too alien).

                                  1. 5

                                    Amiga in Rust or Swift or Zig, too)

                                    Good news: LLVM does support 68k, in part to many communities like the Amiga community. LLVM doesn’t like to include stuff unless there’s a sufficient maintainer base, so…

                                    MVS comes to mind

                                    Bad news: LLVM does support S/390. No idea if it’s just Linux or includes MVS.

                                    1. 1

                                      Good news: LLVM does support 68k Unfortunately, that doesn’t by itself mean that compilers (apart from clang) get ported, or that the platform gets added as part of a target triple. For instance, Plan 9 runs on platforms with LLVM support, yet isn’t supported by LLVM.

                                      Bad news: LLVM does support S/390. I should have written VMS instead.

                                      1. 1
                                    2. 2

                                      I won’t disagree with describing Plan 9 as off-mainstream ;) But I’d still like a console-based Signal client for that OS, and the best (only?) one I’ve found is written in Rust.

                                1. 4

                                  Zig will automatically generate both an async and sync function body from the same source-level definition? That’s incredibly awesome, obviously the correct language design decision, and something that I’ve been upset at other languages (Python, Rust) for getting incorrect for a long time.

                                  (why “obviously correct”? because forcing you to rewrite the same function twice, with the only difference being the presence of the “async” keyword, in order to use it in both an async and sync context, is literally forcing the programmer to do repetitive, automatable work that can and should be done by a computer, which is an automation machine, for no benefit to the human)

                                  Maybe Zig is worth trying to learn to fit into my slot for “non-managed-memory systems-programming language” - currently occupied by C, because Rust was too difficult for me to fit my mind around in the small amount of time that I previously had to try to learn it.

                                  1. 1

                                    It generates only one function “version” of async/sync, depending on whether the function does anything “async” or not (i.e. uses @frame(), does suspend, does await (which implicitly suspends), calls an async function normally (which implicitly awaits)).

                                    When you’re switching between an async and synchronous version, changing only the keywords may mean you’re probably not taking advantage of either correctly. For example, spawning a bunch of async tasks and joining their results is fine and efficient; Doing the same with threads naively isn’t such a good idea even if it’s a one-to-one mapping. There’s actual, situational benefit in being explicit about what tools you’re using under what async/sync context.

                                  1. 14

                                    This seems to hit two (granted, unexpected) cases of zig async:

                                    • The async-ness of a function which implicitly becomes async (i.e. without callconv(.Async) or similar) is not exposed at comptime.
                                    • Multiple functions that have the same signature can coerce to the same function pointer, but not the same async function as the compiler generated @Frame() state machines are different. Think &dyn Future<Output=T> vs impl Future<Output=T> in Rust, but without the vtable in the dyn version..

                                    Misunderstanding the cause of these two conditions seemed to lead them down the path that the functions themselves (not pointers to them) are colorful w.r.t invocation, which isn’t always the case. It can be the case at the root levels where concurrency is done explicitly (e.g. _start to async or clone() to async), but those aren’t the conditions pointed out by the post, nor do they need to be exposed to most zig async users as mentioned in Loris’ post.

                                    Instead of inferring the properties of zig async functions, the misunderstanding (and conclusions that came out of it) could have been avoided by getting clarification from the zig community or through the project’s github issues. Not the best end-goal answer, but it’s the one that’s the most effective anecdotally at the moment.

                                    1. 6

                                      libgit2 is on c89, but that’s mostly because it has to support so many platforms and c89 is the easiest common language for all these compilers.

                                      See what linux can do by not supporting the Microsoft C compiler?

                                      1. 14

                                        Writing a kernel in any C standard below C11 is a terrible idea. C11 was the first C standard to introduce atomics. Prior to that, atomics were fudged with a mixture of inline assembly and volatile. The guarantee that you get from volatile is not sufficient for atomics (only that loads and stores will not be elided and that access to the same object will not be reordered with respect to each other). Since atomics have existed for over a decade and anyone who cares about parallel execution has had a long time to migrate their code, compilers are becoming increasingly willing to optimise volatile.

                                        FreeBSD is in a much better position to make this switch because the kernel’s memory model is an acquire-release model that maps directly to C11 (and to atomic instructions in modern ISAs), whereas the Linux kernel memory model is a set of barriers based on the ones in the Alpha. I was able to compile the FreeBSD kernel in C11 mode with the kernel’s atomics.h reimplemented using C11 stdatomic.h functions a few years ago (it compiles in gnu99 mode by default today).

                                        The motivating example in this is another reminder to me of why I would not write a kernel in C today. You’re trying to write C++ iterators in C. If you did it in C++, you’d have type-safe code and the compiler already includes a load of lifetime warnings. If you did it in Rust, then the lifetime warnings are part of the type system. C++11 and later are good fits for a modern kernel, they have well-defined atomics, a memory model that works well with modern hardware, smart pointers for tracking ownership, type-safe generic data structures, and so on, all of which C lacks. Rust needs you to use unsafe for a lot of the code and the static analysis tooling for unsafe Rust is in a worse state than for unsafe C++, but outside of a core set of kernel routines you can probably stick to safe Rust and be in a better position.

                                        1. 8

                                          I’m no expert, but if I understand correctly the reason why Linux does not use C11 atomics is not just “legacy code”, but also the fact that C11 atomics are not expressive enough to cover all of Linux’ use-cases: keeping existing Linux ordering guarantees would require “too strong” C11 ordering that would noticeably hurt performance. (For most other projects I would say that this is crazy, it’s already hard enough to avoid UBs when reasoning with C11 atomics, going to something more fine-grained brings the certainty of getting things wrong. But then Linux has enough leverage to get hardware engineers looking at its synchronization code, so maybe this is reasonable?) Do other kernels have better luck mapping their own memory model to the C11 memory model after the fact?

                                          (In OCaml we have the interesting problem that the OCaml Multicore memory model does not coincide with any of the C11 memory orderings – it is more relaxed in some aspects, less in others, and it has less undefined behaviors. As a consequence, it’s not clear which C11 memory ordering to use when implementing part of the OCaml runtime system in C. The problem is not completely dissimilar to the Linux situation, as noticed by Guillaume Munch-Maccagnoni.)

                                          1. 1

                                            I’m no expert, but if I understand correctly the reason why Linux does not use C11 atomics is not just “legacy code”, but also the fact that C11 atomics are not expressive enough to cover all of Linux’ use-cases:

                                            This may be true, but I’d be very surprised if it were. A number of core idioms from the Linux kernel were part of the requirements for the atomics working group.

                                            1. 2

                                              They could be referencing “Data/Control dependency” barriers from https://www.kernel.org/doc/Documentation/memory-barriers.txt

                                              The C11 memory orderings have no such thing atm. It should have been memory_order_consume but it can’t be held up under certain optimizations which break control-flow and data-flow chains at codegen. Instead, memory_order_acquire is required which has unnecessary happens before barrier costs on weakly ordered architectures.

                                              Linux also uses things like seq-lock in which accessing the protected data soundly isn’t obvious under C11. This can be a bit restrictive as it often breaks the ability to do things like race-ily memcpy. LLVM introduces the Unordered memory ordering which is still atomic but doesn’t require total ordering on the atomic variable like relaxed which is easier to optimize.

                                              It’s unfortunate, but I don’t think that there’s alternatives to these in the C11 memory model for achieving similar codegen without compiler-specific extensions (i.e. atomic intrinsics or inline assembly).

                                              1. 1

                                                Thanks!

                                                • Indeed the compiler-side issues with memory_order_consume were among the things I vaguely remember from Linux-atomics discussions, so your more informed explanation is certainly in line with what I had in mind.

                                                • I didn’t know about Unordered, and it looks useful to describe the behavior of non-atomic variables in the OCaml memory model. (The model is more defined than C11 non-atomics, as it gives a semantics to races, but it is more relaxed than “relaxed” C11 atomics.) This is probably no coincidence as I read that Unordered was introduced to represent parts of the Java memory model. Sounds useful. Thanks!

                                                (But: I’m not familiar with LLVM, and I don’t see an easy way to access those intrisics from a C program using Clang. Searching the web suggests that to use LLVM intrisics in a language runtime written in C, we would need a separate file with LLVM IR)

                                          2. 4

                                            wonder why it took 40 years to add atomics to a language designed for writing operating systems. did atomics become more important at some point?

                                            1. 10

                                              Before widespread use of multiple cores and multithreading I guess they were a lot less important.

                                              1. 1

                                                it’s not confined to multiple cores and threading - it is reordering (out of order execution). On the one hand you have primitives (like memory-mapped IO) with implicit or explicit rules on ordering where some load-store operations MUST happen in a certain sequence, and the language lacked the ability to model that.

                                                *a = 1
                                                *b = 2
                                                c = *a + *b
                                                

                                                am I allowed to actually set b before a, can I reduce c to a constant 3? within the language, “it depends” (without side-effects verbiage in the standard) - older C versions permitted volatility, but that only solves for part of the equation, not for ordering or atomicity, just that the operation has to happen - before C11 you had to resort to barriers and other compiler extensions.

                                                1. 4

                                                  it’s not confined to multiple cores and threading - it is reordering (out of order execution).

                                                  There are two kinds of reordering that matter, reordering by the CPU and reordering by the compiler. Reordering by the CPU matters only in multicore systems. On a single core, out-of-order execution is not observable by the running process. Interrupts result in things in store buffers being either committed or discarded and so if you context switch to another thread then it will see the same sequence of operations as appeared in the instruction stream.

                                                  Prior to C11, threads were outside the C abstract machine entirely and so the compiler was free to do any reordering that was not observable to the (one) thread. In C11, the language requires valid source programs to exhibit ‘data-race freedom’, which means that it is able to reorder anything that does not involve atomics or explicit fences and it is undefined behaviour if the program is able to observe this. Both defining c in your example as 3 (assuming that they can prove that a and b don’t alias - if they do alias it has to be 4, if they might alias then the compiler is still free to assume that c must be either 3 or 4).

                                                  The thing that makes the C++11 memory model interesting is the set of things that it permits, rather than the set of things that it prevents. In GNU C89 (which the Linux kernel uses), you need to use a big hammer to prevent compiler reorderings: an inline assembly block with the "memory" clobber, which tells the compilers ‘memory changes in some unspecified way here, ensure that nothing moves across this point’. With the C++11 memory model, the compiler is allowed to move some operations (either loads or stores) in one direction (either forwards or backwards) across certain kinds of atomic. This exposes more optimisation opportunities to the implementation.

                                                  C++11 defines two kinds of fence: signal fences and thread fences. A signal fence defines synchronisation with respect to signals, which practically means that it restricts what a compiler can reorder. A thread fence defines synchronisation with respect to threads and so restricts what the combination of the compiler and CPU can reorder (and therefore requires the compiler to insert explicit barrier instructions on weakly ordered architectures). The Linux memory model has only the latter kind.

                                                  1. 1

                                                    did “reordering” become more important at some point due to advances in technology?

                                            2. 1

                                              I mean, I’m also part of the reason why libgit2 is strict with C89 😬

                                              1. 3

                                                So what platforms hold it at that level? How much non-retro-enthysiast stuff would break if it became C11 now?

                                                1. 1

                                                  Just a little thing called windows I guess, could be wrong though.

                                                  1. 2
                                            1. 2

                                              The opening sort of intertwines simple concurrent dispatch into the definition of “structured concurrency”. The latter isn’t about how easy it is to make things parallel, but instead having constructs which automatically wait for concurrently dispatched work to complete in a structured manner.

                                              1. 1

                                                Great point, I focused on the benefit but didn’t really include that kind of definition. Adding now, thanks!

                                              1. 1

                                                If you’re looking for the answer to the headline question, scroll about halfway down until you see the term in a section heading. Yeesh.

                                                1. 8

                                                  It’s not a very good explanation. For example, it starts with:

                                                  Before getting into futex, we need to know which problem it solves: race condition

                                                  Futexes do nothing to prevent race conditions. They are a performance optimisation. You could remove every futex call in a program and replace it with a spin loop. It would then get a lot slower because threads would wait until their quantum is exhausted before yielding and so the thread that could make progress would not get CPU time until later.

                                                  You can build a lot of synchronisation primitives on top of futex. Anything that uses a compare-and-swap in its spin path can be converted into a sleeping lock with futex, so it can implement semaphores, mutexes, condition variables, and barriers quite easily. For example, you can use a word to represent the value of a counting semaphore. On the fast path, you read the value, if it’s greater than zero, compare-and-swap to decrement it and a simple atomic increment to increment it. You need to sleep if the value is already 0, so on that path you do a futex call with the wait operation. The kernel will internally acquire a lock, check that the value is still 0, and sleep the calling thread if it is, until there’s a corresponding wake call on the same address. When you increment, if you increment from 0 then there may be sleepers and so you do the futex wake operation to wake one or more of them up. You can optimise this by tracking waiters.

                                                  Similarly, you can implement a barrier by using two words. One that is a counter of threads that haven’t yet arrived at the barrier, the other that is the number of times the barrier has been triggered. When you reach the barrier, you first read the second word and then atomically decrement the first word. If it the first word isn’t zero, then you do a futex-wait on the second word and wake up when it changes. If the first word has reached zero then you are the last thread to reach the barrier, so you increment the second word do a futex wake op on it to wake everyone else up. It doesn’t matter if the second word overflows, as long as the value changes every time all threads rendezvous at the barrier.

                                                  The futex call supports a whole bunch of exciting things, for example bitfield operations allowing you to treat a 32-bit words as 32 separate mutexes and acquire or release any subset in a single atomic op on the fast path and block waiting for up to 32 of them.

                                                  I prefer the FreeBSD version, _umtx_op in general because it has a richer set of data structures to operate on, each optimised for a common concurrency primitive. In particular, it has a much better way of handling timeouts (at least for the _sem2 variant). Both futex and _umtx_op are allowed to spuriously wake and if you need to retry then you need to know how much of your timeout has elapsed. _umtx_op helpfully provides the remaining time as an output so that you can just use that on your next loop iteration.

                                                  1. 1

                                                    Slight nitpick, but for the semaphore example, it would need to wake all the waiters on post observing zero unless it risks deadlocking:

                                                    • 3 threads see 0 and wait
                                                    • a post happens, observed 0, wakes one thread
                                                    • two more post happen and don’t observe 0
                                                    • 1 of the 3 waiting threads wakes up and decrements the value to 2 since it’s now 3
                                                    • the value is now 2 (non zero) but the other 2 threads are still waiting with nothing to wake them up.
                                                    1. 1

                                                      Yes, sorry, I was lax in my language to the point of incomprehensibility. This is what I meant by ‘wake one or more up’ (FUTEX_OP_WAKE with INT_MAX) but reading it again I would have interpreted it as FUTEX_OP_WAKE with 1 or some other arbitrary number. Note that this is not intended to be a good implementation of a semaphore (all of the waiting threads will wake up and then all except one will immediately sleep), it’s just the simplest.

                                                1. 26

                                                  I find the complaints about Go sort of tedious. What is the difference between using go vet to statically catch errors and using the compiler to statically catch errors? For some reason, the author finds the first unacceptable but the second laudable, but practically speaking, why would I care? I write Go programs by stubbing stuff to the point that I can write a test, and then the tests automatically invoke both the compiler and go vet. Whether an error is caught by the one or the other is of theoretical interest only.

                                                  Also, the premise of the article is that the compiler rejecting programs is good, but then the author complains that the compiler rejects programs that confuse uint64 with int.

                                                  In general, the article is good and informative, but the anti-Go commentary is pretty tedious. The author is actually fairly kind to JavaScript (which is good!), but doesn’t have the same sense of “these design decisions make sense for a particular niche” when it comes to Go.

                                                  1. 35

                                                    What is the difference between using go vet to statically catch errors and using the compiler to statically catch errors?

                                                    A big part of our recommendation of Rust over modern C++ for security boiled down to one simple thing: it is incredibly easy to persuade developers to not commit (or, failing that, to quickly revert) code that does not compile. It is much harder to persuade them to not commit code where static analysis tooling tells them is wrong. It’s easy for a programmer to say ‘this is a false positive, I’m just going to silence the warning’, it’s very difficult to patch the compiler to accept code that doesn’t type check.

                                                    1. 34

                                                      What is the difference between using go vet to statically catch errors and using the compiler to statically catch errors?

                                                      One is optional, the other one is in your face. It’s similar to C situation. You have asan, ubsan, valgrind, fuzzers, libcheck, pvs and many other things which raise the quality is C code significantly when used on every compilation or even commit. Yet, if I choose a C project at random, I’d bet none of those are used. We’ll be lucky if there are any tests as well.

                                                      Being an optional addition that you need to spend time to engage with makes a huge difference in how often the tool is used. Even if it’s just one command.

                                                      (According to the docs only a subset of the vet suite is used when running “go test”, not all of them - “high-confidence subset”)

                                                      1. 15

                                                        When go vet automatically runs on go test, it’s hard to call it optional. I don’t even know how to turn if off unless I dig into the documentation, and I’ve been doing Go for 12+ years now. Technically gofmt is optional too, yet it’s as pervasive as it can be in the Go ecosystem. Tooling ergonomics and conventions matter, as well as first party (go vet) vs 3rd party tooling (valgrind).

                                                        1. 21

                                                          That means people who don’t have tests need to run it explicitly. I know we should have tests - but many projects don’t and that means they have to run vet explicitly and in practice they just miss out on the warnings.

                                                          1. 2

                                                            Even in projects where I don’t have tests, I still run go test ./... when I want to check if the code compiles. If I used go build I would have an executable that I would need to throw away. Being lazy, I do go test instead.

                                                        2. 13

                                                          Separating the vet checks from the compilation procedure exempts those checks from Go’s compatibility promise, so they could evolve over time without breaking compilation of existing code. New vet checks have been introduced in almost every Go release.

                                                          Compiler warnings are handy when you’re compiling a program on your own computer. But when you’re developing a more complex project, the compilation is more likely to happen in a remote CI environment and making sure that all the warnings are bubbled up is tedious and in practice usually overlooked. It is thus much simpler to just have separate workflows for compilation and (optional) checks. With compiler warnings you can certainly have a workflow that does -Werror; but once you treat CI to be as important as local development, the separate-workflow design is the simpler one - especially considering that most checks don’t need to perform a full compilation and is much faster that way.

                                                          Being an optional addition that you need to spend time to engage with makes a huge difference in how often the tool is used. Even if it’s just one command.

                                                          I feel that the Go team cares more about enabling organizational processes, rather than encouraging individual habits. The norm for well-run Go projects is definitely to have vet checks (and likely more optional linting, like staticcheck) as part of CI, so that’s perhaps good enough (for the Go team).

                                                          All of this is quite consistent with Go’s design goal of facilitating maintenance of large codebases.

                                                          1. 5

                                                            Subjecting warnings to compatibility guarantees is something that C is coming to regret (prior discussion).

                                                            And for a language with as… let’s politely call it opinionated a stance as Go, it feels a bit odd to take the approach of “oh yeah, tons of unsafe things you shouldn’t do, oh well, up to you to figure out how to catch them and if you don’t we’ll just say it was your fault for running your project badly”.

                                                          2. 4

                                                            The difference is one language brings the auditing into the tooling. In C, it’s all strapped on from outside.

                                                            1. 19

                                                              Yeah, “similar” is doing some heavy lifting there. The scale is more like: default - included - separate - missing. But I stand by my position - Rust is more to the left the than Go and that’s a better place to be. The less friction, the more likely people will notice/fix issues.

                                                            2. 2

                                                              I’ll be honest, I get this complaint about it being an extra command to run, but I haven’t ever run go vet explicitly because I use gopls. Maybe I’m in a small subset going the LSP route, but as far as I can tell gopls by default has good overlap with go vet.

                                                              But I tend to use LSPs whenever they’re available for the language I’m using. I’ve been pretty impressed with rust-analyzer too.

                                                            3. 12

                                                              On the thing about maps not being goroutine safe, it would be weird for the spec to specify that maps are unsafe. Everything is unsafe except for channels, mutxes, and atomics. It’s the TL;DR at the top of the memory model: https://go.dev/ref/mem

                                                              1. 6

                                                                Agreed. Whenever people complain about the Rust community being toxic, this author is who I think they’re referring to. These posts are flame bait and do a disservice to the Rust community. They’re like the tabloid news of programming, focusing on the titillating bits that inflame division.

                                                                1. 5

                                                                  I don’t know if I would use the word “toxic” which is very loaded, but just to complain a little more :-) this passage:

                                                                    go log.Println(http.ListenAndServe("localhost:6060", nil))
                                                                  

                                                                  Jeeze, I keep making so many mistakes with such a simple language, I must really be dense or something.

                                                                  Let’s see… ah! We have to wrap it all in a closure, otherwise it waits for http.ListenAndServe to return, so it can then spawn log.Println on its own goroutine.

                                                                   go func() {
                                                                       log.Println(http.ListenAndServe("localhost:6060", nil))
                                                                   }()
                                                                  

                                                                  There are approximately 10,000 things in Rust that are subtler than this. Yes, it’s an easy mistake to make as a newcomer to Go. No, it doesn’t reflect even the slightest shortcoming in the language. It’s a very simple design: the go statement takes a function and its arguments. The arguments are evaluated in the current gorountine. Once evaluated, a new goroutine is created with the evaluated parameters passed into the function. Yes, that is slightly subtler than just evaluating the whole line in a new goroutine, but if you think about it for one second, you realize that evaluating the whole line in a new goroutine would be a race condition nightmare and no one would actually want it to work like that.

                                                                  Like, I get it, it sucks that you made this mistake when you were working in a language you don’t normally use, but there’s no need for sarcasm or negativity. This is in fact a very “simple” design, and you just made a mistake because even simple things actually need to be learned before you can do them correctly.

                                                                  1. 3

                                                                    In practice, about 99% of uses of the go keyword are in the form go func() {}(). Maybe we should optimize for the more common case?

                                                                    1. 1

                                                                      I did a search of my code repo, and it was ⅔ go func() {}(), so you’re right that it’s the common case, but it’s not the 99% case.

                                                                    2. 2

                                                                      I agree that the article’s tone isn’t helpful. (Also, many of the things that the author finds questionable in Go can also be found in many other languages, so why pick on Go specifically?)

                                                                      But could you elaborate on this?

                                                                      evaluating the whole line in a new goroutine would be a race condition nightmare and no one would actually want it to work like that.

                                                                      IMO this is less surprising than what Go does. The beautiful thing about “the evaluation of the whole expression is deferred” is precisely that you don’t need to remember a more complicated arbitrary rule for deciding which subexpressions are deferred (all of them are!), and you don’t need ugly tricks like wrapping the whole expression in a closure which is the applied to the empty argument list.

                                                                      Go’s design makes sense in context, though. Go’s authors are culturally C programmers. In idiomatic C code, you don’t nest function calls within a single expression. Instead, you store the results of function calls into temporary variables and only then pass those variables to the next function call. Go’s design doesn’t cause problems if you don’t nest function calls.

                                                                  2. 3

                                                                    At least they mention go vet so even people like me without knowing it can arrive at similar conclusions. And they also mention that he is somewhat biased.

                                                                    But I think they should just calmly state without ceremony like “And yet there are no compiler warnings” that this is the compiler output and this is the output of go vet.

                                                                    This also seems unnecessary:

                                                                    Why we need to move it into a separate package to make that happen, or why the visibility of symbols is tied to the casing of their identifiers… your guess is as good as mine.

                                                                    Subjectively, this reads as unnecessarily dismissive. There are more instances similar to this, so I get why you are annoyed. It makes their often valid criticism weaker.

                                                                    I think it comes as a reaction to people valid agreeing that golang is so simple but in their (biased but true) experience it is full of little traps.

                                                                    Somewhat related: What I also dislike is that they use loops for creating the tasks in golang, discuss a resulting problem and then not use loops in rust - probably to keep the code simple

                                                                    All in all, it is a good article though and mostly not ranty. I think we are setting the bar for fairness pretty high. I mean we are talking about a language fan…

                                                                    1. 5

                                                                      This also seems unnecessary: […]

                                                                      Agree. The frustrating thing here is that there are cases where Rust does something not obvious, the response is “If we look at the docs, we find the rationale: …” but when Go does something that is not obvious, “your guess is as good as mine.” Doesn’t feel like a very generous take.

                                                                      1. 6

                                                                        the author has years of Go experience. He doesn’t want to be generous, he has an axe to grind.

                                                                        1. 3

                                                                          So where’s the relevant docs for why

                                                                          we need to move it into a separate package to make that happen

                                                                          or

                                                                          the visibility of symbols is tied to the casing of their identifiers

                                                                          1. 3

                                                                            we need to move it into a separate package to make that happen

                                                                            This is simply not true. I’m not sure why the author claims it is.

                                                                            the visibility of symbols is tied to the casing of their identifiers

                                                                            This is Go fundamental knowledge.

                                                                            1. 3

                                                                              This is Go fundamental knowledge.

                                                                              Yes, I’m talking about the rationale.

                                                                              1. 3

                                                                                https://go.dev/tour/basics/3

                                                                                In Go, a name is exported if it begins with a capital letter.

                                                                                1. 1

                                                                                  rationale, n.
                                                                                  a set of reasons or a logical basis for a course of action or belief

                                                                                2. 2

                                                                                  Why func and not fn? Why are declarations var type identifier and not var identifier type? It’s just a design decision, I think.

                                                                          2. 1

                                                                            The information is useful but the tone is unhelpful. The difference in what’s checked/checkable and what’s not is an important difference between these platforms – as is the level of integration of the correctness guarantees are with the language definition. Although a static analysis tool for JavaScript could, theoretically, find all the bugs that rustc does, this is not really how things play out. The article demonstrates bugs which go vet can not find which are precluded by Rust’s language definition – that is real and substantive information.

                                                                            There is more to Go than just some design decisions that make sense for a particular niche. It has a peculiar, iconoclastic design. There are Go evangelists who, much more strenuously than this author and with much less foundation, criticize JavaScript, Python, Rust, &c, as not really good for anything. The author is too keen to poke fun at the Go design and philosophy; but the examples stand on their own.

                                                                          1. 2

                                                                            In the push(task) in “Going lock-free…” section, there seems to be a reference to a t that is not defined anywhere?

                                                                            Also, I’m not sure I understand how the push is expected to work without overwriting another concurrent push; shouldn’t the tail be incremented first, and only then the task stored? or is there something else I’m missing? But maybe things will clear up once the issue of the missing t is fixed? edit: Ok, I get this part now, I forgot that this is a thread-local queue, where only Single Producer can add things. So no races between pushes.

                                                                            edit 2: As to the missing t, based on a fragment of the linked source repository, I assume t is intended to be basically equivalent to tail. Given that, again, only the Single Producer will ever write to it.

                                                                            1. 2

                                                                              Nice catch. Looks like the post is full of grammatical errors. Also yes the t is tail and can be loaded & stored to without rmw synchronization given it’s single producer (SPMC).

                                                                            1. 1

                                                                              I really enjoyed this post @kprotty. Hopefully more like this coming (alluded to in the conclusion?)!

                                                                              1. 2

                                                                                There’s now a new one if you’re still interested

                                                                                1. 1

                                                                                  I was, and I read it.. and it was good!

                                                                                  1. 1

                                                                                    Thanks!

                                                                              1. 53

                                                                                Here’s two things I noticed about this benchmark:

                                                                                1. A minor issue, but the Go program has a data race. The i loop variable is captured in the spawned goroutines while it is mutated by the outer loop. Fixing it doesn’t change the results of the program. It’s also immediately caught when built with the -race flag. Rust being data race free by construction is a super power.

                                                                                2. Changing the amount of spawned goroutines/threads to 50,000 has a noticeable difference between the two on my machine. The Go program still completes in a timely fashion with ~5x the rss and no negative impacts on the rest of my system:

                                                                                real 11.25s
                                                                                user 41.49s
                                                                                sys  0.70s
                                                                                rss  135136k
                                                                                

                                                                                but the Rust version immediately causes my system to come to a crawl. UIs stop responding, audio starts stuttering, and eventually it crashes:

                                                                                thread 'main' panicked at 'failed to spawn thread: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', src/libcore/result.rs:1189:5
                                                                                note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
                                                                                Command exited with non-zero status 101
                                                                                real 16.39s
                                                                                user 23.76s
                                                                                sys  98.50s
                                                                                rss  288132k
                                                                                

                                                                                So while it may not be significantly lighter than threads as far as this one measure is concerned, there is definitely some resource that threads are taking up that goroutines are not. I checked on some production monitoring systems to see how many active goroutines they had, and they were sitting comfortably in the 20-30k area. I wonder if they were using threads how they would fare.

                                                                                1. 8

                                                                                  Rust being data race free by construction is a super power.

                                                                                  Well, it also necessarily prohibits entire classes of useful and productive patterns… it was a design decision, with pros and cons, but certainly not strictly better.

                                                                                  1. 4

                                                                                    Strong plus +1. I wish, for application development, it was possible to remove lifetimes, borrow-checker and manual memory management (which are not that useful for this domain) and to keep only fearless concurrency (which is useful even (more so?) in high-level languages). Alas, it seems that Rust’s thread-safety banana is tightly attached to the rest of the jungle.

                                                                                    1. 7

                                                                                      FWIW. Ponylang has exactly this:

                                                                                      • no lifetimes, data is either known statically via simple escape analysis or traced at runtime
                                                                                      • no borrow checker, although it does have a stricter reference capability system
                                                                                      • no manual memory management: uses a deterministically invoked (excluding shared, send data) Mark-and-no-sweep GC
                                                                                      • has the equivalent Send and Sync traits in its reference capability system which provide the same static guarantees

                                                                                      As with everything though it has its own trade offs; Whether that be in its ref-cap system, lack of explicit control in the system, etc.

                                                                                  2. 4

                                                                                    To prevent that Rust error, I believe you need to change vm.max_map_count: https://github.com/rust-lang/rust/issues/78497#issuecomment-730055721

                                                                                    1. 4

                                                                                      That seems to me like the kind of thing Rust should do to some extent on its own!

                                                                                    2. 3

                                                                                      I’d be interested to see how many threads ended up scheduled with the Go example. My guess is not many al all: IIRC the compiler recognizes those sleeps as a safe place to yield control from the goroutines. So I suspect you end up packing them densely on to threads.