1. 28
    1. 10

      It’s great that the post touched on how the claim of threads being “heavy weight” may not be about memory usage but instead scheduling cost. It takes 20-50us to spawn a thread meanwhile spawning a goroutine takes 1-20us. It takes around 5-10us to schedule a thread while it’s around a few nanos to 5(ish)us for a goroutine. These are rough numbers, but the cost of thread-related actions tends to just be less with userspace scheduling making the kernel equivalents look heavier.

      Another thing that is only partially mentioned with non-blocking IO is that userspace can make more specialized scheduling decisions; When concurrency is high, the OS scheduler is likely to schedule the same task frequently for optimization and only sometimes let starved ones run. Meanwhile Golang has a focus on latency and can be more fair among ready tasks. Practically, this (along with context switching costs) means the p90 response times for a server can be in the seconds (and even hit timeouts) for 10k threads per conn while only milliseconds for 10k goroutines per conn.

      1. 6

        Javascript avoids this by essentially making everything async and non-blocking. Python & Rust have to juggle async/non-async separately.

        So when you combine goroutines and fully integrated non-blocking I/O that’s when you get strong multicore performance and a platform that can “cheaply” handle a large number of network connections while still avoiding “callback hell” or the “function coloring” problem. It’s not everyone’s desired tradeoff. They’ve made C interop more expensive, and calls that can’t be made non-blocking have to be done in a threadpool (just like node.js and DNS resolution).

        I don’t think it’s fair to say Go “avoided” the function colouring problem. They just put different syntactic sugar on “making everything async and non-blocking”.

        Gor Nishanov’s Fibers under the magnifying glass is basically showing how Go’s decision to use stackful coroutines was tried in the 90s and abandoned by everyone else because they couldn’t figure out how to make it perform “better enough” to justify the negative effects on in-process FFI.

        Rust and Python made the trade-off they did because their raison d’etre is to interop well with C, either to enable partial rewrites of C codebases or to act as a glue language.

        Heck, fasterthanlime’s “Lies we tell ourselves to keep using Golang” has a whole section named Go is an island talking about how much Go’s developers have doubled down on “you should rewrite the universe in Go for easy cross-compilation and deployment” and “the only good boundary with Go is a network boundary”, yet Rewrite it in Rust is the one everyone memes because Rust is the language that excites those with little project management experience.

        1. 12

          My co-workers make faces of horror and disgust when I tell them I want to import a package that uses CGo to wrap a C library. It’s like I’ve confessed to heresy.

          Did you know someone went to the insane effort of writing a C-to-Go transpiler, so they could have a SQLite binding that doesn’t use CGo?

          1. 5

            Ouch. Given how much of SQLite’s value proposition is the literal avionics-grade testing they apply to the C version to verify how it interacts with all sorts of OS edge-cases and hardware faults, I wouldn’t want to rely too heavily on how differing semantics of the Go execution model might undermine that.

            1. 2

              Execution is faithfully preserved, it has to be if the transpiler will be coherent. The issue, if there is one, is performance.

              1. 5

                This is exactly backwards – anybody who has written a transpiler/compiler will tell you that :)

                It depends on the source language, target language, and the runtime interface

                If someone says, “I wrote a transpiler that’s correct, and therefore the translated sqlite library behaves exactly the same”, then I have a bridge to sell you

                The only way have confidence of correctness is to actually run the sqlite tests against the Go version (and maybe they did that)

                In some cases that may be difficult as the tests run in-process, not in a separate process

                1. 2

                  […] to actually run the sqlite tests against the Go version (and maybe they did that)

                  In some cases that may be difficult as the tests run in-process, not in a separate process

                  Also, most of SQLite’s tests are proprietary, including the “avionics-grade” ones.

                2. 2

                  Execution is faithfully preserved under normal circumstances. The test suite for the original C version of SQLite simulates all sorts of abnormal cases where the abstractions presented by the programming language and its runtime are leaky.

                  (eg. The “Executive Summary” section of the How SQLite Is Tested page includes bullet points such as “Out-of-memory tests”, “I/O error tests”, and “Crash and power loss tests”… all of which could have semantics that vary subtly but significantly between the C and Go runtimes.)

                  I’m reminded of how… Bryan Cantrill, I believe… talked about how achieving compatibility with the subtle nuances of the intersection of Linux’s… vfork and SIGCHLD semantics, if I remember correctly, for the purposes of being able to run Linux binaries was a terrible experience.

                  1. 4

                    You do remember correctly! Details are described in: lx vfork and signal handling still racey. While this was a bug in our emulation, it was surprising the degree that Go relied upon the murky semantics at the intersection of these two leaky abstractions…

                    1. 3

                      Oh, hey! While you’re here, I really wanted to thank you for this:

                      And it’d be easy to be like “Well, that’s just B-Trees! …not AVL trees.” Yeah, but… the reason I could use a B-Tree and not an AVL tree, is because of that composability of Rust.

                      A B-Tree in C is gnarly. A B-Tree is just… I mean talk about intrusive. It’s VERY intrusive. It’s very kind of up in everything’s underwear. And someone, of course, on the Internet’s like “Well, you should go to use this B-Tree implementation.” You look at the B-Tree implementation and are like “This is rocky! And if there’s a memory corruption issue in here, I do not want to have to go find it.”

                      I would still use an AVL tree in C even though I know I’m giving up some small amount of performance, but in Rust, I get to use a B-Tree.

                      – Bryan Cantrill @ https://youtu.be/HgtRAbE1nBM?t=2450

                      It’s been so useful to have on hand to help support my arguments about the important difference between what performance a language is capable of and what a programmer is willing to do with it.

                      1. 2

                        Oh, that’s awesome to hear – and thank you for the kind words! If you haven’t already seen it, I expanded on the relative performance of C and Rust, which explains how I came to the conclusion that B-trees were providing much of the win of using Rust over C.

                        1. 1

                          Thanks. :)

                          I have an approach to blog-reading that’s very haphazard (stochastic, if you want to put it euphemistically), so I missed “The relative performance of C and Rust” despite having quoted other things like the “But the nothing that C provides reflects history more than minimalism; it is not an elegant nothing, but rather an ill-considered nothing” paragraph from “Rust after the honeymoon”.

                          Also, it looks like one of your WordPress plugins has gone awry, because every comment is topped with its own “Notice: get_the_author_email is deprecated since version 2.8! Use get_the_author_meta(‘email’) instead.” message.

              2. 2

                It’s similar to Java – JNI is frowned upon from what I remember

                And yeah it’s a big reason I’ve avoided Go – I’m supposed to throw out all my C / C++ knowledge and decades of libraries? The sqlite3 thing raised my eyebrows too

                It is a big shame, because I would like to use a higher level and faster language like Go or Java/Kotlin, but I stick with Python because there is always a “way out”

                Python has different problems, but I can always make it work somehow. With Go or Java it seems like I’ll find myself in a corner with a Catch 22 eventually


                I have a rant about the “monoglot anti-pattern” – language designers assuming that every system in the language will be written ONLY in that language. Everybody wants to “own the world”

                As a software system becomes more successful and gets to a certain size, the probability of that approaches zero rapidly

                I mean Go barely has GUI libraries from what I remember, so if you even add a GUI to your system, now you have this awkward boundary, and a code reuse problem

                1. 6

                  It’s similar to Java – JNI is frowned upon from what I remember

                  It made more sense for Java because pure Java code was binary portable to any platform with a JVM and was part of the same sandboxing as the rest of the Java code. In the case of Go, neither of those reasons hold.

                  1. 4

                    I switched from Python to Go precisely because Python rarely if ever offers a “way out”. People think “just rewrite the hot path in a fast language” will save their bacon, but they always forget that calling into C requires marshaling their data into a C-friendly structure which often outweighs any performance gains. People also fail to think about the complexity or adds to your build system or how you debug into the C code (it can be done, but it’s tedious).

                    In every Python system I’ve worked in, we found ourselves boxed in because there are no good optimization options. Whereas with Go you can usually do a little refactoring of the naive implementation to consolidate allocations and you will often see something like a 10X improvement. Plus Go has shared memory parallelism, so you can go even further.

                    Go has bindings to other GUI toolkits, but yes, it’s a cross-language boundary which is awkward in Go like it is in most languages. Even writing GTK in C or Qt in C++ are awkward because they make heavy use of conventions, metaprogramming, etc that are unusual in their host languages. GUI toolkits are just hard in any language, unfortunately.

                    1. 1

                      Yeah I don’t doubt that experience

                      • fixing memory leaks in Python/C bindings isn’t fun
                      • finding where to release the GIL in Python/C bindings isn’t fun
                      • debugging SWIG-type crap in bindings isn’t fun

                      I have done all these things, and I’ve also been surprised when I managed to rewrite the hot loop of a Python program in C++, and it reduced memory usage by 10x and increased speed by 10x. That’s the theory, but it doesn’t work in many types of programs, for the reasons you mention

                      But I really meant it when I say “way out” … It’s more work in those situations, but you can always apply some more knowledge/effort to get it done

                      I AM very tempted by Go – it does have many appealing qualities. But for the problems I’ve worked on and am working on, the “closed world” property makes me very hesitant

                      Actually right now I have a choice between a (small) existing Go codebase, Python, and also looking at Elixir … genuinely not sure what direction I’ll go in

                    2. -2

                      I’m supposed to throw out all my C / C++ knowledge and decades of libraries?

                      Yep, and replace them with Go libraries, which almost entirely are of pretty low quality. It’s common to have very clumsy, unsafe interfaces, incomprehensible bad error messages (usually just many levels deep of complex error strings glued together), no type safety, no escape hatches and so on.

                      Whenever I touch Go, everything that would take 10 minutes in basically any other modern language is a 1 hour adventure and often ends in “and now you have to copy&paste half of this library into your project because they didn’t think of that. And it never gets better. The purest and stickiest tarpit our industry has at the moment, imo.

                      1. 5

                        This seems misinformed. Go doesn’t have Rust’s type safety, but it certainly has more type safety than C and certainly not “none”. I also haven’t perceived any lower quality in Go code bases than other languages (and a fair bit higher than C or C++), of course this is all subjective.

                        Regarding error messages, Go’s approach isn’t my favorite, but the typical C approach is returning a single integer devoid of any context. It can map to a string like “file not found”, but you’re not going to get the salient context.

                        Regarding “no escape hatches”, Go lets you escape the type system either by using (essentially) dynamic typing or unsafe. I’m not sure what kind of “escape hatches” you’re expecting (particularly those from C/C++).

                        Whenever I touch Go, everything that would take 10 minutes in basically any other modern language is a 1 hour adventure

                        This is the exact opposite of my experience. Go is my go-to language for “I need to get stuff done quickly in a way that others can understand”. It’s boring and it gets the job done. I use other languages for hobby stuff where I want to play with new ideas and I don’t necessarily care if it takes 10x longer to get the job done (I actually started using Go because I got tired of spending soooo much time scripting CMake or Gradle and I wanted something that just let me hit the ground running).

                        Anyway, this doesn’t invalidate your experience; just adding my datapoint.

                  2. 12

                    Go absolutely avoided the function coloring problem. Go functions are all the same color, which is synchronous. This fact motivates a few best practices, summarized by the idiom of “leave concurrency to the caller”.

                    Go treats FFI as a second-class citizen, for sure. But very few programs make use of FFI.

                    1. 0

                      *nod* And a big part of the reason Pascal lost out to C is that it took so long to step out of the “That functionality would impede its suitability to be a safe teaching language” sandbox.

                      The function colouring problem exists in no small part because there was a time not too long ago when we hadn’t yet recognized how desirabie asynchronous programming would become as a feature. Go had the benefit of hindsight and chose to willingly walk toward being “Pascal, but for microservices instead of teaching”.

                      (Thus, “the only good boundary with Go is a network boundary”.)

                      …and yes, Pascal’s support for things like arrays where the length isn’t part of the type signature and strings that aren’t size-limited by a length field with a range below what available memory allows are a function colouring-esque change that needs to be plumbed throughout a codebase to change things if you chose incorrectly.

                    2. 11

                      I don’t think it’s fair to say Go “avoided” the function colouring problem. They just put different syntactic sugar on “making everything async and non-blocking”.

                      Surely it is fair? The problem is that the developer needs to know about function colours. In go, they don’t?

                      The fact that the go runtime may be able to reap the benefits of async I/O without bothering the developer is - imho - kind of the whole point of go.

                      1. 1

                        The reason I quoted what I did is because I think it’s unfair to say “Javascript avoids this by essentially making everything async and non-blocking.” and then not say the same thing about Go.

                        Sure, the syntactic sugar is different, but Go and Node.js take the same approach.

                        1. 8

                          Node does not, right? You have to color async functions with the “async” prefix sigil, and sync functions can’t call async functions without await (which is a bit safer than what python does, or does not do in this case), but you do color them.

                          1. 2

                            You don’t need async to implement “everything is async and non-blocking”. That’s why so many Node.js APIs take callbacks. They were doing “everything is async and non-blocking” before JavaScript gained the async syntax construct.

                            1. 6

                              If you’re saying “straight line golang code” is merely “syntactic sugar” compared to “callback hell” then I think you’re fairly far off into “all languages are the same” territory.

                              (If you’re not saying that, I’m misunderstanding you).

                              1. 1

                                I’m saying that the execution models embodied in what the standard library APIs allow you to to do are the same, and that it weakens the argument to not acknowledge that just because one language has existed longer and had more time to accumulate multiple syntaxes that serve the same purpose as a result.

                                1. 6

                                  Your initial point was that it wasn’t fair to say golang avoided the function-colouring problem.

                                  I understand that problem to be “you have two different - mostly non-interoperable - kinds of function, and the developer has the burden of choosing which to use each time”.

                                  Golang does not have this problem (due to fairly novel runtime support for calling blocking APIs and shuffling userspace threads across OS threads), so in my view it is pretty clear it did avoid this problem.

                                  1. 0

                                    And, when it was introduced, Node.js didn’t have that problem because the standard library threw in hard on callback-based async APIs.

                                    To argue that Go doesn’t have that problem is like saying “Go doesn’t have the problem of a bolted-on generics system because Go doesn’t have generics”… oops. It’s a weak argument to say “This language is better because, thus far, upstream fiat has managed to keep it from growing bodged-on support for use-cases X and Y while the other language had many more years to get hammered into doing so.”

                                    1. 5

                                      The “function coloring problem” doesn’t refer to the different concurrency syntaxes in JS, it refers to one specific syntax (async/await) which has multiple colors within itself—sync functions and async functions.

                                      The criticism isn’t “JS has too much stuff bolted on”, it’s “JS requires you to choose between a function coloring problem (async/await) or callback hell (while Go has neither”).

                                      1. 2

                                        And, in a decade, I wouldn’t be surprised if Go were being criticized for having a “requires you to choose between” relationship with some feature Rust’s in the middle of popularizing.

                                        (eg. A garbage collector gets you memory-safety, and generics extends Go’s ability to give you type-safety, but I’ve already seen posts like Fixing the Next 10,000 Aliasing Bugs which ends with “Null checking was once unusual as well, but now, in this age of Kotlin and strictNullChecks TypeScript, nobody will take a language without null checking seriously. I think that someday, the same will be true of alias checking as well.”. If Go has to retrofit support for “alias-safety”, it’s going to make the language uglier.)

                                        It’s a weak argument to compare two languages based on something one of them predates and had to add in a backwards-compatible way without acknowledging that fact.

                                        For example, Rust vs. C++ arguments are stronger for readily acknowledging that C++ is hamstrung by the need to remain compatible with a long history of legacy code.

                                        1. 1

                                          And, in a decade, I wouldn’t be surprised if Go were being criticized for having a “requires you to choose between” relationship with some feature Rust’s in the middle of popularizing.

                                          Maybe. 🤷‍♂️ Hopefully newer languages advance over older ones, right? That’s how progress works.

                                          1. 1

                                            *nod* …and that’s a good thing.

                                            I’m just saying that the argument being made was weakened by not acknowledging the role that played in Go having a more pleasant, less function-coloured async story than JavaScript.

                                      2. 1

                                        Go’s generics certainly aren’t perfect, but I don’t think they can be accurately described as “bolted on”.

                                        1. 3

                                          There is a whole ecosystem full of APIs that were designed before Go had generics. Thus, I consider it bolted-on.

                                          It’s like comparing Rust’s Option<T> to C++’s std::optional<T> based on their use in their respective ecosystems.

                                          1. 2

                                            I’m not sure that it’s reasonable to describe generics as “bolted on” simply because they came in a later version of the language.

                                            1. 2

                                              It depends on whether you’re looking at it from the point of view of the language’s semantics or how they fit into the ecosystem.

                                              If you’re looking at the language in isolation, you’re correct… but if you’re looking at how they fit into the standard library APIs and the APIs provided by the ecosystem then, yes, they’re very much bolted on later and that’s an unavoidable side-effect of needing to remain backwards compatible.

                                              Personally, as a programmer, what I care about is what I have to interact with in real-world scenarios, which means the latter case.

                                              1. 1

                                                if you’re looking at how they fit into the standard library APIs and the APIs provided by the ecosystem then, yes, they’re very much bolted on later and that’s an unavoidable side-effect of needing to remain backwards compatible.

                                                I don’t really agree. Pre-generics “generic” code necessarily used runtime reflection to do its dark work, but the introduction of generics didn’t totally obviate that stuff, it just introduced an alternative way to express type-invariant code, which makes sense to use in a subset of use cases.

                                                1. 1
                                                  1. Swap out a few words and you’ve just described function colouring vs. callbacks as a way to handle async in JavaScript.

                                                  2. Please understand that I’m one of those people who gravitates to Rust, not for its performance, but for the maintainability advantage of having such a strong type system combined with an ecosystem that embraces “fearless upgrades” more than Haskell’s ecosystem. Having stuff that can’t be shifted to being checked at compile-time, despite the language now supporting it, is a big downside.

                                2. 2

                                  “You don’t need” is not the same as “you cannot”. Javascript supports async/await and has Dunston coloring, whether you use it or not.

                                  1. 2

                                    And a big part of my point is that Node.js was designed before JavaScript had async/await, so saying Go is better without acknowledging that is like a fit 20-something mocking an old man for not having the same physical strength.

                                    It hurts your argument to not acknowledge that one of the languages didn’t have that benefit of hindsight when being designed, and had an existing ecosystem to remain compatible with when extending its capabilities.

                                    1. 3

                                      I don’t get your argument. Other languages have big ecosystems to maintain too. Go is just historically conservative when it comes to expanding the language. And the choice of expanding the language and maintaining legacy is one the javascript designers were already familiar with (promises promises), and they still took it, so yeah, they had the benefit of hindsight.

                                      1. 2

                                        My point is that, of course Go doesn’t have as much baggage. It’s younger. JavaScript has baggage because it’s grown far beyond its original designed use case… so it hurts how people will perceive the argument if you don’t acknowledge that.

                                        Compare, for example, how Rust readily acknowledges that one of the big reasons it’s more pleasant to work with than C++ in many situations is that it learned from C++’s decades of evolution without needing to maintain backwards compatibility.

                                        1. 2

                                          That’s not the argument for function coloring which is a relatively recent feature of JS. Which Go predates by several years, and could have adopted if worth it. Rust adopted it, and I don’t think it avoids function coloring either (although it’s not been promoted to a language feature AFAIK). It has nothing to do with JS perceived baggage, and I don’t know how being younger would have fixed it, but perhaps you can explain.

                                          1. 2

                                            What I’m saying is that:

                                            1. JavaScript originated as a quick-and-dirty web content scripting language in 1995, so it’s not surprising that it began with only sync APIs.
                                            2. That and the need for backwards compatibility meant that the only way it could reasonably add async was through function colouring.
                                            3. Go originated in 2009, when the need for good async abstractions was much more visible than in 1995, and it didn’t have a sync ecosystem to remain compatible with, so it could be async-only.

                                            That’s how being younger would have fixed it.

                                            Not at least making a nod to this in a comparison is like implicitly faulting C for lacking Rust design decisions which must be present from the beginning and could never have been implemented on a PDP-11.

                                            Rust adopted it, and I don’t think it avoids function coloring either

                                            Because Rust’s primary goal is good “full-duplex integration” with C and suitability for replacing C and/or C++ in all scenarios, which means:

                                            1. Sync-coloured APIs must be available.
                                            2. Async cannot be baked into the language too deeply or it prevents things like the Linux kernel devs’ interest in using async/await on top of their own kernel-mode task APIs.
                                            3. Rust’s view on explicitness means Zig’s approach is a non-starter.

                                            …though the async WG has announced their intent to explore the idea of “keyword generics”, where you’d be able to declare a function as being generic over async-ness.

                          2. 4

                            Maybe it’s the stockholm syndrome, but I kind of like that stuff gets a native implementation in Go rather than pulling in a bunch of C stuff. FFI sounds great, but cross-language building, cross-compiling, debugging, etc are a pain (e.g., Python). I’m glad that Go makes these things a little costlier so something has to be “better enough” to justify these non-obvious costs. Moreover, even with Rust’s great FFI story, most things are still rewritten, so the practical effects are similar.

                            Go might be a walled garden, but it’s a pretty nice one IMHO.

                            1. 3

                              The problem is that better is different. Forcing a reimplementation of everything to be in Go to coexist with the Go runtime means that things can be better than C-world by being different, but also things are different and different is bad in itself and often not actually better.

                              Anyway, Go is a language based on trade offs and some people don’t like the trade offs it makes and some people don’t like the idea of having trade offs at all.

                          3. 5

                            This post says event-driven IO like epoll is faster, but doesn’t say why that’s true. The interface you program against in an OS thread is exactly the same as the one you use in a Goroutine: call read and if there’s no data available, your thread gets switched out. This is a blocking API conducted on top of nonblocking primitives, in both cases. So why would one be faster than the other? This article doesn’t say, and it also doesn’t say that it isn’t saying. The article should be amended to either acknowledge this as a mystery, or to answer it. Don’t just ignore it.

                            1. 1

                              *nod*

                              My understanding is that it’s because OS threads have to swap out more more on each context switch to account for the most performance-antagonistic possibility in what the code may be doing, but it’d be nice to have some solid data and citations on it right in the post.

                              I haven’t researched it in depth, so the main source I’m aware of is the “2.2 Context Switching overhead” section of Fibers under the magnifying glass, which consists of this paragraph and a table:

                              While Fibers do not offer significant savings in terms of the memory footprint compared to threads, they do have a capability to switch from fiber to fiber without involving kernel transition and the cost of the fiber switch is cheaper that the cost of a thread switch. However, the fiber switch has still significant cost compared to a nor- mal function call and return or (stackless) coroutine suspend and resume [Wandbox]3.

                              The following table samples fiber switching costs on several popular platforms:

                              (For the record, in that paper, “fibers” refers to Go’s stackful coroutines, while the state machines generated by Rust’s async/await are examples of “(stackless) coroutine suspend and resume”, because those plus threads exist as three points on a continuum of how much the programmer is telling the runtime about what it needs to spend time preserving across context switches.)

                              Actually, now that I think about it, that idea that research would develop smarter compilers/runtimes which would solve efficiency problems was a common thread in a lot of stuff in the 90s that either faded or has been held back by it. The JVM, fibers in pretty much everything except Go, Itanium’s explicitly parallel design, etc.

                              1. 3

                                You’re talking about something different than what my comment was about. My comment is basically a criticism of the writing in the article.

                                You’re talking about the context switching overhead, which this article says (rightly or wrongly) is not significant:

                                So far I’ve made goroutines sound pretty boring. They are like threads but use same order of magnitude of RAM. They occasionally can be scheduled smartly but I haven’t presented any evidence they can be scheduled/context switched more efficiently than regular threads. The biggest real benefit I see is that I can use lots of goroutines without worrying as much about configuring system resources.

                                Then the article says:

                                So why do [goroutines] exist and why are they awesome? […] non-blocking I/O where possible plus integration of the event loop into the go scheduler is, to answer our earlier question, the manner in which goroutines can be more efficient than threads and it’s how go manages to be pretty good at fast networking.

                                The article distinguishes context switching overhead from “non-blocking I/O”, and says the latter is the significant source of efficiency for goroutines over threads. But it does not say why “non-blocking I/O” is faster.

                                1. 1

                                  Ahh. Point.

                                  That is an interesting question, given how many different platform APIs “non-blocking I/O” can cover and how variations between them can affect what performance gains may be seen.

                              2. 1

                                The interface you program against in an OS thread is exactly the same as the one you use in a Goroutine: call read and if there’s no data available, your thread gets switched out . . . So why would one be faster than the other?

                                The “surface area” of a goroutine is smaller than an OS thread, and the Go scheduler has access to deeper and more specific knowledge about goroutines than the OS does about threads. Net result — which makes perfect sense! — is that the Go scheduler can context-switch goroutines over OS threads much faster than the OS can context-switch threads over CPU cores.

                                1. 1

                                  See your sibling comment and my reply to it. You’re talking about something different than my comment is about :)

                                  1. 1

                                    I understand that your comment was about the material presented in the post, I’m just trying to provide some of the rationale that you felt was missing :)

                              3. 4

                                defer wg.Done()

                                I think this bit wastes non-trivial amount of resources in go, so it’d be better, for the purpose of comparison, to just call Done directly after Sleep.

                                EDIT: yeah, doing that improves Go CPU time 3x. So, “That’s way more than rust because … go is performing scheduling work in userspace” is incorrect. It’s rather because defer (orthogonal feature) is slow.

                                ignoring kernel structure tracking overhead

                                I think that’ll be at least an extra page per-thread, which also won’t be visible to usr/bin/time.

                                1. 2

                                  I think this bit wastes non-trivial amount of resources in go, so it’d be better, for the purpose of comparison, to just call Done directly after Sleep.

                                  Defer overhead was basically reduced to zero in some point release many years ago.

                                  1. 1

                                    Does not seem to be the case in this particular example:

                                    $ go version
                                    go version go1.20.5 linux/amd64
                                    
                                    $ git diff a.go b.go
                                    diff --git a/a.go b/b.go
                                    index 3c50a8a..92ea0c9 100644
                                    --- a/a.go
                                    +++ b/b.go
                                    @@ -9,8 +9,8 @@ func main() {
                                         wg.Add(count)
                                         for i:=0;i<count;i++ {
                                               go func() {
                                    +                       defer wg.Done()
                                                            time.Sleep(time.Second)
                                    -                       wg.Done()
                                               }()
                                         }
                                            wg.Wait()
                                    
                                    $ t go run a.go 
                                    real 2.26s
                                    cpu  6.03s (4.95s user + 1.08s sys)
                                    rss  2550.02mb
                                    
                                    $ t go run b.go
                                    real 2.58s
                                    cpu  18.02s (17.28s user + 746.47ms sys)
                                    rss  1904.80mb
                                    

                                    CPU time is changed from 18s to 6s, which is significant.

                                    1. 2

                                      I can’t reproduce your results. I suspect it’s related to your use of go run, which includes compilation time. Do you see the same thing if you compile each program beforehand, and just run the compiled binary?

                                      1. 1

                                        Huh, that’s curious! It definitely reproes on my machine even with build/run. Here are results using check.sh:

                                        https://gist.github.com/peterbourgon/eaabb887c85ae235999ab22d38e0c4be?permalink_comment_id=4637652#gistcomment-4637652

                                        1. 2

                                          Summarizing my response in that gist: this makes sense, defer definitely adds some overhead, that overhead compounds for larger values of N, but for any typical value of N it’s going to be negligible. If you have a function that produces 100k or 1M defers then sure, maybe that’s worth an optimization pass, but that’s a special case.

                                          edit: “As of Go 1.13 most defer operations take about 35ns … In contrast, a direct call takes about 6ns.” Go 1.14 “eliminates about 80% of defer’s direct CPU cost.” Still more costly than not using a defer, but not by so much that it makes sense to avoid, unless you have concrete profiling data indicating it’s a problem.

                                          edit 2: a better demonstration of the relevant overhead is probably something like this, which I believe illustrates just how small it is in practice.

                                          1. 3

                                            I don’t think that’s what’s happening here. We don’t have one function with N defers. Rather, we have N functions with one defer each. Looking at the profile a bit, it looks like perhaps the role of defer here is to somehow poke gc into working and scanning stacks, which then results in threads running gc contending on this singleton atomic:

                                            https://github.com/golang/go/blob/2eca0b1e1663d826893b6b1fd8bd89da98e65d1e/src/runtime/mgc.go#L311

                                            pretty low confidence here though.

                                            To clarify, this does mean that I was wrong — it has almost nothing to do with defer, and everything to do with gc.

                                            1. 5

                                              Ok, confirmed that disabling GC removes the difference, so I am now moderately confident that this just hits pathological case with concurrent GC where scaling is negative (due to all threads competing for the same atomic).

                                              $ COUNT=1000000 GOGC=off t ./b
                                              
                                              real 1.85s
                                              cpu  2.96s (2.11s user + 844.43ms sys)
                                              rss  2600.94mb
                                              
                                              $ COUNT=1000000 GOGC=off t ./a
                                              
                                              real 1.99s
                                              cpu  2.98s (2.08s user + 900.51ms sys)
                                              rss  2603.96mb
                                              
                                              1. 2

                                                Haha, neat! I replied in the gist with a theory that was at least adjacent to this finding 😇

                                2. 3

                                  a platform that can “cheaply” handle a large number of network connections while still avoiding “callback hell” or the “function coloring” problem

                                  Until you cannot. I’m working on a Go code stack that launches I/O operations in goroutines because it wants to send of multiple requests and then wait. And because goroutines are „cheap“ it seemed fine until the program scheduled millions of them and the scheduler latency jumped into tens of seconds. So we are back to channels or callbacks 🤷🏼‍♂️

                                  1. 1

                                    You’re working on a program that makes millions of concurrent network requests? That dog ain’t gon’ hunt, friend! 😉

                                    Goroutines are cheap for sure, and you can have a million of them going at the same time, but the OS and the network and many other layers besides assert limits on iops, so you definitely don’t get to do a million network requests at the same time. At least, not without serious impact to performance and resource utilization.

                                    1. 2

                                      I didn’t phrase it well. Ofc we wouldn’t make a million network requests biter did create a million goroutines in a short time span (<30s) and the scheduler had a hard time with them. The cores issue was that the interface wasn’t well defined and the goroutines were used in concurrency control.

                                      Overall I’d argue that goroutines do not come for free and I have the impression that many write like they do.

                                      1. 1

                                        I see. Yeah, you’re right: goroutines definitely aren’t free, and you need to bound them the same as any other resource. But they’re significantly cheaper than OS threads.

                                  2. 3

                                    One thing not mentioned here is the fact that goroutine can reduce its stack size. So if you have a long running goroutine (chat server with ‘one’ per client) if it occasionally needs to stack allocate a lot but it’s a transient allocation it will not increase the stack size permanently.

                                    This is not the case for the native threads which can only grow.

                                    1. 3

                                      This description of “go-routines” makes them sound exactly like 90s-ero “user-mode” (aka “green” aka “M:N”) threads. Is there some reason for adding new terminology here? Some extra feature not in green threads?

                                      Also, I think a lot of the nervousness over “a million threads” dates from the same era, when you could easily exhaust 2GB of virtual memory with only a few thousand threads. That caused so many scars that it can be hard to remember that 64-bit machines can’t hurt us (that way) anymore, so the legends continue to be whispered from generation to generation. :)

                                      1. 2

                                        Your intuition is correct: goroutines are a form of green threads. The terminology just follows from the Go spec.

                                      2. [Comment removed by author]