1. 12

  2. 12

    I think there are several questionable statements there and the problem domain is not described in sufficient detail even for introductory material.

    The reader is left with an impression that in order to fill in I/O wait times with work that you either add more processes or threads and the latter is the faster (but harder to implement) choice. If the goal of the article was to simply describe what threads can be used for this is the stuff which should not have been touched imo.

    The CPU vs I/O bound description is fine at this level. Indeed adding more processes or threads may be a perfectly valid way of dealing with it. Way which was completely omitted was programming with event loops - you set your fds to non-blocking and just add events and react to completions. This is how e.g. nginx works.

    There is a repeated statement about the benefit of sharing code. But this is precisely what normally happens for processes - virtual -> physical mappings for all files (the binary itself, libc, whatever) lead to the same memory pages. 2 processes will have a slightly higher memory footprint than 2 threads and it should be negligible.

    The article itself notes that if you want you can share data between processes, e.g. you can mmap.

    There is a benefit of a cheaper thread<->thread switch, but it seems overplayed and out of place for this piece - it’s too low level. In the same spirit I can point out how threads get slower:

    • plenty of syscalls called all the time accept file descriptors as an argument. translation fd -> actual file happens all the time. linux has a hack - if the process is single-threaded it avoids referencing the found file (and dereferencing later), this saves 2 atomic ops.
    • several kernel structures also get shared - in particular you start getting lock contention on mm (address space) and the file table

    On the other hand a non low-level and very real cost which was not even mentioned comes from providing safe access to shared memory. It was mentioned it is harder, but not that naively applied can result in prohibitive performance impact. “Fun” fact is that even if you have threads which don’t talk to each other, if there was no care taken to properly place global data it may be there are cacheline bounces. Most real-world programs not written with concurrency in mind from day 1 suffer from false-sharing and bad locking to this very day.

    So, I think the article misrepresents what threads are good for and does not provide a fair/balanced cost:benefit ratio (neither does this post, but I think it does enough to point out problems with the article).

    Today, especially with the advent of NUMA, you want to avoid as much writeable shared state as possible. And if your shared state is read-only, you very likely can just use processes (which may or may not be a better choice than threads, point is - memory usage will be virtually the same).

    1. 5

      This is good feedback. I didn’t intend to “sell” threads as much as I was trying to explain their existence. In Ruby other languages that have a GVL there is a bit of thread-phobia and my IO comments are primarily aimed at those groups. Basically the GVL is released during IO which also happens to be an ideal candidate for using threads (even on languages without a GVL). This short video was extracted from a talk I gave to Ruby programmers.

      I find many programmers are literally afraid of threads. Either they’ve been burned badly by them, or they’ve just heard so many horror stories. Many i’ve talked to wish for some kind of a magical drop-in construct that will have all the benefits of threads and processes with none of the downsides. I think when you understand what exactly a thread is, then it’s a bit more clear such a mythical thing won’t come. (or at least not anytime soon, or without it’s own caveats). There are concurrency alternatives, but there are not concurrency magic tools.

      The course I recommend at the bottom of the page goes into quite a bit more detail. Explicitly the problems with different cache invalidation strategies and how they can be messed up with different parallel access patterns. Also different kinds of access controls such as when it’s a good idea to use a spin lock versus a mutex etc. We also went over the “flash” paper which compares many strategies including evented programming. It’s some pretty interesting stuff.

      While the cost of context switching a thread is likely not substantial to most people or programs. The benefits from sharing a hot cache I think can be very substantial. However they’re harder to explain. I mentioned that, but didn’t dig into it.

      Way which was completely omitted was programming with event loops

      I have a note at the bottom of the article that mentions evented programming. I didn’t make it into the video.

    2. 2

      It blew my mind when I learned that Linux models threads as separate processes (i.e. each thread has its own PID), they just all share the same virtual memory mapping (until CoW). A really neat implementation detail, at least from 10,000 ft up.

      1. 2

        Wait, what? I think you’re thinking of Forks, which are full heavyweight processes. They do Copy-on-write.

        Posix threads are not forks. They operate within the same process and within the same PID.

      2. 0

        cool - thanks!