1. 51
  1.  

  2. 8

    Why do they keep putting “green threads” in quotes and calling them ‘fake userspace threads’? Userspace threads are still threads. They’re not lightweight processes a.k.a. OS threads, but they’re still threads. OS threads are no more ‘real threads’ than userspace threads are. Userspace threads are not processes, lightweight or otherwise, but they’re still real threads of execution that can execute concurrently (if not in parallel). Thread is a very broad term.

    RAM issue

    How much memory does your application take up with static resources (i.e. things that wouldn’t get written to and thus copied anyway in a copy-on-write scheme) that you cannot run multiple instances of it on a single machine? How do you have gigabytes of these resources?

    Even if you do have to have gigabytes of static memory consumption by your application, why can’t you just map it into the address space of all your workers? You’re doing something pretty unusual, so having to do mmap magic doesn’t seem like a far stretch to be honest. Python has an mmap module that should support doing this.

    Let’s carry on with this scenario. Process #1 is still running this request. It in turn has to fire something down the network to yet another “service”. When it does, something in the “green thread” libraries notices what’s going on and says “hey, you seem to be waiting on the network, so how about we go back for more work?”. It pushes that original request aside and goes back to the epoll situation.

    SIGPIPE.

    What does this have to do with userspace threading? Or Python? If you do something else while waiting for the data you need to respond to a request on a socket then doing that something else might result in your waiting longer than the client timeout on the original request (which the client doesn’t tell you - it might be 500ms for all you know, or it could be 600s).

    1. 7

      If you do something else while waiting for the data you need to respond to a request on a socket then doing that something else might result in your waiting longer than the client timeout on the original request (which the client doesn’t tell you - it might be 500ms for all you know, or it could be 600s).

      I think the point is that if another request doesn’t yield, because it’s CPU bound, a request waiting on network never has a chance to respond. Where as, using OS threads, CPU bound tasks will be preempted and can offer more fairness to tasks waiting to execute.

      1. 2

        When you use userspace threading, you multiplex M userspace threads over N OS threads, where M >> N. When you don’t use userspace threading, you multiplex N userspace threads over N OS threads. If you can migrate userspace threads between OS threads then I think this problem goes away unless I’m totally misunderstanding something.

        1. 3

          You can’t do this with CPython.

          1. 3

            If you can migrate user space threads to OS threads, yes, this is not a problem. But gevent doesn’t allow for that. Gevent implements IO yielding coroutines. These coroutines are also symmetric, so it is possible to build yielding into a CPU bound task. But, I’d wager that isn’t happening generally.

            1. 2

              Part of why I insist on using asyncio (and related explicit coroutine solutions) is that it makes it very clear where you can have a yield point. Problem is, your entire stack must be disciplined to use coroutines or shove blocking work into a ProcessPoolExecutor if it’s cpu bound and ThreadPoolExecutor if it’s IO bound.

              It’s so easy to just do data = requests.get(...).json() instead of

              async with aiohttp.ClientSession() as session:
                  async with session.get(...) as response:
                          data = await response.json()
              

              However the latter gives up the “yield” token as often as possible and means that the effective blocking is nil. However you can accomplish a “good enough” loaf for the first with:

              loop = asyncio.get_running_loop()
              def get_data_blocking():
                  return requests.get(...).json()
              # ARJ: Note we can use ProcessPoolExecutor but said executor must be
              # defined after the function to call is define so it can fork(...) appropriately
              with concurrent.futures.ThreadPoolExecutor() as pool:
                  data = await loop.run_in_executor(pool, get_data)
              

              The nice thing about explicit async is that I can easily see when it’s going to clog the event loop.

              The bad thing about grafting async into a sync-first world is that it is sooooo easy to do the sync version and then wonder “why my api server not responding????”

          2. 1

            I think the point is that if another request doesn’t yield, because it’s CPU bound, a request waiting on network never has a chance to respond.

            That is a programming mistake.

            1. 2

              The author argues that using this model is a programming mistake. Still, lots of people make it.

        2. 7

          This post is just confusing to me. They seemed to have already concluded it was a bad idea going in then did everything possible to make it go wrong. I’m personally not a fan of green threads and Gevents but this could happen to async code too. Also fork after load in python would still run into the same issue with RAM due to reference counting not mixing too well with CoW.

          1. 4

            I think the article goes into the good direction. But the style of the post, does not really help to make a point. There is several matters that are touched: user space / green threads, gevent, gunicorn, python, static typing vs. dynamic typing serialization, RPC vs. web request, poor coding practice. Even if there is some points I agree with.

            I am eager to read the follow up post on RPC, mentioned in:

            I’m not even going to get into the whole insanity of using web requests when you really should be using a real RPC mechanism, with, oh, you know, strongly defined data types and methods which use them, timeouts, load balancing, health checks, queue depth determination, sharding, pick-two, selective LIFO, and everything else that tends to show up given enough time in a battle-tested system. That’s a rant for another post entirely.

            In particular, I was under the impression that web requests are a form of RPC. It seems to me that an HTTP server framework could handle all of the above “given enough time in a battle-tested system”.

            I might be completly missing the point, but it seems to me, the situation is what it is, because people that know all this stuff roll their own proprietary solution, instead of contributing to the pool of common software.

            Thanks for the COST paper link.

            1. 1

              So what’s the alternative to python for services? Go? What are you supposed to use for RPC? Capnproto?