1. 11
  1.  

  2. 6

    Instead of

    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread" )
    

    You might want to do

    find_package(Threads REQUIRED)
    target_link_libraries(YourExecutable Threads::Threads)
    

    It’s more portable (supports Win32 threads) and it will only set it for your target, instead of for every target in your project.

    1. 4

      Oh man this is great. Thank you! You wouldn’t know how much issues I have in another project with a windows build and threads, will be very helpful! Will have to edit the site later, can’t do that on mobile. edit: updated the site with a backlink here and the site linked in your profile.

      1. 2

        You’re welcome! CMake is pretty nice if you use it the “right” way, but there are so many small things you can do to upset it and make your experience miserable. I should probably write them all down instead of forgetting them and re-discovering them…

    2. 4

      One of the many reasons why Rust async tends to significantly reduce actually-achieved performance compared to threads is because of similar issues where there is no limit on how many tasks can be accepted into a system, leading to eager acceptors that continue to accept new work whenever they are able to, rather than when the system has capacity to process that request with a reasonable amount of latency. This in-turn makes it a lot more challenging for upstream load balancers to properly do their jobs, because there is no explicit backpressure signaling to send traffic somewhere less saturated. It seems like people tend to do a better job of bounding work with threadpools, as this article mentions.

      There are a number of nice heuristics that you can use for bounding or prioritizing work, other than a fixed-size semaphore etc… A great one for many request->response workloads that are latency-bound (rather than throughput, where you tend to benefit from over-subscribing a bit more) is to prioritize writing IO over reading IO over acceptance of new work. Writing usually happens when a request is being responded to, and once it’s written you can free the memory involved in servicing the request. You want to do this first for latency-bound tasks because it drives the shedding of resources. Reads come next, which are often more closely associated with work that is not yet complete. Finally, acceptance of new work is prioritized last for latency-bound work, along with short TCP accept backlogs to encourage the kernel to reject connections that won’t be seen by the application quickly so that the requester can choose a different server that may be able to better serve them - which must be paired with a reasonably short timeout on the client side. Stacks are often better used in these workloads than queues, as they are more likely to give you things that are hot in cache. FIFO queues often destroy cache utilization but let you avoid starvation when you become over-subscribed.

      Everything in a request dependency graph would ideally be in agreement about whether the system is prioritizing minimum latency, maximum throughput, or something specific in-between, as if you put something high-throughput in the middle of a low-latency dependency graph then it obliterates the latency of the entire graph.

      Prioritizing older tasks over newer ones (with the accepting task receiving lower priority) is another way to get nice latency for when almost all of the tasks are homogeneous.

      These considerations tend to be quite challenging to address using the hyper-minimal Rust executor interface. How does C++ stand up? I’ve heard that C++ gives you a lot more low-level access to things, but am curious how easy it is to build things like an executor where the priority of an async task can change over time.