Some factual errors:
node.js already benefits from an extremely performant JIT-enabled engine, so it’s likely that despite this repurposing of non-blocking IO for a case in which it was not intended, scheduling among database connections using non-blocking IO works acceptably well (and perhaps additionally because libuv still uses a thread pool to help with polling).
Actually, libuv uses a thread pool only for asynchronous file I/O. As it states in the link the author gave:
Unlike network I/O, there are no platform-specific file I/O primitives libuv could rely on, so the current approach is to run blocking file I/O operations in a thread pool.
Which has nothing to do with the database connections.
Actually, the linked paper is from 2006. Node.js was first release in 2009. On top of that, the tone I inferred from the quoted sentence is that this is purely some academic flight of fancy which does not match the reality I have experienced. At least inside of the Python community, the Twisted team has been promoting asynchronous I/O for probably 20 years now.
Response to the article:
It’s a lot of text which mixes between arguing about asynchronous code in general to asynchronous code using Python specifically. In short the claim is: Python is so slow any value you would get from asynchronous I/O is overwhelmed by how much time you’ll spend marshalling/unmarshalling values. On top of that, the author is talking only about RoR style applications. So, sure, sounds about right.
The author, however, does not make a distinction between throughput and latency. All of the performance numbers given are measuring latency, which goes up. But it does not indicate if throughput is increased at all. That is not to say it does, but rather that the author fails to argue it. It could be, for example, that switching to asyncio causes 10x increase in latency but a 100x increase in throughput. Which would mean your application can handle 100x more requests.
The repeated claim that asynchronous I/O is only good for long lived slow network connections is quite misleading. Consider HAProxy which pushes around a lot of data. While each individual connection is not maxing out the network, HAProxy itself is benefiting heavily from the throughput it is capable of achieving.
What’s also missing is some consideration that async io allows each stage of a pipeline to be “right sized”. I imagine that there are a lot of load balancer -> worker -> database stacks that have differing levels of concurrency. For example, a database server with 20 workers, but a web server with only 10 workers. Assuming the web workers are blocking, they can only accept 10 connections, performing 10 queries, and leaving at least 10 database workers idle. An async worker could conceivably accept 100 connections, moving the backlog to the database.
There’s going to be a blocking operation somewhere, but you may not know what it is. async means you don’t preemptively introduce a blocking bottleneck.
We will use psycopg2 which is currently the only production DBAPI that even supports async, in conjunction with aiopg which adapts psycopg2’s async support to asyncio and psycogreen which adapts it to gevent.
Rapidly approaching the point at which there’s more adapter code than code…