I fail to see how Common Lisp being machine code and all that contributes to the speed up when everything really is I/O bound at the DB level. It sounds to me that the real improvement is better thread support in Common Lisp. The author is of coure free to use whatever they like but I have a strong feeling that the same numbers could be achieved from python with something like multiprocess or joblib or any of the other GIL circumvention techniques out there.
I also find this post very disappointing. Author hand-waves and isn’t sure themselves about what causes it, then chalks it down to “magic alien pixie dust”. Articles like this do Lisp a disservice IMO.
Also, the Common Lisp implementation uses the wire protocol directly. Not sure if the python DBI driver does that. Going through libpq very likely has copying overhead.
Good call on the copying overhead, this is something I struggled with a bit for the CHICKEN libpq wrapper, for strings at least. There’s a way to bypass it if you know what you’re doing, and in CHICKEN 6 we’ll finally have a faster noncopying implementation for strings.
Psycopg2 used libpq, psycopg3 was reimplemented from scratch and speaks the wire protocol directly. The project page explicitly mentions fast COPY support, so one has to wonder if the “old” pgloader implementation would benefit from a switch to psycopg3, and if it would make things as fast as the “new” implementation in CL…
I am probably missing something, but https://www.postgresql.org/docs/current/sql-copy.html looks super straight forward to me in any language. The only tricky bit is that the file must be accessible to the server, which again is an I/O problem, not much of a CPU problem.