Another surprise: Rust’s std::fs::copy does in-kernel copy when possible, using copy_file_range syscall on Linux.
Huh. If it’s fine to create your own “mediation pipe” to satisfy the API, and then splice(in, pipe); splice(pipe, out) … why doesn’t the kernel support that itself?
splice(in, pipe); splice(pipe, out)
Maybe it’s because errors need to be reported on the pipe instead of on in or out?
Does it work properly with stdin and EOFs sent with ^D?
I didn’t know this was an issue. :-)
tail -f - hitting ctrl+d does not quit the program,
but cat - does.
tail -f -
I assume using splice from Ruby would result in a similar increase in speed.
I wrote a fast cat once out of actual necessity.
I was working with an embedded system where, for installation/testing purposes (details forgotten), we had a service running in a VM that would help bootstrap a system. Part of this setup was a script that would run on the target, read a binary blob from a file descriptor and write it to the device. That reading was done with the target system’s implementation of cat.
As it happened, the sending of the binary blob to the target turned out to be a significant bottleneck. After I profiled it, it turned out that cat was taking a long time to read and write the bytes. I looked at the source and found it was using fread and fwrite. I changed it to use read/write and the transfer time went down significantly. It was great because I’m fairly certain this was part of our automated build system to create images for the devs and the result was that build times went down a lot.
So sometimes you really do need a faster cat.
Why were fread/fwrite slow? Aren’t they thin veneers over read/write that do a bit of buffering?
For “buffering”, read “copying”.
With read/write, it’s just copying into a buffer (read), and then copying out of that buffer (write).
With fread/fwrite it’s copying into a buffer (read), then copying from that buffer to another buffer (fread), then potentially copying that back (fwrite), before finally copying it back into another buffer (write). That can really add up - even the read/write loop is more than we’d want ideally, hence stuff like splice.
I’d imagine that the reason for GNU cat not using splice is because of portability (POSIX). Remember GNU cat wants to support different OS’es. Splice being a Linux specific feature would break this portability.
Not really. GNU cp takes advantage of Linux-only FICLONE (copy-on-write file cloning) feature when available. GNU cat could do the same.