It didn’t mention the language until near the end, so it took a while to realise that this was about Rust. It sounds as if the crate in question is a very thin veneer over the Berkeley socket API which is… not the most user-friendly API in the world. It also inherits the original UNIX design that allowed read and write system calls to partially fail, reading or writing only a subset of the data that you’ve asked them to even when the underlying stream had data / space.
I think it’s less about Rust specifically and more about doing low level socket programming. Which… Certainly can be a learning experience. You definitely will gain a new appreciation for just how much that higher level library is doing for you.
Right, this was just using the standard library TcpStream, and I think like a lot of these APIs, it’s just going over that Berkley API.
There’s that “TCP/IP Illustrated” book that I think I’ll need to just buy and deal with. So many times I want to believe that it’s abstracted away, but my lack of understanding means I have so many questions about the actual state machine and failure modes to worry/not worry aobut.
TCP/IP Illustrated is more about the protocol itself, not the API. I also recommend lUnix Network Programming”, since it sounds like the Rust API is leaking a lot of POSIX details.
If you do a search on duckduckgo with !posix it will search through the POSIX api pages. It might be enough to just do !posix socket and read the results
Right, the API interfacing with the TCP/IP protocol could have looked very different from sockets even though they’re tightly coupled for historical reasons. What might a clean-slate TCP/IP API look like with no compatibility requirements with sockets (just the protocol)?
Container namespaces are influenced by them as well. Plan 9 had a lot of ideas that percolated out into industry. I wouldn’t say that Go was only real-world impact.
I remember finding the Java (JDK 1) TCP API very clear and simple to use, back in the 1990s; it was my first exposure to networking. Stream classes are a great abstraction!
I know this API didn’t expose all the subtleties of TCP/IP, though. I haven’t kept up with Java so I don’t know what the API looks like now.
Main Lesson: when trying to figure something out, try to use the
simplest tool to remove as many variables as possible. Even that won’t
be enough, but it will help. I also have a newfound appreciation for
“just use HTTP”.
That’s an interesting plot twist. Instead of using a pure data protocol
with no overhead, just use a protocol that has an 114-page RFC for a
basic version of it, and it’s too risky to write an own parser for it,
so an external library should be used to handle it.
That has little bearing on how easy it is or is not for the programmer to use, not implement. How many pages are the Ethernet, TCP, IP, and DHCP specs? How easy is it to plug together a small network with minimal configuration from existing parts?
The pure data protocol with no overhead has N pages of POSIX and M pages of system/language-specific docs which isn’t much better.
I think the quote makes sense as long as you’re using a ready library. You can ignore 99% of http spec if you only need rpc-like calls and no streaming.
You can ignore 99% of http spec if you only need rpc-like calls and no
streaming.
The point what I’m trying to make is that if you want to ignore 99% of
something, then maybe you should spend some time in reassessing whether
you need it or not.
If the options are: write raw TCP stream processing from scratch, use a ready high level protocol. If you think http is too complex what alternative would you propose? (Http libraries abstract stream processing, connection, endpoint authentication, message encapsulation.)
The only reasonably common alternative I can think of is grpc.
If you think http is too complex what alternative would you propose?
It depends what’s the actual use case.
IMAP, POP3, NNTP work fine just by exchanging some lines across a
socket. So if the user controls both the server and the client and both
will run on the same LAN, then maybe sending simple strings would be one
option.
I’ve written some crawlers recently and it’s not really funny to see how
fat some network messages are, when they shouldn’t be. Metadata sent,
but never read, some identifiers duplicated hundreds of times in the
same response, sending few megabytes of stuff when the client needs just
a fraction of it. It doesn’t really matter if just one person is doing
it, but when everyone is doing it, then suddenly we have websites that
require downloading 5 megabytes of useless data only to display some red
box and a string on the screen, because someone has thought “let’s just
use SOAP”.
I think a big value-add from HTTP comes in from it having atomicity beyond “receive some bytes”. You can easily layer single actions onto HTTP, and then use libraries that handle transmission questions, and in the end your application code is operating on something akin to simple files.
When working with HTTP I’m not sitting around wondering how to handle partial responses or the like, cuz it’s handled for me (ignoring streaming requests etc). And, of course, HTTP more or less successfully abstracts over a lot of potential network issues.
I agree about the overhead concerns in general, though.
This was a confusing read since you never explicitly stated what language you’re using. (Thats a pet peeve of mine.) I did figure out rust based on the clue “Manual memory management” and use of namespaces that don’t match C++.
Your experience could have been even worse if you’d used C. But honestly not that much worse. The POSIX I/O + sockets API is pretty terrible, not so much on the surface, but when you get to the details and edge cases and platform idiosyncrasies. I’ve found both “Unix Network Programming” and “Advanced Programming In The Unix Environment” to be essential for working with it.
hello darknes^H file descriptor passing my old friend, and now everything and his mother is using it. DuplicateHandle please. Yesterday - no, 10 years ago.
This sounds like a combination of bad library (and/or bad docs) and mismatched expectations. For example, that line buffering netcat does has nothing to do with TCP whatsoever, but I can sure sympathize with the confusion these extra layers of convenience/complexity can cause trying to debug something that’s supposed to be quite simple.
And saying HTTP is much simpler doesn’t make a lot of sense given that the exact thing you’re trying to do (read blocks at a time) isn’t even possible with HTTP (well, maybe with long polling and chunked transfers but that’s getting into pretty niche territory and many libraries abstract away over HTTP so high that you can’t easily do that anyway)
Also, as I understand it you have little experience with manual memory allocation, which means you’re trying to learn two (or three) things at the same time: TCP sockets, that particular library to TCP sockets and the manual memory management required to make something you don’t have a lot of understanding of yet work. That’s like trying to do something completely new in a language you’ve never used - a recipe for frustration. Maybe try to do a small prototype first in a managed language you’re very familiar with and only then switch to doing it in Rust?
Sounds exactly like what you should run into and learn from. Makes you understand what TCP does and what application protocols do and why. Someone like me can’t learn this by reading about it and has to run exactly this gauntlet.
Slight tangent: as someone who has done some gamedev, and specifically multiplayer games using UDP, I’m unsure what the use case of a new project using raw TCP is these days. I have done custom UDP stuff, and used http, but I can’t think of why I would use raw TCP without http now.
The main thing that comes to mind is that HTTP isn’t a great wire format from the perspective of fast parsing and serialisation. You can get higher throughput with a binary protocol where e.g. fields are at fixed offsets and there isn’t much parsing to do.
Not really, I think. HTTP 2 burns a lot of CPU cycles on both sides in order to make better use of a long, slow network. There’s a whole complicated compression system and just no.
In the spirit of this, C might’ve made things a bit more straightforward for this venture.
Also, if you haven’t already, Beej’s guide is an invaluable intro to socket stuff.
It didn’t mention the language until near the end, so it took a while to realise that this was about Rust. It sounds as if the crate in question is a very thin veneer over the Berkeley socket API which is… not the most user-friendly API in the world. It also inherits the original UNIX design that allowed
read
andwrite
system calls to partially fail, reading or writing only a subset of the data that you’ve asked them to even when the underlying stream had data / space.I think it’s less about Rust specifically and more about doing low level socket programming. Which… Certainly can be a learning experience. You definitely will gain a new appreciation for just how much that higher level library is doing for you.
Right, this was just using the standard library TcpStream, and I think like a lot of these APIs, it’s just going over that Berkley API.
There’s that “TCP/IP Illustrated” book that I think I’ll need to just buy and deal with. So many times I want to believe that it’s abstracted away, but my lack of understanding means I have so many questions about the actual state machine and failure modes to worry/not worry aobut.
TCP/IP Illustrated is more about the protocol itself, not the API. I also recommend lUnix Network Programming”, since it sounds like the Rust API is leaking a lot of POSIX details.
If you do a search on duckduckgo with
!posix
it will search through the POSIX api pages. It might be enough to just do!posix socket
and read the resultsAlso beej’s pages on networking are generally very good
Right, the API interfacing with the TCP/IP protocol could have looked very different from sockets even though they’re tightly coupled for historical reasons. What might a clean-slate TCP/IP API look like with no compatibility requirements with sockets (just the protocol)?
I liked the way Plan 9 did networking: http://doc.cat-v.org/plan_9/4th_edition/papers/net/
Funny that AFAICT the only real-world impact of Plan 9 was Go.
Container namespaces are influenced by them as well. Plan 9 had a lot of ideas that percolated out into industry. I wouldn’t say that Go was only real-world impact.
I remember finding the Java (JDK 1) TCP API very clear and simple to use, back in the 1990s; it was my first exposure to networking. Stream classes are a great abstraction!
I know this API didn’t expose all the subtleties of TCP/IP, though. I haven’t kept up with Java so I don’t know what the API looks like now.
That’s an interesting plot twist. Instead of using a pure data protocol with no overhead, just use a protocol that has an 114-page RFC for a basic version of it, and it’s too risky to write an own parser for it, so an external library should be used to handle it.
That has little bearing on how easy it is or is not for the programmer to use, not implement. How many pages are the Ethernet, TCP, IP, and DHCP specs? How easy is it to plug together a small network with minimal configuration from existing parts?
The pure data protocol with no overhead has N pages of POSIX and M pages of system/language-specific docs which isn’t much better.
I think the quote makes sense as long as you’re using a ready library. You can ignore 99% of http spec if you only need rpc-like calls and no streaming.
The point what I’m trying to make is that if you want to ignore 99% of something, then maybe you should spend some time in reassessing whether you need it or not.
If the options are: write raw TCP stream processing from scratch, use a ready high level protocol. If you think http is too complex what alternative would you propose? (Http libraries abstract stream processing, connection, endpoint authentication, message encapsulation.)
The only reasonably common alternative I can think of is grpc.
It depends what’s the actual use case.
IMAP, POP3, NNTP work fine just by exchanging some lines across a socket. So if the user controls both the server and the client and both will run on the same LAN, then maybe sending simple strings would be one option.
I’ve written some crawlers recently and it’s not really funny to see how fat some network messages are, when they shouldn’t be. Metadata sent, but never read, some identifiers duplicated hundreds of times in the same response, sending few megabytes of stuff when the client needs just a fraction of it. It doesn’t really matter if just one person is doing it, but when everyone is doing it, then suddenly we have websites that require downloading 5 megabytes of useless data only to display some red box and a string on the screen, because someone has thought “let’s just use SOAP”.
I think a big value-add from HTTP comes in from it having atomicity beyond “receive some bytes”. You can easily layer single actions onto HTTP, and then use libraries that handle transmission questions, and in the end your application code is operating on something akin to simple files.
When working with HTTP I’m not sitting around wondering how to handle partial responses or the like, cuz it’s handled for me (ignoring streaming requests etc). And, of course, HTTP more or less successfully abstracts over a lot of potential network issues.
I agree about the overhead concerns in general, though.
This was a confusing read since you never explicitly stated what language you’re using. (Thats a pet peeve of mine.) I did figure out rust based on the clue “Manual memory management” and use of namespaces that don’t match C++.
Your experience could have been even worse if you’d used C. But honestly not that much worse. The POSIX I/O + sockets API is pretty terrible, not so much on the surface, but when you get to the details and edge cases and platform idiosyncrasies. I’ve found both “Unix Network Programming” and “Advanced Programming In The Unix Environment” to be essential for working with it.
Does Rust have a higher level stream-based API?
The worst API in the world is Unix domain sockets ancillary data. I don’t even think it’s possible to use without undefined behavior.
hello darknes^H file descriptor passing my old friend, and now everything and his mother is using it. DuplicateHandle please. Yesterday - no, 10 years ago.
This sounds like a combination of bad library (and/or bad docs) and mismatched expectations. For example, that line buffering netcat does has nothing to do with TCP whatsoever, but I can sure sympathize with the confusion these extra layers of convenience/complexity can cause trying to debug something that’s supposed to be quite simple.
And saying HTTP is much simpler doesn’t make a lot of sense given that the exact thing you’re trying to do (read blocks at a time) isn’t even possible with HTTP (well, maybe with long polling and chunked transfers but that’s getting into pretty niche territory and many libraries abstract away over HTTP so high that you can’t easily do that anyway)
Also, as I understand it you have little experience with manual memory allocation, which means you’re trying to learn two (or three) things at the same time: TCP sockets, that particular library to TCP sockets and the manual memory management required to make something you don’t have a lot of understanding of yet work. That’s like trying to do something completely new in a language you’ve never used - a recipe for frustration. Maybe try to do a small prototype first in a managed language you’re very familiar with and only then switch to doing it in Rust?
Sounds exactly like what you should run into and learn from. Makes you understand what TCP does and what application protocols do and why. Someone like me can’t learn this by reading about it and has to run exactly this gauntlet.
Slight tangent: as someone who has done some gamedev, and specifically multiplayer games using UDP, I’m unsure what the use case of a new project using raw TCP is these days. I have done custom UDP stuff, and used http, but I can’t think of why I would use raw TCP without http now.
Because you want streams, reliable delivery, and flow control, but you don’t need HTTP’s headers, request/response cycle, and so on?
The main thing that comes to mind is that HTTP isn’t a great wire format from the perspective of fast parsing and serialisation. You can get higher throughput with a binary protocol where e.g. fields are at fixed offsets and there isn’t much parsing to do.
Most of those concerns are handled by HTTP 2, no?
Not really, I think. HTTP 2 burns a lot of CPU cycles on both sides in order to make better use of a long, slow network. There’s a whole complicated compression system and just no.