1. 47
  1. 16

    An additional benefit of using UNIX sockets in a case like that is easy version updates. Add a suffix to the socket (version, PID, random, whatever), point a symlink at it, reload nginx which is configured for the symlink name. This way you can start a new version, swap the symlink, wait for the old version to finish handling all connections, kill old version.

    If you need to do in-place updates, it’s seamless and works great. You can use systemd unit templates to manage the “unit@version” service in that case.

    Also given /tmp has auto-cleaners enabled on many systems these days, I wouldn’t put sockets there. You may find that after 48h they disappear :-( /var/run could be a better option.

    1. 3

      I’ll look more into /var/run, however in my limited testing I have found that on my Linux distro /var/run is a symlink to /run and only root can create files in there. I’ll do some more hacking though.

      EDIT: I managed to move the socket to /srv/within/run/printerfacts.sock and everything still works great. Thanks for the suggestion to not use /tmp! I’ll append this information to the footer of the article.

      1. 8

        /srv /opt and others will work great :) You can also create a directory in /run with the right permissions before the service starts:

        ExecStartPre=/bin/mkdir -p /run/foo
        ExecStartPre=/bin/chmod g+wx /run/foo
        

        Or using systemd parameters:

        Group=...
        RuntimeDirectoryPreserve=yes
        RuntimeDirectory=foo
        RuntimeDirectoryMode=770
        
    2. 7

      Even though it looks like a file to us humans, it’s still a socket under the hood.

      Aside, but:

      I think it’s a shame the term ‘file’ has lost its meaning so. Traditionally, it only meant a stream of bytes; not a specific, persistent record.* When a socket, record, pipe, and shared memory buffer are all conceived of as distinct objects, somewhat of the magic of unix is lost.

      Unix has its problems, and incorrect conflation is one of them. ioctl is a testament to that. But there is a tonne of value in the universal, generic structure, and I worry that the computing platform of tomorrow will be built without cognisance of that fact. Gary Bernhardt suggests that the web will be that platform; where is the web equivalent to a file?

      * Arguably it’s even more general: a capability. I contest that, however.

      1. 4

        Maybe the real universal pervasive generic data structure was the hierarchical namespace we made along the way

        1. 4

          I guess I’d reckon on the HTTP message being the equivalent to a file. It’s a common denominator for just about all web communication, its basics are easily understood but it has a lot of weird options to account for shoehorning non-document use cases into the protocol, and from the perspective of software it’s yet another way of shuffling bytes around.

          1. 1

            where is the web equivalent to a file?

            The Streams API?

            1. 5

              The question is not if the web has a way to express byte streams; the question is if the web has a universal, pervasive data structure. Considering that the page you link says streams are experimental, I don’t think they can possibly fill that role.

              1. 1

                I’d agree with dulaku then; the universal, pervasive data structure is the HTTP request/response pair.

          2. 7

            I use UNIX sockets whenever I can. Using TCP/IP for connections to local programs seems so excessive and is involving a lot more machinery and overhead. I’ve even seen people use SSL by mistake for local connections when using templating config managers.

            Most common server programs support UNIX sockets: SQL databases, memcache, pythons wsgi servers, nginx, Apache, etc. Maybe the most notable exception is rabbitmq? I just think people don’t know they exist, or find them mysterious.

            1. 2

              You can, of course, use socat as a tcp proxy to a Unix domain socket. You’ll lose the performance benefits of UDS but can interact with TCP-only services without needing to pull in a network stack into your application.

              1. 2

                Using TCP/IP for connections to local programs seems so excessive and is involving a lot more machinery and overhead.

                Both TCP/IP and UNIX sockets are abstracted away by the kernel in my mental model. Where can I read more about their overheads?

                1. 2

                  The main thing that comes to mind is this (postgres) comparison:

                  https://momjian.us/main/blogs/pgblog/2012.html#June_6_2012

                  I have done a number of private benchmarks over the years which find about the same but it is sometimes very obvious because on debian at least the default postgres connection is a unix socket and when you start using TCP/IP instead (eg when using docker for integration tests) some applications can slow down a bit - particularly noticeable on test suites that output timing numbers.

                  1. 1

                    Thanks for the pointer! That’s a good read, but I want to understand the overheads from a theoretical perspective, like, which steps are handled under the hood by the kernel when I use a UNIX/TCP socket?

                    1. 2

                      My own (perhaps naive) mental model is that the TCP/IP socket is approximately all the work described in my undergrad TCP/IP textbook:

                      • copy into a TCP frame
                      • copy into an IP packet (though I know these two steps are clubbed together in practice)
                      • figure out where to send the IP packet - easy as it’s localhost
                      • pulling apart the IP packet
                      • pulling apart the copied TCP frame (again, clubbed together normally)

                      as opposed to a unix socket which is basically two files, one in each “direction”. And on unix a file is “just” a seekable stream of bytes.

                      I suppose if I wanted to know exactly which steps are in userland vs which are in the kernel I would review the kernel syscalls that my fave language’s implementation uses.

                      My ideas for intro reading (ie the books I liked):

                      • For networks, a) Tenenbaum’s Computer Networks or b) TCP/IP Illustrated vol 1
                      • For files, the relevant bits of a) Tenenbaum’s Operating Systems or b) the Operating Systems dinosaur book

                      Two books I want to read are Robert Love’s Linux Kernel Development and his Linux System Programming. I think they would clear some mist our of my head in this area.

              2. 4

                You might want to read the papers on OKws. This was the OK Cupid web server architecture. It was designed by a few people from MIT and makes clever use of Unix domain sockets.

                1. 4

                  MIT 6.858 Computer Systems Security covers OKws as a case study. Well worth a watch IMO (or at least a skim of the lecture notes).

                  1. 2

                    is this the one that you had in mind ? may you please let me know ? thank you kindly !

                    1. 2

                      Not GP, but seems correct. You can check the Github repo too and they have linked the paper

                      1. 2

                        Yes, that is one of the papers.

                    2. 3

                      This also does away with the overhead incurred from loopback TCP. Just be careful to understand how timeouts affect the upstream load balancer (in the article’s case nginx), but otherwise this is a technique I’ve used both at $WORK and for personally run services.

                      1. 2

                        Isn’t this:

                        if let Ok(sockpath) = std::env::var("SOCKPATH") {
                            use tokio::net::UnixListener;
                            use tokio_stream::wrappers::UnixListenerStream;
                            let listener = UnixListener::bind(sockpath).unwrap();
                            let incoming = UnixListenerStream::new(listener);
                            server.run_incoming(incoming).await;
                        } else {
                            server.run(([0, 0, 0, 0], port));
                        }
                        

                        running against the whole premise of the approach? Specifically, the second paragraph says:

                        Mostly to prevent you from messing up and accidentally exposing your backend port to the internet.

                        Yet, with the code above, by forgetting the SOCKPATH we’re going to expose the port on the internet. I really like the approach of preventing/limiting the risks of configuration and deployment errors. Maybe running the server on 127.0.0.1 by default would be a better approach?

                        1. 1

                          Can someone recommend good books that can give a deep dive on this?