Threads for danderson

  1. 2

    If I’m understanding correctly, this requires you to be using Tailscale already, right? I.e. the client-proxy connection is secured only when they are both peers in a Tailscale network.

    1. 4

      Yes, at least on the client-end. The server can be on the open internet and not part of a Tailscale network:

      As the name suggests, this proxy sits between your Postgres client and your cloud-hosted database. It only accepts connections from clients over Tailscale, by using our tsnet library.

      This is, of course, pure marketing. There is no vulnerability in making this pgproxy also listen (TLS or no TLS) on localhost and via unix domain sockets so that you can deploy it on the same machine your app server is deployed to and use it to enforce proper TLS to the remote postgres instance. That would also reduce latency as compared to connecting from your app server to pgproxy over Tailscale and then connecting from there to your postgres instance, but then they wouldn’t make any money off of it.

      1. 10

        The more charitable, and true outlook: pgproxy is something we wrote for our own internal use, and then subsequently open-sourced. It uses Tailscale because that’s what fit best into our production infrastructure.

        I’ll also note that simply making this listen on localhost doesn’t solve the problem in a satisfying way, because in doing that you can no longer have an allowlist on the server side that is restricted to a single entity that you know is doing TLS correctly. Instead, you’re relying on every client (including desktop and CLI clients run by humans) to remember to connect to the localhost proxy, and not to the server directly. Unfortunately the latter is the most obvious thing that people would do if they were trying to connect to the database.

        To fix that, you need to move the proxy away from the clients somehow, so that you can wield the single tool most hosted DBs offer (IP allowlists) in a way that prevents this security regression. And once you’ve moved the proxy off the client’s machine, you’re right back to the problem of securing the client<>proxy datapath, with the constraint that you cannot rely on the postgres client to get it right. The simplest way to achieve that, is a VPN. And yes, we built this tool to work over Tailscale, since that’s naturally the VPN we’re using.

        With all that said, the code’s open source, so you’re more than welcome to take the couple hundred lines that implement the proxying logic, and stick whatever TCP listener and method of securing the client connection on it. Should be less than an hour of work once you’ve figured out the transport security bit.

        1. 2

          To address the problem from another angle: the assumption is you’re using a hosted database, so you can’t just put the DB on Tailscale too, right? Seems like in cases where you control the DB server that would be the simplest choice.

          An issue I ran into recently was that I wanted to run Postgres on Fly.io and connect to it from AWS Lambda, but I couldn’t figure out how to get Lambda to connect to the Fly.io WireGuard network, so I had to give up and use RDS instead. It seems like it should have been a trivial problem if there was just “Let’s Encrypt Postgres” for servers and clients.

          1. 1

            It uses Tailscale because that’s what fit best into our production infrastructure.

            That’s fair, but the marketing spin on this being the only secure way to use postgres isn’t.

            [..] because in doing that you can no longer have an allowlist on the server side that is restricted to a single entity that you know is doing TLS correctly.

            There’s no risk to the server that only accepts connections TLS connections from clients that don’t validate the TLS certificates (this would be akin to claiming the existing of curl -k makes my HTTPS server insecure). Configuring your server so it only accepts connections from a single address associated with the pgproxy instance that only accepts connections over Tailscale is at best an attempt at encouraging best practices by clients and for the safety of the clients but doesn’t affect the server.

            The simplest way to achieve that, is a VPN.

            I would argue that this means people should really just look into putting their postgres nodes on a VPN and only listening for connections on the VPN subnet (whether that’s Tailscale or otherwise), leaving pgproxy aside altogether. (Which would also be in your benefit, anyway.)

            1. 9

              the marketing spin on this being the only secure way to use postgres isn’t.

              That’s certainly not what I intended when I wrote the article. I intended as “here’s a problem you can encounter in some system architectures, and here’s a thing we made to help with that.” As you say, this is definitely not the only way of doing it, and it has a bunch of tradeoffs that may or may not match what someone else needs. In my defense, the time from blank page to publishing this article was maybe 1h, so I may have gotten the tone and message wrong in places.

              There’s no risk to the server that only accepts connections TLS connections from clients

              This is technically true, but also misses the more important point: what you’re trying to keep safe is the data stored within the database server, not the clients or server per se. Allowing clients to connect insecurely compromises that security, even if both endpoints are completely happy doing so.

              Yes, you can achieve a secure transport in a number of ways, but the downside of most strategies is that it relies on all clients being correctly configured all the time, and if you forget one or someone spins something up in a default configuration, you lose. That’s a very brittle setup that’s ripe for regressions, in my experience.

              I would argue that this means people should really just look into putting their postgres nodes on a VPN

              The article opens with exactly this, yes :). There’s a realpolitik thing going on here: yes, in the abstract ideologically pure system architecture, the database would be hosted somewhere that can only be accessed securely in the first place. However, as the world moves towards more hosted as-a-service type things, that line is becoming increasingly hard to hold, and in a lot of realistic deployments the question becomes “okay, how do we secure this architecture”, rather than preventing it from happening.

              That’s what happened to us, incidentally: we had a database running directly on Tailscale for some business analytics stuff, but then the need for fancier analytical capability, combined with lack of time to do a good job of self-hosting, took us to a TimescaleDB hosted database - and so we ended up having to secure a DB connection with an internet leg.

      1. 2

        This is neat, but why is there no certbot for Postgres? I just want it to verify my server’s DNS and give me a signed cert based on that which my client can verify with the OS’s CA.

        1. 8

          acmetool, to name one example, maintains an up-to-date TLS cert in /var/lib/acme/ and will (temporarily) start up a minimal HTTP server to answer the ACME challenge protocol if it needs to. You can then configure any other services you want to use that TLS cert, like IMAP or SMTP or Postgres even if the server isn’t a web server.

          1. 5

            That would be nice, but note that it doesn’t fix the core problem that the bit.io folks explained: almost all postgres clients default to unsafe TLS settings. So even if you present a completely valid cert, those clients don’t care, and will proceed on the basis that any TLS handshake took place, regardless of what the cert said. IOW, I can still trivially MitM the connection and serve literally any TLS cert, and the client will be none the wiser.

            Of course you can reconfigure the client for full validation, but that leaves the brittleness problem: you have to never screw up, with any of your clients, because if you forget or regress a client, it’ll silently fail open. The only fix for that is in pushing the postgres clients to change their default, though making it trivial for the servers to serve valid zero-config TLS is indeed a likely prerequisite.

            1. 2

              Right, the clients need to use the OS Certificate Authority to verify that the certificate is for the domain you think it’s for. Seems like a solvable problem, but there’s a chicken and the egg component because clients don’t ship with ability to do normal domain validation and certbot doesn’t support Postgres out of the box.

              1. 2

                Yup, in the longer term, making it trivial to use TLS with postgres, and changing all the clients to strict validation, would be the way to go.

            2. 1

              How about having the server prove knowledge of a secret that the client knows on connection? The secret could be in the PG URL (could even use the user + password because these are large random strings in most cloud deployments).

              If no new info is added to the URL then no action is required beyond adding this functionality to the code of clients and servers and then eventually enabling it by default.

              1. 1

                The usually way to do this is to have a private Certificate Authority, and in “verify-full” mode the client expects the server to present a certificate signed by that authority. That works but it’s an unnecessary hassle. Just have the server prove to Let’s Encrypt that it controls the domain and LE can sign a cert, like they do for millions of websites.

              2. 1

                I don’t know any specialized tool supporting Postgres, but configuring TLS using the certbot-created certificates and having certbot reconfig Postgres shouldn’t be that much work even if you have to do it by hand?

                1. 1

                  Sure. The problem is that the clients don’t verify that properly configured certificate appropriately before sending credentials. So an attacker who gets in the middle of a client/server connection can just present any old invalid certificate they care to, get the client’s credentials, and relay them to the server.

                  Getting postgres clients to appropriately validate the TLS handshake “shouldn’t be that much work” either, I suppose. But it’s harder than it should be, unfortunately.

                  1. 1

                    Yeah, it’s not so hard, you just copy the certs in a post renewal hook, but it should be out of the box like Apache and Ngnix.

                1. 6

                  Tailscale’s free tier is not very feature rich. You can’t do user ACLs with it, for example. And yes, you’re locked into a Google/Microsoft identity. If you don’t like that, pay for more. If you can’t, go use another service or roll it on your own.

                  Articles like this are why I’ll probably never make a service with a free tier. Yikes.

                  1. 21

                    Articles like this are why I’ll probably never make a service with a free tier. Yikes.

                    Because someone might write a polite and even-handed criticism? I have certainly read and indeed written substantially less charitable things about companies that are not giving their stuff away for free.

                    1. 12

                      Tailscale employee here. +1, my reading of iliana’s post is absolutely not “wow, those ingrate free users”. It’s a very well constructed critique of one of our product choices, and my reaction is a web of different things (mostly “I agree, and have complex thoughts on why we haven’t done the thing yet”), but I’m very grateful for this feedback.

                      1. 10

                        Another Tailscale employee here. +1, iliana means very well by this. Xie means very will by this and xer feedback is already in the awareness of the product side of things at Tailscale. I was thinking out ideas on how to work around this with xer and presented them alongside that article. Overall I’m fairly sure that this will end up with positive change.

                        1. 6

                          Talking about ICE and union busting in this context doesn’t feel even-handed to me. 🤷🏽‍♀️

                          I have certainly read and indeed written substantially less charitable things about companies that are not giving their stuff away for free.

                          I’m confused, isn’t that exactly the point? If Tailscale only offered Google/MS login on their paid corporate plan this article would make much more sense?

                          1. 8

                            Talking about ICE and union busting in this context doesn’t feel even-handed to me

                            A footnote explaining why the OP doesn’t want to work with Microsoft and Google seems entirely relevant, no? Google and Microsoft do work with immigration authorities, and they do engage in union-busting; these are facts, not imputation or speculation.

                            1. 1

                              But the OP isn’t forced to work with them. They can use the free service Google/MS provide or pay Tailscale to bring a different identity provider.

                              That makes Google/MS business with immigration extremely irrelevant to an article about Tailscale.

                              1. 9

                                I’m really not sure what you mean. The complaint here is that the free tier of Tailscale requires people to use Google or Microsoft’s services. Explaining why Google and Microsoft are parties the OP doesn’t want to work with is at least somewhat relevant.

                      1. 2

                        it employed a strictly synchronous task model

                        Not entirely certain how to interpret that – does that mean it’s cooperatively (as opposed to preemptively) scheduled?

                        Edit: no, I see this page explicitly mentions preemptive multitasking. (So, still unsure what “strictly synchronous task model” means.)

                        1. 8

                          This seems to be a reference to Hubris’s IPC mechanism and the general execution model for tasks, which is discussed in more detail in the linked docs.

                          Tasks have a single thread of execution, and cannot do anything asynchronous: if they send an RPC to another task, they’re suspended by the kernel until that other task responds (or the kernel synthesizes a response if that task crashes). They only receive asynchronous notifications from other tasks or hardware interrupts when they explicitly perform a receive (which suspends the task until something noteworthy happens).

                          You can still do preemptive execution in this model - arguably easier, because there’s very few surprises for the kernel to deal with: a task is either runnable, or it took one of a small number of actions that are explicitly documented to suspend the task, until some future other small number of actions resume it.

                          This makes for a very nice programming model: a task is single-threaded, runs an explicit event loop if it exposes an API to other tasks, and everything it does is synchronous and executes exactly like it says in the code. Even interaction with the rest of the OS looks like normal function calls that just execute in a roundabout way.

                          1. 8

                            Just to add to that, this is not just a very useful model, but it can also be tuned for surprisingly good performance. Just a few days ago, in another comment here, I mentioned QNX as one of the microkernels that figured our very early that building a fast message-passing system involves hooking the message-passing part to the scheduling part. One of the tricks it employed was that (roughly – the terminology isn’t exact and there were exceptions) if a process sent a message to another process and expected an answer, then that other process would be immediately scheduled to run. In a strictly synchronous task model, even a simple scheduler can get surprisingly good performance by leveraging the fact that tasks inherently “know” what they need and when.

                            It’s also worth pointing out that this makes the whole system a lot easier to debug. I haven’t used Hubris so I don’t know how far they’ve taken it but one of my pet peeves with most async message-based systems is that literally 90% of the debugging is “why is this task in this state”, as in “who sent that stupid message and why?” If the execution model is strictly synchronous that’s very easy to figure out: you just look at the tasks that are in suspended state and see which one’s trying to talk to yours, and if you look on their stack, you also figure out why.

                            It’s probably also worth pointing out that all these things – synchronous execution, tasks defined at compile-time, and (not used by Hubris, but alluded to in another comment) cooperative multitasking are very much common in embedded systems. I’ve worked on systems that were similar in this regard (strictly synchronous, all tasks defined at compile-time) twice so far. It doesn’t map so well to general-purpose execution but this isn’t a general-purpose system :-D.

                          2. 2

                            I’m not sure why they wrote “preemptive multitasking” there. I’ve read the documentation and briefly looked at the code — tasks are only switched on syscalls and interrupts.

                            1. 8

                              Isn’t an interrupt-triggered task switch (e.g. on a timer interrupt, say) kind of the definition of preemptive multitasking? If the interrupted task gets stopped and another task starts running on the CPU, the first task has been preempted, no?

                              1. 1

                                By “interrupts” I meant hardware interrupts that the tasks subscribe to. I guess it still is preemptive but I don’t think they use a timer specifically for preemption.

                              2. 4

                                “Preemptive” is a term that hasn’t been used in a long time because the thing it replaced, “cooperative multitasking”, is no longer in use. In cooperative multitasking, each program needed to explicitly call some OS-provided function to yield time to other programs. For example, on pre-OS X Macs it was WaitNextEvent().

                                1. 10

                                  Strangely cooperative multitasking is in use, again. Just at the next level up in the stack. We’ve just renamed it to things like “green threads” or “async” or so on, and it’s multi-tasking at the “task inside a process level” instead of “process inside the OS” level.

                                  1. 2

                                    I was just thinking that. As I understand it, JS sagas implemented using generator function*s and yield are basically doing cooperative multitasking within a JS single-threaded execution context, right? And isn’t this similar with generators in other languages?

                                    1. 1

                                      I’m not familiar with exactly what you mean when you say “saga”, but probably. Async in javascript is, so assuming they make use of that, yes.

                                      Generators, in a way, but they’re mostly an even simpler form of control flow than co-operative multi-tasking in that there is no scheduler, just “multiple stacks” (though usually they emulate the extra stacks with compiler magic) which you explicitly switch between. Generator support at the language level is enough to make some pretty ergonomic co-operative multi-tasking libraries though. Rust, for example, does all it’s async stuff on top syntactic sugar on top of generators, using normal libraries with no special support from the language for the scheduling part.

                                    2. 1

                                      Huh. I guess that’s true. Fascinating!

                              1. 2

                                I feel like we just need something that’s like Caddy v1 [1] but for VPNs that just works: it should have very little setup overhead and just do everything for you (e.g. generate public/private keys, certs, etc) but still be able to be more flexible with larger configurations.

                                This isn’t the first environment-assuming-auto-install script I’ve seen for insert generic complicated VPN software here and I don’t want more of those; I know I can’t just ask for free software and have it be made [2] but I don’t know much crypto and rolling your own is dangerous.

                                [1] Caddy v2 is bloated and doesn’t really respect v1’s simplicity IMO.

                                [2] There’s dsvpn but it seems the author has stopped maintaining it and it was quite unreliable when I tried it.

                                Edit: Another concern is cross-platform: only the big and bulky VPNs have mobile clients right now.

                                1. 2

                                  there’s dsvpn

                                  Runs on TCP (first bullet point under features)

                                  Eh, no thanks. At that point I’d much rather just use openssh as a socks proxy.

                                  TCP over TCP is unpleasant, and UDP and similar protocols over TCP is even worse.

                                  It seems likely the future of vpn will be built on wireguard. But it needs something like zerotier.com for some “virtual secure lan” use cases.

                                  Tailscale.com does a bit of the zerotier stuff for wireguard - but zerotier has (AFAIK) smarter routing - local lan traffic stays local and encrypted. (if you have two laptops at home, a vps in the cloud - all on the same zerotier vpn - all traffic is encrypted, but traffic between the two laptops is routed locally. And things like bonjour/mDNS works across all three machines).

                                  1. 4

                                    FWIW, Tailscale also routes traffic intelligently, so LAN traffic will remain local (assuming the devices are able to talk to each other, of course). Tailscale does have public relay nodes as a last resort fallback, but on well-behaved networks, all traffic is p2p on the most direct path possible.

                                  2. 2

                                    Check dsnet, which was posted here a few weeks ago: https://github.com/naggie/dsnet it is basically a simpler UI for wireguard, which I like so far.

                                    1. 2

                                      There’s dsvpn but it seems the author has stopped maintaining it […]

                                      The GitHub repo currently has 0 open issues, so I’d rather call it mature instead of unmaintained.

                                      […] and it was quite unreliable when I tried it.

                                      Maybe give it another chance now? It works perfectly for me.

                                      1. 2

                                        seems like streisand fills the gap of easy-but-still-configurable setup. not entirely one-click but aimed toward a less technical crowd and holds the user’s hand decently well.

                                        1. 1

                                          This looks fantastic, thanks for putting this together. I’m particularly interested in the prospect of Wireguard support, is that waiting until that’s merged into OpenBSD proper? (If I can avoid needing any Go on my machines I’m happy).

                                        1. 17

                                          I just want to make a meta point here…this article is very much content marketing for Tailscale. Normally I flag this on principle.

                                          But. But. But.

                                          The amount of effort and detail put into this, the explicit “this is how you would implement a competitor or an alternative to what we provide”, and the educational tone of “here is an amazingly detailed list of stuff we had to overcome in the course of shipping our product” instead of “oh this is all Very Scary and you should just buy our product and not worry about it” is what makes it different for me.

                                          This is the new gold standard in my opinion–if somebody kvetches about their content spam getting flagged, I will point to this article and ask “were you this useful?”.

                                          1. 7

                                            Thank you :). This is exactly the balance I was shooting for when writing the article. Yes it’s indirectly marketing for Tailscale, in a “btw we make a thing so you don’t have to think about any of this” way. But like you, I hate content marketing that runs on a platform of “you’re not equipped to understand this problem, just trust us.” I’m glad that I apparently managed to thread that needle :)

                                            My litmus test when writing was: if I remove all mention of Tailscale in the article, is it still the thing I would personally read to remember the nasty details of how to do NAT traversal? At some point I’ll crosspost this to my personal blog for safekeeping, and I don’t want it to feel out of place there.

                                          1. 3

                                            [cgNAT] is a run of the mill double-NAT, and so as we covered above it’s mostly okay

                                            Huh? I was under the impression that cgNAT (NAT444) is completely impenetrable.. nothing could traverse the cgNAT on my previous ISP.

                                            1. 4

                                              CGNAT isn’t by itself harder than regular NATs, but CGNATs can have characteristics that make them harder to traverse. They might do endpoint-dependent mapping (which any NAT can do), they probably don’t support hairpinning (which breaks connectivity between two different subscribers of the same ISP), and sometimes break the port mapping protocols (in fairness, to avoid confusing software that’s a bit too naive about the realities of NAT traversal).

                                              Overall, CGNATs are still bad news for p2p connectivity, but they’re not automatically the death of all p2p connectivity. Tailscale successfully traverses a fair number of CGNATs that are “well enough” behaved.

                                            1. 5

                                              Hey, this article is pretty cool, however would it be possible for you to use the well-defined documentation IP addresses next time? Those can freely be used without running the danger of misdirecting bots to innocent websites.

                                              1. 9

                                                An early draft of this article used the documentation ranges. Unfortunately, unless you breathe BGP configurations for a living, the documentation ranges are really hard to read, especially when you’re trying to convey to/from and packet transformations. It made the whole article much harder to follow.

                                                In the end, I decided to go with easier to read addresses, on the reasoning that this isn’t software documentation, so the primary risk of crosstalk (people copy/pasting config snippets) doesn’t apply. That said, it’s obviously a compromise that I’m not super happy with.

                                                The article evolved a bunch from the early draft, let me see how readable it is now if I swap in the documentation ranges…

                                              1. 21

                                                This is hands down the best article I have ever read on NAT traversal. I’ve spent years of my life dealing with all of the fun issues that come up (for example, some UPnP implementations care about the Case-Sensitivity of the headers sent in the faux-HTTP request, and they don’t all agree!), and I still learned things reading it.

                                                1. 10

                                                  Thank you! In return, I’ve just learned that some UPnP implementations care about header case. Thanks, I hate it!

                                                  But seriously, that’s great intel and TIL :). If you have references for what empirical behaviors you discovered, I’m interested!

                                                  1. 5

                                                    I don’t have any references, but I can say that I’ve also seen:

                                                    1. Routers may or may not like SOAPAction header value double quoted.
                                                    2. Routers may reject have multiple mappings to the same internal port, even with distinct destination IPs. For example, if you have a mapping for external port 8080 to internal address 192.168.1.100:7070, it will reject adding a mapping for external port 8181 to internal address 192.168.1.200:7070 because the internal port “collides”.

                                                    I think that’s all I can remember, and I don’t remember any data about how often these things occurred. Consumer routers are amazing creatures.

                                                    1. 5

                                                      Amazing creatures indeed. Thanks for the tips! So far, it seems that, thankfully, most routers these days that offer UPnP IGD also offer NAT-PMP or PCP. UPnP does grab a teeny bit more of the long tail, but you can sort of get away with ignoring UPnP in a lot of cases.