1. 57
  1. 18

    Tremer writes:

    Is IPsec really hard to use? No, it clearly is not if the vendor has done their homework right and provides an interface that is easy to use.

    To put this into perspective: I ran an OpenVPN server for >50 people but failed to configure IPsec. (It might be just me, though).

    1. 9

      It’s not just you. I’ve ran OpenVPN in several configurations at several companies. It wasn’t perfect, but it usually worked fine or good enough.

      Whereas IPSec… nearly every admin I know hates it, because it varies so much and usually it’s vendor-specific why it fails and a lot of it is a black box.

      I’ve so far managed to stay away from it and never be personally responsible for an IPSec setup and I’ll try to avoid it for the next 20 years.

      1. 2

        Even getting started is pain: which one these *Swan projects should I use? Which ones have implementation for stuff this vendor provides/requires? Which one has the quirks (bugs) in those features that vendor requires? And finding out happens by trying to get those non-logging black boxes to connect and stay connected.. horrible.

    2. 18

      Not the most important points, but I spotted even more mistakes in the criticised “Why not WireGuard” post. This paragraph specifically is full of approximation & minor errors:

      ChaCha20 is a stream cipher which are easier to implement in software. They encrypt one bit at a time. Block ciphers like AES encrypt a block of 128 bits at a time. That would need many more transistors when implemented in hardware, so larger processors come with AES-NI - an instruction set extension that performs some tasks of the encryption process to speed it up.

      Let’s break it down piece by piece:

      • ChaCha20 is a stream cipher. It is. No error there.
      • which are easier to implement in software. This strongly suggests stream cipher in general are easier to implement in software. This is false. What makes Chacha20 easy to implement in software is its RAX (Rotate, Add, Xor) design, which (i) exclusively relies on widely available CPU instructions, and (ii) is naturally constant time on those CPUs.
      • They encrypt one bit at a time. An easy approximation to make, but no. Stream ciphers are deterministic random number generators, that generate an (almost) arbitrary amount of random bytes, which you can then simply XOR with the plaintext message. You can indeed cut the stream at any one bit. But the stream is not, in general, generated one bit at a time (or even one byte at a time). A Chacha20 block is 512 bits (64 bytes), and Chacha20 will generate the stream 512 bits at a time. Moreover, actual implementations often use vector units, and thus process 4, 8, or even 16 blocks at a time. Such meta-blocks can go up to a kilobyte.
      • Block ciphers like AES encrypt a block of 128 bits at a time. Okay.
      • That would need many more transistors when implemented in hardware, No. If I recall correctly, AES was selected for its hardware performance. Even if it required more hardware than Chacha20, this has little to do with the block size: Chacha20’s blocks are 4 times bigger, remember? And if I were to implement Chacha20 on hardware, I would likely use 16 adders instead of just one so I can parallelise the core loop (and that’s before I even consider processing several blocks in parallel). I can’t say for sure, but I currently guess that efficient AES hardware may require less transistors.
      • so larger processors come with AES-NI - an instruction set extension that performs some tasks of the encryption process to speed it up. Well, first, thank you for providing this email from Chacha20 designer, that note that Chacha20 also benefit from hardware accelerators: vector units. Which makes it perform well within a factor of 2 of AES in most cases. The reason we need AES-NI is not just speed, it’s timing attacks. Naive optimised implementations are prone to easily exploitable timing attacks even across the network. Implementing AES in a constant time manner without loosing too much speed is hard: bit slice implementations are no picnic. (And of course Chacha20 blows bit sliced AES out of the water).
      1. 2

        Block ciphers like AES encrypt a block of 128 bits at a time. Okay.

        If we’re nitpicking here, then that’s not okay: A block cipher just has a fixed block size that should be at least 128 bits, but the block size varies between ciphers. For example, DES and SKIPJACK had 64-bit block sizes, SPECK has a 64-bit block size specification, and then there’s Threefish which can go up to 1024-bit block sizes (but I think it’s only ever used as an internal part of the Skein hash).

        AES rant inbound:

        That would need many more transistors when implemented in hardware, No. If I recall correctly, AES was selected for its hardware performance.

        According to Report on the Development of the Advanced Encryption Standard (AES) on p. 53:

        Rijndael appears to be consistently a very good performer in both hardware and software across a wide range of computing environments regardless of its use in feedback or non-feedback modes. Its key setup time is excellent, and its key agility is good. Rijndael’s very low memory requirements make it very well suited for restricted-space environments, in which it also demonstrates excellent performance. Rijndael’s operations are among the easiest to defend against power and timing attacks. Additionally, it appears that some defense can be provided against such attacks without significantly impacting Rijndael’s performance. Rijndael is designed with some flexibility in terms of block and key sizes, and the algorithm can accommodate alterations in the number of rounds, although these features would require further study and are not being considered at this time. Finally, Rijndael’s internal round structure appears to have good potential to benefit from instruction-level parallelism.

        Interestingly, p. 50 gives Twofish shit because it uses addition (though it also used an S-box), which is “somewhat difficult to defend against timing attacks and power analysis attacks”. I’m not sold on the timing attack part outside dedicated hardware implementations, but the power analysis part definitely got a point there.

        Going back to p. 39, table lookups were declared “not vulnerable to timing attacks”, which suggests that timing attacks in software were not on NIST’s radar at the time, but they seemed to at least eyeball software implementations as well given that they looked into “instruction-level parallelism”. You’re probably right when you say they focused on hardware implementations though.

        1. 2

          Thank you for the corrections.

          This report puzzles me, especially this idea that there would be no software implementation. Even if you could expect hardware support soon after the adoption of the standard, we could imagine partial support that doesn’t touch the cache (and in particular doesn’t force the look up table to stay in L1 cache).

          Addition is constant time in most CPUs, but it is harder to have it resist power analysis. In general, carry propagation means addition is not very energy efficient. Overall, addition is excellent in software, crappy in hardware. Chacha20 was optimised for software, so…

          My own opinion of AES: the s-box was a bad idea, and AES is obsolete because of it. It’s just too easy, too tempting, to write software that’s vulnerable to cache timing attacks. More generally, new general purpose ciphers today should probably not ignore software implementations. Though I confess hardware-only solutions like Keccack do have their appeal in some settings, and hybrid solutions like Gimli aren’t quite there yet.

          1. 1

            My understanding is as follows:

            1. Dedicated hardware implementations will always beat out cipher implementations in software in terms of speed. If a primitive becomes ubiquitous, Intel, AMD and ARM will add dedicated instructions anyway, making software performance on non-embedded irrelevant. (Ubiquity is only evaluated for the West, no matter how much SM and GOST standards may be used in China and Russia, respectively, because the relevant chipmakers are all U.S. companies and thus appear to care little for Chinese and Russian standards.)
            2. There will always be demand for fast software-only ciphers in embedded because (1) is difficult and/or expensive. And there’s a niche of people who are portability snobs who refuse the idea that source code should be platform-dependent even in the face of significantly better performance numbers.
            3. The two camps cannot unite unless either (a) everyone agrees on a magical threshold of accepting a certain penalty and somehow getting the required speed on ARM and x86, (b) someone comes up with an ingenious cipher design that is both easily expressible in hardware and software, yet arrives at the same result, at approximately equal speeds.

            Therefore, AES cannot become obsolete unless it is badly broken: New competitions would focus at least partially on software as well, which causes a design that only evaluates hardware performance to never be adopted as a standard, which it would have to become ubiquitous, which it would have to be in order to be adopted as a primitive added to the instruction sets of x86 and ARM.

      2. 5

        Wireguard was easy to setup. But I also have done a lot of OpenVPN, so without that experience, I would have still stumbled over routes and firewalls.

        Wireguard is not good on DynDNS connection. They have an official script for forcing Wireguard to re-check DNS, but I still find that clunky. The OS doesn’t store the destination hostname. It’s discarded once resolved.

        Wireguard cannot bind to a specific adapter or IP address. This might not seem like a big deal (there’s an issue for it) because it don’t respond on a port without the correct key, but it can lead outbound packed to be on a different IP, leading to asymmetric routing.

        I should really do my own post on this.

        1. 4

          I just read both of those articles and I’m a little disappointed in them both. They both came across as fairly shallow dismissals of each other, which is too bad because I’d like to see the deeper conversation and learn more.

          I have no dog in this fight. I’ve set up openvpn and it was easy. I’ve tried to setup wireguard and failed. I haven’t used tailscale although I very much respect their team.

          One thing I didn’t see in the rebuttal was a response to the performance questions raised by ipfire. Will tailscale share their benchmarking setup and/or confirm use of jumbo frames?

          Both groups seem to talk past each other. ipfire seems to say “sure, wireguard is simple but that’s because it doesn’t do a bunch of important things”. Tailscale responds with “there are scripts for that”

          Tailscale says about choosing a cipher: “This is an unanswerable question for anyone who is not a cryptography expert.” which seems to miss the point that only one cryptographer needs to publish a recommended config value. It’s not like every company needs to go hire a cryptographer if they are going to use ipsec.

          I’m also confused about the question around hardcoding the wireguard cipher suite. ipfire states that this will make upgrade impossible. If I’m reading tailscale’s response correctly they are saying multiple cipher suites might be added to wireguard later and that upgrade will work. I guess I don’t know enough about the wireguard v1 protocol to determine if it’s trivial to add another suite in a backwards compatible way or not.

          So here’s my shallow take: from a business success perspective most customers don’t care that wireguard is simple. Most business’s I work with are already using TCP or HTTP based solutions so they’ve already decided that some of the udp/voip stuff over vpn isn’t that high of a priority. So I’d guess Tailscale’s early customers would be folks who need a vpn with better UDP performance and/or people who get bitten by TCP in TCP issues. If Tailscale succeeds on a larger scale it will likely be on the strength of it’s user interface (cli or not) and enough feature parity with existing solutions to check the boxes the CTO cares about.

          1. 3

            Mobile processors are somewhat slower than desktop and server processors when doing encryption, of course, but they are also usually on much slower networks. On mobile, you should expect the symmetric crypto to take maybe 1% of the time, and slow networks to take 99% of the time.

            Speed wouldn’t be affected, but what about energy consumption? Maybe using CPU specialized instructions (if available) would reduce power consumption compared to regular instructions. In that case, we may want to benefit from that encryption instructions set.

            Besides, the network is not always slow: a phone can use a fast WiFi connection. But, maybe still too slow (even with the latest 802.11ac technologies) to notice any performance degradation, though.

            1. 2

              I recently had the joy of setting up a (mostly) OpenBSD OpenIKED <-> FreeBSD StrongSwan tunnel. I provided my readable OpenIKED config and the FreeBSD admin tried matching my config. Some trouble points:

              • Strictly setting the algorithms did cause trouble
              • PF debugging on both sides (normal?)
              • Rekeying and timeouts occurred

              We got it to work on a weekend and it definitely took longer than I would have wished for. I was glad my friend is an experienced BSD admin and he has a deep understanding of all involved Protocols and Systems.

              Some inaccuracies maybe:

              The author here seems to suggest that configuring IPsec on OpenBSD is complicated. Although our team is not personally familiar with IPsec on OpenBSD, we do know that configuring WireGuard on OpenBSD is easy, just like on other platforms.

              On the other hand the go implementation is only for amd64 and arm64. https://openports.se/net/wireguard-go

              1. 2

                I’m curios when (and if) will see:

                1. Enterprise offering/enforcing WG to their office workers
                2. VPN “privacy” services offering WG
                1. 4

                  Regarding #2, Mullvad already offers WG access.

                2. 1

                  What is missing is tooling for enterprise use cases. Such as LDAP integration and then being able to have different user classes in a simple way based on that. 2FA authentication as well as far as I know.

                  1. 3

                    That’s tailscale’s product.

                    1. 4

                      First of all I don’t like tailscale very much because that is yet another company trying to ride on the Free Software wave using the freemium business model. As soon as you want to use any of the relevant features it is not Free any more and it says “contact us”. Unless I have misunderstood how it works. I think I was not clear enough as I don’t see it being available if it is not fully Free software or even open source. Further they use the golang implementation of wireguard and are very opinionated about how the networking should look.

                      1. 5

                        While we shouldn’t confuse the motives of people with the motivations of companies, it’s worth looking at the track record of the folks involved in tailscale as far as involvement in Free Software. My impression of them (as well as people like Brad Fitzpatrick) is that while they will make proprietary software, they want open source by default and aren’t in favor of using proprietrary stuff for vendor lock-in. Your impression may be different, but I’m inclined to give them the benefit of the doubt.

                  2. 1

                    I really like the WireGuard technology! However, some issues that are only mentioned briefly, which are important if you want to deploy WireGuard at some scale and compare it to OpenVPN:

                    • no dynamic configuration of clients, if you have a limited pool of (public) IP addresses, it is not feasible to statically assign one IP to one client (public key) not to mention indicate what to route over the VPN, or receive DNS servers;
                    • no support for TCP (fallback) if your client is running on a network that blocks or mangles UDP packets. Degraded VPN is better than no VPN in many cases…

                    Of course you could simply refuse to work on networks with broken UDP and use RFC 1918 addresses for the client together with (CG)NAT, or create your own WireGuard apps that handle IP/route/DNS assignment out of band, but that is some next level stuff that becomes clear once you try to publish a VPN app for iOS or macOS in the stores… I’m sure Tailscale can take care of that for themselves…

                    1. 1

                      WireGuard is simply amazing: I have a poke-setup meaning I have 1 master node and a ton of worker nodes (all of them connect to the master and can see one another via the master) and what I can do is: not only be part of my cluster’s network and utilize it as a VPN, but also setup Docker Swarm among a bunch of both on-prem and cloud instances. There’s a LB on the master node to orchestrate all routing.