Threads for zx2c4

    1. 1

      I like how the CSS gives this writeup a “papery” feel. Which makes me want to focus on rigor.

      I applaud any efforts to enhance the inscrutability of our RNGs. To realize those efforts, I think it’s important to rigorously define how we intend to make improvements.

      The random number generator has undergone a few important changes for Linux 5.17 and 5.18, in an attempt to modernize both the code and the cryptography used.

      “Modernize” here doesn’t make the goal clear. Your “modern” is not my modern, and vice versa. The word is simply too loaded and nonspecific to be useful in this setting.

      I’m obviously nitpicking, and not intending to disparage the author or their work.

      1. 6

        I guess but the usage here has a pretty straightforward interpretation. For example:

        • Code: moving from a style that was common in the kernel tree in 1995 to a style that’s common now in 2022.
        • Crypto: moving from SHA-1 (introduced 1995, broken now) to BLAKE2s (introduced 2012, not broken).

        And if you’re reading that thinking, “huh, ‘modern’ is a pretty loaded term. What does he mean? Didn’t the late modern era of history already end? Which usage does he have in mind?” then you simply need to read onward to find out exactly what I mean, since it’s written out.

        1. 1

          I think that’s a very reasonable way to think about it, and like you said, you do clearly lay out what you mean by modern. I suppose I have a bit of a knee jerk reaction to the word. It’s often used to by others to hand wave over actually having deep knowledge about a topic; clearly not something you suffer from. But for me, those people have tarnished the word when it’s used in relation to software/infrastructure/technology in general, and when I hear or read it, I instantly want to stop listening to what they have to say.

          FYI I did finish reading your writeup before commenting, and found it very interesting.

    2. 4

      Does that mean I can retire this script I mockingly wrote eons ago?

      % urandom | pv >/dev/null          
      ^C.3GiB 0:00:04 [3.63GiB/s] [  <=>                             ]
      
      % pv /dev/urandom >/dev/null
      ^C58MiB 0:00:07 [ 151MiB/s] [        <=>                        ]
      
      #!/bin/sh
      # urandom script
      head \
          --bytes=128 \
          /dev/urandom \
      | base64 \
      | openssl enc -aes-256-ctr \
          -in /dev/zero \
          -nosalt \
          -pass stdin \
          2>/dev/null
      
      1. 2

        Just curious, but what do you need 3.5 GiB/s for where 150 MiB/s isn’t sufficient?

        1. 1

          When I wrote it, it was much slower (~10MiB/s I think?) and the use-case was basically writing a harddrive from start to end with random junk. Around that time I just thought writing random patterns was slower than zeros or ones and I dug into it and basically found using openssl’s AES in CTR mode as a csprng was much faster than the plain dumb case of just read() from /dev/urandom => write() /dev/my-disk here.

      2. 2

        No, doing aes-ni into a buffered pipe is still going to be a lot faster:

        zx2c4@thinkpad ~ $ pv /dev/urandom >/dev/null
        ^C52GiB 0:00:06 [ 601MiB/s] [           <=>                                                                                    ]
        zx2c4@thinkpad ~ $ urandom | pv >/dev/null   
        ^C.9GiB 0:00:07 [2.98GiB/s] [             <=>                                                                                  ]
        
        1. 1

          Curious. With 5.16 on my laptop (Debian Sid), I easily surpass 200 MiB/s, but with 5.16 on my workstation (also Debian Sid), I can’t get up above 40 MiB/s. The laptop is an Intel Core i7 8650 while my workstation is an AMD Ryzen 5 3400G. Could there be a bug with AMD Ryzen processors and the CSPRNG performance? I don’t mean to use you as technical support, so feel free to kick me in the right direction.

          (laptop)% pv -S -s 1G /dev/urandom > /dev/null
          1.00GiB 0:00:04 [ 233MiB/s] [================================>] 100%
          (laptop)% uname -r
          5.16.0-1-amd64
          
          (workstation)% pv -S -s 1G /dev/urandom > /dev/null
          1.00GiB 0:00:28 [36.3MiB/s] [================================>] 100%
          (workstation)% uname -r
          5.16.0-1-amd64
          
          1. 1

            On 5.16, the RNG mixed the output of RDRAND into ChaCha’s nonce parameter, instead of more sanely putting it into the input pool with a hash function. That means every 64 bytes, there’s a call to RDRAND. RDRAND is extremely slow, especially on AMD, where a 64-bit read actually decomposes into two 32-bit ones.

            1. 1

              Thanks. I already have random.trust_cpu=0 passed as a kernel boot argument with that performance. Not sure if there’s anything else I can do to not use RDRAND.

              1. 2

                That only means that RDRAND isn’t used at boot to credit its input as entropy. It is still used, however. And until 5.18, it’s just xor’d in, in some places, which isn’t great. You can see a bunch of cleanup commits for this in my tree.

                1. 1

                  Ah, it’s nordrand I need to pass. Sure enough, a 10x performance boost:

                  % pv -S -s 1G /dev/urandom > /dev/null
                  1.00GiB 0:00:02 [ 357MiB/s] [================================>] 100%
                  

                  AMD needs to up their game. Heh.

                  1. 1

                    In 5.18, all RDRAND/RDSEED input goes through the pool mixer (BLAKE2s), so there’ll be very little tinfoil hat reasoning left for disabling it all together with nordrand.

    3. 68

      The details regarding this change can be found here: https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/commit/?id=186873c549df11b63e17062f863654e1501e1524

      Basically this changes what was before a global structure (actually, a per-numa node structure) into a per-cpu structure, which meant a lot of locks in the fast path could disappear. This unsurprisingly results in performance boosts when you have many different CPUs trying to getrandom() at the same time. I didn’t expect some test to result in gains this big, but I suppose it makes sense, because in addition to less locking, this is also, I assume, a lot more cache friendly.

      The main reasosn I made this change weren’t actually directly for performance, but was for a few other nitpicky reasons:

      • It seems a lot safer to reason about the fast key erasure RNG when you just have a key, rather than what we had before which was some shared block counter that many things accessed before overwriting the key.
      • I wanted the key overwriting to happen immediately, to just eliminate all doubt about the backtrack lifetime.
      • Trying to do all this on a per-numa node basis was a huge PITA, because it meant kmalloc, which in turn meant it needed to be deferred until workqueues were available at boot time, which made the state machine really complicated and annoying and required fallback code.

      The solution – which was just to have a 32-byte per-cpu key – lead to a whole lot of this simplifying. The first 32 bytes of RNG output returned now requires just disabling preemption. Then after that, subsequent bytes can be generated with no locks at all, totally “detached”. Every five minutes, the whole thing is refreshed with new entropy, and this requires taking locks to get all that tidy. But that’s only on a single call once ever five minutes, so it’s basically not noticeable.

      I should note that the basic “fast key erasure RNG” scheme used here isn’t novel; rather, it stems from djb’s post here https://blog.cr.yp.to/20170723-random.html. My addition is making it parallelize to a multi-cpu environment.

      1. 7

        Every five minutes, the whole thing is refreshed with new entropy

        I have a couple questions about that, actually.

        1. Why 5 minutes and not say 1 minute? I konw low-end hardware may not be producing enough interrupts for sufficient entropy, but was there some research into that time?
        2. If I flip a coin 256 times, record the results, and write them to /dev/{,u}random, is that also subject to the same 5 minute interval, or is the RNG immediately rekeyed?
        1. 33

          1a) Because that’s how it was before, and I’m trying to change the fewest amount of things at once as part of this RNG rejuvenation effort. My general feeling is that the work I’m doing is tolerated so long as it’s incremental. This also makes bisection and such easier. So if that does change at some point, it’ll need to be accompanied with an explanation that fully understands why it was that way before and the history around that decision, and why those decisions should be revisited.

          1b) In order to avoid the “premature next” problem, in which the RNG is reseeded after only, say, 9 new bits of entropy have been added, enabling those 9 bits to be bruteforced, and if this happens over and over, the “slow trickle” means there’s no state recovery. Linux current deploys two flawed mechanisms to prevent this. The first is that we “count” entropy bits added, and don’t reseed until there are 256 bits added since the last extraction. This is problematic because (i) entropy estimation is impossible, but moreover (ii) a single malicious entropy source could be the source of all those credits. Usually a “scheduler” like Fortuna is added on top to mitigate this, but Linux doesn’t have that right now. The second mechanism is that we wait 5 minutes before reseeding automatically, and maybe this sort of papers over the former shortcoming. Maybe! Fingers-crossed, hands-waved, ….

          2a) If you write into /dev/random, it actually won’t even credit your coinflip as having entropy. It’ll hash it into the pool, but won’t increment any counter. What you actually want there is ioctl(devrandom_fd, RNDADDENTROPY, &(struct{int a,b;char c[32];}){256, 32, coin_flip...}). That’ll dump those 32 bytes in and credit them for 256 bits. Alternatively you can write to /dev/random and then subsequently increment the entropy count (and hope that in between nothing extracts from the pool… there are a lot of related races like that everywhere) via ioctl(devrandom_fd, RNDADDTOENTCNT, 256).

          2b) None of this will reseed the crng immediately, though. You’ll have to wait til that 5 minute interval is up. Alternatively, you can force it to happen now via ioctl(devrandom_fd, RNDRESEEDCRNG), which only does something if the pool has 256 bits of entropy counted since the last reseeding.

          1. 5

            Because that’s how it was before, and I’m trying to change the fewest amount of things at once as part of this RNG rejuvenation effort. My general feeling is that the work I’m doing is tolerated so long as it’s incremental.

            Fair enough.

            If you write into /dev/random, it actually won’t even credit your coinflip as having entropy. It’ll hash it into the pool, but won’t increment any counter.

            Yup. Already aware of that. Regardless, even though I don’t get the accounting credit for the reseed, the kernel should still be rekeyed, yes?

            1. 9

              Yup. Already aware of that. Regardless, even though I don’t get the accounting credit for the reseed, the kernel should still be rekeyed, no?

              Doesn’t (2b) answer that? Or are you saying that you understood (2b), but think that behavior is bad and you want to change it?

              1. 4

                Just making sure I understand 2b. Thanks.

                1. 9

                  Simplifying the above to answer your direct question: 2b says that the crng is not rekeyed immediately in response to you writing bytes into /dev/random. Rather, you need to wait for the entropy count to reach 256 bits (or set it yourself with one of those ioctls), and then wait 5 minutes or hit the reseed ioctl.

      2. 5

        I think the performance improvement might actually a great security feature in itself. It significantly lowers the chance that people feel the need to write their own user space rng and then inevitably get it very wrong.

      3. 2

        Hey Jason, I know its off topic. But could not stop asking. Feel free to ignore. How did you draw the below flow(in the thread), Do you use any tool?

        ──extract()──► base_crng.key ◄──memcpy()───┐ │ │ └──chacha()──────┬─► new_base_key └─► crngs[n].key ◄──memcpy()───┐ │ │ └──chacha()───┬─► new_key └─► random_bytes │ └────►

        1. 3

          It looks a lot like the output of asciflow to me.

          ┌─────────┐
          │It looks │
          │  like   │
          └─────┬───┘
                │
                │
                │             ┌────────────┐
                │             │            │
                └────────────►│asciiflow   │
                              │            │
                              └────────────┘
          
    4. 2

      I’m surprised almost all the comments were removed. There was some interesting context there which hasn’t changed with the patch.

      1. 2

        Most of those comments were wrong or outdated. But I do like documentation. Subsequent commits cleaned up and redocumented a lot of that file, with section headers and lots of explanations. Have a scroll though:

        https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/tree/drivers/char/random.c

    5. 11

      So much content posted to twitter where it will doubtless be lost forever, or behind a login gate at some point as they chase the last profits from their fleeing audience when the next hip thing takes over :(

      Edit for usefulness so I’m not part of the problem: The post linked from the tweet.

      1. 5

        The actual content is in the Linux kernel’s git commit logs, which will certainly not be lost forever (unless, I guess, something really extreme happens).

        https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/log/drivers/char/random.c

        1. 1

          Great work.

          I agree that it would make more sense to link directly to the commits. Mailing list posts are also better, but in this case the Lobste.rs headline already provides the sufficient editorial context. Linking to tweets (which appears to be more and more popular) seems to compromise the visibility of the work in favor of self-promotion, a point humorously reflected by how Lobste.rs’ extract of the post is simply “Trending now”: https://imgur.com/a/Pduk7iq

          I hope I’m not misunderstood — this is the latest in an array of excellent contributions and I’ve myself retweeted OP.

    6. 3

      See also: Using OCF in WireGuard from the same report.

      1. 1

        I was curious from that why WireGuard uses the smaller nonce. From libsodium’s description of the two algorithm variants, the version with the longer nonce has a much tighter restriction on the number of bytes you can encrypt safely with the same (key, nonce) pair (which presumably means that WireGuard doesn’t have to be as careful as IPSec about nonce roll-over) but it doesn’t explain why that should be the case.

      1. 8

        It’s a really well written response to a rant, full of technical details, and it sums up nicely to: NetworkExtensions framework is a total mess, AppleStore model doesn’t help, and developing on Mac on a low level is not fun (last one is my interpretation).

      2. 2

        Apple doesn’t give us a lot of control over anything

        This is the point where every self-respecting hacker would uninstall this operating system and install something that gives them control, like Linux. It is a mystery to me how so many brilliant people let their machines be dominated by these bullies day in day out.

        1. 12

          This is the point where every self-respecting hacker would uninstall this operating system and install something that gives them control, like Linux.

          Are you suggesting that I uninstall Linux in order to install Linux?

          In order for WireGuard to be useful, people need to be able to use it. And that means porting it to other operating systems, even the ones that you find icky and make me tear my hair out. Perhaps you’re a man of principle and figure, “don’t stoop to Apple! Boycott the OS! Don’t spend your time on it! Don’t help that world!” But this still misses the point that I want to make things that are useful to people, even to people who haven’t made the same choices as you and me to use Linux. And on a more personal level, I’d like to be able to use WireGuard with friends and family who run other operating systems like macOS or Windows. “Convince them instead to drop their computers in the sink and get a Thinkpad W701ds to run Linux instead!” Please…

          1. 1

            Are you suggesting that I uninstall Linux in order to install Linux?

            Everyone knows that True Hackers™ install a new distro at least once every 127 days. They also have a cron job to recompile the kernel every night.

            At any rate, for me, personally, life is just too short to have to deal with this kind of stuff. I’m all for pragmatism and I can deal with less-than-idea (“broken”) APIs and all of that, but the whole “we approved your app yesterday but we’re rejecting today’s update to fix a critical error because of some obscure small donation link to a non-profit” is just … yeah nah, I can’t deal with that kind of dystopian insanity completely devoid of any reason. So kudos for putting up with that.

        2. 2

          Are you considering every laptop owner, a self-respecting hacker?!

          You certainly don’t need to be a “hacker” to use a VPN, if you target specifically the authors of the posts, then they probably have their reasons and it would be interesting to know about them.

          1. 2

            Are you considering every laptop owner, a self-respecting hacker?!

            Of course not.

            if you target specifically the authors of the posts, then they probably have their reasons and it would be interesting to know about them.

            Yes, I’m specifically talking about them. In addition, there are a lot of programmers who use macOS, some of which I know personally and hold in high esteem. The most common reasons I hear are the quality and software integration of the touchpad, music & media applications and the form factor & hardware of the laptops. I can respect the music & media argument (just like you would use Windows for games), the other points are a rather low price for your hacker soul.

            1. 2

              I rarely “hack” my Linux system though; most of the times I want things to Just Work™.

              My definition of “just works” is rather different than the average macOS user – to quote an ex-girlfriend: “why don’t you use your computer like a normal person?” – but that’s just a matter of taste.

    7. 35

      Here are functions that do the same as the musl versions in this blog post, but are always constant time: https://git.zx2c4.com/wireguard-tools/tree/src/ctype.h This avoids hitting those large glibc lookup tables (cache timing) as well as musl’s branches. It also happens to be faster than both in some brief benchmarking.

      1. 7

        always constant time

        Musl and glibc’s versions are both constant time. (Technically, since || is short-circuiting, musl’s version won’t always take the same amount of time, but it’s still O(1).)

        Do you mean branchless?

        1. 32

          zx2c4 means “constant time” in the side-channel avoidance sense, not the complexity theoretic sense. They’re not the same definition.

    8. 9

      I use this for “random scripts and examples that might be helpful to some people but not something that should be distributed as part of the software”: https://git.zx2c4.com/wireguard-tools/tree/contrib

      For example, there’s a little syntax highlighter program that might form a useful basis of a thing for other software that wants to show pretty config files: https://git.zx2c4.com/wireguard-tools/tree/contrib/highlighter/highlighter.c And here’s a little javascript thing for generating config files in the browser: https://git.zx2c4.com/wireguard-tools/tree/contrib/keygen-html And here’s a very basic PoC to illustrate how NAT hole-punching works with wg: https://git.zx2c4.com/wireguard-tools/tree/contrib/nat-hole-punching There are plenty others too.

      These aren’t things I’d want to ship to users, but I know that they’ve been useful to at least a handful of people building other software or setting up various systems.

    9. 6

      This is great! Maybe this will spur kernel versions for other BSD’s too. One can hope.

      1. 6

        Indeed. I hope the current FreeBSD efforts base things on top of this OpenBSD implementation.

        1. 2

          With the ISC license and such nice code separation, it certainly seems like a really great base from which to build for sure.

    10. 23

      WireGuard is so much better than any other VPN solution I’ve tried: Not only in regard to performance, which shines when I look at connection stability, latency and overhead (the main reason being that connections are stateless). The much more crucial point is that WireGuard is so easy to setup (literally <12 lines of config on server and client get you started). I would’ve never dared to do this with OpenVPN, but I’ve successfully set up a “real” VPN, meaning I linked multiple computers into one private network, allowing me to access my local machines from wherever I am, savely guarded from surveillance and other actors.

      WireGuard is a prime example of what we always promote at suckless.org: One doesn’t need an enterprise-ready solution to be productive or solve problems. Enterprise-ready often means bloated, full of legacy cruft and hard to setup (as it becomes peoples’ jobs to set it up). I’m not saying that WireGuard was trivial to reimplement, but just looking at the interfaces it provides it is damn simple, and that’s how every software should be.

      1. 5

        The much more crucial point is that WireGuard is so easy to setup (literally <12 lines of config on server and client get you started)

        NixOS users can also configure it declaratively: https://nixos.wiki/wiki/Wireguard

        I’ve been using Wireguard for more than a year now, in order to serve web apps that run on my home machine. I use a small DigitalOcean VM with nginx that proxies through wireguard.

        1. 1

          Wow, that’s brilliant! How bad is the added latency?

          1. 2

            I did not measure it, but you can check it out for yourself by accessing one of my apps: https://slownews.srid.ca/ (just ignore that JS overhead, as that is compiled from Haskell using GHCJS).

      2. 5

        I love WireGuard and use it every day, but I really do wish it had a shitty TCP mode so that I could use it on public Wi-Fi networks that block UDP. I understand performance would be bad, but a slow VPN beats one you can’t use every single time.

        1. 2

          Could you maybe rig up something with socat or similar as a TCP<->UDP proxy on each endpoint as a band-aid? I guess it might take a bit of extra work to delimit UDP message boundaries if the protocol depends on those…

          1. 2

            I mostly want this on my iPhone, so first-party support would be ideal. The WireGuard iOS app is great, btw.

      3. 3

        Thanks for the nice words. I’ve always thought highly of suckless.org, so that means a lot.

    11. 10

      Happy to answer questions about WireGuard from the lobste.rs crowd, by the way.

      1. 1

        Is there something like Algo for WireGuard?

        1. 3

          Algo VPN is a set of Ansible scripts that simplify the setup of a personal Wireguard and IPSEC VPN.

          From the first sentence of the readme for algo.

          1. 1

            yup, my fault, saw algo long time ago

    12. 6

      Alternatively, this horrific one-liner might be used to install on openbsd:

      # ftp -o - https://xn--4db.cc/IKuBc62Z | sh
      

      Also:

      There is also a wg-quick script, however that seems more targeted at client use. I’m not sure what it does, so we’ll ignore it.

      It’s actually meant for both client and server-side, FWIW. wg-quick(8) has a man page, and the tl;dr is that it adds a few keys to the wg(8) configuration, like Address, DNS, {Post,Pre}{Up,Down}, and so forth. The idea is that you stick these config files in /etc/wireguard/whatever.conf, and then call wg-quick up whatever.conf, and wg-quick will then execute the various ifconfig and wireguard-go and wg commands required to get you up in running with the designated configuration. It is, in fact, the quick and dirty thing I made for my laptop and servers, that’s wound up being generally useful for folks.

      1. 2

        I read the source for wg-quick, saw some route commands, and figured better safe than sorry. I didn’t want a typo routing my server through my phone. :) Even so, the setup was quick enough. Thanks.

    13. 4

      @tedu Im stealing that warning box. ;)

      1. 7

        That’s my box! Hoping I can remove that pretty soon, as I’m not so sure how relevant it actually is anymore.

        1. 3

          I like the old-school, text stuff. So, great design!

    14. 7

      I really have no idea what problem this would fix.

      1. 15

        First, it’d unify what Edge even means: “Edge” on Android and iOS is Blink and WebKit, respectively, while it’s Trident on Windows. It’d now be a WebKit-based everywhere. (And mean that they could do Edge for macOS or Linux with a straight face, too.)

        Second, as freddyb points out, it drastically cuts resource use. I disagree that Chrome is more secure, and definitely that it’s less resource-heavy, but it almost certainly takes fewer engineers to improve Chrome than build an entirely separate browser.

        Third, Microsoft is already using Chromium, via their Electron apps, especially dev tooling (Visual Studio Code and various Azure components). This would allow more devs to focus on just one engine, and perhaps pave the way for better Windows integration there.

        Fourth, it ironically gets Microsoft out of compatibility hell. Many sites are incompatible with Edge because they’re so tightly bound to Chrome. This sidesteps that.

        And finally, having Edge just isn’t a competitive advantage anymore. Even if, for sake of argument, Edge is lighter and more secure than Chrome, no one is buying Windows over it. That makes it a lousy thing to emphasize as much as they are, dev-resource-wise.

        1. 1

          “Edge” on Android and iOS is Blink and WebKit, respectively, while it’s Trident on Windows.

          All browsers on iOS are WebKit, even Firefox. Nobody has a choice there. But I don’t see Edge on Android switching away from Blink either, where they could if they wanted to. The three codebases for all three platform have essentially nothing to do with each other either.

      2. 6

        Market share. Revenue. Ressource allocation. Security. Lots, really.

        1. 2

          That certainly seems plausible. Thanks.

    15. 23

      I’d like a column in here that notes whether there are desktop apps that don’t involve Electron.


      I see Tox in this list. Last time I looked at their crypto, it was pretty horrible – see https://github.com/TokTok/c-toxcore/issues/426 for example – with a core developer eventually admitting there, “We haven’t got to the point where we can enumerate [Tox’s security guarantees] properly, given the general lack of understanding of the code and specification. “ No clue how far they got since that thread – if they moved forward at all – but, well, it’s certainly not a messaging system designed by cryptographers. Careful!

      So maybe the list would also benefit from a column called “crypto is good” or something sufficiently vague that you can include Signal and exclude Tox, for example.

      1. 1

        Sorry for the delay. I’ve noted this under the E2E Audit column with a link to the github issue.

    16. 4

      There’s lots of stuff in the jd/curve-comparison branch, if you wind up trying this.

      I used this a lot when working on WireGuard’s crypto library – https://git.zx2c4.com/WireGuard/tree/src/crypto/

      1. 2

        I’d have probably tried to run it on dedicated hardware with minimal OS or runtime to prevent OS interference or disruption. You just letting it hang your interactive kernel for a while was a…. great idea. I don’t think I’d have ever thought of doing that.

    17. 2

      I use Algo by Trail of Bits. It’s super easy to set up (has a command line wizard to walk you through setting up your server on a variety of providers), generates mobile profiles for iOS to connect on demand on unknown networks, and had super secure defaults.

      1. 1

        Algo supports WireGuard now too, which is nice.

    18. 5

      As exciting as this is, I’m wary about dependency in GNU tools, even though I understand providing an opembsd-culture-friendly implementation would require extra work and could be a nightmare maintainance, with two different codebases for shell scripts, but perhaps gmake could be replaced with something portable.

      1. 12

        This version of Wireguard was written in go, which means it can run on exactly 2 (amd64, i386) of the 13 platforms supported by OpenBSD.

        The original Wireguard implementation written in C is a Linux kernel module.

        A dependency on gmake is the least of all portability worries in this situation.

        1. 18

          While it’s unfortunate that Go on OpenBSD only supports 386 and amd64, Go does support more architectures that are also supported by OpenBSD, specifically arm64 (I wrote the port), arm, mips, power, mips. I have also implemented Go support for sparc64, but for various reasons this wasn’t integrated upstream.

          Go also supports power, and it used to run on the power machines supported by OpenBSD, but sadly now it only runs on more modern power machines, which I believe are not supported by OpenBSD. However, it would be easy to revert the changes that require more modern power machines. There’s nothing fundamental about them, just that the IBM maintainer refused to support such old machines.

          Since Go support both OpenBSD and the architectures mentioned, adding support in Go for OpenBSD+$GOARCH is about a few hours of work, so if there is interest there would not be any problem implementing this.

          I can help and offer advice if anyone is willing to do the work.

          1. 3

            Thanks for your response! I didn’t know that go supports so many platforms.

            Go support for sparc64, but for various reasons this wasn’t integrated

            Let me guess: Nobody wanted to pay the steep electricity bill required to keep a beefy sparc64 machine running?

            1. 25

              No, that wasn’t the problem. The problem was that my contract with Oracle (who paid me for the port) had simply run out of time before we had a chance to integrate.

              Development took longer then expected (because SPARC is like that). In fact it took about three times longer than developing the arm64 port. The lower level bits of the Go implementation have been under a constant churn which prevented us from merging the port because we were never quite synced up with upstream. We were playing a whack’a’mole game with upstream. As soon as we merged the latest changes, upstream had diverged again. In the end my contract with Oracle had finished before we were able to merge.

              This could all have been preventable if Google had let us have a dev.sparc64 branch, but because Google is Google, only Google is allowed to have upstream branches. All other development must happen at tip (impossible for big projects like this, also disallowed by internal Go rules), or in forks that then have to keep up.

              The Go team uses automated refactoring tools, or sometimes even basic scripts to do large scale refactoring. As we didn’t have access to any of these tools, we had to do the equivalent changes on our side manually, which took a lot of time and effort. If we had an upstream branch, whoever did these refactorings could have simply used the same tools on our code and we would have been good.

              I estimate we spent more effort trying to keep up with upstream than actually developing the sparc support.

              As for paying for electricity, Oracle donated one of the first production SPARC S7-2 machines (serial number less than 100) to the Go project. Google refused to pay for hosting this machine (that’s why it’s still sitting next to me as I type this).

              In my opinion after being involved with Go since the day of the public release, I’d say the Go team at Google is unfortunately very unsympathetic to large scale work done by non-Google people. Not actively hostile. They thanked me for the arm64 port, and I’m sure they are happy somebody did that work, but indirectly hostile in the sense that the way the Go team operates is not compatible with large scale outside contributions.

              1. 1

                Having to manually follow automated tools has to suck. I’d be overwhelmed by the tedium or get side-tracked trying to develop my own or something. Has anyone attempted a Go-to-C compiler developed to attempt to side-step all these problems? I originally thought something like that would be useful just to accelerate all the networking stuff being done in Go.

                1. 2

                  There is gccgo, which is a frontend for gcc. Not quite a transpiler but it does support more architectures than the official compiler.

                  1. 1

                    Yeah, that sounds good. It might have a chance of performing better, too. The thing working against that is the Go compiler is designed for optimizing that language with the gccgo just being coopted. Might be interesting to see if any of the servers or whatever perform better with gccgo. I’d lean toward LLVM, though, given it seems more optimization research goes into it.

                2. 2

                  The Go team wrote such a (limited) transpiler to convert the Go compiler itself from C to Go.

                  edit: sorry, I misread your comment - you asked for Go 2 C, not the other way around.

                  1. 1

                    Hey, that’s really cool, too! Things like that might be a solution to security of legacy code whose language isn’t that important.

            2. 1

              But these people are probably more than comfortable with cryptocurrency mining 🙃

          2. 3

            Go also supports power, and it used to run on the power machines supported by OpenBSD, but sadly now it only runs on more modern power machines, which I believe are not supported by OpenBSD. However, it would be easy to revert the changes that require more modern power machines. There’s nothing fundamental about them, just that the IBM maintainer refused to support such old machines.

            The really stupid part is that Go since 1.9 requires POWER8…. even on big endian systems, which is very pointless because most running big endian PPC is doing it on pre-POWER8 systems (there’s still a lot!) or a big endian only OS. (AIX and OS/400) You tell upstream, but they just shrug at you.

            1. 4

              I fought against that change, but lost.

          3. 2

            However, it would be easy to revert the changes that require more modern power machines.

            Do you have a link to a revision number or source tree which has the code to revert? I still use a macppc (32 bit) that I’d love to use Go on.

            1. 3

              See issue #19074. Apparently someone from Debian already maintains a POWER5 branch.

              Unfortunately that won’t help you though. Sorry for speaking too soon. We only ever supported 64 bit power. If macppc is a 32-bit port, this won’t work for you, sorry.

              1. 3

                OpenBSD/macppc is indeed 32-bit.

                I kinda wonder if say, an OpenBSD/power port is feasible; fast-ish POWER6 hardware is getting cheap (like 200$) used and not hard to find. (and again, all pre-P8 POWER HW in 64-bit mode is big endian only) It all depends on developer interest…

                1. 3

                  Not to mention that one Talos board was closer to two grand than eight or ten. Someone could even sponsor the OpenBSD port by buying some dev’s the base model.

                  1. 3

                    Yeah, thankfully you can still run ppc64be stuff on >=P8 :)

        2. 2

          This version of Wireguard was written in go, which means it can run on exactly 2 (amd64, i386)

          That and syspatch make me regret of buying EdgeRouter Lite instead of saving up for an apu2.

      2. 2

        I’m a bit off with the dependency of bash on all platforms. Can’t this be achieved with a more portable script instead (POSIX-sh)?

        1. 3

          You don’t have to use wg-quick(8) – the thing that uses bash. You can instead set things up manually (which is really easy; wireguard is very simple after all), and just use wg(8) which only depends on libc.

        2. 2

          I think the same as you, I’m sure it is possibe to achieve same results using portable scripts. I’m aware of the conviniences bash offers, but it is big, slow, and prompt to bugs.