Threads for crawshaw

    1. 3

      Tailscalar here. I didn’t tick “I’m the author” because I’m not actually the author of this doc. But I am from Tailscale. I’m not certain what the correct etiquette for this is, but I do believe this is a genuinely interesting piece of technology so I thought I’d try sharing it.

      1. 1

        webdesign feedback: the ToC on the right is longer than a screen height for me, but not independently scrollable - I need to scroll the entire page towards the end for the ToC to scroll too and reveal the last few entries.

      2. 1

        I am not sure if this is the right place to bring this up, it’s certainly not related to the posted link, but has there been any work into improving the energy efficiency of tailscale? Tailscale significantly affects my macOS/iOS devices’ battery life. I know that part of the problem is with Go itself, the Go runtime is not optimized for mobile devices, but still…

    2. 19

      They do mention it in passing, but I really can’t help but feel that the approach outlined here is probably not the best option in most cases. If you are measuring your memory budget in megabytes, you should probably just not use a garbage collected language.

      1. 21

        All of the memory saved with this linker work had nothing to do with garbage collection.

        1. 7

          Sure, but that’s tangential to my point. In a gced language, doing almost anything will generate garbage. Calling standard library functions will generate garbage. This makes it difficult to have really tight control of your memory usage. If you were to use, for example, c++ (or rust if you want to be trendy) you could carefully preallocate pretty much everything, and at runtime have no dynamic allocation (or very little, and carefully bounded, depending on your problem and constraints). This would be (for my skillset, at least) a much easier way to keep memory usage down. They do mention they have a lot of go internals expertise, so maybe the tradeoff is different for them, but that seems like an uncommon scenario.

        2. 1

          I wouldn’t say that, because it’s likely that they wouldn’t have been short on memory to begin with if they hadn’t used a GC language. (And yes, I’m familiar with the pros and cons of GC; I’m writing a concurrent compacting GC right now for work.)

      2. 2

        Only maybe. Without a gc long running processes can end up with really fragmented memory. With a gc you can compact and not waste address space with dead objects.

        1. 18

          If you’re really counting megs, perhaps the better option is to forgo dynamic heap allocations entirely, like an embedded system does.

          1. 4

            Technically yes. But they probably used this to deploy one code base for everything, instead of rewriting this only for the iOS part.

          2. 2

            Exactly this. You can try to do this in a gced language, and even make some progress, but you will be fighting the language.

          3. -2

            You should probably write it all in assembly language too.

            1. 7

              I feel like you’re being sarcastic, but making most of the app not do dynamic applications is not a crazy or extreme idea. It’s not super common in phone apps and the system API itself may force some allocations. But doing 90+% of work in statically allocated memory and indexed arenas is a valid path here.

              Of course that would require a different language than Go, which they have good reasons not to do.

              1. 1

                I’m being sarcastic. But one of the issues identified in the article is that different tailnets have different sizes and topologies - they rejected the idea of limiting the size of networks that would work with iOS which is what they’d need to do if they wanted to do everything statically allocated.

                1. 3

                  they rejected the idea of limiting the size of networks

                  They’re already limited. They can’t use more than the allowed memory, so the difference is - does the app tell you that you reached the limit, or does it get silently killed.

                  I believe that fragment was related to “how other team would solve it keeping other things the same” (i.e. keeping go). Preallocation/arenas requires going away from go, so it would give them more possible connections not less.

        2. 10

          That is absolutely not my experience with garbage collectors.

          Few are compacting/moving, and even fewer are designed to operate well in low-memory environments[1]. golang’s collector is none of that.

          On the other hand, it is usually trivial to avoid wasting address space in languages without garbage collectors, and a application-specific memory management scheme typically gives 2-20x performance boost in a busy application. I would think this absolutely worth the limitations in an application like this.

          [1]: not that I think 15mb is terribly low-memory. If you can syscall 500 times a second, that equates to about 2.5gb/sec transfer filling the whole thing - a speed which far exceeds the current (and likely next two) generations of iOS devices.

          1. 4

            To back up what you’re saying, this presentation on the future direction that the Golang team are aiming to take is worth reading. https://go.dev/blog/ismmkeynote

            At the end of that presentation there’s some tea-leaf reading about the likely direction that hardware development is likely to go in. Golang’s designers are betting on DRAM capacity improving in future faster than bandwidth improvements and MUCH faster than latency improvements.

            Based on their predictions about what hardware will look like in future, they’re deliberately trading off higher total RAM usage in order to get good throughput and very low pause times (and they expect to move further in that direction in future).

            One nitpick:

            Few are compacting/moving,

            Unless my memory is wildly wrong, Haskell’s generation 1 collector is copying, and I’m led to understand it’s pretty common for the youngest generation in a generational GC to be copying (which implies compaction) even if the later ones aren’t.

            I believe historically a lot of functional programming languages have tended to have copying GCs.

            1. 2

              At the end of that presentation there’s some tea-leaf reading about the likely direction that hardware development is likely to go in. Golang’s designers are betting on DRAM capacity improving in future faster than bandwidth improvements and MUCH faster than latency improvements.

              Given the unprecedented semiconductor shortages, as well as crypto’s market influence slowly spreading out of the GPU space, that seems a risky bet to me.

              1. 1

                That’s the short term, but it’s not super relevant either way. They’re betting on the ratios between these quantities changing, not on the exact rate at which they change. If overall price goes down slower than desired, that doesn’t really have any bearing.

          2. 1

            Aren’t most GCs compacting and moving?

            The first multi-user system I used heavily was a SunOS 4.1.3 system with 16MB of RAM. It was responsive with a dozen users so long as they weren’t all running Emacs. Emacs, written in a garbage collected, interpreted language would have run well on a much smaller system if there was only one user.

            The first OS I worked on ran in 16MB of RAM and ran a Java VM and that worked well.

        3. 1

          Any non-moving allocator is vulnerable to fragmentation from adversarial workloads (see Robson bounds), but modern size-class slab allocators (“segregated storage” in the classical allocation literature) typically keep fragmentation quite minimal on real-world workloads. (But see a fascinating alternative to compaction for libc-compatible allocators: https://github.com/plasma-umass/Mesh.)

      3. 1

        This does strike me as a place where refcounting might be a better approach, if you’re going to have any dynamic memory at all.

        1. 1

          With ref-counting you have problems with cycles and memory fragmentation. The short-term memory consumption is typically lower with ref-counting than a compacting GC, but the are many more opportunities to have leaks and grow over time. For a long-running process I’m skeptical that ref-counting is a sound choice.

          1. 1

            Right. I was thinking that for this kind of problem with sharply limited space available you’d avoid the cycles problem by defining your structs so there’s no void* and the types form a DAG.

      4. 1

        Edit: reverting unfriendly comment of dubious value.

    3. 2

      This article, and the presentation at GopherCon, really gets Go: “The ultimate best practice is to embrace simplicity”