1. 59
  1.  

  2. 8

    This was an interesting article, it breaks down the issues with net.IP well, and describes the path to the current solution well.

    But.

    This isn’t a difficult problem. Don’t waste a ton of space, don’t allocate everywhere, make it possible to make the type a key in the language’s standard map implementation. In C++, this would’ve been easy. In Rust, this would’ve been easy. In C, this would’ve been easy (assuming you’re using some kind of halfway decent map abstraction). It doesn’t speak well of Go’s aspiration to be a systems programming language that doing this easy task in Go requires a bunch of ugly hacks and a separate package to make a string deduplicator which uses uintptrs to fool the garbage collector and relies on finalizers to clean up. I can’t help but think that this would’ve been a very straightforward problem to solve in Rust with traits or C++ with operator overloading or even Java with its Comparable generic interface.

    That’s not to say that the resulting netaddr.IP type is bad, it seems like basically the best possible implementation in Go. But there are clearly some severe limitations in the Go language to make it necessary.

    1. 11

      Almost all of the complexity that happened here is related to the ipv6 zone string combined with fitting the value in 24 bytes. Given that a pointer is 8 bytes and an ipv6 address is 16 bytes, you must use only a single pointer for the zone. Then, having amortized zero allocations with no space leaks for the zone portion, some form of interning with automatic cleanup is required.

      If this is as easy as you claim in C/C++/Rust/whatever real systems language you want, can you provide a code snippet implementing it? I’d be happy to audit to see if it does meet the same (or better!) constraints.

      1. 6

        Here’s a C++ version: https://godbolt.org/z/E3WGPb - see the bottom for a usage example.

        Now, C++ is a terrible language in many ways. It makes everything look super complicated, and there’s a lot of seemingly unnecessary code there, but almost all of that stems from having to make my own RAII type, which includes writing the default constructor, the move constructor, the copy constructor, the destructor, the move operator= and the copy operator=. That complexity is just par for the course in C++.

        One advantage of the netaddr.IP type is that it doesn’t allocate for every zone, just for every new zone, thanks to the “intern” system. My code will allocate space for the zone for every IPv6 address with a zone. One could definitely implement a “zone cache” system for my IPZone type though, maybe using a shared_ptr instead of a raw pointer for refcounting. One would have to look at usage patterns to see whether the extra complexity and potential memory/CPU overhead would be worth it or if zones are so infrequently used that it doesn’t matter. At least you have the choice in C++ though (and it wouldn’t rely on finalizers and fooling the GC).

        1. 7

          They also had the choice to just make a copy of every string when parsing and avoid all of the “ugly hacks”. Additionally, a shared_ptr is 16 bytes, so you’d have to figure out some other way to pack that in to the IPAddress without allocations. So far, I don’t think you’ve created an equivalent type without any “ugly hacks”. Would you like to try again?

          1. 6

            I don’t think they had the choice to just copy the zone strings? My reading of the article was that the intern system was 100% a result of the constraint that A) IP addresses with no zone should be no bigger than 24 bytes and B) it should be possible to use IP addresses as keys. I didn’t see concern over the memory usage of an IP address’s zone string. Whether that’s important or not depends on whether zones are used frequently or almost never.

            It’s obviously hard to write a type when the requirements are hypothetical and there’s no data. But here’s a version with a zone string cache: https://godbolt.org/z/P9MWvf. Here, the zone is a uint64_t on the IP address, where 0 represents an IPv4 address, 1 represents an IPv6 address with no zone, and any other number refers to some refcounted zone kept in that IPZoneCache class. This is the “zone mapping table” solution mentioned in the article, but it works properly because the IPAddress class’s destructor decrements the reference count.

            1. 7

              I don’t think they had the choice to just copy the zone strings? My reading of the article was that the intern system was 100% a result of the constraint that A) IP addresses with no zone should be no bigger than 24 bytes and B) it should be possible to use IP addresses as keys.

              Indeed, interning is required by the 24 byte limit. That Interning avoids copies seems to be a secondary benefit meeting the “allocation free” goal. It was a mistake to imply that copying would allow a 24 byte representation and that interning was only to reduce allocations.

              That said, your first solution gets away with avoiding interning because it uses C style (null terminated) strings so the reference only takes up a single pointer. Somehow, I don’t think that people would be happier if Go allowed or used C style strings, though, and some might consider using them an “ugly hack”.

              I didn’t see concern over the memory usage of an IP address’s zone string. Whether that’s important or not depends on whether zones are used frequently or almost never.

              One of the design criteria in the article was “allocation free”.

              It’s obviously hard to write a type when the requirements are hypothetical and there’s no data. But here’s a version with a zone string cache: https://godbolt.org/z/P9MWvf.

              Great! From what I can tell, this does indeed solve the problem. I appreciate you taking the time to write these samples up.


              I have a couple of points to make about your C++ version and some hypothetical C or Rust versions as compared to the Go version, though.

              1. It took your C++ code approximately 60 lines to create the ref-counted cache for interning. Similarly, stripping comments and reducing the intern package they wrote to a similar feature set also brings it to around 60 lines. Since it’s not more code, I assume the objection is to the kind of code that is written? If so, I can see that the C++ code you provided looks very much like straightforward C++ code whereas the Go intern package is very much not. That said, the authors of the intern package often work on the Go runtime where these sorts of tricks are more common.

              2. In a hypothetical C solution that mirrors your C++ solution, it would need a hash-map library (as you stated). Would you not consider it an ugly hack to have to write one of those every time? Would that push the bar for implementing it C from “easy” towards “difficult”? Why should the Go solution not be afforded the same courtesy under the (now valid) assumption that an intern library exists?

              3. I’ll note that when other languages gain a library that increases the capabilities, even if that library does unsafe hacks, it’s often viewed as a positive sign that the language is powerful enough to express the concept. Why not in this case?

              4. In a hypothetical Rust solution, the internal representation (I think. Please correct me if I’m wrong) can’t use the enum feature because the tag would push the size limits past 24 bytes. Assuming that’s true, would you consider it an ugly hack to hand-roll your own union type, perhaps using unsafe, to get the same data size layout?

              5. All of these languages would trivially solve the problem easily and idiomatically if the size was allowed to be 32 bytes and allocations were allowed (this is take 2 in the blog post). Similarly, I think they all have to overcome significant and non-obvious challenges to hit 24 bytes with no allocations as they did.


              Anyway, I want to thank you for engaging and writing some code to demonstrate the type in C++. That’s effort you don’t usually get on the internet. This conversation has caused me to update my beliefs to agree more with adding interning or weak references to the language/standard library. Hopefully my arguments have been as useful to you.

      2. 4

        I agree—if Go is a systems language. But I don’t think it ever was supposed to be. Or if it was, it’s (in my opinion) really bad at it. Definitely worse than even something like C#, for exactly the reasons you’re highlighting.

        I think Go was more originally designed to be a much faster language than Python (or perhaps Java), specifically for Google’s needs, and thus designed to compete with those for high-performance servers. And it’s fine at that. And I’ve thought about solving this kind of issue in those languages, too, using things like array in Python for example.

        So I agree Go isn’t a good systems language, but I think that was a bit of retcon. It’s a compiled high-level language that could replace Python usage at Google. It’s not competing with Rust, C, Zig, etc.

        1. 3

          Ok, I can buy that. IIRC, it was originally promoted as a systems language, but it seems like they’ve gone away from that branding as well. There’s a lot of value to something like “a really fast, natively compiled Python”.

          But even then, this article seems to demonstrate a pretty big limitation. Something as simple as using a custom IP address type as the key in a map, ignoring everything performance-related, seems extremely difficult. How would you write an IP address struct which stores an IPv4 address or an IPv6 address with an optional zone, which can be used a the key in a map, even ignoring memory usage and performance? Because that would be easy in Python too; just implement __hash__ and __eq__.

          This is a problem which isn’t just related to Go’s positioning, be it a “systems language” or a “faster python”. Near the bottom we have C, where an IP address -> whatever map is about as difficult as any other kind of map. Slightly above, we have C++ and Rust, where the built-in types let you use your IP address class/struct as a key with no performance penalty, since you stamp out a purpose-built “IP address to whatever” map using templates. Above that again, we have Java and C#, which also makes it easy, though at a performance cost due to virtual calls (because genetics aren’t templates), though maybe the JIT optimises out the virtual call, who knows. Near the top, we have Python which makes it arguably even more straightforward than Java thanks to duck typing.

          Basically, unless you put Go at the very bottom of the stack alongside C, this should be an easy task regardless of where you consider Go to fit in.

          1. 3

            IIRC, it was originally promoted as a systems language, but it seems like they’ve gone away from that branding as well.

            I believe you’re correct about how Google promoted it. I just remember looking at it, thinking “this is absolutely not a systems language; it’s Limbo (https://en.wikipedia.org/wiki/Limbo_(programming_language), but honestly kind of worse, and without the interesting runtime,” and continuing to not use it. So I’m not sure the team itself actually thought they were doing a systems language.

            But even then, this article seems to demonstrate a pretty big limitation. Something as simple as using a custom IP address type as the key in a map, ignoring everything performance-related, seems extremely difficult.

            I completely agree, but that’s changing the discussion to whether Go is a good language, period. And since I mostly see that devolving into a flame war, I’m just going to just say that I think you have a lot of company, and also that clearly lots of people love the language despite any warts it has.

            1. 2

              I completely agree, but that’s changing the discussion to whether Go is a good language, period. And since I mostly see that devolving into a flame war, I’m just going to just say that I think you have a lot of company, and also that clearly lots of people love the language despite any warts it has.

              My relationship with the language is… Complicated. I often enjoy it, I use it for work, and when I just want to write a small tool (such as when I wrote a process tree viewer) it’s generally my go-to “scripting” language these days. But I hate how the module system puts URLs to random git hosting websites in my source code, there’s a lot of things I dislike about the tooling, and the inability write a datastructure which acts like the built-in datastructures and the inability to write a type which works with the built-in datastructures are both super annoying issues which none of the other languages I use have. I’m hoping Go 2 will fix some of the bigger problems, and I’m always worried about which directions the corporate management at Google will take the language or its tooling/infrastructure.

              But you’re right, this is tantamount to flamewar bait so I’ll stop now.