1. 39
  1.  

  2. 16

    Bad headline. Should be “native byte order doesn’t matter, only network byte order”.

    1. 6

      I’m surprised that for C code hton* functions are not mentioned. You can shift bytes yourself, or you can htonl(*data) which saves you from even thinking about the layout (and potentially getting the numbers wrong). I wonder if he didn’t mention https://linux.die.net/man/3/byteorder for some specific reason.

      1. 6

        This is something Rust gets right with the from_{be,le}_bytes function defined on integer types. The author’s point is valid, but we can have the best of both worlds by just writing the function well once in the standard library and having everyone else use that.

        No, there isn’t a huge speed difference between the author’s version and just casting an array of bytes into an integer. But with Rust’s way, you get the ergonomics, safety, and the extra speed.

        1. 9

          This; the argument is basically one of API. Instead of saying

          #ifdef LITTLE_ENDIAN
          int x = swap_bytes(read_from_network());
          #else
          int x = read_from_network();
          #endif
          

          it’s nicer if you can always write:

          int x = bigendian_to_native(read_from_network());
          
        2. 3

          I’m a bit puzzled by this article and I might be missing something. In the given example, depending on the type of the machine (big endian/little endian) one has to use different extraction methods for the uint32 in the network order. That’s exactly the use case for ifdef, if I were to build a binary for different architectures.

          1. 7

            In the given example, depending on the type of the machine (big endian/little endian) one has to use different extraction methods for the uint32 in the network order.

            Not at all – if you read the example carefully, the author is making the point that, depending on the type of the peripheral (not the host machine!) you can extract the uint32 once, straight into native format, regardless of what the native format is.

            That is, if you need to read a uint32_t, you can either:

            a) Read it straight into a uint32_t on the host and swap the bytes as needed depending on host and peripheral byte order, or

            b) Read it into an array of 4 uint8_ts, at which point the only variable in this equation is the peripheral order (because the result of data[0] << 0 | data[1] << 8 | data[2] << 16 | data[3] << 24 doesn’t depend on host order)

            In terms of performance, things are a teeny tiny bit less black-and-white than the author makes it seem, depending on how smart the underlying compiler is and on how good the underlying architecture is at shifting bytes, unaligned access and the like.

            But in terms of code quality my experience matches the author’s – code that takes route a) tends to end up pretty messy. This is particularly problematic if you’re working on small systems with multiple data streams, from multiple peripherals, sometimes with multiple layers of byte swapping (e.g. peripherals have their own byte order, then the bus controller at the MCU end can swap the bytes for you as well, and the one little-endian peripheral on that bus gives you 12-bit signed integers).

            This is likely why the author hasn’t mentioned man byteorder, as @viraptor suggested. There’s no shortage of permissively-licensed byteorder-family functions for these systems if you’re not writing against a Unix-y system, but in these cases – where you get data from different peripherals, with different native byte orders, over different buses — the concept of “network” ordering is a little elusive. If you’re on a little-endian core you do ntoh conversions for big-endian peripherals, but what do you do for little-endian peripherals? Presumably, not “htoh” (note for confused onlookers: there’s no htoh ;-)), you leave the result as is, but in that case your code isn’t portable for big-endian cores. *to* functions implicitly rely on the relationship between network and host order, which works okay when the network byte order is clear and homogenous, but – as the author of this post points out – it breaks down as soon as you deal with external byte streams of multiple endiannesses.

            (Edit: this is a point that Rob Pike, and others from the Plan 9 team, have made over the years. I thought this was someone echoing that point but lol, turns out this is Pike’s blog?)

            1. 1

              If you’re on a little-endian core you do ntoh conversions for big-endian peripherals, but what do you do for little-endian peripherals?

              In that case use more modern https://linux.die.net/man/3/endian

              htobe32 / htole32 have you covered. htonl is just nicer in cases where you don’t give people choice - network is network, don’t think about which one is it specifically.

              1. 6

                The author’s argument is that portable code should be endianness-independent, not that it should handle endianness with syntactic sugar of the right flavour. The “modern” (meh, they’re about 20 years old at this point?) alternatives work around the ambiguous (and insufficiently diverse) typing of the original API but don’t exhibit all the desirable properties of the version that Pike proposes.

          Stories with similar links:

          1. The byte order fallacy via friendlysock 3 years ago | 30 points | 2 comments