1. 28
  1. 8

    A good followup to the previous byte-order post. An interesting note is that, because of the typical behavior of bigints, this recipe is almost unchanged when ported to high-level memory-safe bigint-only languages like Python. To slice a byte from a bigint in Python, do just like in C: mask and then shift.

    1. 5

      This is overcomplicating a simple problem. Using memcpy + bswap intrinsics needs zero thought and matches what the hardware actually ends up doing (unaligned load/store + bswap)

      1. 6

        Except those intrinsics aren’t standard C. Maybe the C compiler for those IBM mainframes she alludes to doesn’t have them? It does kind of boggle my mind that C and C++ still don’t come with support for endian conversions, which are one of the rock-bottom requirements for portable code.

        This blog post covers most of the dark corners of C so if you’ve understood what you’ve read so far, you’re already practically a master at the language, which is otherwise remarkably simple and beautiful.

        Masterful use of sarcasm there! I would insert “integer arithmetic” after “…of C…”, since there are plenty of other dark corners involving floating-point, parameter passing, struct alignment, etc.

        1. 6

          Pretty much every modern ISA has either swapped load/store (i.e PowerPC, I can speak from experience) or an instruction to do so (x86). The problem is the C abstract machine doesn’t expose it, so you effectively have to rely on compiler built-ins or inline assembly.

          I wish C let you specify endianness as a modifier on a type, kinda like how Erlang binary pattern matches work.

          1. 2

            When I write and use the following:

            static uint32_t load32_le(const uint8_t s[4])
            {
                return (uint32_t)s[0]
                    | ((uint32_t)s[1] <<  8)
                    | ((uint32_t)s[2] << 16)
                    | ((uint32_t)s[3] << 24);
            }
            

            GCC and Clang can recognise what I’m trying to do, and they replace that ugly piece of code by a single unaligned load. Same pattern for the store. I’ve also heard that they take advantage of bswap for the big endian versions. MSVC is likely lagging behind. Even better, I found in practice that in this case, functions are faster than macros. I think the compiler is better able to simplify an isolated function, then inline it, than it would have been if that pattern was in the middle of a bigger function.

            Yes, it would be nice for C to let you specify loads, stores, and endiannes more directly. In practice though, there are ways around this limitation.

            1. 2

              That’s assuming your compiler is optimizing and can recognize such constructs. I think it would be better if the C abstract machine exposed such a thing so the semantics are obvious.

            2. 2

              I wish C let you specify endianness as a modifier on a type

              The IAR compiler has the __big_endian keyword for just this purpose.

              1. 1

                I implemented bigendian<T> as a C++ template a few years ago. It’s just a wrapper around a numeric value that byte-swaps it when the value is read or stored. Very convenient.

              2. 1

                Alas, htole and friends are also non-standard:

                  These functions are nonstandard.  Similar functions are present on the BSDs, where the required header file is <sys/endian.h> in‐
                  stead of <endian.h>.  Unfortunately, NetBSD, FreeBSD, and glibc haven't followed the original OpenBSD naming convention for these
                  functions, whereby the nn component always appears at the end of the function name (thus, for example, in  NetBSD,  FreeBSD,  and
                  glibc, the equivalent of OpenBSDs "betoh32" is "be32toh").
                
              3. 5

                Well, apart from being nonstandard C, bswap intrinsics need to be applied conditionally based on your machine’s endianness, which adds a rarely tested codepath. I think it’s easier to avoid relying on endianness entirely by slicing up the bytes manually.

              4. 4

                I love that the title says “fiasco” but the “solution’ is to… do the obvious thing? Do people really find themselves tempted to write something as insane as that ifdef soup at the top? If your code knows the byte order of the local CPU you’re way out to sea in a storm…

                1. 2

                  Yes, it’s a constant struggle with people who seem to insist that the best way to handle this is using a memcpy and some kind of in-line byte swap. People are so persistent with the idea that it’s good and encouraged to treat C as a high level assembly that in this comments section you see people suggesting that bswap intrinsics should be part of the C standard and that it “boggles [their] mind” that C doesn’t offer tools for handling endianness by standard (except it does, with the tools outlined in the article, which accurately actually express what is happening and if you wish you can shove them into a neat function and forget about the details in most of your code).

                  1. 1

                    I think using plain char for something that is not text was the first mistake. If you use unsigned char you can also cast the value to the size you need and shift.

                    static uint32_t read32be(const unsigned char *p)
                    {
                    	return ((uint32_t) p[0] << 24)
                    	     | ((uint32_t) p[1] << 16)
                    	     | ((uint32_t) p[2] << 8)
                    	     | ((uint32_t) p[3]);
                    }
                    

                    Interestingly (unlike Clang and GCC) MSVC does not appear to be able to recognize these patterns and generate the bswap.

                    1. 2

                      If you want bytes, use uint8_t, not unsigned char. See, sizeof(char) is not fully specified in C. Some actual architectures in current use (DSP) do not support byte level addressing, and on those machines the width of char can actually be 32 bits. (Of course, on those machines uint8_t would not even compile, but that’s kind of the point: if you can’t have bytes, you need to rethink your serialization code.)

                      1. 1

                        While I agree in theory, I believe the standard does not guarantee that uint8_t is a character type, which means you could get in trouble with strict aliasing if a compiler vendor goes crazy. For storing bytes uint8_t is great, but for accessing bytes (like in the function above), unsigned char is safer. You can always check if CHAR_BIT is 8.

                        1. 3

                          I believe the standard does not guarantee that uint8_t is a character type,

                          It indeed does not guarantees that, and in practice sanitisers do warn me about careless casting from uint8_t.

                          which means you could get in trouble with strict aliasing if a compiler vendor goes crazy.

                          It can indeed be a problem if we do something like this:

                          transform(uint32_t *out, const uint8_t *in); // strict aliasing
                          
                          uint8_t data[32];
                          read(file, data, 32);
                          transform(data, data); // strict aliasing violation!!
                          

                          To get to that however, we’d have to be a little dirty. And to be honest, as much as I hate having way too much undefined behaviour in C, I do like the performance improvements that come with strict aliasing. Besides, while we could turn off strict aliasing by using unsigned char here, there’s no way we could turn it off in a case like this:

                          transform2(struct foo *out, const uint32_t *in);
                          

                          Now some C user might indeed be surprised by the fact that strict aliasing applies to uint8_t, even though it invariably has the same representation as unsigned char (at least on 2’s complement machines, which comprise every single machine in active use). That is indeed unfortunate. An API designer however may still set those expectations right:

                          transform(uint32_t *out, const uint8_t * restrict in);
                          
                          1. 1

                            Where is that written? “A typedef declaration does not introduce a new type” and is “for syntactic convenience only” quoth ANSI X3.159-1988. The uint8_t type isn’t the uint8_least_t type so if it’s available then it must be char, unless your environment defines char as fewer than 8 bits and defines either short int or long to be 8-bits, which is about as likely as your code being compiled on a Setun.

                            1. 2

                              You’d have to guarantee that uint8_t comes from a typedef in the first place, and the standard provides no such guarantee. Yes, in practice this will be a typedef, but that typedef is defined in a standard header, so I’m not sure that actually counts. As far as I know, compilers are allowed to special-case this type and pretend it does not come from a typedef, so they can enable strict aliasing.

                              1. 1

                                Where is that written?

                                1. 1

                                  It’s not, that I know of. And with the C standard, if it’s not written, it’s not guaranteed.

                                  1. 1

                                    How would you know? You’re only speaking for yourself.

                                    1. 1

                                      You go find that place in the standard that says uint8_t is a character type. I’m not going to copy & paste 700 pages to show you it’s not there. You wouldn’t read them even if I could. You on the other hand could easily disprove my claim with a couple short citations. Please take the effort to do so.

                                      Ninja Edit: what do you know, it looks like we can disprove my claim after all. From the C99 standard, §7.20.4:

                                      For each type described herein that the implementation provides, <stdint.h> shall declare that typedef name and defined the associated macros. […]

                                      That seems to mean that we have to use typedef to represent an uint8_t, and as far as I know your reasoning that it has to be unsigned char is sound as far as I can tell. I’ve tested the following under all sanitizers I could find (including the TIS interpreter), they find nothing wrong with it:

                                      #include <stdio.h>
                                      #include <string.h>
                                      #include <inttypes.h>
                                      
                                      int main()
                                      {
                                          uint32_t x = 42;
                                          uint32_t y;
                                          uint8_t t8[4];
                                          memcpy(t8, &x, sizeof(uint32_t));
                                          memcpy(&y, t8, sizeof(uint32_t));
                                          printf("x = %" PRIu32 "\n", x);
                                          printf("y = %" PRIu32 "\n", y);
                                          return 0;
                                      }
                                      

                                      (Now my problem is that they find nothing wrong with it even if I replace the uint8_t buffer with an uint16_t buffer.)

                                      I stand corrected, my apologies.

                                      1. 2

                                        Thanks for researching that. I did a bit more research and I think uint8_t being non-char is unlikely for different reasons now. The standard says char can be 7+ bits[1] that short/int/long can be any size greater than or equal to char, but must be a multiple of the size of char. Therefore uint8_t and uint_least8_t can only be defined in an environment where char is eight bits. Because if char was 7 bits then short wouldn’t be able to be 8 bits since it’s not a multiple of 7. The only legal way for uint8_t to be short would be if the environment defined both char and short as being 8-bits and then the C library author chose to use short when defining the typedef instead to torture us. Here is the relevant text from the standard:

                                         * Byte --- the unit of data storage in the execution environment
                                           large enough to hold any member of the basic character set of the
                                           execution environment.
                                        ...
                                           Both the basic source and basic execution character sets shall
                                        have at least the following members: the 26 upper-case letters of
                                        the English alphabet [...] the 26 lower-case letters of the English
                                        alphabet [...] the 10 decimal digits [...] the following 29 graphic
                                        characters [...] the space character, and control characters
                                        representing horizontal tab, vertical tab, and form feed. In both
                                        the source and execution basic character sets, the value of each
                                        character after 0 in the above list of decimal digits shall be one
                                        greater than the value of the previous. [...] In the execution
                                        character set, there shall be control characters representing alert,
                                        backspace, carriage return, and new line.
                                        ...
                                         * Character --- a single byte representing a member of the basic
                                           character set of either the source or the execution environment.
                                        ...
                                           There are four signed integer types, designated as signed char,
                                        short int, int, and long int.
                                        ...
                                        In the list of signed integer types above, the range of values of
                                        each type is a subrange of the values of the next type in the list.
                                        ...
                                           For each of the signed integer types, there is a corresponding (but
                                        different) unsigned integer type (designated with the keyword unsigned)
                                        that uses the same amount of storage (including sign information)
                                        ...
                                        2 The sizeof operator yields the size (in bytes) of its operand, which may
                                        be an expression or the parenthesized name of a type. The size is
                                        ...
                                        3 When [sizeof is] to an operand that has type char, unsigned char, or
                                        signed char, (or a qualified version thereof) the result is 1.
                                        ...
                                        requirement that objects of a particular type be located on storage
                                        boundaries with addresses that are particular multiples of a byte address
                                        

                                        [1] char must be 7+ bits because the standard specifies exactly 100 values which it says needs to be representable in char. Fun fact: that set of legal characters per ANSI X3.159-1988 is basically everything you’d expect from ASCII except $@` which the standard defines as undefined behavior lool Maybe C20 or whatever the next one is should use those for bsr, bsf, and popcnt

                                        Edit: It makes sense that @` weren’t required since their positions in the ASCII table kept being redefined between the ASA X3.4-1963 and USAS x3.4-1967 standards. Not sure what the rationale is for dollar. The ANSI C89 standard also has text saying that dollar may be used in identifiers, along with anything else, but it isn’t mandatory. GNU lets us use dollar identifiers which is cool although I wish they let us use UNICODE symbols too.

                                        1. 2

                                          Note that, although it has to be a typedef, it doesn’t have to be a typedef of a standard type. For example, in CHERI C, intptr_t is a typedef of the built-in type __incap_t. This is permitted by the standard (as far as we could tell) in the same way that it’s permitted for intmax_t to be __int128_t or __int256_t or whatever on systems that expose these as non-standard types.

                                          1. 1

                                            Shit, so that means we could have a build in __unsigned_octet_t type that’s not unsigned char, and and alias uint8_t to that?

                                            That would invalidate the whole aliasing assumption.

                                            1. 1

                                              intmax_t being 64-bit in gnu system five environments always seemed to me like the biggest contradiction with the wording of the standard. cosmopolitan libc defines intmax_t as __int128 for that reason but i’ve often wondered if that’s wise. Do you know off hand if there’s any other environments doing that?

                                              1. 2

                                                intmax_t is defined as int64_t because __int128 didn’t exist in the ‘90s (when most things were 32-bit and a lot of platforms that GNU and BSD systems supported couldn’t even do 64-bit arithmetic without calling out to a software implementation) and it’s an ABI-breaking change to change it. It’s a shame that it exists at all, because it’s predicated on the assumption that your library will never be run linked to a program compiled for a newer system that supports a wider integer value. On a modern x86 system with AVX-512, you could store a 512-bit integer in a register and write a fairly fast set of operations on it, so should intmax_t be 512 bits?

                                                1. 1

                                                  __int128 is a recycling of the 32-bit code for having 64-bit integers. Why throw away all that effort after the move to 64-bit systems? As for AVX-512 as far as I know SSE and AVX do not provide arithmetic types that are wider than 64-bits.

                                                  1. 2

                                                    Most 64-bit ABIs were defined before __int128 came along. AVX-512 doesn’t natively support 512-bit integers, but it does support 512-bit data in registers. You can implement addition by doing vector addition and then applying the carry bits. You can implement in-register multiply in a similar way. This makes a 512-bit integer a more realistic machine type than __int128, which is generally stored in a pair of registers (if you’re going to have an integer type that doesn’t fit in one register, why stop at two registers? Why not have a type split between four or more integer registers?).

                                                    1. 1

                                                      Could you teach me how to add the carry bits in SSE vectors? I know how to do it with VPTERNLOGD but it sounds like you know a more general approach than me.

                            2. 1

                              See, sizeof(char) is not fully specified in C.

                              I think what you mean is CHAR_BIT (the number of bits in a byte) that is not fully specified. sizeof(char)==1 by C11 6.5.3.4p4:

                              When sizeof is applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.

                              1. 1

                                Whoops, my bad. Makes sense. Oh my, I guess that means sizeof(uint32_t) might be like 1. Goodness.

                              2. 1

                                Changing jibsen’s function to use uint8_t* instead will simply make the code refuse to compile in those kinds of environments. That’s why the blog post recommends mask+shift. The linked macros would work on those DSPs provided you store one octet per word.

                                1. 2

                                  As I said in italics, refusing to compile was the point.

                                  1. 1

                                    That sort of attitude leads to about a third of all github issues for C projects like STB last time I checked. There’s always some finger wagger who adds a compiler warning that breaks code due to things like unused parameters because they feel it’s time for us to rethink things. If it’s possible and legal it should be permitted.

                                    1. 1

                                      Monocypher uses uint8_t for every buffer, and many (possibly most) of its users are in the embedded space.

                                      I don’t recall having even a single complaint about it.

                                      1. 1

                                        Yeah if you’re writing crypto libraries I can see the reluctance to accommodate weird architectures. Is valiant wolf your legal name? Makes the sheep like me a bit afraid to trust a library like that.

                                        1. 1

                                          It is my legal name, believe it or not. And if you don’t trust my work, trust its audit.

                                          1. 1

                                            Carry on then. I looked through your code and it looked like you’re doing things the right way.

                                2. 1

                                  See, sizeof(char) is not fully specified in C.

                                  So wrong. N1570 §6.5.3.4p4. sizeof (char) and sizeof (unsigned char) are defined to be 1.

                                  Of course, on those machines uint8_t would not even compile, but that’s kind of the point: if you can’t have bytes, you need to rethink your serialization code.

                                  People generally don’t run their code on DSPs, but let’s say that a popular machine architecture came out with 9 bit bytes. It would be incredibly unusual if that architecture exposed data streams coming over the internet by spreading a sequence of 9 8 bit bytes across 8 9 bit bytes. It’s more likely that this architecture would put the 9 8 bit bytes in 9 9 bit bytes with the MSB unset. It’s entirely possible to write code which handles this correctly and portably.

                                  That being said, if you’re of the opinion that it’s not worth worrying about machines where uint8_t is not defined then you probably don’t care about this hypothetical scenario in which case your entire point about using uint8_t over unsigned char is moot since it won’t matter anyway.

                                  1. 1

                                    N1570 §6.5.3.4p4. sizeof (char) and sizeof (unsigned char) are defined to be 1.

                                    Yeah, I was confusing sizeof and the width of bytes. (An interesting consequence is that sizeof(uint32_t) is one on machines with 32-bit bytes.)

                                    People generally don’t run their code on DSPs

                                    The reason I’ve even heard of DSPs with 32-bit bytes was because a colleague of mine had to write a C program for it, and he ran into all sorts of interesting problems because of that unusual byte size. Sure, the purpose of the chip was probably to do some simple and very fast signal processing, but if you can get away with cramming in more general purpose code in there as well, you could lower the manufacturing costs.

                                    It would be incredibly unusual if that architecture exposed data streams coming over the internet by spreading a sequence of 9 8 bit bytes across 8 9 bit bytes.

                                    It would indeed. I was more thinking of the (real) machines that have 32-bit bytes. It makes more sense for them to pack 4 network octets into a single 32-bit byte.

                                    That being said, if you’re of the opinion that it’s not worth worrying about machines where uint8_t is not defined

                                    I’m on the opinion that we should worry about them. Which is why I advocate explicit exclusion by using uint8_t.

                                    1. 2

                                      An interesting consequence is that sizeof(uint32_t) is one on machines with 32-bit bytes.

                                      uint32_t doesn’t exist on machines where CHAR_BIT is not 8.

                                      The reason I’ve even heard of DSPs with 32-bit bytes was because a colleague of mine had to write a C program for it, and he ran into all sorts of interesting problems because of that unusual byte size. Sure, the purpose of the chip was probably to do some simple and very fast signal processing, but if you can get away with cramming in more general purpose code in there as well, you could lower the manufacturing costs.

                                      Oh for sure, I write my code with the idea that if someone wanted to run it on a DSP for some reason they would at least get a predictable result. The only problem is that when CHAR_BIT is not 8 it’s difficult to know if whatever input data stream coming from some unknown source is going to be octets merged into a bitstream and spread over bytes or if each octet only gets one byte.

                                      It would indeed. I was more thinking of the (real) machines that have 32-bit bytes. It makes more sense for them to pack 4 network octets into a single 32-bit byte.

                                      So in this case a lot of the serialisation/deserialisation code I write deals with one octet per byte. You would need to write a frontend to translate from whatever packed representation appears inside a byte into separate octets per byte to use that code on a machine where octets are merged over bytes.

                                      The problem with writing general purpose code targeting this is that it’s incredibly difficult to make the code clean, auditable and simple while giving enough options to cover the wealth of possible ways you may take octets and fit them into non-8-bit-bytes (even if you just restrict yourself to power-of-two machines).

                                      At the point where you have a general purpose serialisation/deserialisation library which handles this in a way which is flexible enough to handle all possible cases the code will get complicated enough that it would probably be less error prone to modify the original code to specifically work for the intended architecture. Especially when such a modification will actually be quite minor in what would also be quite a tiny codebase.

                                      I’m on the opinion that we should worry about them. Which is why I advocate explicit exclusion by using uint8_t.

                                      I personally think that in that case it is easier, clearer and more explicit to just write:

                                      #include <limits.h>
                                      #if CHAR_BIT != 8
                                      #error "This codebase does not support non-octet bytes."
                                      #endif
                                      

                                      I would then agree that in cases where you’re explicitly operating on the assumption that chars hold 8 bits you should then use uint8_t or a typedef of octet. In general I think because of C’s lacking type system people are reluctant to rely on typenames to add clarity to codebases where it may actually bring a lot of benefit.

                                      Personally I handle this by using unsigned char everywhere inside [de]serialisation functions and using masking and shifting to treat any width char as only holding octets. The interfaces use Since this kind of problem only occurs at interfaces I then document this assumption in the API documentation.