1. 7
  1. 6

    The best explanation of the difference between pointers and arrays is still in The C Programming Language by Kernighan and Ritchie, 2nd ed., which covers this topic in Chapter 5, “Pointers and Arrays”.

    To anyone who’s interested in the topic, please read this book and ignore this blog post.

    1. 2

      I will second this recommendation, and in particular, I want to point out this nugget of wisdom from The C Programming Language, because thinking of arrays as “syntactic sugar for pointers” gets things backwards a little:

      There is one difference between an array name and a pointer that must be kept in mind. A pointer is a variable, so [assigning to a pointer] and [incrementing a pointer] are legal. But an array name is not a variable; constructions like [assigning to an array] and [incrementing an array] are illegal.

      As the book then points out, when an array is passed as an argument to a function, what’s actually being passed behind the scenes is a pointer to the first element of that array. In this particular case, a[i] is indeed syntactic sugar for *(a + i) – which isn’t some remarkable discovery that the commentators of the great classics have made after asiduously studying what the great Kernighan has left us, but literally the second or third thing that Kernighan mentions about it. It’s certainly true that, when you deconstruct the behaviour of a program, you cannot tell an array access from a pointer access – unsurprising, since they both translate to the same thing. But they are certainly different at the code-writing end.

      The author discusses this in the context of reverse engineering (I think) so I can see why they’d present arrays as “syntactic sugar for pointers”. It’s probably a good mental model, but it’s important to remember that it is “just” a mental model, and not what how the language treats it.

    2. 4

      Many C textbooks can tell you that a[i] is in fact a syntactic sugar for *(a+i).

      But even more: array declarations can be seen as syntactic sugar as well.

      Instead of:

      int a[128];
      int b[128];
      

      … you can think of:

      int *a=<address of a chunk in global memory or local stack of size 128*sizeof(int)>;
      int *b=<address of a chunk in global memory or local stack of size 128*sizeof(int)>;
      

      Except not really. &a in the first case is going to point toward the beginning of the array. In the second case it’s going to point toward to the location in memory in which the “address of a chunk in global memory or local stack of size 128*sizeof(int)” is stored. Depending on how its compiled, the second case can result in another variable being stored on the stack too.

      The second case is more like writing:

      int a[128];
      int *b = a;
      

      We then have (with the casts to (void *) omitted for clarity):

      &a != &b
      &a == a
      &a == b
      &b != b
      

      Whereas a[i] == i[a] == *(a + i) == *(i + a) is literally just syntactic sugar. This is nitpicking, but when it comes to C/C++ semantics, I think it’s important to try get things right.

      1. 2

        I think that second case won’t result in anything no matter how you compile it, because it’s not legal C :-).

        I suspect the author “translated” some (simplified) compiler output back into C. That looks a bit like what a trivial implementation of a memory pool system would yield.

        I think the point which the author wants to make is as follows: if you have this:

        int a[N];
        int b[N];
        

        then this is true:

        &b[0] == &a[N]
        

        (I don’t recall if there are any specific ordering requirements about local variable declaration vs. stack growth direction, so maybe whether that is true or &a[0] == &b[N] is a local artefact?)

        I don’t recall seeing an architecture or a program where that wasn’t the case. However, I’m also not sure there’s any specific requirement that would make it so, and I’ve seen enough weird hardware that I don’t want to hazard a guess here :-D.

        The only thing in the C99 standard I can think of off the top of my head as normative in this case is the definition of the equality operator for pointers. The last case where p1 == p2 evaluates to true (p1 and p2 being pointers) is:

        one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

        (Emphasis mine). The standard goes on to say that “two objects may be adjacent in memory because they are adjacent elements of a larger array or adjacent members of a structure with no padding between them, or because the implementation chose to place them so, even though they are unrelated. (Emphasis also mine).

        That seems to imply that this is an implementation artefact (in the case above, a and b are, I think, unrelated). And I can think of cases where doing the opposite might be a valid implementation choice. For example, I would not be at all surprised if some clever hardware engineer punk came up with the glorious idea of a CPU that allows unaligned indexed accesses in general, but requires an aligned base pointer. In that case, an array that follows after an odd-length char array wouldn’t be adjacent to it. But bear in mind that this is 100% hypothetical, I mean, I sure hope it is…

        1. 1

          There is nothing in the C standard that states that &b[0] == &a[N] (or the opposite &a[0] == &b[N]) is true. That it might be is (as you call it) a local artifact. I can see an architecture [1] where two global arrays are in their own segments of memory and the concept of “one after the other” doesn’t make sense.

          [1] An easy example is the 80286 in protected mode.

          1. 1

            A more realistic example might be that the compiler realizes it can reuse the space used by a if the arrays are not used at the same time; in that case &a[0] might be equal to &b[0].

            1. 2

              True. Also, there’s nothing in the standard that says that local variables are allocated in the source code order (unlike struct/union members, which have to be in said order). They can be reordered to avoid padding, for instance.

            2. 1

              I can see an architecture [1] where two global arrays are in their own segments of memory and the concept of “one after the other” doesn’t make sense.

              Oh, yeah, I was thinking of the more restricted case of stack-allocated arrays. Bets are definitely off for globally-allocated arrays. Memory segmentation on x86, or banking on pretty much any architecture, will certainly screw up any address arrangement.