1. 26
  1.  

  2. 7

    This article itself is pretty problematic from the get-go.

    actual array, not just block of memory which can act like one

    WTF is the difference between an “actual array” as opposed to a “block of memory”? Apparently the author thinks ”actual” means “allocated on the stack”, which makes no sense.

    Languages supporting VLAs in one form or another include: Ada, Algol 68, APL, C, C#, COBOL, Fortran, J and Object Pascal. Beside C and C#, those aren’t languages which one would call mainstream nowadays.

    That’s nonsense. Any modern language supports Variable Length Arrays — Java, JS, Python, Rust, etc. If we limit this to “actual” [stack-based] arrays, that list still includes Go. Of course in Go whether any non-scalar value is located on the stack is up to the compiler, but really that’s true of most languages. (AFAIK there’s nothing in the C language spec forbidding VLAs from being heap based.)

    I’m not denying the author’s main point that C VLAs can be dangerous, but for fuck’s sake this is C we’re talking about — everything is dangerous, even copying a string. And in pretty much any language, letting invalidated, untrusted input tell you what size array to allocate is a very bad idea … the only difference with stack-based allocation is that it takes smaller numbers to cause mayhem. If you do proper validation and limit array sizes to a sane amount, VLAs are a respectable optimization.

    1. 3

      WTF is the difference between an “actual array” as opposed to a “block of memory”? Apparently the author thinks ”actual” means “allocated on the stack”, which makes no sense.

      An array is a type consisting of a contiguously allocated nonempty sequence of objects with a particular element type. The number of those objects (the array size) never changes during the array lifetime.

      A pointer is a type of an object that refers to a function or an object of another type, possibly adding qualifiers. Pointer may also refer to nothing, which is indicated by the special null pointer value.

      So in particular, arrays have a static size.

      That’s nonsense. Any modern language supports Variable Length Arrays — Java, JS, Python, Rust, etc. If we limit this to “actual” [stack-based] arrays, that list still includes Go. Of course in Go whether any non-scalar value is located on the stack is up to the compiler, but really that’s true of most languages. (AFAIK there’s nothing in the C language spec forbidding VLAs from being heap based.)

      So the author’s point stands.

      1. 1

        [Citation Needed]? In any case, the distinction between pointers and arrays in C is famously tenuous — an array is almost identical to a pointer to its first element, except that it’s allocated for you somehow. The distinction between an auto or static array and one allocated by malloc is pretty minor.

        I don’t think the author’s point that variable-length arrays are only found in obscure languages (other than C and C#) stands. I explained why; if you disagree you might explain too.

        1. 1

          [Citation Needed]?

          An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. The element type shall be complete whenever the array type is specified. Array types are characterized by their element type and by the number of elements in the array. An array type is said to be derived from its element type, and if its element type is T, the array type is sometimes called ‘‘array of T’’. The construction of an array type from an element type is called‘‘array type derivation’’.

          compare with

          A pointer type may be derived from a function type or an object type, called the referenced type. A pointer type describes an object whose value provides a reference to an entity of the referenced type. A pointer type derived from the referenced type T is sometimes called ‘‘pointer to T’’. The construction of a pointer type from a referenced type is called ‘‘pointer type derivation’’. A pointer type is a complete object type

          §6.2.5¶20 ISO/IEC 9899:201x

    2. 6

      There is one use case where VLA comes in handy

      NB. This is not actually a VLA, but a variably-modified type. Here’s a blogpost by a c committee member on the subject.

      1. 4

        I LOLed at the final picture of “chocoladevla”. Whenever I read about VLA, my mind conjures up exactly that!

        1. 5

          A VLA applies to a type, not an actual array. So you can create a typedef of a VLA type, which “freezes” the value of the expression used, even if elements of that expression change at the time the VLA type is applied

          C continuing to meet my expectations as the foot gun language.

          1. 3

            For stack usage, it seems to me that once you have a check against a max size (and input validation is a good idea), VLAs are strictly safer. Consider:

            int arr[MAX];
            

            vs:

            if (n > MAX) return;
            int arr[n];
            
            1. 3

              Only in COBOL

              Ada VLAs store their own bounds, are bounds checked and can be constrained in size by the index type size. If you overflow, you’ll get a Storage_Error. That seems safe to me.

              1. 1

                Yes, I also think this is reasonable. A well defined failure mode is a way for a runtime to announce an unsolveable condition.

                I believe, but may be mistaken, that Pascal and its derivatives like Modula-2 and may be Delphi did something similar.

                I wonder though if more could be done by a compiler or by a programmer.

                For example, if an executable or a shared object could communicate to OS, at program load time the maximum stack size it will need is XY, or other ‘capabilities’ (eg whether it uses less secure runtime calls, etc).

                This way an OS could examine it, compare it to say ‘ulimits’ and other constraints – and declare right at the load time, that a given executable or DLL will not work.

                Just in general, trying to move the potential run-time errors closer to the ‘execution start/load time’ , in my view, has a benefit of better user experience and would allow OS subsystems that deal with security, scheduling, memory management to be more ‘aware’ of the program’s intentions.

              2. 1

                VLAs are interesting in some more ways, one being among the few or even only(?) feature going from being part of the language in one revision to moving to an optional annex is another (c11) hinting at some of the tug of war going on between stakeholders within the committee itself.

                VLAs are allocated on stack - and this is the source of the most of the problems

                Does this have to be the case? I have not seen any implementation to the contrary, but what are the blockers from having an implementation defined control in the compiler for mapping the allocation/deallocation to functions of my choosing yet retain the syntactical convenience and distinction from static sized auto / dynamic size-lifespan malloc?

                I can see that cleanup across longjmps being an awkward edge case to cover, are there any others?

                1. 2

                  Not quite the same, but related: _malloca; allocates small sizes on the stack and large ones on the heap. Its memory needs to be manually freed, though; so it’s about performance, not correctness.

                  cleanup across longjmps being an awkward edge case to cover

                  Raii style patterns are problematic in general in c because there’s no mechanism for stack unwinding. There’s a working (controversial) proposal to add stack unwinding; presumably that could be implemented to integrated with your heap VLAs, but there’s no rescuing longjmp.

                  Another fun wrinkle: tcc vlas are incompatible with signal handlers.

                  1. 1

                    Using malloc for VLAs seems like an interesting idea. There’s a lot of weird cases (such as goto and longjmp), but AFAIK, compilers already generally handle those cases because it’s required by C++‘s destructors or GNU C’s __attribute__((cleanup)).

                    I wonder if it would be correct in all situations to rewrite:

                    int array[x];
                    ...
                    something(sizeof(array));
                    

                    into:

                    size_t __vla_len_1 = x;
                    int *array __attribute__((cleanup(free))) = malloc(sizeof(*array) * __vla_len_1);
                    ...
                    something(sizeof(*array) * __vla_len_1);
                    

                    You’d probably need a few more rewrite rules, but I don’t immediately see any huge show-stopping issues.

                    EDIT: I checked, and it seems like neither C++ destructors nor __attribute__((cleanup)) works with longjmp.

                    1. 6

                      I’ve written C macros in the past that use __attribute__((cleanup)) and a small fixed-size on-stack allocation and give me a pointer to either the stack allocation (if it’s big enough) or to a heap allocation. They expand to something like this:

                      static void clean_heap_buffer(void **buf)
                      {
                        free(*buf);
                        *buf = NULL;
                      }
                      
                      ...
                      
                      T stack_buf[16] = {0};
                      __attribute__((cleanup(clean_heap_buf)) T *heap_buf = NULL;
                      T *buf = stack_buf;
                      if (size > 16)
                      {
                        heap_buf = calloc(size, sizeof(T));
                        buf = heap_buf;
                      }
                      

                      You can now just use buf and everything is fine. If the function is inlined and the compiler can prove that size <= 16 is always true then the malloc paths are optimised away. Note that you can’t use free as the argument to the cleanup attribute because the cleanup function takes a pointer to the on-stack value, so you need a dereference. Your version will pass the address of a stack variable to free.

                      Whether destructors / cleanup work with longjmp depends on the implementation. The simple implementation saves some registers in setjmp and just restores them in longjmp. The Itanium version records the stack and instruction pointer registers and then uses DWARF unwind info to unwind the stack. Both modes are supported by GCC and Clang on most architectures but because they’re ABI-incompatible the former is used by default.

                      At some point after writing this kind of macro, you realise that you should just use a language that you don’t have to fight to do this kind of thing. C++ makes this trivial. LLVM has a SmallVector class that does this in a much cleaner way (including allowing dynamic resizing). It looks to consumers like a std::vector but contains a small (specified by template parameter) in-object buffer. If the number of elements you store fits into that buffer, it never allocates on the heap. If you push another element, it does a heap allocation and copies everything.

                    2. 1

                      How useful is that when you don’t have standard library linked?

                      1. 1

                        Not very, unless you write your own malloc/free I suppose. But in which situations do you have so much stack space available that you can just dynamically allocate it safely, but don’t have a libc?

                        1. 1

                          This is common in bare metal applications and in OS implementation, a major use case for C where other languages aren’t making a dent. The systems are not necessarily tiny at all, just that you don’t have the standard library.

                          Stack allocation is still a problem there of course due to runtime non-determinism. My point is making a core language feature depend on standard library is not a great idea practically and a bit of circular thinking.