1.  

    I am a big fan of coverage, but feel that a lot of the debate around the practice largely misses the point(s). So, while I agree that complete or high coverage does not automatically mean that a test suite or the software is good… of course it doesn’t? In the extreme, it’s pretty trivial to reach 100% coverage without testing the actual behaviour at all.

    Coverage is useful for other reasons, for example the one this article ends with:

    Our results suggest that coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.

    Identifying under-tested parts of a program seems like a pretty important part of a testing strategy to me. Like many advantages of coverage, though, you have to have pretty high coverage for it to be useful. There are other “flavours” of this advantage that I find useful all the time, most obviously dead code elimination. High test coverage at the very least signals that the developers are putting effort into testing, and checking that their testing is actually hitting important pieces of the code. Maybe their test suite is in fact nearly useless, but that seems pretty unlikely, and it could be nearly useless without coverage, too. That said, like any metric, it can be gamed, and pursuing the metric itself can easily go wrong. Test coverage is a means to many useful ends, not an end unto itself.

    The quest for 100% may be a bit of a wank, but I’ve tried that in a few projects before and actually found it quite useful. In particular it highlights issues with code changes that affect the coverage of the test suite in a very simple way. Day-to-day, this means that you don’t need to meticulously pour over the test suite every time any change is made to make sure that some dead code or dead/redundant branches weren’t added. If you don’t have total coverage, doing that is a chore. If you do, it’s trivial: “oh, the number is not 100% anymore, I should look into why”. I regularly end up significantly improving the code during this process. It’s undeniably a lot of work to get there (depending on the sort of project), but once you do, there are a lot of efficiency benefits to be had. If the project has platform-specific or -dependant aspects, then this is even more useful in conjunction with a decent CI system.

    As to the article itself, the methodology here seems rather… convenient to me:

    • Programs are mutated by replacing a conditional operator with a different one. This mutation does not affect coverage (except perhaps branch coverage, in exactly one case, if you’re replacing > with >= as they are here). It also hardly seems like a common case.

    • The effectiveness of the test suite as a whole is determined by running random subsets of the tests and seeing if they catch the bug. This is absurd. Test suites are called test suites for a reason. The instant you remove arbitrary tests, you are no longer evaluating the effectiveness of the test suite, full stop. You are - obviously - evaluating the effectiveness of a random subset of the test suite. Who cares about that?

    Am I missing something? In short, given this methodology, the only things these results seem to say to me is: “running a random subset of a test suite is not a reliable way to detect random mutations that change one conditional operator to another”. I don’t think this is at all an indicator of overall test suite effectiveness.

    That said, I have not read the actual paper (paywall), and am assuming that the summary in the article is accurate.

    1.  

      I have not read the actual paper (paywall)

      The PDF is on the linked ACM site: https://dl.acm.org/doi/pdf/10.1145/2568225.2568271 – I think you must have misinterpreted something or took a wrong turn somewhere(?)

      Otherwise there is always that certain site run by a certain Kazakhstani :-)

      1.  

        Paywalled in the typical ACM fashion as far as I can tell?

        That said, sure, there are… ways (and someone’s found an author copy on the open web now). I’m just lazy :)

        1.  

          Skimmed the paper. It seems the methodology summary in the article is accurate, and I stand by my critique of it. To be fair, doing studies like this is incredibly hard, but I don’t think the suggested conclusions follow from the data. The constructed “suites” are essentially synthetic, and so don’t really say anything about how useful of a quality metric or target coverage is in a real-world project.

          1.  

            Huh, I can just access it. I don’t know, ACM is weird at times; for a while they blocked my IP because it was “infiltrated by Sci-Hub” 🤷 Don’t ask me what that means exactly, quoting their support department.

        2.  

          Identifying under-tested parts of a program seems like a pretty important part of a testing strategy to me.

          My interpretation is that test coverage reports can be useful if you look at them in detail to identify specific areas in the code where you thought you were testing it but you were wrong.

          But test coverage reports are completely useless if you just look at a percentage number on its own and say “the tests for project X are better than project Y because their number is higher”. We have a codebase at work with the coverage number around 80%, and having looked at it in detail, I can tell you that we could raise that number to 90% and get absolutely no actual benefit from it.

        1.  

          State serialization and synchronization across a network

          Woah woah woah, I don’t think that’s something the voxel engine needs, just something the game needs that is unrelated to the graphics stack. What am I missing here?

          1.  

            Typical modern game style is to use the data model more or less directly everywhere. So, for example, if you have a simple ECS with Position and Speed components, all the code you write that does something with that information has to… well, work with that information, in the format it is stored in (usually quite directly).

            That’s not true for just rendering, it’s how the entire application is written. If the job is to synchronize state across the network, then that code also needs to understand how to find and manipulate that state. It wouldn’t make sense to try and make some kind of abstraction there: the data model is the abstraction that all the pieces of code use to cooperate. “It’s All About the Data”, as the headline in TFA says.

            1.  

              I don’t understand why you think “voxel engine” means “voxel rendering engine”. The entire point of the article seems to be that a “voxel engine” is (or should be) so much more than a cool voxel renderer.

            1. 31

              It’s odd to see C described as boring. How can it be boring if you’re constantly navigating a minefield and a single misstep could cause the whole thing to explode? Writing C should be exhilarating, like shoplifting or driving a motorcycle fast on a crowded freeway.

              1. 17

                Hush! We don’t need more reasons for impressionable youngsters to start experimenting with C.

                1. 10

                  Something can be boring while still be trying to kill you. One example is described in Things I Won’t Work With.

                  1. 1

                    ‘Boring’ is I suspect the author’s wording for ‘I approve of this language based on my experiences’.

                    1. 10

                      I suspect “boring” is used to describe established languages whose strengths and weaknesses are well known. These are languages you don’t spend any “weirdness points” for picking.

                      1. 5

                        Normally I’d lean towards this interpretation, but I’ve read many other posts by this author and he strikes me as being more thoughtful than that. Perhaps a momentary lapse in judgement; happens to everyone I suppose.

                        1. 4

                          ‘Boring’ is I suspect the author’s wording for ‘I approve of this language based on my experiences’.

                          I’m curious if you read the post, and if so, how you got that impression when I said things like “it feels much nicer to use an interesting language (like F#)”, “I still love F#”, etc.

                          Thanks for the feedback.

                          1. 4

                            I found your article pretty full of non-sequiturs and contradictions, actually.

                            boring languages are widely panned. … One thing I find interesting is that, in personal conversations with people, the vast majority of experienced developers I know think that most mainstream langauges are basically fine,

                            Are they widely panned or are they basically fine?

                            But when I’m doing interesting work, the boilerplate is a rounding error and I don’t mind using a boring language like Java, even if that means a huge fraction of the code I’m writing is boilerplate.

                            Is it a rounding error or is it a huge fraction? Once the code has been written down, it doesn’t matter how much effort it was to mentally wrestle with the problem. That was a one-time effort, you don’t optimize for that. The only thing that matters is clearly communicating the code to readers. And if it’s full of boilerplate, that is not great for communication. I want to optimize for clear, succinct communication.

                            Of course, neither people who are loud on the internet nor people I personally know are representative samples of programmers, but I still find it interesting.

                            I’m fairly sure, based on this, that you are just commenting based on your own experiences, and are not claiming to have an unbiased sample?

                            To me it basically seems that your argument is, ‘the languages which should be used are the ones which are already used’. The same argument was used against C, C++, Java, Python, and every other boring language you can think of.

                            1. 2

                              Are they widely panned or are they basically fine?

                              I think the point is that the people who spend a lot of time panning boring languages (and advocating their favourite “interesting” one) are not representative of “experienced developers”. They’re just very loud and have an axe to grind.

                              1. 1

                                Having a tough time reconciling this notion that a narrow section of loudmouths criticize ‘boring languages’, against ‘widely panned’, which to me means ‘by a wide or significant section’.

                                But it’s really quite interesting how the experienced programmers who like ‘boring languages’ are the ones being highlighted here. It begs the question, what about the experienced programmers who don’t? Are they just not experienced enough? Sounds like an unquestionable dogma to me. If you don’t like the boring languages in the list, you’re just not experienced enough to realize that languages ultimately don’t matter.

                                Another interesting thing, some essential languages of the past few decades are simply not in this list. E.g. SQL, JavaScript, shell. Want to use a relational database, make interactive web pages, or just bash out a quick script? Sorry, can’t, not boring enough 😉

                                Of course that’s a silly argument. The point is to use the right tool for the job. Sometimes that’s a low-level real-time stuff that needs C, sometimes it’s safety-critical high-perf stuff that needs Ada or Rust, sometimes you need a performant language with good domain modelling and safety properties like OCaml or F#. Having approved lists of ‘boring languages’ is a silly situation to get into.

                                1. 1

                                  To be honest, I don’t really see why that’s hard to reconcile at all. Take an extreme example:

                                  Let’s say programming language X is used for the vast majority of real world software development. Through some strange mechanism (doesn’t matter), programmers who write language X never proselytize programming languages on the Internet. Meanwhile, among the set of people who do, they almost always have nasty things to say about X. So, all the articles you can find on the general topic are at least critical of X, and a lot of them are specifically about how X is the devil.

                                  Is saying that X is “widely panned” accurate? Yes.

                                  Of course that’s a silly argument.

                                  Yes it is.

                                  The point is to use the right tool for the job.

                                  Indeed.

                      1. 1

                        Since struct and class are so similar, I choose to consider class to be the keyword in excess, simply because struct exists in C and not class, and that it is the process of the keyword class that brought them both so close.

                        This is an interesting perspective on the history. I would consider struct to be the keyword worth removing, since that would change the default access qualifiers to be safer.

                        1. 5

                          I may be misremembering but I am reasonably sure that backwards compatibility with C was one of the early design goals of C++. Removing struct would quickly break compatibility. That is, presumably, why the default access qualifier is different from class‘s (and identical to C’s struct).

                          1. 1

                            It’s always irked me that this C compatibility was only one-way because of support for member functions (at least).

                          2. 3

                            Removing struct would create a lot more C code that is not C++, and making the default “safer” doesn’t improve things since, as noted, it’s standard practice to be explicit with access qualifiers.

                            1. 4

                              Yeah, I don’t think that can be understated. This would destroy one of the biggest reasons C++ was successful, and one of its main advantages to this day. It would even make most C headers not C++ compatible, which would be an absolute catastrophe. Even if the committee did something so egregious, no compiler could or would ever implement it (beyond perhaps a performative warning).

                              I think the real mistake is that the keywords are redundant at all. We’ve ended up with this near-universal convention that struct is for bags of data (ideally POD or at least POD-ish) because that’s a genuinely useful and important distinction. Since C++ somehow ended up with the useless “class except public by default” definition, we all simply pretend that it has a useful (if slightly fuzzy) one.

                              1. 1

                                Because of its incremental design and the desire to make classes seem like builtin types, C++ has a Moiré pattern-like feel. A lot of constructs that are exceedingly close, yet different.

                          1. 11

                            These aren’t for comments, but rather for replies on a review thread. I think it is unwise to overload the term ‘comment’ in computing.

                            For code comments, I have been using https://www.python.org/dev/peps/pep-0350/ for this for a long time, and recommend it to others.

                            For review responses, I suppose this looks decent enough, although the use of bold assumes styled text; I would prefer all-caps, as has been conventional in unstyled text for quite awhile. When styles are available, bold and all-caps is quite visually distinct.

                            1. 4

                              These aren’t for comments, but rather for replies on a review thread. I think it is unwise to overload the term ‘comment’ in computing.

                              These are comments, readers are supposed to understand that we’re talking about something different from code comments by context. This is absolutely not an unreasonable expectation. Both my kids have understood contextual words without being taught. Context really is intuitive to human nature and it’s perfectly reasonable to use the same word in different contexts to mean something different.

                              1. 4

                                “Reviews” is the standard word for this.

                                1. 1

                                  The only real context here outside of TFA itself is computing, or maybe slightly more broadly, technology. All we see on this site, which is generally full of programming minutia, is “comments”, in both the title and domain name. The use of the word “conventional” only makes it worse: conventions in code comments are an almost universally recognized and common thing, conventions in reviews, not so much. One might even argue that this is nearing territory considered off-topic on this site (being not particularly technical).

                                  I’d be low-key surprised if anyone here assumed differently. This is actually my second click through to this article, because although I read the whole thing the first time, it didn’t even occur to me that this link was to that article, and not something on code commenting practices or whatever that I missed before.

                                  Sure, anyone who actually reads the whole thing and comes away confused… well, has bigger problems… but it’s still a poor choice of words. Maybe this is a superficial bikeshed, but that sort of thing is pretty important when the whole point is to define a soft standard for things with a standard name. Even in the context this is specifically intended for (code review), I’d assume that “conventional comments” was something about the code (did I get the Doxygen tags wrong or something?), because of course I would. That’s what a code review is.

                              1. 1

                                I haven’t ever done anything with entity-component systems. I am curious about how broadly this could be applied.

                                So I understand you having an entity and you give it a position, so presumably this component is like a property of the entity. So why not just give the entity a position directly?

                                1. 1

                                  There’s a few reasons. On the software engineering side of things, it avoids a lot of issues where class hierarchies are too rigid or code becomes too tightly coupled, but there are also performance benefits which is largely why the game development universe is so into the idea.

                                  If you have a system that is calculating collisions, for example, perhaps you only need that position to do the calculation. If you “just” give the entity a position “directly” (assuming Entity is a class and you just jam fields in there), then you will also “just” give it other things directly, and eventually it grows to have a huge number of fields. So, your collision algorithm is scanning huge chunks of fragmented memory only to read a single position variable, which is extremely cache-inefficient.

                                  In contrast, with an ECS you can implement that so that scanning all of the positions is just a linear scan of a contiguous array. Depending on the data type it may even be vectorized. The way you realize this is to not make the component a property of the entity in the sense that it is stored “in” the entity, but to instead store the components by type, completely separate from entities, and associate them with IDs. In the simplest ideal vision, there is no Entity class at all, an entity is simply an integer.

                                  1. 1

                                    In contrast, with an ECS you can implement that so that scanning all of the positions is just a linear scan of a contiguous array. Depending on the data type it may even be vectorized.

                                    I definitely get this along with the cache argument.

                                    The thing I am not really sure about is perhaps more generally, outside of games. I don’t really do game design, but I do things with web applications (angular).

                                    1. 1

                                      It’s a technique that’s mainly for performance–it also is kinda specific to languages (read: C/C++/Java/C#) that don’t have an easy way of doing dynamic compositions/mixins. Like, in Ruby or JS I don’t think it’s as big a win.

                                  2. 1

                                    Its a framework that helps reinforce a good separation of concerns. You can mix and match any type of components across your entities, and your systems only care about their specific types of components. It’s a lifesaver in instances where you need an oddball case later in development. “Gee, I really need this Sword Item class to be able to talk to the player, but only the Character class has “Talk()”! Rather than trying to figure out how to shoehorn your Sword into a different class hierarchy, you would start by just adding a Talk component to the Sword entity.

                                    It flattens out the logic, any “thing” in your world has the capability to do any action.

                                  1. 4

                                    getting rid of footguns like parameter-less/takes-any-argument functions

                                    Wow, finally. A few times I’ve been told to “make my functions ANSI” by reviewers, which always got me like “WHAT? I don’t use the weird K&R style decls before the opening {, this is ANSI??” and a minute later “oh, the stupid (void) parameter, argh” >_<

                                    1. 1

                                      There’s a Warning For That™

                                    1. 10

                                      Are they finally going to fix the abomination that is C11 atomics? As far as I can tell, WG14 copied atomics from WG21 without understanding them and ended up with a mess that causes problems for both C and C++.

                                      In C++11 atomics, std::atomic<T> is a new, distinct type. An implementation is required to provide a hardware-enforced (or, in the worst case, OS-enforced) atomic boolean. If the hardware supports a richer set of atomics, then it can be used directly, but a std::atomic<T> implementation can always fall back to using std::atomic_flag to implement a spinlock that guards access to larger types. This means that std::atomic<T> can be defined for all types and be reasonably efficient (if you have a futex-like primitive then, in the uncontended case it’s almost as fast as T and in the contended state it doesn’t consume much CPU time or power spinning).

                                      Then WG14 came along and wanted to define _Atomic(T) to be compatible with std::atomic<T>. That would require the C compiler and C++ standard library to agree on data layout and locking policy for things larger than the hardware-supported atomic size, but it’s still feasible. Then they completely screwed up by making all of the arguments to the functions declared in stdatomic.h take a volatile T* instead of an _Atomic(T)*. For historical reasons, the representation of volatile T and T have to be the same, which means that _Atomic(T) and T must have the same representation and there is nowhere that you can stash a lock. The desire to make _Atomic(T) and std::atomic<T> interchangeable means that C++ implementers are stuck with this.

                                      Large atomics are now implemented by calls to a library but there is no way to implement this in a way that is both fast and correct, so everyone picks fast. The atomics library provides a pool of locks and acquires one keyed on the address. That’s fine, except that most modern operating systems allow virtual addresses to be aliased and so there are situations (particularly in multi-process situations, but also when you have a GC or similar doing exciting virtual memory tricks) where simple operations _Atomic(T) are not atomic. Fixing that would requiring asking the OS if a particular page is aliased before performing an operation (and preventing it from becoming aliased during the operation), at which point you may as well just move atomic operations into the kernel anyway, because you’re paying system call for each one.

                                      C++20 has worked around this by defining std::atomic_ref, which provides the option of storing the lock out-of-line with the object, at the expense of punting the determination of the sharing set for an object to the programmer.

                                      Oh, and let’s not forget the mtx_timedlock fiasco. Ignoring decades of experience in API design, WG14 decided to make the timeout for a mutex the wall-clock time, not the monotonic clock. As a result, it is impossible to write correct code using C11’s mutexes because the wall-clock time may move arbitrarily. You can wait on a mutex with a 1ms timeout and discover that the clock was wrong and after it was reset in the middle of your ‘get time, add 1ms, timedwait’ sequence, you’re now waiting a year (more likely, you’re waiting multiple seconds and now the tail latency of your distributed system has weird spikes). The C++ version of this API gets it right and allows you to specify the clock to use, pthread_mutex_timedlock got it wrong and ended up with platform-specific work-arounds. Even pthreads got it right for condition variables, C11 predictable got it wrong.

                                      C is completely inappropriate as a systems programming language for modern hardware. All of these tweaks are nice cleanups but they’re missing the fundamental issues.

                                      1. 3

                                        Then they completely screwed up by making all of the arguments to the functions declared in stdatomic.h take a volatile T* instead of an _Atomic(T)*. For historical reasons, the representation of volatile T and T have to be the same, which means that _Atomic(T) and T must have the same representation and there is nowhere that you can stash a lock.

                                        I’m not too familiar with atomics and their implementation details, but my reading of the standard is that the functions in stdatomic.h take a volatile _Atomic(T) * (i.e. a pointer to volatile-qualified atomic type).

                                        They are described with the syntax volatile A *object, and earlier on in the stdatomic.h introduction it says “In the following synopses: An A refers to one of the atomic types”.

                                        Maybe I’m missing something?

                                        1. 2

                                          Huh, it looks as if you’re right. That’s how I read the standard in 2011 when I added the atomics builtins to clang, but I reread it later and thought that I’d initially misunderstood. It looks as if I get to blame GCC for the current mess then (their atomic builtins don’t require _Atomic-qualified types and their stdatomic.h doesn’t check it).

                                          Sorry WG14, you didn’t get atomics wrong, you just got mutexes and condition variables wrong.

                                          That said, I’ve no idea why they felt the need to make the arguments to these functions volatile and _Atomic. I am not sure what a volatile _Atomic(T)* actually means. Presumably the compiler is not allowed to elide the load or store even if it can prove that no other thread can see it?

                                          1. 1

                                            I’ve no idea why they felt the need to make the arguments to these functions volatile and _Atomic

                                            I’ve no idea; but a guess: they want to preserve the volatility of arguments to atomic_*. That is, it should be possible to perform operations on variables of volatile type without losing the ‘volatile’. I will note that the c++ atomics contain one overload with volatile and one without. But if that’s the case, why the committee felt they could get away with being polymorphic wrt type, but not with being polymorphic wrt volatility is beyond me.

                                            There is this stackoverflow answer from a committee member, but I did not find it at all illuminating.

                                            not allowed to elide the load or store even if it can prove that no other thread can see it?

                                            That would be silly; a big part of the impetus for atomics was to allow the compiler to optimize in ways that it couldn’t using just volatile + intrinsics. Dead loads should definitely be discarded, even if atomic!


                                            One thing that is clear from this exchange: there is a massive rift between specifiers, implementors, and users. Thankfully the current spec editor (JeanHeyd Meneide, also the author of the linked post) seems to be aware of this and to be acting to improve the situation; so we will see what (if anything) changes.

                                            1. 3

                                              One thing that is clear from this exchange: there is a massive rift between specifiers, implementors, and users. Thankfully the current spec editor (JeanHeyd Meneide, also the author of the linked post) seems to be aware of this and to be acting to improve the situation; so we will see what (if anything) changes.

                                              It’s not really clear to me how many implementers are left that care:

                                              • MSVC is a C++ compiler that has a C mode. The authors write in C++ and care a lot about C++.
                                              • Clang is a C++ compiler that has C and Objective-C[++] modes. The authors write in C++ and care a lot about C++.
                                              • GCC includes C and C++ compilers with separate front ends, it’s primarily C so historically the authors have cared a lot about C, but for new code it’s moving to C++ and so the authors increasingly care about C++.

                                              That leaves things like PCC, TCC, an so on, and a few surviving 16-bit microcontroller toolchains, as the only C implementations that are not C++ with C as an afterthought.

                                              I honestly have no idea why someone would choose to write C rather than C++ these days. You end up writing more code, you have a higher cognitive load just to get things like ownership right (even if you use nothing from C++ other than smart pointers, your live is significantly better than that of a C programmer), you don’t get generic data structures, and you don’t even get more efficient code because the compilers are all written in C++ and so care about C++ optimisation because it directly affects the compiler writers.

                                              C++ is not seeing its market eroded by C but by things like Rust and Zig (and, increasingly, Python and JavaScript, since computers are fast now). C fits in a niche that doesn’t really exist anymore.

                                              1. 2

                                                I honestly have no idea why someone would choose to write C rather than C++ these days.

                                                For applications, perhaps, but for libraries and support code, ABI stability and ease of integration with the outside world are big ones. It’s also a much less volatile language in ways that start to really matter if you are deploying code across a wide range of systems, especially if old and/or embedded ones are included.

                                                Avoiding C++ (and especially bleeding edge revisions of it) avoids a lot of real life problems, risks, and hassles. You lose out on a lot of power, of course, but for some projects the kind of power that C++ offers isn’t terribly important, but the ability to easily run on systems 20 years old or 20 years into the future might be. There’s definitely a sort of irony in C being the real “write once, run anywhere” victor, but… in many ways it is.

                                                C fits in a niche that doesn’t really exist anymore.

                                                It might not exist in the realm of trendy programming language debates on the Internet, but we’re having this conversation on systems largely implemented in it (UNIX won after all), so I think it’s safe to say that it very much exists, and will continue to for a long time. That niche is just mostly occupied by people who don’t tend to participate in programming language debates. One of the niche’s best features is being largely insulated from all of that noise, after all.

                                                It’s a very conservative niche in a way, but sometimes that’s appropriate. Hell, in the absolute worst case scenario, you could write your own compiler if you really needed to. That’s of course nuts, but it is possible, which is reassuring compared to languages like C++ and Rust where it isn’t. More realistically, diversity of implementation is just a good indicator of the “security” of a language “investment”. Those implementations you mention might be nichey, but they exist, and you could pretty easily use them (or adapt them) if you wanted to. This is a good thing. Frankly I don’t imagine any new language will ever manage to actually replace C unless it pulls the same thing off. Simplicity matters in the end, just in very indirect ways…

                                                1. 4

                                                  For applications, perhaps, but for libraries and support code, ABI stability and ease of integration with the outside world are big ones. It’s also a much less volatile language in ways that start to really matter if you are deploying code across a wide range of systems, especially if old and/or embedded ones are included.

                                                  I’d definitely have agreed with you 10 years ago, but the C++ ABI has been stable and backwards compatible on all *NIX systems, and fairly stable on Windows, for over 15 years. C++ provides you with some tools that allow you to make unstable ABIs for your libraries, but it also provides tools for avoiding these problems. The same problems exist in C: you can’t add a field to a C structure without breaking the ABI, just as you can’t add a field to a C++ class without breaking the ABI.

                                                  I should point out that most of the things that I work on these days are low-level libraries and C++17 is the default tool for all of these.

                                                  You lose out on a lot of power, of course, but for some projects the kind of power that C++ offers isn’t terribly important, but the ability to easily run on systems 20 years old or 20 years into the future might be.

                                                  Neither C nor C++ guarantees this, in my experience old C code needs just as much updating as C++ code, and it’s often harder to do because C code does not encourage clean abstractions. This is particularly true when talking about running on new platforms. From my personal experience, we and another group have recently written memory allocators. Ours is written in C++, theirs in C. This is what our platform and architecture abstractions look like. They’re clean, small, and self-contained. Theirs? Not so much. We’ve ported ours to CHERI, where the hardware enforces strict pointers and bounds enforcement on pointers with quite a small set of changes, made possible (and maintainable when most of our targets don’t have CHERI support) by the fact that C++ lets us define pointer wrapper types that describe high-level semantics of the associated pointer and a state machine for which transitions are permitted, porting theirs would require invasive changes.

                                                  It might not exist in the realm of trendy programming language debates on the Internet, but we’re having this conversation on systems largely implemented in it (UNIX won after all), so I think it’s safe to say that it very much exists, and will continue to for a long time.

                                                  I’m writing this on a Windows system, where much of the kernel and most of the userland is C++. I also post from my Mac, where the kernel is a mix of C and C++, with more C++ being added over time, and the userland is C for the old bits, C++ for the low-level new bits, and Objective-C / Swift for the high-level new bits. The only places either of these systems chose C were parts that were written before C++11 was standardised.

                                                  Hell, in the absolute worst case scenario, you could write your own compiler if you really needed to.

                                                  This is true for ISO C. In my experienced (based in part on building a new architecture designed to run C code in a memory-safe environment and working on defining a formal model of the de-facto C standard), there is almost no C code that is actually ISO C. The language is so limited that anything nontrivial ends up using vendor extensions. ‘Portable’ C code uses a load of #ifdefs so that it can use two or more different vendor extensions. There’s a lot of GNU C in the world, for example.

                                                  Reimplementing GNU C is definitely possible (clang, ICC, and XLC all did it, with varying levels of success) but it’s hard, to the extent that of these three none actually achieve 100% compatibility to the degree that they can compile, for example, all of the C code in the FreeBSD ports tree out of the box. They actually have better compatibility with C++ codebases, especially post-C++11 codebases (most of the C++ codebases that don’t work are ones that are doing things so far outside the standard that they have things like ‘works with G++ 4.3 but not 4.2 or 4.4’ in their build instructions).

                                                  More realistically, diversity of implementation is just a good indicator of the “security” of a language “investment”. Those implementations you mention might be nichey, but they exist, and you could pretty easily use them (or adapt them) if you wanted to.

                                                  There are a few niche C compilers (e.g. PCC / TCC), but almost all of the mainstream C compilers (MSVC, GCC, Clang, XLC, ICC) are C++ compilers that also have a C mode. Most of them are either written in C++ or are being gradually rewritten in C++. Most of the effort in ‘C’ compiler is focused on improving C++ support and performance.

                                                  By 2018, C++17 was pretty much universally supported by C++ compilers. We waited until 2019 to move to C++17 for a few stragglers, we’re now pretty confident being able to move to C++20. The days when a new standard took 5+ years to support are long gone for C++. Even a decade ago, C++11 got full support across the board before C11.

                                                  If you want to guarantee good long-term support, look at what the people who maintain your compiler are investing in. For C compilers, the folks that maintain them are investing heavily in C++ and in C as an afterthought.

                                                  1. 3

                                                    I’d definitely have agreed with you 10 years ago, but the C++ ABI has been stable and backwards compatible on all *NIX systems, and fairly stable on Windows, for over 15 years. C++ provides you with some tools that allow you to make unstable ABIs for your libraries, but it also provides tools for avoiding these problems. The same problems exist in C: you can’t add a field to a C structure without breaking the ABI, just as you can’t add a field to a C++ class without breaking the ABI.

                                                    The C++ ABI is stable now, but the problem is binding it from other languages (i.e. try binding a mangled symbol), because C is the lowest common denominator on Unix. Of course, with C++, you can just define a C-level ABI and just use C++ for everything.

                                                    edit

                                                    Reimplementing GNU C is definitely possible (clang, ICC, and XLC all did it, with varying levels of success) but it’s hard, to the extent that of these three none actually achieve 100% compatibility to the degree that they can compile, for example, all of the C code in the FreeBSD ports tree out of the box. They actually have better compatibility with C++ codebases, especially post-C++11 codebases (most of the C++ codebases that don’t work are ones that are doing things so far outside the standard that they have things like ‘works with G++ 4.3 but not 4.2 or 4.4’ in their build instructions).

                                                    It’s funny no one ever complains about GNU’s extensions to C being so prevalent that it makes implementing other C compilers hard, yet loses their minds over say, a Microsoft extension.

                                                    1. 2

                                                      The C++ ABI is stable now, but the problem is binding it from other languages (i.e. try binding a mangled symbol), because C is the lowest common denominator on Unix. Of course, with C++, you can just define a C-level ABI and just use C++ for everything.

                                                      That depends a lot on what you’re binding. If you’re using SWIG or similar, then having a C++ API can be better because it can wrap C++ types and get things like memory management for free if you’ve used smart pointers at the boundaries. The binding generator doesn’t care about name mangling because it’s just producing a C++ file.

                                                      If you’re binding to Lua, then you can use Sol2 and directly surface C++ types into Lua without any external support. With something like Sol2 in C++, you write C++ classes and then just expose them directly from within C++ code, using compile-time reflection. There are similar things for other languages.

                                                      If you’re trying to import C code into a vaguely object-oriented scripting language then you need to implement an object model in C and then write code that translates from your ad-hoc language into the scripting language’s one. You have to explicitly write all memory-management things in the bindings, because they’re API contracts in C but part of the type system in C++.

                                                      From my personal experience, binding modern C++ to a high-level language is fairly easy (though not quite free) if you have a well-designed API, binding Objective-C (which has rich run-time reflection) is trivial to the extent that you can write completely generic bridges, and binding C is possible but requires writing bridge code that is specific to the API for anything non-trivial.

                                                      1. 1

                                                        Right; I suspect it’s actually better with a binding generator or environments where you have to write native binding code (i.e. JNI/PHP). It’s just annoying for the ad-hoc cases (i.e. .NET P/Invoke).

                                                        1. 2

                                                          On the other hand, if you’re targeting .NET on Windows then you can expose COM objects directly to .NET code without any bridging code and you can generate COM objects directly from C++ classes with a little bit of template goo.

                                        2. 2

                                          Looks like Hans Boehm is working on it, as mentioned in the bottom of the article. They are apparently “bringing it back up to parity with C++” which should fix the problems you mentioned.

                                          1. 4

                                            That link is just Hans adding a <cstdatomic> to C++ that adds a #define _Atomic(T) std::atomic<T>. This ‘fixes’ the problem by letting you build C code as C++, it doesn’t fix the fact that C is fundamentally broken and can’t be fixed without breaking backwards source and binary compatibility.

                                        1. 4

                                          And people wonder why RDF didn’t become popular. I can’t imagine why…

                                          1. 2

                                            I don’t think one can hold the fact that such discussion is possible with RDF against RDF. RDF is a tool. You may stick to representing your CSV in RDF or go further, and RDF has nothing to do with it. I actually don’t see the future of IoT/Industry 4.0/buzzwordhere without RDF. Its core premise is a federation of vocabulary definitions, where classes and properties are identified by their URIs instead of bespoke literals (and anyone can reuse other definitions by linking to them). Eg ssn, s_s_n, and all other variants become ‘https://ns.irs.gov/core/#ssn’. I can’t imagine how we are going to build a future of “connected everything” if we can’t get things even to use the same terms for same things and just keep inventing new JSON structures for every single API.

                                            1. 4

                                              I remember the hey-day of RDF in “Web 2.0” - Flickr, Friend-of-a-Friend, all sorts of API-driven sites with sharing. In principle I really appreciate the affordances these services offered but they were a bit too hard for most developers to wrap their heads around.

                                              The vision for XML was similar: that organizations would define careful schemas, files would conform to these schemas, and liberal use of XSLT would transform data from one form to another. Reality was that people shoved CSV data willy-nilly into XML, shipped it over the wire, and relied on copious amount of hand-filed ETL code at the receiving end to handle it correctly.

                                              RDF is the same - it’s a beautiful creation but too good for this fallen world. The future is an endless pile of JSON.

                                              1. 3

                                                That’s the fundamental tragedy of RDF, to me. The idea and most of the basic concepts are mostly great and sorely needed, but the technology stack and documentation is a 90’s W3C nightmare, and there is all this academic logic/theory stuff around that almost nobody in the real world cares about and only makes the entire thing seem hopelessly intimidating. There is some good technology hidden in there, but if you just search the web for “RDF” as a curious potential adopter, you’ll almost certainly end up saying “yeeeeeeeah, no thanks”.

                                                As the maintainer of a project (LV2) that “forces” RDF on developers who just want to get something done (write audio plugins), I’m painfully aware of these problems. While some aspects of the technology are concretely useful there, it /really/ hurts to have RDF be what it is. In recent years I’ve been trying to mitigate this on a soft level by distancing from “RDF” and the mess the W3C and the semantics people have made, including avoiding using the term “RDF” at all. The situation is so bad that I think doing so only does damage.

                                                In an ideal universe maybe there’d be a vaguely WHATWG-like splinter group of people trying to build up these ideas in a more practical way (we want to chuck quasi-schemaless data around but still have it make some sense, and we need nice tools with a low barrier of entry to do so, etc), but I imagine that ship has sailed.

                                                At least JSON-LD provides a pretty viable bridge between something most developers these days are comfortable with (JSON) and the Linked Data ideal, at the cost of more work having to write context definitions and so on. Although I have my gripes with JSON-LD, it at least provides us an option to provide something that is superficially uncontroversial (“it’s just JSON”) without throwing the meaningful baby out with the bathwater…

                                                While I have my gripes with JSON-LD, I think it’s the only hope at this point. The RDF project did such a bad job at the practical-developer-facing side of things, that the only way out is to present a veneer that almost completely abstracts it away and makes it essentially an implementation detail. I’m pretty convinced at this point that the only way to make RDF-based technology palatable to random developers in the trenches is to make it so that they can’t even tell they’re using RDF-based technology at all, unless they actively dig into it.

                                                JSON won because it’s simple. A developer who knows nothing at all about it can get the basic idea with a single web search and maybe 5 or 10 minutes, and probably achieve their goal (which is probably just chucking some data around) shortly thereafter. Meanwhile, it would probably take days if not weeks to initially figure out what RDF even /is/, to say nothing of actually achieving anything with it. There’s a lesson to be learned in there somewhere. I wrote a fully conformant (and very fast) JSON parser in C with no dependencies in one weekend. I’ve been writing an RDF implementation for over a decade with no end in sight. That kind of thing really matters.

                                                … I probably should have used this as my entry in the “what would you rewrite from scratch?” thread from a few weeks ago. Such a missed opportunity.

                                            1. 1

                                              I’ll go a different way with this and take more of a broad software design “rewrite”:

                                              JSON. Not really “from scratch”, just the little things like trailing commas that are annoying enough in real life to drive people to horrifying monstrosities like YAML that almost infinitely raise the bar of entry for implementations.

                                              or, rewinding a bit further: the entire syntactic side of the web stack, informed by… well, Lisp. S-expressions (with some universal DOCTYPE-like header syntax and/or namespacing mechanism) everywhere, entirely avoiding this ridiculous zoo of [SG|X]ML + a vaguely C-like veneer on a scripting language lazily cobbled together in a week + a subset of that being the de-facto structured data syntax.

                                              1. 1

                                                Those who misunderstand semver are doomed to forever struggle through a maze of twisty little version numbers, all alike.

                                                1. 13

                                                  Personally I’m a fan of TSV for things that are just a bunch of numbers or other simple fields where whitespace delimiters make sense. It’s as easy as can be to work with out of the box with simple tools (including a bunch of standard UNIX utilities like cut) without having to mess around with delimiter options. Plus, if you need to read it in most programming languages you don’t really need a parser at all (since reading up to the next whitespace is usually a built-in thing).

                                                  Of course, this all falls apart as soon as the data is more complicated and stringy, but for data that fits, it’s lovely. JSON (or whatever) is nice when you need it, but I find it pretty silly when people put data this simple in something like JSON (or much worse, XML). Then you need to understand the schema and write a parser (probably using some library, which probably can’t even stream) to get at the data at all, and trivial one-liner command line tasks become an entire bespoke program you need to write. Ugh.

                                                  Simple is good. Sure, people will abuse simple things - people will abuse everything - but that doesn’t make them bad. As AndrewStevens pointed out above, people will find a way to cause all of these problems with more complicated formats anyway. Most of the web is built out of broken garbage written in syntaxes that really shouldn’t have these problems at all, and there is a vast amount of tooling available to prevent it, and yet…

                                                  1. 9

                                                    TSV is indeed better. If it would be easier to use the ASCII control characters in text editors, I think it could be done even better. There are Record, Group, Unit and File Separators right there in the lowest common demoninator of all encodings that is ASCII. If we used that more, things would be less error prone to parse even with “stringy” data.

                                                    1. 3

                                                      Interesting, I hadn’t thought of that. I guess Unit Separator is the replacement for tab or comma in this context. Shame that these won’t be considered whitespace, so some of the “magically works with roughly every dumb tool ever” advantages would be lost, but I wonder how many tools support these things in practice…

                                                      The combination of robustness (the chance of these characters appearing in fields seems incredibly low) and extreme simplicity is really appealing, but I guess if it’s not something you can edit manually in whatever editor, it could never be a real replacement for [CT]SV.

                                                      Funny how little things in technology history have such wide-reaching implications sometimes. If one of the more or less entirely unused keys on a standard keyboard (say, SysRq) was instead a suitable separator, we wouldn’t have any of these problems.

                                                      1. 1

                                                        The combination of robustness (the chance of these characters appearing in fields seems incredibly low) and extreme simplicity is really appealing, but I guess if it’s not something you can edit manually in whatever editor, it could never be a real replacement for [CT]SV.

                                                        Yeah, I tried using it on a little project once, but it is really annoying for anything else but the parsing side, b/c the regular tools don’t handle it nicely.

                                                        1. 1

                                                          If one of the more or less entirely unused keys on a standard keyboard (say, SysRq) was instead a suitable separator, we wouldn’t have any of these problems.

                                                          The strange thing is that we all have a symbol that most of us probably never use + it rarely occurs in text - namely § - on our keyboards. We never lean towards it as a separator. Probably because it looks ugly to the eye. I literally can not remember the last time I used § before writing this message.

                                                        2. 1

                                                          Exactly. There are characters already, just hard to type.

                                                        3. 2

                                                          Yup I agree, I designed a graceful upgrade of TSV for Oil, called “quoted typed tables”. It can represent tabs and newlines in fields if desired, and also you can attach a type to each column.

                                                          https://github.com/oilshell/oil/wiki/TSV2-Proposal