1. 57
  1. 29

    I don’t see this as a case for a modern language, as much as a problem with working with C.

    This has worked in Ada since its initial version (Ada83), throws Constraint_Error on out of range and bad input, and works similarly on floats Float'Value(S), and even custom enumerations with their names with Enum_Type'Value(S).

    X : Integer := Integer'Value (S);
    

    Strings are also not null terminated, but there’s Interfaces.C.Strings if you need to convert to C-style to work with an imported function.

    1. 19

      I think here “modern” means “newer than 1970”.

      1. 14

        Lol in that case, C counts (created in 72).

        1. 4

          EDIT: I think I wildly misinterpreted your point. I was considering “modern” by going off of this quote:

          one of the several more modern languages that have sprung up in the systems programming space, at least for newly written code. I make no secret that I love Rust and Zig both, for many similar reasons. But I also find Nim to be impressive and have heard great things about Odin.

        2. 11

          The C++11 versions of these functions also throw exceptions on out-of-range or parse-error conditions. This makes me a bit sad, because I normally disable exceptions in C++ - I wish the standards group would change their stance on subsetting and properly standardise the no-exceptions dialect.

          1. 7

            I haven’t tried them yet but C++2017 added from_chars and to_chars. They don’t throw exceptions or support localization.

            1. 2

              Zounds! I did not know of these.

              1. 2

                Thanks, that looks like exactly the API that I was looking for! Apart from the not supporting localization bit - it would be nice if they could take an optional std::locale. That said, another of my pet peeves with the C++ committee’s aversion to subsetting is that higher-level functionality such as locales are mandatory in all environments. You can sort-of get around this by defining precisely one locale (“C” or “POSIX”, the only one that POSIX requires to exist) but then you’re relying on your compiler to do a lot of tricky dead-code elimination.

                1. 3

                  Not supporting a locale is a goal of those interfaces:

                  Unlike other formatting functions in C++ and C libraries, std::to_chars is locale-independent, non-allocating, and non-throwing. Only a small subset of formatting policies used by other libraries (such as std::sprintf) is provided. This is intended to allow the fastest possible implementation that is useful in common high-throughput contexts such as text-based interchange (JSON or XML).

            2. 7

              I’d consider Ada a modern language! At the very least, one of the earliest languages with modern sensibilities.

              1. 8

                I would consider never versions of Ada (especially Ada 2012) to be modern as well. My point was that the author was emphasizing much newer languages, and I was addressing that this has been solved in an older language for a long time.

            3. 13

              The first example isn’t valid rust. It has a bunch of incorrect & signs (that must be removed), and a use of a type hint in the if let that I really wish was valid, but isn’t.

              Here’s a version that would actually compile

              // pretend that this was passed in on the command line
              let my_number_string = String::from("42");
              // If we just want to bubble up errors
              let my_number: u8 = my_number_string.parse()?;
              assert_eq!(my_number, 42);
              // If we might like to panic!
              let my_number: u8 = my_number_string.parse().unwrap();
              assert_eq!(my_number, 42);
              // If we're a good Rustacean and check for errors before trying to use the data
              if let Ok(my_number) = my_number_string.parse::<u8>() {
                  assert_eq!(my_number, 42);
              }
              
              1. 10

                It’s not about modernity, but ability to fix things in a timeframe shorter than decades. C has fully embraced ossification as a symbol of stability and compatibility, and C++ has made the C community prejudiced against any new language features.

                In C, there’s no hope to get even smallest most broken things fixed in a reasonable time. 5 years or longer to get something into the standard (if it gets in at all), several years until it’s implemented, then a few more years before laggard Linux distros update their compilers, then projects will wait a couple more years just in case someone still has an old compiler. If you support MSVC that’s an extra 10 years of waiting until Microsoft puts it on their roadmap, and writes a sub-par implementation. And then you will still run into projects that insist on staying on C89 until the end of time.

                In Rust, for small warts, the time between “this is stupid, we should fix it” and having it fixed in production is 3 months, maybe a year.

                1. 10

                  There came a time when I got tired of running as fast as I could just to stay in place. And it seems to me that with modern languages du jour, that’s all you are doing—running as fast as you can just to stay in place.

                  1. 3

                    Depends what you use — Node.js has a senseless policy of making semver-major ( = breaking) releases every 3 months, which ripples through the JS ecosystem. Swift went through a lot of churn too.

                    OTOH Rust has stabilized over 6 years ago, and kept very stable. I have a project from 2016 that uses Rust + JS + C. Rust builds perfectly fine, unchanged, with the latest compiler. JS doesn’t build at all (to the absurd level that the oldest Node version compatible with my OS is too new to run newest versions of packages compatible with my JS). I’ve also had to fix the C build a few times. C may be dead as a language, but clang/gcc do change and occasionally break stuff. C also isn’t well-isolated from the OS, so it keeps tripping up on changes to headers and system dependencies. In this regard Rust is more stable and maintenance free than C.

                    1. 2

                      Not sure where you are getting your information about Node.js releases. New LTS (even-numbered) major versions are released every year, with minor non-breaking version updates more frequently.

                      1. 1

                        Sorry, every 6 months: v16 2021-04-20, v17 2021-10-19, v18 2022-04-19. This isn’t merely a numbering scheme, they really reflect semver-major breaking changes. I’d prefer Node not to be on v18.x.x, but on v1.18.x, maybe v2.9.x. LTS doesn’t fix Node’s churn problem. It’s merely a time bomb, because you have to upgrade to a newer LTS and face the breakage eventually.

                        I much prefer Rust’s approach of staying on 1.x.x forever, because I can upgrade the compiler every month, keep access to latest-greatest features, and add newly-released dependencies, but still be able to build a 6-year-old project with 6-year-old dependencies and upgrade it on my own schedule.

                      2. 1

                        C may be dead as a language

                        It’s not dead as much as usually not used as the primary language for applications. It has a bunch of niches in which it’s used as a low level glue, and there’s quite a few embedded developers who use it.

                        on changes to headers and system dependencies

                        Programs which break due to header changes are often the result of people not properly including all of the things that they need. This can be a very hard problem to solve because of how header files can be implicitly included, though there are some tools like IWYU which can help mitigate the problem.

                        1. 3

                          I mean dead in terms of future evolution. TFA shows that even such a basic thing like parsing a number has a poor API that could have been fixed at any point in the last 40 years, but wasn’t. Users C of are so used to dealing with such papercuts, that often there’s no will to improve anything, so these problems will never be fixed.

                          As for headers, I mean things like Linux distros changing headers in /usr/include. I don’t really have control over this in C — pkg-config or equivalent will give me whatever it has. In Rust/Cargo I have semver ranges and lockfiles that give me compatible dependencies.
                          Same goes for compilers. There are known footguns like -Werror, but also subtler ones due to implementation changes. For example, apple-clang started “validating” header include paths for whatever SDK policy they had. Using GCC instead isn’t easy or reliable either, because Apple keeps adding non-standard C to their headers, even stdio.h. OTOH Rust works as a reliable baseline, and isolates me from these issues.

                      3. 2

                        C++ for all its faults is a decent middle ground in this respect. It’s steadily adding good features, without breaking backward compatibility (except a few edge cases.)

                    2. 8

                      I’d say it’s not the C language per se which is at fault, but C the ecosystem is very bad and basically irreformable. Really what you would want is a real import system (not just invisibly dumping in names because of an include macro) and a decent standard library and some system for strings used throughout the stdlib which is sane. (Null terminated strings aren’t actually strings because they can’t include \0, which is a perfectly ordinary character value. They can be one part of a larger system, but shouldn’t be the default.) The signature for a good atoi isn’t a secret, but it’s easier to make a whole new language than to wait for C to ship with it.

                      1. 12

                        I’d respond that the lack of module systems, the awful strings, null everywhere, etc. are parts of the C language. If you need to change those, you’re changing the language.

                        In contrast, you can just have my_lib.c with a sane atoi() and copy-paste it into every project you make.

                      2. 4

                        So, this is talking about a very narrow niche of languages without a GC. If you accept that a GC is fine for your use case, then you have a wide range of modern (or even old!) languages at your disposal.

                        1. 4

                          The “42b” error checking isn’t even quite right. The article says

                          This will return 0, will not trip errno (remember errno?), and the pointer has moved forward by two bytes.

                          But actually, it will return 42, not 0, so the condition i == 0 && *end != '\0' will not trigger.

                          Also, the i == 0 isn’t necessary for the end == one check, since if end == one, we know strtol returned 0.

                          You could combine all into a single condition as

                          if (errno || end == str || *end)
                          	fprintf(stderr, "invalid input\n");
                          
                          1. 2

                            Yup, atoi is indeed terrible, and strtol needs a wrapper! I think strtol is a performance-focused interface, but there should also be a convenient interface. So it would indeed be nice if a strtol wrapper was added to the standard library. I have written that and it is slightly annoying and easy to get wrong.

                            1. 1

                              So what would a convenient interface for strtol() look like? Meanwhile, I’ve used strtol() to parse numbers like ‘32k’ (where the k multiplies the number by 1024).

                              1. 1

                                Yeah exactly, that is why the strtol() interface is defensible – so you don’t do duplicate work when you’re parsing something like 32k. A lot of C interfaces make more sense when you think about them as performance-oriented and state machine oriented, not oriented around composition by functions. (Since C doesn’t have “functions”; that word is too overloaded.)

                                But many programs need a high level wrapper. In C I would just do it as an out param:

                                int result = 0;
                                if (str_to_int(buf, len, &result)) {
                                  printf("%d", result);
                                } else {
                                  printf("conversion error");
                                }
                                

                                Which is very similar to the Zig example. (How does that Zig variable binding work?)

                                You might also want to do switch(str_to_int()) with different error codes for overflow, underflow, etc.

                            2. 1

                              This article would be better if it expressed those issues in terms of affordance.

                              In C, the standard functions are not good, most of the time. That’s why most projects do not use them, and use wrappers around them. The question is not whether the standard is good enough, but whether it is possible –at all– to get a simple and safe function written in C to do this job.

                              For this particular problem, yes it is possible.

                              The next question then, is whether the language shepherds you toward creating this safe and simple interface. I think for C the answer is ‘no’, and that’s the issue with this language. Someone with enough experience might think, most of the times, to create this wrapper. But there is nothing in the language pushing toward it, or even better, completely forbidding the use of the unsafe and impractical version. This ‘most of time’ translates into many projects not using the safer and more practical wrapper.

                              Some people use non-standard preprocessor statements for this job (they will mark the unsafe / impractical standard function as deprecated and force the use of the wrapper in the current project).

                              Any C project starting today should first try to switch to a new language. If not possible, it should start with a very comprehensive boilerplate skeleton that consist in many, many wrappers replacing most of the standard lib.

                              1. 1

                                C++17 has a sane interface called from_chars.

                                It’s templated, so the function (and range check) you get depends on the template argument.

                                It’s still up to the caller to check if there is something after the number. As always, STL functions are optimized for versatility over foolproofing. You can always wrap a versatile interface in a more foolproof one, but trying to do the reverse is called an abstraction inversion.

                                1. 1

                                  Couldn’t the last example be reduced to:

                                  char *one = "one";
                                  char *end;
                                  errno = 0; // remember errno?
                                  long i = strtol(one, &end, 10);
                                  if (errno) {
                                      perror("Error parsing integer from string: ");
                                  } else if(*end) {
                                      fprintf(stderr, "Error: invalid input: %s\n", one);
                                  }
                                  

                                  For a similar expected behavior?

                                  The one == end is kind of redundant with *end != '\0'.

                                  Might not be perfect, but API-wise this is not as smelly as it is presented, in my opinion. Actually with C limitations it kind of makes sense. Any parsing error due to numbers that cannot get represented in a long are passed into errno, which is a common way to represent errors in stdlib. Using the end pointer seems like a reasonable and very flexible way to let the caller deal with what should be considered an invalid string. In some case, someone might not care about trailing character (so could use end == buf instead of *end).

                                  1. 2

                                    The one == end is kind of redundant with *end != '\0'.

                                    Not necessarily. If the string is "", then we would have one == end, but not *end != '\0'.

                                    1. 1

                                      Good point didn’t thought about this case!

                                    2. 1

                                      Not quite; you have to check the return value before you can rely on errno.