1. 37
  1.  

  2. 11

    In a more complete build, SQLite also uses library routines like malloc() and free()

    TIL sqlite can be used without memory allocator!

    Zero-malloc option. The application can optionally provide SQLite with several buffers of bulk memory at startup and SQLite will then use those provided buffers for all of its memory allocation needs and never call system malloc() or free().

    1. 2

      Interesting paper he cited:

      Bounds for some functions concerning dynamic storage allocation

      https://scholar.google.com/scholar?cluster=15336827953885873575&hl=en&as_sdt=0,5&sciodt=0,5

      Unfortunately behind a paywall :-(

      1. 4
    2. 7

      All of these advantages are really due to C’s popularity though, aren’t they?

      C was the best language around a long time ago, and its popularity due to that led to it being standard to make C compilers for hardware, and the tremendous amount of work that went into C’s compilers due to years and years of improvement, along with the language’s low level, made it quite fast.

      I’m not trying to argue that C wasn’t a good choice; clearly it was due to the reasons listed in the article, but I don’t think there’s a lot that is inherently “the best” in the language C itself; perhaps if the same colossal amount of work were put into other languages, they’d be thriving as much as (or more than) C. Its popularity makes itself a self-fulfilling prophecy.

      1. 9

        C was the best language

        It wasn’t. The early works show it was what they could get to compile on a PDP-11 based on a chopped down ALGOL that was what Richards et al could compile on an EDSAC. The only goals were (a) it compiles on terrible hardware and (b) it runs efficiently on terrible hardware. Terribly by today’s standards that is. The ALGOL-derived languages, including smaller ones like Wirth’s, tried to balance requirements of safety, maintainability, compile speed, and runtime speed. Most projects doing system programming today benefit from each of those as well. Hence, a design ignoring one or more is objectively worse in most projects.

        So, both BCPL and C were great languages for hacking on crap hardware when little to nothing else was available. From there, like you said, it’s popularity from being used in a killer app (UNIX) did much of the rest. The momentum behind it created more compilers, compiler optimizations, developer tools, 3rd party libraries, and people that know how to program it. That ecosystem makes it a good choice in a lot of systems programming projects. That became self-reinforcing, too.

        The language itself is either poorly or barely designed for the current use-cases despite doing original one really well. Anything that does current use-cases better is possibly a better choice. That’s exponentially more true if it’s C-compatible in data types, calling conventions, and/or ability to compile to C. I recommend last points for all C competitors given the ecosystem being an army of freight trains that aren’t stopping or turning around any time soon. There’s also still a niche of people using other system languages like Free Pascal, Ada/SPARK, and now Rust. Thanks to Rust work, that niche is getting revitalized. I stand by my recommendation of being C compatible, though, since both backward compatibility and incremental change are long-proven to be best ways to transition whether we like it or not. A transition that may take decades for legacy systems since the build outs took decades.

        1. 5

          [C] was what they could get to compile on a PDP-11 based on a chopped down ALGOL that was what Richards et al could compile on an EDSAC. The only goals were (a) it compiles on terrible hardware and (b) it runs efficiently on terrible hardware.

          That’s not the impression I get from dmr’s The Development of the C Language. It’s not even an impression I’d get from the presentation you’re referring to.

          The impression I get from both of those sources is that C is first and foremost a programming language that is relatively easy to implement and then relatively straightforward to use.

          That’s exponentially more true if it’s C-compatible in data types, calling conventions, and/or ability to compile to C. I recommend last points for all C competitors given the ecosystem being an army of freight trains that aren’t stopping or turning around any time soon.

          I think that the ability to compile from one language to C is many cases is either impractical or pointless. It only is potentially useful when an operating system comes with a C compiler but not Rust and you’re a user that needs to run a Rust program but cannot install a Rust implementation - but then it can be difficult to debug the program unless you’re willing to invest your time into analyzing and improving the generated C source code.

          1. 3

            It was relatively easy to implement at that time period if they didn’t know about LISP notation. It’s straightforward to use if, like in their use cases, safety and maintainability of large programs don’t matter much. If they do, it’s more straightforward to write in a language that knocks defects out by design. Wirth’s Modula-2 addressed both needs in a language that was even easier to implement. He iterated on from there baking in things like concurrency and OOP without C++’s mess.

            “I think that the ability to compile from one language to C is many cases is either impractical or pointless.”

            Here’s a few examples of where it could be beneficial. In each, we’d do what I stated above to seemlessly use available C libraries or to compile it with C’s many optimizing compilers.

            1. Coded in Modula-2 or other simple, safe-by-default programming language (i.e. Oberons) that extracts to C. Languages are straight-forward to use, will have less defects, and handle more stuff in programming in the large.

            2. The LISP machines and Smalltalk environment allowed coding a system incrementally against a running, live image. Every step can be immediately checked. Instead of crashes, mistakes take you where the problem is. The system can be changed while it’s running: redefining a class will redefine all instances in the system. In LISP’s, macros let one make it easy to express constructs that take more code in C. Studies on both Common LISP and Smalltalk vs C had the former running circles around the latter in productivity. Lower defects, too. A C embedded in CL environment would be better than C itself. PreScheme was a system, programming language that kept the compile-time-only features of LISP’s to get some language-level benefits. Compiled to C.

            3. Code in ML or a dialect to get its benefits. ML itself is strongly-typed with type inference that lends well to functional programming. Since language is clean, CompSci folks regularly extend it with dialects that do things like concurrency models or covert-channel analysis. Major provers can all extract verified code directly to ML. CakeML will compile that to assembly for verified, reference implementation. If the reference implementation is too slow, then a ML-to-C compiler like Tolmach et al built can possibly speed it up a lot. Exhaustive, equivalence checking on a per function or module basis between reference and C versions shows latter is at least as correct.

            4. Similar to 1, code in Ada 2012 + SPARK 2014. Ada is a complex derivative of ALGOL that’s systematically designed for about every feature to be safe by default. You can turn safety off in selective ways when necessary. SPARK lets you prove the absence of common errors such as divide by zero or integer overflows in sample code if you put some work into it. Mostly automated. Unlike C with Frama-C, SPARK language was specifically designed to make verification easy. It can compile to C for portability reasons. Throw Rust in 4 since it would be used for similar reasons to Ada with better model for temporal safety and concurrency.

            5. Haskell is a pure, functional language that gets those benefits. It also has many features in its type system for knocking out whole classes of problems while maintaining relatively-concise programs. It’s like a middle point between a safe language and something that would take lots of mathematical skills. It can do DSL’s. Both C-like and hardware languages have been embedded in Haskell for its benefits. They extract to the other language once program is finished.

            6. Any DSL-oriented language given they boost productivity and/or safety at certain things. They might reduce to C form for execution. They will also have things they can’t handle which necessitate a FFI.

            7. For distributed algorithms, tools such as Event-B or TLA+ can precisely specify them following checks or proofs of key properties. Users of such tools often prefer a code generator that reduces odds they’ll make a mistake on something tedious doing it by hand. The primitives themselves will usually be coded-by-hand, though. They can use something like 1-6, though, to reduce defects. Each tool does what it’s good at.

            So, we have a series of languages that are a mix of easier to compile, easier to transform, easier to be safer, easier to maintain, and so on. Everything about C has been beaten by better designs except for piles of work put into its compilers. Even they with the kinds of bugs we see show they’d have benefited from a gradual port to a better systems language at some point. The main reasons people keep with C for language rather than social reasons are to use the tools in its ecosystem. If includes or deploys have no penalties, then one can use a language better across the board in that ecosystem. There’s no reason to continue using C past familiarity if alternatives have all its benefits with few to none of its detriments. Even familiarity didn’t make it more productive in the fairest study comparing it to Ada.

        2. 4

          Yes, in other words, programming languages have strong network effects. To interact with software written in C, the path of least resistance is to also write your code in C.

          So if your goal is to reuse code, then every piece of software written in C makes the C ecosystem more valuable. And since sqlite is a library, it wants other people to reuse it, even if it doesn’t reuse anything besides libc itself.

          This effect is especially strong in the case of kernels and interpreters/VMs. C has pretty much a monopoly on kernels: Linux, BSD, and Windows. It also has a near monopoly on interpreters and VMs: Python, Ruby, Perl, PHP, R, bash, Lua, Tcl, the JVM, etc. (An exception would be v8 which has a somewhat weird C++ interface.)

        3. 5

          This reminded me of the curl is C blogpost, in which the author makes similar arguments around compatibility and low-dependency.

          1. 3

            Read the title of that “CURL IS C” like “THIS IS SPARTA”.

          2. 5

            I can’t remember who said it, but “C is the universal assembly language.” If you want your library usable in as many environments as possible, C is the way to go. Essentially every higher-level language has a built-in FFI with C.

            1. 10

              That has more to do with Unix becoming universal than with C the language itself. It’s like the idea that “C has no runtime” – it’s more accurate to say C’s runtime is included in the operating system runtime for most/all modern operating systems.

            2. 10

              All good reasons, IMO. But it fails to mention any of the well-known problems with C, which would have prevented many vulnerabilities in SQLite. So it reads like they’re just trying to justify their choice, rather than an honest assessment of C. I don’t know what the intention or purpose of this page is, though. And to be fair, I would probably have made the same choice in 2000.

              1. 40

                I don’t know what the intention or purpose of this page is

                Probably to stop people asking why it’s not written in Rust.

                1. 14

                  Since it mentions Java but not Go or Rust, I suspect it’s an older page.

                  1. 25

                    That’s the beauty of C, it refutes all future languages without having to be recompiled.

                    1. 1

                      It mentions Swift, too.

                        1. 1

                          Yeah, looking at the parent page, it appears it showed up sometime in 2017. I was mislead by the mention of Java as an alternative, because I think it’s rather obviously unsuited for this job.

                    2. 4

                      I tried finding a list of vulnerabilities in SQLite and only this page gave current info. Now, I’m unfamiliar with CVE stats so I don’t know if 15 CVE’s in 8 years is more than average for a project with the codebase and use of SQLite.

                      1. 7

                        […] I don’t know if 15 CVE’s in 8 years is more than average for a project with the codebase and use of SQLite.

                        I don’t know either! I looked at the same page before writing my comment, and found plenty of things that don’t happen in memory-safe languages. There were fewer entries than I expected, but also some of them have descriptions like “Multiple buffer overflows […],” so the number of severe bugs seems to be higher than the number of CVEs.

                        1. 7

                          The 4 in 2009 appear to have been in some web app that used SQLite, not SQLite itself.

                          1. 4

                            The security community generally considers CVE counts a bad mechanism to argue about the security of a project, for the following reasons:

                            Security research (and thus vulnerability discovery) are driven by incentives like popularity, impact and monetary gain. This makes some software more attractive to attack, which increases the amount of bugs discovered, regardless of the security properties of the codebase. It’s also hard to find another project to compare with.

                            (But if I were to join this game, I’d say 15 in 8 years is not a lot ;))

                          2. 1

                            15 vulnerabilities of various levels in the past 10 years.

                            https://www.cvedetails.com/vendor/9237/Sqlite.html

                            How does that compare to other products or even similar complicated libraries?

                          3. 7

                            It would be a fine page if titled “why we picked C to base SQLite on in 2000”.

                            1. 11

                              If you mean it would be written in Rust in 2018, nope. Most platforms of interest to SQLite are still tier-2/tier-3 support by Rust at best.

                              1. 3

                                I believe it. But what platforms are at issue? Edit: yikes, the tier 1 list is way more limited than I thought. Never mind this question.

                                Also, for projects starting in 2018, the question isn’t what Rust supports today, but what platforms you’re willing to bet Rust will support in 5-10 years. Hopefully that list is bigger.

                                1. 8

                                  We’ve been talking about re-vamping the tier system, because it doesn’t do a great job of recognizing actual support. For example, ARM is a Tier 1 platform for Firefox, so stuff gets checked out and handled quite a bit, but given the current rules of how we clarify support, it appears like it’s a lot less than it actually is.

                                  in 5-10 years. Hopefully that list is bigger.

                                  We recently put together a working group to work on ease of porting rust to other platforms, so yeah, we expect it to grow. The hardest part is CI, honestly. Getting it going is one thing, but having someone who’s willing to commit to fixing breakage in a timely manner without bringing all development to a halt is tough for smaller/older platforms.

                              2. 2

                                Depends on whether they had access to Pascal, Ada, Scheme, or Smalltalk. ;)