1. 1

    Here is a talk with a lot more motivation - https://www.youtube.com/watch?v=bSNda9EzNOI

    1. 20

      SQL is a narrow waist. Almost all databases support SQL. Almost all ORMs and data analysis tools emit SQL. If you make a new database that doesn’t support SQL it’s very hard to get adoption because users would have to abandon all their existing tools. If you make a new data analysis tool that doesn’t emit SQL it’s very hard to get adoption because users won’t be able to use their existing databases.

      an imperative DSL for writing queries would help improve the situation

      This is essentially what Spark and Flink are. Both of them have also recently added a SQL layer on top.

      The upsides of having a declarative layer are that the database does a lot of the optimization work for you, in many cases can do a better job than a human while still letting you write readable code, and when the data changes it can reoptimize without you having to rewrite all your code.

      This is pretty similar to the situation with assembly and structured languages. They will often generate much worse assembly then you would write by hand, but they allow much more productivity and make it easy to port software between different ISAs. Occasionally you might want to spot-check a hot loop or maybe even use a little inline assembly.

      SQL has a lot of deficiencies as a language, but whatever replaces it will probably have the same separation between query and plan. The only thing I can see changing is better hinting and debugging tools.

      (It would be nice to be able to submit plans directly, but that would also constrain the database to never changing the plan interface and that has been frequently necessary over the last few decades to take advantage of changes in hardware. There is some research work on the subject but it doesn’t feel like a solved problem yet.)

      1. 3

        Adding to this, most traditional RDBMS’s had a more imperative API via cursors. I don’t have a link ready to backup my position, but every opinion I’ve heard throughout my 14 year career is to avoid cursors whenever possible. I’ve replaced a lot of cursor usage with set operations (e.g. INSERT...SELECT) for tremendous speed gains (and readability, imo).

        One reason for the performance of standard set operations is that a lot of thought has gone into them. The engine makes decisions with a lot of knowledge about data structures, storage hardware, and profile of current data (e.g. cardinality of a join match). OTOH in Spark, it isn’t uncommon for me to reach for RDD operations because the SQL query engine isn’t as developed (this is changing, of course).

      1. 3

        The authors blog is full of algorithms fun.

        1. 1

          Wow you’re right, it’s a goldmine!

          1. 1

            Yeah he does some really nice LISP work. :)

            1. 2

              Inspired by nelhage’s article, I did a writeup here: https://pythonspeed.com/articles/consistent-benchmarking-in-ci/

              I have started using my own manually created Cachegrind benchmarks in an actual project, and it’s pretty good. Took a bunch of work to get consistency (e.g. consistent hash seeds, using Conda-python so it’s exact same code), and even then for some reason I’m getting 0.3% difference between running in GitHub Actions and running on my computer (different Rust version? Minor Valgrind version differences? different Linux or glibc versions?). But this is different hardware, so that small of a difference is actually quite good.

              1. 8

                All of the differential dataflow papers really suffer from the academic presentation which forces the author to focus on absolute novelty, rather than being allowed to talk about quality engineering. I think https://github.com/frankmcsherry/blog/blob/master/posts/2017-09-05.md motivates the problem much better.

                1. 1

                  FWIW I looked over some of the recent blog posts, and they were talking about the SQL interface.

                  That is a lot more interesting to me! It looks like I might be able to stream my web logs into a tabular format and then run ANSI SQL over them that is continuously updated? That would be pretty interesting.

                  The post made it sound like the main way to use it was via a Rust library. I’m not really interested in writing Rust for analytics. Go replaced Sawzall several years ago and I wasn’t really a fan of that either. Analytics code can be very tedious so you want a high level language.

                  Although I do have some “log spam” use cases and I wonder if SQL can handle that …

                  1. 2

                    It looks like I might be able to stream my web logs into a tabular format and then run ANSI SQL over them that is continuously updated?

                    Yep, that’s exactly how it works. The sql interface is a BSL-licensed thing being built on top of DD by a company founded by Frank McSherry and Arjun Narayan about 2 years ago - https://materialize.com/. I think it has a bright future, but I’m also curious about other niches where DD itself seems like it should be useful but currently doesn’t have any uptake.

                    I updated the post with the feedback I got, fwiw. Notably people who wanted to just crunch data were more into the kind of all-in-one-product that materialize is offering, but a smaller group wanted to use DD as a backend for more specialized tools but bounced off the api/docs.

                2. 2

                  This sounds like incremental computing, also called self-adjusting computing. Which is also a bit more general than the incremental computation over streams idea presented here. And it has applications for UI programming as well.

                  1. 14

                    I fully intend to reread this, because it seems interesting, but I have no idea what Mitchell’s thesis is, or if he has a thesis. It mostly makes sense to me at the level of individual paragraphs, but I don’t know how they fit together.

                    In particular, the entire first half of the essay had me asking “why isn’t this just the difference between ‘free and open’?” Happy to see he anticipated the question, but I didn’t understand the response.

                    1. 11

                      Sorry if that didn’t come across clearly enough. I write these things first and foremost for myself. Occasionally other people find them worthwhile.

                      If I were going to try to mush the thesis into a comment box, it would be that the current go-to argument against new, strong copyleft licenses like SSPL, that open source licenses can’t discriminate against closed software development, flaunts the history of free and open source software. The whole crux of that movement was learning to tell the difference between “open” and “closed” and coming down strongly on the side of open.

                      The reason the argument plays these days is that a bunch of closed software companies have taken over leadership of open source institutions. Twenty years ago, they’d’ve all counted as “evil” online, and the idea that open source “wins” if there’s open source in proprietary software would’ve been angrily and soundly dismissed.

                      I have a mechanism for summarizing my own blog posts, which I never remember to use. I’ll add it now. Thanks.

                      1. 3

                        I think the hardest part is that I react to that and see “free, not open”. Looking at the current state of open source, I think a FSF person has every right to say “I told you so!”

                        But clearly you think there’s more going on here. I see below that you think that was a personality driven fight, but is that the only way it misses what you think is going on?

                        Edit: P.S. Sorry for “or if he has a thesis”. I think that’s sloppy writing on my part. What I meant is that I didn’t know how much you wanted to argue for a very specific point, vs. framing the situation and making observations as a sort of first draft of how to think about it.

                        1. 6

                          I mention in my post, or tried to mention, that I think the archetype of the free-versus-open schism invites us to write all this off as “same old story”, instead of seeing all that’s completely new and different.

                          Superficially, the AWS-Elastic fight today seems a lot like the fight that kept AGPLv3 from becoming GPLv3. But we’re talking about a fight between companies, not, say, FSF versus Google. Even when foundations are weighing in, as OSI did against Elastic, we’re talking about foundations that have been propped up, staffed, and influenced by commercial firms and their people for decades. Among those companies, the ones with the most influence are first and foremost makers of proprietary products and services, plus huge integrator-consultancies, not movement stalwarts, as in the early 2000s.

                          Personalities were a big part of the FSF-OSI schism. But there were real policy issues there, too. I don’t mean to write it off as a big Tim-ESR-RMS-Linus pissing match, though it was that, too.

                      2. 4

                        I think the thesis is that the phrase “open source” has gone from being about freedom of end-users to read/copy/modify software to freedom of corporations to include software in their proprietary stacks which end-users are not permitted to read/copy/modify. And that the complaints that Elastic’s new license is not “open” are just a new move in this long battle over whose freedom open source is about.

                        1. 2

                          I know what you mean by “I don’t know how they fit together”. He does not have a structure sort of writing style. You kind of have to go along for the ride of all his thoughts, but I enjoy that sometimes more than the “5 things you need to know about open source licensing” sort of blogging.

                        1. 24

                          Data tech is a massive and intertwined ecosystem with a lot of money riding on it. It’s not just about compute or APIs, that’s a fairly small part.

                          • What file formats does it support?
                          • Does it run against S3/Azure/etc.?
                          • How do I onboard my existing data lake?
                          • How does it handle real-time vs batch?
                          • Does it have some form of transactions?
                          • Do I have to operate it myself or is there a Databricks-like option?
                          • How do I integrate with data visualization systems like Tableau? (SQL via ODBC is the normal answer to this, which is why it’s so critical)
                          • What statistical tools are at my disposal? (Give me an R or Python interface)
                          • Can I do image processing? Video? Audio? Tensors?
                          • What about machine learning? Does the compute system aid me in distributed model training?

                          I could keep going. Giving it a JavaScript interface isn’t even leaning in to the right community. It’s a neat idea, for sure, but there’s mountains of other things a data tech needs to provide just to be even remotely viable.

                          1. 6

                            Yeah this is kinda what I was going to write… I worked with “big data” from ~2009 to 2016. The storage systems, storage formats, computation frameworks, and the cluster manager / cloud itself are all tightly coupled.

                            You can’t buy into a new computation technology without it affecting a whole lot of things elsewhere in the stack.

                            It is probably important to mention my experience was at Google, which is a somewhat unique environment, but I think the “lock in” / ecosystem / framework problems are similar elsewhere. Also, I would bet that even at medium or small companies, an individual engineer can’t just “start using” something like differential dataflow. It’s a decision that would seem to involve an entire team.

                            Ironically that is part of the reason I am working on https://www.oilshell.org/ – often the least common denominator between incompatible job schedulers or data formats is a shell script!

                            Similarly, I suspect Rust would be a barrier in some places. Google uses C++ and the JVM for big data, and it seems like most companies use the JVM ecosystem (Spark and Hadoop).

                            Data tech also can’t be done without operators / SREs, and they (rightly) tend to be more conservative about new tech than engineers. It’s not like downloading something and trying it out on your laptop.

                            Another problem is probably a lack of understanding of how inefficient big data systems can be. I frequently refer to McSherry’s COST paper, but I don’t think most people/organizations care… Somehow they don’t get the difference between 4 hours and 4 minutes, or 100 machines and 10 machines. If people are imagining that real data systems are “optimized” in any sense, they’re in for a rude awakening :)

                            1. 3

                              Believe that andy is referring to this paper if anyone else is curious.

                              (And if you weren’t let me know and I’ll read that one instead. :] )

                              1. 3

                                Yup that’s it. The key phrases are “parallelizing your overhead”, and the quote “You can have a second computer once you’ve shown you know how to use the first one.” :)


                                The details of the paper are about graph processing frameworks, which most people probably won’t relate to. But it applies to big data in general. It’s similar to experiences like this:


                                I’ve had similar experiences… 32 or 64 cores is a lot, and one good way to use them all is with a shell script. You run into fewer “parallelizing your overhead” problems. The usual suspects are (1) copying code to many machines (containers or huge statically linked binaries), (2) scheduler delay, and (3) getting data to many machines. You can do A LOT of work on one machine in the time it takes a typical cluster to say “hello” on 1000 machines…

                              2. 1

                                That’s a compelling explanation. If differential dataflow is an improvement on only one component, perhaps that means that we’ll see those ideas in production once the next generation of big systems replaces the old?

                                1. 2

                                  I think if the ideas are good, we’ll see them in production at some point or another… But sometimes it takes a few decades, like algebraic data types or garbage collection… I do think this kind of big data framework (a computation model) is a little bit more like a programming language than it is a “product” like AWS S3 or Lambda.

                                  That is, it’s hard to sell programming languages, and it’s hard to teach people how to use them!

                                  I feel like the post is missing a bunch of information: like what kinds of companies or people would you expect to use differential dataflow but are not? I am interested in new computation models, and I’ve heard of it, but I filed it in the category of “things I don’t need because I don’t work on big data anymore” or “things I can’t use unless the company I work for uses it” …

                              3. 2

                                The above is a great response, so to elaborate on one bit:

                                What statistical tools are at my disposal? (Give me an R or Python interface)

                                It’s important for engineers to be aware of how many non-engineers produce important constituent parts of the data ecosystem. When a new paper comes out with code, that code is likely to be in Python or R (and occasionally Julia, or so I’m hearing).

                                One of the challenges behind using other great data science languages (e.g. Scala) is that there may be an ongoing and semi-permanent translation overhead for those things.

                                1. 1

                                  all of the above + does it support tight security and data governance?

                                1. 3

                                  I still wonder why they don’t provide precompiled binaries. That would make it quite a bit easier to give it a try.

                                  1. 2

                                    This is just a matter of time resources. Pijul is moving fast, and building it for various platforms would take a lot of time. We do have a CI system, but it’s fairly basic.

                                    1. 1

                                      True, but realistically, support for Linux, Windows, and macOS on x86-64 covers the vast majority of development setups.

                                      1. 1

                                        Sure, but:

                                        • Linux binaries require support for all common distributions (Ubuntu, Debian, Arch, Fedora, Suse), which means a different VM each time.
                                        • MacOS requires an Apple computer, there are no real options in the cloud that I’m aware of. Now that Apple is changing their architecture, this actually requires two expensive Apple machines.
                                        • Windows requires a heavy Windows VM, started fresh every time to test static linking.
                                        • Docker is also to be added to the list of common “distributions”.

                                        Additionally, I use none of these systems (I’m on NixOS on all my machines).

                                        1. 1

                                          Linux binaries can be linked statically with musl, that’s completely foolproof. You can also build for the oldest glibc you want to support, which works well enough in practice.

                                          A number of hosted CI services provide MacOS build VMs. I use Travis to build MacOS binaries of some of my projects, since I haven’t owned an Apple machine for a while. Apple on ARM supports x86-64 emulation so special binaries for it is “nice to have”, not “must have”.

                                          Same with Windows, you can get it from GitHub Actions, Circle CI, and some other cloud CI services.

                                          1. 1

                                            Yes, integration with other CI systems is planned, but not there yet.

                                          2. 1

                                            RE Linux binaries: https://guix.gnu.org/manual/en/guix.html#Invoking-guix-pack

                                            Doesn’t have to be too painful?

                                            1. 1

                                              Interesting. I wonder if NixOS has that too.

                                              1. 1

                                                Nix has dockerTools which I’ve had some success with before, managing to build smaller images than our existing docker compose.


                                                1. 1

                                                  This is very cool, I wasn’t aware of this. I wonder how that could work to build a better CI system.

                                            2. 1

                                              I thought Rust supported cross-compilation quite well?

                                              1. 1

                                                I never got it to work.

                                                1. 1

                                                  This was my experience too. In particular, xargo had some blocking issues on nixos last time I tried.

                                      1. 4

                                        I love people documenting their own personal infrastructure/custom tooling (for any purpose). Please share other examples if you know of any :)

                                          1. 2

                                            Awesome, thanks.

                                            I love that there’s description of real-world objects as part of the docs:

                                            • Blue is for the main internet connection
                                            • White is for internal connections
                                            1. 2

                                              Is that an AvE sticker?

                                              1. 1


                                            2. 3

                                              I wrote what I was hosting last year: https://chown.me/blog/infrastructure-2019 I plan to publish a new article with the changes from 2020 in a few weeks. :)

                                              1. 1

                                                Sweet! This is exactly what I’m after.

                                                Give me a shout when the new one gets posted.

                                              2. 3

                                                Here is mine - all in one file, single command deploy/rollback:


                                                1. 1

                                                  Oh cool! A few things to dig into :)

                                                  1. 1

                                                    Oh this makes me miss using NixOS on a server. Nice configuration.

                                                1. 11

                                                  I love seeing positive stories like this (especially on Tech, here on Lobsters). Any similar small/simple/curated websites in this regards?

                                                  1. 13

                                                    I have this list sitting in some old notes:



                                                    bingo card creator


                                                    sidekiq - https://www.indiehackers.com/interview/how-charging-money-for-pro-features-allowed-me-quit-my-job-6e71309457

                                                    image compression for games - https://twitter.com/sehurlburt/

                                                    complice - https://www.indiehackers.com/product/complice








                                                    dwarf fortress










                                                    1. 1

                                                      Thanks, appreciate it. Can you reformat your comment to be inline? It’s taking whole FF window on my Macbook Retina.

                                                      sublime, tarsnap, bingo card creator, jepsen, sidekiq, image compression for games, complice, insomnia, browserless, pinboard, instapaper, newsblur, duckduckgo, minecraft, dwarf fortress, metafilter, backblaze, prgmr, lavabit, growstuff, tabnine, fathom, ravelry, sqlite

                                                      1. 1

                                                        I can’t edit it, sorry :S

                                                  1. 3

                                                    Putting on my Zig hat here, the thing that jumps out to me is this:

                                                    assert(meta.deepEqual(expected.items, actual.items));

                                                    If we had a better facility for testing deep equality, there would be no need for a debugger at all! We have this for string comparison. With a line like this:

                                                    testing.assertStringsEqual(a, b);

                                                    The output for one example values of a and b looks like:

                                                    Test [1/1] test "example"... 
                                                    ====== expected this output: =========
                                                    ======== instead found this: =========
                                                    First difference occurs on line 2:
                                                    test failure

                                                    I know the point of the blog post is trying to use this as an example to examine the state of linux debuggers, but this was my personal takeaway :)

                                                    1. 2

                                                      This is great and I would love to have more of it. I feel like I’ve ended up implementing test diffing in every language I’ve used.

                                                      Does it always print the whole string? A few tests up I’m comparing strings, but they’re ~1gb each :)

                                                      I know the point of the blog post is trying to use this as an example to examine the state of linux debuggers,

                                                      The point was actually just to get people to recommend me a debugger that works. I just made a very poor choice of title.

                                                      1. 2

                                                        Just checked - it has no detection of long strings! That would be a nice contributor-friendly improvement to make :-)

                                                        1. 1

                                                          Touche :)

                                                      2. 1

                                                        I prefer the approach of using the same infrastructure you have for debug-printing values, then just using your normal string-diffing equality comparison.

                                                        The nice thing about this is you can build an expect-test workflow on top of it where failing tests produce a diff which you can use to update your tests to pass.

                                                      1. 3

                                                        Hm how was that binary built? Is it a Zig binary?

                                                        I haven’t had problems building a C++ binary myself with shell scripts, and then debugging it with gdb console, Eclipse, or Clion – on Ubuntu 16.04. I can set the stack, set breakpoints, and it jumps to the right source locations, etc.

                                                        I think you basically just have to build it with -g, and if you strip it, you have to set the ELF header metadata to point to the debug symbols.

                                                        I’m impressed with how many things you tried, but I think there are a whole bunch of different things going on here, and they could probably be teased out a little more.

                                                        I think if you create a simple C or C++ test binary you might get different results (though it’s not clear exactly what you’re trying to do from the blog post).

                                                        1. 2

                                                          Hm how was that binary built? Is it a Zig binary?

                                                          It’s a zig binary built in debug mode.

                                                          You can see from the transcript at the beginning that debugging zig stuff in gdb itself works fine, so I was hoping that the various gdb frontends would also just work.

                                                          Notably, each thing I tried in that session worked in at least one of the frontends. So if the bugs are zig-related, they’re also different zig-related bugs per frontend.

                                                          I think if you create a simple C or C++ test binary you might get different results

                                                          It’s possible. It would be a useful thing to do for science to track down where exactly the problems are in each frontend. But I don’t much feel like doing that at the moment, I just want to find a debugger that works on my project.

                                                          I got a ton of comments and emails with suggestions, so I’ll likely have a followup post in a few days.

                                                          1. 4

                                                            Fair enough, but I don’t think the title is really accurate. I would guess a lot of the suggestions you got glossed over the fact that you’re running Zig and Nix. They are technically mentioned in the post, but it’s not clear.

                                                            I would say debugging C/C++ on Ubuntu or Debian is more representative of the state of debuggers on Linux, and I’ve never had a problem getting source locations to work with ANY debugger. I tried at least 5.

                                                            When I hear “Zig and Nix”, I think “probably does not work”. Nix kinda scares me because it completely changes the file system layout (with checksums, etc.). So I would strongly suspect anything to do with finding stuff via paths to be broken unless the package author fixed it, and debug symbols rely on file system paths.

                                                            The package author of GDB may have fixed it, but not the other ones. I like the idea of Nix, but I’m not that interested in using it due to what I’ve seen of its internals [1]

                                                            Also, DWARF is an extremely complicated file format (e.g. it has a Turing complete VM, so you may need to run arbitrary code to compute line numbers [2]). As far as I understand, there’s not just “one way” to encode line numbers. So it wouldn’t surprise me either if Zig’s data tickled a bug here or there.

                                                            That is I think that unconstraining one of the variables Zig and Nix would go a long way to narrowing down the problem. I would guess that the variable “which GDB front end” is less relevant …

                                                            Although to be fair, I found the Eclipse GUI to work roughly 70% on C++/Ubuntu. The symbols worked, but lots of other stuff was broken inexplicably. So I switched to CLion (via free open source license) or gdb console.

                                                            If I wanted the best chance of debugging a Zig binary with a good UI, I’d probably try CLion on Ubuntu/Debian (e.g. copy it to a VM as an initial test).

                                                            [1] e.g. it uses bash in more abusive ways than any Linux distro I’ve seen, and that’s saying a lot: https://github.com/oilshell/oil/issues/26 . My experience is that package authors on all distros often “hack it until it works”, and when there is this kind of complexity, they give up.

                                                            [2] https://kristerw.blogspot.com/2016/01/more-turing-completeness-in-surprising.html

                                                            1. 2

                                                              I’ve never had a problem getting source locations to work with ANY debugger. I tried at least 5.

                                                              I tried 13 and only 1 had trouble with source locations, so I guess we have compatible success rates :)

                                                              The package author of GDB may have fixed it, but not the other ones. debug symbols rely on file system paths.

                                                              Most of the frontends launch gdb from a bash command that you give them, so they’re all using the same binary. The exceptions that I know of are code-lldb, which didn’t work when patched, and intellij, which found debug symbols but still refused to show me variables at the failed assert.

                                                              DWARF is an extremely complicated file format

                                                              To what extent gdb frontends actually have to deal with it though? It looks like they mostly just send print commands to gdb and get strings back. Maybe for understanding the structure of types?

                                                              due to what I’ve seen of its internals

                                                              I’m in more or less the same boat. I think there are a lot of design decisions in nix that make things much buggier than they needed to be. But stuff that works tends to stay working, whereas my most-of-a-decade experience with ubuntu was that stuff would just mysteriously break at random times and be impossible to rollback.

                                                              I am tempted to try guix, since it seems to be much better designed, but that’s even more niche.

                                                              My main interest is just finding something that works for this project and I did get a lot of interesting suggestions to test out.

                                                              The title was off the cuff and wasn’t intended to be a big statement about anything. I might just change it to “Looking for debugger” or similar.

                                                              1. 3

                                                                Well it looks like you had problems getting backtraces in multiple debuggers. I consider that the most basic functionality: after a crash, get a backtrace with function names, then click to go to source location.

                                                                What I read seems significantly more broken than I’ve experienced, although I won’t argue with the general point that debugger UIs can be flaky. I tried a Python web UI as well and didn’t have good luck with it. Eclipse had some weird problems respecting my breakpoints, but was generally usable and I did fix many bugs with it.

                                                                I think another source of difficulty could be the GDB MI version (protocol version). I’m not versed in the details, but some debuggers use a newer protocol and some use and older one. That seems heavily distro and version dependent.

                                                                Generally when I look at a distro, I look at the size of the package definitions. For every line I assume a pretty large probability of having a bug. Stuff like this is why “minimalist distros” are appealing (although I use Ubuntu for my desktop, since it’s tested by brute force in the happy paths at least).

                                                                1. 3

                                                                  A few months ago I went through a similar list of debuggers on Debian, and encountered very similar errors. Many debuggers refuse to work properly when they haven’t built the binary themselves, or they can’t find the C/C++ source. While I won’t call Debian a “minimalist distro”, the fact I also encountered doesn’t make these seem like the fault of Nix.

                                                        1. 11

                                                          By way of contrast, I’ve been really happy with nixops. My server config is a single file https://git.sr.ht/~jamii/tower/tree/master/tower.nix, changes get pushed from my laptop with nixops deploy and I pay vultr an extra $1 to deal with backups for me.

                                                          1. 4

                                                            I can’t find any mention of input or interactivity in Gemini. Is it possible to build something like lobster.rs or wikipedia?

                                                            1. 3

                                                              A page can request user input in a single text field, which has the semantics that the same page is re-fetched with the contents of the input field added as a query parameter (see 3.2.1 in the spec). So far I’ve seen this used for: 1) search boxes, and 2) text input to interactive fiction games.

                                                              I don’t think it’s designed for more wiki-style editing where the entire document itself is edited in the same viewer.

                                                              1. 2

                                                                Just like the WWW in 1994. If Gemini catches on, it will quickly (d)evolve into the web of 1996.

                                                            1. 3

                                                              One point that is conspicuously missing is comparison of resource management (RAII vs defer). It seems to be an area without clear answer (see this issue‘s history: https://github.com/ziglang/zig/issues/782). Was this a non-question in practice?

                                                              1. 4

                                                                So far I haven’t had any difficulty using defer, but on the other hand most of the code I’ve written leans heavily on arena allocation and I also haven’t put much effort into testing error paths yet. I don’t expect to have much of an opinion either way until I’ve written a lot more code and properly stress tested some of it.

                                                                I suspect that defer will be the easy part, and the hard part will be making sure that every try has matching errdefers. There’s a bit of an asymmetry between how easy it is to propagate errors and how hard it is to handle and test resource cleanup in all those propagation paths.

                                                                1. 2

                                                                  For me, it is a non-problem. You usually see when a return value needs deferring cleanups, and it’s just a matter of typing

                                                                  var x = try …(…, allocator, …);
                                                                  defer x.deinit();

                                                                  it’s usually pretty obvious when a clean-up is required and if not, looking at the return value or doc comments is sufficient

                                                                  1. 4

                                                                    Does Zig suffer from the same problems with defer that Go does? e.g., It’s often quite tempting to run a defer inside a loop, but since defer is scoped to functions and not blocks, it doesn’t do what you might think it will do. The syntax betrays it.

                                                                    Answering my own question (nice docs! only took one click from a Google search result), it looks like Zig does not suffer from this problem and executes at the end of the scope.

                                                                    1. 2

                                                                      No, Zig has defer for block scopes, not function scopes. When i learnt what Go does i was stunned on how unintuitive that is

                                                                      1. 1

                                                                        Yeah, it’s definitely a bug I see appear now and then. I suspect there’s some design interaction here between defer and unwinding. Go of course does the latter, and AFAIK, uses that to guarantee that defer statements are executed even when a “panic” occurs. I would guess that Zig does not do unwinding like that, but I don’t actually know. Looking at the docs, Zig does have a notion of “panic” but I’m not sure what that actually implies.

                                                                        1. 1

                                                                          Panic calls a configurable panic handler. The default handler on most platforms prints a stacktrace and exits. It can’t be caught at thread boundaries like the rust panic can, so I guess it makes sense that it doesn’t try to unwind.

                                                                          1. 1

                                                                            Ah yeah, that might explain why Zig is able to offer scope based defer, where as defer in Go is tied to function scope.

                                                                          2. 1

                                                                            A “panic” in zig is a unrecoverable error condition. If your program panics, it will not unwind anything but usually just print a panic message and exit or kill the process. Unwinding is only done for error bubbling

                                                                  1. 5

                                                                    Both languages require explicit annotations for nulls (Option in rust, ?T in zig) and require code to either handle the null case or safely crash on null (x.unwrap() in rust, x.? in zig).

                                                                    Describing Option<T> as “explicit annotation for nulls” has always struck me as missing the point a little (this is not the only essay to use that kind of verbiage to talk about what an option type is).

                                                                    At one level of abstraction, Rust just doesn’t have nulls - an i32 is always a signed 32 bit integer, a String is always an allocated utf-8 string, with no possibility that when you start calling methods on a variable with that type, it will turn out that there was some special null value in that type that makes your calls crash the program or cause undefined behavior. This is a good improvement over many languages that do make null implicitly a member of every type, that the programmer needs to check for.

                                                                    At a different level of abstraction, null semantics are still something a programmer frequently wants to represent using a language - that is, the idea of a variable either being nothing or else being some value of a specific type. The Rust standard library provides the Option<T> type to represent these semantics, and has some special syntactic support for dealing with it with things like the ? operator. But at the end of the day, it’s just an enum type that the standard library defines in the same way as any other Rust type, enum Option<T> { Some(T), None }. If you are writing a program that needs two different notions of nullity for some reason, you can define your own custom type enum MyEnum { None1, None2, Some(T) } using the same common syntax for defining new types.

                                                                    1. 5

                                                                      Since you mention it, it’s interesting that ?T could be a tagged union in zig:

                                                                      fn Option(comptime T: type) type {
                                                                          return union(enum) {
                                                                              Some: T,

                                                                      Instead it’s … weird. null is a value with type @TypeOf(null). There are implicit casts from T to ?T and from null to ?T which is the only way to construct ?T. There is a special case in == for ?T == null.

                                                                      I had a quick dig through the issues and I can’t find any discussion about this.

                                                                      And, out of curiosity:

                                                                          const a: ?usize = null;
                                                                          const b: ??usize = a;
                                                                          std.debug.print("{} {}", .{b == null, b.? == null});

                                                                      prints “false true”.

                                                                      1. 4

                                                                        I read that like this in Rust:

                                                                        fn main() {
                                                                            let a: Option<u8>  = None;
                                                                            let b: Option<Option<u8>> = Some(a);
                                                                            println!("{:?}, {:?}", b.is_none(), b.unwrap().is_none());

                                                                        Which has the same output. This makes sense to me, as b and a have different types. Does zig normally pass through the nulls? The great thing to me in Rust is that although a and b are different types, they take up the same space in memory (zig may do the same, I’ve never tested).

                                                                        1. 1

                                                                          Does zig normally pass through the nulls?

                                                                          No, your translation is correct and this is the behavior I would want. But this is something I tested early on because the way ?T is constructed by casting made me suspicious that it wouldn’t work.

                                                                          zig may do the same, I’ve never tested

                                                                          Oh, me neither…

                                                                             std.debug.print("{}", .{.{
                                                                          [nix-shell:~]$ zig run test.zig
                                                                          struct:79:30{ .0 = 8, .1 = 16, .2 = 24, .3 = 8, .4 = 8, .5 = 16 }
                                                                          [nix-shell:~]$ zig run test.zig -O ReleaseFast
                                                                          struct:79:30{ .0 = 8, .1 = 16, .2 = 24, .3 = 8, .4 = 8, .5 = 16 }

                                                                          Looks like it does collapse ?* but not ??.

                                                                          1. 2

                                                                            Looks like it does collapse ?* but not ??.

                                                                            It’s not possible to collapse ?? as it would have a semantic loss of information. Imagine ?void as a boolean which is either null (“false”) or void (“true”). When you now do ??void, you have the same number of bits as ?bool.

                                                                            ??void still requires 1.5 bit to represent, whereas ?void only needs 1 bit.

                                                                            Collapsing an optional pointer though is possible, as Zig pointers don’t allow 0x00… as a valid address, thus this can be used as sentinel for null in an optional pointer. This allows a really good integration into existing C projects, as ?*Foo is kinda equivalent to a C pointer Foo * which can always be NULL. This translates well to Zig semantics of ?*Foo.

                                                                            Note that there are pointers that allow 0x00… as a valid value: *allowzero T. Using an optional to them doesn’t collapse: @sizeOf(*allowzero T) != @sizeOf(*T)

                                                                            1. 1

                                                                              It’s not possible to collapse ?? as it would have a semantic loss of information.

                                                                              ?void still requires 1.5 bit to represent, whereas ?void only needs 1 bit.

                                                                              I don’t think you read the sizes carefully in my previous comment. ??void actually uses 16 bits in practice.

                                                                              std.debug.print("{}", .{.{@sizeOf(void), @sizeOf(?void), @sizeOf(??void)}});
                                                                              struct:4:30{ .0 = 0, .1 = 1, .2 = 2 }

                                                                              Whereas if we hand-packed it we can collapse the two tags into one byte (actually 2 bits plus padding):

                                                                              fn Option(comptime T: type) type {
                                                                                  // packed union(enum) is not supported directly :'(
                                                                                  return packed struct {
                                                                                      tag: packed enum(u1) {
                                                                                      payload: packed union {
                                                                                          Some: T,
                                                                                          None: void,
                                                                              pub fn main() void {
                                                                                  std.debug.print("{}", .{.{@sizeOf(void), @sizeOf(Option(void)), @sizeOf(Option(Option(void)))}});
                                                                              struct:17:30{ .0 = 0, .1 = 1, .2 = 1 }

                                                                              The downside is that &x.? would have a non-byte alignment, which I imagine is why this is not the default.

                                                                              But that’s what we were testing above. Not “can we magically fit two enums in one bit”.

                                                                              1. 1

                                                                                Okay, i misread that then, sorry. Zig is still able to do some collapsing of multi-optionals as there is no ABI definition. It might be enabled in release-small, but not in the other modes. But: This is just a vision of the future, it’s not implemented atm

                                                                            2. 1

                                                                              That makes sense, and is the same as Rust. 0 is a valid bit pattern for usize and thus cannot use the null pointer optimization. In Rust you’d have to use Option<&Option<&usize>> to collapse everything since Option<T> is not known to be non-null but references (&) are. It would be neat if both Rust and Zig were able to say that Option<T> is non-null if T if non-null so you could get this benefit without the need for references (or other [unstable] methods of marking a type non-null).

                                                                          2. 1

                                                                            Could those decisions have something to do with C interop? Not sure how much that would affect it, but my inexperienced assumption is that using actual nulls over a tagged union would help with that.

                                                                            1. 2

                                                                              Worth noting here that Rust guarantees that Option<T> is represented without a discriminant (tag) when T is a nullable pointer type or otherwise has a “niche” where you could encode the discriminant in. This even applies to fat pointers like slices or Vec (which have an internal pointer to the allocation, which can never be null).

                                                                              Or, more visually:

                                                                              fn main() {
                                                                                  use std::ptr::NonNull;
                                                                                  use std::mem::size_of;
                                                                                  assert_eq!(size_of::<Option<&u32>>()              , size_of::<&u32>());
                                                                                  assert_eq!(size_of::<Option<NonNull<u32>>>(), size_of::<&u32>());
                                                                                  assert_eq!(size_of::<Option<&[u8]>>()               , size_of::<&[u8]>());
                                                                                  assert_eq!(size_of::<Option<Vec<u32>>>()       , size_of::<Vec<u32>>());

                                                                              (NonNull is the non-nullable raw pointer: https://doc.rust-lang.org/std/ptr/struct.NonNull.html

                                                                              For that reason, Option can be used in FFI situations.

                                                                              This is actually a general compiler feature, those composite types are not special-cased. (Declaration of a type as not being nullable is a nightly feature still, though)


                                                                              1. 1

                                                                                That’s possible. There is a separate [*c]T for c pointers and the casts could do the conversions. But maybe that would be expensive.

                                                                          1. 30

                                                                            Definitely worth a read.

                                                                            Unlike most language comparisons, this one is really well-written and informative.

                                                                            1. 11

                                                                              Agreed. It was making me rethink my prior swipe-left on Zig … until I got to “any function that could allocate memory has to take an allocator parameter”, and “no lambdas”, which were my deal-breakers earlier.

                                                                              I get that custom allocators can be useful. But I’ve rarely needed them in C++, and when I think of how many functions/classes in my code allocate memory, that’s a fsckton of extra allocator parameters to have to plumb through every call path.

                                                                              1. 16

                                                                                If you want a global allocator in your zig program you can put one in a global variable and use it everywhere.

                                                                                const allocator = std.heap.c_allocator;

                                                                                The issue is more when writing libraries - it’s nice for the user of the library to be able to choose.

                                                                                It’s not a crazy amount of threading either eg std.ArrayList takes an allocator on init and then the rest of the methods just take self. Similar patterns work elsewhere eg my compiler has a context struct that stores a bunch of global state and I stash the allocator in there.

                                                                                The lack of closures is a pain though. You can write anonymous functions, they just can’t close over state automatically. There is some design discussion on https://github.com/ziglang/zig/issues/229 that may lead to closures with explicit captures.

                                                                                1. 5

                                                                                  Closures that capture all variables have been a minor regret of the Julia developers, so it might be a good idea to go slow and have explicit captures.

                                                                                  Capturing all variables has caused some tricky bugs with threaded tasks (which are often defined with a closure) and make some compiler optimisations trickier than they would otherwise be.

                                                                                  1. 2

                                                                                    That’s really interesting. Do you have links to any of the discussions about that?

                                                                                    1. 5

                                                                                      I don’t know of any deep discussion of the problems with closures in general. Let me know if you find some! Jeff and Stephan talked about it briefly near the end of the State of Julia talk last year, I think.

                                                                                      The issue with tasks created with @spawn closures is that it’s easy to use a reference from the outer scope (often accidentally because you reused var names or because you forgot that you should take a copy and pass that in rather than sharing), and now all of your tasks are editing the same variable in parallel. This bug has turned up in real code in lots of tkf’s work and in Pkg.jl

                                                                                      Edit: there’s also this long thread about structured concurrency, which touches on some other perceived issues with @spawn. Not super on topic, but may interest you https://github.com/JuliaLang/julia/issues/33248

                                                                                      1. 2

                                                                                        Thanks, I’ll check it out.

                                                                                        1. 3

                                                                                          The talk I was thinking of is https://youtu.be/vfxS6_Sx1Pk

                                                                                          I have no idea where they talk about it, sorry!

                                                                                          1. 3

                                                                                            According to the transcript, it might be at this timestamp: https://youtu.be/vfxS6_Sx1Pk?t=3035

                                                                                            1. 3

                                                                                              This is it, @jamii, it’s slightly different to my memory, they say they would rather have pass by value semantics for closures rather than passing in bindings.

                                                                                              I think it’s morally similar to what I said originally, but maybe not so much. Apologies if I’ve sent you on a wild goose chase!

                                                                                  2. 4

                                                                                    Would it be possible to specify an allocator as a comptime parameter, like the Allocator template parameter that STL collections use? Then a global allocator wouldn’t add overhead since its context is zero-size, but a local allocator could be used transparently.

                                                                                    1. 1
                                                                                      const std = @import("std");
                                                                                      fn BadArrayList(comptime allocator: *std.mem.Allocator, comptime T: type) type {
                                                                                          return struct {
                                                                                              elems: []T,
                                                                                              const Self = @This();
                                                                                              fn init() Self {
                                                                                                  return .{.elems = &[_]T{}};
                                                                                              fn push(self: *Self, elem: T) void {
                                                                                                  var new_elems = allocator.alloc(T, self.elems.len+1) catch @panic("oh no!");
                                                                                                  std.mem.copy(T, new_elems, self.elems);
                                                                                                  new_elems[self.elems.len] = elem;
                                                                                                  self.elems = new_elems;
                                                                                      pub fn main() void {
                                                                                          var list = BadArrayList(std.heap.c_allocator, u8).init();
                                                                                          std.debug.print("{s}", .{list.elems});

                                                                                      but a local allocator could be used transparently.

                                                                                      I don’t think this works out. The allocator value in this example has to be known at compile-time so it can’t be something that is constructed at runtime. It has to be global.

                                                                                      1. 1

                                                                                        Oh, wait, I think you’re asking for something slightly different. If the type of the allocator is known at compile time then the size is known, but the actual value can be passed at runtime and the value of c_allocator should be zero-sized. That should work, but it would require changing the allocator idiom that is currently used to let you call the methods directly rather than going through the fn pointers in Allocator.

                                                                                  3. 14

                                                                                    any function that could allocate memory has to take an allocator parameter

                                                                                    I used to feel as strongly about this, but having written a small but functionally complete piece of software in Zig that does a lot of (de)allocation (a CommonMark/GFM implementation), the Allocator type gets explicitly referenced on 50 lines out of 4500. It turned out to be surprisingly unpainful.

                                                                                    1. 3

                                                                                      Good to hear! Is the code online? I’m curious to see it.

                                                                                      1. 2

                                                                                        Have at it! https://github.com/kivikakk/koino/

                                                                                        I think it’s currently in use by one other project – only updating it to keep in line with Zig master at the moment.

                                                                                1. 8

                                                                                  Nice writeup! I’m glad to see that Zig’s compile time metaprogramming is carrying its weight. It seems like a great thing to base a language around, and something I’ve been interested in for a long time.

                                                                                  It’s interesting to compare that with: https://nim-lang.org/araq/v1.html

                                                                                  … Nim’s meta programming capabilities are top of the class. While the language is not nearly as small as I would like it to be, it turned out that meta programming cannot replace all the building blocks that a modern language needs to have.

                                                                                  I don’t know why that is (since I don’t know Nim), but of course it’s a hard problem, and it looks like Zig has done some great things here.

                                                                                  If this is true in general, a plausible reason for this difference is that many of the ‘zero-cost’ abstractions that are heavily used in rust (eg iterators) are actually quite expensive without heavy optimization.

                                                                                  I’m also finding this with “modern” C++ … A related annoying thing is that those invisible / inlined functions are visible in the debugger, because they may need to be debugged!

                                                                                  1. 7

                                                                                    Related wish: I kinda want an application language with Zig-like metaprogramming, not a systems language. In other words, it has GC so it’s a safe language, and no pointers (or pointers are heavily de-emphasized).

                                                                                    Basically something with the abstraction level of Kotlin or OCaml, except OCaml’s metaprogramming is kinda messy and unstable.

                                                                                    (I’m sort of working on this, but it’s not likely to be finished any time soon.)

                                                                                    1. 6

                                                                                      Julia has similar ideas. There is a bit more built in to the type-system eg multimethods have a fixed notion of type specificity, but experience with julia is what makes me think that zig’s model will work out well. Eg: https://scattered-thoughts.net/writing/zero-copy-deserialization-in-julia/ , https://scattered-thoughts.net/writing/julia-as-a-platform-for-language-development/

                                                                                      1. 4

                                                                                        Yeah Julia is very cool. I hacked on femtolisp almost 5 years ago as a potential basis for Oil, because I was intrigued how they bootstrapped it and used it for the macro system. (But I decided against writing a huge parser in femtolisp).

                                                                                        And recently I looked at the copying GC in femtolisp when writing my own GC, which is one of the shortest “production” usages of the Cheney algorithm I could find.

                                                                                        And I borrowed Julia’s function signature syntax – the ; style – for Oil.

                                                                                        But unfortunately I haven’t gotten to use Julia very much, since I haven’t done that type of programming in a long time.

                                                                                        That said, I’d be very interested in a “Zig for language development” post to complement these … :) Specifically I wonder if algebraic data types are ergonomic, and if Zig offers anything nice for those bloated pointer-rich ASTs …

                                                                                        i.e. I have found it nice to have a level of indirection between the logical structure and the physical layout (i.e. bit packing), and it seems like Zig’s metaprogramming could have something to offer there. In contrast, Clang/LLVM do tons of bit packing for their ASTs and it seems very laborious.

                                                                                        1. 3

                                                                                          wonder if algebraic data types are ergonomic

                                                                                          Aside from the lack of pattern matching, they’re pretty good. There are a couple of examples in the post of nice quality of life features like expr == .Constant for checking the tag and expr.Constant for unwrap-or-panic. Comptime reflection makes it easy to generate things like tree traversals.

                                                                                          Zig offers anything nice for those bloated pointer-rich ASTs

                                                                                          I mostly work in database languages where the ast is typically tiny, but if you have some examples to point to I could try to translate them.

                                                                                          a “Zig for language development” post

                                                                                          I definitely have plans to bring over some of the query compiler work I did in julia but that likely won’t be until next year.

                                                                                      2. 6

                                                                                        Take a look at Nim. It has GC (now ref-counted in 1.4, with a cycle collector) and an excellent macro facility.

                                                                                        1. 4

                                                                                          Nim is impressive, and someone is actually translating Oil to Nim as a side project …


                                                                                          I tried Nim very briefly, but the main thing that turned me off is that the generated code isn’t readable. Not just the variable names, but I think the control flow isn’t preserved. Like Nim does some non-trivial stuff with a control flow graph, and then outputs C.

                                                                                          Like Nim, I’m also generating source code from a statically typed language, but the output is “pidgin C++” that I can step through in the debugger, and use with a profiler, and that’s been enormously helpful. I think it’s also pretty crucial for distro maintainers.

                                                                                        2. 5

                                                                                          I find D’s approach to metaprogramming really interesting, might be worth checking out if you’re not familiar with it.

                                                                                          1. 5

                                                                                            D’s compile-time function execution is quite similar. Most of the zig examples would work as-is if translated to d. The main difference being that in d, a function cannot return a type; but you can make a function be a type constructor for a voldemort type and produce very similar constructions.

                                                                                            1. 3

                                                                                              Yeah I have come to appreciate D’s combination of features while writing Oil… and mentioned it here on the blog:


                                                                                              Though algebraic data types are a crucial thing for Oil, which was the “application” I’m thinking about for this application language … So I’m not sure D would have been good, but I really like its builtin maps / arrays, with GC. That’s ilke 60% of what Oil is.

                                                                                              1. 2

                                                                                                D does have basic support for ADTs (though there’s another better package outside the standard library). Support is not great, compared with a proper ml; but its certainly no worse than the python/c++ that oil currently uses.

                                                                                            2. 3

                                                                                              Julia sort of fits, depends on your applications. Metaprogramming is great and used moderately often throughout the language and ecosystem. And the language is fantastically expressive.

                                                                                              1. 2

                                                                                                I want this too, got anything public like blog posts on your thoughts / direction?

                                                                                                1. 4

                                                                                                  Actually yes, against my better judgement I did bring it up a few days ago:


                                                                                                  tl;dr Someone asked for statically typed Python with sum types, and that’s what https://oilshell.org is written in :) The comment contains the short story of how I got there.

                                                                                                  The reason I used Python was because extensive metaprogramming made the code 5-7x shorter than bash, and importantly (and surprisingly) it retains enough semantic information to be faster than bash.

                                                                                                  So basically I used an application language for a systems level task (writing an interpreter), and it’s turned out well so far. (I still have yet to integrate the GC, but I wrote it and it seems doable.)

                                                                                                  So basically the hypothetical “Tea language” is like statically typed Python with sum types and curly braces (which I’ve heard Kotlin described as!), and also with metaprogramming. Metaprogramming requires a compiler and interpreter for the same language, and if you squint we sorta have that already. (e.g. the Zig compiler has a Zig interpreter too, to support metaprogramming)

                                                                                                  It’s a very concrete project since it’s simply the language that Oil is written in. That is, it already has 30K+ lines of code written for it, so the feature set is exactly mapped out.

                                                                                                  However, as I’ve learned, a “concrete” project doesn’t always mean it can be completed in a reasonable amount of time :) I’m looking for help! As usual my contact info is on the home page, or use Github, etc.

                                                                                                  Another way to think of this project is as “self hosting” Oil, because while the current sets of metalanguages is very effective, it’s also kind of ugly syntactically and consists of many different tools and languages. (Note that users are not exposed to this; only developers. Tea may never happen and that’s OK.)

                                                                                            1. 3

                                                                                              23 minutes of incremental compilation? How a simple change in an actual application causes upstream dependencies to recompile? Genuinely curious.

                                                                                              1. 2

                                                                                                The application is split into multiple crates (mostly to speed up compilation). The change was in a crate that most of the others depend on. It’s probably close to the worst case for that project, but I spent a lot of time debugging and benchmarking the code in that crate so I hit it often.

                                                                                                1. 3

                                                                                                  Oh I see, so these are downstream. Makes sense. These compilation times are insane.

                                                                                                  1. 4

                                                                                                    Yep. I’ve since heard that buying everyone threadrippers has brought the times down somewhat, but it’s still a big productivity drain.