1. 3

    You may be interested in this library I just published:

    https://hexdocs.pm/exonerate/0.1.1/Exonerate.html

    Watch this space, I’ll be open sourcing through work a full-on openAPI validation generator in about a month and a half or so.

    1. 1

      Nice. Just added your project to my watch list :)

      Perhaps you are already acquainted with Aeson from Haskell, but if you’re not, I’d recommend checking it out. It features very powerful datatype generic encoding/decoding to/from JSON.

      1. 1

        Ok! I’ll be sure to link back to the openApi project in the exonerate readme

    1. 7

      This is typical for “actor” libraries in languages with async/await. It gives you a very different programming model than that of Erlang: a model where it’s easier to express concurrent processes, but much harder to manage concurrent access to the state correctly (https://matklad.github.io/2021/04/26/concurrent-expression-problem.html).

      Prescriptivist in me screams that calling this “actor model” is wrong, but realist feels that it’s more productive to use something like “strict actror model” for things like Erlang or https://uazu.github.io/stakker

      1. 4

        I feel like in Erlang/Elixir you would just not put the authorization in an async block. The whole point of “actors” is that it’s a “unit of concurrency”. If you hadn’t put authorization in an async block, then it would be obvious that nothing could reenter. (In elixir, if you do a stdlib Task.async call that spawns a thread still nothing could reenter, so long as your await is in the same function block)

        I guess that’s not the case in swift? If you perform an async call, it instantiates suspend-back-to-the-actor points that you didn’t ask for? Man, what a footgun.

        1. 2

          There are two different approaches to re-entrancy in actors, and IMHO each comes with its own footguns.

          In the Erlang style (which I admittedly have not used), actors can easily deadlock by calling each other in a cycle. That kind of nastiness is one of the reasons I switched to using (async-based) actors.

          1. 6

            I’ve been doing elixir for four years now and I can count on my hands the number of times I’ve deadlocked, and 0 times in prod.

            Part of it is that genserver comes with a sensible timeout default, and part of it is that you just usually aren’t writing concurrent code; your concurrency is much, much higher level and tied to the idea of failure domains.

            If you’re running into deadlocks that often in BEAM languages *especially elixir, which gives you task, which shouldn’t deadlock unless you do something way too clever) you’re probably overusing concurrency.

            1. 3

              Actors can’t deadlock unless you allow selective receive (which is not part of Hewitt’s original model), but it’s difficult to imagine a pragmatic actor implementation that doesn’t allow it.

          2. 4

            Prescriptivist in me screams that calling this “actor model” is wrong, but realist feels that it’s more productive to use something like “strict actror model” for things like Erlang or https://uazu.github.io/stakker

            It’s funny because apparently the creator of the actor model says that Erlang doesn’t implement it and the creators of Erlang agree.

            Edit: Eh, that might have come off as snide. I mean to say it’s funny because it feels like a no true Scotsman fallacy.

            “Akka implemented the actor model in Scala.”

            “Well, that’s a library! Swift made it a part of the language!”

            “Swift allows suspend points within it’s actors. Erlang is the strict actor model”.

            “No, Erlang doesn’t even implement the actor model! But Pony does!”

          1. 4

            Side effects are super useful for debugging, telemetry, etc; artificially coloring your functions in this fashion is likely going to make it terrible for maintenance and operability. But good luck anyways!

            1. 35

              return err is almost always the wrong thing to do. Instead of:

              if err := foo(); err != nil {
              	return err
              }
              

              Write:

              if err := foo(); err != nil {
              	return fmt.Errorf("fooing: %w", err)
              }
              

              Yes, this is even more verbose, but doing this is what makes error messages actually useful. Deciding what to put in the error message requires meaningful thought and cannot be adequately automated. Furthermore, stack traces are not adequate context for user-facing, non-programming errors. They are verbose, leak implementation details, are disrupted by any form of indirection or concurrency, etc.

              Even with proper context, lots of error paths like this is potentially a code smell. It means you probably have broader error strategy problems. I’d try to give some advice on how to improve the code the author provided, but it is too abstract in order to provide any useful insights.

              1. 18

                I disagree on a higher level. What we really want is a stacktrace so we know where the error originated, not manually dispensed breadcrumbs…

                1. 32

                  maybe you do, but I prefer an error chain that was designed. A Go program rarely has just one stack, because every goroutine is its own stack. Having the trace of just that one stack isn’t really a statement of the program as a whole since there’s many stacks, not one. Additionally, stack traces omit the parameters to the functions at each frame, which means that understanding the error means starting with your stack trace, and then bouncing all over your code and reading the code and running it in your head in order to understand your stack trace. This is even more annoying if you’re looking at an error several days later in a heterogeneous environment where you may need the additional complication of having to figure out which version of the code was running when that trace originated. Or you could just have an error like “failed to create a room: unable to reserve room in database ‘database-name’: request timed out” or something similar. Additionally, hand-crafted error chains have the effect that they are often much easier to understand for people who operate but don’t author something; they may have never seen the code before, so understanding what a stack trace means exactly may be difficult for them, especially if they’re not familiar with the language.

                  1. 6

                    I dunno. Erlang and related languages give you back a stack trace (with parameters) in concurrently running processes no problem

                    1. 5

                      It’s been ages since I wrote Erlang, but I remember that back then I rarely wanted a stack trace. My stack were typically 1-2 levels deep: each process had a single function that dispatched messages and did a small amount of work in each one. The thing that I wanted was the state of the process that had sent the unexpected message. I ended up with some debugging modes that attached the PID of the sending process and some other information so that I could reconstruct the state at the point where the problem occurred. This is almost the same situation as Go, where you don’t want the stack trace of the goroutine, you want to capture a stack trace of the program at the point where a goroutine was created and inspect that at the point where the goroutine failed.

                      This isn’t specific to concurrent programs, though it is more common there, it’s similar for anything written in a dataflow / pipeline style. For example, when I’m debugging something in clang’s IR generation I often wish I could go back and see what had caused that particular AST node to be constructed during parsing or semantic analysis. I can’t because all of the state associated with that stack is long gone.

                  2. 10

                    FWIW, I wrote a helper that adds tracing information.

                    I sort of have two minds about this. On the one hand, yeah, computers are good at tracking stack traces, why are we adding them manually and sporadically? OTOH, it’s nice that you can decide if you want the traces or not and it gives you the ability to do higher level things like using errors as response codes and whatnot.

                    The thing that I have read about in Zig that I wish Go had is an error trace which is different from the stack trace, which shows how the error was created, not the how the error propagates back to the execution error boundary which is not very interesting in most scenarios.

                    1. 7

                      The nice thing about those error traces is that they end where the stack trace begins, so it’s seamless to the point that you don’t even need to know that they are a thing, you just get exactly the information that otherwise you would be manually looking for.

                    2. 8

                      In a multiprocess system that’s exchanging messages: which stack?

                      1. 2

                        see: erlang

                      2. 5

                        You don’t want stack traces; you want to know what went wrong.

                        A stack trace can suggest what may have gone wrong, but an error message that declares exactly what went wrong is far more valuable, no?

                        1. 8

                          An error message is easy, we already have that: “i/o timeout”. A stack trace tells me the exact code path that lead to that error. Building up a string of breadcrumbs that led to that timeout is just a poorly implemented, ad-hoc stack trace.

                          1. 5

                            Indeed and I wouldn’t argue with that. I love a good stack trace, but I find they’re often relied upon in lieu of useful error messages and I think that’s a problem.

                            1. 2

                              Building up a string of breadcrumbs that led to that timeout is just a poorly implemented, ad-hoc stack trace.

                              That’s a bit of an over-generalization. A stack trace is inherently a story about the construction of the program that originated the error, while an error chain is a story about the events that led to an error. A stack trace can’t tell you what went wrong if you don’t have access to the program’s source code in the way that a hand crafted error chain can. A stack trace is more about where an error occurred, while an error chain is more about why an error occurred. I think they’re much more distinct than you are suggesting.

                              and of course, if people are just bubbling up errors without wrapping them, yeah you’re going to have a bad time, but I think attacking that case is like suggesting that every language that has exceptions encourages Pokémon exception handling. That’s a bad exception-handling pattern, but I don’t think that the possibility of this pattern is a fair indictment of exceptions generally. Meanwhile you’re using examples of bad error handling practices that are not usually employed by Go programmers with more than a few weeks experience to indict the entire paradigm.

                          2. 4

                            Stack traces are expensive to compute and inappropriate to display to most users. Also, errors aren’t exceptions.

                            1. 1

                              That’s why Swift throws errors instead. Exceptions immediately abort the program.

                            2. 3

                              What really is the “origin” of an error? Isn’t that somewhat arbitrary? If the error comes from a system call, isn’t the origin deeper in the kernel somewhere? What if you call in to a remote, 3rd party service. Do you want the client to get the stack trace with references to the service’s private code? If you’re using an interface, presumably the purpose is to abstract over the specific implementation. Maybe the stack trace should be truncated at the boundary like a kernel call or API call?

                              Stack traces are inherently an encapsulation violation. They can be useful for debugging your internals, but they are an anti-feature for your users debugging their own system. If your user sees a stack trace, that means your program is bugged, not theirs.

                              1. 5

                                I get a line of logging output: error: i/o timeout. What do I do with that? With Ruby, I get a stack trace which tells me exactly where the timeout came from, giving me a huge lead on debugging the issue.

                                1. 5

                                  I get a line of logging output: error: i/o timeout. What do I do with that?

                                  Well, that’s a problem you fix by annotating your errors properly. You don’t need stack traces.

                                  1. 3

                                    When your Ruby service returns an HTTP 500, do you send me the stack trace in the response body? What do I do with that?

                                    Go will produce stack traces on panics as well, but that’s precisely the point here: these are two different things. Panics capture stack traces as a “better than nothing” breadcrumb trail for when the programmer has failed to account for a possibility. They are for producers of code, not consumers of it.

                                  2. 2

                                    There’s definitely competing needs between different audiences and environments here.

                                    A non-technical end user doesn’t want to see anything past “something went wrong on our end, but we’re aware of it”. Well, they don’t even want to see that.

                                    A developer wants to see the entire stack trace, or at least have it available. They probably only care about frames in their own code at first, and maybe will want to delve into library code if the error truly doesn’t seem to come from their code or is hard to understand in the first place.

                                    A technical end user might want to see something in-between: they don’t want to see “something was wrong”. They might not even want to see solely the outer error of “something went wrong while persisting data” if the root cause was “I couldn’t reach this host”, because the latter is something they could actually debug within their environment.

                                2. 9

                                  This is one reason I haven’t gone back to Go since university - There’s no right way to do anything. I think I’ve seen a thousand different right ways to return errors.

                                  1. 9

                                    Lots of pundits say lots of stuff. One good way to learn good patterns (I won’t call them “right”), is to look at real code by experienced Go developers. For instance, if you look at https://github.com/tailscale/tailscale you’ll find pervasive use of fmt.Errorf. One thing you might not see – at least not without careful study – is how to handle code with lots of error paths. That is by it’s very nature harder to see because you have to read and understand what the code is trying to do and what has to happen when something goes wrong in that specific situation.

                                    1. 6

                                      there is a right way to do most things; but it takes some context and understanding for why.

                                      the mistake is thinking go is approachable for beginners; it’s not.

                                      go is an ergonomic joy for people that spend a lot of time investing in it, or bring a ton of context from other languages.

                                      for beginners with little context, it is definitely a mess.

                                      1. 9

                                        I thought Go was for beginners, because Rob Pike doesn’t trust programmers to be good.

                                        1. 18

                                          I’d assume that Rob Pike, an industry veteran, probably has excellent insight into precisely how good the average programmer at Google is, and what kind of language will enable them to be productive at the stuff Google makes. If this makes programming languages connaisseurs sad, that’s not his problem.

                                          1. 9

                                            Here’s the actual quote:

                                            The key point here is our programmers are Googlers, they’re not researchers. They’re typically, fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt.

                                            So I have to wonder who is capable of understanding a “brilliant language” …

                                            1. 8

                                              So I have to wonder who is capable of understanding a “brilliant language” …

                                              Many people. They don’t work at Google at an entry-level capacity, that’s all.

                                              There’s a subtle fallacy at work here - Google makes a lot of money, so Google can afford to employ smart people (like Rob Pike!) It does not follow that everyone who works at Google is, on average, smarter than anyone else.

                                              (edited to include quote)

                                              1. 8

                                                Let’s say concretely we are talking about OCaml. Surely entry-level Googlers are capable of understanding OCaml. Jane Street teaches it to all new hires (devs or not) in a two-week bootcamp. I’ve heard stories of people quickly becoming productive in Elm too.

                                                The real meaning of that quote is not ‘entry-level Googlers are not capable of it’, it’s ‘We don’t trust them with it’ and ‘We’re not willing to invest in training them in it’. They want people to start banging out code almost instantly, not take some time to ramp up.

                                                1. 8

                                                  Let’s say concretely we are talking about OCaml. Surely entry-level Googlers are capable of understanding OCaml. Jane Street teaches it to all new hires (devs or not) in a two-week bootcamp.

                                                  I suspect that Jane Street’s hiring selects for people who are capable of understanding OCaml; I guarantee that the inverse happens and applicants interested in OCaml self select for careers at Jane Street, just like Erlang-ers used to flock towards Ericsson.

                                                  Google has two orders of magnitude more employees than Jane Street. It needs a much bigger funnel and is likely far less selective in hiring. Go is “the law of large numbers” manifest as a programming language. That’s not necessarily bad, just something that is important for a massive software company and far less important for small boutiques.

                                                  1. 2

                                                    applicants interested in OCaml self select for careers at Jane Street,

                                                    As I said, they teach it to all hires, including non-devs.

                                                    Google has two orders of magnitude more employees than Jane Street. It needs a much bigger funnel and is likely far less selective in hiring

                                                    Surely though, they are not so loose that they hire Tom Dick and Harry off the street. Why don’t we actually look at an actual listing and check? E.g. https://careers.google.com/jobs/results/115367821606560454-software-developer-intern-bachelors-summer-2022/

                                                    Job title: Software Developer Intern, Bachelors, Summer 2022 (not exactly senior level)

                                                    Minimum qualifications:

                                                    Pursuing a Bachelor’s degree program or post secondary or training experience with a focus on subjects in software development or other technical related field. Experience in Software Development and coding in a general purpose programming language. Experience coding in two of C, C++, Java, JavaScript, Python or similar.

                                                    I’m sorry but there’s no way I’m believing that these candidates would be capable of learning Go but not OCaml (e.g.). It’s not about their capability, it’s about what Google wants to invest in them. Another reply even openly admits this! https://lobste.rs/s/yjvmlh/go_ing_insane_part_one_endless_error#c_s3peh9

                                                    1. 2

                                                      And I remember when Google would require at minimum a Masters Degree before hiring.

                                                      1. 1

                                                        I had a master’s degree in engineering (though not in CS) and I couldn’t code my way out of a paper bag when I graduated. Thankfully no-one cared in Dot Com Bubble 1.0!

                                                    2. 3

                                                      They want people to start banging out code almost instantly, not take some time to ramp up.

                                                      Yes, and? The commodification of software developers is a well-known trend (and goal) of most companies. When your assets are basically servers, intangible assets like software and patents, and the people required to keep the stuff running, you naturally try to lower the costs of hiring and paying salary, just like you try to have faster servers and more efficient code.

                                                      People are mad at Rob Pike, but he just made a language for Google. It’s not his fault the rest of the industry thought “OMG this is the bee’s knees, let’s GO!” and adopted it widely.

                                                      1. 1

                                                        Yes, I agree that the commodification of software developers is prevalent today. And we can all see the result, the profession is in dire straits–hard to hire because of bonkers interview practices, hard to keep people because management refuses to compensate them properly, and cranking out bugs like no tomorrow.

                                                      2. 2

                                                        on the contrary, google provides a ton of ramp up time for new hires because getting to grips with all the internal infrastructure takes a while (the language is the least part of it). indeed, when I joined a standard part of the orientation lecture was that whatever our experience level was, we should not expect to be productive any time soon.

                                                        what go (which I do not use very much) might be optimising for is a certain straightforwardness and uniformity in the code base, so that engineers can move between projects without having to learn essentially a new DSL every time they do.

                                                        1. 1

                                                          You may have a misconception that good programming languages force people to ‘essentially learn a new DSL’ in every project. In any case, as you yourself said, the language is the least part of the ramp-up of a new project, so even if that bit were true, it’s still optimizing for the wrong thing.

                                                          1. 1

                                                            no, you misunderstood what i was getting at. i was saying that go was optimisng for straightforwardness and uniformity so that there would be less chance of complex projects evolving their own way of doing things, not that better languages would force people to invent their own DSLs per project.

                                                            also the ramp-up time i was referring to was for new hires; a lot of google’s internal libraries and services are pretty consistently used across projects (and even languages via bindings and RPC) so changing teams requires a lot less ramp up than joining google in the first place.

                                                            1. 1

                                                              i was saying that go was optimisng for straightforwardness and uniformity so that there would be less chance of complex projects evolving their own way of doing things,

                                                              Again, the chances of that happening are not really as great as the Go people seem to be afraid it is, provided we are talking about a reasonable, good language. So let’s say we leave out Haskell or Clojure. The fear of language-enabled complexity seems pretty overblown to me. Especially considering the effort put into the response, creating an entirely new language and surrounding ecosystem.

                                                2. 9

                                                  No, Rob observed, correctly, that in an organization of 10,000 programmers, the skill level trends towards the mean. And so if you’re designing a language for this environment, you have to keep that in mind.

                                                  1. 4

                                                    it’s not just that. It’s a language that has to reconcile the reality that skill level trends toward the mean, with the fact that the way that google interviews incurs a selection/survival bias towards very junior programmers who think they are the shit, and thus are very dangerous with the wrong type of power.

                                                    1. 4

                                                      As I get older and become, presumably, a better programmer, it really does occur to me just how bad I was for how long. I think because I learned how to program as a second grader, I didn’t get how much of a factor “it’s neat he can do it all” was in my self-assessment. I was pretty bad, but since I was being compared to the other kids who did zero programming, it didn’t matter that objectively I was quite awful, and I thought I was hot shit.

                                                    2. 4

                                                      Right! But the cargo-cult mentality of the industry meant that a language designed to facilitate the commodification of software development for a huge, singular organization escaped and was inflicted on the rest of us.

                                                      1. 4

                                                        But let’s be real for a moment:

                                                        a language designed to facilitate the commodification of software development

                                                        This is what matters.

                                                        It doesn’t matter if you work for a company of 12 or 120,000: if you are paid to program – that is, you are not a founder – the people who sign your paychecks are absolutely doing everything within their power to make you and your coworkers just cogs in the machine.

                                                        So I don’t think this is a case of “the little fish copying what big bad Google does” as much as it is an essential quality of being a software developer.

                                                        1. 1

                                                          Thank you, yes. But also, the cargo cult mentality is real.

                                                    3. 2

                                                      Go is for compilers, because Google builds a billion lines a day.

                                                3. 2

                                                  return errors.Wrapf(err, "fooing %s", bar) is a bit nicer.

                                                  1. 13

                                                    That uses the non-standard errors package and has been obsolete since 1.13: https://stackoverflow.com/questions/61933650/whats-the-difference-between-errors-wrapf-errors-errorf-and-fmt-errorf

                                                    1. 1

                                                      Thanks, that’s good to know.

                                                    2. 8

                                                      return fmt.Errorf("fooing %s %w", bar, err) is idiomatic.

                                                      1. 8

                                                        Very small tweak: normally you’d include a colon between the current message and the %w, to separate error messages in the chain, like so:

                                                        return fmt.Errorf("fooing %s: %w", bar, err)
                                                        
                                                    3. 1

                                                      It makes error messages useful but if it returns a modified err then I can’t catch it further up with if err == someErr, correct?

                                                      1. 2

                                                        You can use errors.Is to check wrapped errors - https://pkg.go.dev/errors#Is

                                                        Is unwraps its first argument sequentially looking for an error that matches the second. It reports whether it finds a match. It should be used in preference to simple equality checks

                                                        1. 2

                                                          Thanks! I actually didn’t know about that.

                                                        2. 2

                                                          Yes, but you can use errors.Is and errors.As to solve that problem. These use errors.Unwrap under the hood. This error chaining mechanism was introduced in Go 1.13 after being incubated in the “errors” package for a long while before that. See https://go.dev/blog/go1.13-errors for details.

                                                      1. 5

                                                        Admit it. If you browse around you will realize that the best documented projects you find never provide that docs generated directly from code.

                                                        Is this saying that you shouldn’t use Javadoc or pydoc or “cargo doc”, where the documentation is located in the source files? So, from the previous point, it’s essential that docs live in the same repo as the code, but not the same files as the code? Seems like a pretty extreme position relative to the justification.

                                                        1. 18

                                                          As a concrete example, Python’s official documentation is built using the Sphinx tool, and Sphinx supports extracting documentation from Python source files, but Python’s standard library documentation does not use it - the standard library does include docstrings, but they’re not the documentation displayed in the Standard Library Reference. Partially that’s because Python had standard library documentation before such automatic-documentation tools existed, but it’s also because the best way to organise a codebase is not necessarily the best way to explain it to a human.

                                                          As another example in the other direction: Rust libraries sometimes include dummy modules containing no code, just to have a place to put documentation that’s not strictly bound to the organisation of the code, since cargo doc can only generate documentation from code.

                                                          There’s definitely a place for documentation extracted from code, in manpage-style terse reference material, but good documentation is not just the concatenation of small documentation chunks.

                                                          1. 1

                                                            Ah, I was thinking of smaller libraries, where you can reasonably fit everything but the reference part of the documentation on one (possibly big) page. Agreed that docs-from-code tools aren’t appropriate for big projects, where you need many separate pages of non-reference docs.

                                                          2. 10

                                                            There’s definitely a place for documentation extracted from code, in manpage-style terse reference material, but good documentation is not just the concatenation of small documentation chunks.

                                                            Can’t agree enough with this. Just to attempt to paint the picture a bit more for people reading this and disagreeing. Make sure you are thinking about the complete and exhaustive definition of ‘docs’. Surely you can get the basic API or stdlib with method arity and expected types and such, but for howtos and walkthroughs and the whole gamut it’s going to take some effort. And that effort is going to take good old fashioned work by technical folks who also write well.

                                                            It’s taken me a long time to properly understand Go given that ‘the docs’ were for a long time just this and lacked any sort of tutorials or other guides. There’s been so much amazing improvement here and bravo to everyone who has contributed.

                                                            On a personal note, the Stripe docs are also a great example of this. I cannot possibly explain the amount of effort or care that goes into them. Having written a handful of them myself, it’s very much “a lot of effort went into making this effortless” sort of work.

                                                            1. 8

                                                              Yeah I hard disagree with that. The elixir ecosystem has amazing docs and docs are colocated with source by default for all projects, and use the same documentation system as the language.

                                                              1. 2

                                                                Relevant links:

                                                              2. 5

                                                                The entire D standard library documentation is generated from source code. Unittests are automatically included as examples. It’s searchable, cross-linked and generally nice to use. So yeah, I think this is just an instance of having seen too many bad examples of code-based docs and not enough good ones.

                                                                When documentation is extracted from code in a language where that is supported well, it doesn’t look like “documentation extracted from code”, it just looks like documentation.

                                                                1. 4

                                                                  Check out Four Kinds of Documentation. Generated documentation from code comments is great for reference docs, but usually isn’t a great way to put together tutorials or explain broader concepts.

                                                                  It’s not that documentation generation is bad, just that it’s insufficient.

                                                                  1. 2

                                                                    Maybe the author is thinking about documentation which has no real input from the developer. Like an automated list of functions and arguments needed with no other contextual text.

                                                                  1. 3

                                                                    Maybe I’m a bit crazy but I feel like if you’re programing elixir with message passing in mind, then you’re doing something wrong. 99% on your code should be purely functional and you should not be thinking about message passing (which is fundamentally stateful/effectful). Sure, it’s great to keep in mind that it’s happening behind the scenes, and that grokking the mechanism can give you confidence about the robustness of your code, but I do not architect the bulk of my code considering message passing, except if I’m digging really deep and writing e.g. a low-level tcp protocol implementation (which I almost never am).

                                                                    Is it different in the erlang community?

                                                                    1. 8

                                                                      Elixir and Erlang do not have a “pure FP” philosophy. Evaluation is eager, typing is dynamic, and side effects are basically never pushed into a monad in practice. Some of the core libraries even proudly expose global/cluster-wide state.

                                                                      The parts of FP that made it in (first-class functions, immutable data structures, certainly some other aspects I am missing) are there because they are useful for building systems that are resilient, debuggable, repairable, upgradable, and evolvable with ~zero downtime.

                                                                      That is the primary goal of Erlang, and Elixir inherits this legacy. It’s a goal, not a philosophy, so you may find competing ideas next to each other, because they’ve been shown to work well together in this context.

                                                                      1. 7

                                                                        total nitpick, but pure FP requires neither static typing nor lazyness.

                                                                        1. 2

                                                                          Also effects in FP languages can be, and usually are modeled using process calculi which are exactly what Erlang offers!

                                                                          That being said, Erlang also has side effects apart from message passing.

                                                                        2. 2

                                                                          never claimed it is pure FP. The VM is nice in that gives you places to breach FP pureness in exactly the right spots and makes it very hard or ugly to breaching pureness where doing so is dangerous.

                                                                          1. 4

                                                                            My mistake, I thought you were surprised at message passing from a pure-FP point of view.

                                                                            Another reason to think of message passing, and more broadly genservers/processes, in particular is that they can become bottlenecks if used carelessly. People talk about genservers as a building block of concurrency, which isn’t false, but from another point of view they are Erlang’s units of single-threaded computation. They only process one message at a time, and this is a feature if you know how/when to use it, but a drawback at other times. Effective Elixir or Erlang development must keep in mind the overall pattern of message passing largely to avoid this issue (and, in highly performance-sensitive cases, to avoid the overhead of message passing itself).

                                                                            1. 1

                                                                              Love reminding people that genservers are a unit of single threaded ness.

                                                                              1. 1

                                                                                I still can’t find a good word for it! https://twitter.com/gamache/status/1390326847662137355

                                                                        3. 6

                                                                          I can only offer my own experience (5 years of Erlang programming professionally).

                                                                          Message passing, like with FP, is a tool that can be (mis)used. Some of the best uses of multiple processes or message passing that I see in code are:

                                                                          • Enforcing a sequence on unordered events, or enforcing the single writer principle.
                                                                          • Bounded concurrency. Worker pools, queues, and more.
                                                                          • Implementing protocols. Protocols for passing messages between processes serve to standardize and abstract. Suppose you have a service on TCP that you want to extend to work over Websockets. The well-architected solution for this has 3 kinds of processes. 1 process that receives Erlang terms, and 2 processes that receive data along some transport (TCP, Websockets, etc.), and send Erlang terms. Structuring Erlang code in this way is an amazing aid in keeping code simple and organized.

                                                                          I’ll generally come across problems that are solved by processes/message passing when writing libraries. When writing application code that uses those libraries, it’s usually far less common.

                                                                          1. 4

                                                                            my advice is typically:

                                                                            • are you writing a library? You probably don’t need a genserver (except for odd things like “I need a “fire and forget” genserver to wrap an ets table, well, yeah).
                                                                            • ok so you still think you need a genserver? did you try Task (this is elixir-specific)
                                                                            • are you wrapping a stateful communication protocol? then go ahead use genserver.
                                                                            • are you creating a ‘smart cache’ for something IRL or external to the vm? then go ahead and use genserver
                                                                            • are you creating temporary shared state between users (like a chat room?) then go head and use genserver

                                                                            I like the bounded concurrency one. Should probably add it to my list. Are you creating a rate limiter or resource pool? then use genserver.

                                                                            1. 3

                                                                              There is nothing wrong with using gen_server in library. The thing is that in most cases it is not you who should start and manage that process - leave it up to the user. The “problem” is that there are 3 “kinds” of projects in Erlang world:

                                                                              • “libraries” which should not start their own supervisor in general and should leave all process management up to the end user
                                                                              • “library applications” which should be mostly self contained, and are mostly independent from the main application, for example OpenTelemetry, systemd integration, Telemetry Poller, etc.
                                                                              • “end-user applications” your application, where you handle all the data and processes on your own

                                                                              In each of these parts there are different needs and message passing will be used more or less, depending on the needs.

                                                                              1. 1

                                                                                150% this. Dave Thomas got this trichotomy right in his empex talk, it’s just infuriating to me that his choice of “names” for these things is unnecessarily confusing.

                                                                                1. 1

                                                                                  Sorry I mistyped… It should be “are you writing a library? If not, you should probably not be writing a genserver.

                                                                            2. 2

                                                                              In my experience (having done Elixir for 8 years), folks that don’t understand/think about messages and genservers in Elixir are at a severe disadvantage when debugging and grokking why their code is behaving some way.

                                                                              It’s a lot like people who in the previous generation of webdevs learned Rails but not Ruby…which is fitting, since those are the folks driving adoption of Phoenix.

                                                                              (There are also certainly people who reach for a genserver for everything, and that’s a whole additional annoying thing. Also the people who use Tasks when they really, really shouldn’t.)

                                                                              1. 1

                                                                                I’ve not used Elixir, but it sounds from your description as if it has some abstractions that give you Erlang (technically Beam, I guess) processes but abstract away the message-passing details? The last time I wrote a non-trivial amount of Erlang code was about 15 years ago, but at least back then Erlang didn’t have anything like this. If you wanted concurrency, you used processes and message passing. If you didn’t want concurrency, there were several functional programming languages with much faster sequential execution than Erlang, so you’d probably chosen the wrong tool.

                                                                                1. 1

                                                                                  Processes are the core feature that provides both error isolation and fault tolerance. As you build more robust systems in Elixir, the number of times you use processes increases.

                                                                                  Often it is correct to abstract this away behind a module or API, but you’re still passing messages.

                                                                                1. 3

                                                                                  All the time. Frequently when I’m backpacking outdoors the code I’m working on invades my dreams. I don’t mind it, it motivates me to get to it when I get off the trail.

                                                                                  1. 1

                                                                                    I frequently experience insights when backpacking, or even day hiking. Its really nice to let my mind wander when I am going along a trail.

                                                                                  1. 3

                                                                                    For Rust, why does this say that you need no_std to get portable binaries? I don’t feel like this is true. If you just want a portable binary within the same OS, you can use the MUSL libc target and statically compile everything. It’s crazy easy.

                                                                                    1. 1

                                                                                      Suppose you need to write something on a bare metal architecture, no operating system even, just an assembly bootstrap.

                                                                                      1. 1

                                                                                        Ok, sure, then you’d need no_std. But if we’re just talking about a binary being portable between different environments with the same kernel, then Rust + musl libc gets you there.

                                                                                        The biggest hit to portability is dynamic linking. Once you start dynamically linking, you have to worry about what version of your dependencies are on your users’ machines. Or your users have to worry about it, at least.

                                                                                        Portability between different operating systems is a less important problem, though seemingly still interesting given how much attention Cosmopolitan libc gets.

                                                                                        Portability to the extent of “I don’t even need an OS” is probably not that valuable.

                                                                                    1. 4

                                                                                      The problem is that almost no one will use IEEE halfs; for the reasons in the blogpost. F16 is probably only if ever going to be used in machine learning, and even in some cases people have moved on to u2 (xnor.ai, inference-only), u8-ish (google, TPU, inference-only). I did some research showing that training models can probably be done effectively using an 8-bit float if you expand to a higher precision during summation phase of kronecker products in backpropagation.

                                                                                      For machine learning, as the article says, bfloat16 is the most popular (google, TF/TPU/gpus), but nvidia also has confusingly named TensorFloat32, which is a 16-bit format.

                                                                                      1. 8

                                                                                        Keep going, guys. Every time I read about Zig I get a bit closer to deciding to try it. (Srsly.) it helps that my current project uses very little memory allocation, so Zig’s lack of a global allocator won’t be as big a deal. (Previosly I have been turned off by the need to pass or store allocator references all over the place.)

                                                                                        1. 5

                                                                                          For end-product projects there’s really nothing wrong with setting a global allocator and using it everywhere! You can even use that single point of reference to swap in test allocator in tests so you can check for memory leaks in all of your unit and integration tests. You might want to be more flexible, composable, and unopinionated (or maybe strategically opinionated) with allocator strategies if you’re writing a library.

                                                                                          1. 5

                                                                                            Good point! I tend to write libraries, though.

                                                                                            The scenario that worries me is where I’ve got a subsystem that doesn’t allocate memory. Then I modify something down inside it so that it does need to allocate. Now I have to plumb an allocator reference through umpteen layers of call stack (or structs) in all control flow paths that reach the affected function.

                                                                                            Maybe far fetched, but it gets uglier the bigger the codebase gets. I’ve had to do this before (not with allocators, but other types of state) in big layer-cake codebases like Chromium. It’s not rocket science but it’s a pain.

                                                                                            I guess I’m not used to thinking of “performs memory allocation” as a color of function.

                                                                                            1. 4

                                                                                              Then I modify something down inside it so that it does need to allocate.

                                                                                              To me this would be a smell, a hint that the design may want to be rethought so as not to have the possibility of allocation failure. Many code paths do have the possibility of allocation failure, but if you have an abstraction that eliminates that possibility, you’ve opened up more possible users of that abstraction. Adding the first possibility of allocation failure is in fact a big design decision in my opinion - one that warrants the friction of having to refactor a bunch of function prototypes.

                                                                                              As I’ve programmed more Zig, I’ve found that things that need to allocate tend to be grouped together, and likewise with things that don’t need to allocate. Plus there are some really nice things you can do to reserve memory and then use a code path that cannot fail. As an example, this is an extremely common pattern:

                                                                                              https://github.com/ziglang/zig/blob/f81b2531cb4904064446f84a06f6e09e4120e28a/src/AstGen.zig#L9745-L9784

                                                                                              Here we use ensureUnusedCapacity to make it so that the following code can use the “assumeCapacity” forms of appending to various data structures. This makes error handling simpler since most of the logic does not need to handle failure (note the lack of the word try after those first 3 lines). This pattern can be especially helpful when there is a resource that (unfortunately) lacks a cheap or simple way to deallocate it. If you reserve the other resources such as memory up front, and then leave that weird resource at the end, you don’t have to handle failure.

                                                                                              A note on safety: “assumeCapacity” functions are runtime safety-protected with the usual note that you can opt out of runtime safety checks on a per-scope basis.

                                                                                              1. 1

                                                                                                the design may want to be rethought so as not to have the possibility of allocation failure.

                                                                                                True, I’ve been forgetting about allocation failures because I code for big iron, like phones and Raspberry Pis 😉 … but I do want my current project to run on li’l embedded devices.

                                                                                                The ensureUnusedCapacity trick is neat. But doesn’t it assume that the allocator has no per-block overhead? Otherwise the heap may have A+B+C bytes free, but it’s not possible to allocate 3 blocks of sizes A, B and C because each heap block has an n-byte header. (Or there may not be a free block big enough to hold C bytes.)

                                                                                                1. 2

                                                                                                  Speaking of allocators: the Zig docs say that implementations need to satisfy “the Allocator interface”, but don’t say what an “interface” is. That is in fact the only mention of the word “interface” in the docs.

                                                                                                  I’m guessing that Zig supports Go-like interfaces, but this seems to be undocumented, which is weird for such a significant language feature…?

                                                                                                  1. 2

                                                                                                    The Allocator interface is actually not part of the language at all! It’s a standard library concept. Zig does not support Go-like interfaces. It in fact does not have any OOP features. There are a few patterns you can use to get OOP-like abstractions, and the Allocator interface is one of them.

                                                                                                    The ensureUnusedCapacity trick is neat. But doesn’t it assume that the allocator has no per-block overhead?

                                                                                                    The ensureUnusedCapacity usage I showed above is using array lists and hash maps :)

                                                                                                    1. 3

                                                                                                      Will I get in trouble if i start calling it the allocator factory?

                                                                                                      1. 1

                                                                                                        Lol

                                                                                              2. 2

                                                                                                It’s not function coloring (or if it is, it doesn’t feel like it), because in your interfacing function you can fairly trivially catch the enomem and return a null value that will correspond to allocation failure… And you return that, (or panic on alloc failure, if you prefer), in either case you don’t have to change your stack at all. Since allocators are factories, it’s pretty easy to set it up so that there is an overrideable default allocator.

                                                                                                As an example, here is some sample code that mocks libc’s calloc and malloc (and free) with zig’s stdlib “testing.allocator” which gives you memory leak analysis in code that is designed to use the libc functions, note that testing.allocator, of course, has zig’s standard memory allocator signature, but with the correct interface that doesn’t have to travel up the call stack and muck up the rest of the code, that is expecting something that looks like libc’s calloc/malloc (which of course doesn’t have error return values, but communicates failure with null):

                                                                                                https://gist.github.com/ityonemo/fb1f9aca32feb56ad46dd5caab76a765

                                                                                                1. 1

                                                                                                  I guess I’m not used to thinking of “performs memory allocation” as a color of function.

                                                                                                  We should, though. Although, it should be something handled like generics.

                                                                                                  1. 2

                                                                                                    It’s not, though (see sibling reply with code example).

                                                                                            1. 1

                                                                                              I actually feel that we should compare Zig to Go, because it is closer in concept to what I think Go wanted to be

                                                                                              Eh.. kinda, yes, as in Go “wanted to be a modern C”, but it seems like fully automatic memory management was a really big part of their vision of “modern”.

                                                                                              1. 11

                                                                                                Yeah, I never understand comparing Go to C, Go competes more with Java at the “mid level garbage collection care about performance a bit” tier

                                                                                                1. 5

                                                                                                  I think it’s because it’s small and imperative, and also doesn’t hide pointers from you.

                                                                                                  1. 8

                                                                                                    Go compiles to native code. Besides the performance benefit, it also means that, like C/++, it produces a regular binary file anyone can simply run without having to install anything. This also makes binding to C APIs a lot easier.

                                                                                                    At the language level, Go and Java disagree violently about inheritance and error handling.

                                                                                                    1. 2

                                                                                                      Yes

                                                                                                      1. 3

                                                                                                        There is no reason Java can’t compile to native as well. And with GraalVM it is becoming more and more prevalent.

                                                                                                    2. 4

                                                                                                      I assume the connection to C is primarily because of Ken Thompson’s involvement.

                                                                                                      1. 2

                                                                                                        i always feel like Go is a lovechild of C and Scheme : compiles to native, compiler can detect stack allocatable activation records as in C, with a huge standard library, but the compiler can also generate functions whose activation records are heap allocated for situations needing a construction of a closure for runtime realization of lexical scoping like Scheme (and , unlike Java’s half-hearted version).

                                                                                                      2. 6

                                                                                                        There’s also a “X for me but not for thee” division between how go’s stdlib is implemented under the hood versus user code. That philosophy is non-existent in zig, and often I find myself cribbing code snippets or concepts from stdlib in zig.

                                                                                                      1. 14

                                                                                                        The (almost) same idea is also known under the name of “Configuration complexity clock” (by Mike Hadlow): http://mikehadlow.blogspot.com/2012/05/configuration-complexity-clock.html

                                                                                                        1. 4

                                                                                                          I thought the same thing! Thanks for reposting this article.

                                                                                                          Both these articles have a flavor of satire/this is insanity. One thing I mused on recently in my last two jobs is maybe we should lean in to this pattern, because a lot of the choices are sensible, and set out to establish formal or informal best practices as to when to move on from one stage to the next, and how best to manage, document, and plan the transitions.

                                                                                                          1. 2

                                                                                                            I don’t think this is insanity, although its comical that even principal engineers get stuck in this cycle without realizing it.

                                                                                                            To your point, I do think that we should absolutely lean into this and carve out some best practices. I might write a follow-up soon!

                                                                                                            1. 1

                                                                                                              Thanks! I wasn’t sure how to read your tongue in cheekness!

                                                                                                          2. 1

                                                                                                            Yep! Mike’s post was top of mind when I wrote this. I thought I’d update it a decade later with how its changed (albeit only slightly) today.

                                                                                                          1. 9

                                                                                                            But the problem is fundamentally unsolvable, because it’s built into Julia on a basic design level.

                                                                                                            Would love to read more about this! My naive understanding is that this mostly just quality of implementation issue: Julia uses LLVM to compile code during execution, and lacks a tiered JIT. Are there reasons why we can’t just add interpreter tier to Julia, beyond “someone has to do that work”?

                                                                                                            1. 10

                                                                                                              There is some debate about it here, including from a core Julia dev.

                                                                                                              https://news.ycombinator.com/item?id=27961251

                                                                                                              My (mostly uninformed) feeling is that the job of the compiler is to get rid of abstractions (e.g. monomorphizing code), and perfect optimization is a global process. So when any piece of code can be redefined at any time, that makes recompilation slow.

                                                                                                              Of course that doesn’t mean you can’t do some tricks, but it’s working against the grain of the problem.

                                                                                                              I’d also say from reading about v8 over the years, the tiers seem to become a huge tax. It’s not just “someone has to do that work”, but “every language change from now on requires more work” (from a limited group of people; it’s not orthogonal).

                                                                                                              I don’t think the Julia language is done evolving, so I can see that duplicating the semantics of the language in multiple places is something they would be reluctant to do. (again this is pure speculation) Hopefully there is some kind of IR that makes this less burdensome, but compilers are always messier than you’d like to think :)


                                                                                                              edit: I think the quote is a shorter way to explain it.

                                                                                                              Just as separate compilation isn’t possible for C++ template code, it’s a challenge for Julia as well.

                                                                                                              e.g. if all your C++ code is in templates – and there are some styles that lean that way for zero runtime cost – then C++ doesn’t have incremental compilation at all. It has plenty of duplicate compilation if you like :)

                                                                                                              1. 3

                                                                                                                It also sounds like they can improve the caching / precompilation:

                                                                                                                While currently precompile can only save the time spent on type-inference, in the long run it may be hoped that Julia will also save the results from later stages of compilation. If that happens, precompile will have even greater effect, and the savings will be less dependent on the balance between type-inference and other forms of code generation.

                                                                                                                https://julialang.org/blog/2021/01/precompile_tutorial/

                                                                                                                1. 1

                                                                                                                  Yeah, caching is the first thing I thought of when I saw the “unsolvable” problem. I wonder if caching the whole heap could be an option here as it is in some Standard ML implementations, SBCL, …

                                                                                                                2. 3

                                                                                                                  Given the existence of things like ghci/runhaskell that also have to compile a complex language before they start to run, I feel like it can’t be unsolvable

                                                                                                                  1. 2

                                                                                                                    Are there reasons why we can’t just add interpreter tier to Julia

                                                                                                                    I have had this exact thought; I don’t think there is a fundamental reason why not, though it would take a tremendous amount of refactoring.

                                                                                                                    1. 3

                                                                                                                      You can already set the compilation level of Julia per function, and the lowest level is sort of an interpreter.

                                                                                                                      There has been some work on a fully ahead of time compiler for Julia and the core team have mentioned using more conventional JIT techniques with an interpreter level, too.

                                                                                                                      1. 2

                                                                                                                        Yeah I mean it’s not hard to see why “just go and create a second implementation of your language that retains perfect compatibility with the first” isn’t really something a lot of people want to hear.

                                                                                                                        1. 1

                                                                                                                          I imagine there’s a way too could do it incrementally with careful planning, but I don’t know enough about Julia internals to make any real statements about the level of difficulty that entails. Could be really easy for all I know

                                                                                                                    1. 5

                                                                                                                      Ah man, I’m a bit disappointed there wasn’t much discussion of the new data model. I would have been very interested in seeing what made the new version more efficient and why zig is a better language for implementing it than C :\

                                                                                                                      1. 7

                                                                                                                        As mentioned in the article, it seems that Zig was not chosen because it allowed more things, but because it was easier and more fun to prototype on it… and it ended up with a full V2.

                                                                                                                        1. 3

                                                                                                                          Ask for a follow-up!!

                                                                                                                          1. 12

                                                                                                                            Asked the author and he told me on the fediverse that he do plan to do a followup adressing the data model comparisons and how doing it in C would have been more painful.

                                                                                                                          2. 2

                                                                                                                            The Zig source is very readable. I took a read through and paged through the code on the new data model. https://code.blicky.net/yorhel/ncdu/src/branch/zig is the zig branch, which you can checkout or browse if you’re interested.

                                                                                                                          1. 7

                                                                                                                            We already have various tools for enabling growth: the freedom to use the software for any purpose being one of the most powerful.

                                                                                                                            This has been tried for the last 30 years; with very limited success, and the success it did have is probably not primarily attributable to “Software Freedom” either.

                                                                                                                            Essentially this article is “keep doing what we’ve been doing for the last 30 years”, which doesn’t strike me as a very good strategy.

                                                                                                                            1. 6

                                                                                                                              with very limited success

                                                                                                                              Hobbyist software hasn’t exactly displaced end-user software, but the major browsers are (corporate) open-source and pretty much all developer tooling and infrastructure is. I’d say open source has been very successful!

                                                                                                                              1. 7

                                                                                                                                Hobbyist software hasn’t exactly displaced end-user software, but the major browsers are (corporate) open-source and pretty much all developer tooling and infrastructure is. I’d say open source has been very successful!

                                                                                                                                Ok but how much browser hacking have you done? How many times have you opened up Blender’s source code – or Chromium’s – and gone “ok I can make a change to this within a reasonable timeframe to make it do what I want”.

                                                                                                                                Free as in freedom doesn’t mean shit when codebases are so large as to be inscrutable and induplicable.

                                                                                                                                1. 10

                                                                                                                                  Something as complex as a modern web browser is going to have complex source code. There’s no way around that, and it has nothing to do with being open source or not.

                                                                                                                                  1. 4

                                                                                                                                    Sometimes those codebases are large because they actually do something useful.

                                                                                                                                    I will say I’ve gone in on complex projects before (usually language runtimes) when needed/desired and been able to figure out changes. Usually, I don’t because I have no need.

                                                                                                                                    1. 2

                                                                                                                                      On one hand, there are two hobbyist forks of Chromium that I use: Ungoogled Chromium (for desktop) and Bromite (for Android). These are both Chromium with no google services. That these projects are possible is a success of open source.

                                                                                                                                      On the other hand, In order to have an ecosystem that fully embodies the spirit and values of free software, we need to embrace the OP’s recommendation, and go well beyond that. Software needs to be much simpler. Languages need to be much simpler. Hypercomplex mega-apps need to be replaced by an ecology of small tools that interoperate on a shared data model. (We had that in the Unix CLI, but then we threw that away when we permitted Apple and Microsoft to define how a GUI works.) Operating systems need to allow you to trivially examine and modify any code that you happen to be running, much more like Smalltalk or a Lisp machine, and much less like the C + Unix model that the community adopted back in the 1980’s.

                                                                                                                                    2. 3

                                                                                                                                      Hobbyist software hasn’t exactly displaced end-user software

                                                                                                                                      Linux was originally a hobby OS, so, at least “software that started as hobbyist” did displace end-user everywhere on internet servers.

                                                                                                                                      1. 2

                                                                                                                                        It’s been very successful at developer tooling. Which makes sense, since the whole ‘you can modify and change it as you please’ ethic lends itself naturally towards developers who can make those changes. It’s failed pretty hard at end-user software, with browsers being one of the only exceptions I can think of.

                                                                                                                                        1. 2

                                                                                                                                          I guess one thing is that end-users often are unaware things might be changeable. Many still view computing as a black box that only the “gifted” can bend to their will. Everyone has a “handy cousin” who can fix their printer, but not everyone has a handy cousin who can change the program for them to do as they will.

                                                                                                                                          And besides, if someone would ask me to create what is in essence a private fork of a program, I would be very hesitant at doing so: I would have to maintain that until the end of days, and somehow incorporate security updates into it. No, thanks!

                                                                                                                                          And paying a company to do the same would get prohibitively expensive for most people, especially if it’s just a small seemingly trivial change like, I dunno, “could you change it so that the menu bar is at the bottom rather than at the top?”

                                                                                                                                          Of course the more savvy users would end up making implementation tickets in the ticket trackers of the projects they care about, but I doubt the small free software projects would be able to deal with the large inflow of such “trivial” requests without patches, or even the certainty people would be able to verify the new version does what they want. Dev: “can you compile this most recent version and test it?” User: “what does ‘compile’ mean?”

                                                                                                                                        2. 1

                                                                                                                                          I’d say open source has been very successful!

                                                                                                                                          The point of this article is that it’s been successful in a way that hasn’t meaningfully accomplished much in terms of user empowerment. Open source gave us the browser ecosystem which has been a very powerful force for allowing tech companies to deliver new products (which is exactly why open source exists) but it’s succeeded only by redirecting goodwill away from the user-centric free software movement towards the corporate-friendly goals of open source. So we have these incredibly powerful browsers that are borderline impossible for anyone who doesn’t work at Google/Apple/Mozilla to contribute to. We have mobile phones which have impressive computing power and are largely technically open source, but they spy on us and don’t allow us to do much to protect against hostile corporate behavior even on our own devices.

                                                                                                                                          By focusing on technologies which are intentionally anti-scale, we may be able to redirect some of that lost momentum from the last couple decades back to technologies which can put the user in control and respect consent and autonomy.

                                                                                                                                          1. 1

                                                                                                                                            But is this attributable to “software freedom” or something else?

                                                                                                                                            How many people use Chrome because it’s Free Software? I’d say very very few.

                                                                                                                                        1. 19

                                                                                                                                          This was much better than I thought, and worth reading for sql fans.

                                                                                                                                          My main disagreement is that this conflates two things:

                                                                                                                                          1. Sql being seriously suboptimal at the thing it’s designed for; and distinctly
                                                                                                                                          2. Sql being bad at things general purpose programming languages are good at.

                                                                                                                                          There’s value in a restricted language with a clearly defined conceptual model that meets well defined design goals. Despite serious flaws sql is quite good at its core mission of declarative relational querying.

                                                                                                                                          In many ways the porosity story is not bad - for example Postgres lets you embed lots of languages. I think a lot of the criticisms here really mean that more than one language is needed, and integrating them smoothly is the issue.

                                                                                                                                          For me, better explicit declaration of what extensions are required for a query to run would make things more maintainable. I think the criticisms in the article around compositionality are in the right area at least - much more clarity would be better here.

                                                                                                                                          In terms of an upgrade path - if we accept that basically sql is pretty sound but too ad hoc - then this is a very similar problem to that of shell programming. I find the “smooth upgrade path” theory of oil shell plausible (and I’d add that Perl in many ways was a smooth upgrade from shell) although many more people have attempted smooth upgrade paths than have succeeded.

                                                                                                                                          My best guess as to how to do it would be to implement an alternative but similar and principled language on top of at least two popular engines - probably drawn from the set of SQLite, Postgres, and MySQL - that accommodates the different engines being different and allows their differences to be exposed in a convenient way. If you can get the better query language into at least two of those, you’ll be reaching a large audience who are actually trying to do real work. All easier said than done, of course.

                                                                                                                                          1. 15

                                                                                                                                            Sql being bad at things general purpose programming languages are good at.

                                                                                                                                            I think this (and what follows) is a misinterpretation.

                                                                                                                                            The core idea is not to change things such that SQL is suddenly good a GP tasks, but to adopt the things from GP languages that worked well there, and will also work well in the SQL context; for instance:

                                                                                                                                            • Sane scoping rules.
                                                                                                                                            • Namespaces.
                                                                                                                                            • Imports.
                                                                                                                                            • Some kind of generic programming.

                                                                                                                                            These things alone would enable people to write “cross-database SQL standard libraries” that would make it easier to write portable SQL (which the database vendors are obviously not interested in).

                                                                                                                                            Which would then free up resources from people who want to improve communication with databases in other ways¹ – because having to write different translation code for 20 different databases and their individual accumulation of 20 years of “quirks” is a grueling task.

                                                                                                                                            principled language on top of at least two popular engines - probably drawn from the set of SQLite, Postgres, and MySQL - that accommodates the different engines being different and allows their differences to be exposed in a convenient way

                                                                                                                                            I think most of the ecosystems weakness comes from any non-trivial SQL code being non-portable. I would neither want “differences exposed in a convenient way”, nor would I call a language that did that “principled”.


                                                                                                                                            ¹ E. g. why does shepherding some data from a database into a language’s runtime require half a dozen copies and conversions?

                                                                                                                                            1. 2

                                                                                                                                              I guess maybe I just disagree on the problem. I don’t think portability is a very important goal, and I would give it up before pretty much anything else.

                                                                                                                                              1. 5

                                                                                                                                                Portability is not the important goal, it’s simply the requisite to get anything done, including things you may consider an important goal.

                                                                                                                                                Because without it, everyone trying to improve things is isolated into their own database-specific silo, and you have seen the effect of this for the last decades: Little to no fundamental improvements in how we use or interact with databases.

                                                                                                                                                1. -2

                                                                                                                                                  No I don’t think so.

                                                                                                                                            2. 7

                                                                                                                                              My best guess as to how to do it would be to implement an alternative but similar and principled language on top of at least two popular engines - probably drawn from the set of SQLite, Postgres, and MySQL

                                                                                                                                              That’s exactly what I did with Preql, by making it compile it to SQL (https://github.com/erezsh/Preql)

                                                                                                                                              Still waiting for the audience :P

                                                                                                                                              1. 3

                                                                                                                                                Yeah but (a) it’s not available out of the box (b) it’s not obvious there’s a smooth upgrade path here or even that this is the language people want. Which is only somewhat of a criticism- lots of new things are going to have to be tried before one sticks.

                                                                                                                                              2. 2

                                                                                                                                                Sql being seriously suboptimal at the thing it’s designed for; and distinctly

                                                                                                                                                Sql being bad at things general purpose programming languages are good at.

                                                                                                                                                Excellent point. Bucketing those concerns would make this “rant” even better! I do think that stuff falls into both buckets (though that which falls into the first buckets are trivially solvable, especially with frameworks like linq or ecto, or gql overlays). The second category though does reflect that people do want optimization for some of those things, and it’s worth thinking about how a “replacement for sql” might want to approach them.

                                                                                                                                              1. 5

                                                                                                                                                I’m glad to see this on the front page of lobste.rs! It’s one of my favorite programming blog posts of all time, along with its pseudo-companion, 1.

                                                                                                                                                Structured concurrency will be, I think, amazing, but I have had an idea recently that I think would make it even better.

                                                                                                                                                Structured concurrency, as currently defined, turns execution into a tree of threads. But what if we could turn it into a DAG?

                                                                                                                                                It would have these restrictions:

                                                                                                                                                1. To prevent potential memory cycles (and thus, needing a garbage collector or reference counting), forbid all types from being able to, directly or indirectly, have pointers to themselves.
                                                                                                                                                2. Some form of RAII with automatic calling of destructors, like C++ or Rust’s Drop trait.
                                                                                                                                                3. Have structured concurrency as it currently stands.
                                                                                                                                                4. But add the ability for threads to ask another thread to run code for them.

                                                                                                                                                Number 4 is the magic. What happens is that, when a thread creates data (the “create thread”) that other threads might want to use, it then creates a child thread to run the code it would have run on the data and waits for requests from other threads. Then, when other threads put in requests for the data, instead of getting a pointer to that data and continuing on their merry way, requiring reference counting for the data, each thread sends a function to the create thread, from which the create thread spawns another child thread that has access to the data it created, which would give every thread indirect access to the data that it wanted.

                                                                                                                                                Then, every thread that requested the data is paused, waiting, while the execution of the thread it requested runs.

                                                                                                                                                The reason the create thread spawns a child thread instead of just running code on the data itself is two-fold:

                                                                                                                                                1. It allows it to respond to requests, and
                                                                                                                                                2. It prevents the situation where an already existing child of the create thread requests the data and then the child is dependent on its parent to exit first. If the child of the create thread is dependent on a sibling to exit first, all is well.

                                                                                                                                                In other words, creating the child thread first is what keeps the execution a DAG.

                                                                                                                                                Number 1 (at top) is important to get rid of all need of any kind of GC, including borrow checking!

                                                                                                                                                Some may claim that it’s impossible to eliminate cycles, but you can always create a “parent” type that points to all of the child types that could be in a cycle.

                                                                                                                                                For example, to create a linked list, create a parent type that stores an array with data, and alongside the data, it stores handles that it knows how to interpret as the prev and next pointers. The children don’t know how to interpret those handles, but the parent does, and it can do everything you need it to do in order to emulate a linked list.

                                                                                                                                                A programming language that provides these facilities should also have reference counting, like Rust does despite its borrow checker, because sometimes, more complicated things are needed. However, if a programmer decides to program with these restrictions, they would have these benefits:

                                                                                                                                                1. No memory leaks.
                                                                                                                                                2. No use-after-free.
                                                                                                                                                3. No memory safety issues.
                                                                                                                                                4. No borrow checker to explode compilation times.
                                                                                                                                                5. Memory allocation that could be entirely done with bump allocators, except for data that may need to grow, such as resizable arrays or maps. Even dynamically-allocated stuff that knows what size it needs, but doesn’t need to change its size after allocation, could use the bump allocator.

                                                                                                                                                While there are some details (a compiler would need to figure out when an item escapes a function), I think these ideas could be pretty useful.

                                                                                                                                                As for escape analysis, we could make it easy and say that the compiler should just error if a value escapes. I personally would prefer something more sophisticated, where we ensure that a value does not escape to global variables. If a value escapes by being returned, the compiler should just treat that as though the caller has responsibility for it, and it already knows what to do in those cases (nothing in the callee, and call the destructor in the caller, unless it escapes, etc.).

                                                                                                                                                However, I could be naive; this could be a fantasy. So I guess I am asking all of you: could it work?

                                                                                                                                                1. 4

                                                                                                                                                  In E and Monte, vats form a tree, while promises form a DAG under resolution. This achieves the balance which you’re imagining. The tradeoff is that many low-level details are lost, which is not what everybody wants.

                                                                                                                                                  A vat is an actor which manages a queue of incoming actions, called “messages”, which are delivered to objects; each delivery is a “turn” of a vat. An object is only referred to by exactly one vat at a time. This is similar to the discipline of nurseries, except that sometimes a vat might be “immortal” and continue to exist even when there are no pending messages to deliver; for example, the Monte REPL’s vat is immortal:

                                                                                                                                                  ⛰  currentVat
                                                                                                                                                  Result: <vat(pa, immortal, 0 turns pending)>
                                                                                                                                                  

                                                                                                                                                  A vat can create or “sprout” many children vats, forming a tree. This is analogous to a tree of nurseries and has similar behavior.

                                                                                                                                                  When we enqueue pending actions upon a vat, we receive a promise for the return value. Promises are references to objects, but they are also first-class objects which can be stored in closures. This means that promises can refer to object graphs. And there is a builtin trick which uses promises to construct object cycles. However, the resolution of promises (looking up the referent object of a fulfilled promise) is acyclic, because a promise can only be resolved once per graph traversal, forcing the traversal to only explore a DAG.

                                                                                                                                                  I should point out that datalock is still possible with this model, and type systems generally cannot prevent it either.

                                                                                                                                                  1. 1

                                                                                                                                                    This is great! Now I can go dig deeper.

                                                                                                                                                  2. 3

                                                                                                                                                    If you can tolerate a higher level language, Everything you ask for basically baked into erlang, plus things you haven’t thought of yet, like combining failure domains (if task x dies, also take down task y… And that should trigger closing y’s open database connection (if it’s open) with 0 extra lines of code kthxbai) The nursery concept in OP is similar to elixir’s Task.async_stream.

                                                                                                                                                    1. 1

                                                                                                                                                      Yeah, I knew that. What I want is some language that can effectively replace C and Rust. C because bugs, Rust because complexity. That means that the language needs to have a minimal runtime and be able to have threads and processes. As far as I know, Erlang only provides the latter.

                                                                                                                                                      1. 1

                                                                                                                                                        Try zig? It doesn’t quite catch as many things as rust but it’s still quite nice, and way, way safer than C (you get overflow, underflow, buffer overflow, double free, memory leak, and many UAF checked in test; you’re writing tests, right?). I think the long play is that zig is very easy to parse (easier than C, tbh) so I think we will see formal verification tools for the cases where you need thread safety. These will also catch race conditions, which one can’t do generally at compile time if you care for a sane dev experience

                                                                                                                                                        1. 2

                                                                                                                                                          I admit I am tired of the Zig advocacy.

                                                                                                                                                          I know about Zig. I’ve talked with Zig’s designer. Zig does not have anything like this here, and in fact, with its async design, has one of the worst (in my opinion) concurrency stories among programming languages. Re-ifying stack frames is just not the greatest idea.

                                                                                                                                                          1. 1

                                                                                                                                                            I dunno, it worked great for me! (I seamlessly unrolled erlang’s awkward c abi “tail-call” to build in a yield statement that you can drop into an imperative zig loop). You can thus interchange your zig code among the different cooperation styles (cooperative, threaded, preemptive) to tune overhead, latency, and isolation, without having to write your code over and over. That level of flexibility speaks to the correct choice of reifying stack frames.

                                                                                                                                                            Anyways, to each their own

                                                                                                                                                            1. 1

                                                                                                                                                              Flexibility is the problem. Please do not push Zig on me.

                                                                                                                                                              1. 2

                                                                                                                                                                Please do not push Zig on me.

                                                                                                                                                                Anyways, to each their own

                                                                                                                                                  1. 8

                                                                                                                                                    See also Y.js, which is another CRDT library in JS that has wide usage. I’ve read about both Automerge and Y.js; my personal assessment is that Y.js has more users (eg Input, a commercial note-taking app that competes with Notion, my employer) and support (offers consulting, advertises bindings for the major JS editor frameworks) than Automerge. For a while the Y.js CRDT also had clear advantages over Automerge, but then Martin landed a big rework and now I expect that they’re roughly on the same performance footing I believe. I’d still need to benchmark them myself before I pick one for my next project.

                                                                                                                                                    Another aspect to keep in mind for these libraries is, can you write your database logic against the CRDT in Rust (or any other non-JS language)? Because CRDT is CPU expensive at scale, I would expect that using Rust or another threaded language that doesn’t die when CPU work happens on the server would eventually be required for a serious scale backend for a CRDT service. Both Yjs and Automerge also have rust port projects in the early stages.

                                                                                                                                                    1. 5

                                                                                                                                                      I’m a maintainer of the Rust port of Automerge. One of the big reasons I’m interested in it is less about performance and more about interoperability. Because Rust has no runtime and C-like FFI it’s possible to build wrappers in other languages which use the Rust codebase for the complicated CRDT parts. E.g this experimental library for python: https://github.com/automerge/automerge-py

                                                                                                                                                      1. 2

                                                                                                                                                        Thank you for your work! I didn’t mean just in the “rust is fast language”. You wrote:

                                                                                                                                                        I’m interested in it is less about performance and more about interoperability.

                                                                                                                                                        To me interoperability is also a performance advantage because I have more freedom to choose an ideal solution when I can run the CRDT algorithm in any process in my cluster, because Rust/C is embeddable.

                                                                                                                                                        An example idea in the scaling solution space that uses Rust would be to add an Automerge column type and operators to Postgres as an extension that uses your library, so that my front-end web server doesn’t need to read the entire document from Postgres and can just pass the diff on to the database directly by writing UPDATE doc SET doc.crdt = automerge(doc.crdt, $update) WHERE doc.id = $id.

                                                                                                                                                      2. 2

                                                                                                                                                        Thanks for sharing! I’m trying to figure out what storage/conflict-resolution library to use as the basis of an app that works in three modes: 1) browser-only 2) desktop 3) SaaS. Y.js and automerge are now on the top of my list as generic solutions to my problem.

                                                                                                                                                        1. 2

                                                                                                                                                          Keep in mind that CRDT come with space trade offs, a CRDT storing a text will always use more space than a plain string, and the CRDT’s space usage may grow without bound as it stored history. For local-only or offline-first, this concern is usually not a problem, but if you make a popular-enough SaaS, you’ll need to think about compaction and dropping old histories, as well as transmission costs when syncing data between nodes.

                                                                                                                                                          I think the simplicity of CRDT makes it a better choice than Operational Transform model, especially for 0->1 scale, but I haven’t come up with a backend architecture for scaling big document CRDTs that doesn’t involve one million shards and hand waving at the storage layer.

                                                                                                                                                          1. 1

                                                                                                                                                            Yeah for sure. History aside from the current “session” is not incredibly important I think.

                                                                                                                                                        2. 1

                                                                                                                                                          Non-JS implementations are also important for native mobile & desktop apps, as well as embedded systems. Not all client-side code is built for web browsers (or for web browsers packaged into a facsimile of an app.)

                                                                                                                                                          1. 1

                                                                                                                                                            … I would expect that using Rust or another threaded language

                                                                                                                                                            You don’t need a server in the first place, that’s part of the point of a CRDT; but a side effect is that if you want to have one (say passively making backups) the server can be rather lazy.

                                                                                                                                                            1. 3

                                                                                                                                                              You don’t need a server in the first place

                                                                                                                                                              I think this is a reductive attitude. CRDT enables offline-first authoring and system design, but many use-cases still work best with an online and timely server component.

                                                                                                                                                              Even for a simple use-case of writing a shopping list on my desktop at home and then viewing that list on my phone at the market, an always-on server node reachable somehow on the public internet is very beneficial to facilitate syncing. Otherwise, I need to make sure my two devices are active on the same network at the same time, and I need to verify they show the same state before I break their connection.

                                                                                                                                                              1. 1

                                                                                                                                                                Did you seriously not read everything after the semicolon?

                                                                                                                                                                1. 2

                                                                                                                                                                  I did, sorry for not including this in my initial response.

                                                                                                                                                                  Laziness is a fine solution at the family-and-friends use-case, and it certainly relaxes some constraints, but at scale you don’t have “down time” you can defer work to and you always need enough CPU cores and storage bandwidth to burn through the sync/backup queue so it doesn’t grow unbounded. All I was trying to say is that having a Rust or C implementation of the CRDT makes the scaling solution space more broad and much less expensive.

                                                                                                                                                          1. 1

                                                                                                                                                            Am I mistaken for having heard that ECMP is better for large scale data centers? Post makes no mention of alternatives and considerations for pursuing BGP as the chosen standard besides “it’s popular”

                                                                                                                                                            1. 5

                                                                                                                                                              BGP distributes routes and sets next hop. ECMP allows multiple routes to a single destination to exist and the router can forward packets across all outgoing interfaces usually in a load balanced way (using a l4 hash usually).

                                                                                                                                                              BGP is just one protocol to distributes routes. Equal Cost Multi Pathing does not distribute routes.

                                                                                                                                                              1. 2

                                                                                                                                                                The longer paper linked in the post seems to mention ECMP, and the positives & negatives of BGMP. Link to the paper: https://research.fb.com/wp-content/uploads/2021/03/Running-BGP-in-Data-Centers-at-Scale_final.pdf

                                                                                                                                                              1. 1

                                                                                                                                                                In Canada? No. Engineering is a special accredited title that bears a lot of responsibility. Very, very few software developers here are actual engineers.

                                                                                                                                                                1. 14

                                                                                                                                                                  The article has a whole section on the question of licensing.

                                                                                                                                                                  1. 20

                                                                                                                                                                    What’s the point of reading the article if you can comment on the title? 🙃

                                                                                                                                                                  2. 3

                                                                                                                                                                    I call myself a developer. If I could write an SLA and stand by the agreement legally, I would call myself an engineer. Or if I operated a ship, train, or starship engine.

                                                                                                                                                                    1. 7

                                                                                                                                                                      The post series makes a pretty compelling point that there are traditional engineering disciplines that don’t have anything like an SLA and are still universally accepted as Real Engineering, e.g., chemical engineering.

                                                                                                                                                                      1. 4

                                                                                                                                                                        As a former working chemist: chemical engineering is not anything like chemistry and you can bet your ass there is stuff like, “use this flow regulator with these pipe fittings or else THE REFINERY WILL BLOW UP, because the chemical reaction will go exothermic “, signed, licenced chemeng