1. 3

    “Health checks are like bloom filters. A failing health check means a service isn’t up, but a health check passing means the service is probably “healthy””

    Interesting comparison.

    1. 5

      From $work, failing health checks can also mean that your service is “fine”, but some dependency or assumption of the health checks is not OK (from recent memory, the service can detect when it needs to serve responses in degraded mode, but the health checks see that as irredeemable brokenness).

      So, they’re kind of like a two-way bloom filter. Either “yes” or “no” is suspect.

      1. 2

        Good point! I was also thinking it could be failure of management interface vs data/Internet interface if health checks go over one and usage over the other.

    1. 4

      At $job:

      1. Announce roll out in the common chat channel. We have about 50 ish things that can be rolled out and concurrent rollouts are the norm.
      2. Roll out like 5 minutes later tops.
      3. Monitor graphs, the common error tracing, and the chat while in vanguard.
      4. Declare success 15 minutes later and deploy to the rest of the hosts.

      Abort and revert if anyone signals a problem during that process.

      The first action taken when a rollout is suspect is to revert, if it’s been live for less than a day. That’s not a hostile action and as the rollout system is uniform across the company anyone knows how to revert or rollout any system.

      It sounds janky, but is remarkably efficient in practice.

      1. 4

        Theory. I think the confusion stems from using the same symbol to denote two different functions. If we denote addition by a, multiplication by m and inverse by i then the problem can be summarised as follows:

        1. a and m are functions from K × K to K.
        2. i is a function from K \ { 0 } to K.
        3. When we see x / y we usually think of a function d(x, y) = m(x, i(y)) and this is a function from K × K \ { 0 } to K.
        4. However, we can define a different function d’ from K × K to K such that the restriction of d’ to K × K \ { 0 } is d.

        Using / to denote both d and d’ is what is causing confusion. Some people see 1 / 0 = 1 and assume it means 1 · 0⁻¹ = 1 (and I find this thought natural) but what is really meant is f(1, 0) = 0 where f could be d or d’ or something else entirely.

        Here’s an unsurprising summary: You can define an arbitrary function f on a field and it won’t “break” the field in any way. You can introduce an inconsistency if in addition to the definition you add some claims about it’s properties that are related to the field but a definition alone cannot wreak havoc.

        Practice. There are practical implications though. This whole discussion proves that there’s a lot of confusion over what the / symbol actually means. People assume it’s function d written using an infix notation. I think that making / denote d’ can be surprising to lots of people and cause them to introduce bugs.

        If I see code like

        if a / b == 0:
          ...
        

        then it is natural for me to conclude that: a is zero and b is non-zero. In other words, it’s natural for me to think of function d and not of d’ or any other function. However, if / means d’ then my reasoning does not hold.

        We can define / to mean anything we want, swap + and -, swap the meaning of digits, and do all sorts of syntactic changes without changing the meaning of these symbols. Computers don’t care but programmers do and this whole discussion is the best evidence that this change will cause lots of confusion.

        1. 2

          Mathematicians generally avoid defining this d’ in any way because there is no canonical choice of what the extra value should be. In category theoretic terms, there is no universal property that such a function satisfies that forces a choice unique up to isomorphism.

          This means that everyone is free to pick their own value that makes sense for them, which leads to inconsistencies and, for the lack of a better word, portability problems of theorems between theories that picked different values. That’s again why we generally don’t see people picking any value to define a d’, and when they do anything of the sort they usually have their sights set on things that don’t depend heavily on that value.

        1. 36

          I am a maths researcher at the university of Cologne and adressed this in a thesis I wrote in 2016. See chapter 3, especially the first part of section 3.1.

          Dividing by zero is totally well defined for the projectively extended real numbers (only one unsigned infinity inf) but the argument for the usual extended real numbers (+-inf) not working is not based on field theory, but of infinitisemal nature, given you can approach a zero-division both from below and above and get either +inf or-inf equally likely.

          Defining 1/0=0 not only breaks this infinitiseminal form, it‘s also radically counterintuïtive given how the values behave when you approach the division from small numbers, e.g. 1/10, 1/1, 1/0.1, 1/0.001…

          lim x->0 1/x = 0 makes no sense and is wrong in terms of limits.

          See the thesis where I proved a/0=inf to be well-defined for a!=0.

          tl;dr: There‘s more to this than satisfying the field conditions. If you redefine division, this has consequences on higher levels, in this case most prominently in infinitisemal analysis.

          1. 8

            I used to be a maths researcher, and would just like to point out that some of the people who define division by zero to mean infinity do it because they’re more interested in the geometric properties of the spaces that functions are defined on than the functions themselves. This is the reason for the Riemann sphere in complex analysis, where geometers really like compact spaces more than noncompact ones, so they’re fine with throwing away the field property of the complex numbers. The moment any of them need to compute things, however, they pick local coordinates where division by zero doesn’t happen and use the normal tools of analysis.

            1. 2

              Thanks for laying this out and pointing out the issue with +/- Inf

              Could you summarize here why +Inf is a good choice. As a practical man I approach this from the limit standpoint - usually when I end up with a situation like this it’s because the correct answer is +/- Inf and it depends on the context which one it should be. Here context means on which side of zero was my history of the denominator.

              The issue is that the function 1/x has a discontinuity at 0. I was taught that this means 1/0 is “undefined”. IMO in code this means throw an exception.

              In practical terms I end up adding a tiny number to the denominator (e.g. 1e-10) and continuing, but that implicitly means I’m biased to the positive side of line.

              I think Pony’s approach is flat out wrong.

              1. 6

                It is not +inf, but inf. For the projectively extended real numbers, we only extend the set with one infinite element which has no sign. Take a look at page 18 of the thesis which includes an illustration of this. Rather than having a number line we have a number circle.

                Dividing by zero, the direction we approach the denominator does not matter, even if we oscillate around zero, given it all ends up in one single point of infinity. We really don’t limit ourselves here with hat as we can express a limit to +inf or -inf in the traditional real number extension by the direction from which we approach inf in the projectively extended real numbers (see remark 3.5 on page 19).

                1/x is discontinuous at 0, this is true, but we can always look at limits. :) I am also a practical man and hope this relatively formal way I used to describe it did not distract from the relatively simple idea behind this.

                Pony’s approach is reasonable within field theory, but it’s not really useful when almost the entire analytical building on top of it collapses on your head. NaN was invented for a reason and given the IEEE floating-point numbers use the traditional +-inf extension, they should just return the indeterminate form on division by zero in Pony.

                1. 6

                  NaN only exists for floating point, not integers. If you want to use NaN or something like it for integers, you will need to box all integer numbers and take a large performance hit.

              2. 1

                Just curious, but why isn’t 1/0=1? Would 1/0=Inf not require that infinity exists between 0 and 1?

                1. 2

                  I’m not sure I understand your question. Does 1/2=x require x to be between 1 and 2?

              1. 3

                The problem turns out to be some obscure FUSE mounts that the author had lying around in a broken state, which subsequently broke the kernel namespace system. Meanwhile, I have been running systemd on every computer I’ve owned in many years and have never had a problem with it.

                Does this not seem a bit melodramatic?

                1. 9

                  From the twitter thread:

                  Systemd does not of course log any sort of failure message when it gives up on setting up the DynamicUser private namespace; it just goes ahead and silently runs the service in the regular filesystem, even though it knows that is guaranteed to fail.

                  It sounds like the system had an opportunity to point out an anomaly that would guide the operator in the right direction, but instead decided to power through anyways.

                  1. 8

                    A lot like continuing to run in a degraded state is a plague that affects distributed systems. Everybody thinks it’s a good idea “some service is surely better than no service” until it happens to them.

                    1. 3

                      At $work we prefer degraded mode for critical systems. If they go down we make no money, while if they kind of sludge on we make less but still some money while we firefight whatever went wrong this time.

                      1. 8

                        My belief is that inevitably you could be making $100 per day, would notice if you made $0, but are instead making $10 and won’t notice this for six months. So be careful.

                        1. 4

                          We have monitoring and alerting around how much money is coming in, that we compare with historical data and predictions. It’s actually a very reliable canary for when things go wrong, and for when they are right again, on the scale of seconds to a few days. But you are right that things getting a little suckier slowly over a long time would only show up as real growth not being in line with predictions.

                      2. 2

                        I tend to agree that hard failures are nicer in general (especially to make sure things work), but I’ve also been in scenarios where buggy logging code has caused an entire service to go down, which… well that sucked.

                        There is a justification for partial service functionality in some cases (especially when uptime is important), but like with many things I think that judgement calls in that are usually so wrong that I prefer hard failures in almost all cases.

                        1. 1

                          Running distributed software on snowflake servers is the plague to point out.

                          1. 1

                            Everybody thinks it’s a good idea “some service is surely better than no service” until it happens to them.

                            So if the server is over capacity, kill it and don’t serve anyone?

                            Router can’t open and forward a port, so cut all traffic?

                            I guess that sounds a little too hyperbolic.

                            But there’s a continuum there. At $work, I’ve got a project that tries to keep going even if something is wrong. Honest, I’m not sure I like how all the errors are handled. But then again, the software is supposed to operate rather autonomously after initial configuration. Remote configuration is a part of the service; if something breaks, it’d be really nice if the remote access and logs and all were still reachable. And you certainly don’t want to give up over a problem that may turn out to be temporary or something that could be routed around… reliability is paramount.

                            1. 2

                              And you certainly don’t want to give up over a problem that may turn out to be temporary

                              I think that’s close to the core of the problem. Temporary problems recur, worsen, etc. I’m not saying it’s always wrong to retry, but I think one should have some idea of why the root problem will disappear before retrying. Computers are pretty deterministic. Transient errors indicate incomplete understanding. But people think a try-catch in a loop is “defensive”. :(

                        2. 4

                          So you never had legacy systems (or configurations) to support? I read Chris’ blog regularly, and he works at a university on a heterogeneous network (some Linux, some other Unix systems) that has been running Unix for a long time. I think he started working there before systemd was even created.

                          1. 3

                            Why do you say that the FUSE mounts were broken? As far as we can see they were just set up in a uncommon way https://twitter.com/thatcks/status/1027259924835954689

                            1. 3

                              It does look brittle that broken fuse mounts prevent the ntpd from running. IMO the most annoying part is the debugability of the issue.

                              1. 2

                                Yes, it seems melodramatic, even to my anti-systemd ears. It’s a documentation and error reporting problem, not a technical problem, IMO. Olivier Lacan gave a great talk last year about good errors and bad errors (https://olivierlacan.com/talks/human-errors/). I think it’s high time we start thinking about how to improve error reporting in software everywhere – and maybe one day human-centric error reporting will be as ubiquitous as unit testing is today.

                                1. 2

                                  In my view (as the original post’s author) there are two problems in view. That systemd doesn’t report useful errors (or even notice errors) when it encounters internal failures is the lesser issue; the greater issue is that it’s guaranteed to fail to restart some services under certain circumstances due to internal implementation decisions. Fixing systemd to log good errors would not cause timesyncd to be restartable, which is the real goal. It would at least make the overall system more debuggable, though, especially if it provided enough detail.

                                  The optimistic take on ‘add a focus on error reporting’ is that considering how to report errors would also lead to a greater consideration of what errors can actually happen, how likely they are, and perhaps what can be done about them by the program itself. Thinking about errors makes you actively confront them, in much the same way that writing documentation about your program or system can confront you with its awkward bits and get you to do something about them.

                              1. 3

                                I think this also works: Pick longitude (0, pi). pick another angle (0, 2pi) for latitude and scale by sqrt(u) to avoid bunching at the poles. Pick radius (0,1) and scale by cube(u).

                                1. 2

                                  What does the symbol u represent? And why doesn’t longitude run from 0 to 2π and latitude from 0 to π, instead of the other way around (as you have it)? The span of longitude should be bigger because it goes all the way around the sphere, while latitude just goes 1/2 of the way (from top to bottom).

                                  1. 2

                                    u is a randomly generated variable between zero and one. I think tedu mixed up longitude and latitude (he wouldn’t be the first one).

                                  2. 1

                                    Is this different from picking a random point in the spherical coordinate box and converting it to Cartesian ones?

                                    1. 3

                                      You can’t naively pick random spherical coordinators because they will clump together near the origin and around the poles.

                                  1. 1

                                    This is super fun, but the result isn’t surprising from a geometric point of view. The configuration space of n points in the plane (that is, all the possible pictures we can make with n points) is an algebraic manifold of dimension n!. Fixing the mean, median, correlation and some other parameters of those points only removes one degree of freedom for every property we fix. As n! is very large, and we only fix a handful of properties, we still have an enormous amount of space to move in and create images that are obviously completely different.

                                    1. 2

                                      but the result isn’t surprising from a geometric point of view.

                                      In this case I myself would go for the ‘did I see it coming’ definition of surprise, rather than ‘is it unlikely to occur’. Otherwise, no maths could ever be surprising, could it? I’m personally fond of the phrase ‘surprising yet inevitable’ – I got it from Howard Tayler talking about fiction writing, and apparently it was first said by Aristoteles, but it’s neat for this sort of thing, too.

                                      Also, thanks for the n!-based explanation, that was new (and surprising!) to me.

                                      1. 2

                                        Yeah, I should have said “can be explained”. :)

                                        1. 2

                                          For me, it was more a layperson view on it. I watch news articles post contradictory statistics all the time. They seem easy enough to massage to prove about any BS you want. That they can get different datasets to produce the same means or whatever is unsurprising from that perspective of “statistical results are easy to fake or mislead with if audience isn’t a statistics expert reviewing their methodology and data.”

                                          The visualizations were still cool, though.

                                      1. 6

                                        Would anybody be interested in using this for anything? I probably won’t ever work on it, but it might be a nice learning exercise to get a minimal UNIX-like kernel going and a sliver of a userspace.

                                        I also have some ideas that other people could work on in their free time instead of me. I don’t have the timestamp that points to the quote, but “if all you have are ideas, you’re an ideas guy” from Programming is terrible comes to mind.

                                        1. 3

                                          Fixed the wording there, I wasn’t saying someone else should go out and make it, just that it’s an idea that I might go ahead with on spare time.

                                          Also, this post came out of me working on this idea. I spent about a day working on a similar system, Linux-based, and failed to get it close to my size goal. So I decided to throw this out there to see what people think, if I should spend any time working on it, and if there are any glaring issues (which there seem to be).

                                        1. 2

                                          looks like a decent alternative to mcabber, but I really hope this trend of “git clone and run a mystery shell script” ends soon. we have trusted build systems and package distribution systems; we should use them.

                                          1. 3

                                            That’s for running the developer branch only. Linux Distributions and *BSDs have it packaged.

                                            1. 1

                                              Yeah. I was going to build it because I didn’t know if it was in the repos of my distribution (Fedora), however I found that it was.

                                            2. 1

                                              Most package distribution systems assume root (except for a few such as Nix variants and pkgsrc AFAIK). I don’t see why I should prefer such a distribution especially when the intent is to try it out first.

                                              1. 1

                                                mcabber. Never heard of it. I can check it out but since I am becoming accustomed to poezio Idk.

                                                I see they look similar hence I will probably stay with poezio.

                                                1. 1

                                                  You mean like these package distribution systems?

                                                1. 2

                                                  I think I was on board until pattern 4, when I realized that if you start with deliberately terrible code, rewriting in any alternative style will be an improvement. But is that really the best effort an if/else programmer could make? Or is it a straw man?

                                                  1. 1

                                                    Pattern 4 looks like code golfing or writing for the computer instead of human readers. The “before” code looks shitty, but I don’t need to recall operator precedence rules or model a truth table in my head to understand what it does.

                                                  1. 27

                                                    That you know tools and processes that work for you is great. But it doesn’t sound like you have the best ideas about when to use them.

                                                    You say you got decent work done while your wife was talking with her friends at dinner. What they saw was someone tuning their presence out to do what they’d really wanted.

                                                    If you’re gonna be there, be there.

                                                    1. 12

                                                      There are a few details to my story that I had left out because I didn’t think they were relevant. You just made them relevant.

                                                      1. My wife and her friends know I’m a writer.
                                                      2. They know that I sometimes come up with ideas at socially inconvenient times.
                                                      3. I had told them that talking with them had sparked an idea, and asked them if they would mind if I wrote it down right away.
                                                      4. Only one person objected.
                                                      5. The one person who objected wasn’t my wife, so her opinion didn’t really matter.
                                                      6. I didn’t have headphones plugged in, so I was able to put aside my work and engage when the conversation turned back toward me.
                                                      7. When you’re the only man in a party of twelve, the conversation doesn’t turn to you that often.

                                                      Do you usually give out unsolicited etiquette advice online, or should I be flattered?

                                                      1. 5

                                                        It’s advice, we don’t know the whole story.

                                                        I’m thankful when someone online is forward and seems genuinely interested in giving helpful etiquette advice. Unsolicited is the only solution if I don’t know I’m being rude.

                                                      2. 1

                                                        Hah, I thought the same. It’s funny how most people will not put up with someone opening up their laptop in the restaurant but we don’t bat an eye when we start using smart phones.

                                                        1. 7

                                                          I’m not going to defend smart phone usage and will call my friends out for using them at the table (in a friendly manner, of course), but the difference of degree between pulling out your phone and pulling out a laptop is so large it becomes a difference of kind.

                                                          1. 2

                                                            I guess it only depends of the intent. If it’s just to write your thoughts quickly not to forget something, then a notebook is as intrusive in a dinner as a smartphone or a laptop.

                                                      1. 25

                                                        I did a PhD in maths where I had to do a lot of algebraic geometry, so I’m comfortable with category theory and its concepts and applications. I’ve never seen those ideas being used in nontrivial or useful ways in programming, and by now think that either me or a lot of other people are missing some point. I’m not sure which.

                                                        Category theory became popular in mathematics, and especially algebraic geometry, because it provided a “one higher level” from which to look upon the fields and see that a lot of the ideas we were working with were actually a shadow of a single more abstract idea. For example, the direct products of groups, rings, fields, vector spaces and so on were understood as different incarnations of the category-theoretic product. This helped to standardize a lot of arguments, and give names to some concepts differing groups had been grappling with in isolation before. Grothendieck was able to wield these abstract notions so deftly that he could use them to “take compass bearings” and figure out in what directions he should go. That is unbelievably powerful.

                                                        In programming, I can see how one would model the state of a program as a monad. A monad is basically defined around the idea that we can’t always undo whatever we just did, so that makes sense. I’ve also read a fair number of Haskell programmers and their explanations of how category theory fits into programming. None of it seems to even have the promise of the same levels of usefulness as we’ve seen in mathematics, and a lot of it seems to be actively harmful by raising jargon barriers around trivial ideas.

                                                        1. 8

                                                          That is a great story, I would definitely read more of your writing about math if you shared it somewhere.

                                                          1. 4

                                                            I too have encountered category theory during my maths degree (never managed to get the PhD, though), and I also agree that category theory in programming seems very out of place. The most interesting application I’ve seen for it is in homological algebra, but I’m pretty sure no programmer has any interest in abelian categories. The most prototypical functor for me is the Galois functor, which programmers have no need for.

                                                            The result is that when I see computer people talk about category theory, it’s all utterly foreign to me. They tell me I should like it because I like mathematics, but I do not. I’ve made some effort to understand why they like it and have never been very convinced by it, as unconvinced as you seem yourself.

                                                            1. 4

                                                              I’ve also read a fair number of Haskell programmers and their explanations of how category theory fits into programming.

                                                              I’d be interested in your take on this discussion, in particular the first comment by Gershom Bazerman, as well as his post here. It seems like he has a good perspective on it, but I don’t have the mathematical knowledge to really confirm that one way or the other. Or maybe you’ve already read this particular stuff and dismissed it; in either case it’d be handy to get a sense of what you think to place it in context, if you’re willing.

                                                              Here is another post which your comment reminded me of, which I wish I had the mathematical ability to fully understand; I’d also love to hear what you think about that as well.

                                                              I’m really not trying to challenge anything you said about the misapplication of CT in Haskell/programming in general (if I haven’t emphasized this enough at this point, I don’t think I’m qualified to do so), I’m just always interested in adding more data to my collection, hoping that at some point I’ll have built up the mathematical maturity to understand the different positions better.

                                                              None of it seems to even have the promise of the same levels of usefulness as we’ve seen in mathematics, and a lot of it seems to be actively harmful by raising jargon barriers around trivial ideas.

                                                              As a programmer who barely understands category theory, all I can say is that I’ve personally found the small number of concepts I’ve encountered useful, and, most importantly, more useful than anything else out there (which I’ll generalize as “design patterns and other vague, poorly specified stuff”) for providing a basis for designing modular structures to base programs on. I find that the most basic concepts presented in category theory map well to the kind of abstraction present in programming, and I’d love to get a better sense of where you find the jargon barriers to be and how we could eliminate those (and fwiw I think this is a general problem in programming, not limited to Haskell nerds dropping category theory terms into their discussions). In particular I’ve found concepts like Monoid, Monad, and Functor to be useful–especially in understanding how they interrelate and can be used together. They’ve enhanced my ability to think conceptually and logically about the kinds of structures I deal with in programming all the time, even where I may not be applying these structures directly in whatever program I’m considering. I may be doing it wrong, but insofar as I’ve developed the correct intuition around these things, they seem useful to me.

                                                              So I can readily accept that we have not been able (and maybe never will be able!) to harness category theory at the level Grothendieck did, but it seems like right now it’s yielding results, and part of the value is simply in the exploration and application of a different rigor to programming as a practice. Maybe in ten or twenty years we’ll look back at the folly of applying category theory to programming, but I rather think it’s more likely that we’ll see it as a step on the path toward discovering something deeper, more rigorous and powerful, and more beautiful than what we can imagine for designing programs right now.

                                                              Or maybe we’ll go back to being obsessed with design patterns and UML. If that’s the case I hope I’ll have quit and gone into being an organic farmer in Vermont or something.

                                                              1. 1

                                                                I’m interested in hearing more about this as well. It’s been a long-standing question for me whether continuing to investigate category theory would help me write better programs. I have no background in higher math, but my understanding/assumption has been that category theory is relevant to programming insofar as it facilitates composition.

                                                                As I see it, the fundamental problem of software design is economically accommodating change. We try to facilitate this by selectively introducing boundaries into our systems in the hopes that the resulting structures can be understood, modified, and rearranged as atomic units. I don’t think it’s controversial to say that our overall success at this is mixed at best.

                                                                The promise of category theory seems to be that, rather than relying on hearsay (design patterns) or our own limited experience, we can inform our choices of where to introduce boundaries from a more fundamental, abstract space where the compositional properties of various structures are rigorously understood.

                                                                But like I said, this is very much an open question for me. I would love to be convinced that, although there is clearly some overlap, the fields of category theory and software design are generally independent and irrelevant to each other.

                                                              1. 1

                                                                I don’t even have to read this to up-vote it.

                                                                1. 9

                                                                  You might still want to read it and maybe reconsider that upvote. That article goes all over the place, stating its thesis that blockchain hype is the new NoSQL hype, that the NoSQL haters were right, that both of those terms are vague enough to be meaningless, and conclues that blockchain is a very exciting technology and that we should all join the author’s company VMware to work on it.

                                                                  ¯\(ツ)

                                                                  1. 1

                                                                    It was just a joke, expressing my views on blockchain technologies, e.g great in theory, but not the answer to every problem. Similar to my previous (and current) views on NoSQL.

                                                                1. 2

                                                                  At $work, I’m going to plug our Graphite setup into the in-house instrumenting pipeline so we can detect new bugs more quickly than by people pinging us asking “Is the thing broken now?”, and then get on a rollout train. In the medium term, I also want to hook us into the in-house deployment setup so we can get faster rollouts (but mostly faster reverts) than we currently do via puppet, but that’s going to take some sysadmining and sysadmin-convincing.

                                                                  1. 12

                                                                    Output should be simple to parse and compose

                                                                    No JSON, please.

                                                                    Yes, every tool should have a custom format that needs a badly cobbled together parser (in awk or whatever) that will break once the format is changed slighly or the output accidentally contains a space. No, jq doesn’t exist, can’t be fitted into Unix pipelines and we will be stuck with sed and awk until the end of times, occasionally trying to solve the worst failures with find -print0 and xargs -0.

                                                                    1. 11

                                                                      JSON replaces these problems with different ones. Different tools will use different constructs inside JSON (named lists, unnamed ones, different layouts and nesting strategies).

                                                                      In a JSON shell tool world you will have to spend time parsing and re-arranging JSON data between tools; as well as constructing it manually as inputs. I think that would end up being just as hacky as the horrid stuff we do today (let’s not mention IFS and quoting abuse :D).


                                                                      Sidestory: several months back I had a co-worker who wanted me to make some code that parsed his data stream and did something with it (I think it was plotting related IIRC).

                                                                      Me: “Could I have these numbers in one-record-per-row plaintext format please?”

                                                                      Co: “Can I send them to you in JSON instead?”

                                                                      Me: “Sure. What will be the format inside the JSON?”

                                                                      Co: “…. it’ll just be JSON.”

                                                                      Me: “But it what form? Will there be a list? Name of the elements inside it?”

                                                                      Co: “…”

                                                                      Me: “Can you write me an example JSON message and send it to me, that might be easier.”

                                                                      Co: “Why do you need that, it’ll be in JSON?”

                                                                      Grrr :P


                                                                      Anyway, JSON is a format, but you still need a format inside this format. Element names, overall structures. Using JSON does not make every tool use the same format, that’s strictly impossible. One tool’s stage1.input-file is different to another tool’s output-file.[5].filename; especially if those tools are for different tasks.

                                                                      1. 3

                                                                        I think that would end up being just as hacky as the horrid stuff we do today (let’s not mention IFS and quoting abuse :D).

                                                                        Except that standardized, popular formats like JSON get the side effect of tool ecosystems to solve most problems they can bring. Autogenerators, transformers, and so on come with this if it’s a data format. We usually don’t get this if it’s random people creating formats for their own use. We have to fully customize the part handling the format rather than adapt an existing one.

                                                                        1. 2

                                                                          Still, even XML that had the best tooling I have used so far for a general purpose format (XSLT and XSD in primis), was unable to handle partial results.

                                                                          The issue is probably due to their history, as a representation of a complete document / data structure.

                                                                          Even s-expressions (the simplest format of the family) have the same issue.

                                                                          Now we should also note that pipelines can be created on the fly, even from binary data manipulations. So a single dictated format would probably pose too restrictions, if you want the system to actually enforce and validate it.

                                                                          1. 2

                                                                            “Still, even XML”

                                                                            XML and its ecosystem were extremely complex. I used s-expressions with partial results in the past. You just have to structure the data to make it easy to get a piece at a time. I can’t recall the details right now. Another I used trying to balance efficiency, flexibility, and complexity was XDR. Too bad it didn’t get more attention.

                                                                            “So a single dictated format would probably pose too restrictions, if you want the system to actually enforce and validate it.”

                                                                            The L4 family usually handles that by standardizing on an interface, description language with all of it auto-generated. Works well enough for them. Camkes is an example.

                                                                            1. 3

                                                                              XML and its ecosystem were extremely complex.

                                                                              It is coherent, powerful and flexible.

                                                                              One might argue that it’s too flexible or too powerful, so that you can solve any of the problems it solves with simpler custom languages. And I would agree to a large extent.

                                                                              But, for example, XHTML was a perfect use case. Indeed to do what I did back then with XLST now people use Javascript, which is less coherent and way more powerful, and in no way simpler.

                                                                              The L4 family usually handles that by standardizing on an interface, description language with all of it auto-generated.

                                                                              Yes but they generate OS modules that are composed at build time.

                                                                              Pipelines are integrated on the fly.

                                                                              I really like strongly typed and standard formats but the tradeoff here is about composability.

                                                                              UNIX turned every communication into byte streams.

                                                                              Bytes byte at times, but they are standard, after all! Their interpretation is not, but that’s what provides the flexibility.

                                                                              1. 4

                                                                                Indeed to do what I did back then with XLST now people use Javascript, which is less coherent and way more powerful, and in no way simpler.

                                                                                While I am definitely not a proponent of JavaScript, computations in XSLT are incredibly verbose and convoluted, mainly because XSLT for some reason needs to be XML and XML is just a poor syntax for actual programming.

                                                                                That and the fact that while my transformations worked fine with xsltproc but did just nothing in browsers without any decent way to debug the problem made me put away XSLT as an esolang — lot of fun for an afternoon, not what I would use to actually get things done.

                                                                                That said, I’d take XML output from Unix tools and some kind of jq-like processor any day over manually parsing text out of byte streams.

                                                                                1. 2

                                                                                  I loved it when I did HTML wanting something more flexible that machines could handle. XHTML was my use case as well. Once I was a better programmer, I realized it was probably an overkill standard that could’ve been something simpler with a series of tools each doing their little job. Maybe even different formats for different kinds of things. W3C ended up creating a bunch of those anyway.

                                                                                  “Pipelines are integrated on the fly.”

                                                                                  Maybe put it in the OS like a JIT. Far as bytestreams, that mostly what XDR did. They were just minimally-structured, byte streams. Just tie the data types, layouts, and so on to whatever language the OS or platform uses the most.

                                                                          2. 3

                                                                            JSON replaces these problems with different ones. Different tools will use different constructs inside JSON (named lists, unnamed ones, different layouts and nesting strategies).

                                                                            This is true, but but it does not mean heaving some kind of common interchange format does not improve things. So yes, it does not tell you what the data will contain (but “custom text format, possibly tab separated” is, again, not better). I know the problem, since I often work with JSON that contains or misses things. But the problem is not to not use JSON but rather have specifications. JSON has a number of possible schema formats which puts it at a big advantage of most custom formats.

                                                                            The other alternative is of course something like ProtoBuf, because it forces the use of proto files, which is at least some kind of specification. That throws away the human readability, which I didn’t want to suggest to a Unix crowd.

                                                                            Thinking about it, an established binary interchange format with schemas and a transport is in some ways reminiscent of COM & CORBA in the nineties.

                                                                          3. 7

                                                                            will break once the format is changed slighly

                                                                            Doesn’t this happens with json too?
                                                                            A slight change in the key names or turning a string to a listof strings and the recipient won’t be able to handle the input anyway.

                                                                            the output accidentally contains a space.

                                                                            Or the output accidentally contact a comma: depending on the parser, the behaviour will change.

                                                                            No, jq doesn’t exis…

                                                                            Jq is great, but I would not say JSON should be the default output when you want composable programs.

                                                                            For example JSON root is always a whole object and this won’t work for streams that get produced slowly.

                                                                            1. 5

                                                                              will break once the format is changed slighly

                                                                              Doesn’t this happens with json too?

                                                                              Using a whitespace separated table such as suggested in the article is somewhat vulnerable to continuing to appear to work after the format has changed while actually misinterpreting the data (e.g. if you inserted a new column at the beginning, your pipeline could happily continue, since all it needs is at least two columns with numbers in). Json is more likely to either continue working correctly and ignore the new column or fail with an error. Arguably it is the key-value aspect that’s helpful here, not specifically json. As you point out, there are other issues with using json in a pipeline.

                                                                            2. 3

                                                                              On the other hand, most Unix tools use tabular format or key value format. I do agree though that the lack of guidelines makes it annoying to compose.

                                                                              1. 2

                                                                                Hands up everybody that has to write parsers for zpool status and its load-bearing whitespaces to do ZFS health monitoring.

                                                                                1. 2

                                                                                  In my day-to-day work, there are times when I wish some tools would produce JSON and other times when I wish a JSON output was just textual (as recommended in the article). Ideally, tools should be able to produce different kinds of outputs, and I find libxo (mentioned by @apy) very interesting.

                                                                                  1. 2

                                                                                    I spent very little time thinking about this after reading your comment and wonder how, for example, the core utils would look like if they accepted/returned JSON as well as plain text.

                                                                                    A priori we have this awful problem of making everyone understand every one else’s input and output schemas, but that might not be necessary. For any tool that expects a file as input, we make it accept any JSON object that contains the key-value pair "file": "something". For tools that expect multiple files, have them take an array of such objects. Tools that return files, like ls for example, can then return whatever they want in their JSON objects, as long as those objects contain "file": "something". Then we should get to keep chaining pipes of stuff together without having to write ungodly amounts jq between them.

                                                                                    I have no idea how much people have tried doing this or anything similar. Is there prior art?

                                                                                    1. 9

                                                                                      In FreeBSD we have libxo which a lot of the CLI programs are getting support for. This lets the program print its output and it can be translated to JSON, HTML, or other output forms automatically. So that would allow people to experiment with various formats (although it doesn’t handle reading in the output).

                                                                                      But as @Shamar points out, one problem with JSON is that you need to parse the whole thing before you can do much with it. One can hack around it but then they are kind of abusing JSON.

                                                                                      1. 2

                                                                                        That looks like a fantastic tool, thanks for writing about it. Is there a concerted effort in FreeBSD (or other communities) to use libxo more?

                                                                                        1. 1

                                                                                          FreeBSD definitely has a concerted effort to use it, I’m not sure about elsewhere. For a simple example, you can check out wc:

                                                                                          apy@bsdell ~> wc -l --libxo=dtrt dmesg.log
                                                                                               238 dmesg.log
                                                                                          apy@bsdell ~> wc -l --libxo=json dmesg.log
                                                                                          {"wc": {"file": [{"lines":238,"filename":"dmesg.log"}]}
                                                                                          }
                                                                                          
                                                                                    2. 1

                                                                                      powershell uses objects for its pipelines, i think it even runs on linux nowaday.

                                                                                      i like json, but for shell pipelining it’s not ideal:

                                                                                      • the unstructured nature of the classic output is a core feature. you can easily mangle it in ways the programs author never assumed, and that makes it powerful.

                                                                                      • with line based records you can parse incomplete (as in the process is not finished) data more easily. you just have to split after a newline. with json, technically you can’t begin using the data until a (sub)object is completely parsed. using half-parsed objects seems not so wise.

                                                                                      • if you output json, you probably have to keep the structure of the object tree which you generated in memory, like “currently i’m in a list in an object in a list”. thats not ideal sometimes (one doesn’t have to use real serialization all the time, but it’s nicer than to just print the correct tokens at the right places).

                                                                                      • json is “java script object notation”. not everything is ideally represented as an object. thats why relational databases are still in use.

                                                                                      edit: be nicer ;)

                                                                                    1. 8

                                                                                      I’ve noticed that Joyent has remote positions and their ads say: “Qualified applicants with criminal histories will be considered for the position in a manner consistent with the Fair Chance Ordinance.” Maybe this person could see if they have use for his talents? (Ping @bcantrill, which I hope wasn’t in unbelievably bad taste?)

                                                                                      1. 2

                                                                                        Perhaps you should contact the job seeker directly.