Threads for c-cube

    1. 2

      That’s a great video! Crystal clear from beginning to end 😍

      1. 10

        Thanks for the writeup, I’ll bookmark it for sure.

        I recently started looking into wireguard to access a few of my devices behind NATs and also ended up researching Tailscale in tandem. I will gladly accept that Tailscale is amazing and easy for lots of folks but it turns out I’m a control freak when it comes to system administration and I don’t fully understand everything that Tailscale does to the host. (Particularly MagicDNS.) To put it another way, it has lots of small parts. I was also a bit turned off by the fact that it uses a userspace implementation of Wireguard even though I have a Wireguard-capable kernel of all of my boxes. The final straw was that the documentation for Headscale (the open source control plane I planned to use) seemed rather lacking and if you have any questions, you basically have no option but to log into their Discord.

        So my simple poor-man’s alternative is to just deploy Wireguard to the nodes as “spokes” along with a “hub” on a VPS and route all traffic through that. Yeah, it’s going to be inefficient compared to node-to-node traffic, but I’m not asking much out of this. I’m using an IPv6 ULAs for the Wireguard IPs and each of these gets an AAAA record on my DNS server. At some point, I may look into whether I can deploy my own DERP server to somehow make it more of a mesh and less of a wheel.

        1. 4

          Tailscaled runs its own nameserver and edits resolv.conf to use it on my linux laptop. They had a blog post where they talked about tackling all the DNS configurations they handle.

          1. 5

            Also, if you run systemd-resolved, it simply registers its “magic” nameserver with resolved API, and then everything “just works” because resolved has first-class support for multi-homed resolution.

            IIRC, that blog post also concluded with “just use resolved”.

          2. 2

            FWIW I switched from plain WireGuard to a Headscale setup about a year ago, and it’s been working perfectly fine.

            You’re right that the documentation can be a bit obtuse, and once or twice I’ve had breaking changes when upgrading the Headscale server, but that’s to be expected, and I now skim the release notes before upgrading (which I should have done all along, of course).

              1. 1

                Perhaps! I have it bookmarked but have yet to dig into it.

              2. 2

                My poor-man’s alternative is to run autossh tunnels that connect to my home server via DDNS. Then I can open a reverse tunnel from home to the remote device.

                Pros:

                • it works
                • my home server already runs an SSH server anyway, so no additional service needed
                • no magic involved
                • no external company involved

                Cons:

                • requires working DDNS (if my home provider wouldn’t offer a public IP, I’d be screwed)
                • somewhat hacky?
                • tunneling over a TCP tunnel can be a bad experience. One day I should replace autossh by Wireguard.
              3. 19

                I’ve got some history in this domain, having created the Fleece format for Couchbase Lite. My priorities with Fleece were (1) fast parsing, since it’s used to store database records that are read much more often than being written, and (2) compactness, especially when storing many documents with similar schemas.

                I don’t think I had seen MsgPack or CBOR at the time I designed Fleece in 2015. I’d looked at various other formats like BSON and realized that most of the expense of deserialization wasn’t from parsing the data, rather from allocating the DOM objects. So I made Fleece to not require any memory allocation: the DOM objects are simply pointers into the encoded data. (Cap’n Proto and FlexBuffers share this property.) Additionally, there are no O(n) algorithms accessing data: strings have a length prefix, array items are fixed-size, and object keys are stored as sorted arrays that can be binary-searched.

                (The entire Fleece library has gotten pretty big, but most of that is for auxiliary features, as well as some heavy optimization. A basic codec is pretty easy to implement.)

                1. 3

                  That’s cool! Fleece was a big inspiration for twine. In fact I’d describe it as halfway between CBOR and fleece :-)

                  1. 1

                    it’s hard to beat a zero-allocation format + gzip for size

                    delta encoding looks useful

                  2. 1

                    Would Gzip + json perform similarly at scale?

                    1. 2

                      It depends a lot on how much you rely on the sharing. If you start from JSON, turn it into Twine, and also use gzip on it, both might be comparable. But you can encode things into Twine in, say, 50KiB, that will correspond to an extremely large JSON values, because encoding a DAG into a tree can result in exponential blowup. For my use case, producing the JSON (or CBOR) and then compressing it would be very wasteful, and a lot slower.

                    2. 3

                      Ooh, binary YAML!

                      1. 2

                        - NO

                        Well I see where you’re coming from 😁. There are significant differences though, it’s not human readable (although you can kind of read the hexdumps with a bit of habit); definitely not human writable; and much better typed. No confusion between strings and booleans and numbers for a start.

                      2. 1

                        Is it correct to think of your deduplication of values as dictionary compression?

                        Did you consider also creating a dictionary for the key fields, e.g. "a"? If the same message is sent multiple times then you’ll end up sending all the field names every time. If you had a schema, then you could assigned “pointers” to the schema for each field and thus saved bytes. Or is that not relevant for your use case?

                        1. 1

                          Using pointers (implicit sharing), yes, it can be compared to compression where words are reused. The first example shows that as a string is encoded only once.

                          There is no separate dictionary that could be shared between messages (although you could probably follow dCBOR42 and use a tag to represent a key into the shared dictionary). It’s not really relevant to my use case because the messages are fairly big and, using a schema (OCaml with a deriving library) we can encode directly into arrays instead of dictionaries. If you want compactness (and have a schema) you can also imitate protobuf and use (small) integers are dictionary keys.

                        2. 5

                          Let it goooooooooo, let it goooooooooo, it’s not needed anymoooooooore, let it goooooooooo, let it gooooooo, ratatui now holds the fooooooort

                          Congrats, I suppose. Lifting the curse must feel good :-)

                          1. 2

                            If a bad version of a library is released, someone needs to explicitly opt into this new version.

                            How so? We have dependabot now. It can make PRs to always make the minimum required version the latest (some authors might only want to support the latest release of dependencies). The PR can get merged and released automatically if the CI suite passes.

                            Even if there is no automatic merging, I doubt that people thoroughly review the changes if they can see their CI suite already passes with the new version. Does that count as an explicit opt-in? It doesn’t matter, the attack still worked.

                            1. 2

                              At least CI would be green with this specific version before a person presses “merge”. There’s no silver bullet but it seems nicer than your CD picking up a never seen before version…

                            2. 4

                              It’s always “compared to what” and “for what purpose”.

                              And to all those people who complain about JSON verbosity compared to MessagePack. Go on msgpack.org, click on “Try!” and type in 1.2 or [1.1,2.2,3.4,4.4,5.5,6.6].

                              I am a huge fan of Protobuf and some of its competitors, but comparing them with JSON is like saying wget or curl are better than Firefox or Chrome or vice versa. They are both HTTP[1] clients, yet their goals, intentions and applicable use cases vary a lot.

                              The same is true for compression formats and algorithms, even things like encryption and so on. Different use cases require different properties. The idea of there just being one Turing-complete programming language is just as silly.

                              Why would this be different for data serialization?

                              Look at your use case and choose the appropriate tool.

                              Take a deep dive into XML, including CDATA, handing truthiness in a million different ways, binary data, XML being UTF-8 or UTF-16, or neither, parameters vs children vs tags, the fact that fast XML parser still have security issues on a regular basis, verbose tag closing or tags that close themselves and you’ll find JSON might not be a much better fit - or not.

                              This of course isn’t meant to be pro-JSON or a defense of JSON, but that the idea of being angry at a tool - especially given its history and context - is a bit silly.

                              [1] and other protocols

                              1. 3

                                Like u/Relax says, this might be more compact if your use case is storing floats whose decimal notation is short. Try 152351.152152151211292 and suddenly, tada, msgpack is shorter. In the real world, floats will not nicely fit in a short decimal notation; integers will be smaller in msgpack/cbor; and blobs will be 2/3 of what they’d be in JSON+base64.

                                1. 1

                                  JSON+base64

                                  If you need it. Base64 is an odd choice in many situations. The fact that Protocol Buffers uses it in certain situations is strange indeed, basically double-serializing stuff. Base64 is for when you have to use Text, but this rarely is the case and I’ve seen more the one company and many projects just skipping or re-inventing parts of gRPC to get around that weirdness, when wanting to basically have gRPC in the web browser, without the sillyness interesting design choices that their proxies bring just to get around browsers not having Tailer-Support.

                                  Don’t get me wrong though, please. I don’t hate on msgpack. The whole point of my post is consider what you use, but don’t assume one size fits all. Somehow we live in a time where serialization formats have haters and fans, that think only their favorite one should ever be used.

                                  1. 3

                                    If you need it. Base64 is an odd choice in many situations.

                                    Well, with JSON, you need it as soon as you want to embed any form of binary data, since strings have to be UTF8. Want to embed a thumbnail, a hash, another file format? There you go, pay the 3/2 tax from base64. This is just not an issue with (binary) protobuf, msgpack, CBOR, etc.

                                    I’m not even getting into the performance overhead of having to base64-encode, or even escape regular strings, of course.

                                    The whole point of my post is consider what you use, but don’t assume one size fits all.

                                    And the OP is about how JSON is not a one size fits all either, and about how it’s way overused given the tradeoffs! There are situations where JSON is fine (esp. with JSONL imho). There are also lots of situations where it’s used and shouldn’t be (for example, HTTP+JSON APIs could use CBOR instead, gain on performance, correctness, interoperability, etc. and lose basically nothing).

                                2. 3

                                  And to all those people who complain about JSON verbosity compared to MessagePack. Go on msgpack.org, click on “Try!” and type in 1.2 or [1.1,2.2,3.4,4.4,5.5,6.6].

                                  What’s your point? That msgpack stores doubles rather than strings? That seems…way, way better. Given that msgpack is otherwise more compact, I have a hard time believing there are meaningful pathological cases where this matters. The upside of having a tighter spec just seems like such a massive win.

                                  1. 1

                                    I have a hard time believing there are meaningful pathological cases where this matters

                                    Huh? Whenever you have anything like speeds, distances, duration, and so on. Stuff that float is basically made for. And usually changing the unit from something common (duration in seconds, speed in m/s or km/h, distances in meters, etc.) because of how it is being serialized rarely is reasonable.

                                    I don’t think it matters at all, because I could just use JSON + gzip, have something easily readable. copyable (debugging) and move on.

                                    1. 2

                                      In those cases msgpack is more efficient because it’s storing the 8 byte double rather than a long decimalized string. The pathological cases I’m talking about are where msgpack is surprisingly less efficient than json.

                                  2. 8

                                    This means that your system probably can’t safely handle JSON documents with unknown fields.

                                    Like Protobuf handles unknown fields any better?

                                    If you’re sending me unknown fields, as, in they’re not in my published schema, I’m either ignoring them if I’m honouring Postel, or you’re getting a 400 Bad Request.

                                    I honestly can’t think of a reason why I would accept unknown fields you’re POSTing to my API.

                                    And if JSON is so bad, you need to ask yourself why it’s so obiquitous.

                                    Also, streaming parsers for JSON exist. I can’t speak to their implementation, but I’ve seen them.

                                    1. 7

                                      And if JSON is so bad, you need to ask yourself why it’s so obiquitous.

                                      That’s really never been a good argument. It’s popular, like many other things, because it’s the easiest format for the first few hours/days (especially when javascript is involved). By the time the (numerous) downsides become more apparent, it’s too late to change your service/data/protocol.

                                      1. 6

                                        Like Protobuf handles unknown fields any better?

                                        What’s the problem with protobuf unknown fields? I checked the official Dart, Java, Python, C++ implementations, they all handle unknown fields.

                                        I honestly can’t think of a reason why I would accept unknown fields you’re POSTing to my API.

                                        You probably shouldn’t. The protobuf guide says:

                                        In general, public APIs should drop unknown fields on server-side to prevent security attack via unknown fields. For example, garbage unknown fields may cause a server to fail when it starts to use them as new fields in the future.

                                          1. 1

                                            Yeah, that’s why I mentioned ignoring unknown fields. But generally in a distributed system I’d use a schema registry like Apicurio so that when a publisher uses a new version of a schema, consumers can pull it down as needed.

                                            1. 1

                                              Scheme registries are nice— I’ve used buf— but it doesn’t solve the fundamental problem that old consumers will run at the same time as updated consumers.

                                              Ingress services can do what they want with unknown fields, but middleboxes need to pass them unmolested.

                                        1. 21

                                          I’m here for all the hate on json. It’s very expensive to pass around and serialize and deserialize. It’s the modern CSV, and like CSV it appeals to people because it seems so neutral and safe.

                                          1. 5

                                            Call me when there’s a viable alternative. I like the idea of protobuf, but I haven’t seen an implementation that isn’t a huge pain to work with, and precisely zero of my applications are bottlenecking anywhere near JSON de/serialization. 🤷‍♂️

                                            1. 4

                                              If you use JSON today, swapping with CBOR shouldn’t even be that hard, on paper. It’s the same kind of schemaless format. However, unlike JSON, you get actual binary blobs, a good spec, and integer/floats with specified precision. It’s also faster and more compact.

                                              1. 1

                                                There are a lot of alternatives. But then you have persuade people to install a library.

                                                Avro is in many ways the most json-like (it even has a json based format). It’s just so not hard to do better.

                                            2. 7

                                              I went looking for error handling in the standard library based on os.read_entire_file returning a boolean, and it’s via another function: read_entire_file_from_filename_or_err.
                                              This seems bad given the boolean is just obtained with err != nil so the API is pushing towards hiding errors which is bad UX.
                                              Programs like this represent most of my strace use…

                                              Also I noticed os.read_at takes an offset: i64 and returns an error if it’s less than zero. I hope this is an isolated case and it doesn’t have the same issue as Go of using signed integers for everything.

                                              Basically a quick look at Odin’s os package docs has put me off the stdlib already :(
                                              The language seems to have some interesting ideas though, like explicit overloading (which could greatly benefit os).

                                              1. 4

                                                My understanding is that the Odin dev and contributors are working on a new os package because the current one is crufty (one of the first packages to be developed). So maybe do not use it as a proxy for the overall quality of the language :)

                                                1. 3

                                                  I tried to make this explicit in my first comment, but I was using it as a proxy for the stdlib and not the whole language. Glad to hear it’s not just me :)

                                                  1. 1

                                                    Using unsigned integers in most places is an anti pattern. You are inviting unexpected wrapping overflow.

                                                    Only a handful of languages provide support for catching this: Rust in debug, Zig in debug (I think). C/C++ if you turn on a ton of extensions.

                                                    Go will not tell you but will do the wrong thing, that’s why people sensibly use signed integers in most APIs there.

                                                    1. 1

                                                      I don’t know if that’s “common wisdom,” I’ve never heard it before, and don’t think it makes sense. What would be the use case for unsigned types then?

                                                      Signed integer overflows are not checked either according to the integer overflow section in the Go Spec. And since unsigned integers can represent larger numbers, overflow is less likely when you’re only using positive numbers.

                                                      The kind of APIs I’m thinking of is len, file sizes, port numbers, etc.

                                                  2. 10

                                                    People are asking what formal methods Amazon is using. My understanding (as a non-Amazon person) is that the answer is : “a bit of everything”. They have been hiring people with expertise is automated theorem provers (the main Z3 author moved to Amazon), but also verified-programming systems like Dafny (the main author moved to Amazon), proof assistants, several of them (Hol Light, Coq, Lean), modelling tools like TLA+, etc. A bit of everything. From the outside it feels like they have many different (sub)teams interested in formalization problems or many different sub-problems that justify them, and people have a lot of freedom to pick the approach they think is best as long as it delivers result. They are also funding collaboration with academia to try out other approaches, new tools, etc.

                                                    There are more details for example in this PDF document which comes from one of those team, and mentions again many different tools for formal verification.

                                                      1. 2

                                                        the main Z3 author moved to Amazon

                                                        Assuming you’re referring to Leonardo de Moura, he was involved in the early days from 2012-2014 (self-describes as the main architect) but since then he moved onto developing Lean and Nikolaj Bjørner has been the principal developer along with Lev Nachmanson and Christoph Wintersteiger.

                                                        1. 2

                                                          A historical note: Z3 is older than that. The classic paper on it places its first release in 2007, and it’s been leading the SMT competition for a very long time. I expect that development must have started somewhere between 2003 and 2005, but don’t quote me on that. Work on Lean started in 2013 so that must be roughly when Leo de Moura started working on it.

                                                        2. 2

                                                          Yes, that’s pretty accurate. Bit of everything, whatever we need to solve the business problem.

                                                          In addition, we’ve got some internal tools that aren’t available externally (yet), and an increasing investment in combining AR techniques with neural/AI/LLM/whatever you wanna call it techniques. Missing from your list on the modelling tools side is P (https://p-org.github.io/P/whatisP/), which is also primarily developed at Amazon right now, and is widely used internally for distributed systems stuff.

                                                          From the outside it feels like they have many different (sub)teams interested in formalization problems or many different sub-problems that justify them, and people have a lot of freedom to pick the approach they think is best as long as it delivers result.

                                                          This has, historically, been AWS’s approach to this kind of “researchy” work. We’ve really tried to optimize for technology transfer, making sure the benefits of the work end up in production and benefiting customers, over some other concerns. I think that’s been very successful on the whole, although isn’t without its trade-offs compared to other models.

                                                          1. 2

                                                            Yeah, I agree… it seems like they might be using HOL Light, from an article linked in the post?

                                                            1. 1

                                                              Agreed! I am really interested in this stuff. The double whammy promise of faster and Correct with a capital C is attractive. Are they using anything besides TLA+?

                                                              1. 3

                                                                It is weird that they don’t mention specific technologies, but the S3 work they talked about used TLA+, the IAM work used Z3 (or at least a modified version called Zelkova), and the cryptography work used an in-house interactive prover called HOL Light. I know they also use P and have heard rumors of something similar to P but for model-checking rust code. Don’t know what either of those are used for.

                                                                1. 2

                                                                  Some corrections about this:

                                                                  • I’m told that zelkova is mostly backed by cvc5 (Amazon employs/funds a bunch of cvc5 researchers, among other things for string/regex reasoning). Dafney, however, is still backed by Z3 by default.
                                                                  • HOL light is not “in house” but was developed for a long time before AWS hired John Harrison. Previously I think he was at Intel. I think it’s been used to verify cryptographic primitives at AWS, but previously it was also used for hardware verification.
                                                            2. 26

                                                              The release mentions some refactoring work in preparation for HTML output. I think that HTML output is really a key feature for Typst right now: it is not reasonable for a document-writing system this decade to only target PDF. Presumably implementing a HTML backend should be possible (especially if outputting in paginated form as a first step), but there may be limitations in the design that make it harder than it should be. I will seriously consider adopting Typst if it can get solid multi-backend support, and in particular find a way to produce non-paginated documents.

                                                              1. 35

                                                                Wanted to give my perspective as one of the devs: The focus on PDF was there from the very beginning (the very first Typst PDFs in 2019 used the PDF base 14 fonts ^^). I’m happy that we started this way because it’s what forced us to really build our own stack. It’s been (and still is) a lot of work, but it also gives us a lot of control and it is crucial for incremental compilation (cf. the whole problem with incremental compilers and linking).

                                                                Had we started with HTML, it would have been way to easy to just give up on custom PDFs and instead render them with a browser or transpile to LaTeX. Still, I acknowledge that we are a bit late to the party with HTML (or rather, still on our way). Targeting both from the beginning or at least earlier might have been the way, but even more ambitious…

                                                                But for what it’s worth: In thinking about how to best add HTML output over the past year or so, I’m quite optimistic that it can work out really nicely. I guess only time will tell!

                                                                I plan to write a tracking issue soon, laying out our rough plans. We want to ship HTML export step by step: Initially, the core focus will be on getting the Typst -> HTML semantics right without worrying too much about styling. I think that already gives quite a lot of value since you can always write CSS later. The crucial part is to have a single source of truth for the content, that can be turned into PDF and HTML.

                                                                1. 7

                                                                  I’m not trying to blame the choices, but I think that it would have been nice to design things with an intermediate representation in the middle that is reasonably backend-independent. One can design a rendering tool with such an intermediate representation, and yet focus on just one backend, but the existence of this intermediate layer can (to some extent) help avoiding making design choices that endanger the ability to do other backends well.

                                                                  1. 6

                                                                    I think that this intermediate representation exists: In the form of Typst elements. Semantic elements like headings, figures, etc. are already as low as you can go if you want to support HTML. Having a second fixed-world IR below Typst elements doesn’t give you much over using them directly.

                                                                    Translation from this “high-level IR” to HTML will happen primarily via show rules. Making the IR open-ended (through the planned custom types/elements) gives a lot of flexibility: Users will be able to define their own custom HTML definitions for built-in or custom elements.

                                                                    There is also a “low-level IR” below the semantic one, which consists of blocks, boxes, spacing, placement, etc. which is important for layout. Note that these are still Typst elements, just more low-level ones. For HTML, we want to go a similar route. There will be a Typst element that just results in a raw HTML element. High-level Typst elements can then, depending on the export target, show themselves as different things: Either layout primitives, HTML primitives. Some might be so high-level that they just show themselves as other semantic elements (which are shown recursively), so they needn’t be export-target aware.

                                                                    One can try to lossily translate the layout IR to HTML as well, but ideally, when the HTML target is selected, the semantic elements will not yield layout primitives in the first place.

                                                                    1. 1

                                                                      But typst prose isn’t trivially machine readable. I’d rather an IR that converts everything to a basic syntax that consists of e.g. function calls and source spans or something, for everything (including the built in elements). It’d then be up to me to supply those functions as the “backend” developer, allowing me to render the document however I please.

                                                                      That’d be an an instant sell for me to use Typst.

                                                                      1. 1

                                                                        Eh, I just had a bit of fun making a treewalking interpreter in typst for typst math expressions. (e.g. interpreting what typst produces from”$2^3$” to “8”) The prose isn’t trivially machine readable, but the element datastructure it evaluates to and exposes to the user is, and is basically function calls.

                                                                        Exposing spans would be neat, right now typst is clearly tracking the source of values but it isn’t exposing it to the user.

                                                                  2. 5

                                                                    Thanks for your work on typst.

                                                                    I agree getting high-quality HTML (in terms of structure) is more important than trying to make it look ‘pretty’, in particular trying to make it look as much like the PDF as possible.

                                                                    I really hope that the HTML ends up being ‘accessible as a base’. The current PDF output seems very inaccessible for blind readers, compared to (for example) the PDFs that word produces, which are excellent.

                                                                    To make accessible HTML, the main thing is just to make sure you follow HTML standards (use tables for tables, and h1/h2/etc for headers, and don’t make ‘div soup’ for things which have well-defined tags). Hopefully this will let typst be accessible to those with vision difficulties.

                                                                    1. 5

                                                                      Yes, accessibility is currently a weak point. We’re very aware of that. We will definitely generate proper HTML and not a div soup. And we’re slowly working towards generating more accessible PDFs.

                                                                  3. 21

                                                                    I use pandoc --from typst for now.

                                                                    1. 10

                                                                      [..] it is not reasonable for a document-writing system this decade to only target PDF.

                                                                      HTML is less important than PDF when it comes to typesetting documents, ime.

                                                                      it will be good to have HTML support, but to say that it’s a requirement of any document writing system in the modern era is a little… idk, ignorant of the fact that rendering documents to PDF is a significant enough focus that if they never supported HTML output at all this project would be very worthwhile.

                                                                      1. 8

                                                                        I’m no publishing expert. The two domains that I vaguely know about where people may send PDFs as part of their production pipeline are scientific publishing and general book publishing (for books I don’t think that people actually use PDF, but let’s assume). In both domains, publishers have expended significant resources in trying to get adequate HTML output, because the HTML format is significantly easier to deploy and gets more views than PDFs:

                                                                        • scientific publishers are doing their best to semi-automatically translate PDF papers into HTML for online viewing, see for example this paper published by Springer, with a bad HTML rendering produced from a PDF source of truth itself produced by LaTeX
                                                                        • general book publishers are doing their best to produce epub formats for the e-reader market, and to my knowledge those are basically HTML documents.

                                                                        Dear non-ignorant poster, do you know of another community / use-case that is happily producing PDF documents, and is not in the process of trying to also produce alternative HTML outputs for some reason?

                                                                        1. 14

                                                                          Ahhhhh maybe kind of a niche but at the small aerospace company I work at PDFs are pretty common for design documents, drawings, ICDs, and things like that. One of the main reasons for that is that they’re self-contained and durable. For a released version of a document or drawing the document is digitally signed by the two reviewers/approvers. Once signed, we have assurance that that single self-contained file serves as an immutable and perpetually self-contained artifact.

                                                                          Plus… since we do operate in field conditions without guaranteed robust internet we can just load them onto a USB drive and be quite confident that they’ll work. Or, crazy thing, print them out on paper and have them look exactly the same as they do on-screen.

                                                                          1. 5

                                                                            it’s not a matter of also trying to produce HTML, but whether producing something where the main intention is that it shall be consumed as a paginated document.

                                                                            anyway the following:

                                                                            • academia
                                                                            • architecture
                                                                            • government (civil)
                                                                            • government (aerospace & defense)
                                                                            • MEP engineering
                                                                            • law
                                                                            • publication

                                                                            in many of these places people work primarily with .docx and print to physical paper or PDF for exchange with counterparties.

                                                                            I’m sure that HTML would be appreciated in a lot of these domains, but it wouldn’t register as a higher priority than good, consistent PDF output based on what I’ve seen.

                                                                            1. 4

                                                                              I work in academia, and I’m pretty sure that the vast majority of academics in my field do not really care about the fact that their documents are paginated. They want to be able to print it to paper sometimes (rarely, these days; people only print for reviews, and that’s because reading PDF documents on e-readers/tablets is a pain, otherwise it would be the dominant workflow by now, currently only a fraction of people do it). Browsers can paginate HTML documents on the fly for printing, and the quality of their rendering has improved over the years.

                                                                              The fact that we work with PDF is an artifact of previous printed-first publishing practices, and inertia, but it is not a requirement. (We do need a reproductible measure of “document size” to set size limits on academic paper submissions.) Formats that are paginated at rendering time are significantly more flexible in terms of viewing on mobile, e-readers, etc., and that would be a significant improvement for many people.

                                                                              I don’t know the other fields you mention, but except for dead-paper book editing and publishing I don’t really understand why it would be a strong requirement that something “can be consumed as a paginated document”, by which I suppose that you mean that the producer of the document decides the pagination, rather than the consumer at rendering time. (Is it that people are used to references document parts using page numbers, instead of using numbered (sub)sections?)

                                                                              1. 6

                                                                                Formats that are paginated at rendering time are significantly more flexible in terms of viewing on mobile, e-readers, etc., and that would be a significant improvement for many people.

                                                                                this isn’t generically true, though, and it’s part of what I was trying to get at in my original and follow-up comments. there are many situations where it is either unacceptable or simply not desired for layout decisions (such as text reflow, pagination, etc.) to be deferred to some other rendering engine.

                                                                                I’m happy that typst will, at some point in the future, render to HTML but there are already a huge number of tools that can handle rendering some markup to HTML and not very many that can handle typesetting a document and rendering to PDF in a manner that’s comparable to LaTeX.

                                                                                anyway, all I’m saying really is that there is a strong desire for a markup language that produces consistently aesthetically pleasing output in the style of printed documents and I would much rather have a tool whose primary goal was to improve upon this workflow before focusing elsewhere.

                                                                                1. 3

                                                                                  there are many situations where it is either unacceptable or simply not desired for layout decisions (such as text reflow, pagination, etc.) to be deferred to some other rendering engine.

                                                                                  Again, excuse my ignorance, but can you give details on what those scenarios are where reflow is unacceptable? (The only one I can think of is the edition of documents whose main distribution medium is to be printed on physical paper, and as far as I know less and less documents are intended for this usage as time passes.) There was an excellent point made earlier on the archival-friendly qualify of (some) PDFs which makes them desirable in some environments, but this is orthogonal/independent of when the pagination happens.

                                                                                  1. 1

                                                                                    As it stands, I imagine that papers that you and I would write have some parts of them that are non reflowable (e.g. a typing rule), even when the text around it can be reflowed? Browser support for complex math objects is not great and will look uglier than proper LaTeX (or typst). So even when rendering to HTML/epub, I’d hope typst can work in a sort of hybrid mode, because being a proper typesetting engine means it has an edge over browser-style technologies for some use cases.

                                                                            2. 4

                                                                              Invoices, contracts, forms etc. tend to be PDF — everyone is assured they’re looking at the same thing, easily signed or stuffable into a signed container, etc.

                                                                              1. 3

                                                                                For anglophone genre fiction production, Word reigns supreme. Charles Stross has blogged extensively about his travails with this.

                                                                                1. 3

                                                                                  My dad works for a publishing house and they send PDFs to the printing house (usually with some special formatting to know where to cut the paper for example).

                                                                                  I’m actually trying to help him to migrate some books they published 30 years ago and a crucial requirement is PDF output.

                                                                              2. 7

                                                                                If you want HTML output today simply use the SVG output and wrap the pages with a bit of CSS :D

                                                                                In seriousness, the project is very much approaching being able to do HTML output. [What follows are my impressions after studying the code for approximately one evening, anyone who knows better please correct me.] The difficulty lies in the fact that evaluation (typst is implemented like an interpreted programming language) and layout aren’t serial steps but are somewhat intertwined. After the document is first evaluated, the result still includes style directives and rewrite rules which may be arbitrary functions that have to be called during the layout. Handling this is called “realization” and it’s what was mentioned in these release notes as having been rewritten to handle the upcoming requirements for HTML.

                                                                              3. 19

                                                                                I really appreciate this attitude. As programmers, we love to complain and grumble to each other about how the state of things suck, or that things are over complicated, but then too often the response is the software engineering equivalent of “I paid my student loans, so you should have to, too”. A new person joins the project, and WTFs at something, and the traumatized veterans say, “haha oh boy welcome, yeah everything sucks! You’ll get used to it soon.”

                                                                                I hate that attitude.

                                                                                We are at the very, very beginning of software protocols that could potentially last for millennia. From that perspective, you would look back at this situation and think of Richard’s blog post as super obvious, the clear voice of reason, and the reaction of everyone here as myopic.

                                                                                Even if our software protocols for whatever reason don’t last that long, we need to be working on reducing global system complexity. Beauty and elegance aside, there is such a thing as complexity budget which is limited by the laws of information theory, the computer science equivalent of the laws of physics. People like Richard understand this intuitively, and actively work towards reconstructing our world to regain complexity currency so that it can be spent on more productive things.

                                                                                1. 13

                                                                                  Making software behave in even more ways that contradict the standards supposed to describe its behavior, and expecting other implementations to change to accept this, is not reducing complexity. If applied naively, it makes it way worse.

                                                                                  New protocols and developments, totally, get rid of CRLF.

                                                                                  1. 12

                                                                                    A lot of the commentary on this blog post seems to be based on the assumption that the standard is written first, and then everybody makes an implementation, while in reality it is precisely the other way around.

                                                                                    1. 14

                                                                                      For all practical matters, HTTP 1.1 is now written in stone and will not change much. Thousands of libraries and hardware implementations have been implemented against its standard. If we want to change things, it’s a couple decades too late for this one particular protocol; and if we want to improve efficiency and on-the-wire byte size, it’s already doable with HTTP 2 and HTTP 3. This change is just not worth the breakages.

                                                                                      1. 6

                                                                                        The post explicitly talks about doing this for standards that are both already documented standard-as-written and already have many deployed implementations that have figured out an interoperable standard-as-implemented. If it were about standards that are still figuring these out, because the first implementations are still being written and the details refined, it would not be controversial.

                                                                                    2. 7

                                                                                      A new person joins the project, and WTFs at something, and the traumatized veterans say, “haha oh boy welcome, yeah everything sucks! You’ll get used to it soon.”

                                                                                      It’s often called “responsibility.”

                                                                                      Being a software author for the massively distributed internet means behaving and doing what others expect, so that you may all prosper. Otherwise you end up doing what’s sufficient, like hardcoding the TLS version number, and you end up with TLS 1.3 needing to work around the immaturity of implementers who figured they could just do the bare minimum and that number probably shouldn’t be changing anyway so let’s hardcode it.

                                                                                      I am not trying to cast shade or cast aspersions, but there is a reason we hate existing cruft and want to build new things. We can’t just “wish” the wrinkles out of the old things, not while being responsible members of society.

                                                                                      1. 3

                                                                                        People use this same “responsibility” argument to ignore the climate crisis. “Responsibility” looks different depending on whose interests you are protecting.

                                                                                        1. 4

                                                                                          Off topic climate sentence? I don’t see the relevance.

                                                                                          I’m talking about protecting society’s interest by acting responsibly and being a good citizen. When implementing a protocol, people expect you to act in a certain way. So you should act in that way.

                                                                                          When making a new thing, of course, the field is wide open. Go have fun.

                                                                                      2. 3

                                                                                        Better is different.

                                                                                        Different is worse.

                                                                                        QED Better is worse.

                                                                                      3. 4

                                                                                        This is quite a long article, but it did manage to convince me I should try typst seriously soon :-). I’m particular, the support for templating looks excellent; and the possibility of writing slides is definitely a case where I’d look into ditching beamer.

                                                                                        1. 3

                                                                                          <input type="hidden" name="method" value="put">. What is the point of using different HTTP verbs? The point, to me, is that middleware boxes treat GET as idempotent and cacheable if it has a cache header, but not POST. We don’t need further differentiation. The point of HTTP verbs and status codes is not to communicate with your clients (you can make them do whatever you want based on any part of the request or response) but with clients you don’t control. Clients you don’t control won’t respond to PUT/PATCH/DELETE any differently than a POST, so we don’t need it.

                                                                                          Compare that to the QUERY proposal, which, to me, is a good proposal. The problem is you want to send a request body but also opt into GET-like semantics from middleware boxes. That’s why QUERY is a good idea, not because we lack ways to communicate with HTTP servers.

                                                                                          1. 5

                                                                                            Sections 6-8 of the forms proposal answer this question, and they include specific examples of how adding PUT and DELETE to HTML forms improves both server- and client-side code.

                                                                                            1. 9

                                                                                              In 2000, Roy Fielding published a PhD dissertation in which he introduced the Representational State Transfer (REST) architectural style for distributed hypermedia systems. While these principles were used to guide the early development of the World Wide Web, they are often badly misunderstood.

                                                                                              Big sigh from me. If the dissertation is not understood by anyone, why are we talking about it? Who cares? Is the dissertation supposed to be descriptive or normative? Did the web succeed because it was REST or in spite of not being REST or what?


                                                                                              The discussion of logging out is wrong. It is not idempotent. Logging out means invalidating a session row in a table. You only have permission to log out your own session, so once you are logged out, you do not have permission to log anyone out anymore. Sure, in terms of UX, you don’t have to show an error message, but it is an error to try to log out someone else’s session, so if I send a log out message with an old session cookie, that is an error because you are attempting an operation you don’t have permission to do. If I send it with no session cookie, that is a special case and you can choose to treat it as an error or not. In either case, it’s not really idempotent, it’s just a UX nicety to not show people an error message when they perform an illegal operation by mistake. In an API, I would expect to receive an error message in response to sending a bad log out request.


                                                                                              The ideal situation is obvious, from the server’s standpoint:

                                                                                              router.post(   '/reservation',                        requireLogin,             createReservation)
                                                                                              router.get(    '/reservation/:reservationId',         requireGroupMembership,   getReservation)
                                                                                              router.put(    '/reservation/:reservationId',         requireGroupMembership,   updateReservation)
                                                                                              router.delete( '/reservation/:reservationId',         requireOwnership,         deleteReservation)
                                                                                              

                                                                                              The ideal situation is not to do any of that. It should be:

                                                                                              router.post(   '/create-reservation-v2', requireLogin,             createReservation)
                                                                                              router.get(    '/reservation', requireLogin,   getReservation)
                                                                                              router.post(    '/reservation', requireLogin,   updateReservation)
                                                                                              
                                                                                              • Delete is not a real operation! It’s just spicey update! No one really deletes anything. Even post-GDPR, you probably want to soft delete first and do hard deletes in a batch update hours or days later to protect users from themselves.
                                                                                              • There are a million ways to endpoint versioning, but the simplest and best is just to put versions into the path of individual endpoints as needed. It will be trivial to see if anyone is still using the old endpoints when you want to turn it off.
                                                                                              • Use query parameters for GET, not path manipulation (or use QUERY and a JSON body if it exists someday)
                                                                                              • Do the authentication in middleware and leave fine grained authorization up to the controller. Today you’re giving owners permission to delete but not group members. That can change as your authorization model changes. Just do the broad permissions at the middleware layer. Middleware in general is a pain to debug because it is invisible control flow. Use it sparingly.

                                                                                              These are my opinions. It’s fine to have other opinions too. But none of this is about solving the problem of getting other servers to understand requests/responses which is what HTTP verbs and statuses are for.

                                                                                              1. 4

                                                                                                There’s too much to respond to line-by-line, but hopefully this is enough to get the gist of where I’m coming from.

                                                                                                Big sigh from me. If the dissertation is not understood by anyone, why are we talking about it?

                                                                                                Many good ideas where not properly understood in their time. Decades later, I understood it, and it taught me something about how I’d like to improve the web.

                                                                                                it’s just a UX nicety to not show people an error message when they perform an illegal operation by mistake.

                                                                                                I disagree that logouts aren’t idempotent (I certainly model them that way), but more importantly, a lot of browser behavior is driven by UX niceties. It’s good and important to develop those.

                                                                                                There are a million ways to endpoint versioning, but the simplest and best is just to put versions into the path of individual endpoints as needed.

                                                                                                One of the things that adopting RESTful UI patterns allows you to do is stop versioning APIs, because the APIs are self-describing. Carson Gross talks about this in a bunch of his essays.

                                                                                                But none of this is about solving the problem of getting other servers to understand requests/responses which is what HTTP verbs and statuses are for

                                                                                                This is clearly not true. When I include a link on my website to another page on my website, that uses the GET HTTP verb; if it didn’t matter what verb it was, all same-origin requests would be verb-less. The browser users information about the HTTP method, same-origin or not, to make behavioral decisions.

                                                                                                1. 3

                                                                                                  One of the things that adopting RESTful UI patterns allows you to do is stop versioning APIs, because the APIs are self-describing. Carson Gross talks about this in a bunch of his essays.

                                                                                                  What does that even mean? Every client needs to re-analyze your APIs every time they want to do something because they might’ve changed?

                                                                                                  1. 4

                                                                                                    What that means is that browsers already use a standardized, universal API, called HTML. Each page in a normal website is self-describing, because it adheres to this standard that browsers already know; as opposed to SPAs where you have to ship your custom client for your custom API in order to then modify the DOM.

                                                                                                    1. 3

                                                                                                      Every client needs to re-analyze your APIs every time they want to do something because they might’ve changed?

                                                                                                      It helps a lot to remember that the point of REST was not to enable the kind of automatic-client-program-generated-from-a-machine-readable-spec stuff that, in the early 2000s, was largely being attempted via stuff like SOAP and the WS-* stack, and nowadays is attempted via things like OpenAPI.

                                                                                                      So when you read that something is “self-describing” in REST, you should interpret that more as “a human interacting with it would understand what is going on”, not as “an unattended program would be able to automatically derive the correct action from machine-readable metadata”. The unattended program is, as I understand it, explicitly not catered to by REST, or at best it is assumed that the program would be created by a human who had first interacted with the system and encoded their own understanding of it into the program.

                                                                                                      1. 1

                                                                                                        or at best it is assumed that the program would be created by a human who had first interacted with the system and encoded their own understanding of it into the program.

                                                                                                        But if there is no versioning, then you can’t have programs interact with the system at all! Since it might change at any time and thus require another round of human intervention.

                                                                                                        It really seems that REST has absolutely nothing at all to do with APIs of any kind and is apparently just re-stating obvious things about things designed for humans?

                                                                                                        1. 4

                                                                                                          It really seems that REST has absolutely nothing at all to do with APIs of any kind and is apparently just re-stating obvious things about things designed for humans?

                                                                                                          Fielding uses the term API in a way that includes “Hypermedia API,” I guess on the basis that the browser is the application that interprets the interface. So a lot of people get tripped up on that, understandably.

                                                                                                          But otherwise, yes, your comment is correct.

                                                                                                          1. 4

                                                                                                            But if there is no versioning, then you can’t have programs interact with the system at all!

                                                                                                            “How will automated programs be able to interact with it” is not a problem REST-in-the-Roy’s-thesis-sense-of-the-term attempts to solve. The sooner that point gets fully communicated and understood, the less trouble there will be from expectations that REST was somehow about making the web useful for machines.

                                                                                                            It really seems that REST has absolutely nothing at all to do with APIs of any kind and is apparently just re-stating obvious things about things designed for humans?

                                                                                                            A lot of really shitty websites allegedly are meant for use by humans. REST would make some number of them better.

                                                                                                        2. 1

                                                                                                          There’s a link in the quoted text (“his essays”) that explains what I (and Fielding, and Gross) mean by “self-describing.” I’m happy to explain further too.

                                                                                                          1. 3

                                                                                                            I followed that link, and then followed another link to the REST article on Wikipedia, which says that RESTful systems must have

                                                                                                            Self-descriptive messages: Each message includes enough information to describe how to process the message. For example, which parser to invoke can be specified by a media type.

                                                                                                            This is maybe fine at a syntactic level. I understand how clients and servers can use content negotiation to say “I’d rather get JSON” or “I’d rather get EDN” or whatever. But on a semantic level, I have to echo @lonjil’s confusion—what does this actually mean? If a client program receives a completely new kind of data from the server, how can it possibly know what to do with it? If the “client” is a human reading a web page then sure, I get it. But if it’s a program?

                                                                                                            1. 1

                                                                                                              If a client program receives a completely new kind of data from the server, how can it possibly know what to do with it? If the “client” is a human reading a web page then sure, I get it. But if it’s a program?

                                                                                                              When building on the web, the client is almost always a human reading a web page. And even in the situations where its not—like a search engine bot—you don’t need to tell the search engine bot “hey I updated my homepage, it’s homepage v1.1 now.” People and bots just load your homepage again, see something new, and it’s fine. That’s what self-describing API means: its intended consumers will be able to interpret changes on their own.

                                                                                                              If you need an API for programs to read (what most people think of when hear “API”), then by all means, make a one, but don’t couple it with the human-facing representation of the app (which is HTML), precisely because you can’t change what the program-facing API returns without breaking the things that rely on it.

                                                                                                              Carson actually wrote his own version of Wikipedia page on HATEOAS because he felt like it was a bad explanation (which I agree with). You don’t need to read this link though—it says basically the same thing.

                                                                                                              1. 1

                                                                                                                If you need an API for programs to read (what most people think of when hear “API”), then by all means, make a one, but don’t couple it with the human-facing representation of the app (which is HTML),

                                                                                                                But no one does that already.

                                                                                                                1. 3

                                                                                                                  Anyone who does default Rails or Django or Spring or Dotnet templating is doing this already. I believe that, even though it’s currently seen as old-school, that model is more flexible and more future-proof than JSON-to-React model of building websites.

                                                                                                        3. 2

                                                                                                          The browser uses information about the HTTP method, same-origin or not, to make behavioral decisions.

                                                                                                          Yes, that’s my point. The browser is a client you don’t control. It treats POSTing a form differently than GETing a form: it asks you are you sure when you press the back button. How should browsers treat PUT/PATCH/DELETE differently from POST if it were a valid form method?

                                                                                                          1. 3

                                                                                                            Details on this are in section 4! Let me know if there’s something specific I didn’t cover in there.

                                                                                                            1. 1

                                                                                                              Showing an automatic warning before sending a DELETE request would be really cool!

                                                                                                              1. 3

                                                                                                                I guarantee you that if sending a DELETE request would trigger a browser dialog that you cannot customize, your UX team would ban DELETE requests.

                                                                                                                1. 1

                                                                                                                  Then make it customizable… A deleteRequest event could do the job. Maybe it could also be a styleable <dialog>

                                                                                                                  1. 2

                                                                                                                    If it’s customizable via a JS event, then it’s not different than doing onsubmit="prompt('Delete file?') || event.preventDefault()" which has existed for decades.

                                                                                                                    1. 1

                                                                                                                      Yes, but that doesn’t have a confirm dialog by default. The styleable <dialog> would be perfect.

                                                                                                    2. 17

                                                                                                      I’m really curious what the reaction would have been if systemd had introduced SQLite as the log format, with a table of log producers, log levels, and so on, and log events, in a normal form. It would then have been easy to query logs with SQL and trivial to write scripts that ran queries like ‘give me all of the logs whose level is warning or higher from yesterday’ or ‘tell me everything that event source $X did in the last week’. There are already many libraries for interacting with SQLite, so writing graphical log explorers would also have been easy. How much of the objection was from binary logs and how much from the wrong binary log format?

                                                                                                      This is, unfortunately, a consequence of UNIX throwing away record-structured files. On VMS, for example, logs can be written in a structured form that all of the built-in tools can consume. The closest UNIX equivalent is bdb, which people have good reasons to dislike. SQLite is often a good alternative.

                                                                                                      My reaction to systemd is very often that it identifies real problems and then solves them in the wrong way.

                                                                                                      1. 4

                                                                                                        Crucially, data stored in sqlite can be indexed, which would probably have helped systemd with this (still open) 8 year old bug: https://github.com/systemd/systemd/issues/2460

                                                                                                        Somehow, they ended up with a binary format that’s too slow for what I would consider the primary use case of logs: reading them.

                                                                                                        1. 3

                                                                                                          I think it’d have been nice for the user… at the detriment of logging performance. I’m not sure WAL was a thing when the systemd project first introduced its binary logs, but even then, Sqlite doesn’t shine in very write-heavy workloads, and logging definitely fits in that case. Paying for B-trees and transactions when inserting a new log entry end up being just too expensive.

                                                                                                          1. 2

                                                                                                            How much performance do you need from syslog? You typically log messages at a rate of seconds per log item, with tens of log items per second at worst.

                                                                                                            1. 3

                                                                                                              syslog can be networked, a large set of servers + some noisy services I could imagine easily hit sustained 10+ items per second

                                                                                                              I’ve had bursts of probably 100+ per second from Steam and Discord complaining about absolutely everything in existence.

                                                                                                              I think I’m low-balling both numbers. I don’t know what conclusion I’d draw from any of this.

                                                                                                              1. 1

                                                                                                                syslog can be networked, a large set of servers + some noisy services I could imagine easily hit sustained 10+ items per second

                                                                                                                That’s fair. Though if the backend is something like SQLite by default, replacing it with a scalable RDBMS would also be possible and so a server that’s collecting logs for hundreds / thousands of machines could easily use something other than the default.

                                                                                                                I’ve had bursts of probably 100+ per second from Steam and Discord complaining about absolutely everything in existence.

                                                                                                                100 transactions per second is well within the capabilities of SQLite, even on comparatively slow hardware. It’s close to the line on slow machines with spinning rust if you’re just doing the naïve thing (one transaction per INSERT), but it would be fairly trivial to batch up transactions for either 100 log messages or one second (whichever is longer) and then write them in a single transaction. Or do this only when you have a high rate of log messages and default to one per transaction until you start to see more than one in the queue.

                                                                                                              2. 1

                                                                                                                I’ve seen¹ my desktop Linux kernel spew more than ten thousand messages per second into the systemd journal when under high I/O load, typically from updating all installed packages. (It may be what I get for using ZFS on Linux — I have nowhere near enough knowledge of ZFS internals to know why ZFS would incur this, but at least one other ZFS user seemed to have encountered it, although I now don’t see their ticket in the OpenZFS issue tracker.)

                                                                                                                ¹ (looking at the logs after the fact, not in real time)

                                                                                                                1. 1

                                                                                                                  I’m curious what would lead you to believe that is a description of “typical” syslog flow volumes.

                                                                                                                  1. 1

                                                                                                                    Because I’ve spent a reasonable amount of time reading (time stamped) syslog entries over the last 20+ years and rarely seen more than a handful of log entries per second.

                                                                                                                    1. 4

                                                                                                                      OTOH my own experience has had me working in situations like your own description and also situations with log volumes orders of magnitudes larger.

                                                                                                                      I think it is perhaps good to keep in mind that one’s own anecdata should not be presumed to be broadly typical of an entire population.

                                                                                                                      I’ve written about it elsewhere on lobst.er before, but I’ve easily been able to have actual real world syslog event volumes that exceed what journald can handle even when configured purely as a forwarder. The reality of the situation is that there are many solutions which could handle low and occasionally bursty volumes which are absolutely unsuitable as general solutions because their performance simply doesn’t scale like that.

                                                                                                            2. 10

                                                                                                              I agree that unstructured plaintext is not ideal for logs – it’s painful to do automated log analysis when there’s no schema at all – but I don’t see why structured logging is so quickly dismissed. JSON-per-line is not that hard to read, and even if it is more painful than one would like, it’s trivial to translate to something more readable.

                                                                                                              1. 4

                                                                                                                Json is brittle, poorly understood and hostile to the human reader.

                                                                                                                1. 8

                                                                                                                  Json per line is less brittle than text+regex if you want a modicum of structure, imho. With text, all the edge cases are terrible: can you emit a log message that spans multiple lines? Can you easily retrieve entries’ timestamps?

                                                                                                                  1. 1

                                                                                                                    it’s as usual all about expectations. Unspoken often. Do I want reporting? Then logfile analysis may be inferior to a dedicated data outlet. Do I want bug or incident forensics? Then flexibility and readability may be more important than structure. I expect logging to log the unforeseen, the surprises. The less assumptions made upfront, the more likely to succeed.

                                                                                                                    I have yet to see the cases where complexity in logging is beneficial (indirectly, naturally) to the end user.

                                                                                                                  2. 5

                                                                                                                    grep “something” | cut -d -f 3 “some_field_that_might_exist” | grep “something else” | cut -d “not everybody writes good terminal output” | echo “whoops I forgot some filenames have special characters”

                                                                                                                    vs –output json | jq “.thing[]|as_array()|that_i_want|{.CPU,.RAM}”

                                                                                                                    1. 2

                                                                                                                      doesn’t convince me. Why not transform the plaintext logs to the data format of choice?

                                                                                                                      What about upward and downward compatibility over decades and multiple updates? IMO brittle.

                                                                                                                      1. 2

                                                                                                                        How often do you need to investigate decade-old logs?

                                                                                                                        1. 1

                                                                                                                          oh, you’re right. One usually deletes them after a few weeks. Still I uphold the rest of the argument.