1. 9

    Nice article.

    I know that this is an area that could use improvement; it’s quite manual right now. I’d be interested to hear about some kind of light-weight solution for this that people have come up with.

    You can look at a simple docker-compose setup as managing 67 containers just with docker run can get messy. I’ll say if you want to get a little more sophisticated, you can look at writing a simple ansible playbook that will template out different docker-compose.yml files on the server and use that to manage stuff like image updates, config changes etc.

    1. 10

      To most people, Docker and Ansible are anything but simple.

      1. 6

        Of course it’s relative… to most people, running even a single web app/site is anything but simple.

        But obviously if you can read the article and understand what’s going on, then OP’s suggestion of using docker compose and/or ansible makes sense.

        1. 6

          I say this completely seriously and not intending to throw any shade at the OP. Over the past year I have tried to recognize how and where I use words like “simply” or “just”, particularly when it applies to technical instructions, and kill them off for exactly that reason.

          What I find simple someone else might not. I might find it simple because I have lived it for 10 years and not realize my bias. The person I’m speaking to or writing for might read those words (as you’ve noted) and think “That’s not simple, I can’t do this.” To some, words like those come off as arrogant (even if not intended that way).

          Instead, I now try to explain things in a simple way, if I think it is simple, or give clear examples that step the reader through it. If it really is simple, they can skip the step-by-step and if it isn’t for that reader, they have what they need to start accumulating the experience needed for it to seem simple the next time.

          1. 1

            Sure, but what are the alternatives, genuinely?

            1. 4

              Pyinfra is a great replacement for Ansible for these types of tasks. Nomad would also be a great way to orchestrate some containers with little fuss. So would be Docker Swarm, in my opinion.

          2. 4

            I tried using just docker-compose with a similar project last year but I found it required a fair amount of maintenance to keep up during deploys.

            I’m considering redoing it using Harbormaster (compose file watcher) so that updating the source replicates to the servers.

            https://gitlab.com/stavros/harbormaster

            1. 3

              I’m using systemd-docker at the moment to run docker containers in systemd (at work, in my personal life I don’t like docker). I find this very nice, because then I have all services in a standard systemd environment and can fully use dependencies between systemd units and so on. If I remember correctly, systemd also has some built-in container runtime (nspawn?).

            1. 17

              I’m trying to find a charitable interpretation for the fact that “avoid installing security updates because this distribution tool can’t handle updating in a secure manner” has ever even been considered as a form of best practice. Charitable as in not leaning towards “web developers gonna web develop”, which I would’ve been happy with 15 years ago but I realise perfectly well that’s not the right explanation. I just can’t, for the life of me, figure out the right one.

              Can someone who knows more about Docker and DevOps explain this old Unix fart why “packages inside parent images can’t upgrade inside an unprivileged container” is an argument for not installing updates, as opposed to throwing Docker into the thrash bin, sealing the lid, and setting it on fire?

              1. 13

                This is not a problem with Docker the software. Docker can install system updates and run application as non-privileged user. The article demonstrates how, and it’s not like some secret technique, it’s just the normal way documented way.

                This is a problem with whoever wrote this document just… making nonsensical statements, and Docker the organization leaving the bad documentation up for years.

                So again, Docker the software has many problems, but inability to install security updates is not one of them.

                1. 1

                  Has that method always worked? Or is it a recent addition for unprivileged containers? I’m just curious to understand how this ended up being the Docker project’s official recommendation for so many years that it ended up in linters and OWASP lists and whatnot. I mean none of these cite some random Internet dude saying maybe don’t do that, they all cite the program’s documentation…

                  1. 5

                    When I (and the documentation in question) say “unpriviliged” in this context it means “process uid is not root”.

                    There’s also “unpriviliged containers” in the sense that Docker isn’t running as root, which is indeed a new thing also is completely orthogonal to this issue.

                    1. 1

                      Now it sounds even weirder, because the documentation literally says “unprivileged container”, but I think I got your point. Thanks!

                2. 5

                  Well, the article did point out that you can upgrade from within Docker. The problem is that the OS running inside Docker can’t assume it has access to certain things. I only skimmed the article, but I think it mentioned an example where updating an Linux distro might cause it to try to (re)start something like systemd or some other system service that probably doesn’t work inside a Docker container.

                  However, that really doesn’t address your main point/question. Why was this ever advice? Even back in the day, when some OSes would misbehave inside Docker, the advice should have been “Don’t use that OS inside Docker”, not “Don’t install updates”.

                  I think the most charitable explanation is that developers today are expected to do everything and know about everything. I love my current role at my company, but I wear a lot of hats. I work on our mobile app, several backend services in several languages/frameworks, our web site (ecommerce style site PHP + JS), and even a hardware interfacing tool that I wrote from scratch because it only came with a Windows .exe to communicate with it. I have also had to craft several Dockerfiles and become familiar with actually using/deploying Docker containers, and our CI tool/service.

                  It’s just a lot. While I always do my best to make sure everything I do is secure and robust, etc, it does mean that sometimes I end up just leaning on “best practices” because I don’t have the mental bandwidth to be an expert on everything.

                  1. 2

                    it mentioned an example where updating an Linux distro might cause it to try to (re)start something like systemd or some other system service that probably doesn’t work inside a Docker container.

                    That’s not been true for years, for most packages. That quote was from an obsolete article from 2014, and only quoted in order to point out it’s wrong.

                    1. 2

                      I didn’t mean to imply that it was! If you read my next paragraph, it might be a little more clear that this isn’t an issue today. But I still wonder aloud why the resulting advice was ever good advice- even when this particular issue was common-ish.

                      1. 1

                        AFAICT the current version of best practices page in Docker docs was written in 2018 (per Wayback Machine), by which point that wouldn’t have been an issue. But maybe that’s left over from an older page at a different URL.

                  2. 5

                    I am not a Docker expert (or even user), but as I understand the OCI model you shouldn’t upgrade things from the base image because it’s a violation of separation of concerns between layers (in the sense of overlay filesystem layers). If there are security concerns in the base packages then you should update to a newer version of the image that provides those packages, not add more deltas in the layer that sits on top of it.

                    1. 2

                      That makes a lot more sense – I thought it might be something like this, by analogy with e.g. OpenEmbedded/Yocto layers. Thanks!

                      1. 1

                        This doesn’t hold water and is addressed in the article.

                        The way Docker containers work is that they’re built out of multiple, composable layers. Each layer is independent and the standard separation of concerns layer based.

                        So after pulling a base container, the next layer that makes sense is to install security updates for the base image. Any subsequent changes to the base image will re-install security updates.

                        Often base images are updated infrequently, So relying on their security update is just allowing security flaws to persist your application.

                        1. 1

                          To me, an outsider who uses Docker for development once in a while but nothing else, a separate layer for security updates doesn’t make much sense. Why would that be treated as a separate concern? It’s not something that is conceptually or operationally independent of the previous layer, something that you could in principle run on top of any base image if you configure it right – it’s a set of changes to packages in the parent layer. Why not have “the right” packages in the parent layer in the first place, then? The fact that base images aren’t updated as often as they ought to be doesn’t make security updates any more independent of the base images that they ought to be applied to. If that’s done strictly as a “real-world optimisation”, i.e. to avoid rebuilding more images than necessary or to deal with slow-moving third parties, that’s fine, but I don’t think we should retrofit a “serious” reason for it.

                    2. 3

                      Charitable as in not leaning towards “web developers gonna web develop”

                      I kind of want to push back on this, because while it’s easy to find examples of “bad” developers in any field of programming, I think it’s actually interesting to point out that many other fields of programming solve this problem by… not solving it. Even for products which are internet-connected by design and thus potentially exploitable remotely if/when the right vulnerability shows up. So while web folks may not be up to your standards, I’d argue that by even being expected to try to solve this problem in the first place, we’re probably ahead of a lot of other groups.

                      1. 1

                        Yeah, that’s exactly why I was looking for the right explanation :-). There’s a lot of smugness going around that ascribes any bad practice in a given field to “yeah that’s just how people in are”, when the actual explanation is simply a problem that’s not obvious to people outside that field. Best practices guides are particularly susceptible to this because they’re often taken for granted. I apologise if I gave the wrong impression here, web folks are very much up to my standards.

                    1. 15

                      I find it really odd that UUIDs didn’t come up here.

                      They solve the same problems without the drawback that they can be accidentally used as temporary values when iterating or doing maths

                      1. 17

                        That’s a start, but using a common ID type besides integer or string only prevents some of the errors. You can still inadvertently pass a Person ID to an operation expecting a Product ID, unless they are distinct in your type system. I like to pair each table with a one-field struct type holding its ID value. Some languages make this zero cost at runtime for a nice compile time benefit.

                        Types are very good when they give you guarantees about data, but even better when they give you guarantees about the problem domain.

                        1. 14

                          You can still inadvertently pass a Person ID to an operation expecting a Product ID, unless they are distinct in your type system

                          you can, but if there actually is a collision, then you have much worse problems in your hands.

                          1. 1

                            Or a very (* huge number of times) unlikely, but not impossible, event happened once… either way, you’d have to be even more unlucky/have a bigger problem to collision within the same namespace.

                            1. 1

                              It’s not either-or. You can make your types mutually exclusive at compile time and your values collision-resistant at runtime too.

                            2. 3

                              That’s a good idea 💡! I’ve also seen the issue you mentioned solved by prepending the table name to the id “person:12345” or using opaque type aliases (if the language has a type system that supports this)

                              1. 5

                                Embedding the type along with the identifier is such a good idea that I’ve surprised myself with how much I used to scoff at this when looking at IDs on paper communications, e.g. ‘CST12345’ having a ‘CST’ prefix so that it’s unambiguously (for the organisation) a customer identifier rather than a widget or a data or … whatever.

                                Within a relational database, in X normal form tables, the ‘type’ of a row is effectively identified with the table name. With other forms of data storage, there are other approaches. With CSV, for example, column name serves this purpose. In a structured document, e.g. JSON, we might have:

                                "customer-id": 12345
                                

                                It’s still far too easy for confusion to happen, though.

                                In data storage, we might do this:

                                "customers": [
                                    { "id": 42, name: "Alice", "ref": 99 },
                                    { "id": 34, name: "Bob", "ref": 123 }
                                ]
                                

                                ref … to what? It’s something that has an ID, and that ID is stored as an integer, but:

                                1. We don’t know what it is, so it’s easy to make a wrong assumption.
                                2. It’s an integer, so if we get it wrong, we might not even notice until catastrophe occurs.

                                In code, too, we can do (pseudocode here):

                                function update_customer(id, ref, from_date):
                                    save(id = id, address_ref = ref, from_date = from_date)
                                

                                Again here we’ve made an assumption, and if new_ref wasn’t supposed to be an address_ref, we have a catastrophe. Instead of updating when the customer started the job with ref ‘ref’, we’ve set when they moved into the address given by ‘ref’, which was not the intention.

                                Does this mean that we should lean on the type system? In some languages, this works really well. Apologies because I haven’t used C++ for a long time, but I seem to remember you can do this:

                                void update_customer(Customer customer, Job job, Date from_date) {
                                    save_job_details(customer.id, job.id, from_date);
                                }
                                

                                Here we’ve fixed the issue of understanding what we’ve been passed, but there’s still a boundary here between the code and the underlying data, where we can easily get things confused, because the data is stored without strict types - customer ID is stored as a number, job ID is stored as a number - even date is (let’s assume here) stored as a number - and they could be switched around without the code or the data storage caring.

                                We could have a storage system that enforced the types in the same way as the type system we’ve built in the code, but I don’t think I’ve ever used a system like this. Once I’m working with data storage or transmission, I’m mapping to/from a structure where the highest level enforced types are only as sophisticated as integer vs bigger integer vs date vs text, for example.

                                We’ve gone some way to helping ourselves avoid a particular kind of catastrophe by using GUID/UUIDs as unique identifiers, so even if we get confused, at least the chance of mistaken identity is effectively zero.

                                Why doesn’t an XML/JSON schema fix the problem? Because it only works as well as the mapping code. I’ve noticed there’s been a trend away from schemas as JSON’s become the lingua franca. I used to really like them in the XML world because they provided a little safety ‘for free’ and then building some checks on top was much less work.

                                Why don’t ORMs solve the problem? They do help, but I think we’ve all seen plenty of positive and negative aspects of using ORMs, so they aren’t ubiquitous, especially in the case where they are hooked into the types through application code, and of course the world isn’t using relational databases for everything any more.

                                Back to labelling identifiers with their types: It’s a safety belt, it adds boiler plate, and we often don’t like ‘unnecessary’ fluff, but with a little tooling support, perhaps it’s one of the better options for avoiding type confusion that we have when it comes to data leaving the confines of code using high level types for safety.

                            3. 13

                              They don’t only solve solve that problem. They solve the problem of generating random numbers on a distributed manner without collisions. And because of this they are a rather huge number, which makes them a very bad choice for database index.

                              I don’t quite agree with this article, in the sense that the examples provided by the author are pretty well known anti patterns. The reason why you probably shouldn’t do it is that you shouldn’t do it at all. An incremental integer primary key is not meant to be used as a public reference for external systems. If you need another unique identifier, add one as needed. You can generate a large random integer for example. I don’t think anyone would write a for loop and use a loop counter for database lookup. The point of having a database is to NOT do such things.

                              1. 7

                                I’ve mapped “UUIDs are a bad idea for database keys” to a “needs more research” thunk in my head. I have no idea if it’s true or not. One place it was discussed: https://news.ycombinator.com/item?id=14523523.

                                1. 4

                                  They don’t only solve solve that problem. They solve the problem of generating random numbers on a distributed manner without collisions. And because of this they are a rather huge number, which makes them a very bad choice for database index.

                                  Why? Indexes aren’t using the integers directly. Binary search – and b-trees – don’t care how big a value is. Neither do hash indexes. UUIDs aren’t ordered, but I doubt that user ids would have a huge locality effect – or that you’d have enough users that the index pages would fall out of cache.

                                  What’s the concern? Storage size?

                                  1. 2

                                    So from personal experience I worked at a place using MSSQL and they had GUIDs as the Clustered Index (CI). Now aside from the fact that these were stored as the wrong type (MSSQL has a UUID type which should have been able to handle them a bit more space efficiently) the other problem was table bloating.

                                    See, when you have an ordered index and you insert a user into the table it just gets appended to the end of the table in the last page. Then the indexes get updated and those might end up having to be inserted in the middle of a page somewhere in the middle of the index but because indexes are usually smaller than table records this isn’t a massive deal.

                                    Now if your clustered index isn’t ordered you end eventually with an incredibly sparse table, because if you consider a completely defragmented and compacted table, if you need to perform an insert this insert is going to be guaranteed to be in the middle of the table somewhere in the middle of a page. This causes a page split (which means you’ve now got two pages). This keeps happening until your tables all take up twice as much space.

                                    And when your key type is a 36 byte fixed string and everything refers to everything and your keys end up included as index elements for various reasons this means that your database is now growing at an enormous rate all the time.

                                    It wasn’t a pretty picture especially since we were trying to use the free tier of MSSQL which has a 10GiB per/db limit.

                                    When the solution of splitting the database up into multiple databases (which brought in a whole slew of problems) was eventually implemented after many of my protests we tried to change out all the CIs in the database. GUIDs would still be used for references but an integer RowID was being used as the CI.

                                    Databases grew slower and things actually performed better.

                                    Although the above is an anecdote.

                                2. 8

                                  When running my code in ‘test mode’, I adjusted the autoincrement sequences in the schema so that none of them generate overlapping values (so that if I use the wrong table ID I can’t accidentally find a different record). Not a perfect system, but a very cheap one to implement that catches 90% of my mistakes.

                                  1. 1

                                    Indeed, this is a great idea to catch more bugs in an existing system that is using integer IDs!

                                  2. 4

                                    Or you simply use a UserId for the the user id and be done, because it completely solves the issue described in the article (hint: it’s not a collision/uniqueness problem).

                                    1. 1

                                      hint: it’s not a collision/uniqueness problem

                                      Exactly. UUIDs aren’t a bad solution, but they aren’t targeting the actual problem being discussed, even if they happen to work well for it.

                                  1. 6

                                    I don’t know much about X11 or graphics programming but was able to follow this at a conceptual level.

                                    Is this the direction Wayland ended up going or were different decisions taken with its design?

                                    1. 10

                                      Some the same, some different. They’re generally based on the same principles, but where Wayland draws the lines isn’t always where this (excellent) article does. My somewhat limited understanding and comparison is as follows:

                                      Each graphical application gets direct access to the hardware, and a window is nothing more than a clipping list and an (x,y) translation

                                      Wayland working through OpenGL is even a little lower level than that, a window is just a buffer of pixels to output to. Clipping lists are less important because a compositor… well, does the compositing for you, blitting together multiple windows into a final framebuffer that gets shown on the screen. GPU’s can help with this process a lot more than they could in 2002, when doing off-screen rendering and layering together multiple render passes was expensive and uncommon.

                                      It doesn’t just maintain clipping lists. It maintains the “true shape”of each window, and a stacking order. The windows clip is derived from these by subtracting from the clip for a window the shapes of all of those above it…

                                      Again, Wayland itself is a little lower level. The actual Wayland protocol/API says nothing about any of this, it’s entirely up to the compositor implementing it how it wants to arrange and communicate between windows.

                                      [The windowing system] has to handle resource allocation within the accelerator, including texture space and rendering ports.

                                      Yep, Wayland does this.

                                      The mouse driver would need to have an association between clipping lists and processes to deliver events to. It is the natural place for one small piece of UI policy: that once a mouse button goes down, all following events get delivered to the same process until all buttons are up.

                                      This is left entirely up to the compositor again. It just gets a stream of input events and handles the multiplexing itself, delivering them to whichever window it thinks should get them.

                                      It is quite likely that the keyboard and mouse driver should be merged at the higher levels so that applications have one file to read events from.

                                      IIRC, which I’m not sure I do, Wayland effectively does this. It’s more complicated than that because there’s lots of other events as well, particularly things like monitors or mice or such being attached/disconnected, which nothing really handled well in 2002, and was less common because laptops weren’t quite as pervasive.

                                      Almost everything else would be done at user level, in the user process: rendering, window borders, and the hard parts of cut/paste/drag/drop. … The kernel knows nothing of the details within a window.

                                      Yep, exactly what Wayland does.

                                      So all in all, a lot of things are similar but not exactly the same. A lot of the difference is Wayland refusing to dictate policy. A little bit of it is due to hardware evolving in different ways than anticipated, but for being written in 2002 I find this incredibly prescient. Advantages to long and deep experience, I suppose: you can look at what the trends are and how they change.

                                      1. 2

                                        Thank you so much for the detailed answer. I read it and immediately afterwards saw https://tudorr.ro/blog/technical/2021/01/26/the-wayland-experience/ show up.

                                        Having this background made me enjoy that article even more 🙌

                                      1. 2

                                        Would you mind summarizing for those of us with short attention spans?

                                        1. 9

                                          Browsers will never be done, because they’ve become operating systems, and operating systems themselves are never complete.

                                          Now look at our current popular options: Windows, Mac, Linux. Each have their own ecosystems.

                                          The web is currently “ChromeOS web”. This means it is pointless to build something like Firefox, which ends up being just an alternative “ChromeOS”. This is very similar to Windows vs ReactOS scenario. It’s impossible for ReactOS to catch up.

                                          So what we need is a lot of people to get fed up and create the equivalent “Linux web”. We need a “POSIX web” of sorts. There are many Linux distributions but they are somewhat the same.

                                          1. 7

                                            POSIX web

                                            Just like POSIX is the lowest common denominator of (old) Unix systems, HTTP/1.1 and HTML 4 with Ecmascript 3 are the lowest common denominator of (old) web browsers. These standards are absolutely fine. The problem is developers’ endless hunger for new features and more new capabilities. POSIX tends to not be enough either if you want modern software that can do things like use cgroups (Linux only; for example, Docker just runs inside a Linux VM on Mac) or eBPF (linux only). Most of the other *nixes are just playing catch up to Linux, except perhaps for OpenBSD which provides novel and useful APIs of their own like pledge and unveil, and of course Mac which is like a universe unto its own.

                                            So all in all, I don’t think this is going to happen. Innovation can’t be stopped and almost all the development power is behind Chrome (Linux) and anything new that’s implemented by Firefox (BSD) will eventually end up in some shape or form in Chrome (Linux).

                                            Edit: Ironically enough, I just noticed that you consider Linux to be the “alternative”, where I was comparing Linux to Chrome. Note that Linux != POSIX. In fact, POSIX is that which you rely on when you’re trying to write portable non-Linux-only software.

                                            1. 4

                                              The problem is developers’ endless hunger for new features and more new capabilities. POSIX tends to not be enough either if you want modern software that can do things like use cgroups (Linux only; for example, Docker just runs inside a Linux VM on Mac) or eBPF (linux only).

                                              I think a lot of that “hunger” makes sense though; The Web™ isn’t really all that more complex than Qt (which already includes much more than just a GUI toolkit), or GTK3 + some libs (gstreamer, whatnot), or other desktop libraries/frameworks to build similar applications.

                                              The difference is mainly in the development model: I can ship a POSIX system and that will be useful because many things are built on top of that (not everything, as you pointed out), but I can’t ship a “HTML 4/ES 3” system and expect it to really be useful, since everything is driven by standards that are expected to be implemented rather than libraries built on top of the POSIX foundation.

                                              So I don’t think that HTML 4 + HTTP 1.1 + ES3 is really a “POSIX web”; as it won’t really allow me to build many useful applications, whereas POSIX does.


                                              I wouldn’t say this standards-driven model is “flawed” per se, but it does come with some serious drawbacks. It’s also not an easy problem to solve without basically chucking away everything we have now. Look at the discussions surrounding Python 2’s and Flash’s EOL this week to see how hard that is, and everyone is using IPv6 yet, right? Besides, this is not an easy problem to solve well in the first place.

                                              You see this “drive towards standards-driven complexity” even with much simpler systems like email. Reading and implementing RFC5321 (SMTP) RFC5322 (message format) is not enough to implement a functional email client today: you need 10+ other specifications too if you want it to really be useful, even for plain text email, and 25+ specifications if you want it to be fully-featured. And even then you won’t be done yet because there are a number of unstandardised common behaviours too. It’s not as complex as the web as the scope is much smaller, but it’s the same issue really. This was also a big problem with XMPP.

                                              1. 2

                                                Thanks for your thoughtful and long reply!

                                                I can’t ship a “HTML 4/ES 3” system and expect it to really be useful.

                                                I dunno, people have been bending over backwards to keep stuff compatible with old versions of Internet Explorer for so long that I think it has proved that even buggy implementations of these standards are “useful”. Maybe not if you want to build videoconferencing or such, but for the majority of web apps, it was (and probably still is) enough.

                                                I wouldn’t say this standards-driven model is “flawed” per se, but it does come with some serious drawbacks.

                                                The biggest drawback being that who decide what goes in the standard tend to be big stakeholders interested in the status quo.

                                                You see this “drive towards standards-driven complexity” even with much simpler systems like email.

                                                It makes sense of course. Popular system A adds a new feature which pulls people away from popular system B, so they decide to add that feature too to retain its users. Then they add another feature which system A then needs to add and so on and so forth. Incremental evolution of standards like this without an overarching vision is exactly what leads to complexity because more often than not, the features interact in weird ways or are not completely complementary so the need for another feature arises to paper over the seams.

                                                Unfortunately, I don’t really have a solution. I like the careful way the Scheme standard has evolved (even though it’s design by committee, it’s a committee of passionate people who come together with mostly the same vision), but its pace is so glacial it’s like watching paint dry, so it’s probably not an option for developing standards in this fast-paced business ;)

                                                1. 1

                                                  I dunno, people have been bending over backwards to keep stuff compatible with old versions of Internet Explorer for so long that I think it has proved that even buggy implementations of these standards are “useful”. Maybe not if you want to build videoconferencing or such, but for the majority of web apps, it was (and probably still is) enough.

                                                  You wouldn’t be able to easily express Lobster’s JS in ES3; it uses XMLHttpRequest for example, which is its own specification and not something you can write yourself with just JS. There are a few other things as well, such as localStorage which are pretty useful and their own standard and not something you can implement yourself in just JS.

                                                  Then there’s the whole layout issue; floated divs to align things suck, as do tables, but those are the only options you have with HTML 4/CSS2. Flexboxes and Grid are pretty nice. And things like border-radius, box-shadow, opacity, etc. to replace all these hacks we previously did with images is pretty nice too.

                                                  There are also things that don’t really need a standard IMO; for example the “drag and drop” thing, as you can easily just write a JS library to do this (and many exist already), but a lot of things that were added solve real problems that were much harder or impossible to solve before.

                                                  I think a lot of problems are because of legacy and backwards compatibility; the whole HTML/CSS thing is so needlessly complex now that writing a “bug compatible” HTML renderer is quite the task, although writing a basic one is not that hard (I did it in a weekend last year).

                                                  1. 1

                                                    You wouldn’t be able to easily express Lobster’s JS in ES3; it uses XMLHttpRequest for example

                                                    Sure - but you could absolutely do it with ES5.

                                                    Did we need to add, say, fetch to the standard - given 99% of what it implements can be provided via a thin wrapper around XHR?

                                              2. 1

                                                Yep.

                                              3. 2

                                                Linux web

                                                The Gemini protocol? Or do you mean like Display PostScript?

                                                1. 1

                                                  Probably more like nix-shell for “web applicaitons” and then just the various internet protocols like ipfs, hypercore, https, etc.

                                                  1. 1

                                                    the language for web applications would be up to the author right? or are you envisioning some system where web application code is part of the standard?

                                                    1. 1

                                                      I mean the platform is linux and the application runs on the system and uses internet protocols. You’d use a package manager like nix or guix to install ephemeral apps and simulate the web experience.

                                                      Sorry the reply wasn’t highlighted or I’d have answered sooner.

                                                      1. 1

                                                        so this wouldn’t really be the “web,” it would be more like the pre-web world where each use case had its own protocol and application. if the web apps are running locally, they could just as well be native programs.

                                                        1. 1

                                                          Exactly. Native programs are the web apps of the next generation because microkernels are coming. The whole stack is shifting down one later.

                                          1. 7

                                            As someone who works in Rust and JavaScript, these criticisms don’t hold much weight.

                                            The value I get from Rust boils down to a well designed language with the ability to write both low and high level abstractions, a modern toolchain and an ecosystem that contains libraries borrowed from best-in-class approaches in other languages.

                                            I’ve personally found the borrow checker to be a great (if demanding) tutor, but I could take it or leave it as a language feature.

                                            1. 8

                                              I first ran across this concept around two years ago in Hasura (ACL* docs], which uses table and row-based ACL. At the time I was working on a Rails app and I was surprised by how much we were doing in the application code that could actually be pushed to the database level.

                                              I recently interviewed at quite a few companies and only one of them had any interest in discussing using DB ACL* features. There’s no bias against it, just not a lot of knowledge about how to set it up and where it becomes difficult.

                                              After spending some time reading about setting up database level ACLs and thinking about it, I think the issue that holds back a lot of these use cases is the opacity of seeing which options are set. Unlike with source-code where you can read the code without running it, you need to access an up-to-date instance of the database in order to find out how it’s configured.

                                              This is just a bit too different from how we’re used to working and that small change probably makes people investing in building startups reluctant to embrace these changes, because they’re perceived as an eventual bottleneck to scaling.

                                              For FAANG and Fortune500 companies, I think the main reason they don’t implement controls at the DB level is that they’ve already solved them in the application code and there’s a natural level of inertia to re-writing a solved problem.

                                              * ACL stands for Access control list

                                              1. 4

                                                After spending some time reading about setting up database level ACLs and thinking about it, I think the issue that holds back a lot of these use cases is the opacity of seeing which options are set. Unlike with source-code where you can read the code without running it, you need to access an up-to-date instance of the database in order to find out how it’s configured.

                                                Do you think it would help if the DB forced changes to these options to go through the config file? It would be like nginx: to change a setting at runtime, you edit the config and reload.

                                                I wonder how far you could take this idea. Would it make sense for the DB to require all DDL to go through the config file?

                                                1. 1

                                                  That’s a great idea! I think it fits well with the broader movement to “infrastructure-as-code” and for me at least, the database is much more infrastructure than code. I think this is a bit how Prisma’s data modeling works.

                                                  One of the problems I’ve had reasoning about databases in the past is that often I find myself reading through migration logs, which is the equivalent to reading through git .patch files to find the current state of your codebase.

                                                  One of the biggest benefits I found from using Hasura for a couple projects was that it gave me a UI to view my current database schema at a glance.

                                                  1. 1

                                                    “Developers need to know the DB schema of production” is not a super hard problem to solve in a secure way, but most solutions in use are either insecure or subject to drift.

                                                    A nightly job that dumps the schema somewhere accessible might fail a particularly tight security audit, but I suspect most would let it pass after giving it a close look.

                                                1. 4

                                                  I always wonder why MsgPack took off over similar tools like Cap’n Proto?

                                                  1. 21

                                                    they’re different things, msgpack is self-encoding and is basically binary json, whereas capnproto comes with an IDL, a RPC system, etc.

                                                    1. 4

                                                      That is an excellent question! Could be some kind of worse-is-better thing, due to how very compatible with JSON MessgePack is. But then, why not BSON? I would love to see some analysis of usage trends for these new binary serialization protocols, including Protobufs, CBOR, BSON, and Cap’n Proto.

                                                      1. 9

                                                        I want to like CBOR; it has a Standards Track RFC: 7049, after all. But, the standard does define some pretty odd behavior, last I looked. There’s a way to do streaming which, effectively, could require you to allocate forever.

                                                        I’ve not used Cap’n Proto, but have definitely used Protobuf. I greatly prefer the workflow of using MsgPack, but do also appreciate the schema enforcement being generated for you with Protobuf, it does get in the way early on in dev though. :/

                                                        1. 4

                                                          Funny, it’s the other way for me; I prefer to nail down the schema as early as possible. I thought Capn’p was pretty great compared to JSON, but I might not be as enthusiastic if I was trying to port over some sprawling legacy thing that never had a very well-defined schema in the first place.

                                                          1. 4

                                                            I do adhoc, investigative stuff far more often than I do work that goes into production. That being said, the last few projects I’ve had involvement in have started out with defining a schema in protobuf and moving forward that way. Prematurely, in both cases, probably. :)

                                                          2. 4

                                                            Yeah, but the streaming is optional and can be used as a ready-made framing format for your whole TCP session instead of inventing an ad-hoc one.

                                                            Seriously, stop inventing new protocols. Just use newline-separated JSONs or CBOR. Please? Pretty please?

                                                            1. 5

                                                              You can stream MsgPack objects back-to-back without any problem at all - it’s just concatenating objects one after the other on the wire, in a file, in a kafka queue, etc. You can do the same thing in CBOR, but CBOR makes it ambiguous if you should wrap the objects inside an indefinite-length array or something, and then says that some clients might not like that.

                                                              And this encapsulates the problem with CBOR - it defines a bunch of optional features of dubious value (tags, ‘streaming mode’, optional failure modes) that complicate interoperability and bloat the specification. The MsgPack spec is tiny and unambiguous.

                                                              It’s really a shame that CBOR was forked from MsgPack and submitted to the IETF against the will of the original authors. Now we have two definitions of essentially the same thing, but one of them is concise, and the other one is an IETF standard.

                                                              1. 1

                                                                It might have been better, yes. But you can use strict mode, the standard can be revised and IANA runs a tag registry. In this case, as much as I hate the saying, good is better than better.

                                                                Without a clear signal “use this” and proper hype some people might consider alternatives. And that will inevitably lead to custom formats. We need an extensible TLV format with approximately JSON semantics to move forward. Not more “key value\n” without escaping or dumping packed data structures to the wire.

                                                              2. 1

                                                                ready-made framing format for your whole TCP session instead of inventing an ad-hoc one.

                                                                Is this actually true? I assumed the streaming allowed for an arbitrary nesting, say:

                                                                [ {...},
                                                                  {...},
                                                                ....
                                                                ]
                                                                

                                                                Where you really can’t finish reading until that last ] forcing continued growth in allocations. I suppose your implementation could say “an outer array will emit the inner elements via a callback” … then you don’t have to allocate the world.. But, what if the inner object also uses streaming? Is that a thing that can happen?

                                                                Seriously, stop inventing new protocols. Just use newline-separated JSONs or CBOR. Please? Pretty please?

                                                                Probably don’t want newline delimited CBOR, or MsgPack. Might I suggest you stick a MsgPack integer before your object, decode that, and then read that value of bytes more and avoid delimiters altogether? Sure was nice back when I was using “framed-msgpack-rpc”…

                                                                1. 2

                                                                  I believe you can have nested streaming, though I can’t think of a practical use-case outside of continuously streaming “frames” of data. The endless allocation isn’t really endless, it goes until whatever is producing the data marks the end of it. If you’re pipelining your data handling, streaming like this is extremely useful because it allows you to stream large datasets while keeping things stateless/within the same request context. Most implementations don’t support streaming though.

                                                                  1. 1

                                                                    Just use newline-separated JSONs or CBOR. Please? Pretty please?

                                                                    Probably don’t want newline delimited CBOR, or MsgPack.

                                                                    I am not a native English speaker and I thought the comma made it ((newline-separated JSONs) || (CBOR)).

                                                                    1. 2

                                                                      I think you did everything right English syntax wise. I messed up reading your intention. Even with your clarification, though, I still think framing MsgPack with the number of bytes in the message is a great idea. :)

                                                                      1. 1

                                                                        And do you frame it using it’s compact integer notation, or fully wrap it in a blob or do you settle for an uint32be?

                                                              3. 8

                                                                I don’t have usage trends, but I wrote up a compare-and-contrast to all these various things a little while ago: https://wiki.alopex.li/BetterThanJson

                                                                Long story short, MsgPack is intended to be schema-less, or at least schema-optional, like JSON (and also CBOR). Cap’n Proto, like Protobufs, assumes a schema to begin with, which makes it much more complicated, with more tooling attached, and potentially much faster.

                                                                Also the more I look at CBOR and MsgPack in terms of encoding details the more similar they look to me; they both seem to very obviously share the same lineage. Take a look at the encoding of the example on the MsgPack website and the CBOR version.

                                                              4. 3

                                                                Slightly less juvenile name?

                                                                More seriously though, has MsgPack “taken off”? From what I can see most stuff offered as an API format is JSON. I guess for internal messaging something less fat on the wire is valuable.

                                                                1. 3

                                                                  It definitely has some level of adoption. I’ve been using Rocket (Rust web framework) and noticed it’s mentioned in the minor version release notes. Which leads me to assume that MsgPack has at least enough interest for issues with it to get fixed in a non-mainstream framework. Not a super strong basis for this conclusion, but I think we’ll continue to see interest in MsgPack growing.

                                                                  1. 6

                                                                    For how old it I think it has absolutely not taken off. Also Rocket is not even Rust-mainstream.

                                                                    1. 7

                                                                      Is any web framework Rust mainstream? 😛

                                                              1. 1

                                                                I ran into this problem last weekend. This article provides a great walkthrough of the problem space.

                                                                These days you can get the non-Cow solution by just setting your input as token: impl Into, which has nice ergonomics.

                                                                I decided I’d prefer to have pretty code with the extra allocation rather than making API consumers worry about Cow’ing their inputs.

                                                                1. 2

                                                                  This is an interesting idea, but there’s a reason C-style syntax is so dominant in popular programming languages today. The difficulty of learning something new is proportional to the distance it lies from concepts you already know.

                                                                  So while this might be perfect for dense algorithms papers or folks who already have strong math backgrounds, it doesn’t seem worth the effort involved to learn for most industry engineers.

                                                                  Almost all my attempts to convey code fall into a few well defined categories:

                                                                  1. Code review. In which case I’ll either suggest edits to the code or in cases where I want to showcase a different design approach, provide a patch demonstrating the suggestion.
                                                                  2. Data Modeling. This is mostly the art of drawing boxes on a whiteboard.
                                                                  3. Whiteboarding actual code. I just use Python in these cases as it’s almost universally understandable (even if you don’t know Python!). I interview in Python for the same reason, despite rarely using it day-to-day.
                                                                  4. Writing design documents. This is where I’d think this would really shine, but I have never actually written a design document that’s more focused on the algorithmic solution than on explaining the problem and the tradeoffs that led to the solution that’s going to be used.

                                                                  Has anyone used this or similar shorthand systems? What are your experiences?

                                                                  1. 4

                                                                    I use a hodgepodge of mathematical symbols, Z notation, and J when writing stuff by hand. I’m writing for just myself, to get my thoughts in order, and the terseness helps a lot.

                                                                    1. 1

                                                                      Interesting, I’ve never heard of J before. Do you use it as a primary language?

                                                                      1. 2

                                                                        More as a hobby language, it’s just that being able to write \: list instead of sort_descending(list) is pretty nice!

                                                                      2. 1

                                                                        Funny you say J, as a Forther a lot of my scratch notes are Forth style. Just goes to show, one likes to program in ways one likes to think.

                                                                        1. 1

                                                                          one likes to program in ways one likes to think.

                                                                          Hence my tendency to write much of my code as if I long for it to be Lisp. Because I do.

                                                                      3. 1

                                                                        If I’m following the examples correctly, then there’s a bug in the FizzBuzz algorithm.

                                                                        When I do this type of stuff, I will generally use a pseudo-C type language—braces for blocks, and will sometimes skip variable declarations if it’s obvious. I might even flow chart a bit.