1. 3

    It seems to be a common theme these days…. People rediscovering time and time again why properly normalised data, ACID and a well thought through data model is important.

    I walked into a discussion recently where they were bemoaning the fragility, brittleness and complexity of a large json data structure…

    …my only comment was that I felt I had fallen into a time warp and I was back in the late 1980’s when people were bemoaning the problems of hierarchical databases and why a RDBMS was needed.

    Sigh.

    Sort of sad really.

    I’m still waiting for the pro-sql types to wake up to what CJ Date has been saying for decades and to up their game beyond null’s and auto-increment keys….. but we can’t get there because we keep having to rehash the basic stuff of normalization and ACID.

    1. 7

      The problem is the lack of SQL databases that require less than days to set up replication in a robust manner. Schemas are not the problem. Arcane hard to administrate software is the problem. PostgreSQL replication requires a dedicated DBA. I’m keeping a close eye on CockroachDB.

      1. 4

        I use Amazon RDS at the day job. Unless you have enough data to justify a DBA for other reasons, RDS is inexpensive enough and solves PostgreSQL replication.

    1. 5

      If you like the Slack interface but don’t like the idea of a closed silo housing all communications, check out: https://about.mattermost.com

      I have no affiliation with this company but we do use this.

      The biggest things the Slack style interface brings to the table are search-ability, cross-device sync, and persistence. I like those, and that can hop in and scroll back and catch up. I still prefer aspects of IRC though, and all these Slack-style apps are a lot fatter than an IRC client.

      I’ve never had a problem with it disrupting work. I just close the damn thing if I don’t want it right now. If your org whines to you if you do this, you have a culture/management problem not a tech problem.

      1. 17

        People seem to be missing the forest for the trees in this thread. The whole point of multi-user OSes was to compartmentalize processes into unique namespaces- a problem we’ve solved again thanks to containers. The issue is that containers are a wrecking ball solution to the problem, when maybe a sledgehammer (which removes some of our assumptions about resource allocation) sufficed.

        For example, running a web server. If you’re in a multi-tenant environment, and you want to run multiple webservers on port 80, why not… compartmentalize that, instead of building this whole container framework.

        Honestly, I think this article raises a point that it didn’t meant to: the current shitshow that is the modern micro-service/cloud architecture landscape resulted from an overly conservative OS community. I understand the motivations for conservatism in OS communities, but we can see a clear result: process developers solving problems OSes should solve, badly. Because the developers working in the userspace aren’t working from the same perspective as developers working in the OS space, so these developers come up with the “elegant” solution of bundling subsets of the OS into their processes. The parts they need. The parts they care about. When the real problem was that the OS should have been providing them the services they needed, and thus the whole problem would have been solved with like, 10% of the total RAM consumption.

        1. 3

          This is reasonable… except when you mentioned RAM consumption:

          Although containers themselves have almost no overhead, Docker is not without performance gotchas. Docker volumes have noticeably better performance than files stored in AUFS. Docker’s NAT also introduces overhead for workloads with high packet rates. These features represent a tradeoff between ease of management and performance and should be consid- ered on a case-by-case basis.

          Run containers with host networking on the base filesystem and there is no difference. Our wrecking balls weigh the same as our sledgehammers.

          http://domino.research.ibm.com/library/cyberdig.nsf/papers/0929052195DD819C85257D2300681E7B/$File/rc25482.pdf

          1. 6

            The problem isn’t really RAM or CPU weight, though the article uses that aspect to get its catchy title. The problem is unnecessary complexity.

            Complexity is brittle, bug-prone, security-hole-prone, and imposes a high cognitive load.

            The container/VM shitshow is massively complex compared to just doing multi-tenancy right.

            Simpler solutions are nearly always better. Next best is to find a way to hide a lot of complexity and make it irrelevant (sweep it under the rug, then staple the rug to the floor).

            1. 1

              Complexity is brittle, bug-prone, security-hole-prone, and imposes a high cognitive load.

              The container/VM shitshow is massively complex compared to just doing multi-tenancy right.

              Citation needed. This article is proof that doing “multi-tenancy right” requires additional complexity too: designing a system where unprivileged users can open ports <1024. Doing “multi-tenancy right” also requires disk memory and CPU quotas and accounting (cgroups), process ID namespaces (otherwise you can spy on your colleagues), user id namespaces so you can mount an arbitrary filesystem, etc, etc, etc.

              BSD jails have had all of these “complexities” for 20 years, and yet no one gripes about them. I suspect it’s just because linux containers are new and people don’t like learning new things.

        1. 6

          Erm, so I disable priv ports. I start a web server on port 80. Little Timmy comes along and starts a web server on port 80. What happens now?

          1. 3

            Timmy’s call to bind() fails, because the port is already in use by you.

            1. 4

              Then how is this actually useful for running multiple web servers on the same box? Wouldn’t it end up in a free-for-all, with the first user who starts up their Wordpress install getting port 80, while the rest have to contend with another, non-standard port?

              1. 12

                What *nix really needs is the ability to assign users ownership to IP addresses. With IPv6 you could assign every machine a /96 and then map all UIDs onto IP space.

                This is probably a better idea than even getting rid of privileged ports. You can bind to a privileged port if you have rw access to the IP.

                The real issue here is that Unix has no permissions scheme for IPs the way it does for files, etc.

                1. 5

                  Its not so very much code to write a simple daemon that watches a directory of UNIX sockets, then binds to the port of the same name, forwarding all traffic. Like UNIX programming 101 homework easy. One can certainly argue its a hack, but its possible and its been possible for 20 years if that’s what people wanted. No kernel changes required.

                  I think theres a corollary to necessity is the mother of all invention. If it hasn’t been invented, its not necessary. To oversimplify a bit.

              2. 2

                Sounds like Timmy needs a VM, so now I’m unclear on exactly how we’ve solved the energy crisis.

                1. [Comment removed by author]

                  1. 2

                    Well, what happens when I grab 10.0.0.2 too? And .3 and .4?

                    There needs to be an address broker at some level, and I’m not convinced it’s impossible for that broker to be nginx.conf proxying a dozen different IPs to a dozen different unix sockets. There’s a fairly obvious solution to the problem that doesn’t involve redesigning everything.

                    So why then does AWS offer VMs instead of jamming a hundred users onto a single Linux image? Well, what if I want to run FreeBSD? VM offers a nice abstraction to allow me run a different operating system entirely. Now maybe this is an argument for exokernels and rump kernels and so forth, but I didn’t really see that being proposed.

                    1. [Comment removed by author]

                      1. 6

                        OK, sorry, didn’t mean to be argumentative. But it’s a really long article, so I could only keep some of it in my head, and it got a lot of upvotes, so I’m trying to mine out what the insights are. But don’t feel personally obligated to explain. :)

                        There seemed to be a metapoint that things are inefficient because we’re using some old design from another era and it’s obsolete. But I didn’t see much discussion of why we can’t keep the design we have and use the tools we have in a slightly better way. Like nginx.conf to multiplex. Shared web hosting used to be a thing, right?

                        1. 4

                          I feel the metapoint was the opposite. The author wanted to go back to the old way things were done, but simply allow users to have their own IP address in the same way they have their own home directory.

                          You can already add many IP addresses to a single machine in BSD and Linux. In Linux (don’t know about BSD), you can even create virtual sub-interfaces that have their own info, but reside on the same physical interface. The author wanted unix permissions on interfaces too, rwx = read write bind. So your hypothetical user Timmy user would have /home/timmy and eth0:timmy, with rwx on /home/timmy, and r-x on eth0:timmy. They would be able to read their IP, MAC, etc, and bind to it, but not change it.

                          1. 2

                            Shared web hosting used to be a thing. I think people have realised that hosting a website means running code, one way or another, and traditional unix was never really suited to the idea that there would be multiple people organizing code on the same machine: multiple users yes, but unix is very much single-administrator.

                            More concretely, library packaging/versioning sucks: it’s astonishingly difficult to simply have multiple versions of a shared library installed and have different executables use the versions they specify. Very few (OS-native) packaging systems support installing a package per-user at all. Even something like running your website under a specific version of python is hard on shared hosting. And OS-level packaging really hasn’t caught up with the Cambrian explosion of ways to do data storage: people have realised that traditional square-tables-and-SQL has a lot of deficiencies but right now that translates into everyone and their dog writing their own storage engine. No doubt it will shake out and consolidate eventually, but for now an account on the system MySQL doesn’t cut it but the system has no mechanism in place for offering the user persistence-service-of-the-week.

                            Personal view: traditional unix shared too much - when resources were very tight and security not very important it made sense to optimize for efficiency over isolation, but now the opposite is true. I see unikernels on a hypervisor as, in many ways, processes-on-a-shared-OS done right, and something like Qubes - isolation by default, sharing and communication only when explicitly asked for, and legacy compatibility via VMs - as the way forward.

                            1. 1

                              Isn’t this exactly the problem solved by virtualenv and such? I’ve never found it especially difficult to install my own software. There was a big nullprogram post about doing exactly this recently.

                              There are some challenges for sure, but I get the sense that people just threw their hands in the air, decided to docker everything, and allowed the situation to decay.

                              1. 1

                                virtualenv has never worked great: a lot of Python libraries are bindings to system C libraries and depend on those being installed at the correct version. And there’s a bunch of minor package-specific fiddling because running in virtualenv is slightly different from running on native python.

                                People reached for the sledgehammer of docker because it solved their problem, because fundamentally its UX is a lot nicer than virtualenv’s. Inefficient but reliable beats hand-tuning.

                            2. [Comment removed by author]

                              1. 1

                                You can’t quite use namespaces that way. Net namespaces are attached to a process group, not a user. But doing something like I described would truly assign one IP address to a user. That user would have that IP address always. They would ssh to it, everything they started would bind to it by default, and so on. It would be their home IP in the same way their home directory is theirs.

                                1. 1

                                  Docker is mentioned as also bloat because of image for each container.

                                  Container and layer sprawl can be real. I can’t deny that :)

                                  But you have two options to mitigate that:

                                  1. Build your dockerfile FROM scratch and copy in static binaries. If you’re doing C, or Go, this works very well

                                  2. Pick a common root - Alpine Linux (FROM alpine) is popular since it is fairly small. Once that is fetched, any container that references it will reuse it - so your twenty containers will not all go download the same Linux system.

                        2. 1

                          They have different ip addresses, There must be some way to use multiple addresses on the same linux install and if there isnt it would be easy to add.

                      2. 2

                        From the article: network service multi-tenancy. What does that even mean? Good question. I think that in his ideal world we’d be using network namespaces and we’d assign more ips per machine.

                        Honestly it sounds like despite his concerns about container overhead, his proposal is basically to use containers/namespaces. Not sure why he thinks they are “bloated”.

                        1. 3

                          A few numbers would certainly make the overhead argument more concrete. Every VM has its own kernel and init and libc. So that’s a few dozen megabytes? But a drop in the bucket compared to the hundreds of megabytes used by my clojure web app. So if I’m provisioning my super server with lots of user accounts, I can get away with giving each user 960MB instead of 1024MB like I would for a VM? Is that roughly the kind of savings we’re talking about?