1. 81
  1. 39

    This is the most sane article on modern tools I’ve read in ages. Neglecting complexity is the industry-wise disease, and the article explicitly talks about it.

    1. 25

      Our industry would be a lot better if there were more stories of people running into real issues and having to scale–not self-inflicted ones. I don’t see enough of “Here’s where our response times and query times blew up, here’s why we couldn’t just buy a bigger machine, here’s the service architecture we had and why that was unchangeable, here’s where our hand was forced.”

      There’s not enough clean data and anecdata to properly educate the next generation, and instead we end up with weird stories like Hadoop being beaten by basic shell knowledge or the famous McIlroy/Knuth example.

      Cynically, one might note that a common theme is that fewer engineers with more knowledge and better analysis outperform large teams of less experienced engineers with shinier-but-less-understood tooling and problems, but then we start running into the deep soul-searching of how our industry career paths work and how we’re compensated and how “good” we all really are at our jobs…and that just leads to madness.

      1. 9

        Knuth example:

        “What people remember about his review is that McIlroy wrote a six-command shell pipeline that was a complete (and bug-free) replacement for Knuth’s 10+ pages of Pascal. Here’s the script, with each command given its own line:”

        Although I get and agree with article’s gist, the second program is super apples to oranges comparison. Knuth custom made his functions if I’m understanding the article. He did it in 10 pages of neat Pascal to ensure it’s all done correctly. The alternative was 6 lines of shell that were other programs. Their source is used in the solution but not counted. Problems in his source was counted but problems in those dependencies’ source over time wasn’t mentioned at all. Either the UNIX utilities were flawless, neat code or their flaws were ignored. Apples to apples comparison would be the total source, style, correctness, and effort of the full program (script plus source of dependencies) vs Knuths full program. Knuth might look better then.

        1. 15

          I’ll disagree with you here–the shell was linking together programs (subroutines) the same as you would in a language–say, Pascal–with standard library routines and components. Saying that the shell script doesn’t count because it used other programs feels a little bit like saying the Pascal program doesn’t count because the developer isn’t manually pushing around stack frames and setting link registers.

          Anyways, your concern and my reply above is exactly what I’m talking about: we only have a handful of stories like the above, and they don’t even provide clear guidance for practices. Some questions:

          • Are we to conclude that McIlroy was correct for using smaller programs that functioned about as quickly and could be maintained by others?
          • Are we to conclude that Kunth’s rigor is better, even if his program took much longer to develop?
          • Is the fact that Knuth is clearly the superior computer scientist also give him superior engineering status?
          • Should all of us have the familiarity with basic tools that McIlroy has so we don’t have to reinvent bespoke wheels as Knuth did?

          All of those are valid questions and interpretations of the source story–which is why I classified it as “weird”.

          The story underscores our own profession’s lack of understanding.

          1. 3

            “aying that the shell script doesn’t count because it used other programs feels a little bit like saying the Pascal program doesn’t count because the developer isn’t manually pushing around stack frames and setting link registers.”

            I think that takes it too far. Most developers don’t expect each other to write compilers or assembly. Experts usually do that in a way average developer can use. If custom and aiming for efficiency, they use the standard, low-level language of the platform. That’s C for UNIX. Pascal is a C alternative. So, I’d have compared the C implementation of the shell commands plus the shell script to the Pascal implementation. That’s pretty fair.

            “ll of those are valid questions and interpretations of the source story–which is why I classified it as “weird”. The story underscores our own profession’s lack of understanding.”

            I agree with that and your questions being good examples of it.

            1. 2

              I think these are very interesting questions. I’d note that indeed you would be comparing apples to oranges, given that McIlroy’s shell commands are one abstraction level higher; indeed the shell is linking programs together like subroutines, but those subroutines/programs are themselves written using the standard library as well. If Knuth had used a Pascal helper library which included subroutines equivalent to each of the existing shell programs, that program would not have been (much) longer.

              So yes, most often it is the right approach to re-use existing libraries and components (otherwise, we’d all still be hand-crafting machine code, basically building skyscrapers out of toothpicks). The best engineers understand the tower of abstraction as far down as possible, which means they know where the fault lines are, so they realize when the standard tools suffice, and when it is necessary to write your own or dive into the source to make existing tools suitable. It’s a trade off that has to be made, depending on the project and how important the component is to it.

              I think you end up with even deeper and harder to understand abstractions that try to paper over the flaws in lower level tools if you avoid rewrites at all costs, which is (unfortunately) something of a broader problem in our profession. It doesn’t help that all the “good practice” guides tell you to re-use, re-use, re-use existing code and never write your own if something already exists if it’s even remotely similar to what you need. Of course, market pressure to perform and whip up stuff as quickly as you can in as low a budget and as little time as possible doesn’t help. I’m definitely guilty myself of using square pegs to fit round holes just to save time and costs, and I’m sure the vast majority of us are. On the other hand, without that pressure there would be so many examples of modern technology that we take for granted which would not exist.

              I guess this is why programming in practice is more of an art or craft than a science. It’s also what makes it challenging, and we’ll probably be arguing about all of this decades from now :)

            2. 2

              The counterexample to the oversimplified “what people remember” is also in Programming Pearls.

              Here’s the story in a nutshell. With a good idea, some powerful tools, and a free afternoon, Steve Johnson built a useful spelling checker in six lines of code. Assured that the project was worthy of substantial effort, a few years later Doug McIlroy spent several months engineering a great program.


              1. 2

                That’s really only relevant from a security/correctness perspective.

                The important part in most environments is that the shell pipeline takes five minutes to write and another ten to debug, and most future modifications are going to take a similar amount of effort.

                Doing it Knuth’s way would take me all day, and I have plenty of other work to do.

                1. 1

                  That’s true. It’s better to use quick-and-dirty route if incorrect results are OK.

                  1. 1

                    Well in this case it was Knuth program that had bugs. So, slow-and-dirty route.

                    1. 1

                      The UNIX utilities have had plenty of bugs. Those may or may not have, too. The fair comparison would be bugs and severity in them up to that point vs Knuths clean-slate version in Pascal.

              2. 4

                and that just leads to madness.

                Or disenchantment. That’s where I am.

                1. 2

                  I knew about the Knuth/McIlroy discussion but I never read any of McIlroy’s actual comments from it. They’re really good. Kudos for linking to that article.

                  Makes me want to find a full copy of the journal in my university’s library.

              3. 37

                What about dependencies? If you use python or ruby you’re going to have to install them on the server.

                How much of the appeal of containerization can be boiled directly down to Python/Ruby being catastrophically bad at handling deploying an application and all its dependencies together?

                1. 6

                  I feel like this is an underrated point: compiling something down to a static binary and just plopping it on a server seems pretty straightforward. The arguments about upgrades and security and whatnot fail for source-based packages anyway (looking at you, npm).

                  1. 10

                    It doesn’t really need to be a static binary; if you have a self-contained tarball the extra step of tar xzf really isn’t so bad. It just needs to not be the mess of bundler/virtualenv/whatever.

                    1. 1

                      mess of bundler/virtualenv/whatever

                      virtualenv though is all about producing a self-contained directory that you can make a tarball of??

                      1. 4

                        Kind of. It has to be untarred to a directory with precisely the same name or it won’t work. And hilariously enough, the --relocatable flag just plain doesn’t work.

                        1. 2

                          The thing that trips me up is that it requires a shell to work. I end up fighting with systemd to “activate” the VirtualEnv because I can’t make source bin/activate work inside a bash -c invocation, or I can’t figure out if it’s in the right working directory, or something seemingly mundane like that.

                          And god forbid I should ever forget to activate it and Pip spews stuff all over my system. Then I have no idea what I can clean up and what’s depended on by something else/managed by dpkg/etc.

                          1. 4

                            No, you don’t need to activate the environment, this is a misconception I also had before. Instead, you can simply call venv/bin/python script.py or venv/bin/pip install foo which is what I’m doing now.

                          2. 1

                            This is only half of the story because you still need a recent/compatible python interpreter on the target server.

                        2. 8

                          This is 90% of what I like about working with golang.

                          1. 1

                            Sorry, I’m a little lost on what you’re saying about source-based packages. Can you expand?

                            1. 2

                              The arguments I’ve seen against static linking are things like you’ll get security updates etc through shared dynamic libs, or that the size will be gigantic because you’re including all your dependencies in the binary, but with node_packages or bundler etc you’ll end up with the exact same thing anyway.

                              Not digging on that mode, just that it has the same downsides of static linking, without the ease of deployment upsides.

                              EDIT: full disclosure I’m a devops newb, and would much prefer software never left my development machine :D

                              1. 3

                                and would much prefer software never left my development machine

                                Oh god that would be great.

                          2. 2

                            It was most of the reason we started using containers at work a couple of years back.

                            1. 2

                              Working with large C++ services (for example in image processing with OpenCV/FFmpeg/…) is also a pain in the ass for dynamic libraries dependencies. Then you start to fight with packages versions and each time you want to upgrade anything you’re in a constant struggle.

                              1. 1


                                And if you’re unlucky and your distro is affected by the libav fiasco, good luck.

                              2. 2

                                Yeah, dependency locking hasn’t been a (popular) thing in the Python world until pipenv, but honestly I never had any problems with… any language package manager.

                                I guess some of the appeal can be boiled down to depending on system-level libraries like imagemagick and whatnot.

                                1. 3

                                  Dependency locking really isn’t a sufficient solution. Firstly, you almost certainly don’t want your production machines all going out and grabbing their dependencies from the internet. And second, as soon as you use e.g. a python module with a C extension you need to pull in all sorts of development tooling that can’t even be expressed in the pipfile or whatever it is.

                                2. 1

                                  you can add node.js to that list

                                  1. 1

                                    A Node.js app, including node_modules, can be tarred up locally, transferred to a server, and untarred, and it will generally work fine no matter where you put it (assuming the Node version on the server is close enough to what you’re using locally). Node/npm does what VirtualEnv does, but by default. (Note if you have native modules you’ll need to npm rebuild but that’s pretty easy too… usually.)

                                    I will freely admit that npm has other problems, but I think this aspect is actually a strength. Personally I just npm install -g my deployments which is also pretty nice, everything is self-contained except for a symlink in /usr/bin. I can certainly understand not wanting to do that in a more formal production environment but for just my personal server it usually works great.

                                  2. 1

                                    Absolutely but it’s not just Ruby/Python. Custom RPM/DEB packages are ridiculously obtuse and difficult to build and distribute. fpm is the only tool that makes it possible. Dockerfiles and images are a breeze by comparison.

                                  3. 20

                                    I like that the author disagrees politely, and works from the assumption that he’s coming at the problem from a different place. Its a well taken point that issues of scale and configuration management are important, but for many side projects and even full fledged startups, those considerations might be premature. Sometimes your energy might be better spent building product instead of infrastructure.

                                    1. 13

                                      The original article also took some very basic questions like “How do you deploy your application? Just rsync it to the server?” and directly presented Kubernetes as the answer. That’s literally using a sword when you could use a knife.

                                      I like how this post handles those questions instead, presenting a simpler response to those questions.

                                      I somehow think the original article was a very good introduction to Kubernetes in general, but it couldn’t make the case that Kubernetes is the best choice to deploy a side-project.

                                      1. 8

                                        Glad I wasn’t the only person who read that article and thought that!

                                        Generally if deployment involves more steps than Heroku, I don’t want to do it. Keeping packages up to date on my production server? Ugh. Given that my personal time is 10% of what it used to be, the Heroku premium still has me coming out way ahead. (Lambda would be a good possibility as well!)

                                        1. 1

                                          That’s why I like dokku. It takes a small amount more setup initially but deployment is exactly as easy as Heroku. There’s a continuum between “dev” and “ops” in devops, with Heroku being at the far “dev” end, and dokku is just one step down from that.

                                          1. 1

                                            Given that my personal time is 10% of what it used to be

                                            Children? ;-)

                                            1. 3

                                              Depending on the ages, that can be too generous by an order or two of magnitude. Ask me how I know!

                                              1. 1

                                                Looks like I’m lucky eventually :-)

                                          2. 4

                                            I agree with the conclusion at the end: the best part about docker and kubernetes is that the configuration is all written out. You don’t need to remember anything. Everything can be version controlled. When you come back to it in 6 months you don’t have to struggle to remember how you installed x. Otherwise, it’s a pain in the ass.

                                            1. 3

                                              An alternative: just write a nixos config for your service if you’re concerned about your configuration producing an artifact for later.

                                              1. 2

                                                I like just using ansible for deploying everything, myself.

                                                And then there’s this mysterious thing called “documentation” that I hear helps too, when returning to an old project after six months or so.

                                                1. 3

                                                  That sounds like writing. I think you’re trying to trick me into writing.

                                                2. 1

                                                  Terrible question (because the answer varies), but how long does it take to get up and running using nixos when you don’t know nixos?

                                                  1. 4

                                                    It took me a few days of struggling, but that is because I

                                                    • was too stupid to understand that you can’t compile Haskell on MacOS and have it run on Linux
                                                    • didn’t understand how/why I should pin the nixpkgs version
                                                    • am generally quite stupid

                                                    Overall, I’m super happy with NixOps which in my case provisions EC2 machines with NixOS and deploys my Haskell programs to them, after compiling said programs inside a NixOS Docker container (because the architectures must match).

                                                    I wrote about this a few years ago so whatever I wrote is likely to be somewhat outdated. HTH.

                                                    1. 3

                                                      My problem is, it’s a giant PITA to get anything to talk to stuff inside the nix ecosystem. You end up fighting dependency hell all over again, if you try to talk to stuff that’s packaged up inside /nix. So if you go nix you have to go 100% nix, there is no Oh I’ll use nix for these 10 base dependencies sort of thing. It’s an all or nothing proposition in my experience.

                                                      Also I’ve never been able to get my own stuff to live inside of nix, there is basically zero useful documentation I’ve found about how to keep my private code within nix (but use nixpkgs for dependencies) without it living inside of the nixpkgs ecosystem. Some stuff is either not ready or will never get sourced publicly for whatever reason.

                                                      The ideas behind nix are pretty great, but getting anything to interact with nix has been nothing but hell.

                                                3. 3

                                                  I think that most projects (personal or corporate) are seduced by one big feature of Kubernetes which is … (re)scheduling. Which helps with:

                                                  • If one node of the cluster is going down, the services are launched on some other node.
                                                  • Just specify “declaratively” what are the resources you need (CPU/RAM but also disk amounts and speed or GPU).
                                                  • If your project grows (scaling), add more nodes, change the scales of pods and you can go back to architectural problems.
                                                  • “Cloud agnostic”, you don’t care on which cloud you are (aws/google/azure/…), you are on Kubernetes.

                                                  Although, I do agree that most projects don’t need that at all, and the point about learning is addressed, which is a great thing. Companies could probably go pretty far with systemd + Ansible + tar archives releases, but sadly, most of us prefer to try the latest things all the time and finish with graphql on kubernetes without really knowing why…

                                                  1. 4

                                                    For most personal projects, there’s already an implicit declarative specification of needed resources: the cheapest VPS available / whatever’s in the free tier if that exists :)

                                                    you don’t care on which cloud you are (aws/google/azure/…), you are on Kubernetes

                                                    That’s still a hell of a dependency compared to “you are on Unix”.

                                                    1. 1

                                                      That’s not completely true. The differences between Ubuntu and Debian are subtle and pretty high with Fedora, without mentioning RHEL. The concepts are similar, but I wouldn’t trust a Debian expert to correctly setup a RHEL platform.

                                                  2. 2

                                                    As I always say—the best code is no code. Applies to infra in this case.

                                                    But if you want to run a k8s cluster on the side for the purpose of learning about k8s, then that makes total sense to me.