1. 11

    1. 28

      Staging being a fast feedback loop? Where in heaven? TDD and/or good local tools can simulate whole flows in a local machine with high confidence and even faster response times.

      We should be investing in removing staging altogether instead.

      1. 5

        I’ve seen it both ways, and can explain where in heaven.

        Running interactively against a server is perfect when you can’t really formulate the goal. When you know it when you see it, but can’t write a hard test for it. In that case, the best way to work is playing around with the server, changing things, trying again and again, until the server feels pleasant, feels right, is fun, whatever. Games have this a lot, because fun is so important in games.

        Running tests is perfect when you can formulate “right” rigidly before you see it.

        Both can occur in the same project. My side project at the moment had a few phases where “right” was about what progression of rendering felt calmest/fastest during rendering, but most of the code has hard tests.

    2. 20

      We’ve become mainframe developers, we all have powerful computers now but still somehow rely on a shared set of limited resources as the default.

    3. 20

      you’ll have to break your monolith into microservices so it scales more efficiently or so that multiple teams can deploy software without stepping on each other’s toes

      Are we sure of this? Or do we want really really hard it to be true, because the technical challenges of microservices+cloud+k8s+etc. are more fun to us as developers? When deciders don’t understand the technology, we can convince them of anything, and that leads to many startups failing because of the cash burn we introduced.

      A monolith can be scalable: a web application implemented with a classic framework like Rails/Django/Laravel/Springboot is perfectly scalable ; and if the database behind it is PostgreSQL, your traffic must be very very high (and/or your SQL queries very poorly written) for it to become a bottleneck.

      A monolith can be modular, so that features are added almost independently, with minor rebasing issues at the end of a development iteration.

      And of course, a monolith is way simpler to reason about, way more robust natively, and… it runs on localhost.

      1. 2

        Yeah, the author went from one extreme to another in a few cases. As you mention, there’s lots of other things before microservices. There’s also lots of ways to do a “booked big machine” style staging without actually moving your whole dev environment there. You can simulate lambdas - either through the serverless framework or a few others. You can also emulate quite a few AWS services as long as you’re not trying to test the IAM policies.

        It’s like the author saw one bad approach and then decided that absolutely every aspect needs to be fixed. In reality, you choose what you want to compromise on.

      2. 1

        And of course, a monolith is way simpler to reason about, way more robust natively, and… it runs on localhost.

        To quote, “Are we sure of this?” :P Because I am very much not…

        Something I see in monoliths a lot is that developers simply can not avoid the temptation of building more and more abstractions across their codebase. They couple so much together. This is far worse in languages like Java that strongly encourage inheritance/ couple interfaces and implementations. It creates a big mess of code that in theory could all be nice and separated but in practice just is not. I have seen this far too often - perhaps there is some theoretically beautiful monolith of nice, isolated modules, but I haven’t run into it.

        As for robust, that’s perhaps a bit vague. If you mean reliable, I’ll disagree. Isolating the state of your services is going to be a significant win for isolation. You can reason about your state more easily because it is impossible for two microservices to share state - they have physically different address spaces, they have to communicate all state via messaging. The exception might be if they have a shared db but that’s an antipattern, certainly. Microservices let you incrementally update pieces of your entire system by doing canary rollouts of one component without any other being modified - just one example.

        Microservices can run on localhost just fine.

    4. 9

      as they can’t simulate Lambdas, S3 buckets, or SQS queues.

      I’ve got to say, writing good-enough replicas for those seems way more value. Is it not possible to mock these things?

      I guess I like my Big Nearby VM. I’m even interested in moving my IDE to a VM, with something like JetBrains Gateway.

      1. 4

        I’ve not used the AWS version but the Azure equivalents of all of these have local simulators. I wrote an Azure Function for the first time a few months back for the demo that I have at CyberUK. I wanted users to be able to write some JavaScript code in the web browser that would be compiled to bytecode and then shipped to the device (FPGA simulation of a CHERIoT core). The FPGA connected to the Azure IoT Hub (basically an MQTT broker). The browser submitted the code to an Azure Function that compiled it and, if it compiled without errors, pushed it to the MQTT hub. I developed it entirely locally using the functions simulator and we also did a version of the demo at a CHERI conference with the function running on the demo machine. The compile-test cycle for local was a few seconds, for the deployed version it was a few minutes. Azure should definitely fix the latter because it’s just embarrassingly bad, but it’s hard to beat a flow that is ‘restart a local process’ for immediate debugging.

      2. 2

        There’s localstack but covering the entire API surface of these projects is really hard, especially since they integrate with each other. localstack is also really slow in my experience and definitely has bugs - it is not ideal at all :\

        I have done some local mocking for things like Kafka where I just have a:

        struct Producer<E: Event> {
            partitions: [Partition; 200],

        etc. Then I can pretend I have Kafka when I don’t. It won’t test acls but tbh acls break immediately on a canary deployment so as long as you have optimized that flow I think it’s OK. Or just set up Kafka in Docker - also not that big of a deal.

    5. 5

      Yes, and this is a terrible loss. I’m fighting against the need to develop on remote machines as much as I can.

      To me the key is that having a 100% exact environment is not necessary for local development. It’s necessary for integration testing and release process, but that can happen in CI.

      Unit testing can (and often should) use mocks for external dependencies. Once you’re able to mock dependencies, running locally with mocked dependencies becomes possible too.

    6. 4

      As that happens, there will be a point of no return in which engineers can’t run the whole system on their machines anymore — at least not in a reliable way.

      It really shouldn’t be the case that I need to run the whole system, or even a subset of the whole system, on my laptop, in order to deliver business value.

    7. 4

      I’m guessing docker compose isn’t an option here… you have to have some pretty serious infrastructure to not run on a high spec laptop. Or infrastructure you don’t have containers for.

      1. 3

        I think nomad really gives us an option here of running the same thing in prod as on our laptop. Docker compose feels like the wrong hammer for a prod environment.

        1. 2

          Okay, so install nomad on your laptop instead of using docker compose. It’s more about the concept than the specific tool. :)

      2. 2

        An interesting direction we’ve taken for our use case is to generate a single container with a bunch of services in it managed by process-compose. We use devenv to define the set of services needed for development as a collection of Nix modules and it sets up a process-compose configuration that will run them. We then create a docker image with this process-compose setup. The container even has an nginx in it reverse-proxying various routes of 8080 to the internal services as well as a main page (pieced tohether from the modules) for the whole container documenting the contents and their versions etc.

        1. 2

          process-compose definitely looks interesting, but I can’t help but wonder if running systemd inside the container would have been sufficient. :)

          1. 1

            interesting, but I can’t help but wonder if running systemd inside the container would have been sufficient. :)

            I’ve never tried running systemd inside a container or seen anyone else do that before, but that’s definitely an interesting idea, particularly since that would allow you to reuse a lot of the NixOS module definitions. That said, process-compose is a really nice fit for this application. When you start it, it shows a simple text UI with the status of all the services and you can scroll through them and see the logs. It also has an HTTP API that exposes the same functionality as that text UI, making it very convenient to manage the environment in tests etc.

    8. 3

      you’ll have to break your monolith into microservices so it scales more efficiently or so that multiple teams can deploy software without stepping on each other’s toes.

      As that happens, there will be a point of no return in which engineers can’t run the whole system on their machines anymore — at least not in a reliable way.

      That’s not my experience. Running microservices locally is pretty straightforward. What’s hard is emulating proprietary infrastructure like SQS. S3, on the other hand, is trivial with minio for 99% of workloads.

      First, engineers create huge Google Docs or Markdown files with complex instructions for running the software reliably.

      Other than a 1-time setup phase we boiled it down to make up at my company, which brought up nomad, etc. There were problems, but problems that were avoidable in retrospect.

      1. We were using things like SQS early. Localstack just wasn’t a great replacement for a number of reasons. We also used lambdas and eventually moved away. The less our stack was AWS specific the more it worked locally.

      2. We used Chromebooks and the container environment had some limitations that, for our niche use case (of running Firecracker VMs), caused some “if local do this, if not do that”.

      Knowing what I know now I could easily avoid those issues.

      As more people share staging, they will start stepping on each other’s toes.

      o god yes. It is awful. Death to staging environments perhaps? At this point I’m more fond of:

      a) Writing idempotent services

      b) Making rollbacks trivial and fast

      c) Canaries, fast fails, etc.

      That’s when the CTO will step in to solve the problem and give each engineer their environment in the cloud. That’s what companies like Shopify, Uber, and Stripe already do.

      Us too. It wasn’t worth it ultimately. You always make a choice between “fresh resources that you can trust, but very slow setup” and “old resources that you want to trust, but very fast setup”. Not good!

      My solution at this point is to enforce a local development environment at almost all costs. Buying a 5000 dollar laptop per dev so that they have the 64GB of RAM or whatever is not that big of a deal for a company. At the end of the day, when space is tight, consider optimizing your system. 64GB of RAM is a fucking lot for a dev environment and I wouldn’t be surprised to see laptops supporting 128GB in the next few years.

      This means avoiding proprietary “cloud only” services unless they support some local version - like dynamodb does, or like minio for S3. It means that every service needs to be written efficiently. It means that on day 1 you want a way to look at your traces on a local machine.

    9. 1

      Luckily the world isn’t quite so broken yet that software has to be a centralized cloud service to be successful.

    10. 1

      We use Nix devShells combined with arion for fully reproducible developer environments that are as close to production as possible. With the great performance of Cachix devs can instantly spin an entire prod env on their laptop. The one thing this setup does not support are aws lambdas, so we’ve decided to just not use those.