1. 66
  1.  

  2. 53

    This article is a cautionary tale about dogma and religious belief in computing, and the role of “serverless” is only ancillary. Teams making hundreds of repos, copy pasting code, chasing shiny new thingies configured with YAML/JSON/EDN/… - this can happen with almost any underlying deploy technology in the name of “Microservices”. The underlying issue is not the addictive nature of the cloud provider’s overhyped abstractions - it is this:

    When it’s not okay to talk about the advantages and disadvantages of [The Way] with other engineers without fear of reprisal, it might be a cult. Many of these engineers say [The Way] is the only way to [Be] anymore.

    In this article, The Way is “AWS lambda”, but it could easily be “Kubernetes”, “Event sourcing”, “Microkernels”, “Microservices”, “Cluster framework (akka/erlang/…)” or any other technology. Anything that can be a panacea that becomes a poison when dogma is too strong.

    I enjoy cautionary tales, but I’d be more interested to hear how to solve the underlying cultural issue of dogma within an org, since changing minds and engineering practice is so much harder than changing code.

    1. 17

      That’s one takeaway, but an equally important one is that serverless has inherent limitations when compared with “non-scalable” solutions … like hosting your app on a Unix box. That includes the first one (testing), “microservice hell”, and “inventing new problems”.

      The last one is particularly important. In more time-tested stacks there are “idioms” for each kind of problem you will encounter. Serverless only provides a very small part of the stack (stateless, fine grained), and if your application doesn’t fit (which is very likely), then you will need to invent some custom solution.

      And what he’s saying is that, after experience, those custom solutions (“hacks”) ended up worse than the time-tested ones.

      I agree the abuse of HTTP falls in the other category though. That is not inherent to serverless – if you got it wrong there, you would get it wrong everywhere.

      1. 9

        You could make a similar cautionary tale about Enterprise software set in the early 00s with the rise of Java and XML.

        1. 3

          Yep, if you squint, AWS serverless is similar in outcome to IT-managed WebSphere. Although, at least WebSphere tried to be J2EE-compliant. Lambda and other services are purely proprietary, with no attempt to provide open standards.

          1. 13

            Lambda is just Greek for “bean”.

        2. 4

          I think this is symptomatic of the problematic relationship we have with senior engineers. Software engineering has a tendency for neophilia, where if someone criticizes a new idea they’re often dismissed as an “old timer” who’s afraid of change and whose opinions are irrelevant. I have the impression at least that in other fields the opinion of seniors are taken very seriously because they have experience that juniors lack. Certainly seniors are less prone to be wooed by new technology and have an abundance of past experience that the technology can be evaluated against. It’s really hard to ask critical questions like “How would new technology X handle scenario Y?” if you’ve never had to deal with “scenario Y” before.

          One idea would be to have some kind of “senior engineer advisory board” or “council of elders” that could weigh in on technical decisions. At Pixar they famously had a “brain trust” of experienced employees that would vet important decisions at various stages of the production. The point would be to institutionalize some kind of sanity checking with more experienced developers so that we don’t have to make the same mistakes twice, both as an organization and as a field.

          I’m not advocating for letting senior engineers rule by fiat, just that we should look more to seniors for guidance and counsel, just like pretty much every other industry seems to be doing.

          1. 5

            Microkernels

            That item does not belong in that list.

            1. 3

              Why not? The point is that in a healthy engineering culture, no technology choice should be canon law, unable to be consciously weighed against alternatives, whether it’s your favourite pet or not.

              1. 7

                It lacks the cargo cult; Everybody seems to hate ’em. For no good reason.

                1. 6

                  I think that’s uncharitable, and also untrue. Uncharitable because clearly lots of people are interested in microkernels one way or another, and because like all technical choices it represents a trade-off between competing concerns; your expectations of system behaviour may well not match the expectations of somebody less enthusiastic about microkernels.

                  It’s untrue because I hear loud positive noises about microkernels, even just on this site, all the time; e.g., your own comment over here!

                  1. 1

                    Yes, it is pretty much just me, and that is sad.

                  2. 4

                    I think that’s untrue now but it’s alternated. In the early ’90s, there was a lot of dogma around microkernels. OSF/1 was going to change the UNIX landscape completely. Minix was a demonstration of how operating systems should be written. Linux was a terrible design, *BSD was a legacy thing that would eventually just become a server on a microkernel (or, ideally, ripped apart and bits of it used to build multiple servers).

                    Then people got performance data on Mach. System call overheads were insanely high because Mach did port rights checking on every message and a system call required at least two messages to handle. The dogma shifted from microkernels are unconditionally good to microkernels are unconditionally bad.

                    In the background, systems like L4 and QNX showed that microkernels can outperform monolithic kernels but that didn’t really take off until multicore systems became common and shared-everything concurrency in a single kernel proved to be a bottleneck. Most of the debate moved on because microkernels quietly won by rebranding themselves as hypervisors and running Linux / *BSD / Windows as a single-server isolation layer.

                    These days, the L4 family is deployed on more devices than any other OS family, most monolithic kernels are run on top of a microkernel^Whypervisor, and anyone writing a new OS for production use is building a microkernel. Even monolithic kernels are adding userspace driver frameworks and gradually evolving towards microkernels. There’s a lot less dogma.

            2. 14

              I work for AWS, my views are my own and do not reflect my employer’s views.

              Thanks for posting your frustrations with using AWS Lambda, AWS API Gateway, and AWS EventBridge. I agree, using new technologies and handing more responsibility over to a managed service comes with the risk that your organization is unable to adopt and enforce best standards.

              I also agree that working in a cult-like atmosphere is deeply frustrating. This can happen in any organization, even AWS. I suggest focusing on solving problems and your business needs, not on technologies or frameworks. There are always multiple ways to solve problems. Enumerate at least three, put down pros and cons, then prototype on two that are non-trivially different. With this advice you will start breaking down your organization’s cult-like atmosphere.

              Specifically addressing a few points in the article:

              Since engineers typically don’t have a high confidence in their code locally they depend on testing their functions by deploying. This means possibly breaking their own code. As you can imagine, this breaks everyone else deploying and testing any code which relies on the now broken function. While there are a few solutions to this scenario, all are usually quite complex (i.e. using an AWS account per developer) and still cannot be tested locally with much confidence.

              This is a difficult problem. I have worked in organizations that have solved this problem using individual developer AWS accounts deploying a full working version of “entire service” (e.g. the whole of AWS Lambda), with all its little microservices as e.g. different CloudFormation stacks that take ~hours to set up. It works. I have also worked in organizations that have not solved this problem, and resort to maintaining brittle shared test clusters that break once a week and need 1-2 days of a developer’s time to set up. Be the organization that invests in its developer’s productivity and can set up the “entire service” accurately and quickly in a distinct AWS account.

              Many engineers simply put a dynamodb:* for all resources in the account for a lambda function. (BTW this is not good). It becomes hard to manage all of these because developers can usually quite easily deploy and manage their own IAM roles and policies.

              If you trust and train your developers, use AWS Config [2] and your own custom-written scanners to automatically enforce best practices. If you do not trust and do not train your developers, do not give them authorization to create IAM roles and policies, and instead bottleneck this authorization to a dedicated security team.

              Without help from frameworks, DRY (Don’t Repeat Yourself), KISS (Keep It Simple Stupid) and other essential programming paradigms are simply ignored

              I don’t see how frameworks are connected with DRY and KISS. Inexperienced junior devs using e.g. Django or Ruby on Rails will still write bad, duplicated code. Experienced trained devs without a framework naturally gravitate towards helping their teams and other teams re-use libraries and create best practices. I think expecting frameworks to solve your problem is an equally cult-like thought pattern.

              Developers take the generic API Gateway generated DNS name (abcd1234.amazonaws.com) and litter their code with it.

              Don’t do this, attach a Route 53 domain name to API Gateway endpoints.

              The serverless cult has been active long enough now that many newer engineers entering the field don’t seem to even know about the basics of HTTP responses.

              Teach them.

              Cold starts - many engineers don’t care too much about this.

              I care about this deeply. Use Go or Rust first, see how much cold starts are still a problem, in my experience p99.99 latency is < 20 ms for trivial (empty) functions (this is still an outrageously high number for some applications). If cold starts on Go or Rust are still a problem, yes you need to investigate provisioned concurrency. But this is a known limitation of AWS Lambda.

              As teams chase the latest features released by AWS (or your cloud provider of choice)

              Don’t do this, give new features / libraries a hype-cool-down period that is calibrated to your risk profile. My risk profile is ~6 months, and I avoid all libraries that tell me they are not production ready.

              When it’s not okay to talk about the advantages and disadvantages of serverless with other engineers without fear of reprisal, it might be a cult. Many of these engineers say Lambda is the only way to deploy anymore.

              These engineers have stopped solving problems, they are now just lego constructors (I have nothing against lego). Find people who want to solve problems. Train existing people to want to solve problems.

              I am keeping track of people’s AWS frustrations, e.g. [1]. I am working on the outline of a book I’d like to write on designing, deploying, and operating cloud-based services focused on AWS. Please send me your stories. I want to share and teach ideas for solving problems.

              [1] https://blog.verygoodsoftwarenotvirus.ru/posts/babys-first-aws/

              [2] https://docs.aws.amazon.com/config/latest/developerguide/managed-rules-by-aws-config.html

              1. 4

                The serverless cult has been active long enough now that many newer engineers entering the field don’t seem to even know about the basics of HTTP responses.

                Teach them.

                I’m happy to teach anyone who wants to learn. Unfortunately this usually comes up in the form of their manager arguing that it’s too much overhead to spend time getting their employee(s) up to speed on web tech and insist on using serverless as a way to paper over what is happening throughout the stack. This goes to the heart of why people characterize it as a cult. The issues it brings into orgs isn’t about the tech as much as it is about the sales pitches coming from serverless vendors.

                1. 9

                  Interesting. At $WORK, we’re required to create documents containing alternatives that were considered and rejected, often in the form of a matrix with multiple dimensions like cost, time to learn, time to implement, etc. Of course there’s a bit of a push-pull going on with the managers, but we usually timebox it (1 person 1 week if it’s a smaller decision, longer if it’s a bigger one.) Sometimes when launching a new service we’ll get feedback from other senior engineers asking why we rejected an alternative maybe even urging us to reconsider the alternative.

                  Emotional aspects of the cult aside (which sucks, not saying it doesn’t just bringing up a different point), I don’t think I’d ever let a new system be made at work if at least a token attempt weren’t made at evaluating different technologies. I fundamentally think comparing alternatives makes for better implementations, especially when you have different engineers with different amounts of experience with different technologies.

                  1. 1

                    So you write an RFP with metrics/criteria chosen to perfectly meet the solution already settled on?

                    1. 2

                      I mean if that’s what you want to do, sure. Humans will be human after all. But having this kind of a process offers an escape hatch from dogma around a single idea. Our managers also try to apply pressure to just get started and ignore comparative analyses, but with a dictum from the top, you can always push back, citing the need for a true comparative analysis. When big outages happen, questions are asked in the postmortem whether an alternate architecture would have prevented any issues. In practice we often get vocal, occasionally bikeshed-level comments on different approaches.

                      I’m thankful for our approach. Reading about other company cultures reminds me of why I stay at $WORK.

                  2. 2

                    Try giving them alternatives. Want to train your developers, or sign off on technical debt and your responsibility to fix it?, when presented well, can point out the issue. This happens with all tech vendors, and all managers can suck at this. But that’s not the fault of serverless.

                    Note that I’m not arguing that serverless is actually good. As with any tech, the answer is usually “it depends”. But just like serverless, you need experience with other things as well to be able to see this pattern.

                    In fact, I agree with several commenters saying that majority of issues in the article can be applied to any tech. The only real insurmountable technical issue is the testing/local stack. The rest is mostly about processes of the company, or maybe a team in the company.

                  3. 4

                    Specifically addressing a few points in the article

                    … while carefully avoiding the biggest one:

                    “All these solutions are proprietary to AWS”

                    That right there is the real problem. An entirely new generation of devs is learning, the hard way, why it sucks to build on proprietary systems.

                    Or to put it in economic terms, ensure that your infrastructure is a commodity. As we learned in the 90s, the winning strategy is x86 boxen running Linux, not Sun boxen running Solaris ;) And you build for the Internet, not AOL …

                    1. 2

                      I think there are three problems with a lot of the serverless systems, which are closely related:

                      • They are proprietary, single-vendor solutions. If you use an abstraction layer over the top then you lose performance and you will still end up optimising to do things that are cheap with one vendor but expensive for others.
                      • They are very immature. We’ve been building minicomputer operating systems (and running them on microcomputers) for 40+ years and know what abstractions make sense. We don’t really know what abstractions make sense for a cloud datacenter (which looks a bit like a mainframe, a bit like a supercomputer, and a bit like a pile of servers).
                      • They have a lot of vertical integration and close dependencies between them, so it’s hard to use some bits without fully buying into the entire stack.

                      If you think back to the late ’70s / early ‘80s, a lot of things that we take for granted now were still very much in flux. For example, we now have a shared understanding that a file is a variable-sized contiguous blob of bytes. A load of operating systems provided record-oriented filesystems, where each file was an array of strongly typed records. If you do networking, then you now use the Berkeley Sockets API (or a minor tweak like WinSock), but that wasn’t really standardised until 1989.

                      Existing FaaS offerings are quite thin shims over these abstractions. They’re basically ‘upload a Linux program and we’ll run it with access to some cloud things that look a bit like the abstractions you’re used to, if you use a managed language then we’ll give you some extra frameworks that build some domain-specific abstractions over the top’. The domain-specific abstractions are often overly specialised and so evolve quite quickly. The minicomputer abstractions are not helpful (for example, every Azure Function must be associated with an Azure Files Store to provide a filesystem, but you really don’t want to use that filesystem for communication).

                      Figuring out what the right abstractions are for things like persistent storage, communication, fault tolerance, and so on is a very active research area. This means that each cloud vendor gains a competitive advantage by deploying the latest research, which means that proprietary systems remain the norm, that the offerings remain immature. I expect that it will settle down over the next decade but there are so many changes coming on the hardware roadmap (think about the things that CXL enables, for one) that anything built today will look horribly dated in a few years.

                      1. 1

                        Many serverless frameworks are built upon Kubernetes, which is explicitly vendor-neutral. However, this does make your third point stronger: full buy-in to Kubernetes is required.

                        1. 2

                          Anything building on Kubernetes is also implicitly buying into the idea that the thing that you’ll be running is a Linux binary (well, or Windows, but that’s far less common) with all of the minicomputer abstractions that this entails. I understand why this is being done (expediency) but it’s also almost certainly not what serverless computing will end up looking like. In Azure, the paid FaaS things use separate VMs for each customer (not sure about the free ones), so using something like Kubernetes (it’s actually ACS for Azure Functions, but the core ideas are similar) means a full Linux VM per function instance. That’s an insane amount of overhead for running a few thousand lines of code.

                          A lot of the focus at the moment is on how these things scale up (you can write your function and deploy a million instances of it in parallel!) but I think the critical thing for the vast majority of users is how well they scale down. If you’re deploying a service that gets an average of 100 requests per day, how cheap can it be? Typically, FaaS things spin up a VM, run the function, then leave the VM running for a while and then shut it down if it’s not in use. If your function is triggered, on average, at an interval slightly longer than the interval that the provider shuts down the VM then the amount that you’re paying (FaaS typically charges only for CPU / memory while the function is running) is far less than the cost of the infrastructure that’s running it.

                      2. 2

                        S3 was a proprietary protocol that has become a de facto industry standard. I don’t see why the same couldn’t happen for Lambda.

                    2. 9

                      I think this is a rare article where I agree with everything. I wouldn’t describe myself as having been a serverless zealot but I observed a lot of serverless zealotry that stood atop a disregard for the issues mentioned in this article. Serverless tech has it’s uses but it trades away established development practices to gain scalability, exaulting scalability to the first tier of concerns. If your use case isn’t going to succeed or fail based on its ability to scale then serverless isn’t likely to be a great fit. I usually find it works best as a load sink at the edges rather than a front door or other load generator. The basic promise is that you can tune your capacity to your load but what that comes down to in practice is needing much more accurate indicators of load. IME that as is threading hard as cache or.

                      1. 7

                        When I implemented a serverless plugin, one of the problems that I noticed was the required boilerplate and shared code. One of the motivations for serverless designs is to share not just the infrastructure for computation, but also the runtime and libraries; the only thing that the programmer should control is a single isolated function. But if that’s the case, then why does each serverless function need to be boilerplated? I came to the conclusion that the serverless abstraction is not a good division of responsibilities.

                        I would imagine that a proper serverless design for a system like Kubernetes would involve a single custom resource definition for serverless functions, one custom resource per function, and a single autoscaling controller which executes functions. I don’t think any such designs are being actively developed, but I’m open to hearing about them.

                        1. 4

                          Regarding deployment and scaling. Yes, “serverless” does this well, but really, depending on what you use instead and what the circumstances are that might be not be the big issue elsewhere.

                          I think many of such perceived improvements are that you have to initially think about them and about how you design things. A lot of that would also happen if you did that with different approaches.

                          People like to try out new things, and new clean things tend to work well, especially when still small. However, even using the same approach starting fresh often allows you that and you have the benefit of knowing the pitfalls already.

                          1. 2

                            In a cult, brainwashing is done so gradually, people have no idea it is going

                            Oh they turn even the most skeptical person into their toy out of frustration and desparation. Cults aren’t subtle at all. /off-topic

                            1. 2

                              I currently work at a company whose major existing codebase is largely built as a serverless app using many of the same AWS technologies this article mentions. I’m not super-thrilled about this, but this architecture was decided upon long before I started working there and I’m not in a position to change it (and even if I was, a ground-up rewrite of a working codebase is not something to be taken lightly).

                              The biggest problem with serverless architecture is that it makes a lot of our code behavior very dependent on understanding and correctly configuring minute details of Amazon’s AWS stack using Amazon’s configuration language. I spend more of my time debugging CDK than I would like, and I’m not thrilled about the inherent lock-in to AWS as a platform that this imposes on the system (although again, it’s very unlikely that we’ll ever move off of it).

                              I’ve run into some of the problems this blog post mentions, and I think serverless architecture does exacerbate some of them. For instance, writing chunks of code as lambdas really does encourage “microservice hell” problems. On the other hand, some of these are just general problems in software engineering, that developers can handle well or poorly in any system implemented on any substrate (like hardcoding a DNS name in 200 places that needs to be changed in the future).

                              1. 1

                                Five years?! What took you so long pal?

                                1. 5

                                  As an engineer, it baffles me to no end that it was years rather than minutes. The article is written a little bit like serveless is a group of people. It’s technology we are talking about. They had literally the same technical information about this technology since day one. A technology doesn’t become a cult. Its users do. But what do you care if the tool solves hour problem? If it’s a cult is ultimately irrelevant, what matters is the value it adds when applied as a solution. That should be perceived by engineers when they evaluate said tool, not 5 years later.

                                  1. 3

                                    I suppose it might also be a situation of someone finally thinking for themselves after having followed some more experienced people in a place of authority. When you trust someone based on their (implied) experience, you tend to believe they know what they’re doing. This is especially true if you are a junior and know you’re in over your head, and it’s not your place to question architectural choices, and surrounded by an entire organization where this is “just the way it’s done”.

                                    Cults have a way of thinking the problems inherent to the cult’s tech are somehow just temporary setbacks with a proper solution just around the corner if you just stick with it a while longer. And band-aids become solutions and the whole mess becomes more complex, but you don’t see the forest for the trees.