1. 44

    1. 18

      Tldr: serverless functions are hard to use for web apps and actually don’t save any money.

      Worth reading in full.

    2. 13

      Lambda is really nice for cronjobs. I have one that notifies me about cubs games and used to have one that rotated my Twitter avatar. At my last job I set one up to glue aslack command to a deploy job.

      All of these are ahort, self-contained scripts with little load, so the exact opposite of how aws was pitching serverless.

      1. 6

        This is why I find the serverless conversation so confusing. It seems like a great fit for stuff like this. Why were people pitching anything else?

        1. 11

          Hammer salesmen convinced us the world is all nails.

        2. 5

          Because it should be good for other things. FaaS is the first step towards people actually writing a decent OS for the mainframes that they’re building. A cloud OS should have:

          • A distributed transactional data store.
          • IPC primitives for timers, message queues, and shared locks.
          • Network interfaces where I can publish endpoints and have QUIC / TLS / HTTP / other high-level protocol abstractions routed to a component of my program.
          • A lighweight process-like abstraction that lets me deploy portions of my program as isolated components that have access to some of the above resources and cost nothing when not running and scale up to an unbounded number of instances (or bounded by limits I set). This needs monitoring support to release locks or roll back transactions if it crashes.

          With these primitives, you could build pretty much any server app (not just web apps - mail, file, IM servers and so on would be easy as well). Unfortunately, most current offerings are nothing like this. Instead, they’re a way of running a Linux container on demand, with all of the other features implemented as HTTP-based APIs and most of the essential features for building reliable systems need implementing on top.

      2. 4

        It is also great for events, like “file arrived in S3 bucket, kick off something that processes it.

    3. 8

      I’ve built and deployed a handful of production web apps on the AWS “serverless stack” (APIg, Lambda, Dynamo, Cognito) and some of this definitely rings true.

      • Yes API gateway is terrible from a developer perspective.
      • Yes serverless SQL DBs was pretty terrible as cold starts could be closer to minutes rather than milliseconds.

      The niches I found to be a happy place where:

      • Internal web applications with relatively low traffic where running a VM or container would be quite inefficient.
      • Queued or scheduled work.

      The whole idea of writing a separate micro codebase for each route was a non-starter for me. I universally used a framework like express for the whole site (giving easy local testing) then packaged it into single lambda with API gateway being a dumb proxy. It went against most of the docs recommended but worked well for me. As long as your packaged code could fit within 100 megs, you were all fine.

      End result internal sites with just a few thousand requests a day cost one or two dollars per month to run. Sure, we could have hosted them alongside other apps on shared infrastructure but then your introducing whole new noisy neighbour and shared global state problems.

      1. 1

        One codebase per function and one function per route is so dumb it sounds like a strawman made up to argue against serverless

      2. 1

        The people I talked to definitely skirt that megabyte limit and try to pack related functions together.

        Queued or scheduled work

        Anything that’s running off SQS or another queue is a really good fit for serverless and also is non-interactive so it isn’t affected by cold starts or retries at all.

    4. 6

      Made me have flashbacks to a project I was on a while ago. We definitely started running into a lot of these problems and we weren’t even doing anything complicated. And don’t even get me started on Aurora Serverless…

      1. 1

        We’ve been talking about moving to Aurora Serverless v2 at work. Would you mind getting started…? ;)

        1. 3

          I used v1 (it was before v2 was released). Nominally, v2 fixes a lot of the glaring issues with v1, however:

          • It’s only been available for a bit over a year. Considering how long it took AWS to get from v1 to v2 and considering how undercooked their new products can be - sometimes for years - I’m not at all confident that v2 is ready for production use.
          • v1 was misrepresented by AWS in my view. For example, the Data API that was used to access Aurora Serverless v1 was in reality a cruel joke with ridiculous limitations. Hence I’d be wondering whether v2 in reality is the same as v2 on paper.
          • v1 was stuck on an older version of Postgres, I don’t know if that’s also going to be the case with v2.

          So after using v1, I’d personally wait another year or two and read a number of really convincing experience reports before I even touched v2 with a barge pole. For now, reading things like this, I’m wary: https://www.lastweekinaws.com/blog/the-aurora-serverless-road-not-taken/

    5. 4

      Let’s be honest, a big reason the whole “lambda per route” thing wasn’t so popular/well-liked and instead there was a lot of just shoving entire traditional-but-stateless apps into lambda images i.e. using it as a “Cloud CGI” (which I sure did with some things, to host things for free and “forever” and with no more maintenance) is that Amazon’s API Gateway is incomprehensible, clunky, annoying and alien-feeling. It doesn’t feel “at home” in a webdev toolbox at all.

      1. 4

        Definitely, I had a bad experience with API Gateway. IIRC, the fact that the only way to deal with error handling in Node lambdas was to do regex matches on exception messages was 🤯 And what is that weird undocumented DSL with roots in Java that is used for request transformations?

    6. 2

      Good article! I was never part of that shift (was doing frontend back then), but I’m working in micro service land right now, and there is quite a bit of overlap.

      One thing…

      NodeJS and Python are the dominant languages, which is a little eyebrow raising. This suggests that picking the right language for the job didn’t end up happening, instead picking the language easiest for the developer. But that’s fine, that is also an optimization.

      I know what I’d pick, but I’m really not sure what the author is hinting at here.

      1. 6

        My interpretation of that comment is that in theory you’d expect serverless to drive an increase in the number of languages used (because it’s supposedly easier) but in practice it didn’t happen, so it didn’t turn out to be a real benefit.

        1. 3

          Ah of course. Makes sense. I guess I just don’t agree at all with that expectation. Thank you :)

    7. 1

      Local development

      This is a real pain. cdk watch alleviates it a bit depending on how you are then able to invoke your lambda.

      Hard to set resources correctly

      We just set every function to 1G RAM so that they also get the processing power to go with it.

      Observability is harder with a distributed system

      Sure, but Datadog works. It’s expensive as hell with Lambdas but it works.

      teams needing to keep “functions warm.”

      This doesn’t actually work unless you have constant traffic. The best solution I’ve seen for this is to switch your functions to Rust (serious serverless practitioners I talked to are doing this) which has the fastest cold start times and also a really good developer experience in general.

      But it is very possible for one function to eat all of the capacity for every other function.

      Global concurrency limits are a thing and you need to have circuit breakers and monitoring in place for them. You learn this the hard way.

      God knows how long people running massive stacks were waiting

      We used to wait more than an hour, but then we split all the stacks into separate pipelines which was a net benefit, but then came with holes in the mental model that developers have about system state.

      I agree in general with the thrust of the article. It is possible to run serverless at scale well with some proper decision making but running containers is not very complex (even though EKS does its best to complicate it) and most outfits should not go further than 50-100 functions (maybe a handful of fat lambdas) before exiting this platform.