Thank you for this article. It’s rare to see an article detailing the struggles of going from scratch to a working application on AWS. The author is clearly someone used to operating services and has good opinions about what they need (logging, metrics, tracing).
That being said, the target price of $200/month and the achieved cost of $130/month seems high. I run a very small hobby project called Simple Calendar Tracker on AWS. Very basic CRUD application using the typical API Gateway to Lambda to DynamoDB, with CloudWatch and X-Ray. It costs around $20/month total for a test environment and a prod environment, doesn’t take much traffic maybe 100 daily active users and < 5 TPS p100.
I’ve been meaning to write up this simple application and how other can follow. Some things that stood out for me in the article:
The author explicitly wants to go all in for AWS, but then uses Terraform? That’s strange, why not use AWS CDK? AWS supports CDK and smooths from rough edges for you, e.g. private subnets and access to services.
Aurora Serverless is great if you require scalable relational database access, but is expensive. DynamoDB is widely used inside Amazon, and in pay-per-request mode can be cheap.
Why use Fargate and ECS and not Lambda? Granted, Lambda can have larger tail latencies, but is cheaper for small workloads.
The author was really obsessed with avoiding managed NAT gateways. I get it, they’re annoying and expensive. You can run your own NAT Gateway on a small instance, or just use a public subnet and secure your instances, or use a managed compute environment like Lambda.
Why complain about the cost of CloudWatch, but then also complain at how expensive it is to run Elasticsearch / Graphana / Prometheus? This is kind of what CloudWatch is - expensive but all-encompassing. But I 100% agree CloudWatch metrics are expensive.
People often compare using managed AWS services to running a single VPS instance, but this is an apples to oranges comparison. How does this address scalability, availability, durability, deployments, and security? Lambda is capable of bursting up from zero to hundreds of thousands of concurrent instances instantly if you request limit increases.
I’m looking forward to writing my blog post on creating and operating services on AWS, I think it could easily become a book. Blog posts like this help me realize what customer pain points are.
I’ll be on the lookout for your article. I used to host my personal stuff on AWS because that is what I work with, but have mostly moved off due to pricing. Terraform is my choice because it is cloud neutral and I am using more than one cloud provider. I use apps I didn’t write and they expect Postgres, so I think dynamo is not an option? I think the author walked away with cloudwatch, prom/graph, and elasticsearch all being really expensive. I personally find it hard to get the visuals from cloudwatch I get from grafana and the ability to query more efficiently. I may just not know enough there as well. I agree that he was never going to get a low price out of his method deployment, but it is a small application and even ec2 would have been a huge cost savings for a non critical app.
I agree that he was never going to get a low price out of his method deployment, but it is a small application and even ec2 would have been a huge cost savings for a non critical app.
I agree, if the business problem you are solving can be satisfied by a single EC2 spot instance and you have the domain knowledge to operate it, go for it. t4g.nano spot price is $1 a month. I use spot instances for a separate hobby project, works great.
People often compare using managed AWS services to single VPS instance, but this is an apples to oranges comparison. How does this address scalability, availability, durability, deployments, and security? Lambda is capable of bursting up from zero to hundreds of thousands of concurrent instances instantly if you request limit increases.
As a member of the VPS instance gang, I’ll play devil’s advocate a little if you don’t mind. ;-)
For small applications you can easily get a year of uptime and low hundreds of concurrent users on a single small VPS. It’s really not a problem, especially if the service you’re running is free and can tolerate an hour of downtime once every six months or so. For what I’d consider mid-sized applications with Real Users you’d probably want 2 redundant web servers, load balancer with failover, a database server or two, maybe some logging system, and you’d need an Ansible setup or something to orchestrate. Linode puts that at $60/month, so this is about the level at which AWS starts being competitive. It takes a week or two to set up and maybe one day a month to manage tops, if you know what you’re doing, and as this post demonstrates, knowing what you’re doing with AWS is a pretty large investment of work as well.
If your code is a little careful this ought to be plenty to scale to thousands of concurrent users, and easily expand to tens of thousands with beefier and more servers; you don’t hit its limits until you get into the dozens of machines. You probably won’t run Netflix off of it, but you can run 2015’s StackOverflow off of it just fine.
The other place where AWS works really well is at the very low end, such as when you just need something that’s a single Lambda function or such that glues together other things. I dealt with it for that sort of thing recently and it was reasonably pleasant, though it does tend to start with one service that Does Something and then you need to drag in bunches of other pieces to manage it.
As a member of the VPS instance gang, I’ll play devil’s advocate a little if you don’t mind. ;-)
Don’t exclude me! My Battlesnake runs happily on a Digital Ocean instance. I don’t care about downtime.
This all boils down to different strokes for different folks. I’m absolutely not saying AWS is the glorious single true way to deploy software systems. With appropriate domain expertise you can be StackOverflow and run on a handful of powerful 1U rack servers. In my first job I worked for a company that sold powerful telecommunication softswitches for small towns and cities, that was an awesome experience.
But if you don’t have domain expertise, managed services are pretty awesome. I wasn’t lying before. I interacted with a high-profile internal customer whose Lambda traffic scaled from 0 TPS to 100k TPS within 10ms and throttled excess load. Their complaint was that they needed this to be larger, and we handled it. I’m not saying Lambda is magical and solves all problems, just that this particular customer needed no domain expertise or capital expenditure at all.
AWS is a sliding scale with different components you can play with, or you can opt out and build everything by hand, that’s how I see it.
…and at the same time you can get a Linode VPS that can do most of this stuff for $5/month. I’m still confused as to why people want “serverless” stuff.
I thought the $200 number was a joke at first, since I have a number of hosts on Digital Ocean VPSes and am paying about $15 a month. But that’s really what it costs to be web scale, I guess.
Perhaps I’m an old fuddy duddy but I’d much rather learn to build my app with OSS tools and maintain my own Linux VPS for $20/mo which can run all the parts of my app and support 1000s of users than break up my app into a dozen AWS proprietary pieces.
Single VPSs can vertically scale pretty far these days if you suddenly need to support a lot of users.
There’s a lot of truth to what you say. I’m surprised at the high costs, but a few things with the single server model:
Even with IaaS, servers and their storage can fail. You really need to provision at least two servers and associated disks to be able to handle failure. You can skip this for a hobby system but for anything in production you risk a single failure taking out everything. At the very least, you need regular backups so that your users lose only a few hours of data in the event of catastrophic failure.
If you want higher reliability, then you need to build all of the fun two-phase commit logic yourself, or deploy and manage some platform that provides it.
You are responsible for deploying security fixes for the entire platform (kernel, system libraries, and so on). If your time is not free, then this is an additional cost that is probably much higher than the $20/month.
If your system needs to scale up to more than a single machine then you may need to completely re-architect your system. This probably isn’t an important concern for a lot of things: I think WhatsApp was able to support around 10M users on a single FreeBSD box with a server written in Erlang. I might be off by an order of magnitude but I remember them claiming a record for the number of open sockets on a single box a long time ago, and a single machine is a lot more powerful than it was then.
The smallest VM that can manage your peak load is the smallest scale-down size that you can support.
By using PaaS things, you get a few benefits. When you have no users, you’re paying only for storage, your scale-down costs can be basically nothing (we’re using some Azure Functions stuff for managing CI, for example, and have a mostly stateless system there that handles one HTTP request when we want to trigger a job, it’s free until we hit some quota for requests and then cost peanuts after that, less than the cheapest VM on Azure or even somewhere cheaper like vultr). You don’t have to worry about deploying security fixes for the platform, it’s all managed for you. You don’t have to worry about node failure, it’s managed by the platform.
The down sides are lock-in to a single cloud platform (usually. There are some abstraction things but they’re not great) and the fact that PaaS offerings are fairly immature. POSIX 2018 is the result of almost 50 years of UNIX evolution and provides a pretty good set of abstractions for a glorified minicomputer. AWS Lambda or Azure Functions are very young in comparison and the design patterns for easily writing applications that scale down nothing and up to running on a mainframe / supercomputer hybrid that we’ve branded as a cloud datacenter are still quite experimental.
I have had good luck running free SPAs on Lambda by using Netlify as my gateway. The downside of that approach is that there is no persistence. You can do cheap persistence for low write volume apps by syncing files to S3, but every solution ends up more complicated than is ideal.
I think an AWS setup, with Terraform, etc. is expensive because they’re solving problems the author does not have. Mainly I pay extra for scalability and reliability, assuming the architecture has already been done for me. More accurately I’m replacing very expensive engineers who may or may not get the architecture and engineering correct and moving that into a tried set of services on AWS. It’d be great if AWS could somehow bridge the gap and make inexpensive hobby projects mirror our production work projects, then I’d be increasing my job skills. The reality is that DigitalOcean, Heroku and the myriad of other low cost providers are fine. Since I am the only developer on hobby projects and there’s no way I’d get more than a handful of users, a private GitHub repo and manually uploading deploys would be fine. Definitely don’t need Terraform or even CI (though setting up deploy on commit is so easy, I probably wouldn’t do manual to be honest). $200/mo knowing your infrastructure could scale to 200k-300k users an hour without having to buy capital equipment upfront or architecting a highly scalable solution is incredibly cheap when you think of it that way.
I work for AWS, my opinions are my own.
Thank you for this article. It’s rare to see an article detailing the struggles of going from scratch to a working application on AWS. The author is clearly someone used to operating services and has good opinions about what they need (logging, metrics, tracing).
That being said, the target price of $200/month and the achieved cost of $130/month seems high. I run a very small hobby project called Simple Calendar Tracker on AWS. Very basic CRUD application using the typical API Gateway to Lambda to DynamoDB, with CloudWatch and X-Ray. It costs around $20/month total for a test environment and a prod environment, doesn’t take much traffic maybe 100 daily active users and < 5 TPS p100.
I’ve been meaning to write up this simple application and how other can follow. Some things that stood out for me in the article:
I’m looking forward to writing my blog post on creating and operating services on AWS, I think it could easily become a book. Blog posts like this help me realize what customer pain points are.
I’ll be on the lookout for your article. I used to host my personal stuff on AWS because that is what I work with, but have mostly moved off due to pricing. Terraform is my choice because it is cloud neutral and I am using more than one cloud provider. I use apps I didn’t write and they expect Postgres, so I think dynamo is not an option? I think the author walked away with cloudwatch, prom/graph, and elasticsearch all being really expensive. I personally find it hard to get the visuals from cloudwatch I get from grafana and the ability to query more efficiently. I may just not know enough there as well. I agree that he was never going to get a low price out of his method deployment, but it is a small application and even ec2 would have been a huge cost savings for a non critical app.
I agree, if the business problem you are solving can be satisfied by a single EC2 spot instance and you have the domain knowledge to operate it, go for it. t4g.nano spot price is $1 a month. I use spot instances for a separate hobby project, works great.
As a member of the VPS instance gang, I’ll play devil’s advocate a little if you don’t mind. ;-)
For small applications you can easily get a year of uptime and low hundreds of concurrent users on a single small VPS. It’s really not a problem, especially if the service you’re running is free and can tolerate an hour of downtime once every six months or so. For what I’d consider mid-sized applications with Real Users you’d probably want 2 redundant web servers, load balancer with failover, a database server or two, maybe some logging system, and you’d need an Ansible setup or something to orchestrate. Linode puts that at $60/month, so this is about the level at which AWS starts being competitive. It takes a week or two to set up and maybe one day a month to manage tops, if you know what you’re doing, and as this post demonstrates, knowing what you’re doing with AWS is a pretty large investment of work as well.
If your code is a little careful this ought to be plenty to scale to thousands of concurrent users, and easily expand to tens of thousands with beefier and more servers; you don’t hit its limits until you get into the dozens of machines. You probably won’t run Netflix off of it, but you can run 2015’s StackOverflow off of it just fine.
The other place where AWS works really well is at the very low end, such as when you just need something that’s a single Lambda function or such that glues together other things. I dealt with it for that sort of thing recently and it was reasonably pleasant, though it does tend to start with one service that Does Something and then you need to drag in bunches of other pieces to manage it.
Don’t exclude me! My Battlesnake runs happily on a Digital Ocean instance. I don’t care about downtime.
This all boils down to different strokes for different folks. I’m absolutely not saying AWS is the glorious single true way to deploy software systems. With appropriate domain expertise you can be StackOverflow and run on a handful of powerful 1U rack servers. In my first job I worked for a company that sold powerful telecommunication softswitches for small towns and cities, that was an awesome experience.
But if you don’t have domain expertise, managed services are pretty awesome. I wasn’t lying before. I interacted with a high-profile internal customer whose Lambda traffic scaled from 0 TPS to 100k TPS within 10ms and throttled excess load. Their complaint was that they needed this to be larger, and we handled it. I’m not saying Lambda is magical and solves all problems, just that this particular customer needed no domain expertise or capital expenditure at all.
AWS is a sliding scale with different components you can play with, or you can opt out and build everything by hand, that’s how I see it.
Okay, I can’t lie, that is pretty awesome.
…and at the same time you can get a Linode VPS that can do most of this stuff for $5/month. I’m still confused as to why people want “serverless” stuff.
I thought the $200 number was a joke at first, since I have a number of hosts on Digital Ocean VPSes and am paying about $15 a month. But that’s really what it costs to be web scale, I guess.
This is such a great metaphor!
Perhaps I’m an old fuddy duddy but I’d much rather learn to build my app with OSS tools and maintain my own Linux VPS for $20/mo which can run all the parts of my app and support 1000s of users than break up my app into a dozen AWS proprietary pieces.
Single VPSs can vertically scale pretty far these days if you suddenly need to support a lot of users.
And if you outgrow your $20 VPS you get a server for $100 and keep on trucking
There’s a lot of truth to what you say. I’m surprised at the high costs, but a few things with the single server model:
By using PaaS things, you get a few benefits. When you have no users, you’re paying only for storage, your scale-down costs can be basically nothing (we’re using some Azure Functions stuff for managing CI, for example, and have a mostly stateless system there that handles one HTTP request when we want to trigger a job, it’s free until we hit some quota for requests and then cost peanuts after that, less than the cheapest VM on Azure or even somewhere cheaper like vultr). You don’t have to worry about deploying security fixes for the platform, it’s all managed for you. You don’t have to worry about node failure, it’s managed by the platform.
The down sides are lock-in to a single cloud platform (usually. There are some abstraction things but they’re not great) and the fact that PaaS offerings are fairly immature. POSIX 2018 is the result of almost 50 years of UNIX evolution and provides a pretty good set of abstractions for a glorified minicomputer. AWS Lambda or Azure Functions are very young in comparison and the design patterns for easily writing applications that scale down nothing and up to running on a mainframe / supercomputer hybrid that we’ve branded as a cloud datacenter are still quite experimental.
This is pretty accurate. I think Lightsail might be better suited for hobby projects. CloudWatch is always expensive though.
I have had good luck running free SPAs on Lambda by using Netlify as my gateway. The downside of that approach is that there is no persistence. You can do cheap persistence for low write volume apps by syncing files to S3, but every solution ends up more complicated than is ideal.
I think an AWS setup, with Terraform, etc. is expensive because they’re solving problems the author does not have. Mainly I pay extra for scalability and reliability, assuming the architecture has already been done for me. More accurately I’m replacing very expensive engineers who may or may not get the architecture and engineering correct and moving that into a tried set of services on AWS. It’d be great if AWS could somehow bridge the gap and make inexpensive hobby projects mirror our production work projects, then I’d be increasing my job skills. The reality is that DigitalOcean, Heroku and the myriad of other low cost providers are fine. Since I am the only developer on hobby projects and there’s no way I’d get more than a handful of users, a private GitHub repo and manually uploading deploys would be fine. Definitely don’t need Terraform or even CI (though setting up deploy on commit is so easy, I probably wouldn’t do manual to be honest). $200/mo knowing your infrastructure could scale to 200k-300k users an hour without having to buy capital equipment upfront or architecting a highly scalable solution is incredibly cheap when you think of it that way.