1. 10
    1. 7

      One note on reserved instances: they can be sold for EC2. So I actually work for a company that buys and sells RIs on behalf of customers and we buy 3 year no upfront instances and then sell them as soon as a month later. You have to have some idea of the liquidity of different instance types to do this well however. But c5, m5, r5 (common instance types) and us-east-1, eu-west-1, us-west-1 (large regions) are pretty safe.

    2. 4

      This is actually a surprisingly good article. Probably the most important part of taming aws costs is cattle not pets.

      1. 1

        Yeah, the best thing that would improve this article would be to add a second one going into more aspects. :)

        I would love to see advice about minimising egress costs, for example.

    3. 1

      Scattered thoughts:

      • I really like that this starts with Cost Management. It’s kind of like profiling a program–when you look the easy room for improvement doesn’t turn out to be where you’d guess. You can also export cost data to query in Athena.
      • I’m a big fan of the flexible Savings Plans. Knowing only that you’ll need a certain level of base load for 1 or 3 years, you can save a lot, and you keep the flexibility to change type, family (including new families that aren’t yet an option when you sign up for the savings plan), or region (for lower cost, DR, data-location requirements, etc.). Note RDS, etc. don’t fall under these plans and have their own, less flexible, reserved instances.
      • Some places you can reduce costs outside of compute:
        • Useless storage: over enough it’s easy to accumulate junk in S3 or EBS snapshots you don’t need. An occasional manual cleaning can be worth your time (S3 Storage Lens is nice), and lifecycle rules to delete stuff like logs, Athena results, or temporary files can help. (Also, gzip and zstd work wonders on things like logs or data dumps and also make them faster to read!) Archive-tier EBS snaps are kind of interesting if you need to keep them 90 days, though we haven’t found a use case.
        • S3 storage classes: for anything like backups, Intelligent Tiering looks awesome. Glacier Instant lets you store backups you’re keeping for 90 days for cheap, without the slow Glacier retrieval process if you need them. (And Infrequent Access’ minimum is just 30d.) None of these have super punitive costs when you do use your backups.
        • Bandwidth out: AWS’s bandwidth out is relatively expensive. If you serve enough TB/mo of static or cacheable content to care, multiple CDNs and Cloudflare R2 can do it for you cheaper. (If you’re relatively small but your bandwidth costs for static content are noticeable, CloudFront’s free 1TB/mo may be nice.)
        • Inter-AZ bandwidth: I’m not sure how common this is, but we run cross-AZ, and trying to keep heavy data flows within one AZ can reduce costs, but it’s tricky and app-dependent.
      • Picking the right instance family (e.g. c6 vs r6 vs m6) can help.
        • A couple times we’ve considered increasing size and been able to move “laterally” instead (e.g. ‘m’ family to ‘r’ when RAM was the limiter), or to a newer variation that had what we wanted at the time (i3 to i3en for space).
        • Data/benchmarks may not line up with intuition: our Web tier was more RAM-bound than you’d expect a Web tier to be, and an OLAP database ended up more CPU-sensitive than expected.
        • With flexible RIs/savings plans, it’s worth checking now and if moving to the latest variation of your instance types is an efficiency gain.
        • The m6a/r6a/c6a instances can be cheaper and faster than Intel options in the same gen and are worth checking out.
        • The ‘t’ instances are awesome for small utility servers.
        • Gravitron’s sneaky advantage is that each “vCPU” is a physical core whereas on Intel/AMD a vCPU a is hardware thread that may a core w/another thread. It won’t win a single-threaded benchmark against AMD/Intel, but because you have twice the physical cores at a similar price, many-thread throughput can be better. Certainly interesting for greenfield or easy-to-port stuff.
      • It’s worth comparing alternative/related services for a use case, and looking for people’s opinions of them. Like, Athena does wonders for us, and I hear iffy things about DynamoDB and Managed NAT Gateway. (A data lake queryable by Athena taking advantage of compressed formats and partitioning can be pretty great.)
      • Considering the app and deployment together can help. If you had to grow an instance because of OOMs around spikes of activity, maybe you can smooth the spikes or reduce the RAM usage or both. I’ve gotten useful ideas from trying to follow costs back to the code that incurred them.