1. 47
    1. 24

      A cloud exit makes sense if you have an established long-term viable product with predictable & stable traffic patterns.

      Cloud gives you the flexibility to establish those parameters for your product, without gambling on expensive one-off purchases for resources you may not need, as you experiment with which hardware resources are best suited for your load.

      1. 34

        I’m starting to feel that “cloud allows you to scale up” has become a meme by now…

        You can get 32GB of RAM with a Intel Xeon E5-1620 on kimsufi dot com (I don’t want people to think that I’m advertising) for $40/m and $40 of installation fees. The equivalent in AWS is the t3.2xlarge at $240/m. Just on the first month, with the installation fees, it’s cheaper with a dedicated server than AWS. And AWS will kill you on the bandwith which is 100Mbps unmetered with kimsufi.

        Also, with one of these, a well-written application and good caching, you can easily handle ~1k req/s. That’s a lot of users!

        I sometimes think that cloud companies managed to sell outrageously over-priced products to gullible users who never needed it in the first place. :(

        1. 9

          There’s a lot more to the word “flexibility” besides “scale up”. Clouds allow you to experiment with (managed) blob store, queues, DBs, caches, search indexes, CDNs, networking, etc; to figure out what best suits your product. And then easily move the product to another continent if it turns out it is more popular in EU than in the US—or wherever else your hardware is.

          1. 17

            easily move the product to another continent if it turns out it is more popular in EU than in the US

            How often does this happen? And is it worth paying 10x the price per month, just to optimize for this use case?

            blob store, queues, DBs, caches, search indexes, CDNs, networking, etc

            Most of this can be experimented on very cheaply, one command away with apt install varnish-cache glusterfs postgresql-server rabbitmq-server haproxy. The default package configuration will be more than good enough to experiment.

            Also, this will put you much more in control of things instead of debugging C++ stack traces (I had ton of these when trying to use AWS Redshift), or weird HTTP error messages returned by proprietary APIs. (Try to debug authorization errors in AWS, good luck with this)

        2. 5

          1K requests / second could be underselling it …

          It’s hard to compare directly, but back in 1999 people talked about the C10K problem – 10,000 concurrent connections on a single machine.

          https://en.wikipedia.org/wiki/C10k_problem

          As the wikipedia page mentions, production services have achieved over 1 million concurrent connections on a single machine over 10 years ago – e.g. WhatsApp, using Erlang, not even a native language like C.

          Granted those are probably tiny requests, and I guess keeping the connections open probably allows many more requests per second, since setting up the connections is expensive.

          But it’s still orders of magnitude more than 1K / second (even though the units aren’t the same; I’d be interested in any pointers that elaborate on the relation)

          Maybe 1K page loads, since modern web apps seem to make like 500 requests per page load :-P Or they download 10-50 MB of Javascript routinely.


          Maybe a more comparable site is StackOverflow, they are a self-hosted monolith and seem have large amounts of optimization at the .NET layer.

          Stack Overflow is a cacheless, 9-server on-prem monolith

          The “cacheless” point is important. I’ve seen a lot of bad architectures papered over by caches, which make things better in some cases and worse in others. They also introduce a lot of operational expense.

          https://twitter.com/sahnlam/status/1629713954225405952 – actually the Twitter thread says it’s 6000 request/second per machine, consuming 5-10% capacity. Interesting

          6000 requests/second is 140 B requests/month, which is at least in the ballpark of “2B page views/ month” claimed

          So yeah StackOverflow was acquired for $1.8 billion in 2021, and you can run it on 9 machines, each doing ~6000 requests/second.

        3. 4

          If $40 vs $240 is the debate, you should definitely go with kimsufi. For a lot of other cases you should use the cloud. And then after a certain scale it might indeed pay off to be off the cloud. It really depends on the services you would use.

        4. 2

          kimsufi

          Seems to just resell OVH? As i understand it, the servers all have ecc ram? Still 100 Mbps is abyssimal - would much prefer minimum Gbps. You get that unmetered from Hetzner (or 10Gbps with 30TB/month) - but at closer to 100 USD/month if you disregard pre-owned boxes.

          Still, try and price 30 (or just 2!) TB egress from AWS…

          That aside, you would probably want at least two boxes and a floating ip to have your meaningful risks in a similar ballpark to AWS (worse risk, but possibly similar in actual business terms even if you trade 30 minutes downtime/year for a day of downtime).

          1. 6

            Kimsufi is part of the OVHcloud group. They offer old servers (Intel Atom N2800 anyone?) through this brand, with simpler services.

            1. 1

              Well, if we’re talking non-ecc ram, hetzner has a couple of cheap options with unmetered Gbps uplink.

          2. 5

            Amazon overcharges for egress to keep you keeping you there as much as possible 🤣

          3. 2

            The 100mbit isn’t much, that’s true. But on the other side: if you actually have that many requests you probably want something like a CDN or a different host in front for all the static assets. So for a typical crud app it is probably enough bandwidth. Then again there are definitely other vendors which also give you a fixed amount of compute, storage, memory and bandwidth with > 1Gbit, that do not charge you for the bandwidth. They are bookable monthly too..

            Obviously you will need someone who has knowledge about Linux, to at least install the base and for example docker. But you will need someone equally for AWS (and then probably Linux on top).

            I think this is again a part of “no one has ever been fired for buying IBM”, but now we’re doing the same with AWS. For most people it probably isn’t actually reasonable to use AWS, apart from using the trodden path and not having to search for different vendors. Which is totally fine if you want to spend that money.

        5. 2

          I personally wouldn’t run anything business related off kimsufi, OVH doesn’t really care all that much about it and will leave you with dead hardware. It’s not that much more expensive (for a business at least) to go up to an actual OVH or use something like Hetzner if you’re only concerned about Europe. Even if you want to go for “cloud” machines then OVH or Hetzner might be a better bet anyway.

      2. 2

        Right, and furthermore, with hardware being as powerful as it is, much more powerful than needed for most use-cases, you may not need more than a small fraction of that expensive resource for a very long time, or essentially forever.

    2. 13

      As others have mentioned, it makes sense that running on a vm is slower than running on the actual hardware.

      Ruby also has a sort of vm, which hurts performance too. One could imagine a similar approach where the Basecamp/Hey hotspots are replaced with a faster language via FFI or routing to a whole separate app/codebase. I am almost certain they will never do this though because they value the convenience and flexibility of Ruby and are willing to pay the cost.

      Other people value the convenience and flexibility of the cloud and are willing to pay the cost.

    3. 10

      people underestimate how difficult it is to manage hardware. things like

      • capacity planning
      • datacenter footprint
      • storage device failure/replacement
      • data loss due to mismanagement of said storage devices
      • backups
      • virtualization
      • layer 2 networking - storage topology design
      • layer 3 routing - mpls, bgp, etc
      • storage networking - fibrechannel? 100gbps ethernet?

      you can ignore all of the above by using a managed service / cloud. people forget what a nightmare hardware management is - or they never managed hardware in the first place. it’s hard and really sucks tbqh

      1. 4

        people underestimate how difficult it is to manage hardware. things like

        This list is an understatement of what a DC provides.

        you can ignore all of the above by using a managed service / cloud. people forget what a nightmare hardware management is - or they never managed hardware in the first place. it’s hard and really sucks tbqh

        You can ignore all of the above by using a reputable DC. This entire list is bog standard support that the DC provides, often in a way that you cannot opt out of. The only wildcard is backups: that’s not necessarily always provided, but it’s nearly always a cheap addon.

        Going from AWS to running your own DC adds these hurdles, but going from AWS to leased or owned hardware in an existing DC doesn’t imply responsibility for anything here. I don’t think anyone is suggesting “move from AWS to a server rack in your closet.”

      2. 3

        It’s 20 servers total, at two sites. Most of those tasks are easy or irrelevant at that scale.

        One of my previous jobs had multiple entire data centers just for TESTING our software with thousands of configurations of weird hardware. Now THAT was a planning nightmare.

    4. 9

      This compares EC2 instance types from over 5 years ago to current hardware.

      1. 2

        I guess you’re basing this on

        ran on EKS in AWS (that’s their managed Kubernetes setup) using a mix of c5.xlarge and c5.2xlarge instances

        Which amazon described as Skylake 8124M (2017) or Cascade Lake 8223CL (2019-2020). Either way they’re paying good money to amazon.

        Trying to figure out what exactly the cost of a newer version would be, took me 5 minutes of figuring out where they tell you that on the website. So if I look this up for the “reserved instances”, open the calculator (which defaults to #/addService/DynamoDB), then I get “2,522.88 USD”/12months for the C5.xlarge of 4 cores 8gig RAM for 3 years reserved. I’ll select Ireland for the location. Frankfurt, which would be near me is probably way too costly.

        Now their new AMD CPUs are from 2022 and have a ton more cores. The newest compute gen from AWS in the same category (data stuff?) is C7g, which would be 1,931.58 USD for 4 cores 8GiB (still C7g.xlarge). But that’s AWS Graviton, which is AFAIK Aarch64.

        What does it cost for an AMD system ? A C6a.xlarge is at 2,045.37 USD for 4 cores 8GiB of RAM. That’s not the latest EPYC generation (as the server CPU), because you can’t get them with AWS?!

        Now all of this took me 20 minutes to search through the horrific website that is the AWS product site. Calculating the cost per core per time is up to someone else now.

        Anyway I’ve had enough of the price jungle called AWS, just isn’t my thing. This may be outdated hardware, but the pricing still stands.

        1. 2

          I agree that the pricing on the AWS website is annoying to find. I only looked this up after I talked to someone I know who works there.

          However, good luck finding pricing for enterprise hardware online. Usually the only way to get prices at all is to have a call with their sales.

          1. 1

            I had to buy enterprise server stuff, and I’ve had multiple vendors where you could get the prices for a configuration on their website. You will end up asking them anyway for more details or SLAs, but that’s already a good start.

            I think my biggest gripe is that you’re just buying a stock product from AWS and they still won’t tell you what it costs.

    5. 6

      What kind of expertise would a company need to pull an cloud exit strategy? It seems to me that one of the “advantages” that cloud tries to sell is “managed”, i.e. no need for in house sys admin expertise needed (although AWS/Kubernetes/et al has become so complex that dev ops is now a thing even for not so complex applications!)

      1. 8

        It depends on precise circumstances, but as a rough guide: let’s say you’re moving to a datacenter which will provide rackspace in cabinets or cages, reliable power, HVAC, cross-connects to a meet-me room and a hands-and-eyes service. You will need to:

        • spec, buy or lease hardware (servers, switching, router/firewall at a minimum)
        • install and cable the hardware
        • configure switching, routing, firewalling, come up with an IP plan and a naming plan
        • install operating systems
        • figure out monitoring, logging, and alerting (for hardware and OS, as well as your application)
        • maintain infrastructure services - DNS, NTP, local email, possibly DHCP.
        • pick a deployment system

        There are at least two people at my company who can do all of this (I’m one of them) and at least two more people who can, together, handle the whole thing. You need a minimum of two people. If you want 24/7 operations, you need a minimum of 5. Lots of things can be delayed for specialists, lots of bits can be addressed by contractors, and almost everybody really wants a competent DBA as well.

        The thing is, those 2-5 people plus DBA(s) can easily handle a hundred to perhaps a thousand machines, depending on how fast you need them in place and how complex an environment you need.

      2. 3

        It seems to me that one of the “advantages” that cloud tries to sell is “managed”

        This really should be the advantage of the cloud, but no one has yet built the thing that customers want: a computer that they can run their workload on. I’m honestly shocked at how badly IBM is doing in this market. A modern cloud system should be a hybrid of mainframe and supercomputer ideas and IBM has been building both for longer than any of the major cloud player have existed. Instead of a system where I can write my program and deploy it and have it scale up and down transparently, I get to manage a fleet of containers on top of a set of VMs. If anything the management overhead of current cloud offerings is higher than the overhead of managing leased hardware.

    6. 5

      Each of these machines were less than $20,000. Amortize that over five years. That’s $333/month for all the hardware (minus routers etc) needed to run Basecamp Classic today. And this is still a large SaaS app that’s generating literally millions of dollars in revenue per year! The vast majority of SaaS businesses out there would require far less firepower to service their customers.

      holy misleading comparison batman

      is owning your infrastructure probably cheaper? sure

      is this an apples to apples comparison? nah

      the expensive part of any business is the people (not to mention the fact that they are handwaving away all their other costs)

      1. 6

        They talk a lot about pricing in the other posts linked from the body. dhh may be many things, but he’s not stupid. I’m pretty sure he knows how budgeting works.

        This specific article is about performance, and the one paragraph where it mentions pricing is not supposed to be a rigorous comparison—it’s not a comparison at all. It’s just the author saying “wowee, you can get a lot of computer for not much money”.

        1. 1

          It’s the marginal cost (of adding more capacity) being compared, not the total cost. Both are important, but once you the infrastructure is in, the marginal cost is usually more interesting.

      2. 3

        I may not have all the aspects in my head, so please add some I miss. What I though of:

        1. More ppl needed for owned infra vs. cloud: in this case, to some extent, I’m paying cloud instead of my people. I’d probably vote for my people.

        2. If I manage my own infra, I’ll need people with ops skills. In the cloud, I’ll need people with ops skills AND cloud specific skills as well (vendor lock-in?).

      3. 2

        Large and complex cloud deployments also need a lot of effort and constant oversight. Guess what Amazon is doing when your Kubernetes cluster on AWS is not working properly? Yep, nothing, that’s on you. The effort to maintain the hardware on which everything runs is a tiny fraction of the time spent maintaining the entire system. And debugging a cloud deployment may even be substantially harder than when you can call your in-house pals and see for yourself if the server room is on fire. The value proposition of the cloud only looks good to those with myopia or very specific needs.

    7. 4

      I’m curious what the cause for the performance increase is - is it just that the on-prem hardware is that much better than the cloud hardware?

      1. 8

        Like caius said, the hardware is probably better. But also; overhead and jitter is introduced by the various virtualization layers.

        Here is a good read on the subject: https://www.brendangregg.com/blog/2017-11-29/aws-ec2-virtualization-2017.html

        1. [Comment removed by author]

          1. 1

            From my understanding, Nitro is for the base layer. That kind of overhead is what you get if you order bare-metal instances on AWS. Then on top of that, you get the VM overhead for most instance types. A fairer benchmark would compare with one of the metal instances, but those are also 5-10x more expensive: https://instances.vantage.sh/?cost_duration=monthly&selected=c6a.8xlarge

      2. 2

        Probably the hardware is better specced, but also there’ll be less running on it. AWS has multiple customers on the same hardware so they have to ensure nothing leaks cross-tenant. And there’s also noisy neighbour, when you’re the only customer on the box you can tune it much more easily knowing there isn’t someone sat there rinsing CPUs next to you. Not sure what they’re doing for storage too, but that is likely local rather than network attached too.

        Turns out having dedicated hardware running one tenant is superior in performance. Reminds me of the (many) times we (re)discovered that during the days of VMs vs bare servers.

        1. 2

          This + fewer abstraction layers with on-prem hardware. The closer to the metal you are, the more performance you’d get - always.

      3. 2

        IIRC Ruby benefits from faster single-core speed so moving to on-prem is going to give you some benefit. Jeff Atwood’s Why Ruby? is old, but cover a lot of points. I haven’t kept up with how Discourse are doing their hosting, but Jeff has mentioned core performance over the years on Twitter.

        I see other comments about Ruby having a VM, but that’s often only a problem when you have limited awareness of your platform and managing performance on said platform. In Bing’s move to .NET 7 with Principal Engineer Ben Watson you can hear a commentary on how awareness of generations in the .NET GC can help optimize along with the implications when there are modifications to the GC. You can make similar comments on Python GIL conversations that never address the nature of the performance problem.

        1. 2

          I’m not sure if they still sell them, but for a while Intel sold chips with 128 MiB of EDDR as last level cache. We got some for FPGA builds (single threaded place and route, CPU bound with fairly random memory access patterns) and they were a lot faster than anything else Intel sold for this workload (lower core counts and lower clock speed, but when you have a single threaded workload that spends a big chunk of its time waiting on cache misses, this doesn’t matter so much). I don’t think any cloud providers offer them but they’d probably be a huge win for a lot of these workloads.

          1. 1

            AMD’s 3D V-Cache chips have large amounts of proper SRAM L3 cache. My Ryzen 7 5800X3D has 96MB, and the new Ryzen 9 7950X3D has 128MB. Most of the benchmarking on them has been for gaming. I’d be curious to see web backend benchmarks though.

    8. 1

      They’re going to ride this horse until it doesn’t have legs anymore, aren’t they?