Cloud is the ideal place to build software while looking for product-market fit because the cost of failure is very low. Once you hit product-market fit and scale up past a certain point, you can save money by migrating to on-prem bare-metal.
I agree, a few years ago another company in town was suffering similar performance issues with AWS EBS (pre-performance options) they setup a VPN link to high performance storage in their data center and let the app run on AWS. It seemed to work out for them however I am unsure of how their infrastructure operates now.
I’ll be more interested to read their post after they’ve been on bare metal for a while. Going on-prem is one of those things that as a nerd I’d love to believe is better, but I haven’t heard a lot of great success stories.
It turns out running a data center is a very difficult thing to do if you’re trying to optimize for cost, redundant utilities, manageability, and green-ness (high power efficiency per space).
Is renting space in a DC the same as running one? I was always under the impression one rented out some space and got some power outlets and network connections and everything else was managed by the DC owners.
No, it’s not the same. It’s a compromise between the two extremes.
You get to skip the headaches of having to manage redundant power, fibre connectivity, most of the green-ness concerns, though you still need to do due diligence to assure yourself that the people running the DC are doing all those things to a standard which is acceptable for your use case.
You get to keep the cost savings from buying machines up front (and amortising the capex) instead of renting them — this is AFAIK where most of the opex savings come from going when switching from AWS to anything else. (Do price this up against reserved cloud VMs rather than against on-demand cloud VMs though, because you’re committing quite hard when you buy servers up front.
You will pay more for electricity, space, physical management and connectivity than the raw price of running a DC, because of course the company running the DC wants to make a profit. Note that a big DC selling colocation to a whole bunch of customers is going to be able to get way better economies of scale on some things that are really expensive, (such as the person-time required to keep multiple actually-redundant internet connections in the face of telcos merging lines without telling you,) which might offset the cost of their profit margin until your need for machines gets gargantuan.
You will have to manage physical machines yourself, including managing the risk that an actual physical machine has an actual physical fault and dies. This can be “fun”; there are good and bad ways to find out what the lead time is for Dell/HP/etc to assemble and ship a new box to a colo (IME, most of a month), try to do it one of the good ways. ?
I was always under the impression one rented out some space and got some power outlets and network connections and everything else was managed by the DC owners.
Yes, that’s what colocation facilities offer. You’ll have to manage what the machines actually do by yourself. Usually a KVM-over-IP switch too for maintenance tasks. The colo will also provide basic services like “stick a Ubuntu 16.04 DVD in the drive and push the ‘on’ button for you so you can run the installer for that via the KVM”.
Oh and one more thing: you can satisfy “this shared hypervisor isn’t providing enough storage IO, we need bare metal” without actually having to do all the above capex/opex tradeoff, manage and purchase physical machines yourself, etc. Some companies will quite happily rent you bare metal machines by the hour on roughly the same basis as they’d rent out VMs to you. (e.g. RackSpace sell this as “OnMetal”)
Edit: Amazon AWS sell something similar as “dedicated hosts”, where you still get a VM rather than bare metal but it’s guaranteed to be the only VM running on that physical server, so you aren’t subject to noisy-neighbour problems.
Interesting read and the comments are also interesting, suggesting that perhaps their choice of Ceph and their architecture may have been less than ideal. I don’t know much about Ceph other than basic high-level concepts so can’t really comment on that.
Cloud is the ideal place to build software while looking for product-market fit because the cost of failure is very low. Once you hit product-market fit and scale up past a certain point, you can save money by migrating to on-prem bare-metal.
I agree, a few years ago another company in town was suffering similar performance issues with AWS EBS (pre-performance options) they setup a VPN link to high performance storage in their data center and let the app run on AWS. It seemed to work out for them however I am unsure of how their infrastructure operates now.
I’ll be more interested to read their post after they’ve been on bare metal for a while. Going on-prem is one of those things that as a nerd I’d love to believe is better, but I haven’t heard a lot of great success stories.
It turns out running a data center is a very difficult thing to do if you’re trying to optimize for cost, redundant utilities, manageability, and green-ness (high power efficiency per space).
Is renting space in a DC the same as running one? I was always under the impression one rented out some space and got some power outlets and network connections and everything else was managed by the DC owners.
No, it’s not the same. It’s a compromise between the two extremes.
You get to skip the headaches of having to manage redundant power, fibre connectivity, most of the green-ness concerns, though you still need to do due diligence to assure yourself that the people running the DC are doing all those things to a standard which is acceptable for your use case.
You get to keep the cost savings from buying machines up front (and amortising the capex) instead of renting them — this is AFAIK where most of the opex savings come from going when switching from AWS to anything else. (Do price this up against reserved cloud VMs rather than against on-demand cloud VMs though, because you’re committing quite hard when you buy servers up front.
You will pay more for electricity, space, physical management and connectivity than the raw price of running a DC, because of course the company running the DC wants to make a profit. Note that a big DC selling colocation to a whole bunch of customers is going to be able to get way better economies of scale on some things that are really expensive, (such as the person-time required to keep multiple actually-redundant internet connections in the face of telcos merging lines without telling you,) which might offset the cost of their profit margin until your need for machines gets gargantuan.
You will have to manage physical machines yourself, including managing the risk that an actual physical machine has an actual physical fault and dies. This can be “fun”; there are good and bad ways to find out what the lead time is for Dell/HP/etc to assemble and ship a new box to a colo (IME, most of a month), try to do it one of the good ways. ?
Yes, that’s what colocation facilities offer. You’ll have to manage what the machines actually do by yourself. Usually a KVM-over-IP switch too for maintenance tasks. The colo will also provide basic services like “stick a Ubuntu 16.04 DVD in the drive and push the ‘on’ button for you so you can run the installer for that via the KVM”.
Oh and one more thing: you can satisfy “this shared hypervisor isn’t providing enough storage IO, we need bare metal” without actually having to do all the above capex/opex tradeoff, manage and purchase physical machines yourself, etc. Some companies will quite happily rent you bare metal machines by the hour on roughly the same basis as they’d rent out VMs to you. (e.g. RackSpace sell this as “OnMetal”)
Edit: Amazon AWS sell something similar as “dedicated hosts”, where you still get a VM rather than bare metal but it’s guaranteed to be the only VM running on that physical server, so you aren’t subject to noisy-neighbour problems.
Interesting read and the comments are also interesting, suggesting that perhaps their choice of Ceph and their architecture may have been less than ideal. I don’t know much about Ceph other than basic high-level concepts so can’t really comment on that.