Hopefully next is some degree of ops-conservatism going mainstream. The current situation is only bearable for very large organizations and IMO we have just gotten better at “throw it over the wall”.
We are now at a point where we are really bad at anything but kilo-core/TB-ram clusters because at less than that the cost of ops eclipse the hardware. Continuity from development, small deployments to “web-scale” has been largely sacrificed.
The other main branch in the tree is “own-nothing” of specialized cloud offerings where you have to use mock implementations to even do testing.
Containerization was a significant improvement for the “works on my machine”-problem, but it has been terrible for deployment performance and artifact management. It’s proof we failed to package and distribute our services.
Similarly the overlay-network based mesh is an indication that the network layers and OS integration doesn’t match how we think about connectivity between services.
My hope is that we do more to resolve these basic issues at the OS level and then rethink k8s etc, how would they work if the OS solved networking and process isolation?
The container abstraction is so shallow that once you start trying to debug multiple containers working in concert, it only works for everyone using the same operating system. It essentially only “solved” disk snapshotting and distribution of snapshots. Engineers working on macos vs linux vs windows (even WSL) are working as diffrerently as a cloud engineer deploying on ECS vs EKS vs AKS. We’ve just shifted the character of the differences between machines.
The RPC nature is a big part of the issue. The networking options around containers are very different across operating systems. The ability of containers to interact with the host is a big area of difference. The Docker Engine approach of running containers in a VM limits those containers in ways they would not be limited when deployed (in most cases). This is particularly true on MacOS where the interactions between containers and the host depends (at least at one point ~2017) on whether you were using a wired or wireless connection to your network.
So move to SOA and RPC, partially spurred on by containerization, pushes the differences to the area of maximum difference. The reaction most people have is to try to push everything into a network managed by Docker Engine. That can make debugging tests completely unrepresentative of the relationships between applications when they are deployed. The ability to mock external dependencies has been very important in my debugging work.
The MacOS wired vs wifi case is extreme but I’ve been bit by this in my work. I was debugging a call to an external API. I had the application running locally, working against a containerized version of the external API (the connections may have done in the other direction, this was a long time ago). I was making progress debugging it. I had to leave the office to catch a flight. While on the plane I could no longer reproduce the issue because the host application could no longer connect to the container. Worse than “works on my machine” because it was the same machine!
For example, the rise of sqlite as an alternative to more complex DBs which run on separate machines. One way to simplify a system is to use parts which require less maintenance and administration.
Right up until you need any kind of high availability, then you start adding replication layers to SQLite and grow something as complex as a traditional RDBMS.
There’s a huge class of apps which don’t need high availability but would benefit hugely from a simplified deploy and cheap hosting costs. I think Mastodon or WordPress are perfect examples.
I agree on blog hosting (if it’s down, I doubt many people care), I’m less sure with Mastodon, at least long term. If my email is down, that’s increasingly problematic because people use it for time-sensitive communication. I don’t think Mastodon is in the same space yet, but it aspires to be.
If the discourse on sites like this one is any indication, I think there’s a growing backlash against the trend of ever-increasing abstraction. Not just here in infrastructure, but elsewhere as well; for example, some of us want to go back to writing GUI applications in systems programming languages (albeit with better safety, e.g. Rust). We’ve had enough of teetering atop the tower of abstractions, and we want to find other ways forward. So, I think the future should be finding better ways to deploy and manage infrastructure at level 1 or 2, while keeping the good parts of higher-abstraction deployments (e.g. reproducible infrastructure as code). I’m watching the Nix scene with interest, but haven’t yet jumped in with a real NixOS deployment.
Maybe we should. Specifically, maybe Levels 3 and 4 are the highest we should ever go, and even then only at massive FAANG-order scales.
Abstraction is useful, but its inevitability is not a natural law. It’s a design principle. Like all design principles, it stops being useful the moment it becomes dogma.
Chances are that future infrastructure will be affected by laws and regulations more than by technical limitations.
We may end up in a future where countries have sovereign segments of the network with strict rules regarding data storage and transport.
Maybe this will drive enthusiasts to find refuge in an alternative to internet - small private mesh networks that are integrated with each-other.
I find that possibility quite compelling, but your wording confuses me. The internet is already a “network of networks” - how is what you’re describing different?
Data residency requirements are already here. Every government/health usage is being constrained this way. The only thing you get is physical presence. It’s just as easy to exfiltrate data if storage is connected to the network. And it prevents you front taking advantage of the cheapest prices, wherever they might be world-wide. And yet, governments forge on ahead.
The future is impossible to predict, but perhaps the next level is something like computing as in the novel Permutation City - many providers vying for business on a generic MIPS exchange, with computations paused, their state captured, then seamlessly restarted from the same state on a computer somewhere else. It’s a neat thought although this level of interoperability is anathema to cloud provider profits. Competition is expensive; monopoly/oligopoly with vendor lock-in is where the real money is. Beyond those political/economic concerns I’d say data transfer bandwidth is the main technical hurdle to this vision becoming reality.
Interestingly I got it from a very pro-capitalist man, Peter Thiel. This is a prominent thesis in his book Zero to One: the purpose of a firm is to achieve monopoly.
This is a very interesting subject in economics. Are you familiar with the paper from Ronald coast on why firms exist? It is pretty short, readable and changed the whole field. And the whole monopoly thing appears as empire building in governance and information economics. If you like the subject, you will like the texts :)
The problem is data gravity. With a bit of leg-work, moving container load around multiple clouds is already possible. Kubernetes allows multi-cloud deployments with some effort, for example. But if you rely on DBs or data warehouses in a particular region/cloud, the compute cost savings need to be massive to justify hopping over the egress cost wall.
There’s also latency requirements for online services. If you need to be in US-central, you’re constrained by the ambient costs of that area - land prices, wages, electricity costs. You can’t just pack up and move to Taiwan.
My guess is that the cloud will become an interchangeable commodity. There will be enough intermediaries between your code and the cloud APIs that you don’t have to care what’s underneath, and can easily use any cloud provider at will. But people have been predicting that for a while, and it hasn’t happened…
I mean, I might have predicted I could run AAA games on MacOS by now, or Microsoft would distrubute Office for Linux by now, or…
I suspect the big cloud providers think it’s in their interest to remain incompatible and “sticky”, and though there are driving forces to make things interchangeable they are probably being actively hindered rather than helped (by making decisions of the same flavour as Apple releasing Metal instead of jumping on the Vulkan bandwagon).
Hopefully next is some degree of ops-conservatism going mainstream. The current situation is only bearable for very large organizations and IMO we have just gotten better at “throw it over the wall”.
We are now at a point where we are really bad at anything but kilo-core/TB-ram clusters because at less than that the cost of ops eclipse the hardware. Continuity from development, small deployments to “web-scale” has been largely sacrificed.
The other main branch in the tree is “own-nothing” of specialized cloud offerings where you have to use mock implementations to even do testing.
Containerization was a significant improvement for the “works on my machine”-problem, but it has been terrible for deployment performance and artifact management. It’s proof we failed to package and distribute our services.
Similarly the overlay-network based mesh is an indication that the network layers and OS integration doesn’t match how we think about connectivity between services.
My hope is that we do more to resolve these basic issues at the OS level and then rethink k8s etc, how would they work if the OS solved networking and process isolation?
The container abstraction is so shallow that once you start trying to debug multiple containers working in concert, it only works for everyone using the same operating system. It essentially only “solved” disk snapshotting and distribution of snapshots. Engineers working on macos vs linux vs windows (even WSL) are working as diffrerently as a cloud engineer deploying on ECS vs EKS vs AKS. We’ve just shifted the character of the differences between machines.
But wouldn’t the containers be talking via some RPC mechanism? Shouldn’t that help with OS heterogeneity? Perhaps I misunderstand what you mean.
The RPC nature is a big part of the issue. The networking options around containers are very different across operating systems. The ability of containers to interact with the host is a big area of difference. The Docker Engine approach of running containers in a VM limits those containers in ways they would not be limited when deployed (in most cases). This is particularly true on MacOS where the interactions between containers and the host depends (at least at one point ~2017) on whether you were using a wired or wireless connection to your network.
So move to SOA and RPC, partially spurred on by containerization, pushes the differences to the area of maximum difference. The reaction most people have is to try to push everything into a network managed by Docker Engine. That can make debugging tests completely unrepresentative of the relationships between applications when they are deployed. The ability to mock external dependencies has been very important in my debugging work.
The MacOS wired vs wifi case is extreme but I’ve been bit by this in my work. I was debugging a call to an external API. I had the application running locally, working against a containerized version of the external API (the connections may have done in the other direction, this was a long time ago). I was making progress debugging it. I had to leave the office to catch a flight. While on the plane I could no longer reproduce the issue because the host application could no longer connect to the container. Worse than “works on my machine” because it was the same machine!
Ah! So container runtimes aren’t as uniform as the snapshot/distribution formats.
Some of us are still advocating for more usage of Level 1. Modern servers are unbelievably powerful and reliable.
Could you elaborate on this a bit more?
For example, the rise of sqlite as an alternative to more complex DBs which run on separate machines. One way to simplify a system is to use parts which require less maintenance and administration.
Right up until you need any kind of high availability, then you start adding replication layers to SQLite and grow something as complex as a traditional RDBMS.
There’s a huge class of apps which don’t need high availability but would benefit hugely from a simplified deploy and cheap hosting costs. I think Mastodon or WordPress are perfect examples.
I agree on blog hosting (if it’s down, I doubt many people care), I’m less sure with Mastodon, at least long term. If my email is down, that’s increasingly problematic because people use it for time-sensitive communication. I don’t think Mastodon is in the same space yet, but it aspires to be.
[Comment removed by author]
Please god someone drive a stake through the heart of Kubernetes.
If the discourse on sites like this one is any indication, I think there’s a growing backlash against the trend of ever-increasing abstraction. Not just here in infrastructure, but elsewhere as well; for example, some of us want to go back to writing GUI applications in systems programming languages (albeit with better safety, e.g. Rust). We’ve had enough of teetering atop the tower of abstractions, and we want to find other ways forward. So, I think the future should be finding better ways to deploy and manage infrastructure at level 1 or 2, while keeping the good parts of higher-abstraction deployments (e.g. reproducible infrastructure as code). I’m watching the Nix scene with interest, but haven’t yet jumped in with a real NixOS deployment.
The water is warm enough to jump in :)
If you want to dip a toe in, there’s https://devenv.sh for development purposes. You can move from there to deployment later if you like it.
Maybe we should. Specifically, maybe Levels 3 and 4 are the highest we should ever go, and even then only at massive FAANG-order scales.
Abstraction is useful, but its inevitability is not a natural law. It’s a design principle. Like all design principles, it stops being useful the moment it becomes dogma.
Chances are that future infrastructure will be affected by laws and regulations more than by technical limitations. We may end up in a future where countries have sovereign segments of the network with strict rules regarding data storage and transport.
Maybe this will drive enthusiasts to find refuge in an alternative to internet - small private mesh networks that are integrated with each-other.
I find that possibility quite compelling, but your wording confuses me. The internet is already a “network of networks” - how is what you’re describing different?
Data residency requirements are already here. Every government/health usage is being constrained this way. The only thing you get is physical presence. It’s just as easy to exfiltrate data if storage is connected to the network. And it prevents you front taking advantage of the cheapest prices, wherever they might be world-wide. And yet, governments forge on ahead.
The future is impossible to predict, but perhaps the next level is something like computing as in the novel Permutation City - many providers vying for business on a generic MIPS exchange, with computations paused, their state captured, then seamlessly restarted from the same state on a computer somewhere else. It’s a neat thought although this level of interoperability is anathema to cloud provider profits. Competition is expensive; monopoly/oligopoly with vendor lock-in is where the real money is. Beyond those political/economic concerns I’d say data transfer bandwidth is the main technical hurdle to this vision becoming reality.
That’s the most succinct criticism of capitalism I’ve ever read.
Interestingly I got it from a very pro-capitalist man, Peter Thiel. This is a prominent thesis in his book Zero to One: the purpose of a firm is to achieve monopoly.
This is a very interesting subject in economics. Are you familiar with the paper from Ronald coast on why firms exist? It is pretty short, readable and changed the whole field. And the whole monopoly thing appears as empire building in governance and information economics. If you like the subject, you will like the texts :)
The problem is data gravity. With a bit of leg-work, moving container load around multiple clouds is already possible. Kubernetes allows multi-cloud deployments with some effort, for example. But if you rely on DBs or data warehouses in a particular region/cloud, the compute cost savings need to be massive to justify hopping over the egress cost wall.
There’s also latency requirements for online services. If you need to be in US-central, you’re constrained by the ambient costs of that area - land prices, wages, electricity costs. You can’t just pack up and move to Taiwan.
My guess is that the cloud will become an interchangeable commodity. There will be enough intermediaries between your code and the cloud APIs that you don’t have to care what’s underneath, and can easily use any cloud provider at will. But people have been predicting that for a while, and it hasn’t happened…
I mean, I might have predicted I could run AAA games on MacOS by now, or Microsoft would distrubute Office for Linux by now, or…
I suspect the big cloud providers think it’s in their interest to remain incompatible and “sticky”, and though there are driving forces to make things interchangeable they are probably being actively hindered rather than helped (by making decisions of the same flavour as Apple releasing Metal instead of jumping on the Vulkan bandwagon).