It’s going to be cheaper. Especially when you consider employee costs. The most expensive AWS RDS instance is approximately $100k/year (full price). It’s not an apples to apples comparison, but in many countries you can’t get a database architect for that salary.
“not an apples to apples comparison” is putting it mildly. You’re comparing the cost of some infrastructure with the cost of an application-level specialist you’d need either way. The equipment to run a comparable database instance on-premises should cost less than half that. Ops costs are harder to compare, but $work certainly doesn’t retain an operations engineer per big RDBMS instance.
More generally, though, I think this is the way things are going without encouragement, and I hate it. It feels like the list of places where ops involves working with computers is shortening. I could offer all sorts of reasons why I think it’s a bad trend—it’s concentrating people, information and resources in a few huge companies, network effects stifle innovation, bespoke architectures are more efficient, your NOC is more likely to answer your calls when they work for you, blah blah. Mostly, though, it just makes me sad to see the things I enjoy becoming increasingly irrelevant except to a small group of companies I don’t want to work for.
TITANESQUE DISCLAIMER: I FIX USER OUTLOOK MAILBOXES FOR A LOCAL IT COMPANY
Yes, always cheaper at first. And then your tiny thing grows, and then you cannot live without your managed services, and your managed service knows it, and then it is not cheaper anymore.
This. Half of the jobs I had could be summed up as: oh shit, managed service X got out of control and its cost became unbearable.
In one instance, it ended up in the company firing 70% of its employees, because one morning, some GCP service costs had sky rocketed causing the main project of the company to fail.
To be honest, had they made a simple extrapolation when they signed up for the service, the situation would be 100% expectable.
Of course it depends. S3 value offer is difficult to replicate as a self managed service, but things like hosted RDMS or even ec2 instances are really only an advantage when starting up and toying with tiny loads. Once your serviece start scaling, they quickly result in 1-3 orders of magnitude of extra costs. Which is unlikely irrelevant.
You’re comparing the cost of some infrastructure with the cost of an application-level specialist you’d need either way.
I beg to differ. Lots of places can get by without the application-level specialist. Until they can’t. Making that decision is of course is an art, but using a managed service will extend how big you can get without a FTE (maybe you will need some consulting for tuning a query, for example, but that’s cheaper).
More generally, though, I think this is the way things are going without encouragement, and I hate it.
I understand this perspective. I still think there are ops tasks and jobs, but yes, they are certainly changing. I do think that managed services lets people build a lot more software (the same way that blogs let a lot more content be created than hand written html) but appreciate that the effects aren’t entirely positive.
One thing that I think gets lost in this is that a company is theoretically valued also at the in-house talent. If your company is basically just a glorified rebundler and reseller of service providers, you lost the ability to leverage that in-house talent. You also by definition reduce the ability to distinguish yourself from other companies that are doing the same thing.
Second, there is a very real problem where people buy a service provider and then just patch over the issues with it using…another service provider! This has this weird inflection point where suddenly any scaling is a heart-stopping event as you graduate from the “developer wants to pad resume tier” to the “somebody in the C-suite needs to sign a contract tier”.
I think it’s kinda relevant that you are a developer evangelist for one of these hosted services. Neither good nor bad, just an important datapoint.
Thanks for pointing that out! I have actually been espousing this viewpoint for years. I also taught AWS certification courses and built a start-up in the last 5 years and both of those cemented my view about leveraging managed services.
But yes, where people sit definitely affects where they stand, and I am no different. (I will also submit that, in my opinion, my employer’s offering beats the pants off the hand rolled authentication systems I have seen over the years.)
One thing that I think gets lost in this is that a company is theoretically valued also at the in-house talent.
That’s a good point. I think every company needs to decide what it is good at. That may be infrastructure for some companies, for others it might be enterprise sales, etc, etc. I have left companies where I knew my department was never going to be the focus of the company, because that limited my growth. My favorite old post about this topic: https://www.joelonsoftware.com/2001/10/14/in-defense-of-not-invented-here-syndrome/
From that post:
If it’s a core business function — do it yourself, no matter what.
As you mention service provider sprawl is a real problem. Being disciplined about managed services, including when to leave them, is just as hard as being disciplined about maintaining your own code. Maybe even harder because the effort is lower for the managed service (which would argue against my thesis, as you say).
But yes, where people sit definitely affects where they stand, and I am no different.
Same here! And I didn’t mean that as a gotcha, more of as a “this poster has probably seen the spectrum of this philosophy in practice and that is relevant.”
You bring up a lot of good points, especially about bespoke architecture. I think my only push back would be that the landslide majority of the time I set up a database (or any other service), it’s with the same configuration and features as everyone else. It makes more economic efficiency to cater to the general case, both for the vendor and the customer (or for the DBA and the company). Inevitably there will be divergence especially as a company or product grows in complexity.
I enjoy working on these architecture as well, but I find setting up the same old database again and again tiresome. I’d rather defer that action to a vendor until there’s a real problem worth solving and dive deep there.
It’s not fading into irrelevance when more companies are given the ability to use database (or whatever) at scale. On the contrary, more companies using databases mean more relevance for the people that know how to use them well. The statistical majority can stay under the middle 50% of the use case covered well by the vendor and leave me to cover the outer 50% where the inside and outside tails have interesting use cases.
These complaints seem like general grievances with any kind of automation. More work is accomplished by fewer people, and those people are increasingly specialized. Bespoke widgets are replaced with cold, stamped-out widgets. Customer service decreases.
This isn’t to say your frustrations are invalid, only that they are normal and increasing as we automate away more and more skillsets. It’s a bummer to see your hard-won skills falling to a dramatically cheaper service.
Managed services frequently determine the shape of your application, the velocity of feature delivery, and, when gone deep in, it removes the capability to use sophisticated open source software.
This kind of smear-it-on-everything advice is not good. You need to analyze what your software does, where it will be, what the organization needs, and what it plans to need.
It’s going to be operated well. The expertise that the cloud providers can provide and the automation they can afford to implement will likely surpass what you can do, especially across multiple services.
That, in the most blunt sense: depends. Wildly. EC2 is a good service. DynamoDB is deeply limited. GKE is not deeply configurable. Azure has some strange APIs.
It’s going to be cheaper. Especially when you consider employee costs. The most expensive AWS RDS instance is approximately $100k/year (full price). It’s not an apples to apples comparison, but in many countries you can’t get a database architect for that salary.
Dollar for dollar, but that’s not even appropriate level analysis. You need to analyze what it does to the system under delivery: limitations, expansions, and integration work.
It’s going to be faster for development. Developers can focus on connecting these pieces of infrastructure rather than learning how to set them up and run them.
Same as above.
The advent of managed services and writing integrations/overlays has given a lot of people extremely well paid jobs. Thought should be put into this decision: second and third order effects will start to outweigh simplistic considerations like the above.
“My cloudformation changes are taking a week to be applied” -> iterate that problem enough and you’ve delayed releases by months, costing far more in reputation, income, morale, etc than that Staff Engineer you didn’t want to hire.
But, it has APIs, right? If you self-manage, do you want to spend engineering hours making your own API? I don’t think I would. I’d rather build my product.
This depends, heavily, on what that service adds and what that service API looks like. The worse the API is, the fewer cases it covers, the worse it is for me. The more commodity the tool is, the more comfortable I am in using it, because it implies I can (1) migrate (2) there’s competition vs providers. My “effectiveness” moat needs not to be a thin wrapper over an API, it needs to be much better.
Every third party thing I integrate demands hours of work, design review, etc. The rough complexity of integrations is n!, n being the number of integrations(technically each third party thing has its own complexity count).
Our own code might be something like polynomially expensive to maintain. If you can keep your complexity count under the complexity count that I take in - both initially and over time (dev time, ops time, maintenance time), then you will win.
Note that I’ve personally been in an org which was so bought into a certain cloud provider that it had no ability to innovate via, e.g., using Apache projects, and all improvements came from the cloud provider… we’d essentially stalled out. New divisions actually wound up spinning their own devops orgs because we’d become so incapable at delivering new tools because “not supplied by cloud provider”.
What we’re actually talking about is called vertical integration in the general case. It can be enormously effective if you can swing it.
Let me ask you this question: how much does Google use from AWS? (Mind, I don’t think you can answer. ;) But Google is notorious for writing its own software).
edit: Note that what I’m saying is not “Don’t use cloud services/SaaS services”. It is: understand deeply what you’re doing and the ramifications of your choices, both technically and organizationally.
On one side, I strongly agree with this. I use GCP and DigitalOcean often to outsource what I do.
On the other hand, I’m watching an entire community of people put out fires because they built their IT on a managed service which Apple bought and effectively terminated yesterday, causing people to wake up to entire fleets of devices with broken policies.
Like everything else in tech, there’s no right answer, rather it’s a set of tradeoffs someone has to make.
I think there is definitely a difference between using AWS/Azure/GCP/AliCloud and a startup like Fleetsmith. I feel super sad for the people that got impacted, as that sunset is really bad (I know that GCP has a 1 year sunset for GA products). If you’re using say, GKE for your k8s clusters, you can be confident that’s not going away.
Yesterday I was trialing EKS (k8s) on AWS. I did not like the experience, I ended up abandoning the AWS native method for a third-party tool called eksctl and it still took ~30m to provision a 2 node cluster. I cannot begin to imagine how one would self host a k8s cluster.
So yes, there are trade-offs, but I think there are definitely ways to mitigate them.
P.S. Given the Fleetspeak turn-off, one great service going away that would keep me up at night is PagerDuty, there really is no product that I know of that is anywhere near as good.
a difference between using AWS/Azure/GCP/AliCloud and a startup like Fleetsmith
Is there thought? So only use a big provider (AWS/GCP/Azure) for your startup project? No Digital Ocean/Vultr? Those are both fairly large shops with a lot of startups on them. But they’re also not too big to fail. Digital Ocean is offering more managed services (databases and k8s) but if they ever declared bankruptcy, your startup will be scrambling for another service (and could find yourself with a much higher bill on the big three).
I’d rather see more open source management solutions for things like simply full redundancy management for postgres or mysql. What I’ve found is that most shops that have this kind of tooling keep it under lock, and it’s proprietary/specific to their setup.
I think managed services are bad due to cost and lockin, and they’re also having the side-effect on slowing innovation for better tooling so people can self-host those same solutions.
Yes, the loss of DigitalOcean in particular would be a huge blow to the ecosystem. Their documentation in particular is fabulous.
I’m unclear about whether I’d agree with lock-in as long as you are judicious if this is a concern, e.g. Google Cloud SQL is just Postgres/MySQL with Google juju underneath to make it run on our infra. There’s nothing stopping you dumping your database at any time. Same goes for something like a service like Cloud Run where you’re just deploying a Docker container, you can take that anywhere too. But then if you go all in on GCP BigQuery, then yeah, you’re going to have a harder time finding somewhere to take a data dump of that to.
I would say that the difference isn’t big provider vs startup but infrastructure-as-a-service vs software-as-a-service. Sure the major cloud providers have some software they offer as services but they all also have VMs that you can spin up and install whatever you want on. It’s not like you can install Fleetsmith on your own machines.
Managed services == cloud which leads to vendor lockin which leads to centralization of the internet.
Hard pass.
Own your infrastructure, own your data. If you care about your business you’ll do this. If you are cash strapped and can’t find or afford the talent, that’s a whole different cup of otters.
Context: I contract to (mostly) small companies providing ops/tooling and dev services.
TLDR: If you don’t want to have to hire in-house ops people to manage your DB layer, I’d very much recommend a company (e.g. Percona or similar) to manage it for you, on infra you control.
Long rant-y version:
This is (one of) the key point that surprises clients when I talk to them about a multi-vendor solution.
They mostly understand fairly well the risk of relying on a single company, especially with how rube-goldbergy AWS is.
“Ok so we’ll just use <cloud 1> and <cloud 2>, right?”
I’ve literally never seen any first party managed DB service (i.e. where the management isn’t provided by some third party, e.g. Percona) that will even acknowledge the existence of a managed instance provided by another company. And with the direction that AWS in particular goes, you’d be crazy to try it: replicating to/from their “patched” / “reimplemented” versions and a regular instance elsewhere? Sounds like you’re just begging for incompatibilities there.
At the very most, I’d vaguely agree (with the overall discussion, WRT databases) that some businesses might benefit from having a third party manage their DB setup, across multiple vendors (i.e. just using the vendors to provide virtual machine instances or some form) that the business itself controls (i.e. you give DBs-R-Us access to your DB infra, not pay them to provide you with a hosted DB service). In this scenario the big “cloud” operators are likely worse options, as they’re very much keyed around using as much of there “as a service” stack as possible, and just renting dumb VMs from them 24x7 is ridiculously expensive compared to the competition.
Many of these managed services like S3 are black (or at least grey) boxes. The edge case SLAs can be unpredictable and when you experience issues like multi-second latencies, it’s impossible to debug.
Running your own service means you can strace, gdb, top or even add printf() statements to debug. The power is in your hands.
Running your own service means you can strace, gdb, top or even add printf() statements to debug. The power is in your hands.
I’d argue that if you’re running strace or gdb on your database instance unless you’re breathing some VERY rarefied air and doing some incredibly specific work where you’re using the database in an incredibly specialized way, this should be seen as an antipattern.
And, if you are in fact in that 10% who have a valid need to do this, you probably wouldn’t even consider managed services anyway.
They’re for people who want to treat infrastructure like LEGO. There are some valid reasons to hate this conceptual model, but I claim that the industry has produced enough counter-examples to at least put up a good fight.
I think this downplays significantly how broken computers are on a fundamental level.
It’s also not a nice sentiment to have when you’re the person who has the control and I, the user, do not.
There’s a simple fact that systems will fail in loud, quiet and interesting ways. When you rent a managed service it’s essentially the same thing as buying a support license from a vendor for a black box unit that sits in your datacenter.
Sometimes the support staff have little incentive to help, sometimes your unit will fail intermittently and be difficult to have support staff on hand during the issue- and, often, the support price will go up as the product gets older. Meaning you have to rebuild your applications to use version 2 of whatever product it is.
My situation is very simple: I’m responsible.
I cannot outsource that responsibility, I can mitigate the risks, but if everything fails my company will come to me first; it was my choice to use a managed service and I have the responsibility if it fails.
Managed services are great though! In theory there are many operations staff working on maintaining them so that I don’t have to. But the flip side is that I get a cookie cutter variant of something with no control if it goes down, and developers are constantly pushing changes to the service without my knowledge which can impact things (for the better or worse).
You’re quite lucky to be working for the dominant cloud provider too by the way, because when you have issues your customers are not going to be beating down your door too much. “If Amazon goes down, half the internet goes down too, so we’re not too worried” is a sentiment I’ve heard developers mention often.
The Google Cloud and Azure guys don’t have this luxury; and you’ll note that it’s never the CTO of a >500 person org who is making this claim either.
You’re quite lucky to be working for the dominant cloud provider too by the way, because when you have issues your customers are not going to be beating down your door too much. “If Amazon goes down, half the internet goes down too, so we’re not too worried” is a sentiment I’ve heard developers mention often.
I am incredibly lucky across a number of axes. I love working here. For me it’s an unbeatable combination - great challenges, great people, great culture. I am also incredibly lucky to be on a great team with a phenomenal manager.
For one, your DB daemon is just a process. It has a stack and a heap. There’s no reason to be intimidated doing traditional debugging–gdb, strace–to identify latency issues.
Your other example about EFS is an even more accessible candidate for debugging. Let’s say I was experiencing 500ms write latencies on EFS. I would have to go…RTFM and hope there’s some 99%ile case I’m missing. On my own NFS server i would jump over there and strace & gdb the daemon to see where the time was spent.
I’m guessing both cases would waste my time just as much, but the 2nd case would empower me with both the understanding of the problem and the tools to get to the solution (patching the daemon and sharing the patch upstream).
The only antipattern I see here is telling devs that they are not sophisticated enough to gdb a daemon. There’s no magic in software. They are deterministic artifacts with stacks and heaps – differing from “hello world” only in size but not in nature.
And just in case this sounds hypothetical, it’s not. I’ve had real world experience on multi-million dollar products where the cloud DB just stopped , or disappeared, or dropped a table unexpectedly. Our only recourse was a phone call, which usually involved a support upgrade, and a long queue waiting for the solution to be found. In one case this was a DB with a > $50k / mo license fee that just disappeared.
So the black-box issue on cloud is real. And cloud marketing is working to cover it up.
To be clear, I did not cite EFS as an example, explicitly because what we offer is a black box, and has to be given the proprietary nature of the service.
that’s a part I’d like to see improved. I’m also a big cloud fan and heavy user. But the move toward more closed services is a bad trend for developers.
With some investment in tracing & debugging, improvements can be made.
My only hard-line stance is against “managed is always better” – there are enormous costs to using a managed service .
My only hard-line stance is against “managed is always better” – there are enormous costs to using a managed service .
Anyone who takes this stance is straight up ignorant.
There are all kinds of reasons why managed services might not make sense for a particular use case. Total control and debuggability is but one of them.
There are regulatory reasons, performance reasons, customizability reasons, and that’s just off the top of my head.
So is this is advice for the run of the mill cloud engineer who can’t be bothered to learn from the experience shared by those who created those managed services in the first place? Maybe my approach can be generalized but I think the information contained in blog posts and whitepapers and conference talks about the tech in the managed services is enough to build in house services in their image. Maybe it’s a bad idea? Probably. Obviously you pay for it in terms of time and effort which can be equated to money and fewer features elsewhere.
What you gain though is control over your own destiny and lower infrastructure costs over time. Honestly it’s probably not the best trade off but I would rather have control than sending emails to engineers or “support specialists” who I can’t fully trust to care about my application or my customers instead of handling it myself. SLAs and reimbursements don’t matter much in that case. At some point the amount of redundancy that I implement to make sure I don’t get screwed because of a service outage or shutdown one day (regardless of any promises they have made to me) is going to eclipse the amount of work I would need to implement the service itself. At that point it becomes a bad deal.
None of them are actually that hard. All of the building blocks exist.
What you gain though is control over your own destiny and lower infrastructure costs over time.
Great way to explain the tradeoff. And different orgs can make different choices. My experience has primarily been at smaller companies where it didn’t make sense to build our own.
I mean, I guess you make the same choice using a commercial service like pagerduty vs building up your own scripts based on a solution like nagios (or something else, I haven’t been in that space for a while). You need to make strategic decisions about what is worth keeping in house and what isn’t.
None of them are actually that hard. All of the building blocks exist.
Even if true (which I don’t grant, especially when you consider edge cases and ongoing maintenance) it becomes an opportunity cost choice. Do I build my own database backup automation system (which is not, for a typical business, a value add) or do I use a managed service (which introduces additional dependencies)?
Whenever companies grow substantially, the comparison of the cost for managed vs running their own services always ends up with the managed service being more expensive. So, managed services aren’t a good idea for anything than taking a shortcut when you have a small team, and it’s going to bite you in the ass when you grow later.
If your organization plans to stay at same level of growth as it currently is forever then maybe your point makes sense, but companies like that tend to either have bounds of time and effort to focus on improving these things (case in point: Craig’s List) or they lay off people because their ship is sailing w/ less maintanence now (usual case).
With that said, you didn’t fix anything. I considered typing “if” when I wrote it, but it’s really a moot argument in practice. If you plan to grow then you should plan to grow. Otherwise, you plan to fail.
Just because you feel differently doesn’t mean you are fixing things. Saying things like “fixed that for you” instead of having an actual conversation is just rude behavior all around.
I already accounted for that edge case at the start, but I also explained why I believe it doesn’t really apply here. It’s possible that you replied while I was still editing since I made a few edits, so not sure. I apologize if that confused the situation.
Either way, it’s completely not applicable whether or not they plan to grow in my personal opinion and experience.
Ah, sorry, I was joking about the fact that often people plan to grow but don’t actually do so. I have certainly been part of such organizations.
And that if you plan for when you are a 1000 person company when you are a 200 person, especially when it comes to engineering intensive efforts without a ton of value add like commodity databases, you may never get to 1000.
In some cases, but I am saying that they are financial debt at a successful company. They are a shortcut to profitability, not a solution to longevity.
They do tend to become technical debt at scale, though, because they are meant to make broad scopes and support every use-case and aren’t usually tuned to perform well for the specific use-case of the product.
Some managed services are compatible with alternatives (kubernetes services are a good example)
You can’t just move k8s deploys from one provider to another. K8s is incredibly complex. Especially when you factor in the different ways kops, EKS, Rancher and other setup master/nodes into your hosting infrastructure. Running k8s on bare metal also presents problems where you have to setup an ingress/egress system .. not to mention the networking landscape changes how tools like Istio would need to be deployed.
In the best case, deployment YAML/json has minimal changes (and really the generation of those should be automated so they can all be updated with minimal changes at once; although tools like Helm are the major tools used for that task and they’re pretty terrible). Realistically, migrating people from one k8s cluster to another is incredibly difficult. k8s was build for providers like GCP (and now AWS and others) and using it pretty much marries you to hosting providers.
Fair points. Perhaps I should have mentioned moving a mysql database from one provider to another, or switching out a standardized queueing system. I don’t have a ton of k8s experience, but from what I’ve read, there’s at least some promise of portability.
I feel like the OP is missing the nuance between “managed service that provides low-cost access to commodity cloud utility, like compute, network, or storage”, vs “managed service that locks you in to a proprietary API and guarantees your margin is eaten by the cloud vendor.”
In AWS, using S3 is the former (commodity storage at low cost), whereas using DynamoDB is the latter (non-commodity proprietary API that is expensive in all respects, and, at the end of the day, is just storage). That being just a pair of examples.
AWS “the good parts” involves EC2, S3, ELB, and, perhaps also, Cloudfront, EMR and Athena. It’s interesting that you can now tell “the good parts” by what has been copied or matched in Google Cloud, so they have GCE, GCS, Cloud LB, Cloud CDN, Dataproc, and BigQuery. These are truly providing thin wrappers over the commodity compute, storage, and network infrastructure. And that’s also why the cloud vendors drop prices on these over time, because they are tracking the cost of the commodities in terms of hardware and data center advances – savings that acrue to you, rather than the cloud vendor’s margin in using their “serverless” infra.
I also do agree that using RDS (or Google Cloud SQL) for a Postgres instance is usually the right move, but that’s more a convenience thing: it’s “just hosted Postgres” at the end of the day.
But once you start heading in the direction of Kinesis, DynamoDB, Cloudsearch, etc. you are coding directly against AWS, essentially treating AWS not as your web/app host, but as your standard library and development environment itself.
And that’s fine, but it also shouldn’t be a crime to consider Kafka instead of Kinesis, Cassandra instead of DynamoDB, Elasticsearch instead of Cloudsearch, etc. Not to mention using those supports open source rather than proprietary stacks. And the good news is, you can reuse your skills with Linux, EC2, and S3 in standing those systems up easily, even more so with the help of tools like Terraform and Ansible.
That’s a great nuance and comment. In my defense, I was limited in the number of words I had :), due to where I was submitting it (never got published, so I just posted to my blog). But you’re correct, there’s a big difference, both from the implementor and cloud vendor perspective, between what is proprietary and commodity. And you as a managed service user should consider that as part of your decision tree.
I think that BigQuery was released ahead of Athena, though (nit).
Yea, BigQuery came first and Athena was a fast-follow. I think there might be an argument, also, that PubSub and PubSub Lite are Google’s “S3/GCS equivalent for basic real-time messaging”. You can view PubSub as a managed service over “raw event/message-level networking” as a commodity, which is really only a small step up from raw inter-machine sockets as might be provided by ZeroMQ or Kafka.
I think AWS will likely imitate that product, or roll the “pull” messaging model into SQS, since right now the best options for this style of messaging on AWS are Kinesis (which isn’t 100% managed, due to the pay-per-shard model) and MSK (which is really just Kafka hosting).
It’s going to be operated well. The expertise that the cloud providers can provide and the automation they can afford to implement will likely surpass what you can do, especially across multiple services.
Not… really. They’re operated by people just like us, they make mistakes, they make stupid decisions, and they’re driven by goals that are never in exact alignment with your own.
It’s going to be cheaper.
No it’s not. If it’s something core to your business, managed is going to be way more expensive.
The most expensive AWS RDS instance is approximately $100k/year
Not really fair, one huge DB instance isn’t the right answer for anyone, and one of the biggest thing RDS has is nowhere near the upper limit on growth. My medium-size employer has 40x that many cores running relational DBs. I’m sure we can find a couple people to do care-and-feeding if it means reducing a $4M bill.
It’s going to be faster for development. Developers can focus on connecting these pieces of infrastructure rather than learning how to set them up and run them.
Or rather, they can focus on creating a layer that makes the third-party thing they were given resemble the thing they actually need, instead of building they actual thing. Then they hope and pray that the cloud thing keeps operating the same way in the future, because all of their work can be made worthless at any time.
This is the “build vs. buy” argument and the fundamentals haven’t changed. If you have a requirement that isn’t actually your core business that you don’t want to hire for, buy. If you’re a startup with no budget trying to prove out a concept, buy for everything except that tiny nugget. If it’s something that actually matters, build.
Rather than focusing purely on price, a more useful perspective is on opportunity cost. The cost of a choice isn’t $X, rather it is what benefits you could have derived from the next best alternative for that $X. Moreover we need to better define what level of abstraction we’re “managing”.
For example, I made an natural language processing based method for choosing baby names. I did not use AWS SageMaker. Heresy! But SageMaker is incredibly limited and does not give me the flexibility I need in terms of tuning my deep learning models that extract features and boosted gradient trees for using those features. At the same time, I run the whole lot on AWS Lambda and ECS. I chose not to worry about scaling, patching servers, and ensuring availability zone fault tolerance, amongst other concerns.
So sometimes asking others to manage something lets you use your time better. Sometimes there aren’t powerful enough managed services at given levels of abstraction to serve your needs, and it’s worth the time and sweat and tears to educate yourself and maintain something custom.
AWS also uses its own managed services, including services not available to the public. It would be insane to re-invent everything from scratch. Imagine the wasted opportunities.
A significant portion of my career has included what I think of as the “cycle of managed services.”
A new project is proposed. They want to use a new, shiny database provided by a specialized SaaS vendor.
I tell them not to do that.
They do anyway. There are immediate problems with performance, and the startup promises they will be fixed as soon as possible.
The startup folds.
There is a frantic rush to move to whatever the Amazon/Microsoft/Google equivalent of the managed service is.
There are more problems because of the subtle differences, despite “API compatibility.”
There is a time-crunched effort to re-write the project to use an established tool instead.
This has seriously happened at least three times that I’ve seen on a large scale, and several other times on a smaller scale (where they hit the first problems and back up and rework it early in the project).
Ultimately, I think a managed service is okay, if that managed service is essentially a copy of some decent open source (or even commercial) tool that someone is managing for you. Postgres? Sure, RDS is fine. Redis? Yeah, go ahead and use Elasticache or Azure Cache. More “niche” things, though, with proprietary APIs? They’re a real risk. You go down with the ship (boutique database startups) or you find yourself mired so deep in the particulars of a bad API that you can never escape (DynamoDB).
This is super refreshing to read. I think the trend towards self hosting, Linux from Scratch, and designing your own processors, motherboards, and firmware are all fascinating and laudable pursuits, but there is value for many people in being able to treat infrastructure as composable building blocks.
In other words people don’t self-host because they want something “noncomposable”. And open source stuff is usually more composable anyway, so I’m not sure what you’re getting at.
They self-host because they want control (and sometimes cost, which may or may not be accurate etc.)
n other words people don’t self-host because they want something “noncomposable”. And open source stuff is usually more composable anyway, so I’m not sure what you’re getting at.
I’m getting at the value of helpful abstractions that allow people who would not otherwise be able to to build complex and interesting systems.
Have you ever run an ElasticSearch cluster? It’s not easy and it’s REALLY easy to think you can and lose all your data.
A managed service not only encapsulates ease of installation but also best practice around operational safety, backups, etc.
Sure, that is a reasonable opinion… but I have trouble getting there from the original comment. Whether ElasticSearch is hosted or not doesn’t affect “composability”
I think it may come down to containerization or not. For better or worse, containers are more composable because the state is more isolated.
Hosted services are also isolated in that you can’t peek inside (again, for better or worse). But it seems like most people are self-hosting with containers these days. There’s a spectrum.
I agree composability is a desirable property but I aim to achieve that without relying on too many cloud services. I guess one way to do that is to only use cloud services with commodity interfaces, e.g. web hosting, SQL, etc. Rather than services which there is only of like BigQuery, etc.
Containerization is one useful abstraction that can definitely help people achieve solutions quickly, but depending on the nature and operational characteristics of the system being containerized, that abstraction only gets you one step of the way towards solving your problem.
I’m sure there’s an ElasticSearch container, but will that help you understand how to operate it with 100% confidence when you can lose your entire data set if you unintentionally perform certain operations when you don’t have sufficient quorum?
I think that’s a pretty ridiculous take. Nobody is proposing you should build your own operating system just that you can’t build a business just gluing other people’s products together.
I think your response is also a straw man. For most software businesses, the value is not in having their own high-availability DB setup or load balancer or any of the other things that the big IaaS providers provide, but in the application that they develop on top of those services.
I think the trend towards self hosting, Linux from Scratch, and designing your own processors, motherboards, and firmware
This is the straw man. :)
For most software businesses, the value is not in having their own high-availability DB setup or load balancer or any of the other things that the big IaaS providers provide, but in the application that they develop on top of those services.
Nobody is saying make your own load balancer, but RUN your own load balancer.
The major providers are / are bound up in entities that exercise power on the scale of nation-states, and an unfathomable number of lives are affected by how their eating of the entire economic possibility space plays out. Meanwhile, it slowly begins to dawn on much of the heretofore quite privileged technical class that their lives are just as contingent and their livelihoods will not be spared once the machinery they’ve helped build finishes solving for the problem of their necessity.
You’ve invented quite the end game for yourself there!
Will there be job reductions and resultant economic pain as a result of the move away from the data center and towards more cloud/managed services? Hell yes there will be!
But those who want to work in this business will find ways to adapt and continue to add value.
I’ve been working in technology for ~32 years. This doesn’t make me magical or omniscient or even smart but what it does make me is a witness to several waves of seismic change.
I came into the job market in the early 90s when DEC was imploding, PCs were ascendant, and the large expensive workstations vendors were dead companies walking but didn’t realize it yet.
Everyone thought the world would end and we’d all be out of work too.
Will it be hard? Yes. Will there be a lot of people who can’t or don’t want to adapt and will be left behind? Undoubtedly. Is that bad? Yes. However it’s the way our industry works and has since its inception.
We are all turning the crank that drives the flywheel of increased automation, whether we administer our own database clusters or not. The move to managed and cloud definitely represents a notable pinch point, and we’ll see how painful the transition will be, but it’s one paradigm shift in an industry that creates paradigm shifts for a living.
I’ve actually thought for a while that in the longer term, as compute at scale becomes quite a bit smaller and even cheaper, we could see a move back away from cloud because when you can host your company’s compute cluster in a cube the size of a printer and we have algorithms that can encode and enact services at scale in an operationally safe way, the value cloud adds will dwindle.
I love my job, and I love this industry, and plan to continue playing in this pool until I die, whether someone’s willing to pay me for it or not.
I assert that the constant negativity many of us exhibit is neither necessary nor desirable and represents a kind of immaturity that we will eventually grow out of as our industry continues to mainstream.
However it’s the way our industry works and has since its inception.
I’d gently suggest this is sort of an admission that it’s not an endgame I’ve invented.
I’ve actually thought for a while that in the longer term, as compute at scale becomes quite a bit smaller and even cheaper, we could see a move back away from cloud because when you can host your company’s compute cluster in a cube the size of a printer and we have algorithms that can encode and enact services at scale in an operationally safe way, the value cloud adds will dwindle.
I’ve had similar thoughts, at various points along the winding path to where we are now, but I think in practice that it’s not just where the hardware lives but who owns it. And in practice the amount that the hardware’s owners are the hardware’s users instead of the companies that control the ecosystem is perpetually decreasing. Hobby-level undertakings of various sorts are vastly easier than they were a few decades ago, and an amazing range of tech is available to small-scale operators, but somehow none of this seems to be decreasing the net concentration of power.
I assert that the constant negativity many of us exhibit is neither necessary nor desirable and represents a kind of immaturity that we will eventually grow out of as our industry continues to mainstream.
I would in many ways prefer to see our industry destroyed and the ashes thoroughly salted, so I suppose it is safe to say that we fall on very different sides of this question. :)
Forgive me but I don’t understand! I’m saying this ISN’T an endgame! Merely part of a larger cycle that we are in the throes of and have been forever.
Sorry, I think I should have been more clear. I’ve encountered the view that things are cyclical among a lot of the long-timer technical people I know, and I recognize its validity. I share it to a degree. I haven’t been in the game as long as you have (or most probably with anything like the depth of experience), but anyone who’s lived through a handful of major technical paradigm shifts along whatever axis should probably be able to recognize cycles in the system. All the stuff that was considered The Future when I was getting my start has been The Dead Past for long enough now that some of it’s coming back around in different costume for the 2nd or 3rd time.
Two things, though:
For an individual, the endgame is that the cycle will eventually dispose of you, your hard-won understandings, and the things that you have built, whether it takes one iteration or a dozen. To a first approximation, it will do this with no regard for the value of any person or thing. On one level that’s just the nature of life, but the technical economy has by now overdriven this process to an absurdly pathological extent. It burns people out and churns their work into dust at a rate and on a scale that is frankly anti-human, and it’s a reality that means ruthless mercenary calculation and a deep mistrust of the systems and cultures and communities we work within are more or less requirements for paying the rent.
Once upon a time, I made a just-barely-workable and fairly satisfying living building some desktops, fixing some printers, running a mailserver, and hacking some e-commerce code. That job, that class of job, hasn’t utterly vanished, but it’s an endangered species. I probably wouldn’t know where to find one now if I needed it. I’ve been lucky and connected enough to migrate to other tiers of abstraction in other corners of things. But every thousand of that job that went away wasn’t replaced by a thousand jobs like I have now, and it definitely wasn’t replaced by a thousand jobs working on the stuff that made most of it obsolete from the point of view of the people who used to pay for it. In fact, a lot of the people who used to pay for it also don’t have those jobs now because instead they’re sharecropping for Amazon in an enclosed market that will eventually find them unnecessary.
I guess what I’m saying just boils down to: The cycles operate within something larger. The something larger has a direction. The direction is one I have come to generally fear and loathe.
The way people use “incorrect” here is really frustrating. What precisely about my statement does someone deem “incorrect” - the bit about Linux from Scratch being a laudable pursuit or the bit about being able to treat managed services as building blocks?
“not an apples to apples comparison” is putting it mildly. You’re comparing the cost of some infrastructure with the cost of an application-level specialist you’d need either way. The equipment to run a comparable database instance on-premises should cost less than half that. Ops costs are harder to compare, but $work certainly doesn’t retain an operations engineer per big RDBMS instance.
More generally, though, I think this is the way things are going without encouragement, and I hate it. It feels like the list of places where ops involves working with computers is shortening. I could offer all sorts of reasons why I think it’s a bad trend—it’s concentrating people, information and resources in a few huge companies, network effects stifle innovation, bespoke architectures are more efficient, your NOC is more likely to answer your calls when they work for you, blah blah. Mostly, though, it just makes me sad to see the things I enjoy becoming increasingly irrelevant except to a small group of companies I don’t want to work for.
TITANESQUE DISCLAIMER: I FIX USER OUTLOOK MAILBOXES FOR A LOCAL IT COMPANY
Yes, always cheaper at first. And then your tiny thing grows, and then you cannot live without your managed services, and your managed service knows it, and then it is not cheaper anymore.
This. Half of the jobs I had could be summed up as: oh shit, managed service X got out of control and its cost became unbearable.
In one instance, it ended up in the company firing 70% of its employees, because one morning, some GCP service costs had sky rocketed causing the main project of the company to fail. To be honest, had they made a simple extrapolation when they signed up for the service, the situation would be 100% expectable.
Of course it depends. S3 value offer is difficult to replicate as a self managed service, but things like hosted RDMS or even ec2 instances are really only an advantage when starting up and toying with tiny loads. Once your serviece start scaling, they quickly result in 1-3 orders of magnitude of extra costs. Which is unlikely irrelevant.
I beg to differ. Lots of places can get by without the application-level specialist. Until they can’t. Making that decision is of course is an art, but using a managed service will extend how big you can get without a FTE (maybe you will need some consulting for tuning a query, for example, but that’s cheaper).
I understand this perspective. I still think there are ops tasks and jobs, but yes, they are certainly changing. I do think that managed services lets people build a lot more software (the same way that blogs let a lot more content be created than hand written html) but appreciate that the effects aren’t entirely positive.
(I think it’s kinda relevant that you are a developer evangelist for one of these hosted services. Neither good nor bad, just an important datapoint.)
One thing that I think gets lost in this is that a company is theoretically valued also at the in-house talent. If your company is basically just a glorified rebundler and reseller of service providers, you lost the ability to leverage that in-house talent. You also by definition reduce the ability to distinguish yourself from other companies that are doing the same thing.
Second, there is a very real problem where people buy a service provider and then just patch over the issues with it using…another service provider! This has this weird inflection point where suddenly any scaling is a heart-stopping event as you graduate from the “developer wants to pad resume tier” to the “somebody in the C-suite needs to sign a contract tier”.
Thanks for pointing that out! I have actually been espousing this viewpoint for years. I also taught AWS certification courses and built a start-up in the last 5 years and both of those cemented my view about leveraging managed services.
But yes, where people sit definitely affects where they stand, and I am no different. (I will also submit that, in my opinion, my employer’s offering beats the pants off the hand rolled authentication systems I have seen over the years.)
That’s a good point. I think every company needs to decide what it is good at. That may be infrastructure for some companies, for others it might be enterprise sales, etc, etc. I have left companies where I knew my department was never going to be the focus of the company, because that limited my growth. My favorite old post about this topic: https://www.joelonsoftware.com/2001/10/14/in-defense-of-not-invented-here-syndrome/
From that post:
As you mention service provider sprawl is a real problem. Being disciplined about managed services, including when to leave them, is just as hard as being disciplined about maintaining your own code. Maybe even harder because the effort is lower for the managed service (which would argue against my thesis, as you say).
Same here! And I didn’t mean that as a gotcha, more of as a “this poster has probably seen the spectrum of this philosophy in practice and that is relevant.”
:)
You bring up a lot of good points, especially about bespoke architecture. I think my only push back would be that the landslide majority of the time I set up a database (or any other service), it’s with the same configuration and features as everyone else. It makes more economic efficiency to cater to the general case, both for the vendor and the customer (or for the DBA and the company). Inevitably there will be divergence especially as a company or product grows in complexity.
I enjoy working on these architecture as well, but I find setting up the same old database again and again tiresome. I’d rather defer that action to a vendor until there’s a real problem worth solving and dive deep there.
It’s not fading into irrelevance when more companies are given the ability to use database (or whatever) at scale. On the contrary, more companies using databases mean more relevance for the people that know how to use them well. The statistical majority can stay under the middle 50% of the use case covered well by the vendor and leave me to cover the outer 50% where the inside and outside tails have interesting use cases.
These complaints seem like general grievances with any kind of automation. More work is accomplished by fewer people, and those people are increasingly specialized. Bespoke widgets are replaced with cold, stamped-out widgets. Customer service decreases.
This isn’t to say your frustrations are invalid, only that they are normal and increasing as we automate away more and more skillsets. It’s a bummer to see your hard-won skills falling to a dramatically cheaper service.
Managed services frequently determine the shape of your application, the velocity of feature delivery, and, when gone deep in, it removes the capability to use sophisticated open source software.
This kind of smear-it-on-everything advice is not good. You need to analyze what your software does, where it will be, what the organization needs, and what it plans to need.
That, in the most blunt sense: depends. Wildly. EC2 is a good service. DynamoDB is deeply limited. GKE is not deeply configurable. Azure has some strange APIs.
Dollar for dollar, but that’s not even appropriate level analysis. You need to analyze what it does to the system under delivery: limitations, expansions, and integration work.
Same as above.
The advent of managed services and writing integrations/overlays has given a lot of people extremely well paid jobs. Thought should be put into this decision: second and third order effects will start to outweigh simplistic considerations like the above.
“My cloudformation changes are taking a week to be applied” -> iterate that problem enough and you’ve delayed releases by months, costing far more in reputation, income, morale, etc than that Staff Engineer you didn’t want to hire.
MASSIVE DISCLAIMER I WORK ON GOOGLE CLOUD
But, it has APIs, right? If you self-manage, do you want to spend engineering hours making your own API? I don’t think I would. I’d rather build my product.
It is your product! Every part of your product is your product. Every dependency of your product is your product.
This depends, heavily, on what that service adds and what that service API looks like. The worse the API is, the fewer cases it covers, the worse it is for me. The more commodity the tool is, the more comfortable I am in using it, because it implies I can (1) migrate (2) there’s competition vs providers. My “effectiveness” moat needs not to be a thin wrapper over an API, it needs to be much better.
Every third party thing I integrate demands hours of work, design review, etc. The rough complexity of integrations is
n!
, n being the number of integrations(technically each third party thing has its own complexity count).Our own code might be something like polynomially expensive to maintain. If you can keep your complexity count under the complexity count that I take in - both initially and over time (dev time, ops time, maintenance time), then you will win.
Note that I’ve personally been in an org which was so bought into a certain cloud provider that it had no ability to innovate via, e.g., using Apache projects, and all improvements came from the cloud provider… we’d essentially stalled out. New divisions actually wound up spinning their own devops orgs because we’d become so incapable at delivering new tools because “not supplied by cloud provider”.
What we’re actually talking about is called vertical integration in the general case. It can be enormously effective if you can swing it.
Let me ask you this question: how much does Google use from AWS? (Mind, I don’t think you can answer. ;) But Google is notorious for writing its own software).
edit: Note that what I’m saying is not “Don’t use cloud services/SaaS services”. It is: understand deeply what you’re doing and the ramifications of your choices, both technically and organizationally.
On one side, I strongly agree with this. I use GCP and DigitalOcean often to outsource what I do.
On the other hand, I’m watching an entire community of people put out fires because they built their IT on a managed service which Apple bought and effectively terminated yesterday, causing people to wake up to entire fleets of devices with broken policies.
Like everything else in tech, there’s no right answer, rather it’s a set of tradeoffs someone has to make.
MASSIVE DISCLAIMER I WORK ON GOOGLE CLOUD
I think there is definitely a difference between using AWS/Azure/GCP/AliCloud and a startup like Fleetsmith. I feel super sad for the people that got impacted, as that sunset is really bad (I know that GCP has a 1 year sunset for GA products). If you’re using say, GKE for your k8s clusters, you can be confident that’s not going away.
Yesterday I was trialing EKS (k8s) on AWS. I did not like the experience, I ended up abandoning the AWS native method for a third-party tool called
eksctl
and it still took ~30m to provision a 2 node cluster. I cannot begin to imagine how one would self host a k8s cluster.So yes, there are trade-offs, but I think there are definitely ways to mitigate them.
P.S. Given the Fleetspeak turn-off, one great service going away that would keep me up at night is PagerDuty, there really is no product that I know of that is anywhere near as good.
Is there thought? So only use a big provider (AWS/GCP/Azure) for your startup project? No Digital Ocean/Vultr? Those are both fairly large shops with a lot of startups on them. But they’re also not too big to fail. Digital Ocean is offering more managed services (databases and k8s) but if they ever declared bankruptcy, your startup will be scrambling for another service (and could find yourself with a much higher bill on the big three).
I’d rather see more open source management solutions for things like simply full redundancy management for postgres or mysql. What I’ve found is that most shops that have this kind of tooling keep it under lock, and it’s proprietary/specific to their setup.
I think managed services are bad due to cost and lockin, and they’re also having the side-effect on slowing innovation for better tooling so people can self-host those same solutions.
Yes, the loss of DigitalOcean in particular would be a huge blow to the ecosystem. Their documentation in particular is fabulous.
I’m unclear about whether I’d agree with lock-in as long as you are judicious if this is a concern, e.g. Google Cloud SQL is just Postgres/MySQL with Google juju underneath to make it run on our infra. There’s nothing stopping you dumping your database at any time. Same goes for something like a service like Cloud Run where you’re just deploying a Docker container, you can take that anywhere too. But then if you go all in on GCP BigQuery, then yeah, you’re going to have a harder time finding somewhere to take a data dump of that to.
I would say that the difference isn’t big provider vs startup but infrastructure-as-a-service vs software-as-a-service. Sure the major cloud providers have some software they offer as services but they all also have VMs that you can spin up and install whatever you want on. It’s not like you can install Fleetsmith on your own machines.
Disclaimer: I work on containers for AWS, but not directly on EKS
Just a note here that
eksctl
is the official and recommended tool for interacting with EKS. You can find it in the EKS user guide here.Managed services == cloud which leads to vendor lockin which leads to centralization of the internet.
Hard pass.
Own your infrastructure, own your data. If you care about your business you’ll do this. If you are cash strapped and can’t find or afford the talent, that’s a whole different cup of otters.
If you use something like managed PostgreSQL then I assume you’ll always have the option to dump your data and import it somewhere else, right?
Yes, with downtime unless they also offer replication access (RDS do not).
Context: I contract to (mostly) small companies providing ops/tooling and dev services.
TLDR: If you don’t want to have to hire in-house ops people to manage your DB layer, I’d very much recommend a company (e.g. Percona or similar) to manage it for you, on infra you control.
Long rant-y version:
This is (one of) the key point that surprises clients when I talk to them about a multi-vendor solution.
They mostly understand fairly well the risk of relying on a single company, especially with how rube-goldbergy AWS is.
“Ok so we’ll just use <cloud 1> and <cloud 2>, right?”
I’ve literally never seen any first party managed DB service (i.e. where the management isn’t provided by some third party, e.g. Percona) that will even acknowledge the existence of a managed instance provided by another company. And with the direction that AWS in particular goes, you’d be crazy to try it: replicating to/from their “patched” / “reimplemented” versions and a regular instance elsewhere? Sounds like you’re just begging for incompatibilities there.
At the very most, I’d vaguely agree (with the overall discussion, WRT databases) that some businesses might benefit from having a third party manage their DB setup, across multiple vendors (i.e. just using the vendors to provide virtual machine instances or some form) that the business itself controls (i.e. you give DBs-R-Us access to your DB infra, not pay them to provide you with a hosted DB service). In this scenario the big “cloud” operators are likely worse options, as they’re very much keyed around using as much of there “as a service” stack as possible, and just renting dumb VMs from them 24x7 is ridiculously expensive compared to the competition.
Many of these managed services like S3 are black (or at least grey) boxes. The edge case SLAs can be unpredictable and when you experience issues like multi-second latencies, it’s impossible to debug.
Running your own service means you can strace, gdb, top or even add printf() statements to debug. The power is in your hands.
DISCLAIMER: I work for AWS Elastic Filesystem.
I’d argue that if you’re running strace or gdb on your database instance unless you’re breathing some VERY rarefied air and doing some incredibly specific work where you’re using the database in an incredibly specialized way, this should be seen as an antipattern.
And, if you are in fact in that 10% who have a valid need to do this, you probably wouldn’t even consider managed services anyway.
They’re for people who want to treat infrastructure like LEGO. There are some valid reasons to hate this conceptual model, but I claim that the industry has produced enough counter-examples to at least put up a good fight.
I think this downplays significantly how broken computers are on a fundamental level.
It’s also not a nice sentiment to have when you’re the person who has the control and I, the user, do not.
There’s a simple fact that systems will fail in loud, quiet and interesting ways. When you rent a managed service it’s essentially the same thing as buying a support license from a vendor for a black box unit that sits in your datacenter.
Sometimes the support staff have little incentive to help, sometimes your unit will fail intermittently and be difficult to have support staff on hand during the issue- and, often, the support price will go up as the product gets older. Meaning you have to rebuild your applications to use version 2 of whatever product it is.
My situation is very simple: I’m responsible.
I cannot outsource that responsibility, I can mitigate the risks, but if everything fails my company will come to me first; it was my choice to use a managed service and I have the responsibility if it fails.
Managed services are great though! In theory there are many operations staff working on maintaining them so that I don’t have to. But the flip side is that I get a cookie cutter variant of something with no control if it goes down, and developers are constantly pushing changes to the service without my knowledge which can impact things (for the better or worse).
You’re quite lucky to be working for the dominant cloud provider too by the way, because when you have issues your customers are not going to be beating down your door too much. “If Amazon goes down, half the internet goes down too, so we’re not too worried” is a sentiment I’ve heard developers mention often.
The Google Cloud and Azure guys don’t have this luxury; and you’ll note that it’s never the CTO of a >500 person org who is making this claim either.
I am incredibly lucky across a number of axes. I love working here. For me it’s an unbeatable combination - great challenges, great people, great culture. I am also incredibly lucky to be on a great team with a phenomenal manager.
I think there are two good points here.
For one, your DB daemon is just a process. It has a stack and a heap. There’s no reason to be intimidated doing traditional debugging–gdb, strace–to identify latency issues.
Your other example about EFS is an even more accessible candidate for debugging. Let’s say I was experiencing 500ms write latencies on EFS. I would have to go…RTFM and hope there’s some 99%ile case I’m missing. On my own NFS server i would jump over there and strace & gdb the daemon to see where the time was spent.
I’m guessing both cases would waste my time just as much, but the 2nd case would empower me with both the understanding of the problem and the tools to get to the solution (patching the daemon and sharing the patch upstream).
The only antipattern I see here is telling devs that they are not sophisticated enough to gdb a daemon. There’s no magic in software. They are deterministic artifacts with stacks and heaps – differing from “hello world” only in size but not in nature.
And just in case this sounds hypothetical, it’s not. I’ve had real world experience on multi-million dollar products where the cloud DB just stopped , or disappeared, or dropped a table unexpectedly. Our only recourse was a phone call, which usually involved a support upgrade, and a long queue waiting for the solution to be found. In one case this was a DB with a > $50k / mo license fee that just disappeared.
So the black-box issue on cloud is real. And cloud marketing is working to cover it up.
To be clear, I did not cite EFS as an example, explicitly because what we offer is a black box, and has to be given the proprietary nature of the service.
that’s a part I’d like to see improved. I’m also a big cloud fan and heavy user. But the move toward more closed services is a bad trend for developers.
With some investment in tracing & debugging, improvements can be made.
My only hard-line stance is against “managed is always better” – there are enormous costs to using a managed service .
Anyone who takes this stance is straight up ignorant.
There are all kinds of reasons why managed services might not make sense for a particular use case. Total control and debuggability is but one of them.
There are regulatory reasons, performance reasons, customizability reasons, and that’s just off the top of my head.
I think you two are in agreement though. Neither of you wants people to assume managed services are a silver bullet to every problem.
So is this is advice for the run of the mill cloud engineer who can’t be bothered to learn from the experience shared by those who created those managed services in the first place? Maybe my approach can be generalized but I think the information contained in blog posts and whitepapers and conference talks about the tech in the managed services is enough to build in house services in their image. Maybe it’s a bad idea? Probably. Obviously you pay for it in terms of time and effort which can be equated to money and fewer features elsewhere.
What you gain though is control over your own destiny and lower infrastructure costs over time. Honestly it’s probably not the best trade off but I would rather have control than sending emails to engineers or “support specialists” who I can’t fully trust to care about my application or my customers instead of handling it myself. SLAs and reimbursements don’t matter much in that case. At some point the amount of redundancy that I implement to make sure I don’t get screwed because of a service outage or shutdown one day (regardless of any promises they have made to me) is going to eclipse the amount of work I would need to implement the service itself. At that point it becomes a bad deal.
None of them are actually that hard. All of the building blocks exist.
Great way to explain the tradeoff. And different orgs can make different choices. My experience has primarily been at smaller companies where it didn’t make sense to build our own.
I mean, I guess you make the same choice using a commercial service like pagerduty vs building up your own scripts based on a solution like nagios (or something else, I haven’t been in that space for a while). You need to make strategic decisions about what is worth keeping in house and what isn’t.
Even if true (which I don’t grant, especially when you consider edge cases and ongoing maintenance) it becomes an opportunity cost choice. Do I build my own database backup automation system (which is not, for a typical business, a value add) or do I use a managed service (which introduces additional dependencies)?
Building services is hard… Let’s go shopping.
Whenever companies grow substantially, the comparison of the cost for managed vs running their own services always ends up with the managed service being more expensive. So, managed services aren’t a good idea for anything than taking a shortcut when you have a small team, and it’s going to bite you in the ass when you grow later.
if you grow later.
Fixed that for you :)
If your organization plans to stay at same level of growth as it currently is forever then maybe your point makes sense, but companies like that tend to either have bounds of time and effort to focus on improving these things (case in point: Craig’s List) or they lay off people because their ship is sailing w/ less maintanence now (usual case).
With that said, you didn’t fix anything. I considered typing “if” when I wrote it, but it’s really a moot argument in practice. If you plan to grow then you should plan to grow. Otherwise, you plan to fail.
Just because you feel differently doesn’t mean you are fixing things. Saying things like “fixed that for you” instead of having an actual conversation is just rude behavior all around.
Not all companies are VC-backed startups. There’s nothing wrong with serving a small number of loyal customers indefinitely.
I already accounted for that edge case at the start, but I also explained why I believe it doesn’t really apply here. It’s possible that you replied while I was still editing since I made a few edits, so not sure. I apologize if that confused the situation.
Either way, it’s completely not applicable whether or not they plan to grow in my personal opinion and experience.
Ah, sorry, I was joking about the fact that often people plan to grow but don’t actually do so. I have certainly been part of such organizations.
And that if you plan for when you are a 1000 person company when you are a 200 person, especially when it comes to engineering intensive efforts without a ton of value add like commodity databases, you may never get to 1000.
In other words, managed services are technical debt?
In some cases, but I am saying that they are financial debt at a successful company. They are a shortcut to profitability, not a solution to longevity.
They do tend to become technical debt at scale, though, because they are meant to make broad scopes and support every use-case and aren’t usually tuned to perform well for the specific use-case of the product.
You can’t just move k8s deploys from one provider to another. K8s is incredibly complex. Especially when you factor in the different ways kops, EKS, Rancher and other setup master/nodes into your hosting infrastructure. Running k8s on bare metal also presents problems where you have to setup an ingress/egress system .. not to mention the networking landscape changes how tools like Istio would need to be deployed.
In the best case, deployment YAML/json has minimal changes (and really the generation of those should be automated so they can all be updated with minimal changes at once; although tools like Helm are the major tools used for that task and they’re pretty terrible). Realistically, migrating people from one k8s cluster to another is incredibly difficult. k8s was build for providers like GCP (and now AWS and others) and using it pretty much marries you to hosting providers.
Fair points. Perhaps I should have mentioned moving a mysql database from one provider to another, or switching out a standardized queueing system. I don’t have a ton of k8s experience, but from what I’ve read, there’s at least some promise of portability.
I feel like the OP is missing the nuance between “managed service that provides low-cost access to commodity cloud utility, like compute, network, or storage”, vs “managed service that locks you in to a proprietary API and guarantees your margin is eaten by the cloud vendor.”
In AWS, using S3 is the former (commodity storage at low cost), whereas using DynamoDB is the latter (non-commodity proprietary API that is expensive in all respects, and, at the end of the day, is just storage). That being just a pair of examples.
AWS “the good parts” involves EC2, S3, ELB, and, perhaps also, Cloudfront, EMR and Athena. It’s interesting that you can now tell “the good parts” by what has been copied or matched in Google Cloud, so they have GCE, GCS, Cloud LB, Cloud CDN, Dataproc, and BigQuery. These are truly providing thin wrappers over the commodity compute, storage, and network infrastructure. And that’s also why the cloud vendors drop prices on these over time, because they are tracking the cost of the commodities in terms of hardware and data center advances – savings that acrue to you, rather than the cloud vendor’s margin in using their “serverless” infra.
I also do agree that using RDS (or Google Cloud SQL) for a Postgres instance is usually the right move, but that’s more a convenience thing: it’s “just hosted Postgres” at the end of the day.
But once you start heading in the direction of Kinesis, DynamoDB, Cloudsearch, etc. you are coding directly against AWS, essentially treating AWS not as your web/app host, but as your standard library and development environment itself.
And that’s fine, but it also shouldn’t be a crime to consider Kafka instead of Kinesis, Cassandra instead of DynamoDB, Elasticsearch instead of Cloudsearch, etc. Not to mention using those supports open source rather than proprietary stacks. And the good news is, you can reuse your skills with Linux, EC2, and S3 in standing those systems up easily, even more so with the help of tools like Terraform and Ansible.
That’s a great nuance and comment. In my defense, I was limited in the number of words I had :), due to where I was submitting it (never got published, so I just posted to my blog). But you’re correct, there’s a big difference, both from the implementor and cloud vendor perspective, between what is proprietary and commodity. And you as a managed service user should consider that as part of your decision tree.
I think that BigQuery was released ahead of Athena, though (nit).
Yea, BigQuery came first and Athena was a fast-follow. I think there might be an argument, also, that PubSub and PubSub Lite are Google’s “S3/GCS equivalent for basic real-time messaging”. You can view PubSub as a managed service over “raw event/message-level networking” as a commodity, which is really only a small step up from raw inter-machine sockets as might be provided by ZeroMQ or Kafka.
I think AWS will likely imitate that product, or roll the “pull” messaging model into SQS, since right now the best options for this style of messaging on AWS are Kinesis (which isn’t 100% managed, due to the pay-per-shard model) and MSK (which is really just Kafka hosting).
Not… really. They’re operated by people just like us, they make mistakes, they make stupid decisions, and they’re driven by goals that are never in exact alignment with your own.
No it’s not. If it’s something core to your business, managed is going to be way more expensive.
Not really fair, one huge DB instance isn’t the right answer for anyone, and one of the biggest thing RDS has is nowhere near the upper limit on growth. My medium-size employer has 40x that many cores running relational DBs. I’m sure we can find a couple people to do care-and-feeding if it means reducing a $4M bill.
Or rather, they can focus on creating a layer that makes the third-party thing they were given resemble the thing they actually need, instead of building they actual thing. Then they hope and pray that the cloud thing keeps operating the same way in the future, because all of their work can be made worthless at any time.
This is the “build vs. buy” argument and the fundamentals haven’t changed. If you have a requirement that isn’t actually your core business that you don’t want to hire for, buy. If you’re a startup with no budget trying to prove out a concept, buy for everything except that tiny nugget. If it’s something that actually matters, build.
I work for AWS.
Rather than focusing purely on price, a more useful perspective is on opportunity cost. The cost of a choice isn’t $X, rather it is what benefits you could have derived from the next best alternative for that $X. Moreover we need to better define what level of abstraction we’re “managing”.
For example, I made an natural language processing based method for choosing baby names. I did not use AWS SageMaker. Heresy! But SageMaker is incredibly limited and does not give me the flexibility I need in terms of tuning my deep learning models that extract features and boosted gradient trees for using those features. At the same time, I run the whole lot on AWS Lambda and ECS. I chose not to worry about scaling, patching servers, and ensuring availability zone fault tolerance, amongst other concerns.
So sometimes asking others to manage something lets you use your time better. Sometimes there aren’t powerful enough managed services at given levels of abstraction to serve your needs, and it’s worth the time and sweat and tears to educate yourself and maintain something custom.
AWS also uses its own managed services, including services not available to the public. It would be insane to re-invent everything from scratch. Imagine the wasted opportunities.
Just two cents.
A significant portion of my career has included what I think of as the “cycle of managed services.”
This has seriously happened at least three times that I’ve seen on a large scale, and several other times on a smaller scale (where they hit the first problems and back up and rework it early in the project).
Ultimately, I think a managed service is okay, if that managed service is essentially a copy of some decent open source (or even commercial) tool that someone is managing for you. Postgres? Sure, RDS is fine. Redis? Yeah, go ahead and use Elasticache or Azure Cache. More “niche” things, though, with proprietary APIs? They’re a real risk. You go down with the ship (boutique database startups) or you find yourself mired so deep in the particulars of a bad API that you can never escape (DynamoDB).
This is super refreshing to read. I think the trend towards self hosting, Linux from Scratch, and designing your own processors, motherboards, and firmware are all fascinating and laudable pursuits, but there is value for many people in being able to treat infrastructure as composable building blocks.
I would say those are orthogonal issues.
In other words people don’t self-host because they want something “noncomposable”. And open source stuff is usually more composable anyway, so I’m not sure what you’re getting at.
They self-host because they want control (and sometimes cost, which may or may not be accurate etc.)
I’m getting at the value of helpful abstractions that allow people who would not otherwise be able to to build complex and interesting systems.
Have you ever run an ElasticSearch cluster? It’s not easy and it’s REALLY easy to think you can and lose all your data.
A managed service not only encapsulates ease of installation but also best practice around operational safety, backups, etc.
Sure, that is a reasonable opinion… but I have trouble getting there from the original comment. Whether ElasticSearch is hosted or not doesn’t affect “composability”
You’re right that was a poor choice of words. I’ll try to think up the perfect turn of phrase.
I think it may come down to containerization or not. For better or worse, containers are more composable because the state is more isolated.
Hosted services are also isolated in that you can’t peek inside (again, for better or worse). But it seems like most people are self-hosting with containers these days. There’s a spectrum.
I agree composability is a desirable property but I aim to achieve that without relying on too many cloud services. I guess one way to do that is to only use cloud services with commodity interfaces, e.g. web hosting, SQL, etc. Rather than services which there is only of like BigQuery, etc.
Containerization is one useful abstraction that can definitely help people achieve solutions quickly, but depending on the nature and operational characteristics of the system being containerized, that abstraction only gets you one step of the way towards solving your problem.
I’m sure there’s an ElasticSearch container, but will that help you understand how to operate it with 100% confidence when you can lose your entire data set if you unintentionally perform certain operations when you don’t have sufficient quorum?
I think that’s a pretty ridiculous take. Nobody is proposing you should build your own operating system just that you can’t build a business just gluing other people’s products together.
I think your response is also a straw man. For most software businesses, the value is not in having their own high-availability DB setup or load balancer or any of the other things that the big IaaS providers provide, but in the application that they develop on top of those services.
This is the straw man. :)
Nobody is saying make your own load balancer, but RUN your own load balancer.
But even just running your own load balancer, replicated DB, etc. has a cost that can often be better spent focusing on the application itself.
I was using an extreme case to illustrate a point.
Wow the negativity bias around this topic is stunning :)
The major providers are / are bound up in entities that exercise power on the scale of nation-states, and an unfathomable number of lives are affected by how their eating of the entire economic possibility space plays out. Meanwhile, it slowly begins to dawn on much of the heretofore quite privileged technical class that their lives are just as contingent and their livelihoods will not be spared once the machinery they’ve helped build finishes solving for the problem of their necessity.
So, like, of course there are negative feelings.
You’ve invented quite the end game for yourself there!
Will there be job reductions and resultant economic pain as a result of the move away from the data center and towards more cloud/managed services? Hell yes there will be!
But those who want to work in this business will find ways to adapt and continue to add value.
I’ve been working in technology for ~32 years. This doesn’t make me magical or omniscient or even smart but what it does make me is a witness to several waves of seismic change.
I came into the job market in the early 90s when DEC was imploding, PCs were ascendant, and the large expensive workstations vendors were dead companies walking but didn’t realize it yet.
Everyone thought the world would end and we’d all be out of work too.
Will it be hard? Yes. Will there be a lot of people who can’t or don’t want to adapt and will be left behind? Undoubtedly. Is that bad? Yes. However it’s the way our industry works and has since its inception.
We are all turning the crank that drives the flywheel of increased automation, whether we administer our own database clusters or not. The move to managed and cloud definitely represents a notable pinch point, and we’ll see how painful the transition will be, but it’s one paradigm shift in an industry that creates paradigm shifts for a living.
I’ve actually thought for a while that in the longer term, as compute at scale becomes quite a bit smaller and even cheaper, we could see a move back away from cloud because when you can host your company’s compute cluster in a cube the size of a printer and we have algorithms that can encode and enact services at scale in an operationally safe way, the value cloud adds will dwindle.
I love my job, and I love this industry, and plan to continue playing in this pool until I die, whether someone’s willing to pay me for it or not.
I assert that the constant negativity many of us exhibit is neither necessary nor desirable and represents a kind of immaturity that we will eventually grow out of as our industry continues to mainstream.
We’ll see.
I’d gently suggest this is sort of an admission that it’s not an endgame I’ve invented.
I’ve had similar thoughts, at various points along the winding path to where we are now, but I think in practice that it’s not just where the hardware lives but who owns it. And in practice the amount that the hardware’s owners are the hardware’s users instead of the companies that control the ecosystem is perpetually decreasing. Hobby-level undertakings of various sorts are vastly easier than they were a few decades ago, and an amazing range of tech is available to small-scale operators, but somehow none of this seems to be decreasing the net concentration of power.
I would in many ways prefer to see our industry destroyed and the ashes thoroughly salted, so I suppose it is safe to say that we fall on very different sides of this question. :)
Forgive me but I don’t understand! I’m saying this ISN’T an endgame! Merely part of a larger cycle that we are in the throes of and have been forever.
Sorry if I’m missing your point.
Sorry, I think I should have been more clear. I’ve encountered the view that things are cyclical among a lot of the long-timer technical people I know, and I recognize its validity. I share it to a degree. I haven’t been in the game as long as you have (or most probably with anything like the depth of experience), but anyone who’s lived through a handful of major technical paradigm shifts along whatever axis should probably be able to recognize cycles in the system. All the stuff that was considered The Future when I was getting my start has been The Dead Past for long enough now that some of it’s coming back around in different costume for the 2nd or 3rd time.
Two things, though:
For an individual, the endgame is that the cycle will eventually dispose of you, your hard-won understandings, and the things that you have built, whether it takes one iteration or a dozen. To a first approximation, it will do this with no regard for the value of any person or thing. On one level that’s just the nature of life, but the technical economy has by now overdriven this process to an absurdly pathological extent. It burns people out and churns their work into dust at a rate and on a scale that is frankly anti-human, and it’s a reality that means ruthless mercenary calculation and a deep mistrust of the systems and cultures and communities we work within are more or less requirements for paying the rent.
Once upon a time, I made a just-barely-workable and fairly satisfying living building some desktops, fixing some printers, running a mailserver, and hacking some e-commerce code. That job, that class of job, hasn’t utterly vanished, but it’s an endangered species. I probably wouldn’t know where to find one now if I needed it. I’ve been lucky and connected enough to migrate to other tiers of abstraction in other corners of things. But every thousand of that job that went away wasn’t replaced by a thousand jobs like I have now, and it definitely wasn’t replaced by a thousand jobs working on the stuff that made most of it obsolete from the point of view of the people who used to pay for it. In fact, a lot of the people who used to pay for it also don’t have those jobs now because instead they’re sharecropping for Amazon in an enclosed market that will eventually find them unnecessary.
I guess what I’m saying just boils down to: The cycles operate within something larger. The something larger has a direction. The direction is one I have come to generally fear and loathe.
There are still loads of generalist sysadmin/IT department jobs for places that aren’t tech-focused.
The way people use “incorrect” here is really frustrating. What precisely about my statement does someone deem “incorrect” - the bit about Linux from Scratch being a laudable pursuit or the bit about being able to treat managed services as building blocks?
It doesn’t say “I disagree” it says “incorrect”.