Just for the sake of argument: YAGNI boils down to “prefer simplicity to complexity”. RDBMS and SQL are very complex. If I’m following YAGNI, why should I use an RDBMS instead of something much simpler, like Mongo?
((I’m strongly in favor of SQL over NoSQL is 99% of cases, just curious about how other lobsters answer the puzzle.))
Two acronyms that trigger me immensely after seeing a lot of devs abuse them are YAGNI and DRY, usually because they are parroted back blindly by people that aren’t thinking holistically about their systems and the people building and maintaining those systems. For DRY, as an example, a bunch of copy-paste config scripts or boilerplate can actually be a lot easier to troubleshoot and maintain than a byzantine architecture design to abstract away things to let people skip writing var window = new Window(0,0,200,200); var window2 = new Window(100,100,200,200);.
More to your point, with YAGNI, the answer for me is that yeah, starting out it’s honestly faster to use a memory store (say, var sessions = Object.create(null)) instead of even Mongo! If you need persistence quick, use property files or json blobs flushed to disk periodically.
But, and this is where people usually screw up, you use your experience to inform what you’re going to need. Things that every business needs within the first few months of development:
Monitoring, even a simple heartbeat 200 route.
Sending emails
Collecting user emails
Authenticating (not authorizing!) users
Metrics on pageviews to show traffic and conversion
Querying relationships between business domain entities
Logging for when things blow up
Persisting user data to disk
Decades of work has shown that there are no special snowflakes in these regards!
And yet, claiming YAGNI, a lot of places pretend that those things are not a concern right now and never will be a concern and end up doing really heinous shit that even a moment of reflection would’ve prevented. Example of this would be building an e-commerce site (one of the literal academic exercises for SQL) with a store like Mongo.
Like, yes, right now there is no need to do a rollup of quaterly sales by product line and vendor, but that is something we know you’re going to want as soon as you figure out that such a thing exists. But, if people have been strict lean-startup YAGNI the whole time, you’re probably going to find out that the way forward is to retroactively bolt on some hideous schema and relational model to the application layer and hope that that gives you real numbers.
Similar things that people cry YAGNI on:
“We don’t need transactions for our database yet.”
“We don’t need more than one prod server yet.”
“We aren’t going to need site analytics.”
“We aren’t going to need an HTTP API.”
“We don’t need linters and code climate stuff yet.”
One of the signs of seniority, in my opinion, is to have an engineer that recognizes when the business is, in fact, going to need it–and in all other cases, aggressively fake it in such a way as to not hamper later fixes.
There’s also a tendency to mistake “I don’t know it” for “it’s too complex,” when other people who you can hire are more likely to know it than the “simpler alternative.” Relational databases are the best example: I’ve never seen anyone argue against them who was comfortable with them. If you base your system on the Next New Thing, how likely is it that it will still be around in forty years, like SQL? Or that you’ll be able to hire someone to help you with it?
Just for the sake of argument: YAGNI boils down to “prefer simplicity to complexity”. RDBMS and SQL are very complex. If I’m following YAGNI, why should I use an RDBMS instead of something much simpler, like Mongo?
The complexity of a RDBMS is isolated in a single unit that was thoroughly tested and presents a simple API to the programmer. The complexity of a key-value store is mostly in the programmer having to maintain a scheme outside the database and having to deal with new and exciting bugs that usually end up with the kind of data loss that would make MySQL look sane.
YAGNI boils down to “prefer simplicity to complexity”. RDBMS and SQL are very complex. If I’m following YAGNI, why should I use an RDBMS instead of something much simpler, like Mongo?
“Simpler internally” is not the same as “simple to deal with.” The latter is more relevant.
I’d ask two questions: 1) what needs are we most likely to have in the future? and 2) how much pain will we have if we’re wrong?
For instance, you may need high scalability. You also may need relational integrity.
Which one are you more likely to need? I’d guess “relational integrity”, as every system I’ve ever seen has had at least some relational data. (Even loosely-structured document data needs to belong to a specific user.)
Which one is harder to bolt on later? If you pick a RDBMS and need to scale it, indexes, caching, sharding and clustering are all things that can help. If you pick a NoSQL database and need to add relational integrity and transactions… you’re basically sunk.
Which problem hurts more to have? If you have scaling problems (and your business model is sane) you have proportionally large revenue and can afford to work on scaling. If you have data integrity problems, they may be costing you the only customers you have.
It depends, doesn’t it? If you’ve already got a RDBMS humming along, adding a second type of database is definitely more complex. If you don’t, I can see a case to be made for grabbing MongoDB, but.. throwing random schemaless json documents can cause headaches without adding in complexity in the form of discipline/coordination of changes/monitoring/etc. Some of which you may already have, further complicating the analysis. Either way, it’s easiest to work with the grain of your architecture– which fits the spirit of YAGNI.
Looking at YAGNI in particular, does the term itself ever get invoked in discussion as something more than a tool to shut down conversation?
Just for the sake of argument: YAGNI boils down to “prefer simplicity to complexity”. RDBMS and SQL are very complex. If I’m following YAGNI, why should I use an RDBMS instead of something much simpler, like Mongo?
I think people misunderstand YAGNI. As an engineering principle the idea is that, when you find yourself asking “Hmm, should I do X or build Y now, because someone might want it”, then the answer should be No. It arose in opposition to the Java Factory Factory Factory Overapplication pattern of adding extra injection points “just in case” someone wanted to introduce a different kind of FooBean down the road, which usually never happened, leaving you with a lot of extra complexity to read through for zero real-world gain.
It doesn’t really apply to questions like “Do I want X or Y?”, in my opinion. It’s purely a heuristic for rejecting undertaking “just in case” work you don’t actually have a concrete use-case for.
(In the case of RDBMS vs Mongo, I propose a different heuristic: NENM. Nobody Ever Needs Mongo).
I’m starting to notice a similar trend with software deployments these days. Small companies with like 5 customers and a CRUD app using clusters running schedulers orchestrating containers and what not.
I feel a lot of such software is a response to problems that you face when you’re running at Google scale. So taking the same technology and applying it to a much smaller application is just going to add more complexity.
Schedulers also offer features that everyone needs, regardless of scale: deployment and lifecycle of your application.
It’s true that a Capistrano/Fabric/Ansible script could work for some of the use cases, but a battle tested deployment tool is valuable for small scale projects as well.
By lifecycle I meant: canary deploys, rollbacks, blue-green deployments and similar tasks.
In my admittedly limited experience with Fabric/Capistrano, they don’t seem to be able to reason about the entire state of your server pool. What happens if Fabric fails to run the task on 10% of servers? How about 50%? How do ou handle auto-scaling? Servers self-provision with Puppet/Chef? You bake AMIs?
Most of these potential issues seem to be handled quite well by cluster management tools like k8s, docker swarm, nomad and others.
There’s a real upfront complexity cost to get started with them, but it also simplifies many other tasks.
Agree about Fabric/Capistrano. They don’t scale that well.
Ansible is in a different bucket though. I have more experience with SaltStack, and I like that I can reason about all my servers just by looking at the code. And all the problems you mentioned are easily solvable with SaltStack.
Provisioning is again a different domain. You could provision servers with Salt, but I prefer using something like Terraform for that purpose.
I get your point though. I’ll have a look at how k8s is handling these problems.
I also use terraform to provision the k8s cluster nodes but after that I haven’t found the need for traditional provisioning tools like chef/puppet. It’s all k8s pods and services.
Just for the sake of argument: YAGNI boils down to “prefer simplicity to complexity”. RDBMS and SQL are very complex. If I’m following YAGNI, why should I use an RDBMS instead of something much simpler, like Mongo?
((I’m strongly in favor of SQL over NoSQL is 99% of cases, just curious about how other lobsters answer the puzzle.))
Two acronyms that trigger me immensely after seeing a lot of devs abuse them are YAGNI and DRY, usually because they are parroted back blindly by people that aren’t thinking holistically about their systems and the people building and maintaining those systems. For DRY, as an example, a bunch of copy-paste config scripts or boilerplate can actually be a lot easier to troubleshoot and maintain than a byzantine architecture design to abstract away things to let people skip writing
var window = new Window(0,0,200,200); var window2 = new Window(100,100,200,200);
.More to your point, with YAGNI, the answer for me is that yeah, starting out it’s honestly faster to use a memory store (say,
var sessions = Object.create(null)
) instead of even Mongo! If you need persistence quick, use property files or json blobs flushed to disk periodically.But, and this is where people usually screw up, you use your experience to inform what you’re going to need. Things that every business needs within the first few months of development:
Decades of work has shown that there are no special snowflakes in these regards!
And yet, claiming YAGNI, a lot of places pretend that those things are not a concern right now and never will be a concern and end up doing really heinous shit that even a moment of reflection would’ve prevented. Example of this would be building an e-commerce site (one of the literal academic exercises for SQL) with a store like Mongo.
Like, yes, right now there is no need to do a rollup of quaterly sales by product line and vendor, but that is something we know you’re going to want as soon as you figure out that such a thing exists. But, if people have been strict lean-startup YAGNI the whole time, you’re probably going to find out that the way forward is to retroactively bolt on some hideous schema and relational model to the application layer and hope that that gives you real numbers.
Similar things that people cry YAGNI on:
One of the signs of seniority, in my opinion, is to have an engineer that recognizes when the business is, in fact, going to need it–and in all other cases, aggressively fake it in such a way as to not hamper later fixes.
There’s also a tendency to mistake “I don’t know it” for “it’s too complex,” when other people who you can hire are more likely to know it than the “simpler alternative.” Relational databases are the best example: I’ve never seen anyone argue against them who was comfortable with them. If you base your system on the Next New Thing, how likely is it that it will still be around in forty years, like SQL? Or that you’ll be able to hire someone to help you with it?
The complexity of a RDBMS is isolated in a single unit that was thoroughly tested and presents a simple API to the programmer. The complexity of a key-value store is mostly in the programmer having to maintain a scheme outside the database and having to deal with new and exciting bugs that usually end up with the kind of data loss that would make MySQL look sane.
“Simpler internally” is not the same as “simple to deal with.” The latter is more relevant.
I’d ask two questions: 1) what needs are we most likely to have in the future? and 2) how much pain will we have if we’re wrong?
For instance, you may need high scalability. You also may need relational integrity.
Which one are you more likely to need? I’d guess “relational integrity”, as every system I’ve ever seen has had at least some relational data. (Even loosely-structured document data needs to belong to a specific user.)
Which one is harder to bolt on later? If you pick a RDBMS and need to scale it, indexes, caching, sharding and clustering are all things that can help. If you pick a NoSQL database and need to add relational integrity and transactions… you’re basically sunk.
Which problem hurts more to have? If you have scaling problems (and your business model is sane) you have proportionally large revenue and can afford to work on scaling. If you have data integrity problems, they may be costing you the only customers you have.
It depends, doesn’t it? If you’ve already got a RDBMS humming along, adding a second type of database is definitely more complex. If you don’t, I can see a case to be made for grabbing MongoDB, but.. throwing random schemaless json documents can cause headaches without adding in complexity in the form of discipline/coordination of changes/monitoring/etc. Some of which you may already have, further complicating the analysis. Either way, it’s easiest to work with the grain of your architecture– which fits the spirit of YAGNI.
Looking at YAGNI in particular, does the term itself ever get invoked in discussion as something more than a tool to shut down conversation?
I think people misunderstand YAGNI. As an engineering principle the idea is that, when you find yourself asking “Hmm, should I do X or build Y now, because someone might want it”, then the answer should be No. It arose in opposition to the Java Factory Factory Factory Overapplication pattern of adding extra injection points “just in case” someone wanted to introduce a different kind of FooBean down the road, which usually never happened, leaving you with a lot of extra complexity to read through for zero real-world gain.
It doesn’t really apply to questions like “Do I want X or Y?”, in my opinion. It’s purely a heuristic for rejecting undertaking “just in case” work you don’t actually have a concrete use-case for.
(In the case of RDBMS vs Mongo, I propose a different heuristic: NENM. Nobody Ever Needs Mongo).
I’m starting to notice a similar trend with software deployments these days. Small companies with like 5 customers and a CRUD app using clusters running schedulers orchestrating containers and what not.
I feel a lot of such software is a response to problems that you face when you’re running at Google scale. So taking the same technology and applying it to a much smaller application is just going to add more complexity.
Schedulers also offer features that everyone needs, regardless of scale: deployment and lifecycle of your application.
It’s true that a Capistrano/Fabric/Ansible script could work for some of the use cases, but a battle tested deployment tool is valuable for small scale projects as well.
Sorry, I don’t understand what “lifecycle of your application” means. Could you elaborate on that?
And, Capistrano/Fabric/Ansible are battle tested as well, no?
By lifecycle I meant: canary deploys, rollbacks, blue-green deployments and similar tasks.
In my admittedly limited experience with Fabric/Capistrano, they don’t seem to be able to reason about the entire state of your server pool. What happens if Fabric fails to run the task on 10% of servers? How about 50%? How do ou handle auto-scaling? Servers self-provision with Puppet/Chef? You bake AMIs?
Most of these potential issues seem to be handled quite well by cluster management tools like k8s, docker swarm, nomad and others.
There’s a real upfront complexity cost to get started with them, but it also simplifies many other tasks.
Agree about Fabric/Capistrano. They don’t scale that well.
Ansible is in a different bucket though. I have more experience with SaltStack, and I like that I can reason about all my servers just by looking at the code. And all the problems you mentioned are easily solvable with SaltStack.
Provisioning is again a different domain. You could provision servers with Salt, but I prefer using something like Terraform for that purpose.
I get your point though. I’ll have a look at how k8s is handling these problems.
I also use terraform to provision the k8s cluster nodes but after that I haven’t found the need for traditional provisioning tools like chef/puppet. It’s all k8s pods and services.