If you read the linked article at the bottom of this article, you find out that this number only holds up for Postgres. It’s a cool hack, but this is all just working around deficiencies in the query planners in various databases and so its effectiveness will depend on the planner in question.
Do programmers really not care about time complexity anymore? Obviously there are lots of instances where it doesn’t matter, but it still seems like a useful concept to understand.
That’s usually the point where I see the become-a-developer-in-two-weeks folks fall down. And why I worry very little about becoming unemployable. :)
I think it depends greatly on your domain.
In process engineering, for example, we no longer have to care about it. The embedded processors we have now are so powerful they can take even the most poorly coded set of calculations and whip through them without lagging, whereas back in the 80s and 90s this was a very real concern. And the overall complexity of our applications is decreasing due to “smart” instrumentation that handles a lot of the annoying calculations for us.
I have no idea what sort of emphasis this gets in modern CS courses, as I’ve been out of college for several decades now.
As I’ve mentioned before, my last (hobby) project was on a 6502 processor, so time was not just on my mind, but to partially quote a Star Trek movie “the fire in which I burned.”
Is the concern that programmers today don’t understand COBOL or that they don’t understand the business logic that these COBOL programs encode? My understanding of the COBOL gap problem isn’t so much that COBOL programmers are retiring without replacement COBOL programmers, but that these programmers are the only ones who understand the business logic in these programs and this knowledge isn’t being transferred to the younger generation.
Preventing stale reads requires coupling the read process to the oplog replication state machine in some way, which will probably be subtle–but Consul and etcd managed to do it in a few months, so the problem’s not insurmountable!
Consul and etcd implement Raft which is a proven consensus algorithm. Part of the issue with MongoDB, based on my following it from a distance, is they seem to be building their own distributed systems algorithms and they don’t appear to have the talent to accomplish it.
And remember: always use the Majority write concern, unless your data is structured as a CRDT! If you don’t, you’re looking at lost updates, and that’s way worse than any of the anomalies we’ve discussed here!
Does this actually help you in MongoDB? I am under the impression MongoDB does not support CRDTs so it will simply drop writes, as shown in the analysis.
On an emotional note, it’s so distressing reading about MongoDB. It would be one thing if they were just reimplementing the last 30 years of database technology because of NIH. But they are reimplementing it wrong. Yet it is massively popular. These are the things that make me depressed about the software industry and want to move to a farm.
+1 for wanting to move to a farm. And not just because some people are doing stupid things, because well seasoned developers are being ignored. It’s no longer that people just want to do the right thing, it’s that people want to be doing something so long as there is motion! More lines of code, more bug trackers, more issues fixed, more complexity, more features… less problems solved.
Consul and etcd implement Raft which is a proven consensus algorithm.
There really aren’t any proven consensus algorithms running in the wild – and absent a system which mechanically and correctly translates proofs into code, there probably won’t be. Etcd and Zookeeper both have had consistency bugs, despite their theoretical backgrounds, owing to implementation errors. The often forgotten part of every distributed system is the ability of each and every human implementor of code in the critical path to have fully understood all of the possible failure cases at the time they wrote the code.
CRDTs can be application-resolved so every database ‘supports’ CRDTs, with smart enough applications. The implementation is often profoundly unpretty, though.
There really aren’t any proven consensus algorithms running in the wild – and absent a system which mechanically and correctly translates proofs into code, there probably won’t be.
This is exactly why I said the algorithm is proven, not the implementations. MongoDB is not even running a theoretically proven algorithm, it appears to be the a patch-work of attempts to get something working.
CRDTs can be application-resolved so every database ‘supports’ CRDTs, with smart enough applications
How is this possible if the database drops your writes?
Sorry, I thought you were using the argument to authority, and wanted to highlight the difference.
CRDTs don’t have anything to do with consistency in the face of failed writes; they’re merely a technique for resolving differences between two apparently correct values with data structures.
CRDTs don’t have anything to do with consistency in the face of failed writes; they’re merely a technique for resolving differences between two apparently correct values with data structures.
If I’m dropping writes then I’ve lost the other value, which is the problem.
that’s an orthogonal problem. Every database can drop writes given sufficient partition; that doesn’t stop some from having CRDT implementations.
Your statement was:
CRDTs can be application-resolved so every database ‘supports’ CRDTs, with smart enough applications.
The application cannot resolve anything if it does not have all of the writes because they have been dropped by the database.
So no, it is not an orthogonal problem.
Every database can drop writes given sufficient partition
Dropping writes doesn’t have to have to do with partitions, it’s about accepting a write then throwing it away.
I feel like you’re wilfully ignoring the causal arrow in my statements, so I’m ending the conversation. Good luck!
I am sorry you feel that way, you could simply explain how a CRDT helps when the database is discarding writes.
CRDTs don’t have anything to do with consistency in the face of failed writes; they’re merely a technique for resolving differences between two apparently correct values with data structures.
They are data structures that consistently resolve causally parallel modifications into a single successor. If your DB does’t natively support them or expose the conflicts in any way, you cannot apply that technique.
For example, if you have a value A in the DB and then it accepts two with that single ancestor, let’s call the writes A' and A'‘, and then internally resolves the conflict to either of them, where do you apply the CRDT merging logic?
CRDTs can be application-resolved so every database ‘supports’ CRDTs, with smart enough applications. The implementation is often profoundly unpretty, though.
You can only use them if the DB offers some sort of control for conflict merging, right? If it drops the conflicts on the floor or just resolves to an arbitrary write CRDTs would not help you in any way.
That’s what CRDTs are: a conflict resolution mechanism. You can implement CRDTs with dumb pencil and paper, if you like.
But you need the conflicts in order to resolve them, which you don’t have if the database throws them out.
For some reason I keep NOT using the MAS; it feels like it’s easier to just download a dmg and drag the file to ~/Applications… Same applies to all non developer friends that use a mac AFAIK.
I prefer MAS versions because it updates my apps when I’m not using them rather than popping up an annoying dialog when I start the app.
I always disable auto-updates so I can read about what is in each one… but I am surely not “typical” user in that case. :)
I guess I prefer updates to be noisy. I don’t like things to silently change (which might mean “break”).
I’m right there with you for anything remotely production – my desktop just doesn’t qualify in my case.
While I really like this on my iPhone… I agree with jryans here… I prefer my desktop software to be nagging me to get updated so I can make a conscious decision about it. Same reason applies: I don’t want it to break!
For that few apps I do use via MAS; auto update like there’s no tomorrow ;-)
I think one lesson is glacier is only for backups. It sounds like he was storing the originals? The idea of using a cheap USB drive for local storage and glacier for emergency recovery would have been a good compromise.
This is definitely true. I also would’ve assumed that the Amazon Glacier backup was /dev/null until I had successfully restored from it. I was completely blown away that he not only hadn’t tested restore, he didn’t even know what tool he was going to use.