This is the second time in the last week I’ve come across a thought-provoking remark on eventual consistency in a system. From this article:
Instead of thinking of mutable data as the default, I prefer to think of it as data that destroys its own paper trail. It shreds any evidence of the change and adjusts the scene of the crime so the past never happened. All edits are applied atomically, with zero allowances for delay, consideration, error or ambiguity. This transactional view of interacting with data is certainly appealing to systems administrators and high-performance fetishists, but it is a poor match for how people work with data in real life. We enter and update it incrementally, make adjustments and mistakes, and need to keep the drafts safe too. We need to sync between devices and across a night of sleep.
The previous time was when I stumbled across a three-article series[1] by Vaughn Vernon on object design, where the following indicates a similar way of thinking:
Thus, if executing a command on one aggregate instance requires that additional business rules execute on one or more other aggregates, use eventual consistency. Accepting that all aggregate instances in a large-scale, high-traffic enterprise are never completely consistent helps us accept that eventual consistency also makes sense in the smaller scale where just a few instances are involved.
Ask the domain experts if they could tolerate some time delay between the modification of one instance and the others involved. Domain experts are sometimes far more comfortable with the idea of delayed consistency than are developers. They are aware of realistic delays that occur all the time in their business, whereas developers are usually indoctrinated with an atomic change mentality. Domain experts often remember the days prior to computer automation of their business operations, when various kinds of delays occurred all the time and consistency was never immediate. Thus, domain experts are often willing to allow for reasonable delays—a generous number of seconds, minutes, hours, or even days—before consistency occurs.
It still pains me to imagine a system where you cannot ever know for sure whether the data is consistent, but this made me sort of warm up to the idea that maybe you don’t need to know that your data is consistent. Maybe gradual deterioration of data consistency is part of the natural state of things, as long as you also allow for some degeneracy[2], i.e. that some things can be represented in multiple different, complementary ways, to give a roughly correct picture even in the face of some inconsistency.
When you say that you are pained by the idea of a system that allows inconsistency, that hits close to my heart. In many of my (often small) data cleaning projects I based the design on a “Make it consistent as fast as possible, and then don’t do anything to break that consistency” principle.
It’s a nice way to program as long as things go right; but the resulting programs have felt brittle. I would write a candidate program, discover some inconsistencies in the output, and have to debug the program and rerun its cleaning pipeline. When I reran the cleaning pipeline it would recompute not just the cases with problems, but also the already-good cases; which, because they were updated in-place, would leave me a bit anxious whether the update was really a no-op.
I did store each step’s intermediate results for easier pipeline debugging by direct artefact inspection, and I think I was on the right track there; but emphasising the results kind of hid the changes because there were so many files. Next time I’m going to try emphasise the changes a bit more, perhaps by having each intermediate folder keep only those intermediate results changed w.r.t. the previous step. See if I can make it obvious how ‘this item was already good; that item was fixed in 1 step; thot item needed 3 fixes applied to it before it was consistent’. At which point a big ‘produce consistency’ task becomes a lot of small two-part tasks shaped like ‘(a) detect this kind of inconsistency; (b) fix it in this way’ . That would be tolerant of those inconsistency forms because it knows how to fix them.
Come to think of it, DVCSs like Mercurial are also about ‘work towards consistency; you won’t get there in one go, but you can store all the intermediate steps als explicitly inspectable items’.
So, thanks for the thought-provoking links. I’d write even more, but it’s midnight here. Good night!
The perfectly mutable medium of computer memory is a blip, geologically speaking. It’s easy to think it only has upsides, because it lets us recover freely from mistakes. Or so we think. But the same needs that gave us real life bureaucracy re-appear in digital form. Only it’s much harder to re-introduce what came naturally offline.
On pointerless data:
Data structures in a systems language like C will usually refer to each other using memory pointers: […] This has a curious consequence: the most common form of working with data on a computer is one of the least useful encodings of that data imaginable. It cannot be used as-is on any other machine, or even the same machine later, unless loaded at exactly the same memory offset in the exact same environment. Almost anything else, even in an obscure format, would have more general utility.
On reactive architecture:
React is not actually a thing to make web apps. It is an incremental job scheduler, for recursively expanding a tree in an asynchronous and rewindable fashion.
The idea of resource use being like a large tree, akin to React, and/or being able to move processes from one machine to another makes me think of two technologies: Erlang, due to the process tree metaphor often used in OTP, and Urbit, as crazy as it is, since it involves actually re-thinking software to make it portable between two systems in that sort of way.
Perhaps part of the problem is that most programs are grossly underspecified? But how would one move a program that depended on certain files and sockets from one machine to another, without locking all of that state into a VM of sorts?
Regarding the legibility: grey on black can’t be pleasant to read, indeed. I don’t think that style is the author’s intention, though: the web page renders as near-black text on a mostly-white background for me.
Regarding the contents: I can tell you that the essay is perhaps the best I have read in months, a rich vein of gems of insight. It’s worth your time even if you have to copy-paste it into a word processor first.
Many sites become vastly better as some poor sods CSS, that they slaved away at for weeks, just gets junked and replaced by something simple and soothing.
The page was generally readable for me in firefox on windows. I recommend to try reading mode with your preferred color scheme, the page seems to be properly displayed that way, I did not spot missing content.
This is the second time in the last week I’ve come across a thought-provoking remark on eventual consistency in a system. From this article:
The previous time was when I stumbled across a three-article series[1] by Vaughn Vernon on object design, where the following indicates a similar way of thinking:
It still pains me to imagine a system where you cannot ever know for sure whether the data is consistent, but this made me sort of warm up to the idea that maybe you don’t need to know that your data is consistent. Maybe gradual deterioration of data consistency is part of the natural state of things, as long as you also allow for some degeneracy[2], i.e. that some things can be represented in multiple different, complementary ways, to give a roughly correct picture even in the face of some inconsistency.
[1]: Effective Aggregate Design: https://dddcommunity.org/library/vernon_2011/
[2]: Degeneracy, Code and Innovation: https://adl.io/essays/degeneracy-code-and-innovation/
When you say that you are pained by the idea of a system that allows inconsistency, that hits close to my heart. In many of my (often small) data cleaning projects I based the design on a “Make it consistent as fast as possible, and then don’t do anything to break that consistency” principle.
It’s a nice way to program as long as things go right; but the resulting programs have felt brittle. I would write a candidate program, discover some inconsistencies in the output, and have to debug the program and rerun its cleaning pipeline. When I reran the cleaning pipeline it would recompute not just the cases with problems, but also the already-good cases; which, because they were updated in-place, would leave me a bit anxious whether the update was really a no-op.
I did store each step’s intermediate results for easier pipeline debugging by direct artefact inspection, and I think I was on the right track there; but emphasising the results kind of hid the changes because there were so many files. Next time I’m going to try emphasise the changes a bit more, perhaps by having each intermediate folder keep only those intermediate results changed w.r.t. the previous step. See if I can make it obvious how ‘this item was already good; that item was fixed in 1 step; thot item needed 3 fixes applied to it before it was consistent’. At which point a big ‘produce consistency’ task becomes a lot of small two-part tasks shaped like ‘(a) detect this kind of inconsistency; (b) fix it in this way’ . That would be tolerant of those inconsistency forms because it knows how to fix them.
Come to think of it, DVCSs like Mercurial are also about ‘work towards consistency; you won’t get there in one go, but you can store all the intermediate steps als explicitly inspectable items’.
So, thanks for the thought-provoking links. I’d write even more, but it’s midnight here. Good night!
Some choice quotes to pull you in.
On in-place modification:
On pointerless data:
On reactive architecture:
The idea of resource use being like a large tree, akin to React, and/or being able to move processes from one machine to another makes me think of two technologies: Erlang, due to the process tree metaphor often used in OTP, and Urbit, as crazy as it is, since it involves actually re-thinking software to make it portable between two systems in that sort of way.
Perhaps part of the problem is that most programs are grossly underspecified? But how would one move a program that depended on certain files and sockets from one machine to another, without locking all of that state into a VM of sorts?
Grey text set on a black background does not make for a very readable webpage.
I am jumping to the conclusion that this is a vanity site and judge the content accordingly.
Regarding the legibility: grey on black can’t be pleasant to read, indeed. I don’t think that style is the author’s intention, though: the web page renders as near-black text on a mostly-white background for me.
Regarding the contents: I can tell you that the essay is perhaps the best I have read in months, a rich vein of gems of insight. It’s worth your time even if you have to copy-paste it into a word processor first.
Install tranquility firefox extension.
Many sites become vastly better as some poor sods CSS, that they slaved away at for weeks, just gets junked and replaced by something simple and soothing.
Thanks. Giving it a try.
It certainly solved my grey on black problem.
The page was generally readable for me in firefox on windows. I recommend to try reading mode with your preferred color scheme, the page seems to be properly displayed that way, I did not spot missing content.
The background is white for me.