1. [Comment removed by author]

1. 5

I don’t know of any good quantitative research on that, but there’s some good qualitative stuff in writing from Jens Rasmussen and Richard I. Cook. I’d start with the classic “How Complex Systems Fail” (http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf), and follow on with some of Cook’s papers where he looks at some great examples of the problem.

1. 1

Cook also talked about this at the Velocity 2013 conference: https://www.youtube.com/watch?v=PGLYEDpNu60

1. 8

Possibly dumb question, but how do they know the proof is correct/states the problem correctly?

1. 12

You never really do. There are a lot of algorithms that are proven correct using one interpretation of the paper or text description, but are broken using another reasonable interpretation of the text. One great value of the statement of Raft in Coq or TLA+ is that it provides an very precise definition of what the protocol really is, which can be very useful to implementers who care about correctness and don’t want to misinterpret the paper.

Proving things like this has a few challenges (1) does the thing I have implemented match what other humans call Raft? (2) do the invariants I have specified (or constraints, or proof goal, etc) match what I am trying to prove (linearizability)? (3) is my proof correct?

Of these, (2) is the easiest, (3) is tractable with peer review, and (1) is pretty hard. The advantage of writing down Raft in TLA+ is that it’s much easier - Coq maps onto TLA+ much more deterministically than it does onto English. The TLA+ specification then is Raft, and the description in the paper is approximately, hand wavingly Raft.

1. 2

Isn’t one point of Coq to replace peer review with computation to ensure (3)?

1. 1

Not wholly replace, but generalize to peer review of Coq itself. The authors answered on HN explaining this better than I could.

1. 20

People seem not to realize (or not to want to realize) that CAP is a mathematical theorem which is rigidly true in every circumstance and cannot be talked around or ignored. It’s simply a fact. In the presence of network partitions, your system will sacrifice consistency or availability (or, in a no-doubt worryingly large number of cases, both); the most you can do is pick. (It is a safe bet, by the way, that availability is the wrong choice [edit: to keep; that was entirely unclear, sorry].)

(As an amusing-to-me aside, CA is what every system should be in the absence of partitions. If your system cannot deliver both consistency and availability even when everything is going right, it is worse than useless.)

1. 14

As an amusing-to-me aside, CA is what every system should be in the absence of partitions.

Sometimes, for latency reasons, you might want lower consistency even when the network is fully connected. Figuring out when you want that, though… maybe an open research problem, haha.

1. 2

As an amusing-to-me aside, CA is what every system should be in the absence of partitions.

As aphyr said, you may want to do this for latency reasons. For example, theres PA/EL in Abadi’s PACELC taxonomy. Many traditional architectures with replications trees of relational databases offer these tradeoffs, as do many “quorum”-based database systems.

Along with latency, there’s also scale. Again, with relational databases it’s fairly common to have async replicated read replicas take read-only traffic, or to have multi-master configurations with async replication. These systems choose A over C entirely for performance reasons, and may not actually intend to be available under partitions. In fact, many choose a “bounded staleness” model, where they stop returning data after some known staleness, which is not possible to achieve with full availability under partitions. These kinds of systems - very sensible systems - are neither C (linearizable) or A (fully available) under partitions.

1. 2

This is true. Actually, the extreme strength of the notion of consistency Brewer used (that is, linearizability) is a point that can be used to argue against the conclusions of the CAP theorem, because depending on the data model, many systems can be meaningfully consistent without full linearizability.

I’m not aware of any work to prove (or disprove) the CAP theorem for different notions of consistency, though I would conjecture that the lower bound on consistency possible while maintaining availability is uselessly low.

1. 4

I’m not aware of any work to prove (or disprove) the CAP theorem for different notions of consistency

I suggest http://www.vldb.org/pvldb/vol7/p181-bailis.pdf, which includes a handy summary of impossibility results for various consistency models.

2. 1

I can not remember the name good enough to find it in google, but MSR had an interesting paper trying to figure this out to some degree. Pileaus or something.

1. 1

(looks like your memory is totally CA.)

1. 1

Yep, that’s it! That only does these adaptive things on reads.

1. 5

A common theme in aphyr’s excellent jepsen posts+tweets is him gently asking “why don’t you use a proven consensus algorithm”?

Does anyone have any perspective on why this advice isn’t taken more seriously? Not-invented-here? Implementation complexity/too much effort? Lack of confidence/belief in the benefits and/or academia?

I’ve been guilty of similar behaviour in the past I guess (not on a distributed system, but using ordered msync() on a custom index file to emulate transactions to avoid the use of a db).

I think I thought at the time my solution was “mostly good enough” with the failure modes being unlikely, but it’s a while ago now :-)

Are there actually any good reasons to not use Paxos or Raft (and are there alternatives to these two?)

1. 7

I’m not an expert, so get some corroboration here but.

• Are there alternatives to Paxos and Raft? There’s at least Zab as well used in Zookeeper.
• Why isn’t this taken more seriously?
• Generally Paxos (and even Raft) are known to be resistant to implementation; it’s easy to break them
• The above point is well-known and can scare people off
• Consensus failure tends to be thought of as an edge-case failure of low consequence
• Actual knowledge about the various theoretical models of consensus is harder to come by than a team of people willing to just try it

But finally, the most important point is just that trade-offs are everywhere! Paxos/Raft aren’t a silver bullets—they’re slow, unstable, and only work under some assumptions that are honestly a little difficult to pull off in practice. When analyzing them it may be easy to feel that you can do better for some specific circumstance by relaxing some of their assumptions and just building your own system.

I think a lot of what Kyle’s posts have emphasized is that Paxos/Raft aren’t just a set of strange algorithms solving a theoretically interesting problem—they’re battle-hardened weapons-of-war against a remarkably difficult problem. Databases who just don’t include them think that they can go into battle with a new strategy and Jepsen does a marvelous job of throwing just the right kind of response to cower them.

So maybe what people should be realizing is that they didn’t think hard enough about their strategy to realize just exactly how difficult claims in distributed systems genuinely are. The CAP theorem talks about hard limits which are much lower than human experience (which is decidedly distributed, but we essentially spend our whole lives ignoring that) would have us believe. Paxos and Raft are slower and more fragile than you might think—but what makes you really, really think that you have a trick to getting the speed and robustness that you want?

But again, it’s all tradeoffs. Usually the big wins that Jepsen takes home are when a product’s marketing team likes to throw around technical terms it has no right to use. Actual use is probably a lot stickier—and it’s often the case that actual needs of a product may not be ACID or linearizability. But to make that tradeoff is hard and to market that tradeoff seems to be universally considered not worth the effort.

1. 5

Are there actually any good reasons to not use Paxos or Raft (and are there alternatives to these two?)

There are several good reasons. Three that come to mind from my perspective as somebody who has built several systems on these primitives:

• Your use-case doesn’t need the guarantees that they provide (this one is obvious, but deserves re-stating).
• The distributed state machine model is beautifully simple in theory, but has significant practical complexities. The biggest of these is that writing truly deterministic code is hard. You can’t use floating point, you can’t allocate memory (unless you are VERY careful), you can’t do any IO, etc. Not impossible, but significantly hard.
• These algorithms are resistant to making into clean libraries. Consensus systems tend to need to be designed from the ground up to as consensus systems to work well in the real world.

Now, that doesn’t mean that you shouldn’t use them, just that they have high engineering costs.

are there alternatives to these two?

ZAB, Viewstamped Replication, and the many variants of Paxos (vertical, cheap, etc). There are also other design approaches, like virtual synchrony. One way to look at all of these is as different algorithms. Another way is that they are all different ways of leveraging the same core algorithm (Paxos’s synod algorithm, VR’s viewchange, etc) into real systems.

Not-invented-here?

This is a huge part of it. These things are extremely difficult to reason about clearly, and it’s very easy to invent a consensus protocol that you can’t find a counter-example for. Reality, however, is much more creative.

1. 4

[…] and are there alternatives to these two?

There is also Viewstamped Replication (original paper and updated) and Zab (e.g. Zookeeper, paper). And here is a comparison of Paxos / VR / Zab.

Are there actually any good reasons to not use Paxos or Raft

To your first question, I don’t think most people lack confidence in the academic side, or think these algorithms are formally incorrect. I feel they are well proven at this point.

I think the advice is taken seriously (at least I hope so) … but I think the slow progress boils down to real-world constraints, tradeoffs and implementation challenges. For example:

• External systems such as Zookeeper are tricky to configure correctly, and add additional complexity to a system. Instead of deploying DatastoreX, you have to deploy DatastoreX + Zookeeper…and then keep both clusters healthy. This applies to any “external” consensus server, like Zookeeper, etcd, Logcabin, etc

• These consensus algos aren’t really “drop in place”; they need to be fully integrated into the system, otherwise all kinds of weird stuff can happen. Which can make it tricky to integrate an existing consensus library into your own code (because of bog-standard, mundane reasons).

• Even very clear, explicit algos like Raft have some room for interpretation, which can cause strange or undefined behavior. And parameters can be fiddly, leading to poor performance. And of course, legit bugs are always possible too, and many of these libraries are either relatively new, or implementing hairy algos like Paxos

• Most of the simpler variants of consensus algos do not tolerate dynamic changes to the node list (e.g if you want to resize your cluster, you have to bounce the entire cluster). You typically need a more complex implementation to change node counts at runtime (e.g. Section 2.3, “Dynamic Paxos”, Zookeeper reconfiguration, Section 6 of Raft)

• Similarly, more complicated implementations are usually needed for better performance (Section 9.3 in Raft, Fast Paxos)

• It isn’t always clear what you should put through the consensus state machine, and what should be excluded. Do you send all data through the state machine? That will crush performance of the server….which is a serious problem, because many of these distributed datastores exist to address scalability and performance problems found in monolithic systems. If distributed DatastoreX is just as slow as a monolithic DB because it has to get consensus on all writes…I might as well just use the monolithic system.

• Alternatively, you could maintain consensus only on metadata, or perhaps certain types of data. Or allow the user to select consistency levels per-data-type or per-operation You can build any number of scenarios, and it just places your system on a continuum…none of which are strictly “good” or “bad”.

• Simple history is sometimes the blame. Project starts as something small for one task, grows, starts dealing with these types of issues, needs to retrofit a lot of older code. Hopefully the next “generation” of distributed datastores will take these considerations into account from the start, since it is a much more visible topic now.

Now, I’m not saying there isn’t a good dose of NIH syndrome, or “our algo will do just fine”, or “good enough”. That definitely happens too :)

But there are real challenges to implementing these things. Challenges which can certainly be overcome, but it isn’t easy as “just” implementing an algorithm and calling it a day.

Which is why its a good thing people like Aphyr are documenting the shortcomings of these systems, to either prompt more rigorous development, or proper documentation of the risks/tradeoffs of a system. There’s nothing wrong with being a fast, lossy datastore…as long as your users know exactly what they are buying into. :)

These are just my personal opinions about the whole ecosystem, but I think it’s a fair coverage of the realities.

1. 6

I admire his decision, but at the same time I hope I never get to that point. I want to like coding, and I don’t want to burn out. Have any good articles been written on this? Is it a matter of personality/body type?

1. 16

Burnout is on my long list of things to write about.

But a short summary is: take care of yourself. Don’t be a doormat. Take risks. Avoid places where you aren’t respected. Exercise. Be in good relationships with people. Ignore tech for seasons at a time. And kill off perfectionism, because it can consume you. Find ways to enjoy yourself in tech, even after many years.

1. 2

You should write about it, because I think that’s a good summary.

Find ways to enjoy yourself in tech

This has been critical for me. We’re blessed (for now) in the field that we’re in that there’s a huge amount of choice available. That’s true everywhere, although more true in the world’s big tech cities. Taking advantage of it is key.

2. 6

Everyone is different. For myself, I prevent burnout by taking breaks. When I get home I do housework: yardwork, cooking. I also do a bit of writing - a blog (or two), watch TV, read blogs and admire folks who build computers out of wood in their basement. I think burnout is the effects of binging on work. We all do that sometimes, and we should take a break to recharge.

If you like to keep programming but feel really burned out, you could try changing application field. Computer science is required EVERYWHERE. It’s often very refreshing to change where you apply it.

For example, I’m “taking a break” now by working at a great company that does bio-informatics. My previous background had been in Neuroscience. Learning a new field of biology and branching out into new directions of computer science has been very exciting for me.

1. 3

If you like to keep programming but feel really burned out, you could try changing application field. Computer science is required EVERYWHERE. It’s often very refreshing to change where you apply it.

This is a really good point. I have been in a dozen industries as a software developer, and each time it was exciting because even if I knew the tech, I didn’t yet know the domain.

2. 5

I think it is profoundly impacted by personality and environment. You have to understand what works for you and seek it out.

Personality wise: the idea of constantly learning and problem solving has to be something that appeals to you. For me, I couldn’t stand being a bartender… I would be extraordinarily unfulfilled, and insanely bored. But, the poster enjoyed being a bartender/chef: “Over the next couple years I began working as a chef, and as a bartender to make my living and I was fairly happy with that lifestyle.”.

Environment wise: where (and how) you work matters a lot. The author apparently was mainly a “line developer” closing tickets and working on others peoples code, 40 hour work weeks, etc. I personally can’t imagine working in that type of environment for multiple years. Prior to becoming a founder I did a mix of start-up work and contracting work. I always ensured I had flexible scheduling and work from home options – not because everyone needs that, but because I needed it.

Burnout generally doesn’t sneak up on you – it is a slow building thing, and if you find yourself building towards burnout, try to arrange a sabbatical and recharge your batteries. I am honesty curious if this “new life” for the poster will end up being just an awkwardly done sabbatical after he learns the farm life isn’t really for him, or that he has material needs far better met by being a programmer versus a farmhand.

1. 2

Create the opposite of what you would at work, if you create boring business apps at work then create games at home, if you write in Java at work then writing in Java at home could feel too much like work so you could change that too. If you program for the man at work then program for yourself at home. At work I maintain a library where I fix bugs/implement features for other people, at home I also maintain a library but I implement exactly the features I want to implement in the way I want to implement them, sure I waste time reinventing the wheel or writing code I won’t use, but I have fun doing it. If all I did was chores for people at work and didn’t do any programming at home then I would probably feel burnt out.

1. 2

Thank you for the thoughtful comments. My personal strategy for this summer is to go overnight backpacking or bike riding every other weekend in the summer. I also do open source or blogging when I get home from work because I enjoy it, but if it feels like too much in the future I can drop it.

1. 3

“Sunshine is said to be the best of disinfectantants” wrote supreme court justice Brandeis, SSL Everywhere puts all traffic in the shade.

I’m not sure I follow this argument.

It seems to characterize traffic into three segments (a) traffic that needs to be secure, (b) traffic that doesn’t deserve to be secure, and © evil traffic. Then, we’re lead to believe that if we don’t use SSL for (b), then we’ll easily be able to find ©. The problem of telling (a) from © is left unsolved. That’s really the hard problem - if you have any legitimately secure traffic, isn’t finding the evil traffic just as hard as if you encrypt all traffic?

1. 2

The first row means: “One web-server connected to one SQL database” is a CA system, not available during partitions.

The author is very confusing. CA systems do not exist. I think they are trying to make some meaningful distinction but CAP is really not expressive enough for that, instead PACELC might be a better set of letters to work with.

1. 3

I agree about PACELC, and I think that’s close to the author’s point. There’s a lot of confusion coming from CAP’s particular version of A, which is very useful for some system uses, but is frequently misunderstood.

I wrote about it here: http://brooker.co.za/blog/2014/07/16/pacelc.html

1. 1

Both PACELC and yield/harvest are “beyond CAP”: You want to say that you chose eventual consistency for another reason than cap-availability? That’s PACELC You want to define a system with reduced availability? That’s yield/harvest.

I mention them in the conclusion of the post, but not inside the post because the post is just about CAP.

1. 2

This post is of low-quality. The author doesn’t seem to have really grasped the Harvest and Yield paper.

1. 3

Could you elaborate? I don’t think you’re right. Your main contention seems to be that you can’t give up partition tolerance, but his claim is that if you don’t have any replicas, you have both consistency and availability, but not partition tolerance, which seems reasonable. It’s just not distributed.

In particular, Jay Kreps mentioned this post as being high quality on twitter, and I trust him to have a pretty good distributed systems horse sense.

1. 1

CAP applies to systems not a single piece of data, which is what harvest and yield are about. In the case of failure not all of your data is available but the system might be. And depending on the semantics of the system, that might be just fine.

When the author says there is no such thing as an AP big data store, that is simply false. I have worked on systems that are AP and big data. It worked because missing some data was ok it just meant the quality of a decision during a failure was degraded.

1. 1

there is no such thing as an AP big data store I have worked on systems that are AP and big data It’s an interesting point. At the data store level, saying ‘I don’t know if I have or had this data’ is considered as an error (excepted for some crazy definitions of datastore: but an eventually consistent store does not allow this). Then an application can perfectly encapsulate this. It depends on the application, but this application by itself is not itself a data store (but yes, it is a big data application).

1. 1

As far as I can tell, both Harvest and Yield are in terms of both data and entire systems. They phrase themselves in success rate, which can be considered an “entire system” idea, but clearly the success rate is bounded by availability of underlying data.

My understanding is that strict AP would mean that even in a partition, you have access to all of the data in a system, unless all of the replicas are down. Instead, the system you’re describing sounds closer to neither strictly available nor strictly consistent.

1. 1

I know of nobody who believes availability is defined as that and I have not had the idea of “strict availability” come up in discussions. And it clearly is not useful definition after a few seconds thought. People care about systems as a whole.

1. 2

In particular, Harvest and Yield and the CAP theorem paper both define availability that way.

Harvest and Yield:

High availability is assumed to be provided through redundancy, e.g. data replication; data is considered highly available if a given consumer of the data can always reach some replica.

And

AP without C: HTTP Web caching provides clientserver partition resilience by replicating documents, but a client-server partition prevents verification of the freshness of an expired replica. In general, any distributed database problem can be solved with either expiration-based caching to get AP, or replicas and majority voting to get PC (the minority is unavailable)

Cap Theorem:

For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response. That is, any algorithm used by the service must eventually terminate. In some ways this is a weak definition of availability: it puts no bound on how long the algorithm may run before terminating, and therefore allows unbounded computation. On the other hand, when qualified by the need for partition tolerance, this can be seen as a strong definition of availability: even when severe network failures occur, every request must terminate.

And

It is possible to provide high availability and partition tolerance, if atomic consistency is not required. If there are no consistency requirements, the service can trivially return v0, the initial value, in response to every request. However it is possible to provide weakened consistency in an available, partition tolerant setting. Web caches are one example of a weakly consistent network.

It’s fine if you don’t think that reasoning about distributed systems in this way is useful–Brewer says much the same in Harvest and Yield, which is why presented what he considered a more useful metric than strict availability.

1. 1

Great quotes. The theme I see through them is that availability is a spectrum and you can pick spots on the spectrum that make sense to you, which is what I got out of harvest and yield. The CAP Theorem quote you have just says you need an answer, not what the quality of that answer has. I do not believe the article tells that story.

2. 3

To be fair, the harvest and yield paper’s treatment of CAP isn’t any better. For example, it says:

CA without P: Databases that provide distributed transactional semantics can only do so in the absence of a network partition separating server peers.

Even in context of the paper, that’s misleading. The point that the post author is making actually turns out to be a very similar one to the harvest and yield paper. It’s also expressed in a way that’s tricky to follow, but it’s a subtle topic and hard to write about well.

1. 1

The biggest issue I have with the article is that it conflates the availability of a whole system with that of parts of the system. You are right that it is a subtle topic, but I believe this article adds nothing positive to the discussion and even adds some misinformation.

2. 2

I’m doing a lot of head scratching. Maybe there’s an idea here that I’m not getting but its really not laid out well.

1. 1

Sorry about that. The main point of this post is to look at the definitions of availability to show that there are different definitions: A system can be highly-available but not available by the definition of CAP. And then showing how it propagates to common systems. This said, the post is difficult to read if you have not seen previously the CAP theorem proof. If you want to try again :-), you can have a look at the first post of the series. Especially, the post http://blog.thislongrun.com/2015/03/comparing-eventually-consistent-and-cp_11.html introduces some of the definitions.

1. 10
1. 9

Thanks you, that’s a great illustration.

SQLite was originally designed with a policy of avoiding arbitrary limits. Of course, every program that runs on a machine with finite memory and disk space has limits of some kind. But in SQLite, those limits were not well defined. The policy was that if it would fit in memory and you could count it with a 32-bit integer, then it should work.

Unfortunately, the no-limits policy has been shown to create problems. Because the upper bounds were not well defined, they were not tested, and bugs (including possible security exploits) were often found when pushing SQLite to extremes.

1. 6

Have you seen any other good illustrations? I’ve suffered through a lot of code like this, which is a good illustration of why “Zero, One, Infinity” is often better than the alternative:

``````char buf[100];
strcpy(buf, inp1);
strcat(buf, "-");
strcpy(buf, inp2);
strcpy(buf, ".csv");
parse_csv(buf);
``````

And of course we all remember how pre-GNU Unix utilities almost all had some (usually undocumented) maximum line length, and pre-Perl scripting languages almost all had some maximum string length (often 255 bytes), and so anything that might contain unusually long lines was prone to break shit, often in surprising and exploitable ways. Or maybe we don’t all remember this. But it sure happened.

A lot of this depends on how you handle errors, including resource-exhaustion errors. If you just log a stack trace and restart the system without the error-inducing input, that’s probably hard to exploit, although maybe an attacker who can detect the restart can infer something about the length of some data that’s supposed to be secret from them, which can eventually reveal the secret data to them if it’s, say, getting gzipped along with data they themselves supply. But if you handle errors by truncating strings, or by truncating result sets, or by overwriting your stack frame (as in my above example C code), it’s often a major vulnerability!

Maybe one example of saying no to infinity would be in the C standard; §5.2.4.1 of ISO/IEC 9899:TC2 (at least as of the 2005 draft) says:

The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:

• 127 nesting levels of blocks
• 63 nesting levels of conditional inclusion
• 12 pointer, array, and function declarators (in any combinations) modifying an arithmetic, structure, union, or incomplete type in a declaration …

Plauger wrote about the reasoning behind this kind of thing at some point. He talked about his experience with some previous language standard that required that some such thing, maybe identifier lengths, would have no limit. (I read the book in the 1990s, so I can’t remember exactly.) But since that’s untestable, people who wrote standards testing suites had to choose some maximum length to test, and then people wrote their compilers to pass the tests, and then rejected future standards testing suites that tested larger maximum lengths. So, effectively, the standards committee had merely delegated the determination of the minimum supported identifier length to the test-suite authors.

So the C standard, where Plauger was on the committee (maybe still is?) was careful to specify some minimum maximums. So if your C compiler only allocates space for 16 levels of nested blocks, it’s definitely not compliant (at least with the draft I quoted above), even though that’s far more levels than it’s reasonable to write by hand.

1. 5

That seems like a perfectly reasonable reason to use zero-one-infinity. It’s genuinely is an arbitrary limit. Rather, it’s one based on certain factors which are nigh guaranteed to change over the lifetime of the deployed artifact (e.g., size of memory).

Some such constraints are valuably arbitrary. 32-bit architectures were an “arbitrary” choice but also one so significant that (a) not setting that limit would destroy performance and (b) then change from 25 to 26 was going to be a big one in any circumstance. It doesn’t make 5 less arbitrary than 6, though.

A place where zero-one-infinity fails badly might be backpressure in a distributed system. Zero and one are clearly out when talking about rates of delivery—there’s not even any good sense in which they could be the answer. However, if you jump next to infinity, as many people do, you design systems which are highly susceptible to failure under spikes in demand. It turns out that you have a fuzzily-bounded resource limit and by treating the upper bound arbitrarily you’ll suffer very bad side effects.

The right answer is to, of course, pick a “moderate load” set point and build a control system around that set point which you tune so as to set the distribution on state space to have the right expected load and the right variance.

It’s a completely different kind of resource limiting when you avoid zero-one-infinity. You cannot even use the terminology of hard limits any longer in the same way that taking “infinity” seriously means that you have to start thinking of streaming/lazy algorithms instead of batch ones.

1. 2

a good illustration of why “Zero, One, Infinity” is often better than the alternative:

The version where you malloc() and realloc() the buf as you go doesn’t fix the whole problem. You still need to handle the case that you run out of memory gracefully, and you still need to apply a limit to input length. It’s better in that it scales to machines with more memory more gracefully (modulo vm_overcommit issues and the like), but it’s worse in that it hides the problem from the programmer.

There’s no single clear solution, of course. SQLite’s “buttons and knobs” approach is better in some circumstances, but adds a bunch of complexity. A fixed-size buffer is better in other circumstances, because it’s simple. The realloc approach is better in some circumstances, because it’s more flexible and future proof. Any solution without bounds checking is broken.

A lot of this depends on how you handle errors, including resource-exhaustion errors.

vm_overcommit behavior in Linux (and similar behavior in other modern OSs) makes resource exhaustion of memory hard to do right. Accepting input until malloc or realloc return NULL is going to lead to some very interesting impacts on the overall system.

Plauger wrote about the reasoning behind this kind of thing at some point.

That’s a great example. Any idea where I could dig up Plauger’s writing?

1. 2

The version where you malloc() and realloc() the buf as you go doesn’t fix the whole problem. You still need to handle the case that you run out of memory gracefully, and you still need to apply a limit to input length.

Yeah, I agree. But it seems like you could do a pretty good job of that with a ulimit, as long as the software handles running out of memory in a reasonable way, and using `stralloc` rather than C strings usually results in simpler and more robust software.

Any idea where I could dig up Plauger’s writing?

He wrote some popular columns, but probably his books, which republished most of the columns, are the easiest way to read what he wrote. I read him for the first time when the university library in my small town was throwing out Elements of Programming Style. I probably read this reasoning in one of the Programming on Purpose series. Sadly, he doesn’t seem to be blogging.

1. 4

Stellar is very interesting stuff.

Even the decision to trust the person-with-the-most-SHA256-colliding-hardware (bitcoin’s current, absurd rule) is a trust judgment.

I have to agree with a lot of Graydon’s concerns about the Bitcoin community, but I don’t think this characterization is fair. The core argument in Bitcoin’s proof-of-work is that willingness to do work aligns with the rewards for doing that work, and Bitcoin tries to align the incentives, so “behaving well” means “getting paid”. There are some minor issues with this that have been found (like selfish mining), but so far it’s proven to be fairly robust. Unfortunately, it’s a very hard model to formally reason about, which is a big weakness in Bitcoin for the long term.

1. 4

I know very little about Bitcoin, so this may be a silly question: Doesn’t the incentive of Bitcoin fall apart if/when someone with significant resources (e.g. hashing capability) joins the network? Significant in the sense of being able to control enough of the global hashrate to start attempting double-spends, or just generally rewriting history because they can out-hash enough of the network to get revisions in?

I realize the global hash rate is pretty large…but I also don’t underestimate the resources a government or large organization could throw at the problem if they really desired.

1. 5

Doesn’t the incentive of Bitcoin fall apart if/when someone with significant resources (e.g. hashing capability) joins the network?

Yes. If any single “interest” controls more than half of the hash rate, they can start to rewrite the chain from any arbitrary point (over time). The meta-argument is that progress in hash rate changes is slow enough (some kind of weak ‘efficient hashrate market’ hypothesis) than technology changes won’t allow this to happen. It has also happened in the past, so it’s not clear how valid these assumptions really are.

There’s a bunch of interesting detail on this in Section III of http://www.jbonneau.com/doc/BMCNKF15-IEEESP-bitcoin.pdf

1. 2

Neat, thanks for the explanation and link. Will make some nice weekend reading. :)

1. 2

There is no technical reason why they could not do it. But there is no incentive for them to do it.

Plenty of reasons. Availability: my home has only one net connection and it’s availability is terrible. Durability: Any data stored in my home has very limited durability. Physical Security: My home’s not the most physically secure place. There are more.

It is possible to get great performance on these properties from a decentralized system, but it’s going to be far from “install gmail on your home machine”.

1. 2

Gmail is written as a decentralized system. They just own all the machines…

1. 1

I want to add I agree with you that a 3rd party is useful for certain things.

1. 4

Leslie Lamport’s “Who Builds a House without Drawing Blueprints?” http://cacm.acm.org/magazines/2015/4/184705-who-builds-a-house-without-drawing-blueprints/fulltext is something of a companion to this. I don’t like Lamport’s chosen analogy, but agree with his overall point.

One interesting thing about Lamport’s approach (and the TLA+ approach) is the simple mathematical basis. By choosing set theory and predicate logic, I think he’s made a tool that’s easier to approach theoretically than those based on other formalisms (like Pi calculus and type theory). That’s both a strength and weakness, so it will be interesting to see how this evolves in the future.

1. 4

There are some really great historical details here. I particularly enjoyed:

The IBM 1401 does not use bytes. Instead, it uses 6-bit BCD storage.

It would be super interesting to understand why they built it that way. BCD seems like an interestingly inefficient way to do arithmetic in hardware. Had the “standard” binary encoding approach not been invented yet? Was there another good reason?

Even the comparison instruction cost extra.

That whole paragraph is very interesting in comparison to the normal way of doing things today. It’s not unusual for modern hardware products to have features physically supported but soft-disabled. Sometime between the late 60s and now (probably driven by VLSI), the economics of hardware changed dramatically.

The 1401 I used is the Sterling model which it supports arithmetic on pounds/shillings/pence, which is a surprising thing to see implemented in hardware.

I consciously knew that the pounds/shillings/pence age ended in the early 70s, but my unconscious mental model of history doesn’t have it overlapping with the computer age.

1. 5

Was there another good reason?

Since it was designed for business they probably wanted to avoid the errors that come with representing base-10 fractions in binary. Billing systems, for example, often use BCD arithmetic so they can perform accurate arithmetic on large columns of numbers with fractions.

1. 4

I don’t know why BCD was chosen, but the fact, noted later in the story, that raw machine code was human readable is a really interesting side effect, especially when working at such a low level. Wonder if that was a factor?

1. 3

I’m the author of that post. I would be happy to answer questions, and would appreciate any feedback people have to offer.

1. 1

Assuming you can’t control what times the clients start at? What triggers many clients to try writes at the same time?

1. 1

That depends on the application. The “everybody starts at the same time” case is interesting, because it’s the worst case. Typical internet-facing service traffic is very bursty, so while this is a degenerate case, it’s not one that’s really too atypical.

1. 2

I’ve seen a thundering herd effect from smoke tests (to verify that live environment was working correctly) that resulted in several caches expiring at the same time and causing a (small) regular latency spike every 5 minutes in production. We solved it by adding jitter, though we referred to it as “fuzz”.

1. 2

You can also do soft expiry using an exponential function so early expiry is rare but you still don’t get contention with very large numbers of clients. Something like 10 ** (t - timeout - rand()).

1. [Comment removed by author]

1. 5

Hm? I’d take a 6 with a 9 in soft skill over the reverse anytime. The one is a guided missile with a bit less of punch, the other a heavy-hitter that often misses and needs a lot of aiming.

Figuring out who is what is the other difficulty, though :).

1. [Comment removed by author]

1. 4

Thanks for bringing that up. Lack of skills to evaluate the quality of the received work is a huge issue in the industry, in my opinion.

1. 3

Nearly all lawyers I have been in contact with are terrible. 5% are adequate. 2% or less are great.

2. 3

If you want to be surrounded by talent, stay small. Never work at a large organization.

That hasn’t been my experience. I’ve worked with teams large and small, and both had a similar mix of talent. The other counter is that larger organizations can spend more resources on specialization, so you get to work with people who are extremely deep in specific areas.

1. 2

This is the most interesting thing I’ve seen in a while. I wonder if my time is better invested in learning Coq than Haskell..

1. 7

Well, I found learning coq a really big investment (time and mental-wise)… I tried twice and left a bit frustrated, the simpler proofs are really simple, the harder ones are really hard. I wouldn’t dismiss the fact that it may be just too hard for me to grasp though.

If you want to learn programming-language-theory oriented coq there’s a de facto starting point: http://www.cis.upenn.edu/~bcpierce/sf/current/index.html and you may continue with http://adam.chlipala.net/cpdt/

However, if you like software verification, but find the proving process boring, there’s always the subject of SMTs (Z3, CVC4), automated theorem provers (ACL2) and such (coq is rather a proof assistant, you need to do the proof manually). You don’t need to learn coq to have fun with software verification!

[EDIT]

I forgot to mention Idris, which aims to be a sort of “usable coq”.

1. 5

In my limited use of it (mostly limited to a grad course taken some years ago), I found ACL2 more towards the proof-assistant side of things also. It does have some automated theorem proving, but it only succeeds fully automatically for very simple things. For anything non-trivial (not even meaning big Real Software, even simple algorithms-textbook algorithms) you have to hand-hold the prover quite a bit, which requires digging into the details of the proving process. A lot of getting a proof to “go through” consists of strategically giving it rewrite rules, induction schemes, etc. The people who are really good with ACL2 seem to have a lot of black-art type knowledge of it.

1. 2

Fabrice, thanks for the reply - this is the sort of overview I was looking for.

I have some experience with Haskell - read a lot of papers & code but have only written toys. I’ve seen Idris around and algebraic effects are definitely intriguing.

From the sounds of it, Coq isn’t quite what I’m really looking for. I might try out some of those really simple proofs you’re talking about to get a feel for it and then move onto learning more about SMT and Idris.

Software verification is definitely the goal, for me :)

1. 4

Depending on what you mean by “software verification”, you should also look at TLA+ and Promela/Spin. They aim at a different space from Coq, but in my experience tend to be better at the set of things that many working coders find useful.

1. 1

Thanks, I’ve toyed around with TLA+ and have started digging into the book. Haven’t heard of Promela/Spin - will take a look.

2. [Comment removed by author]

1. 1

Thanks. I’ve got some Haskell & ML experience. I don’t mind cliffs :)

1. [Comment removed by author]

1. 5

Formal methods have a great way of forcing people to think about the properties they expect of their systems. While there’s obviously value in proofs and model checking, in my experience the biggest benefit to me has been clarity around requirements.

1. 11

I wrote this on Saturday morning, primarily to highlight some common threading pitfalls and cool stuff that can be done with ‘perf’.

I’ve written a bunch of stuff over the last few years at http://brooker.co.za/blog/, and have received great feedback and criticism from others. For this one, though, my email has been a fairly constant stream of vitriol since yesterday. I even got accused of being a “shill for Node.js”. It’s all easy enough to ignore, but I found it interesting how touchy this subject is.

1. 5

It’s remarkable how terrible people can be at times.

Thank you for writing this. I didn’t know about `perf`!

1. 6

I think extrapolating from “my program got slower with threads (and there was an easy fix)” to “most programs get slower with threads” is quite the leap.

1. 10

I think the point is more: “it is easy to get multi-threading wrong and hurt performance in ways you may not expect if you’re unfamiliar with how multi-threading works.”

1. 10

Multi-threaded programs can, and very often do, run much more slowly than the equivalent single-threaded program.

The point that I was trying to make is that Amdahl’s law gives us the wrong intuition about the performance of multi-threaded programs. The worst case of Amdahl’s law is a wash: the multi-threaded code runs in the same time as the equivalent single-threaded code. Unfortunately, that doesn’t match reality. In the real world, poorly written (or poorly optimized) multi-threaded code runs slower than the equivalent single-threaded program.

That doesn’t mean that threads are bad, just that they aren’t a magic ointment that makes all programs faster. If there is contention, especially if it’s contention that requires a context switch, they can make code go slower. Sometimes shockingly so.

The second think I was trying to talk about is how modern Linux has some very cool tools for tracking down these kinds of performance problems. perf (or perf-events) is extremely powerful, and combines a lot of what you get from profilers and strace into one package with much less runtime overhead. In addition, its ability to do system-wide profiling is very handy for cross-process interactions. Many other operating systems have equivalent, and some better, tools.

1. 6

In the real world, poorly written (or poorly optimized) multi-threaded code runs slower than the equivalent single-threaded program.

I’ve done a lot of concurrent programming over the last four years, and this has almost never been my experience working with Erlang, Clojure, and java.util.concurrent, but YMMV I guess. I tend to see sublinear scaling owing to unexpected locks in the stdlib (hi Integer.parseInt), or known synchronization points like interning and queue contention, but I don’t think I’ve ever hit a real-world computational problem where I haven’t been able to get a ~4-10x speedup out of a 16-core box by slapping a threadpool on it. Usually takes a profiler to get to that near-linear domain though.

1. 2

It was harder to get useful behavior out of multithreading in the bad old C/C++ days where there was heavy reliance on out-of-process locks. People know how to do things better than lock-spaghetti now.

1. 2

This is cool. You could tell the same story starting with the knuckle thing too:

• Months have 30 days, or you have flabby hands with no knuckles (y = 30)
• Alternating months have 31, or you have a knuckle to start, then a gap then a knuckle and so on (y = 30 + x % 2)
• Except they don’t, something weird happens after the x = 7, or the fourth knuckle, and the process gets offset by one (y = 30 + (x + floor(x/8)) % 2)
• The first gap (February) is extra weird. The article’s trick of using 2 % x + 2 * floor(1/x) is clever.