1. 9

    because it forces you to use a phone number, the author works for facebook and recommends against GPG, and they did not want people to use the F-droid free appstore and they are against federating with people hosting their own server

    1. 8

      Who works for Facebook? Not Moxie, as far as I know.

      1. 5

        moxie used to work for twitter, but no longer.

      1. 2

        this is awesome!

        1. 1

          Thanks for the kind words, glad you like it!

        1. -1

          it looks like the link is broken =(

          1. 1

            I just got to the page fine.

            1. 2

              weird it failed for me initially and i just assumed the link was bad, sorry for the confusion.

            2. 1

              Nope, seems fine?

            1. -1

              [Title] /proc/<pid>/stat is broken

              This sounds serious! Is the content of the pseudo-file associating incorrect PIDs or parent PIDs to processes?

              Let’s continue…

              Documentation (as in, man proc) tells us to parse this file using the scanf family, even providing the proper escape codes - which are subtly wrong.

              So it’s a documentation issue…

              When including a space character in the executable name, the %s escape will not read all of the executable name, breaking all subsequent reads

              I have literally never encountered an executable with a space in the name, although it’s perfectly legal from a file name perspective. (I’ve been a Linux user since 1998).

              The only reasonable way to do this with the current layout of the stats file would be to read all of the file and scan it from the end […]

              So… let’s do this instead?

              The proper fix (aside from introducing the above function) however should probably be to either sanitize the executable name before exposing it to /proc//stat […]

              Sounds reasonable to me.

              […], or move it to be the last parameter in the file.

              Thus breaking all existing implementations that rely on the documentation in man proc. But I guess it can be done in some backwardly compatible way?

              This problem could potentially be used to feed process-controlled data to all tools relying on reading /proc//stat

              I can’t really parse this. Do you mean “affect” instead of “used”?

              In conclusion: I can’t see any evidence of the functionality of this proc pseudo-file being “broken”. You have encountered an edge case (an executable name with a whitespace character in it). You’ve even suggested a workaround (scan from the end). If you had formulated this post as “here’s a workaround for this edge case” I believe you would have made a stronger case.

              1. 5

                I have literally never encountered an executable with a space in the name

                Well, tmux does this, for example. But my primary concern is not has it ever happened to me but, if it happens, what will my code do?. As this is a silent failure (as in, the recommended method fails in a non-obvious way without indicating failure), no action is taken by most implementations to guard against this. That, in my mind, counts as broken, and the least thing to do is to fix the documentation. Or expose single parameters in files instead of a huge conglomeration with parsing issues. Or… see above.

                So… let’s do this instead?

                I do, but only after I got sceptical while reading the documentation, ran some tests and had my hunch confirmed. Then I checked to see others making that very mistake.

                Thus breaking all existing implementations that rely on the documentation in man proc. But I guess it can be done in some backwardly compatible way?

                No, I don’t think so - except for introducing single-value files (and leaving /proc/<pid>/stats be as it is).

                This problem could potentially be used to feed process-controlled data to all tools relying on reading /proc//stat

                I can’t really parse this. Do you mean “affect” instead of “used”?

                Admittedly, English is not my first language, I do however think that sentence parses just fine. The discussed problem (which is present in several implementations based on the documentation), can potentially be used to inject data (controlled by the process, instead of the kernel) into third-party software.

                In conclusion: I can’t see any evidence of the functionality of this proc pseudo-file being “broken”.

                That depends on your view of broken - if erroneous documentation affecting close to all software relying on it with a silent failure does not sound broken to you, I guess it is not.

                You have encountered an edge case (an executable name with a whitespace character in it).

                I actually did not encounter it per se, I just noticed the possibility for it. But it is an undocumented edge case.

                You’ve even suggested a workaround (scan from the end).

                I believe that is good form.

                If you had formulated this post as “here’s a workaround for this edge case” I believe you would have made a stronger case.

                Maybe, but as we can see by the examples of recent vulnerabilities, you’ll need a catchy name and a logo to really get attention, so in my book I’m OK.

                1. 1

                  Thanks for taking the time to answer the questions I have raised.

                  The discussed problem (which is present in several implementations based on the documentation), can potentially be used to inject data (controlled by the process, instead of the kernel) into third-party software.

                  Much clearer, thanks.

                  On the use of “broken”

                  I’m maybe extra sensitive to this as I work in supporting a commercial software application. For both legal and SLA[1] we require our customers to be precise in their communication about the issues they face.

                  [1] Service level agreement

                  1. 1

                    Followup: can you give a specific example of how tmux does this? I checked the running instances of that application on my machine and only found the single word tmux in the output of stat files of the PIDs returned by pgrep.

                    1. 2

                      On my Debian 9 machine, when starting a tmux host session, the corresponding /proc/<pid>/stat file contains:

                      2972 (tmux: client) S 2964 2972 2964 […]

                2. 3

                  “Thus breaking all existing implementations that rely on the documentation in man proc. But I guess it can be done in some backwardly compatible way?”

                  I will never get the 100ms it took to read this sentence back….

                  1. 1

                    I dunno, maybe just duplicate the information at the end of the current format, in the author’s preferred format, and delimited by some character not otherwise part of the spec.

                    It’s not trivial, though.

                    That was my point.

                  2. 1

                    this was clearly overlooked when the api was designed, nobody is parsing that file from the end and nobody is supposed to

                    1. -1

                      What was overlooked? That executables can have whitespace in their names?

                      I can agree that this section of the manpage can be wrong (http://man7.org/linux/man-pages/man5/proc.5.html, search for stat):

                      (2) comm  %s
                          The filename of the executable, in parentheses.
                          This is visible whether or not the executable is
                          swapped out.
                      

                      From the manpage of scanf:

                      s: Matches a sequence of non-white-space characters; the next
                          pointer must be a pointer to the initial element of a
                          character array that is long enough to hold the input sequence
                          and the terminating null byte ('\0'), which is added
                          automatically.  The input string stops at white space or at
                          the maximum field width, whichever occurs first.
                      

                      So it’s clear no provision was made for executables having whitespace in them.

                      This issue can be simply avoided by not allowing whitespace in executable names, and by reporting such occurrences as a bug.

                      1. 8

                        This issue can be simply avoided by not allowing whitespace in executable names, and by reporting such occurrences as a bug

                        Ahhh, the Systemd approach to input validation!

                        Seriously, if the system allows running executables with whitespace in their names, and your program is meant to work with such a system, then it needs to work with executables with whitespace in their names.

                        I agree somewhat with the OP - the interface is badly thought out. But it’s a general problem: trying to pass structured data between kernel and userspace in plain-text format is, IMO, a bad idea. (I’d rather a binary format. You have the length of the string encoded in 4 bytes, then the string itself. Simple, easy to deal with. No weird corner cases).

                        1. 1

                          I agree it’s a bug.

                          However, there’s a strong convention that executables do not have whitespace in them, at least in Linux/Unix.[1]

                          If you don’t adhere to this convention, and you stumble across a consequence to this, does this mean that a format that’s been around as long as the proc system is literally broken? That’s where I reacted.

                          As far as I know, nothing crashes when you start an executable with whitespace in it. The proc filesystem isn’t corrupted.

                          One part of it is slightly harder to parse using C.

                          That’s my take, I’m happy to be enlightened further.

                          I also agree that exposing these kind of structures as plain text is arguably … optimistic, and prone to edge cases. (By the way, isn’t one of the criticisms of systemd that it has an internal binary format?).

                          [1] note I’m just going from personal observation here, it’s possible there’s a subset of Linux applications that are perfectly fine with whitespace in the executable name.

                          1. 3

                            I agree with most of what you just said, but I myself didn’t take “broken” to mean anything beyond “has a problem due to lack of forethought”. Maybe I’m just getting used to people exaggerating complaints (heck I’m surely guilty of it myself from time to time).

                            It’s true that we basically never see executables with a space (or various other characters) in their names, but it can be pretty frustrating when tools stop working or don’t work properly when something slightly unusual happens. I could easily see a new-to-linux person creating just such an executable because they “didn’t know better” and suffering as a result because other programs on their system don’t correctly handle it. In the worst case, this sort of problem (though not necessarily this exact problem) can lead to security issues.

                            Yes, it’s possible to correctly handle /proc/xxx/stat in the presence of executables with spaces in the name, but it’s almost certain that some programs are going to come into existence which don’t do so correctly. The format actually lends itself to this mistake - and that’s what’s “broken” about it. That’s my take, anyway.

                            1. 2

                              Thanks for this thoughtful response. I believe you and I are in agreement.

                              Looking at this from a slightly more usual perspective, how does the Linux system handle executables with (non-whitespace) Unicode characters?

                              1. 3

                                Well, I’m no expert on unicode, but I believe for the most part Linux (the kernel) treats filenames as strings of bytes, not strings of characters. The difference is subtle - unless you happen to be writing text in a language that uses characters not found in the ASCII range. However, UTF-8 encoding will (I think) never cause any bytes in the ASCII range (0-127) to appear as part of a multi-byte encoded character, so you can’t get spurious spaces or newlines or other control characters even if you treat UTF-8 encoded text as ASCII. For that reason, it poses less of a problem for things like /proc/xxx/stat and the like.

                                Of course filenames being byte sequences comes with its own set of problems, including that it’s hard to know encoding should be used to display filenames (I believe many command line tools use the locale’s default encoding, and that’s nearly always UTF-8 these days) and that a filename potentially contains an invalid encoding. Then of course there’s the fact that unicode has multiple ways of encoding the exact same text and so in theory you could get two “identical” filenames in one directory (different byte sequences, same character sequence, or at least same visible representation). Unicode seems like a big mess to me, but I guess the problem it’s trying to solve is not an easy one.

                                (minor edit: UTF-8 doesn’t allow 0-127 as part of a multi-byte encoded character. Of course they can appear as regular characters, equivalent to the ASCII).

                                1. 1
                                  ~ ❯ cd .local/bin
                                  ~/.l/bin ❯ cat > ą << EOF
                                  > #/usr/bin/env sh
                                  > echo ą
                                  > EOF
                                  ~/.l/bin ❯ chmod +x ą 
                                  ~/.l/bin ❯ ./ą
                                  ą
                                  
                              2. 2

                                If you don’t adhere to this convention, and you stumble across a consequence to this, does this mean that a format that’s been around as long as the proc system is literally broken?

                                Yes; the proc system’s format has been broken (well, misleadingly-documented) the whole time.

                                As you note, using pure text to represent this is a problem. I don’t recommend an internal, poorly-documented binary format either: canonical S-expressions have a textual representation but can still contain binary data:

                                (this is a canonical s-expression)
                                (so "is this")
                                (and so |aXMgdGhpcw==|)
                                

                                An example stat might be:

                                (stat
                                  (pid 123456)
                                  (command "evil\nls")
                                  (state running)
                                  (ppid 123455)
                                  (pgrp 6)
                                  (session 1)
                                  (tty 2 3)
                                  (flags 4567)
                                  (min-fault 16)
                                  …)
                                

                                Or, if you really cared about concision:

                                (12345 "evil\nls" R 123455 6 1 16361 4567 16 …)
                                
                            2. 3

                              nobody is parsing that file from the end

                              As an example the Python Prometheus client library uses this file, and allows for this.

                        1. 8

                          The core statement (in my view):

                          When it comes to strategy, planning and so forth, they’ll go to the suit. Even though the engineer might be much better suited to discuss these topics.

                          So the problem is the estimation

                          P(Strategy | Suit) > P(Strategy | Tech)
                          

                          Humans are pretty good at picking up on regularities, so assuming that this is indeed a consistent pattern, there must be a reason for this judgement.

                          • Assuming that most strategy people do wear suits, I’d guess that P(Suit | Strategy) is pretty large.
                          • Assuming that most strategy people are not tech guys, I’d guess that P(Tech | Strategy) is pretty small.

                          i.e.

                          P(Suit | Strategy) >>  P(Tech | Strategy)
                          

                          then via Bayes:

                          • P(Strategy | Suit) = P(Suit | Strategy) P(Strategy) / P(Suit)
                          • P(Strategy | Tech) = P(Tech | Strategy) P(Strategy) / P(Tech)

                          Now, assuming that there’s fewer managers than engineers, i.e.

                          P(Suit) <= P(Tech)
                          

                          we immediately get the “problematic” estimation above:

                              P(Strategy | Suit) 
                          =   P(Suit | Strategy) P(Strategy) / P(Suit) 
                          >   P(Tech | Strategy) P(Strategy) / P(Suit) 
                          >=  P(Tech | Strategy) P(Strategy) / P(Tech) 
                          =   P(Strategy | Tech)
                          

                          In this case, we could say that:

                          • There’s not enough engineers && suits
                          • There’s too many suits doing strategy
                          • There not enough engineers doing strategy
                          • There should be more suits in tech companies (???)

                          Modulo sloppy reasoning and faulty assumptions :)

                          1. -4

                            this is your first comment?

                          1. 3

                            Go measure the cost of real threads on a modern linux kernel, compare to Golang, rethink some of the beliefs you have been parroting from Golang dogma. It’s an antipattern to spin up 10k goroutines anyway.

                            Erlang has a much better situation, but still, once you’re getting into the realm of high-throughput you shouldn’t be erlanging.

                            Real threads are cheap enough for 99.9% of workloads, and incur a lot less CPU overhead for steady state execution under high concurrency. Don’t drink the kool-aid.

                            1. 3

                              this is something i see people often ranting about in absolute contexts which drives me crazy, although i have observed problems related to high frequency thread creation/churn can be an issue for truly cpu bound workloads. i would say most people’s workloads are largely idle or IO bound, having thousands of largely unused threads is nbd.

                              i wish i had a nickel for every compute instance w/ a daily max cpu usage of less than 60%.

                            1. 1

                              this is a very exciting project (even if it is in erlang) and what a wonderful readme!

                              1. 5

                                (even if it is in erlang)

                                Why such a hater?!

                                1. 2

                                  I’m not! But I do think that erlang has a high barrier for potential contributors, it’s a great quirky language and I’m excited to test minuteman/learn more. =)

                                  1. 1

                                    You might be right, but I can’t think of a better language to have done Lashup in. In retrospect, I might have rather done Minuteman in C, but it’s not like C is much friendlier to contributors. :(.

                              1. 6

                                well done!

                                1. 6

                                  This article is making me really nervous but I know I don’t have the distributed chops to prove it wrong.

                                  I’ll say this: when an author starts talking about probabilistic interpretations of the theorem and going on about cost-benefit analysis (seriously, why are we worked up about poor “administration interfaces” here?!) my BS needle starts twitching. And when they do that when an impossibility proof exists that shows element availability and atomic consistency are not both possible, it starts swinging around madly.

                                  The article reads like an awful lot of language lawyering around fairly well understood concepts, but I’m not sure what the motivations of the author are.

                                  1. 6

                                    Heh… Sigh. It reads like an attempt to illuminate, but a bad one. That seems worthwhile if it were shorter and clearer; I don’t think the concepts are actually all that well understood, unfortunately. At a previous job, after two months of arguing that Riak was the wrong choice for the company, I finally got through:

                                    Me: “What exactly is the benefit you’re hoping for from using a distributed technology? Uptime? Preventing data loss?” Them: “Yes, both of those.” Me: “Those are mutually-exclusive in our situation.” Them: “Oh… Maybe something else would be okay.”

                                    (And no, they aren’t inherently mutually exclusive, but the data was peculiar and merging later, after resolving a partition, wasn’t an option. I can’t go into it.)

                                    I definitely don’t want that to be read as an insult to the intelligence of the person involved; they were quite competent. It’s just that databases are a subject not all engineers actually know very much about, and distributed ones are a rather new technology in the scheme of things.

                                    It’s worth noting that not all distributed systems are databases, too, of course!

                                    1. 5

                                      That’s not what the impossibility proof says–he references that paper.

                                      “In 2002, Seth Gilbert and Nancy Lynch publish the CAP proof. CA exists, and is described as acceptable for systems running on LAN.”

                                      “If there are no partitions, it is clearly possible to provide atomic, available data. In fact, the centralized algorithm described in Section 3.2.1 meets these requirements. Systems that run on intranets and LANs are an example of these types of algorithms” [0]

                                      I don’t think CAP is very well understood. I think folks end up very confused about what consistent means, and what partition-tolerant means.

                                      I think this is pretty well researched. I’m not sure why cost-benefit analysis makes you nervous.

                                      1. 4

                                        James Hamilton of AWS says it best, I think:

                                        Mike also notes that network partitions are fairly rare. I could quibble a bit on this one. Network partitions should be rare but net gear continues to cause more issues than it should. Networking configuration errors, black holes, dropped packets, and brownouts, remain a popular discussion point in post mortems industry-wide.

                                        Gilbert & Lynch’s implicit assertion is that LANs are reliable and partition free; I can buy this in theory but does this happen in practice? When Microsoft performed a large analysis of failures in their data centers, they found frequent loss occurring that was only partially mitigated by network redundancy.

                                        But either way you make a fair point: CA models aren’t strictly precluded by that proof. I’m just not certain I’ve seen a network that is trustworthy enough to preclude partitions.

                                        1. 5

                                          Network partitions are not even remotely rare, honestly. LANs are actually worse culprits than the Internet, but both do happen.

                                          You already cited one of the better sources for it, but mostly I believe this because I’ve been told it by network engineers who I respect a lot.

                                          1. 6

                                            Even if network partitions were rare, I’ll tell you what aren’t (for most people): garbage collections. What I did not like about this post is it, over and over again, just talks about network partitions and the actual networking hardware. But weird application-specific things happen as well that appear to be unresponsive for longer than some timeout value and these are part of the ‘P’ as well.

                                            In reality, I think CAP is too cute to go away but not actually adequate in talking about these things in detail. PACELC makes the trade-offs much clearer.

                                            1. 4

                                              LANs are actually worse culprits than the Internet

                                              Funny you mention that: over the past few days I’ve been fighting an issue with our internal network that has resulted in massive packet loss internally (>50% loss in some spikes), and ~0.5% to the Internet. That’s probably why this article raised my eyebrows - it’s my personal bugbear for the week.

                                              The culprit seems to have been a software update to a Palo Alto device that stopped playing nice with certain Cisco switches… plug the two of them together and mumble mumble spanning tree mumble loops mumble. The network guys start talking and my eyes glaze over. But all I know is that I’ve learned the hard way to not trust the network - and when a proof exists that the network must be reliable in order to have CA systems, well…

                                              1. 1

                                                Heh - my sympathies.

                                          2. 3

                                            I think some of the confusion comes from describing all node failures as network partitions. In reality “true” network partitions are rare enough (lasting in durations long enough to matter to humans), but nodes failing due to hardware failure, operational mistakes, non-uniform utilization across the system, and faulty software deploys are sometimes overlooked in this context.

                                            i like the comment above “It’s worth noting that not all distributed systems are databases, too, of course!”, but i think this is also a matter of perspective. most useful systems contain state, isn’t twitter.com as a network service a distributed database? kind of neat to think about

                                          3. 3

                                            It’s not clear to me that the distinction the author makes between a CA and a CP system exists. He uses ZooKeeper as an example of a CP system, but the minority side of networking partition in ZooKeeper cannot make progress, just like his CA example. In reality, CP seems to be a matter of degree not boolean, to me. Why does a CP system that handles 0 failures have to be different than one that handles 2f-1?

                                            1. 1

                                              When the system availability is zero (not available at all) after a partition, you can claim both CP and CA (that’s the overlap between CP/CA).

                                              There are two corner cases when the system is not available at all:

                                              • the system does not even restart after the partition. You can claim CP theoretically. The proof’s definitions don’t prevent this formally. But it makes little sense in practice.

                                              • the system restarts after the partition and remains consistent. Both CP and CA are ok.

                                              But ZooKeeper is not concerned by these corner cases, because it is partly available during the partition.

                                              1. 9

                                                No, you can’t: a system which is not available during a partition does not satisfy A, and cannot be called CA. If you could claim both CA and CP you would have disproved CAP.

                                                1. 2

                                                  CA means: I have a magical network without partition. If my network is not that magical at the end, I will be CP/AP and more likely in a very bad state, not fully available and not fully consistent.

                                                  1. 7

                                                    I’m responding to “When the system availability is zero (not available at all) after a partition, you can claim both CP and CA”. Please re-read Gilbert & Lynch’s definition of A: you cannot claim CA if you refuse to satisfy requests during a partition.

                                                    1. 3

                                                      But those magic networks do not exist, so how can a CA system exist?

                                                      1. 1

                                                        :-) It exists until there is a partition. Then the most probable exit is to restore manually the system state. 2PC with heuristic resolution being an example.

                                                        Or, if you build a system for machine learning: 20 nodes with GPU, 2 days of calculation per run. If there is a network partition during these two days you throw away the work in progress, fix the partition and start the calculation process again. I don’t see myself waiting for the implementation/testing of partition tolerance for such a system. I will put it in production even if I know that a network partition will break it apart.

                                                        1. 2

                                                          That system is still CP. You are tolerating the notion of partitions, and in the case of a partition you sacrifice A (fail to fulfill a request–a job in this case) and restart the entire system for the sake of C.

                                                          1. 1

                                                            It exists until there is a partition.

                                                            If a system reacts to a partition by sacrificing availability - as it must, and you haven’t demonstrated differently - how can you claim it is CA?

                                                            If there is a network partition during these two days you throw away the work in progress, fix the partition and start the calculation process again. I don’t see myself waiting for the implementation/testing of partition tolerance for such a system. I will put it in production even if I know that a network partition will break it apart.

                                                            I feel like I’m in bizarro world.

                                                            1. 1

                                                              If a system reacts to a partition by sacrificing availability - as it must, and you haven’t demonstrated differently - how can you claim it is CA?

                                                              If the system sacrifices consistency (it could also be consistency, or both), then there is an overlap between CA and CP. That’s what Daniel Abadi said 5 years ago: “What does “not tolerant” mean? In practice, it means that they lose availability if there is a partition. Hence CP and CA are essentially identical.”

                                                              The key point is that forfeiting partitions does not mean they won’t happen. To quote Brewer (in 2012) “CA should mean that the probability of a partition is far less than that of other systemic failures”

                                                              That’s why there is an overlap. I can choose CA the probability of a partition is far less than that of other systemic failures, but I could have a partition. And if I have a partition I will be either non consistent, either non available, either both, and I may also have broken some of my system invariants.

                                                              I’m sure it does not help you as I’m just repeating my post, and this part is only a repetition of something that was said previously by others :-(

                                                              Trying differently, maybe the issue to understand this is that you have:

                                                              • CAP as a theorem: you have to choose between consistency and availability during a partition. There are 3 options here:

                                                                • full consistency (the CP category)

                                                                • full availability (the AP category)

                                                                • not consistent but only partial availability (not one of the CAP categories, but possible in practice, typically 2PC with heuristic resolutions: all cross-partition operations will fail).

                                                              • CAP as a classification tool with 3 options: AP/CP/CA. There are a description of the system. CA means you forfeited partition tolerance, i.e. it’s a major issue for the system you build.

                                                              And, in case there is any doubt: most systems should not forfeit partitions. I always mention 2PC/heuristic because is a production proven exception.

                                                    2. 1

                                                      Could you rephrase your statement? I am having trouble parsing what you have said.

                                                      1. 1

                                                        the cr went away. let me edit.

                                                      2. 1

                                                        If we take your second case - as it’s the only real case worth discussing, as you note :-) - how can you claim the system is available?

                                                        The system is CA under a clean network until time n when the network partitions. The partition clears up after m ticks. So from [1, n) and (m, inf) the system is CA, but from [n, m] it is unavailable. Can we really say the system maintains availability? That feels odd to me.

                                                        Maybe it makes more sense to discuss this in terms of PACELC - a system in your second case has PC behavior; in the presence of a partition it’s better to die hard than give a potentially inconsistent answer.

                                                        Having said all of this, my distributed systems skills are far below those of the commentators here, so please point out any obvious missteps.

                                                        1. 1

                                                          CA is forfeiting partition tolerance (that’s how it was described by Eric Brewer in 2000). So if a partition occurs it’s out of the operating range, you can forfeit consistency and/or availability. It’s an easy way out of the partition tolerance debate ;-). But an honest one: it clearly says that the network is critical for the system.

                                                          Maybe it makes more sense to discuss this in terms of PACELC - a system in your second case has PC behavior;

                                                          Yep it works, Daniel Abadi solved the overlap by merging CA and CP (“What does “not tolerant” mean? In practice, it means that they lose availability if there is a partition. Hence CP and CA are essentially identical.”) It’s not totally true (a CA system can lose its consistency if there is a partition, like 2PC w/ heuristic resolutions), but it’s a totally valid choice. If you do the same choice as Daniel in CAP you choose CP for the system 2 above. CA says “take care of your network and read the documentation before it is too late”.

                                                    3. 3

                                                      seriously, why are we worked up about poor “administration interfaces” here

                                                      :-) Because I’ve seen a lot of system where the downtime/data corruptions were caused mainly by: 1) software bugs 2) human errors.

                                                      I also think that a lot of people take partition tolerance for granted (i.e. “this system is widely deployed in production, so it is partition tolerant as I’m sure everybody has network issues all the time, so I can deploy it safely myself w/o thinking to much about the network”). Many systems are not partition tolerant (whatever they say). That’s why Aphyr’s test crash them (dataloss, lost counters,…), even if they are deployed in production.

                                                      It does not mean they have no value. It’s a matter of priority. See Aphyr’s post on ES, imho they should plan partition tolerance and implement immediately crash tolerance for example, instead of trying to do both at the same time.

                                                      I prefer a true “secure your network” rather than a false “of course we’re partition tolerant, CAP says anything else is impossible” statement (with extra points for “we’re not consistent so we’re available”).

                                                      1. 3

                                                        CAP tells you that you can’t have both C and A when a partition happens. Most people take that to mean you must choose one or the other and have a CP or AP system. But it’s worth remembering that you do have the option of making sure that partitions never[1] happen - either by making the system non-distributed or by making the communications reliable enough. And for some use cases that might be the correct approach.

                                                        [1] In a probabilistic sense - you can’t ensure that a network partition never happens, but nor can you ensure that you won’t lose all the nodes of your distributed system simultaneously. Any system will have an acceptable level of risk of total failure; it’s possible to lower the probability of a network partition to the point where “any network partition is a total system failure” is an acceptable risk.

                                                        1. 2

                                                          I think it’s important to modify your statement a bit. What you have to do is ensure that in the face of a partition you remain consistent then try your darnedest to reduce the frequency of partitions. The distinction being you have control over what happens during a partition but not control over a partition happening.

                                                          1. 4

                                                            you have control over what happens during a partition but not control over a partition happening.

                                                            I don’t think that this sharp distinction exists. You don’t have absolute control over what happens during a partition - to take an extreme example, the CPU you’re running on might have a microcode bug that means it executes different instructions from the one you intended. And you do have control - to the extent that you have control of anything - over the things that cause network partitions; you can construct your network (or pay people to construct your network) so as to mitigate the risks. It is absolutely possible to construct a network which won’t suffer partitions (or rather, in which partitions are less likely than simultaneous hardware failures on your nodes) if you’re willing to spend enough money to do so (this is rarely a smart choice, but it could be).

                                                            1. 2

                                                              I do not think byzantine faults really matter for this discussion, they are a whole other issue to partitions. But I do not think your response invalidates my point at all. Partitions are something that happens to you, how your program handles them is something you do.

                                                      1. 3

                                                        Building an etcd mesos framework so that it’s easier to run HA kubernetes clusters on top of mesos (or anything else that has made the decision to use etcd over zk or the mesos replicated log.) I’ve got it handling failover of the scheduler or the mesos master, and seamless recovery when up to (N-1)/2 etcd instances fail. Etcd mutates its membership by invoking RAFT itself, which is why it’s a little more interesting to recover from more than (N-1)/2 failures. Next I’ll be adding periodic backups to HDFS/S3/etc… so that the cluster can be restored in catastrophic scenarios. If up to N-1 failures occur, it will be able to just dump the current snapshot and spawn a fresh cluster using that as the seed.

                                                        1. 1

                                                          is it possible to run mesos under etcd instead of zk yet? the jira looks like it’s still open, but i would be willing to try a branch if it’s something that’s semi-functional

                                                          1. 1

                                                            It’s in progress currently, but I’m not sure how far along it is. There is also desire to make mesos self sufficient by handling its own leader election, but zk is going to be the most reliable option for the time being.

                                                        1. 4

                                                          saying that anything needs to die in a fire is a great way to not have your argument taken seriously

                                                          1. 4

                                                            Storm 0.9.2 + Kafka 0.7 + Cassandra 2.1.0-rc5 + Elasticsearch 1.3 cluster is now up-and-running in production, running against around 3,000 web traffic requests per second. Time to test it in more detail and make it fast!

                                                            1. 1

                                                              are these 3000 real requests per second? or is that just in benchmarks?

                                                              1. 1

                                                                real requests per second

                                                              2. 1

                                                                why deploy kafka 0.7 instead of 0.8.1?

                                                                1. 1

                                                                  we want to upgrade to 0.8.1, but we currently use a Python driver we wrote for 0.7 and are in the midst of merging its functionality with an open source driver for 0.8.1

                                                              1. 2

                                                                What are you trying to create another lobster clone for?

                                                                1. 11

                                                                  I’d assume they want to use it for something other than tech news.

                                                                  1. 4

                                                                    I, for example, have opened a lobster clone for Russian developers https://develop.re/

                                                                    1. 2

                                                                      that’s a great domain name

                                                                    2. 3

                                                                      We have a small dev community in Hawaii, and we deployed our own lobster clone because we wanted a place to share and discuss links/events. We also wanted something we could customize a bit (so that ruled out reddit, also reddit is generally too public for the type of discussion we wanted), so rather than NIH another link-sharing site we started with the lobsters codebase.