1. 4

    Congrats on the job! One thing I’ve found helpful to do is profiling the operation of the codebase under normal load. Specifically I’ve used flamegraphs, which gives you a quick visualization into the callstacks of the most frequently/longest running methods. I’ve used this information to prune through most of the codebase (focusing on what’s running 90% of the time, so to speak), and to understand the overarching structure of things. Starting from func main or reading source files top to bottom, both of which I’ve tried doing (ineffectively I should add) were motivated by the same reasons but I’ve since settled in on just profiling.

    Additionally I’ve found tests to be good point to start probing in, understanding the test setup phase (if any) has helped me understand the structure/dependencies and is a quick way to get your feet wet (by muddling around, breaking the tests).

    1. 10

      I was recently working with coreos/etcd’s Raft implementation, specifically it’s PreVote feature which was something alluded to in the Raft dissertation. I convinced myself that specifying PreVote in TLA+ would help me better understand it (etcd’s PreVote implementation has had some issues as of late), which I did here: irfansharif/raft.tla.

      Some other resources that came in handy learning TLA+ then were Leslie Lamport’s video series on the subject and Microsoft Research’s Dr. TLA+ series.

      1. 3

        I convinced myself that specifying PreVote in TLA+ would help me better understand it

        How much did it help?

        1. 5

          Quite a bit, I was sure to familiarize myself with the original Raft TLA+ spec (Appendix B.2 from the thesis paper) which itself was a very succinct representation of the possible state changes in vanilla Raft. Also note what I mentioned above is about understanding raw TLA+, the post linked covers PlusCAL which adds a C like pseudo-code interface to TLA+ that is essentially transpiled to TLA+ – not something I covered.

          EDIT: welp, didn’t realize I was responding to the author of the post.

          1. 6

            EDIT: welp, didn’t realize I was responding to the author of the post.

            Hah, no worries! I’m just glad to get all of the info I can about how people use TLA+. I mostly pitch it as a way to find bugs in a spec, but just as beneficial is using it to understand a spec. It’s a much more subtle bonus, but writing a spec means not being able to handwave or glaze over what you actually want to happen. The guide is something I really care about, and part of caring means constantly revising it to make it more useful to readers.

            the post linked covers PlusCAL which adds a C like pseudo-code interface to TLA+ that is essentially transpiled to TLA+ – not something I covered.

            PlusCal is sort of an interesting case. I chose to focus on it because I think it’s easier to learn and, for a lot of common use cases, is “good enough” for most people. But it’s definitely less powerful than raw TLA+. For example, in PlusCal you can’t simulate a system reset by forcing all processes to simultaneously go back to their starting state, or have one process create a second at runtime. I think that if TLA+ ever becomes popular, most people will be working in DSLs like Pluscal – C to TLA+’s assembly language.

      1. 7

        One can imagine a scenario where one of the nodes has a latency higher than it’s election timeout, causing it to continuously start elections

        I have used Raft in production and can confirm this is a real thing that happens. Here’s an issue on the etcd repo discussing this problem https://github.com/coreos/etcd/issues/7970. basically: “a 5-node cluster can function correctly if two nodes are down. But it won’t work if one node has slow disk.”. Which is a weird failure mode!

        also I found this post valuable, I’d never read the raft website/paper but this appeared in my RSS reader and it caused me to actually learn more about how Raft works!

        1. 3

          Using Raft’s PreVote extension should alleviate this issue, though etcd’s current PreVote implementation could be more stable. See https://github.com/coreos/etcd/issues/8501, https://github.com/coreos/etcd/pull/8517, https://github.com/coreos/etcd/pull/8288 and https://github.com/coreos/etcd/pull/8334.

        1. 0

          release for software announcements. :)

          we are happy to additionally announce our series B raise of $27M

          That seems like a lot to crawl out from under…

          Also, @irfansharif please avoid dropping marketing here when you haven’t participated in our community.

          It’s tacky.

          1. 13

            I don’t see a problem with this, I was very happy to see this posted, I don’t care who posted it. We already have the upvote system to filter content.

            1. 1

              That’s cool you don’t see a problem with this. I do.

              We’ve had problems with this in the past, and once you normalize the behavior it tends to get worse and worse.

              Imagine you were at a party and a person who never talked to any of the other guests came up and started telling you about a time share–hell, assume you were even in the market for a time share.

              Would you want to keep going to parties at a place where it became known that that was a good place to get pitched on things?

              Tech culture is rotten enough with the hustle and advertising of spurious (if sometimes useful) products as it is, we need not foster it here.

              Had the post been the intern’s experience, a reflection on things learned while working at CockroachDB, a review of what makes it good compared to other databases, or whatever else, and had not been the first thing submitted by the user who has no previous history of interacting with the community here (no comments, no stories, nada), I wouldn’t have complained.

              Hell, we’ve had a lot of previous submissions about CockroachDB.

              But we’ve also had multiple people decide to drop marketing spam here.

              1. 2

                Time share? Terrible example of something with no value to readers here. Let’s do one that actually compares. So, these two guys make this thing called UNIX that delivers a never-before-seen combo of features that wow’s people. Anyone that can use it is using it but it’s proprietary with a big company that’s not acting in users’ interests or might not later. Then, some group creates a FOSS derivative of that tool called BSD. Those capabilities in high demand and that are technically interesting are now available as FOSS. They drop it on Lobsters as their first submission. Except, in this case, it’s even harder to derive as they had to invent new, hard concepts rather than a port of existing ones.

                You’d tell them to take their marketing elsewhere whereas I’d say “Holy shit, there’s a free version of something that made me say ‘Holy Shit’ before! It might even benefit a lot of admins or db fans here after bugs get shaken out.” Looking at prior submissions, quite a few were technical pieces that got significant upvotes because readers enjoyed and learned from them. There were some talking beta’s, yet another article on scaling in Go, they had some problems, etc. where I agree we don’t need it here. The votes agreed, too. Hell, you normally comment “maybe add a release tag” on FOSS project submissions with unknown significance but “no marketing” on a Spanner competitor? A little inconsistent. So, if we’re talking about marketing, I think an exception should be made for a 1.0 release of FOSS solving a hard problem with interesting internals possibly worth copying in other apps in that domain. Updates on product development, feature comparisons, and other marketing fluff of should stay on other forums as you’ve said before.

                Note: And this is coming from the guy that was Public Enemy No 1 on CochroachDB threads at HN for slamming the shit out of them on things like no stability focus early on. No bias here except in favor of rewarding hard, useful, and free tech.

                1. 1

                  Again, those other submissions were by established community members, weren’t just a product roadmap, and weren’t bragging about a Series B round in their opening paragraphs.

                  As for your BSD example, I wouldn’t have downvoted that since it came from an academic institution and not an intern of a tech startup shilling as their first (and heretofore only) community contribution. ;)

                  1. 3

                    The funding is actually extremely relevant to me because It means I have more reason to trust it won’t be abandoned. It is no secret that database companies are notorious for failure, and cockroachdb has had some problems with stability. The last thing I want to do is use something guaranteed to fail due to lack of funding.

            2. 11

              right, long time observer but first time posting. didn’t realize this would make for a bad first post but mentioned my affiliation up above.

              1. -4

                Regrettably, lurking and observing is no substitute for participation. :)

              2. 6

                I have mixed feelings about it. Spanner/F1 is possibly best DB ever invented in terms of consistency, geo, speed, and cost. Only two competitors were on my radar even attempting to replace it. Apple acqui-killed one (FoundationDB). The remaining one hitting 1.0 with a FOSS release is newsworthy to technical audience. It means it’s time to play and peer review.

                At least for DB people. Im hands off on it for now except for assessing assurance methods from the technical posts.

                Note to all: Im interested in any other projects you know with serious talent and funding trying to compete with F1 RDBMS.