1. 7

    I wrote void for similar reasons, and it has totally changed the way I learn things, approach problems, and track progress. I really recommend people learn how to build their own organizers. Everyone hates the organizers that someone else wrote to some extent, but if its your own baby it’s hard to hate :P Use tools you love.

    1. 2

      I totally agree. I love plain text files. And I only trust and like my own dogfood.

      http://www.sistemasoperativos.org/2012/12/17/mi-todo_list-y-sobre-todo-mi-done_list.html (spanish)

    1. 28

      I guess the author hasn’t worked on large code bases. Tests avoid regressions without needing to analyse the impact on the whole codebase.

      One of the points of tdd is in fact that it encourages small interfaces. Even in cases where the interface is not stable, having a suite of tests that captures the expected scenarios and behaviors allows developers to make changes (including during initial development) and know the impact on existing functionality.

      There are cases where manual testing makes more sense, but I’ve found those to be the exception rather than the rule. Generally scripts of no more than 1000 lines with a single well defined purpose. Of course those can then be integration tested, and also manually tested as a single unit.

      1. 14

        The post isn’t “don’t test,” it’s “mostly avoid unit tests.”

        I’m kinda inclined to agree with the author here, though I think it really depends on what sort of software you’re working on. There are projects that inevitably have lots and lots of easily unit testables interfaces. And there are projects that are inherently very stateful, making it difficult to unit test without lots of mocking. You can still do integration (or black box, or whatever) testing.

        I’m kinda in this boat at work, with a highly stateful system. There are a few things you could break out into a rather easily testable API that takes an input and outputs a result that you can check, but these are generally trivial, and if these parts don’t work, then the integration tests would reveal it anyway. So why maintain redundant unit tests? That’s not where the hard parts are.

        You could try highly stateful, highly mocked unit tests for the stateful parts but (without experience) I’ll say that it’s probably going to be high maintenance effort for poor yield. Keep changing the mocks as internals change. I’m concerned that they still wouldn’t catch the hard bugs.

        The hard bugs relate to transient behavior of the system. Threads being an obvious case. Flow control. Changes in system state that affect other components. IME unit tests are really bad at catching these type of bugs. And I’ve watched people struggle to write test cases for that type of bugs.

        I wish the system I was working on were as easy to test as sqlite, but no..

        1. 6

          The biggest advantage of unit tests for stateful systems in my experience is that it will point directly to the place in the code that is busted for the most part. integration tests tend to cover a lot of ground, but it can be hard to pinpoint what went wrong.

          My policy on this is always evolving, but usually I will use integration tests as my main line of defense for a brand new feature. This will cover the most ground but let me get something out there. Then, once I find issues outside of the happy path, I tend to target them with more unit-y tests.

          In practice, often this means that a bug in production turns into “factor out this problematic code to make it more testable”, then writing a test against it.

          This leaves the integration tests out of most edge case testing, but means that when other people hit corner cases they have a well documented example to work off of.

          If I write a specific test on an edge case, it’s more likely to be seen as intentional than if it’s just a part of an integration test.

          1. 4

            As an SRE and distributed database engineer, I despise traditional unit tests for large stateful systems. I touched on this a lot in this talk on testing complex systems. Your biases while writing the implementation are the same as your biases when writing the tests, so don’t write tests at all. Use tools that generate tests for you, because that pushes the bias space into the realm of broad classes, which our puny minds can stumble into much more easily than enumerating interleaving spaces of multiple API calls or realistic hardware phenomena over time. You can apply this technique to anything, not just simple pure functions. This paper by John Hughes goes into plenty of specifics on how to do this, too.

            We can build our systems so that these randomized tests can shit out deterministic regression tests that facilitate rapid debugging without pinning ourselves to a single point in statespace as unit / example / whatever tests do.

            Unit tests and integration tests that explore a single script just create code that is immune to the test, but not necessarily reliable.

            1. 3

              let me start off by saying I like things like generative testing and am always looking for ways to integrate that kind of tooling into projects.

              I have found that for enterprise software, where you have a pretty heterogenous system with a lot of edge cases around existing data in the system, it’s hard (in a holistic sense, not in a tooling sense) to really make cross-cutting declarations about the behaviour of the system. X will be true, but only when settings Y and Z are toggled in this way, and only during this time of day. Often times probing in the database for certain sorts of global conditions that affect things in a cross-cutting manner.

              You can decouple systems to make this flow better, but often there’s intrinsic difficulties, where your best bet is to isolate X truth-ness. But when you have calculateXTruthiness: Bool -> Bool -> Bool -> Bool, the value of generative testing goes down a decent amount because it’s just a predicate! Meanwhile you do get at least a bit of value from some unit tests at least to document known correct behaviours (from a business rules perspective).

              It’s all a spectrum, but it can be slim pickings in enterprise software for generative testing. Your best bet is to refactor to pull out “systemic” parts of your code to make it easier to test, even if your top layer remains messy as a consequence of reality being tricky.

              Lots of time there is simply not really many overall properties to eke out of your system beyond “whatever the system is doing already” (because backwards compatibility is so important nowadays in the kinds of systems we build).

              1. 2

                Whatever your expectations of a thing are, you will almost always have success in violating them through a sequence of generated interactions if you built the thing with scripted unit and integration tests. If you have no expectations, then your job is done and you can look busy in other ways :P

        2. 10

          Integration tests tend to be more useful for broad regression detection. Often, the failures in a system come from mistaken assumptions about the behavior of other modules interfaces, and not from within the module itself. If I had a choice, I would prefer a handful of end to end tests over the same amount of time invested in unit tests. Or even better, a mix of integration tests to cover end to end issues, with unit tests on subtle or hairy core algorithms.

          It’s not a choice between unit testing and manual testing – there are other types of automated test.

          1. 2

            The non-syncing of spec code with implementation code really feels like the big barrier to making this usable in general.

            One idea I had to tackle this issue in a language like Python would be to allow for executable doc-strings within the code that could let you write specs inline, and have those be parsed out (but by default it would use the actual in-code implementation)

            That way you could write simplifying specs for certain parts of the code (say, the result of input will be any string instead of waiting on stdin when checking), while still avoiding duplication because most code is straightforward

            Though to be honest this might be very hard to get right. I feel like it’s a bit like the ORM/Type System issue, where type systems are usually rigid and don’t give much “type-check-time” flexibility, but ORMs are usually defined dynamically (relative to the type system)

            1. 6

              This is why I ended up spending less time with TLA after learning it. However, learning it was an incredibly useful exercise that has dramatically informed the way I build systems. It made me start to ask why I can’t write TLA style invariants and check executions of concurrent and distributed algorithms I build in general purpose languages.

              I realized I actually can get similar results on real code if I build systems carefully: schedule multithreaded interleavings at cross-thread communication points, simulate distributed clusters with buggy networks in a single process at accelerated speed a la discrete event simulation, things that use files are communicating with future instances of themselves and you can record logs of file operations and arbitrarily truncate them and ensure invariants hold after restart.

              my main project right now is trying to make the above ideas into nice libraries that let people run their code in more realistic ways before opening pull requests, and integrating those tools into the construction process of the sled database.

              1. 3

                An idle thought which can go on my list of side projects to start “one day” (probably right after the bus accident): probably symbolic execution can be used to demonstrate, if not enforce, the synchronisation of TLA+-type models with code. A symbolic executor can show the different cases a program will execute based on its input and the outputs that result; those can be compared with the cases discovered by the model-checking tool.

                Hooray, I’m not the first person to have that idea! You can combine formal methods with symbolic execution and meet in the middle.

                1. 2

                  One idea I had to tackle this issue in a language like Python would be to allow for executable doc-strings within the code that could let you write specs inline, and have those be parsed out (but by default it would use the actual in-code implementation)

                  While this example was pretty close to the code implementation, TLA+ (and most specification languages) are too flexible to allow easy embedding. Here the processes were actual threads, but they could just as easily be servers, or human agents, or an abstracted day/night cycle. In one spec I wrote, one process represented two separate interacting systems that, at that level of detail, were assumed to have perfect communication.

                1. 2

                  Human brains are really bad at reasoning about the correctness of concurrent systems. Even when we use files in simple ways we are usually skirting several race conditions that just happen not to pop up while running the code on a single laptop for a few days. After starting to write my networked code in ways that let me do randomized partition testing in-process, I’ve been so humbled by the dramatic number of bugs that pop out when you just start to impose possible-yet-unanticipated orders of execution on concurrent systems.

                  I’ve started to expand the approach to file interfaces that record writes since the last fsync, and can be “crashed” during tests at different points. Concurrent algorithms that can be deterministically replayed in any interleaving of cross-thread communication points. This has been fairly straightforward on systems that run on top of libpthread, but I’ve been struggling to apply these techniques to existing go codebases, where concurrency is not interactive from go programs themselves. External instrumentation of the process at runtime from ptrace gets gnarly really quickly.

                  I wish that as a language that encourages the usage of concurrency more than most others, go also embraced understanding its concurrent behavior more. Does anyone know of better options than using rr’s chaos mode on a go system after forcing all goroutines to LockOSThread?

                  1. 3

                    Do you already know about TSAN? https://golang.org/doc/articles/race_detector.html

                    1. 1

                      TSAN and the go race detector are similar (I think Dmitry Vyukov might have been involved in creating tsan, in addition to the go race detector) and they work by instrumenting points in the code in similar ways as what I mentioned, but I’m interested in further applying these techniques to actually cause different scheduled interleavings (and ideally having deterministic replay) to gain more confidence in the implementation. Just running TSAN or go -race will only catch synchronization issues (tsan has false positives because it can’t reason about bare or implicit memory fences, but go -race doesn’t have this issue because you can’t express these memory semantics in native go) that your execution happens to stumble on while running an instrumented binary. Sometimes the mere instrumentation of the binary causes timing changes that prevent race conditions from popping out as often as they would in production. I want to force diverse executions (ideally fully exhaust the interleaving space for small bounded execution combinations) in addition to just detecting issues.

                      1. 1

                        Are you thinking of something like Jepsen?

                        1. 2

                          Jepsen is thousands to millions of times slower than in-process network partition simulation, takes a month to stand up if you already have clojure skills, and it does not result in deterministically-replayable regression tests for issues it finds. It’s fairly expensive to run on every commit, and you can’t afford to block devs on jepsen runs in most cases. Here’s an example of what I’m talking about for network testing applied to a CASPaxos implementation. In 10 seconds it will find way more bugs than jepsen usually will in 10 hours, so you can actually have devs run the test locally before opening a pull request.

                          1. 1

                            What you’re describing reminds me of the FoundationDB simulation testing talk.

                            They instrumented their C++ codebase to remove all sources of non-determinism when running under test, so they could test their distributed database and replay failures, stressing it in exactly the way you’ve described.

                            Let me try to find a link.

                            Edit: I believe this is it: video

                            1. 2

                              Yeah! This talk was really inspiring to me, and I’ve been pushing to test more things in this way. There have been some advances since that talk that address some of the complexity downsides he mentions, particularly lineage-driven fault injection. LDFI is the perfect way to make a vast fault-space tractably testable, by looking at what goes right and then introduces failures from there.

                  1. 6

                    This is pretty far off-topic, and most likely to result in a bunch of yelling back and forth between True Believers.



                    OP didn’t even bother to link to the claimed “increasing evidence”. This is a bait thread. Please don’t.

                    1. 17

                      Shrug. I find the complete lack of political awareness at most of the tech companies I’ve worked at to be rather frustrating and I welcome an occasional thread on these topics in this venue.

                      1. 13

                        It’s possible that many of your coworkers are more politically aware than they let on, and deliberately avoid talking about it in the workplace in order to avoid conflict with people who they need to work with in order to continue earning money.

                        1. 1

                          All work is political. “Jesus take the wheel” for your impact on the world through your employment decisions is nihilistic.

                          1. 8

                            Not trumpeting all your political views in the workplace does not mean completely ignoring political incentives for employment or other decisions. I’m not sure what made you think GP is advocating that.

                      2. 3

                        Obviously “off-topic-ness” is subjective, but so far your prediction re: yelling back and forth hasn’t happened. Perhaps your mental model needs updating… maybe your colleagues are better equipped to discuss broad topics politely than you previously imagined?

                        1. 4

                          Obviously “off-topic-ness” is subjective, but so far your prediction re: yelling back and forth hasn’t happened.

                          Probably because everyone on this site is good and right-thinking — or knows well enough to keep his head down and his mouth shut.

                          (Which has nothing to do with the truth of either side’s beliefs; regardless of truth, why cause trouble for no gain?)

                          1. 5

                            To me, the people on this site definitely handle these discussions better. Hard to say how much better given that’s subjective. Let’s try for objective criteria: there’s less flame wars, more people sticking to the facts as they see them vs comments that re pure noise, and moderation techniques usually reduce the worst stuff without censorship of erasing civil dissenters. If those metrics are sound, then Lobsters community are objectively better at political discussions than many sites.

                          1. 5

                            These all seem to say one thing: climate change is going to be worse faster than some other prediction said. But that does not even remotely address your claim that “organized human life might not be possible by the end of the century and possibly sooner”. What on earth makes you think you know anything about what conditions humans need to organize?

                            1. 1

                              This is a good point. I guess my “evidence” would be past civilization collapse as a result of environmental destruction like what happened on Easter Island.

                        1. 6

                          People don’t go nearly far enough with generative testing IMO. I have really great success while using it to test distributed systems by using a generator to seed clusters with client requests at specific time steps, and I also generate a schedule of network weather. For the client requests that receive responses, it’s often useful when testing consensus algorithms etc… to check that they linearize.

                          For concurrent systems, I generate small sets (2-4) operations against the full concurrent API and run them on multiple threads and record their return values. If it’s impossible to then find a single sequential schedule that has the same return values for the same requests by running different permutations of the concurrent ops on a single thread, then I just found a violation of atomicity.

                          For systems that write to disk, I have a file interface that when run in testing mode records a log of writes and fsync calls. I’ll generate operations against the API that persists data, and a crash time when writes are either partially applied or completely deleted after the last fsync call. Upon restart if the system is in an inconsistent state, then I just found incorrect usage of the filesystem (so many databases fail to do this correctly).

                          People tend to just test things that adhere to a function signature, and you get great results for very little effort by doing this, but there are mountains of gold if you get more creative with bigger and messier systems.

                          1. 2

                            Have you looked at secure scuttlebutt or dat? They are currently being used successfully for helping periodically-connected communities exchange information asynchronously, like remote villages and sailors etc…

                            1. 1

                              We’re working on an implementation similar to scuttlebutt but on top of the C zyre libs for our desktop app, looking at scuttlebutt for possible usage on our mobile app.

                              I haven’t seen dat before, doing a bit of reading on that to see if we can use it for some things.

                            1. 3

                              Hey icefall, one thing that would complement this presentation is a page listing every paper and tool in it with links to them.

                              1. 8

                                This is a good idea! Here’s a nice sketch that contains most of them and the two that aren’t on there are the ALICE paper and Simple Testing Can Prevent Most Critical Failures. I’ll cut a summary blog post that goes into these when I have a few hours!

                                1. 4
                              1. 7

                                It depends what you mean by resiliency. I tend to work on things with strong consistency requirements, and to be honest I think the way most people build and talk about distributed systems engineering is pretty gross and unprincipled.

                                Why is Jepsen so successful against most systems, despite it being so incredibly slow to actually exercise communication interleaving patterns? People are building systems from a fundamentally broken perspective where they are not actually considering the realistic conditions that their systems will be running in.

                                In my opinion, the proper response to this should be to ask how we can simulate realistic networks (and filesystems for that matter) on our laptops as quickly as possible, without requiring engineers to work with new tools.

                                My approach is to use quickcheck to generate partitions + client requests and implement participants as things that implement an interface that usually looks like:

                                • receive(at, from, msg) -> [(to, msg)]
                                • tick(at) -> [(to, msg)]

                                And this way a distributed algorithm can be single-stepped in accelerated time, and for each outbound message we use the current state of network weather to assign an arrival time / drop it. Stick it in a priority queue and iterate over this until no messages are in flight.

                                Then as the “property” in property testing, ensure that linearizability holds for all client requests.

                                With something like this, every engineer can get a few thousand jepsen-like runs in a couple seconds before even opening a pull request. They don’t have to use any tools other than their language’s standard test support. You can write the simulator once and it has very high reuse value, since everything just implements the interface you chose. Way higher bug:cpu cycle ratio than jepsen.

                                This does not replace jepsen, as jepsen is still important for catching end-to-end issues in an “as-deployed” configuration. But I really do think jepsen is being totally misused.

                                We should build things in a way that allows us to quickly, cheaply measure whether they will be successful in the actual environments that we expect them to perform in.

                                Maximize introspectability. Everything is broken to some extent, so be sympathetic to future selves that have to debug it in production while failing spectacularly and causing everyone to freak out.

                                One kind of concurrency that few seem to consider until it’s time to upgrade: multiple versions of your code may be live in some situations. Did you build your system in a way that ensures this is safe? Did you build your system in a way that allows you to un-deploy a new version if it fails unexpectedly? Or did you build in points of no return?

                                One reason why I don’t do distributed systems fault injection consulting anymore is because of the egos of people who can’t accept their babies have problems. That got tiring really quickly. The #1 most important thing to building a reliable system is being humble. Everything we do is broken. That’s OK. So many engineers who learn how to open sockets begin to think of themselves as infallible rockstars. It’s really hard to build systems that work with these people.

                                1. 3

                                  At first I was rolling my eyes but then realized I was being an elitist. Infrastructure must be sympathetic to its users. Part of that IS forward-compatibility, but security issues like this trump those concerns IMO.

                                  1. 7

                                    This reminds me a bit of a technique Joe Armstrong mentioned, maybe in an in-person conversation at a conference or maybe more publically, I forget. He said that he will throw away code that takes him more than one day to write, and either start over the next day or do something more important. His justification was that if it takes longer, the approach is probably shit. At the time, I thought it was a pretty extreme approach, and just made a mental note of it and moved on.

                                    After spending the last year+ making a reasonably well tested database, I’ve caught myself having taken on this approach myself, arriving at it after an endless series of devastatingly complicated bugs that have deeply humbled me. It was quite a surprise when I realized it was exactly what Joe had mentioned, that I didn’t really believe was an effective strategy at the time.

                                    Why do this? For me, it keeps the complexity manageable. If I can’t keep the whole thing in my head when I’m creating it in one shot, there’s very little chance I’ll be able to debug an issue in it in under one or two days. I will create far more bugs when I can’t wield the model of it easily in my mind.

                                    How long do you want to spend debugging an issue in a system? Combine this approach with bwk’s “debugging is twice as hard as writing a program in the first place” rule of thumb and throw the whole thing away when you get to 1/2 of your desired debugging budget!

                                    1. 5

                                      Transactions for sled! I’ve been going through the literature and arriving at a concurrency control scheme loosely based on cicada but a little less complexity around timestamp synchronization, but giving it a clear path toward implementing less centralized timestamp allocation in the future if 100+ core machines become targets. It has been super interesting as a distributed systems person working on a main-memory database to see techniques from the distributed world being applied recently to reduce coordination costs of transaction techniques. I’ve been holding off on releasing sled as alpha because of a few known rare dataloss bugs, but I think it might be better to just roll forward and be clear that it’s got bugs still, which will be the case whether I know about them or not, because it’s still so young.

                                      1. 8

                                        I’m working on two projects in my spare time right now:

                                        -going through Type-Driven Development With Idris, to learn Idris for the dependent-types goodness

                                        -trying to teach myself the nuts and bolts of type theory by implementing Hindley-Milner type checking in the toy programming language interpreter I’ve been working on for a while. I’ve found a few resources about exactly how to go about doing this (most notably, this Haskell conference talk and this section from Stephen Diehl’s incomplete tutorial on implementing a Haskell-like functional programming language. I’ve actually had a bit of trouble translating the code from those resources, which is in Haskell and assumes a particular design for the AST data structures, to my own interpreter, which is in Rust and has a different AST design. If anyone is knowledgeable about actually implementing Hindley-Milner in a real programming language, I’d love to get some advice.

                                        1. 4

                                          I’ve got that book sitting in front of me right now! It’s so cool how you can specify protocols and valid state transitions so nicely in idris! I’m fantasizing about using it as the basis for a smart contract language.

                                          1. 4

                                            I tried going through that book, too, but I was not convinced by the examples it used. I recall one example which added the runtime length of an array to the typesystem. Nice, but you still needed do (runtime) checks for the length…

                                            1. 2

                                              You don’t always need to do runtime checks for the length. For example, if the length of a vector is n you can use a data type Fin n to represent an index. If you ask a user for an input, you’ll have to check it there - but once you have a Fin n you can pass it around your program without checking it. It’s always a valid index.

                                            2. 1

                                              If anyone is knowledgeable about actually implementing Hindley-Milner in a real programming language…

                                              I have experience with doing it in Haskell, but it seems like you’re implying that Haskell is not a real programming language…

                                              1. 1

                                                Haha far from it, I’ve written ostensibly-useful code in Haskell myself. There are a number of impedence mismatches between Haskell-style code and Rust-style code, that have made it hard for me to take the example Haskell code I’ve seen for Hindely-Milner and apply it to my own project.

                                            1. 9

                                              I foresake C as someone who works with it all the time. The author of this post makes the point that it’s worth knowing. I think that’s totally true if you interact with low-level systems.

                                              I definitely don’t buy the point about distributed systems usually requiring C because of performance reasons. As a distributed systems engineer, most of my peers work in memory safe languages, with a few things along data paths being in C, and every once in a while people may peer into the kernel to reason about an issue in the networking stack, but I’d imagine that most people who bill themselves as distributed systems engineers today are terrible at C and it probably doesn’t hurt them very much.

                                              When I foresake C I don’t advocate for its ignorance. I advocate for learning all you can about memory corruption, and being honest as a biased human who is building things for other biased humans and with other biased humans. C is a great language to use for learning about bugs and exploitation techniques. There is too much macho bullshit from prominent C engineers, and it catches on with incompetent engineers who make the world a more dangerous place.

                                              1. 4

                                                Many timeseries projects today seem to be borrowing techniques from Gorilla, particularly around compression, so it’s pretty relevant.

                                                1. 11

                                                  Thank you for the wonderful comments last week.

                                                  I wrote an Earley parser. And a Pratt parser. The Pratt parser is what I’ve been looking for all this time: a modular recursive descent parser. What it lacks in formalism it makes up with in brevity and power-to-weight.

                                                  Now, I need to choose a host language. I’d like to pick Rust, but I’m not sure it has a ready-made GC solution right now, and I don’t want to go down that rabbit hole. That leaves C++, JVM, or OTP. Any thoughts?

                                                  1. 3

                                                    What kind of language are you looking to interpret/execute? The three platforms you mention all have really different tradeoffs.

                                                    1. 3

                                                      A Lisp-esque language under the hood with a non-Lisp syntax on top. Idea is the functional paradigm can subsume the other two big paradigms (imperative/logic). Can use the CEK machine for proper tail call handling, so that isn’ta requirement of the host. Big thing I’m looking for is a GC (whether lib or built-in) and a language I like that I can target it with.

                                                    2. 2

                                                      For rust, you can wrap everything in a Rc, or if you have multiple threads an Arc, or if you want tracing GC you can use this, or if you just need epoch-style reclamation there’s crossbeam-epoch or if you just need hazard pointers there’s conc. I’ve had a lot of success with crossbeam-epoch in lock-free systems I’ve built.

                                                      1. 1

                                                        Rc (and friends) would need cycle detection, no? Maybe the thing to do is just use Rc and do research on cycle-detection algorithms to see if they are hard or not.

                                                        I looked at Epoch and hazard pointers and wasn’t sure if they were ok as a general GC. I need to do more reading. Thanks!

                                                        1. 2

                                                          Yeah, you can create memory leaks with Rc cycles in rust. But this is rarely an issue in most use cases. Rust memory can feel a little confusing at first, but cycles tend not to come up once you learn some different idioms for structuring things in non-cyclical ways.

                                                          For example, if you want to build a DAG, you can quickly implement it with a HashMap from ID to Node, where ID is some monotonic counter that you maintain. Each Node can contain Vec’s of incoming and outgoing edges. You can implement your own RC-like thing that tracks the sum of indegree and outdegree, and when it reaches 0, you just remove the Node out of the containing hashmap. For the cases where performance or concurrency concerns rule out this approach (which are rare and should not be pursued until this is measured to be a bottleneck) you can always write Rust like C with unsafe pointers, Box::into_raw, dereferencing inside unsafe blocks, and free’ing by calling Box::from_raw (actually calling drop() on that if you want to be explicit about what’s happening, but it will be dropped implicitly when it goes out of scope). Use mutexes on shared state until… basically always, but if you REALLY want to go lock-free, that’s when you can benefit from things like crossbeam-epoch to handle freeing of memory that has been detached from mutable shared state but may still be in use by another thread.

                                                          Feel free to shoot me an email if you’re curious about how something can be done in Rust! I know it can be overwhelming when you’re starting to build things in it, and I’m happy to help newcomers get past the things I banged my head against the wall for days trying to learn :)

                                                      2. 2

                                                        FWIW, many languages written in C or C++ use arenas to hold the nodes that result from parsing . For example, CPython uses this strategy. I’m pretty sure v8 does too. So you don’t manage each node individually, which is a large load on the memory allocator/garbage collector – you put them all in a big arena and then free them at once.

                                                        1. 2

                                                          Save the earth , use C++ or OTP

                                                          1. 1

                                                            You also have Go and .NET Core as possible host runtimes.

                                                            1. 1

                                                              What about Nim? It seems to be a memory-safe language with low-latency GC, macros, and produces C. I mean, the Schemes are ideal if doing language building with LISP thing underneath since they start that way.

                                                            1. 8

                                                              Rushing to get my lock-free rust bw-tree-backed embedded database to an alpha state before FOSDEM next weekend, where I hope to encourage a few people to give it a shot for low-impact workloads. Thinking about ways of applying real-time scheduling to threads in a concurrent property testing library I’m writing to tease out bugs in the bw tree to get similar results to PULSE when used with quviq’s quickcheck for erlang. I will name my first child “Determinism” after the past few months of intense debugging…

                                                              1. 34

                                                                The recruiters who perform the first layer of filtering will usually have a set of buzzwords or technologies that they have been told are associated with candidates worth talking to. This might include things like prometheus, kubernetes, golang, aws, puppet, etc… If you are applying for a specific job, try to figure out what stack they use, and familiarize yourself with it by working through a tutorial, so that you can mention at least a basic level of familiarity with tech that the recruiters will be often filtering for. To the good companies that want SRE’s who are curious enough to dig into unfamiliar territory, being open about your previous unfamiliarity and willingness to dive in anyway can be a strong positive signal. But recruiters don’t always share the same cultural values around this as managers or future teammates, so use your discretion about how open you are about this with the recruiter. Some teams really value curiosity, but the recruiters often don’t get that message, and will hear that you’ve only done a quick tutorial on something and will mash the “reject” button in their candidate pipeline management interface.

                                                                When I hire for SRE teams, I care about one thing above all else: ability to drill into unfamiliar systems that are behaving strangely. General curiosity and demonstrations of learning new things are possible indicators of this mindset to me. I actually usually prefer to hire people who are more curious than people who are less curious and more experienced because the curious people will be getting more out of the job on a personal level, and I love teaching. A lot of teams don’t prioritize teaching though, due to mismanagement, incompetence, laziness, or fear-driven mindsets that are afraid a newcomer will outperform them. Avoid these places like the plague. They are the norm, unfortunately, but they will not help you grow at a rapid pace, and if you can afford to keep playing the interview lottery, you should really hold off until you get a place that gives you a strong signal about mentorship.

                                                                I test for drilling proficiency by asking questions about tools that are common, and can be assumed to have some basic familiarity around, like linux or the web, and when we get to a part that they are unfamiliar with, I let them know that they can use me as Google to ask questions about how they can drill into specific behaviors. I ask about how things can fail. What are different ways a file can fail to be created, etc… (they can still use me as Google to look into the specific implementation details of things, or we can strace stuff together, etc…). Basically, I try to show them windows they can use to peer into things I’m asking about the system if they are not already familiar, and then I try to get a sense of how creatively they can use these new tools I’m giving them. That’s how the job will look, so that’s how I try to make the interview.

                                                                Most people suck at interviewing. They will ask you about minutae about tools that you may not be experienced with yet. It’s important to keep a calm head, let them know about where your current level of experience is with these specific technologies, and then explain to them how, in the real world, you would dive into the problem to understand and solve it. You exist to solve problems, deflect bad interview trivia questions with confidence in your ability to solve problems by drilling into them. If the team sucks at interviewing, that’s on them, and the team is less likely to be made up of experienced people who are also good to work with. People skills are the most important engineering skills.

                                                                If you want a laundry list of tech to get familiar with for modern SRE-like jobs:

                                                                • linux fundamentals, what’s an inode, what’s a process, what’s a socket
                                                                • AWS / GCE
                                                                • kubernetes
                                                                • prometheus


                                                                Different teams will prioritize coding skills differently. The most common thing I see recruiters filtering for is making sure you have one scripting and one compiled language under your belt. Golang is a good bet, as it’s taken over the pop infrastructure world.

                                                                Have fun :)

                                                                1. 2

                                                                  That was really helpful!

                                                                1. 3

                                                                  Raw sockets are the strcpy of the current age of terribly-written distributed systems. People stand no chance of finding bugs in the systems they create on their own laptops. The fact that slow, incredibly low bug:cpu cycle black-box partition testing is so successful in finding bugs should be a screaming red alarm that we need to write our systems in ways that are more amenable to testing on our laptops in the first place.

                                                                  That means having a pluggable transport layer that can be used with real sockets in production, a simulator that messes up traffic according to the asynchronous network model, or a hit-tracing fuzzer. If you’re using raw sockets in your core serving logic, your architecture is broken, and you will have far more bugs and higher costs to fix the few bugs you discover.