1. 9

    Sourcegraph has got to be one of the best tools I’ve used… with giant monorepos at Uber. Uber has a monorepo for iOS, Android, Java and Go, each used by hundreds of teams. There’s custom tooling to work with it that I won’t get into, but I’m amazed at how pleasant Sourcegraph makes browsing it and how fast the code search is, with first-class regex support. It’s the search speed of the regex searches that really puzzles me, and I’d love to know how it’s done behind the scenes.

    It might sound like I’m selling something but I’m not. I have no association with Sourcegraph, have no idea how much this tool costs for the company or the licensing terms (was that this article touches on, regarding the unconventional open source approach). But tools I’ve learned to appreciate in this environment that I’ve not used before are Kibana, Grafana and Sourcegraph. Obviously, your mileage might vary.

    Update: I literally just came across a Software Engineering Daily podcast episode from a month ago where one of the Sourcegraph founders talks indexing large repos, using Uber as the example: https://softwareengineeringdaily.com/2020/07/22/sourcegraph-code-search-and-intelligence-with-beyang-liu/

    1. 4

      Searching for regular expressions reminded me of this post by Russ Cox, which the interview you linked to briefly mentions.

      1. 3

        I haven’t used Sourcegraph myself so I can’t speak to whether there are any similarities, but you might take a peek at the source code of livegrep, which also offers pretty speedy regex-capable search.

      1. 7

        I’ve noticed a growing trend of people assuming algorithms are pointless questions that are asked by tech companies purely as an arbitrary measure. I hear more people complain about how all of this is a purely academic exercise.

        No, people complain that asking candidates to remember a ton of algorithms in their heads and produce them on a whiteboard when asked in a heavily time-constrained setting is a poor estimate of one’s ability to use them in practice.

        1. 10

          We’re on the same page. In the article, I make it clear this is also my stance and close with:

          “ To anyone reading whose company has a bar to hire people who know some of the advanced algorithms by heart: think again if this is what you need. I’ve hired fantastic teams at Skyscanner London and Uber Amsterdam without any tricky algorithm questions, covering no more than data structures and problem solving. You shouldn’t need to know algorithms by heart. What you do need is awareness of the most common data structures and the ability to come up with simple algorithms to solve the problem at hand, as a toolset.”

        1. 9

          This is a great idea for a post that I’ve wanted to write myself. Leaving aside trees, hash tables, and stacks, I probably used less than one “interesting” data structure/algorithm PER YEAR of programming. Some notable counterexamples are two kinds of reservoir sampling, regular languages/automata, random number generation, some crypto, and some info retrieval algorithms.

          One thing that sparked it is a obscure but long-running debate over whether dynamic programming interview questions are valid.

          I don’t think they’re valid. It’s mostly a proxy for “I got a CS degree at a certain school” (which I did, I learned dynamic programming in my algorithms class and never used it again in ~20 years.)

          I challenge anyone to name ANY piece of open source software that uses dynamic programming. Or to name an example in your own work – open source or not.

          I’ve tried this in the past and nobody has been able to point to a concrete instance. I think the best I’ve heard is someone heard about a professor who heard about some proprietary software once that used it.


          Related: algorithms used in real software (although this is certainly not representative, since compiler engineering is a subfield with its own body of knowledge):

          https://old.reddit.com/r/ProgrammingLanguages/comments/b22tw6/papers_and_algorithms_in_llvms_source_code/

          https://github.com/oilshell/blog-code/blob/master/grep-for-papers/llvm.txt

          Linux kernel algorithms:

          https://github.com/oilshell/blog-code/blob/master/grep-for-papers/linux.txt

          1. 10

            I challenge anyone to name ANY piece of open source software that uses dynamic programming.

            Git, or most reasonable implementations of “diff”, will contain an implementation of the Myers Algorithm for longest-common-subsequence, which is very dynamic-programmy.

            No concrete example for this one, but I know that bioinformatics code is full of dynamic programming algorithms for the task of sequence alignment, which is similar to diff — identifying a way to align two or more base sequences so that they coincide with the minimal number of changes/additions/deletions required to make them identical.

            1. 1

              Hm I’m familiar with that algorithm but I never thought of it as dynamic programming.

              Wikipedia does say it’s an instance of dynamic programming. Although when I click on the paper, it appears to contrast itself with “the basic O(MN) dynamic programming algorithm” (section 4).

            2. 8

              Since you mentioned dynamic programming, it’s worth pointing out that the name “dynamic programming” was chosen for political reasons, as pointed out in the history section of the Wikipedia article on dynamic programming. So I think it’s a really bad name.

              1. 1

                That’s funny, I remembered xkcd’s “dynamic entropy” comic, and it quotes the same passage:

                https://xkcd.com/2318/

                It also has a very interesting property as an adjective, and that is it’s impossible to use the word dynamic in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It’s impossible.

                LOL

                Agree it’s a terrible name… I would say it was chosen for “marketing” reasons

              2. 7

                I have thought about whether dynamic programming questions are fair to ask, and I ended up where you are: they are not.

                Dynamic programming was the one I struggled most in understanding and implementing correctly. And while there are semi-practical examples (like the backpack problem), I have not found any practical, day-to-day use cases on this.

                I had an argument with my colleague who asked this kind of problem, saying it’s basic knowledge. Turns out he did competitive programming and there, it is table stakes. But in practice, it just filtered for anyone who has learned and practiced this approach.

                I stay away from asking this, problems that need dynamic programming to solve.

                1. 4

                  I’m familiar with dynamic programming mostly from high-school competitive programming as well. Otherwise I can’t say I’ve encountered real-life problems where it occurred to me to use the technique.

                2. 8

                  I challenge anyone to name ANY piece of open source software that uses dynamic programming. Or to name an example in your own work – open source or not.

                  I’m literally about to implement something that could be classified as dynamic programming at work, which can be sumarrized as “computing a few simple statistics such as number of bytes changed for each directory in a large filesystem”. Dynamic programming is such a general term that it applies regularly if you want to use it.

                  1. 4

                    I’d like to see it. I don’t think dynamic programming is a general term.

                    In fact one time I participated in this argument (which was years ago so my memory is fuzzy), a former CS professor I worked with explained the difference between memoization and dynamic programming. A bunch of 10-20+ year programmers like myself went “ah OK”. Unfortunately I don’t even remember the difference, but the point is that most programmers don’t, because dynamic programming is very uncommon.

                    What you’re describing sounds like an obvious algorithm that anyone could implement, which is not the same as dynamic programming interview questions, or dynamic programming in competitive programing.

                    As other posters mentioned, competitive programming is the main place you see it outside of a CS class.

                    1. 2

                      It’s absolutely an obvious algorithm, so is most dynamic programming. That was sort of my point.

                      Can’t share the code unfortunately, but it’s just iterate over sorted list of file changes in reverse order and collect statistics as we go. Dynamic part comes from the fact that we can just look at the subdirectories of a dir (that we already have numbers for) instead of recursing into it.

                      1. 2

                        What you’re talking about could be called memoization, or it probably doesn’t even deserve that name. It’s just what a “normal” programmer would come up with.

                        That’s not the type of thing that’s asked in interview questions or competitive programming. The wikipedia page gives some examples.

                        Dynamic programming usually changes the computational complexity of an algorithm in some non-obvious way. There’s very little you can do recursing over a directory tree that doesn’t have a clear O(n) way to code it (e.g. computing statistics).

                        1. 7

                          I like Erik Demaine’s explanation, that problems where dynamic programming can be applied are ones where their subproblems and their dependencies can be modeled as a directed acyclic graph [1]. Up to you if you’d like to tackle that with a top down approach where you look at a node and calculate its solution based on the solutions of its ancestors, or a bottom up approach starting from the nodes in the DAG with no dependencies and propagate the solutions in topological order.

                          My colleague and I used it for a generalization of matrix chain multiplication (for tensors) [2].

                          [1] https://youtu.be/OQ5jsbhAv_M?t=2736

                          [2] https://github.com/TensorCon/ratcon/blob/release/opt/gencon/pathopt.py#L198

                          edit: by the definition above, even naive memoization can count as DP, if you’re caching solutions to subproblems of that structure. Doesn’t have to be at the difficulty level of competition to count as DP in that case.

                          1. 1

                            Hm interesting, searching through my personal wiki, I found a related definition which is different. I haven’t really thought about it enough to form an opinion.

                            Either way, it doesn’t change my overall point: that there are certain kinds of algorithms problems that appear on coding interviews, and in competititve programming, that do not show up in 99% of programming jobs. They are easy to pose and have cute solutions, but aren’t testing very much.

                            I think the “DP tables” part is key but again I haven’t thought about it enough …

                            https://blog.racket-lang.org/2012/08/dynamic-programming-versus-memoization.html

                            Memoization is fundamentally a top-down computation and DP is fundamentally bottom-up. In memoization, we observe that a computational tree can actually be represented as a computational DAG

                            In DP, we make the same observation, but construct the DAG from the bottom-up. That means we have to rewrite the computation to express the delta from each computational tree/DAG node to its parents. We also need a means for addressing/naming those parents (which we did not need in the top-down case, since this was implicit in the recursive call stack). This leads to inventions like DP tables, but people often fail to understand why they exist: it’s primarily as a naming mechanism (and while we’re at it, why not make it efficient to find a named element, ergo arrays and matrices).

                            This bottom-up / top-down distinction might have been the same as what the aforementioned professor said 5+ years ago, but I don’t remember exactly.

                            1. 1

                              So, is memorization of factorial top-down, or bottom-up?

                              1. 1

                                I would probably say neither… Either factorial or Fibonacci is so trivial that it doesn’t help to be thinking about it that way.

                                Though I think the quote hints at a clear test for whether it’s top-down or bottom-up: if you need extra state outside the call stack. I’m getting curious enough to try this out, but right now all I can do is quote what other people say.

                                In any case it’s clear to me that there’s some controversy over what dynamic programming really is. I think the issue is that a lot of algorithms could be framed that way but were not discovered that way, and not taught and learned that way.

                                1. 1

                                  I would probably say neither… Either factorial or Fibonacci is so trivial that it doesn’t help to be thinking about it that way.

                                  I think that the triviality is actually helpful here. If it’s actually true that memoization and dynamic programming are different (and there’s clearly debate on this), can 2 implementations of a trivial function, that everyone can understand highlight the differences?

                          2. 1

                            On the contrary the stupidly naive way (recurse on every directory) is O(n^2).

                            Dynamic programming is just solving a series of problems while using the answers of shared subproblems multiple times. Memoization is a common way to implement this.

                            Yes there are some very clever algorithms that use dynamic programming, this doesn’t make obvious algorithms that use dynamic programming not also fit under the definition.

                            1. 3

                              Why would recursing into every directory be O(n^2)? You’re still only visiting every directory/file once? It seems like something missing?

                              1. 1

                                Say you have a directory structure with a single file in it, /a/b/c/d/e

                                To get the number of bytes changed in e you need to visit e, then to get the number of bytes changed in d you need to visit d and then e, then for c you need to visit c, d, and e, and so on.

                                Like I said, it takes a really naive solution, but if you don’t remember the values you calculate anywhere for some reason it’s sum over inodes (depth of inode)… which is O(n^2) (for bad directory structures).

                                Note that I need these values for every directory, not just for one particular directory.

                                1. 2

                                  That’s still linear complexity space. Unless you’re hardlinking directories (which you then have to deal with potential recursion), it’s still O(n). If you add a file at /a/b/c/file you only visit 1 more file and no more dirs, not an exponential. O(n + n + n) or O(n + 3) still simplifies to O(n).

                                  1. 1

                                    If you add /a/b/c/file you add 4 more visits, not 1. Going from n= 3 /a/b/file to n=4 /a/b/c/file adds 4 more visits. In other words this worst case example takes time O(sum from 1 to n of i) = O(n(n-1)) = O(n^2).

                                    N is the number of inodes in a arbitrary tree not a number of files in a fixed tree.

                                    1. 1

                                      That’s still adding a linear number of operations for each file, the depth could technically be considered a different variable, say m. So for each file (n+1) you add, you also add the number of directory traversals (m) resulting in O(m+n), which simplifies again to O(n), but in reality folders are files too, so are part of n in the first place, so again O(n). Ultimately your n space is the total number of inodes, which both files and folders have.

                                      Abstractly, you’re just traversing a tree structure (or a directed graph if using links), which is well understood to be O(n) (maybe O(n^2) worst case if all folders are links, resulting in a fully connected graph), because you only visit each node once. If it were O(n^2), you would visit each node n times.

                                      Remember, Big O notation is about scaling, not the actual concrete number of operations, which is why you drop any constants or variables other than n.

                                      1. 1

                                        It’s O(mn) not O(m+n) (in the insanely naive algorithm that recalculate things every time).

                                        It’s not a single tree traversal but #internal nodes tree traversals.

                                        1. 1

                                          Even if it was O(mn) (it’s not), that still simplifies to O(n). An ‘internal nodes’ tree traversal is still O(n), n is just smaller, but again, your problem is not an internal nodes traversal, it’s a full traversal because you have to look at the blocks attached to the file (leaf) inodes, which means you need to read all inodes of all files and of all folders one time each. n = # of files + # of folders = O(n)

                                          1. 1

                                            I supposed an extremely naive solution could be to fully traverse each sub tree for every folder visited, which would be… O(log n)? But even that isn’t O(n^2), as the total repeated space shrinks the deeper you get.

                                            1. 1

                                              You’re assuming a balanced tree, which is not guaranteed. Depth of tree is O(n) in pathological cases (and average case O(sqrt(n)) is typical for randomly generated trees)

                                              1. 1

                                                Ah yeah, I think it would be O(n log n) not O(log n), because you traverse the tree once for each node, and a subset of of the tree for almost every n (except leafs), at least in the worst case. Still not O(n^2), and the solution for a O(n) is almost easier to conceptualize than the completely naive solution :)

                                                1. 1

                                                  and the solution for a O(n) is almost easier to conceptualize than the completely naive solution :)

                                                  No argument here…

                                                  I think it would be O(n log n)

                                                  We agree it’s O(n) * O(time tree search) now right? And you’re trying to call the tree search time log(n)? Because trees are height log(n)? Then see the post you replied to, that’s true in a balanced tree, it’s not true in a random tree (where it is sqrt(n)) and it’s definitely not tree in a pathological worst case (where a tree is just a n length linked list).

                                                  1. 2

                                                    Yeah, the part I was hung up on before was that you’re naive solution traverses the entire subtree below a node for each node visit, I was stuck in the simple optimal solution. For the pathological case, basically just a bunch of folders in folders with a single file at the bottom, the depth of the tree is n, and the file inode at the bottom would be accessed n times, so O(n^2). For the common case it would be about O(n log n) where you can skip traversing larger and larger parts of the tree the deeper you get on each ‘path.’ Thanks for the discussion, I enjoyed it :)

                          3. 1

                            I think comparing memoization to dynamic programming is a category mistake: they are different kinds of things.

                            ‘Dynamic programming’ is a description of a style of algorithm. It’s divide-and-conquer, usually with overlapping subproblems, making it possible to reuse intermediate results.

                            Memoization is a programming technique to remember intermediate results, by remembering the results of function calls. You can e.g. also store the intermediate results somewhere explicitly, usually in a matrix, in which case you don’t memoize the result ‘transparently inside the function’, but use a lookup table ‘external to the function that computed the result’.

                            1. 1

                              I dunno I find that in addition to the Racket language resource I gave elsewhere in the thread, lots of people compare them:

                              https://medium.com/@afshanf/cracking-the-coding-interview-questions-part-7-a7c8f834d63d

                              A note on terminology: Some people call top-down dynamic programming “memoization” and only use “dynamic programming” to refer to bottom-up work. We do not make such a distinction here. We call both dynamic programming.

                              There does seem to be disagreement on what dynamic programming is. And many algorithms that were not derived with dynamic programming techniques could be described as dynamic programming.

                              But it seems that most people agree it’s related to memoization.

                        2. 4

                          GCC uses dynamic programming to split IA-64 instructions into bundles.

                          1. 2

                            Thanks, nice example and link! Still I would say it’s a niche skill, especially to come up with from scratch in an interview.

                          2. 4

                            I challenge anyone to name ANY piece of open source software that uses dynamic programming. Or to name an example in your own work – open source or not.

                            Ever do video encoding or transcoding with anything built on FFmpeg or x264? Encode images with MozJPEG? Encode an AV1 video or AVIF image with libaom? Trellis quantization in advanced lossy compression encoders is a dynamic programming algorithm.

                            1. 3

                              Hm very interesting! I was not aware of that algorithm. Paper I found:

                              https://www.mp3-tech.org/programmer/docs/e00_9.pdf

                              I would still say it’s a bad interview topic, but it’s cool to see real world usages of it.

                              1. 2

                                Oh, no disagreement there! Even after coding it up myself, I’d hate to have someone ask me to whiteboard a working implementation of trellis quantization in 40 minutes or whatever (though I’m pretty sure I could sketch out an explanation now).

                                In general I’m not a fan of whiteboard coding exercises at all. Whenever I’ve interviewed candidates I’ve always preferred the old-fashioned method of just reading their resume well ahead of time, looking up what ever piques my interest on it, and then having a friendly conversation about that. Usually that provides plenty of esoteric material for me to quiz them on and it also lets them show me their strengths and enthusiasm.

                                1. 1

                                  My current company doesn’t do a whiteboard exercise, but my previous one did… but the thing is, the usual task was to implement a basic “grep”. That is, read a file and print all of the lines that contain a user-specified string, in a language of your choice, with whatever libraries make you happy (it’s not a trick, you’re not supposed to implement Boyer-Moore on your own). Assuming you succeeded at that, we would ask you to implement a few more features, like a -v flag (only print lines that don’t match), and -A and -B flags (print context lines before and after the matching line), until you got stuck or the time for that segment was up. It wasn’t graded on minutiae like correct semicolon placement, it was just an evaluation of whether a candidate could do a trivial task, how they handled additional requirements, whether they asked sensible questions and got clarification when needed, etc. I found it pretty reasonable.

                            2. 4

                              I challenge anyone to name ANY piece of open source software that uses dynamic programming. Or to name an example in your own work – open source or not.

                              I used Warshall’s algorithm (which is dynamic programming) to compute the transitive closure of a graph for a typechecker. This is, in my experience, a very common algorithm.

                              In high school, I wrote a program for my professor that places students into groups of 4 such that their meyers-briggs personalities are as different as possible. This used dynamic programming.

                              A professor of mine (who taught the subject to me) used dynamic programming for some kind of RNA sequencing problem in a paper he published. One of our homework assignments had us arrive at a watered down version of his (and his co-authors’) algorithm.

                              I’m fairly certain that at least some fuzzy string matching algorithms use string distance, which is also solved using dynamic programming.

                              These are all diverse applications of DP. In my personal, subjective experience, the idea that DP is in any way obscure or dated is absurd.

                              Edit:

                              To be more concrete, the “transitive closure of a graph” is for the graph of dependencies, computing the set of all functions that a particular function depends on. This is as described in the Haskell Report.

                              For fuzzy string matching, I have in mind something like fzf, though I cannot say with certainty that it uses string distance (I’m unfamiliar with its implementation).

                              Here’s the paper that I think I’m referencing: Statistical Mechanics of Helix Bundles using a Dynamic Programming Approach

                              1. 2

                                Thanks for the examples. The claim is definitely not that it’s outdated or obscure; the claim is that it’s not a good interview question because it doesn’t show up much at work. Although there were lots of people here who pointed out interesting uses of dynamic programming, that’s not incompatible with the idea that you could have a 10 or 20 year programming career and never use it.

                                Side note: I’m familiar with the Floyd Warshall algorithm but I never thought of it as dynamic programming. I think part of the issue is that I may have a more narrow definition of it than others. (I think people even say the linear time fibonacci is an example of dynamic programming, which I find silly. I just call that the obvious algorithm. I guess it can be used to illustrate a principle.)

                                Even so, I definitely think it’s more popular in universities, and certain domains like bioinformatics. In contrast to what people on this site typically do “for work”.

                              2. 3

                                I challenge anyone to name ANY piece of open source software that uses dynamic programming. Or to name an example in your own work – open source or not.

                                I do a lot of work with text processing – computing the edit distance between two strings is something I do very often. That’s a classic dynamic programming algorithm. There are probably hundreds of open source packages that do this or some variation thereof.

                                1. 3

                                  Just to add to the list of responses clearly demonstrating Cunningham’s Law:

                                  I believe the Knuth-Plass line-breaking algorithm used in LaTeX to lay out text “optimally” uses dynamic programming. This was done for efficiency, as opposed to using some kind of general global optimization routine. It’s also the reason why LaTeX doesn’t support “backtracking”.

                                  1. 2

                                    It’s also the reason why LaTeX doesn’t support “backtracking”.

                                    Sile uses a variant of the same dynamic programming algorithm to lay out paragraphs on a page. The original paper describing the algorithm says that TeX wanted to use it like that, but it would require more than one entire megabyte of state for a large document, which was infeasible.

                                    1. 1

                                      Definitely an instance of Cunningham’s law at work :) I should make another go for my pet problems:

                                      • it’s impossible to make a zsh-like interactive interface on top of GNU readline
                                      • you can’t make a constant-space linear-time model of computation that’s more powerful than regular languages, and that can parse shell/JS/Python/C++
                                      • you can’t make an extended glob to regex translator in less than a week (https://github.com/oilshell/oil/issues/192)

                                      Thanks for the example. If there were more specific links I would make a blog post out of this :)

                                      And although my comment was a tangent, it did motivate me to get out the “Algorithm Design Manual” and flip to the part on dynamic programming. Though I remember the applications in that book being interesting but seemingly far removed from what programmers do day-to-day. It seemed to be by a professor who consulted on algorithms for various companies, which is an interesting job!

                                    2. 1

                                      The Grasshopper routing library uses contraction hierarchies, which are implemented using Dijkstra’s shortest path algorithm and A* search, which are special cases of dynamic programming.

                                      I have to agree it’s not something most people will use every day, but it never hurts to have a general idea how it works.

                                      1. 1

                                        Here is a concrete example of Dynamic Programming that you use every day: Word Wrap. Knuth has an algorithm that is often used for maximizing the number of words per line.

                                        Also the field of Bioinformatics often uses the Levenshtein distance when matching two dna strands.

                                        Also I would like to mention the single most important thing I learned from Dynamic Progrmming: Start at the end case, and figure out what constraints can work from there. For example, think about the last recursion call, and what constraints it needs, and go backwards from there.

                                      1. 2

                                        at uber… we’ve been preaching microservices since 2018.

                                        1. 11

                                          It’s been since 2015 and I would hesitate to call it “preaching”: it’s been sharing what worked over the eng blog, and what did not work for Uber at the time.

                                          Uber went from monolith to SOA in 2015. This SOA followed a microservice-based architecture. And different engineers have been sharing what what they’ve learned along the way: the steps it usually takes to build a microservice, addressing testing problems with a multi-tenancy approach or how and why teams use distributed tracing. We also open soruced some of our tools like Jaeger, which is part of the Cloud Native Foundation’s graduated projects, alongside Kubernetes and Prometheus.

                                          I’ve not seen anyone preaching, meaning anyone wanting to convince or convert anyone else. I personally tell people “here’s what we do, but your mileage will very likely vary”. I’ve always found it interesting to understand how other companies address their challenges and what worked and why.

                                          Also, you don’t need to look far to hear all sorts of different things that work for other companies - some that might seem unconventional for companies of their size or traffic. Shopify shared how they still are a monolith, abeit a modularized one. Stack Overflow shared how in 2013 they ran on a lean hardware stack that scaled up by 2016, but still includes zero usage of the cloud. And I could go on the list.

                                          All of these can serve as inspiration: but the end of the day, you need to make decisions in your environment that you think will work best. Anyone copying the likes of Google, Uber, Shopify, Stack Overflow or others when they’re not even similar in setup will be disappointed.

                                          1. 3

                                            Anyone copying the likes of Google, Uber, Shopify, Stack Overflow or others when they’re not even similar in setup will be disappointed.

                                            That is exactly what everybody around me is doing. Nobody knows the way to success, so they are copying the behaviors of famous successful companies.

                                            Since i cannot edit my original comment, I want to back off on my tone. Its not like I have anything against people in uber. Its just too many conference talks i see where engineers talk how microservices solve problems, and forget to mention the new problems appearing. And never retroactively admitting the microservices were ever a mistake.

                                            1. 2

                                              What problems did moving from a monolith solve for y’all? Were they more people problems or technical ones?

                                            2. 2

                                              Exactly. :-)

                                              For context, I was sweeping a set of around 150 processes running on a good dozen machines into a single JAR back in 2003, with the huge performance increases and reliability/deployment improvements you might expect.

                                            1. 64

                                              I wrote that tweet that is making the rounds. Not many things fit in 280 characters and Twitter being immutable, there’s not many things you can go back to clarify. So let me give some more details on this forum.

                                              1. I speak for my experience, not for all of Uber. Heck, we have hundreds of teams, 95% of whom I don’t know. And teams are autonomous and decide how and what they do (including following guidelines or ignoring them partially or fully) - even if I wanted to, I couldn’t make sweeping statements.
                                              2. Uber has - and still has - thousands of microservices. Last I checked it was around 4,000. And, to be very clear: this number is (and will keep) growing.
                                              3. I’ve been working here for almost 4 years and see some trends in my org / area (payments). Back in the day, we’d spin up a microservice that did one, small thing just like that. We had a bunch of small services built and maintained by one person. This was great for autonomy, iteration speed, learning and making devops a no-brainer. You could spin up a service anytime: but you’d be oncall for it.
                                              4. Now, as my area is maturing and it’s easier to look ahead, as we create new platforms, we’re doing far more thoughtful planning on new services. These services don’t just do one thing: they serve one business function. They are built and maintained by a team (5-10 engineers). They are more resilient and get far more investment development and maintenance-wise than some of those early microservceis. Cindy called these macroservices and I said we’re doing something similar. The only difference in what we do is a service is owned by one team, not multiple teams.
                                              5. While many microservices are evolving like this, the majority, frankly, stays as is. Thousands of microservices bring a lot of problems that need to be solved. Monitoring. Testing. CI/CD, SLAs. Library versions across all of them (security, timezone issues). And so on. There are good initiatives we keep doing - and sharing what works and open sourcing some of the tools we build to deal with the problems, as they pop up. Like testing microservices with a multi-tenancy approach. Distributed tracing across the services. All of this is a lot of investment. Only do microservices at scale if you’re ready to make this investment.

                                              So no, Uber is not going no-microservices like I’m seeing many people interpret it. It’s not even going to less microservices. And when I said “we’re moving”, that was not exact phrasing. New microservices are more thoughtfully created in my team and in my org. These are “larger” services than some of the early, small, focused, microservices.

                                              Microservices worked well at Uber in many ways and keep helping in others areas. There are problems, of course, and you deal with the problems as you go. This would be the same with e.g. a monolith with thousands of developers, SOA with thousands of developers or {you name whatever you do} with thousands of developers. The number of services is still growing, as a whole, as the business grows - though in some orgs, like mine, they are at level, or even going down a bit (though this is not the goal itself). But not all microservices are equal any more. The critical ones look less like your classic microservice - or at least what I called microservices years back.

                                              On another note: everyone interprets the name “microservice” differently. I’ll write a post summarizing my experiences with the ups and downs of microservices at scale. For now, hopefully this gives some more color.

                                              Any other questions, just ask.

                                              1. 4

                                                Thank you for posting a clarification. It is of much more value than the Twitter thread and would love to see a blog post about it.

                                                This is a good example demonstrating why I think linking Twitter threads is poor form and I discourage people from doing it.

                                                1. 4

                                                  It’s basically the whole microkernel thing all over again, innit?

                                                  Monolithic kernels: “Yikes, one bug can bring the whole OS down, let’s split that out”

                                                  Microkernels: “Yikes, having loads of little services effectively work together is actually much harder than we thought!”

                                                  Current “hybrid” kernels: “Let’s take the best of both approaches and combine them where it gives us the best bang for our buck”

                                                1. 6

                                                  What size teams do you work on? I’m asking as the larger the team, the more quickly obvious the benefits of unit testing is.

                                                  I needed a few “aha” moments to realise that writing unit tests are not only not a waste of time, but they are a huge benefit for any project that is not throwaway.

                                                  Early in my career, I built a long-running project mostly by myself where I wrote unit tests, “by the book”. A year later, these tests saved me as I got back to the project I’d long forgotten about and could refactor it, while being confident that it worked.

                                                  Later, working on teams, I realized unit tests are the best way to “guard” the correctness of my code - better than any documentation or convention. I had my colleagues, who did not write tests, break my tests multiple times - and all I had to do was tell them to fix my tests. Later, a pattern showed on those not writing tests having visibly higher bug rate than those who did, and the “no unit test” hangt slowly converted, after big after big we pointed out how 90% of the time, a test would have prevented it.

                                                  And then there’s the implication on architecture. Unit testing forces you to design testable architecture and become hands-on with things like dependent injection, abstractions via interfaces and so on.

                                                  Everyone has a different journey, but I’d love to hear what “aha” moments you’d had (or not yet had) with unit testing.

                                                  1. 4

                                                    Everyone has a different journey, but I’d love to hear what “aha” moments you’d had (or not yet had) with unit testing.

                                                    At first I found unit testing tedious: why bother writing a whole unit test when you can manually test a few relevant inputs and be done forever? More importantly, why bother spending a bunch of time setting up some unit testing framework?

                                                    Then I found manual testing tedious: sometimes iterating on some code requires testing many times. To avoid typing the same inputs over and over again, I saved my test input as text files and used shell redirection.

                                                    Then I found manually examining the output tedious: larger programs could have many execution paths requiring many inputs to exercise. To avoid missing differences, I wrote shell scripts to diff output with expected output.

                                                    At some point I realized I was just unit testing indirectly with text output instead of actual function inputs and outputs. Once I finally bothered spending 2 hours getting standalone JUnit working, I felt like a huge moron. For C/C++ I used this simple #ifdef TEST idiom I found in the FreeBSD rand(3) implementation. I still use it for simple stuff. Some time later I started using googletest once I got more comfortable building and linking external projects to my own.

                                                    I definitely thought of unit testing as 100% test coverage, even for trivial crap, rather than automating manual tests. Likewise I thought of mocks as enterprise OO hype rather than an easier way to set up and test obscure edge cases.

                                                    1. 2

                                                      I’ve nearly always worked on teams of 5 or less. The last two software jobs, those teams are part of a larger team as well, but code was mostly segregated. Ish.

                                                      As far as “aha” moments have gone, I had a nice experience with working on PISC, which only had me working on it, but getting tests in place helped me keep from changing things I didn’t mean to change. Reading about how testing worked in Moon Pig has felt like another aha moment, as was working through half of 99 Bottles of OOP, which showed me how quickly unit tests should be running. I’ve yet to have unit tests protect my code as such, but I’m convinced enough of the value of testing to give it a good run.

                                                      As an added factor, the NUnit Visual Studio extension is not a snappy experience. Comparing that to the testing framework used in 99 Bottles of OOP, I realized why the Red -> Green -> Refactor cycle is a thing.

                                                    1. 29

                                                      I worked at large companies with user-facing products similar to what the author referenced - not Apple, but Skype, Microsoft and Uber. I worked or observed the team closely on OSes like XBox One and Windows 8, similar to the MacOs example. I think the example is overly dramatic.

                                                      In software - especially with BigTechCo - the goal is not to ship bug-free software. It’s to ship software that supports the business goal. Typically this means gaining new customers and reducing user churn. Ultimately, the end goal of publicly traded companies is to provide shareholders value. And Apple is dam good at this, generating $50B in profit on $240B in revenues per year.

                                                      All the examples in this blog post are ones that won’t result in user churn. The Catalina music app having a buggy section? Big deal, it will be fixed in the next update. The Amazon checkbox issue? Same thing: it will be prioritised and fixed sometime. They are all side-features with low impact. The team might have known about it already. Or - more likely - this team did not spend budget on thorough testing, as what they were building isn’t as critical as some other features.

                                                      The Skype app was full of smaller bugs like this: yet it dominated the market for a long time. When it failed, it was not for this. Catalina likely spent resources on making sure booting was under a given treshold and updates worked flawless. Things that - if they go wrong - could lead to loss of customers, gaining fewer new ones. So things that would directly impact revenue.

                                                      Finally, a (very incorrect) fact:

                                                      Lack of resources? This is Apple, a company that could’ve hired anyone in the world. There are probably more people working on Music player than on the entire Spotify business. Didn’t help.

                                                      This is plain false and very naive thinking. Teams are small at Apple and the music player team for Catalina is likely 10-15 people or less, based on my experience. While Apple could hire an army for an app like this, then they would not be the $1T company they are today. They have that valuation because they are very good at consistently generating high profits: for every $10 of revenue, they generate $2 of profit. They hire the number of people needed to make a good enough product and don’t spend money just because they have it.

                                                      What did change is Apple used to have a huge budget for manual testers: it was insane, compared to other companies. Due to rationalising - publicly traded company et al - the testing department is likely smaller for non-critical apps. Which puts them in-line or slightly above with the rest of their competitors.

                                                      I am not saying that bugs in software are great. But consumer-facing software development is more about iteration, speed and launching something good enough, than it is about perfection. It’s what makes economic sense. For other industries, like air travel or space, correctness is far more important, and it comes at the expense of speed and iteration.

                                                      It’s all trade-offs.

                                                      1. 12

                                                        It’s to ship software that supports the business goal.

                                                        This is really the fundamental miss of the author. Author doesn’t understand that (1) crap happens and (2) the level of quality required for a satisficed user is lower than he thinks.

                                                        Teams are small at Apple and the music player team for Catalina is likely 10-15 people or less, based on my experience.

                                                        Also, I can’t lay my head on a citation, but I think that it’s been studied that smaller teams produce better quality software (up to a point, ofc).

                                                        1. 3

                                                          All the examples in this blog post are ones that won’t result in user churn. The Catalina music app having a buggy section? Big deal, it will be fixed in the next update. The Amazon checkbox issue? Same thing: it will be prioritised and fixed sometime. They are all side-features with low impact. The team might have known about it already. Or - more likely - this team did not spend budget on thorough testing, as what they were building isn’t as critical as some other features.

                                                          The inability to open the iTunes Store might be bad for sales, so they’ll probably want to fix that one. But yes, as long as the basic features are working, these bugs are fine, on some level. This is how it is.

                                                          I think he is trying to highlight something on a more fundamental level: it should not be so easy to write these kinds of bugs. The developers should have to go out of their way to write them. But with the tools they have been given, it seems they have to work very hard to avoid writing bugs. It is like they have been given hammers that by their nature have a tendency towards hitting thumbs and sometimes manage to hit both your thumbs at the same time.

                                                          Let’s turn it around. Suppose software had a fundamentally good and auspicious nature. Suppose also that your product owner was a tricky fellow who wanted to add some bugs in your program. He comes up with a user story: as a user, sometimes I want to have an item be selected, but not highlighted, so as to further my confusion. I think the result of this would be a commit with some kind of conditional statement, table-driven code or perhaps an extra attribute on the list items that activates the bug path. The point being that you would need to add something to make it buggy. With the tools the Catalina music app team had, they very likely did not have to add anything at all to get those bugs.

                                                          The instances of bugs he brings up suggests to me that the tools involved were not used for their intended purpose. They were used to create simulacrum software. The Amazon checkboxes probably get their state from a distributed system where they “forgot” to handle multiple pending state changes. They could instead have used a design where this would never be an issue at all. If it had been designed properly, they would indeed have needed to add code to get it that buggy. And the buggy list items are probably not even in lists, but merely happen to sometimes visually resemble lists. And so on.

                                                          It is not good that this sort of thing happens regularly. One example from my own experience: the Swedish Civil Contingencies Agency (MSB) has an app that alerts you to important events. I cannot count how many times it has lost its settings and has defaulted to alerting about everything that happens everywhere. I have uninstalled that app. When the war arrives, I’ll be the last one to know.

                                                          1. 4

                                                            Teams are small at Apple and the music player team for Catalina is likely 10-15 people or less, based on my experience.

                                                            Yes, this accords with my experience. I would be surprised if it were that many people; the number of people who were working on the iTunes client was shockingly small, and they’d never grow the team just for Music.

                                                            1. 3

                                                              Based on my experience at Apple, I’d be surprised if the Music app was an actual team. Much more likely it was a few folks from another team that was tasked with creating it as a part-time project and wasn’t their primary project. Or, it could’ve been 2 folks who were fairly junior and tasked with writing it with occasional assistance.

                                                              In my experience teams and projects were sorely understaffed and underfunded (unless it was wasteful projects like the doomed self-driving car, in which case they were showered with far too much money and people). It was just shocking to work at a company that had so much excess cash and couldn’t “afford” to add people to projects that could really use them.

                                                            2. 2

                                                              All the examples in this blog post are ones that won’t result in user churn. The Catalina music app having a buggy section? Big deal, it will be fixed in the next update. The Amazon checkbox issue? Same thing: it will be prioritised and fixed sometime. They are all side-features with low impact. The team might have known about it already. Or - more likely - this team did not spend budget on thorough testing, as what they were building isn’t as critical as some other features.

                                                              One bug like this would not make me “churn”, but two or three would. I no longer use Chrome, nor iTunes, nor iOS, because of exactly this type of drop in quality. I no longer even bother with new Google products, because I know that they’re more likely than not to be discontinued and dropped without support. I no longer consider Windows because of all the dark patterns.

                                                              I am on the bleeding edge relative to less techical users, but I am also a lazy software dev, meaning I hate tinkering just to make something work. I’ll probably go with GNU for my next computer. And a year or two later, I bet so will my neighbor who just uses email and my friend who just needs to edit photos and my other friend who just writes papers.

                                                              1. 7

                                                                I no longer use Chrome, nor iTunes, nor iOS, because of exactly this type of drop in quality . . . I hate tinkering just to make something work.

                                                                I’ll probably go with GNU for my next computer.

                                                                🤨

                                                                1. 1

                                                                  Not sure if you intended for that to show up as “missing Unicode glyph”, but it works.

                                                                  You’ve got a point there.

                                                                  Until now, I have been using macOS for the hardware support, a few niche apps for stuff like playing music, and a GNU VM (Fedora LXDE) for dev which has proven to be a low-maintenance setup all around.

                                                                  1. 2

                                                                    The “missing Unicode glyph” is the Face with one eyebrow raised emoji and shows up for me using Chrome on iOS and Windows.

                                                                    1. 2

                                                                      And FF on Android

                                                            1. 4

                                                              Bootcamps as a means to enter the industry. In 2010, a tech bachelors was a de-facto requirement for most jobs, with a few being fine with self-taught applicants.

                                                              I haven’t seen a change in this. Tech has always had ways in for people with no degrees (like me) or with non-tech degrees. I don’t know how to measure this as it’s always been anecdotal for me. But I’ve seen like 5-20% in small and large orgs of people who don’t have CS degrees.

                                                              1. 2

                                                                I think you have a point. What I tried to express is how I’m seeing a lot more people with self-taught background enter the industry having gone through a bootcamp, versus the time when there were no bootcamps. And some companies are being more open to seeing a coding bootcamp as a token of “this person knows something about tech”.

                                                                Where this change has been visible though is more and more of the large tech companies - Google, Apple, IBM - universally dropping bachelors requirements in job applications from back in 2010 when this was still mandatory. Hopefully, more and more companies to follow and it won’t just be smaller companies/startups where they won’t care about degrees.

                                                              1. 2

                                                                Second, the new system was built in an elegant way, without any workarounds. However, the legacy system it replaced has a lot of edge cases: edge cases we discovered during the shadowing/rollout phase. In some cases, we needed to make tough business decisions whether to support an edge case or drop it.

                                                                This sounds like a roundabout way of saying “The new system was made smaller than the old system, and smaller systems are easier to make.” Am I getting that right? Not criticising the choice of words; genuinely curious.

                                                                Very salient learnings, either way. A lot of it echoes Demings teachings of quality and continuous improvement.

                                                                1. 3

                                                                  The new system we built was more complex than the old one, as it was more future-proof. What was not clear when we did the rewrite was

                                                                  1. The amount of undocumented workarounds the old system had (“hidden business logic”)
                                                                  2. How some known bugs in the old system were, in fact, features and fixing these led to unexpected behaviours. So by fixing these, we had clients fail as they built on top of these.

                                                                  The crazy part was we knew this might happen so we did our research. But it was with rolling out and monitoring we caught rouge clients using the old system in weird ways or teams who built on top of explicitly deprecated APIs that were supposed to be retired 12-18 months ago but never were.

                                                                  Note that the system I’m talking about had a good 20-30 known consumers and a few new ones we discovered during migration.

                                                                1. 2

                                                                  @gregdoesit what has the holocaust memorial in Berlin got to do with distributed systems?

                                                                  1. 1

                                                                    @james - it doesn’t and thanks for flagging. I did not check what the photo was taken of. I’ve replaced the image with something that hopefully is more neutral.

                                                                  1. 2

                                                                    After blogging quite a bit in 2019, I’m planning to learn how to write a tech book on growing, as a software engineer.

                                                                    In general, I find a great way to set a goal on teaching something - that you don’t know. In my case, I’m sure I’ll learn a lot more about software engineering and career growth next year, through the book project, than I’ve done before.

                                                                    1. 9

                                                                      I have also found that writing is a forcing function of thinking and creating. To write first, then do the work (whatever the work is) is forcing to slow down and structure things before jumping into large tasks.

                                                                      It’s why writing down plans for projects before starting them (RFCs) work. I’ve also observed writing well - or writing, at all - being an undervalued engineering skill.

                                                                      1. 5

                                                                        We have an internal RFC process for engineering that people use to write and propose changes they want to implement (the judgement is if it’s something that’s worthy of input from one or more engineers, throw it into the RFC).

                                                                        It’s fantastic, and the benefits are immediately obvious. Forces you to really think through what you’re proposing and people won’t take you for meetings whenever they want – everything’s in one place for them to read, discuss, refine.

                                                                        Of course, the real problem is when people just won’t read, but that’s more of a slow cultural change to get them to look in there.

                                                                      1. 5

                                                                        On my team of software engineers, we rotate the project lead role. I thought it would be interesting to share the approach we came up with.

                                                                        I’d be interested to hear: how do you run projects on your team and what role do you play, as an engineer/developer?

                                                                        1. 1

                                                                          Thanks for posting this, as an engineering manager it’s always useful to learn about how other people are doing it (better than I do, I suspect)

                                                                        1. 4

                                                                          I also loved this book - and applied the core message (empowering people) on my engineering team. With pretty good success: https://blog.pragmaticengineer.com/a-team-where-everyone-is-a-leader/

                                                                          1. 5

                                                                            Really liked the article, clear and concise! however i ended disagreeing with the disagreements.

                                                                            • A strong stance against exceptions

                                                                            Exception are good only for exceptional cases and not any non-favorable behavior. An http request failing is not an exceptional case but most systems treat it as such and create infrequent code paths which can lead to unforeseen failure conditions. (further anecdotal proof/opnion)

                                                                            • John is quite in favor of writing comments

                                                                            The post author’s stance is not in congruence with their agreement on deep modules. If a class/module has enough depth, it might require a why comment.

                                                                            1. 2

                                                                              Thanks for the feedback and appreciate you voicing your disagreement.

                                                                              For exceptions, the book seemed to suggest going out of the way to throw them. I’m not that dogmatic and don’t mind them - as long as there’s good monitoring, and we follow a zero-exceptions policy. So I’m more on the mitigation side, than prevention, especially for backend software.

                                                                              For comments, I’m not a fan of inline comments. The “why” comments, especially at module/class level, I am definitely for. I really like how you put it: for deep classes, comments can probably explain a lot of their depth, helping maintenance greatly. Until I read your comment here, I did not think along this dimension, though. Thanks!

                                                                            1. 5

                                                                              I read this book a few months ago, and like Mr. Orosz I felt like it was pretty good but kind of a mixed bag.

                                                                              The distinction he makes been tactical and strategic programming is a fantastic point, and probably my #1 takeaway. Shallow vs deep modules is also a great metaphor that I apply in my career.

                                                                              I think my biggest disagreement, or shall we say eyebrow raise, was the juxtaposition of his opinion on comments and unit tests.

                                                                              Comments provide the only good way to fully capture abstractions, and good abstractions are fundamental to good system design. If you write comments describing the abstractions at the beginning, you can review and tune them before writing implementation code. To write a good comment, you must identify the essence of a variable or piece of code: what are the most important aspects of this thing?

                                                                              – Section 15.3 p. 131

                                                                              Now, compare that to:

                                                                              Although I am a strong advocate of unit testing, I am not a fan of test-driven development. The problem with test-driven development is that it focuses attention on getting specific features working, rather than finding the best design.

                                                                              – Section 19.4 pg. 155

                                                                              Wow do I disagree with that.

                                                                              Now, I’m not dogmatic about TDD. Sometimes you just have to prototype things without tests when you’re finding the design. But TDD helps me think about interfaces from the outside in exactly the same way he talks about the benefits of comments.

                                                                              IMO, comments are less helpful than unit tests for this because they always drift out of sync with the code, because you usually don’t have a system to ensure their correctness. (Yes, Go is a notable exception)

                                                                              He addresses this last concern this way:

                                                                              Comments do sometimes get out of date, but this need not be a major problem in practice. Keeping documentation up-to-date does not require an enormous effort.

                                                                              Section 12.3 pg. 98

                                                                              LMAO. Okay dude. I don’t even know how to respond to this.

                                                                              1. 3

                                                                                Thanks for finding the examples. Yes, it seems both you and me have similar views on tests/comments. Most of my experience/pushback on too many comments comes from having seen them become out of date and confusing. I do see, how e.g. for an intro into software design course, this might not come up. But like you, I’d be very-very hesitant on choosing comments over tests, for maintenance.

                                                                                1. 4

                                                                                  I think both are valuable, and on balance I think a codebase with appropriate comments is better than one without.

                                                                                  But if I had to choose between unit tests and comments, I’d pick unit tests without hesitation.

                                                                              1. 7

                                                                                The author overlooks that, often times, the cognitive load of engaging with the owner of a project is greater than “scratching the itch”. My approach in that situation is to scratch whatever “itch” I have to unblock my task at hand. Then, when I have time and if I find my patch worth submitting, I follow the guidelines of the project and submit a PR. I attach no pride to that submission and expect it to be rejected for whatever reason. When that happens, I am usually happy to rework the patch.

                                                                                1. 5

                                                                                  In addition to this, I feel the “Why are you making this change? What problem are you trying to solve?”-question should have been the first comment on the PR, instead of arguing that “it’s a bad change”. People usually don’t make these kind of changes for the fun of it.

                                                                                  1. 3

                                                                                    I would say that to me, this goes backwards. If someone offers a change, I expect them to explain why they’re going it.

                                                                                    Communicating the intent before is always better than having to ask why after it’s done.

                                                                                    The author even gives a the very particular case of a distributed team, where a team might have lost hours of implementation even before spending 10minutes in discussion.

                                                                                    1. 1

                                                                                      Well, ideally, yes. But people get caught up in their reasoning at times and assume it’s all just as obvious to other people as it is to themselves. Sometimes it is; often it’s not. It’s a simple human mistake that I think we’ve all made at least once or twice.

                                                                                      Either way, don’t start argueing over the how if the why isn’t clear, which seems to have happened in the OP’s case.

                                                                                      1. 1

                                                                                        Agreed. I feel this often happens with strong ownership models.

                                                                                  2. 1

                                                                                    Good point. I’m the OP and perhaps I could have made it more clear that I’ve seen this issue happen more frequently with distributed teams in different timezones, especially when trying to make changes to less mature codebases owned by another, remote team.

                                                                                    As I’ve observed these instances from a step back, I found it wild how narrow the person proposing the change was thinking, and how much time they poured into making a change that never made sense from the platform’s point of view. And these are smart engineers, who were under pressure to solve their pressing problem, on the spot.

                                                                                    I’ll give you an example: a customer of a platform was frustrated how localization changes took around a day for the system to pick up. And this was blocking their team. They noticed that whenever the service was restarted, localization changes got picked up immediately. So they put together a pull request, implementing a way to restart service instances on-demand, when localization changes were done. They didn’t think further on what restarts would mean in a distributed world, how this could impact jobs, workflows and the many customers of this platform. And they added no commentary on the pull request.

                                                                                    Had they talked with the platform team, outlining the problem, they would have learned that this was a known issue and the way it was being solved is… well, the platform emptying localization cache. It was just not a priority until then, and the platform engineers were unaware of the issue this caused.

                                                                                  1. 2

                                                                                    We follow an internal open source model with strong code ownership, where any engineer can put up pull requests for codebases. The owning team of that system sets up blocking code reviews.

                                                                                    Worked at a place where this was practiced. A few core teams (10-20 people) blocked the rest of the company. The fan-in was too great. Blame the org chart or the process. Worked at other places where CRs/PRs are a courtesy, where automated testing keeps most of the bad stuff out and where you fix what you break. I felt that people in that environment were more open and willing to discuss things beforehand too. Less hierarchy, less gatekeeping. YMMV

                                                                                    1. 2

                                                                                      Thanks for sharing your experience. I guess it all depends on the environment. At Uber, there are hundreds of services, each service owned by a team, all of them doing blocking code reviews for their own services. It evolved like this, after the alternative - anyone pushing changes that made sense to them - quickly turning into chaos, building up large tech/architecture debt. My team is just untangling such a case, where a central library had no blocking reviewers, and over the course of a year, it’s turned from a reusable library to a hot piece of mess, with dependency and maintenance hell, after engineers were making changes that scratched their itch.

                                                                                      Having an owner for a service, who makes sure it is changed in a strategic, over a tactical way, is a pretty good thing.

                                                                                      There are cases when a team/service becomes a bottleneck. In this case, I’ve seen two things happen. First, put an SLA for reviews in place: usually 24 hours. If that SLA is not met, teams are free to push changes. Second, start a negotiation on how to remove the blockage. This is usually resolved by either carving off a part of a service/codebase to a separate service or code ownership, that a smaller team owns and decides to implement blocking code reviews (or not). The other approach is to formalize “core” code review requirements, then onboard members of other teams to be these core reviewers, helping keep the quality bar high.

                                                                                      If a small group is blocking a larger group, that is a bad thing and something needs to change. Sounds like therer was little flexibility at that company to do so?

                                                                                    1. 6

                                                                                      So, projects (as defined by the rather pedantic PMI) have an end.

                                                                                      After this engineer launched stuff like Skype on Xbox One, what happened? Did all work stop, because the project ended? All engineers never touched that code again? Nope! They posted 4-5 months later about new features and bug fixes.

                                                                                      What happened here is they estimated a product launch (which could be a project) and then kept working on the product, into perpetuity. If it’s supported into perpetuity, it doesn’t qualify as a project anymore.

                                                                                      The reason I make the distinction is that most of the teams I have seen struggle are struggling because once the “launch” is complete, the project management team gives them a new project for a completely different product. They then spend their time losing their minds trying to do maintenance for all of their old stuff while also building the new stuff.

                                                                                      1. 2

                                                                                        Funny you’re asking this question. Yes, work continued for a while. And about 6 months later - right after that second update - the team was dismissed, Skype for Xbox one put into maintenance mode. It was good enough. An Xbox team specialising I’m maintaining “complete” apps took it over, doing monitoring and having the code, in case any changes were needed later. Devs either moved to a new team within Skype/Microsoft, or left. None of the original devs touched the codebase again (as far as I know).

                                                                                        So despite what you’d think from the outside, there really was an end here quite soon after the launch.

                                                                                        1. 1

                                                                                          Yes, work continued for a while. And about 6 months later - right after that second update - the team was dismissed

                                                                                          So…. you had a project timeline that you set with 100+ teams a year in advance. The product launched, but it had enough new features to add (and bugs to fix) that they kept the project running for an extra 6 months (+50% of the original estimate). This is your example of a good case for estimates?

                                                                                          Your other case was that you beat your estimate, leaving more time to experiment… but doesn’t that mean you had a feature complete app that you sat on for a period of time - experimenting? Why was this a good thing?

                                                                                          From your article:

                                                                                          Have you noticed how Apple ships most of their big bang projects at WWDC, a date they commit to far ahead?

                                                                                          From the Wall Street Journal (quoted in another place, admittedly)

                                                                                          “Of the 70-plus new and updated products launched during Mr. Cook’s tenure,” the Journal notes, “five had a delay between announcement and shipping of three months or more and nine had delays of between one and three months.”

                                                                                          Being generous and assuming 70+ means “80”, Apple has missed a little over 17% of their product launch dates by more than a month. 6% of the time they’re off by more than 3 months. What are the odds that they could achieve similar results simply by saying

                                                                                          “Whatever is ready for launch the month before WWDC makes it into the presentation”

                                                                                          instead of

                                                                                          “Here’s a sneak peek at Apple AirPower, coming next year!” followed a year+ later with “whoops! It turns out we can’t figure out how to make it work. 😳”

                                                                                      1. 4

                                                                                        This one is a classic from Google. I also wrote up my experience monitoring large systems, working at Uber. There’s some overlap between the two parts.

                                                                                        Also, a book on SLOs being written by Alex Hidago is a project I’m following with interest: https://twitter.com/ahidalgosre/status/1174324046026215426?s=21.

                                                                                        1. 2

                                                                                          Thanks really useful