There are, of course, jobs where you would need to know how to (for example) invert a binary tree…but those are relatively rare and outside of those jobs on the rare occasion where you need to know, you can look it up.
Far more important, IMHO, is knowing the performance characteristics of different data structures and the operations possible on them. Should I use a hash table or a binary tree for this lookup? Or a B-tree? Why a B-tree over a plain binary tree?
The answers to these kinds of questions are not academic exercises; they have very real implications. If I need to extract a range of keys in sorted order, a hash table won’t help me at all, for example.
This is a great idea for a post that I’ve wanted to write myself. Leaving aside trees, hash tables, and stacks, I probably used less than one “interesting” data structure/algorithm PER YEAR of programming. Some notable counterexamples are two kinds of reservoir sampling, regular languages/automata, random number generation, some crypto, and some info retrieval algorithms.
One thing that sparked it is a obscure but long-running debate over whether dynamic programming interview questions are valid.
I don’t think they’re valid. It’s mostly a proxy for “I got a CS degree at a certain school” (which I did, I learned dynamic programming in my algorithms class and never used it again in ~20 years.)
I challenge anyone to name ANY piece of open source software that uses dynamic programming. Or to name an example in your own work – open source or not.
I’ve tried this in the past and nobody has been able to point to a concrete instance. I think the best I’ve heard is someone heard about a professor who heard about some proprietary software once that used it.
Related: algorithms used in real software (although this is certainly not representative, since compiler engineering is a subfield with its own body of knowledge):
I challenge anyone to name ANY piece of open source software that uses dynamic programming.
Git, or most reasonable implementations of “diff”, will contain an implementation of the Myers Algorithm for longest-common-subsequence, which is very dynamic-programmy.
No concrete example for this one, but I know that bioinformatics code is full of dynamic programming algorithms for the task of sequence alignment, which is similar to diff — identifying a way to align two or more base sequences so that they coincide with the minimal number of changes/additions/deletions required to make them identical.
Hm I’m familiar with that algorithm but I never thought of it as dynamic programming.
Wikipedia does say it’s an instance of dynamic programming. Although when I click on the paper, it appears to contrast itself with “the basic O(MN) dynamic programming algorithm” (section 4).
Since you mentioned dynamic programming, it’s worth pointing out that the name “dynamic programming” was chosen for political reasons, as pointed out in the history section of the Wikipedia article on dynamic programming. So I think it’s a really bad name.
It also has a very interesting property as an adjective, and that is it’s impossible to use the word dynamic in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It’s impossible.
LOL
Agree it’s a terrible name… I would say it was chosen for “marketing” reasons
I have thought about whether dynamic programming questions are fair to ask, and I ended up where you are: they are not.
Dynamic programming was the one I struggled most in understanding and implementing correctly. And while there are semi-practical examples (like the backpack problem), I have not found any practical, day-to-day use cases on this.
I had an argument with my colleague who asked this kind of problem, saying it’s basic knowledge. Turns out he did competitive programming and there, it is table stakes. But in practice, it just filtered for anyone who has learned and practiced this approach.
I stay away from asking this, problems that need dynamic programming to solve.
I’m familiar with dynamic programming mostly from high-school competitive programming as well. Otherwise I can’t say I’ve encountered real-life problems where it occurred to me to use the technique.
I challenge anyone to name ANY piece of open source software that uses dynamic programming. Or to name an example in your own work – open source or not.
I’m literally about to implement something that could be classified as dynamic programming at work, which can be sumarrized as “computing a few simple statistics such as number of bytes changed for each directory in a large filesystem”. Dynamic programming is such a general term that it applies regularly if you want to use it.
I’d like to see it. I don’t think dynamic programming is a general term.
In fact one time I participated in this argument (which was years ago so my memory is fuzzy), a former CS professor I worked with explained the difference between memoization and dynamic programming. A bunch of 10-20+ year programmers like myself went “ah OK”. Unfortunately I don’t even remember the difference, but the point is that most programmers don’t, because dynamic programming is very uncommon.
What you’re describing sounds like an obvious algorithm that anyone could implement, which is not the same as dynamic programming interview questions, or dynamic programming in competitive programing.
As other posters mentioned, competitive programming is the main place you see it outside of a CS class.
It’s absolutely an obvious algorithm, so is most dynamic programming. That was sort of my point.
Can’t share the code unfortunately, but it’s just iterate over sorted list of file changes in reverse order and collect statistics as we go. Dynamic part comes from the fact that we can just look at the subdirectories of a dir (that we already have numbers for) instead of recursing into it.
What you’re talking about could be called memoization, or it probably doesn’t even deserve that name. It’s just what a “normal” programmer would come up with.
That’s not the type of thing that’s asked in interview questions or competitive programming. The wikipedia page gives some examples.
Dynamic programming usually changes the computational complexity of an algorithm in some non-obvious way. There’s very little you can do recursing over a directory tree that doesn’t have a clear O(n) way to code it (e.g. computing statistics).
I like Erik Demaine’s explanation, that problems where dynamic programming can be applied are ones where their subproblems and their dependencies can be modeled as a directed acyclic graph [1]. Up to you if you’d like to tackle that with a top down approach where you look at a node and calculate its solution based on the solutions of its ancestors, or a bottom up approach starting from the nodes in the DAG with no dependencies and propagate the solutions in topological order.
My colleague and I used it for a generalization of matrix chain multiplication (for tensors) [2].
edit: by the definition above, even naive memoization can count as DP, if you’re caching solutions to subproblems of that structure. Doesn’t have to be at the difficulty level of competition to count as DP in that case.
Hm interesting, searching through my personal wiki, I found a related definition which is different. I haven’t really thought about it enough to form an opinion.
Either way, it doesn’t change my overall point: that there are certain kinds of algorithms problems that appear on coding interviews, and in competititve programming, that do not show up in 99% of programming jobs. They are easy to pose and have cute solutions, but aren’t testing very much.
I think the “DP tables” part is key but again I haven’t thought about it enough …
Memoization is fundamentally a top-down computation and DP is fundamentally bottom-up. In memoization, we observe that a computational tree can actually be represented as a computational DAG
In DP, we make the same observation, but construct the DAG from the bottom-up. That means we have to rewrite the computation to express the delta from each computational tree/DAG node to its parents. We also need a means for addressing/naming those parents (which we did not need in the top-down case, since this was implicit in the recursive call stack). This leads to inventions like DP tables, but people often fail to understand why they exist: it’s primarily as a naming mechanism (and while we’re at it, why not make it efficient to find a named element, ergo arrays and matrices).
This bottom-up / top-down distinction might have been the same as what the aforementioned professor said 5+ years ago, but I don’t remember exactly.
I would probably say neither… Either factorial or Fibonacci is so trivial that it doesn’t help to be thinking about it that way.
Though I think the quote hints at a clear test for whether it’s top-down or bottom-up: if you need extra state outside the call stack. I’m getting curious enough to try this out, but right now all I can do is quote what other people say.
In any case it’s clear to me that there’s some controversy over what dynamic programming really is. I think the issue is that a lot of algorithms could be framed that way but were not discovered that way, and not taught and learned that way.
I would probably say neither… Either factorial or Fibonacci is so trivial that it doesn’t help to be thinking about it that way.
I think that the triviality is actually helpful here. If it’s actually true that memoization and dynamic programming are different (and there’s clearly debate on this), can 2 implementations of a trivial function, that everyone can understand highlight the differences?
On the contrary the stupidly naive way (recurse on every directory) is O(n^2).
Dynamic programming is just solving a series of problems while using the answers of shared subproblems multiple times. Memoization is a common way to implement this.
Yes there are some very clever algorithms that use dynamic programming, this doesn’t make obvious algorithms that use dynamic programming not also fit under the definition.
Say you have a directory structure with a single file in it, /a/b/c/d/e
To get the number of bytes changed in e you need to visit e, then to get the number of bytes changed in d you need to visit d and then e, then for c you need to visit c, d, and e, and so on.
Like I said, it takes a really naive solution, but if you don’t remember the values you calculate anywhere for some reason it’s sum over inodes (depth of inode)… which is O(n^2) (for bad directory structures).
Note that I need these values for every directory, not just for one particular directory.
That’s still linear complexity space. Unless you’re hardlinking directories (which you then have to deal with potential recursion), it’s still O(n). If you add a file at /a/b/c/file you only visit 1 more file and no more dirs, not an exponential. O(n + n + n) or O(n + 3) still simplifies to O(n).
If you add /a/b/c/file you add 4 more visits, not 1. Going from n= 3 /a/b/file to n=4 /a/b/c/file adds 4 more visits. In other words this worst case example takes time O(sum from 1 to n of i) = O(n(n-1)) = O(n^2).
N is the number of inodes in a arbitrary tree not a number of files in a fixed tree.
That’s still adding a linear number of operations for each file, the depth could technically be considered a different variable, say m. So for each file (n+1) you add, you also add the number of directory traversals (m) resulting in O(m+n), which simplifies again to O(n), but in reality folders are files too, so are part of n in the first place, so again O(n). Ultimately your n space is the total number of inodes, which both files and folders have.
Abstractly, you’re just traversing a tree structure (or a directed graph if using links), which is well understood to be O(n) (maybe O(n^2) worst case if all folders are links, resulting in a fully connected graph), because you only visit each node once. If it were O(n^2), you would visit each node n times.
Remember, Big O notation is about scaling, not the actual concrete number of operations, which is why you drop any constants or variables other than n.
Even if it was O(mn) (it’s not), that still simplifies to O(n). An ‘internal nodes’ tree traversal is still O(n), n is just smaller, but again, your problem is not an internal nodes traversal, it’s a full traversal because you have to look at the blocks attached to the file (leaf) inodes, which means you need to read all inodes of all files and of all folders one time each. n = # of files + # of folders = O(n)
I supposed an extremely naive solution could be to fully traverse each sub tree for every folder visited, which would be… O(log n)? But even that isn’t O(n^2), as the total repeated space shrinks the deeper you get.
You’re assuming a balanced tree, which is not guaranteed. Depth of tree is O(n) in pathological cases (and average case O(sqrt(n)) is typical for randomly generated trees)
Ah yeah, I think it would be O(n log n) not O(log n), because you traverse the tree once for each node, and a subset of of the tree for almost every n (except leafs), at least in the worst case. Still not O(n^2), and the solution for a O(n) is almost easier to conceptualize than the completely naive solution :)
and the solution for a O(n) is almost easier to conceptualize than the completely naive solution :)
No argument here…
I think it would be O(n log n)
We agree it’s O(n) * O(time tree search) now right? And you’re trying to call the tree search time log(n)? Because trees are height log(n)? Then see the post you replied to, that’s true in a balanced tree, it’s not true in a random tree (where it is sqrt(n)) and it’s definitely not tree in a pathological worst case (where a tree is just a n length linked list).
Yeah, the part I was hung up on before was that you’re naive solution traverses the entire subtree below a node for each node visit, I was stuck in the simple optimal solution. For the pathological case, basically just a bunch of folders in folders with a single file at the bottom, the depth of the tree is n, and the file inode at the bottom would be accessed n times, so O(n^2). For the common case it would be about O(n log n) where you can skip traversing larger and larger parts of the tree the deeper you get on each ‘path.’ Thanks for the discussion, I enjoyed it :)
I think comparing memoization to dynamic programming is a category mistake: they are different kinds of things.
‘Dynamic programming’ is a description of a style of algorithm. It’s divide-and-conquer, usually with overlapping subproblems, making it possible to reuse intermediate results.
Memoization is a programming technique to remember intermediate results, by remembering the results of function calls. You can e.g. also store the intermediate results somewhere explicitly, usually in a matrix, in which case you don’t memoize the result ‘transparently inside the function’, but use a lookup table ‘external to the function that computed the result’.
A note on terminology: Some people call top-down dynamic programming “memoization” and only use “dynamic programming” to refer to bottom-up work. We do not make such a distinction here. We call both dynamic programming.
There does seem to be disagreement on what dynamic programming is. And many algorithms that were not derived with dynamic programming techniques could be described as dynamic programming.
But it seems that most people agree it’s related to memoization.
I challenge anyone to name ANY piece of open source software that uses dynamic programming. Or to name an example in your own work – open source or not.
Ever do video encoding or transcoding with anything built on FFmpeg or x264? Encode images with MozJPEG? Encode an AV1 video or AVIF image with libaom? Trellis quantization in advanced lossy compression encoders is a dynamic programming algorithm.
Oh, no disagreement there! Even after coding it up myself, I’d hate to have someone ask me to whiteboard a working implementation of trellis quantization in 40 minutes or whatever (though I’m pretty sure I could sketch out an explanation now).
In general I’m not a fan of whiteboard coding exercises at all. Whenever I’ve interviewed candidates I’ve always preferred the old-fashioned method of just reading their resume well ahead of time, looking up what ever piques my interest on it, and then having a friendly conversation about that. Usually that provides plenty of esoteric material for me to quiz them on and it also lets them show me their strengths and enthusiasm.
My current company doesn’t do a whiteboard exercise, but my previous one did… but the thing is, the usual task was to implement a basic “grep”. That is, read a file and print all of the lines that contain a user-specified string, in a language of your choice, with whatever libraries make you happy (it’s not a trick, you’re not supposed to implement Boyer-Moore on your own). Assuming you succeeded at that, we would ask you to implement a few more features, like a -v flag (only print lines that don’t match), and -A and -B flags (print context lines before and after the matching line), until you got stuck or the time for that segment was up. It wasn’t graded on minutiae like correct semicolon placement, it was just an evaluation of whether a candidate could do a trivial task, how they handled additional requirements, whether they asked sensible questions and got clarification when needed, etc. I found it pretty reasonable.
I challenge anyone to name ANY piece of open source software that uses dynamic programming. Or to name an example in your own work – open source or not.
I used Warshall’s algorithm (which is dynamic programming) to compute the transitive closure of a graph for a typechecker. This is, in my experience, a very common algorithm.
In high school, I wrote a program for my professor that places students into groups of 4 such that their meyers-briggs personalities are as different as possible. This used dynamic programming.
A professor of mine (who taught the subject to me) used dynamic programming for some kind of RNA sequencing problem in a paper he published. One of our homework assignments had us arrive at a watered down version of his (and his co-authors’) algorithm.
I’m fairly certain that at least some fuzzy string matching algorithms use string distance, which is also solved using dynamic programming.
These are all diverse applications of DP. In my personal, subjective experience, the idea that DP is in any way obscure or dated is absurd.
Edit:
To be more concrete, the “transitive closure of a graph” is for the graph of dependencies, computing the set of all functions that a particular function depends on. This is as described in the Haskell Report.
For fuzzy string matching, I have in mind something like fzf, though I cannot say with certainty that it uses string distance (I’m unfamiliar with its implementation).
Thanks for the examples. The claim is definitely not that it’s outdated or obscure; the claim is that it’s not a good interview question because it doesn’t show up much at work. Although there were lots of people here who pointed out interesting uses of dynamic programming, that’s not incompatible with the idea that you could have a 10 or 20 year programming career and never use it.
Side note: I’m familiar with the Floyd Warshall algorithm but I never thought of it as dynamic programming. I think part of the issue is that I may have a more narrow definition of it than others. (I think people even say the linear time fibonacci is an example of dynamic programming, which I find silly. I just call that the obvious algorithm. I guess it can be used to illustrate a principle.)
Even so, I definitely think it’s more popular in universities, and certain domains like bioinformatics. In contrast to what people on this site typically do “for work”.
I challenge anyone to name ANY piece of open source software that uses dynamic programming. Or to name an example in your own work – open source or not.
I do a lot of work with text processing – computing the edit distance between two strings is something I do very often. That’s a classic dynamic programming algorithm. There are probably hundreds of open source packages that do this or some variation thereof.
Just to add to the list of responses clearly demonstrating Cunningham’s Law:
I believe the Knuth-Plass line-breaking algorithm used in LaTeX to lay out text “optimally” uses dynamic programming. This was done for efficiency, as opposed to using some kind of general global optimization routine. It’s also the reason why LaTeX doesn’t support “backtracking”.
It’s also the reason why LaTeX doesn’t support “backtracking”.
Sile uses a variant of the same dynamic programming algorithm to lay out paragraphs on a page. The original paper describing the algorithm says that TeX wanted to use it like that, but it would require more than one entire megabyte of state for a large document, which was infeasible.
Thanks for the example. If there were more specific links I would make a blog post out of this :)
And although my comment was a tangent, it did motivate me to get out the “Algorithm Design Manual” and flip to the part on dynamic programming. Though I remember the applications in that book being interesting but seemingly far removed from what programmers do day-to-day. It seemed to be by a professor who consulted on algorithms for various companies, which is an interesting job!
Here is a concrete example of Dynamic Programming that you use every day: Word Wrap. Knuth has an algorithm that is often used for maximizing the number of words per line.
Also the field of Bioinformatics often uses the Levenshtein distance when matching two dna strands.
Also I would like to mention the single most important thing I learned from Dynamic Progrmming: Start at the end case, and figure out what constraints can work from there. For example, think about the last recursion call, and what constraints it needs, and go backwards from there.
I think the vast majority of web and/or system developer jobs where the main task is “feature development” can rely on the built-in structures in most languages (maps, lists and sets using various implementations) as long as they use common developer sense, but I have come across quite a few bottlenecks where lack of knowledge of the underlying data store (whether SQL or not) was causing trouble.
I find SQL DB internals fascinating, like tuning a formula 1 race car, so I am likely biased, but it seems like a lot of developers have some simple-but-crude notions of how a database works, with very little understanding of what it takes when you start seeing bigger amounts of data and/or more activity (“I put an index on it, why didn’t it automatically improve the performance on my 100m rows, 300 column wide DB when I select all fields on 20% of the rows?”)
I’ve noticed a growing trend of people assuming algorithms are pointless questions that are asked by tech companies purely as an arbitrary measure. I hear more people complain about how all of this is a purely academic exercise.
No, people complain that asking candidates to remember a ton of algorithms in their heads and produce them on a whiteboard when asked in a heavily time-constrained setting is a poor estimate of one’s ability to use them in practice.
We’re on the same page. In the article, I make it clear this is also my stance and close with:
“ To anyone reading whose company has a bar to hire people who know some of the advanced algorithms by heart: think again if this is what you need. I’ve hired fantastic teams at Skyscanner London and Uber Amsterdam without any tricky algorithm questions, covering no more than data structures and problem solving. You shouldn’t need to know algorithms by heart. What you do need is awareness of the most common data structures and the ability to come up with simple algorithms to solve the problem at hand, as a toolset.”
I’ve had to write a couple over the years. I enjoyed it immensely. I think the other thing to keep in mind is that even when you’re not designing or implementing an algorithm, good engineering occasionally requires just knowing tradeoffs for correct algorithm selection.
So over the years, here’s a few datastructures and algorithms I’ve needed to use while building software.
KMP search. I also had unique requirements to work on a stream rather than a string so I had to re-implement. Why? I needed to search for something in a HTTP response without buffering and this use case was not well supported by any libs I could find. I ended up borrowing a java implementation from twitter.
Good old binary search. Why? Needed to find a good compression ratio for an image that didn’t fall below a certain visual similarity score.
trie: Didn’t implement this one myself, but I needed to rapidly match IP blocks and a simple hash or list wouldn’t have worked. I just used a library called cidranger, but I selected that library because I knew I needed that datastructure.
Tree traversal. It happens sometimes in surprising contexts (dom manipulation, certain types of reporting, etc)
To be clear my position is that software engineering interviews focus too much on CS. Just like Physics and mechanical engineering are separate but related disciplines, CS and Software engineering are not the same thing.
I also had unique requirements to work on a stream rather than a string so I had to re-implement. Why? I needed to search for something in a HTTP response without buffering and this use case was not well supported by any libs I could find. I ended up borrowing a java implementation from twitter.
Note that, unless the stream is somehow interactive (that is, “wait until we have N bytes to read (or there’s EOF)” is unacceptable and we need to find match in what we have right now), this has a simpler solution that is both faster (with a fast memmem() — e.g. from glibc) and lighter on memory than KMP (unless len(pattern) <= 256 and you use 1-byte values to encode the values of prefix function — but then the difference in memory is at most 256 bytes). Here’s the pseudocode:
def contains(stream, pattern):
prefix = ''
while True:
chunk = stream.read(len(pattern))
# assume 'memmem()' which is O(N+M) is used for 'in' operation
if pattern in (prefix + chunk):
return True
if len(chunk) < len(pattern):
# EOF
return False
prefix = chunk
Great post! I’d like to make a similar exercise some time :-) I’m also curious what (to paraphrase Greenspan’s tenth rule) ad hoc, informally-specified, bug-ridden, slow implementation of established algorithms and data structures I might have unwittingly produced throughout the years.
Here are some of examples of data structures from personal experience (would be good to see what others came across, as well)
Early in my career needed to build a database migration utility that could also be used for patching up to a new version. Our product had about 600 tables and over 2000 relationships (foreign key enforced).
Dropping indices and recreating them was not an option, because the product systems were so large that the drop/recreate index would take very long time (and if there is an error, the system would be left in an inconsistent state)
So long story, it was important to generate SQL update statements such that they would not violate referential integrity.
To do that I built a graph of FK relations across tables, and then traversed that graph by ‘levels’ such that SQL update statements were generated in the of relations.
I think years later, I realized that I was implementing a topological sort.
In telecommunication when a person dials a long distance number, a way to match that number against a reference data indicating Country/City code, is to match the dial number, from left-to-right incrementally against that reference data. When the greedy match fully consumed first n digits against a full entry in the reference data – that was the match (and the remaining digits did not need to be matched)
To make that incremental matching fast, it was useful to convert the reference data into something akin a Suffix tree data structure.
Parts of Allen interval algebras helped to reason about time period intervals in the domain of revenue management, billing and taxation.
While this was not a particular war story, but during my education we were asked to implement a b-tree, a paper operating system (down to interrupt handlers, etc). This, later on, helped to understand how databases organized indices. And combined with knowledge of how operating systems accessed disks, and managed virtual memory – it made it much easier for me to understand database tuning.
Overall however, during 100s (if not over a thousand) interviews I had conducted – I do not remember, ever asking a person to come up with a datastrctures algorithm, on an interview.
I would ask questions that help to gauge what I call ‘pragmatic CS vocabulary’ of a given candidate (essentially their existing toolbox/knowledge base of the fundamentals), rather than asking them to come with new ‘verbs’ on the spot, so to speak.
Because I am interested:
how that ‘pragmatic CS vocabulary’ was constructed overtime (including motivations, bumps on the road, etc)
how easy a person can manipulate & apply their existing knowledge
how up-to-date it is,
and how flexible they are in adding/modifying their vocabulary (their knowledge base).
In my view, it is those characteristics, that would be a better (and one of many) indicators of person’s fitment in particular technical or technical leadership role, than ability to come up with new (to them) algorithmic constructs on the spot.
I guess it begs the question, if a majority of the algorithms & data structures are so rarely used, what are lowly, junior programmers at the gang of four actually coding all day? Not that this is limited to those companies. Anyway, thanks for the article and interesting to see the Max Howell tidbit is still relevant.
There are, of course, jobs where you would need to know how to (for example) invert a binary tree…but those are relatively rare and outside of those jobs on the rare occasion where you need to know, you can look it up.
Far more important, IMHO, is knowing the performance characteristics of different data structures and the operations possible on them. Should I use a hash table or a binary tree for this lookup? Or a B-tree? Why a B-tree over a plain binary tree?
The answers to these kinds of questions are not academic exercises; they have very real implications. If I need to extract a range of keys in sorted order, a hash table won’t help me at all, for example.
This is a great idea for a post that I’ve wanted to write myself. Leaving aside trees, hash tables, and stacks, I probably used less than one “interesting” data structure/algorithm PER YEAR of programming. Some notable counterexamples are two kinds of reservoir sampling, regular languages/automata, random number generation, some crypto, and some info retrieval algorithms.
One thing that sparked it is a obscure but long-running debate over whether dynamic programming interview questions are valid.
I don’t think they’re valid. It’s mostly a proxy for “I got a CS degree at a certain school” (which I did, I learned dynamic programming in my algorithms class and never used it again in ~20 years.)
I challenge anyone to name ANY piece of open source software that uses dynamic programming. Or to name an example in your own work – open source or not.
I’ve tried this in the past and nobody has been able to point to a concrete instance. I think the best I’ve heard is someone heard about a professor who heard about some proprietary software once that used it.
Related: algorithms used in real software (although this is certainly not representative, since compiler engineering is a subfield with its own body of knowledge):
https://old.reddit.com/r/ProgrammingLanguages/comments/b22tw6/papers_and_algorithms_in_llvms_source_code/
https://github.com/oilshell/blog-code/blob/master/grep-for-papers/llvm.txt
Linux kernel algorithms:
https://github.com/oilshell/blog-code/blob/master/grep-for-papers/linux.txt
Git, or most reasonable implementations of “diff”, will contain an implementation of the Myers Algorithm for longest-common-subsequence, which is very dynamic-programmy.
No concrete example for this one, but I know that bioinformatics code is full of dynamic programming algorithms for the task of sequence alignment, which is similar to diff — identifying a way to align two or more base sequences so that they coincide with the minimal number of changes/additions/deletions required to make them identical.
Hm I’m familiar with that algorithm but I never thought of it as dynamic programming.
Wikipedia does say it’s an instance of dynamic programming. Although when I click on the paper, it appears to contrast itself with “the basic O(MN) dynamic programming algorithm” (section 4).
Since you mentioned dynamic programming, it’s worth pointing out that the name “dynamic programming” was chosen for political reasons, as pointed out in the history section of the Wikipedia article on dynamic programming. So I think it’s a really bad name.
That’s funny, I remembered xkcd’s “dynamic entropy” comic, and it quotes the same passage:
https://xkcd.com/2318/
LOL
Agree it’s a terrible name… I would say it was chosen for “marketing” reasons
I have thought about whether dynamic programming questions are fair to ask, and I ended up where you are: they are not.
Dynamic programming was the one I struggled most in understanding and implementing correctly. And while there are semi-practical examples (like the backpack problem), I have not found any practical, day-to-day use cases on this.
I had an argument with my colleague who asked this kind of problem, saying it’s basic knowledge. Turns out he did competitive programming and there, it is table stakes. But in practice, it just filtered for anyone who has learned and practiced this approach.
I stay away from asking this, problems that need dynamic programming to solve.
I’m familiar with dynamic programming mostly from high-school competitive programming as well. Otherwise I can’t say I’ve encountered real-life problems where it occurred to me to use the technique.
I’m literally about to implement something that could be classified as dynamic programming at work, which can be sumarrized as “computing a few simple statistics such as number of bytes changed for each directory in a large filesystem”. Dynamic programming is such a general term that it applies regularly if you want to use it.
I’d like to see it. I don’t think dynamic programming is a general term.
In fact one time I participated in this argument (which was years ago so my memory is fuzzy), a former CS professor I worked with explained the difference between memoization and dynamic programming. A bunch of 10-20+ year programmers like myself went “ah OK”. Unfortunately I don’t even remember the difference, but the point is that most programmers don’t, because dynamic programming is very uncommon.
What you’re describing sounds like an obvious algorithm that anyone could implement, which is not the same as dynamic programming interview questions, or dynamic programming in competitive programing.
As other posters mentioned, competitive programming is the main place you see it outside of a CS class.
It’s absolutely an obvious algorithm, so is most dynamic programming. That was sort of my point.
Can’t share the code unfortunately, but it’s just iterate over sorted list of file changes in reverse order and collect statistics as we go. Dynamic part comes from the fact that we can just look at the subdirectories of a dir (that we already have numbers for) instead of recursing into it.
What you’re talking about could be called memoization, or it probably doesn’t even deserve that name. It’s just what a “normal” programmer would come up with.
That’s not the type of thing that’s asked in interview questions or competitive programming. The wikipedia page gives some examples.
Dynamic programming usually changes the computational complexity of an algorithm in some non-obvious way. There’s very little you can do recursing over a directory tree that doesn’t have a clear O(n) way to code it (e.g. computing statistics).
I like Erik Demaine’s explanation, that problems where dynamic programming can be applied are ones where their subproblems and their dependencies can be modeled as a directed acyclic graph [1]. Up to you if you’d like to tackle that with a top down approach where you look at a node and calculate its solution based on the solutions of its ancestors, or a bottom up approach starting from the nodes in the DAG with no dependencies and propagate the solutions in topological order.
My colleague and I used it for a generalization of matrix chain multiplication (for tensors) [2].
[1] https://youtu.be/OQ5jsbhAv_M?t=2736
[2] https://github.com/TensorCon/ratcon/blob/release/opt/gencon/pathopt.py#L198
edit: by the definition above, even naive memoization can count as DP, if you’re caching solutions to subproblems of that structure. Doesn’t have to be at the difficulty level of competition to count as DP in that case.
Hm interesting, searching through my personal wiki, I found a related definition which is different. I haven’t really thought about it enough to form an opinion.
Either way, it doesn’t change my overall point: that there are certain kinds of algorithms problems that appear on coding interviews, and in competititve programming, that do not show up in 99% of programming jobs. They are easy to pose and have cute solutions, but aren’t testing very much.
I think the “DP tables” part is key but again I haven’t thought about it enough …
https://blog.racket-lang.org/2012/08/dynamic-programming-versus-memoization.html
This bottom-up / top-down distinction might have been the same as what the aforementioned professor said 5+ years ago, but I don’t remember exactly.
So, is memorization of factorial top-down, or bottom-up?
I would probably say neither… Either factorial or Fibonacci is so trivial that it doesn’t help to be thinking about it that way.
Though I think the quote hints at a clear test for whether it’s top-down or bottom-up: if you need extra state outside the call stack. I’m getting curious enough to try this out, but right now all I can do is quote what other people say.
In any case it’s clear to me that there’s some controversy over what dynamic programming really is. I think the issue is that a lot of algorithms could be framed that way but were not discovered that way, and not taught and learned that way.
I think that the triviality is actually helpful here. If it’s actually true that memoization and dynamic programming are different (and there’s clearly debate on this), can 2 implementations of a trivial function, that everyone can understand highlight the differences?
On the contrary the stupidly naive way (recurse on every directory) is O(n^2).
Dynamic programming is just solving a series of problems while using the answers of shared subproblems multiple times. Memoization is a common way to implement this.
Yes there are some very clever algorithms that use dynamic programming, this doesn’t make obvious algorithms that use dynamic programming not also fit under the definition.
Why would recursing into every directory be O(n^2)? You’re still only visiting every directory/file once? It seems like something missing?
Say you have a directory structure with a single file in it, /a/b/c/d/e
To get the number of bytes changed in e you need to visit e, then to get the number of bytes changed in d you need to visit d and then e, then for c you need to visit c, d, and e, and so on.
Like I said, it takes a really naive solution, but if you don’t remember the values you calculate anywhere for some reason it’s sum over inodes (depth of inode)… which is O(n^2) (for bad directory structures).
Note that I need these values for every directory, not just for one particular directory.
That’s still linear complexity space. Unless you’re hardlinking directories (which you then have to deal with potential recursion), it’s still O(n). If you add a file at
/a/b/c/file
you only visit 1 more file and no more dirs, not an exponential. O(n + n + n) or O(n + 3) still simplifies to O(n).If you add /a/b/c/file you add 4 more visits, not 1. Going from n= 3 /a/b/file to n=4 /a/b/c/file adds 4 more visits. In other words this worst case example takes time O(sum from 1 to n of i) = O(n(n-1)) = O(n^2).
N is the number of inodes in a arbitrary tree not a number of files in a fixed tree.
That’s still adding a linear number of operations for each file, the depth could technically be considered a different variable, say
m
. So for each file (n+1) you add, you also add the number of directory traversals (m) resulting in O(m+n), which simplifies again to O(n), but in reality folders are files too, so are part ofn
in the first place, so again O(n). Ultimately your n space is the total number of inodes, which both files and folders have.Abstractly, you’re just traversing a tree structure (or a directed graph if using links), which is well understood to be O(n) (maybe O(n^2) worst case if all folders are links, resulting in a fully connected graph), because you only visit each node once. If it were O(n^2), you would visit each node n times.
Remember, Big O notation is about scaling, not the actual concrete number of operations, which is why you drop any constants or variables other than
n
.It’s O(mn) not O(m+n) (in the insanely naive algorithm that recalculate things every time).
It’s not a single tree traversal but #internal nodes tree traversals.
Even if it was O(mn) (it’s not), that still simplifies to O(n). An ‘internal nodes’ tree traversal is still O(n), n is just smaller, but again, your problem is not an internal nodes traversal, it’s a full traversal because you have to look at the blocks attached to the file (leaf) inodes, which means you need to read all inodes of all files and of all folders one time each. n = # of files + # of folders = O(n)
I supposed an extremely naive solution could be to fully traverse each sub tree for every folder visited, which would be… O(log n)? But even that isn’t O(n^2), as the total repeated space shrinks the deeper you get.
You’re assuming a balanced tree, which is not guaranteed. Depth of tree is O(n) in pathological cases (and average case O(sqrt(n)) is typical for randomly generated trees)
Ah yeah, I think it would be O(n log n) not O(log n), because you traverse the tree once for each node, and a subset of of the tree for almost every n (except leafs), at least in the worst case. Still not O(n^2), and the solution for a O(n) is almost easier to conceptualize than the completely naive solution :)
No argument here…
We agree it’s O(n) * O(time tree search) now right? And you’re trying to call the tree search time log(n)? Because trees are height log(n)? Then see the post you replied to, that’s true in a balanced tree, it’s not true in a random tree (where it is sqrt(n)) and it’s definitely not tree in a pathological worst case (where a tree is just a n length linked list).
Yeah, the part I was hung up on before was that you’re naive solution traverses the entire subtree below a node for each node visit, I was stuck in the simple optimal solution. For the pathological case, basically just a bunch of folders in folders with a single file at the bottom, the depth of the tree is n, and the file inode at the bottom would be accessed n times, so O(n^2). For the common case it would be about O(n log n) where you can skip traversing larger and larger parts of the tree the deeper you get on each ‘path.’ Thanks for the discussion, I enjoyed it :)
I think comparing memoization to dynamic programming is a category mistake: they are different kinds of things.
‘Dynamic programming’ is a description of a style of algorithm. It’s divide-and-conquer, usually with overlapping subproblems, making it possible to reuse intermediate results.
Memoization is a programming technique to remember intermediate results, by remembering the results of function calls. You can e.g. also store the intermediate results somewhere explicitly, usually in a matrix, in which case you don’t memoize the result ‘transparently inside the function’, but use a lookup table ‘external to the function that computed the result’.
I dunno I find that in addition to the Racket language resource I gave elsewhere in the thread, lots of people compare them:
https://medium.com/@afshanf/cracking-the-coding-interview-questions-part-7-a7c8f834d63d
There does seem to be disagreement on what dynamic programming is. And many algorithms that were not derived with dynamic programming techniques could be described as dynamic programming.
But it seems that most people agree it’s related to memoization.
GCC uses dynamic programming to split IA-64 instructions into bundles.
Thanks, nice example and link! Still I would say it’s a niche skill, especially to come up with from scratch in an interview.
Ever do video encoding or transcoding with anything built on FFmpeg or x264? Encode images with MozJPEG? Encode an AV1 video or AVIF image with libaom? Trellis quantization in advanced lossy compression encoders is a dynamic programming algorithm.
Hm very interesting! I was not aware of that algorithm. Paper I found:
https://www.mp3-tech.org/programmer/docs/e00_9.pdf
I would still say it’s a bad interview topic, but it’s cool to see real world usages of it.
Oh, no disagreement there! Even after coding it up myself, I’d hate to have someone ask me to whiteboard a working implementation of trellis quantization in 40 minutes or whatever (though I’m pretty sure I could sketch out an explanation now).
In general I’m not a fan of whiteboard coding exercises at all. Whenever I’ve interviewed candidates I’ve always preferred the old-fashioned method of just reading their resume well ahead of time, looking up what ever piques my interest on it, and then having a friendly conversation about that. Usually that provides plenty of esoteric material for me to quiz them on and it also lets them show me their strengths and enthusiasm.
My current company doesn’t do a whiteboard exercise, but my previous one did… but the thing is, the usual task was to implement a basic “grep”. That is, read a file and print all of the lines that contain a user-specified string, in a language of your choice, with whatever libraries make you happy (it’s not a trick, you’re not supposed to implement Boyer-Moore on your own). Assuming you succeeded at that, we would ask you to implement a few more features, like a
-v
flag (only print lines that don’t match), and-A
and-B
flags (print context lines before and after the matching line), until you got stuck or the time for that segment was up. It wasn’t graded on minutiae like correct semicolon placement, it was just an evaluation of whether a candidate could do a trivial task, how they handled additional requirements, whether they asked sensible questions and got clarification when needed, etc. I found it pretty reasonable.I used Warshall’s algorithm (which is dynamic programming) to compute the transitive closure of a graph for a typechecker. This is, in my experience, a very common algorithm.
In high school, I wrote a program for my professor that places students into groups of 4 such that their meyers-briggs personalities are as different as possible. This used dynamic programming.
A professor of mine (who taught the subject to me) used dynamic programming for some kind of RNA sequencing problem in a paper he published. One of our homework assignments had us arrive at a watered down version of his (and his co-authors’) algorithm.
I’m fairly certain that at least some fuzzy string matching algorithms use string distance, which is also solved using dynamic programming.
These are all diverse applications of DP. In my personal, subjective experience, the idea that DP is in any way obscure or dated is absurd.
Edit:
To be more concrete, the “transitive closure of a graph” is for the graph of dependencies, computing the set of all functions that a particular function depends on. This is as described in the Haskell Report.
For fuzzy string matching, I have in mind something like fzf, though I cannot say with certainty that it uses string distance (I’m unfamiliar with its implementation).
Here’s the paper that I think I’m referencing: Statistical Mechanics of Helix Bundles using a Dynamic Programming Approach
Thanks for the examples. The claim is definitely not that it’s outdated or obscure; the claim is that it’s not a good interview question because it doesn’t show up much at work. Although there were lots of people here who pointed out interesting uses of dynamic programming, that’s not incompatible with the idea that you could have a 10 or 20 year programming career and never use it.
Side note: I’m familiar with the Floyd Warshall algorithm but I never thought of it as dynamic programming. I think part of the issue is that I may have a more narrow definition of it than others. (I think people even say the linear time fibonacci is an example of dynamic programming, which I find silly. I just call that the obvious algorithm. I guess it can be used to illustrate a principle.)
Even so, I definitely think it’s more popular in universities, and certain domains like bioinformatics. In contrast to what people on this site typically do “for work”.
I do a lot of work with text processing – computing the edit distance between two strings is something I do very often. That’s a classic dynamic programming algorithm. There are probably hundreds of open source packages that do this or some variation thereof.
Just to add to the list of responses clearly demonstrating Cunningham’s Law:
I believe the Knuth-Plass line-breaking algorithm used in LaTeX to lay out text “optimally” uses dynamic programming. This was done for efficiency, as opposed to using some kind of general global optimization routine. It’s also the reason why LaTeX doesn’t support “backtracking”.
Sile uses a variant of the same dynamic programming algorithm to lay out paragraphs on a page. The original paper describing the algorithm says that TeX wanted to use it like that, but it would require more than one entire megabyte of state for a large document, which was infeasible.
Definitely an instance of Cunningham’s law at work :) I should make another go for my pet problems:
Thanks for the example. If there were more specific links I would make a blog post out of this :)
And although my comment was a tangent, it did motivate me to get out the “Algorithm Design Manual” and flip to the part on dynamic programming. Though I remember the applications in that book being interesting but seemingly far removed from what programmers do day-to-day. It seemed to be by a professor who consulted on algorithms for various companies, which is an interesting job!
The Grasshopper routing library uses contraction hierarchies, which are implemented using Dijkstra’s shortest path algorithm and A* search, which are special cases of dynamic programming.
I have to agree it’s not something most people will use every day, but it never hurts to have a general idea how it works.
Here is a concrete example of Dynamic Programming that you use every day: Word Wrap. Knuth has an algorithm that is often used for maximizing the number of words per line.
Also the field of Bioinformatics often uses the Levenshtein distance when matching two dna strands.
Also I would like to mention the single most important thing I learned from Dynamic Progrmming: Start at the end case, and figure out what constraints can work from there. For example, think about the last recursion call, and what constraints it needs, and go backwards from there.
I think the vast majority of web and/or system developer jobs where the main task is “feature development” can rely on the built-in structures in most languages (maps, lists and sets using various implementations) as long as they use common developer sense, but I have come across quite a few bottlenecks where lack of knowledge of the underlying data store (whether SQL or not) was causing trouble.
I find SQL DB internals fascinating, like tuning a formula 1 race car, so I am likely biased, but it seems like a lot of developers have some simple-but-crude notions of how a database works, with very little understanding of what it takes when you start seeing bigger amounts of data and/or more activity (“I put an index on it, why didn’t it automatically improve the performance on my 100m rows, 300 column wide DB when I select all fields on 20% of the rows?”)
No, people complain that asking candidates to remember a ton of algorithms in their heads and produce them on a whiteboard when asked in a heavily time-constrained setting is a poor estimate of one’s ability to use them in practice.
We’re on the same page. In the article, I make it clear this is also my stance and close with:
“ To anyone reading whose company has a bar to hire people who know some of the advanced algorithms by heart: think again if this is what you need. I’ve hired fantastic teams at Skyscanner London and Uber Amsterdam without any tricky algorithm questions, covering no more than data structures and problem solving. You shouldn’t need to know algorithms by heart. What you do need is awareness of the most common data structures and the ability to come up with simple algorithms to solve the problem at hand, as a toolset.”
I’ve had to write a couple over the years. I enjoyed it immensely. I think the other thing to keep in mind is that even when you’re not designing or implementing an algorithm, good engineering occasionally requires just knowing tradeoffs for correct algorithm selection.
So over the years, here’s a few datastructures and algorithms I’ve needed to use while building software.
To be clear my position is that software engineering interviews focus too much on CS. Just like Physics and mechanical engineering are separate but related disciplines, CS and Software engineering are not the same thing.
And even that is pretty easy to get wrong.
Note that, unless the stream is somehow interactive (that is, “wait until we have N bytes to read (or there’s EOF)” is unacceptable and we need to find match in what we have right now), this has a simpler solution that is both faster (with a fast
memmem()
— e.g. from glibc) and lighter on memory than KMP (unlesslen(pattern) <= 256
and you use 1-byte values to encode the values of prefix function — but then the difference in memory is at most 256 bytes). Here’s the pseudocode:It might also just be that having written Homebrew isn’t enough to automatically get a job everywhere.
Great post! I’d like to make a similar exercise some time :-) I’m also curious what (to paraphrase Greenspan’s tenth rule) ad hoc, informally-specified, bug-ridden, slow implementation of established algorithms and data structures I might have unwittingly produced throughout the years.
Here are some of examples of data structures from personal experience (would be good to see what others came across, as well)
Early in my career needed to build a database migration utility that could also be used for patching up to a new version. Our product had about 600 tables and over 2000 relationships (foreign key enforced). Dropping indices and recreating them was not an option, because the product systems were so large that the drop/recreate index would take very long time (and if there is an error, the system would be left in an inconsistent state) So long story, it was important to generate SQL update statements such that they would not violate referential integrity. To do that I built a graph of FK relations across tables, and then traversed that graph by ‘levels’ such that SQL update statements were generated in the of relations. I think years later, I realized that I was implementing a topological sort.
In telecommunication when a person dials a long distance number, a way to match that number against a reference data indicating Country/City code, is to match the dial number, from left-to-right incrementally against that reference data. When the greedy match fully consumed first n digits against a full entry in the reference data – that was the match (and the remaining digits did not need to be matched) To make that incremental matching fast, it was useful to convert the reference data into something akin a Suffix tree data structure.
Parts of Allen interval algebras helped to reason about time period intervals in the domain of revenue management, billing and taxation.
While this was not a particular war story, but during my education we were asked to implement a b-tree, a paper operating system (down to interrupt handlers, etc). This, later on, helped to understand how databases organized indices. And combined with knowledge of how operating systems accessed disks, and managed virtual memory – it made it much easier for me to understand database tuning.
Overall however, during 100s (if not over a thousand) interviews I had conducted – I do not remember, ever asking a person to come up with a datastrctures algorithm, on an interview.
I would ask questions that help to gauge what I call ‘pragmatic CS vocabulary’ of a given candidate (essentially their existing toolbox/knowledge base of the fundamentals), rather than asking them to come with new ‘verbs’ on the spot, so to speak.
Because I am interested:
In my view, it is those characteristics, that would be a better (and one of many) indicators of person’s fitment in particular technical or technical leadership role, than ability to come up with new (to them) algorithmic constructs on the spot.
I guess it begs the question, if a majority of the algorithms & data structures are so rarely used, what are lowly, junior programmers at the gang of four actually coding all day? Not that this is limited to those companies. Anyway, thanks for the article and interesting to see the Max Howell tidbit is still relevant.
Overly broad statement: Most people are making interfaces on top of a database, and doing some basic reporting.