1. 51
  1.  

  2. 31

    “So what makes a data structure good to name-drop in an interview? I would say that it has to be mildly obscure, so that you sound like an erudite data structures hipster, but it can’t be too obscure, lest your interviewer ask you to really explain the implementation details or trade-offs to the point that you reveal your ignorance. It’s best when you mention a data structure that’s somewhat obscure, but that your interviewer has heard of, because then your name-dropping validates their knowledge. They want to think of themselves as smart, and they’ve heard of this data structure, so when you show your knowledge of it, they deem you smart by the transitive principle.”

    She’s playing interview chess, not checkers. She’s doing it book smart and people smart. I’m already sold on hiring her haha.

    1. 6

      Are ring buffers actually vaguely obscure? They get used all the time for simple queues in my experience, especially in low-level code where memory is expensive and latency is low. Receiving data from a device of some kind? Throw it into a ring buffer until you can handle it.

      1. 4

        Well it’s super common in the same sense that quadrtrees are super common in game dev for collision detection. It’s super common in certain kinds of problems but you can go a life without running into them

        Granted I think circular buffers are more common

        1. 3

          Might be obscure for web developers, and interview quizes are especially wicked at webdev positions.

          Circular buffers get really interesting if they’re implemented with mapping multiple memory pages to the same physical memory, and it’s quite common.

        2. 3

          Tries aren’t as obscure as the article tries to suggest they are. As it turns out, the ircu IRC daemon actually uses tries to match commands (of which there are quite a lot) somewhat fast.

          1. 7

            I think that’s the point: it is “obscure” in that you don’t get it in the introductory classes, but it is common enough that most people have heard of them and know there are real world cases around.

          2. 3

            My favorite use of bloom filters is CRLite, a system for certificate revocation

            1. 2

              Why are bloom filters so often described as using O(1) space? Given a desired false positive rate, you have to use increasing amounts of space to store more keys. The only sense in which O(1) space makes sense to me is that you don’t dynamically resize the bloom filter. Am I missing something?

              1. 12

                They’re O(1) space because they do not increase in size relative to the amount of data you’re trying to store. Yes, you can change the filter size based on the accuracy you desire, but it has an O(1) relationship to the amount of data stored.

                1. 7

                  This is only true if you’re willing to accept an arbitrarily high false positive rate, which is never true. For any given level of false positives, they require more space for a given number of elements.

                  1. 5

                    And yet Big-O notation don’t take in account how you pick the constant size of you structure. Size remains independant of the number of elements you insert.

                    1. 2

                      It is true because of the definition of Big-O. If you insert a new element into a bloom filter, it does not increase in size. The insert operation of a bloom filter has no impact on its size, therefore it is O(1). If you want to build an implementation of a bloom filter that dynamically resizes as part of insertion, then that would have a different complexity.

                      1. 1

                        You would need to remember the keys (defeating the purpose of a bloom filter) if you were to dynamically resize, since you’d need to rehash the elements that had already been inserted.

                        1. 3

                          You can dynamically grow by creating a new one, inserting new records to it, and checking both.

                          1. 1

                            Which is kind of my point, it wouldn’t really be a bloom filter, so it would have different complexity.

                    2. 3

                      It’s really O(k), where k is constant, right? We just don’t always write the k, probably because it might be confusing since some algorithms have parameters other than n in play. So even if k is fifty bajillion million, it’s still constant and n (the number of pieces of data you want to process) never comes into it. For a Bloom filter, n is the number of items you add to the set. No matter how many you add, k never changes.

                      1. 1

                        You can vary the number of hashes independently of the number of bits used. For a given size, raising the number of hashes will initially lower the false positive rate, then lower it (in the extreme case, with enough hashes, every element will have every bit set).

                        1. 3

                          I think the point is that you don’t vary the number of hashes or bits. You set them up-front. It’s OK that the probability of a false positive goes up as you add items, that’s the nature of the algorithm.

                          Wikipedia says basically what I said above, though note that in this case k is actually the number of hash functions. But, again, if you set it and forget it, k is constant.

                          Bloom filters also have the unusual property that the time needed either to add items or to check whether an item is in the set is a fixed constant, O(k), completely independent of the number of items already in the set.

                      2. 2

                        You are correct, but people also oddly enough describe a hash table O(1) when it suffers from the same problems.

                        It’s ironic that most data structure shibboleth statements are at best oversimplifications and at worst lies.

                        1. 2

                          Sorry, isn’t a resizable hash-table really amortized O(1) for insert and average O(1) for lookup (assuming the hashing operation is itself constant-time—and if it’s not then it’s still O(m), where m is something like the size of the keys, not the number of keys used)?

                          1. 2

                            Absolutely! I don’t know about you but I find that the effects of resizing and the differences between worst case complexity and amortized complexity are rarely, if ever, discussed in the standard algo interview.

                            1. 3

                              It does happen in practice too… A long time ago, I remember that a coworker submitted patches to GNU Make for “badly sized hash tables” that led to O(n^2) performance on huge Makefiles. And I was like “huh that happens for real” :)

                              I think “old-style C” is usually lazy about their hash tables, because it’s hard to write a reusable one, and writing one with the “proper” big-O takes significant effort. Every program has its own.

                              For example, I think I heard that the initial version of the C linker only considered the first 6 characters. The only reason I can think of this is implementation simplicity, i.e. laziness :)

                              https://stackoverflow.com/questions/38035628/c-why-did-ansi-only-specify-six-characters-for-the-minimum-number-of-significa

                              And PHP’s hash function for identifiers was the strlen!!! Which obviously causes a lot of collisions. And it apparently influenced the language design:

                              https://news.ycombinator.com/item?id=6919216

                              I’ve been reading a lot of old C code lately, and the difference between theory and practice leaves me amused and occasionally horrified :-/

                            2. 1

                              This is super pedantic,and I’m not super sure about this:

                              Since you can have arbitrary memory, and arbitrary amount of elements in your hash table, to store n elements you need to be able uniquely identify each element, which means an identifier of length O(log n). Reading this identifier takes O(log n), so insertion and deletion are O(log n).

                        2. 2

                          Recently I had to implement the code snippet for finding connected components on paper. The interviewer was not impressed. Apparently I am still dizzy with recursive solutions. Gotta put in more work.

                          1. 0

                            Speculation: The fact somebody’s more interested in “name dropping” “to sound smart in an interview” than actually learning is probably why they need to name drop in the first place.

                            And then everybody wonders why it’s so hard to find good devs…

                            1. 33

                              I think developers have a long history of being playful with systems both social and physical, and the article just follows that tradition.

                              1. 7

                                Speculation: The fact somebody’s more interested in “name dropping” “to sound smart in an interview” than actually learning is probably why they need to name drop in the first place.

                                To be honest, it sounds like you didn’t read the article. The author does go into detail about the implementation details of the data structures if that is of interest to the reader.

                                And then everybody wonders why it’s so hard to find good devs…

                                Meanwhile I’m jumping up and down shouting, “Hire me! I have 5 years of industry experience!” and nobody is paying attention.

                                1. 1

                                  Anyone can go into detail on a subject like data structures when writing a blog post. Especially detail that reads like me lifting Wikipedia articles for a BTEC essay on maximum junction temperature. One has all the time in the world.

                                  1. 1

                                    Speculation: The fact somebody’s more interested in “name dropping” “to sound smart in an interview” than actually learning is probably why they need to name drop in the first place.

                                    To be honest, it sounds like you didn’t read the article. The author does go into detail about the implementation details of the data structures if that is of interest to the reader.

                                    I took issue with that choice of phrase as well, but IMO being guilty of using click bait-y titles is a misdemeanor offense :) Everybody wants to get their blog posts read.

                                    As I said in my response I do wish the article both presented more data structures and also said more about each than pointing at a Wikipedia explanation, but that’s neither here nor there.

                                  2. 6

                                    I would argue that the fact that this works points to a deficiency in the standard technical interview format, not a deficiency in the candidate.

                                    1. 6

                                      I’ve mentioned this before and it’s been a little controversial, but this is why I tend to weight soft skills and “potential” more highly than technical “stuff” when I interview. I don’t care if someone has memorized CLRS or read some book on hacking technical interviews. If they can code, at least reasonably well (which is pretty easy to test) then I’d rather try to discover something about their communication, collaboration, interpersonal, and self-teaching skills.

                                      Of course this mainly applies to entry-level or other “junior” hires. It’s a little different for more specific, or more “senior” positions. Although in those cases, the person should be able to to talk about past experience and projects. We still shouldn’t need to resort to regurgitating complexity characteristics.

                                      1. 2

                                        I think this is on the right track, but how do you deal with people like past me who put on a really great show, and could indeed perform, but talked like a superstar when in reality I was merely a solid base hitter?

                                        (See above. I feel as if I’ve pretty much addressed this for myself but I’m baring my own throat as a concrete example of one way this strategy could fail.)

                                        1. 2

                                          I think the answer is that if the candidate makes the interview into a “senior” interview by name-dropping a bunch of high-powered concepts, then the interview becomes a senior interview and I expect to see evidence of big-time contributions or highly-specialized knowledge. Of course even senior people need those soft skills, so that part of the interview remains.

                                          1. 1

                                            So, not to beat a dead horse but how do you gather said evidence? There are people (my past self included) who can describe things out to 5 9s but actually implementing them beyond a certain point is another thing altogether.

                                            We may need to agree to disagree on this :)

                                            1. 1

                                              I guess I’m not sure what you’re asking at this point. How do I gather evidence of big-time contributions? The usual ways: GitHub profile, descriptions of past projects, personal recommendations, and even whiteboard coding. I never said I don’t look at any whiteboard coding, just that I don’t emphasize it for “junior” hires.

                                              They key is to recognize, and become comfortable with, the idea that interviewing isn’t a mechanical process, it requires nuance and it will never be 100% effective. The key is best-effort (like a Bloom filter, hehe). I am 100% certain that I could be “tricked” by someone with just the right combination of knowledge and interpersonal skills. I’m OK with that.

                                              1. 1

                                                Sorry I thought I made this clear in a previous response: The question I’m asking is - using your methods, how do you validate that the candidate can actually do the things they’re claiming they’ve done or some reasonable approximation given the time and logistics constraints of the interview?

                                                1. 1

                                                  Well, like I said, the same way anyone else would. You can look at GitHub (or other public sources), recommendations from people you trust, ask them questions, or hit up a whiteboard or write some code.

                                        2. 1

                                          I mostly hire for potential and character. I’ll perform technical tests to determine both of those. How does this person work through a problem? Is the individual’s workflow compatible with the rest of my team mates?

                                          Technical tests can measure character, too. I would rather a person say “this problem is out of range of my skill set, but I would love to learn how to solve it” than a person try to lie or cheat their way through it. The training I’ve gone through allows me to detect whether a person is lying. And I’ve caught quite a few liars through the use of technical tests.

                                          I would not consider hiring anyone who felt the need to “name drop” a technical solution without being able to back that up with a sample basic implementation. If I ask a candidate a technical solution, and they name drop a solution, I’ll immediately ask for a basic implementation. Feel free to draw on our walls–they’re coated with a special whiteboard paint, safe for dry erase markers. :)

                                          No implementation of name dropped technical solutions = no hire

                                          1. 7

                                            Note to self: never mention that I use red-black trees if I’m interviewing for lattera. Because in >20 years of programming, I never needed to implement them once even if I use them a plenty.

                                            Here we go, another entry in the secret rulebook for the fucking interview game.

                                            1. 1

                                              Here we go, another entry in the secret rulebook for the fucking interview game.

                                              Yes, I do think most standard interviews are a game. This is why I personally favor actually having candidates walk the walk. I know a lot of people are down on whiteboard coding, but it sure seems to me that when used correctly it can go a long way towards accomplishing this.

                                            2. 6

                                              I see value in name-dropping because just knowing the name can shave a LOT of time off implementation time because you can then pull the details off wikipedia or whatever. (Of course, you also need to have the foundational knowledge to read wikipedia and understand the vocabulary, etc.) So even if you don’t remember the details off the top of your head, knowing the name (and that it is relevant to the problem) is still going to put you streets ahead of the person who is going in search circles for want of an appropriate keyword, or who has to beg stack overflow for even the first pointer.

                                              1. 2

                                                I understand your reasoning, but based on past experiences I’ve had, I’ll have to disagree politely. Having different perspectives and opinions is totally cool, though. If everyone held the same opinions as I, the world would be a very boring, repetitive, and frustrating place. ;)

                                                1. 2

                                                  Sure, but you can tell the difference between someone name-dropping something they’ve heard of and someone suggesting something they’ve heard of but would need to look up the details of.

                                                  1. 2

                                                    Not at all sure that’s the case. I understand what a bloom filter is and does well enough that I could probably answer some questions about one cogently, but not well enough to implement one. See my responses above.

                                                  2. 2

                                                    I’m afraid I must respectfully disagree. I know what a bloom filter is. Even with Wikipedia I sincerely doubt I could implement one in any kind of reasonable time-frame. Admittedly I do system infrastructure work so the necessity of doing so in my work seems very remote, but there it is.

                                              2. 1

                                                Definitely agreed. Coming up with good talking points is one thing, being able to walk the walk is quite another. I’ll admit that at a certain point in my career I had become more adept at talking the talk and doing well in interviews than actually walking the walk.

                                                My current gig with a certain cloudy company that’s everyone’s favorite poo flinging target these days has narrowed that disparity measurably.

                                            3. 1

                                              The love. The hate. I feel the need to vent :)

                                              Love: The idea. Shining a light on interesting data structures is always good, especially ones that have increased relevance in the modern business environment. Explaining them in a useful way that’s easily understood is even better.

                                              The Hate: This title is total clickbait and I hate it on both practical and philosophical levels.

                                              • Philosophical: Interviews are marketing, but ‘name dropping’ an algorithm without a REALLY solid understanding of what it is, what it does, and ideally some sample code to back it up are a recipe for disaster. You have to sell yourself, but it must also be your TRUE self. The self that can walk the walk every single day after talking the talk and making the sale.
                                              • Practical: Name dropping is never a good idea in an interview unless ‘dropping’ that name is in service to explaining how you made some particular contribution to a project that illuminates what you can bring to the table for a prospective employer. So, “I co-implemented the Plan-9 filesystem with Thompson and Pike at Bell Labs” is both helpful and illustrative, whereas, “I worked with Eric Schmidt on the Google search algorithm” when what you actually did was say, release engineering is not. This second case is what I think of as a “name drop” and should be considered an antipattern in interviewing.

                                              The Meh: I really wish the article had more depth. Only 2-3 data structures are showcased, and the only explanations offered are pointers to Wikipedia. On the one hand, good that we can build on the backs of giants and rely on the common to offer a solid explanation. Bad in that the article could add more value if the author had something original to add.

                                              I don’t mean to cast unwarranted aspersions - it’s great the the author blogged this, I just feel like there’s a lot of un-harvested potential here.