1. 20
  1.  

  2. 32

    Probably because the problem is ill-typed. Programmers with mediocre mental models of typing probably won’t notice that the problem is essentially malformed, start attacking it, and then get confused when they run up against the fact that this program requires switching on types, not just values.

    If I was asked this in an interview, I would consider it to be a bit of a yellow flag; it suggests to me that I might expect to run into this sort of unpleasantness on the job. And unfortunately, things like this do exist; I’ve seen a function in a python codebase at a previous job with documentation that went something like “This function returns a list, a tuple, or a nest of maps, lists, and sets. God help you.”

    In my mind, something called “flatten” would probably have type “[[a]] -> [a]”. A well-typed version of the question is non-trivial to express, as it requires type recursion, but I would probably write it as either

    data RList = RInt Int | Rec [RList]
    

    Or equivalently

    Free [] Int 
    

    But in neither case would I describe it as “a list”. A more accurate description might be “collect the leaves in a rose tree”.

    A short O(n) solution to the problem (assuming we’re using the latter representarion) would be

    collect = foldr (:) []
    
    1. 13

      The first time I read your post, my kneejerk reaction was that you were overly academic and that you were completely wrong. Further rereadings convinced me that you had a point, perhaps only one, and here it is:

      then get confused when they run up against the fact that this program requires switching on types, not just values.

      You hit it on the head here as the source of difficulty, but that same source of difficulty is exactly what makes it a good question. Anybody who doesn’t understand, in their bones, that they can do such things need not share a codebase with me unless the relationship is explicitly an educational one.

      I’d go further, and point out that the solution in, say, JS, isn’t some deep kraut space magic:

      function angerFlatten( array ) {
          return array.reduce( function(acc,el) {
              return acc.concat( 
                  (Array.isArray(el)) ? angerFlatten(el) : el
              );
          },[]);
      }
      

      Now, the above does exploit the slightly underhanded trick of Array.concat knowing how to handle both scalar and array arguments, but that’s not that evil.

      Dynamic languages make this easier, but you could do the same in anything that supports ignoring the inner types of an array, perhaps by using void* or Object in C or Java respectively–which is pretty much why you and other Haskell folks seem so flummoxed for what is really a perfectly reasonable problem.

      The real problem incidentally is that the nice solutions that leverage recursion here may not be safe for production, because a lot of industry languages just don’t handle that competently.

      ~

      The more troubling part is when you said this:

      it suggests to me that I might expect to run into this sort of unpleasantness on the job.

      In professional software development, especially with legacy systems, unpleasantness abounds. I don’t mean to sound like Uncle Bob here, but if we cannot stand getting our hands dirty with something as unseemly as a rose tree then we should all seek a different career.

      For an example of where we might encounter such a thing, consider a legacy system where you have a collection of customer companies, and each company has some number of shipping addresses. They also have some number of subsidiary companies, and those subsidiary companies have subsidiaries and mailing addresses, and so forth. Because legacy, all of these companies (direct customers or subsidiaries) use the same Company interface. Also, because legacy, there isn’t a way of getting all company entities in the system, and instead you only get access to the customer companies.

      You are tasked by angerboss–between fits of drunken rambling and shitposting–with getting a list of all addresses for all customer companies and their subsidiaries.

      The company doesn’t use Ruby, so smartass answers like ObjectSpace.each_object(Company) { |c| ...} aren’t permissible.

      The only alternative is going to be to basically flatten out this initial array of companies. It’s not an exotic use case, it’s not crazy, it’s not even particularly noteworthy.

      Similarly, if you’re crawling a scenegraph or certain types of graphic structure, you end up flattening everything into a pile of polygons. Or if you’re, purely random example, crawling HTML and just need the text of a website, you do this too.

      ~

      Your post is exactly the sort of thing I’d reference when explaining why some folks view Haskellers and other type-purists as being aloof and not always suited towards doing practical work.

      1. 8

        In professional software development, especially with legacy systems, unpleasantness abounds.

        This is true, but also irrelevant; my point was that I don’t want to work with such unpleasantness. That’s why this question would constitute a “yellow flag”.

        I’ve had only moderate difficulty finding employment that doesn’t force me to work with unsound and regrettable legacy code. As they say, dress for the job you want.

        but if we cannot stand getting our hands dirty with something as unseemly as a rose tree

        The problem isn’t rose trees. The problem is having data structures that are obviously malformed and incoherently typed. Here are some sensible rose tree definitions (free and cofree, respectively), from scratch: data Rose a = Leaf a | Forest [Rose a] and data Rose a = Rose a [Rose a]. The latter doesn’t even require sum types; you can easily do it in C(++) or Java.

        Because legacy

        You’re absolutely right that this code exists, which is orthogonal to the fact that I’d rather not work with it and that we should strongly discourage such code from being created in the future.

        Your post is exactly the sort of thing I’d reference when explaining why some folks view Haskellers and other type-purists as being aloof and not always suited towards doing practical work.

        It’s unfortunate that people occasionally conflate having practical standards with being aloof.

        Regardless of “some folks'” views on the matter, there is plenty of practical work being done that’s up to and above my standards of sensibility, and (perhaps not incidentally) it tends to be on the more interesting end of professional computer science.

        1. 3

          then get confused when they run up against the fact that this program requires switching on types, not just values.

          You hit it on the head here as the source of difficulty

          I’d say that the fact that this is a source of difficulty is an example of what’s wrong with programming today. Why should I have to decide a priori what is a type and what is a value? I should simply switch on an arbitrary predicate and then later I can decide how I want to encode the data and implement the predicate.

          1. 2

            Recursion is not required.

            • Knuth showed us another good way to do it (TAOCP I.2.3.1 Traversing binary trees, exercise 21), but this requires external nodes or observing that pointers are integral (at least in languages like C).
            • raze converge (spelled ,// in K) will also do it, but using iteration instead of recursion, and relying on the memory allocator amortising append for performance.

            Types are not required either, and actually might be a terrible way to think about problems because of these kinds of issues – and not just because of what you observed about JavaScript.

            1. 2

              Recursion isn’t required, certainly, as you can just keep an external accumulator during traversal. That said, it make the traversal code nice and terse looking–deceptively so, as I observed.

              Hey, haven’t seen you forever! Thanks for the comment, and for introducing a really nifty function name. :)

          2. 2

            I had the same thought “wow, that’s a stupid data structure, I wonder what its type would even be”. I think the blog author could answer his question pretty easily if he tried to write it in Java himself and would see how awkward and convoluted it is in Java to have to walk around the type system.

            Yes, as people wrote further down, it is rather doable in dynamically typed languages but I think a generic flatten which flattens recursively (unlike concat/flatmap) is in general not a good idea, because it means that you have some convoluted data structure and you try to simplify it using a brute force approach (“just bubble up the values somehow”) and the semantics of such a flatten function are unclear in either case: what to do with tuples? What to do with subclasses of list? What to do with other iterables, how would iterators flatten?

            The task is rather complex, therefore there can’t really be a goodgeneric solution as the semantics are rather unclear. I’d advise against using this question for hiring unless you want to see how the candidate approaches it without expecting an actual solution.

            1. 3

              The task is rather complex, therefore there can’t really be a goodgeneric solution as the semantics are rather unclear.

              When the semantics are unclear, you can parameterize by the questions you’re asking.

              what to do with tuples? What to do with subclasses of list? What to do with other iterables, how would iterators flatten?

              Parameterize with a predicate.

              For example, here’s an implementation in Clojure using tree-seq:

              (defn flatten [x]
                (filter (complement sequential?)
                        (rest (tree-seq sequential? seq x))))
              

              tree-seq’s signature is (tree-seq branch? children root).

              Only want to flatten a particular type? OK:

              (defn flatten-if [pred x]
                (filter (complement pred)
                        (rest (tree-seq pred seq x))))
              

              Note that this is extremely polymorphic. Want to find all of the <p> tags in an html document? No problem:

              (->> (flatten-if html-element? html-doc)
                   (filter #(= (html-tag %) "p")))
              
            2. 2

              That RList type completely kills your code reuse. See my reply to Leonidas below about tree-seq. Even if you wrote a well-typed tree-seq function, you’d need to write several type classes or pay O(N) conversion cost for your custom types in order to get it to be as polymorphic as the Clojure version, which is essentially typed with an existential.

              1. 2

                Yeah, that’s why I included the Free [] version. It already has all the typeclass definitions and combinators you could want. I just included RList so people unfamiliar with Free monads or F-algebras or other generic constructions could have a concrete and straightforward type definition to look at.

                1. 0

                  I assume you’re talking about this type specifically?

                  http://hackage.haskell.org/package/free-4.12.4/docs/Control-Monad-Free.html#t:Free

                  You still have to pay O(N) boxing cost to wrap/unwrap things with Pure and Free constructors. Never mind the syntactic cost: If you have a [[a]] and want to use a generic tree-seq operation on it, you’re going to need to sprinkle your code with liftF, etc.

                  At least I think liftF is the right function. Holy hell, Haskell documentation reads like math word soup.

                  1. 2

                    My assumption was that the data structure was already in the Free format.

                    If you’re worried about a constant per-element increase in complexity, I can pretty much guarantee you that it’s going to be faster in Haskell than in Clojure (since your given predicate is doing a type check anyway) so I’m not really sure what you’re complaining about. The overhead associated with switching on a sum type in Haskell is basically zero, since the compiler is very very good at either eliding checks in the first place or just doing pointer tagging.

                    Never mind the syntactic cost

                    Is it just me, or is my solution vastly syntactically simpler than yours?

                    In fact, it occurs to me that what I wrote is already defined in Data.Foldable as “toList”. So the whole solution is one obvious, standardized function. Where’s the “syntactic complexity” in that?

                    If you’re referring to constructing the value by hand in the first place, you probably wouldn’t write out Pure and Free manually. You could just make some helper functions that wrap it up for you. I’m not really sure what your hypothetical use case is; do you imagine writing thousands of trees out by hand?

                    How do you expect documentation for the Free monad to read? Any “friendly” description would probably be wrong, because it would require assuming things about the functor you’re using.

                    1. 1

                      Of course you can make any particular call site syntactically smaller by moving the complexity elsewhere. My tree-seq usage makes the parameterization explicit, which is going to be syntactically larger than the implicit dictionary passing of a type class. However, that implicit dictionary passing means you need to insert lifts and transforms in various places. Changing the type may incur non-local code changes; changing a predicate will not. Your assumption that the data structure is already in the one-true Free format would be simply wrong if you wanted two different flattening predicates.

                      1. 2

                        which is going to be syntactically larger than the implicit dictionary passing of a type class.

                        There is no dictionary passing in my code. Any use of the fact that Free [] is Foldable, if that’s what you’re concerned about, is going to get specialized away. There will be no runtime switching on types or looking up of typeclass vtables.

                        implicit dictionary passing means you need to insert lifts and transforms in various places

                        Can you give an example? I really can’t imagine what you’re referring to; we may be on completely separate pages here.

                        Changing the type may incur non-local code changes

                        Changing the type of what? The argument? That has no bearing on the behavior of the fold over the tree. If you mean changing the type of [] to something else (for example a Vector), the behavior will be the same so long as the other object also obeys the Foldable laws (which any common data structure does).

                        1. 1

                          Unless you’re using the prelude foldr that is specialized to lazy lists, you need to pass the Foldable dictionary. Ignore the performance concerns for a moment: That Foldable implementation depends on what predicate you want to flatten. Let’s try to be concrete, so my point about parameterization is clear.

                          Say you had a simplified HTML type:

                          data Html = Element Tag [Html] | Text String

                          How would you flatten only paragraph tags?

                          1. 4

                            you need to pass the Foldable dictionary

                            Ah, that’s a common misconception. In almost all instances (except for in the case of higher-rank types or when the compiler’s code-size heuristics kick in), typeclass dictionaries don’t actually show up in the compiled code. This is because the compiler knows, statically, exactly which instance of the typeclass will be used, and can just stick the proper functions in directly. Here’s an example where the compiler actually can’t specialize the code to a particular instance, and the dictionary has to exist at runtime.

                            data SomeShow = SomeShow (forall a . Show a => a)
                            

                            That’s pretty rare though. In any case where the type is concretized at compile time, the compiler will usually specialize away any dictionary passing.

                            How would you flatten only paragraph tags?

                            Ah, OK. I thought you were talking about the question in the OP. Well, since we were using Free monads earlier, let’s express this as a Free monad:

                            data Tag = P | A -- For example
                            data Element a = Element Tag [a] deriving Functor
                            type Html = Free Element String
                            

                            Then we can do

                            psOnly (Element P xs) = xs
                            psOnly (Element _ xs) = []
                            collectPs = collect . hoistFree psOnly
                            

                            There are probably nicer ways to do it, but this seemed straightforward enough. The problem with doing it exactly the way you did it (a boolean predicate) in a strongly typed language is that your boolean predicate has to work on any type that might show up in the tree, whether a leaf or a node. Doesn’t mean you can’t do it, just that it won’t look exactly the same. You could also just write functions that deconstruct and operate on Free values directly.

                            1. 2

                              typeclass dictionaries don’t actually show up in the compiled code

                              I’m still not talking about perf, I’m talking about the conceptual model.

                              hoistFree psOnly

                              This is (some) of the syntactic overhead of supplying the dictionary that I’m talking about. Moreover, it’s supplying it existentially! Just like a dynamically typed language would by default.

                              Now what happens when you want an open set of tags? You specified a closed set. Your type changed, so you’re going to have to change your psOnly function or define a pattern view. That’s more of the syntactic overhead and non-local impact I’m talking about.

                              Moreover, this is a very indirect way of programming. I specified the problem in terms of a logical predicate (is it a paragraph?) and you specified a solution in terms of a new type (“the tree of paragraphs”) and some intermediate data structures (the empty sequence, etc). Honestly, I don’t even know if you’ve solved the problem correctly, since it’s not quite clear to me if you’re splicing paragraph tags in to their parents, or stripping non-paragraph tags, or what. I’ll take the direct, concrete approach any day of the week, even if it’s more code (and it probably isn’t).

                              1. 3

                                I’m talking about the conceptual model.

                                The conceptual model has nothing whatsoever to do with dictionaries; that’s an implementation detail. Typeclass implementations are unique and global, so the user never “supplies a dictionary” at any point.

                                This is (some) of the syntactic overhead of supplying the dictionary that I’m talking about.

                                I’m not really sure what your delineation between “overhead” and “code” is.

                                In your code for flatten-if and p-flattening, I count 20 parentheses, which is what I would actually define as “overhead”. Writing out “hoistFree”, which is essentially a kind of map operation, is just code. If I had to do it a bunch of times, then I would consider it overhead, but I’m just doing it once.

                                Moreover, it’s supplying it existentially! Just like a dynamically typed language would by default.

                                Existential quantification is emphatically very different from what dynamically typed languages do. Forall-quantified types explicitly preclude any behavioral differences based on the type, whereas dynamically typed languages (generally) allow and expect you to do runtime type inspection. Forall-quantification inside a type actually restricts what you can do.

                                Now what happens when you want an open set of tags?

                                Very similar to the example I wrote earlier:

                                filter pred (Element tag xs) = if pred tag then xs else []
                                collectWith pred = collect . hoistFree (filter pred)
                                

                                For a “concrete approach”, you can just pattern match on the tree directly, like

                                f (Free (Element tag children)) = if something tag then ...
                                

                                or (with the structure you proposed)

                                f (Element tag children) = ...
                                

                                and you specified a solution in terms of a new type

                                Well yeah, this is sort of a hallmark of strongly-typed languages. We’re doing operations on a tree structure; how am I going to do anything without defining a type?

                                Honestly, I don’t even know if you’ve solved the problem correctly

                                I’m not actually sure what your code is doing, so I tried to guess based off your description (“How would you flatten only paragraph tags?”). If you’d like to specify the problem further, I’d be happy to refine my answer.

            3. 11

              Interesting thing about this task is that I think static typing in a language like Java gets in the way more than it helps.

              I haven’t used Java in a long time, but I would spend most of my time figuring out how to make a list that contained both integers and List objects. Maybe Oracle finally got around to making the primitive types real objects and it wouldn’t be so difficult now?

              In Common Lisp it’s almost trivial, although this version isn’t as fast as the one in Alexandria:

              (defun flatten (lst)
                  (if (listp lst)
                      (apply #'nconc (mapcar #'flatten lst))
                      (list lst)))
              
              1. 10

                I’m guessing the “correct” way of doing this in Java would be to make a List<Object> and then fill it with Integers and other lists. Basically completely sidestepping the type system. This is unintuitive, and so you’re right that this question is probably a better fit for a dynamically typed language.

                Alternately, in a language like ML, Haskell, or even Swift, you can use a sum type, where you can specify that the list contains objects that can either be other lists or integers, and have the type checker enforce that they can’t be anything else (unlike the Java example where you could theoretically stick any object in that list).

                1. [Comment removed by author]

                  1. 5

                    Since the article was complaining (okay, stating) that nobody used Java 8 I ended up implementing this solution: https://gist.github.com/calumleslie/8fe772225b8403e515286a56b965b985

                    Quite fun to do, but I’m not going to claim the support code is short. The solution is one (recursive) line though!

                    Edit: I was interested about how this actually executes, so here is the nightmare stack from the point where the system is evaluating the ‘5’ node.

                2. 6

                  Interesting thing about this task is that I think static typing in a language like Java gets in the way more than it helps.

                  That’s what makes it a strange interview question in this context imo, where it seems they’re hiring for a Java shop, interviewing candidates who responded to a job ad looking for Java programmers. Flattening a list is an introductory textbook example in dynamic languages where you can actually write literals like “[1,[2,3], [4, [5,6]]]”. But it’s not really a introductory Java textbook example.

                  It’s of course solvable in Java, and a broadly competent programmer should be able to come up with a solution in an hour. But I’m not sure it serves as a great fizzbuzz-type baseline filter when hiring for a Java shop. You probably end up filtering more on whether the candidate has previously encountered a flatten function in a different language and remembers the algorithm (maybe they went to a university that had a Scheme course), and/or whether they’re comfortable with doing ugly run-time type things in Java. Which are not useless skills, but there are probably lots of reasonably but narrowly competent Java programmers who would get hung up because it’s a bit of a “foreign” example to someone who works entirely in Java.

                  1. 3

                    Pretty easy in Python too:

                    def flatten(lis):
                        ret = []
                        for item in lis:
                            if isinstance(item, int):
                                ret.append(item)
                            elif isinstance(item, list):
                                ret.extend(flatten(item))
                        return ret
                    
                    1. 1

                      Using generator and yield from would be my first choice, but yeah… What’s so hard about that? I mean… It’s literally a five minute task.

                      1. 3

                        It’s only easy if you recognize the right case analysis. I use to TA for a programming languages course (mostly juniors and seniors), and one of the first assignments contains a question that asks you to write this function in a lisp-like language. Tons of students trip on it. A lot of them do get it right, and some of them write sub-optimally correct solutions (i.e., redundant case analysis). If you’re not use to thinking about recursive structure, then it can be tricky!

                        N.B. The assignment is given after a lecture that covers rose trees, so we weren’t just throwing them to the wolves. :-)

                        1. 3
                          def flatten(source):
                              for item in source:
                                  yield from flatten(item) if isinstance(item, list) else [item]
                          

                          yield from ftw indeed!

                          1. 1

                            I should use these Python-3 only features now that I’ve finally ported my setup :)

                      2. 1

                        I haven’t used Java in a long time, but I would spend most of my time figuring out how to make a list that contained both integers and List objects.

                        I think I would start by taking a step back to see if your data are being properly represented by objects. Java is, after all, OO. If you need a heterogeneous collection, it should be typed to store a parent class of the subclasses you want to store. Asking how to collect objects and lists is too abstract; rather ask how do I collect records about single family homes and also apartment buildings that are composed of multiple units.

                      3. 11

                        I’ll tell you why I failed, hard, at this exact question a few weeks ago. I simply don’t do well with my every keystroke being monitored in real time.

                        I like to think I’m a fairly competent developer and I’ve got quite a bit of open source work that, I think, shows that, I just don’t handle being watched too well, especially when it’s by someone who is the gatekeeper for a job that I really want. It was so bad that I basically forgot how to type

                        There were a few other problems, of course, like it was using javascript, which while I can be fairly proficient with it, I hadn’t done much with it in the last few months so I couldn’t remember how all of the little things worked (“Shit, is it len(obj), obl.len(), obj.len, obj.length(), or obj.length?”) and I wasn’t allowed to look at the docs for anything.

                        All put together, I’m sure I ended up looking like some complete moron who had done a single programming tutorial or something. They refused to reply when I asked why they were saying no, but since all these little exercises took up the bulk of the interview I can only assume it was this.

                        1. 4

                          This comes up a lot in Erlang-land. For some reason, I seem to only have to deal with lists that are nested once, and never more than that. Maybe I’m just lucky. They aren’t completely awesome, but generally, you keep your nested lists as is, and send them around as an IOLIST. They aren’t completely awesome, because iolists can only contain certain datatypes, like binaries, the numbers 0-255, and other iolists. Here’s more on this pattern: http://www.erlangpatterns.org/iolist.html

                          Just wanted to share something slightly unusual, from a Lang that is slightly unusual.

                          1. 3

                            The example is much harder in statically-typed languages than dynamically-typed languages. Beyond that, it is even more difficult in a language (like Java) that is statically-typed but which doesn’t have union types. (AFAIK the closest thing to a union type in Java is an interface.) The interviewer is presumably someone with a background in a dynamically-typed language interviewing predominantly Java programmers. I think this explains why the interviewer expects it to be easy but it’s difficult for his interviewees.

                            1. 0

                              There are two answers. Unfortunately, it’s hard as hell to tease the two apart.

                              1. Sometimes, very smart people interview and test poorly. I’ve always been a great test-taker and good at technical interviews. (I’m bad at “fit” interviews. Tech interviews require getting the right answer, the truth; and fit interviews often require selling a lie– that you’re more subordinate and pliant than you actually are. One or the other at a time, I can handle. Unexpected context switches are hard, though.) So, yeah, I’m sure there are plenty of people who are more than smart enough to blow through that problem under ideal conditions who freeze up when they’re being tested. We need some way of filtering out the non-programming programmers (see #2) but the way we do it sucks and we bunt a lot of good people.

                              2. There are a ton of mediocre or worse programmers. Business programming actually doesn’t involve writing much code. It’s mostly stapling together off-the-shelf assets and figuring out APIs. A lot of business programmers write less than 100 lines of code per year. Five years later, they’ve forgotten everything they knew in CS 1 and they can’t write FizzBuzz, much less speak intelligently about big-O notation. The business software environment, to put it bluntly, rots the brain. And add the lack of a barrier to entry, and you also get a lot of people who never learned CS who just slap stuff together. Moreover, there are plenty of jobs (sometimes well-paid jobs) where programmers use Agile Scrum and that level of unrigorous sloppiness is not only sufficient to get and keep the job, but preferred over correctness.

                              1. 0

                                This is why everyone should go through The Little Schemer. It gives you a mental model for recursion and tree navigation within 3 chapters (and implements a flatten method in one of those chapters if I remember correctly). This type of problem isn’t hard to think about or solve.