1. 11
  1.  

  2. 16

    I work on a team that primarily writes Haskell. We’re part of a small-ish (~20 person) engineering org where most people come in knowing Go or JavaScript or Ruby. We’ve had good (surprising!) success onboarding new engineers to Haskell. Zero to production-ready code review takes around 6 weeks, which isn’t that different from most languages.

    Because of this, I’ve had quite a bit of experience teaching working programmers what monads are, and seen a ton of different approaches. Pedagogically, I would say that “monads as monoids of endofunctors” is probably one of the least effective approaches we’ve tried. There’s just too many concepts here that you need to introduce at the same time. Monoids make sense to most people (programmers write a lot of monoids day-to-day, so it’s easy to motivate a monoid with concrete examples like strings, lists, numbers, or Builders), but endofunctors and categories usually do not. What is a category? What is an example of a category in code I’ve seen before? These questions tend to trip people up because it’s hard to understand something until you’ve played with it for a bit (alas, You Can’t Tell People Anything).

    The approach I’ve found to be most effective is to start from something a working programmer has seen before, and compare and contrast. Fortunately, most programmers usually do have a good starting point! Depending on their native language, this is usually a Promise or a Future or a continuation or a Stream - usually some kind of object that represents a computation, where these objects can be sequenced to construct larger computations. I’ve had great success working with JavaScript programmers and telling them “monads are a lot like then-ables (Promises), and async/await is a lot like do-syntax, and here’s where they’re similar, and here’s how they differ”.

    I think this approach succeeds because it’s much easier to teach a concept by starting from something that someone has actually used before (and by using their existing vocabulary) than by starting from scratch. If I tell someone “monads are values that represent computations that can be sequenced”, I get a lot of blank stares. If I follow that up with “just like Promises!”, I see a lot of lightbulbs go off.

    (This is why I think it’s actually particularly hard to teach applicative functors. Lots of Haskell pedagogy attempts to go in the order of the typeclass hierarchy, teaching functors then applicatives then monads. I think it’s actually more pedagogically effective to skip around, teaching functors then monads then applicatives, because it’s hard to motivate applicatives as “functors containing functions” and easy to motivate applicatives as “weaker monads when you don’t need the power and you want to constrain what the computation can do”.)

    1. 7

      “Monoid in the category of endofunctors” is true statement, but is an old joke that escaped containment.

      Can you elaborate on your async/await comparison? I haven’t been in the JS world for a long time.

      Re: Applicative: I used to help teach the System-F (nee Data61) FP Course, and we seemed to have reasonable success motivating Applicative by trying to map a binary function a -> b -> c over a functor f a and getting “stuck” with an f (b -> c). This motivates Applicative as the typeclass that provides lift0 and lift[2..\infty] functions, if you think of Functor as the typeclass that provides lift1 (as an alternative perspective on fmap).

      1. 10

        map a binary function a -> b -> c over a functor f a

        The hard part about this example is that the immediate response is “why would I ever do this?”. I don’t think I have a compelling answer for that. I’ve found that helping someone answer “why would I ever use X?” for any concept really helps them understand the concept better.

        Can you elaborate on your async/await comparison?

        Sure! TL;DR: async/await is do-syntax for the Promise monad.

        JavaScript has a concept of “Promises”, which are asynchronous computations that will eventually return a value. Promises are more convenient than using callbacks since you avoid the Pyramid of Doom problem (although they trade this off with some annoyances of their own), and most of the JS ecosystem has settled on Promises as a primitive for asynchronous computation. (Fun fact: Node actually had Promise support really early on, then switched to callbacks for most standard library functions, then switched back to Promises! There’s a really interesting aside about this in Platform as a Reflection of Values, a talk by Bryan Cantrill who was there at Joyent at the time.)

        Promises provide a couple primitives:

        • resolve(value: T): Promise<T>, which takes a regular value and “wraps” it into a Promise that immediately provides the value.
        • then(this: Promise<A>, f: A -> Promise<B>): Promise<B>, that takes a Promise<A> and a function A -> Promise<B> and provides you with a Promise<B>. Mechanically, then waits until this has a value ready, and then runs f and returns the result.

        (There are some other primitives for constructing promises from callbacks, synchronizing multiple promises, doing error handling, etc., but they’re not important for this comparison.)

        Why do you have to use then in order to call a function using the Promise’s value? Because the Promise’s value isn’t guaranteed to be there yet! What you’re really saying is “here’s a computation that I want to run once the value becomes available”. (If you know monads, this should start sounding familiar.)

        So programmers usually wind up with a bunch of code that looks like this:

        somethingThatReturnsAPromise()
          .then(a -> {
            // Do a bunch of synchronous stuff,
            // then end up doing something
            // asynchronous, and return a Promise.
          }).then(b -> {
            // etc.
          }).then(c -> {
            // etc.
          })
          // This eventually becomes a long chain.
        

        Later, JavaScript adopted what’s called “await/async” syntax, which is a syntax sugar that allows you to write code like this instead:

        // Some types:
        // somethingThatReturnsAPromise: () -> Promise<A>
        // someOtherAsyncThing: A -> Promise<B>
        // yetAnotherAsyncThing: B -> Promise<C>
        
        a = await somethingThatReturnsAPromise()
        // Do stuff...
        b = await someOtherAsyncThing(a)
        // etc.
        result = await yetAnotherAsyncThing(b) // result: Promise<C>
        

        This is a bit clearer and a bit more convenient to write, and basically desugars down to the same thing.

        Now, how is this all related to monads? Well Promises are actually just an instance of monad! (Not really because of implementation details, but they’re monadic in spirit.)

        Let’s look at how they compare. Monad is a typeclass, which basically means it’s an interface (or really more like a Rust trait). (Again not really, but correct in spirit - most of what I’m saying here is going to be not-super-technically correct.) A type is-a monad when it implements that interface. The monad interface basically looks like:

        interface Monad for type T<A> {
          resolve(value: A): T<A>
          then(this: T<A>, f: A -> T<B>): T<B>
        }
        

        (Haskell’s actual Monad typeclass names these functions return instead of resolve, and bind instead of then.)

        Hold on! Doesn’t that look familiar? Yes, this is almost exactly what Promise provides! Now let’s look at do-syntax:

        // Some types:
        // somethingThatReturnsT: () -> T<A>
        // someOtherThingThatReturnsT: A -> T<B>
        // yetAnotherThingThatReturnsT: B -> T<C>
        
        result: T<A> // Where T is a monad
        result = do
          a <- somethingThatReturnsT()
          // Do stuff in let-bindings...
          b <- someOtherThingThatReturnsT(a)
          // etc.
          yetAnotherThingThatReturnsT(b)
        

        Huh… isn’t that almost exactly async/await? Right again.

        Why is this useful? Two reasons:

        1. It turns out that there are lots of monads. This idea of “give me some computations, and I’ll sequence them together in a certain way” turns out to be very general. In Promise, the “way I sequence computations” is “wait until the asynchronous value is available”. In Maybe, it’s “don’t run the next computations if previous computations didn’t succeed”. In State, it’s “let future computations ask for a value that previous computations have put”. All of these implementations of Monad differ in how they implement then and resolve.
        2. It turns out that you can write functions that are general across every single kind of monads! For example, you can write when or unless or forever or sequence (see Control.Monad for other examples). This makes it easier to reuse code and concepts across different kinds of monads. If you can also write your business logic to work across different kinds of monads, it also makes testing easier: you can use a monad that provides logging by writing to disk in production, and swap that out for a monad that provides logging by saving log messages in memory in testing, all without changing a single line of code in your business logic. Monads are like type-checked, effect-tracked dependency injection! (This idea is its own whole blog post - TL;DR: monads and DI are both about effects, so they wind up being used for similar purposes.)

        Sorry in advance for typos or incoherence - most of this is off of the top of my head.

        1. 2

          Thanks for the detailed post. Is your type signature for then(this: Promise<A>, f : A -> B) incorrect? It should be a Promise<B>, right?

          The hard part about this example is that the immediate response is “why would I ever do this?”

          We’d usually motivate this with the Maybe applicative: apply f to both arguments if both are Just; return Nothing otherwise. Then consider other applicatives and notice how an embarrassingly large amount of imperative programming is subsumed by things you can build out of applicative operations (traverse and sequence in particular).

          1. 2

            Yes, good catch on the typo!

            Hmm, I haven’t thought about teaching applicatives in the context of traverse. That’s an interesting idea, I’ll have to think about that more.

            1. 1

              Traversable makes it more powerful, but more complicated. Try specialising to the [] Traversable first, if you end up giving it a go?

            2. 1

              Then consider other applicatives and notice how an embarrassingly large amount of imperative programming is subsumed by things you can build out of applicative operations (traverse and sequence in particular).

              Could you elaborate on this?

        2. 3

          I understand your approach, and while I think it’s effective, it does have some well-known weaknesses. The most important thing to emphasize is the algebraic laws, but your teams are working in Haskell, which is oblivious to those laws; it is easy for programmers to learn the entirety of the Monad typeclass in practice but never realize the laws. (I was such a Haskeller!)

          What is a category?

          C’mon, really? I know that you’re just playing, but this question really needs to be given a standard answer. A category is a collection of structures plus a collection of transformations from one structure to another. We usually call the structures “objects” and the transformations “arrows” or “morphisms”, but this is merely a traditional note so that readers can read common category-theoretic documentation. The main point of a category is to consider composition of transformations.

          What is an example of a category in code I’ve seen before?

          The following systems are categories which I’ve used in serious code. Each system has a proper title, but this is again merely mathematical tradition. This should not be a surprising collection; computer scientists should recognize all of these components, even if they’ve never considered how they’re organized.

          • Set, whose objects are sets and arrows are functions
          • Rel, whose objects are sets and arrows are relations
          • P, whose objects are sets and arrows are permutations
          • Circ, whose objects are the natural numbers and arrows are Boolean circuits
          • Mat_R, whose objects are the natural numbers and arrows are matrices over some semiring R
          • Eff, whose objects are sets with computable equality of elements and arrows are computable functions
          • Pos, whose objects are partial orders and arrows are monotone maps

          In addition, every programming language gives a category via the native type theory construction. This is actually misleading for Haskellers, since the category Hask is actually bogus, but there is a non-Hask category which accurately describes how Haskell transforms data.

          1. 5

            The most important thing to emphasize is the algebraic laws

            Yes, agreed. Something I’ve found is useful for teaching is to explicitly map the maximally precise terminology that the Haskell community likes to use back to similar, vaguer terminology that working programmers are familiar with.

            For example, I’ve found more success teaching “algebraic laws” as “coding conventions but motivated because look at how useful these equalities are”. Most programmers are familiar with conventions (e.g. don’t do side effects in your toString implementation), but I’ve found a lot of people get hung up on the terminology of what “algebraic” means and what “laws” are.

            A category is a collection of structures plus a collection of transformations from one structure to another.

            The difficulty I’ve had with a definition like this is answering “why is looking at something as a category useful to me as a programmer?”. I’ve found that providing a satisfying answer to this question for any concept (“why is X useful?”) helps tremendously with making the concept stick.

            There are definitely categories in the list of examples you’ve provided that I’ve worked with (e.g. Set, Rel, P) in other languages, but where I’ve been struggling is providing a satisfying answer to “if I could look at this as a category in a programming language I’m familiar with, what would be easier?”.

            It’s been relatively easier for me to answer this question for ideas like effect tracking (“your logging library will never secretly call the network again”) or algebraic data types (“you’ll never have a struct whose fields are in an inconsistent state”), but I haven’t found a satisfying answer for categories yet. If you’ve heard of one, I’d love to use it!

            1. 4

              A disconnect between (nominally) mathematical and pragmatic perspectives I’ve noticed that seems important to address: people who aren’t conditioned to think as mathematicians do tend to, understandably, relate to mathematical invariants as inscrutable laws passed down from the universe, when in fact they are just as much determined by a criterion of usefulness as any programming construct is. Take the case of linear transformations as a tensor product between vectors and covectors: considered on their own, the only matrices you get from just that operation are singular. Such a purity of definition results in a universe of discourse (singular matrices) that is, frankly, not very useful, so we go on to consider sums of such products so that we can talk about linear transformations with full rank.

              It’s certainly the case that what counts as “useful” to a mathematician might very well diverge wildly from what counts as useful to a programmer, but it’s been my experience that conveying the understanding that mathematical constructs are chosen as much on the basis of what they can do as are programming constructs is an essential aspect of answering the question of “why is framing this practical concern in terms of a more abstract mathematical construct useful to me as a programmer?”

              1. 3

                All that said, though, I still harbor a suspicion that most invocations of category theory in the context of programming are fetishistic and convey little in the way of enhanced understanding of the shape of the problem at hand. The mere fact of dealing with a problem where one is operating on objects within a category and morphisms between them doesn’t at all imply that conceiving of the problem in category-theoretic terms is useful, even in the case of a lone creator who is free to organize their work according to whichever models they see fit, and much less in the case of someone who must work with others (which is to say, someone whose work could ever matter) and develop a shared mental model of their collective project with unquestionably competent people who nonetheless are in general unlikely to share the category-theoretic lens. It’s pretty silly to me to cite the mere fact of any one category’s objects and morphisms being a frequent domain of consideration as evidence of the notion that thinking of that domain in terms of its categorical representation is inherently useful.

                1. 1

                  If literally nothing else, category theory is tightly related to programming language theory and design via type theory. I’ve explained before how this works for the MarshallB programming language, but it also works for languages like Joy. Categories also show up in computability theory, due to the fact that Turing categories are restriction categories.

                  1. 1

                    I’m absolutely in agreement with you on that, in that category theory provides an extraordinarily useful set of tools for modeling constructive type theories. Again, though, among all the problems that might ever be solved by a Turing-complete symbol system, how many benefit notably from a category-theoretic framing? To be clear, I’m coming at this from the position that cases like Haskell’s Monad typeclass don’t qualify, since they seem to be constructs that are merely influenced by category theory rather than being faithful representations of some category-theoretical construct: consider that return as implemented isn’t actually a natural transformation η in the theoretical sense, because it doesn’t actually map between functors: the identity functor is merely implied or assumed, even though the input is merely an object of Hask that need not be a functor itself.

          2. 1

            Thanks, the description of explaining things in terms of what the programmer already knows helped me a lot. I have been working in c# for years and I knew that I have been using and understanding the use of monads most of that time through linq. But I was having issues translating that to a more abstract definition and overall concept. After reading your comment I just googled ‘monads in c#’ and got this https://mikhail.io/2018/07/monads-explained-in-csharp-again/ which explained to me how what I already new related to the more general concept.

            1. 3

              I’m still looking for an argument/example for why all this abstraction carries its own weight, in a software-development context.

              1. 3

                Like most software engineering patterns, it exists to facilitate code reuse in a principled way. An abstraction’s utility can be measured along two axes:

                1. Working only within the abstraction, what things can I say?
                2. How many things embody this abstraction?

                Monad is a useful abstraction because it applies to a surprisingly large range of types, while still allowing a broad vocabulary of functions that work over any Monad. It is also hard to understand for these reasons, which is why the most effective teaching method seems to involve finding some vaguely familiar concept (e.g., promises) that happens to be a Monad, using that to give the student a toe-hold, and then asking the student to write Monad instances for a zillion types, letting the instinctive, pattern-matching part of the student’s brain notice the underlying similarities.

                The Monad abstraction in Haskell enables (among other things) a large family of functions (when, unless, forever, etc) that would be control structures in other languages. This is handy because a) we don’t have to petition some language steward to add new ones (contrast: Python’s with statement), and b) we can use our normal programming tools to work with them.

                I can use the Monad abstraction when checking for missing values, to halt my program on the first error, to make an ergonomic database Transaction data type that prohibits non-database I/O, to write deserialisers, to set up eDSLs with great ergonomics, to treat lists as nondeterministic computations, to provide a good interface to software transactional memory, to build up an additional “summary” result in a computation, to pass around additional arguments, and other things I’ve surely forgotten. You could well say (as you said to /u/shapr in a sibling comment) that none of these need the theory of Monads. And they don’t. What the theory of Monads gives you is a useful toolbox to see how they’re all similar, and it’s one that’s usefully expressed only in a few programming languages. A tool like mapM works exactly as well in each of those apparently-diverse cases, and only needed writing once.

                1. 2

                  mapM’s functionality is trivial, though, and I expect anything you can do with an arbitrary monad would be equally trivial. In my experience, code reuse is useful when the code carries significant domain knowledge or provides a single point of reference for application details which are subject to change. Abstracting away repeated patterns for the sake of it, or simply for the sake of concision, is often not worth the cognitive load it adds.

                2. 3

                  I’m not a haskeller, but I’ve spent a little time with the language. One benefit is that you can write your functions for the success case and the monadic machinery will short circuit if there is a failure. This means that you don’t need to litter you code with checks for nulls or nothings.

                  1. 2

                    One thing I like is that I can use the same code with a fake in memory database without changing the code itself, just feeding a different value into the monad.

                    1. 7

                      You don’t need the theory of monads to enable that.

                      1. 1

                        Using a monad to separate concerns and do easy dependency injection is one of many cases where the monad abstraction carries its weight.

                        I agree, you don’t need the theory to do those things with a monad, you just use it.

                        1. 3

                          I can do those things quite easily in languages which don’t even have the concept of “monad.” The abstractions I use might be representable as monads, but I see no benefit to calling them out as such.

                          1. 1

                            Consider Elm. Its designer has banned the word monad from all the documentation. Nevertheless, it has a very consistent and intuitive way of handling all monadic types. How can that be? Because the designer knew they were monads.

                            Most users won’t ever have to declare a monad. They don’t need monad tutorials or even know the word, but the world would be a better place if all language designers did.

                      2. 3

                        that’s just an interface, isn’t it?

                        1. 1

                          If you mean an interface as in an API being a pattern of functions and data, then yes. A good interface can make a problem easier to think about and solve.

                          1. 2

                            Or an interface as in literally like a Java interface, i.e. a simple everyday programming concept that doesn’t need any theoretical mathematics to understand.

                            1. 3

                              That’s what I was thinking. Jdbc is the ultimate example here. You program only against interfaces and you can swap databases in tests trivially easy. All without reading complex Monad tutorials.

                              1. 1

                                Interface theory is very complex and mathematical. You never see blog sized tutorials about all the theory because it doesn’t fit in a blog. Monads are stupid simple in comparison, which is why there are so many blogs about it. Get over the abstract nonsense words, implement it yourself in 10 lines, then write a blog about how you finally understood it. That’s all there is to it.

                                Designing interface rules for your language is hard to get right even if you know all the theory, cause there are many tradeoffs and you might make the wrong choice for what you’ll want later on. Getting monads wrong is only possible when you refuse to acknowledge you’re dealing with a monad, like JS’s Promises.

                                1. 2

                                  Interface theory is very complex and mathematical. You never see blog sized tutorials about all the theory because it doesn’t fit in a blog.

                                  I think you never see blogs about it, because - at least Java programmers - learn about them in the very beginning and then use them. They are trivial to understand and use. Java programmers write them every single day and most of them do not have deep type theory backgrounds. I think that is what this thread is about. Pragmatic programmers using something vs. pure functional programmers exchanging complex Monad tutorials.

                                  1. 2

                                    Exactly, you can use monad-like things without ever learning about monads. You’ll have a better time if the language designer did know monad theory though. Same goes for interfaces.

                                    I really don’t want to call monads complex though. That’s what leads to this “monad tutorial fallacy”, it’s always the first mythical hard thing new Haskellers run in to, so when it clicks they must write a tutorial. Haskell has other stuff much more complex, that never gets blog explanations either. I’d say GADTs are at the level of interfaces, and when a new Haskeller reaches those, suddenly they’re pragmatics again. (And then after a year you make it your master thesis and never touch Haskell again lmao.)

                      3. 1

                        Code reuse