Thanks to all the Monad’s tutorial and hype, when I first learned Haskell, a few years ago, it took me a few months to accept that I had understood Monads at the first glance.

All those tutorials were just confusing. I was all thinking “what am I missing? it seem so simple! why nobody explains what I’m missing?”. I was missing nothing.

A Monad is simply a wrapper. You have a function to wrap a value (return) and a function that applies to the wrapped value another function that returns a different wrapped value (bind). That’s it.

Thanks for this list. Someone on my team recently read and shared a “Monads are like Burritos” blog post. The post made several dubious analogies. The effect was that people felt like they understood Monads. In fact, they did not. This is worse than not understanding them and feeling like you don’t understand them.

Seriously. People act like the concept is so complicated and it really isn’t. I definitely knew what monads were for a long time before all the elaborate metaphors muddled my understanding. So far I’ve gotten the most mileage by illustrating how to convert procedural code to “math style” single expression functions, and rolling from there.

People correct me if I’m wrong because I’m not a Haskeller. I do keep reading these monad things and getting confused about it. One person told me it was basically just imperative programming in a functional language to make things happen in a certain order. I haven’t had anyone attempt to corroborate or refute that. Maybe just one. Your statement sounds similar.

It also reminds me of SSA form a little bit in how you describe it.

I will corroborate that as a partial explanation. And it should remind you of SSA form, since it helps accomplish similar things. I think that’s a confusing place to start though, because people tend to ask why functional languages don’t just run things in a certain order anyway. And they do, take printNum( readNum() + 1 ), this will clearly read a number before trying to print the number.

Instead, assume we have a VM that only understands simple “math style” functions like:

f(x) = expression

Such as:

square(x) = x * x

So yes we could write:

main() = printNum( readNum() + 1 )

But we could not write:

main() = print("multiple"); print("expressions"); print("in a sequence")

That is, the VM does not implement multiple statements in a function, it expects only a single expression. Why not? Ignore that for now. This code will work though:

That’s tedious as hell to write though. Humor me, and now lets write it with lambda expressions. Lambda expressions are function values, i.e. these two definitions of main are equivalent:

main() = print("hello world")
main = λ() -> print("hello world")

Notice I’m using main() = to define a named function, but just main = with lambda. This is exactly identical to the following javascript:

function main() { print("hello world"); }
main = function() { print("hello world"); }

So we can rewrite our program like so:

main = printStage1
printStage1 = λ() -> printStage2(print("multiple"))
printStage2 = λ(_) -> printStage3(print("expressions"))
printStage3 = λ(_) -> print("in a sequence")

Now lets inline some things, one step at a time:

main = λ() -> printStage2(print("multiple"))
printStage2 = λ(_) -> printStage3(print("expressions"))
printStage3 = λ(_) -> print("in a sequence")
main = λ() ->
(
λ(_) -> printStage3(print("expressions"))
)( print("multiple") )
printStage3 = λ(_) -> print("in a sequence")
main = λ() ->
(
λ(_) ->
(
λ(_) -> print("in a sequence")
)( print("expressions") )
)( print("multiple") )

We now have a strategy for compiling a sequence of expressions into a single lambda expression, that returns the result of the last expression. Notice that I used underscores for the lambda args, because their result was unused by the function. But what if our program uses variable names? These are equivalent:

main() = {
n = readNum();
printNum(n);
}
main = λ() ->
(
λ(n) -> printNum(n)
)( readNum() )

Hopefully you can now see that we could write a compiler that converts normal imperative code with variable assignments and so on into an unreadable chain of simple lambdas.

So why would you want to do that? Well, if this was a language we were implementing, we can add imperative syntax to our language without actually extending the VM. Our language interpreter can pre-process the code into simple lambdas, then the VM only has to know how to execute simple lambdas. That’s neat.

It also lets us make radical simplifying assumptions, just like SSA. Function calls can be evaluated in any order, as long as all its arguments have been evaluated first, just like SSA. But the way we’ve done it so far makes every line dependent on every previous line. So now lets split = into = and <-, where = denotes a weakly ordered assignment, and <- denotes a strongly ordered assignment.

main() = {
a = 7
b = 2
x <- readNum()
y <- readNum()
_ <- printNum(a / x + b / y)
}

We don’t actually need to create a new lambda until we hit a strong assignment. So this simple lambda code is equivalent:

main = λ() ->
(
λ(a, b, x) ->
(
λ(y) -> printNum(a / x + b / y)
)( readNum() )
)( 1, 2, readNum() )

Okay but what if we screwed up and wrote this:

main() = {
a = 7
b = 2
x = readNum()
y = readNum()
printNum(a / x + b / y)
}

Which would compile to this:

main = λ() ->
(
λ(a, b, x, y) -> printNum(a / x + b / y)
)( 1, 2, readNum(), readNum() )

If functions can be evaluated in any order, will x or y be read first? It’s undefined. Either readNum() call could run first. You shouldn’t be allowed to weakly order things like IO. That should be an error. One way to do that is to give readNum a type, that insists its return value must be strongly assigned. But readNum should also be ordered with respect to printNum, and anything else that does IO.

So lets define these functions with a special return type that says they are both IO:

readNum(): IO(Number)
printNum(n: Number): IO()

This IO wrapper type must be strongly assigned, and the result of its assignment will be its inner type. That is, these are basically equivalent:

x: Number = 1
x: Number <- IO(1)

That’s the IO monad.

The strong assignment operator <- is called bind. A monad’s bind function takes a function, and returns a new monad, that represents the result of applying the function to the monad’s inner value. Using our typed readNum and printNum, our program looks like this:

main() = {
a = 7
b = 2
readNum().bind(
λ(x) -> readNum().bind(
λ(y) -> printNum(a / x + b / y)))()
}

If you’ve ever written node.js before, you might be thinking hey wait a minute, that’s just an IO callback! Honestly yeah, pretty much. But with strong typing and syntax sugar. We can write this:

x <- readNum()
y <- readNum()
printNum(x + y)

But if we write this, it’s a type error:

x = readNum()
y = readNum()
printNum(x + y)

Why? Because the + operator has the type signature Number + Number, not IO(Number) + IO(Number). Even if + was defined for IO(Number), printNum still takes a Number, not an IO(Number). The type system forces us to bind properly.

So now our VM (or compiler) can treat all assignments as weak, because we’ve implemented strong assignments on top of weak assignments and lambdas. Normally assuming all assignments are weak by default would make the language hugely error-prone to use. But the IO functions all return an IO monad, so the type system protects us. Constrast with a conventional imperative language, that must assume all assignments are strong unless it can prove otherwise.

Naturally the weak assignment strategy opens up a lot more opportunities for certain kinds of optimization. Haskell still hasn’t taken over the world though, cause there are a lot of other opportunities for optimization, like hand writing vectorized assembly. That kind of thing is harder to do in Haskell than e.g. C.

That’s one of the ways Haskell uses monads, but IIRC (and I probably don’t RC), that’s driven more by Haskell’s lazy evaluation than a fundamental property of functional languages.

A more general use case of monads is the Maybe monad, which lets you control for nulls. If a function could return an integer or a null, you instead say its return value is Maybe Int. Then the type system can enforce that anything that calls that function handles both the integer case and the “null” case.

I do keep reading these monad things and getting confused about it.

Monads are mathematical structures. They have nothing to do with computer programming whatsoever. In fact, monads are so ridiculously general compared to most other mathematical objects that there is no way they could serve a concrete purpose for non-mathematician standards. So the first thing monads are not is “something you use”.

Theoretical computer scientists (read: mathematicians) discovered [0] that you can “give a semantics” to a higher-order [1] strict [2] language in which “types” are denoted by “objects in a category C”, and “a computation that produces a value of type B from a value of type A” is denoted by “an arrow A -> TB”, where T is “a strong monad over C”. The tl;dr of all this [3] is that monads are “something intrinsic to the very structure of the programming languages we use”, just like the Earth’s gravitational field is “something intrinsic to the very structure of the world we live in”. You don’t need a mathematical model for it to exist, although it is nice to have one for some purposes.

However, Haskell is a lazy language, so it is intrinsically comonadic rather than monadic. (Again, [3].) And, while Haskellers like their lazy language overall, they also want the sequential structure of their effects to be more like what strict languages have, because lazy effectful computations are insanely hard to reason about. So you could say Haskellers have monads in their standard library because they don’t have it in their core language.

Your distinction between “discovered” and “created” seems to presuppose Platonism, which is a rather contentious issue in the philosophy of mathematics. A Formalist would say that math is indeed created, and that while the implications of a mathematical system may not be known for a long time, math is still a product of a human mind.

I’m by no means a Platonist. I’m just saying monads weren’t invented for this specific purpose. Noticing connections between seemingly unrelated mathematical theories happens all the time.

Yep. This is why I shifted to trying to teach people to understand the concept of data that obeys laws. That’s the real issue: they see “monad laws” and they freeze.

Probably easier for statically typed OOP people to just hear interface. A monad is a kind of interface for some generic type with a constructor and an aggregate/flatmap where the returned generic is the same type as the input.

I think most of the struggle in learning that its just a type with a bind and return is people worry about teaching the math behind it. This is a bad idea. If you’re going to teach math, teach math, and if you’re going to teach programming teach programming. You don’t need to understand the abstract backbone to understand that you can use this thing to manage side effects.

This, a million times over. When I learned about monads in university, the concept was simple and intuitive and I recall just “getting it.” A couple years ago I decided to learn Scala (after a decade of using imperative languages), and a lot of the tutorials made a big deal about monads and tried to explain them in the most obtuse manner, using all kinds of computer science terms and references. For a while I questioned whether I had actually ever understood monads.

I’m in the process of learning Haskell myself right now and I feel myself slowly coming to this same conclusion. I think part of the reason they seemed complicated on the surface, for me at least, is due to how heavily they are used in so much Haskell code out there.

Also because of the abstraction-driven nature of Haskell, I’m always aware that there’s (potentially) a lot happening behind a small operator or function so as a beginner I assume there’s a lot of magic happening that I just don’t understand yet. Couple that with terms for these concepts that are rooted in mathematics (and thus unfamiliar with your average dev, like myself) and you get a recipe for assuming there’s always more to understand.

I think it really took this video for that to get across to me. This whole time I thought I was still missing something. It turns out I use Monads all the time!

I wish instead of talking about bind as an operator he called it the bind function. A lot of people get hung up on the mysticism of operators, not realizing that they are just a binary function with a different syntax.

I’m not an audio-visual learner. I have aphantasia and cannot visualize, as well as auditory processing issues which make it impossible to follow most lecturers (unless their voice is exceptionally clear and they talk slowly). I need to read something to understand it.

The learning styles research is, indeed, persistently mischaracterized, and people who focus on it will explain this in detail.

But disabilities exist and create most of the same accessibility needs which are often justified by appeal to learning styles. Academia is, in general but with some noteworthy exceptions, hostile to any discussion of accessibility. So that’s a complicating factor in any discussion like this…

Okay, but disabilities are not what people mean when they are talking about learning styles. I’m sure you also understand the gist of a data set better via a good graph a better than a list of numbers and you also can get much more easily bored or distracted during a lecture, like you just said, particularly if there’s a computer or book in front of you competing for your attention.

I live with someone with disabilities and I understand the need for corresponding accommodation. But the learning styles myth furthers the idea that everyone needs their own special unique kind of accommodation, which isn’t the case. Odds are that if someone is having trouble learning something by a particular method then the majority of students are also struggling with that same method.

Thanks to all the Monad’s tutorial and hype, when I first learned Haskell, a few years ago, it took me a few months to

acceptthat I had understood Monads at the first glance.All those tutorials were just confusing. I was all thinking “what am I missing? it seem so simple! why nobody explains what I’m missing?”. I was missing nothing.

A Monad is simply a wrapper. You have a function to wrap a value (return) and a function that applies to the wrapped value another function that returns a different wrapped value (bind). That’s it.

Completely agree with you!

See the eightfold path to monad satori from What I Wish I Knew When Learning Haskell:

Thanks for this list. Someone on my team recently read and shared a “Monads are like Burritos” blog post. The post made several dubious analogies. The effect was that people felt like they understood Monads. In fact, they did not. This is worse than not understanding them and feeling like you don’t understand them.

Seriously. People act like the concept is so complicated and it really isn’t. I definitely knew what monads were for a long time before all the elaborate metaphors muddled my understanding. So far I’ve gotten the most mileage by illustrating how to convert procedural code to “math style” single expression functions, and rolling from there.

People correct me if I’m wrong because I’m not a Haskeller. I do keep reading these monad things and getting confused about it. One person told me it was basically just imperative programming in a functional language to make things happen in a certain order. I haven’t had anyone attempt to corroborate or refute that. Maybe just one. Your statement sounds similar.

It also reminds me of SSA form a little bit in how you describe it.

I will corroborate that as a partial explanation. And it should remind you of SSA form, since it helps accomplish similar things. I think that’s a confusing place to start though, because people tend to ask why functional languages don’t just run things in a certain order anyway. And they do, take

`printNum( readNum() + 1 )`

, this will clearly read a number before trying to print the number.Instead, assume we have a VM that only understands simple “math style” functions like:

Such as:

So yes we could write:

But we could not write:

That is, the VM does not implement multiple statements in a function, it expects only a single expression. Why not? Ignore that for now. This code will work though:

See how that would work?

That’s tedious as hell to write though. Humor me, and now lets write it with lambda expressions. Lambda expressions are function values, i.e. these two definitions of

`main`

are equivalent:Notice I’m using

`main() =`

to define a named function, but just`main =`

with lambda. This is exactly identical to the following javascript:So we can rewrite our program like so:

Now lets inline some things, one step at a time:

We now have a strategy for compiling a sequence of expressions into a single lambda expression, that returns the result of the last expression. Notice that I used underscores for the lambda args, because their result was unused by the function. But what if our program uses variable names? These are equivalent:

Hopefully you can now see that we could write a compiler that converts normal imperative code with variable assignments and so on into an unreadable chain of simple lambdas.

So why would you want to do that? Well, if this was a language we were implementing, we can add imperative syntax to our language without actually extending the VM. Our language interpreter can pre-process the code into simple lambdas, then the VM only has to know how to execute simple lambdas. That’s neat.

It also lets us make radical simplifying assumptions, just like SSA. Function calls can be evaluated in any order, as long as all its arguments have been evaluated first, just like SSA. But the way we’ve done it so far makes every line dependent on every previous line. So now lets split

`=`

into`=`

and`<-`

, where`=`

denotes a weakly ordered assignment, and`<-`

denotes a strongly ordered assignment.We don’t actually need to create a new lambda until we hit a strong assignment. So this simple lambda code is equivalent:

Okay but what if we screwed up and wrote this:

Which would compile to this:

If functions can be evaluated in any order, will x or y be read first? It’s undefined. Either

`readNum()`

call could run first. You shouldn’t be allowed to weakly order things like IO. That should be an error. One way to do that is to give`readNum`

a type, that insists its return value must be strongly assigned. But`readNum`

should also be ordered with respect to`printNum`

, and anything else that does IO.So lets define these functions with a special return type that says they are both IO:

This IO wrapper type must be strongly assigned, and the result of its assignment will be its inner type. That is, these are basically equivalent:

That’s the IO monad.The strong assignment operator

`<-`

is called bind. A monad’s bind function takes a function, and returns a new monad, that represents the result of applying the function to the monad’s inner value. Using our typed`readNum`

and`printNum`

, our program looks like this:If you’ve ever written node.js before, you might be thinking hey wait a minute, that’s just an IO callback! Honestly yeah, pretty much. But with strong typing and syntax sugar. We can write this:

But if we write this, it’s a type error:

Why? Because the

`+`

operator has the type signature`Number + Number`

, not`IO(Number) + IO(Number)`

. Even if`+`

was defined for`IO(Number)`

,`printNum`

still takes a`Number`

, not an`IO(Number)`

. The type system forces us to bind properly.So now our VM (or compiler) can treat all assignments as weak, because we’ve implemented strong assignments on top of weak assignments and lambdas. Normally assuming all assignments are weak by default would make the language hugely error-prone to use. But the IO functions all return an IO monad, so the type system protects us. Constrast with a conventional imperative language, that must assume all assignments are strong unless it can prove otherwise.

Naturally the weak assignment strategy opens up a lot more opportunities for certain kinds of optimization. Haskell still hasn’t taken over the world though, cause there are a lot of other opportunities for optimization, like hand writing vectorized assembly. That kind of thing is harder to do in Haskell than e.g. C.

That’s one of the ways Haskell uses monads, but IIRC (and I probably don’t RC), that’s driven more by Haskell’s lazy evaluation than a fundamental property of functional languages.

A more general use case of monads is the

`Maybe`

monad, which lets you control for nulls. If a function could return an integer or a null, you instead say its return value is`Maybe Int`

. Then the type system can enforce that anything that calls that function handles both the integer case and the “null” case.Monads are mathematical structures. They have nothing to do with computer programming whatsoever. In fact, monads are so ridiculously general compared to most other mathematical objects that there is no way they could serve a concrete purpose for non-mathematician standards. So the first thing monads are

notis “something you use”.Theoretical computer scientists (read: mathematicians) discovered [0] that you can “give a semantics” to a higher-order [1] strict [2] language in which “types” are denoted by “objects in a category

`C`

”, and “a computation that produces a value of type`B`

from a value of type`A`

” is denoted by “an arrow`A -> TB`

”, where`T`

is “a strong monad over`C`

”. The tl;dr of all this [3] is that monads are “something intrinsic to the very structure of the programming languages we use”, just like the Earth’s gravitational field is “something intrinsic to the very structure of the world we live in”. You don’t need a mathematical model for it to exist, although it is nice to have one for some purposes.However, Haskell is a lazy language, so it is intrinsically comonadic rather than monadic. (Again, [3].) And, while Haskellers like their lazy language overall, they also want the sequential structure of their effects to be more like what strict languages have, because lazy effectful computations are

insanelyhard to reason about. So you could say Haskellers have monads in their standard library because they don’t have it in their core language.I have made essentially the same point elsewhere.

Footnotes:

[0] As opposed to “created”, like operating systems, word processors or computer games are.

[1] Having procedures as first-class values, e.g., Lisp, Python, Java, ML, Haskell.

[2] Reducing arguments to a normal form before they are passed to a function, e.g., Lisp, Python, Java, ML, but not Haskell.

[3] Lots of details omitted. It doesn’t really matter what any of this means anyway.

Your distinction between “discovered” and “created” seems to presuppose Platonism, which is a rather contentious issue in the philosophy of mathematics. A Formalist would say that math is indeed created, and that while the implications of a mathematical system may not be known for a long time, math is still a product of a human mind.

I’m by no means a Platonist. I’m just saying monads weren’t invented for this specific purpose. Noticing connections between seemingly unrelated mathematical theories happens all the time.

Explaining monads to programmers is like explaining commutativity to accountants.

Yep. This is why I shifted to trying to teach people to understand the concept of data that obeys laws. That’s the real issue: they see “monad laws” and they freeze.

Probably easier for statically typed OOP people to just hear interface. A monad is a kind of interface for some generic type with a constructor and an aggregate/flatmap where the returned generic is the same type as the input.

I think most of the struggle in learning that its just a type with a bind and return is people worry about teaching the math behind it. This is a bad idea. If you’re going to teach math, teach math, and if you’re going to teach programming teach programming. You don’t need to understand the abstract backbone to understand that you can use this thing to manage side effects.

What exactly is a nontrivial data structure, if not “data that obeys laws”?

This, a million times over. When I learned about monads in university, the concept was simple and intuitive and I recall just “getting it.” A couple years ago I decided to learn Scala (after a decade of using imperative languages), and a lot of the tutorials made a big deal about monads and tried to explain them in the most obtuse manner, using all kinds of computer science terms and references. For a while I questioned whether I had actually

ever understoodmonads.I’m in the process of learning Haskell myself right now and I feel myself slowly coming to this same conclusion. I think part of the reason they seemed complicated on the surface, for me at least, is due to how heavily they are used in so much Haskell code out there.

Also because of the abstraction-driven nature of Haskell, I’m always aware that there’s (potentially) a lot happening behind a small operator or function so as a beginner I

assumethere’s a lot of magic happening that I just don’t understand yet. Couple that with terms for these concepts that are rooted in mathematics (and thus unfamiliar with your average dev, like myself) and you get a recipe for assuming there’s always more to understand.I think it really took this video for that to get across to me. This whole time I thought I was still missing something. It turns out I use Monads all the time!

I wish instead of talking about bind as an operator he called it the bind function. A lot of people get hung up on the mysticism of operators, not realizing that they are just a binary function with a different syntax.

“What’s a monad?” is this century’s “What’s a spline?”

He made it the whole video without saying this!

I waited a whole day before posting it!

A monad is an APL primitive that takes a single argument, what’s the problem?

no no, as Leibniz said, monads are OOP

Everyone is. Everyone can get bored during a lecture. Everyone can understand a graph better than a list of numbers.

Learning styles are a pervasive myth.

https://www.wired.com/2015/01/need-know-learning-styles-myth-two-minutes/

I’m not an audio-visual learner. I have aphantasia and cannot visualize, as well as auditory processing issues which make it impossible to follow most lecturers (unless their voice is exceptionally clear and they talk slowly). I need to read something to understand it.

The learning styles research is, indeed, persistently mischaracterized, and people who focus on it will explain this in detail.

But disabilities exist and create most of the same accessibility needs which are often justified by appeal to learning styles. Academia is, in general but with some noteworthy exceptions, hostile to any discussion of accessibility. So that’s a complicating factor in any discussion like this…

Okay, but disabilities are not what people mean when they are talking about learning styles. I’m sure you also understand the gist of a data set better via a good graph a better than a list of numbers and you also can get much more easily bored or distracted during a lecture, like you just said, particularly if there’s a computer or book in front of you competing for your attention.

I live with someone with disabilities and I understand the need for corresponding accommodation. But the learning styles myth furthers the idea that everyone needs their own special unique kind of accommodation, which isn’t the case. Odds are that if someone is having trouble learning something by a particular method then the majority of students are also struggling with that same method.

I’m not sure what you mean. I do, in fact, support making material available in multiple formats. It’s a best practice for accessibility, in fact.