I don’t find Monad itself to be scary, but perhaps that’s because I’ve gone through the process of mucking about with various instances and getting a sense of the commonalities.

When I teach Haskell, I always teach IO first without reference to Monad, because I think that IO (and the explicit-state State s) gives us a sense of why we care about this otherwise abstruse type class. Another good example is Async, because we intuitively understand the process of “when it’s done, hand it off” that is captured by andThen :: Async a -> (a -> Async b) -> Async b which, of course, is (>>=) for that class.

My experience is that it takes about 3 fairly similar examples before people “get” Monad, but it’s important to let people know that they can use IO before they understand Monad.

I run the local Haskell user group in my city, and I’ve spent a lot of time there, at work, and in my personal life trying to help people come to grips with haskell and pure FP in general, and this presentation is something that people should really take to heart. The number of people who come to meetups and talk to me about how hard of a time they are having getting over monads is enough that some of us have even started referring to a persons general level of familiarity with the language as whether or not that they’ve crossed the monad gap (and I think that moving on and focusing on other things successfully counts as crossing that gap).

To take a more broad look at the problem though, I think that a lot of people have trouble with the mathematical language that tends to be used when talking about haskell. Much like educated people used to use Latin, the language of category theory and abstract algebra permeates discussions of haskell and at times feels like it’s being used entirely to keep people out of the loop. I know that a lot of people involved in haskell are academic- either actively in academia or else having been steeped in the culture of academia they still communicate as though they are writing papers. It just starts to feel like a club sometimes. I had no background in higher level math when I started learning haskell- and I eventually managed to learn and start to grok category theory and abstract algebra through sheer determination, force of will, and the generous helping of free time that comes from being early in your career, single, and introverted.

I think haskell is a pretty good language. It’s not perfect by any means, but it’s in the list of languages that exist right now that I’d prefer to work with for most things, and it’s a much harder sell when so few people are comfortable working in it. I think as a community we’d do well to try to make the language more accessible. The underlying math is interesting and important, but not for everyone, not all the time, and I think it needs to be less front-loaded into what people are told they need to learn.

If by underlying math you mean category theory, then I disagree. I never use category theory in my research, it is rarely used in the research results I feel are important, and I don’t think in terms of category theory when programming in Haskell. Even monads, the shining star of applying category theory to PL, I feel are better understood in terms of their operational analogue to delimited continuations.

I happen to prefer ML to Haskell, and agree that the importance of category theory in everyday programming has been oversold. However, to be fair, there are some virtues to thinking denotationally (monads) rather than operationally (continuations). For example, the virtues of call-by-push-value, were immediately clear to me when I saw its categorical semantics: An adjunction between the categories of value and computation types, where the former admits nice universal constructions (direct sums, tensor products) and the latter admits nice couniversal constructions (direct products). The adjunction itself gives you function types. Considering just the category of value types (and the monad on it) gives you (a denotational semantics for) call-by-value. Dually, considering just the category of computation types (and the comonad on it) gives you (a denotational semantics for) call-by-name. I have no idea how I would have even begun to understand this from a purely operational point of view.

I have no idea how I would have even begun to understand this from a purely operational point of view.

If you are unable to explain the significance of this result in operational terms then I have to doubt its relevance to PL. Ultimately we work in the domain of machines; a denotational semantics is interesting to us as computer scientists when we can apply results stated in terms of other mathematical domains to our domain. I see a denotational semantics that gives insight into category theory as primarily interesting to category theorists.

An example of denotational semantics I find interesting would be the recentworkonsemanticsubtyping. Normally subtyping is presented axiomatically as a series of inference rules. In what sense are these axioms “complete”? For example, this paper presents a subtyping relation, but you can’t prove ∀a.() → a ≤ () → ∀a.a in it. Are we justified in adding the distributivity axiom, and are there other axioms that we should add? By interpreting types as their obvious set-theoretic counterparts we can answer these questions definitively.

Of course, ultimately a programming language has to be implementable on a machine, so it must have an operational semantics. However, I prefer to regard the operational semantics as the final product of the language design process. The starting point is elsewhere.

A programming language that is useful to human beings can’t be defined by arbitrary rules for pushing symbols, otherwise we end up with monstrosities like null references, template specialization, superclass linearization, etc., whose sole justification is “well, it works that way”. In a sensibly designed language, the symbols mean something to us, and it’s this meaning that justifies the use of this language over that other one. So the language designer must have in mind some collection of intended denotations, and use that as a yardstick for evaluating his or her work.

However, I prefer to regard the operational semantics as the final product of the language design process. The starting point is elsewhere. […] In a sensibly designed language, the symbols mean something to us, and it’s this meaning that justifies the use of this language over that other one.

Correct me if I am misunderstanding you, but it sounds to me like if you take this argument to its logical conclusion, you are arguing for language design based on mathematical aesthetics. I disagree with this perspective. The value of a denotation is derived from its ability to model desirable operational phenomena, not the other way around. I advocate functional programming because its easier to reason about the operational behavior of programs when there are no “hidden variables”, to borrow a physics term. If mathematical functions didn’t capture the essence of this operational requirement, then I’d pick something else.

To pick one of your examples, null is bad because it is a member of every type whether you want it to be or not, which limits the ability of the programmer to constrain the operational behavior of their program through the type system, which leads to bugs. Or in other words, null is evidence that your type system is insufficiently expressive. Whether or not some categorical construction naturally accounts for null is irrelevant to the badness of null. On the contrary, if that construction leads one to advocate for null, then that is evidence that that construction would be a poor foundation for a programming language.

Let’s connect this with your CBPV example earlier.

An adjunction between the categories of value and computation types, where the former admits nice universal constructions (direct sums, tensor products) and the latter admits nice couniversal constructions (direct products). The adjunction itself gives you function types. Considering just the category of value types (and the monad on it) gives you (a denotational semantics for) call-by-value. Dually, considering just the category of computation types (and the comonad on it) gives you (a denotational semantics for) call-by-name.

Without establishing the operational consequences this statement has, it is meaningless. Maybe I could model a competitor to CBPV with semisimple rings and Hilbert spaces. How would you compare and contrast these evaluation strategies without an appeal to operational semantics?

whose sole justification is “well, it works that way”

One way to justify an operational semantics is by connecting it with an intuitive denotation. But that isn’t the only way. Another way is by directly stating the mathematical properties you want the operational semantics to have. You can justify your choice of call-by-value evaluation by proving that the cost semantics for your language is compositional. You can justify the type system you chose by proving that the operational semantics does not admit undefined behavior (the classic “progress + preservation” theorem). Etc.

Correct me if I am misunderstanding you, but it sounds to me like if you take this argument to its logical conclusion, you are arguing for language design based on mathematical aesthetics.

You’re not wrong.

The value of a denotation is derived from its ability to model desirable operational phenomena, not the other way around.

Sure. I didn’t mean to suggest that I begin with a fully formed denotational semantics and use that to come up with the operational semantics of a new language. Rather, I think denotational considerations provide useful guidelines for language design. More on this later.

Without establishing the operational consequences this statement has, it is meaningless.

The operational consequence is that call-by-push-value is very finicky about the distinction between a value and a computation that produces a value, and dually, between a computation and a thunk that, when forced, produces this computation. This gives the programmer maximal control over when computation happens, as opposed to either call-by-value or call-by-name, which come with implicit assumptions about when computation happens. CBV and CBN can be embedded into CBPV by explicitly using these assumptions in the appropriate places, and, furthermore, CBPV’s type system will tell you what these appropriate places are.

In retrospect, that probably wasn’t so hard to understand coming from an operational approach. But for some reason I found it easier to come at it from a denotational approach first.

Another way is by directly stating the mathematical properties you want the operational semantics to have.

In a previous conversation with you, I mentioned that I prefer languages that don’t gratuitously provide type formers that break isomorphism invariance. (In fact, lately I have entertained the idea that a good definition of “effect” is “anything that breaks some equational law”.) How would I state the property that a type former is isomorphism-invariant, if not first in categorical language?

In a previous conversation with you, I mentioned that I prefer languages that don’t gratuitously provide type formers that break isomorphism invariance. (In fact, lately I have entertained the idea that a good definition of “effect” is “anything that breaks some equational law”.) How would I state the property that a type former is isomorphism-invariant, if not first in categorical language?

If I’m understanding what you’re saying, then it’d look something like T : ⋆ -> ⋆ is an isomorphism-invariant type former if for all types A and B, A ~ B implies T A ~ T B. There are a couple notions of type isomorphism that aren’t quite equivalent, but for the sake of completeness you could define A ~ B as there exist two terms f : A -> B and g : B -> A such that f . g ≡ id and g . f ≡ id where ≡ is contextual equivalence at the appropriate type.

If that sounds overly categorical it is because the property you desire relies essentially on categorical concepts. The question then is why you chose this to be the property that you want to hold. It isn’t immediately obvious to me that this property implies that GADTs are well behaved, nor is it immediately obvious to me that references don’t satisfy this property (although I believe it because references are tricky). As in the previous discussion the property I would have stated is that the language should not include a mechanism for deciding type disjointness. One can justify this in a couple of ways: first, disjointness is not closed under abstraction, so for GADTs to work in full generality one would need to add disjointness predicates, similar to how Gaster-Jones style record systems allow abstraction over disjoint name predicates.

Second, unlike names, there is no obvious “base case” for disjointness. If you’re working in a lambda calculus, it may be that all types ultimately are built from -> constructors (or at least the semantics provides the illusion of this for clarity/ease of formal manipulation). For this concept to work in full generality mandates that you have a mechanism for “hard” type generativity a-la Haskell newtype, as opposed to what I would call “soft” type generativity that essentially relies on parametricity a-la ML. And I consider newtype to be a hack that forces programmers to endlessly litter any code they wish to hide implementation details of with operationally insignificant injections and projections, so I consider any features that suggest the addition of newtype suspect.

You can use this same reasoning to argue against Racket-style intersections and unions, although I would argue against these more directly by appealing to parametricity.

If you were able to immediately deduce all of the implications I stated above from that isomorphism invariance property, then maybe I should give category theory another shot.

There are a couple notions of type isomorphism that aren’t quite equivalent, but for the sake of completeness you could define A ~ B as there exist two terms f : A -> B and g : B -> A such that f . g ≡ id and g . f ≡ id where ≡ is contextual equivalence at the appropriate type.

In my mind, the right notion of equivalence between two abstract types is that no client program can possibly distinguish between them. That is, if a client program that uses one abstract type is consistently modified to use an equivalent one, then the behavior of the client program remains unchanged. I will call this property “weak equivalence”.

Unfortunately, weak equivalence is difficult to establish in the general case. To simplify the task, I pay attention to the special case when a weak equivalence of abstract types is witnessed by computable mappings (inside the programming language) back and forth between their internal representations, such that “transporting” a value from either abstract type to the other (using the appropriate mapping) doesn’t change the observations that can be made from it. It follows that composing the mappings in either order gives a contextual equivalence at the appropriate abstract type. I will call this property “strong equivalence”.

Since strong equivalence implies weak equivalence and is so much easier to establish, I want to milk strong equivalences for all they’re worth. 90% of the benefit for 10% of the cost is a truly awesome deal.

If that sounds overly categorical it is because the property you desire relies essentially on categorical concepts. The question then is why you chose this to be the property that you want to hold.

When I implement an abstract type, I often want to consider several possible representations, even if in the end I will only use one, because different operations of an abstract type might be easier to implement using different representations. Using strong equivalences, I can “patch together” several incomplete abstract type implementations into a single complete (if crude) one, which can then be optimized by rewriting it in meaning-preserving ways.

It isn’t immediately obvious to me that this property implies that GADTs are well behaved, nor is it immediately obvious to me that references don’t satisfy this property (although I believe it because references are tricky). As in the previous discussion the property I would have stated is that the language should not include a mechanism for deciding type disjointness.

Right, in our previous conversation, you convinced me that the real problem is the the ability to answer type disjointness questions, not GADTs themselves. Alas, the only two implementations of GADTs I’ve seen so far, namely, Haskell’s and OCaml’s, both rely on the ability to answer type disjointness questions (with something other than “dunno”).

One can justify this in a couple of ways: first, disjointness is not closed under abstraction, so for GADTs to work in full generality one would need to add disjointness predicates, similar to how Gaster-Jones style record systems allow abstraction over disjoint name predicates.

This objection isn’t strong enough. You could say “if you need disjointness predicates, well, add them!”

Second, unlike names, there is no obvious “base case” for disjointness. If you’re working in a lambda calculus, it may be that all types ultimately are built from -> constructors (or at least the semantics provides the illusion of this for clarity/ease of formal manipulation). For this concept to work in full generality mandates that you have a mechanism for “hard” type generativity a-la Haskell newtype, as opposed to what I would call “soft” type generativity that essentially relies on parametricity a-la ML.

Yes, this gets to the heart of my objection. If we take isomorphism-invariance seriously, it follows that, ignoring efficiency considerations, the language’s semantics shouldn’t distinguish between “hard” and “soft” generativity in the first place. Isomorphic means equal modulo injections and projections in the right places. If we can make different observations from types meant to be isomorphic, something went wrong.

If you were able to immediately deduce all of the implications I stated above from that isomorphism invariance property, then maybe I should give category theory another shot.

Unfortunately, being neither a computer scientist nor a mathematician, just a layman programmer, I lack the necessary sophistication to make a solid enough case to convince you.

OK, I better understand what you are getting at. I agree that my definition of isomorphism-invariance isn’t what you want. I think the stumbling block for me is that I understand isomorphism in terms of the abstract algebra definition of bijection + preserving structure and in this case didn’t understand what structure you were trying to preserve–the definition I gave above is more-or-less the direct transliteration of (my understanding of) the category theoretic definition of isomorphism.

My new understanding of this is that you want the type system to be “monotonic.” If you’re unfamiliar with this term, it means that if there is some subtyping/subkinding relation, Γ ⊢ e : t, and Γ' ≤ Γ implies Γ' ⊢ e : t' and t' ≤ t (where subtyping/subkinding is extended pointwise to contexts). If monotonicity doesn’t hold, then it makes the notion of backwards compatibility for APIs difficult to nail down. Haskell’s core language isn’t monotonic for a number of reasons, and there isn’t a formal subtyping relation defined on modules to begin with (I’m not even sure it’s possible to do at module granularity because of orphan type classes).

Generally we want a type definition to be in a subkinding relation with an abstract type. That, is if the language has singleton kinds, S(t : k) ≤ k. It is impossible to consider Haskell-style data and newtype in this way without breaking monotonicity–coverage checking for GADTs, type family pattern matching, and type class canonicity all crucially rely on the type system being non-monotonic. (One has to be careful of the statement of this fact because newtype also allows recursive types & wrapping impredicative types. These have structural equivalents, so we could say something like S(StructuralEquivalent(t) : k) ≤ k)

Unfortunately, being neither a computer scientist nor a mathematician, just a layman programmer, I lack the necessary sophistication to make a solid enough case to convince you.

FWIW I am enjoying our conversation, so I’m sorry if it comes across as otherwise.

My new understanding of this is that you want the type system to be “monotonic.” If you’re unfamiliar with this term, it means that if there is some subtyping/subkinding relation, Γ ⊢ e : t, and Γ' ≤ Γ implies Γ' ⊢ e : t' and t' ≤ t (where subtyping/subkinding is extended pointwise to contexts).

Monotonic as in “monotonic logic”? Aha, yes! I start with two types from which I can make equivalent observations. Then I define a bunch of GADTs, type families, class instances, whatever, and, voilà, I can no longer make equivalent observations from those types.

If monotonicity doesn’t hold, then it makes the notion of backwards compatibility for APIs difficult to nail down.

As a layman programmer, what I observe is that it hurts modularity. Adding a new definition “here” (without syntactically altering existing ones) destroys a property established and used “there”. It is difficult to understand for me that Haskellers of all people would be fine with this situation.

FWIW I am enjoying our conversation, so I’m sorry if it comes across as otherwise.

The value of a denotation is derived from its ability to model desirable operational phenomena, not the other way around.

Are you saying the requirements of programming are operational? That’s not my interpretation. People don’t run programs to make machines do things, they run programs to answer their questions (the original case was decrypting encrypted messages AIUI) - even in a case where a program is automating, like, a warehouse or something, I usually find it more helpful to think of this as the operator (or machine) asking “what should I do now?” I mean if we’re talking about mathematics, the original use cases were things like land surveying, which is operational in a sense (and in the same sense that most programming is, as I would see it). I think mathematicians have been solving similar problems to those that programmers solve, for longer, and so mathematics should inform programming language design.

Another way is by directly stating the mathematical properties you want the operational semantics to have. You can justify your choice of call-by-value evaluation by proving that the cost semantics for your language is compositional.

Isn’t that still a mathematical style of modelling - the kind of thing that category theory would be useful for?

I think there is some confusion wrt the terminology I am using. By “operational” I am referring to operational semantics, a way of formalizing a programming language by describing how programs in the language are executed. The alternative to an operational semantics is a denotational semantics, which formalizes a programming language by associating its syntax with some other mathematical construction, like a set, or a function, or a category. Both are important, we are just disagreeing about which “comes first” or “has primacy” (not really sure how to put it, I’m sleepy)

It’s been quite the opposite historically. I’m not sure where Lambda Calculus fits in denotational vs operational. The Turing Machine & Von Neumann Architecture appear more like operational semantics. Computers were designed to be equivalent to them. Programming languages were as well with them gradually becoming more high-level. Most of what we have today was built with experimental iteration that didn’t involve formal methods or even formal specs. Most programming is similarly “perform these steps to solve the goal.”

Operational semantics has ruled the day in both languages and computers since they began. Possibly why the early proof efforts on programs used operational semantics.

People have been at least attempting both approaches for most of programming history I think. The Church-Turing equivalence was surprising to some at the time partly because it showed that the more “mechanical” way of thinking of computation and the more “mathematical” way were equivalent, while it wasn’t entirely obvious beforehand that this would be true.

In design of “real” PLs, I think the 1950s projects of Algol vs. Lisp present some of the same flavor. By 21st century standards 1950s Lisp is not all that high-level and abstract (I mean, car and cdr are named after register-twiddling), but compared to what was going on at the time, the Lisp people at least saw themselves as trying to go the other direction. Rather than starting bottom-up from hardware and abstracting some structure out of that gradually, start top-down from something more like mathematical functions, and find a way to implement them on the computer so that the programmer is presented with the abstract computational model regardless of what happens at the machine-code level. Hence the assumption of a bunch of things that were nice in the abstract model, but didn’t actually exist concretely yet, like garbage collection, higher-order functions, etc. (most of which wouldn’t exist in efficient implementations until decades later).

Ironically, Dijkstra complained that the definition of LISP (when its name was still in all caps) was too operational and tied to its implementation details. And he invented predicate transformer semantics, which allows you to ask (and sometimes answer) whether two operationally different, imperative, nondeterministic programs denote the same thing (relate the same precondition to the same postcondition). So it isn’t completely fair to suggest that imperative languages necessarily beget mechanistic reasoning.

I’m not referring to category theory per-se, although the sort of category-ish language is part of it, along with the type theory. More than those specific things though I’m really referring to the general approach to thinking and communicating, and the sort of math-adjacent pseudo-formalism that the Haskell community is really fond of, and the mathematical leaning theoretical cs that people often run across in papers that still form most of the documentation for some libraries.

For me personally, the language used around Haskell motivated me to study the mathematics, and in turn to realize that a lot of the things in Haskell are named by analogy to those concepts than strict formal adherence to them. The useful bit I think came from the way of thinking I started to develop and how I got a better intuition about going between math and code. Still, I don’t think making that leap is necessary to use the language and is turning a lot of people away.

A big part of the problem is that Haskell, or rather, Haskellers make mutually inconsistent promises. For example:

The types can tell you everything that’s going on, especially in plumbing libraries like lens, conduit, you name it.

You don’t need to learn any math to understand Haskell.

You can only pick one, really. You need a certain degree of mathematical sophistication to understand Haskell’s type system, e.g. how parametricity constrains the possible inhabitants of a type, how type class laws make sense in a generic way, and aren’t just the result of someone’s whims. Contrapositively, if Haskell libraries are meant to be used by people with no mathematical inclination, they need to be documented without an implicit assumption that types will tell you everything that’s going on, and equational laws should play a less prominent rôle in defining commonly used abstractions.

I don’t think they’re supposed to apply simultaneously. It’s more like:

(To get started) You don’t need to learn any math to understand Haskell.

(Eventually) The types can tell you everything that’s going on, especially in plumbing libraries like lens, conduit, you name it.

Which means that there’s a (well identified) gap as someone learns where their ability to read types is not developed enough to handle “advanced” code. That doesn’t mean they can’t still write it, but it is a steep part of the learning curve and could use some better documentation.

Sadly, this won’t do. Would you use libraries in your own work that contain unlawful instances for standard type classes like Applicative and Monad? Probably not, right? (At least I know I wouldn’t.) Well, that’s the kind of code that someone without mathematical inclination would write. And, for them, this would be fine so long as their programs work.

I’m always confused seeing the submissions talking about how important category theory is to the working programmer. I collect cutting-edge work in all major sub-fields of IT plus quite a bit on type systems, formal verification, PL research, and so on. The latter being stuff I pass onto others with only most abstract understanding on my part. Almost all the major stuff I have in the latter categories is done in Coq, HOL’s, custom logics derived from things like ZF/FOL, and so on. I’ve never seen category theory in any groundbreaking work programmers could actually use. Even most ordinary advances came from just using creativity and reasoning within ordinary programming languages like C, Java, etc. Java being number one in quantity of clever stuff published in it. Many are using Python now for stuff based on ad hoc methods.

I had to go out of my way to find a significant work, a compiler, done with category theory. It wasn’t that good compared to really old stuff done with other methods. I say category theory isn’t important 99+% of the time plus hasn’t brought us much at all. I’d say set theory, propositional logic, FOL, automatic solvers (esp SAT/SMT), and LCF-style provers have collectively gotten us the biggest gains justifying continued investment that continues to pay off (rinse repeat). Haskell is best off understood without any of that stuff if people just want to get stuff done. Even those doing formal proofs should ignore category theory in favor of ASM’s, Isabelle/HOL, etc. As you suggested, it’s likely a social, clique effect keeping it so prominent in discussions despite being so unimportant in the big picture.

I think that there is a mode of thinking that’s more of an analogy to category theory than actually being category theory, and it can be helpful in providing a mental framework for gaining an intuition of certain types of abstractions. Beyond that I agree its usefulness has been overstated.

That said, I think there are a lot of benefits to thinking of api design algebraicly and Haskell has done a lot to facilitate that.

I think the 99% figure might be underestimating the importance of category.theory, though maybe not by much- but in my own experience it’s always served more as a framework for brainstorming and I move on to other techniques to actually get stuff done.

“ and it can be helpful in providing a mental framework for gaining an intuition of certain types of abstractions.”

I won’t argue that. I lack the experience for it. Haskellers sure like it.

“I think the 99% figure might be underestimating the importance of category.theory”

Over 10 years and 12,000 papers on CompSci in languages, engineering, formal verification, and so on before I saw category theory mentioned… on Hacker News and then on Lobste.rs. I might be underestimating its importance but it can’t be that high outside the Haskell community.

Edit to add: my papers focused on significant achievements, how they were built/specified, and so on. There could be low-value or incidental work going on in category theory all the time. Just never made enough headway for me to see it in places showing high-impact tech.

I think a lot of things have been discovered under different names and then Category Theory swoops in and takes credit for those accomplishments since it’s a functional generalization of it. That said, most of that generality is pretty hard to put to practice… but it does all hang together and thus form a really pleasing way of seeing these concepts together.

I mean, I’d probably argue that any time you solve a problem by emphasizing functions then you’re practicing Category Theory in that one of its most key insights is to organize how focusing on morphisms between similar structures has been a powerful force in abstract algebra.

Without Category Theory we’d be in the same place we are now with less pretty language for talking about it.

I find problematic this attitude where whoever comes up with the general method for solving a large class of problems gets all the credit, and those who solved particular cases before get ignored. Good abstractions don’t arise in a vacuum: they are the result of studying lots of good motivating examples.

Incidentally, this is also why type classes are problematic: they assume that you can come up with the right abstractions (classes) and only then define models (instances) of them.

To be clear, I’m not exactly fond of this tendency either.

But on the other hand, generalization—the method of cutting away the inessential—is hugely vital to the development of theories in mathematics and computer science. I can’t overstate that. It’s not something to shake a stick at.

Category Theory almost gets gypped here since what it eventually generalized was so heavily used in mathematics since Bourbaki that the act of (formal) generalization gave almost no power—those roads were well-trodden. It just became a lingua franca for those who want to learn it.

All that said, I don’t have a clue why you’re taking aim at typeclasses. As far as I’m concerned they assume no such thing. A user of typeclasses certainly could… but it’s no more inherent in the feature than any other abstraction method.

But on the other hand, generalization—the method of cutting away the inessential—is hugely vital to the development of theories in mathematics and computer science. I can’t overstate that. It’s not something to shake a stick at.

Yes, and category theory hasn’t invented generalization. (Though it probably has refined it.) There is ample historical evidence of both mathematicians and programmers without exposure to category theory, who have come up with nice generalizations.

All that said, I don’t have a clue why you’re taking aim at typeclasses.

Type classes make it unnecessarily difficult to redesign existing abstraction hierarchies in view of newly acquired information. For example, when Haskell made Applicative a superclass of Monad, library authors who had defined Monad but not Applicative instances had to go out of their way to implement those instances. And, even today, the fail method hasn’t been removed from Monad, even though there are no good reasons for it to be there.

ML modules are more flexible in this regard. They allow signatures and structures to be defined and evolved independently, and matched only when the need arises. IMO, this matches more closely the process by which algebraic structures and their models are discovered.

That’s pretty much my position. Category Theory is a neat-looking formalism that shows up after all kinds of others, even informal methods, do the real work.

Re Functions. We got from the Lambda Calculus far as I can guess. Reinvented informally as interchangeable subroutines on punch cards with argument passing done manually. Then Fortran and LISP showed up. I don’t know if LC came from category theory in any way. If not, then functions are another example of after-the-fact, pretty-it-up effect you described.

I’d say we got them from math itself well before LC or subroutines :)

LC didn’t come from CT—it was invented around 15 years later. Its fundamental goal was to formalize the intuitive idea mathematicians had long had of certain transformation/arrows/functions being “natural” and it succeeded in abstracting this principle into a very general framework for talking about algebraic theories.

It’s absolutely just a matter of cleaning up language, but those cleaning up stages often turn into big wins. In fact, it was a principle tool of the French school of Algebraic Geometry led by Grothendieck (though I know nothing about AG so I can’t say anything more).

Additionally, now it’s a driving tool in certain schools of type theory so there’s a chance it will play a part in new developments there as they arise.

All that said, I don’t fundamentally disagree with your analysis. Category Theory gets its name as “Abstract Nonsense” largely because its discovery didn’t say anything anyone didn’t already know—it just said it in weird new words that someone else was promising you were really coherent.

Much like educated people used to use Latin, the language of category theory and abstract algebra permeates discussions of haskell and at times feels like it’s being used entirely to keep people out of the loop.

Nice pun with loop. In all seriousness, though… I’ve taken abstract algebra up to a graduate level and a lot of the Haskell terminology was new to me as well. Most of abstract algebra is groups, rings and fields and around specific applications (e.g. number fields, linear algebra, functional analysis). You learn that loops and monoids exist, but you don’t do much with them. Category theory is on the fringe even by pure math standards (I didn’t even encounter it until my late 20s).

Yeah, Haskegorical theory is a thing of its own. I learned category theory too in math school, but the real interesting applications for me were completely different from Haskell’s, like the Galois functor or homological algebra. I seldom see familiar stuff like universal mapping properties or snake lemmas in Haskegorical theory.

That’s because Haskellers only work with the moral analogue of the category of sets, tops Kleisli categories of Set-monads, and, even then, only by wishing bottom (and sometimes laziness altogether) out of existence. I bet the categorical techniques used in homological algebra (which is an actual use case of category theory for things you couldn’t reasonably do without it) would look just as alien and confusing to the average Haskeller as they would to any other programmer. Probably even an Abelian category would look weird: “I didn’t know the initial and final objects could be the same!”

In a more mathematical setting, I think this comment mirrors similar concerns to this.

I really disagree with the “you don’t need to understand monads” theme found here and in other Haskell blog posts; they all seem to be written by people that already understand monads and don’t get confused when the examples in documentation for popular libraries are littered with calls to liftM and >=>.

Sidenote, the documentation for both of those functions is:

I don’t think the article/presentation’s thesis is “you don’t need to understand monads” so much as it is “the best way to understand monads is by example, and by seeing the patterns between types that implement the Monad typeclass”.

Using things you don’t know in order to teach you something you don’t know

That’s a perfect description! Many have taken stabs at fixing this in Haskell, but there’s zero doubt that the majority of resources out there today are pedagogically aimed at someone who is coming to Haskell from, say, SML. No, probably they’d also have to be engaged in the academic community surrounding SML, too.

Just call them “IO action” or “IO type” … But we call them IO monads and cause this chaos.

Not only is this pedagogical chaos, it’s linguistically just awkward. Using “monad” as a noun is at best a very specialized phrase and generally just invalid. Experts do it because it’s convenient shorthand—convenient shorthand needs to explicitly described to someone who isn’t already in the know!

“Use the IO type.”

“The IO type is the type of imperative actions done in sequence—just like every language you’ve ever worked in.”

This is nothing this slideshow doesn’t already say, but god does it ever need reinforcement. If you use the phrase “the _____ monad” to anyone not already super familiar with monads as a concept then you should feel the need for some good old self-flagellation style penance.

Never attempt to consciously learn what monad is.

Fantastic. Follow this rule until you’ve reached a point in your learning that you feel the incurable need to break it. Then, if you feel pain, start following the rule again. When the time is right for you to understand monads they’ll be obvious.

I hope one day Haskell community will realize that while better than the previous solution it is still a bad solution. This is the usual set of bad examples that make Haskell IO looks sensible until someone starts working on a real program (with user output, configuration files, http requests, database requests, logging) and discover now they are dealing with transforms and lifting and want to tear their hair out.

Hmm, do you go over that in your book? I’ll have to finally purchase and dig in :)

I continually end up on haskell for writing certain types of apps (like cli tools or simple websites) because of the available libraries and stack – Ocaml just doesn’t have the quality of libraries for stuff like AWS or as good a build tool as stack and Erlang isn’t a fit for a lot of that work. But become frustrated with transforms and start dreaming of a haskell like language that isn’t pure :)

Hmm, do you go over that in your book? I’ll have to finally purchase and dig in :)

The book isn’t modular, you’d need to do at least half the book to be able to digest the monad transformers chapter. We only cover part of the point I made here, but it shouldn’t be hard to infer how to avoid making this mistake.

Lose purity and you lose almost everything that made it distinctly worthwhile. Even laziness (which is important) is optional compared to that.

Lose purity and you lose almost everything that made it distinctly worthwhile. Even laziness (which is important) is optional compared to that.

For some, sure. I just want a staticly typed functional language that compiles to a native binary and has good development tooling :). Haskell and ocaml each fit in there but each have their own distinct issues that can frustrate me. And then Rust, would like a function language with GC that can interop with Rust as an option.

I could nearly do without optional GC in Rust if it was a touch more functional, as it is, it inhabits roughly the same space as C++ for me. I don’t mind though, at least gives me an out if something doesn’t require a tonne of C++ middleware.

Every programming language has something like monads. In Prolog, it’s difference lists or definite clause grammars or metainterpreters; for Java, it’s reflection and generic methods and fancier generics; for Lisp, it’s fancy macros. It’s part of the fun.

I think other languages do a better job of not telling the beginners about these things on day one, but otherwise the situation isn’t that different. You put a high value on understanding something esoteric and tell the beginners about it and how hard it is, you shouldn’t be too surprised by seeing a bunch of blog posts about how easy and great it is. “Monads” are not a real problem.

I don’t find

`Monad`

itself to be scary, but perhaps that’s because I’ve gone through the process of mucking about with various instances and getting a sense of the commonalities.When I teach Haskell, I always teach

`IO`

first without reference to`Monad`

, because I think that`IO`

(and the explicit-state`State s`

) gives us a sense of why we care about this otherwise abstruse type class. Another good example is`Async`

, because we intuitively understand the process of “when it’s done, hand it off” that is captured by`andThen :: Async a -> (a -> Async b) -> Async b`

which, of course, is`(>>=)`

for that class.My experience is that it takes about 3 fairly similar examples before people “get”

`Monad`

, but it’s important to let people know that they can use`IO`

before they understand`Monad`

.I run the local Haskell user group in my city, and I’ve spent a lot of time there, at work, and in my personal life trying to help people come to grips with haskell and pure FP in general, and this presentation is something that people should really take to heart. The number of people who come to meetups and talk to me about how hard of a time they are having getting over monads is enough that some of us have even started referring to a persons general level of familiarity with the language as whether or not that they’ve crossed the monad gap (and I think that moving on and focusing on other things successfully counts as crossing that gap).

To take a more broad look at the problem though, I think that a lot of people have trouble with the mathematical language that tends to be used when talking about haskell. Much like educated people used to use Latin, the language of category theory and abstract algebra permeates discussions of haskell and at times feels like it’s being used entirely to keep people out of the loop. I know that a lot of people involved in haskell are academic- either actively in academia or else having been steeped in the culture of academia they still communicate as though they are writing papers. It just starts to feel like a club sometimes. I had no background in higher level math when I started learning haskell- and I eventually managed to learn and start to grok category theory and abstract algebra through sheer determination, force of will, and the generous helping of free time that comes from being early in your career, single, and introverted.

I think haskell is a pretty good language. It’s not perfect by any means, but it’s in the list of languages that exist right now that I’d prefer to work with for most things, and it’s a much harder sell when so few people are comfortable working in it. I think as a community we’d do well to try to make the language more accessible. The underlying math is interesting and important, but not for everyone, not all the time, and I think it needs to be less front-loaded into what people are told they need to learn.

If by underlying math you mean category theory, then I disagree. I never use category theory in my research, it is rarely used in the research results I feel are important, and I don’t think in terms of category theory when programming in Haskell. Even monads, the shining star of applying category theory to PL, I feel are better understood in terms of their operational analogue to delimited continuations.

The OCaml community is much more sane about this.

I happen to prefer ML to Haskell, and agree that the importance of category theory in everyday programming has been oversold. However, to be fair, there are some virtues to thinking denotationally (monads) rather than operationally (continuations). For example, the virtues of call-by-push-value, were immediately clear to me when I saw its categorical semantics: An adjunction between the categories of value and computation types, where the former admits nice universal constructions (direct sums, tensor products) and the latter admits nice couniversal constructions (direct products). The adjunction itself gives you function types. Considering just the category of value types (and the monad on it) gives you (a denotational semantics for) call-by-value. Dually, considering just the category of computation types (and the comonad on it) gives you (a denotational semantics for) call-by-name. I have no idea how I would have even begun to understand this from a purely operational point of view.

If you are unable to explain the significance of this result in operational terms then I have to doubt its relevance to PL. Ultimately we work in the domain of machines; a denotational semantics is interesting to us as computer scientists when we can apply results stated in terms of other mathematical domains to our domain. I see a denotational semantics that gives insight into category theory as primarily interesting to category theorists.

An example of denotational semantics I find interesting would be the recent work on semantic subtyping. Normally subtyping is presented axiomatically as a series of inference rules. In what sense are these axioms “complete”? For example, this paper presents a subtyping relation, but you can’t prove

`∀a.() → a ≤ () → ∀a.a`

in it. Are we justified in adding the distributivity axiom, and are there other axioms that we should add? By interpreting types as their obvious set-theoretic counterparts we can answer these questions definitively.Of course, ultimately a programming language has to be implementable on a machine, so it must have an operational semantics. However, I prefer to regard the operational semantics as the final product of the language design process. The starting point is elsewhere.

A programming language that is useful to human beings can’t be defined by arbitrary rules for pushing symbols, otherwise we end up with monstrosities like null references, template specialization, superclass linearization, etc., whose sole justification is “well, it works that way”. In a sensibly designed language, the symbols

meansomething to us, and it’s this meaning that justifies the use of this language over that other one. So the language designer must have in mind some collection of intended denotations, and use that as a yardstick for evaluating his or her work.Correct me if I am misunderstanding you, but it sounds to me like if you take this argument to its logical conclusion, you are arguing for language design based on mathematical aesthetics. I disagree with this perspective. The value of a denotation is derived from its ability to model desirable operational phenomena, not the other way around. I advocate functional programming because its easier to reason about the operational behavior of programs when there are no “hidden variables”, to borrow a physics term. If mathematical functions didn’t capture the essence of this operational requirement, then I’d pick something else.

To pick one of your examples, null is bad because it is a member of every type whether you want it to be or not, which limits the ability of the programmer to constrain the operational behavior of their program through the type system, which leads to bugs. Or in other words, null is evidence that your type system is insufficiently expressive. Whether or not some categorical construction naturally accounts for null is irrelevant to the badness of null. On the contrary, if that construction leads one to advocate for null, then that is evidence that that construction would be a poor foundation for a programming language.

Let’s connect this with your CBPV example earlier.

Without establishing the operational consequences this statement has, it is meaningless. Maybe I could model a competitor to CBPV with semisimple rings and Hilbert spaces. How would you compare and contrast these evaluation strategies without an appeal to operational semantics?

One way to justify an operational semantics is by connecting it with an intuitive denotation. But that isn’t the only way. Another way is by directly stating the mathematical properties you want the operational semantics to have. You can justify your choice of call-by-value evaluation by proving that the cost semantics for your language is compositional. You can justify the type system you chose by proving that the operational semantics does not admit undefined behavior (the classic “progress + preservation” theorem). Etc.

You’re not wrong.

Sure. I didn’t mean to suggest that I begin with a fully formed denotational semantics and use that to come up with the operational semantics of a new language. Rather, I think denotational considerations provide useful guidelines for language design. More on this later.

The operational consequence is that call-by-push-value is very finicky about the distinction between a value and a computation that produces a value, and dually, between a computation and a thunk that, when forced, produces this computation. This gives the programmer maximal control over when computation happens, as opposed to either call-by-value or call-by-name, which come with implicit assumptions about when computation happens. CBV and CBN can be embedded into CBPV by explicitly using these assumptions in the appropriate places, and, furthermore, CBPV’s type system will tell you what these appropriate places are.

In retrospect, that probably wasn’t so hard to understand coming from an operational approach. But for some reason I found it easier to come at it from a denotational approach first.

In a previous conversation with you, I mentioned that I prefer languages that don’t gratuitously provide type formers that break isomorphism invariance. (In fact, lately I have entertained the idea that a good definition of “effect” is “anything that breaks some equational law”.) How would I state the property that a type former is isomorphism-invariant, if not first in categorical language?

If I’m understanding what you’re saying, then it’d look something like

`T : ⋆ -> ⋆`

is an isomorphism-invariant type former if for all types`A`

and`B`

,`A ~ B`

implies`T A ~ T B`

. There are a couple notions of type isomorphism that aren’t quite equivalent, but for the sake of completeness you could define`A ~ B`

as there exist two terms`f : A -> B`

and`g : B -> A`

such that`f . g ≡ id`

and`g . f ≡ id`

where`≡`

is contextual equivalence at the appropriate type.If that sounds overly categorical it is because the property you desire relies essentially on categorical concepts. The question then is why you chose this to be the property that you want to hold. It isn’t immediately obvious to me that this property implies that GADTs are well behaved, nor is it immediately obvious to me that references don’t satisfy this property (although I believe it because references are tricky). As in the previous discussion the property I would have stated is that the language should not include a mechanism for deciding type disjointness. One can justify this in a couple of ways: first, disjointness is not closed under abstraction, so for GADTs to work in full generality one would need to add disjointness predicates, similar to how Gaster-Jones style record systems allow abstraction over disjoint name predicates.

Second, unlike names, there is no obvious “base case” for disjointness. If you’re working in a lambda calculus, it may be that all types ultimately are built from

`->`

constructors (or at least the semantics provides the illusion of this for clarity/ease of formal manipulation). For this concept to work in full generality mandates that you have a mechanism for “hard” type generativity a-la Haskell`newtype`

, as opposed to what I would call “soft” type generativity that essentially relies on parametricity a-la ML. And I consider`newtype`

to be a hack that forces programmers to endlessly litter any code they wish to hide implementation details of with operationally insignificant injections and projections, so I consider any features that suggest the addition of`newtype`

suspect.You can use this same reasoning to argue against Racket-style intersections and unions, although I would argue against these more directly by appealing to parametricity.

If you were able to immediately deduce all of the implications I stated above from that isomorphism invariance property, then maybe I should give category theory another shot.

In my mind, the right notion of equivalence between two abstract types is that no client program can possibly distinguish between them. That is, if a client program that uses one abstract type is consistently modified to use an equivalent one, then the behavior of the client program remains unchanged. I will call this property “weak equivalence”.

Unfortunately, weak equivalence is difficult to establish in the general case. To simplify the task, I pay attention to the special case when a weak equivalence of abstract types is witnessed by computable mappings (inside the programming language) back and forth between their internal representations, such that “transporting” a value from either abstract type to the other (using the appropriate mapping) doesn’t change the observations that can be made from it. It follows that composing the mappings in either order gives a contextual equivalence at the appropriate abstract type. I will call this property “strong equivalence”.

Since strong equivalence implies weak equivalence and is so much easier to establish, I want to milk strong equivalences for all they’re worth. 90% of the benefit for 10% of the cost is a truly awesome deal.

When I implement an abstract type, I often want to consider several possible representations, even if in the end I will only use one, because different operations of an abstract type might be easier to implement using different representations. Using strong equivalences, I can “patch together” several incomplete abstract type implementations into a single complete (if crude) one, which can then be optimized by rewriting it in meaning-preserving ways.

Right, in our previous conversation, you convinced me that the real problem is the the ability to answer type disjointness questions, not GADTs themselves. Alas, the only two implementations of GADTs I’ve seen so far, namely, Haskell’s and OCaml’s, both rely on the ability to answer type disjointness questions (with something other than “dunno”).

This objection isn’t strong enough. You could say “if you need disjointness predicates, well, add them!”

Yes, this gets to the heart of my objection. If we take isomorphism-invariance seriously, it follows that, ignoring efficiency considerations, the language’s semantics shouldn’t distinguish between “hard” and “soft” generativity in the first place. Isomorphic means equal modulo injections and projections in the right places. If we can make different observations from types meant to be isomorphic, something went wrong.

Unfortunately, being neither a computer scientist nor a mathematician, just a layman programmer, I lack the necessary sophistication to make a solid enough case to convince you.

OK, I better understand what you are getting at. I agree that my definition of isomorphism-invariance isn’t what you want. I think the stumbling block for me is that I understand isomorphism in terms of the abstract algebra definition of bijection + preserving structure and in this case didn’t understand what structure you were trying to preserve–the definition I gave above is more-or-less the direct transliteration of (my understanding of) the category theoretic definition of isomorphism.

My new understanding of this is that you want the type system to be “monotonic.” If you’re unfamiliar with this term, it means that if there is some subtyping/subkinding relation,

`Γ ⊢ e : t`

, and`Γ' ≤ Γ`

implies`Γ' ⊢ e : t'`

and`t' ≤ t`

(where subtyping/subkinding is extended pointwise to contexts). If monotonicity doesn’t hold, then it makes the notion of backwards compatibility for APIs difficult to nail down. Haskell’s core language isn’t monotonic for a number of reasons, and there isn’t a formal subtyping relation defined on modules to begin with (I’m not even sure it’s possible to do at module granularity because of orphan type classes).Generally we want a type definition to be in a subkinding relation with an abstract type. That, is if the language has singleton kinds,

`S(t : k) ≤ k`

. It is impossible to consider Haskell-style`data`

and`newtype`

in this way without breaking monotonicity–coverage checking for GADTs, type family pattern matching, and type class canonicity all crucially rely on the type system being non-monotonic. (One has to be careful of the statement of this fact because`newtype`

also allows recursive types & wrapping impredicative types. These have structural equivalents, so we could say something like`S(StructuralEquivalent(t) : k) ≤ k`

)FWIW I am enjoying our conversation, so I’m sorry if it comes across as otherwise.

Monotonic as in “monotonic logic”? Aha, yes! I start with two types from which I can make equivalent observations. Then I define a bunch of GADTs, type families, class instances, whatever, and, voilà, I can no longer make equivalent observations from those types.

As a layman programmer, what I observe is that it hurts modularity. Adding a new definition “here” (without syntactically altering existing ones) destroys a property established and used “there”. It is difficult to understand for me that Haskellers of all people would be fine with this situation.

No problem, I am actually enjoying this too! :-)

Are you saying the requirements of programming are operational? That’s not my interpretation. People don’t run programs to make machines do things, they run programs to answer their questions (the original case was decrypting encrypted messages AIUI) - even in a case where a program is automating, like, a warehouse or something, I usually find it more helpful to think of this as the operator (or machine) asking “what should I do now?” I mean if we’re talking about mathematics, the original use cases were things like land surveying, which is operational in a sense (and in the same sense that most programming is, as I would see it). I think mathematicians have been solving similar problems to those that programmers solve, for longer, and so mathematics should inform programming language design.

Isn’t that still a mathematical style of modelling - the kind of thing that category theory would be useful for?

I think there is some confusion wrt the terminology I am using. By “operational” I am referring to operational semantics, a way of formalizing a programming language by describing how programs in the language are executed. The alternative to an operational semantics is a denotational semantics, which formalizes a programming language by associating its syntax with some other mathematical construction, like a set, or a function, or a category. Both are important, we are just disagreeing about which “comes first” or “has primacy” (not really sure how to put it, I’m sleepy)

It’s been quite the opposite historically. I’m not sure where Lambda Calculus fits in denotational vs operational. The Turing Machine & Von Neumann Architecture appear more like operational semantics. Computers were designed to be equivalent to them. Programming languages were as well with them gradually becoming more high-level. Most of what we have today was built with experimental iteration that didn’t involve formal methods or even formal specs. Most programming is similarly “perform these steps to solve the goal.”

Operational semantics has ruled the day in both languages and computers since they began. Possibly why the early proof efforts on programs used operational semantics.

Any programming language can be given an operational semantics, and lambda calculi are no exception.

People have been at least attempting both approaches for most of programming history I think. The Church-Turing equivalence was surprising to some at the time partly because it showed that the more “mechanical” way of thinking of computation and the more “mathematical” way were equivalent, while it wasn’t entirely obvious beforehand that this would be true.

In design of “real” PLs, I think the 1950s projects of Algol vs. Lisp present some of the same flavor. By 21st century standards 1950s Lisp is not all

thathigh-level and abstract (I mean,`car`

and`cdr`

are named after register-twiddling), but compared to what was going on at the time, the Lisp people at least saw themselves as trying to go the other direction. Rather than starting bottom-up from hardware and abstracting some structure out of that gradually, start top-down from something more like mathematical functions, and find a way to implement them on the computer so that the programmer is presented with the abstract computational model regardless of what happens at the machine-code level. Hence the assumption of a bunch of things that were nice in the abstract model, but didn’t actually exist concretely yet, like garbage collection, higher-order functions, etc. (most of which wouldn’t exist inefficientimplementations until decades later).Ironically, Dijkstra complained that the definition of LISP (when its name was still in all caps) was too operational and tied to its implementation details. And he invented predicate transformer semantics, which allows you to ask (and sometimes answer) whether two operationally different, imperative, nondeterministic programs denote the same thing (relate the same precondition to the same postcondition). So it isn’t completely fair to suggest that imperative languages necessarily beget mechanistic reasoning.

I’m not referring to category theory per-se, although the sort of category-ish language is part of it, along with the type theory. More than those specific things though I’m really referring to the general approach to thinking and communicating, and the sort of math-adjacent pseudo-formalism that the Haskell community is really fond of, and the mathematical leaning theoretical cs that people often run across in papers that still form most of the documentation for some libraries.

For me personally, the language used around Haskell motivated me to study the mathematics, and in turn to realize that a lot of the things in Haskell are named by analogy to those concepts than strict formal adherence to them. The useful bit I think came from the way of thinking I started to develop and how I got a better intuition about going between math and code. Still, I don’t think making that leap is necessary to use the language and is turning a lot of people away.

A big part of the problem is that Haskell, or rather, Haskellers make mutually inconsistent promises. For example:

The types can tell you everything that’s going on, especially in plumbing libraries like lens, conduit, you name it.

You don’t need to learn any math to understand Haskell.

You can only pick one, really. You need a certain degree of mathematical sophistication to understand Haskell’s type system, e.g. how parametricity constrains the possible inhabitants of a type, how type class laws make sense in a generic way, and aren’t just the result of someone’s whims. Contrapositively, if Haskell libraries are meant to be used by people with no mathematical inclination, they need to be documented without an implicit assumption that types will tell you everything that’s going on, and equational laws should play a less prominent rôle in defining commonly used abstractions.

I don’t think they’re supposed to apply simultaneously. It’s more like:

Which means that there’s a (well identified) gap as someone learns where their ability to read types is not developed enough to handle “advanced” code. That doesn’t mean they can’t still write it, but it is a steep part of the learning curve and could use some better documentation.

Sadly, this won’t do. Would you use libraries in your own work that contain unlawful instances for standard type classes like Applicative and Monad? Probably not, right? (At least I know I wouldn’t.) Well, that’s the kind of code that someone without mathematical inclination would write. And, for them,

this would be fineso long as their programs work.re category theory

I’m always confused seeing the submissions talking about how important category theory is to the working programmer. I collect cutting-edge work in all major sub-fields of IT plus quite a bit on type systems, formal verification, PL research, and so on. The latter being stuff I pass onto others with only most abstract understanding on my part. Almost all the major stuff I have in the latter categories is done in Coq, HOL’s, custom logics derived from things like ZF/FOL, and so on. I’ve never seen category theory in any groundbreaking work programmers could actually use. Even most ordinary advances came from just using creativity and reasoning within ordinary programming languages like C, Java, etc. Java being number one in quantity of clever stuff published in it. Many are using Python now for stuff based on ad hoc methods.

I had to go out of my way to find a significant work, a compiler, done with category theory. It wasn’t that good compared to really old stuff done with other methods. I say category theory isn’t important 99+% of the time plus hasn’t brought us much at all. I’d say set theory, propositional logic, FOL, automatic solvers (esp SAT/SMT), and LCF-style provers have collectively gotten us the biggest gains justifying continued investment that continues to pay off (rinse repeat). Haskell is best off understood without any of that stuff if people just want to get stuff done. Even those doing formal proofs should ignore category theory in favor of ASM’s, Isabelle/HOL, etc. As you suggested, it’s likely a social, clique effect keeping it so prominent in discussions despite being so unimportant in the big picture.

I think that there is a mode of thinking that’s more of an analogy to category theory than actually being category theory, and it can be helpful in providing a mental framework for gaining an intuition of certain types of abstractions. Beyond that I agree its usefulness has been overstated.

That said, I think there are a lot of benefits to thinking of api design algebraicly and Haskell has done a lot to facilitate that.

I think the 99% figure might be underestimating the importance of category.theory, though maybe not by much- but in my own experience it’s always served more as a framework for brainstorming and I move on to other techniques to actually get stuff done.

“ and it can be helpful in providing a mental framework for gaining an intuition of certain types of abstractions.”

I won’t argue that. I lack the experience for it. Haskellers sure like it.

“I think the 99% figure might be underestimating the importance of category.theory”

Over 10 years and 12,000 papers on CompSci in languages, engineering, formal verification, and so on before I saw category theory mentioned… on Hacker News and then on Lobste.rs. I might be underestimating its importance but it can’t be that high outside the Haskell community.

Edit to add: my papers focused on significant achievements, how they were built/specified, and so on. There could be low-value or incidental work going on in category theory all the time. Just never made enough headway for me to see it in places showing high-impact tech.

I think a lot of things have been discovered under different names and then Category Theory swoops in and takes credit for those accomplishments since it’s a functional generalization of it. That said, most of that generality is pretty hard to put to practice… but it does all hang together and thus form a really pleasing way of seeing these concepts together.

I mean, I’d probably argue that any time you solve a problem by emphasizing

functionsthen you’re practicing Category Theory in that one of its most key insights is to organize how focusing on morphisms between similar structures has been a powerful force in abstract algebra.Without Category Theory we’d be in the same place we are now with less pretty language for talking about it.

I find problematic this attitude where whoever comes up with the general method for solving a large class of problems gets all the credit, and those who solved particular cases before get ignored. Good abstractions don’t arise in a vacuum: they are the result of studying lots of good motivating examples.

Incidentally, this is also why type classes are problematic: they assume that you can come up with the right abstractions (classes) and only then define models (instances) of them.

To be clear, I’m not exactly fond of this tendency either.

But on the other hand, generalization—the method of cutting away the inessential—is

hugelyvital to the development of theories in mathematics and computer science. I can’t overstate that. It’s not something to shake a stick at.Category Theory almost gets gypped here since what it eventually generalized was so heavily used in mathematics since Bourbaki that the act of (formal) generalization gave almost no power—those roads were well-trodden. It just became a lingua franca for those who want to learn it.

All that said, I don’t have a clue why you’re taking aim at typeclasses. As far as I’m concerned they assume no such thing. A user of typeclasses certainly could… but it’s no more inherent in the feature than any other abstraction method.

Yes, and category theory hasn’t invented generalization. (Though it probably has refined it.) There is ample historical evidence of both mathematicians and programmers without exposure to category theory, who have come up with nice generalizations.

Type classes make it unnecessarily difficult to redesign existing abstraction hierarchies in view of newly acquired information. For example, when Haskell made

`Applicative`

a superclass of`Monad`

, library authors who had defined`Monad`

but not`Applicative`

instances had to go out of their way to implement those instances. And, even today, the`fail`

method hasn’t been removed from`Monad`

, even though there are no good reasons for it to be there.ML modules are more flexible in this regard. They allow signatures and structures to be defined and evolved independently, and matched only when the need arises. IMO, this matches more closely the process by which algebraic structures and their models are discovered.

Sorry, I don’t mean to say that CT invented generalization but instead that it’s just a nice generalization of some very well known things.

I see what you mean about typeclasses as well and agree.

That’s pretty much my position. Category Theory is a neat-looking formalism that shows up after all kinds of others, even informal methods, do the real work.

Re Functions. We got from the Lambda Calculus far as I can guess. Reinvented informally as interchangeable subroutines on punch cards with argument passing done manually. Then Fortran and LISP showed up. I don’t know if LC came from category theory in any way. If not, then functions are another example of after-the-fact, pretty-it-up effect you described.

I’d say we got them from math itself well before LC or subroutines :)

LC didn’t come from CT—it was invented around 15 years later. Its fundamental goal was to formalize the intuitive idea mathematicians had long had of certain transformation/arrows/functions being “natural” and it succeeded in abstracting this principle into a very general framework for talking about algebraic theories.

It’s absolutely just a matter of cleaning up language, but those cleaning up stages often turn into big wins. In fact, it was a principle tool of the French school of Algebraic Geometry led by Grothendieck (though I know nothing about AG so I can’t say anything more).

Additionally, now it’s a driving tool in certain schools of type theory so there’s a chance it will play a part in new developments there as they arise.

All that said, I don’t fundamentally disagree with your analysis. Category Theory gets its name as “Abstract Nonsense” largely because its discovery didn’t say anything anyone didn’t already know—it just said it in weird new words that someone else was promising you were really coherent.

Nice pun with loop. In all seriousness, though… I’ve taken abstract algebra up to a graduate level and a lot of the Haskell terminology was new to me as well. Most of abstract algebra is groups, rings and fields and around specific applications (e.g. number fields, linear algebra, functional analysis). You learn that loops and monoids exist, but you don’t do much with them. Category theory is on the fringe even by pure math standards (I didn’t even encounter it until my late 20s).

Yeah, Haskegorical theory is a thing of its own. I learned category theory too in math school, but the real interesting applications for me were completely different from Haskell’s, like the Galois functor or homological algebra. I seldom see familiar stuff like universal mapping properties or snake lemmas in Haskegorical theory.

That’s because Haskellers only work with the moral analogue of the category of sets, tops Kleisli categories of Set-monads, and, even then, only by wishing bottom (and sometimes laziness altogether) out of existence. I bet the categorical techniques used in homological algebra (which is an actual use case of category theory for things you couldn’t reasonably do without it) would look just as alien and confusing to the average Haskeller as they would to any other programmer. Probably even an Abelian category would look weird: “I didn’t know the initial and final objects could be the same!”

In a more mathematical setting, I think this comment mirrors similar concerns to this.

I really disagree with the “you don’t need to understand monads” theme found here and in other Haskell blog posts; they all seem to be written by people that already understand monads and don’t get confused when the examples in documentation for popular libraries are littered with calls to

`liftM`

and`>=>`

.Sidenote, the documentation for both of those functions is:

You’re right: they’re wrong.

I don’t think the article/presentation’s thesis is “you don’t need to understand monads” so much as it is “the best way to understand monads is by example, and by seeing the patterns between types that implement the Monad typeclass”.

That’s a perfect description! Many have taken stabs at fixing this in Haskell, but there’s zero doubt that the majority of resources out there today are pedagogically aimed at someone who is coming to Haskell from, say, SML. No, probably they’d also have to be engaged in the academic community surrounding SML, too.

Not only is this pedagogical chaos, it’s linguistically just awkward. Using “monad” as a noun is at best a very specialized phrase and generally just invalid. Experts do it because it’s convenient shorthand—

convenient shorthand needs to explicitly described to someone who isn’t already in the know!“The IO type is the type of imperative actions done in sequence—just like every language you’ve ever worked in.”

This is nothing this slideshow doesn’t already say, but god does it ever need reinforcement. If you use the phrase “the _____ monad” to anyone not already super familiar with monads as a concept then you should feel the need for some good old self-flagellation style penance.

Fantastic. Follow this rule until you’ve reached a point in your learning that you feel the incurable need to break it. Then, if you feel pain, start following the rule again. When the time is right for you to understand monads they’ll be obvious.

I hope one day Haskell community will realize that while better than the previous solution it is still a bad solution. This is the usual set of bad examples that make Haskell IO looks sensible until someone starts working on a real program (with user output, configuration files, http requests, database requests, logging) and discover now they are dealing with transforms and lifting and want to tear their hair out.

Nah fam. I know it’s rough go for your first project, but so is OTP.

If you know what you’re doing, you shouldn’t actually be lifting anything in your programs. That’s even a trope in the community.

Hmm, do you go over that in your book? I’ll have to finally purchase and dig in :)

I continually end up on haskell for writing certain types of apps (like cli tools or simple websites) because of the available libraries and stack – Ocaml just doesn’t have the quality of libraries for stuff like AWS or as good a build tool as stack and Erlang isn’t a fit for a lot of that work. But become frustrated with transforms and start dreaming of a haskell like language that isn’t pure :)

The book isn’t modular, you’d need to do at least half the book to be able to digest the monad transformers chapter. We only cover part of the point I made here, but it shouldn’t be hard to infer how to avoid making this mistake.

Lose purity and you lose almost everything that made it distinctly worthwhile. Even laziness (which is important) is optional compared to that.

For some, sure. I just want a staticly typed functional language that compiles to a native binary and has good development tooling :). Haskell and ocaml each fit in there but each have their own distinct issues that can frustrate me. And then Rust, would like a function language with GC that can interop with Rust as an option.

I could

nearlydo without optional GC in Rust if it was a touch more functional, as it is, it inhabits roughly the same space as C++ for me. I don’t mind though, at least gives me an out if something doesn’t require a tonne of C++ middleware.Every programming language has something like monads. In Prolog, it’s difference lists or definite clause grammars or metainterpreters; for Java, it’s reflection and generic methods and fancier generics; for Lisp, it’s fancy macros. It’s part of the fun.

I think other languages do a better job of not telling the beginners about these things on day one, but otherwise the situation isn’t that different. You put a high value on understanding something esoteric and tell the beginners about it and how hard it is, you shouldn’t be too surprised by seeing a bunch of blog posts about how easy and great it is. “Monads” are not a real problem.