Blog

Codensity Monads and Kan Extensions

In this post, we are going to look at another method of constructing monads. The details are possibly more mathematically complex that those we have discussed so far, hinging on the topic of Kan extensions. We will study this construction at a fairly high level of abstraction, aiming to highlight the wide applicability of the techniques involved.

Kan Extensions

Given a category \mathcal{D}, with a subcategory \mathcal{C}, and a functor G : \mathcal{C} \rightarrow \mathcal{E}, it is natural to ask if there is some systematic way to extend G to the whole of \mathcal{D}, yielding a functor G' : \mathcal{D} \rightarrow \mathcal{E}? Of course, when we look for “systematic” or “canonical” ways of doing things in category theory, we are looking for a construction with a universal property.

Although a natural motivation, the restriction to subcategories is unnecessary. Instead, for any functor J : \mathcal{C} \rightarrow \mathcal{D}, and functor G : \mathcal{C} \rightarrow \mathcal{E}, we are looking for a way to extend G “along J” to construct a functor of type \mathcal{D} \rightarrow \mathcal{E}.

It turns out, there is not one, but two mathematically natural choices, with universal properties given by the following isomorphisms, natural in F:

  1. \mathsf{Lan}_J(G)\Rightarrow F \quad\cong\quad G \Rightarrow F \circ J.
  2. F \circ J \Rightarrow G \quad\cong\quad F \Rightarrow \mathsf{Ran}_J(G).

\mathsf{Lan}_J(G) and \mathsf{Ran}_J(G) are respectively known as the left and right Kan extensions of G along J. Note that this terminology can be reversed in some parts of the literature, so it is always wise to check which universal property is being assumed. Kan extensions arise all over category theory and mathematics, for example in connection to adjunctions, (co)limits and monads, leading to MacLane’s claim “All concepts are Kan extensions”.

If these extensions exist for all G, we have adjunctions involving precomposition with J:

\mathsf{Lan}_J(-) \dashv (-) \circ J \qquad\text{ and }\qquad (-) \circ J \dashv \mathsf{Ran}_J(-)

So left Kan extensions then conveniently yield left adjoints and dually on the right, making the naming convention somewhat logical.

We can rephrase the universal properties for Kan extensions as follows:

  1. For \mathsf{Lan}_J(G) there exists a universal natural transformation \mathsf{run} : G \Rightarrow \mathsf{Lan}_J(G) \circ J \Rightarrow G such that for every \alpha : G \Rightarrow F \circ J there exists a unique \hat{\alpha} : \mathsf{Lan}_J{G} \Rightarrow F such that \alpha = (\hat{\alpha} \circ J) \cdot \mathsf{run}.
  2. For \mathsf{Ran}_J(G) there exists a universal natural transformation \mathsf{run} : \mathsf{Ran}_J(G) \circ J \Rightarrow G such that for every \alpha : F \circ J \Rightarrow G there exists a unique \hat{\alpha} : F \Rightarrow \mathsf{Ran}_J(G) such that \alpha = \mathsf{run} \cdot (\hat{\alpha} \circ J).

So far, we have no insight into whether we can form left and right Kan extensions. Fortunately, there is some good news on this front, if \mathcal{C} is a small category (has a set of objects):

  1. If \mathcal{E} is cocomplete, then \mathsf{Lan}_J(G) exists for all G.
  2. If \mathcal{E} is complete, then \mathsf{Ran}_J(G) exists for all G.

Under these conditions, there are explicit formulae for the Kan extensions in terms of the assumed (co)limits. These constructions are important, but we omit the details as we shall not need them in this post.

Example: It is common to consider functors between presheaf categories

[\mathcal{D}^{op},\mathsf{Set}] \rightarrow [\mathcal{C}^{op},\mathsf{Set}]

induced by precomposing with a functor J : \mathcal{C} \rightarrow \mathcal{D}. By the result above, as \mathsf{Set} is both complete and cocomplete, such functors always have both left and right adjoints, given by the two Kan extensions.

We note that in this way, every such J induces both a monad and a comonad on [\mathcal{D}^{op}, \mathsf{Set}]. These monads are not our main focus for today though.

There is another useful condition for the existence of certain Kan extensions in the presence of an adjunction. Assume L \dashv R with unit and counit \eta : \mathsf{Id} \Rightarrow R \circ L and \epsilon : L \circ R \Rightarrow \mathsf{Id}. Then:

  1. \mathsf{Lan}_R(\mathsf{Id}) = L.
  2. \mathsf{Ran}_L(\mathsf{Id}) = R.

We also have that adjoints preserve Kan extensions in the following sense:

  1. L \circ \mathsf{Lan}_J(G) = \mathsf{Lan}(L \circ G).
  2. R \circ \mathsf{Ran}_J(G) = \mathsf{Ran}(R \circ G).

Kan Lifts

As an aside, we briefly mention there is a related concept of Kan lifts. These answer the question for functors J : \mathcal{C} \rightarrow \mathcal{D}, and G : \mathcal{D} \rightarrow \mathcal{E} of when we can “lift” G to a functor G'  : \mathcal{C} \rightarrow \mathcal{E}. Notice the difference in direction of travel along J versus Kan extensions. Again there are two possible answers, with universal properties given by natural bijections:

  1. \mathsf{Lift}_J(G) \Rightarrow F \quad\cong\quad G \Rightarrow J \circ F.
  2. J \circ F \Rightarrow G \quad\cong\quad F \Rightarrow \mathsf{Rift}_J(G).

\mathsf{Lift}_J(G) and \mathsf{Rift}_J(G) are respectively the left and right Kan lifts of G along J. Again, if these exist for all G, we have adjunctions, now involving post composition with J:

\mathsf{Lift}_J(-) \dashv J \circ (-) \qquad\text{and}\qquad J \circ (-) \dashv \mathsf{Rift}_J(-)

One should be careful that not all the theory transfers as smoothly to lifts as it is for extensions. For example, I am not aware of an explicit construction in terms of (co)limits, or a construction similar to that we see in the next section.

Codensity Monads

For any functor

J : \mathcal{C} \rightarrow \mathcal{D}

we can consider the right Kan extension:

\mathsf{Ran}_J(J) : \mathcal{D} \rightarrow \mathcal{D}

of J along itself. Assuming this exists, we have natural transformations:

\widehat{\mathsf{id}_{\mathsf{Ran}_J(J)}} : \mathsf{Id}_{\mathcal{D}} \Rightarrow \mathsf{Ran}_J(J).

and

\widehat{ \mathsf{run} \cdot (\mathsf{Ran}_J(J) \circ \mathsf{run}) }: \mathsf{Ran}_J(J) \circ \mathsf{Ran}_J(J) \Rightarrow \mathsf{Ran}_J(J)

(The hat should cover the whole term above, but we are reaching the limits of the latex formatting technology at our disposal).

So we have an endofunctor \mathsf{Ran}_J(J) and two natural transformations with the right types to serve as a unit and multiplication for a monad. It turns out that this construction does in fact yield a monad.

If \mathcal{D} is a complete category, and \mathcal{C} is small, the required Kan extension always exists, so we can use this as another method to generate monads. We may return to this point in a future point, and consider concrete examples. Our aim for today is to understand exactly which monads can be constructed in this way.

Consider a monad \mathbb{T}. Via the Eilenberg-Moore construction, this always arises from an adjunction L \dashv R. From this, we know from the observations about Kan extensions and adjunctions above that:

\mathsf{Ran}_L(\mathsf{Id}) = R

and furthermore

\mathsf{Ran}_R(R) = R \circ \mathsf{Ran}_L(\mathsf{Id}) = R \circ L = \mathbb{T}

Therefore the endofunctor part of the monad always arises as the Kan extension of the right adjoint along itself. In fact, if we’re a bit more careful about tracking the unit and multiplication, we find that up to isomorphism, every monad arises as a codensity monad via this construction. Everything about this argument dualises smoothly, so every comonad arises as a density comonad, giving us another handle on the apparently wilder world of comonads.

Remark: Sometime you will see a claim that “this is a codensity monad”. Given that every monad is, what is normally meant by such a statement? I would normally interpret this as being about the concrete description of the monad. The Right Kan extension can be expressed as a limit, and often in a particular category a specific construction of limits is considered standard, pointing to a concrete representation of interest.

Conclusion

We have rather quickly run through some of the high-level theory of Kan extensions and codensity monads. There is a lot more to say here, particularly in terms of how specific monads arise from the codensity construction applied to mathematically natural choices of functors which aren’t merely resolutions of the chosen monad. We may return to concrete examples in later posts.

Further reading: For Kan extensions and related theory, and many other things, I would recommended “(Co)end Calculus” by Fosco Loregian. I would certainly suggest reading about the novel string diagrammatic approach to Kan extensions in “Kan Extensions for Program Optimisation Or: Art and Dan Explain an Old Trick” by my co-author Ralf Hinze. For intriguing examples of the codensity construction, “Codensity and the Ultrafilter Monad” by Leinster is a fascinating read.

Lawvere Theories

In many previous posts, we have discussed connections between monads and universal algebra. So far, we have seen that given a presentation (\Sigma,E) of an algebraic theory in terms of set of operations \Sigma, and equations E, there is a corresponding monad \mathbb{T} such that:

  • The Eilenberg-Moore algebras of the monad are the algebras described by the chosen presentation.
  • The Kleisli category is the full subcategory of free algebras.

Many other aspects of monad theory can then be stood from this algebraic perspective.

Every monad arising from a presentation in terms of operations and equations is finitary. This is a technical condition saying that certain colimits are preserved by the monad endofunctor, but we will not dwell on the precise details. Intuitively, the idea is that the functor is completely defined by its action on finite objects. The key point is that there is a correspondence between algebraic presentations and finitary monads, for which the categories of algebras coincide. It is essential that the categories of models agree for this relationship to be algebraically meaningful. Note this is not a bijection, as a given finitary monad will have infinitely many potential presentations in terms of operations and equations.

In this post, we will see a third perspective on describing algebraic structures, Lawvere theories. These sit somewhere between conventional universal algebra and monads in their level of abstraction. As with any mathematical phenomenon, each new perspective deepens our understanding, and emphasises new features and intuitions.

The Idea of Lawvere Theories

To be concrete, lets consider what we need to do to describe a monoid. The general situation for different algebraic structures will follow in the same way.

Typically, we first pick out an underlying set M, and two functions:

  1. A zero argument function, or constant, for the unit element. We can identity this as a function 1 \xrightarrow{z} M.
  2. A two argument function, or binary operation, for the monoid multiplication. We can identify this with a function M \times M \xrightarrow{m} M.

Of course, once we have these functions, we can build further ones by combining them together. For example

M \times 1 \xrightarrow{m \circ (z \times \mathsf{id}_M)} M

Which informally we can think of as corresponding to a term m(z,x). Of course, by one of the monoid unit axioms, we require that this composite is equal to

M \times 1 \xrightarrow{\pi_1} M

which we can think of as corresponding to a variable x. So in this way of describing a monoid, we:

  1. Pick out some basic operations, which are maps of the form X^n \rightarrow X.
  2. Build lots of other operations by composing the basic operations together, using category composition, and the properties of products.
  3. Require that certain of the operations are equal.

We note some issues with this naive plan. Firstly, where did the basic operations come from, and why do we distinguish them from the other operations? This bias towards certain operations seems categorically unnatural, and raises the ugly question of whether any further mathematical results we derive might be dependent on our choice of presentation. Secondly, a small but annoying detail. It was a pain that M \times 1 was only isomorphic, and not equal to M when we built our composite terms. If we build up more complicated composites involving higher Cartesian powers, we will spend a lot of time bookkeeping how to bracket expressions like:

M \times ((M \times M) \times M)

for no practical benefit. So putting all this together, it would be convenient if we could describe the theory of monoids as a category with:

  1. Objects the natural numbers, corresponding to the number of arguments of our operations.
  2. (Strict) finite products are given by adding the required arities together m \times n = m + n.
  3. Intuitively, we consider an operation of arity m to be a morphism of type m \rightarrow 1. For example, a binary operation is a morphism 2 \rightarrow 1 and a constant is a morphism 0 \rightarrow 1. Notice that, unlike the common convention, in this case 0 is the terminal object.
  4. A family of n such operations is a morphism of type m \rightarrow n. This is automatic from universal property of the assumed strict finite products.
  5. A diagram commutes in our category if and only if the corresponding operations should be equal in the theory of monoids.

In fact, it is technical important to add a little bit more structure. If we consider the category of sets, and take the full subcategory with objects

\{ \},  \{ 0 \}, \{ 0, 1 \}, \ldots

this category clearly has finite coproducts, given by:

\{ 0,\ldots, m \} + \{ 0, \ldots, n \} = \{ 0, \ldots, m + n \}

If we identify the objects of this category with the natural numbers, via:

n = \{ m \in \mathbb{N} \mid m < n \}

this yields a category with finite coproducts \aleph_0, and therefore a category with finite products \aleph^{op}_0.

Moving away from the special case of monoids, we then define a Lawvere theory to be a category \mathcal{L} with strict finite products, with an identity on objects strict finite product preserving functor I : \aleph^{op}_0 \rightarrow \mathcal{L}. The category \mathcal{L} can be interpreted as encoding the operations and equations of the theory. We shall return to the interpretation of the inclusion I later. A morphism of Lawvere theories H : (\mathcal{L},I) \rightarrow (\mathcal{L}',I') is a finite product preserving functor H : \mathcal{L} \rightarrow \mathcal{L}' such that I' \circ H = I.

What we really wanted to do was describe a class of algebraic structures, using our theory. A model of a Lawvere theory in a category with finite products \mathcal{C} is a finite product preserving functor

A : \mathcal{L} \rightarrow \mathcal{C}

and a homorphism of models is a natural transformation between them. For conventional algebraic structures, we take \mathcal{C} = \mathsf{Set}. Note we don’t require that the models interact appropriately with the product structure, as in fact this is automatic. The category of models of a Lawvere theory and their homomorphisms is denoted

\mathsf{Mod}(\mathcal{L}, \mathcal{C})

This approach of identifying structures with functors from a suitably structured category is known as functorial semantics. Notice the difference with monads as a categorical tool for describing algebraic structures. The Eilenberg-Moore category of a monad captures algebraic structures on a fixed base category, whereas we can straightforwardly considering models of Lawvere theories in any category with finite products. There is a forgetful functor given by evaluation at the object 1:

\mathsf{Mod}(\mathcal{L},\mathcal{C}) \rightarrow \mathcal{C}

In many cases, for example if \mathcal{C} is locally presentable, this functor will have a left adjoint, and will therefore induce a monad on \mathcal{C}. So a single Lawvere theory induces monads on many different base categories.

If we have two Lawvere theories such that we have equivalence:

\mathsf{Mod}(\mathcal{L}, \mathsf{Set}) \simeq \mathsf{Mod}(\mathcal{L}', \mathsf{Set})

then there is an isomorphism \mathcal{L} \cong \mathcal{L}' in the category of Lawvere theories. So there is a very tight correspondence between Lawvere theories and the category of models that they describe.

How to build a Lawvere – take one

So how do we build a Lawvere theory? One approach is to start with a presentation (\Sigma, E). We then form a category with:

  1. Object the natural numbers
  2. Morphisms n \rightarrow m are m-tuples of terms in n variables, quotiented by provable equality in equational logic, with respect to the equations of our presentation.
  3. The product projection onto the ith component of a product corresponds to the term for the ith variable.
  4. Identity morphisms are (necessarily) the tuples of projection morphisms.
  5. Composition is given by substitution of terms.
  6. The inclusion morphism I : \aleph^{op}_0 \rightarrow \mathcal{L} picks out the projection morphism in \mathcal{L}.

So we see the intuition for the inclusion I is that it picks out the trivial terms corresponding to bare variables within the broader collection of operations.

Example: We can build a Lawvere theory for monoids from the usual presentation in terms of a unit and multiplication operation, and unitality and associativity axioms. The category \mathsf{Mod}(\mathcal{L}, \mathsf{Set}) is then equivalent to the usual category of monoids.

Note is important that we do not require models of a Lawvere theory strictly preserves product structure. For the usual products in \mathsf{Set}, if we did this, the category of models for the Lawvere theory of monoids would be empty!

How to build a Lawvere Theory – take two

Instead of starting with a presentation in terms of operations and equations, what if we were given a finitary monad? Can we construct a Lawvere theory directly from this data?

The structure of a Lawvere theory in terms of equivalence classes of terms composed by substitution sounds suspiciously like the algebraic perspective on the Kleisli category of a monad. In fact, the two are very similar, up to some cosmetic details. For a finitary \mathsf{Set}-monad \mathbb{T}, we can construct a Lawvere theory (\mathcal{L},I) as follows:

  1. Form the Kleisli category \mathsf{Set}_{\mathbb{T}}.
  2. Restrict to the full subcategory of objects \{\}, \{ 0 \}, \{ 0, 1 \}, \ldots, and rename the objects to natural numbers in a similar way to the section above.
  3. Take the opposite category.

Tersely, we say the required Lawvere theory is the opposite of (the skeleton of) the full subcategory of finitely generated free algebras. As we would hope, we have an equivalence between \mathsf{Set}^{\mathbb{T}} and \mathsf{Mod}(\mathcal{L}, \mathsf{Set}), so the categories of models coincide.

Monads to Lawvere Theories

Given a Lawvere theory (\mathcal{L},I), we would like to go in the other direction, and construct a corresponding finitary monad. This can be done by taking a coend:

\mathbb{T}(X) = \int^{n \in \aleph_0} \mathcal{L}(n, 1) \times X^{n}

If this is unfamilar, the idea is that we can construct the required monad as a certain categorical colimit. Intuitively, we form the collection of equivalence classes of terms, identifying product structure where appropriate.

Again, the categories of Eilenberg-Moore algebras and models of the Lawvere theory in \mathsf{Set} agree up to equivalence.

In fact, the connection between Lawvere theories and finitary monads is about as strong as we might hope. The category of Lawvere theories is equivalent to the category of finitary monads on \mathsf{Set}. The presence of the inclusion morphism I in the definition of a Lawvere theory ensures that this equivalence goes through correctly.

Conclusion

Lawvere theories give a presentation independent formulation of algebra. They have several strengths:

  1. They are relatively close in form to concepts from conventional universal algebra. In particular, they can be seen as an abstraction of the notion of clone.
  2. Unlike monads, a single object can describe models taken in different categories. For example, we can take models of the Lawvere theory of monoids in sets, posets, topological spaces and so on.
  3. Certain operations for combining theories are more natural from the perspective of Lawvere theories, rather than that of finitary monads.
  4. They are algebraic by construction. By comparison, monads include structures such as the continuation monad, which don’t have a clear interpretation in terms of universal algebra.

There are many generalisations of Lawvere theories, for example to incorporate operations of larger aritites, or enrichments of the base category. These typically come with a correspondence to a natural class of monads, providing a different perspective to work with. The functorial semantics of Lawvere theories also suggests other structures, such as PROs and PROPs, which move us further away from convention algebra to structures potentially having multiple outputs as well as inputs.

One intriguing question is what is the equivalent of Lawvere theories for comonads. At this point there does not seem to be a satisfactory answer, as highlighted in the paper “Category Theoretic Understandings of Universal Algebra and its dual: monads and Lawvere theories, comonads and ?” This example highlight how comonad theory cannot always be seen as routinely “flipping monad theory upside down”, and sometimes a routine application of duality isn’t enough.

Further reading: An excellent source for background is the book “Algebraic Theories” by Adamek, Rosicky and Vitale. Also useful for background is Hyland and Power’s “The Category Theoretic Understanding of Universal Algebra: Lawvere Theories and Monads”. Power has developed much of the theory generalising Lawvere theories in various ways, and applying them in computer science.

The Continuation Monad

In this post, we introduce the continuation monad. This monad is slightly less well-behaved that some of the previous examples we have encountered, and a source of counterexamples to naive conjectures. Along the way, we will touch on some previous topics, such as monad morphisms, Kleisli and Eilenberg-Moore categories, and commutative monads.

The Monad Construction

We begin by establishing a relatively simple, but important form of adjunction. We do this at a greater level of generality than some of our earlier posts, to highlight the fairly minimal requirements. This adjunction then yields our monad in the usual way.

For any symmetric monoidal category \mathcal{V}, the monoidal closure means that for every object A, there is an adjunction:

(-) \otimes A \dashv A \multimap (-)

For a fixed object R, the functor (-) \multimap R is contravariant. That is, we can view it as having type \mathcal{V}^{op} \rightarrow \mathcal{V} or \mathcal{V} \rightarrow \mathcal{V}^{op} as is convenient to us.

Using the definitions, closure adjunction and symmetry, we have a series of natural bijections between the following homsets, involving the contravariant functor (-) \multimap R:

  • Morphisms A \multimap R \rightarrow B in \mathcal{V}^{op}
  • Morphisms B \rightarrow A \multimap R in \mathcal{V}
  • Morphisms B \otimes A \rightarrow R in \mathcal{V}
  • Morphisms A \otimes B \rightarrow R in \mathcal{V}
  • Morphisms A \rightarrow B \multimap R

These natural bijections establish that there is an adjunction:

(-) \multimap R : \mathcal{V} \rightarrow \mathcal{V}^{op} \dashv (-) \multimap R : \mathcal{V}^{op} \rightarrow \mathcal{V}

and therefore a monad on \mathcal{V}:

(((-) \multimap R) \multimap R, \eta, ((-) \multimap R) \circ \epsilon (-)  \circ ((-) \multimap R)

where \eta and \epsilon are the unit and counit of the adjunction. This monad is known as the continuation monad.

The Continuation Monad on Sets

To get a handle on what is going on, lets consider the special case where \mathcal{V} is the Cartesian closed category of sets and functions. The unit of the adjunction is then:

\eta(a)(f) = f(a)

and the counit is the same map, but in \mathsf{Set}^{op}. The multiplication of the monad at A is:

\mu(f )(g : A \Rightarrow R)) = f (\lambda h : (A \Rightarrow R) \Rightarrow R. h(g))

In the above, we’ve used lambda notation for inline functions, and added selected type annotations to emphasise how the input data is being used. Notice no copying or reuse of arguments takes place. This specific example lives in a Cartesian closed category, but the construction is inherently symmetric monoidal in nature, and therefore linear in its use of its arguments.

Kleisli Structure

A Kleisli morphism of type A \rightarrow B is a morphism in the underlying category of type A \rightarrow (B \multimap R) \multimap R. In the case of \mathsf{Set}, this is a function:

f : A \rightarrow (B \Rightarrow R) \Rightarrow R

From a computer science perspective, we can think of an element k : B \Rightarrow R as a continuation. This is how a computation should continue once it has computed a value of type B, eventually yielding a result of type R. An element g \in (B \Rightarrow R) \Rightarrow R is a computation yielding a B, that requires a continuation as a parameter, telling it how to continue to yield final result of type R.

Eilenberg-Moore Algebras

The Eilenberg-Moore algebras can be difficult to describe in the general case. We focus attention on a particular result. Consider the two element set 2 = \{ 0, 1 \}. The functor (-) \Rightarrow 2 : \mathsf{Set}^{op} \Rightarrow \mathsf{Set} is the contravariant powerset functor. An element \xi \in X \Rightarrow 2 can be seen a characteristic function, mapping elements appearing in a subset to 1, and the others to 0. This means that it has a left adjoint, as we already know, and their is an equivalent between the category of Eilenberg-Moore algebras, and \mathsf{Set}^{op}. Therefore, in this case:

\mathsf{Set}^{((-) \Rightarrow 2) \Rightarrow 2} \simeq \mathsf{Set}^{op}

It is well known (in the right circles!) that \mathsf{Set}^{op} is equivalent to the category \mathsf{CABA} of complete atomic Boolean algebras and morphisms preserving their joins and meets.

Remark: Notice there is a pronounced different here to considering complete Boolean algebras and their usual morphisms preserving the algebraic structure. In this case it is well known the forgetful functor to \mathsf{Set} does not have a left adjoint, as the free complete Boolean algebra on three generators does not exist. This is an instructive counterexample to the naive intuition that these algebraic constructions always “just work out”. Once you move beyond collection of operations of a fixed cardinality bound, you need to be much more careful.

A topos is a category rather similar to the category of sets and function, and \mathsf{Set} is probably the simplest example. Toposes are Cartesian closed, and have an object \Omega known as a subobject classifier, which can be thought of as a collection of truth values. The functor (-) \Rightarrow \Omega is the analog of the contravariant powerset functor on \mathsf{Set}, and again is monadic, so for any topos \mathcal{T}, we have a wide generalisation of the equivalence above:

\mathcal{T}^{((-) \Rightarrow \Omega) \Rightarrow \Omega} \simeq \mathcal{T}^{op}

This result is important for establishing finite completeness of toposes, via their finite completeness, and theory of Eilenberg-Moore categories.

Monad Morphisms and Algebras

We now turn to an interesting property of the continuation monad, that can be thought of as a very general representation result.

Consider a term in the algebraic theory of monoids, for example:

x_1 \times (x_2 \times x_3)

where x_1, x_2, x_3 come from some set X of variables. How can we evaluate this term in a particular monoid, say (R,\times,1)?

Firstly, we need to know what values to use for each variable, which we could specify via a function X \rightarrow R. Secondly, we really only need to specify the value for each equivalence class of terms, so the elements of \mathbb{L}(X), where \mathbb{L} is the free monoid (list) monad. So a crude first step is we need a function:

\mathbb{L}(X) \times (X \Rightarrow R) \rightarrow R

Given what we have seen in this post, it seems natural to rearrange this to emphasise the terms are the input, into the form:

\mathbb{L}(X) \rightarrow (X \Rightarrow R) \Rightarrow R

Obviously, the way we evaluate terms should not depend on which set we use to name the variables, so we expect this actually to be a family of maps, natural in X.

The algebraic structure also suggests that:

  1. The value for a term that is just a variable should be the value assigned to that variable.
  2. Evaluation should be compatible with substitution of terms.

These final two conditions correspond to the monad morphism axioms. So we are interested in monad morphisms of type

\mathbb{L} \rightarrow ((-) \Rightarrow R) \Rightarrow R

Obviously, there’s nothing special about our example of the list monad, so we’re really interested in monad morphisms of the form:

\mathbb{T} \rightarrow ((-) \Rightarrow R) \Rightarrow R

It turns out, there is a bijection between:

  1. Eilenberg-Moore algebra \alpha : \mathbb{T}(R) \rightarrow R.
  2. Monad morphisms \sigma : \mathbb{T} \rightarrow ((-) \Rightarrow R) \Rightarrow R

For \mathsf{Set} monads, the monad morphism resulting from an Eilenberg-Moore algebra has action:

\sigma(t)(k) = \alpha(\mathbb{T}(k)(t))

In the other direction, give a monad morphism, we get an Eilenberg-Moore algebra with action:

\alpha(t) = \sigma(t)(\mathsf{id}_{R})

In fact, this result goes through in much greater generality, but discussing the technical details would get distracting. Interested readers can refer to Kock’s “On Double Dualization Monads”.

This correspondence between algebras and monad morphisms opens up new possibilities. To see this, as a preliminary step, say that two monad morphisms:

  1. \sigma_1 : \mathbb{T}_1 \rightarrow \mathbb{S}
  2. \sigma_2 : \mathbb{T}_2 \rightarrow \mathbb{S}

commute if the following equality holds:

\mathsf{dst} \circ (\sigma_1 \otimes \sigma_2) = \mathsf{dst}' \circ (\sigma_1 \otimes \sigma_2)

where \mathsf{dst} and \mathsf{dst}' are the two double strength morphism. With this in place, we can say that two Eilenberg-Moore algebras \alpha_1 : \mathbb{T}_1(R) \rightarrow R and \alpha_2 : \mathbb{T}_2(R) \rightarrow R commute if their corresponding monad morphisms \mathbb{T}_i \rightarrow ((-) \multimap R) \multimap R commute as defined above. We then say an algebra is commutative if it commutes with itself.

This extends the notion of commutativity we have encountered previously in two directions:

  1. We consider commutativity for individual algebras, rather than for a monad as a whole.
  2. We can consider mixed forms of commutativity, for algebras of potentially different monads on the same base object.

We can also relate this form of commutativity with the previous definition. The following are equivalent for a monad \mathbb{T}.

  1. Every Eilenberg-Moore algebra is commutative.
  2. Every monad morphism with domain \mathbb{T} commutes with itself.
  3. The monad \mathbb{T} is commutative.

This theorem deepens our understanding of commutativity, allowing us to relate algebra-wise and monad-wise notions.

Conclusion

This post is relatively long. Despite this, we have only very briefly mentioned continuations, which are a fiddly topic in themselves. The continuation monad is an intriguing example of a monad, which can be hard to grasp as first sight. It is what is known as a monad without rank, which limits the scope of the algebraic techniques we have encountered previously.

Further reading: Johnstone’s “Stone Spaces” discusses the complete atomic Boolean algebra duality with \mathsf{Set}, and that the free complete Boolean algebra doesn’t exist. His “Sketches of an Elephant” gives a comprehensive account of topos theory, with the important monadicity theorem which we discussed appearing in the first volume.

Distributive Laws are Monads

Early on we claimed, with no explanation, that distributive laws are monads. As the title of this post suggests, we are now in a place to explain the details of what this rather intriguing statement means. This will require our previous experience of monads in a 2-category, and distributive laws. In particular, the ability to define monads within arbitrary 2-categories gives us an awful lot of additional expressive freedom. Our first step will be to introduce a 2-category suitable for formalising relationships between monads.

A 2-category of Monads

The 2-category that we will be interested in has monads as 0-cells, Eilenberg-Moore laws as 1-cells, and compatible natural transformations as 2-cells. We now spell out the details.

For an arbitrary 2-category \mathbf{K}, there is a 2-category \mathbf{Mnd}(\mathbf{K}) with:

  1. 0-cells: Tuples (\mathcal{C}, \mathbb{T}, \eta, \mu) consisting of a 0-cell \mathcal{C}, such that (\mathbb{T},\eta,\mu) is a monad on \mathcal{C} in \mathbf{K}.
  2. 1-cells: A 2-cell (\mathcal{C}, \mathbb{S}, \eta^{\mathbb{S}}, \mu^{\mathbb{S}}) \rightarrow (\mathcal{D}, \mathbb{T}, \eta^{\mathbb{T}}, \mu^{\mathbb{T}}) is a pair (U,\varphi) consisting of a 1-cell U : \mathcal{C} \rightarrow \mathcal{D}, and a 2-cell \varphi : \mathbb{T} \circ U \Rightarrow U \circ \mathbb{S}, such that \varphi is an Eilenberg-Moore law.
  3. 2-cells: A 2-cell (U,\varphi) \Rightarrow (U',\varphi') is a 2-cell \sigma : U \Rightarrow U' such that \varphi' \cdot (\mathbb{T} \circ \sigma) = (\sigma \circ \mathbb{S}) \cdot \varphi.

Horizontal composition is given by

(V,\psi : \mathbb{T} \circ V \Rightarrow V \circ \mathbb{S}) \circ (U,\varphi : \mathbb{S} \circ U \Rightarrow U \circ \mathbb{R}) = (V \circ U, (V \circ \varphi) \cdot (\psi \circ V))

with

\mathsf{Id}_{(\mathcal{C},\mathbb{T},\eta,\mu)} = (\mathsf{Id}_{\mathcal{C}}, \mathsf{id}_{\mathbb{T}})

The vertical composition and identities for 2-cells in \mathbf{Mnd}(\mathbf{K}) are inherited from \mathbf{K}.

Monads in the 2-category of Monads

We now consider what a monad in \mathbf{Mnd}(\mathbf{K}) boils down to. As we will be working a 2-category constructed from another 2-category, and with 3 different monads, we will be fussy in spelling out the details carefully.

A monad in \mathbf{Mnd}(\mathbf{K}) consists of four pieces of data:

  1. A 0-cell in \mathbf{Mnd}(\mathbf{K}). That is, some monad (\mathcal{C}, \mathbb{S}, \eta^{\mathbb{S}}, \mu^{\mathsf{S}}) in \mathbf{K}.
  2. A 1-cell in \mathbf{Mnd}(\mathbf{K}) of type (\mathcal{C}, \mathbb{S}, \eta^{\mathbb{S}}, \mu^{\mathsf{S}}) \rightarrow (\mathcal{C}, \mathbb{S}, \eta^{\mathbb{S}}, \mu^{\mathsf{S}}). That is, a 1-cell \mathbb{T} : \mathcal{C} \rightarrow \mathcal{C} in \mathbf{K}, and an Eilenberg-Moore law \lambda : \mathbb{S} \circ \mathbb{T} \Rightarrow \mathbb{T} \circ \mathbb{S}.
  3. A unit 2-cell of type \mathsf{Id}_{(\mathcal{C}, \mathbb{S}, \eta^{\mathbb{S}}, \mu^{\mathsf{S}})} \Rightarrow (\mathbb{T}, \lambda) in \mathbf{Mnd}(\mathbf{K}). That is, a 2-cell \eta^{\mathbb{T}} : \mathsf{Id}_{\mathcal{C}} \Rightarrow \mathbb{T} in \mathbf{K} such that \lambda \cdot (\mathbb{S} \circ \eta^{\mathbb{T}}) = \eta^{\mathbb{T}} \circ \mathbb{S}.
  4. A multiplication 2-cell of type (\mathbb{T}, \lambda) \circ (\mathbb{T}, \lambda) \Rightarrow (\mathbb{T}, \lambda) in \mathbf{Mnd}(\mathbf{K}). That is, a 2-cell \mu^{\mathbb{T}} : \mathbb{T} \circ \mathbb{T} \Rightarrow \mathbb{T} in \mathbf{K} such that \lambda \cdot (\mathbf{S} \circ \mu^{\mathbb{T}}) = \mu^{\mathbb{T}} \cdot (\mathbb{T} \circ \lambda) \cdot (\lambda \circ \mathbb{T}).

The unit and associativity axioms for a monad in \mathbf{Mnd}(\mathbf{K}) are equivalent to requiring that (\mathbb{T}, \eta^{\mathbb{T}}, \mu^{\mathbb{T}}) satisfies those axioms for a monad on \mathcal{C} in \mathbf{K}. Finally, the two equations satisfied by the unit and multiplication as 2-cells in \mathbf{Mnd}(\mathbf{K}) are equivalent to requiring that \lambda is a Kleisli-law. As we noted previously, being a distributive law is equivalent to simultaneously being an Eilenberg-Moore law and a Kleisli law. Therefore, a monad in \mathbf{Mnd}(\mathbf{K}) is equivalent to the following data:

  1. A 0-cell \mathcal{C} in \mathbf{K}.
  2. A pair of monads (\mathbf{S}, \eta^{\mathbb{S}}, \mu^{\mathbb{S}}) and (\mathbb{T}, \eta^{\mathbb{T}}, \mu^{\mathbb{T}}) on \mathcal{C} in \mathbf{K}.
  3. A distributive law \mathbb{S} \circ \mathbb{T} \Rightarrow \mathbb{T} \circ \mathbb{S} in \mathbf{K}.

Roughly speaking, up to being more careful to specify all the required data, a distributive law in \mathbf{K} is the same thing as a monad in \mathsf{Mnd}(\mathbf{K}).

Example: For the “ordinary” monads discussed in the previous post, a distributive law is a monad in \mathbf{Mnd}(\mathbf{Cat}).

If we iterate the monad 2-category construction, a distributive law in \mathbf{K} is the same thing as a 0-cell in \mathbf{Mnd}(\mathbf{Mnd}(\mathbf{K})).

Conclusion

This relatively short post has unravelled an observation of Street, to show how a distributive law can be seen as a monad in a suitable 2-category. This both served as an illustration of the power of the notion of monad in a 2-category, and provided an excuse to introduce the 2-category \mathsf{Mnd}(\mathbf{K}). This 2-category is important more broadly in the theory of monads and distributive laws. In particular, by using the different dualities that apply in a 2-category, which we encountered when discussing comonads, we can incorporate comonads and Kleisli laws in the same setting. By iterating these constructions, we can consider different flavours of distributive laws such as monad-comonad or comonad-monad distributive laws. Hopefully we will get the chance to return to these topics later.

Further reading: The constructions outlined in this post were introduced in Street’s “Formal Theory of Monads”, along with vastly more important ideas than we have had chance to discuss. The 2-category \mathsf{Mnd}(\mathbf{K}) is also discussed in a very readable way, with more motivation for computer scientists, in Power and Watanabe’s “Combining a Monad and a Comonad”, along with many other interesting developments.

Composition and Distributive Laws

A common theme in many scientific disciplines is the desire to build up complicated objects out of simpler parts. This compositional perspective allows us to work in a modular fashion, rather than addressing each new problem as a monolith that must be investigated from scratch.

With this in mind, given two monads \mathbb{T}, \mathbb{S}, it would be very convenient if the composite endofunctor \mathbb{T} \circ \mathbb{S} was also a monad in some canonical way. Unfortunately, this is not always the case. Although this is initially disappointing, the study of when such composites are monads is a rich and interesting subject.

Composing Endofunctors

We begin with some simple concrete examples, to get a sense of the various possibilities when we try to compose two monads as endofunctors. We will say that an endofunctor F can be given the structure of a monad, if there exists any \eta : \mathsf{Id} \Rightarrow F and \mu : F \circ F \Rightarrow F satisfying the unit and associativity axioms. We then note the following instructive examples:

  1. Trivially, for any monad \mathbb{S}, both the composites with the identity monad \mathsf{Id} \circ \mathbb{S} and \mathbb{S} \circ \mathsf{Id} carry the structure of a monad. As an equally trivial corollary of this observation, as there are endofunctors that carry multiple monad structures, this will also be a possibility for composite monads.
  2. For any monad \mathbb{S}, its post composition with the exception monad \mathbb{S}(- + E) carries the structure of a monad.
  3. The composition of the powerset monad with itself \mathbb{P} \circ \mathbb{P} does not carry the structure of a monad. This was shown by Klin and Salamanca “Iterated Covariant Powerset is not a Monad” in 2018.

This is a pretty mixed picture. There are monads such that composition with their endofunctor on one or both sides results in a monad every time, and at the other extreme, some monads can’t even be composed with themselves.

Distributive Laws

Our aim now is to try and deduce some sufficient conditions under which monads compose. We will consider two monads (\mathbb{S}, \eta^{\mathbb{S}}, \mu^{\mathbb{S}}) and (\mathbb{T},\eta^{\mathbb{T}}, \mu^{\mathbb{T}}) on the same base category. Our notation is a bit fussier than normal, using superscripts on the unit and counit to make it clear which monad is intended.

We would like to find a monad structure on \mathbb{T} \circ \mathbb{S}. There is an obvious choice for the unit:

\eta^{\mathbb{T}} \circ \eta^{\mathbb{S}} : \mathsf{Id} \Rightarrow \mathbb{T} \circ \mathbb{S}

If we trying composing the two multiplications together in the same way, we get:

\mu^{\mathbb{T}} \circ \mu^{\mathbb{S}} : \mathbb{T} \circ \mathbb{T} \circ \mathbb{S} \circ \mathbb{S} \Rightarrow \mathbb{T} \circ \mathbb{S}

We now see a problem, we have a natural transformation of type \mathbb{T} \circ \mathbb{T} \circ \mathbb{S} \circ \mathbb{S} \Rightarrow \mathbb{T} \circ \mathbb{S}, not the type \mathbb{T} \circ \mathbb{S} \circ \mathbb{T} \circ \mathbb{S} \Rightarrow \mathbb{T} \circ \mathbb{S} which we require. A natural attempt to address this problem would be to introduce some natural transformation \lambda : \mathbb{S} \circ \mathbb{T} \Rightarrow \mathbb{T} \circ \mathbb{S}, and form the composite:

(\mu^{\mathbb{T}} \circ \mu^{\mathbb{S}}) \cdot (\mathsf{Id}_{\mathbb{T}} \circ \lambda \circ \mathsf{Id}_{\mathbb{S}}) : \mathbb{T} \circ \mathbb{S} \circ \mathbb{T} \circ \mathbb{S} \Rightarrow \mathbb{T} \circ \mathbb{S}

Of course, we can’t just use any old natural transformation \lambda, it will need to satisfy some axioms in order for the resulting construction to be a monad. It is an interesting exercise to try and deduce these by attempting a proof, which I would highly recommend.

Sufficient conditions are two unit axioms:

\lambda \cdot (\eta^{\mathbb{S}} \circ \mathbb{T}) = \mathbb{T} \circ \eta^{\mathbb{S}} \qquad \lambda \cdot (\mathbb{S} \circ \eta^{\mathbb{T}}) = \mathbb{S} \circ \eta^{\mathbb{T}}

and two multiplication axioms:

\lambda \cdot (\mathbb{T} \circ \mu^{\mathbb{S}}) = (\mathbb{T} \circ \mu^{\mathbb{S}}) \cdot (\lambda \circ \mathbb{S}) \cdot (\mathbb{S} \circ \lambda) \qquad \lambda \cdot (\mathbb{S} \circ \mu^{\mathbb{T}}) =  (\mu^{\mathbb{T}} \circ \mathbb{S}) \cdot (\mathbb{T} \circ \lambda) \cdot (\lambda \circ \mathbb{T})

A natural transformation \lambda : \mathbb{S} \circ \mathbb{T} \Rightarrow \mathbb{T} \circ \mathbb{S} satisfying these four equations is known as a distributive law. This construction was first discovered by Jon Mock Beck, and they are sometimes referred to as Beck distributive laws for emphasis. Interestingly, the required axioms are exactly that \lambda is both a Kleisli law and an Eilenberg-Moore law, which we encountered previously.

Of course, we haven’t actually demonstrated that such natural transformations can be found, so it’s time for some examples.

Example: For the free monoid (list) monad \mathbb{L}, and the Abelian group monad \mathbb{A}, there is a distributive law:

\lambda : \mathbb{L} \circ \mathbb{A} \Rightarrow \mathbb{A} \circ \mathbb{L}

We can think of the elements of (\mathbb{L} \circ \mathbb{A})(X) as equivalence classes of products-of-sums, for example:

(x_1 + x_2) \times (x_3 + x_4)

If we apply the usual distributivity axiom for multiplication over addition, we get the term:

x \times (y + z) = x \times y + x \times z

our example term becomes a sum-of-products:

x_1 \times x_3 + x_1 \times x_4 + x_2 \times x_3 + x_2 \times x_4

which is an element of (\mathbb{A} \circ \mathbb{L})(X). This operation on terms is the idea behind the action of the distributive law, and also inspires the name distributive law. The resulting monad in the free ring monad.

Example: Again for the list and Abelian group monads, there is no distributive law composing in the opposite order, of the type:

\mathbb{A} \circ \mathbb{L} \Rightarrow \mathbb{L} \circ \mathbb{A}

This gives us an example of a pair of monads with a distributive law in one order, but not the other.

Example: Similarly to the ring monad example, there is a distributive law between the list and powerset monads:

\mathbb{L} \circ \mathbb{P} \Rightarrow \mathbb{P} \circ \mathbb{L}

The action of this distributive is:

[X_1,\ldots,X_n] \mapsto \{ [x_1,\ldots,x_n] \mid x_i \in X_i \}

As the powerset monad is the free complete join semilattice monad, and list is the free monoid monad, we can view the action of this distributive law algebraically as:

\bigvee X_1 \otimes \ldots \bigvee X_n = \bigvee \{ x_1 \otimes \ldots \otimes x_n \mid x_i \in X_i \}

Essentially, this distributive law comes from axioms of the form:

x \otimes \bigvee Y = \bigvee \{ x \otimes y \in Y \} \qquad \bigvee X \otimes y = \{ x \otimes y \mid x \in X \}

The algebras of the resulting monad are known as quantales.

Example: As a more computational example, consider the exception monad for any object E, and and arbitrary monad \mathbb{T}. There is a distributive law:

\mathbb{T} + E \Rightarrow \mathbb{T}(- + E)

Abstractly, this law has components:

\mathbb{T}(X) + E \xrightarrow{[\mathbb{T}(\kappa_1), \eta^{\mathbb{T}} \circ \kappa_2]} \mathbb{T}(X + E)

where \kappa_1 and \kappa_2 are the coproduct injections. For example, if we take monad \mathbb{T} to be the list monad, the distributive law acts on the left component of the coproduct as:

(0,[x_1,\ldots,x_n]) \mapsto [(0,x_1),\ldots,(0,x_n)]

and on the right component:

(1,e) \mapsto [(1,e)]

The next example illustrates that distributive laws are only one aspect of the composition question for monads.

Example: As another negative result, there is no distributive law for the list monad over itself \mathbb{L} \circ \mathbb{L} \Rightarrow \mathbb{L} \circ \mathbb{L}, but \mathbb{L} \circ \mathbb{L} does carry the structure of a monad (as was pointed out by Bartek Klin). So the lack of a distributive law does not spell doom for finding any monad structure at all on the composite endofunctor.

Given the previous example, one might ask what are the advantages of knowing the composite monad \mathbb{T} \circ \mathbb{S} arises via a distributive law? The key point is that we then get sensible relationships between the composite and its component parts. There are monad morphisms \mathbb{S} \Rightarrow \mathbb{T} \circ \mathbb{S} and \mathbb{T} \Rightarrow \mathbb{T} \circ \mathbb{S}. These in turn induce functors between the corresponding Kleisli and Eilenberg-Moore categories, as we have seen previously. Furthermore, \mathbb{T} lifts to a monad on the Eilenberg-Moore category of \mathbb{S}, and \mathbb{S} lifts to a monad on the Kleisli category of \mathbb{T}.

Conclusion

We have only touched on the topic of distributive laws in this post. Hopefully it’s fairly obvious to see that these ideas will generalise to the 2-categorical and bicategorical definitions of monads we recently discussed. There are results giving sufficient conditions, either for when a distributive laws does exist, or precluding that possibility. There is also a lot of open ground for new results in this area, showing that in some ways our understanding of monads is still rather naive.

For those interested in distributive laws, I would thoroughly recommend Jon Beck’s original 1969 paper, which is a very enjoyable read.

One final note of caution. Distributive laws are a notorious source of difficulty and errors in the literature. If you are working with them, or depending on the results of other, it is worth putting in extra effort to make sure you are on firm ground.

Comonads – Monads Upside Down

Over many previous posts, we have investigated various aspects of monads. To discuss monad theory properly, it is often useful to consider their interaction with the dual notion, comonads. Laying the foundations for such discussions is the aim of the current post. Although comonads can be seen as a special case of monads, that doesn’t really capture how they feel in practice. Our objectives for the current post are:

  1. To introduce several examples of comonads in the familiar setting of sets and functions.
  2. To describe some different ways in which comonad are “monads, but upside down”.

Although comonads are formally dual to monads, and therefore “just the same thing”, they are often seen as rather mysterious. A quick experiment on your favourite search engine will find a lot less discussion of comonads than monads for example. One might be tempted to speculate as to why that is, historical mathematical bias, something fundamental about the nature of mathematics,..? We shall avoid such philosophical musing, and concentrating on trying to build up some familiarity.

Introducing Comonads

A comonad on a category \mathcal{C} consists of:

  1. An endofunctor \mathbb{D} : \mathcal{C} \rightarrow \mathcal{C}.
  2. A counit natural transformation \epsilon : \mathbb{D} \Rightarrow \mathsf{Id}_{\mathcal{C}}.
  3. A comultiplication natural transformation \delta : \mathbb{D} \Rightarrow \mathbb{D} \circ \mathbb{D}.

This data must satisfy the following counitality and coassociativity equations:

(\epsilon \circ \mathbb{D}) \cdot \delta = \mathsf{Id}_{\mathbb{D}} = (\mathbb{D} \circ \epsilon) \cdot \delta \qquad (\mathbb{D} \circ \delta) \cdot \delta = (\delta \circ \mathbb{D}) \cdot \delta

The terms counit and comultiplication drop heavy hints that comonads are a dual notion to monads. We will return to this point shortly. First we plough on, with lots of examples.

Example: Lists are a standard example for monads. For comonads, the non-empty lists provide a similarly canonical example. Let \mathbb{L}^+ be the endofunctor that sends a set to its collection of non-empty lists. This functor is a comonad, with counit the function that returns the tail of the list, and comultiplication mapping a list to its list of prefixes, for example:

\delta([1,2,3]) = [[1],[1,2],[1,2,3]]

Unlike the list monad, we can bound the length of our lists to form another comonad \mathbb{L}^+_k of non-empty lists of length at most k \in \mathbb{N} elements.

As with monads, it’s worth considering some instructive degenerate examples. The first one hopefully wont be a big shock.

Example: For any category \mathcal{C}, the identity functor \mathsf{Id}_{\mathcal{C}} is a comonad, with the counit and comultiplication both being identity morphisms.

The previous example gives a trivial answer to a natural question. It is possible for an endofunctor to carry the structure of both a monad and a comonad. In fact, so does the endofunctor \mathbb{L}^+, which is perhaps a bit more interesting.

The next example is the comonadic analog of the inconsistent monads, in that it provides a healthy source of counterexamples.

Example: For any category with an initial object, the constant functor mapping to the initial object is a comonad. The counit and comultiplication are the unique morphisms:

0 \xrightarrow{\epsilon_X} X \qquad 0 \xrightarrow{\delta_X} 0

We might think of this as the overspecified comonad.

The next few of examples are comonadic relatives of the writer, reader and state monads we’ve encountered previously.

Example: For a fixed set E, the endofunctor (-) \times E is a comonad, with counit:

\epsilon((x,e)) = x

and comultiplication:

\delta((x,e)) = ((x,e),e)

Together these form a comonad, which we shall refer to as the reader comonad. Despite the name, this comonad is dual to the exception monad. Computationally, we can think of a map X \times E \rightarrow Y as a function from X to Y that can depend on some additional data it reads from the environment E.

Recall that for an adjunction L \dashv R, the composition R \circ L is a monad. Dually, the composite L \circ R is a comonad. The following is a standard example of this phenomenon.

Example: For a fixed object S, in a Cartesian closed category we have adjunction:

(-) \times S \dashv S \Rightarrow (-)

Therefore, the functor:

X \mapsto (S \Rightarrow X) \times S

is a comonad. Computationally, if we think of S as a collection of states, morphisms:

(S \Rightarrow X) \times S \rightarrow Y

bijectively correspond to morphisms:

(S \Rightarrow X) \rightarrow (S \Rightarrow Y)

which transforms state dependent values of type X to state dependent elements of type Y. This comonad is often referred to as the costate comonad.

Example: Given a monoid (M, \times,1), the functor M \Rightarrow (-) is a comonad, with counit:

\epsilon(f) = f(1)

and comultiplication:

\delta(f)(m)(n) = f(m \times n)

This is sometimes referred to as the exponent comonad.

A special case of this construction is an important example in itself. Consider the exponent comonad on the monoid of additive natural numbers (\mathbb{N},+,0). We can think of an element of \mathbb{N} \Rightarrow X as an infinite lists, or stream, of values from X:

[f(0),f(1),f(2),\ldots]

The counit returns the first element of the stream, and the comultiplication gives the stream of tails, as follows:

\delta([1,2,3,\ldots]) = [[1,2,3,\ldots],[2,3,4,\ldots],[3,4,5,\ldots],\ldots]

This construction is referred to as the stream comonad.

We now move onto some strategies to systematically construct simple comonads.

Example: Given a non-empty list, we can mark a focus position in it, written as follows:

[1,2,\underline{3},4]

There is then a natural counit operation, which returns the focal element:

\epsilon([1,2,\underline{3},4]) = 3

There is also a comultiplication operation, which acts as follows on our example list:

\delta([1,2,\underline{3},4) = [[\underline{1},2,3,4],[1,\underline{2},3,4],\underline{[1,2,\underline{3},4]},[1,2,3,\underline{4}]]

The resulting comonad is the zipper comonad for non-empty lists.

There was nothing special about non-empty lists in the previous example.

Example: Given a pair of values, we can mark a focus position in it, for example:

(\underline{1},2)

As in the previous example, there is an obvious counit returning the focus:

\epsilon((\underline{1},2)) = 1

There is also a comultiplication, analogously to the previous example, with action:

\delta((\underline{1},2)) = (\underline{(\underline{1},2)},(1,\underline{2}))

The resulting comonad is the zipper comonad for pairs.

This pattern of collections of data with a focus can be repeated in great generality to yield many further examples of comonads.

Another example of a similar kind is the following:

Example: Again, we consider the endofunctor \mathbb{L}^+. We define our counit to return the tail element of lists. We do something slightly more interesting with the comultiplication, which cycles through the list, as in the following example:

\delta([1,2,3]) = [[2,3,1],[3,1,2],[1,2,3]]

This forms what is know as the circular non-empty list comonad.

The previous example answers another natural question. It is possible for a given endofunctor to carry two different comonad structures. In fact, even if we fix the counit, this remains so.

The zipper comonads, the non-empty list comonad, the circular non-empty list comonad, and several other examples we’ve seen, are all built up in a similar manner:

  1. There are certain shapes of collections of data types, such as pairs, lists of length 1,2,3, and so on.
  2. Each shape has a collection of positions in which data is stored. For example, the first and second elements of a pair.
  3. There is a focus position. In the zippers this was given explicitly, in the non-empty list examples, it was the tail element.
  4. There is rather general notion of “sub-shape” at a given position.

The first two aspects allow us to describe an endofunctor, mapping a set X to the collection values of each shape of a given datatypes, with data at each position taken from X. The focus element allow us to build a counit, and the notion of sub-shape is key to describing a comultiplication. Of course, we are only giving imprecise intuitions here, to make all this mathematically precise, we need to introduce more technical machinery. Specifically, we need the framework of (directed) containers, which we may return to in a later post.

It’s all just monads upside-down!

We now return to the question of how monads are related to comonads, which we shall look at from a few different perspectives.

Given a fixed category \mathcal{C}, we can consider the monads on the opposite category \mathcal{C}^{op}. These consist of:

  1. An endofunctor \mathbb{M} on \mathcal{C}^{op}, which is the same thing as an endofunctor \mathbb{D} on \mathcal{C}. The presence of the “op” doesn’t really make a great deal of different here, unlike for the natural transformations.
  2. A unit natural transformation \eta : \mathsf{Id}_{\mathcal{C}^{op}} \Rightarrow \mathbb{M}. Due to the reversal of the direction of morphisms in the opposite category, this is the same thing as a natural transformation \epsilon : \mathsf{Id}_{\mathcal{C}} \Rightarrow \mathbb{D}.
  3. A multiplication natural transformation \mu : \mathbb{M} \circ \mathbb{M} \Rightarrow \mathbb{M}. This is the same thing as a natural transformation \delta : \mathbb{D} \Rightarrow \mathbb{D} \circ \mathbb{D}.

The monad axioms satisfied by \eta and \mu are the same thing as requiring the comonad equations for the corresponding \epsilon and \delta. That is:

A comonad on a category \mathcal{C} is the same thing as a monad on the opposite category \mathcal{C}^{op}.

This is all very nice, but in previous posts, we generalised monads to 2-categories and bicategories. The previous explanation does really help us understand comonads in terms of monads in that setting, as it is very specifically tied to monads on categories. We need to look at things in a slightly different way.

Given a 2-category or bicategory \mathbf{K}, there are a few different ways we can produce new bicategories, by flipping the directions of different cells.

  1. We can reverse the direction of the 1-cells, yielding a bicategory \mathbf{K}^{op}.
  2. We can reverse the direction of the 2-cells, yielding a bicategory \mathbf{K}^{co}.
  3. We can flip the directions of both the 1 and 2 cells, yielding a bicategory \mathbf{K}^{coop}.

The operation we are interested in is flipping the direction of 2-cells. It is fairly routine to check that:

A comonad on 0-cell \mathcal{A} in bicategory \mathbf{K} is the same thing as a monad on \mathcal{A} in \mathbf{K}^{co}.

Finally, we note that for a fixed 0-cell \mathcal{A} in \mathbf{K}, the category \mathbf{K}(\mathcal{A}, \mathcal{A}) is a monoidal category, with the monoidal structure given by composition and identities. A monad on \mathcal{A} is the same thing is a monoid in \mathbf{K}(\mathcal{A}, \mathcal{A}). Dually:

A comonad on a 0-cell \mathcal{A} in bicategory \mathbf{K} is the same thing as a comonoid in the monoidal category \mathbf{K}(\mathcal{A},\mathcal{A}).

If you’ve not encountered comonoids before, you could equally take this as their definition via comonads.

These more esoteric definitions are useful for performing formal calculations at a high level of abstraction, but concrete examples are often important to develop satisfactory intuitions for applications.

Conclusion

We introduced comonads at a fairly rapid pace, building on our previous experience with monads. There is a lot more we could say about comonads, much of which would dualise the situation with monads. For example:

  1. There are corresponding Eilenberg-Moore and Kleisli constructions, with a relation to adjunctions. All of this works out in an entirely dual manner to that for monads.
  2. There is a notion of cofree comonad – again formally dual to that of free monad.

In each case it is worthwhile to examine concrete examples and constructions in familiar categories. There are also interesting questions about the relationships between monads and comonads, and the interaction between them. We will return to such topics in future posts.

Further reading: A good source of information about comonads from a mathematical or computer science perspective is the work of Tarmo Uustalu and Varmo Vene. I would suggest the paper “Comonadic Notions of Computation” as a good starting point.

Graded Monads

Now we have seen monads on 2-categories and bicategories, we are in a position to introduce some new ideas.

We have encountered various notions of monads so far, including:

These are all special cases of monads in a bicategory. This time, we are going to see a genuine generalisation. To do so, we follow a well-worn categorical trail, and adopt a functorial perspective on a familiar object, the monads themselves. This will give us yet another way of describing monads, and a suitable starting point for generalisation. The resulting graded monads are a powerful tool for modelling resource usage in a monadic setting.

A 2-Functorial Perspective

In ordinary category theory, we often consider the points in a category as morphisms from the terminal object. For example, in \mathsf{Set}, the elements of a set X bijectively correspond to functions of type \{ \star \} \rightarrow X with domain the terminal object.

To think about the points of a 2-category, we need to introduce 2-functors. A strict 2-functor \mathbf{F} : \mathbf{A} \rightarrow \mathbf{B} consists of:

  1. A map on 0-cells, \mathcal{C} \;\mapsto\; \mathbf{F}(\mathcal{C}).
  2. A map on 1-cells, F : \mathcal{C} \rightarrow \mathcal{D} \;\mapsto\; \mathbf{F}(F) : \mathbf{F}(\mathcal{C}) \rightarrow \mathbf{F}(\mathcal{D}).
  3. A map on 2-cells, \alpha : F \Rightarrow G \;\mapsto\; \mathbf{F}(\alpha) : \mathbf{F}(F) \Rightarrow \mathbf{F}(G).

These maps preserve identities

\mathbf{F}(\mathsf{Id}_{\mathcal{C}}) = \mathsf{Id}_{\mathbf{F}(\mathcal{C})} \qquad \mathbf{F}(\mathsf{id}_F) = \mathsf{id}_{\mathbf{F}(F)}

and composition

\mathbf{F}(G \circ F) = \mathbf{F}(G) \circ \mathbf{F}(F) \qquad \mathbf{F}(\beta \cdot \alpha) = \mathbf{F}(\beta) \cdot \mathbf{F}(\alpha)

in the obvious way.

Put another way, a strict 2-category is simply a \mathsf{Cat}-enriched category, and strict 2-functors are then \mathsf{Cat}-enriched functors.

The terminal 2-category 1 is the category with one 0, 1 and 2-cell. Explicitly, we have:

  1. A 0-cell \star.
  2. An identity 1-cell \mathsf{Id}_\star : \star \rightarrow \star.
  3. An identity 2-cell \mathsf{id}_{\mathsf{Id}_\star} : \mathsf{Id}_\star \rightarrow \mathsf{Id}_\star.

With these definitions in place, we can consider the (strict) points of a 2-category \mathbf{C}. These are 2-functors:

\mathbf{P} : 1 \rightarrow \mathbf{C}

Such a functor can map \star to any 0-cell it chooses. As both the identity cells must be preserved, this is the only freedom in choosing a particular \mathbf{P}. Therefore we can identity strict points of a 2-category with its 0-cells. So far, so boring.

As we learned last time, it is mathematically natural to relax the strict setting to one where structure is only preserved up to isomorphism. To do so, we consider strong 2-functors (often called pseudofunctors or simply 2-functors). These have the same mappings as before, but now we have isomorphism 2-cells:

  • \mathsf{Id} \overset{\cong}{\Rightarrow} \mathbf{F}(\mathsf{Id}).
  • \mathbf{F}(G) \circ \mathbf{F}(F) \overset{\cong}{\Rightarrow} \mathbf{F}(G \circ F).

As usual, these isomorphisms must satisfy some coherence equations, ensuring they are suitably compatible with each other. We will skip spelling these out explicitly.

Now consider strong 2-functors \mathbf{P} : 1 \rightarrow \mathbf{C}. As the identity 2-cell must still be preserved, these consist of:

  1. A 0-cell \mathbf{P}(\star).
  2. A 1-cell \mathbf{P}(\mathsf{Id}).
  3. An isomorphism 2-cell \mathsf{Id} \overset{\cong}{\Rightarrow} \mathbf{P}(\mathsf{Id}).
  4. An isomorphism 2-cell \mathbf{P}(\mathsf{Id}) \circ \mathbf{P}(\mathsf{Id}) \overset{\cong}{\Rightarrow} \mathbf{P}(\mathsf{Id}).

Possibly this is starting to look slightly familiar. A good choice of notation will hopefully clarify things further. Lets write

\mathcal{C} = \mathbf{P}(\star) \qquad \mathbb{T} = \mathbf{P}(\mathsf{id})

and name the two invertible 2-cells \eta : \mathsf{Id} \overset{\cong}{\Rightarrow} \mathbb{T} and \mu : \mathbb{T} \circ \mathbb{T} \overset{\cong}{\Rightarrow} \mathbb{T}. With this terminology, specifying a strong 2-functor 1 \rightarrow \mathbf{C} consists of:

  1. A 0-cell \mathcal{C}.
  2. A 1-cell \mathbb{T}.
  3. An isomorphism 2-cell \eta : \mathsf{Id} \overset{\cong}{\Rightarrow} \mathbb{T}.
  4. An isomorphism 2-cell \mu : \mathbb{T} \circ \mathbb{T} \overset{\cong}{\Rightarrow} \mathbb{T}.

This looks suspiciously like the definition of a monad on the 0-cell \mathcal{C} in the 2-category \mathbf{C}. In fact, the coherence conditions for this strong 2-functor exactly require that \eta and \mu satisfy the unit and multiplication axioms of a monad.

There is one remaining oddity to address, that \eta and \mu are isomorphisms, which is too strong to capture most monads. We need to weaken the conditions on our 2-functors still further. Fortunately, another standard categorical notion fits the bill. A lax 2-functor has the same data as a strong 2-functor, but we drop the isomorphism requirement for the 2-cells:

  • \mathsf{Id} \Rightarrow \mathbf{F}(\mathsf{Id}).
  • \mathbf{F}(G) \circ \mathbf{F}(F) \Rightarrow \mathbf{F}(G \circ F).

Notice that now these 2-cells are not required to be invertible, their direction matters. We could equally consider oplax 2-functors, in which these 2-cells go in the other direction. The lax setting is the right one for our needs. We can make the standard observation:

A monad is the same thing as a lax 2-functor from the terminal category.

We have concentrated on strict 2-categories to avoid bookkeeping yet more isomorphisms, but this observation remains true if we move to the bicategorical setting.

Time to Generalise

The point of moving to a functorial perspective is that we have a natural launching point for generalisation. We identified monads in the 2-category \mathbf{C} with lax 2-functors 1 \rightarrow \mathbf{C}. The obvious next step is to consider domains other than the terminal 2-category.

As a first step, we adjust our perspective on the terminal category slightly. Any monoid can be seen as a 1-object category, with morphisms corresponding to elements of the monoid, and composition and identities given by the monoid multiplication and unit respectively. Given any category \mathcal{C}, we can view it as a 2-category, with the 0 and 1 cells given by the objects and morphisms of \mathcal{C}, and the only 2-cells being the necessary identities. We can then build the terminal category as follows:

  1. Consider the terminal monoid. This is the monoid with just a unit element.
  2. View this as an ordinary category.
  3. View the resulting category as a 2-category.

Obviously, we could perform these simple changes of perspective with any monoid (M, \times, 1). This yields the notion of graded monad. A graded monad on a 0-cell \mathcal{C}, graded by M consists of:

  1. A family of 1-cells \mathbb{T}_m : \mathcal{C} \rightarrow \mathcal{C}, indexed by the elements of M.
  2. A unit 2-cell \eta : \mathsf{Id} \Rightarrow \mathbb{T}_1.
  3. Multiplication 2-cells \mu_{m,n} : \mathbb{T}_m \circ \mathbb{T}_n \Rightarrow \mathbb{T}_{m \times n}.

These satisfy some generalisations of the unit and multiplication axioms, compatible with the monoid indexing.

Example: We can consider functors \mathbb{L}_n sending a set X to the set of lists of elements of X of exactly length n. There is an obvious mapping:

\eta_X : X \rightarrow \mathbb{L}_1(X)

sending each element of x to the singleton list [x]. There are also multiplication morphisms:

\mu_{m,n,X} : \mathbb{L}_m(\mathbb{L}_n(X)) \rightarrow \mathbb{L}_{m \times n}

given by concatenation. For example:

[[1,2],[3,4],[5,6]] \mapsto [1,2,3,4,5,6]

Together, this data constitutes a graded monad, graded by the natural numbers under multiplication (\mathbb{N}, \times,1). We shall refer to this as the exact list graded monad.

This example gives a general sense of the nature of graded monads, particularly in computer science. The grading allows us to track some quantitative data, allowing to model resources appropriate to the effects in our application. We mention another perhaps slightly surprising degenerate example:

Example: Every 1-cell F : \mathcal{C} \rightarrow \mathcal{C} induces a (\mathbb{N},+,0) graded monad \mathbb{F} on \mathcal{C}, with

\mathbb{F}_n = \overbrace{F \circ \ldots \circ F}^{n \text{ times}}

So in particular, \mathbb{F}_0 = \mathsf{Id}. The unit and multiplications are identity 2-cells.

The exact list example is a bit inflexible, as list lengths are fixed completely. To loosen this up, we introduce a new mathematical object. A preordered monoid is a monoid (M,\times,1) with a preorder on M such that the monoid multiplication is monotone.

Example: The natural numbers under multiplication form a partially ordered monoid under the usual ordering on natural numbers, as:

m_1 \leq m_2 \;\wedge\; n_1 \leq n_2 \;\Rightarrow\; m_1 \times n_1 \leq m_2 \times n_2

Similarly to ordinary monoids, preordered monoid can be viewed as (thin) 2-categories, with 2-cells between 1-cells m,n whenever m \leq n. We can then generalize further, to monads graded by a preordered monoid. These are the same as monoid graded monads, with the addition that if m \leq n there is a 2-cell \mathbb{T}_m \Rightarrow \mathbb{T}_n. As usual, these 2-cells must satisfying some compatibility equations, which we will skip over.

Example: There is a second family of list functors \mathbb{L}^{\leq}_n, with the elements of \mathbb{L}^{\leq}_n(X) consisting of lists of elements of X of length at most n. The unit and multiplication maps are given by singletons and concatenation, as in the previous example. If m \leq n, there is a natural transformation:

\mathbb{L}^{\leq}_m \Rightarrow \mathbb{L}^{\leq}_n

promoting lists of shorter length to the longer context. These constitute a \mathbb{N}, \times, 1, \leq)-graded monad, the bounded list graded monad.

We can go further, and in the extreme consider monads graded by any 2-category we like.

Conclusion

Graded monads can be seen as a resource aware generalisation of monads, with natural applications in areas such as computer science. For example, there are non-trivial graded state, writer and maybe monads. There is a lot more to explore here, for example:

  1. Developing the mathematics further, for example suitable Kleisli and Eilenberg-Moore constructions.
  2. Discussing applications of graded monads.

As ordinary monads are already a rich topic, unsurprisingly this is more than we can hope to cover properly in a single post. We may return to this subject again later.

This post owes a great deal to exposition in papers and talk slides by Tarmo Uustalu. I would recommended the slides from “Grading monads, comonads, distributive laws” as a good starting point for further reading.

Acknowledgements: Thanks to Ralph Sarkis for picking up a confusing typo in one of the examples.

Bicategories, monoids, internal and enriched categories

Last time, we introduced a more general definition of monads, in the setting of (strict) 2-categories. Unfortunately this strictness is often a bit too rigid to incorporate natural mathematical objects, and this is certainly true for monad theory. This time, we will push our definition of monads beyond this strict setting, into the wider world of bicategories.

Generalising to Bicategories

A bicategory \mathbf{C} consists of:

  1. A collection of 0-cells A,B,\ldots
  2. Between every pair of 0-cells A,B, a category \mathbf{C}(A,B). The objects of these categories are termed 1-cells, and the morphisms 2-cells. Composition of 2-cells is called vertical composition.
  3. For every triple of 0-cells, A,B,C a bifunctor \circ : \mathbf{C}(B,C) \times \mathbf{C}(A,B) \rightarrow \mathbf{C}(A,B), referred to as horizontal composition. For every 0-cell A, there are identity 1-cells \mathsf{Id}_A in \mathbf{C}(A,A).

Horizontal composition is associative and unital up to isomorphism. That is, there are natural isomorphisms:

  1. A left unitor \lambda : \mathsf{Id} \circ (-) \Rightarrow (-).
  2. A right unitor \rho : (-) \circ \mathsf{Id} \Rightarrow (-).
  3. An associator \alpha : (-) \circ ((-) \circ (-)) \Rightarrow ((-) \circ (-)) \circ (-), between the two different orders in which we can apply horizontal composition twice.

As you might expect, this structure is subject to some coherence equations, entirely analogous to those for a monoidal category.

Example: A (strict) 2-category is a bicategory in which the unitors and associators are identities, so horizontal composition is unital and associative on the nose.

Example: A monoidal category is “the same thing” as a bicategory with one 0-cell. This is often phrased as “A monoidal category is a one object bicategory”.

Considering one direction in more detail, in a bicategory \mathbf{C} with one 1-cell A consists of a single category \mathbf{C}(A,A). We also have:

  1. A horizontal composition bifunctor \mathbf{C}(A,A) \times \mathbf{C}(A,A) \rightarrow \mathbf{C}(A,A). We consider this to be the tensor product of our monoidal category.
  2. The identity 1-cell \mathsf{Id}_A will serve as the monoidal unit.

The coherence axioms for a bicategory is this simple case are exactly those of a monoidal category. Going in the other direction is similar.

We also introduce a couple of more complex bicategories that will be important in subsequent examples.

Example: A span of type X \rightarrow Y in a category \mathcal{C} is a pair of \mathcal{C}-morphisms of the form:

X \xleftarrow{f} A \xrightarrow{g} Y.

If \mathcal{C} has pullbacks, we can form the composite of a span X \xleftarrow{f} A \xrightarrow{g} Y with a span Y \xleftarrow{h} B \xrightarrow{k} Z as:

X \xleftarrow{f \circ \pi_1} A \times_{Y} B \xrightarrow{g \circ \pi_2} Z

where \pi_1, \pi_2 are the two projection maps given by forming the pullback of g along h.

Given two spans X \xleftarrow{f_1} A_1 \xrightarrow{g_1} Y and X \xleftarrow{f_2} A_2 \xrightarrow{g_2} Y a morphism of spans between them is a \mathcal{C}-morphism h : A_1 \rightarrow A_2 such that:

f_2 \circ h = f_1 \quad\text{and}\quad g_2 \circ h = g_1

In fact, for a category \mathcal{C} with pullbacks, there is a bicategory \mathbf{Span}(\mathcal{C}) with:

  1. 0-cells the objects of \mathcal{C}.
  2. \mathbf{Span}(\mathcal{C})(X,Y) consists of all spans of type X \rightarrow Y, and span morphisms between them.
  3. The universal property of pullbacks means composition of spans extends to a bifunctor, which we take to be our horizontal composition.

We are skimming over a size issue here. Readers with the necessary background may want to consider how such questions might creep in to the definition above.

Example: Let \mathcal{V} be a monoidal category with coproducts. For sets A,B, we can consider \mathcal{V}-valued matrices of type A \rightarrow B to be functions:

M : A \times B \rightarrow \mathsf{obj}(\mathcal{V})

Given a pair of matrices M : A \rightarrow B and N : B \rightarrow C, we can form their composite as follows:

(N \circ M)(a,c) = \coprod_{b \in B} M(a,b) \otimes N(b,c)

Given a pair of matrices M_1, M_2 : A \rightarrow B, a morphism of matrices is a family of \mathcal{V} morphisms:

f_{a,b} : M_1(a,b) \rightarrow M_2(a,b)

This data can be combined into a bicategory \mathbf{Mat}(\mathcal{V}) with:

  1. 0-cells sets.
  2. The category \mathbf{Mat}(\mathcal{V}(A,B) consists of matrices of type A \rightarrow B and morphisms between then.
  3. Horizontal composition is given by composition of matrices, as described above.

The two previous examples are typical for bicategories, in that horizontal composition involves either limits or colimits. In such cases, if we form iterated horizontal composites in different orders, the universal properties involved will ensure they agree up to isomorphism. Insisting that composites formed in different orders agree on the nose would lead to unrealistic requirements on our choice of (co)limits, so the bicategorical rather than strict 2-categorical setting is more natural.

Now we’ve seen a few different bicategories, it’s time to get back to our main motivation, monads. A monad in a bicategory \mathbf{C} consists of:

  1. A 0-cell \mathcal{C}.
  2. A 1-cell \mathbb{T} : \mathcal{C} \rightarrow \mathcal{C}.
  3. A unit 2-cell \eta: \mathsf{Id}_{\mathcal{C}} \Rightarrow \mathbb{T}.
  4. A multiplication 2-cell \mu: \mathbb{T} \circ \mathbb{T} \Rightarrow \mathbb{T}.

As before, our notation is suggestive of that for “ordinary” monads on categories. Notice the data above is no different to that which we encountered in the strict 2-categorical setting. Things get more interesting when we consider the coherence equations that need to hold, as now we must account for the fact that horizontal composition is not unital or associative on the nose, but only up to isomorphism. The required equations are:

  1. \mu \cdot (\eta \circ \mathsf{id}_{\mathbb{T}}) = \lambda.
  2. \mu \cdot (\mathsf{id}_{\mathbb{T}} \circ \eta) = \rho.
  3. \mu \cdot (\mu \circ \mathsf{id}_{\mathbb{T}}) = \mu \cdot (\mathsf{id}_{\mathbb{T}} \circ \mu) \cdot \alpha.

Firstly, we see that this really does generalise our previous definition.

Example: As we would expect, a monad in a 2-category is a special case of a monad in a bicategory. The unitors and associators are identities, and the equations above collapse to those of the 2-categorical case.

We also find a fun simple source of monads that would not have made sense before:

Example: We can view any monoidal category \mathcal{V} as a one object bicategory. A monoid in \mathcal{V} is the same thing as a monad in this one object bicategory. For example, every group or monoid can be seen as a monad. There is a certain symmetry here:

  1. In a bicategory \mathbf{C}, every monad on \mathcal{A} is a monoid in the monoidal category \mathbf{C}(\mathcal{A},\mathcal{A}).
  2. A monoid in a monoidal category is a monad in the corresponding one object bicategory.

We now move on to some more interesting examples, which really highlight the benefits afforded by the more liberal bicategorical definition.

Example: A (small) category consists of a set of objects O and a set of morphisms M. There are two functions giving the source and target of a given morphism, forming a span:

O \xleftarrow{s} M \xrightarrow{t} O

The identities induce a morphism from the identity span to this span. If we form the pullback of s along t, the object M \times_O M consists of all composable pairs of morphisms, and there is a span morphism from the resulting span to the one defined above, given by the function maps composable pairs to their composite.

Given a category \mathcal{C} with pullbacks, we can define an internal category in \mathcal{C} to consists of the analogous structure. A (small) category is then an internal category in \mathsf{Set}.

The key observation for our purposes is that an internal category in \mathcal{C} is the same thing as a monad in \mathbf{Span}(\mathcal{C}). The monad unit and multiplication correspond to the identities and composition respectively. The rest is “just” an exercise in unravelling definitions.

Example: For a monoidal category \mathcal{V} with coproducts, a \mathcal{V}-enriched category \mathcal{A} consists of:

  1. A set of objects O.
  2. For each pair of objects o_1, o_2 a hom-object in \mathcal{V}. We can view this as a \mathcal{V}-valued matrix A : O \rightarrow O.
  3. Morphisms j_o : \mathcal{I} \rightarrow A(o,o) picking out the identities. These form a morphism of matrices from the identity matrix to A.
  4. Composition morphisms m_{o_1, o_2, o_3} : A(o_2,o_3) \otimes A(o_1,o_2) \rightarrow A(o_1, o_3), which via the universal property of coproducts induce a morphism \coprod_{o_2 \in O} A(o_1, o_2) \otimes A(o_2, o_3) \rightarrow A(o_1,o_3). Together, these form a morphism of matrices of type A \circ A \rightarrow A.

A more careful unpacking of the definitions following these intuitions shows that a \mathcal{V}-category is the same thing as a monad in \mathbf{Mat}(\mathcal{V}).

As a special case of this construction, the Booleans can be considered as a thin two object monoidal category \mathcal{B}. The category \mathbf{Mat}(\mathcal{B}) is equivalent to the bicategory of relations \mathbf{Rel} we encountered last time. The Boolean enriched categories are then exactly preorders.

Conclusion

By considering a further generalisation of the notion of monad to the weak setting of bicategories, many new interesting examples become possible. The examples of internal and enriched categories are well-known, and provide the intuitions for other, more elaborate constructions.

A categorically minded reader should be asking themselves the following question. We have identified internal and enriched categories as monads in certain bicategories, what about the morphisms between them? Do internal and enriched functors correspond to an obvious notion of morphism between these monads? The answer to this question is a bit more subtle than one might hope. We may return to this point in a later post.

More background on bicategories, and the monads corresponding to internal and enriched categories can be found in Lack’s “A 2-Categories Companion”, which is highly recommended reading.

First steps into a larger world of monads

Until now, we have taken the definition of monad to be:

  • A category \mathcal{C}.
  • An endofunctor \mathbb{T} : \mathcal{C} \rightarrow \mathcal{C}.
  • A unit natural transformation \eta : \mathsf{Id} \Rightarrow \mathbb{T}.
  • A multiplication natural transformation \mu : \mathbb{T}^2 \Rightarrow \mathbb{T}.

satisfying some compatibility equations. In this post, we will take a first step in generalising this definition to include a much broader range of mathematical objects. Our original definition arises as a special case, and interesting new examples come into view.

Generalising to 2-categories

A 2-category is an abstraction of the fundamental mathematical structure of categories, functors and natural transformation. Note that some more modern accounts refer to these as strict 2-categories, reserving the term 2-category for a more general notion, although this terminology is still not in consistent use. We shall occasionally use the adjective strict for emphasis.

A (strict) 2-category has three key building blocks:

  1. 0-cells, which we shall denote \mathcal{C}, \mathcal{D}, \ldots.
  2. Between any pair of 0-cells \mathcal{C}, \mathcal{D}, there is a collection of 1-cells of type \mathcal{C} \rightarrow \mathcal{D}. We shall denote 1-cells as F,G,\ldots. For any 0-cell \mathcal{C}, there is a distinguished identity 1-cell \mathsf{Id}_{\mathcal{C}} : \mathcal{C} \rightarrow \mathcal{C}.
  3. Between any pair of 1-cells F, G : \mathcal{C} \rightarrow \mathcal{D} of the same type, there is a collection of 2-cells of type F \Rightarrow G. We shall denote 2-cells as \alpha, \beta, \ldots. For any 1-cell F, there is a distinguished identity 2-cell \mathsf{id}_{F} : F \Rightarrow F.

These components can be composed in various ways:

  • For any pair of 1-cells F : \mathcal{A} \rightarrow \mathcal{B} and G : \mathcal{B} \rightarrow \mathcal{C}, there is a composite 1-cell G \circ F : \mathcal{A} \rightarrow \mathcal{C}.
  • For 1-cells, F,G : \mathcal{A} \rightarrow \mathcal{B} and H, K : \mathcal{B} \rightarrow \mathcal{C}, and 2-cells \alpha : F \Rightarrow G and \beta : H \Rightarrow K there is a horizontal composite natural transformation \beta \circ \alpha : H \circ F \Rightarrow K \circ G.
  • For any triple of 1-cells F,G,H : \mathcal{A} \rightarrow \mathcal{B}, and 2-cells \alpha : F \Rightarrow G and \beta : G \Rightarrow H, there is a vertical composite natural transformation \beta \cdot \alpha.

These composition operations satisfy various unitality, assocativity and other compatibility equations, such that the collections of 1-cells, and the 2-cells between them form categories, and horizontal composition is a bifunctor. The details are readily available elsewhere, so I shall avoid spelling out the full formal definition.

Example: Categories, functors and natural transformations form a 2-category \mathbf{Cat}, and in fact this is the motivating example. Strictly speaking we should say small categories, functors and natural transformations to avoid Russell’s paradox like problems. As usual we shall not dwell on such size issues.

Example: There is a 2-category \mathbf{Rel} with:

  1. 0-cells sets.
  2. 1-cells of type A \rightarrow B relations R \subseteq A \times B.
  3. There is a 2-cells R \Rightarrow S if R \subseteq S.

The identity 1-cells are the usual notion of identity relation:

\mathsf{Id}(a_1,a_2) \Leftrightarrow (a_1 = a_2)

This sort of 2-category is termed thin as there is at most one 2-cell between a pair of 1-cells. That is, the collections of 1-cells, and 2-cells between them form preorders.

With the definition of 2-category in place, the generalisation of the notion of monad is pretty routine. We simply replace category, functor and natural transformation with 0-cell, 1-cell and 2-cell in our original definition.

A monad in a 2-category consists of:

  1. A 0-cell \mathcal{C}
  2. A 1-cell \mathbb{T} : \mathcal{C} \rightarrow \mathcal{C}.
  3. A unit 2-cell \eta : \mathsf{Id} \Rightarrow \mathbb{T}.
  4. A multiplication 2-cell \mu : \mathbb{T}^2 \Rightarrow \mathbb{T}.

This data must satisfy the following three equations:

\mu \cdot (\eta \circ \mathsf{Id}_{\mathbb{T}}) = \mathsf{Id}_{\mathbb{T}} \qquad \mu \cdot (\mathsf{Id}_{\mathbb{T}} \circ \eta) = \mathsf{Id}_{\mathbb{T}} \qquad \mu \cdot (\mu \circ \mathsf{Id}_{\mathbb{T}}) = \mu \cdot (\mathsf{id}_{\mathbb{T}} \circ \mu)

We recover our original notion of monad as a special case.

Example: Monads in the 2-category \mathbf{Cat} are exactly our original notion of monads.

We can also now talk about further examples, that would not even have made sense before.

Example: A monad on a set A in the 2-category \mathbf{Rel} is a relation R : A \rightarrow A such that:

  1. \mathsf{Id} \subseteq R.
  2. R \circ R \subseteq R.

Unpacking this into a pointwise definition, this is a relation R such that:

  1. For all a \in A, R(a,a).
  2. If R(a_1,a_2) and R(a_2,a_3) then R(a_1,a_3).

That is, a monad on A in \mathbf{Rel} is a preorder on A.

A variation on the previous example yields a classical observation of Lawvere, in a monadic guise.

Example: Let \mathbb{R}^{\infty} denote the positive real numbers, extended with a maximal element \infty. We can extend addition and multiplication to \mathbb{R}^{\infty} by setting:

x + \infty = \infty = \infty + x \qquad\text{and}\qquad x \times \infty = \infty =\infty \times x

There is a 2-category \mathbf{Rel}_{\mathcal{L}} with:

  1. 0-cells sets.
  2. 1-cells of type A \rightarrow B are functions R : A \times B \rightarrow \mathbb{R}^{\infty}. We think of these as positive real valued relations, with R(a,b) giving a distance between a and b.
  3. There is a 2-cell R \Rightarrow S if for all a,b, R(a,b) \geq S(a,b). Note the direction of ordering here.

As this 2-category is thin, it only remains to specify the identity and composition of 1-cells. The identity 1-cells generalise the identity relations we saw in the earlier example:

\mathsf{Id}(a,a) = \begin{cases} 0, &\text{if \begin{math}a_1 = a_2\end{math}} \\ \infty, &\text{otherwise} \end{cases}

In terms of distances, an element a is zero distance from itself, and infinitely far away from everything else.

For the composition of R : A \rightarrow B and S : B \rightarrow C, again we generalise composition of relations from the previous example:

(S \circ R)(a,c) = \bigwedge_{b \in B} R(a,b) + R(b,c)

In terms of our distance based intuition, if R gives distances between A and B, and S gives distances between B and C, then (S \circ R)(a,c) gives the shortest distance between a and c travelling in two steps:

  1. Firstly moving from a to some element b \in B using the distances specified by R.
  2. Secondly, moving from the chosen element b \in B to c using the distances specified by S.

Now we have specified our setting, we can consider what the monads are. A monad on a set A in \mathbf{Rel}_{\mathcal{L}} consists of a function D : A \times A \rightarrow \mathbb{R}^{\infty} such that there are 2-cells:

  1. \mathsf{Id} \Rightarrow D
  2. D \circ D \Rightarrow D

It will help to expand these to pointwise definitions, and make the ordering between elements explicit. We find that:

  1. D(a,a) = 0
  2. D(a_1,a_2) \leq \bigwedge_{a \in A} D(a_1, a) + D(a, a_2). This is equivalent to requiring for all a_1, a_2, a that the triangle inequality D(a_1, a_2) \leq D(a_1,a) + D(a, a_2) holds.

So the monads in \mathbf{Rel}_{\mathcal{L}} are generalised metric spaces in which we permit the following extra flexibility:

  1. Non equal points may be at zero distance from each other.
  2. Distances are not necessarily symmetric, that is we may have D(a,b) \neq D(b,a).

The 2-category of the previous example was built by considering relations over generalised truth values. Varying this idea yields many other interesting examples.

Our final example is a natural generalisation of \mathbf{Cat}.

Example: We encountered enriched categories briefly when discussing strong monads. For a monoidal category \mathcal{V}, there are \mathcal{V}-categories, \mathcal{V}-functors and \mathcal{V}-natural transformations. These form a 2-category \mathcal{V}-\mathbf{Cat}, and the monads in this 2-category yield exactly the definition of \mathcal{V}-monad that is considered in enriched category theory. This sort of example is encouraging evidence for the usefulness of the abstract definition of monad, in that it recovers mathematical construction considered in settings beyond ordinary category theory.

Conclusion

We have taken a first step in generalising the definition of monad to the setting of (strict) 2-categories. This allowed us to rediscover some interesting new examples, such as preorders and generalised metric spaces being monads. Unfortunately, some natural examples we would like to cover are still out of reach as the strict 2-categorical setting is too rigid. To get our hands on many further examples, we shall take a second step, and generalise the definition of monad to bicategories in a later post.

Algebra, Order and Monads

In the previous post, we introduced preorders, posets and monotone maps between them. We then investigated a small number of monads on these categories of ordered structures. The additional order relations enable interesting new structure on \mathsf{Pos}-monads that we hadn’t encountered before with our main example of \mathsf{Set}-monads.

In this post, we will sketch how the algebraic perspective that is so instructive with \mathsf{Set} monads can be generalised to monads on \mathsf{Pos}. Our discussions will be relatively informal, aiming to highlight the additional aspects of algebra in the land of ordered structures that are of interest. The details involve some fairly technical ideas, which are beyond the scope of what we have covered so far. Interested readers will find some pointers at the end of the post.

Relatives of the Maybe Monad

In the previous post, we introduced the “obvious” generalisation of the maybe monad from sets to posets, with functor component:

X \mapsto X + 1

To do so, we mimicked the construction used in \mathsf{Set} in terms of binary coproducts and terminal objects. Our current aim to try and understand such constructions from an algebraic perspective, whatever that means now we’ve moved from sets to posets.

To try and get an algebraic handle on \mathsf{Pos}-monads, we are going to assume we need a collection of operations \Sigma, and a collection of equations E, as was the case for ordinary algebra.

In fact, for the maybe monad, this is already sufficient, so we don’t need anything new. We assume \Sigma consists of a single new constant element \star, and we require no equations. As we only have a constant element, the only terms we can form over X are either:

  1. Variables from X.
  2. The constant symbol \star.

As there are no equations, we don’t need to worry about forming equivalence classes. If we assume variables inherit an ordering from X, and no other non-trivial order relations hold, the resulting poset is isomorphic to

X + 1

which is exactly what we hoped for. So far, so easy. A small additional step will take us into less familiar territory. Rather than generalising the maybe monad, instead we can consider the exception monad induced by some poset E. This will have functor action:

X \mapsto X + E

Recall that the coproduct of posets inherits the order from the two components, and no other non-trivial order relations hold.

Algebraically, the \mathsf{Set} exception monad corresponded to adding a set E of new constant symbols. If we do this is \mathsf{Pos}, and form terms over X as we did above for the maybe monad, we will get a poset with an underlying set consisting of:

  1. Variables from X.
  2. Constant symbols e \in E.

The only non-trivial order relations will be those between the variables inherited from X. This isn’t quite general enough for what we want, as we want to allow E to be a poset, not just a mere set. It is reasonably clear we cannot achieve the desired effect, even if we add in some equations. So we shall need the first new feature of algebra on posets:

Operation symbols of the same arity form a poset.

If we were to provide some formal syntax, in our example, we are allowing ourselves to introduce relations

e_1 \leq e_2

between the constant symbols representing exceptions. More generally, if we have two operation symbols o_1, o_2 of the same arity, and o_1 \leq o_2, then we have the following axiom between terms:

o_1(x_1,\ldots,x_n) \leq o_2(x_1,\ldots,x_n)

Last time, we introduced another relative of the maybe monad, the lift monad. Here, we map a poset X to a poset with a new element \star added below all of the elements of X. It doesn’t seem we can achieve the desired effect by ordering our operation symbols, as we need to add an inequation of the form:

\star \leq x

between an operation symbol and a bare variable. It is tempting to conclude that for algebra between posets, we need to permit inequations rather than mere equations between terms. In fact, this seems mathematically very natural, but lets not be too hasty.

We can’t achieve the desired effect directly by ordering our operation symbols, as we currently only have the constant symbol \star. Nobody said we can’t add additional operation symbols, so lets be a bit more creative, and add two unary symbols l (for left hand side) and r (for right hand side), and require that l \leq r. Finally, we require the equations:

l(x) = \star and r(x) = x

Then we can conclude that:

\star = l(x) \leq r(x) = x

If we form equivalence classes of terms, ordering them according to these axioms, the resulting construction yields the lift monad. The upshot of this procedure is that

We can use inequations between terms in poset algebra, but this is achieved using ordering between operation symbols, and equations between terms.

So from this perspective, poset algebra is still fundamentally about equations, not inequations, which is possibly a bit of a surprise.

Relatives of the List Monad

Previously, in the case of \mathsf{Set}-monads, we have seen that the list monad is the free monoid monad. That is, we have a constant symbol 1, and a binary operation symbol \times, such that the following three equations hold:

1 \times x = x = x \times 1 and (x \times y) \times z = x \times (y \times z).

We don’t need to consider ordering between our operation symbols, as they are of different arities. We can form equivalence classes of terms, as we would for the set theoretic construction, which correspond to the lists of variables symbols that appear in the terms. The key question is what non-trivial order relations should we add between terms. As we’ve seen before, for terms over poset X, we should always inherit the order relations between variables. Now we have a non-constant operation symbol \times, we have two obvious choices about how this operation interacts with the order structure of its arguments:

  1. Consider the multiplication operation \times to be monotone.
  2. Consider the multiplication operation \times to be an arbitrary function.

If we take the first option, the resulting monad is the list monad, with lists of the same length ordered pointwise. This is the monad \mathbb{L}_2 of the previous post. If on the other hand we take the less restrictive route of the second option, we get a different list monad, in which only the singleton lists are ordered with respect to each other if their contained elements are. This is the monad we called \mathbb{L}_1 in the previous post. These observations apply in full generality, and lead us to our second observation about poset algebra:

We have the right, but the not obligation, to require that every operation is monotone.

So in fact, there are two different notions of poset algebra we can consider, depending on this choice. It is perhaps slightly surprising at first glance that allowing non-monotone operations is a viable option. One choice corresponds to \mathsf{Pos}-enriched monad theory, and the other to \mathsf{Set}-enriched monad theory, but that will take us further into technically challenging material than is appropriate with what we know so far.

Obviously, we can impose other relations such as:

x \leq x \times y and x \times y \leq y

using the ability to encode inequations as axioms discussed above. These additional inequations yield two more of the list monad variants from the previous post. We also note that in general, our ability to encode inequations lets us require that some, but not all of our operations are monotone.

Finally, we need to consider the ordered list monads that we discussed last time. Recall, these were monads in which we only allow rising lists of the form:

[x_1,\ldots,x_n]

where each consecutive pair of elements are ordered, that is x_i \leq x_{i + 1}.

These monads seem to involve some sort of phenomenon that we’ve not encountered so far. It doesn’t seem possible to encode such a monad using the features we’ve introduced previously. We need something new. This leads us to possibly the most exotic feature of poset algebra.

Intuitively, the problem we have is that if we consider a binary operation \times, there’s no restriction on the arguments we can apply it to, for example we can form

x \times y

where x is strictly greater than y. This is because for ordinary set algebra, an operation is a map:

\overbrace{X \times \ldots \times X}^{n\rm\ times} \rightarrow X

and we have so far continued with this approach to operation arities in the order theoretic context. To generalise things, we observe that if we write [n] for the set \{ 1, \ldots, n \}, this is equivalent to a function:

X^{[n]} \rightarrow X

where X^{[n]} is the exponential, the set of all functions from [n] to X. In fact, such an operation is equivalent to one of the form:

X^A \rightarrow X

where A is any n-element set. So operations in set can be seen as having arities finite sets. If we generalise this point of view to posets, an operation should be a map:

X^A \rightarrow X

where A is a finite poset. This gives us more freedom than before, as finite posets can specify non-trivial ordering between arguments, as the exponential X^A contains only monotone maps. For our list based construction, we can require a \times operation with arity the two element poset \{ x \leq y \}. This is an operation that maps ordered pairs to a new element.

In terms of syntax, this means we should only be able to form the term x \times y if we have established that x \leq y.

Combining these general arities with some of the previous tricks allows us to specify monads that deal with ordered lists. This leads us to our third and final observation about poset algebra:

Operations have arities which are finite posets.

This is a feature we might not have anticipated in advance, but in fact is does fall out of some (rather technical) general categorical arguments, involving locally finite presentable categories.

Conclusion

We have sketched some features we might encounter when dealing with monads and algebra in the context of posets. We encountered three new features beyond what we see in the setting of ordinary universal algebra.

  1. Operation symbols of the same arity can be ordered with respect to each other.
  2. We have the option to require that all our operations are monotone, but this is not compulsary.
  3. The arities of operations that we can consider are not mere numbers. Instead they are specified by finite posets imposing constraints on how the arguments are ordered.

We also discovered we didn’t need to impose inequations between terms as a new feature, as these can be simulated using the features above.

These generalisations follow a pattern that continues for appropriate notions of algebra in other categorical settings. A good starting point for further background is Robinsons “Variations on Algebra”, which is written in an enjoyable expository style, whilst being much more precise on the details than this post. Important technical background can be found in Kelly and Power’s “Adjunctions whose counits are coequalizers, and presentations of finitary enriched monads”. A lot more detail about the right notion of algebra in these generalised settings can be found in the many papers on Lawvere theories of Power and co-authors. The papers “Presentations and algebraic colimits of enriched monads for a subcategory of arities” and “Diagrammatic presentations of enriched monads and varieties for a subcategory of arities” by Lucyshyn-Wright and Parker may also be of interest.

Although there is a great deal of literature about generalised algebra, particularly at the level of Lawvere theories, there is a bit of a gap in the literature. Developing precise syntax and proof systems for these generalised equational logics in specific cases of interest is rarely worked out in detail, particularly when generalised arities are involved. This is a shame, as the concrete tools of equational presentations and equational logic are tremendously useful tools in the set theoretic setting. Detailed meta-theoretic results for such systems, at a high level of abstraction, can be found in the work of Fiore and co-authors, but applying these methods to a specific setting is still challenging.

Acknowledgements: Thanks to Jason Parker for pointing out the two Lucyshyn-Wright and Parker papers as interesting references.