Blog

The Continuation Monad

In this post, we introduce the continuation monad. This monad is slightly less well-behaved that some of the previous examples we have encountered, and a source of counterexamples to naive conjectures. Along the way, we will touch on some previous topics, such as monad morphisms, Kleisli and Eilenberg-Moore categories, and commutative monads.

The Monad Construction

We begin by establishing a relatively simple, but important form of adjunction. We do this at a greater level of generality than some of our earlier posts, to highlight the fairly minimal requirements. This adjunction then yields our monad in the usual way.

For any symmetric monoidal category \mathcal{V}, the monoidal closure means that for every object A, there is an adjunction:

(-) \otimes A \dashv A \multimap (-)

For a fixed object R, the functor (-) \multimap R is contravariant. That is, we can view it as having type \mathcal{V}^{op} \rightarrow \mathcal{V} or \mathcal{V} \rightarrow \mathcal{V}^{op} as is convenient to us.

Using the definitions, closure adjunction and symmetry, we have a series of natural bijections between the following homsets, involving the contravariant functor (-) \multimap R:

  • Morphisms A \multimap R \rightarrow B in \mathcal{V}^{op}
  • Morphisms B \rightarrow A \multimap R in \mathcal{V}
  • Morphisms B \otimes A \rightarrow R in \mathcal{V}
  • Morphisms A \otimes B \rightarrow R in \mathcal{V}
  • Morphisms A \rightarrow B \multimap R

These natural bijections establish that there is an adjunction:

(-) \multimap R : \mathcal{V} \rightarrow \mathcal{V}^{op} \dashv (-) \multimap R : \mathcal{V}^{op} \rightarrow \mathcal{V}

and therefore a monad on \mathcal{V}:

(((-) \multimap R) \multimap R, \eta, ((-) \multimap R) \circ \epsilon (-)  \circ ((-) \multimap R)

where \eta and \epsilon are the unit and counit of the adjunction. This monad is known as the continuation monad.

The Continuation Monad on Sets

To get a handle on what is going on, lets consider the special case where \mathcal{V} is the Cartesian closed category of sets and functions. The unit of the adjunction is then:

\eta(a)(f) = f(a)

and the counit is the same map, but in \mathsf{Set}^{op}. The multiplication of the monad at A is:

\mu(f )(g : A \Rightarrow R)) = f (\lambda h : (A \Rightarrow R) \Rightarrow R. h(g))

In the above, we’ve used lambda notation for inline functions, and added selected type annotations to emphasise how the input data is being used. Notice no copying or reuse of arguments takes place. This specific example lives in a Cartesian closed category, but the construction is inherently symmetric monoidal in nature, and therefore linear in its use of its arguments.

Kleisli Structure

A Kleisli morphism of type A \rightarrow B is a morphism in the underlying category of type A \rightarrow (B \multimap R) \multimap R. In the case of \mathsf{Set}, this is a function:

f : A \rightarrow (B \Rightarrow R) \Rightarrow R

From a computer science perspective, we can think of an element k : B \Rightarrow R as a continuation. This is how a computation should continue once it has computed a value of type B, eventually yielding a result of type R. An element g \in (B \Rightarrow R) \Rightarrow R is a computation yielding a B, that requires a continuation as a parameter, telling it how to continue to yield final result of type R.

Eilenberg-Moore Algebras

The Eilenberg-Moore algebras can be difficult to describe in the general case. We focus attention on a particular result. Consider the two element set 2 = \{ 0, 1 \}. The functor (-) \Rightarrow 2 : \mathsf{Set}^{op} \Rightarrow \mathsf{Set} is the contravariant powerset functor. An element \xi \in X \Rightarrow 2 can be seen a characteristic function, mapping elements appearing in a subset to 1, and the others to 0. This means that it has a left adjoint, as we already know, and their is an equivalent between the category of Eilenberg-Moore algebras, and \mathsf{Set}^{op}. Therefore, in this case:

\mathsf{Set}^{((-) \Rightarrow 2) \Rightarrow 2} \simeq \mathsf{Set}^{op}

It is well known (in the right circles!) that \mathsf{Set}^{op} is equivalent to the category \mathsf{CABA} of complete atomic Boolean algebras and morphisms preserving their joins and meets.

Remark: Notice there is a pronounced different here to considering complete Boolean algebras and their usual morphisms preserving the algebraic structure. In this case it is well known the forgetful functor to \mathsf{Set} does not have a left adjoint, as the free complete Boolean algebra on three generators does not exist. This is an instructive counterexample to the naive intuition that these algebraic constructions always “just work out”. Once you move beyond collection of operations of a fixed cardinality bound, you need to be much more careful.

A topos is a category rather similar to the category of sets and function, and \mathsf{Set} is probably the simplest example. Toposes are Cartesian closed, and have an object \Omega known as a subobject classifier, which can be thought of as a collection of truth values. The functor (-) \Rightarrow \Omega is the analog of the contravariant powerset functor on \mathsf{Set}, and again is monadic, so for any topos \mathcal{T}, we have a wide generalisation of the equivalence above:

\mathcal{T}^{((-) \Rightarrow \Omega) \Rightarrow \Omega} \simeq \mathcal{T}^{op}

This result is important for establishing finite completeness of toposes, via their finite completeness, and theory of Eilenberg-Moore categories.

Monad Morphisms and Algebras

We now turn to an interesting property of the continuation monad, that can be thought of as a very general representation result.

Consider a term in the algebraic theory of monoids, for example:

x_1 \times (x_2 \times x_3)

where x_1, x_2, x_3 come from some set X of variables. How can we evaluate this term in a particular monoid, say (R,\times,1)?

Firstly, we need to know what values to use for each variable, which we could specify via a function X \rightarrow R. Secondly, we really only need to specify the value for each equivalence class of terms, so the elements of \mathbb{L}(X), where \mathbb{L} is the free monoid (list) monad. So a crude first step is we need a function:

\mathbb{L}(X) \times (X \Rightarrow R) \rightarrow R

Given what we have seen in this post, it seems natural to rearrange this to emphasise the terms are the input, into the form:

\mathbb{L}(X) \rightarrow (X \Rightarrow R) \Rightarrow R

Obviously, the way we evaluate terms should not depend on which set we use to name the variables, so we expect this actually to be a family of maps, natural in X.

The algebraic structure also suggests that:

  1. The value for a term that is just a variable should be the value assigned to that variable.
  2. Evaluation should be compatible with substitution of terms.

These final two conditions correspond to the monad morphism axioms. So we are interested in monad morphisms of type

\mathbb{L} \rightarrow ((-) \Rightarrow R) \Rightarrow R

Obviously, there’s nothing special about our example of the list monad, so we’re really interested in monad morphisms of the form:

\mathbb{T} \rightarrow ((-) \Rightarrow R) \Rightarrow R

It turns out, there is a bijection between:

  1. Eilenberg-Moore algebra \alpha : \mathbb{T}(R) \rightarrow R.
  2. Monad morphisms \sigma : \mathbb{T} \rightarrow ((-) \Rightarrow R) \Rightarrow R

For \mathsf{Set} monads, the monad morphism resulting from an Eilenberg-Moore algebra has action:

\sigma(t)(k) = \alpha(\mathbb{T}(k)(t))

In the other direction, give a monad morphism, we get an Eilenberg-Moore algebra with action:

\alpha(t) = \sigma(t)(\mathsf{id}_{R})

In fact, this result goes through in much greater generality, but discussing the technical details would get distracting. Interested readers can refer to Kock’s “On Double Dualization Monads”.

This correspondence between algebras and monad morphisms opens up new possibilities. To see this, as a preliminary step, say that two monad morphisms:

  1. \sigma_1 : \mathbb{T}_1 \rightarrow \mathbb{S}
  2. \sigma_2 : \mathbb{T}_2 \rightarrow \mathbb{S}

commute if the following equality holds:

\mathsf{dst} \circ (\sigma_1 \otimes \sigma_2) = \mathsf{dst}' \circ (\sigma_1 \otimes \sigma_2)

where \mathsf{dst} and \mathsf{dst}' are the two double strength morphism. With this in place, we can say that two Eilenberg-Moore algebras \alpha_1 : \mathbb{T}_1(R) \rightarrow R and \alpha_2 : \mathbb{T}_2(R) \rightarrow R commute if their corresponding monad morphisms \mathbb{T}_i \rightarrow ((-) \multimap R) \multimap R commute as defined above. We then say an algebra is commutative if it commutes with itself.

This extends the notion of commutativity we have encountered previously in two directions:

  1. We consider commutativity for individual algebras, rather than for a monad as a whole.
  2. We can consider mixed forms of commutativity, for algebras of potentially different monads on the same base object.

We can also relate this form of commutativity with the previous definition. The following are equivalent for a monad \mathbb{T}.

  1. Every Eilenberg-Moore algebra is commutative.
  2. Every monad morphism with domain \mathbb{T} commutes with itself.
  3. The monad \mathbb{T} is commutative.

This theorem deepens our understanding of commutativity, allowing us to relate algebra-wise and monad-wise notions.

Conclusion

This post is relatively long. Despite this, we have only very briefly mentioned continuations, which are a fiddly topic in themselves. The continuation monad is an intriguing example of a monad, which can be hard to grasp as first sight. It is what is known as a monad without rank, which limits the scope of the algebraic techniques we have encountered previously.

Further reading: Johnstone’s “Stone Spaces” discusses the complete atomic Boolean algebra duality with \mathsf{Set}, and that the free complete Boolean algebra doesn’t exist. His “Sketches of an Elephant” gives a comprehensive account of topos theory, with the important monadicity theorem which we discussed appearing in the first volume.

Distributive Laws are Monads

Early on we claimed, with no explanation, that distributive laws are monads. As the title of this post suggests, we are now in a place to explain the details of what this rather intriguing statement means. This will require our previous experience of monads in a 2-category, and distributive laws. In particular, the ability to define monads within arbitrary 2-categories gives us an awful lot of additional expressive freedom. Our first step will be to introduce a 2-category suitable for formalising relationships between monads.

A 2-category of Monads

The 2-category that we will be interested in has monads as 0-cells, Eilenberg-Moore laws as 1-cells, and compatible natural transformations as 2-cells. We now spell out the details.

For an arbitrary 2-category \mathbf{K}, there is a 2-category \mathbf{Mnd}(\mathbf{K}) with:

  1. 0-cells: Tuples (\mathcal{C}, \mathbb{T}, \eta, \mu) consisting of a 0-cell \mathcal{C}, such that (\mathbb{T},\eta,\mu) is a monad on \mathcal{C} in \mathbf{K}.
  2. 1-cells: A 2-cell (\mathcal{C}, \mathbb{S}, \eta^{\mathbb{S}}, \mu^{\mathbb{S}}) \rightarrow (\mathcal{D}, \mathbb{T}, \eta^{\mathbb{T}}, \mu^{\mathbb{T}}) is a pair (U,\varphi) consisting of a 1-cell U : \mathcal{C} \rightarrow \mathcal{D}, and a 2-cell \varphi : \mathbb{T} \circ U \Rightarrow U \circ \mathbb{S}, such that \varphi is an Eilenberg-Moore law.
  3. 2-cells: A 2-cell (U,\varphi) \Rightarrow (U',\varphi') is a 2-cell \sigma : U \Rightarrow U' such that \varphi' \cdot (\mathbb{T} \circ \sigma) = (\sigma \circ \mathbb{S}) \cdot \varphi.

Horizontal composition is given by

(V,\psi : \mathbb{T} \circ V \Rightarrow V \circ \mathbb{S}) \circ (U,\varphi : \mathbb{S} \circ U \Rightarrow U \circ \mathbb{R}) = (V \circ U, (V \circ \varphi) \cdot (\psi \circ V))

with

\mathsf{Id}_{(\mathcal{C},\mathbb{T},\eta,\mu)} = (\mathsf{Id}_{\mathcal{C}}, \mathsf{id}_{\mathbb{T}})

The vertical composition and identities for 2-cells in \mathbf{Mnd}(\mathbf{K}) are inherited from \mathbf{K}.

Monads in the 2-category of Monads

We now consider what a monad in \mathbf{Mnd}(\mathbf{K}) boils down to. As we will be working a 2-category constructed from another 2-category, and with 3 different monads, we will be fussy in spelling out the details carefully.

A monad in \mathbf{Mnd}(\mathbf{K}) consists of four pieces of data:

  1. A 0-cell in \mathbf{Mnd}(\mathbf{K}). That is, some monad (\mathcal{C}, \mathbb{S}, \eta^{\mathbb{S}}, \mu^{\mathsf{S}}) in \mathbf{K}.
  2. A 1-cell in \mathbf{Mnd}(\mathbf{K}) of type (\mathcal{C}, \mathbb{S}, \eta^{\mathbb{S}}, \mu^{\mathsf{S}}) \rightarrow (\mathcal{C}, \mathbb{S}, \eta^{\mathbb{S}}, \mu^{\mathsf{S}}). That is, a 1-cell \mathbb{T} : \mathcal{C} \rightarrow \mathcal{C} in \mathbf{K}, and an Eilenberg-Moore law \lambda : \mathbb{S} \circ \mathbb{T} \Rightarrow \mathbb{T} \circ \mathbb{S}.
  3. A unit 2-cell of type \mathsf{Id}_{(\mathcal{C}, \mathbb{S}, \eta^{\mathbb{S}}, \mu^{\mathsf{S}})} \Rightarrow (\mathbb{T}, \lambda) in \mathbf{Mnd}(\mathbf{K}). That is, a 2-cell \eta^{\mathbb{T}} : \mathsf{Id}_{\mathcal{C}} \Rightarrow \mathbb{T} in \mathbf{K} such that \lambda \cdot (\mathbb{S} \circ \eta^{\mathbb{T}}) = \eta^{\mathbb{T}} \circ \mathbb{S}.
  4. A multiplication 2-cell of type (\mathbb{T}, \lambda) \circ (\mathbb{T}, \lambda) \Rightarrow (\mathbb{T}, \lambda) in \mathbf{Mnd}(\mathbf{K}). That is, a 2-cell \mu^{\mathbb{T}} : \mathbb{T} \circ \mathbb{T} \Rightarrow \mathbb{T} in \mathbf{K} such that \lambda \cdot (\mathbf{S} \circ \mu^{\mathbb{T}}) = \mu^{\mathbb{T}} \cdot (\mathbb{T} \circ \lambda) \cdot (\lambda \circ \mathbb{T}).

The unit and associativity axioms for a monad in \mathbf{Mnd}(\mathbf{K}) are equivalent to requiring that (\mathbb{T}, \eta^{\mathbb{T}}, \mu^{\mathbb{T}}) satisfies those axioms for a monad on \mathcal{C} in \mathbf{K}. Finally, the two equations satisfied by the unit and multiplication as 2-cells in \mathbf{Mnd}(\mathbf{K}) are equivalent to requiring that \lambda is a Kleisli-law. As we noted previously, being a distributive law is equivalent to simultaneously being an Eilenberg-Moore law and a Kleisli law. Therefore, a monad in \mathbf{Mnd}(\mathbf{K}) is equivalent to the following data:

  1. A 0-cell \mathcal{C} in \mathbf{K}.
  2. A pair of monads (\mathbf{S}, \eta^{\mathbb{S}}, \mu^{\mathbb{S}}) and (\mathbb{T}, \eta^{\mathbb{T}}, \mu^{\mathbb{T}}) on \mathcal{C} in \mathbf{K}.
  3. A distributive law \mathbb{S} \circ \mathbb{T} \Rightarrow \mathbb{T} \circ \mathbb{S} in \mathbf{K}.

Roughly speaking, up to being more careful to specify all the required data, a distributive law in \mathbf{K} is the same thing as a monad in \mathsf{Mnd}(\mathbf{K}).

Example: For the “ordinary” monads discussed in the previous post, a distributive law is a monad in \mathbf{Mnd}(\mathbf{Cat}).

If we iterate the monad 2-category construction, a distributive law in \mathbf{K} is the same thing as a 0-cell in \mathbf{Mnd}(\mathbf{Mnd}(\mathbf{K})).

Conclusion

This relatively short post has unravelled an observation of Street, to show how a distributive law can be seen as a monad in a suitable 2-category. This both served as an illustration of the power of the notion of monad in a 2-category, and provided an excuse to introduce the 2-category \mathsf{Mnd}(\mathbf{K}). This 2-category is important more broadly in the theory of monads and distributive laws. In particular, by using the different dualities that apply in a 2-category, which we encountered when discussing comonads, we can incorporate comonads and Kleisli laws in the same setting. By iterating these constructions, we can consider different flavours of distributive laws such as monad-comonad or comonad-monad distributive laws. Hopefully we will get the chance to return to these topics later.

Further reading: The constructions outlined in this post were introduced in Street’s “Formal Theory of Monads”, along with vastly more important ideas than we have had chance to discuss. The 2-category \mathsf{Mnd}(\mathbf{K}) is also discussed in a very readable way, with more motivation for computer scientists, in Power and Watanabe’s “Combining a Monad and a Comonad”, along with many other interesting developments.

Composition and Distributive Laws

A common theme in many scientific disciplines is the desire to build up complicated objects out of simpler parts. This compositional perspective allows us to work in a modular fashion, rather than addressing each new problem as a monolith that must be investigated from scratch.

With this in mind, given two monads \mathbb{T}, \mathbb{S}, it would be very convenient if the composite endofunctor \mathbb{T} \circ \mathbb{S} was also a monad in some canonical way. Unfortunately, this is not always the case. Although this is initially disappointing, the study of when such composites are monads is a rich and interesting subject.

Composing Endofunctors

We begin with some simple concrete examples, to get a sense of the various possibilities when we try to compose two monads as endofunctors. We will say that an endofunctor F can be given the structure of a monad, if there exists any \eta : \mathsf{Id} \Rightarrow F and \mu : F \circ F \Rightarrow F satisfying the unit and associativity axioms. We then note the following instructive examples:

  1. Trivially, for any monad \mathbb{S}, both the composites with the identity monad \mathsf{Id} \circ \mathbb{S} and \mathbb{S} \circ \mathsf{Id} carry the structure of a monad. As an equally trivial corollary of this observation, as there are endofunctors that carry multiple monad structures, this will also be a possibility for composite monads.
  2. For any monad \mathbb{S}, its post composition with the exception monad \mathbb{S}(- + E) carries the structure of a monad.
  3. The composition of the powerset monad with itself \mathbb{P} \circ \mathbb{P} does not carry the structure of a monad. This was shown by Klin and Salamanca “Iterated Covariant Powerset is not a Monad” in 2018.

This is a pretty mixed picture. There are monads such that composition with their endofunctor on one or both sides results in a monad every time, and at the other extreme, some monads can’t even be composed with themselves.

Distributive Laws

Our aim now is to try and deduce some sufficient conditions under which monads compose. We will consider two monads (\mathbb{S}, \eta^{\mathbb{S}}, \mu^{\mathbb{S}}) and (\mathbb{T},\eta^{\mathbb{T}}, \mu^{\mathbb{T}}) on the same base category. Our notation is a bit fussier than normal, using superscripts on the unit and counit to make it clear which monad is intended.

We would like to find a monad structure on \mathbb{T} \circ \mathbb{S}. There is an obvious choice for the unit:

\eta^{\mathbb{T}} \circ \eta^{\mathbb{S}} : \mathsf{Id} \Rightarrow \mathbb{T} \circ \mathbb{S}

If we trying composing the two multiplications together in the same way, we get:

\mu^{\mathbb{T}} \circ \mu^{\mathbb{S}} : \mathbb{T} \circ \mathbb{T} \circ \mathbb{S} \circ \mathbb{S} \Rightarrow \mathbb{T} \circ \mathbb{S}

We now see a problem, we have a natural transformation of type \mathbb{T} \circ \mathbb{T} \circ \mathbb{S} \circ \mathbb{S} \Rightarrow \mathbb{T} \circ \mathbb{S}, not the type \mathbb{T} \circ \mathbb{S} \circ \mathbb{T} \circ \mathbb{S} \Rightarrow \mathbb{T} \circ \mathbb{S} which we require. A natural attempt to address this problem would be to introduce some natural transformation \lambda : \mathbb{S} \circ \mathbb{T} \Rightarrow \mathbb{T} \circ \mathbb{S}, and form the composite:

(\mu^{\mathbb{T}} \circ \mu^{\mathbb{S}}) \cdot (\mathsf{Id}_{\mathbb{T}} \circ \lambda \circ \mathsf{Id}_{\mathbb{S}}) : \mathbb{T} \circ \mathbb{S} \circ \mathbb{T} \circ \mathbb{S} \Rightarrow \mathbb{T} \circ \mathbb{S}

Of course, we can’t just use any old natural transformation \lambda, it will need to satisfy some axioms in order for the resulting construction to be a monad. It is an interesting exercise to try and deduce these by attempting a proof, which I would highly recommend.

Sufficient conditions are two unit axioms:

\lambda \cdot (\eta^{\mathbb{S}} \circ \mathbb{T}) = \mathbb{T} \circ \eta^{\mathbb{S}} \qquad \lambda \cdot (\mathbb{S} \circ \eta^{\mathbb{T}}) = \mathbb{S} \circ \eta^{\mathbb{T}}

and two multiplication axioms:

\lambda \cdot (\mathbb{T} \circ \mu^{\mathbb{S}}) = (\mathbb{T} \circ \mu^{\mathbb{S}}) \cdot (\lambda \circ \mathbb{S}) \cdot (\mathbb{S} \circ \lambda) \qquad \lambda \cdot (\mathbb{S} \circ \mu^{\mathbb{T}}) =  (\mu^{\mathbb{T}} \circ \mathbb{S}) \cdot (\mathbb{T} \circ \lambda) \cdot (\lambda \circ \mathbb{T})

A natural transformation \lambda : \mathbb{S} \circ \mathbb{T} \Rightarrow \mathbb{T} \circ \mathbb{S} satisfying these four equations is known as a distributive law. This construction was first discovered by Jon Mock Beck, and they are sometimes referred to as Beck distributive laws for emphasis. Interestingly, the required axioms are exactly that \lambda is both a Kleisli law and an Eilenberg-Moore law, which we encountered previously.

Of course, we haven’t actually demonstrated that such natural transformations can be found, so it’s time for some examples.

Example: For the free monoid (list) monad \mathbb{L}, and the Abelian group monad \mathbb{A}, there is a distributive law:

\lambda : \mathbb{L} \circ \mathbb{A} \Rightarrow \mathbb{A} \circ \mathbb{L}

We can think of the elements of (\mathbb{L} \circ \mathbb{A})(X) as equivalence classes of products-of-sums, for example:

(x_1 + x_2) \times (x_3 + x_4)

If we apply the usual distributivity axiom for multiplication over addition, we get the term:

x \times (y + z) = x \times y + x \times z

our example term becomes a sum-of-products:

x_1 \times x_3 + x_1 \times x_4 + x_2 \times x_3 + x_2 \times x_4

which is an element of (\mathbb{A} \circ \mathbb{L})(X). This operation on terms is the idea behind the action of the distributive law, and also inspires the name distributive law. The resulting monad in the free ring monad.

Example: Again for the list and Abelian group monads, there is no distributive law composing in the opposite order, of the type:

\mathbb{A} \circ \mathbb{L} \Rightarrow \mathbb{L} \circ \mathbb{A}

This gives us an example of a pair of monads with a distributive law in one order, but not the other.

Example: Similarly to the ring monad example, there is a distributive law between the list and powerset monads:

\mathbb{L} \circ \mathbb{P} \Rightarrow \mathbb{P} \circ \mathbb{L}

The action of this distributive is:

[X_1,\ldots,X_n] \mapsto \{ [x_1,\ldots,x_n] \mid x_i \in X_i \}

As the powerset monad is the free complete join semilattice monad, and list is the free monoid monad, we can view the action of this distributive law algebraically as:

\bigvee X_1 \otimes \ldots \bigvee X_n = \bigvee \{ x_1 \otimes \ldots \otimes x_n \mid x_i \in X_i \}

Essentially, this distributive law comes from axioms of the form:

x \otimes \bigvee Y = \bigvee \{ x \otimes y \in Y \} \qquad \bigvee X \otimes y = \{ x \otimes y \mid x \in X \}

The algebras of the resulting monad are known as quantales.

Example: As a more computational example, consider the exception monad for any object E, and and arbitrary monad \mathbb{T}. There is a distributive law:

\mathbb{T} + E \Rightarrow \mathbb{T}(- + E)

Abstractly, this law has components:

\mathbb{T}(X) + E \xrightarrow{[\mathbb{T}(\kappa_1), \eta^{\mathbb{T}} \circ \kappa_2]} \mathbb{T}(X + E)

where \kappa_1 and \kappa_2 are the coproduct injections. For example, if we take monad \mathbb{T} to be the list monad, the distributive law acts on the left component of the coproduct as:

(0,[x_1,\ldots,x_n]) \mapsto [(0,x_1),\ldots,(0,x_n)]

and on the right component:

(1,e) \mapsto [(1,e)]

The next example illustrates that distributive laws are only one aspect of the composition question for monads.

Example: As another negative result, there is no distributive law for the list monad over itself \mathbb{L} \circ \mathbb{L} \Rightarrow \mathbb{L} \circ \mathbb{L}, but \mathbb{L} \circ \mathbb{L} does carry the structure of a monad (as was pointed out by Bartek Klin). So the lack of a distributive law does not spell doom for finding any monad structure at all on the composite endofunctor.

Given the previous example, one might ask what are the advantages of knowing the composite monad \mathbb{T} \circ \mathbb{S} arises via a distributive law? The key point is that we then get sensible relationships between the composite and its component parts. There are monad morphisms \mathbb{S} \Rightarrow \mathbb{T} \circ \mathbb{S} and \mathbb{T} \Rightarrow \mathbb{T} \circ \mathbb{S}. These in turn induce functors between the corresponding Kleisli and Eilenberg-Moore categories, as we have seen previously. Furthermore, \mathbb{T} lifts to a monad on the Eilenberg-Moore category of \mathbb{S}, and \mathbb{S} lifts to a monad on the Kleisli category of \mathbb{T}.

Conclusion

We have only touched on the topic of distributive laws in this post. Hopefully it’s fairly obvious to see that these ideas will generalise to the 2-categorical and bicategorical definitions of monads we recently discussed. There are results giving sufficient conditions, either for when a distributive laws does exist, or precluding that possibility. There is also a lot of open ground for new results in this area, showing that in some ways our understanding of monads is still rather naive.

For those interested in distributive laws, I would thoroughly recommend Jon Beck’s original 1969 paper, which is a very enjoyable read.

One final note of caution. Distributive laws are a notorious source of difficulty and errors in the literature. If you are working with them, or depending on the results of other, it is worth putting in extra effort to make sure you are on firm ground.

Comonads – Monads Upside Down

Over many previous posts, we have investigated various aspects of monads. To discuss monad theory properly, it is often useful to consider their interaction with the dual notion, comonads. Laying the foundations for such discussions is the aim of the current post. Although comonads can be seen as a special case of monads, that doesn’t really capture how they feel in practice. Our objectives for the current post are:

  1. To introduce several examples of comonads in the familiar setting of sets and functions.
  2. To describe some different ways in which comonad are “monads, but upside down”.

Although comonads are formally dual to monads, and therefore “just the same thing”, they are often seen as rather mysterious. A quick experiment on your favourite search engine will find a lot less discussion of comonads than monads for example. One might be tempted to speculate as to why that is, historical mathematical bias, something fundamental about the nature of mathematics,..? We shall avoid such philosophical musing, and concentrating on trying to build up some familiarity.

Introducing Comonads

A comonad on a category \mathcal{C} consists of:

  1. An endofunctor \mathbb{D} : \mathcal{C} \rightarrow \mathcal{C}.
  2. A counit natural transformation \epsilon : \mathbb{D} \Rightarrow \mathsf{Id}_{\mathcal{C}}.
  3. A comultiplication natural transformation \delta : \mathbb{D} \Rightarrow \mathbb{D} \circ \mathbb{D}.

This data must satisfy the following counitality and coassociativity equations:

(\epsilon \circ \mathbb{D}) \cdot \delta = \mathsf{Id}_{\mathbb{D}} = (\mathbb{D} \circ \epsilon) \cdot \delta \qquad (\mathbb{D} \circ \delta) \cdot \delta = (\delta \circ \mathbb{D}) \cdot \delta

The terms counit and comultiplication drop heavy hints that comonads are a dual notion to monads. We will return to this point shortly. First we plough on, with lots of examples.

Example: Lists are a standard example for monads. For comonads, the non-empty lists provide a similarly canonical example. Let \mathbb{L}^+ be the endofunctor that sends a set to its collection of non-empty lists. This functor is a comonad, with counit the function that returns the tail of the list, and comultiplication mapping a list to its list of prefixes, for example:

\delta([1,2,3]) = [[1],[1,2],[1,2,3]]

Unlike the list monad, we can bound the length of our lists to form another comonad \mathbb{L}^+_k of non-empty lists of length at most k \in \mathbb{N} elements.

As with monads, it’s worth considering some instructive degenerate examples. The first one hopefully wont be a big shock.

Example: For any category \mathcal{C}, the identity functor \mathsf{Id}_{\mathcal{C}} is a comonad, with the counit and comultiplication both being identity morphisms.

The previous example gives a trivial answer to a natural question. It is possible for an endofunctor to carry the structure of both a monad and a comonad. In fact, so does the endofunctor \mathbb{L}^+, which is perhaps a bit more interesting.

The next example is the comonadic analog of the inconsistent monads, in that it provides a healthy source of counterexamples.

Example: For any category with an initial object, the constant functor mapping to the initial object is a comonad. The counit and comultiplication are the unique morphisms:

0 \xrightarrow{\epsilon_X} X \qquad 0 \xrightarrow{\delta_X} 0

We might think of this as the overspecified comonad.

The next few of examples are comonadic relatives of the writer, reader and state monads we’ve encountered previously.

Example: For a fixed set E, the endofunctor (-) \times E is a comonad, with counit:

\epsilon((x,e)) = x

and comultiplication:

\delta((x,e)) = ((x,e),e)

Together these form a comonad, which we shall refer to as the reader comonad. Despite the name, this comonad is dual to the exception monad. Computationally, we can think of a map X \times E \rightarrow Y as a function from X to Y that can depend on some additional data it reads from the environment E.

Recall that for an adjunction L \dashv R, the composition R \circ L is a monad. Dually, the composite L \circ R is a comonad. The following is a standard example of this phenomenon.

Example: For a fixed object S, in a Cartesian closed category we have adjunction:

(-) \times S \dashv S \Rightarrow (-)

Therefore, the functor:

X \mapsto (S \Rightarrow X) \times S

is a comonad. Computationally, if we think of S as a collection of states, morphisms:

(S \Rightarrow X) \times S \rightarrow Y

bijectively correspond to morphisms:

(S \Rightarrow X) \rightarrow (S \Rightarrow Y)

which transforms state dependent values of type X to state dependent elements of type Y. This comonad is often referred to as the costate comonad.

Example: Given a monoid (M, \times,1), the functor M \Rightarrow (-) is a comonad, with counit:

\epsilon(f) = f(1)

and comultiplication:

\delta(f)(m)(n) = f(m \times n)

This is sometimes referred to as the exponent comonad.

A special case of this construction is an important example in itself. Consider the exponent comonad on the monoid of additive natural numbers (\mathbb{N},+,0). We can think of an element of \mathbb{N} \Rightarrow X as an infinite lists, or stream, of values from X:

[f(0),f(1),f(2),\ldots]

The counit returns the first element of the stream, and the comultiplication gives the stream of tails, as follows:

\delta([1,2,3,\ldots]) = [[1,2,3,\ldots],[2,3,4,\ldots],[3,4,5,\ldots],\ldots]

This construction is referred to as the stream comonad.

We now move onto some strategies to systematically construct simple comonads.

Example: Given a non-empty list, we can mark a focus position in it, written as follows:

[1,2,\underline{3},4]

There is then a natural counit operation, which returns the focal element:

\epsilon([1,2,\underline{3},4]) = 3

There is also a comultiplication operation, which acts as follows on our example list:

\delta([1,2,\underline{3},4) = [[\underline{1},2,3,4],[1,\underline{2},3,4],\underline{[1,2,\underline{3},4]},[1,2,3,\underline{4}]]

The resulting comonad is the zipper comonad for non-empty lists.

There was nothing special about non-empty lists in the previous example.

Example: Given a pair of values, we can mark a focus position in it, for example:

(\underline{1},2)

As in the previous example, there is an obvious counit returning the focus:

\epsilon((\underline{1},2)) = 1

There is also a comultiplication, analogously to the previous example, with action:

\delta((\underline{1},2)) = (\underline{(\underline{1},2)},(1,\underline{2}))

The resulting comonad is the zipper comonad for pairs.

This pattern of collections of data with a focus can be repeated in great generality to yield many further examples of comonads.

Another example of a similar kind is the following:

Example: Again, we consider the endofunctor \mathbb{L}^+. We define our counit to return the tail element of lists. We do something slightly more interesting with the comultiplication, which cycles through the list, as in the following example:

\delta([1,2,3]) = [[2,3,1],[3,1,2],[1,2,3]]

This forms what is know as the circular non-empty list comonad.

The previous example answers another natural question. It is possible for a given endofunctor to carry two different comonad structures. In fact, even if we fix the counit, this remains so.

The zipper comonads, the non-empty list comonad, the circular non-empty list comonad, and several other examples we’ve seen, are all built up in a similar manner:

  1. There are certain shapes of collections of data types, such as pairs, lists of length 1,2,3, and so on.
  2. Each shape has a collection of positions in which data is stored. For example, the first and second elements of a pair.
  3. There is a focus position. In the zippers this was given explicitly, in the non-empty list examples, it was the tail element.
  4. There is rather general notion of “sub-shape” at a given position.

The first two aspects allow us to describe an endofunctor, mapping a set X to the collection values of each shape of a given datatypes, with data at each position taken from X. The focus element allow us to build a counit, and the notion of sub-shape is key to describing a comultiplication. Of course, we are only giving imprecise intuitions here, to make all this mathematically precise, we need to introduce more technical machinery. Specifically, we need the framework of (directed) containers, which we may return to in a later post.

It’s all just monads upside-down!

We now return to the question of how monads are related to comonads, which we shall look at from a few different perspectives.

Given a fixed category \mathcal{C}, we can consider the monads on the opposite category \mathcal{C}^{op}. These consist of:

  1. An endofunctor \mathbb{M} on \mathcal{C}^{op}, which is the same thing as an endofunctor \mathbb{D} on \mathcal{C}. The presence of the “op” doesn’t really make a great deal of different here, unlike for the natural transformations.
  2. A unit natural transformation \eta : \mathsf{Id}_{\mathcal{C}^{op}} \Rightarrow \mathbb{M}. Due to the reversal of the direction of morphisms in the opposite category, this is the same thing as a natural transformation \epsilon : \mathsf{Id}_{\mathcal{C}} \Rightarrow \mathbb{D}.
  3. A multiplication natural transformation \mu : \mathbb{M} \circ \mathbb{M} \Rightarrow \mathbb{M}. This is the same thing as a natural transformation \delta : \mathbb{D} \Rightarrow \mathbb{D} \circ \mathbb{D}.

The monad axioms satisfied by \eta and \mu are the same thing as requiring the comonad equations for the corresponding \epsilon and \delta. That is:

A comonad on a category \mathcal{C} is the same thing as a monad on the opposite category \mathcal{C}^{op}.

This is all very nice, but in previous posts, we generalised monads to 2-categories and bicategories. The previous explanation does really help us understand comonads in terms of monads in that setting, as it is very specifically tied to monads on categories. We need to look at things in a slightly different way.

Given a 2-category or bicategory \mathbf{K}, there are a few different ways we can produce new bicategories, by flipping the directions of different cells.

  1. We can reverse the direction of the 1-cells, yielding a bicategory \mathbf{K}^{op}.
  2. We can reverse the direction of the 2-cells, yielding a bicategory \mathbf{K}^{co}.
  3. We can flip the directions of both the 1 and 2 cells, yielding a bicategory \mathbf{K}^{coop}.

The operation we are interested in is flipping the direction of 2-cells. It is fairly routine to check that:

A comonad on 0-cell \mathcal{A} in bicategory \mathbf{K} is the same thing as a monad on \mathcal{A} in \mathbf{K}^{co}.

Finally, we note that for a fixed 0-cell \mathcal{A} in \mathbf{K}, the category \mathbf{K}(\mathcal{A}, \mathcal{A}) is a monoidal category, with the monoidal structure given by composition and identities. A monad on \mathcal{A} is the same thing is a monoid in \mathbf{K}(\mathcal{A}, \mathcal{A}). Dually:

A comonad on a 0-cell \mathcal{A} in bicategory \mathbf{K} is the same thing as a comonoid in the monoidal category \mathbf{K}(\mathcal{A},\mathcal{A}).

If you’ve not encountered comonoids before, you could equally take this as their definition via comonads.

These more esoteric definitions are useful for performing formal calculations at a high level of abstraction, but concrete examples are often important to develop satisfactory intuitions for applications.

Conclusion

We introduced comonads at a fairly rapid pace, building on our previous experience with monads. There is a lot more we could say about comonads, much of which would dualise the situation with monads. For example:

  1. There are corresponding Eilenberg-Moore and Kleisli constructions, with a relation to adjunctions. All of this works out in an entirely dual manner to that for monads.
  2. There is a notion of cofree comonad – again formally dual to that of free monad.

In each case it is worthwhile to examine concrete examples and constructions in familiar categories. There are also interesting questions about the relationships between monads and comonads, and the interaction between them. We will return to such topics in future posts.

Further reading: A good source of information about comonads from a mathematical or computer science perspective is the work of Tarmo Uustalu and Varmo Vene. I would suggest the paper “Comonadic Notions of Computation” as a good starting point.

Graded Monads

Now we have seen monads on 2-categories and bicategories, we are in a position to introduce some new ideas.

We have encountered various notions of monads so far, including:

These are all special cases of monads in a bicategory. This time, we are going to see a genuine generalisation. To do so, we follow a well-worn categorical trail, and adopt a functorial perspective on a familiar object, the monads themselves. This will give us yet another way of describing monads, and a suitable starting point for generalisation. The resulting graded monads are a powerful tool for modelling resource usage in a monadic setting.

A 2-Functorial Perspective

In ordinary category theory, we often consider the points in a category as morphisms from the terminal object. For example, in \mathsf{Set}, the elements of a set X bijectively correspond to functions of type \{ \star \} \rightarrow X with domain the terminal object.

To think about the points of a 2-category, we need to introduce 2-functors. A strict 2-functor \mathbf{F} : \mathbf{A} \rightarrow \mathbf{B} consists of:

  1. A map on 0-cells, \mathcal{C} \;\mapsto\; \mathbf{F}(\mathcal{C}).
  2. A map on 1-cells, F : \mathcal{C} \rightarrow \mathcal{D} \;\mapsto\; \mathbf{F}(F) : \mathbf{F}(\mathcal{C}) \rightarrow \mathbf{F}(\mathcal{D}).
  3. A map on 2-cells, \alpha : F \Rightarrow G \;\mapsto\; \mathbf{F}(\alpha) : \mathbf{F}(F) \Rightarrow \mathbf{F}(G).

These maps preserve identities

\mathbf{F}(\mathsf{Id}_{\mathcal{C}}) = \mathsf{Id}_{\mathbf{F}(\mathcal{C})} \qquad \mathbf{F}(\mathsf{id}_F) = \mathsf{id}_{\mathbf{F}(F)}

and composition

\mathbf{F}(G \circ F) = \mathbf{F}(G) \circ \mathbf{F}(F) \qquad \mathbf{F}(\beta \cdot \alpha) = \mathbf{F}(\beta) \cdot \mathbf{F}(\alpha)

in the obvious way.

Put another way, a strict 2-category is simply a \mathsf{Cat}-enriched category, and strict 2-functors are then \mathsf{Cat}-enriched functors.

The terminal 2-category 1 is the category with one 0, 1 and 2-cell. Explicitly, we have:

  1. A 0-cell \star.
  2. An identity 1-cell \mathsf{Id}_\star : \star \rightarrow \star.
  3. An identity 2-cell \mathsf{id}_{\mathsf{Id}_\star} : \mathsf{Id}_\star \rightarrow \mathsf{Id}_\star.

With these definitions in place, we can consider the (strict) points of a 2-category \mathbf{C}. These are 2-functors:

\mathbf{P} : 1 \rightarrow \mathbf{C}

Such a functor can map \star to any 0-cell it chooses. As both the identity cells must be preserved, this is the only freedom in choosing a particular \mathbf{P}. Therefore we can identity strict points of a 2-category with its 0-cells. So far, so boring.

As we learned last time, it is mathematically natural to relax the strict setting to one where structure is only preserved up to isomorphism. To do so, we consider strong 2-functors (often called pseudofunctors or simply 2-functors). These have the same mappings as before, but now we have isomorphism 2-cells:

  • \mathsf{Id} \overset{\cong}{\Rightarrow} \mathbf{F}(\mathsf{Id}).
  • \mathbf{F}(G) \circ \mathbf{F}(F) \overset{\cong}{\Rightarrow} \mathbf{F}(G \circ F).

As usual, these isomorphisms must satisfy some coherence equations, ensuring they are suitably compatible with each other. We will skip spelling these out explicitly.

Now consider strong 2-functors \mathbf{P} : 1 \rightarrow \mathbf{C}. As the identity 2-cell must still be preserved, these consist of:

  1. A 0-cell \mathbf{P}(\star).
  2. A 1-cell \mathbf{P}(\mathsf{Id}).
  3. An isomorphism 2-cell \mathsf{Id} \overset{\cong}{\Rightarrow} \mathbf{P}(\mathsf{Id}).
  4. An isomorphism 2-cell \mathbf{P}(\mathsf{Id}) \circ \mathbf{P}(\mathsf{Id}) \overset{\cong}{\Rightarrow} \mathbf{P}(\mathsf{Id}).

Possibly this is starting to look slightly familiar. A good choice of notation will hopefully clarify things further. Lets write

\mathcal{C} = \mathbf{P}(\star) \qquad \mathbb{T} = \mathbf{P}(\mathsf{id})

and name the two invertible 2-cells \eta : \mathsf{Id} \overset{\cong}{\Rightarrow} \mathbb{T} and \mu : \mathbb{T} \circ \mathbb{T} \overset{\cong}{\Rightarrow} \mathbb{T}. With this terminology, specifying a strong 2-functor 1 \rightarrow \mathbf{C} consists of:

  1. A 0-cell \mathcal{C}.
  2. A 1-cell \mathbb{T}.
  3. An isomorphism 2-cell \eta : \mathsf{Id} \overset{\cong}{\Rightarrow} \mathbb{T}.
  4. An isomorphism 2-cell \mu : \mathbb{T} \circ \mathbb{T} \overset{\cong}{\Rightarrow} \mathbb{T}.

This looks suspiciously like the definition of a monad on the 0-cell \mathcal{C} in the 2-category \mathbf{C}. In fact, the coherence conditions for this strong 2-functor exactly require that \eta and \mu satisfy the unit and multiplication axioms of a monad.

There is one remaining oddity to address, that \eta and \mu are isomorphisms, which is too strong to capture most monads. We need to weaken the conditions on our 2-functors still further. Fortunately, another standard categorical notion fits the bill. A lax 2-functor has the same data as a strong 2-functor, but we drop the isomorphism requirement for the 2-cells:

  • \mathsf{Id} \Rightarrow \mathbf{F}(\mathsf{Id}).
  • \mathbf{F}(G) \circ \mathbf{F}(F) \Rightarrow \mathbf{F}(G \circ F).

Notice that now these 2-cells are not required to be invertible, their direction matters. We could equally consider oplax 2-functors, in which these 2-cells go in the other direction. The lax setting is the right one for our needs. We can make the standard observation:

A monad is the same thing as a lax 2-functor from the terminal category.

We have concentrated on strict 2-categories to avoid bookkeeping yet more isomorphisms, but this observation remains true if we move to the bicategorical setting.

Time to Generalise

The point of moving to a functorial perspective is that we have a natural launching point for generalisation. We identified monads in the 2-category \mathbf{C} with lax 2-functors 1 \rightarrow \mathbf{C}. The obvious next step is to consider domains other than the terminal 2-category.

As a first step, we adjust our perspective on the terminal category slightly. Any monoid can be seen as a 1-object category, with morphisms corresponding to elements of the monoid, and composition and identities given by the monoid multiplication and unit respectively. Given any category \mathcal{C}, we can view it as a 2-category, with the 0 and 1 cells given by the objects and morphisms of \mathcal{C}, and the only 2-cells being the necessary identities. We can then build the terminal category as follows:

  1. Consider the terminal monoid. This is the monoid with just a unit element.
  2. View this as an ordinary category.
  3. View the resulting category as a 2-category.

Obviously, we could perform these simple changes of perspective with any monoid (M, \times, 1). This yields the notion of graded monad. A graded monad on a 0-cell \mathcal{C}, graded by M consists of:

  1. A family of 1-cells \mathbb{T}_m : \mathcal{C} \rightarrow \mathcal{C}, indexed by the elements of M.
  2. A unit 2-cell \eta : \mathsf{Id} \Rightarrow \mathbb{T}_1.
  3. Multiplication 2-cells \mu_{m,n} : \mathbb{T}_m \circ \mathbb{T}_n \Rightarrow \mathbb{T}_{m \times n}.

These satisfy some generalisations of the unit and multiplication axioms, compatible with the monoid indexing.

Example: We can consider functors \mathbb{L}_n sending a set X to the set of lists of elements of X of exactly length n. There is an obvious mapping:

\eta_X : X \rightarrow \mathbb{L}_1(X)

sending each element of x to the singleton list [x]. There are also multiplication morphisms:

\mu_{m,n,X} : \mathbb{L}_m(\mathbb{L}_n(X)) \rightarrow \mathbb{L}_{m \times n}

given by concatenation. For example:

[[1,2],[3,4],[5,6]] \mapsto [1,2,3,4,5,6]

Together, this data constitutes a graded monad, graded by the natural numbers under multiplication (\mathbb{N}, \times,1). We shall refer to this as the exact list graded monad.

This example gives a general sense of the nature of graded monads, particularly in computer science. The grading allows us to track some quantitative data, allowing to model resources appropriate to the effects in our application. We mention another perhaps slightly surprising degenerate example:

Example: Every 1-cell F : \mathcal{C} \rightarrow \mathcal{C} induces a (\mathbb{N},+,0) graded monad \mathbb{F} on \mathcal{C}, with

\mathbb{F}_n = \overbrace{F \circ \ldots \circ F}^{n \text{ times}}

So in particular, \mathbb{F}_0 = \mathsf{Id}. The unit and multiplications are identity 2-cells.

The exact list example is a bit inflexible, as list lengths are fixed completely. To loosen this up, we introduce a new mathematical object. A preordered monoid is a monoid (M,\times,1) with a preorder on M such that the monoid multiplication is monotone.

Example: The natural numbers under multiplication form a partially ordered monoid under the usual ordering on natural numbers, as:

m_1 \leq m_2 \;\wedge\; n_1 \leq n_2 \;\Rightarrow\; m_1 \times n_1 \leq m_2 \times n_2

Similarly to ordinary monoids, preordered monoid can be viewed as (thin) 2-categories, with 2-cells between 1-cells m,n whenever m \leq n. We can then generalize further, to monads graded by a preordered monoid. These are the same as monoid graded monads, with the addition that if m \leq n there is a 2-cell \mathbb{T}_m \Rightarrow \mathbb{T}_n. As usual, these 2-cells must satisfying some compatibility equations, which we will skip over.

Example: There is a second family of list functors \mathbb{L}^{\leq}_n, with the elements of \mathbb{L}^{\leq}_n(X) consisting of lists of elements of X of length at most n. The unit and multiplication maps are given by singletons and concatenation, as in the previous example. If m \leq n, there is a natural transformation:

\mathbb{L}^{\leq}_m \Rightarrow \mathbb{L}^{\leq}_n

promoting lists of shorter length to the longer context. These constitute a \mathbb{N}, \times, 1, \leq)-graded monad, the bounded list graded monad.

We can go further, and in the extreme consider monads graded by any 2-category we like.

Conclusion

Graded monads can be seen as a resource aware generalisation of monads, with natural applications in areas such as computer science. For example, there are non-trivial graded state, writer and maybe monads. There is a lot more to explore here, for example:

  1. Developing the mathematics further, for example suitable Kleisli and Eilenberg-Moore constructions.
  2. Discussing applications of graded monads.

As ordinary monads are already a rich topic, unsurprisingly this is more than we can hope to cover properly in a single post. We may return to this subject again later.

This post owes a great deal to exposition in papers and talk slides by Tarmo Uustalu. I would recommended the slides from “Grading monads, comonads, distributive laws” as a good starting point for further reading.

Acknowledgements: Thanks to Ralph Sarkis for picking up a confusing typo in one of the examples.

Bicategories, monoids, internal and enriched categories

Last time, we introduced a more general definition of monads, in the setting of (strict) 2-categories. Unfortunately this strictness is often a bit too rigid to incorporate natural mathematical objects, and this is certainly true for monad theory. This time, we will push our definition of monads beyond this strict setting, into the wider world of bicategories.

Generalising to Bicategories

A bicategory \mathbf{C} consists of:

  1. A collection of 0-cells A,B,\ldots
  2. Between every pair of 0-cells A,B, a category \mathbf{C}(A,B). The objects of these categories are termed 1-cells, and the morphisms 2-cells. Composition of 2-cells is called vertical composition.
  3. For every triple of 0-cells, A,B,C a bifunctor \circ : \mathbf{C}(B,C) \times \mathbf{C}(A,B) \rightarrow \mathbf{C}(A,B), referred to as horizontal composition. For every 0-cell A, there are identity 1-cells \mathsf{Id}_A in \mathbf{C}(A,A).

Horizontal composition is associative and unital up to isomorphism. That is, there are natural isomorphisms:

  1. A left unitor \lambda : \mathsf{Id} \circ (-) \Rightarrow (-).
  2. A right unitor \rho : (-) \circ \mathsf{Id} \Rightarrow (-).
  3. An associator \alpha : (-) \circ ((-) \circ (-)) \Rightarrow ((-) \circ (-)) \circ (-), between the two different orders in which we can apply horizontal composition twice.

As you might expect, this structure is subject to some coherence equations, entirely analogous to those for a monoidal category.

Example: A (strict) 2-category is a bicategory in which the unitors and associators are identities, so horizontal composition is unital and associative on the nose.

Example: A monoidal category is “the same thing” as a bicategory with one 0-cell. This is often phrased as “A monoidal category is a one object bicategory”.

Considering one direction in more detail, in a bicategory \mathbf{C} with one 1-cell A consists of a single category \mathbf{C}(A,A). We also have:

  1. A horizontal composition bifunctor \mathbf{C}(A,A) \times \mathbf{C}(A,A) \rightarrow \mathbf{C}(A,A). We consider this to be the tensor product of our monoidal category.
  2. The identity 1-cell \mathsf{Id}_A will serve as the monoidal unit.

The coherence axioms for a bicategory is this simple case are exactly those of a monoidal category. Going in the other direction is similar.

We also introduce a couple of more complex bicategories that will be important in subsequent examples.

Example: A span of type X \rightarrow Y in a category \mathcal{C} is a pair of \mathcal{C}-morphisms of the form:

X \xleftarrow{f} A \xrightarrow{g} Y.

If \mathcal{C} has pullbacks, we can form the composite of a span X \xleftarrow{f} A \xrightarrow{g} Y with a span Y \xleftarrow{h} B \xrightarrow{k} Z as:

X \xleftarrow{f \circ \pi_1} A \times_{Y} B \xrightarrow{g \circ \pi_2} Z

where \pi_1, \pi_2 are the two projection maps given by forming the pullback of g along h.

Given two spans X \xleftarrow{f_1} A_1 \xrightarrow{g_1} Y and X \xleftarrow{f_2} A_2 \xrightarrow{g_2} Y a morphism of spans between them is a \mathcal{C}-morphism h : A_1 \rightarrow A_2 such that:

f_2 \circ h = f_1 \quad\text{and}\quad g_2 \circ h = g_1

In fact, for a category \mathcal{C} with pullbacks, there is a bicategory \mathbf{Span}(\mathcal{C}) with:

  1. 0-cells the objects of \mathcal{C}.
  2. \mathbf{Span}(\mathcal{C})(X,Y) consists of all spans of type X \rightarrow Y, and span morphisms between them.
  3. The universal property of pullbacks means composition of spans extends to a bifunctor, which we take to be our horizontal composition.

We are skimming over a size issue here. Readers with the necessary background may want to consider how such questions might creep in to the definition above.

Example: Let \mathcal{V} be a monoidal category with coproducts. For sets A,B, we can consider \mathcal{V}-valued matrices of type A \rightarrow B to be functions:

M : A \times B \rightarrow \mathsf{obj}(\mathcal{V})

Given a pair of matrices M : A \rightarrow B and N : B \rightarrow C, we can form their composite as follows:

(N \circ M)(a,c) = \coprod_{b \in B} M(a,b) \otimes N(b,c)

Given a pair of matrices M_1, M_2 : A \rightarrow B, a morphism of matrices is a family of \mathcal{V} morphisms:

f_{a,b} : M_1(a,b) \rightarrow M_2(a,b)

This data can be combined into a bicategory \mathbf{Mat}(\mathcal{V}) with:

  1. 0-cells sets.
  2. The category \mathbf{Mat}(\mathcal{V}(A,B) consists of matrices of type A \rightarrow B and morphisms between then.
  3. Horizontal composition is given by composition of matrices, as described above.

The two previous examples are typical for bicategories, in that horizontal composition involves either limits or colimits. In such cases, if we form iterated horizontal composites in different orders, the universal properties involved will ensure they agree up to isomorphism. Insisting that composites formed in different orders agree on the nose would lead to unrealistic requirements on our choice of (co)limits, so the bicategorical rather than strict 2-categorical setting is more natural.

Now we’ve seen a few different bicategories, it’s time to get back to our main motivation, monads. A monad in a bicategory \mathbf{C} consists of:

  1. A 0-cell \mathcal{C}.
  2. A 1-cell \mathbb{T} : \mathcal{C} \rightarrow \mathcal{C}.
  3. A unit 2-cell \eta: \mathsf{Id}_{\mathcal{C}} \Rightarrow \mathbb{T}.
  4. A multiplication 2-cell \mu: \mathbb{T} \circ \mathbb{T} \Rightarrow \mathbb{T}.

As before, our notation is suggestive of that for “ordinary” monads on categories. Notice the data above is no different to that which we encountered in the strict 2-categorical setting. Things get more interesting when we consider the coherence equations that need to hold, as now we must account for the fact that horizontal composition is not unital or associative on the nose, but only up to isomorphism. The required equations are:

  1. \mu \cdot (\eta \circ \mathsf{id}_{\mathbb{T}}) = \lambda.
  2. \mu \cdot (\mathsf{id}_{\mathbb{T}} \circ \eta) = \rho.
  3. \mu \cdot (\mu \circ \mathsf{id}_{\mathbb{T}}) = \mu \cdot (\mathsf{id}_{\mathbb{T}} \circ \mu) \cdot \alpha.

Firstly, we see that this really does generalise our previous definition.

Example: As we would expect, a monad in a 2-category is a special case of a monad in a bicategory. The unitors and associators are identities, and the equations above collapse to those of the 2-categorical case.

We also find a fun simple source of monads that would not have made sense before:

Example: We can view any monoidal category \mathcal{V} as a one object bicategory. A monoid in \mathcal{V} is the same thing as a monad in this one object bicategory. For example, every group or monoid can be seen as a monad. There is a certain symmetry here:

  1. In a bicategory \mathbf{C}, every monad on \mathcal{A} is a monoid in the monoidal category \mathbf{C}(\mathcal{A},\mathcal{A}).
  2. A monoid in a monoidal category is a monad in the corresponding one object bicategory.

We now move on to some more interesting examples, which really highlight the benefits afforded by the more liberal bicategorical definition.

Example: A (small) category consists of a set of objects O and a set of morphisms M. There are two functions giving the source and target of a given morphism, forming a span:

O \xleftarrow{s} M \xrightarrow{t} O

The identities induce a morphism from the identity span to this span. If we form the pullback of s along t, the object M \times_O M consists of all composable pairs of morphisms, and there is a span morphism from the resulting span to the one defined above, given by the function maps composable pairs to their composite.

Given a category \mathcal{C} with pullbacks, we can define an internal category in \mathcal{C} to consists of the analogous structure. A (small) category is then an internal category in \mathsf{Set}.

The key observation for our purposes is that an internal category in \mathcal{C} is the same thing as a monad in \mathbf{Span}(\mathcal{C}). The monad unit and multiplication correspond to the identities and composition respectively. The rest is “just” an exercise in unravelling definitions.

Example: For a monoidal category \mathcal{V} with coproducts, a \mathcal{V}-enriched category \mathcal{A} consists of:

  1. A set of objects O.
  2. For each pair of objects o_1, o_2 a hom-object in \mathcal{V}. We can view this as a \mathcal{V}-valued matrix A : O \rightarrow O.
  3. Morphisms j_o : \mathcal{I} \rightarrow A(o,o) picking out the identities. These form a morphism of matrices from the identity matrix to A.
  4. Composition morphisms m_{o_1, o_2, o_3} : A(o_2,o_3) \otimes A(o_1,o_2) \rightarrow A(o_1, o_3), which via the universal property of coproducts induce a morphism \coprod_{o_2 \in O} A(o_1, o_2) \otimes A(o_2, o_3) \rightarrow A(o_1,o_3). Together, these form a morphism of matrices of type A \circ A \rightarrow A.

A more careful unpacking of the definitions following these intuitions shows that a \mathcal{V}-category is the same thing as a monad in \mathbf{Mat}(\mathcal{V}).

As a special case of this construction, the Booleans can be considered as a thin two object monoidal category \mathcal{B}. The category \mathbf{Mat}(\mathcal{B}) is equivalent to the bicategory of relations \mathbf{Rel} we encountered last time. The Boolean enriched categories are then exactly preorders.

Conclusion

By considering a further generalisation of the notion of monad to the weak setting of bicategories, many new interesting examples become possible. The examples of internal and enriched categories are well-known, and provide the intuitions for other, more elaborate constructions.

A categorically minded reader should be asking themselves the following question. We have identified internal and enriched categories as monads in certain bicategories, what about the morphisms between them? Do internal and enriched functors correspond to an obvious notion of morphism between these monads? The answer to this question is a bit more subtle than one might hope. We may return to this point in a later post.

More background on bicategories, and the monads corresponding to internal and enriched categories can be found in Lack’s “A 2-Categories Companion”, which is highly recommended reading.

First steps into a larger world of monads

Until now, we have taken the definition of monad to be:

  • A category \mathcal{C}.
  • An endofunctor \mathbb{T} : \mathcal{C} \rightarrow \mathcal{C}.
  • A unit natural transformation \eta : \mathsf{Id} \Rightarrow \mathbb{T}.
  • A multiplication natural transformation \mu : \mathbb{T}^2 \Rightarrow \mathbb{T}.

satisfying some compatibility equations. In this post, we will take a first step in generalising this definition to include a much broader range of mathematical objects. Our original definition arises as a special case, and interesting new examples come into view.

Generalising to 2-categories

A 2-category is an abstraction of the fundamental mathematical structure of categories, functors and natural transformation. Note that some more modern accounts refer to these as strict 2-categories, reserving the term 2-category for a more general notion, although this terminology is still not in consistent use. We shall occasionally use the adjective strict for emphasis.

A (strict) 2-category has three key building blocks:

  1. 0-cells, which we shall denote \mathcal{C}, \mathcal{D}, \ldots.
  2. Between any pair of 0-cells \mathcal{C}, \mathcal{D}, there is a collection of 1-cells of type \mathcal{C} \rightarrow \mathcal{D}. We shall denote 1-cells as F,G,\ldots. For any 0-cell \mathcal{C}, there is a distinguished identity 1-cell \mathsf{Id}_{\mathcal{C}} : \mathcal{C} \rightarrow \mathcal{C}.
  3. Between any pair of 1-cells F, G : \mathcal{C} \rightarrow \mathcal{D} of the same type, there is a collection of 2-cells of type F \Rightarrow G. We shall denote 2-cells as \alpha, \beta, \ldots. For any 1-cell F, there is a distinguished identity 2-cell \mathsf{id}_{F} : F \Rightarrow F.

These components can be composed in various ways:

  • For any pair of 1-cells F : \mathcal{A} \rightarrow \mathcal{B} and G : \mathcal{B} \rightarrow \mathcal{C}, there is a composite 1-cell G \circ F : \mathcal{A} \rightarrow \mathcal{C}.
  • For 1-cells, F,G : \mathcal{A} \rightarrow \mathcal{B} and H, K : \mathcal{B} \rightarrow \mathcal{C}, and 2-cells \alpha : F \Rightarrow G and \beta : H \Rightarrow K there is a horizontal composite natural transformation \beta \circ \alpha : H \circ F \Rightarrow K \circ G.
  • For any triple of 1-cells F,G,H : \mathcal{A} \rightarrow \mathcal{B}, and 2-cells \alpha : F \Rightarrow G and \beta : G \Rightarrow H, there is a vertical composite natural transformation \beta \cdot \alpha.

These composition operations satisfy various unitality, assocativity and other compatibility equations, such that the collections of 1-cells, and the 2-cells between them form categories, and horizontal composition is a bifunctor. The details are readily available elsewhere, so I shall avoid spelling out the full formal definition.

Example: Categories, functors and natural transformations form a 2-category \mathbf{Cat}, and in fact this is the motivating example. Strictly speaking we should say small categories, functors and natural transformations to avoid Russell’s paradox like problems. As usual we shall not dwell on such size issues.

Example: There is a 2-category \mathbf{Rel} with:

  1. 0-cells sets.
  2. 1-cells of type A \rightarrow B relations R \subseteq A \times B.
  3. There is a 2-cells R \Rightarrow S if R \subseteq S.

The identity 1-cells are the usual notion of identity relation:

\mathsf{Id}(a_1,a_2) \Leftrightarrow (a_1 = a_2)

This sort of 2-category is termed thin as there is at most one 2-cell between a pair of 1-cells. That is, the collections of 1-cells, and 2-cells between them form preorders.

With the definition of 2-category in place, the generalisation of the notion of monad is pretty routine. We simply replace category, functor and natural transformation with 0-cell, 1-cell and 2-cell in our original definition.

A monad in a 2-category consists of:

  1. A 0-cell \mathcal{C}
  2. A 1-cell \mathbb{T} : \mathcal{C} \rightarrow \mathcal{C}.
  3. A unit 2-cell \eta : \mathsf{Id} \Rightarrow \mathbb{T}.
  4. A multiplication 2-cell \mu : \mathbb{T}^2 \Rightarrow \mathbb{T}.

This data must satisfy the following three equations:

\mu \cdot (\eta \circ \mathsf{Id}_{\mathbb{T}}) = \mathsf{Id}_{\mathbb{T}} \qquad \mu \cdot (\mathsf{Id}_{\mathbb{T}} \circ \eta) = \mathsf{Id}_{\mathbb{T}} \qquad \mu \cdot (\mu \circ \mathsf{Id}_{\mathbb{T}}) = \mu \cdot (\mathsf{id}_{\mathbb{T}} \circ \mu)

We recover our original notion of monad as a special case.

Example: Monads in the 2-category \mathbf{Cat} are exactly our original notion of monads.

We can also now talk about further examples, that would not even have made sense before.

Example: A monad on a set A in the 2-category \mathbf{Rel} is a relation R : A \rightarrow A such that:

  1. \mathsf{Id} \subseteq R.
  2. R \circ R \subseteq R.

Unpacking this into a pointwise definition, this is a relation R such that:

  1. For all a \in A, R(a,a).
  2. If R(a_1,a_2) and R(a_2,a_3) then R(a_1,a_3).

That is, a monad on A in \mathbf{Rel} is a preorder on A.

A variation on the previous example yields a classical observation of Lawvere, in a monadic guise.

Example: Let \mathbb{R}^{\infty} denote the positive real numbers, extended with a maximal element \infty. We can extend addition and multiplication to \mathbb{R}^{\infty} by setting:

x + \infty = \infty = \infty + x \qquad\text{and}\qquad x \times \infty = \infty =\infty \times x

There is a 2-category \mathbf{Rel}_{\mathcal{L}} with:

  1. 0-cells sets.
  2. 1-cells of type A \rightarrow B are functions R : A \times B \rightarrow \mathbb{R}^{\infty}. We think of these as positive real valued relations, with R(a,b) giving a distance between a and b.
  3. There is a 2-cell R \Rightarrow S if for all a,b, R(a,b) \geq S(a,b). Note the direction of ordering here.

As this 2-category is thin, it only remains to specify the identity and composition of 1-cells. The identity 1-cells generalise the identity relations we saw in the earlier example:

\mathsf{Id}(a,a) = \begin{cases} 0, &\text{if \begin{math}a_1 = a_2\end{math}} \\ \infty, &\text{otherwise} \end{cases}

In terms of distances, an element a is zero distance from itself, and infinitely far away from everything else.

For the composition of R : A \rightarrow B and S : B \rightarrow C, again we generalise composition of relations from the previous example:

(S \circ R)(a,c) = \bigwedge_{b \in B} R(a,b) + R(b,c)

In terms of our distance based intuition, if R gives distances between A and B, and S gives distances between B and C, then (S \circ R)(a,c) gives the shortest distance between a and c travelling in two steps:

  1. Firstly moving from a to some element b \in B using the distances specified by R.
  2. Secondly, moving from the chosen element b \in B to c using the distances specified by S.

Now we have specified our setting, we can consider what the monads are. A monad on a set A in \mathbf{Rel}_{\mathcal{L}} consists of a function D : A \times A \rightarrow \mathbb{R}^{\infty} such that there are 2-cells:

  1. \mathsf{Id} \Rightarrow D
  2. D \circ D \Rightarrow D

It will help to expand these to pointwise definitions, and make the ordering between elements explicit. We find that:

  1. D(a,a) = 0
  2. D(a_1,a_2) \leq \bigwedge_{a \in A} D(a_1, a) + D(a, a_2). This is equivalent to requiring for all a_1, a_2, a that the triangle inequality D(a_1, a_2) \leq D(a_1,a) + D(a, a_2) holds.

So the monads in \mathbf{Rel}_{\mathcal{L}} are generalised metric spaces in which we permit the following extra flexibility:

  1. Non equal points may be at zero distance from each other.
  2. Distances are not necessarily symmetric, that is we may have D(a,b) \neq D(b,a).

The 2-category of the previous example was built by considering relations over generalised truth values. Varying this idea yields many other interesting examples.

Our final example is a natural generalisation of \mathbf{Cat}.

Example: We encountered enriched categories briefly when discussing strong monads. For a monoidal category \mathcal{V}, there are \mathcal{V}-categories, \mathcal{V}-functors and \mathcal{V}-natural transformations. These form a 2-category \mathcal{V}-\mathbf{Cat}, and the monads in this 2-category yield exactly the definition of \mathcal{V}-monad that is considered in enriched category theory. This sort of example is encouraging evidence for the usefulness of the abstract definition of monad, in that it recovers mathematical construction considered in settings beyond ordinary category theory.

Conclusion

We have taken a first step in generalising the definition of monad to the setting of (strict) 2-categories. This allowed us to rediscover some interesting new examples, such as preorders and generalised metric spaces being monads. Unfortunately, some natural examples we would like to cover are still out of reach as the strict 2-categorical setting is too rigid. To get our hands on many further examples, we shall take a second step, and generalise the definition of monad to bicategories in a later post.

Algebra, Order and Monads

In the previous post, we introduced preorders, posets and monotone maps between them. We then investigated a small number of monads on these categories of ordered structures. The additional order relations enable interesting new structure on \mathsf{Pos}-monads that we hadn’t encountered before with our main example of \mathsf{Set}-monads.

In this post, we will sketch how the algebraic perspective that is so instructive with \mathsf{Set} monads can be generalised to monads on \mathsf{Pos}. Our discussions will be relatively informal, aiming to highlight the additional aspects of algebra in the land of ordered structures that are of interest. The details involve some fairly technical ideas, which are beyond the scope of what we have covered so far. Interested readers will find some pointers at the end of the post.

Relatives of the Maybe Monad

In the previous post, we introduced the “obvious” generalisation of the maybe monad from sets to posets, with functor component:

X \mapsto X + 1

To do so, we mimicked the construction used in \mathsf{Set} in terms of binary coproducts and terminal objects. Our current aim to try and understand such constructions from an algebraic perspective, whatever that means now we’ve moved from sets to posets.

To try and get an algebraic handle on \mathsf{Pos}-monads, we are going to assume we need a collection of operations \Sigma, and a collection of equations E, as was the case for ordinary algebra.

In fact, for the maybe monad, this is already sufficient, so we don’t need anything new. We assume \Sigma consists of a single new constant element \star, and we require no equations. As we only have a constant element, the only terms we can form over X are either:

  1. Variables from X.
  2. The constant symbol \star.

As there are no equations, we don’t need to worry about forming equivalence classes. If we assume variables inherit an ordering from X, and no other non-trivial order relations hold, the resulting poset is isomorphic to

X + 1

which is exactly what we hoped for. So far, so easy. A small additional step will take us into less familiar territory. Rather than generalising the maybe monad, instead we can consider the exception monad induced by some poset E. This will have functor action:

X \mapsto X + E

Recall that the coproduct of posets inherits the order from the two components, and no other non-trivial order relations hold.

Algebraically, the \mathsf{Set} exception monad corresponded to adding a set E of new constant symbols. If we do this is \mathsf{Pos}, and form terms over X as we did above for the maybe monad, we will get a poset with an underlying set consisting of:

  1. Variables from X.
  2. Constant symbols e \in E.

The only non-trivial order relations will be those between the variables inherited from X. This isn’t quite general enough for what we want, as we want to allow E to be a poset, not just a mere set. It is reasonably clear we cannot achieve the desired effect, even if we add in some equations. So we shall need the first new feature of algebra on posets:

Operation symbols of the same arity form a poset.

If we were to provide some formal syntax, in our example, we are allowing ourselves to introduce relations

e_1 \leq e_2

between the constant symbols representing exceptions. More generally, if we have two operation symbols o_1, o_2 of the same arity, and o_1 \leq o_2, then we have the following axiom between terms:

o_1(x_1,\ldots,x_n) \leq o_2(x_1,\ldots,x_n)

Last time, we introduced another relative of the maybe monad, the lift monad. Here, we map a poset X to a poset with a new element \star added below all of the elements of X. It doesn’t seem we can achieve the desired effect by ordering our operation symbols, as we need to add an inequation of the form:

\star \leq x

between an operation symbol and a bare variable. It is tempting to conclude that for algebra between posets, we need to permit inequations rather than mere equations between terms. In fact, this seems mathematically very natural, but lets not be too hasty.

We can’t achieve the desired effect directly by ordering our operation symbols, as we currently only have the constant symbol \star. Nobody said we can’t add additional operation symbols, so lets be a bit more creative, and add two unary symbols l (for left hand side) and r (for right hand side), and require that l \leq r. Finally, we require the equations:

l(x) = \star and r(x) = x

Then we can conclude that:

\star = l(x) \leq r(x) = x

If we form equivalence classes of terms, ordering them according to these axioms, the resulting construction yields the lift monad. The upshot of this procedure is that

We can use inequations between terms in poset algebra, but this is achieved using ordering between operation symbols, and equations between terms.

So from this perspective, poset algebra is still fundamentally about equations, not inequations, which is possibly a bit of a surprise.

Relatives of the List Monad

Previously, in the case of \mathsf{Set}-monads, we have seen that the list monad is the free monoid monad. That is, we have a constant symbol 1, and a binary operation symbol \times, such that the following three equations hold:

1 \times x = x = x \times 1 and (x \times y) \times z = x \times (y \times z).

We don’t need to consider ordering between our operation symbols, as they are of different arities. We can form equivalence classes of terms, as we would for the set theoretic construction, which correspond to the lists of variables symbols that appear in the terms. The key question is what non-trivial order relations should we add between terms. As we’ve seen before, for terms over poset X, we should always inherit the order relations between variables. Now we have a non-constant operation symbol \times, we have two obvious choices about how this operation interacts with the order structure of its arguments:

  1. Consider the multiplication operation \times to be monotone.
  2. Consider the multiplication operation \times to be an arbitrary function.

If we take the first option, the resulting monad is the list monad, with lists of the same length ordered pointwise. This is the monad \mathbb{L}_2 of the previous post. If on the other hand we take the less restrictive route of the second option, we get a different list monad, in which only the singleton lists are ordered with respect to each other if their contained elements are. This is the monad we called \mathbb{L}_1 in the previous post. These observations apply in full generality, and lead us to our second observation about poset algebra:

We have the right, but the not obligation, to require that every operation is monotone.

So in fact, there are two different notions of poset algebra we can consider, depending on this choice. It is perhaps slightly surprising at first glance that allowing non-monotone operations is a viable option. One choice corresponds to \mathsf{Pos}-enriched monad theory, and the other to \mathsf{Set}-enriched monad theory, but that will take us further into technically challenging material than is appropriate with what we know so far.

Obviously, we can impose other relations such as:

x \leq x \times y and x \times y \leq y

using the ability to encode inequations as axioms discussed above. These additional inequations yield two more of the list monad variants from the previous post. We also note that in general, our ability to encode inequations lets us require that some, but not all of our operations are monotone.

Finally, we need to consider the ordered list monads that we discussed last time. Recall, these were monads in which we only allow rising lists of the form:

[x_1,\ldots,x_n]

where each consecutive pair of elements are ordered, that is x_i \leq x_{i + 1}.

These monads seem to involve some sort of phenomenon that we’ve not encountered so far. It doesn’t seem possible to encode such a monad using the features we’ve introduced previously. We need something new. This leads us to possibly the most exotic feature of poset algebra.

Intuitively, the problem we have is that if we consider a binary operation \times, there’s no restriction on the arguments we can apply it to, for example we can form

x \times y

where x is strictly greater than y. This is because for ordinary set algebra, an operation is a map:

\overbrace{X \times \ldots \times X}^{n\rm\ times} \rightarrow X

and we have so far continued with this approach to operation arities in the order theoretic context. To generalise things, we observe that if we write [n] for the set \{ 1, \ldots, n \}, this is equivalent to a function:

X^{[n]} \rightarrow X

where X^{[n]} is the exponential, the set of all functions from [n] to X. In fact, such an operation is equivalent to one of the form:

X^A \rightarrow X

where A is any n-element set. So operations in set can be seen as having arities finite sets. If we generalise this point of view to posets, an operation should be a map:

X^A \rightarrow X

where A is a finite poset. This gives us more freedom than before, as finite posets can specify non-trivial ordering between arguments, as the exponential X^A contains only monotone maps. For our list based construction, we can require a \times operation with arity the two element poset \{ x \leq y \}. This is an operation that maps ordered pairs to a new element.

In terms of syntax, this means we should only be able to form the term x \times y if we have established that x \leq y.

Combining these general arities with some of the previous tricks allows us to specify monads that deal with ordered lists. This leads us to our third and final observation about poset algebra:

Operations have arities which are finite posets.

This is a feature we might not have anticipated in advance, but in fact is does fall out of some (rather technical) general categorical arguments, involving locally finite presentable categories.

Conclusion

We have sketched some features we might encounter when dealing with monads and algebra in the context of posets. We encountered three new features beyond what we see in the setting of ordinary universal algebra.

  1. Operation symbols of the same arity can be ordered with respect to each other.
  2. We have the option to require that all our operations are monotone, but this is not compulsary.
  3. The arities of operations that we can consider are not mere numbers. Instead they are specified by finite posets imposing constraints on how the arguments are ordered.

We also discovered we didn’t need to impose inequations between terms as a new feature, as these can be simulated using the features above.

These generalisations follow a pattern that continues for appropriate notions of algebra in other categorical settings. A good starting point for further background is Robinsons “Variations on Algebra”, which is written in an enjoyable expository style, whilst being much more precise on the details than this post. Important technical background can be found in Kelly and Power’s “Adjunctions whose counits are coequalizers, and presentations of finitary enriched monads”. A lot more detail about the right notion of algebra in these generalised settings can be found in the many papers on Lawvere theories of Power and co-authors. The papers “Presentations and algebraic colimits of enriched monads for a subcategory of arities” and “Diagrammatic presentations of enriched monads and varieties for a subcategory of arities” by Lucyshyn-Wright and Parker may also be of interest.

Although there is a great deal of literature about generalised algebra, particularly at the level of Lawvere theories, there is a bit of a gap in the literature. Developing precise syntax and proof systems for these generalised equational logics in specific cases of interest is rarely worked out in detail, particularly when generalised arities are involved. This is a shame, as the concrete tools of equational presentations and equational logic are tremendously useful tools in the set theoretic setting. Detailed meta-theoretic results for such systems, at a high level of abstraction, can be found in the work of Fiore and co-authors, but applying these methods to a specific setting is still challenging.

Acknowledgements: Thanks to Jason Parker for pointing out the two Lucyshyn-Wright and Parker papers as interesting references.

Getting Our Examples In Order

Most of the previous posts have focussed on \mathsf{Set}-monads and their algebraic presentations. This was done to avoid pulling in lots of abstract machinery, whilst allowing us to still build up good intuitions about what is going on.

Unfortunately, some phenomena only truly reveal themselves when we move away from the category of sets and functions. To develop our understanding, we need additional proving grounds for the ideas we discuss. In the current post, we shall beginning discussing monads on ordered structures. This is a baby step away from \mathsf{Set}, where new possibilities emerge, but we still work with simple mathematical objects that are relatively easy to describe and visualise.

Basics of Posets and Preorders

Posets and preorders are two related mathematical structures that abstract the idea of an ordered collection of objects. A preorder P consists of a set, which we shall also denote P, and a reflexive, transitive binary relation \leq on P. That is, a relation satisfying both:

  • Reflexivity: p_1 \leq p_1.
  • Transitivity: p_1 \leq p_2 and p_2 \leq p_3 implies p_1 \leq p_3.

A poset is a preorder for which the relation \leq is also irreflexive. That is:

p_1 \leq p_2 \;\wedge\; p_2 \leq p_1 \;\Rightarrow\; p_1 = p_2

Example: The set \mathbb{N} of natural numbers is a poset, with their usual ordering.

Example: For any set X, the powerset \mathcal{P}(X) is a poset, with the subset inclusion relation.

Example: For any set X, the the finite powerset \mathcal{P}_\omega(X) can be ordered with U \leq V if the number of elements of U is less than equal to the number of elements of V. This gives the finite powerset of X the structure of a preorder, but not a poset, as both \{x \} and \{ y \} are finite subsets of \{ x, y \} with the same number of elements, but they are not equal.

A morphism of preorders or posets h : (P, \leq_P) \rightarrow (Q, \leq_Q) is a function h : P \rightarrow Q such that:

p_1 \leq_P p_2 \Rightarrow h(p_1) \leq_Q h(p_2)

Such functions are referred to a monotone. The categories of preorders and monotone maps, and partial orders and monotone maps will be denoted \mathsf{Pre} and \mathsf{Pos} respectively.

As well as considering the categories of preorders and posets, it is worth thinking about these objects as special sorts of categories themselves. From this perspective:

  • A preorder is a category in which there is at most one morphism between a pair of objects. Such as category is sometimes described as being thin.
  • A poset is a thin category in which the only isomorphisms are the identities.

(Strictly speaking we should say that preorders and posets are small categories, but we’ll steer clear of discussing size issues.)

Remark: The fact that preorders and posets are “baby” categories is a useful tool for studying category theory. Such examples are often introduced very early in introductory category theory courses, but are nevertheless powerful aides to understanding. If a categorical construction is proving hard to grasp in full generality, considering it on preorders will mean that any valid diagrams will automatically commute. This means that any equational axioms will be satisfied, once you’ve identified morphisms of the right type. Even simpler, if you consider the special case of posets, any structural isomorphisms, for example those that occur in the definition of a monoidal category, will collapse to being equalities, so again lots of things become trivial. These simplifications can remove a lot of distractions whilst playing around with unfamiliar constructions, or trying to tease out the correct abstraction of a phenomenon of interest.

Relatives of the Maybe Monad

Previously, we introduced the maybe monad on sets and functions. Recall the functor action maps a set X to the set X + 1, with an extra element added. Concretely, this operation is the coproduct with (a choice of) terminal object, so up to isomorphism the monad action is the disjoint union:

X + 1 = X \uplus \{ \star \} =  \{ (1, x) \mid x \in X \} \cup \{ (2,\star) \}

If we think of this as a construction involving binary coproducts and terminal objects, we can define a maybe monad in any category where this structure exists. In the category \mathsf{Pos} the coproduct (P, \leq_P) + (Q, \leq_Q) has underlying set P \uplus Q, and order relation x \leq y if and only if either:

  • x = (1, p_1) and y = (1, p_2) and p_1 \leq_P p_2.
  • x = (2, q_1) and y = (2, q_2) and q_1 \leq_Q q_2.

That is, we just glue together the two components, and inherit the order from the two parts.

The terminal object in \mathsf{Pos} is also similar to that in \mathsf{Set}. The underlying set is \{ \star \}, and the order relation is the inevitable \star \leq \star.

With these concrete constructions in place, on objects the maybe monad in \mathsf{Pos} simply adds an extra element to a poset, unrelated with respect to the elements of the original structure.

From a computational point of view, it is sometime useful to interpret an order relation on a structure as an information or approximation ordering. In this way, p_1 \leq p_2 is interpreted as p_1 being less informative than p_2.

If we adopt this point of view, the maybe monad is not entirely satisfactory when interpreting \star as computation failure or divergence. This should yield no information, and therefore be a minimal element, rather than simply incomparable with everything else.

This suggests considering a construction on poset (P, \leq_P) with universe (P, \leq_P) \uplus \{ \star \} as before. The order relation extends that of the maybe monad with the additional relations:

(2, \star) \leq (1,p)

for all p \in P. That is, \star is now placed at the bottom of the resulting structure. In fact, this yields a new monad on \mathsf{Pos}, often termed the lift monad. The name is derived from the idea that the original structure is lifted up above the new element. Notice how this additional construction doesn’t even make sense on mere sets, so the presence of order allows us to introduce new features – this will be a recurring theme.

Relatives of the List Monad

The list monad is such a standard example that we should investigate list like constructions in our new setting. For a set X, we shall write L(X) for the set of lists of elements from X.

For a poset (P, \leq_P), we would like to build a monad \mathbb{L} such that \mathbb{L}(P, \leq) has underlying set L(P). We shall also require the unit morphism

\eta_{(P,\leq)} (P, \leq_P) \rightarrow \mathbb{L}(P, \leq)

to map elements to singleton lists p \mapsto [p] as usual, and the multiplication to be list concatenation.

As the unit morphism must be a monotone map, we then must have for all p_1, p_2 such that p_1 \leq p_2:

[p_1] \leq [p_2]

in \mathbb{L}(P,\leq). We shall call this the singleton axiom. By reflexivity, we also need for every list l that l \leq l. In fact, these are all the relations we need, and the resulting construction yields a monad, which we shall denote \mathbb{L}_1 for convenience.

So we have found a list monad structure on \mathsf{Pos}, but it’s a bit ugly. If we think of these lists as data structures, why would we only order singleton lists, this seems a bit unnatural? Instead, it might be more natural for list of the same length [p_1,\ldots,p_n] and [q_1,\ldots,q_n] to have

[p_1,\ldots,p_n] \leq [q_1,\ldots,q_n]

if

p_1 \leq_P q_1 \; \wedge\; \ldots \; \wedge \; p_n \leq_P q_n

We shall call this the pointwise ordering axiom. Adding pointwise ordering to the construction \mathbb{L}_1 yields a second monad, which we shall denote \mathbb{L}_2.

We could order lists based on their endpoints. So p_n \leq q_1 implies:

[p_1, \ldots, p_n] \leq [q_1,\ldots,q_m]

We shall call this the endpoints axiom. Adding this axiom to the constructions for \mathbb{L}_1 or \mathbb{L}_2 yields two more monads \mathbb{L}_3 and \mathbb{L}_4.

Varying things in another direction, we could restrict our attention to lists

[p_1, \ldots p_n]

such that for all consecutive elements p_i, p_{i + 1}

p_i \leq_p p_{i + 1}

Clearly, we must still require the singleton axiom, and the relations implied by reflexivity. In fact, this construction yields another monad, which we shall denote \mathbb{L}_5. Adding the endpoint axiom to this construction yields a further monad \mathbb{L}_6.

Notice that we do have to be careful about these variations though. Adding in the pointwise ordering axiom to \mathbb{L}_5 or \mathbb{L}_6 does not result in a valid monad, as list concatenation won’t be monotone so the multiplication is not well-defined. For example, for natural numbers with their usual ordering, pointwise ordering yields

[0,3] \leq [1,4]

but the concatenation of the ordered list

[[0,3],[1,4]]

is

[0,3,1,4]

which is not correctly ordered.

Conclusion

Moving to categories of ordered structures, even whilst restricting our attention to constructions originating with the maybe and list monads, we found a plethora of new possibilities. This newfound freedom all felt slightly wild, with many options for extending and varying constructions that were more straightforward in \mathsf{Set}. Not all the combinations of features yielded valid monads, so care was needed to (hopefully!) avoid mistakes. It is natural to wonder if we can make things more systematic? For example, where did the six monads \mathbb{L}_1, \ldots \mathbb{L}_6 come from? Are there more? What goes on with other constructions such as the multiset monad?

We shall look at further examples, and the question of whether there are more instructive methods we can bring to bear, in later posts.

Coproducts of Monads

Often the simplest way to form a combination of two objects in category theory is to take their coproduct. In this post, we are going to investigate coproduct of monads, beginning as usual from an algebraic perspective.

Sums of Theories

We begin by considering a pair of \mathsf{Set}-monads given by equational presentations. We shall investigate how a suitable method of combining those presentations yields a corresponding operation on monads.

Given two equational presentations, (\Sigma_1, E_1) and (\Sigma_2, E_2), we can define a new equational presentation with:

  • Operations the disjoint union \Sigma_1 \uplus \Sigma_2.
  • Equations given by the union E_1 \cup E_2.

We shall refer to this as the sum (\Sigma_1, E_1) + (\Sigma_2, E_2) of the two component theories.

This combines the two equational theories such that there is “no interaction between them”. This can be seen by considering the algebras of the resulting theory. An algebra for (\Sigma_1, E_1)  + (\Sigma_2, E_2) consists of

  1. A fixed choice of universe A.
  2. A (\Sigma_1, E_1)-algebra structure on A.
  3. A (\Sigma_2, E_2)-algebra structure on A.

The sum of theories induces an operation \mathbb{T}_1 \oplus \mathbb{T}_2 on the monads induced by these equational presentations. In fact, the resulting monad is the coproduct of \mathbb{T}_1 and \mathbb{T}_2.

Coproducts of Free Monads

Forming coproducts of monads via an operation on their algebraic presentations as we have just described has some limitations. In particular:

  1. To exploit it, we need to identify presentations for our monads of interest.
  2. We don’t get a description of the monad \mathbb{T}_1 \oplus \mathbb{T}_2 directly in terms of the component monads \mathbb{T}_1 and \mathbb{T}_2.
  3. Realistic applications will probably involve categories other that \mathsf{Set}. We would prefer constructions that generalise smoothly to other settings.

To address some of these problems, we shall begin by examining the simple case of coproducts of free monads. For a pair of signatures \Sigma_1 and \Sigma_2, the free monads for the corresponding signature functors must satisfy:

\Sigma_1^* \oplus \Sigma_2^* \cong (\Sigma_1 + \Sigma_2)^*

This follows immediately from the universal property, or via a simple direct calculation. We then recall that free monads can be constructed using least fixed points, so we have:

(\Sigma_1^* \oplus \Sigma_2^*)(X) \cong \mu Z. X + \Sigma_1(Z) + \Sigma_2(Z)

We can think of this as building the coproduct in stages. In the first stage, we begin with the set of variables X. In subsequent stages, we:

  1. Apply every operation symbol in \Sigma_1 to terms constructed in the previous stage.
  2. Apply every operation symbol in \Sigma_2 to terms constructed in the previous stage.
  3. Add back in the set of variables X again.

Example: Let \Sigma_1 and \Sigma_2 both contain a single unary operation symbol, \sigma_1 and \sigma_2 respectively. The terms over set of variables \{ x \} will be:

  1. x.
  2. x, \sigma_1(x), \sigma_2(x).
  3. x, \sigma_1(x), \sigma_2(x), \sigma_1(\sigma_1(x)), \sigma_2(\sigma_2(x)), \sigma_1(\sigma_2(x)), \sigma_2(\sigma_1(x))

This lets us build up the elements of the coproduct of free monads, using the underlying signature functor. This is a step in the right direction, but ideally we would use the monads themselves in the construction. To try and achieve this, we notice that each term consists of layers of operations from one signature, followed by layers of operations from the other signature, eventually applied to variable symbols.

Example: Continuing the previous example, we end up with terms like:

\sigma_2(\sigma_2(\sigma_1(x))

We can see this as a layer of two \Sigma_2 operations, followed by a layer with one \Sigma_1 operation applied to a variable.

Can we build up these layers directly, using the free monads \Sigma^*_1 and \Sigma^*_2? A naive first attempt would be to try the fixed point construction:

\mu Z. X + \Sigma^*_1(Z) + \Sigma^*_2(Z)

If we consider the first few approximants to this fixed point, we get:

  1. X + \Sigma_1^*(\emptyset) + \Sigma_2^*(\emptyset).
  2. X + \Sigma_1^*(X + \Sigma_1^*(\emptyset) + \Sigma_2^*(\emptyset)) + \Sigma_2^*(X + \Sigma_1^*(\emptyset) + \Sigma_2^*(\emptyset)).

This is not quite what we want. For example, \Sigma_1^*(X + \Sigma_1^*(\emptyset) + \Sigma_2^*(\emptyset)) will contain a copy of the variables. So we are going to end up with two many copies of the variables in our fixed point. Intuitively, \Sigma_1^*(X) muddles together both bare variables and non-trivial terms.

Hopefully this rings a bell. We have previously seen that free monads are ideal monads. Recall that an ideal monad abstracts the idea of “keeping variables separate”. An ideal monad \mathbb{T} has a subfunctor \mathbb{T}' such that:

\mathbb{T}(X)  \cong X + \mathbb{T}'(X)

In the case of free monads {\Sigma^{*}}'(X) contains all the guarded terms. These are exactly the terms that are not bare variables. As a second attempt at a fixed point construction, we consider

\mu Z. X + {\Sigma^*}'_1(Z) + {\Sigma^*}'_2(Z)

Unfortunately, this is still not quite right. The first few approximants to this fixed point are:

  1. X + {\Sigma^*}'_1(\emptyset) + {\Sigma^*}'_2(\emptyset).
  2. X + {\Sigma^*}'_1(X + {\Sigma^*}'_1(\emptyset) + {\Sigma^*}'_2(\emptyset)) + {\Sigma^*}'_2(X + {\Sigma^*}'_1(\emptyset) + {\Sigma^*}'_2(\emptyset)).

The problem now is that this construction will build up some terms in more than one way, resulting in redundant duplicate copies.

Example: Continuing our previous example, we can consider \sigma_1(\sigma_1(x)) as arising either directly as a term in {\Sigma^*}'_1(X), or in two steps by applying {\Sigma^*}'_1 to {\Sigma^*}'_1(X), by applying \sigma_1 to \sigma_1(x).

To address this, we must be more careful to build up terms in alternating layers of operations from each signature, in a unique way. To achieve this explicit alternation, we instead solve a fixed point equation in two variables:

(\Sigma^\dagger_1(X), \Sigma^\dagger_2(X)) = \mu (Z_1, Z_2). ({\Sigma^*}'_1(X + Z_2),{\Sigma^*}'_2(X + Z_1))

This will produce terms with alternating layers of operations from the two signatures. \Sigma^\dagger_1(X) will contain terms with a \Sigma_1 operation on top, and dually for \Sigma^\dagger_2. We then recover:

(\Sigma_1^* \oplus \Sigma_2^*)(X) \cong X + \Sigma^\dagger_1(X) + \Sigma^\dagger_2(X)

Coproducts of Ideal Monads

So far, we have rearranged the construction of a coproduct of free monads into language which involves fixed points. Has all this reshuffling got us anywhere?

Well we could consider applying the same construction to arbitrary ideal monads. Firstly, we calculate the fixed point:

(\mathbb{T}^\dagger_1(X), \mathbb{T}^\dagger_2(X)) = \mu (Z_1, Z_2). (\mathbb{T}'_1(X + Z_2), \mathbb{T}'_2(X + Z_1))

Assuming the required fixed points exist, we can then form the coproduct as:

(\mathbb{T}_1 \oplus \mathbb{T}_2)(X) \cong X + \mathbb{T}^\dagger_1(X) + \mathbb{T}^\dagger_2(X)

It is worth considering what this says. Firstly, from the algebraic perspective we’re now building up equivalence classes of terms under provable equality, not just raw terms. The intuitive reason that this construction works is that the layers cannot interact. We can prove equalities within a layer, but once a layer of terms from the other theory is inserted, this layer is insulated from interacting with anything else. Hence, we can build up the equivalence classes of \mathbb{T}_1 \oplus \mathbb{T}_2, not just the underlying terms, inductively in layers, which is rather a pleasing outcome.

In fact, this is true for arbitrary categories and ideal monads, as long as the required fixed points exist. For example, we can also apply this result to the completely iterative monads that we saw previously.

Summing Up

The ideas in this post mainly follows Ghani and Uustalu’s “Coproducts of Ideal Monads”, which is strongly recommended for further reading. There are other explicit constructions for coproducts of comonads, for example in the case of a free monad and an arbitrary monad, as described in Hyland, Plotkin and Power’s “Combining Effects: Sum and Tensor”. “Coproducts of Monads on Set” by Ad├ímek, Milius, Bowler and Levy gives a detailed analysis in the case of \mathsf{Set}-monads. We should also point out that even in the case of \mathsf{Set}-monads, coproducts of pairs of monads do not always exist.