Term Rewriting with (2,1)-Lawvere Theories

Posted on by Owen Lynch

Introduction

The purpose of this blog post is to share with the world some thoughts I’ve been having about term rewriting systems in the context of a categorical computer algebra.

Evan Patterson and I are thinking about rewriting the core machinery of Catlab to take more seriously the categorical perspective on algebra. The core machinery of Catlab is data structures and functions for manipulating Generalized Algebraic Theories. As one might guess, these are are more general version of algebraic theories. Thus, as a warm up for doing this refactor, I have been thinking about regular old algebraic theories in a categorical/computer algebra context.

Specifically, there are three viewpoints that I am trying to “take seriously”, in the hope that they will lead to a more general and elegant view of computer algebra.

  1. The central object of study should be the category where the objects are “contexts”. We will get to exactly what the morphisms in this category are later.
  2. Term rewriting and equational reasoning is a 2-categorical groupoidal structure on the previous category.
  3. It is important to understand precisely what a presentation is, and how the terms in that presentation are built.

I will be illustrating my ideas with snippets of Julia code.

Note: this post will not be particularly friendly to those without much of a background in category theory. I apologize for this, I wrote this blog post in a short period of time and don’t really want to spend more time in writing more background.

Lawvere Theories

The framework that I am working in mostly derives from the idea of a Lawvere theory, so we are going to briefly review that first. To be clear, this is a review that will get you and me on the same page for terminology, not a review that will give you an intuition for Lawvere theories if you haven’t seen them before. If you haven’t seen Lawvere theories before, I suggest taking a slow perusal through Lawvere’s PhD thesis; it’s always worth reading Lawvere because you will learn more than you expected!

A Lawvere theory is a category T whose objects are canonically generated as finite products of a generating set \Lambda of sorts. That is, every single object is canonically a product \prod_{i=1}^{n} S_{i} for S_{i} \in \Lambda.

An “evil” way to put this is that there is a functor (\mathsf{FinSet}/\Lambda)^{\mathrm{op}} \to T that is the identity on objects.

Note that we have linked the n-Lab page for “single-sorted” Lawvere theories, whereas we in this blog post will always be talking about “multi-sorted” Lawvere theories.

Now, this seems like a fairly abstruse definition; it is unclear exactly what this has to do with algebra. The real meat of Lawvere theories comes from how to define them in terms of presentations. A presentation consists of generators and relations, and the generator part of the presentation is called a signature.

Signatures

A signature \Sigma of a Lawvere theory is inductively defined in the following way.

  1. A set \Lambda of sorts
  2. A set \Omega of operations. Each operation w \in \Omega has an arity i(w), which is an element of \Lambda^{\ast}, (the set of strings with characters drawn from \Lambda), and a return type o(w) \in \Lambda.

The signature for the theory of groups has one sort, X, and three operations:

  • e \colon[] \to X (identity)
  • i \colon[X] \to X (inverse)
  • m \colon[X,X] \to X (multiplication)

Given a signature \Sigma = (\Lambda, \Omega), a context is simply a labeled set (\Gamma, \mathrm{ty}) \in \mathsf{FinSet}/ \Lambda. In a context (\Gamma, \mathrm{ty}), we can define the set terms of sort S, \mathrm{Term}_{S}(\Gamma, \mathrm{ty}), in the following way, for S \in \Lambda

  • For i \in \Gamma with \mathrm{ty}(i) = S, \mathrm{var}(i) \in \mathrm{Term}_{S}(\Gamma, \mathrm{ty})
  • For w \in \Omega with o(w) = S, and t_{1},\ldots,t_{n} such that t_{j} \in \mathrm{Term}_{i(w)_{j}}(\Gamma, \mathrm{ty}), \mathrm{appl}(w, [t_{1},\ldots,t_{n}]) \in \mathrm{Term}_{S}(\Gamma, \mathrm{ty}).

For those in computer science, you should recognize this as an abstract syntax tree where the leaf nodes are either arity-0 operations or variables in the context.

We can implement this in Julia with the following data structures. Note that we use an “unlabeled” implementation.

using MLStyle

struct Operation
  arity::Vector{Int}
  ret::Int
end

struct Signature
  # representing the set {1,...,sorts}
  sorts::Int 
  operations::Vector{Operation}
end

@data Term begin
  Var(i::Int)
  Appl(op::Int, args::Vector{Term})
end

struct Context
  vartypes::Vector{Int}
end

function check_sort(s::Signature, c::Context, t::Term)
  @match t begin
    Var(i) => c.vartypes[i]
    Appl(op, args) => begin
      ts = map(arg -> check_sort(s,c,arg), args)
      @assert ts == s.operations[op].arity
      s.operations[op].ret
    end
  end
end

Having introduced this clean “unlabeled” format, we will immediately revert back to using a more conventional notation for actual examples, which can in theory be parsed back into this unlabeled format.

Now, the categorical perspective is always looking for morphisms, and it turns out there is a very natural definition for a morphism between two contexts.

Given two contexts (\Gamma_{1}, \mathrm{ty}_{1}) and (\Gamma_{2}, \mathrm{ty}_{2}), a morphism between them is an assignment of elements of \Gamma_{2} to terms in (\Gamma_{1}, \mathrm{ty}_{1}) with the same type.

That is, f(i) \in \mathrm{Term}_{\mathrm{ty}_{2}(i)}(\Gamma_{1}, \mathrm{ty}_{1}) for i \in \Gamma_{2}.

One can lift a morphism of contexts to apply to not just variables in the context but also terms, and the clearest way to show this is with code.

struct ContextMorphism
  dom::Context
  codom::Context

  # A term in the dom context for each variable in the codom context
  terms::Vector{Term}
end

# t must be a term in the codomain context
function (f::ContextMorphism)(t::Term)
  @match t begin
    Var(i) => f.terms[i]
    Appl(op, args) => Appl(op, f.(args))
  end
end

This allows us to compose morphisms, again the easiest way to show this is also with code.

function compose(f::ContextMorphism, g::ContextMorphism)
  @assert f.codom == g.dom
  ContextMorphism(f.dom, g.codom, f.(g.terms))
end

Given a signature \Sigma, let \mathrm{Ctx}(\Sigma) be the category of contexts and context morphisms. It is easy to show that \mathrm{Ctx}(\Sigma) is a Lawvere theory.

Note that by definition, \mathrm{Term}_{S}(\Gamma, \mathrm{ty}) = \mathrm{Ctx}(\Sigma)((\Gamma,\mathrm{ty}), S) (where by abuse of notation, S is the context with one variable of type S). That is, the set of terms of sort S in context (\Gamma, \mathrm{ty}) is precisely the morphisms from (\Gamma, \mathrm{ty}) to S in the category of contexts.

Presentations

The last ingredient to a presentation is the collection of equations. An equation consists of a context, and then two terms in that context of the same sort.

The equations for the theory of groups are

  • a:X, b:X, c:X \vdash m(a,m(b,c)) = m(m(a,b),c) (associativity)
  • a:X \vdash m(e, a) = a (left identity)
  • a:X \vdash m(a, e) = a (right identity)
  • a:X \vdash m(i(a), a) = e (inverse)

You can see that each equation has a context on the left, and then on the other side of the \vdash, two terms separated by an equals sign.

Note that another way of looking at an equation in context (\Gamma, \mathrm{ty}) and of type S is simply as a pair of morphisms from (\Gamma,\mathrm{ty}) to S.

A presentation for a Lawvere theory consists of a signature (\Lambda, \Omega) and a set \Xi of equations in that signature.

Given a presentation (\Sigma, \Xi), we construct a Lawvere theory T in the follow way. We start with the category \mathrm{Ctx}(\Sigma), and then we quotient out by the equations in \Xi by identifying the morphisms corresponding to the left and right hand side of each equation.

Everything but that very last step is very amenable to computation. However, “quotienting out” is a problematic and undecidable operation. It is this that lead me to seek out whether a higher-categorical structure that didn’t impose such strict equality could be more amenable to computation.

(2,1)-Lawvere Theories

The essential idea of (2,1)-Lawvere theories is to not quotient out by the equations in the presentation of a Lawvere theory, and instead keep track of equalities between terms as “proof objects”.

A (2,1)-category is a 2-category where all of the 2-morphisms are invertible.

A (2,1)-Lawvere theory is a (2,1)-category where every object is canonically a finite product of a fixed set of “sorts”.

A presentation for a (2,1)-Lawvere theory is just the same as a presentation for a regular Lawvere theory

To construct a (2,1)-Lawvere theory from a presentation (\Sigma, \Xi), we start with the 1-category \mathrm{Ctx}(\Sigma) and freely add in 2-morphisms corresponding to each of the equations.

There is still a slight problem with this, because “freely adding in 2-morphisms” actually does still involve quotienting out (for instance, taking the inverse of a morphism twice should end up with the same morphism again, and much more complex identities as well). But from a computational standpoint, we may not care about equality of 2-morphisms, only their existence and validity. Thus we could simply say that any two 2-morphisms with the same domain and codomain are equal, trivially making equality decidable.

What we are more concerned about is exactly how to represent a general 2-morphism in this category. However, it turns out that this is not too hard. There are exactly 5 ways of producing a new 2-morphism.

  1. Take a generating 2-morphism (i.e. one of the equations in the presentation)

  2. Take the identity morphism on a 1-morphism

  3. Invert an existing 2-morphism

  4. Vertical composition of two existing 2-morphisms

  5. Horizontal composition of two existing 2-morphisms

Of these 5 ways, the last one is by far the most interesting. It generalizes two classic rules from term rewriting.

The first classic rule is “congruence”, which can be phrased as “substituting equal expressions in to the same expression yields equal expressions” (see: Theorem Proving and Algebra). For instance, suppose we are in a theory with a single sort S and an operation + \colon[S,S] \to S. Then suppose that in context \Gamma, t_{1} = t_{1}\prime and t_{2} = t_{2}\prime. Congruence allows us to conclude that t_{1} + t_{2} = t_{1}\prime + t_{2}\prime.

Categorically, this is the horizontal composition of an equality \langle t_{1},t_{2} \rangle = \langle t_{1}\prime, t_{2}\prime \rangle with the identity on the map \langle x + y \rangle \colon[x:S,y:S] \to [t:S], which we can picture as follows. (We use \langle t_{1}, t_{2} \rangle to refer to the map \Gamma \to [x:S, y:S] that sends x to t_{1} and y to t_{2})

The second classic rule is “substitutivity”. This says that if \phi(x_{1},\ldots,x_{n}) = \phi\prime(x_1,\ldots,x_n), where x_{1},\ldots,x_{n} are variables, then we can substitute in any terms t_{1},\ldots,t_{n} to get \phi(t_{1},\ldots,t_{n}) = \phi\prime(t_1,\ldots,t_n).

This is horizontal composition the other way!

The upshot of all of this is that we can store a “witness” for an equality proof with a faily simple recursive data structure based on these 5 ways of contructing an equality.

@data Equation begin
  GeneratingEq(i::Int) # references an equation in the presentation
  Refl(f::ContextMorphism) # reflexivity of equality
  Sym(e::Equation) # symmetry of equality
  VCompose(e1::Equation, e2::Equation) # transitivity of equality
  HCompose(e1::Equation, e2::Equation) # congruence and substitutivity
end

Given an Equation, the procedure to check that it is valid is an straightforward recursive function, the implementation of which I leave to the reader.

Conclusion

My idea here was to show that taking the categorical approach to algebra seriously would lead to an elegant structure for witnesses of rewrites, and I think I have succeeeded at this! However, what is still left to do is to figure out how to make this sort of thing smooth for the end-user, not just the implementor. My hope is that the elegance in the categorical structure will allow for a slick, general, and powerful user interface for term rewriting, but I haven’t figured out exactly how to do that yet.

As always, thoughts comments and questions are welcome, and instructions for how to leave these are down below.


This website supports webmentions, a standard for collating reactions across many platforms. If you have a blog that supports sending webmentions and you put a link to this post, your blog post will show up as a response here. You can also respond via twitter or respond via mastodon (on your preferred mastodon server); through the magic of brid.gy all tweets or toots with links to this post will show up below (subject to moderation).
div

Site proudly generated by Hakyll with stylistic inspiration from Tufte CSS