Posted on February 1, 2020

Note: this post will require some familiarity with probability theory and analysis. Also, the main source for this post is the review article: (Master equations and the theory of stochastic path integrals)[https://arxiv.org/pdf/1609.02849v1.pdf]. It seems like it is an excellent summary of its subject matter, but it is very dense; my hope is that by rewriting some of its content I can get a better understanding.

One feature of Petri nets that I’ve been harping on for a while now is their ability to describe more than the dynamics of the expected value. Fundamentally, a Petri net describes a stochastic process, and there are interesting questions that we can ask about the random variables underlying that stochastic process that are not answered by the dynamics of the expected value. For instance, in a decay model, the graph that we get for the expected value never exactly hits 0. However, in the stochastic model, for any given “run”, there is some point at which there are exactly 0 atoms.

Another example of this is in the Lotka-Volterra predator prey model. The expected values repeat perfectly forever. However, there is a chance that all of the predators die off, and then the prey grows uncontrollably. There is also a chance that all the prey die off and then the predators die off too.

In general, in any Petri net in which the only reactions producing a given species s require s as input, when the population of s goes to 0, it will stay at 0 permanently. So if over any period of time there is a chance of the population falling to 0, then eventually the population is guaranteed to go to 0, as long as the chance doesn’t decrease too fast.

If you don’t know what a Markov process is, now would be an excellent time to look it up.

The defining data for a Markov process is the conditional probability distribution, p(t,n|t_0,n_0), which is the probability that the process will be in state n at time t \geq t_0 given that the process was in state n_0 at time t_0. The fundamental equation of a Markov process is the Chapman-Kolmogorov equation, which can be derived very easily with a little knowledge of probability. Let t \geq \tau \geq t_0 be times, and let n,m,n_0 be states. Then by elementary probability theory,

p(t,n|t_0,n_0) = \sum_m p(t,n; \tau,m | t_0, n_0)

where p(t,n; \tau,m | t_0, n_0) is the probability that the process is in state n at time t and in state m at time \tau given that it was in state n_0 at time t_0. Now, by definition of conditional probability,

p(t,n; \tau,m | t_0, n_0) = p(t,n| \tau,m ; t_0, n_0) p(\tau,m | t_0, n_0)

However, by the Markov condition, p(t,n | \tau,m ; t_0, n_0) = p(t,n | \tau,m). Therefore, we can transform the earlier equation into

p(t,n|t_0,n_0) = \sum_m p(t,n | \tau,m) p(\tau,m | t_0, n_0)

Now, if we view p(t,-|t_0,-) as a matrix with entries p(t,n|t_0,n_0), then this is precisely the formula for multiplying two matrices. By abuse of notation, we write p(t|t_0) for p(t,-|t_0,-). Then this formula simplifies to

p(t|t_0) = p(t|\tau) p(\tau|t_0)

We can write this suggestively as

where \Delta^A is the set of probability distributions on the (possibly infinite) set of states A. This suggests that a good way to think about a Markov process is as a functor from the poset \mathbb{R} to the category where the objects are convex sets (which we view as affine spaces over the rig \mathbb{R}_{\geq 0}) and the arrows are functions that preserve convex combinations of elements.

For traditional Markov processes, all of the objects in \mathbb{R} get mapped to the same space. However, we can imagine generalizations where this is not the case.

In a future post, I will develop this theory more for the case that not all elements of \mathbb{R} are mapped to the same probability simplex. However, for now we will assume that this is the case. We do this so that the following definition makes sense.

\partial_t p(t|\tau) = Q_t p(t|\tau)

This allows us to evolve the conditional distribution forwards in time. However, we can also work this the other way; we can take the derivative with respect to t_0 and evolve the conditional distribution *backwards* in time. This gives us the equation

\partial_{-\tau} p(t|\tau) = p(t|\tau) Q_t^T

It is not obvious that the same Q_t works for both equations, however this can be derived from the limit definition of partial derivative.

Now, it is often the case that the Markov process is presented by giving Q_t, and the challenge is to derive p(t|\tau) for all t and \tau. This can be acheived by noting that p(t|t) is the identity, and then evolving forwards or backwards. This is not always very easy, especially when there are an infinite number of states, so people have built a variety of tricks for solving this equation.

The central idea behind these tricks is that there is more than one way to express the probability simplex. For instance, if our set of states is the integers (as would be the case for a random walk), then a convenient way to express a probability distribution p_j is to represent it as a function

q \mapsto \sum_{n=-\infty}^\infty p_n e^{i n q}

This representation is a bijection because we can recreate the probability distribution from such a function by using Fourier analysis. Namely,

p_m = \int_0^{2\pi} e^{-i m q} \sum_{n=-\infty}^\infty p_n e^{i n q} d q

More generally, we have some linear space X along with collection of vectors which we write as \left| \Psi_n \right\rangle \in X for n a state. We also have a collection of linear functionals \left\langle \Psi_n \right| such that \left\langle \Psi_n \mid \Psi_m \right\rangle = \delta(n,m). If you have studied quantum theory, then this notation should be familiar to you. Otherwise, think of it just as a fancy typeface that differentiates between vectors and their associated linear functionals. \left\langle v \mid w \right\rangle is the linear functional \left\langle v \right| applied to the vector \left| w \right\rangle. Then to express a probability distribution as an element of X, we write

\sum_n p_n \left| \Psi_n \right\rangle

and then we retrieve the p_n from an arbitrary vector \left| v \right\rangle \in X we write

p_n = \left\langle \Psi_n \mid v \right\rangle

For instance, in the case of a random walk, we have

\left| \Psi_n \right\rangle = e^{i n q}

and

\left\langle \Psi_n \right| = \left| v \right\rangle \mapsto \int_0^{2\pi} e^{-i n q} \left| v \right\rangle dq

For a random walk, Q_t = Q is defined by:

Q(n,n+1) = \alpha_+ Q(n,n-1) = \alpha_- Q(n,n) = -(\alpha_- + \alpha_+)

That is, probability flows into adjacent cells at a rate of \alpha_+ for the left cell and \alpha_- for the right cell.

We can break this up using two operators c_+ and c_- which are defined by

c_+(n,n+1) = 1 c_-(n,n-1) = 1

Another way of putting this is that if e_n is the unit vector for state n, then c_+ e_n = e_{n+1} and c_- e_n = e_{n-1}. Using these operators, we write Q_t = \alpha_+(c_+ -1) + \alpha_-(c_- - 1). Now, in the Fourier basis, c_+ and c_- actually have very simple forms, namely

c_+ = e^{i q} c_- = e^{-i q}

You can see that these work because

c_+ \left| \Psi_n \right\rangle = e^{i q} e^{i n q} = e^{i (n+1) q} = \left| \Psi_{n+1} \right\rangle c_+ \left| \Psi_n \right\rangle = e^{-i q} e^{i n q} = e^{i (n-1) q} = \left| \Psi_{n-1} \right\rangle

Therefore,

Q = \alpha_+(e^{i q} - 1) + \alpha_-(e^{-i q} - 1)

Now, let’s use this to derive the conditional distribution. Let

\left| g(t|\tau,m) \right\rangle = \sum_{n} p(t,n|\tau,m) \left| \Psi_n \right\rangle

The differential equation that \left| g(t|\tau,m) \right\rangle must satisfy is

\partial_t \left| g(t|\tau,m) \right\rangle = \alpha_+(e^{i q} - 1) + \alpha_-(e^{-i q} - 1)\left| g(t|\tau,m) \right\rangle

For any fixed q, the coefficient of \left| g(t|\tau,m) \right\rangle is a constant, so the solution is

\left| g(t|\tau,m) \right\rangle = \mathrm{exp}(t(\alpha_+(e^{i q} - 1) + \alpha_-(e^{-i q} - 1))) \left| g(\tau|\tau,m) \right\rangle

To find \left| g(\tau|\tau,m) \right\rangle we plug in the initial value \left| g(\tau|\tau,m) \right\rangle = \left| \Psi_m \right\rangle, which we derive from p(\tau|\tau) = \mathrm{id}. And then finally,

p(t,n|\tau,m) = \left\langle \Psi_n \right| \mathrm{exp}(t(\alpha_+(e^{i q} - 1) + \alpha_-(e^{-i q} - 1))) \left| \Psi_m \right\rangle

Computing this is somewhat laborous, but it can be done, and the result is the well-known distribution for random walks. Moreover, even if it couldn’t be solved analytically, it could still be computed numerically reasonably efficiently.

Recall that a Petri net is defined by a set of species S along with a set transitions \{\pi_i\} with input k(\pi_i) \in \mathbb{N}^S, output l(\pi_i) \in \mathbb{N}^S and rate r(\pi_i) \in \mathbb{R}_{\geq 0}.

For now, let’s assume that there is one species and one transition, so that k, l, and r are fixed numbers. Then the rate at which the state n transitions into the state n - k + l is proportional to the falling factorial n(n-1)\ldots (n - k + 1) = n^{\underline{k}}, because this is the number of ways of selecting k elements with ordering from the n molecules available. Specifically, if \left| \Psi_n \right\rangle is a collection of basis vectors for a space, then the operator Q should be of the form

Q \left| \Psi_n \right\rangle = r n^{\underline{k}} \left(\left| \Psi_{n - k + l} \right\rangle - \left| \Psi_n \right\rangle\right)

We break this down as follows. Define two operators a and c (which stand for annihilation and creation respectively) by

a \left| \Psi_n \right\rangle = n \left| \Psi_{n-1} \right\rangle c \left| \Psi_n \right\rangle = \left| \Psi_{n+1} \right\rangle

The reason that a has a coefficient of n in front and c has no coefficient is that there are n ways of taking an element away from a set of n elements, but only one way of adding an element. Now, note that

a^k \left| \Psi_n \right\rangle = n^{\underline{k}} \left| \Psi_{n-k} \right\rangle

so therefore

c^k a^k \left| \Psi_n \right\rangle = n^{\underline{k}} \left| \Psi_n \right\rangle

and also

c^l a^k \left| \Psi_n \right\rangle = n^{\underline{k}} \left| \Psi_{n - k + l} \right\rangle

We can combine these two to get

Q = r(c^la^k - c^ka^k) = r(c^l - c^k)a^k

Now, the question becomes how to give concrete representations of c and a in such a way that Q becomes easy to deal with. To answer this, we let \left| \Psi_n \right\rangle = q^n, and c = q, a = \partial_q. One can verify that this definition agrees with the one above. This gives a power series representation of the probability distribution

\left| g(t|\tau,m) \right\rangle = \sum_{n=0}^\infty p_n \left| \Psi_n \right\rangle == \sum_{n=0}^\infty p_n q^n

However, there are also other choices for \left| \Psi_n \right\rangle that may make different problems more or less easy.

Remember way back when I was talking about the backwards master equation? So, it turns out that there is something very sneaky you can do with the backwards master equation. Let

\left| p(t,n|\tau) \right\rangle = \sum_{m} p(t,n|\tau,m) \left| \Psi_m \right\rangle

where \left| \Psi m \right\rangle = \frac{x^n e^{-x}}{n!}. Then we can write down a backwards master equation for \left| p(t,n|\tau) \right\rangle in a similar manner to the forwards master equation we saw before, and it is sometimes possible to solve this analytically. Then, we notice that in fact $ _x $ is the probability that the process is in state n at time t given that it was Poisson distributed with mean x at time \tau, and this value satisfies the forwards master equation for t! So we solved the forwards master equation starting at a Poisson distribution by solving the backwards master equation using Poisson distributions as a basis!

We classify two categories of states. In the first category of states, the species s has a non-zero population, and in the second category of states, the species s has population zero. Once we transition from the first category of states into the second category of states, then we will stay there for the rest of the Markov process. Therefore what matters is the rate at which probability flows from the first category of states into the second category of states.