# Mean Extinction Times

## Extinction Times

One feature of Petri nets that I’ve been harping on for a while now is their ability to describe more than the dynamics of the expected value. There are interesting questions that we can ask about the *random* variables underlying that stochastic process that are not answered by the dynamics of the expected value. For instance, in a decay model, the graph that we get for the expected value never exactly hits 0. However, in the stochastic model, for any given “run”, there is some point at which there are exactly 0 atoms.

In general, in any Petri net in which the only reactions producing a given species s require s as input, when the population of s goes to 0, it will stay at 0 permanently. So if over any period of time there is a chance of the population falling to 0, then eventually the population is guaranteed to go to 0, as long as the chance doesn’t decrease too fast.

For instance, in a predator-prey petri net, the expected values oscillate forever. However, in an actual stochastic model, after a finite time either all the wolves die off, or all the sheep die off.

To investigate this mathematically, we let T be a random variable that represents the time at which the species s dies off. We want to compute the mean of T.

## An Important Lesson

When I first tried to write this post, I was reading an article that had a section on mean extinction times, and I was trying to write about how to solve some equations to calculate the mean extinction time. However, it turns out that we are only able to solve these equations in very special cases, namely in Petri nets with one or two species. Additionally, the amount of math needed to understand the equations is very high, and I ended up writing an absurd amount of background before getting anywhere near mean extinction times. I will write a high-level post on this at some point, but that time is not now.

Moreover, one of the purposes of learning about mean extinction times for me was to add a calculator for mean extinction times on EZ Petri. None of the methods in the article were anywhere close to general enough to be able to create such a calculator.

Fortunately, one other part of the article described a efficient and accurate simulation algorithm for sampling paths from the Markov process of a Petri net (as opposed to my algorithm in “Petri Nets for Computer Scientists”, which is neither efficient or totally accurate, though is perhaps more intuitive). Armed with this algorithm, we don’t need to solve any equations. We can just run the algorithm a hundred times, or a thousand times or a million times, and this gives us the data to compute the mean and standard deviation of the extinction time. This is very accurate, and quite fast. I wrote the algorithm in about two hours, and I can run the Lotka Volterra model to extinction 10,000 times in about two minutes.

The only objection is that this feels like I’m not really solving any equations; I’m giving up on understanding the solution. However, the hard fact of life is that even if I had ended up with an equation to solve, I probably would have solved it numerically, which is what I did for the master equation anyways. From an aesthetic perspective, if I’m already just going to solve something numerically, it’s not so different to solve it with a randomized numerical method.

So, today I learned an important lesson that you should take deeply to heart. The lesson is that if you are trying to solve a problem involving random variables, then instead of solving it as a math problem you can just simulate it on a computer. OK, back to Petri nets.

## A (Sketch of an Accurate and Fast) Simulation Algorithm

Recall my previous simulation algorithm for Petri nets in “Petri Nets for Computer Scientists”. Essentially the idea there was to take a continuous process and simulate it in fixed time steps.

The essential difference between this previous algorithm and the algorithm I am about to present is that for the new algorithm, rather than using a fixed time step it instead uses a random variable to govern the time between state changes.

You can think about this in the following way. In the previous algorithm, if you were in a given state, then each time step you roll a die (with any number of sides), and depending on the result of the die, you either move to a different state or stay in the same state. Whenever you are in a given state, you roll the same die.

Now, the die has two types of faces: faces which make you stay in the same state, and faces which make you move to a different state. Therefore, we can have the same effect as the die by having a coin where heads you move to *some* new state, and tails you go nowhere, and then a new die that you roll after the coin flip to see which new state you end up in.

Notice that you only need to roll the new die if you get heads on the coin. So essentially what happens is that you keep flipping the coin until you get heads, and then you roll a die to see where you go next. The only information that you get from the coin is the number of tails. Therefore, we could replace the coin with some random number generator that tells you how many tails you got.

Remember that the earlier algorithm got more and more accurate as the time step interval got shorter and shorter. In our new scheme, where at each state we wait for a given number of periods and then jump to a new state, the time step interval getting shorter and shorter *only changes the granularity* of the time that we wait; it doesn’t change the probabilities of moving to another state after we have waited that amount of time. Therefore, we can just take the time step interval to be infinitely small, and then the time that we wait between jumps is a real number.

So our new algorithm is the following.

- Initialize the variable S to our first state.
- Wait an amount of time controlled by a random variable that depends on S
- Jump to a new state with probabilities again depending on S, and go to step 2.

All that remains is to figure out what the random variables in step 2 and 3 are. To do this, we need to do some more difficult math, so if you want to get off the blog train, now is the time.

## The Nitty Gritty

Recall that in a Markov process, the probability vector over states evolves according to a differential equation of the form:

\frac{\partial}{\partial t} p(x,t|x_0,t_0) = \sum_{y} H_{yx} p(y,t|x_0,t_0)

where x ranges over the state space of the Markov process, and p(x,t|x_0,t_0) is the probability of being in state x at time t given that the system was in state x_0 at time t_0. We call this equation the *master equation*. We also fix the initial condition p(x_0,t_0|x_0,t_0) = 1.

The interpretation of the above equation is that for y \neq x, H_{yx} (which is positive) is the rate that probability flows from state y to state x, and H_{xx} (which is negative) is the rate at which probability is flowing *out* of state x. In order for probability to be conservered, we require that \sum_{y} H_{yx} = 0.

Now, suppose that the process is in the state x_0 at time 0. Let T be a random variable representing the time at which the process leaves state x_0. Intuitively, T is exponentially distributed because it is memoryless. To see this in more detail, consider a derived Markov process from the original. This new Markov process has two states, s_0 and s_1. As long as the first Markov process is in x_0, the second Markov process is in state s_0. Once the first Markov process moves anywhere else, the second Markov process moves to state s_1 and stays there forever. Denote by p^{x_0}(s,t|s_0,t_0) the conditional probability for this Markov process, and denote by H^{x_0} the matrix of probability flows. Then by inspection we can see that

H^{x_0} = \begin{pmatrix} 0 & 0 \\ -H_{x_0x_0} & H_{x_0x_0} \end{pmatrix}

Solving the master equation for this new Markov process shows that the probability of being in state s_0 at time t given that we were in state s_0 at time 0 is e^{H_{x_0x_0} t}. Therefore, the probability of being in state s_1 at time t is 1 - e^{H_{x_0x_0} t}. Now, note that the probability of being in state s_1 at time t is exactly the value of the cumulative distribution function for T at t. We recognize this as the cumulative distribution for an exponentially distributed random variable with rate \lambda = -H_{x_0x_0}.

To derive the probability mass function for the state that the Markov process jumps to after waiting an exponentially distributed amount of time, consider the conditional probabilities p(x_i,h|x_0,0). By definition of derivative,

p(x_i,h|x_0,0) - p(x_i,0|x_0,0) = h p(x_0,0|x_0,0) H_{x_0x_i} + o(h^2)

Therefore,

p(x_i,h|x_0,0) = h H_{x_0,x_i} + o(h^2)

If we condition on the fact that X(h) \neq x_0 (where X(t) is the random variable for the Markov process at time t), then we get

\mathbb{P}(X(h) = x_i|X(0) = x_0,X(h) \neq x_0) = \frac{H_{x_0x_i}}{\sum_{x_j \neq x_0} H_{x_0x_j}} + o(h^2)

We can then take h \to 0 to get the probability mass function

\frac{H_{x_0x_i}}{\sum_{x_j \neq x_0} H_{x_0x_j}}

Up until now we have been working with a generic Markov process, and in fact this algorithm does work with a generic Markov process. However, if we want to run this algorithm for a Petri net, we must define our H_{yx} using the data of a Petri net (i.e., rates and transitions). This is not something you can “prove”, because it is part of the definition for the semantics of a Petri net. Therefore, all I can do is try and convince you that it is a reasonable definition. If you read my original post on Petri Nets, *Petri Nets for Computer Scientists*, then you should have the intuition for the following. Namely, if the transition t has rate r, goes from state x to state y, and has input vector n, then we have

H_{xy} = r x_1^{\underline{n_1}} \ldots x_s^{\underline{n_s}}

where a^{\underline{b}} is the “falling exponential” a(a-1)\ldots(a-b+1). This represents the number of input tuples for the transition, multiplied by the rate of the transition. If there are no transitions between x and y, then H_{xy} = 0. Finally, H_{xx} is, as always, the rate at which probability is leaving state x, so it is the negative of the sum of H_{xy} for y \neq x.

If you want to see the implementation, it is up on Github.

## Wrap-up

I have been experimenting with different parameters for the stochastic simulation algorithm, specifically for the Lotka-Volterra model. Some have very short extinction times, especially for small numbers of wolves/sheep, and some remain stable for a long time before eventually dying. It seems that changing the value of a parameter by a very small amount can drastically change the mean extinction time. In other words, the mean extinction time, even though it is just a number, has a very complex relationship to the parameters of a Petri net. It is little surprise then that there is not a clear way of calculating it for general Petri nets. In my post from last year, *Illegible Science*, I talk about sometimes having to give up on analytic solutions and become more reliant on computer models. This post should be read as additional evidence for this viewpoint: even in a case with such a simple model, there is no clear way to solve for basic properties apart from simulation.