Posted on April 17, 2019

If you have ever taken a physics class, you have no doubt been subjected to the laughable notion that a derivative is the quotient of two infinitesimal numbers, \(dx\) and \(dy\), perhaps accompanied by some diagram like this: ^{1}

The slope of the red line is \(\frac{dy}{dx}\), and because \(dx\) and \(dy\) are infinitesimal, the red line is tangent to the curve, so the derivative is just \(\frac{dy}{dx}\)!

When confronted by a real mathematician, the physicist will typically admit that she knows that infinitesimals don’t really exist and that this is only a convenient way of thinking about derivatives. Of course, once the mathematician has left the room the physicist will immediately recklessly multiply both sides of an equation by an infinitesimal in order to solve a differential equation.

In the view of a physicist, math is what works. It’s actually not such a bad attitude: if something you are doing works in a consistent way, there’s probably some way of formalizing it. If logic and nature disagree, we cannot change the nature, so we change the logic. And in fact, there are several ways that we can make infinitesimals perfectly rigorous, and I will show you one of them today (another may come in a later post).

But before we get into that, a bit of philosophy. The question you may have on your mind is whether infinitesimals *actually* exist. Are there physical quantities that are infinitesimally small? The answer is that infinitesimal numbers are just as real as rational numbers, or negative numbers. The only sort of number that we can measure in the real world is a natural number (because we can only ever measure up to a certain precision). We add negative numbers, rational numbers, real numbers and complex numbers into the mix because they are really convenient abstractions, and they can be given mathematical meaning. If infinitesimals are convenient and can be given mathematical meaning, who is to say that they aren’t real?

The first approach to finding infinitesimals conjures them out of thin air. To do this approach in *full* detail would require a thorough introduction to formal logic and model theory, and if you want to really understand how this works, I encourage you to refer to Herbert Enderton’s An Introduction to Mathematical Logic. I have had moderate success explaining to people the basics of the way the argument goes without getting too far into the details, so I will attempt to recreate the argument in suitable informality so that it can be understood on first reading, while still sticking to the heart of the proof. The idea here is that this blog post should be a lazy Sunday’s reading that gets you interested in infinitesimals and logic, not a rigorous course.

I’m going to lay out where we are going briefly, so that you can see the destination as we slog through some definitions. If you get lost, skip this part; if you get bored later on come back to it.

The heart of this proof lies in formalizing what it means to prove something. Essentially, we will make a language, and define what a proposition in that language is. Then we will say that a “theory” is a set of propositions. A “model” for a theory is a mathematical universe where all of those propositions are true.

Then we are going to come up with an absolutely outrageous theory (set of propositions). This theory is going to contain every true statement we can make about the real numbers, and also for each \(r \in \mathbb{R}\) it will also have a proposition saying that \(\kappa\) is larger than \(c_r\), where \(\kappa\) and \(c_r\) are “constant symbols” in the language. A model must assign values to all the constant symbols. Having all of the true statements about the real numbers will force the values assigned to the constant symbols \(c_r\) to behave like the real numbers, but then saying that \(\kappa\) is greater than all of these things that behave like real numbers will force \(\kappa\) to be an “infinite number”, that is still constrained by the true statements of the real numbers to also behave like a real number. For instance, you can divide \(1\) by it. This creates an infinitely small number, and this will give us our infinitesimal.

The trick is proving that such a model exists. To do this, we will use a theorem (called “the compactness theorem for first order logic”) which says that if a model exists for every finite subset of a theory, a model must exist for the theory.

The proof of this theorem is very simple. It relies on another theorem which says that a theory has a model if and only if you cannot prove a contradiction from the theory.

Now, suppose we have some set \(\Gamma\) (possibly infinite) of propositions, such that all finite subsets \(\Gamma_0\) have models. Suppose that we could prove a contradiction from \(\Gamma\). Well, a proof is a finite object, so we could have only cited a finite number of propositions from \(\Gamma\). Take the propositions that we cited and put them in a set, \(\Gamma_0\). By our original hypothesis, \(\Gamma_0\) has a model. Therefore it cannot prove a contradiction, which contradicts our assumption that we had proved a contradiction. Therefore, \(\Gamma\) cannot prove any contradictions, and thus there exists a model for it.

If we go back to our outrageus theory, we can make a model for any finite subset by simply assigning \(\kappa\) to one more than the largest number mentioned anywhere in the finite subset, and assigning all of the rest of the real number symbols to just the normal real numbers. Therefore, by the compactness theorem, we have a model for the entire theory, which was what we wanted.

OK, let’s get into the details.

Model Theory is the study of how to interpret languages. There are two parts to it: a syntactic side, and a semantic side. The syntactic side is concerned with pushing symbols around on paper. The semantic side is concerned with giving those symbols meaning.

For most mathematicians, formal languages are a bit off the beaten path, and when Gödel first developed some of the theory that we will be talking about, he did it in kind of complicated and unintuitive way. However, for anyone who has ever programmed in the modern world, a formal language is very familiar. I am going to take advantage of the fact that you probably have some intuitions about how computer code works to simplify the presentation of logic. In order to facilitate this, I am going to use a different notation than the ones logicians typically use.

Sentential logic is a stripped down version of first order logic, so I will introduce it first so you get a feel for what we will be doing in first order logic in a simpler setting.

In sentential logic, we start with some basic propositions, like \(\texttt{P}\) and \(\texttt{Q}\), and we combine them together to create more fancy propositions, like \(\texttt{[∧ P Q]}\). There are a couple of things to notice about this more fancy proposition. The second symbol, “∧” is the sign in mathematics for “and”. It is what we call a “logical connective”, it connects two propositions together to form a new proposition. Also, the whole proposition is enclosed with square brackets. This is akin to LISP or Scheme, the first thing in the brackets is a logical connective, and then the rest are the arguments to the logical connective.

From this, you can guess how to build even fancier propositions, like \(\texttt{[∧ [∨ P Q] [¬ R]]}\), where “∨” is the sign for “or” and “¬” is the sign for “not”. Another important connective is “implies”, which we write as \(\texttt{[⇒ P Q]}\).

Now, if we assign truth values to all of the basic propositions (if we are given a function from the name of the basic proposition to a boolean), it’s pretty easy to write an algorithm for figuring out the truth value of a fancier proposition. Essentially, when we have a logical connective applied to several arguments, first recursively evaluate the arguments, and then apply the logical connective to the result of evaluating the arguments.

The thing that assigns true values to the basic propositions is called a “model”. This will mean something different in first order logic, and it will feel more “model-y”, but for consistency’s sake, I’m going to call the sentential variant a model too.

We can use the idea of models to formalize what it means for a set of propositions to logically entail another proposition. Now, there are going to be three types of implication in total that we will be talking about, so I’m going to clear them up now. The first type we’ve already seen; it is “in-language implication”, is denoted by \(\texttt{⇒}\), and only appears in propositions. The second type, which we will talk about now, is “semantic implication”, is denoted by \(\vDash\), and never appears in propositions. The last type, which we will get to later, is “syntactic implication”, is denoted by \(\vdash\), and also never appears in propositions.

First, say that a model “satisfies” a set of propositions if when you interpret the propositions in the model you end up with a true statement. Basically, if we take the proposiiton “the sky is blue”, on Earth it is true that the sky is blue, so the Earth-model satisfies that proposition. However, on Mars the sky is not blue, so the Mars-model does not satisfy the proposition.

Now, a set of propositions \(\Gamma\) semantically implies another proposition \(\tau\) (written \(\Gamma \vDash \tau\)) if all models that satisfy \(\Gamma\) also satisfy \(\tau\). In other words, to figure out if \(\tau\) is semantically implied by \(\Gamma\), look at all the possible models that are compatible with \(\Gamma\). If \(\tau\) is true in all of them, then something about \(\Gamma\) must be force \(\tau\) to be true whenever \(\Gamma\) is true.

Here’s an example. We want to show that \(\{\texttt{[⇒ A B]},\texttt{A}\} \vDash \texttt{B}\). Let’s look at all the possible values for \(\texttt{A}\) and \(\texttt{B}\).

\(\texttt{A}\) | \(\texttt{B}\) | \(\texttt{[⇒ A B]}\) |
---|---|---|

\(\texttt{TRUE}\) | \(\texttt{TRUE}\) | \(\texttt{TRUE}\) |

\(\texttt{TRUE}\) | \(\texttt{FALSE}\) | \(\texttt{FALSE}\) |

\(\texttt{FALSE}\) | \(\texttt{TRUE}\) | \(\texttt{TRUE}\) |

\(\texttt{FALSE}\) | \(\texttt{FALSE}\) | \(\texttt{TRUE}\) |

Now, look at all of the rows in which \(\texttt{A}\) and \(\texttt{[⇒ A B]}\) are true. Is it the case that \(\texttt{B}\) is also true? Well, in this case there is only one column where both are true, and in that case \(\texttt{B}\) is in fact true. So we have proved that \(\{\texttt{[⇒ A B]},\texttt{A}\} \vDash \texttt{B}\).

Some people believe that when we do math, we are talking about objects that actually exist in some sort of platonic realm. This is what semantic implication talks about. However, some people believe that we are just pushing symbols around on paper, and it doesn’t really “mean” anything. You can’t talk about all models satisfying a theory because models don’t really exist. Well, I have good news for these people, because right now we are going to throw away models.

What does it mean for a set of propositions to imply a proposition? When you walk into your first math course you learn that a set of propositions \(\Gamma\) implies another proposition \(\tau\) if you can *prove* \(\tau\) using the propositions in \(\Gamma\). But what is a proof? Let’s give the dumb answer: a proof is a bunch of sentences you write down on paper. Actually, this isn’t *that* dumb because we already have the machinery to formalize this. We know what a proposition is, so why not define a proof to be a list of propositions?

The one problem is that if we allowed *any* list of propositions to be a proof, math class would get too easy. So how about this: a proof is a list of propositions, each of which either follows logically from previous propositions or is obvious. A proposition \(\texttt{A}\) follows logically from earlier propositions if the earlier propositions contain \(\texttt{[⇒ B A]}\) and \(\texttt{B}\).

Obviousness is a little trickier. It’s highly non-obvious what counts as obvious. The good-for-nothing lazy bum way to do this is to say that a proposition is obvious if it holds in every model, with no restrictions on the model. It turns out that you can also make a good description of obvious propositions purely by pushing symbols around on paper, but that’s kind of fiddly and obnoxious. So, we’re just going to to the good-for-nothing lazy bum way, and say that if we *really* wanted to, we could do the fiddly rules.

For sentential logic, the obvious propositions are called “tautologies”: they are stuff like \(\texttt{[⇒ A A]}\) and \(\texttt{[∨ A [¬ A]]}\). Clearly, whatever proposition you put in the place of \(\texttt{A}\) these will always be true.

Anyways, the important take-away is that a proof is a sequence of statements with some conditions on them. Then we say that \(\Gamma \vdash \tau\) if there exists a proof of \(\tau\) where you are allowed to write down propositions from \(\Gamma\) in the proof.

The logical question to be asking is “does this whole syntax vs. semantics thing actually matter?” Like, are there some mathematicians who don’t believe in semantic proof and go around proving things purely syntactically and are there some mathematicians who don’t believe in syntactic proof and only make appeals to underlying truths?

The answer is that it turns out that \(\Gamma \vDash \tau\) iff \(\Gamma \vdash \tau\). We can’t prove this because we haven’t been rigorous enough with our definitions, but you can read Enderton if you want to see a proof.

Fortunately, there exists a discipline in which not having rigorous definitions and not knowing how to prove things is A-OK, and it’s called philosophy. What does it really *mean* for the syntactic approach and the semantic approach to coincide? A syntactist might say that all it really shows is that this attempt to appeal to “all possible models” is really couched in language and syntax all the way down, so it’s no surprise that it ends up the same. On the other hand, a semanticist might say that it shows that all the fiddling with symbols is actually just reflecting the true cosmic unity of yadayadayada. OK, enough philosophy.

First order logic is a whole different ball-park from sentential logic. In theory, for any finite set of sentential propositions you could just crunch all of the possible truth assignments, and it might take a while, but you’d eventually figure out what implies what. In first order logic, that’s not possible.

This is because first order logic allows your propositions to talk about *objects*. Now, if you had a finite set of objects, this wouldn’t be different from sentential logic, because you could just create different propositions to deal with each of those objects, and any time you wanted to say, for instance, that all of the people in the world like ice cream, you could create propositions “Larry likes icecream”, “Suzy likes icecream”, “Mohammed likes icecream”, “Erzsebet likes icecream”… and chain them all together into a sentence with 7 billion clauses. It wouldn’t be pretty, but it would work.

But if you want to talk about *infinitely* many objects with finite-length sentences, it is clear that this sort of tomfoolery is right out of the picture. We need some way of saying ALL of the objects have some property. Let’s take a crack at this and make up some syntax. Let’s say that \(\texttt{P}\) is the property of “primeness”, \(\texttt{O}\) is the property of oddness and let’s say that \(\texttt{[∀ [⇒ O P]]}\) means that the property of oddness implies the property of primeness. OK, seems good so far. But what happens if we want to talk about all combinations of *two* numbers? With this syntax there’s no way of telling which properties apply to which numbers.

It is for this reason that we introduce *variables*. The real syntax for “all odd numbers are prime” is \(\texttt{[∀ x [⇒ \{O x\} \{P x\}]]}\) (read: for all \(x\), \(O\) of \(x\) implies \(P\) of \(x\)). \(\texttt{P}\) and \(\texttt{O}\) are what are called “predicate symbols”; they take in a variable (or several variables) and return a proposition that can be combined with other propositions. This makes them unlike logical connectives, which take in propositions and return propositions, and this is the reason I have surrounded them in curly brackets. \(∀\) is a *quantifier*: it declares the context in which we can use the variable \(\texttt{x}\). One way I like to think of it is that when we quantify over a variable we are essentially declaring a function of \(\texttt{x}\) and saying that no matter what value of \(\texttt{x}\) gets passed in to the function, the proposition will still be true.

The carefully quantified statement that you might have come across before is the epsilon-delta definition of a limit. In English, the statement is "\(\lim_{x \to 0} f(x) = L\) if and only if for all \(\epsilon> 0\) there exists a \(\delta > 0\) such that \(\vert x \vert < \delta\) implies that \(\vert f(x) - L \vert < \epsilon\) . If we were to put this into our rigorous notation, it would look like

\(\texttt{[∀ ε [⇒ \{> ε 0\} [∃ δ [∧ \{> δ 0\} [∀ x [⇒ \{< (abs x) ε\} \{> (abs (- (f x) L)) 0\}]]]]]]}\)

In this statement there is another type of delimiter that I didn’t explain before: parentheses. These are for *function application*. This allows us to state properties not only of variables themselves, but also the output of functions when applied to those variables.

There is a strict hierarchy of delimeters: parentheses → curly bois → square brackets. You can never go “backwards”, so you can’t have a curly boi inside a pair of parens, and you can’t skip the curly boi stage, so you can’t have parens directly inside square brackets.

At the first level of the hierarchy, parentheses, we are building *terms*. Terms are either variables, or functions applied to other terms (it’s a recursive definition). There is also a special type of term which is a zero-argument function symbol, we call this a *constant symbol*. For instance, \(\texttt{0}\) is a constant symbol.

At the second level of the hierarchy, curly bois, we are transforming terms into propositions. This is where *relation symbols* come in. When you apply a relation symbol to terms, you are asserting that some relation is true, like \(\texttt{\{> 1 0\}}\).

Finally, at the third level of the hierarchy, we are combining propositions together and quantifying over variables. Combining propositions is essentially exactly the same as sentential logic, only instead of lame propositions like \(\texttt{P}\) we have exciting curly-boi surrounded relation symbols. Quantifiers, on the other hand are a whole new kettle of fish. Quantifiers are the meat of first order logic, the big cheese in town, a literal five-course meal with coffee afterwards. They are so special that they deserve their own heading.

The question that you might have been wondering is what do quantifiers quantify over? In the definition of a limit we didn’t even say what \(\texttt{ε}\) even *was* before comparing it zero. I said that when you for-all quantify over something it’s like having a function that takes in a value for the variable and shows that it satisfies the property inside of the for-all, but what if somewhat passes in a cat into the \(ε\) quantifier in the definition of a limit? Can a cat be greater than \(0\)? Is there a \(δ\) such that for all \(\vert x \vert < δ\), \(\vert f(x) - L \vert\) is less then Mr. Fluffy? The answer is, whenever we talk about interpreting a first order logic sentence, the first thing we do is we fix the universe of quantification, \(A\). And if you are a serious mathematician doing serious math, you do not allow cats into your universe of quantification. Instead, in this case the universe of quantification might be something nine-to-five, like \(\mathbb{R}\).

To go further into a discussion of how quantifiers work, we need to take a step back and figure out how we would even interpret a first-order proposition *without* quantifiers. The first thing we’d need is some way of assigning values from the universe to the variables in the proposition. Let’s say the set of all variables is \(V\), then we need a function \(\texttt{vlookup} : V \to A\). We also need some way of interpreting the function symbols. Let’s say that the set of function symbols that take in \(n\) terms is \(F_n\), then we need a function \(\texttt{flookup}_n : F_n \to (A^n \to A)\).

Now, these two functions give us a way to interpret terms (all the stuff in parentheses). A term is just a tree where the leaf nodes are variables or constant symbols and the internal nodes are function symbols, so you can recursively go through this tree, using \(\texttt{vlookup}\) to get the value of leaf nodes and applying the result of \(\texttt{flookup}\) to the child nodes of an internal node to get the value of that internal node.

Once we have interpreted the terms, we can interpret the relation symbols. To do this, we have… yet another lookup function (this is the last one, I promise!) \(\texttt{rlookup}_n : R_n \to (A^n \to \texttt{Boolean})\), which takes a relation symbol (from the set of \(n\)-place relations symbols \(R_n\)) and returns a function that decides whether a \(n\)-tuple of values from \(A\) have the property that the relation symbol codes for. This function allows us to interpret the curly boi surrounded statements. Once we’ve interpreted the curly boi statements, we are home free, because then it’s just sentential logic.

If you’ve been paying attention, you should have noticed that if each curly boi statement were a single symbol, this would essentially amount to a fancy way of constructing a sentential logic model.

You might have also been wondering where on earth the \(V\), \(F_n\) and \(R_n\) sets came from. How do you decide what is the canonical set of function symbols? Does every model have to interpret all of the function symbols? Even the weird ones?

The answer is that I was sloppy way way back when I started to define a first order proposition. I let you assume that you could just stick any old function symbol or relation symbol into the proposition, but actually before you put pen to ink you have to decide what *language* you are going to be writing in. A language is very simply a definition for the \(F_n\)’s and the \(R_n\)’s: that is the missing piece (the set \(V\) is not as important for reasons that we will find out later, so just pick any convenient infinite set and call it a day). Whenever you have a model in first order logic, it is a model *for a language*, and we only give definitions for the relation and function symbols *in that language*.

For instance, one common language that you talk about in first order logic is the “language of arithmatic”. In this language we have one constant symbol, \(\texttt{0}\), one unary function symbol, \(\texttt{S}\) (“successor”), and two binary functions, \(\texttt{+}\) and \(\cdot\). There is also the binary relation \(\texttt{=}\), though in general when a language includes \(\texttt{=}\) we require the model to interpret it as actual equality instead of a generic relation, so it’s kind of a special case. An example of a sentence in this language could be \(\texttt{[∀ n [∀ m \{= (+ (S n) m) (S (+ n m))\}]]}\).

The standard model for this language has \(\mathbb{N}\) as the universe of quantification, \(0\) as \(\texttt{0}\), \(k \mapsto k + 1\) as \(\texttt{S}\) and the standard multiplication and addition as \(\texttt{+}\) and \(\cdot\). Also, we arbitrarily say that all variables are \(0\). It turns out that this initial setting of variables doesn’t matter because we will only look at propositions where all variables are quantified over. In this model, the above sentence says that for all \(n,m \in \mathbb{N}\), \((n + 1) + m = (n + m) + 1\).

OK, now we are ready to formally define quantification. As always, we work recursively, so we are going to define how to interpret \(\texttt{[∀ x F]}\) assuming that we know how to interpret \(F\) in any model. Suppose that \(M\) is a model, \(x\) is a variable, and \(a\) is in the universe of quantification. Define \(M[x \leftarrow a]\) to be the model with everything assigned to the same value as in \(M\), except for \(x\), which is assigned to \(a\). Then \(\texttt{[∀ x F]}\) is true in \(M\) if and only if \(\texttt{F}\) is true in \(M[x \leftarrow a]\) for all \(a\) in the universe of quantification. Similarly, \(\texttt{[∃ x F]}\) is true in \(M\) if and only if there exists some \(a\) in the universe of quantification with \(\texttt{F}\) true in \(M[x \leftarrow a]\).

Now, think about that definition and why it implies that as long as you quantify over every variable before using it, it doesn’t matter what variable assignment you start with. We call a proposition that quantifies over a variable before using it a “sentence”.

This is almost exactly the same as sentential logic. A set of sentences \(\Gamma\) semantically implies another sentence \(\tau\) if and only if all models satisfying all sentences in \(\Gamma\) also satisfy \(\tau\). Moreover \(\Gamma \vdash \tau\) if and only if there exists a sequence of sentences, each of which either is an element of \(\Gamma\), an “obvious” sentence, or follows logically from earlier sentences. Here “obvious” is even more obnoxious to define than it is in sentential logic, so we are going to ignore it even harder.

Just as before, we have theorems implying that \(\Gamma \vdash \tau\) if and only if \(\Gamma \vDash \tau\), and just as before, we will not prove them.

However, we will talk about some of their consequences. Call a set of sentences *consistent* if no contradiction can be proved from it (like \(\texttt{[∧ P [¬ P]]}\)), and call a set of sentences *satisfiable* if there exists a model that satisfies all of the sentences. If you can prove a contradiction from a set of sentences, then all of the models that satisfy the set of sentences must also satisfy the contradiction (by what we said before), so there can be no models. Conversely, if there are no models of a set of sentences, then all models which satisfy the set of sentences also satisfy a contradiction, so the set of sentences proves a contradiction. This shows that consistency is equivalent to satisfiability.

We can use this idea to get something else interesting. If \(\Gamma \vdash \tau\), there is some proof of \(\tau\), which is a finite sequence of statements. If we throw away all of the statements of \(\Gamma\) that were not mentioned in that proof, we get a set \(\Gamma_0 \vdash \tau\), by the exact same proof. And moreover, \(\Gamma_0\) is finite, because we can only mention finitely many sentences in a finite sequence of statements. Therefore, if \(\Gamma\) is inconsistent (if \(\Gamma\) derives a contradiction), there is some finite subset \(\Gamma_0\) that is inconsistent. Flipping that around, if all finite subsets \(\Gamma_0 \subset \Gamma\) are consistent, then \(\Gamma\) itself must be consistent. And finally, we can apply our previous result: if all finite \(\Gamma_0 \subset \Gamma\) are satisfiable, \(\Gamma\) must be satisfiable. This last statement is called the *Compactness Theorem*.

OK, now we can make some infinitesimals.

This language is just going to be ludicrously huge. Like, you are going to look at this language and think that whoever designed it had some serious insecurities that they were compensating for. Call it \(L\).

First of all, we put a constant symbol \(\texttt{c}_r\) into \(L\) for every real number \(r \in \mathbb{R}\). Already, this language is too big. But we don’t stop there. No, we put in a \(n\)-place function symbol \(\texttt{f}_\phi\) for every function \(\phi : \mathbb{R}^n \to \mathbb{R}\). And I’m not just talking about the nice functions, like smooth functions or continuous functions or measurable functions. I’m talking about all of them. Then we put a \(n\)-place relation symbol \(\texttt{R}_ρ\) for every relation \(ρ : \mathbb{R}^n \to \texttt{Boolean}\).

Now, we are going to make a set of sentences in this language. If you thought the language was bad, this set of sentences is even worse. Call it \(\Gamma\).

This is how we see if a sentence is allowed in \(\Gamma\). There is a natural model for \(L\) which we will call \(\mathscr{R}\), namely the model which has as its universe of quantification \(\mathbb{R}\) itself, and interprets \(\texttt{c}_r\) as \(r\), \(\texttt{f}_\phi\) as \(\phi\) and \(\texttt{R}_\rho\) as \(\rho\). A sentence is allowed in \(\Gamma\) if and only if it is true in this model.

Add one more constant symbol, call it \(\texttt{c}_\infty\), and call the new language \(L'\). Every sentence of \(L\) is also a sentence of \(L'\).

For every \(r \in \mathbb{R}\), add the sentence \(\texttt{[}\texttt{R}_> \: \texttt{c}_\infty \: \texttt{c}_r\texttt{]}\). \(\texttt{R}_>\) is a relation in \(L'\) because \(>\) is a binary relation between real numbers. Call this new theory \(\Gamma'\).

If we have a finite subset \(\Gamma'_0 \subset \Gamma'\), then \(\Gamma'_0\) can only have a finite number of sentences of the form \(\texttt{[}\texttt{R}_> \: \texttt{c}_\infty \: \texttt{c}_r\texttt{]}\). Therefore, we can take the standard model for \(L\), \(\mathscr{R}\), and make it into a model for \(L'\) by setting \(\texttt{c}_\infty\) to some real number that’s bigger than all of the real numbers mentioned in \(\Gamma'_0\). All of the sentences in \(\Gamma'_0 \cap \Gamma\) are true in this model because that’s how we defined \(\Gamma\), and all of the other sentences are true because we set \(\texttt{c}_\infty\) *just* high enough, so this model satisfies \(\Gamma'_0\).

Apply the compactness theorem from the last section to get a model for \(\Gamma'\), let’s call this model \(\mathscr{R}^*\). Now, any statement that you can make about \(\mathbb{R}\) in a first order language is also true in the model \(\mathscr{R}^*\). However, there are more numbers in \(\mathscr{R}^*\) than there are in \(\mathbb{R}\). Specifically, there is the number assigned to \(\texttt{c}_\infty\), which behaves like a real number in any way you can write down in a first order proposition, but is also larger than any of the numbers assigned to \(\texttt{c}_r\) for \(r \in \mathbb{R}\). So we essentially have a copy of \(\mathbb{R}\) in \(\mathscr{R}^*\) that has some extra numbers added to it. Let’s call the universe of quantification ^*. To make things easier, we can just assume that in fact \(\mathbb{R} \subset \mathbb{R}^*\). Moreover, all of the operations on \(\mathbb{R}\) extend to \(\mathbb{R}^*\) (by definition), so I’m going to also do things like say \(a + b\) for \(a,b \in \mathbb{R}^*\), and it is understood that I mean the interpretation of \(\texttt{f}_+\) in \(\mathscr{R}^*\).

Well, specifically, \(\texttt{c}_\infty\). Because \(\mathscr{R}^*\) satisfies every true first order statement about \(\mathbb{R}\), you can divide by any non-zero number. And \(\texttt{c}_\infty\) is definitely non-zero, because it’s greater than \(0\)! So if \(c_\infty\) is assigned to \(\texttt{c}_\infty\), we get some number \(\frac{1}{c_\infty} = d > 0\), such that for all \(r \in \mathbb{R}\), \(d < r\). In fact, we get a whole bunch of numbers like this, because multiplying \(c_\infty\) by any regular real number gives you another “infinite” number.

Remember that whole bothersome definition of limit before? The one with the epsilons and the deltas that is fiddly and boring and everyone forgets? We can now totally do away with that. Say that \(d\) is *infinitesimal* if \(\vert \frac{1}{d} \vert\) is “infinite” (larger than any regular real number). Then say that \(a \approx b\) if \(a - b\) is infinitesimal. Now we can say that \(\lim_{x \to a} f(x) = L\) if and only if \(x \approx a\) implies that \(f(x) \approx L\). To me, at least, that is much more intuitive. It also makes the definition of continuous function more better: \(f\) is continuous iff for all \(a,b \in \mathbb{R}^*\), \(a \approx b\) implies that \(f(a) \approx f(b)\).

You can also define the derivative of a function \(f\) by letting

\(f'(a) = \frac{f(a + d) - f(a)}{d}\)

for \(d\) an infinitesimal. That sure sounds like a physicist’s definition to me!

Note that these definitions of limit and derivative don’t give unique answers. However, they do give unique answers “up to \(\approx\)-equivalence”, and it turns out this is good enough (for more information, as always look to Enderton).

Let’s take an example. We want to prove that if \(f(x) = x^2\), then \(f'(x) = 2x\). Let \(d\) be an infitesimal.

\(\frac{(x + d)^2 - x^2}{d} = \frac{x^2 + 2xd + d^2 - x^2}{d} = \frac{2xd + d^2}{d} = 2x + d \approx 2x\)

It feels a lot like what you implicitly do when you calculate a limit, but now it is explicit.

This also allows us to give pretty trivial proofs of stuff like L’Hopital’s rule. Suppose \(f(0) = 0\), \(g(0) = 0\), and we want to calculate \(\lim_{x \to 0} \frac{f(x)}{g(x)}\). By our definition before, this is \(\frac{f(d)}{g(d)}\) where \(d\) is an infinitesimal, which is the same as

\(\frac{f(d) - f(0)}{g(d) - g(0)} = \frac{\frac{f(d) - f(0)}{d}}{\frac{g(d) - g(0)}{d}} = \frac{f'(0)}{g'(0)}\)

It feels like cheating, but it actually works!

It’s not too difficult to prove that you end up with the same calculus in this model as you do in the regular reals with epsilons and deltas. But this model is pedagogically a lot easier, and just as sound. So why not use it?

This post has gotten way too long, so I’m going to cut it off now before I lose hope of anyone making it all the way through. Hopefully you’ve learned something about logic, and hopefully you will now defend poor physicists against over-zealous mathematicians, thouroughly confusing both in the process! If you are dedicated, read on to the appendix, otherwise see you next time, when I bring in a whole other weird logic to do infinitesimals in a completely different way!

The purpose of this is to give a more rigorous definition of sentential logic. If you don’t know any programming, this will probably be no help. If you know programming, but not Haskell, you may still find this useful, so take a gander and see if you can figure something out.

```
type Model = String -> Boolean
-- | Logical Connective
data UnaryLC = Not
data BinaryLC = And
| Not
| Implies
-- | Proposition
data Prop = Apply1 UnaryLC Prop
| Apply2 BinaryLC Prop Prop
| BasicProp String
interp :: Model -> Prop -> Boolean
interp model (BasicProp p) = model p
interp model (Apply1 lc prop) = case lc of
Not -> not value
where
value = interp model prop
interp model (Apply2 lc prop1 prop2) = case lc of
Or -> value1 || value2
And -> value1 && value2
Implies -> (not value1) || value2
where
value1 = interp model prop1
value2 = interp model prop2
```

The one part that you might not be familiar with is implies. For \(\texttt{[⇒ A B]}\) to be true, if \(\texttt{A}\) is true then \(\texttt{B}\) is true. So, if \(\texttt{A}\) is false, \(\texttt{B}\) can be anything, and if \(\texttt{A}\) is true, \(\texttt{B}\) must be true. Therefore, either \(\texttt{A}\) is false (i.e., \(\texttt{[¬ A]}\) is true) or \(\texttt{B}\) is true. This is why it translates to `(not prop1) || prop2`

.

In this data structure, \(\texttt{[∧ [∨ P Q] [¬ R]]}\) would correspond to

To demonstrate this, let’s first create a helper function for creating models.

```
makeModel :: [(String, Bool)] -> Model
makeModel dict = \x -> case lookup x dict of
Just val -> val
Nothing -> False
```

Now, let’s try evaluating some propositions.

```
> model = makeModel [("P", False), ("Q", True), ("R", True)]
> interp model (BasicProp "P")
False
> interp model (Apply2 And (BasicProp "Q") (BasicProp "R"))
True
> interp model (Apply2 Implies (BasicProp "R") (BasicProp "P"))
False
> interp model (Apply2 Or (BasicProp "P") (BasicProp "Q"))
True
```

You get the picture.

Yes, I did just spend 3 hours figuring out how to get TikZ embedded into this blog. And then Taeer spent two days making it more elegant.↩