This week we start to discuss proof or derivation systems for our logic. The goal is to generate some kind of finite linguistic structure that proves a conclusion from zero or more premises. We want these structures to be such that it is effectively decidable whether one does in fact count as a proof (in a given system) of a formula \(\phi\) from premises \(\Gamma\). (Essentially, this means that a Turing machine could in every case eventually tell us, yes or no, whether this is so.) Sometimes premises and conclusions are restricted to be only sentences (closed formulas), but we will allow open formulas too. In some advanced theoretical settings, proofs from infinite sets of premises are also discussed; but we will not do that. Later we will discuss a property of the classical logics we've given semantics for (a property called compactness) that tells us that with these logical systems, nothing is lost by restricting ourselves to finite sets of premises.
There are many different styles of proof systems; and then within each style, different choices can be made about exactly which axioms and/or rules are posited as primitive, and which instead need to be derived. Many of those choices yield equivalent systems. Some choices yield systems that are essentially weaker: that is, systems where fewer formulas turn out to be derivable.
Suppose we've specified a semantics for our language, including specifying which parts of the language get different interpretations on different models, and which don't (the "logical constants"). As I mentioned in some earlier notes, one way to generate a semantics is to start with a proof system, and let \([\![\, \phi \,]\!]\) be an equivalence class of formulas with which \(\phi\) is interderivable in that system. But more commonly, the semantics is specified in terms that don't refer to proofs. That was the strategy we followed.
Given a semantics for a language, and also a proof theory for the language, one thing you will definitely want is for the proof theory to be sound. That is, you will want it to be the case that when some formula \(\phi\) is provable (either from no premises, or from premises \(\Gamma\)), then \(\phi\) will be a logical consequence of the relevant premises:
Whenever \(\vdash \phi\), then \(\models \phi\).
Whenever \(\Gamma \vdash \phi\), then \(\Gamma \models \phi\).
If your proof system doesn't have this property, then it's not a good partner for your semantics, and you should set one of the two aside.
It's somewhat less mandatory, but still desirable, that your proof system have the converse property too: that every logical consequence of some premises will be provable in that system from those premises. This property is known as completeness, and we will discuss it more in a coming week. For first-order sentential and predicate logic, the proof systems you'll see are complete in this sense, as well as sound. For some logical systems, such as full second-order logic, no sound proof system can be complete. So this desirable property cannot always be attained.
It's also desirable, but even less often attainable, that notions like logical consequence and incompatibility ("unsatisfiability") be effectively decidable --- essentially, meaning: could a Turing machine in every case eventually tell us, yes or no, whether these properties are present? As we've said before, for first-order sentential logic, and some restricted fragments of predicate logic, these properties are decidable. But for full first-order predicate logic and stronger systems, they are not. We can devise a recipe that is guaranteed to generate a proof of \(\phi\) from premises \(\Gamma\) when there's a proof to be had, but when there is no proof, we can't in general guarantee that our recipe will be able to stop and tell us, sorry you're out of luck.
Sider presents two proof systems for sentential logic: one using sequent proofs and the other using logical axioms. With axiomatic systems it's common to say things like "This system has one rule and three axioms," but really the standard strategy is for there to be axiom schemas, each schema generating an infinity of specific instances when you uniformly substitute in formulas for the schematic variables in the axiom. (A different, older approach has the axioms be non-schematic, but then has an additional rule permitting you to uniformly substitute any formula for a sentence constant in your non-schematic axiom. In the end, this amounts to the same thing.) Axiom systems tend to minimize their rules of inference down to just a very few. Other proof systems, like sequent systems, go the other direction. They have lots of rules and no logical axioms.
For predicate logic, Sider presents only an axiomatic proof system. It can be thought of as extending his axiom system for sentential logic. (Only now we permit formulas in the language of predicate logic, which differs in some ways from the language of sentential logic.)
Many of you are familiar with proof systems other than these: sometimes Beth/Jeffrey "Tableaux" proofs, or sometimes "Natural Deduction." There are close connections between natural deduction and sequent systems; Bostock discusses these in one of the readings from his book for this week. Also, sequent systems can be generalized so as to permit more (or fewer) than one formula to appear on the right-hand side. Bostock discusses this in an optional reading for this week. Those generalized sequent systems turn out to be closely connected to the tableaux proof systems. However, we will confine our attention to axiomatic proof systems and the not-yet-generalized sequent systems, which always have exactly one rhs formula.
I said that the axiomatic systems are the only ones with logical axioms. This is different from the question how many extra-logical axioms you have. Given any of these proof systems, we could go on to designate a number of additional axioms that we will always allow to be used as premises in some theoretical settings. This is the sense in which one talks about the "Peano axioms" for arithmetic, or "Euclid's axioms" for geometry, or the "ZFC axioms" for set theory, and so on. Those should all be thought of as theories built on top of some background logic. We'll discuss such things later. These weeks, our focus is just on the background logics themselves. So the only axioms we'll be talking about are logical axioms; and one proof system has these but others have none.
Axiomatic proof systems are sometimes called "Hilbert-style" systems; and sequent systems --- especially the generalized versions, which permit more (or fewer) than one rhs formula --- are sometimes called "Gentzen-style" systems.
As I said, axiomatic proof systems aim to minimize rules of inference down to just a very few, usually including something like modus ponens. Other proof systems, like sequent systems, make rules do all the work, and so don't have any logical axioms. In these systems, all logical theorems are derived from rules. It's usual to organize the primitive rules into "introduction rules" and "elimination rules" for each logical connective. Bostock discusses the desiderata for these labels on pp. bot 240--top 242 of his reading on natural deduction systems. (In most proof systems, it's not possible to perfectly satisfy all the desiderata.)
In addition to substantive rules like modus ponens, or other introduction or elimination rules, proof systems will have a number of other rules known as structural rules. Sometimes these are explicitly posited as primitives in the system. Other times they may be implicit in the system's definition of what counts as a proof or derivation. Other times they may be derivable from other rules and/or axioms.
One example of a structural rule is the rule of Assumption, which says that from the premise \(\phi\) you can derive \(\phi\). Sider explicitly posits this as a rule for his sequent system (on p. 42), but if you examine his definition of a "sequent proof" further down the page, you'll see that the rule is already implicit in that definition, and so didn't really need to be explicitly posited.
More examples of structural rules are Bostock's rules of Interchange and Contraction, presented on pp. 275--6 of his reading on sequent systems. These tell us that we can permute premises and "contract" repeated premises. For Sider, these rules are implicit in his assumption that the left-hand argument of \(\Rightarrow\) is a set. Bostock on the other hand wants a sequent to be a finite string, and so for him the left-hand argument of \(\Rightarrow\) is a finite list of formulas, and these rules for permuting and contracting repeated premises have to be explicitly posited.
Another structural rule is Thinning (sometimes called Weakening or Monotonicity):
If \(\phi\) can be proved from premises \(\Gamma\), then it can also be proved from premises \(\Gamma\cup \{\, \psi \,\}\). That is, adding premises never makes fewer things derivable.
And another structural rule is Cut:
If \(\phi\) can be proved from premises \(\Gamma\), and \(\psi\) can be proved from premises \(\Delta\cup \{\, \phi \,\}\), then \(\psi\) can also be proved from premises \(\Gamma\cup \Delta\). (See Sider pp. 57--8, and the Bostock reading on "Structural Rules.")
Instead of positing Cut as a primitive rule, Sider proves that the axiomatic system he's already given will already make Cut hold. Similarly, instead of positing Thinning as a primitive rule, Sider demonstrates a technique on pp. 45--6 for how to "thin" or expand a set of premises in his sequent proof system. Namely, if you've ever proved:
\(\Gamma \Rightarrow \phi\)
you can then prove:
\(\psi \Rightarrow \psi\)
and use the \(\wedge\)-introduction rule to derive from those two sequents:
\(\Gamma,\psi \Rightarrow \phi\wedge\psi\)
and then use \(\wedge\)-elimination to derive:
\(\Gamma,\psi \Rightarrow \phi\).
As Sider explains, this corresponds to steps 7 and 8 in his proof of the sequent \(P \vee Q, \neg P \Rightarrow Q\).
For our purposes, it won't matter which of these structural rules are explicitly posited and which are derived.
It's worth remarking that in some non-classical logics, not all of these structural rules are valid; such logics are therefore called sub-structural. But we won't be studying any such systems.
A "deduction theorem" holds for a proof system when:
If \(\Gamma,\phi \vdash \psi\) in that system, then also \(\Gamma \vdash \phi\supset \psi\).
This corresponds to the rule of Conditional Proof or \(\supset \)-introduction in Sider's sequent system; so in that system, it can be directly relied on. In Sider's axiomatic system, however, there is no explicit rule permitting this step. However, Sider proves on pp. 58--9 that in his axiomatic system, whenever there is a proof of \(\psi\) from premises \(\Gamma\cup \{\, \phi \,\}\), there will also be some proof of \(\phi\supset \psi\) from premises \(\Gamma\). So although "Conditional Proof" can't be cited to justify any step in an official axiomatic proof, we can assume it in our metalanguage reasoning about what axiomatic proofs exist.
See also section 5.3 (pp. 200--207) of the Bostock reading on axiomatic systems.
As Bostock points out (p. 203), the two axioms he shares with Sider:
conveniently happen to be just what's needed (together with a modus ponens-like rule) to prove the Deduction Theorem. As Bostock also points out (pp. 206--7), if the Deduction Theorem were taken as a further rule, then from it and [modus ponens] we could derive all instances of the [K] and [S] axioms.[K] \(\phi\supset (\psi\supset \phi)\)
[S] \((\phi\supset (\psi\supset \chi)) \supset ((\phi\supset \psi)\supset (\phi\supset \chi))\)
The axiom schema I above called [K] is what Sider calls [PL1] and Bostock calls [A1]. There's a reason it's often called [K], having to do with combinatorial logic, which is related to the pure untyped lambda calculus. We don't have time to explore those connections, so just go along with me calling it [K].
The axiom schema I above called [S] is what Sider calls [PL2] and Bostock calls [A2]. There's a reason it's often called [S], coming from the same connections, so just go along with me calling this one [S].
You'll notice that those two axioms only use the single connective \(\supset \). As mentioned above, they suffice to prove the Deduction Theorem for this proof system, which plays a role analagous to the rule of Conditional Proof or \(\supset \)-introduction in other proof systems. Moreover, Sider and Bostock both endorse a modus ponens-like rule. Bostock initially (p. 194) posits a rule he calls [DET] (for "detachment"), which says:
If \(\vdash \phi\) and \(\vdash \phi\supset \psi\) then \(\vdash \psi\)
but he later (p. 202, see also pp. 222--3) generalizes that to [modus ponens] or [MP]:
If \(\Gamma\vdash \phi\) and \(\Gamma\vdash \phi\supset \psi\) then \(\Gamma\vdash \psi\)
which in the presence of our structural rules, will be the same as:
\(\phi,\phi\supset \psi \vdash \psi\)
and that's how Sider formulates [MP] (p. 47). [MP] is what's often used as a \(\supset \)-elimination rule in other proof systems. So we might think that between [K] and [S] (giving us the Deduction Theorem or Conditional Proof/\(\supset \)-introduction) and [MP] (giving us \(\supset \)-elimination), we might be able to derive all the valid formulas that only contain the connective \(\supset \)?
The answer is, no, that is not possible. At least, not in the classical logic whose semantics we've specified earlier. These resources do suffice to derive all the valid formulas in some weaker logics (namely, that fragment of intuitionistic logic that can be formulated using only \(\supset \)). But not in classical logic. Instances of Peirce's Law \(((\phi\supset \psi)\supset \phi)\supset \phi\) are not derivable from [S] and [K] and the rule [MP]. This law marks a respect in which classical logic is stronger than intuitionistic logic.
For sentential logic, Sider posits one additional axiom schema, the first one we've seen to include the connective \(\neg\):
[Sider's PL3] \((\neg\psi\supset \neg\phi) \supset ((\neg\psi\supset \phi) \supset \psi)\)
Bostock also posits one additional axiom (I've relabeled his variables):
[Bostock's A3] \((\neg\psi\supset \neg\phi) \supset (\phi\supset \psi)\)
You'll notice these are different axioms. In fact, as Bostock explains in his sections 5.4 and 5.8, there are a variety of choices one can make here, and still end up with equivalent systems. Bostock mentions something like Sider's axiom as his [A3\({}'\)] (see exercise 5.2.2 on p. 199; see also the rule [RAA*] in exercise 5.4.2 on p. 216).
Here are a variety of rules of inference with historically significant names. You should at least file these names away for later reference. Sometimes these names are applied to formulas (or formula schemas) as well as to rules.
If we think of the literal falsehood \(\bot\) as validating the rule \(\bot \vdash \psi\), then we could restrict [ex falso] to:
And another way to write [reductio] would be:
As I mentioned, if [MP] is our only rule, then [K] and [S] aren't sufficient by themselves to prove Peirce's Law \(((\phi\supset \psi)\supset \phi)\supset \phi\). But if that is added as a third axiom schema, then every formula using only \(\supset \) which is valid in classical sentential logic will be provable. In the presence of [Peirce's Law], [S] can optionally be replaced by an axiom schema corresponding to the rule of [hypothetical syllogism], mentioned above:
\((\phi\supset \psi) \supset ((\psi\supset \chi) \supset (\phi\supset \chi))\)
See Bostock's exercise 5.8.3, on p. 238 --- though in my printed copy there is a typo in his statement of this new axiom.
In axiomatic systems, connectives like \(\vee\) and \(\wedge\) are often defined in terms of \(\supset \) and \(\neg\) (as Sider does), and sometimes \(\neg\) is itself defined in terms of \(\supset \) and \(\bot\). But we can also introduce \(\vee\) and \(\wedge\) as primitives, and give them their own axiom schemes:
Note that iii is an axiom version of the rule [dilemma] formulated above. For axiom vi, see Sider's tautology "composition" in table 4.1 on p. 102. In the presence of our other rules and axioms, that can be simplified to \(\phi \supset (\psi \supset \phi\wedge\psi)\), but this gives up the nice parallel with axiom iii.
To define \(\neg\) in terms of \(\supset \) and \(\bot\), we say that \(\neg\phi\) means \(\phi\supset \bot\). If we then add to [S] and [K] the single axiom schema:
[Church] \(((\phi\supset \bot)\supset \bot) \supset \phi\)
then those three and the rule [MP] suffice to prove all classically valid formulas. (So [Church] could take the place of Sider's [PL3] or Bostock's [A3].) Alternatively, we could add [Peirce's Law] to [S] and [K], and then we'd need only this simple axiom schema:
\(\bot \supset \phi\)
That together with [Peirce's Law], [S], [K], and the rule [MP] also suffices to prove all classically valid formulas.
If one instead prefers to add to [S] and [K] further axiom(s) using \(\neg\), then two choices we've seen are Sider's [PL3] (corresponding to the rule [reductio*]) or Bostock's [A3]. Alternatively, one could add rules or axioms like [reductio] (without the *) and [ex falso]. These rules won't, by themselves or together, enable one to prove all the classically valid formulas. Together they will however enable one to derive all the formulas valid in intuitionistic logic, which are a proper subset of the classically valid formulas. One can then add some further axiom schemas which are valid only in classical logic, like \(\neg\neg\phi \supset \phi\). In fact this last axiom scheme, together with [reductio], suffices. (That is, they can be added to [S] and [K] and the rule [MP] to prove all classically valid formulas.) Alternatively, one could take [ex falso] together with [consequentia mirabilis*], formulated above. Or one could take either of [reductio] and [ex falso], and combine it with [tertium non datur*]. (The *s in the above list of rules indicates that the rule is only classically and not intuitionistically valid.)
For predicate logic, one can use any axiomatic system that is complete for sentential logic, and then just add some further axioms and rules specific for the quantifiers. Sider adds the following axiom schemas, where \(\alpha\) is schematic for any simple term (name or variable):
[Sider's PC1, also Bostock's A4] \(\forall x\phi \supset \phi[x\leftarrow \alpha]\)
[Sider's PC2, also Bostock's A5] \(\forall x(\phi\supset \psi) \supset (\phi \supset \forall x\psi)\), where \(x\) is not free in \(\phi\)
Sider also introduces a rule he calls [UG] (p. 99), but he points out (pp. 99--100) that the formulation he's presented gives undesirable results if we allow open formulas to appear as premises. Here is another version of this rule:
[Bostock's \(\forall\)-intro] If \(\Gamma\vdash \phi\) and simple term \(\alpha\) doesn't occur in \(\Gamma\), then \(\Gamma \vdash \forall x\phi[\alpha\leftarrow x]\)
(Bostock initially posits a restricted version of this, called [GEN], on p. 221; see pp. 222--223 for this generalization.) Bostock's rule is more complex than Sider's, but it's not complex enough for our purposes. Unlike Bostock (see his pp. 89--90), we'd like to allow open formulas as premises and conclusions. Yet as Sider points out, we shouldn't license the inference \(Fx\vdash \forall xFx\), because that would make life complicated in various ways. And Bostock's rule does license this (let \(\phi\) be \(Fx\) and \(\alpha\) be \(y\)). To block this, we need to impose all of these restrictions:
We do however want to allow that \(\alpha\) might be a variable, perhaps even the variable \(x\). (When \(\alpha\) is not a variable, we'll count all of its occurrences as "free.")
The most elegant way to incorporate all these restrictions involves stipulating the substitution \([x\leftarrow \alpha]\) on the \(\phi\)-premise, rather than the substitution \([\alpha\leftarrow x]\) in the conclusion, as Bostock does. Our rule will be:
[\(\forall\)-intro\({}'\)] If \(\Gamma\vdash \phi[x\leftarrow\alpha]\) and simple term \(\alpha\) doesn't occur free in \(\Gamma\) or \(\forall x\phi\), then \(\Gamma \vdash \forall x\phi\).
From here on, I will amend all further rules I cite from Bostock in this way, without comment.
Sider's rule is essentially the one just stated, with the additional restriction that \(\alpha\) always be \(x\) itself rather than some distinct term; and Sider doesn't explicitly state the constraint that \(\alpha\) not occur free in \(\Gamma\). The more liberal rule we're using permits us to generalize on arbitrary term constants as well as on free variables. This is convenient but it doesn't make any new theorems derivable.
As Bostock points out (pp. 236, 254), an alternative to the pair of axioms and single rule just cited is to take only the first axiom and the stronger rule:
[\(\forall\)-intro\({}''\)] If \(\Gamma \vdash \phi\supset \psi[x\leftarrow\alpha]\) and simple term \(\alpha\) doesn't occur free in \(\Gamma\) or \(\phi\) or \(\forall x\psi\), then \(\Gamma \vdash \phi\supset \forall x\psi\).
Or one could dispense with any new rules, and let axioms carry all the weight. (The only rule for the logic would be [modus ponens].) Enderton uses the axioms (p. 122ff.):
Bostock proves (p. 230) that if we define \(\exists x\phi\) to mean \(\neg\forall x\neg\phi\), then the axioms and rules he gives for \(\forall\) suffice to validate the rules:
[\(\exists\)-intro\({}'\)] \(\phi[x\leftarrow \alpha] \vdash \exists x\phi\).
[\(\exists\)-elim\({}'\)] If \(\Gamma,\phi[x\leftarrow\alpha] \vdash \psi\) and simple term \(\alpha\) doesn't occur free in \(\Gamma\) or \(\exists x\phi\) or \(\psi\), then \(\Gamma,\exists x\phi \vdash \psi\).
Or we could take \(\exists\) as a primitive, governed by the second rule and an axiom version of the first rule (pp. 236, 254).
When \(=\) is in the language, Bostock posits the two axiom schemas (p. 324):
\(\alpha=\alpha\)
\(\alpha=\beta \supset (\phi[x\leftarrow\alpha] \subset\supset \phi[x\leftarrow\beta])\)
More restricted versions of the second axiom would also be adequate. For example, one could replace the \(\subset\supset \) with \(\supset \). One could also restrict the candidate \(\phi\)s to atomic formulas. Instances of the more general axiom would in those cases always still be derivable.
We stipulated rules for \(=\) in our semantics. Some authors instead let the semantics treat \(=\) as an ordinary binary relation, and just assume that logical axioms like the preceding must be satisfied in every model. This guarantees that \(=\) will be an equivalence relation, and that on every model, objects so related will be congruent with respect to every predicate or functor in the language. But it doesn't guarantee that \(=\) expresses numerical identity. On that approach, for example, you cannot assume that \(\exists y\forall x. x=y\) is true only on models whose domain has one object. Sometimes authors who take this approach introduce the notion of a normal model for a language with equality. This just means a model where \(=\) is interpreted to mean numerical identity --- as our semantics requires it always to be interpreted.
Sider's sequent system includes the following rules (see Sider pp. 40--2, and Bostock p. 277):
Many of these rules are sometimes stated in simpler terms, which in the presence of our structural rules like Cut turn out to be equivalent:
For negation, Sider posits three rules. The first is [reductio] (no *), which we stated above but he formulates as:
From \(\Gamma,\phi\Rightarrow\psi\wedge\neg\psi\), derive \(\Gamma\Rightarrow\neg\phi\).
The other two rules are:
From \(\Gamma\Rightarrow\neg\neg\phi\), derive \(\Gamma\Rightarrow\phi\).
From \(\Gamma\Rightarrow\phi\), derive \(\Gamma\Rightarrow\neg\neg\phi\).
The third of these is redundant and could be derived from the other rules.
Instead of these rules, we might instead go with a single rule for negation, corresponding to [reductio*] (a rule version of Sider's axiom [PL3]). Bostock makes a different choice, and uses the rules [tertium non datur*] and [ex falso], which he formulates as:
(See Bostock p. 278.)From \(\Gamma,\phi\Rightarrow\psi\) and \(\Delta,\neg\phi\Rightarrow\psi\), derive \(\Gamma,\Delta\Rightarrow\psi\).
From \(\Gamma\Rightarrow\phi\) and \(\Delta\Rightarrow\neg\phi\), derive \(\Gamma,\Delta\Rightarrow\psi\).
Sider doesn't extend his sequent proof system to predicate logic; but Bostock does. For the quantifiers, Bostock posits the rules (p. 278):
In these rules, a "simple" term is a name or variable. In [\(\forall\)-elim] and [\(\exists\)-intro], \(\alpha\) is not required to be simple: if we have functors in the language then \(\alpha\) may also be a complex term made up of a functor and other terms.
Bostock also mentions the variant formulations (p. 282 in exercise 7.2.1):
When \(=\) is in the language, Bostock posits the rules:
Derive \(\alpha=\beta,\phi[x\leftarrow\alpha]\Rightarrow\phi[x\leftarrow\beta]\).
Derive \(\Rightarrow\alpha=\alpha\).
Alternatively, the second rule could be replaced with:
Derive \(\Rightarrow\exists x(x=\alpha)\).
Or all those rules could be replaced with:
See Bostock p. 324 and exercise 8.1.1 on p. 331.Derive each of \(\phi[x\leftarrow\alpha]\Rightarrow\exists x(x=\alpha \wedge \phi)\) and \(\exists x(x=\alpha \wedge \phi)\Rightarrow\phi[x\leftarrow\alpha]\).