Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Over the past two chapters, we’ve been building up a system of mathematical logic. We’ve looked at what propositions are, how to determine if they are true or false, how to determine if two propositions are equivalent, and how to use propositions in arguments to make valid deductions.
However, this is not how most of mathematics is communicated. For most of us, we understand math in terms of numbers and geometric shapes. We learn arithmetic, algebra, and trigonometry. We learn about lines and angles, polygons and circles, graphs and equations, and how all are interconnected. What we’ve done in this book so far looks vastly different. What gives?
The ultimate reason why we spent so much time trying to get a grip on mathematical logic is because mathematics is all about using known facts (propositions) and deducing new facts (valid arguments.) For example, we’ve talked about the Pythagorean Theorem a lot in numerous introductions. We’ve asked how we know the Pythagorean Theorem true. It’s not like the Pythagorean Theorem was inscribed on a stone tablet and sent down from the heavens for all to see. Someone had to figure it out (it may have been known before Pythagoras.)
Moving forward, we’re going to be deducing new facts from old facts, all having some amount of mathematical interest. All that we do will be based on the mathematical logic we’ve learned thus far, but the structured use of logic will (usually) be implicit in what we do, not explicitly pointed out. By building up a system of logic, we’re giving ourselves a set of tools to fall back to if we’re ever unsure of soundness or correctness of the logic we’re using.
Up to this point, a lot of examples we’ve talked about had seemingly nothing to do with mathematics. Most were concerned with people and the situations they found themselves in. From this point forward, all examples will be mathematical in nature.
An alternate title for this section could be “The Building Blocks of Mathematics”, or perhaps “The Foundations of Mathematics”. Axioms, definitions, theorems, and proofs are the main tools of mathematics, and are what we use, as well as what we produce.
Let’s begin!
In normal, everyday language, it is quite common to hear people speak with conditionals. The problem with everyday language is that it is not precise. Specifically, people tend to use implications when speaking, but sometimes what a person is implying is really a biconditional.
In Marina’s geometry class, her teacher made the following statement:
If a quadrilateral has two pairs of parallel sides, then it is a parallelogram.
After reading through the previous chapters of this book, Marina has translated the above statement in a more mathematical way.
𝒰: | All planar quadrilaterals |
t(x): | x has two pairs of parallel sides |
p(x): | x is a parallelogram |
∀x [t(x) → p(x)]
Later however, as she’s studying with her friend Daisy, Daisy makes the following statement:
If a quadrilateral is a parallelogram, then it has two pairs of parallel sides.
Furthermore, Daisy has also been reading this book, and proposes the following mathematical statement:
∀x [p(x) → t(x)].
Marina notes that Daisy’s statement is not wrong: it is true that parallelograms have two pairs of parallel sides.
In fact, being a parallelogram goes hand-in-hand with having two pairs of parallel sides. Hence, even though both of the above statements make explicit use of implications, what’s being implied is a biconditional. As such, Marina comes up with the following statement:
A quadrilateral is a parallelogram if and only if it has two pairs of parallel sides.
∀x [p(x) ↔ t(x)].
Let’s re-examine Daisy’s statement: ∀x [p(x) → t(x)]. Daisy asserts that if a quadrilateral is a parallelogram, then it has two pairs of parallel sides. Would we ever be able to verify that p(q) → t(q) for some given quadrilateral q? In order to do so, we would have to know what a parallelogram is. The word “parallelogram” itself doesn’t tell us much because it is just a man-made word.
We could figure out what having two pairs of parallel sides means because we understand the ideas and concepts that are expressed by the statement “Ttwo pairs of parallel sides.” We can measure lines in a variety of ways, and there are numerous ways to determine if two lines are parallel or not.
But what about determining if a quadrilateral is a parallelogram? The solution is to know what idea or concept the word “parallelogram” is being assigned to represent. We have to make this assignment because it is just a word. Of course, mathematicians have defined the word parallelogram to refer to a quadrilateral having two pairs of parallel sides.
This is why the statement
∀x [t(x) ↔ p(x)]
is most appropriate. The statements made by the teacher, and by Daisy are essentially defining the word parallelogram.
Without knowing exactly what a parallelogram is, the statement ∀x [p(x) → t(x)] can’t be verified to be true or false, because we would never know if p(x) is 0 or 1 depending on what quadrilateral x represents.
When writing mathematics, it is imperative to be as precise as possible. As such, if what you want to state involves a biconditional, you should use the double-ended arrow ↔. It is almost never appropriate to use an implication arrow → when a biconditional is needed.
The exception to the above rule is definitions. The purpose of a definition is to assign a meaning, idea, or concept to a single word. Hence, it is usually OK to state definitions in terms of implications.
Suppose we want to define some new word, which we’ll refer to simply as word, and we want to assign a definition to word, which we’ll refer to simply as definition. The most accurate way to convey meaning is by saying
word ↔ definition
Remember that for any two propositions p and q, we have that
p ↔ q ⟺ (p → q) ∧ (q → p).
Hence, instead of saying word ↔ definition, we would be fine to instead say
word → definition
or
definition → word.
because even though an implication is used, what is really meant is the biconditional.
Here, we present some mathematical words and definitions which you’ll likely be familiar with already, but they are being presented here just for the sake of completeness.
The numbers
0, 1, 2, 3, 4, …, 100, 101, … 10000, 10001, …
are collectively referred to as the whole numbers.
When we combine the whole numbers into a collection along with their negative counterparts, such as the list
0, 1, -1, 2, -2, 3, -3, …
then the new collection of numbers are collectively referred to as the integers.
When the universe of discourse we are using is the universe of all the integers, we commonly use the symbol ℤ instead of 𝒰.
An integer n is called even if (and only if) there exists some integer k such that
n = 2k.
Mathematically, we would write
∀n [ n is even ↔ ∃k [n = 2k] ]
An integer is called odd if (and only if) there exists some integer k such that
∀n [ n is odd ↔ ∃k [n = 2k + 1] ]
Notice in the above definition, when writing out the logical quantified statements, we didn’t use a single letters to represent the propositions. Instead, we just wrote out the propositions in English.
Also note that we wrote the “and only if” part in parenthesis. This is because here, we want to stress that in a definition, biconditionals are used. Again, it is usually fine to to just say the “if” part in definitions because what’s really meant is the biconditional. For the rest of this section, we will continue to write the “and only if” part in parenthesis. In future sections, it may be excluded, fully written out, or included in parenthesis.
Since 0 is an integer, we can describe it as either being even, or as being odd.
0 is even because there is an integer k such that
0 = 2k.
It just so happens that k is also equal to 0. Notice that in the definition of even integer, we did not say k had to be different from n. As long as k is an integer, any integer, then we say n is even.
However, there is no integer k such that
0 = 2k + 1
and so 0 is not an odd integer. The closest integers we can choose for k are -1 and 0.
9 is an odd integer because when we set k to be the integer 4, we get that
2(4) + 1 = 8 + 1 = 9.
However, there is no integer we can substitute for k such that
9 = 2k.
The closest integers for k in this case are 4 and 5, but when k is 4,
2(4) = 8 ≠ 9
and when k is 5, we get that
2(5) = 10 ≠ 9.
Continuing on the topic of an integer being even, or being odd, consider the integer 4.
Since there is an integer k such that 4 = 2k (that integer is k = 2) 4 is an even integer.
Now consider the integer 6. There is an integer we can substitute for k such that 6 = 2k (here, we can substitute in 3 for k.)
Notice that when we were deciding if 4 is even, the integer we found to work for k is the integer 2, which is itself even. However, when determining if 6 was even, we found that k had to be 3, but 3 is an odd integer.
Notice that nowhere in the definition of even did we say that k also had to be even, or that k had to be odd. The only condition we placed on k was that k had to be some kind of integer regardless if it is even or odd.
The same holds true for odd integers. If we’re trying to determine whether or not n is an odd integer, we don’t have to limit k to be be an even integer or an odd integer.
For example, for 11, we see that there exists a value of k such that
11 = 2k + 1 (where k = 5).
For integer 13, we see that there is a value of k such that
13 = 2k + 1 (here, k = 6).
For integer 11, the integer 5 is our value for k (5 is itself odd), but for odd integer 13, the integer 6 was used for k (6 is an even integer.)
Again, we want to stress that k does not have to be a different integer than k, nor does k have to be the same type of integer. All we need for k is to be an integer itself.
There are numbers other than integers, but can we describe those as even or odd?
The whole numbers are already “contained” in the integers, so we could say those are even or odd.
For a negative integer such as -8, setting k to be the integer -4 tells us that -8 is an even integer:
-8 = 2k = 2(-4).
For integer -13, setting k = -7 tells us that -13 is odd:
-13 = 2k + 1 = 2(-7) + 1 = -14 + 1.
But what about a number like 3.2? Notice that our definition for even requires the number of interest to be an integer. Thus, since 3.2 is not an integer, the definition of even does not apply.
Furthermore, our definition of odd also requires n to be an integer, and since 3.2 is not an integer, we can’t describe 3.2 as odd.
Thus, 3.2 is a number that is neither even, nor odd.
The terms even and odd (as defined so far) only apply to integers.
It may seem a bit excessive to go over the details of how even and odd integers are defined in a college-level set of notes. The point of the examples is not to teach what even and odd integers are. The point of the above examples is to show how a mathematical definition can carry a lot of nuance, even for a definition as simple as even or odd. Additionally, by using commonly known terms, we get a chance to translate their definitions into a more mathematical form.
Notice that in our definition of even, there are three basic requirements that have to be satisfied in order for a number n to be classified as even:
n must be an integer
n = 2k for some number k
k must be an integer.
If any combination of these three requirements fail, then we cannot describe n as even. In mathematics, we require utmost precision in order to know exactly what idea is being conveyed.
Three requirements must also be met for the number n to be called odd:
n must be an integer
n = 2k + 1 for some number k
k must be an integer.
We can write this definition out using the mathematical notation we’ve developed so far, and write out an equivalent English sentence:
e(n): | n is even |
o(n): | n is odd |
a(n, k): | n = 2k |
b(n, k) | n = 2k + 1 |
∀n [e(n) ↔ ∃k a(n, k)]
For all integers n, n is even if and only if there exists some integer k such that
n = 2k.
∀n [o(n) ↔ ∃k b(n, k)]
For all integers n, n is odd if and only if there exists some integer k such that
n = 2k + 1.
Remember that the universe of discourse for symbols n and k in the two quantified statements above is ℤ, the universe of all integer numbers.
All definitions in mathematics work this way, they assert conditions that must be satisfied before the associated word can be applied.
Two integers are said to have the same parity if (and only if) they are both even, or if they are both odd.
∀n, m [n and m have the same parity ↔ m and n are both even or m and n are both odd]
Two integers are said to have different parity if (and only if) one of them is even, and the other one is odd.
Again, let’s take a careful look at the definition of parity. In order for the word parity to be applied, we first need two integers, let’s just use m and n for the time being. If m is not an integer, or n is not an integer, then we can’t use the word parity in reference to m and n.
If we wanted to use a more notational style in defining parity, we could do the following:
p(a, b): | a and b have the same parity |
e(a): | a is even |
o(a): | a is odd |
∀m, n [p(m, n) ↔ (e(a) ∧ e(b)) ∨ (o(a) ∧ o(b)) ]
However, suppose a different number p was an integer, then we could use the word parity in reference to numbers m and p since both m and p are integers.
Going forward, we will not be so pedantic in describing every aspect of a definition. It will be up to the reader in order to determine if a definition can be applied. We will still show how to write the definition using mathematical notation.
The key is to carefully read every part of the definition, and understand what the requirements of satisfying the definition are. If even one condition of the definition fails to hold, then the definition does not apply.
Here is one more definition which will be used quite frequently.
An integer n is called a perfect square if (and only if) there exists some integer k such that
n = k2
∀n [ n is a perfect square ↔ ∃k [n = k2] ]
In the previous two sections, we examined arguments that had universally quantified statements as premises, however, we were concerned more with what form the conclusion of those arguments took. We never discussed how we could get universally quantified statements as premises. Appealing to definitions is one way to do so. We’ll see examples of this in the next section.
Whereas definitions are things that can be decreed, axioms represent something fundamentally different. In some sense, axioms are statements of mathematical interest that are intuitively correct, but require no proof of correctness.
An axiom is a statement of mathematical interest that is taken, or assumed, to be true without the need for proof, and are used as premises in arguments, but never appear as the conclusion of an argument.
The word postulate is a synonym for axiom.
In arithmetic, we are familiar with the associative law:
a + (b + c) = (a + b) + c.
In essence, it does not matter if we add b and together first, and then add in a, or if we start by adding a and b together, and then adding in c. For example,
1 + (2 + 3) | = | 1 + 5 |
= | 6 | |
= | 3 + 3 | |
= | (1 + 2) + 3 |
As such, we can simply write 1 + 2 + 3 without parenthesis.
However, it should be noted that just one example does not prove anything. How do we know this always works? It’s impossible to simply check every combination of integers. Furthermore, this seems to work for non-integer numbers like 90.77, 3.14159265, and 2.718.
However, trying to come up with a counter-example seems fruitless as well. No matter how we break up numbers, this rule always seems to work. As such, we simply assert this as an axiom of basic arithmetic: we can always change the order in which numbers are added together, and we will always get the same result.
Thus, associativity is an axiom of arithmetic.
Most of us have taken a geometry class where lots of definitions and theorems are given, mostly about triangle congruence, parallel and perpendicular lines, angle measure, area, and volume. However, how do we know that all of those theorems are true? The definitions can just be asserted because we are forcing ideas and concepts onto words.
Roughly 2300 years ago, a Greek mathematician and philosopher named Euclid wrote a book commonly known as “The Elements.” In it, he lays out a system of geometry based around five axioms, or as he called them, postulates.
Those postulates are as follows:
From these five axioms, most of what we know about plane, or Euclidean, geometry can be deduced using the rules of logic discussed previously in this book. For example, we can prove that the angle measures in a triangle always add up to 180°, no matter what kind of triangle we’re working with.
Again, we are not trying to prove that the above five statements are correct, we’re asserting them to be true, and from those (assumed to be) true statements, we deduce more true statements.
Something worth noting is that when we assert some collection of axioms, we are essentially creating a branch of mathematics. Everything we do with those axioms develop that branch of mathematics into something more mature. If even one of those axioms is altered, then everything deduced from the original axioms no longer holds. Instead, you have an all new branch of mathematics, with new results, and even more exciting discoveries to be made!
As stated above, altering even just one of the axioms that define a branch of mathematics, you get an entirely new branch of mathematics where the old results no longer necessarily apply.
For a long time, many mathematicians and philosophers actually tried to prove the fifth postulate using the first four postulates. It was thought that it didn’t need to be asserted as an axiom, but as time went on, it was eventually shown that the fifth postulate could not be deduced from the four previous postulates (using very sophisticated mathematical logic that goes well beyond the scope of this book).
As a result, variations on the fifth postulate were asserted by many people over a very long time. When using the postulate as stated, the system of geometry that follows is called “Euclidean”, and is applicable for geometry taking place on an infinitely long flat surface (commonly referred to as a plane).
There are types of geometries where the fifth Euclidean postulate is eschewed, the two most commonly known ones are elliptical geometry and hyperbolic geometry. There is even a type of geometry, distinct from Euclidean, elliptic, and hyperbolic geometry called Spherical Geometry that is concerned with geometric figures that lie on the surface of a sphere.
Being able to work with such geometries requires sophisticated tools that will be explored in later books. In short, elliptic geometry essentially says that there are no parallel lines, whereas hyperbolic geometry says that are infinitely many distinct parallel lines passing through a point that is not on some given line.
A Wikipedia article on Non-Euclidean Geometry can be found here.
Whereas axioms are statements of mathematical interest that are asserted, and define a entirely new branch of mathematics, theorems are always deducible from axioms and definitions.
A theorem is a proposition of mathematical interest that is derived, or deduced from a set of axioms, definitions, or other theorems.
Axioms can only ever appear as premises in arguments, theorems can be premises or conclusions of arguments.
Typically what happens is that we start with a collection of axioms and definitions. From those initial axioms and definitions, we can deduce some initial collection of theorems. Next, we deduce a second round of theorems using the previous theorems, axioms and definitions. We can in theory repeat this process to yield ever more theorems.
During the course of all of this theorem proving, we may even come up with new definitions for a variety of reasons.
Notice from the above discussion we noted that some theorems can be proven from other, previously deduced theorems in addition to the given axioms and definitions. In other cases, some theorems are simply special cases of other theorems. We have special names for those too.
A lemma is a type of theorem that is used to prove other theorems. In other words, a lemma is what we call a theorem that is used as a premise in an argument.
A corollary is a type of theorem that results from considering special cases of some given theorem.
Oftentimes, the word theorem is reserved for major results. We could be pedantic in describing various kinds of theorems as lemmas and corollarys, but we’ll mostly stick to the word theorem.
Most of the time, it’s not important to correctly classify certain theorems as lemmas or corollarys. The important thing about them is that they are all theorems that can be deduced from the axioms, definitions, and other theorems.
The final mathematical building block we’ll discuss is the proof.
A proof is a valid argument provided to show that an implication is a logical implication.
Up to this point in this chapter, we’ve discussed arguments at length, especially how to determine if a given argument is valid. We also touched lightly on the subject of using the rules of inference to chain together logical implications to yield new logical implications. All a proof is is just a valid argument.
Typically, we describe a proof as being given in reference to a theorem. By this, what we mean is that if someone proposes a statement of mathematical interest, if a proof can be given for that statement, then that statement is henceforth called a theorem (because it can be deduced).
Remember that an argument is simply an implication, with a conjunction of multiple propositions as the hypothesis, and a single proposition as the conclusion. Any of the propositions in the hypothesis or conclusion can be primitive or compound.
In the next section, we start learning specific methods of proof, and of ways to devise proofs. This is the primary activity of mathematics.