Basic Theorems Derived From the Axioms

Basic Theorems Derived from the Axioms

The axioms of probability are themselves surprisingly simple. The number of results we can derive from just three axioms is quite staggering.

In this section, we’ll build up a large arsenal of theorems we can use throughout this book, all derived from just those three axioms.

Probabilities involving ∅ and S

Figure 1.4.1: Consider a Venn Diagram with events A and B within sample space S. Here, α represents the proportion of the sample space taken up by event A, and β represents the proportion of the sample space taken up by event B. Since A and B are mutually exclusive, we can essentially take

α + β

to figure out how much of the sample space is taken up by

A ∪ B.

We can interpret p(A) and p(B) in a similar way. This gives us that

p(A ∪ B) = α + β = p(A) + p(B).

Theorem 1.4.1

The probability of the null event (the empty set) is 0. In other words,

p(∅) = 0.

Proof

First, observe the fact that

For any infinite collection of events E_i, let E₁ = S, and for any i ≥ 2, let E_i = ∅. Therefore, E_i and E_j are mutually exclusive events when integers i and j are not equal. Therefore, we have that

Now, we can invoke Axiom 3:

Now, we can invoke Axiom 2, which says that p(S) = 1, thus we have that

Which gives the result that p(∅) = 0 as desired.

Theorem 1.4.2

For any finite collection {E₁, E₂, E₃, E₄, …, E_n} of mutually exclusive events taken from sample space S, we have that

Proof

For all i > n, let E_i = ∅. Then, we get the following:

At this point, we can invoke Axiom 3:

Now we can continue on by manipulating this sum as we need, making use of Theorem 1.4.1 along the way:

This gives us the desired result.

Theorem 1.4.2 means that we can essentially restrict Axiom 3 to a finite number of mutually exclusive events. We no longer have to deal with infinitely large collections of null events.

Theorem 1.4.3

For some event E, we have that

p(E^C) = 1 – p(E).

Proof

Consider an experiment with sample space S. For any event E, we have that

Invoking Theorem 1.4.2 yields the following:

Now we can invoke Axiom 2:

And so, we have that p(E^C) = 1 – p(E) as desired.

Theorem 1.4.3, when combined with Axiom 1, gives us a lower bound and an upper bound on the probability of an event. For any event E from sample space S, we have that

0 ≤ p(E) ≤ 1

This inequality gives us a new definition for probability functions.

Probability Function

Consider an experiment with sample space S.

The probability function of the experiment, denoted p, is the function that assigns to each event from the sample space (each element of the sample space’s power set) a real number from [0, 1]. Symbolically, we write

p: P(S) → [0, 1].

Events contained within Other Events

Figure 1.4.2: Here, event A is contained entirely within event B. In this scenario, the region marked by β does not include any part of the region marked by α. Event B corresponds to the sum α + β, whereas event A only corresponds to α. Thus, we essentially have that

p(B – A) = α + β – α = p(B) – p(A).

Theorem 1.4.4

For any events A, B from sample space S with A ⊆ B, we have that

p(B – A) = p(B) – p(A).

Proof

The first thing we could do is to figure out what the complement of B – A is within S. We get the following:

Remember that since A ⊆ B, we have that B^C ∩ A = ∅, meaning B^C and A are mutually exclusive events within S. This means we can also invoke Theorem 1.4.2. Since we calculated the desired set’s complement, we can also invoke Theorem 1.4.3. We do this as follows:

Which gives us that p(B – A) = p(B) – p(A) as desired.

Theorem 1.4.5

For any events A, B from sample space S with A ⊆ B, we have that

p(A) ≤ p(B).

Proof

We can utilize Axiom 1 and Theorem 1.4.5 as follows:

This gives us the desired result.

Events that Occur Simultaneously

Figure 1.4.3: Notice that event A includes regions α and β, whereas event B includes regions β and γ. This gives us that

p(A) + p(B) = α + 2β + γ,

however, we clearly have that

p(A ∪ B) = α + β + γ.

Since

p(A ∩ B) = β,

we essentially have that

p(A ∪ B) = p(A) + p(B) – p(A ∩ B).

Theorem 1.4.6

For any two events A and B from sample space S, we have that

p(A ∪ B) = p(A) + p(B) – p(A ∩ B).

Proof

Notice here, that we don’t know that A and B are mutually exclusive. Indeed there could be some overlap.

Instead, what we could do is recognize that events

A – (A ∩ B),
A ∩ B,
B – (A ∩ B)

are all mutually exclusive. Furthermore, we have that

A ∪ B = [A – (A ∩ B)] ∪ (A ∩ B) ∪ [B – (A ∩ B)].

We invoke Theorem 1.4.2 to get the following:

Now, we can use the fact that

A – (A ∩ B) ⊆ A
B – (A ∩ B) ⊆ B

along with Theorem 1.4.4 to get the following:

This completes the proof.

Theorem 1.4.7

For any three events A, B, and C from sample space S, we have that

p(A ∪ B ∪ C)	=	p(A) + p(B) + p(C)
		– p(A ∩ B) – p(A ∩ C) – p(B ∩ C)
		+ p(A ∩ B ∩ C)

Proof

We can basically just invoke Theorem 1.4.6 repeatedly. First, notice that

A ∪ B ∪ C = [A ∪ B] ∪ C,

which is just one way to associate the events together. The ensuing algebra gets a bit hairy, but is very doable with a little bit of patience and attention to detail. The algebra makes heavy use of the laws of set theory.

And finally, we have the desired result.

We could keep extending Theorem 1.4.7 to ever larger numbers of events. The key is to notice the pattern that emerges. Basically for any number of events, when we take events by an odd number at a time, we add. When we take events by an even number at a time, we subtract.

Of course, if you want to be pedantic and prove the formula for some number of events, you could keep using these theorems back to back (and recursively build up more “theorems”,) though that is usually unnecessary. Combinatorial arguments could be made to prove the most general form of Theorems 1.4.6 and 1.4.7, specifically, the Inclusion-Exclusion Principal.