The Analysis of Data, volume 1

Basic Definitions: The Probability Functions

1.1. Sample Space and Events

This chapter covers the most basic definitions of probability theory and explores some fundamental properties of the probability function.

Our starting point is the concept of an abstract random experiment. This is an experiment whose outcome is not necessarily determined before it is conducted. Examples include flipping a coin, the outcome of a soccer match, and the weather. The set of all possible outcomes associated with the random experiment is called the sample space. Events are subsets of the sample space, or in other words sets of possible outcomes. The probability function assigns real values to events in a way that is consistent with our intuitive understanding of probability. Formal definitions appear below.

Definition 1.1.1. A sample space $\Omega$ associated with a random experiment is the set of all possible outcomes of the experiment.

A sample space can be finite, for example \[\Omega=\{1,\ldots,10\}\] in the experiment of observing a number from 1 to 10. Or $\Omega$ can be countably-infinite, for example \[\Omega=\{0,1,2,3,\ldots\}\] in the experiment of counting the number of phone calls made on a specific day. A sample space may also be uncountably infinite, for example \[\Omega=\{x\,:\, x \in\R, \,x\geq 0\}\] in the experiment of measuring the height of a passer-by.

The notation $\mathbb{N}$ corresponds to the natural numbers $\{1,2,3,\ldots\}$, and the notation $\mathbb{N}\cup\{0\}$ corresponds to the set $\{0,1,2,3,\ldots\}$. The notation $\R$ corresponds to the real numbers and the notation $\{x: x \in\R, x\geq 0\}$ corresponds to the non-negative real numbers. See Chapter A in the appendix for an overview of set theory, including the notions of a power set and countably infinite and unconuntably infinite sets.

In the examples above, the sample space contained unachievable values (number of people and height are bounded numbers). A more careful definition could have been used, taking into account bounds on the number of potential phone calls or potential height values. For the sake of simplicity, we often use simpler sample spaces containing some unachievable outcomes. This is not a significant problem, since we can later assign zero probability to such values.

Definition 1.1.2. An event $E$ is a subset of the sample space $\Omega$, or in other words a set of possible outcomes.

In particular, the empty set $\emptyset$ and the sample space $\Omega$ are events. Figure 1.2.1 shows an example of a sample space $\Omega$ and two events $A,B\subset\Omega$ that are neither $\emptyset$ nor $\Omega$. The R code below shows all possible events of an experiment with $\Omega=\{a,b,c\}$. There are $2^{|\Omega|}$ such sets, assuming $\Omega$ is finite (see Chapter A on set theory for more information on the power set).

Omega = set("a", "b", "c")
# display a set containing all possible events of
# an experiment with a sample space Omega
## {{}, {"a"}, {"b"}, {"c"}, {"a", "b"}, {"a",
##  "c"}, {"b", "c"}, {"a", "b", "c"}}
Example 1.1.1 In the random experiment of tossing a coin three times and observing the results (heads or tails), with ordering, the sample space is the set \[\Omega=\{\text{HHH,HHT,HTH,HTT,THH,THT,TTH,TTT}\}.\] The event \[ E=\{\text{HHH,HHT,HTT,HTH}\}\subset \Omega\] describes "a head was obtained in the first coin toss." In this case both the sample space $\Omega$ and the event $E$ are finite sets.
Example 1.1.2 Consider a random experiment of throwing a dart at a round board without missing the board. Assuming the radius of the board is 1, the sample space is the set of all two dimensional vectors inside the unit circle \[\Omega=\left\{(x,y)\,:\, x,y\in\mathbb{R}, \,\sqrt{x^2+y^2} < 1\right\}.\] An event describing a bullseye hit may be \[E=\left\{(x,y)\, : \,x,y\in\mathbb{R}, \,\sqrt{x^2+y^2} < 0.1\right\}\subset\Omega.\] In this case both the sample space $\Omega$ and the event $E$ are uncountably infinite.

For an event $E$, the outcome of the random experiment $\omega\in\Omega$ is either in E $(\omega\in E)$ or not in $E$ $(\omega\not\in E)$. In the first case, we say that the event $E$ occurred, and in the second case we say that the event $E$ did not occur. $A\cup B$ is the event of either $A$ or $B$ occurring and $A\cap B$ is the event of both $A$ and $B$ occurring. The complement $A^c$ (in the complement, the universal set is taken to be $\Omega$: $A^c=\Omega\setminus A)$ represents the event that $A$ did not occur. If the events $A,B$ are disjoint $(A\cap B=\emptyset)$, the two events cannot happen at the same time, since no outcome of the random experiment belongs to both $A$ and $B$. If $A\subset B$, then $B$ occurring implies that $A$ occurs as well.