Probability
The Analysis of Data, volume 1
Basic Definitions: Conditional Probability and Independence
$
\def\P{\mathsf{P}}
\def\R{\mathbb{R}}
\def\defeq{\stackrel{\tiny\text{def}}{=}}
\def\c{\,|\,}
$
1.5. Conditional Probability and Independence
Definition 1.5.1.
The conditional probability of an event $A$ given an event $B$ with $\P(B) > 0$ is
\[ \P(A \c B) = \frac{\P(A\cap B)}{\P(B)}.\]
If $\P(A)>0$ and $\P(B)>0$ we have \[\P(A\cap B)=\P(A \c B)\P(B)=\P(B \c A)\P(A).\]
Intuitively, $\P(A \c B)$ is the probability of $A$ occurring assuming that the event $B$ occurred. In accordance with that intuition, the conditional probability has the following properties.
- If $B\subset A$, then $\P(A \c B)=\P(B)/\P(B)=1$.
- If $A\cap B=\emptyset$, then $\P(A \c B)=0/\P(B)=0$.
- If $A\subset B$ then $\P(A \c B)=\P(A)/\P(B)$.
- The conditional probability may be viewed as a probability function
\[\P_A(E) \defeq \P(E \c A)\] satisfying Definition 1.2.1 (Exercise 7). In addition, all the properties and intuitions that apply to probability functions apply to $\P_A$ as well.
- Assuming the event $A$ occurred, $\P_A$ generally has better forecasting abilities than $\P$.
As mentioned above, conditional probabilities are usually intuitive. The following example from (Feller, 1968), however, shows a counter-intuitive situation involving conditional probabilities. This demonstrates that intuition should not be a substitute for rigorous computation.
Example 1.5.1.
Consider families with two children where the gender probability of each child is symmetric (1/2). We select a family at random and consider the sample space describing the gender of the children $\Omega=\{MM,MF,FM,FF\}$. We assume a classical model, implying that the probabilities of all 4 elementary events are 1/4.
We define the event that both children in the family are boys as $A=\{MM\}$, the event that a family has a boy as $B=\{MF,FM,MM\}$, and the event that the first child is a boy as $C=\{MF,MM\}$.
Given that the first child is a boy, the probability that both children are boys is
\[\P(A \c C)=\P(A\cap C)/\P(C)=\P(A)/\P(C)=(1/4)/(1/2)=1/2.\]
This matches our intuition. Given that the family has a boy, the probability that
both children are boys is the counterintuitive
\[\P(A \c B)=\P(A\cap B)/\P(B)=\P(A)/\P(B)=(1/4)/(3/4)=1/3.\]
Definition 1.5.2.
Two events $A,B$ are independent if $\P(A\cap B)=\P(A)\P(B)$. A finite number of events $A_1,\ldots,A_n$ are independent if \[ \P(A_1\cap\cdots A_n)=\P(A_1)\cdots \P(A_n)\] and are pairwise independent if every pair $A_i, A_j, i\neq j$ are independent.
The following definition generalizes independence to an arbitrary collection of events, indexed by a (potentially infinite) set $\Theta$.
Definition 1.5.3.
Multiple events $A_{\theta}, \theta\in\Theta$ are pairwise independent if every pair of events is independent. Multiple events $A_{\theta}, \theta\in\Theta$ are independent if for every $k>0$ and for every size $k$-subset of distinct events $A_{\theta_1},\ldots,A_{\theta_k}$, we have
\[\P(A_{\theta_1}\cap\ldots\cap A_{\theta_k})=\P(A_{\theta_1})\cdots \P(A_{\theta_k}).\]
Note that pairwise independence is a strictly weaker condition than independence.
In agreement with our intuition, conditioning on an event that is independent of $A$ does not modify the probability of $A$:
\[\P(A \c B)=\frac{\P(A)\P(B)}{\P(B)}=\P(A).\]
On the other hand, two disjoint events cannot occur simultaneously and should therefore be dependent. Indeed, in this case $\P(A \c B)=0 \neq \P(A)$ (assuming that $\P(A)$ and $\P(B)$ are non-zero).
Example 1.5.2.
We consider a random experiment of throwing two dice independently and denote by $A$ the event that the first throw resulted in 1, by $B$ the event that the sum in both throws is 3, and by $C$ the event that the second throw was even. Assuming the classical model, the events $A,B$ are dependent
\[ \P(A\cap B)=\P(B \c A)\P(A)= (1/6)(1/6) \neq (1/6)(2/36) = \P(A)\P(B),\]
while $A$ and $C$ are independent
\[ \P(A\cap C)=\P(C \c A)\P(A)=(1/2)(1/6)=\P(A)\P(C).\]
Proposition 1.5.1.
If $A,B$ are independent, then so are the events $A^c,B$, the events $A,B^c$, and the events $A^c,B^c$.
Proof.
For example,
\begin{align*}
\P(A^c\cap B) &= \P(B\setminus A)=\P(B)-\P(A\cap
B)=\P(B)-\P(A)\P(B)\\
&=(1-\P(A))\P(B)=\P(A^c)\P(B).
\end{align*}
The other parts of the proof are similar.
Proposition 1.5.2 (Bayes Theorem).
If $\P(B)\neq 0$ and $\P(A)\neq 0$, then
\[ \P(A \c B)=\frac{\P(B \c A)\P(A)}{\P(B)}.\]
Proof.
\[ \P(A \c B)\P(B)=\P(A\cap B)=\P(B\cap A)=\P(B \c A)\P(A).\]
Example 1.5.3.
We consider the following imaginary voting pattern of a group of 100 Americans, classified according to their party and whether they live in a city or a small town. The last row and last column capture the sum of the columns and the sum of the rows, respectively.
|
City |
Small Town |
Total |
Democrats |
30 |
15 |
45 |
Rebpublicans |
20 |
35 |
55 |
Total |
50 |
50 |
100 |
We consider the experiment of drawing a person at random and observing the vote. The sample space contains 100 elementary events and we assume a classical model, implying that each person may be selected with equal $1/100$ probability.
Defining $A$ as the event that a person selected at random lives in the city, and $B$ as the event that a person selected at random is a democrat, we have
\begin{align*}
\P(A\cap B) &=30/100\\
\P(A^c\cap B) &=15/100\\
\P(A\cap B^c) &=20/100\\
\P(A^c\cap B^c)&=35/100\\
\P(A) &=50/100\\
\P(B) &=45/100\\
\P(A \c B) &=0.3/0.45\\
\P(A \c B^c) &=0.2/0.55\\
\P(B \c A) &=0.3/0.5\\
\P(B \c A^c) &=0.15/0.5.
\end{align*}
Since $A,B$ are dependent, conditioning on city dwelling raises the probability that a randomly drawn person is democrat from $\P(B)=0.45$ to $\P(B \c A)=0.6$.
Proposition 1.5.3 (General Multiplication Rule).
\begin{align*}
\P(A_1\cap \cdots\cap A_n)=\P(A_1)\P(A_2 \c A_1)\P(A_3 \c A_2\cap A_1)\cdots
\P(A_n \c A_1\cap\cdots \cap A_{n-1}).
\end{align*}
Proof.
Using induction and $\P(A\cap B)=\P(A \c B)\P(B)$, we get
\begin{align*}
\P(A_1\cap \cdots\cap A_n)&=\P(A_n \c A_1\cap\cdots \cap A_{n-1})\P(A_1\cap\cdots \cap A_{n-1})\\
&=\cdots\\ &=\P(A_1)\P(A_2 \c A_1)\P(A_3 \c A_2\cap A_1)\cdots
\P(A_n \c A_1\cap\cdots \cap A_{n-1}).
\end{align*}
Proposition 1.5.4 (The Law of Total Probability).
If $A_i, i\in S$, form a finite or countably infinite partition of $\Omega$ (see Section A.1 for a definition of a partition)
\[ \P(B)=\sum_{i\in S} \P(A_i)\P(B \c A_i).\]
Proof.
The partition $A_i, i\in S$, of $\Omega$ induces a partition $B\cap A_i$, $i\in S$, of $B$. The result follows from countable additivity (third probability axiom) or finite additivity applied to that partition
\[\P(B)=\P\left( \bigcup_{i\in S} (B\cap A_i) \right)=
\sum_{i\in S} \P(A_i\cap B)=\sum_{i\in S} \P(A_i)\P(B \c A_i).\]
The figure below illustrates the above proposition and its proof.
Figure 1.5.1:
The partition $A_1,\ldots,A_4$ of $\Omega$ induces a partition $B\cap A_i$, $i=1,\ldots,4$ of $B$ (see Proposition 1.5.4).
The definition below extends the notion of independence to multiple experiments.
Definition 1.5.4.
Consider $n$ random experiments with sample spaces $\Omega_1,\ldots, \Omega_n$. The set $\Omega=\Omega_1\times\cdots\times \Omega_n$ (see Chapter A for a definition of the cartesian product $\times$) is the sample space expressing all possible results of the experiments. The experiments are independent if for all sets $A_1\times\cdots\times A_n$ with $A_i\subset\Omega_i$,
\[ \P(A_1\times\cdots\times A_n) = \P(A_1)\cdots \P(A_n).\]
In the equation above, the probability function on the left hand side is defined on $\Omega_1\times\cdots\times \Omega_n$ and the probability functions on the right hand side are defined on $\Omega_i$, $i=1,\ldots,n$.
Example 1.5.4.
In two independent die throwing experiments $\Omega=\{1,\ldots,6\}\times\{1,\ldots,6\}$ and
\begin{align*}
\P(\text{first die is 3, second die is 4})&=\P(\text{first die is 3})\P(\text{second die is 4})\\ &=\frac{1}{6}\cdot\frac{1}{6}=\frac{1}{36}.\end{align*}
Chapter 4 contains an extended discussion of probabilities associated with multiple experiments.