$
\def\P{\mathsf{P}}
\def\R{\mathbb{R}}
\def\defeq{\stackrel{\tiny\text{def}}{=}}
\def\c{\,|\,}
$

If $\P(A)>0$ and $\P(B)>0$ we have \[\P(A\cap B)=\P(A \c B)\P(B)=\P(B \c A)\P(A).\]

Intuitively, $\P(A \c B)$ is the probability of $A$ occurring assuming that the event $B$ occurred. In accordance with that intuition, the conditional probability has the following properties.

- If $B\subset A$, then $\P(A \c B)=\P(B)/\P(B)=1$.
- If $A\cap B=\emptyset$, then $\P(A \c B)=0/\P(B)=0$.
- If $A\subset B$ then $\P(A \c B)=\P(A)/\P(B)$.
- The conditional probability may be viewed as a probability function \[\P_A(E) \defeq \P(E \c A)\] satisfying Definition 1.2.1 (Exercise 7). In addition, all the properties and intuitions that apply to probability functions apply to $\P_A$ as well.
- Assuming the event $A$ occurred, $\P_A$ generally has better forecasting abilities than $\P$.

As mentioned above, conditional probabilities are usually intuitive. The following example from (Feller, 1968), however, shows a counter-intuitive situation involving conditional probabilities. This demonstrates that intuition should not be a substitute for rigorous computation.

We define the event that both children in the family are boys as $A=\{MM\}$, the event that a family has a boy as $B=\{MF,FM,MM\}$, and the event that the first child is a boy as $C=\{MF,MM\}$.

Given that the first child is a boy, the probability that both children are boys is \[\P(A \c C)=\P(A\cap C)/\P(C)=\P(A)/\P(C)=(1/4)/(1/2)=1/2.\] This matches our intuition. Given that the family has a boy, the probability that both children are boys is the counterintuitive \[\P(A \c B)=\P(A\cap B)/\P(B)=\P(A)/\P(B)=(1/4)/(3/4)=1/3.\]

The following definition generalizes independence to an arbitrary collection of events, indexed by a (potentially infinite) set $\Theta$.

Note that pairwise independence is a strictly weaker condition than independence.

In agreement with our intuition, conditioning on an event that is independent of $A$ does not modify the probability of $A$: \[\P(A \c B)=\frac{\P(A)\P(B)}{\P(B)}=\P(A).\] On the other hand, two disjoint events cannot occur simultaneously and should therefore be dependent. Indeed, in this case $\P(A \c B)=0 \neq \P(A)$ (assuming that $\P(A)$ and $\P(B)$ are non-zero).

City | Small Town | Total | |

Democrats | 30 | 15 | 45 |

Rebpublicans | 20 | 35 | 55 |

Total | 50 | 50 | 100 |

We consider the experiment of drawing a person at random and observing the vote. The sample space contains 100 elementary events and we assume a classical model, implying that each person may be selected with equal $1/100$ probability.

Defining $A$ as the event that a person selected at random lives in the city, and $B$ as the event that a person selected at random is a democrat, we have \begin{align*} \P(A\cap B) &=30/100\\ \P(A^c\cap B) &=15/100\\ \P(A\cap B^c) &=20/100\\ \P(A^c\cap B^c)&=35/100\\ \P(A) &=50/100\\ \P(B) &=45/100\\ \P(A \c B) &=0.3/0.45\\ \P(A \c B^c) &=0.2/0.55\\ \P(B \c A) &=0.3/0.5\\ \P(B \c A^c) &=0.15/0.5. \end{align*} Since $A,B$ are dependent, conditioning on city dwelling raises the probability that a randomly drawn person is democrat from $\P(B)=0.45$ to $\P(B \c A)=0.6$.

The figure below illustrates the above proposition and its proof.

Figure 1.5.1:
The partition $A_1,\ldots,A_4$ of $\Omega$ induces a partition $B\cap A_i$, $i=1,\ldots,4$ of $B$ (see Proposition 1.5.4).

The definition below extends the notion of independence to multiple experiments.

In the equation above, the probability function on the left hand side is defined on $\Omega_1\times\cdots\times \Omega_n$ and the probability functions on the right hand side are defined on $\Omega_i$, $i=1,\ldots,n$.

Chapter 4 contains an extended discussion of probabilities associated with multiple experiments.