Probability
The Analysis of Data, volume 1
Random Vectors: Conditional Expectations
$
\def\P{\mathsf{\sf P}}
\def\E{\mathsf{\sf E}}
\def\Var{\mathsf{\sf Var}}
\def\Cov{\mathsf{\sf Cov}}
\def\std{\mathsf{\sf std}}
\def\Cor{\mathsf{\sf Cor}}
\def\R{\mathbb{R}}
\def\c{\,|\,}
\def\bb{\boldsymbol}
\def\diag{\mathsf{\sf diag}}
\def\defeq{\stackrel{\tiny\text{def}}{=}}
$
4.7. Conditional Expectations
Definition 4.7.1.
The conditional expectation of the RV $Y$ conditioned on $X=x$ is
\begin{align*}
\E(Y \c X=x)=\begin{cases} \int_{-\infty}^{\infty} y f_{Y \c X=x}(y)\, dy &
Y \c X=x \text{ is a continuous RV}\\
\sum_{y} y p_{Y \c X=x}(y)& Y \c X=x \text{ is a discrete RV}
\end{cases}
\end{align*}
Intuitively, $\E(Y \c X=x)$ represents the average or expected value of $Y$ if we know that $X=x$. Definition 4.7.1 extends naturally to conditioning on multiple random variables, for example $\E(X_i \c \{X_j=x_j:j\neq i\})$ (replace in Definition 4.7.1 the appropriate conditional pdf or pmf).
For a given $x$, the conditional expectation $\E(Y \c X=x)$ is a real number. We can also consider the conditional expectation $\E(Y \c X=x)$ as a function of $x$: $g(x)=\E(Y \c X=x)$. The mapping $x\mapsto g(x)=\E(Y \c X=x)$ corresponds to the random variable $\E(Y \c X)$.
Definition 4.7.1.
The conditional expectation $\E(Y \c X)$ is a random variable $\E(Y \c X):\Omega\to\R$
defined as follows:
\[\E(Y \c X)(\omega) = \E(Y \c X=X(\omega)).\]
In other words, for every value $\omega\in\Omega$ we obtain a value
$X(\omega)\in\R$, which we may denote as $x$, and this in turn leads
to $\E(Y \c X=x)$. Note that $\E(Y \c X)$ is a RV that is a
function of the random variable $X$, or in other words $\E(Y \c X)=g(X)$ for some function $g$.
Since $\E(Y \c X)$ is a random variable, we can compute its expectation. The following proposition discovers an interesting relationship between the expectation of $\E(Y \c X)$ and the expectation of $Y$: for any $X$, we have $\E(\E(Y \c X))=\E(Y)$.
Proposition 4.7.1.
\[\E(\E(Y \c X))=\E(Y).\]
Proof.
\begin{align*}
\E(\E(Y \c X))
&= \int_{-\infty}^{\infty} \E(Y \c X=x)f_{X}(x)\, dx \\
&=\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} y f_{Y \c X=x}(y) \, dy
f_X(x)\, dx\\
&= \int_{-\infty}^{\infty} y \int_{-\infty}^{\infty} f_{X,Y}(x,y) \,
dx dy \\
&=\int_{-\infty}^{\infty} y f_Y(y)\, dy \\
&= \E(Y)
\end{align*}
where the first equality holds since $\E(Y \c X)$ is a function of $X$ and $\E(g(X))=\int g(x)f_X(x)dx$. The proof in the discrete case is similar.
The proposition above is sometimes useful for simplifying the calculation of $\E X$. An example appears below.
Example 4.7.1.
Recalling Example 4.5.3, where
\begin{align*} X &\sim U([0,1])\\ \{Y \c X=x\} &\sim U([x,1]) \quad x\in(0,1),\end{align*}
we have $\E(Y \c X=x)=(x+1)/2$ (as shown in Section 3.7 it is the middle point of the interval $[x,1]$). It follows that $\E(Y \c X)=(X+1)/2$ and by the linearity of the expectation,
\[\E(Y)=\E(\E(Y \c X))=(\E(X)+1)/2=\left(\frac{1}{2}+1\right)/2=3/4.\]