Probability
The Analysis of Data, volume 1
Random Variables: Expectation and Variance
$
\def\P{\mathsf{\sf P}}
\def\E{\mathsf{\sf E}}
\def\Var{\mathsf{\sf Var}}
\def\Cov{\mathsf{\sf Cov}}
\def\std{\mathsf{\sf std}}
\def\Cor{\mathsf{\sf Cor}}
\def\R{\mathbb{R}}
\def\c{\,|\,}
\def\bb{\boldsymbol}
\def\diag{\mathsf{\sf diag}}
$
2.3. Expectation and Variance
Given a random variable, we often compute the expectation and variance, two important summary statistics. The expectation describes the average value and the variance describes the spread (amount of variability) around the expectation.
Definition 2.3.1.
The expectation of an RV $X$ is the real number
\begin{align*}
\E(X) = \begin{cases} \sum_{x} x \, p_X(x) & X \text{ is
a discrete RV}\\
\int_{-\infty}^{\infty} x \, f_X(x)\, dx & X \text{ is a continuous RV}
\end{cases},
\end{align*}
provided the sum or integral exists.
Example 2.3.1.
The expectation of an RV expressing the result of throwing a fair die
is
\[ \E(X)=1 p_X(1)+\cdots 6 p_X(6)=\frac{1}{6}(1+\cdots+6)=21/6=3.5.\]
Intuitively, the expectation gives the longterm average of many throws of the dice
\begin{align*}\text{average}=&\frac{1\cdot(\text{#-of
times we get 1})+\cdots+6\cdot(\text{#-of-times-we-get 6})}{k}\\
&=1\cdot\,\text{frequency-of-getting 1}+\cdots+6\cdot\,\text{frequency-of-getting 6}\\ &\to \E(X),
\end{align*}
as $n\to\infty$, assuming that the frequency of getting $i$ in the long term converges to $p(X=i)$. We elaborate on this topic in Section 8.6.
Proposition 2.3.1.
\begin{align}
\E(g(X))
&= \begin{cases} \sum_y y p_{g(X)}(y) & g(X) \text{ is discrete} \\ \int y f_{g(X)}(y)dy & g(X) \text{ is continuous} \end{cases} \\
&= \begin{cases} \sum_{x} g(x) p_X(x) & X \text{ is a discrete RV}\\ \int_{-\infty}^{\infty} g(x) f_X(x)dx & X \text{ is a continuous RV}\end{cases}. \label{eq:RV:expFun}
\end{align}
Proof.
The first equality follows from definition and the second equality follows since the value $g(x)$ occurs when $x$ occurs, which happens with probability $p_X(x)$ in the discrete case or density $f_X(x)$ in the continuous case.
Proposition 2.3.2.
For any constants $a,b\in\mathbb{R}$, the expectation of $Y=aX+b$ is \[ \E(aX+b)=a \E(X) + b.\]
Proof.
Using the previous proposition, we have
\begin{align*}
\E(aX+b)&=\sum_x (ax+b)p_X(x) = a \sum_x x p_X(x)+b \sum_x
p_X(x) \\ &= a\E(x)+b \cdot 1
\end{align*}
for a discrete RV. In the case of a continuous RV, we replace the summation with an integral.
The proposition above specifically states that
\begin{align*}\E(X+b)&=\E(X)+b, \text{ and}\\ \E(aX)&=a\E(X).\end{align*}
The variance measures the amount of variability of the RV $X$ around $\E(X)$.
Definition 2.3.2.
The variance of an RV $X$ is the expectation of the RV
$Y=(X-\E(X))^2$: \[\Var(X)=\E((X-\E(X))^2).\] The standard deviation
of an RV $X$ is $\std(X)=\sqrt{\Var(X)}$.
Since variance measures squared deviation about the expectation, an RV with a narrow pmf or pdf should exhibit low variance and an RV with a wide pmf or pdf should exhibit large variance.
Example 2.3.2.
Consider the classical probability model on $(a,b)$ and its associated random variable $X$. We have $\P(X\in A)=|A|/(b-a)$ and
\begin{align*}
F_X(x)&=\begin{cases} (x-a)/(b-a) & x\in (a,b)\\
0 & x\leq a \\
1& x\geq b\end{cases}\\
f_X(x)&=\begin{cases} 1/(b-a) & x\in(a,b)\\ 0 & \text{otherwise} \end{cases}
\end{align*}
The expectation of $X$
\[ \E(X)=\int_a^b \frac{x}{(b-a)} \,dx = \frac{1}{b-a}\frac{1}{2}x^2\Big|^{x=b}_{x=a} = \frac{b^2-a^2}{2(b-a)}= \frac{(b-a)(b+a)}{2(b-a)}= \frac{a+b}{2}\]
is the middle point of the interval $(a,b)$. The variance
\begin{align*}
\Var(X) &= \E((X-\E X)^2)=\int_a^b \frac{1}{(b-a)} (x-(a+b)/2)^2\,dx \\
&= \frac{1}{b-a}\int_a^b(x^2-x(a+b)+(a+b)^2/4)dx\\
&= \frac{1}{b-a}\left(\frac{b^3-a^3}{3} -
(a+b)\frac{b^2-a^2}{2}+\frac{(a+b)^2}{4}(b-a)\right)\\
&=\frac{b^3-a^3}{3(b-a)} - (a+b)\frac{b+a}{2}+\frac{(a+b)^2}{4}\\
&=\frac{(a-b)(a^2+ab+b^2)}{3(b-a)}-\frac{(a+b)^2}{4}
\\ &=\frac{a^2+b^2-2ab}{12}\\ &=\frac{(a-b)^2}{12}
\end{align*}
grows with the interval length $|b-a|$, confirming our intuition that narrow pdfs and wide pdfs exhibit low and wide variance, respectively.
Proposition 2.3.3.
\begin{align} \nonumber
\Var(X)&=\E(X^2)-(\E(X))^2.
\end{align}
Proof.
Using the linearity of expectations (Proposition 2.3.2),
\begin{align*}
\Var(X)&=\E((X-\E(X))^2)=\E( X^2-2X\E (X)+(\E(X))^2)
\\ &= \E(X^2)-2\E(X)\E(X)+(\E(X))^2 \\ &=\E(X^2)-(\E(X))^2.
\end{align*}
Proposition 2.3.4.
For any constants $a,b\in\mathbb{R}$,
\begin{align}\Var(aX+b)=a^2\Var(X).
\end{align}
Proof.
We decompose the proof to the following two cases:
\begin{align*}
\Var(aX)&=\E(a^2X^2)-(\E aX)^2= a^2 \E(X^2)-(a\E(X))^2 = a^2 (\E(X^2)-(\E X)^2) \\&= a^2 \Var(X) \\
\Var(X+b) &=\E((X+b-\E(X+b))^2)=\E((X+b-\E(X)-b))^2)=\Var(X).
\end{align*}
It then follows that $\Var(aX+b)=\Var(aX)=a^2\Var(X)$.
We can consider a constant $b\in\R$ to be a deterministic RV such that $b:\Omega\to\R$ and $b(\omega)=b$. The outcome of such a random variable is pre-determined, or "deterministic". The corresponding expectation and variance are
\begin{align*}
\E(b) &= \sum_x b p_X(x)=b\\
\Var(b) &= \E(b-\E(b)))^2=E(0)=0.
\end{align*}