Differentiation: Taylor Expansion and Power Series

Probability

The Analysis of Data, volume 1

0
- Front Matter
- 0.1: Contents
- 0.2: Preface
1
2
- Random Variables
- 2.1: Basic Definitions
- 2.2: Functions of RVs
- 2.3: Expectation and Variance
- 2.4: Moments and MGF
- 2.5: RVs and Measure Theory
- 2.6: Notes
- 2.7: Exercises
3
4
5
- Important Vectors
- 5.1: Multinomial Vectors
- 5.2: Gaussian Vectors
- 5.3: Dirichlet Vectors
- 5.4: Mixture Vectors
- 5.5: Exponential Family
- 5.6: Notes
- 5.7: Exercises
6
- Random Processes
- 6.1: Basic Definitions
- 6.2: Marginals
- 6.3: Moments
- 6.4: Random Walk
- 6.5: Processes and Measure
- 6.6: Borell-Cantelli and Zero-One
- 6.7: Notes
- 6.8: Exercises
7
- Important RPs
- 7.1: Markov Chains
- 7.2: Poisson Process
- 7.3: Gaussian Process
- 7.4: Notes
- 7.5: Exercises
8
A
- Set Theory
- A.1: Basic Definition
- A.2: Functions
- A.3: Cardinality
- A.4: Limits of Sets
- A.5: Notes
- A.6: Exercises
B
- Metric Spaces
- B.1: Basic Definitions
- B.2: Limits
- B.3: Continuity
- B.4: Euclidean Space
- B.5: Growth of Functions
- B.6: Notes
- B.7: Exercises
C
- Linear Algebra
- C.1: Basic Definitions
- C.2: Rank
- C.3: Eigenvalues and Determinant
- C.4: Semidefinite Matrices
- C.5: SVD
- C.6: Notes
- C.7: Exercises
D
- Differentiation
- D.1: Scalar Differentiation
- D.2: Power and Taylor Series
- D.3: Notes
- D.4: Exercises
E
- Measure Theory
- E.1: Sigma Algebras
- E.2: Measure Function
- E.3: Extension Theorem
- E.4: Independence
- E.5: Important Measures
- E.6: Measurable Functions
- E.7: Notes
F

$ \def\P{\mathsf{\sf P}} \def\E{\mathsf{\sf E}} \def\Var{\mathsf{\sf Var}} \def\Cov{\mathsf{\sf Cov}} \def\std{\mathsf{\sf std}} \def\Cor{\mathsf{\sf Cor}} \def\R{\mathbb{R}} \def\c{\,|\,} \def\bb{\boldsymbol} \def\diag{\mathsf{\sf diag}} \def\defeq{\stackrel{\tiny\text{def}}{=}} $

D.2. Taylor Expansion and Power Series

Taylor's important theorem approximates a smooth function with a polynomial.

Proposition D.2.1 (Taylor Series Theorem). Let $f:[a,b] \to \R$ be a function whose higher order derivatives exist and define the $k$-order Taylor approximation at $x\in[a,b]$ around $\alpha\in[a,b]$ to be \[ P_k(x)=\sum_{l=0}^{k} \frac{f^{(l)}(\alpha)}{l!} (x-\alpha)^l.\] Then there exists a point $y$ between $x$ and $\alpha$ such that \begin{align} f(x)-P_k(x) = \frac{f^{(k+1)}(y)}{(k+1)!}(x-\alpha)^{k+1}. \tag{*} \end{align} If in addition $\sup_{k\in\mathbb{N}}|f^{(k)}(y)|$ is finite for all $y\in(a,b)$, then \[f(x)=\lim_{k\to\infty} P_k(x).\]

Proof. For an arbitrary $x,\alpha\in [a,b]$, we define \begin{align*} M &\defeq (f(x)-P_k(x))/(x-\alpha)^{k+1},\\ g(y)&\defeq f(y)-P_k(y)-M(y-\alpha)^{k+1}, \qquad y\in[a,b], \end{align*} and observe that \begin{align*} P^{(l)}(\alpha) &=f^{(l)}(\alpha)& l&=0,\ldots,k\\ g^{(l)}(\alpha)&=0& l&=0,\ldots,k. \end{align*} Since \[g^{(k+1)}(y)=f^{(k+1)}(y)-(k+1)!M,\] we will prove Equation (*) if we can show that \[g^{(k+1)}(y)=0, \qquad \text{ for some } y\in B_x(\alpha).\]

Since $g(\alpha)=0$ (since $g^{(0)}(\alpha)=0$ -- see above) and $g(x)=0$ (by definition of $M$), the mean value theorem implies that $g'(z_1)=0$ for some $z_1$ that lies between $\alpha$ and $x$. Similarly, since $g'(\alpha)=0=g'(z_1)$ we have $g^{(2)}(z_2)=0$ for some $z_2$ lying between $\alpha$ and $z_1$, and so on until $g^{(k+1)}(z_{k+1})=0$ for some $z_{k+1}$ lying between $\alpha$ and $x$. This concludes the proof of Equation (*).

If the higher order derivatives at $x$ are bounded as indicated in the second part of the proposition, then \[\lim_{k\to\infty} \Bigg | f^{(k+1)}(y) \frac{(x-\alpha)^{k+1}}{(k+1)!} \Bigg | \leq C \lim_{k\to\infty}\Bigg |\frac{(x-\alpha)^{k+1}}{(k+1)!} \Bigg | = 0,\] where the last equality follows from the fact that the factorial function grows faster to $\infty$ than the exponential function (see Section 1.6).

If the conditions in the proposition above hold, we can express the Taylor approximation using the big-oh and little-oh notations (see Section B.5) at the limit $x\to\alpha$: \begin{align*} f(x) &= \sum_{l=0}^{n} \frac{f^{(l)}(\alpha)}{l!} (x-\alpha)^l + O(|x-\alpha|^{n+1}) \end{align*} and \begin{align*} f(x) &= \sum_{l=0}^{n} \frac{f^{(l)}(\alpha)}{l!} (x-\alpha)^l + o(|x-\alpha|^{l}). \end{align*} The case $\alpha=0$ is sometimes of particular interest: we have \begin{align*} f(x) &= \sum_{l=0}^{n} \frac{f^{(l)}(0)}{l!} x^l + O(|x|^{l+1}) \end{align*} and \begin{align*} f(x)&= \sum_{l=0}^{n} \frac{f^{(l)}(0)}{l!} x^l + o(|x|^{n}). \end{align*}

The Taylor approximation $P_k$ approximates $f$ at the point $x$ using higher order derivatives of $f$, evaluated at $\alpha$. Thus, functions $f:\R\to\R$ whose higher order derivatives exist and are bounded are completely specified for all values by their derivatives at a single point (any point).

The following proposition is sometimes useful. In many cases, the third statement below is taken to be the definition of the exponential function. We prove it in order to demonstrates the Taylor series proposition above.

Proposition D.2.2. \begin{align*} \lim_{n\to\infty} \left(1+\frac{x}{n}\right)^n&= \exp(x)\\ \lim_{n\to\infty} \left(1+a_n\right)^n&=\exp\left(\lim_{n\to\infty} n a_n\right)\\ \lim_{n\to\infty} 1+\frac{x^1}{1!}+\cdots+\frac{x^n}{n!}&= \exp(x). \end{align*} (where we assume in the second statement that $y=\lim n a_n$ exists and is finite.)

Proof. \begin{align*} \log \lim_{n\to\infty} (1+x/n)^n &= \lim_{n\to\infty} \log (1+x/n)^n= \lim_{n\to\infty} n\log (1+x/n)\\ &= \lim_{h\to 0} x \frac{\log (1+h)}{h} = x \frac{d\log x}{dx}\Big|_{x=1}=x\cdot 1, \end{align*} where we used the fact that $\log$ is a continuous function and the substitution $h=x/n$. The proof of the second statement is similar: \begin{align*} \log \lim_{n\to\infty} (1+a_n)^n &= \lim_{n\to\infty} \log (1+a_n)^n = \lim_{n\to\infty} n\log (1+a_n)\\ &= \lim_{h\to 0} na_n \frac{\log (1+h)}{h} = \lim_{n\to\infty} na_n \cdot \lim_{h\to 0} \frac{\log (1+h)}{h}\\ &= \lim_{n\to\infty} na_n \frac{d\log x}{dx}\Big|_{x=1}=y\cdot 1, \end{align*} where we used the substitution $h=a_n$ and used the fact that $a_n\to 0$. The third statement follows from the Taylor series proposition above, using the fact that $\exp(x)^{(k)}=\exp(x)$, and that convergence holds since $\sup_{k\in\mathbb{N}} (\exp(x))^{(k)}$ is finite.

Proposition D.2.3. \[\sum_{k=0}^{n-1} ar^k = a \frac{1-r^n}{1-r},\qquad r\neq 1.\]

Proof. Denoting $R=\sum_{k=0}^{n-1} ar^k$, we have that \begin{align*} Rr &= ar+ar^2+\cdots+ar^n\\ R(1-r)& = R-Rr =a-ar^n\\ R &= \frac{a-ar^n}{1-r}. \end{align*}

Corollary D.2.1. \[\sum_{k=0}^{\infty} ar^k = \begin{cases} a(1-r)^{-1} & |r| < 1 \\ \text{ series does not converge } & \text{otherwise}\end{cases}\]

Proof. Take the limit $n\to\infty$ in the previous proposition and note that we need $|r|<1$ to ensure convergence.

Example D.2.1. Using the results above for $x\in B_1(0)$ we have \begin{align*} \frac{1}{1-x} &= 1+ x + x^2 + x^3+\cdots \end{align*} which is also the Taylor series expansion. Differentiating both sides of the equation above we also get for $x\in B_1(0)$, \begin{align*} \frac{1}{(1-x)^2} &= \sum_{n=0}^{\infty} (n+1)x^n = \sum_{n=0}^{\infty} nx^n + \frac{1}{1-x}, \end{align*} implying the following expression \begin{align} \sum_{n=0}^{\infty} nx^n &= \frac{1}{(1-x)^2} - \frac{1}{1-x}. \end{align} Differentiating the equation above again, we get the expression \begin{align} \sum_{n=0}^{\infty} n^2x^n &= \left(\frac{1}{(1-x)^2} - \frac{1}{1-x}\right)' x =\frac{2x}{(1-x)^3}-\frac{x}{(1-x)^2}. \end{align}

Proposition D.2.4. Let $a_n$ be a non-increasing sequence of positive numbers. Then the sequence $A=\sum_{n\in\mathbb{N}}a_n$ converges if and only if the sequence $B=\sum_{n=0}^{\infty} 2^n a_{2^n}$ converges.

Proof. Writing the sequence $A$ as \[A=(a_1)+ (a_2+a_3)+(a_4+a_5+a_6+a_7)+\cdots\] we note that each parenthetical term above is less than or equal the first summand in that term times the number of summands: $(a_1)\leq a_1$, $(a_2+a_3)\leq a_2+a_2$, $(a_4+a_5+a_6+a_7)\leq a_4+a_4+a_4+a_4$, and so on. This establishes the inequality $A\leq B$ and implies that $A$ converges if $B$ converges.

It remains to show that $B$ converges if $A$ converges. To do that we write \[B=(a_1+a_2)+ (a_2+a_4+a_4+a_4)+(a_4+a_8+\cdots)+\cdots\] and note that $(a_1+a_2)\leq (a_1+a_1)$, $(a_2+a_4+a_4+a_4)\leq (a_2+a_2+a_3+a_3)$, and so on. This implies the inequality \[ B\leq a_1+a_1+a_2+a_2 + a_3+a_3 + a_4+a_4+\cdots=2A,\] which implies that $B$ converges if $A$ converges.

Proposition D.2.5. The sequence $\sum_{n\in\mathbb{N}} n^{-\alpha}$ converges if and only if $\alpha>1$.

Proof. By Proposition D.2.4, the series $\sum_{n\in\mathbb{N}} n^{-\alpha}$ converges if and only if the series \[\sum_{n\in\mathbb{N}} 2^n (2^{-n\alpha})=\sum_{n\in\mathbb{N}} 2^{n-n\alpha}=B\] converges. This occurs by Corollary D.2.1 if and only if $\alpha>1$.

Section F.6 describes the multivariate generalization of differentiation and Taylor expansions.