27 \(\bigstar\) Exponentials PDEs and ODEs
Highlights of this Chapter: to close out our work on derivatives, we take a look at where power series come up in analysis beyond this first course.
- We study the complex exponential, in preparation to prove \(e^{i\pi}=-1\) in the final project.
- We study the matrix exponential, and see its utility in solving systems of linear differential equations.
- We look to extend exponentiation to even more abstract settings, and consider the meaning of \(e^{\frac{d}{dx}}\)
- We use this idea of \(e\) to the power of a differential operator as a window into functional analysis.
27.1 Generalizing the Exponential
Power series are wonderful functions for many reasons, but one of the most powerful is that they are so simple. Like polynomials, to make sense of a power series you just need is addition/subtraction and multiplication, now with one more ingredient: convergence. This makes power series a very natural jumping off point to generalize familiar objects to unfamiliar places. As a first step, we will look at complex numbers:
Definition 27.1 A complex number is a pair \((a,b)\) of real numbers, which we will write as \(a+bi\), where \(i\) is a number such that \(i^2=-1\). Just as the real numbers form a line, pairs of real numbers form a plane.
- Addition of complex numbers is defined component-wise: \[(a+bi)+(c+di)=(a+c)+(b+d)i\]
- Multiplication of a complex number by a real number can also be computed component-wise: \[a(b+ci)=(ab)+(ac)i\]
- Multiplication of two complex numbers is defined by the field axioms together with the definition \(i^2=-1\):
\[\begin{align*} (a+bi)(c+di)&=ac+adi+bci+bdi^2\\ &=(ac-bd)+(ad+bc)i \end{align*}\]
Limits of sequences of complex numbers are defined using the fact that they are built from pairs of real numbers. A sequence \(z_n=x_n+iy_n\) of complex numbers is said to converge if and only if both of the real sequences \(x_n\) and \(y_n\) converge, and in this case
\[\lim(x_n+iy_n):=(\lim x_n)+i(\lim y_n)\]
We know have a fully rigorous theory of what it means to exponentiate a real number: but what does it mean to raise something to the \(i\) power? Or the \(3-7i\) power? Because we know the power series for the real exponential, we can attempt to define a complex exponential just using the power series directly
Definition 27.2 The complex exponential is defined for any \(z\in\CC\) by the series
\[\exp(z)=\sum_{k\geq 0}\frac{z^k}{k!}\] \[=1+z+\frac{1}{2}z^2+\frac{1}{3!}z^3+\cdots\]
Of course, after making such a bold definition we should ask ourselves, does this make sense? That is, does the series of complex numbers converge? This may sound daunting at first, but in fact the theory of complex power series inherits much from the real theory: complex numbers are built from pairs of real numbers after all!
Theorem 27.1 Let \(\sum_n z_n\) be a series of complex numbers, \(z_n=x_n+iy_n\), and let \(|z_n|=\sqrt{x_n^2+y_n^2}\) be their magnitudes. Then \(\sum_n z_n\) converges if \(\sum_n |z_n|\) does.
Proof. Let \(z=x+iy\) be an arbitrary point in \(\CC\), and define \(|z|=\sqrt{x^2+y^2}\) Note that since \(x^2,y^2\geq 0\) we know \[|z|=\sqrt{x^2+y^2}\geq |x|\hspace{1cm}|z|=\sqrt{x^2+y^2}\geq |y|\] Thus, looking at the first inequality we see that \[\sum_{n\geq 0}|z_n|\geq \sum_{n\geq 0}|x_n|\] We’ve assumed that \(\sum|z_n|\) is convergent and so by comparison this implies that \(\sum|x_n|\) is convergent. But this means \(\sum x_n\) is absolutely convergent, and hence convergent. Thus, \(\sum_n x_n = \chi\) for some \(\chi\in\RR\).
A similar argument applies to the sequence of \(y\)’s: since \(\sum|z_n|\geq \sum|y_n|\) comparison shows that \(\sum y_n\) is absolutely convergent and thus convergent, so \(\sum y_n=\eta\) for some \(\eta \in\RR\).
Now, using the definition of convergence for complex numbers, since both real sequences converge the overall sequence does as well, and we can write \[\sum_{n\geq 0 }x_n+iy_n=\sum_{n\geq 0}x_n+i\sum_{n\geq 0}y_n=\chi+i\eta\]
Thus, \(\sum_n z_n\) converges, as claimed.
Corollary 27.1 Let \(p(z)=\sum a_nz^n\) be a power series of a complex number \(z=x+iy\). The \(p(z)\) converges if and only if the corresponding real power series \(p(|z|)\) converges, with \(|z|=\sqrt{x^2+y^2}\) the complex magnitude.
Proof. If \(p(|z|)\) converges, then we know \(\sum_n a_n|z|^n\) converges. But using properties of the absolute value (complex magnitude) we see \[p(|z|)=\sum_{n\geq 0}a_n|z|^n=\sum_{n\geq 0}a_n|z^n|=\sum_{n\geq 0}|a_nz^n|\]
Thus, we have assumed that the series of magnitudes \(|a_nz^n|\) converges, and so by Theorem 27.1 the sequence itself converges, \[\sum_{n\geq 0} a_n z^n\] as claimed.
The complex magnitude \(|z|\) defines a kind of absolute value on the complex numbers, and so we can extend our notion of absolute convergence, saying a series \(\sum a_n z^n\) converges absolutely if the series \(\sum a_n |z|^n\) converges. The theorem above can be translated into this new language, to reveal a familiar theme:
Corollary 27.2 A complex power series is convergent, if it is absolutely convergent.
Applying this to the complex exponential, we can confirm this series makes sense for all complex number inputs:
Corollary 27.3 For any \(z\in\CC\) the power series \(\exp(z)=\sum_n \frac{z^n}{n!}\) converges.
Proof. Let \(r=|z|\). Since \(\exp(x)\) converges for all real inputs, we know \(\exp(r)=\sum \frac{r^n}{n!}=\sum \frac{|z|^n}{n!}\) converges, which means \(\sum\frac{z^n}{n!}\) converges absolutely. But absolute convergence implies convergence, so \[\exp(z)=\sum_{n\geq 0}\frac{z^n}{n!}\] is convergent.
But we can go much further than this. There are many objects we know how to add/subtract and multiply in mathematics - structures with these operations are called rings. So, in any ring where one can make sense of limits, we can attempt to define an exponential function by this power series! A natural example is the ring of \(n\times n\) matrices
Definition 27.3 Denote by \(M_n(\RR)\) the set of all \(n\times n\) matrices with real number entries. We write the \(ij^{th}\) entry of such a matrix \(A\in M_n(\RR)\) as \(A_{ij}\). This space has the following operations that are important for us
- Addition: defined entry-wise \((A+B)_{ij}=A_{ij}+B_{ij}\)
- Scalar Multiplication: defined entry-wise \((cA)_{ij}=c A_{ij}\)
- Multiplication: defined by usual matrix multiplication \((AB)_{ij}=A_{i\star}\cdot B_{\star j}\)
We define limits in \(M_n(\RR)\) using our definition of limits for the real numbers. If \(M_{k}\) is a sequence of matrices, then looking at its entries we can think of \(M_k\) as an array of \(n\times n\) different real-number sequences. We say that \(M_n\) converges if and only if every sequence of entires converges, and in this case define
\[\lim \begin{pmatrix}(a_{11})_n & (a_{12})_n & \cdots\\ (a_{21})_n& (a_{21})_n & \cdots\\ \vdots & \vdots & \ddots \end{pmatrix}:=\begin{pmatrix} \lim(a_{11})_n & \lim(a_{12})_n & \cdots\\ \lim(a_{21})_n& \lim(a_{21})_n & \cdots\\ \vdots & \vdots & \ddots \end{pmatrix} \]
Definition 27.4 The matrix exponential is defined for any \(n\times n\) matrix \(A\) as \[e^A=\exp(A)=\sum_{k\geq 0}\frac{1}{k!}A^k\] \[=I+A+\frac{1}{2}A^2+\frac{1}{3!}A^3+\cdots\]
Again, we are faced with the problem of convergence: for which matrices \(A\) does this power series make sense? A matrix itself is just an \(n\times n\) array of real numbers, so working entry-by-entry, this is just an \(n\times n\) array of sequences, that we need to assure converges. It turns out that like for the complex numbers, the natural thing to do is consider some sory of absolute value on the space of matrices, and try to prove an analog of absolute convergence implies convergence.
Definition 27.5 Let \(A=[a_{ij}]\) be an \(n\times n\) matrix. Then we define the matrix norm of \(A\) as \[|A|=\sqrt{\sum_{1\leq i,j\leq n}a_{ij}^2}\]
So for example \[\left|\begin{pmatrix}1&2\\3&4\end{pmatrix}\right|=\sqrt{1^2+2^2+3^2+4^2}=\sqrt{30}\]
Exactly analogous to the complex case, we can prove that absolute convergence implies convergence
Theorem 27.2 If \(p(x)=\sum a_n x^n\) is a power series and \(A\) is an \(n\times n\) matrix, then \(p(A)\) converges if and only if \(p(|A|)\) converges as a real power series. That is, absolute convergence implies convergence.
Exercise 27.1 Prove a version of Theorem 27.1 for matrices: if \(\sum_n A_n\) is a series of matrices, then it converges if the real series \(\sum_n |A_n|\) does. Then use this to show that a power series \(p(A)\) converges if the real power series converges at \(|A|\).
Exercise 27.2 Find the exponential of the matrix \[\begin{pmatrix} 2& 0\\ 0&3 \end{pmatrix}\]
Exercise 27.3 Prove that if \(A\) is a diagonalizable matrix, then \[\det(e^A)=e^{\operatorname{trace}(A)}\]
Exercise 27.4 Compute the function \[R(t)=\exp\left[\begin{pmatrix}0 &-t\\ t&0\end{pmatrix}\right]\] what matrices do you get? Hint: think about the power series for functions you learned back in calculus
27.1.1 How far can we go?
In linear algebra, one key use of matrices is to represent Linear transformations (another is just as arrays to store information, like systems of equations). But linear operators on a vector space \(V\) are also things that can be added, and multiplied together (composition, as maps \(L\colon V\to V\)) so one can attempt to make sense directly of the exponential of a linear map!
Definition 27.6 If \(L\colon V\to V\) is a linear transformation, we define \(\exp(L)\) to be the linear transformation \(V\to V\) given by
\[e^L=I+L+\frac{1}{2}L\circ L +\frac{1}{3!}L\circ L\circ L+\cdots\]
For this to make sense, we need \(V\) to be a vector space where it makes sense to take limits (for example \(V\) could be a real or complex vector space, among some other examples). When \(V\) is finite dimensional this is equivalent to the matrix examples already given (as matrix multiplication is composition of linear maps) but things get really interesting if we let ourselves go beyond this.
The derivative after all, is a linear map on the vector space of smooth real valued functions, and these functions are things we know how to take limits of (as their inputs and outputs are real numbers!). This might make us wonder: what is the exponential of the derivative?
Definition 27.7 Let \(\mathcal{S}\) be the space of all infinitely differentiable functions, and let \[\frac{d}{dx}\colon\mathcal{S}\to\mathcal{S}\] be the derivative, \(f\mapsto \frac{d}{dx}f=f^\prime\). Then the exponential of the derivative is the operator \(\mathcal{S}\to\mathcal{S}\) defined by the series \[e^{\frac{d}{dx}}=I+\frac{d}{dx}+\frac{1}{2}\frac{d^2}{dx^2}+\frac{1}{3!}\frac{d^3}{dx^3}+\cdots\]
This acts on functions as follows, taking a function \(f\colon\RR\to\RR\) to the function \(e^{\frac{d}{dx}}(f)\colon\RR\to\RR\) given by
\[e^{\frac{d}{dx}}(f(x))=f(x)+f^\prime(x)+\frac{1}{2}f^{\prime\prime}(x)+\frac{1}{3!}f^{\prime\prime\prime}(x)+\cdots\]
You might rightly worry about convergence here: when does this expression even make sense?! The general theory of such things is beyond the scope of this course, but for functions which are themselves power series, we can actually come up with a beautifully simple answer:
Exercise 27.5 Prove that for any power series \(p(x)\) that the exponentiated derivative actually performs a remarkably simple operation: it shifts the functions input
\[e^{\frac{d}{dx}}[p(x)]=p(x+1)\]
Hint: show this happens for \(x^n\), use linearity of the derivative, and take some limits
Corollary 27.4 Let \(t\frac{d}{dx}\) be the operator on functions which takes a function \((x)\) to the function \(tf^\prime(x)\). Then the exponential of this operator equal the shift by \(t\) operator on analytic functions,
\[e^{t\frac{d}{dx}}[f(x)]=f(x+t)\]
27.2 Solving Differential Equations
One interesting application of exponentiation is to solving differential equations. We will not dive deeply into this topic but only take a quick view of some interesting examples, for those who enjoy differential equations.
27.2.1 \(y^\prime =c y\)
If \(c\) is a constant, a solution to the differential equation \(y^\prime = cy\) is a function whose derivative is \(c\) times itself. The simplest such equation is when \(c=1\), which asks for a function \(y^\prime=y\) that is its own derivative. We know the answer to this! The natural exponential, \(y=\exp(x)\) has this property - and so does any constant multiple. The same ideas carry over to more general \(c\):
Exercise 27.6 Let \(c\in\RR\) be any constant. Then the solutions to the differential equation \(y^\prime=cy\) are the functions \[f(t)=Ae^{ct}\] for \(A\in\RR\).
Usually, a differential equation is given with an initial condition, specifying the functions behavior at a certain point. This picks out one solution from the many: if \(y_0\) is the value at \(y=0\) then the specific solution satisfying both this and \(y^\prime=cy\) is \[f(t)=y_0e^{ct}\]
27.2.2 Linear Systems
A beautiful generalization of the relatively simple idea above allows one to solve essentially all linear systems of differential equations (with constant coefficients). One learns in a differential equations course how to turn any such system into a system of first order equations so we focus on those here. For specificity, assume we have the following three differential equations, for unknown functions \(x(t),y(t),z(t)\):
\[\begin{matrix} x^\prime(t) = 2x(t)+3y(t)-4z(t)\\ y^\prime(t)= 3x(t)-y(t)+z(t)\\ z^\prime(t)=x(t)-z(t) \end{matrix}\]
And suppose further that these are constrained by specific initial conditions: \(x(0)=7, y(0)=3, z(0)=2\).
Because the right hand side of each is a linear combination of \(x,y,z\) we can rewrite this more compactly using matrix notation:
\[\begin{pmatrix}x^\prime(t)\\ y^\prime(t)\\ z^\prime(t)\end{pmatrix} = \begin{pmatrix} 2&3&-4\\ 3&-1 &1\\ 1&0&-1 \end{pmatrix} \begin{pmatrix} x(t)\\ y(t) \\ z(t) \end{pmatrix} \]
In our continued attempt to simplify notation and make this problem more manageable, we define \(s(t)\colon\RR\to\RR^3\) to be the vector valued function \[s(t)=\begin{pmatrix} x(t)\\ y(t) \\ z(t) \end{pmatrix} \]
And let \(M\) be the matrix appearing in the system above. Then as \(s^\prime=\begin{pmatrix}x^\prime,y^\prime,z^\prime\end{pmatrix}\) we can rewrite this system much more succinctly as
\[s^\prime(t)=Ms(t)\]
So, we are looking for a vector valued function \(s(t)\) for which differentiation equals multiplication by the matrix \(M\). This gives a hint of why exponentials may be involved: if everything were one dimensional, \(s\) would just be a function and \(M\) a number - we are back in the case considered in the previous section, where solutions are multiples of \(e^{Mt}\)!
Taking this as a hint, we might attempt to solve this differential equation using the matrix exponential. First, we consider a matrix valued function, and then will come back to think about the initial conditions.
Proposition 27.1 Let \(M\in M_n(\RR)\) be an \(n\times n\) matrix, and define the function \(F\colon\RR\to M_n(\RR)\) as follows:
\[F(t)=e^{tM}\]
Then \(F\) satisfies the differential equation \(F^\prime(t)=MF(t)\) in the space \(M_n(\RR)\) of \(n\times n\) matrices.
Proof. If \(M\in M_n(\RR)\) is a matrix, its norm \(|M|=\sqrt{\sum_{1\leq i,j\leq n}m_{ij}^2}\) is a real number, and so the power series for \(e^{t|M|}\) converges for all \(t\) (since it converges on the entire real line).
Thus, by Exercise 27.1 the power series \(F(t)=e^{tM}\) converges in the space of matrices \[F(t)=e^{tM}:=\lim_{k}\left[I + (tM)+\frac{1}{2}(tM)^2+\cdots +\frac{1}{k!}(tM)^k\right]\]
Now we wish to take the derivative. Recalling \(M\) is a fixed matrix, this power series defines an \(n\times n\) array of power series (one for each entry), as a function of \(t\):
\[F(t)=\sum_{k\geq 0}\frac{1}{k!}M^k t^k\hspace{1cm} [F(t)]_{ij}=\sum_{k\geq 0}\frac{1}{k!}[M^k]_{ij}t^k\]
Where we know from the above that each of these \(n\times n\) many power series converges absolutely for all values of \(t\). Thus, using our result on differentiating power series within their radius of convergence, we know for each of these we have
\[\begin{align*} [F(t)]_{ij}^\prime&=\left(\sum_{k\geq 0}\frac{1}{k!}[M^k]_{ij}t^k\right)^\prime\\ &=\sum_{k\geq 0}\left(\frac{1}{k!}[M^k]_{ij}t^k\right)^\prime\\ &=\sum_{k\geq 0}\frac{1}{k!}[M^k]_{ij} k t^{k-1}\\ &=\sum_{k\geq 1}\frac{1}{(k-1)!}[M^k]_[ij] t^{k-1}\\ &= \sum_{k\geq 0}\frac{1}{k!}[M^{k+1}]_{ij}t^k \end{align*}\]
Because we know this equation holds for each entry, we have an equation for the matrices themselves:
\[F(t)^\prime = \sum_{k\geq 0}\frac{1}{k!}M^{k+1}t^k\]
Each term on the right side shares a common factor of \(M\). For any finite sum we may factor out such a term:
\[M+tM^2+ \frac{1}{2}t^2 M^3+\cdots +\frac{1}{k!}t^k M^{k+1}\] \[=M\left[I+tM+ \frac{1}{2}t^2 M^2+\cdots +\frac{1}{k!}t^k M^{k}\right]\]
As \(k\to\infty\) the series on the right converges exactly to the original series \(F(t)\) - now multiplied by this common factor of \(M\). And, the series on the left converges to \(F^\prime(t)\) by our calculation above. So, both sides of this equality converge and taking the limit yields \[F(t)^\prime = M F(t)\]
Now we utilize this to solve our particular differential equation. We’ve constructed a function whose derivative is \(M\) - the thing we wanted - but its not a solution to our differential equation because its a matrix valued function, and we are looking for a vector \((x(t),y(t),z(t))\).
Proposition 27.2 Given a vector \(v_o\in\RR^n\), define the vector-valued function \(s\colon\RR\to\RR^n\) \[s(t)=e^{tM}v_o\] for a matrix \(M\in M_n(\RR)\). Then \(s\) satisfies the vector-valued differential equation \[s^\prime= Ms\hspace{1cm}s(0)=v_0\]
Proof. Defining \(s(t)\) by this formula, since \(F(t)=e^{tM}\) converges to an \(n\times n\) matrix for each \(t\) we are assured that \(s(t)\) is a well defined vector for all time. So we need only check it has the required properties.
First, at \(t=0\) the series for \(e^{tM}\) collapses to its first term, the identity matrix and so \[s(0)=e^{0M}v_0=Iv_0=v_0\]
Next, we wish to take the derivative of the vector equation \(s(t)\). Writing this out using the limit definition of the derivative (now applied to a vector): \[s^\prime(t)=\lim_{h\to 0}\frac{e^{t+h}v_0-e^t v_0}{h}\]
For each value of \(h\) we can factor out the constant vector \(v_0\), and thus are left with a limit of matrices applied to this vector:
\[s^\prime(t)=\left(\lim \frac{e^{t+h}-e^{t}}{h}\right)v_0=\left(\lim_{h\to 0}\frac{F(t+h)-F(t)}{h}\right)v_0\]
But we already know the derivative of \(F\)! So, \[s^\prime(t)=F^\prime(t)v_0=MF(t)v_0\]
And, \(F(t)v_0\) is the definition of \(s(t)\): thus as claimed, \[s^\prime(t)=Ms(t)\]
This gives us an explicit solution to our example system:
\[\begin{pmatrix} x(t)\\ y(t)\\ z(t) \end{pmatrix}= \exp\left[\begin{pmatrix} 2t&3t&-4t\\ 3t&-t &t\\ t&0&-t \end{pmatrix}\right]\begin{pmatrix} 7\\3\\2\end{pmatrix} \]
And in fact, provides a glimpse at just how powerful of a tool we’ve created. The matrix valued function \(F(t)=\exp(tM)\) is a solution generator: it produces every solution to the differential equation \(s^\prime = Ms\) simply by multiplication by the initial condition. We can think of \(e^{tM}\) itself as a map from initial conditions to solutions.
\[e^{tM}\colon\textrm{Initial Condition}\mapsto \textrm{Solution}\] \[v_0\mapsto e^{tM}v_0\]
Such a perspective becomes even more important when we turn an eye towards partial differential equations below.
27.2.3 Exponential Operators
A partial differential equation is a differential equation for multivariate functions which involves derivatives with respect to multiple variables. Partial differential equations are a cornerstone of applied mathematics, and the applications of Analysis to the natural sciences. Some common examples are the heat equation from thermodynamics
\[\partial_t H(x,t)=\partial_x^2 H(x,y)\]
The wave equation from fluids, material science, and electromagnetism
\[\partial_t^2 W(x,t)=\partial_x^2 W(x,t)\]
the Schrodinger equation of Quantum mechanics
\[i\partial_t \Psi(x,t)=\frac{1}{2}\partial_x^2 \Psi(x,t)+V(x)\Psi(x,t)\]
and the Black-Scholes equation from economic theory:
\[\partial_t V(s,t)=\frac{\sigma^2}{2}s^2\partial_s^2 V(s,t)=r V(s,t)-rs\partial_s V(s,t)\]
Solving partial differential equations in general is a much more difficult process than the ordinary differential equations discussed above, and so we do not attempt a comprehensive or rigorous treatment here. Instead, we content ourselves to simply explore a few simple cases where exponentiation can play an important role.
Example 27.1 (The Equation \(\partial_t f=\partial_x f\)) Consider perhaps the simplest partial differential equation for a function \(f(x,t)\), with the initial condition given as a function of \(x\) at \(t=0\) \[\partial_t f(x,t)=\partial_x f(x,t)\hspace{1cm} f(x,0)=h(x)\]
One way to think about a function \(f(x,t)\) is as a one parameter family of functions of \(x\). That is, for each fixed value of \(t\), the function \(f(x,t)\) is just a function of \(x\) (which is often written \(f_t(x)\) instead, to emphasize that \(t\) is a parameter and at each fixed \(t\) we get a function just of \(x\)). From this perspective, the differential equation is telling us something about how this collection of functions changes over time: that the rate of change in time is the same as applying the \(x\)-partial derivative at the given moment.
We can reason about this in analogy with the system of equations we discussed above. Indeed, just as the matrix \(M\) was a linear transformation on vectors, the \(x\)- derivative is a linear transformation on functions \(h(x)\mapsto \frac{d h}{d x}\). If we were trying to build a solution generator, it would be a family of linear operators \(F(t)\) where given a function \(h(x)\), we could produce a solution via \(s(t)=F(t)[h(x)]\). One might also try to determine the nature of \(F(t)\) by analogy with the matrix case: \[\partial_t s(x,t)=\partial_t\left[F(t)[h(x)]\right] = F^\prime(t)[h(x)]\] and notice that if \(F^\prime(t)=\partial_xF(t)\), then we would have \[\partial_t s(x,t)= F^\prime(t)[h(x)]=\partial_x F(t)[h(x)]=\partial_x s(x,t)\]
How could we attempt to build an family of operators with this property, that differentiating would “bring down an \(x\)-deriavtive?” Why not propose \(F(t)=e^{t\frac{d}{dx}}\)? We can attempt to define this as a power series \[e^{t\frac{d}{dx}}=I + t\frac{d}{dx}+\frac{t^2}{2} \frac{d^2}{dx^2}+\cdots +\frac{t^n}{n!}\frac{d^n}{dx^n}+\cdots\]
And use this power series to give an explicit definition for how \(e^{t\frac{d}{dx}}\) should act on functions:
\[e^{t\frac{d}{dx}}[h(x)]=h(x)+th^\prime(x)+\frac{t^2}{2}h^{\prime\prime}(x)+\cdots+\frac{t^n}{n!}h^{(n)}(x)+\cdots\] \[=\sum_{n\geq 0}\frac{t^n}{n!}h^{(n)}(x)\]
Upon seeing this formula, one should certainly be thinking about convergence: is this infinite sum going to make sense, for all values of \(x\) and \(t\)? Or could it be that this diverges? An exercise below looks at the case that \(h\) is a power series, where we can prove explicitly using our real-analysis techniques that the series converges, so for now let’s assume convergence, and investigate the behavior of our proposed function.
Proposition 27.3 Let \(h\colon\RR\to\RR\) be a function and assume that \(s(x,t)=\sum_{n\geq 0}\frac{t^n}{n!}h^{(n)}(x)\) converges absolutely for all \(x,t\).
Furthermore assume the following technical condition on \(h\), essentially saying the derivatives of \(h\) can’t grow too fast: for each \(n\) the derivative \(|h^{(n)}(x)|\) is bounded by some constant \(K_n\), and \(\lim \frac{K_n}{nK_{n-1}}\to 0\).
Then \(s\) solves the differential equation \(\partial_t s = \partial_x s\) with \(s(x,0)=h\).
Proof. For any fixed \(x\), the proposed function \(s(x,t)\) is a power series in \(t\), and we have proven that inside the radius of convergence such a power series can be differentiated term by term. Thus,
\[\begin{align*} \partial_t s(x,t)&=\partial_t\left(\sum_{n\geq 0}\frac{t^n}{n!}h^{(n)}(x)\right)\\ &= \sum_{n\geq 0}\partial_t\left(\frac{t^n}{n!}h^{(n)}(x)\right)\\ &= \sum_{n\geq 0}\frac{nt^{n-1}}{n!}h^{(n)}(x)\\ &= \sum_{n\geq 1}\frac{t^{n-1}}{(n-1)!}h^{(n)(x)}\\ &= \sum_{n\geq 0 }\frac{t^n}{n!}h^{(n+1)}(x) \end{align*}\]
Because our original power series converged absolutely for all \(t,x\), this remains true after differentiation (as a corollary of the power series differentiation theorem, proved using dominated convergence.) For each \(n\), we can rewrite \(h^{(n+1)}(x)\) as \(\frac{d}{dx} h^{(n)}(x)\) and thus \[\frac{t^n}{n!}h^{(n+1)}(x)=\frac{t^n}{n!}\frac{d}{dx} h^{(n)}(x)=\partial_x\left(\frac{t^n}{n!}h^{(n)}(x)\right)\]
Plugging this back into our series, we see
\[\partial_t s(x,t)=\sum_{n\geq 0}\partial_x\left(\frac{t^n}{n!}h^{(n)}(x)\right)\]
Now we investigate the right-hand-side further. For any finite sum we know that \[\sum_{n\leq N}\partial_x\left(\frac{t^n}{n!}h^{(n)}(x)\right)=\partial_x\left(\sum_{n\leq N}\frac{t^n}{n!}h^{(n)}(x)\right)\]
so all that needs to be justified is that this property remains true in the limit. But this is exactly what dominated convergence is built for, exchanging the limit and sum! Let’s check the conditions of dominated convergence apply:
- For each \(n\) the term \(\frac{t^n}{n!}h^{(n)}(x)\) is differentiable.
- For each \(x\), the sum \(\sum_n \frac{t^n}{n!}h^{(n)}(x)\) is convergent.
These follow immediately from our assumptions on \(h\) and \(s\). Next we need to define a dominating series \(M_n\) of constants. Our technical assumption assures us that for each \(n\) there is some uniform constant \(K_n\) bounding the derivative \(|h^{(n)}(x)|\), so we propose \(M_n=\frac{t^n}{n!}K_n\). By definition this is greater than or equal to the \(n^{th}\) term in the series, so all we need to see is that it converges:
\[\sum_{n}M_n=\sum_n \frac{t^n}{n!}K_n\]
Performing the ratio test, we find
\[\begin{align*}\lim\left|\frac{\frac{t^n}{n!}K_n}{\frac{t^{n-1}}{(n-1)!}K_{n-1}}\right|&=\lim\left|\frac{tK_n}{nK_{n-1}}\right|\\ &=|t|\lim \frac{K_n}{nK_{n-1}}\\ &=0 \end{align*}\] Where the last equality follows directly from our technical assumption. Thus, our proposed dominating series converges absolutely, and dominated convergence allows us to switch the order of the differentiation and summation to see
\[\partial_x s(x,t)=\partial_x\left(\sum_{n\geq 0}\frac{t^n}{n!}h^{(n)}(x)\right)=\sum_{n\geq 0}\partial_x\left(\frac{t^n}{n!}h^{(n)}(x)\right)\]
But the right hand side here is exactly what we found earlier must equal the partial \(t\) derivative! Stringing these together,
\[\partial_t s(x,t)=\sum_{n\geq 0}\partial_x\left(\frac{t^n}{n!}h^{(n)}(x)\right)=\partial_x s(x,t)\]
So, \(s\) satisfies the proposed differential equation. Last but not least, we check the intial condition by evaluating at \(t=0\):
\[s(x,0)=h(x)+0\cdot h^\prime(x)+\frac{0^2}{2}h^{\prime\prime}(x)+\cdots=h(x)\]
We can rephrase the result above in more abstract language:
Corollary 27.5 Let \(\mathcal{F}\) be the space of functions, and \(F(t)=e^{t\frac{d}{dx}}\) the operator \(\mathcal{F}\to\mathcal{F}\) defined by \[e^{t\frac{d}{dx}}[h]=\sum_{n\geq 0}\frac{t^n}{n!}h^{(n)}\]
Then \(e^{t\frac{d}{dx}}\) is a solution generator for the differential equation \(\partial_t f(x,t)=\partial_x f(x,t)\). Given any initial condition \(h(x)\) with slow enough growing derivatives, \(s=e^{t\frac{d}{dx}}[h(x)]\) is a solution to this differential equation.
This is pretty incredible: just by analogy with the matrix case we were able to propose a solution using the power series for the exponential, and then with some real analysis prove this solution works! But we can go even farther, and understand the solution geometrically using what we know about the exponentiated derivative operator. Indeed, in Corollary 27.4 we show (following an exercise for you to complete) that at least if \(h\) is a power series we can readily understand the action of \(e^{t\frac{d}{dx}}\):
\[e^{t\frac{d}{dx}}h(x)=h(x+t)\]
Thus, after all of this hard work we end up with a ridiculously simple solution:
Corollary 27.6 If \(h(x)\) is a differentiable function of \(x\), then \[s(x,t)=h(x+t)\] is the solution to \(\partial_t s=\partial_x s\) with initial condition \(h\).
This is trivial to confirm by hand, using the chain rule! \[\partial_t h(x+t)=h^\prime(x+t)=\partial_x h(x+t)\]
And in retrospect, we could have come up with this solution if we just thought hard enough, instead of diving into calculations! But our ability to write this solution in a geometrically - obvious manner is special to this case, and to the differential equation in question being particularly simple. The power of the technique above was that it did not require us to be clever the exponential may to the rescue even when - and especially when - our intuition and foresight fail us.