Differentials and local power expansions

In line with the general purpose of this thread, in this post I intend to present a bird’s eye view of the concept of power expansions (Taylor series), in the belief that these ideas are often presented in a highly technical way that makes it difficult for students to grasp their simplicity and significance.

The change of the area of a unit square corresponding to a change dl of its side equals

\Delta A:=A-1=2dl+(dl)^2

The linear part of the change is called the first differential of A (to be precise, the first differential of A at the point l=1, corresponding to a change dl of the independent variable). It is denoted by dA:

dA=2dl

Thus,

\Delta A=dA+(dl)^2\qquad\qquad (*).

At this point, the above relation is exact and is valid for any finite dl. However, if we consider an infinitesimal dl, the main contribution to \Delta A is given by the linear (in dl) part dA, the difference being an infinitesimal of higher order. This observation suggests the following definition: if a function y=f(x) of a variable x is such that

\Delta f(x,dx)=C(x)dx+o(dx),

for infinitesimal increments dx, we say that f is differentiable at x. We define the first differential to be dy=df(x,dx):=C(x)dx. The coefficient C(x) is what we call the derivative of f at the point x. Therefore, the derivative is a ratio of differentials, C=dy/dx. Finding the derivative from the knowledge of y(x) involves, in the general case, considering dx and dy infinitesimal (see below). For polynomials, however, it only requires algebraic manipulations.

For example, let us compute the derivative of y=x^3+2x^2+x-1 at a generic x. We have to increase x by dx and compute the corresponding change of y:

\Delta y=(x+dx)^3+2(x+dx)^2+(x+dx)-1-[ x^3+2x^2+x-1]=

=3x^2dx+3x(dx)^2+(dx)^3+4xdx+2(dx)^2+dx=

=(3x^2+4x+1)dx+(3x+2)(dx)^2+(dx)^3\qquad\qquad (**)

The linear part is

dy=(3x^2+4x+1)dx,

while the rest are quadratic and cubic terms, all of them infinitesimals of higher order, negligible w.r.t. dx . Thus, the derivative of y with respect to x is 

\displaystyle{\frac{dy}{dx}=(3x^2+4x+1).}

In both previous examples (*) and (**), the total variations of A (respectively y) were sums of positive integer powers of dx. That is always the case when considering polynomial functions of x. Namely, if y=P_n(x) is a polynomial of degree n, and we look at the change of y as x is increased from x to x+dx, we will have

\Delta y=A(x)dx+B(x)(dx)^2+C(x)(dx)^3+\dots W(x)(dx)^n,\qquad\qquad (***)

with coefficients A,B,C,\dots W depending on the base point x. In order to get (***), we just expand the powers (x+dx)^k according to Newton’s binomial theorem. We can think of the coefficients as rates of different orders, multiplying the corresponding powers of dx. Thus, we have a clear separation of the factors leading to the change \Delta y: the rates, which depend only on the base point, and the different powers of the variation of the independent variable dx. This is a simple instance of a (finite) “Taylor expansion”. No limit process is involved in the previous computations, and the final expression (***) is exact for any finite value of dx.

One of the main breakthroughs in the development of Calculus is the fact that for non-polynomial functions like y=\sin x or y=\sqrt{1+x} a similar expansion holds. The key difference is that the expansion may contain infinitely many terms. Thus, the concept of a power series serves as a bridge between algebraic and transcendental relations. The idea that (at least smooth) functional relations are either polynomials or “infinite degree polynomials” (power series) dominated Analysis over a long period of time. Newton himself considered the binomial series and the use of series in general to solve differential equations his main mathematical achievement. They also allowed to define many non-elementary functions and to develop Complex Analysis. Series were informally used by Euler, Lagrange, Laplace and many others, in some cases producing paradoxical results. Questions related to convergence were not posed until the middle of the XIX century by Gauss, Abel, Cauchy, etc. It is said that when Laplace heard about Cauchy’s convergence criteria, he rushed home to check the series he used in his monumental “Celestial Mechanics”. Luckily for him and for the stability of the Solar System, all of them were convergent in the range of parameters he considered.

For a relation like y=\sin x, algebraic methods to find the coefficients in the expansion are not available, and need to be replaced by limit procedures. Here is the basic idea: first, we divide throughout by dx, yielding

\displaystyle{\frac{\Delta y}{dx}=A+Bdx+C(dx)^2+\dots}\qquad (!)

where the denote A(x), B(x), etc. simply as A, B, etc. As dx approaches zero, all the terms in the r.h.s. vanish except for A. Consequently, A is the limit value of \frac{\Delta y}{dx}. This is how the (first) derivative is usually defined,

\displaystyle{\frac{dy}{dx}=A=\lim\limits_{dx\to 0}\frac{\Delta y}{dx}.=\lim\limits_{dx\to 0}\frac{y(x+dx)-y(x)}{dx}}

We can think of successive terms in the right-hand side of (***) as corrections to the previous ones when dx is infinitesimal. Thus, if we only keep the first differential, we obtain the linear approximation of y near x,

y(x+dx)\approx y(x)+dy(x,dx)

which renders a good estimate if dx is small enough, since we are discarding infinitesimals of higher order.

In order to find the coefficient B above, we repeat the procedure. Namely, we take A to the left in (!) and divide by dx again, yielding

\displaystyle{B= \lim\limits_{dx\to 0}\frac{\frac{\Delta y}{dx}-A}{dx}= \frac{\frac{\Delta y}{dx}-\frac{dy}{dx}}{dx}=\frac{\Delta y-dy}{(dx)^2}}.

Thus, the infinitesimal \Delta y-dy is typically quadratic, O((dx)^2) or of a higher order, o((dx)^2) if B=0. It is natural that the quadratic correction B(dx)^2 is related to the variation of A or, equivalently, dy(x, dx) over the interval (x,x+dx). Thus, we introduce the second differential of y to measure the variation of the first differential as the base point x changes to x+dx. Namely, we define

d^2y(x,dx)=d(dy(x, dx),dx)

where we consider dx a constant and vary the base point x again by the amount dx (this is reminiscent of arithmetic sequences). We have

dy(x+dx, dx)-dy(x,dx)=A(x+dx)dx-A(x)dx=

(A(x+dx)-A(x))dx=G(x)(dx)^2+o((dx)^2),

and therefore

d^2y(x,dx)=G(x)(dx)^2,

where G(x) is the derivative of the first derivative A at x. Thus the second differential is quadratic in dx. The coefficient G(x) is called the second derivative of y at the point x The second derivative is thus the ratio of the second differential to the square of dx.

How is G related to B? The idea is to move along the linear approximation up to x+dx/2 and, at that point, use G(x) to “correct” the linear approximation. From the above expression for B,

\displaystyle{\frac{\Delta y-dy}{(dx)^2}.=\frac{y(x+dx)-y(x)-Adx}{(dx)^2}=}

\displaystyle{=\frac{y(x+dx)-y(x+dx/2)+y(x+dx/2)-y(x)-Adx}{(dx)^2}=}

= \displaystyle{\frac{A(x+dx/2)dx/2+B(x+dx/2)(dx/2)^2+A(x)dx/2+B(x)(dx/2)^2-Adx+\dots}{(dx)^2}=}

= \displaystyle{\frac{[A(x+dx/2)-A(x)]dx/2+[B(x+dx/2)+B(x)](dx/2)^2+\dots}{(dx)^2}},

where the dots represent infinitesimals of order higher than (dx)^2. In the limit dx\to 0 we obtain

B(x)=G(x)/4+B(x)/2,

that is, B=G/2.

A completely analogous procedure allows to find C(x) in (***). This time we need to split the interval [x,x+dx] into three equal pieces, since we want to account for the variation of the second differential, which depends on three points. Precisely, we introduce the third differential of y at x as:

d^3y(x,dx)=d(d^2(x,dx),dx).

A computation, similar to the one above for the second differential, gives

d^3y(x,dx)=H(x)(dx)^3,

where H(x) is called the third derivative of y at x. The expression for C(x) is

\displaystyle{C= \lim\limits_{dx\to 0}\frac{\frac{\Delta y}{dx}-A-Bdx}{(dx)^2}=\frac{\Delta y-dy-\frac{d^2y}{2}}{(dx)^3}}

A computation similar to the one above gives the relation C=H/(2\cdot 3). In general, we define the n-th differential recursively

d^ny(x,dx)=d(d^{n-1}(x,dx),dx)

and the corresponding n-th derivative \displaystyle{\frac{d^ny}{(dx)^n}}. The total variation of a polynomial of degree n can then be expressed as

\displaystyle{y(x+dx)=y(x)+dy(x,dx)+\frac{d^2y(x,dx)}{2}+\frac{d^3y(x,dx)}{2\cdot 3}+\dots+\frac{d^ny(x,dx)}{2\cdot 3\cdots n}}.

Thus, if we develop a machinery to compute derivatives, we can swiftly write down the expansion (***) for a polynomial, avoiding the use of the binomial theorem and rearrangement. More importantly, transcendental functions like y=\sin x or y=e^x can be dealt with the same way, leading to their power (Taylor) expansions:

\displaystyle{y(x+dx)=y(x)+dy+\frac{1}{2!}d^2y+\frac{1}{3!}d^3y+\dots+\frac 1{n!}d^ny+\dots}

For computational purposes, the more terms we keep on the right hand side, the better the approximation for small values of dx. But series expansions are also important for the theory. Many of the usual manipulations with polynomials can be extended to series, including differentiation, integration, long division, etc. In particular, they can be used to solve differential equations with “nice” coefficients.

However, the transition from polynomials to more general functions is non-trivial. We showed above that if the difference y(x+dx)-y(x) can be expressed in the form A(x)dx+B(x)dx^2+\dots, then y has to be a differentiable function of x and the coefficients A(x), B(x), etc are uniquely determined in terms of the derivatives. The question as to whether such a representation is at all available, even for infinitely differentiable functions, leads to the concept of analytic function, to be considered in a future post.

Leave a comment