Viewpoints on differentials

…many mathematicians think in terms of infinitesimal quantities: apparently, however, real mathematicians would never allow themselves to write down such thinking, at least not in front of the children. —Bill McCallum 

The concept of differentiable function and differential introduced in a a previous post were related to the expansion

y(x+dx)-y(x)=A(x)dx+o(dx)=dy+o(dx),

valid as dx\to 0. During the XVII and XVIII centuries, mathematicians thought of dx and dy as actual infinitesimals “à la Leibniz”. After the advent of the rigorous concept of limit, the point of view changed. The above relation is now mostly used to introduce the derivative A(x), which plays a central role, and both dx and dy are deprived of their infinitesimal nature. Namely, dx is understood as an arbitrary, finite, increment of the “independent” variable whereas dy is a function of (x,dx), linear in its second argument. dy is thus a functional differential, and the relation dy/dx=A(x) is true by definition. This point of view stresses the notion of derivative, rendering differentials as superfluous, some sort of annoyance of the past, used only in the context of linear approximation, see [1].

However, in Engineering and Physics practice, differentials come first and are perceived as actual infinitesimals while derivatives are obtained as ratios. This seems more natural, the concept of ratio being psychologically more complex than that of infinitesimal. But that is not the only advantage. Indeed, as we will show on examples, reasoning with differentials as infinitesimals leads to shorter and more clear proofs of propositions and solutions to many problems. One should concede, however, that the notion of derivative is easier to put on a firm basis in the language of limits, while infinitesimals are logically problematic.

A more symmetric point of view, where no distinction between “independent” and “dependent” variables is made, is very prolific. As an example, suppose the quantities x and y are linked by the relation

xy=5.\qquad\qquad\qquad\qquad (1)

If we increase x and y by \Delta x (respectively \Delta y) in a way consistent with the above relation,

(x+\Delta x)(y+\Delta y)=5,

we have

(x+\Delta x)(y+\Delta y)-xy=0,\qquad\textrm{or}\qquad x\Delta y+y\Delta x+\Delta x\Delta y=0.

The latter is a condition on the finite increments \Delta x and \Delta y to be “compatible” with (1). By considering infinitesimal increments dx and dy instead, we can “filter” the above relation by keeping only the terms which are linear in dx and dy:

x dy+ydx=0,\qquad\qquad\qquad\qquad (2)

where the quadratically small term dxdy is dropped. This is a general principle in Differential Calculus: in any computation involving infinitesimals, one should keep the leading ones (i.e. the ones of lowest order), dropping the higher order ones. This principle is followed by physicists, engineers and other users of differential calculus, and deemed as intuitively clear. It can be ultimately justified using the language of limits.

Thus, (2) is the differential relation between x,y,dx,dy corresponding to the “finite” relation (1). We derived (2) from (1). This time we are not differentiating a function, but rather an equation.

The opposite process is called “integration” or “solving a differential equation”. Of course one cannot completely recover the finite relation from the differential one: the constant \lq\lq 5" in (1) could be replaced by any other constant without altering the resulting differential relation.

In line with a more “algebraic” point of view, differential equations in modern textbooks are written in the form

\displaystyle{G(x,y,\frac{dy}{dx})=0},

where G is a general function of three variables and the unknown is a function y=y(x). This approach carries some unpleasant consequences. For example, in modern notation (2) may be presented as

\displaystyle{\frac{dy}{dx}=-\frac{y}{x}}\qquad\qquad\qquad\qquad (3)

and solutions are sought as functional relations y = f(x) on some interval. Thus, the function y=\sqrt{1-x^2} on the interval (-1,1) is a solution of (3). The function y=-\sqrt{1-x^2} on the same interval is also a solution. But it would definitely be much nicer to be able to say that the full circle x^2+y^2=1 we started with is a solution to the symmetric differential equation (2), as well as any other circle centered at the origin. In some textbooks, they deal with the issue by saying that the equation (3) actually presupposes the simultaneous consideration of

\displaystyle{\frac{dx}{dy}=-\frac{x}{y}}\qquad\qquad\qquad\qquad (3')

where we look for solutions x=x(y). Then, the full set of solutions (to be precise, non-prolongable ones) is made of right, left, upper and lower half-circles. This is a consequence of insisting on functional relations and finite rates, where the variables play an asymmetric role. This was never an issue for Leibniz, Huygens, the Bernoullis, L’Hôpital or Euler. Much less for Newton, whose independent variable was always time, an extrinsic parameter. Equations in “differential form” like (2) are still presented in many texts, especially those for Engineers. The solution is then presented in the natural implicit form, apparently oblivious of their initial definition of solutions as functional relations.

Back to the above procedure, in order to get (2) from (1) it is assumed that both dx and dy are infinitesimals, and (1) is satisfied up to infinitesimals of order higher than dx and dy. This process is called linearization. More generally, given an implicit relation between the variables u,v,w,\dots

F(u,v,w,\dots)=0

we replace u,v,w,\dots by u+du,v+dv,w+dw,\dots on the left hand side and impose F(u+du,v+dw,w+dw,\dots)=0 up to infinitesimals of order higher than du, dv,dw,\dots (quadratic, cubic, etc.) That leads to a differential relation

A(u,v,w,\dots)du+B(u,v,w,\dots)dv+C(u,v,w,\dots)dw+\dots=0,\qquad\qquad\qquad\qquad (4)

valid “along” F=0 (see below).

In order to reconcile the above linearization process with our previous concept of functional differential, one can consider a new quantity z defined by

z=F(u,v, w,\dots)

and proceed as in the case of one independent variable, i.e. expanding the change of z as a sum of powers of independent increments du, dv,dw,\dots of u,v,w\dots respectively. For the sake of simplicity, assume there are two independent variables, u and v. Then, if \Delta z can be written in the form

z(u+du,v+dv)-z(u,v)=A(u,v)du+B(u,v)dv+\textrm{\ higher order terms}

(where the higher order terms contain higher powers of du and dv, products like dudv etc.) we say that z is a differentiable function of its arguments, and call the linear part

dz=dF(u,v; du, dv)=A(u,v)du+B(u,v)dv

the differential of z (or F). The coefficients A(u,v) and B(u,v) are called partial derivatives of z with respect to u (respectively, v), denoted as

\displaystyle{\frac{\partial z}{\partial u}}=A(u,v);\qquad\qquad \displaystyle{\frac{\partial z}{\partial v}}=B(u,v)

Summing up, when differentiating an equation F(u,v,w,\dots)=0, we differentiate the function on the left hand side as if the variables were independent. The obtained differential relation, however, only holds if du, dv,\dots are compatible with F(u,v,\dots)=0 to first order. Next, we present a nice geometric interpretation of the notion of compatibility.

The point of view of Differential Geometry

There is a very nice interpretation of the linearization process in the language of Differential Geometry. We can think of our relation F(u,v,w,\dots)=0 as a (hyper)surface in the space of the variables (u,v,w,\dots), contained in the domain of F. Then,

dF=A(u,v,w,\dots)du+B(u,v,w,\dots)dv+\dots

is an (exact) differential form, defined on the domain of F. At each point in this domain, it is a linear function of the differentials du, dv,dw,\dots. For example, the above relation xy=5 defines a hyperbola in the xy-plane, and the corresponding differential form is dF=xdy+ydx.

Then, the relation dF=0 holds along F=0 or, more generally, along any level set of F. More precisely, it holds when the differentials du, dv,\dots are the components of a vector, tangent to the hypersurface F=0. Those familiar with Multivariable Calculus will recognize the condition dF=0 as the requirement for (du,dv,dw,\dots) to be perpendicular to the gradient vector \nabla F(u,v,w,\dots) which is in turn perpendicular to the level set F=0. This is the geometric meaning of “differential increment being compatible with the given relation”. In the figure below, F(x,y)=x^2+2xy+3y^2-10.

…………………………………………. \qquad\qquad\qquad \mathbf{dF(A,dr)=0;\, \,\, dr=(dx,dy)} ……………………………………….

More generally, if F is a scalar function defined on a manifold, the tangent space to the submanifold F=0 is the kernel of the differential form dF. Solving an equation in “differential form” is precisely finding a submanifold whose tangent space is the kernel of the given form.

There is no trace of infinitesimal quantities in the above linearization procedure. After all, there is no restriction on the size of the vector (du, dv); it just needs to be tangent to the hyper-surface F=0. However, among Physicists, Engineers and other practitioners of Calculus, it is common to assume that the differentials of the involved variables are actual infinitesimals, and higher order terms are dropped by virtue of their relative smallness. Formally, both approaches lead to the same result, but the latter can often be used as a computational shortcut and is more intuitive.

Yet another advantage of differential relations like (4) is that if the variables (u,v,w,\dots) depend on further variables r, s,t,\dots, we obtain a valid differential relation between the new variables by just considering du, dv,dw,\dots as functional differentials of the new independent variables and substituting accordingly. This formal invariance of the first differential was already pointed out by Leibniz and is very useful in applications. From the point of view of derivatives, it is nothing but the chain rule. From an abstract point of view, it is a consequence of linearity, and thus does not extend to higher order differentials.

In forthcoming posts, we will present some examples of the use of differentials when thought as infinitesimals and some applications of the above calculus with differentials to Physics, to the solution of optimization problems, related rates problems, etc.

References:

[1] “Putting Differentials Back into Calculus”, T. Dray, Corinne A. Manogue, The College Mathematics Journal 42 (2), 2010.