Transcendental numbers? Show me a nice one!

The structure of the real line is deceptively intricate. Rational and irrational numbers are densely interwoven, yet they differ fundamentally—not just in their arithmetic properties, but in how they can be approximated. Even the rationals themselves possess a rich, hierarchical organization, elegantly captured by the Stern-Brocot tree.

If you take a course on Real Analysis where these matters are looked at in some detail, or an Algebra course where extensions of the rational field \mathbb{Q} are considered (say, a course on Galois theory), you encounter the concepts of algebraic and transcendental numbers. Rational numbers can be thought as the ones solving a linear equation a_1x+a_0=0 with integer coefficients, a_1\neq 0. A natural generalization leads us to the definition of algebraic numbers as real numbers satisfying a polynomial equation of the form a_nx^n+a_{n-1}x^{n-1}+\dots +a_0=0 with integer coefficients a_0,a_1,\dots a_n. The degree of an algebraic number is the lowest degree of a polynomial with integer coefficients having the given number as a root. Such polynomial has to be irreducible in \mathbb{Z}[x], otherwise the said number would be a root of a lower degree polynomial. Real numbers which are not algebraic are called transcendental. Clearly, all rational numbers are algebraic, and all transcendental numbers are irrational.

The idea of the existence of transcendental numbers goes back to Leibniz, but the first to prove it, by giving a concrete example, was J. Liouville in papers from 1844 and 1851, [1] & [2]. It is often the case that when transcendental numbers are presented, the examples provided include \pi, e, and the Euler-Mascheroni constant \gamma or Apéry’s constant \zeta(3) are mentioned as candidates (it is not even known if those are irrational). Proving that \pi or e are transcendental was accomplished by Hermite and Lindemann in the second half of the XIX century, and their proofs are far from elementary and can hardly be motivated and put in simple terms.

In contrast, Liouville’s constant has a very simple structure and the proof of its transcendency is completely elementary. It is defined as the number

\displaystyle{L=\sum_{k=1}^{\infty}\frac 1{10^{k!}}},

that is, the number 0.11000100000000000000000100000\dots where the “ones” appear at positions given by the factorials, 1,2,6,24, \dots after the decimal point. Not being periodic, L is clearly irrational.

Here is the reason why L is transcendental: it can be approximated “too well” by the partial sums of the series, contradicting a result proved by Liouville, whose (very simple) proof is given below. The statement of the result, which we accept for the time being, is as follows.

Theorem [1] : Let \alpha be an irrational algebraic number of degree d\ge 2 (i.e., \alpha is a root of an irreducible polynomial of degree d with integer coefficients). Then there exists a constant C(\alpha)>0 such that for all rational numbers \frac{p}{q} (with p,q\in\mathbb{Z},\, q>0) we have

\displaystyle{\left|\alpha-\frac{p}{q}\right|\ge \frac{C(\alpha)}{q^d}}\qquad (1).

(in words, algebraic numbers are not “too close” to rationals, in the sense that the rate of convergence of sequences of rationals with increasing denominators is limited by the degree of the given algebraic number). Formula (1) should be contrasted with the well known fact that the continued fraction convergents \displaystyle{\frac{p_k}{q_k}} of any irrational number \alpha satisfy

\displaystyle{\left|\alpha-\frac{p_k}{q_k}\right|\le \frac{1}{q_k^2}}.

Liouville’s constant violates the theorem

The partial sums L_n=\sum_{k=1}^{n}\frac 1{10^{k!}} are rational numbers with denominator q_n=10^{n!}. It is clear that

\displaystyle{\left|L-L_n\right|=\sum_{k=n+1}^{\infty}\frac 1{10^{k!}}\le \frac{2}{ 10^{(n+1)!}}}\qquad (2)

On the other hand, if L were algebraic of degree d, we would have, according to Liouville’s Theorem,

\displaystyle{\left|L-L_n\right|\ge \frac {C}{ 10^{dn!}}}\qquad (3)

But relations (2) and (3) clearly contradict each other for large n. Indeed, for large enough n, we have

\displaystyle{\frac{2}{ 10^{(n+1)!}}<\frac {C}{ 10^{dn!}}},

since (n+1-d)n!\to\infty. Thus, L is transcendental.

A proof of Liouville’s Theorem

Proof: By definition, there exists an irreducible polynomial in \mathbb{Z}[x]

P(x)=a_dx^d+a_{d-1}x^{d-1}+\dots +a_0

with P(\alpha)=0. If \,\,\displaystyle{\frac{p}{q}}\,\, is a rational number in lowest terms ({\it i.e.} \textrm{g.c.d}\,(p,q)=1), then q^dP\left(\frac{p}{q}\right) is a non-zero integer,

q^dP\left(\frac{p}{q}\right)=a_dp^d+a_{d-1}p^{d-1}q+\dots +a_0q^d

and therefore

\displaystyle{\left|P\left(\frac{p}{q}\right)\right|\ge \frac{1}{q^d}}.

Next, we relate P\left(\frac{p}{q}\right) to P(\alpha) via the Mean Value Theorem,

\displaystyle{\left|P\left(\frac{p}{q}\right)\right|=\left|P\left(\frac{p}{q}\right)-P(\alpha)\right|=\left|P'(\xi)\right|\left|\frac{p}{q}-\alpha\right|}

for some \xi between \frac{p}{q} and \alpha. If we fix an interval around \alpha, that is, if we assume, say, \left|\frac{p}{q}-\alpha\right|<1, then \left|P'(\xi)\right|\le M(\alpha) and the previous estimate entails

\displaystyle{\left|\frac{p}{q}-\alpha\right|=\frac{\left|P\left(\frac{p}{q}\right)\right|}{\left|P'(\xi)\right|}\ge \frac{1}{M(\alpha)q^d}}.

The assertion follows with C(\alpha)=\min\left\{1,\frac 1{M(\alpha)}\right\}.

Final Remarks

During the time of Liouville and Hermite, transcendental numbers appeared rare and exotic, requiring ingenious proofs to isolate even a handful of examples. Yet just a few decades later, in 1874, Cantor demonstrated that, in fact, almost all real numbers are transcendental. What seemed like a hunt for “needles in a haystack” turned out to be a realization that the “haystack” was almost entirely made of “needles.” His proof, however, was non-constructive—relying not on explicit examples but on the abstract principles of cardinality and the countability of algebraic numbers.

Here is a question worth pondering: If almost all numbers are transcendental, why do the ones we encounter so rarely reflect that? Part of the answer to this question is probably that humans think algorithmically. We are drawn to numbers we can computedescribe, or construct. Most numbers are not describable.

The apparent simplicity of Liouville’s constant is deceptive—a trick of human intuition. What makes it seem simple is merely the ease of describing its decimal representation. By contrast, a number like \sqrt{2} might appear ‘chaotic’ in its decimal expansion, yet it is fundamentally structured: as a quadratic irrational, its continued fraction is perfectly periodic,

\displaystyle{\sqrt{2}=1+\frac{1}{2+\displaystyle{\frac{1}{2+\displaystyle{\frac{1}{2+\displaystyle{\frac{1}{2+\dots}}}}}}}}.

The same applies, for instance, to the golden ratio, again a quadratic irrational with a very simple continued fraction representation, etc. This reveals a deeper truth: our perception of mathematical simplicity often depends on the representation we choose, not the object itself.

I hope you enjoyed this dive into Liouville’s constant — its deceptively tidy decimal mask is, after all, a very human kind of charm.

Bibliography

[1] Liouville, J. (1844), “Mémoires et communications”Comptes rendus de l’Académie des Sciences (in French). 18 (20, 21): 883–885, 910–911.

[2] Liouville, J. (1851), “Sur des classes très étendues de quantités dont la valeur n’est ni algébrique, ni même réductible à des irrationnelles algébriques”,  Journal de Mathématiques Pures et Appliquées, 16, 133–142. https://www.numdam.org/item/JMPA_1851_1_16__133_0.pdf

The Legendre transform and some applications

The Legendre transform has its origins in the work of A.M. Legendre [1], where it was introduced to recast a first order PDE into a more convenient form. It gained significance around 1830 with the reformulation of Lagrangian mechanics carried out by W. R. Hamilton by replacing the “natural” configuration space by the so called phase space with its rich symplectic structure [2] and as a central tool in Hamilton-Jacobi theory. In the late 19th century, it was used in Thermodynamics (Gibbs, Maxwell)  as a way of switching between different thermodynamic potentials. With the development of Convex Analysis and Optimization in the 20th century, its central role in providing so called dual representations of functions, problems, etc. became apparent, [3].

From a high level point of view, the transform establishes a link between a function of a variable x belonging to \mathbb{R}^n or a more general topological vector space and its “dual” or “conjugate” function, defined on the dual space of linear forms. From a more practical perspective, it can be understood as an alternative way of encoding the information contained in a function. In that sense, it is very similar to other transforms like the Fourier series/transform and the Laplace transform. The Fourier series, for instance, encodes the information about the given (periodic) function in the form of a sequence of “amplitudes” corresponding to the different “modes”. The Fourier transform furnishes a similar integral representation for non-periodic functions, where this time different “modes” or “frequencies” form a continuum. Both the series and the transform provide a spectral representation of the function.

Suppose we are given a convex real function of one real variable. Let us assume for simplicity that f is twice differentiable and f''(x)>0 for all x. Typical examples are quadratic functions f(x)=ax^2+bx+c with a>0, exponential functions g(x)=Ce^{kx} with C>0, the function h(x)=\ln (1+e^x), etc.

Since f''(x)>0, f' is a strictly increasing (hence one-to-one) function. Such correspondence allows to identify x with the value of the slope of f at that point, p=f'(x). In other words, p can be used as a new independent variable. At this point, one might be tempted to use the function g(p)=f(x(p)), where p=f'(x) (or x=(f')^{-1}(p)) as an alternative representation of f. This is, no doubt, a possible choice. However, a little extra work provides a function f^* that has the following pleasant additional property: if the procedure is applied twice we recover the original function, f^{**}:=(f^*)^*=f. The transformation f\to f^* is what is called involutive (an involution).

Namely, given a slope p, we start by identifying the point x such that p=f'(x). You can think of this operation as parallel-transporting a line with slope p in the direction of the y- axis starting from y=-\infty. Assuming that p\in\textrm{Im\ }f', there will be a first point of contact with the graph of f. At this point, the graph is tangent to the line. We will encode this information by recording the point of intersection with the y– axis; more precisely, if our line has equation y=px+b, we define f^*(p)=-b. The knowledge of the function f^* is tantamount to the knowledge of all the tangents to our graph: our original graph is their (uniquely defined) envelope. The function f^* is called the Legendre transform of f. In the figure below, the values of the transform of the function f(x)=x^2-2x+3 at p=0, \, p=1 and p=2 are represented.

It is easy to derive an explicit formula for f^*. Given p, we need to solve for x=x(p) in the equation

p=f'(x).

The equation of the tangent is y-f(x(p))=p(x-x(p)). Its y-intercept is b(p)=f(x(p))-px(p). Finally,

f^*(p)=-b(p)=px(p)-f(x(p))\qquad\qquad (1).

This new function f^* is convex. Indeed, a simple computation shows that

\displaystyle{(f^*)''(p)=\frac 1{f''(x(p))}>0}.

From formula (1) it follows that, given x, we have

f(x)=p(x)x-f^*(p(x)),

where, as before, f'(x)=p. The formula above reveals that f=(f^*)^*, (i.e. the transformation is involutive), since

\displaystyle{\frac{df^*(p)}{dp}=x(p)+p\frac{dx}{dp}-f'(x(p))\frac{dx}{dp}=x(p)+\frac{dx}{dp}\left[p-f'(x(p))\right]=x(p)}

There is another way to look at this. For fixed p in the range of f', the condition p=f'(x) yields the (unique) critical point of the concave function x\to px-f(x). This point is a global maximum. Thus,

f^*(p)=\max\limits_{x}\,(px-f(x)).\qquad\qquad (2).

The latter equation implies the well-known Young’s inequality:

f^*(p)+f(x)\ge px, \qquad\qquad (3)

valid for any x,p. Equality takes place when x,p are linked by the relation p=f'(x).

Examples: If f(x)=\frac{x^2}{2}, f^*(p)=\frac{p^2}{2}. More generally, if f(x)=\frac{x^{\alpha}}{\alpha} with \alpha>1 for x>0, then f^*(p)=\frac{p^{\beta}}{\beta}, where 1/\alpha+1/\beta=1. If f(x)=e^x, then f^*(p)=p\ln p -p, etc.

Generalizations

So far, we have assumed that our function f is twice differentiable, with f''(x)>0 in its domain. However, the following slight modification of (2),

f^*(p)=\sup\limits_{x}\,(px-f(x))\qquad (4)

is meaningful for any real p and any function f if we accept the values \pm\infty in the range of f^*. The resulting function f^*, being the supremum of a family of linear functions, is convex on its domain. Definition (4), however, is not completely satisfactory for non-convex functions, as it loses information about the original function. However, for a convex function defined on \mathbb{R}, f^* encodes all the information about f and, moreover, f^{**}=f, a fact known as the Fenchel-Moreau theorem, [3]. Namely,

f(x)=\sup\limits_{p}\,(px-f^{*}(p))\qquad (5),

which can be thought as a representation of f as the envelope of its tangents. For convex functions defined on some domain D\neq\mathbb{R}, an extra technical condition is needed on f for (5) to hold, namely lower semicontinuity, which is equivalent to the closedness of its epigraph. It is easy to see that this is a necessary condition for (5) to hold. Indeed, (5) furnishes a representation of the epigraph of f as an intersection of closed half-planes, hence necessarily closed.

In the context of Convex Analysis, the more general transformation given by (4) is called the Fenchel transform (or Fenchel-Legendre transform) and the more general inequality (3) is called the Fenchel inequality (or Fenchel-Young inequality).

Thus, the transformation can be applied to a piece-wise linear, convex function. For instance, if f is a linear function, f(x)=ax+b, then clearly f^* is finite only for p=a, with f^*(a)=b. If f is made of two linear functions, f(x)=a_1x+b_1 when x\le x_0 and f(x)=a_2x+b_2 when x\ge x_0, with a_1<a_2 and a_1x_0+b_1=a_2x_0+b_2, then f^* is finite only for p\in [a_1,a_2], where it is linear and ranges from -b_1 to -b_2. In general, to each “corner” of a polygonal graph there corresponds a segment on the graph of f^*, [2].

The generalization to functions f:\mathbb{R}^n\to\mathbb{R} is straightforward. Under the smoothness and strict convexity assumption

\displaystyle{\left(\frac{\partial^2f}{\partial x_i\partial x_j}\right)\succ 0},

the mapping D: x\to df(x,\cdot) is one-to-one. If p is the vector representing df(x,\cdot) via the standard Euclidean structure, we define the Legendre transform by

f^*(p)=\langle p, x\rangle-f(x), \qquad p\in \textrm{Im\ }D,

where \langle \cdot,\cdot\rangle denotes the inner product. Much like in the one-dimensional case, the previous definition is equivalent to

f^*(p)=\max\limits_{x}\,(\langle p, x\rangle-f(x)).

Finally, we relax the smoothness and strict convexity assumptions and arrive at the most general definition

f^*(p)=\sup\limits_{x}\,(\langle p, x\rangle-f(x))\qquad (5),

where now f is merely convex and f^*:\mathbb{R}^n\to (-\infty,+\infty]. The Fenchel-Moreau theorem holds without modifications under the extra assumption of lower semicontinuity of f if it is not defined on all of \mathbb{R}^n.

One can also consider “partial” Legendre transforms, i.e. transforms relative to some of the variables. Thus if g:\mathbb{R}^2\to\mathbb{R} is a function of two variables, one can consider its transform with respect to the first variable,

f_1^*(p_1,x_2)=\sup\limits_{x_1}\,( p_1 x_1-f(x_1,x_2)).

For smooth, strictly convex functions and fixed x_2, the supremum is achieved at the (unique) x_1 defined by

\displaystyle{p_1=\frac{\partial f}{\partial x_1}}.

The transform can be further generalized to functions on manifolds, but given the fact that a manifold does not have a global linear structure, the duality is established locally, between functions on the tangent bundle TM and their “conjugates” or “dual” on the cotangent bundle T^*M. Namely, the transform connects functions of (x,v) with functions of (x,p), where v\in T_xM and p\in T^*_xM.

Applications

A) Clairaut’s differential equation

A standard example of ODE not solved for the derivative y' is Clairaut’s equation

y=xy'-f(y').

It clearly admits the family of straight lines y=px-f(p) as solutions. The envelope of the family is a singular solution satisfying y=px-f(p) and x=f'(p). But these are precisely the relations defining the Legendre transform of f. We conclude that, for convex f, the singular solution of Clairaut equation is its Legendre transform.

B) Hamiltonian Mechanics from Lagrangian Mechanics.

For many Physics, Applied Math, and Engineering students, this is their first introduction to the Legendre transform. The Lagrangian of a mechanical system L(q,\dot{q},t) on its configuration space (a differentiable, Riemannian manifold) completely describes the system. The actual path q(t) joining two states (t_1,q_1) and (t_2,q_2) is an extremal of the action functional,

S[q]=\displaystyle{\int\limits_{t_1}^{t_2}}  L(q,\dot{q},t)\,dt

(Hamilton’s principle) and therefore satisfies Euler-Lagrange second order differential equations

\displaystyle{\frac{d}{dt}\frac{\partial L}{\partial \dot{q}}-\frac{\partial L}{\partial q}=0}.

Te geometry of the configuration space is intimately connected to the Physics. Thus, holonomic constraints are built into the manifold, the kinetic energy is nothing but the Riemannian metric on the manifold, geodesics relative to this metric represent motion “by inertia”, etc. The alternative Hamiltonian description reveals connections to a different geometry and is introduced as follows. Assuming that the Lagrangian is strictly convex in the generalized velocities \dot{q},

\displaystyle{\left(\frac{\partial^2L}{\partial \dot q_i\partial\dot q_j}\right)\succ 0},

we introduce the Hamiltonian of the system as the Legendre transform of the Lagrangian with respect to \dot{q}:

\displaystyle{H(p,q,t)=\langle p,\dot{q}\rangle -L(q,\dot{q},t)},

where \displaystyle{p=\frac{\partial L}{\partial \dot{q}}} is the generalized momentum, an element of the cotangent fiber at q. Since

\displaystyle{dH=\langle\dot{q},dp\rangle +\langle p,d\dot{q}\rangle-\langle\frac{\partial L}{\partial q},dq\rangle-\langle\frac{\partial L}{\partial \dot{q}},d\dot{q}\rangle-\frac{\partial L}{\partial t}dt}

\displaystyle{\,\,\, =\langle\dot{q},dp\rangle-\langle\dot{p},dq\rangle-\frac{\partial L}{\partial t}dt}

it follows that (p,q) satisfy the first order Hamiltonian system:

\displaystyle{\dot{p}=-\frac{\partial H}{\partial q}};\qquad \displaystyle{\dot{q}=\frac{\partial H}{\partial p}}

The space \{(p,q)\} is called the phase space, and the Hamiltonian function equips it with a remarkable symplectic structure. The Hamiltonian flow (if H does not depend on time) is a one-parameter subgroup of the group of symplectomorphisms of the phase space. A straightforward consequence is Liouville’s Theorem on the preservation of the phase volume (a cornerstone of Statistical Mechanics) . The Lagrangian approach has, however, certain advantages including: a) it is easier to deal with constraints, even non-holonomic via Lagrange multipliers; b) it is easier to track conserved quantities via Noether’s theorem, c) non-conservative forces can be incorporated, etc.

It is worth noticing that the above “Hamiltonization” applies to general variational problems, not just to the problem related to mechanical systems. In the case of mechanical systems, the part of the Lagrangian depending on \dot q is usually a positive definite quadratic function (the kinetic energy of the system) hence convex. The transform of a quadratic form is especially simple: it is just another quadratic form of the conjugate variable. For instance, the Lagrangian of a simple mass-spring system, assuming that the spring is linear and q represents the deviation of the mass from equilibrium is

L(q,\dot q)=\displaystyle{\frac 1{2}m\dot q^2-\frac 1{2}m q^2}

and the Hamiltonian is

H(p,q)=\displaystyle{\frac {p^2}{2m}+\frac 1{2}m q^2}

and represents the total energy of the system. That is the case whenever the Lagrangian is quadratic on velocities, the system is conservative and the constraints are time-independent.

C) Thermodynamic potentials

Yet another early use of the transform was in Thermodynamics, as a way to switch between potentials according to the most convenient independent parameters.

According to the first principle of Thermodynamics (energy conservation), there exists a function of state U (internal energy) such that, for any thermodynamical system undergoing an infinitesimal change of state,

\displaystyle{dU=\delta Q+\delta W+\sum_i\mu_idN_i},

where \delta Q represent the heat (thermal energy) added to the system, \delta W is the work of the surroundings on the system and the last sum above accounts for the energy added to the system by means of particle exchange. Here, dN_i is the amount of particles of the i-th type added to the system, and \mu_i the corresponding chemical potential. The relevant fact here is that dU is an actual differential, whereas the rest of the terms are just differential forms in the configuration space of the system. The term \delta W is in general a differential form \sum p_idq_i involving intensive (p_i) and conjugate extensive variables dq_i. Thus, for example, in the case of a gas expanding/compressing against the environment, the work done on the gas is \delta W=-p_{ext}dV, where dV is the infinitesimal change of volume and p_{ext} is the external pressure. Moreover, according to the second principle, for reversible processes the form \delta Q admits an integrating factor \frac 1{T}, namely there is a function of state called entropy S such that

\displaystyle{dS= \frac{\delta Q}{T}}

Putting all together, for an infinitesimal, reversible (hence quasi-static) process undergone by a gas we get

\displaystyle{dU=TdS-pdV+\sum_i\mu_idN_i}\qquad (6)

where p=p_{ext} is the pressure of the gas in equilibrium with its surroundings.

While U accounts for the total energy of the system, related state functions accounting for different manifestations of energy may result more convenient for particular experimental or theoretical scenarios. For instance, many chemical reactions occur at constant pressure in lab conditions. In such cases, it is convenient to include a term representing the work needed to push aside the surroundings to occupy volume V at pressure p, namely we define the enthalpy of the system as

H=U+pV.

Given that, thanks to (6) we have \partial U/\partial V=-p, H is nothing but (minus) the Legendre transform of U with respect to V, that is,

H(S,p)=-U^*(S,p).

(in Thermodynamics, the transform is usually defined with opposite sign so there is no “minus” in the previous formula. A possible reason is that one prefers all forms of energy to increase/decrease in agreement). Assuming for simplicity that dN=0 we have

dH=dU+pdV+Vdp=TdS-pdV+pdV+Vdp=TdS+Vdp

and, therefore, at constant pressure, dH=TdS=\delta Q. Thus, the change of enthalpy determines if a given chemical reaction is exothermic or endothermic. Moreover, \partial H/\partial S=T and \partial H/\partial p=V.

In a similar fashion, one can consider the (opposite of) Legendre transform of U with respect to entropy,

F=U-TS,

called (Helmholtz) free energy. A similar computation shows that

dF=-SdT-pdV,

thus we can think of dF as the pressure-volume work on the system under fixed temperature.

Yet another thermodynamic potential, the Gibbs free energy, is a measure of the maximum reversible work a system can perform at constant pressure (P) and temperature (T), excluding expansion work.  It is useful to determine if a given process occurs spontaneously (e.g., chemical reactions, phase transitions) and equilibrium conditions.

Bibliography

[1] Legendre, A. M., “Mémoire sur l’intégration de quelques équations aux différences partielles.” Histoire de l’Académie Royale des Sciences, 1789, pp. 309–351.

[2] Arnold, V. I. “Mathematical Methods of Classical Mechanics”, 2nd Edition,  Graduate Texts in Mathematics (60), 1989.

[3] Boyd, Stephen P. and Vandenberghe, L. “Convex Optimization“. Cambridge University Press, 2004.

The Lipschitz condition and uniqueness for ODEs

Fig.1. (L) The Lipschitz condition for a function f. (R) Rudolph Lipschitz (1832 -1903)

A classic condition on the right hand side of a first order normal ODE

\displaystyle{\frac{dx}{dt}=f(t,x)}

guaranteeing local uniqueness of solutions with given initial value x(t_0)=x_0 is Lipschitz continuity of f in the x variable in some open, connected neighborhood U of (t_0,x_0) where f is defined. That is, it is required that there exist some constant L>0 such that

|f(t,x_1)-f(t,x_2)|\le L|x_1-x_2|,\qquad\qquad\qquad \textrm{(LC)}

for all (t,x) in U. Under the above condition there is a unique local solution of the initial value problem

\left\{\begin{array}{l}\displaystyle{ \frac{dx}{dt}=f(t,x)} ;\\[10pt] x(t_0)=x_0 \end{array}\right.\qquad\qquad\qquad \textrm{(IVP)}

where uniqueness means that two prospective solutions defined on open intervals containing t_0 coincide in their intersection. We assume that our solutions are classical, \it{i.e.}, continuously differentiable. Such solutions exist when extra conditions on f are imposed. For instance, if f is assumed continuous in U, classical local solutions exist and can be extended up to the boundary of U. But here our concern is uniqueness.

My goal here is to explain why such condition implies uniqueness in simple terms, how it can be generalized and the relation between uniqueness and another interesting phenomenon, namely finite-time blow-up of solutions.

To illustrate the idea, we will assume that t_0=0 and x_0=0. We will also assume that f(t,0)=0 for every t and, therefore, one solution of the above IVP is x(t)\equiv 0. The general case reduces to this one, as we explain later.

We focus on forward uniqueness. Namely, we will prove that a solution x(t) with x(t_1)=x_1\neq 0 for some t_1>0 can never be zero at t=0. We will assume x_1>0, as the case x_1<0 can be handled in a similar way. Backwards uniqueness also follows easily from the forward result.

Uniqueness is violated if, as t decreases from t_1 to zero, x(t) vanishes at some point \bar t, 0\le \bar t<t_1, while x(t)>0 for \bar t<t\le t_1. By the Lipschitz condition,

|f(t,x(t))|=|f(t,x(t))-f(t,0)|\le L x(t)

while x(t)>0. It then follows from the ODE that

\displaystyle{\frac{dx}{Lx}\le dt}

along the solution for t\in (\bar t,t_1). Integrating this inequality over [t,t_1],

\displaystyle{t_1-t\ge\frac{1}{L}\int\limits_x^{x_1}\frac{ds}{s}}\qquad\qquad\qquad\textrm{(II)}

This integral inequality is the crux of the argument. Just observe that, since the improper integral is divergent, x(t) approaching zero would require t\to -\infty, contradicting the assumption x(t)\to 0 as t\to \bar t^+. The Lipschitz condition prevents x(t) from becoming zero over finite t-intervals.

The argument above is equivalent to the following: for any solution x(t) with x(t_1)=x_1>0 for some t_1>0 one can construct an exponential barrier from below of the form y(t)=Ce^{Lt} for some C>0, that is, x(t)\ge Ce^{Lt} for all t<t_1 in its domain, thus preventing x(t) from becoming zero. Indeed, the inequality

\displaystyle{\frac{dx}{dt}}\le Lx

implies that our solution x(t) is increasing at a slower pace than the solution y(t) of the IVP

\left\{\begin{array}{l}\displaystyle{ \frac{dy}{dt}=Ly} ;\\[10pt] y(t_1)=x_1 \end{array}\right.

which is nothing but y(t)=Ce^{Lt} for the appropriate C>0. Therefore, y(t)\le x(t) to the left of t_1. In other words, y(t) acts as a lower barrier for x(t), as in the figure below.

Fig 2. The exponential barrier prevents the solution through P from reaching the t-axis.

If we assume that x_1<0, we use the inequality (again a consequence of \textrm{(LC)})

\displaystyle{\frac{dx}{dt}}\ge Lx

for x<0 to conclude that y(t)=De^{Lt} with D<0 is a barrier from above for our solution, preventing it from reaching the t-axis.

The integral inequality \textrm{(II)} suggests a very natural generalization. Indeed, all we need is a diverging (at zero) improper integral on the right-hand side. We can replace the Lipschitz condition by the existence of a modulus of continuity, that is a continuous function \Phi: [0,\infty)\to [0,\infty) with \Phi(0)=0, \Phi (u)>0 for u>0 satisfying

|f(t,x_1)-f(t,x_2)|\le \Phi(|x_1-x_2|)

in U, with the additional property

\displaystyle{\int\limits_{0^+}\frac {ds}{\Phi(s)}=\infty}.

This more general statement is due to W. Osgood. The Lipschitz condition corresponds to the choice \Phi(s)=s. The proof is identical to the one above, given that the only property we need is the divergence of the improper integral of 1/\Phi at 0^+.

Thus, for an alternative solution to branch out from the trivial one, we require a non-Lipschitz right-hand side in \textrm{(IVP)} that leads to a convergent improper integral. This condition is satisfied, for instance, in the autonomous problem

\left\{\begin{array}{l}\displaystyle{ \frac{dx}{dt}=x^{2/3}} ;\\[10pt] x(0)=0 \end{array}\right.

which, apart from the trivial solution, has solutions of the form x(t)=0 on [0,c) and x(t)=(t-c)^3/27 for t\ge c for any c>0.

There is nothing special about the power 2/3 in this example. Any power \alpha with 0<\alpha<1 would do. These examples of non-uniqueness are usually attributed to G. Peano.

Uniqueness for general solutions can be easily reduced to the special case above. Namely, if \hat x(t) and x_1(t) are local solutions of \textrm{(IVP)} (say on [t_0,t_1)), then \bar x(s)=\hat x(t_0+s)-x_1(t_0+s) is a local solution of

\left\{\begin{array}{l}\displaystyle{ \frac{d\bar x}{ds}=g(s,\bar x):=f(s+t_0, \bar x+x_1(s+t_0))-f(s+t_0,  x_1(s+t_0))} ;\\[10pt] \bar x(0)=0 \end{array}\right.

on [0,t_1-t_0) with g(s,0)= f(t, x_1(t))-f(t,x_1(t))=0. Moreover, g satisfies a Lipschitz condition near (s,\bar x)=(0,0) if f does near (t_0,x_0), with the same constant L. By the above particular result, \bar x(s)\equiv 0 and hence \hat x(t)\equiv x_1(t) on the corresponding intervals.

Remarks: a) A simple and widely used sufficient condition for \textrm{(LC)} to hold is the continuity of the partial derivative \displaystyle{\frac{\partial f}{\partial x}} in an x-convex region U (typically a rectangle). This follows from a straightforward application of Lagrange’s mean value theorem; b) \textrm{(LC)} is not necessary for uniqueness, as the example of \textrm{(IVP)} with f(t,x)=x^{\alpha}\sin\frac 1{x} with \alpha\in (0,2] shows; d) The Lipschitz condition is relevant in other areas of Analysis. For instance, it guarantees the uniform convergence of Fourier series.

A related phenomenon: blow up in finite time.

Local solutions issued from (t_0,x_0) can be extended to the boundary of U, but not necessarily in the t-direction. The reason is a fast (superlinear) grow of f(t,x) as x\to\pm\infty, assuming that the domain of definition of f extends indefinitely in the x-direction. A simple example is the problem

\left\{\begin{array}{l}\displaystyle{ \frac{dx}{dt}=x^2} ;\\[10pt] x(t_0)=x_0>0 \end{array}\right.,

whose explicit solution \displaystyle{x(t)=\frac{x_0}{1-x_0(t-t_0)}} “blows up” as t\to (t_0+1/x_0)^-, despite the fact that f(x,t)=x^2 is smooth on the whole plane. The role of the superlinear growth at infinity is similar to the role of Lipschitz (or Osgood) condition in bounded regions for uniqueness. The above problem is equivalent to

\displaystyle{\int\limits_{x_0}^x\frac{dx}{x^2}=t-t_0}

Convergence of the improper integral \displaystyle{\int\limits^{\infty}\frac{dx}{x^2}} prevents t from attaining arbitrarily large values. Calling \displaystyle{T=t_0+\int\limits_{x_0}^{+\infty}\frac{dx}{x^2}=t_0+1/x_0}, we have \lim_{t\to T^-}x(t)=+\infty. This phenomenon is called finite time blow-up and is exhibited by ODEs with superlinear right-hand-sides, by some evolution PDEs with superlinear sources, etc.

The same reasoning applies in the general case when there exists a continuous \Psi>0 such that f(t,x)>\Psi(x) as x\to\infty (resp. f(t,x)<-\Psi(x) as x\to-\infty) if only

\displaystyle{\int\limits^{\infty}\frac{ds}{\Psi(s)}<\infty,\quad\textrm{resp.}\quad\int\limits_{-\infty}\frac{ds}{\Psi(s)}<\infty}.

This time, under assumptions guaranteeing existence and uniqueness of solutions and provided the first condition above holds, the (forward) solution to \textrm{(IVP)} with x(t_0)=a>0 stays to the left of the solution of

\left\{\begin{array}{l}\displaystyle{ \frac{dx}{dt}=\Psi(x)} \\[10pt] x(t_0)=x_0 \end{array}\right.

with 0<x_0<a. The latter blows up in finite time, namely at time \displaystyle{T=t_0+\int\limits_{x_0}^{\infty}\frac{ds}{\Psi(s)}}, forcing the solution to our IVP to blow up at some T'\le T.

A story of the “real line”

Main contributors to the theory of real numbers. From left to right: Eudoxus, S. Stevin, G. Cantor and R.Dedekind.

Nowadays students are exposed to the concept of “real line” as a model of the one-dimensional continuum in middle school. The real line contains all previously encountered classes of numbers: natural, integers, rational and irrational and no “gaps” are left. This object, so ubiquitous in modern mathematics, constitutes the basis upon which all of Calculus, Real and Complex Analysis and Linear Algebra (over \mathbb{R} or \mathbb{C}) are built.

It may come as a surprise the fact that the concept crystallized relatively recently, after a slow and painful evolution of the concept of number over the centuries, with contributions from various civilizations shaping our current understanding.

In the Western hemisphere, as far as a systematic development is concerned, the story starts in ancient Greece around 500 BC. Greeks derived their Mathematics from older Egyptian and Babylonian traditions, which viewed Mathematics as a practical tool for counting and measuring.

According to Plato (Philebus 56D-57E, see [2]), there were two kinds of Mathematics in Greece, “that of the people and that of philosophers”. The mathematics of the people was called logistics (λογιστική) and followed the Egyptian/Babylonian tradition. It was concerned with calculations and measurement. That of philosophers, especially appreciated by Plato (429-348 BC) was viewed as a means for apprehending truth. Accordingly, they had a nuanced understanding of numbers. Only counting or natural numbers except for the number one were “numbers” in their own right for them. They used the term “arithmos” (ἀριθµός) for those numbers, and “arithmetic” (ἀριθμητική) for the related study. The reason why the unit was not considered a number was rather philosophical: the unit was a monad, related to the “essence” and indivisible, while a number, by definition, is a multitude of units. This ontological distinction is made by Aristotle in his Metaphysics, 1039a15 and elsewhere, and it is the historical reason why “one” was not considered a prime number. It was not a number at all!

Here is another nuance of Greek mathematics: they would carefully distinguish between numbers and magnitudes. Those belonged to completely different realms. Numbers were discrete whereas magnitudes were continuous. There were different kinds of magnitudes: lengths, areas, volumes, times, etc. When Greeks referred to a magnitude, they did not have a number in mind. A length was a portion of a line; an area was a portion of the plane. Their formulation of Pythagorean Theorem, for instance, involved squares built on the legs and the hypothenuse of a right triangle, and the proofs amounted to shape rearrangements.

Apart from “arithmos”, Greeks had the concept of “logos” (λόγος), which can be translated as “ratio”. These “fractions” were not actual numbers and served the purpose of attaching a “relative” size to magnitudes. According to Elements, Book V, Definitions 3 and 4, “a ratio is a sort of relation in respect of size between two magnitudes of the same kind”. However, it did not make sense for them to consider the ratio of a length to an area, or of length to time (mixed ratios). They only considered ratios of magnitudes of the same kind and ratios of numbers. Ratios could be compared between magnitudes of different kinds by means of proportions. Here is an example, [2]: “Triangles and parallelograms which are under the same height are to one another as their bases” (Elements, book VI).

Numbers can be added and subtracted, and so can magnitudes of the same kind, but ratios cannot. An abstract interpretation of the concept of ratio in Greek mathematics can be found in Bourbaki, [3]: “the ratios of integers are conceived by the classical Greek mathematicians as operators, defined on the set of integers or on a part of this set (the ratio of p to q is the operator which, to N, makes correspond, if N is a multiple of q, the integer p(N/q), a multiple of p“.

Discovery of Incommensurability

A significant milestone in Greek mathematics was the discovery of incommensurability, traditionally attributed to the Pythagoreans. They found that some lengths, such as the diagonal of a square and its side, could not be “measured with a common unit”. In other words, there were not enough ratios of numbers to account for all possible ratios of magnitudes. According to Fowler [2], the discovery may be related to experimentation with anthyphairesis (the Euclidean algorithm/continued fractions) by noticing that in some cases the process continued indefinitely, and no common unit could be thus found. Anthyphairesis applied to the side and the diagonal of a square or to the side and the diagonal of a pentagon easily lead to a periodic continuous fraction and consequent incommensurability. Thus for example, the ratio of the diagonal of a square to its side (which we identify with the real number \sqrt{2}) produces the continuous fraction

\displaystyle{\sqrt{2}=1+\frac{1}{2+\displaystyle{\frac{1}{2+\displaystyle{\frac{1}{2+\displaystyle{\frac{1}{2+\dots}}}}}}}}.

In many modern textbooks it is claimed that the Pythagoreans discovered that the number \sqrt{2} was irrational. Such claim is anachronistic and inaccurate; neither fractions or \sqrt{2} were numbers at all.

Eudoxus’ Theory of Proportions

In response to the problem of incommensurability, Eudoxus, who lived in the third century BC and attended Plato’s Academy, developed the so called theory of proportions. Eudoxus’ theory, as rendered in Euclid’s Elements, provided a way to compare (homogeneous) magnitudes, narrowing the gap between arithmetic and geometry. Namely, “magnitudes are said to be in the same ratio, the first to the second and the third to the fourth, when, if any equimultiples whatever be taken of the first and third, and any equimultiples whatever of the second and fourth, the former equimultiples alike exceed, are alike equal to, or alike fall short of, the latter equimultiples respectively taken in corresponding order. When, of the equimultiples, the multiple of the first magnitude exceeds the multiple of the second, but the multiple of the third does not exceed the multiple of the fourth, then the first is said to have a greater ratio to the second than the third has to the fourth” (Elements, Book V, Def. 5). In other words, given four magnitudes a,b,c,d we say that the first is to the second as the third is to the fourth, a : b :: c : d if, for any integers m, n the following equivalencies hold

ma<nb \Leftrightarrow mc<nd;\qquad ma = nb \Leftrightarrow mc = nd; \qquad ma>nb\Leftrightarrow mc>nd,

and similarly for the “less than” and the “greater than” relations between ratios. If two magnitudes a and b are commensurable, there exist integers m and n such that the equality relation above holds. If they are incommensurable, the two extreme relations furnish “rational approximations” to the given ratio a:b. But again, neither those rational approximations or the final ratio were “numbers”.

The main advantage of Eudoxus’ theory was that it could be applied to both commensurable and incommensurable magnitudes, see Heath [1]. It foreshadows the construction of real numbers given by Dedekind over two millennia later.

Such was the official position of Greek mathematics, as portrayed in their most influential text, the Elements of Euclid, around 300 BC.

Logisticians, on the other hand, viewed fractions as numbers and freely added and subtracted them, specially fractions with numerator equal to one, following the Egyptian tradition.

The Hellenistic Period and Beyond

During the Hellenistic period, Greek Mathematics continued to build on the above concepts. Archimedes made significant contributions to the understanding of numbers and geometry, including the approximation of \pi using the method of exhaustion that had been introduced by Eudoxus and related to the above rational approximations of the perimeter/radius ratio. Notably, Archimedes applied this method to the computation of areas of non-trivial shapes (parabolic sector, sector bounded by an arc of a spiral, etc.) Many applications of this method can be found in the Book XII of the Elements.

With the decline of Greek Mathematics in the third century AD, philosophical considerations gave way to a return of the naiver point of view of logisticians, [3]. A representative of that period is Diophantus of Alexandria, who was the first mathematician to recognize (positive) fractions as numbers. His main motivation came from Algebra, by accepting rational and even irrational numbers as possible values for coefficients and solutions to algebraic equations. Arithmetica is the major work of Diophantus and the most prominent work on premodern Algebra in Greek mathematics. Thus, a major advancement, namely the development of Algebra, was accompanied by a conceptual regress, [3].

Medieval Developments and Stevin’s Innovations

The medieval period saw a relative stagnation in the advancement of Mathematics in Europe, but significant progress was made in the Islamic world. Mathematicians like Al-Khwarizmi and Omar Khayyam expanded on Greek ideas and introduced algebraic methods that would later influence European mathematics. Among other things, negative numbers were introduced about 620 CE in the work of Brahmagupta (598 – 670) who used the ideas of ‘fortunes’ and ‘debts’ for positive and negative. By this time a system based on place-value was established in India, with zero being used in the Indian number system.

The Renaissance brought a resurgence of interest in mathematics in Europe. Arithmetica was first translated (never published) from Greek into Latin by R. Bombelli in 1570. Bombelli realized that, once a unit of length is chosen, there is a one-to-one correspondence between ratios and lengths. He thus had a geometric version of the real numbers. In his Algebra, he was the first European who clearly stated (and gave geometric proofs of) the rules for multiplication of positive and negative numbers: +\times +=+, +\times -=- etc. Bombelli is also responsible for an early use of “imaginary” numbers for the solution of cubic equations.

In the same century, Simon Stevin, a Flemish mathematician, adopting the point of view of Bombelli, made groundbreaking contributions by advocating for the use of decimal fractions. In his work De Thiende (The Tenth, published in 1585), Stevin demonstrated how fractions could be represented as decimal numbers, making calculations more straightforward and paving the way for the inclusion of irrational numbers in arithmetic. For the first time, the unit, fractions and irrational numbers were members of one and the same numerical field. In Stevin’s own words: “We conclude therefore that there are no absurd, irrational, irregular, inexplicable or deaf numbers; but that there is in them such excellence and agreement that we have subject matter for meditation night and day in their admirable perfection”.

Stevin’s work was crucial in transitioning from a purely geometric interpretation of numbers to a more algebraic and analytical approach. His decimal fractions also inspired Newton’s study of infinite series.

The real line

The first mention of the number line used for operation purposes is found in John Wallis’s (1616 – 1703) Treatise of Algebra. In his treatise, Wallis describes addition and subtraction on a number line in terms of moving forward and backward, under the metaphor of a person walking. The one-to-one correspondence between points on a line and numbers had been established.

René Descartes (1596 – 1650) took the idea further with his introduction of the Cartesian coordinate system on the plane and in space. In his work La Géométrie Descartes linked Algebra and Geometry, allowing geometric problems to be solved algebraically and viceversa. This unification laid the foundation for Analytic Geometry. The analytic study of curves gave rise to the birth of Calculus. The introduction of coordinates is essential to the construction of many physico-mathematical theories like Mechanics, Electromagnetic Theory, Thermodynamics, Statistical Mechanics, &c.

A rigorous construction

Stevin’s decimals did not lend themselves to a satisfactory construction of real numbers, in particular they did not provide sound definitions of the operations of sum and multiplication. The year 1872 saw the publication of work by G. Cantor and R. Dedekind, who independently provided a rigorous construction of real numbers out of rationals. Cantor’s construction is based on classes of equivalency of “Cauchy sequences” of rational numbers, whereas Dedekind’s approach hinges on the concept of “cut”, and is conceptually very close to Eudoxus’ theory of proportions. Both constructions were possible thanks to the recently developed language of the theory of sets, which allowed to consider sets of very large “size” (more precisely, cardinality). Such flexibility raised suspicion from many relevant scholars and a decades long debate ensued. Although the construction of reals by Cantor and Dedekind have been largely accepted and is regularly taught at colleges and universities, alternative schools of thought including Intuitionism and Finitism are still in existence and have developed their own Mathematics based on restricted versions of Set Theory, restricted Logic, etc.

Beyond the reals

We should also mention that alternative constructions of the continuum, built upon standard Set Theory, have been proposed starting in the last decade of the XIXth century. A full reconstruction of Analysis was proposed in the 1960s by A. Robinson, culminating with his Nonstandard Analysis, published in 1966. In his work, the real line is replaced by a non-Archimedean”hyperreal” one, containing actual infinitesimals and infinities of different orders along with ordinary real numbers. The main goal was to remove the concept of limit and construct a purely “algebraic” version of Analysis, thus vindicating the dream of Leibniz. But this is a major and intricate topic that deserves a separate post.

References

[1] Sir T. Heath, “A History of Greek Mathematics”, Oxford Clarendon Press, 1921.

[2] D.H. Fowler, “Ratio in Early Greek Mathematics”, Bulletin of the AMS, Vol 1 (6), 1979.

[3] N. Bourbaki, “Elements of the History of Mathematics”, Springer, 1999.

Asymptotes, tangents and backwards long division

A slant asymptote for a real function of one real variable f is a line y=ax+b with the property

f(x)=ax+b+g(x)\qquad\qquad (1),

where g(x)\to 0 as x\to\pm\infty. Geometrically, the graph of f comes closer and closer to the line for large positive/negative values of x. For the sake of simplicity, we will deal with asymptotes at +\infty, the other case being completely analogous. When a=0 we say that the asymptote is horizontal. A simple way to detect if a given function has a slant asymptote is by checking linearity at infinity, {\it i.e.} if the limit

\displaystyle{\lim\limits_{x\to\infty}\frac{f(x)}{x}}

is finite. If that is the case, the value of the limit is the slope of the asymptote, a. The free term b is then given by the limit

\displaystyle{\lim\limits_{x\to\infty}f(x)-ax},

if the latter exists (and is finite). For rational functions of the form

\displaystyle{f(x)=\frac{P(x)}{Q(x)}}

where P and Q are polynomials, the situation is much simpler. In order for f to be linear at infinity, we need {\rm deg}(P)-{\rm deg}(Q)\le 1 and, if that is the case, we perform long division which leads to

\displaystyle{f(x)=ax+b+\frac{R(x)}{Q(x)}}\qquad\qquad (2)

where {\rm deg}(R)<{\rm deg}(Q) and, therefore, the asymptote is just the quotient y=ax+b since

\displaystyle{\lim\limits_{x\to\infty}\frac{R(x)}{Q(x)}=0}

When {\rm deg}(P)-{\rm deg}(Q)> 1, the fraction is superlinear at infinity.

All this is well known and usually taught in high school.

On the other hand, students are taught to find tangents to a given curve at a given point by means of derivatives. It turns out, however, that finding the tangent to a rational function at x=0 does not require derivatives at all. In order to understand this, we notice that y=cx+d is a tangent to f(x) at x=0 precisely when

f(x)=cx+d+g(x)

with g(x)=o(x) as x\to 0. The last relation is very similar to (1) except for the fact that now we are looking at x\to 0 instead of x\to\infty. Thus, if we could somehow come up with a relation like (2) where now

\displaystyle{\lim\limits_{x\to 0}\frac{R(x)}{xQ(x)}= 0},

the quotient would be the tangent.

As it happens, that is perfectly possible. All we need to do is divide the polynomials starting with the lowest powers (backwards), until we reach a “partial remainder” whose lowest degree is two or higher. Observe that the lowest degree of the divisor Q is necessarily zero (otherwise f is undefined at x=0).

As an example, let us find the tangent to the graph of

\displaystyle{g(x)=\frac{2-2x+3x^2}{1+2x^2}}

at x=0. Starting with the lowest degree, we have 2=2\cdot 1, 2(1+2x^2)=2+4x^2 with first partial remainder R_1(x)=2-2x+3x^2-(2+4x^2)=-2x-x^2. In the next step, we add -2x to the quotient, with a partial remainder R_2(x)=-2x-x^2+2x(1+2x^2)=-x^2+4x^3. Since we are interested in the tangent line and x^2 is an infinitesimal of degree higher than one at zero, the division stops here and the equation of the tangent is y=2-2x. If we keep dividing, we get the Taylor polynomials of higher degree. For instance, the osculating parabola at x=0 is y=2-2x-x^2 and so on.

Long division is taught at school starting with the highest powers. A possible reason is that long division of numbers proceeds by reducing the remainder at each step. If we replace the base 10 in the decimal representation of numbers by x, we arrive at the usual long division algorithms of polynomials, reducing the degree at each step. I would call this procedure “division at infinity”. In contrast, the above is an example of “division at zero”.

Finding the tangent to a rational function at a point different from zero can be reduced to the previous case. If, say, we need to find the tangent to f(x)=P(x)/Q(x) at x=x_0, all we need to do is set x=x_0+z and express P and Q as polynomials in z and then finding the tangent at z=0 as before. Finally, we have to replace z back by x-x_0 in the found equation of the tangent.

The above reveals a perfect symmetry between the problems of finding the asymptote and that of finding the tangent at x=0. In some sense, we can say that an asymptote is a “tangent at infinity” and, I guess, that a tangent is an asymptote at a finite point. Both problems are algebraic in nature and can be solved without limit procedures, just by means of division (forward or backward).

More generally, for rational functions, being the simplest non-polynomial functions, finding their Taylor expansion at zero (and, by translation at any point) is a pure algebraic procedure. I believe this fact should be emphasized in high school and it could be used as a motivating example to introduce more general power expansions. As a matter of fact, Newton was inspired by the algorithm of long division for numbers to start experimenting with power series, not necessarily with integer powers. The image below shows a page from his “Method of Fluxions and Infinite Series”. The “backwards” long division of a^2 by b+x is performed in order to get the power series expansion.

The differential of a quotient

Here is yet another example of “division at zero”.

When students are exposed to the differentiation rules, those are derived from the definition of derivative as the limit of the differential quotient. Thus for example to prove the rule of differentiation of a product we proceed as follows.

\displaystyle{(fg)'(x)=\lim\limits_{h\to 0}\frac{(fg)(x+h)-(fg)(x)}{h}=}

\displaystyle{=\lim\limits_{h\to 0}\frac{f(x+h)g(x+h)-f(x)g(x)}{h}=\lim\limits_{h\to 0}\frac{f(x+h)g(x+h)-f(x+h)g(x)+f(x+h)g(x)-f(x)g(x)}{h}=}

\displaystyle{=\lim\limits_{h\to 0}f(x+h)\frac{g(x+h)-g(x)}{h}+\lim\limits_{h\to 0}g(x)\frac{f(x+h)-f(x)}{h}=}

\displaystyle{=f(x)g'(x)+f'(x)g(x)},

given that all the limits are assumed to exist. A similar computation can be done for the quotient. It should be noted, however, that a little algebraic trick has to be used in both cases to make the derivatives of the individual factors appear explicitly. No big deal, but a bit artificial. And, importantly, not the way the founders of Infinitesimal Calculus arrived at these rules.

To help intuition, the product rule is often presented in the form

d(uv)=(u+du)(v+dv)-uv=udv+vdu+dudv=udv+vdu,

and the last term is ignored in the last equality as being a quadratic infinitesimal (in Leibniz’ terminology, the last equality is actually an “adequality”, a term coined by Fermat). Without a doubt, the latter derivation, albeit not meeting the modern standards of rigor, reveals the reason for the presence of the “mixed” terms and the general structure of the formula. Moreover, no algebraic tricks are required. The formula follows in a straightforward manner.

A similar derivation of the quotient rule involves “division at zero”. Here is the derivation in the book “Calculus made easy” by Silvanus Thompson, from 1910.

Observe that the operation has been stopped when the remainder is a quadratic infinitesimal. The conclusion of the computation is the familiar rule

\displaystyle{d\left(\frac u{v}\right)=\frac{u+du}{v+dv}-\frac u{v}=\frac{du}v-\frac{udv}{v^2}=\frac{vdu-udv}{v^2}}.

Infinite and continuous sums in Calculus

In pre-epsilontic times, mathematicians understood definite integrals as continuous sums of infinitesimals (in the above image, a page from Leibniz where he introduces the signs d for difference and \int for sums. He uses the sign \Pi for adequalities). In the hands of the Bernoullis, Huygens, Euler, Lagrange, Legendre and many others, Infinitesimal Calculus, including the theory of differential equations as well as numerous applications to Mechanics, Optics, etc. developed for over 150 years under that conceptual framework. The situation changed during the second half of the XIX century, when set theory and a rigorous construction of the continuum was put forward by Cantor and Dedekind. Supposedly, they provided a final answer to the nature of the continuum that has been prevalent for over 150 years by now. Such a sparse ontology did not convince everyone, and there have been numerous alternative models of the continuum developed over time, aimed at restoring the language of infinitesimals, see [1]. As for the concepts of derivative and integral, infinitesimal interpretations were replaced by epsilons and deltas at the hands of Bolzano, Cauchy and, notably, Weierstrass.

Let us recall the definitions for the sake of completeness. For a real function of a real variable f: D\to\mathbb{R}, if a is a point in the closure of D, we say that \lim_{x\to a}f(x)=L if for any \varepsilon>0 there exists some \delta=\delta(\varepsilon)>0 such that

x\in D\cap\{x:\, 0<|x-a|<\delta\}\Rightarrow |f(x)-L|<\varepsilon.

Both derivatives and integrals are limits. The derivative of f at a point a in the interior of D is the limit

\displaystyle{f'(a)=\lim_{h\to 0}\frac{f(a+h)-f(a)}{h}}

whenever the latter is finite. In the case of the definite integral over [a,b], we consider partitions \mathcal{P}=\{x_0,x_1,\dots x_n\}, where a=x_0<x_1<x_2<\dots<x_n=b with tags \xi_i\in [x_{i-1},x_i] and form the Riemann sums

\mathcal{R}(\mathcal{P}, \{\xi_i\})=\sum_if(\xi_i)(x_i-x_{i-1}).

If the above sums have a finite limit I as \max |x_i-x_{i-1}|\to 0 (uniformly w.r.t. the partitions and the tags), we say that f is Riemann-integrable with integral equal to I. Back to the language of epsilons and deltas, we are requiring that for any \varepsilon there exists \delta=\delta(\varepsilon) such that

\mathcal{P},\{\xi_i\}\textrm{\ \ with\ \ }\max |x_i-x_{i-1}|<\delta\Rightarrow |\mathcal{R}(\mathcal{P}, \{\xi_i\})-I|<\varepsilon

I am not concerned here with the alternative constructions of the continuum and the non-standard approach. My main concern is the teaching of (standard) Calculus. It is my belief that the main difficulties encountered by students when faced with the above definitions are philosophical/psychological in nature, as they feel a disconnection with good old Algebra and finite procedures. In order to fill the gap, I believe that the language of infinitesimals is to be preferred, specially in introductory courses. As I pointed out elsewhere, this is the language used by most users of Calculus, including Physicists and Engineers. Just like derivatives are quotients of infinitesimals, integrals are sums of infinitesimals, thus revealing a common theme. In other words, I believe that Calculus should be presented as an algebra of infinitesimals.

The main objection to this approach is the fact that infinitesimals cannot be properly defined in the realm of standard Calculus, based on the standard real line. I believe however that the pedagogical advantages stemming from its more intuitive character and its agreement with the historical evolution of the subject is worth the lack of rigor on a first acquaintance with the subject.

Let’s consider the different types of sums used in Calculus.

Adding two numbers is an operation everyone is familiar with. The (inductive) extension to a finite amount of addends is trivial, thanks to associativity. Here we are in the realm of Algebra.

Things become trickier when we intend to add infinitely many numbers. We already encounter this situation In Zeno’s paradoxes: In order to travel a certain distance, say d=1, one has to travel half of the distance, then a fourth of the distance, then an eight of it, etc. Zeno rejected the possibility of such process made of infinitely many steps since, in his opinion, it could never be completed. Thus for Zeno and the followers of Parmenides, motion did not exist. Greek mathematics rejected completed infinite processes in general (see [2]) and therefore could not attach a numerical value to the “infinite sum”

1/2+1/4+1/8+\cdots

Standard modern mathematics, however, has embraced the idea of the continuum through the concept of real number, originated with Simon Stevin and his decimal representations. Thanks to a hypostatic abstraction, now we declare numbers (real numbers) a certain property of sequences of rational numbers. In simpler terms, we adjoin to the number system all potential “limits”. Such construction would have abhorred Eudoxus and Euclid, but Cantor and Dedekind, along with most current mathematicians, were perfectly comfortable with it. In order to circumvent Zeno’s paradox, we start by defining the concept of infinite sum. The simplest choice, as is well known, is to consider “partial sums”

S_n=1/2+1/4+1/8+\cdots+1/2^n,

which in this case can be easily computed in closed form, S_n=1-1/2^{n+1}, and then noticing that S_n “stabilizes towards 1“, and thus has the real number 1 as its limit, according to the standard paradigm. In modern terminology, we would say that we are dealing with a convergent series, whose sum is 1.

Several comments are in order. a) we are free to choose what we mean by an infinite sum, as the concept cannot be reduced to that of a finite sum. There are other choices (Cesaro’s convergence, for example) ; b) this mathematical construction does not solve the paradox, which belongs in the realm of metaphysics; c) in this example, the sum happens to be a rational number, but the concept of real number allows to attach a sum to series like

1-1/2+1/3-1/4+\cdots\qquad\qquad\qquad\qquad (1)

which is the “irrational” number ln 2.

A necessary condition for the series \sum a_n to have a finite sum in the above sense is that the general term of the sequence converge to zero, a_n\to 0. Indeed, if S_n stabilizes, the difference a_n=S_n-S_{n-1} has to vanish asymptotically. The elementary theory of numerical series shows that this condition is not sufficient and that there are two main mechanisms of convergence: a) the general term converges to zero fast enough and b) the series contains infinitely many positive and negative terms and cancellations occur.

As an example of the first mechanism, any series of geometrically decreasing terms like the one above is convergent. Also, any series whose terms behave asymptotically as those of a convergent geometric series is convergent (this is called the ratio test). A relevant example of cancellation is given by Leibniz’s series (1) above, where the partial sums oscillate around \ln 2.

Summing up, for a sum over a discrete set of indices to be finite, the terms have to become small in a way that balances the number of addends, stabilizing the partial sums towards a finite limit.

Definite integrals showcase the next level of summation. In Riemann’s construction with uniform partitions, generic terms of the Riemann sums are of the order of 1/n, thus balancing the number of addends n. In contrast with series, in this case \it{all} the terms become arbitrarily small simultaneously, allowing for a “larger” number of addends, namely the continuous sum

\displaystyle{\int\limits_a^bf(x)\,dx}

over the set of indices I=[a,b]. We can think of f(x)\,dx as the infinitesimal addends making up the sum. The situation here is more delicate, since all the terms making up the Riemann sums change every time and the values of the sums may not stabilize if the function is too discontinuous. Everything works nicely for continuous functions though.

When a Physicist needs to find the total mass of a non-homogeneous bar with density \rho(x), he/she reasons as follows: choose an infinitesimal portion of the bar of length dx between two infinitesimally close points x and x+dx. Being infinitesimal, the density is constant and equal to \rho(x) on that portion and the corresponding mass is dm=\rho(x)dx. The total mass is thus the sum of the infinitesimal masses

\displaystyle{m=\int\limits_a^b\rho(x)dx}.

Physics and Engineering books contain plenty of such formulas to compute extensive magnitudes. Yet the way Calculus is currently taught is at odds with the above line of thought. A colleague engineer once told me “I had to relearn Calculus. The version mathematicians taught me was useless”.

The balance between the cardinality of the set of indices and the size of the terms is critical. For instance, if we consider “Riemann sums” of either form

\displaystyle{\sum\limits_i f(\xi_i)(x_i-x_{i-1})^2};\qquad \displaystyle{\sum\limits_i f(\xi_i)(x_i-x_{i-1})^{1/2}}

the corresponding limits are trivial,

\displaystyle{\int\limits_a^bf(x)\,dx^2=0};\qquad\displaystyle{\int\limits_a^bf(x)\,dx^{1/2}=\infty}\qquad\qquad(2)

for “reasonable” (say, continuous on [a,b]) functions. Indeed, in the first case the addends are too small, while they are too big in the second one: a^2<<a<<\sqrt{a} as a\to 0^+.

We can further consider sums over a doubly continuous set of indices I=[a,b]\times[c,d]. Those are double integrals

\displaystyle{\iint\limits_{[a,b]\times[c,d]} f(x,y)\, dxdy},

where the addends are quadratic infinitesimals f(x,y)\, dxdy with respect to dx and dy. The same applies to general multiple integrals. Unbalanced Riemann sums give rise to trivial objects like

\displaystyle{\iint\limits_{[a,b]\times[c,d]}f(x,y)\,dx^2dy=0};\qquad\displaystyle{\iiint\limits_{[a,b]\times [c,d]\times [e,f]}f(x,y,z)\,dxdz=\infty},

etc.

I believe formulas like (2) provide deeper insight than all the technicalities related to the \varepsilon - \delta definition of the definite integral. They help understand why the concept of definite integral is relevant as it strikes the sweet spot where the “number of addends” (i.e. the cardinality of the set of indices) and their size balance each other, resulting in convergence to a finite, nontrivial sum.

References:

[1] “Ten misconceptions from the history of Analysis and their debunking”, P. Blaszczyk, M.G. Katz and D. Sherry, https://arxiv.org/pdf/1202.4153.pdf

[2] “Elements of the history of Mathematics”, N. Bourbaki, Springer Science & Business Media, 1998.

Geometric loci

Sets of points satisfying certain geometric condition are ubiquitous in Mathematics. The simplest examples are straight lines, circles and more general conics. Thus a straight line can be defined as the set of points equidistant from two given points, a circle as the set of points whose distance to a fixed point (center) is constant and a conic as the set of points such that the ratio of the distances to a given point (focus) and a given line (directrix) is constant. The type of conic depends on whether the ratio (called eccentricity) is less, equal or greater than one.

Straight lines and circles were intensively studied since antiquity. They were the favorite objects of Greek geometers, and their properties are thoroughly investigated in Euclid’s Elements. In his fundamental treatise “Conics”, Apollonius of Perga, known as the “Great Geometer”, went further, tackling a systematic study of conics, establishing their focal properties, as well as those of chords and tangents, “conjugate” diameters, asymptotes, etc. It is believed that he heavily drew from previous work by Euclid as well as from Menaechmus, who is generally considered the discoverer of conic sections.

Greeks did not stop there. For the purpose of solving construction problems not amenable to the straightedge and the compass, they introduced more sophisticated loci like conchoids and cissoids, and “kinematic” curves like the quadratrix or the Archimedean spiral.

When the method of coordinates was introduced by Fermat and Decartes in the XVII century, the sophisticated auxiliary constructions typical of synthetic geometry were replaced by more straightforward and systematic algebraic methods. The equations of the above mentioned curves were obtained right away by expressing their defining properties in the language of Algebra. For instance, a parabola is a conic with eccentricity e=1. In other words, it is the locus of points equidistant from the focus and the directrix. A two line computation gives the equation

y^2=2px

for a parabola with focus at F(p,0) and directrix x=-p. In a similar fashion the equations for the other conics can be obtained and used to derive further properties. Conic sections correspond to quadratic equations in two variables, a fact first established by Wallis in 1655.

Yet another locus, also considered by the Greeks, is that of points such that the ratio of their distances to two given points is constant. Using coordinates, one easily arrives at the equation of a circle (or a line if the ratio is equal to one). These are the so called Apollonian circles. They appear in applications, for instance as the zero-potential line for a system of two point charges in Electrostatics.

In the previous examples, the property defining the curve could be directly translated into a finite algebraic relation between the coordinates. With the birth of Calculus other classes of curves started to draw the attention of scholars, namely those whose defining property was more “local” in nature, in the sense that it involved the direction or some other feature of the curve at each point. In those cases, the defining property is not a finite, but rather a differential equation, relating x, y, dx, dy, d^2y etc. which can be integrated in quadratures in some cases.

The consideration of the “differential triangle” with sides dx,dy at a generic point of the sought after curve was (and still is) a valuable tool in the derivation of the differential relations. Let’s look into some examples.

A family of equilateral hyperbolas

Consider the locus of points on the XY– plane satisfying the following property: the area of the triangle defined by the tangent, the ordinate and the subtangent is a positive constant a^2.

Let the generic point be P(x,y), its ordinate by PQ and the subtangent be QR (figure below).

Since PR is tangent to the curve, the triangle \Delta PQR is similar to the differential triangle at P. Therefore,

\displaystyle{\frac{dy}{dx}=\frac{PQ}{QR}=\frac{\pm y}{QR}}

and, consequently

\displaystyle{QR=\pm y\frac{dx}{dy}}

The given condition then reads

\displaystyle{\pm y^2 dx=2a^2dy}

or

\displaystyle{\frac{dy}{y^2}\pm\frac{dx}{2a^2}=0}

The expression on the left is a total differential,

\displaystyle{d\left(-\frac 1{y}\pm \frac{x}{2a^2}\right)=0}

giving a general solution of the form

\displaystyle{y=\frac{2a^2}{C\pm x}},

which is a family of equilateral hyperbolas with common asymptote y=0. As we move along one of these hyperbolas, the distance to the Y-axis is inversely proportional to the size of the subtangent.

The tractrix

The following problem was proposed by Claude Perrault in1670, solved in 1692 by Huygens and subsequently solved by Leibniz, Johann Bernoulli and others. 

What is the path of an object dragged along a horizontal plane by a string of constant length when the end of the string not joined to the object moves along a straight line in the plane?” 

Obviously, if the object is initially on the line of the force, the path is just a line. Assume it is not. For simplicity, choose the Y-axis in the direction of the force and the X-axis containing the point where the object is initially located. Let a be the initial distance from the object to the Y-axis (equal to the length of the string) so the initial position is (a,0). We look at this problem from a strictly geometric point of view, assuming that the object is a mass point that reacts instantly to the pulling force, aligning its motion with the force at all times. In other words, the goal is to find a curve whose segment of tangent between the point of tangency and the Y-axis has constant length, equal to a.

In the figure, P is a generic point on the curve, Q is the point of intersection between the tangent at P and the vertical axis and R is the foot of the perpendicular from P to the axis (so RP is the abscissa). Here, a=5.

The condition to be satisfied is |PQ|=a.

The triangle \Delta PQR and the differential triangle are similar, as before. Therefore,

\displaystyle{-\frac{dy}{dx}=\frac {|QR|}{|RP|}}

thus implying

\displaystyle{|QR|=-|RP|\frac{dy}{dx}=-x\frac{dy}{dx}}.

The condition |QP|^2=|RP|^2+|QR|^2 then reads

\displaystyle{x^2+x^2\left(\frac{dy}{dx}\right)^2=a^2},

equivalent to two differential equations,

\displaystyle{\frac{dy}{dx}=\pm\frac{\sqrt{a^2-x^2}}{x}},

Direct integration (say, by setting x=a\cos\theta) adding the condition y(a)=0 gives

\displaystyle{y=\pm\left(a\ln \frac{a+\sqrt{a^2-x^2}}{x} -\sqrt{a^2-x^2}\right)}

corresponding to an upper branch with negative slope (puller moving up) and a lower branch with positive slope (puller moving down). The branches meet at the initial point (a,0), which is a cusp. The vertical axis is an asymptote.

As it happens, if a tractrix is rotated about the asymptote, the obtained surface is a pseudosphere, whose Gaussian curvature is a negative constant (just like the Gaussian curvature on a sphere is a positive constant). The local geometry on a pseudosphere is hyperbolic, as shown by E. Beltrami.

Involutes

Some curves are generated from others. An involute (also called evolvent) of a curve is the locus of the tip (or any other point) on a piece of taut string as the string is either unwrapped from or wrapped around the curve. Involutes were first studied by Huygens in 1673, particularly those of a cycloid, as part of his study on isochronous pendula. There are infinitely many involutes to a given curve, depending on the point where the tip of the string detaches from it, and also depending on the direction of the wrapping/unwrapping. In the figure below, an involute to a given circle is represented in blue color. Any other involute is obtained by rotation/reflection about a line through the origin.

Let us derive the equation of the involute of a general regular curve \gamma on the plane given parametrically:

\gamma: x=x(t);\quad\quad y=y(t),

where regularity means x'(t)^2+y'(t)^2\neq 0 (if we think of the parameter t as time, the point never stops during its motion). We can easily parametrize the involute using the same parameter t. Namely, call (X(t), Y(t)) the coordinates of the point of the involute on the tangent to \gamma at (x(t),y(t)) and let s the length of the detached portion of the string (that is, the arc length measured from the point of detachment, also called natural parameter in Differential Geometry). In the figure below, we assume that the base curve is positively oriented, so the increase of the parameter t corresponds to a counterclockwise motion along the curve. The values dx and dy represented correspond to dt>0. The string is also being unwrapped counterclockwise.

We have

X(t)=x(t)+s\cos\phi;\qquad Y(t)=y(t)+s\sin\phi,

where \phi is the angle formed by the tangent at (x(t),y(t)) with the X-axis.

On the other hand, from the differential triangle we see that

\cos\phi=-dx/ds;\qquad \qquad \sin\phi=-dy/ds;\qquad ds=\sqrt{dx^2+dy^2}.

In terms of derivatives, we conclude

\displaystyle{X(t)=x(t)-s\frac{x'(t)}{\sqrt{x'(t)^2+y'(t)^2}};\qquad Y(t)=y(t)-s\frac{y'(t)}{\sqrt{x'(t)^2+y'(t)^2}}}.

In the generał case, s is obtained via integration from a given point

\displaystyle{s=\int\limits_{t_0}^tds=\int\limits_{t_0}^t\sqrt{x'(t)^2+y'(t)^2}\,dt}.

For a circle of unit radius, parametrized as x=\cos t,\, y=\sin t, clearly s=t if we choose the point (1,0) as the starting point. An appplication of the general formula gives

\displaystyle{X(t)=\cos t + t\sin t;\qquad Y(t)=\sin t - t\cos t}.

The involutes of a circle are used in the design of gear teeth, see https://www.marplesgears.com/2019/10/an-in-depth-look-at-involute-gear-tooth-profile-and-profile-shift/

Evolutes

Huygens also proved that the locus of the centers of curvature of any involute is the original curve, called the evolute. Huygens and his contemporaries defined the center of curvature as the “point of intersection of two infinitesimally close normals”. Nowadays we would say: ” the center of curvature is the center of the osculating circle”. This conceptual shift clearly shows the transition from a more dynamic to a more static point of view which runs in parallel with the abandonment of infinitesimals.

For algebraic curves, the osculating circle can be found by purely algebraic methods. Indeed, the osculating circle is the unique circle having order of tangency at least two with the curve at the given point. That was the method employed by Descartes. As Johann Bernoulli pointed out, this procedure breaks down for transcendental curves and has to be replaced by a more flexible method based on infinitesimal calculus.

I reproduce below Bernoulli’s derivation of a formula for the radius of the osculating circle (radius of curvature) as a function of x,y,dx,dy and d^2y. It is representative of the Leibnizian calculus of infinitesimals. Remarkably, the result involves second differentials.

Let ABO be a portion of a regular curve, with B and O being infinitesimally close points on it. Let the normals to the curve at B and O meet at D. We choose the origin of coordinates at A and pick the X-axis so it intersects BD and OD at points H and G. We draw a vertical auxiliary line BE and a horizontal line through B meeting OD at C, and yet another vertical through O meeting BC at F. Finally, we draw GL, perpendicular to BD. Let the coordinates of B and O be (x,y) (resp. (x+dx, y+dy)). Our goal is to compute the radius of curvature R=BD, in terms of x=AE, y=EB and their differentials dx=BF and dy=OF. Due to the local nature of R, the final result will not depend on x,y directly.

First, we observe that triangles \Delta OFC, \Delta BEH, \Delta GLH are all similar (strictly or up to negligible infinitesimals) to the differential triangle \Delta BFO. From the similarity of \Delta DHG and \Delta DBC we get

\displaystyle{\frac{BD}{HD}=\frac{BC}{HG}}

Writing HD=BD-BH and solving for BD,

\displaystyle{BD=\frac{BC\dot BH}{BC-HG}}.

Using the triangle similarities mentioned above,

\displaystyle{BC=BF+FC=BF+\frac{FO^2}{BF}=dx+\frac{dy^2}{dx}=\frac{dx^2+dy^2}{dx}} ;

\displaystyle{BH=\sqrt{BE^2+EH^2}=\sqrt{BE^2+\left(\frac{BE\cdot FO}{BF}\right)^2}=\sqrt{y^2+\left(\frac{ydy}{dx}\right)^2}=\frac{y\sqrt{dx^2+dy^2}}{dx}} ;

\displaystyle{HG=dAH=d(AE+EH)=d\left(x+\frac{ydy}{dx}\right)=dx+\frac{yd^2y+dy^2}{dx}}.

Putting all together,

\displaystyle{R=BD=\frac{(dx^2+dy^2)^{3/2}}{-dxd^2y}}.

Taking into account that our figure assumes d^2y<0 (otherwise the point D would be located above the curve and a few signs in the computations would change), and dividing throughout by dx^3 we obtain the familiar formula

\displaystyle{R=\frac{\left(1+y'^2\right)^{3/2}}{|y''|}}.

At points where y''=0, the osculating circle degenerates into a line and R=\infty.

If the original curve is given in parametric form, x ceases to be an independent variable and one has to modify the computation of HG above. Since d^2x\neq 0 one has

\displaystyle{d\left(x+\frac{ydy}{dx}\right)=dx+\frac{(yd^2y+dy^2)dx-ydyd^2x}{dx^2}},

leading to the formula

R(t)=\displaystyle{\frac{\left(x'^2+y'^2\right)^{3/2}}{|y''x'-x''y'|}},

where now derivatives are taken with respect to the parameter t.

Compared with a modern derivation, the one above may rightfully seem a bit clumsy and lacking a systematic approach. It is more of an art; the art of recognizing quantities that can be disregarded in the pre-limit situation. However, apart from that, it is impressive how little is actually needed to get the formula. Just similarity of triangles!

Once we have a formula for the radius of curvature, deriving the equation of the evolute of a generic curve is straightforward. We obtain the point (X(t),Y(t)) on the evolute by shifting the point (x(t),y(t)) in the direction of the normal (-dy,dx) by the amount R(t).That is,

\displaystyle{X(t)=x(t)-\frac{y'(t)R(t)}{\sqrt{x'(t)^2+y'(t)^2}}};\quad \displaystyle{Y(t)=y(t)+\frac{x'(t)R(t)}{\sqrt{x'(t)^2+y'(t)^2}}}.

The formulas obtained allow to easily prove Huygens’ claim: a given curve is the evolute of any of its involutes. As a consequence, the evolute is the envelope of the family of normals of any of its involutes.

Involutes have cusps at the point where the string detaches from the curve. Evolutes have cusps at points corresponding to maximum/minimum curvature.

Some examples of involute/evolute pairs are: tractrix/catenoid, parabola/semicubic parabola, ellipse/(stretched) astroid, logarithmic spiral/(another) logarithmic spiral, &c.

A series of videos showing several involute/evolute pairs and how they are generated can be found here https://kmr.dialectica.se/wp/research/math-rehab/learning-object-repository/geometry-2/metric-geometry/euclidean-geometry/geometry/plane-curves/evolutes/

Lagrange multipliers and their origins in Mechanics

il n’est pas difficile de prouver par la théorie de l’élimination des équations linéaires, qu’on aura les mêmes résultats si on ajoute simplement à l’équation des vitesses virtuelles, les différentes équations de condition dL=o, \, dM=o, \, dN=o, \, \&c, multipliées chacune par un coefficient indéterminé qu’ensuite on égale à zéro la somme de tous les terms qui se trouvent multipliés par une même différentielle… – J. L. Lagrange, “Méchanique Analytique”.

Lagrange introduced the so called “multipliers” in a very natural way to deal with constrained mechanical systems. Unfortunately, in most Calculus textbooks the method is presented using the (unavailable to Lagrange) language of vectors and exhibiting a few completely artificial examples.

In some texts, a geometric interpretation is provided in the two-dimensional case where only one constraint can be imposed. The argument amounts to showing that at a relative extremum the level set of the objective function f cannot be “transversal” to the level set of the constraint g=0 and therefore their gradients have to be collinear. Despite how appealing this interpretation might be, it does not correspond to the historical development of the method. It is rather an afterthought. A similar example would be motivating Pythagorean theorem using vectors and the inner product in \mathbb{R}^2. I believe that re-interpreting mathematical facts from a higher perspective is very useful but using those re-interpretations as motivation is misleading.

Let’s look at how Lagrange himself came up with the idea of multipliers.

A fundamental principle of Mechanics is that of virtual displacements (virtual work, d’Alembert-Lagrange principle). It reads as follows: a mechanical system is in equilibrium if the total work of applied forces is zero for any virtual displacement of the system. The term “applied forces” is used to exclude forces of constraint, {\it i.e.} those which account for constraints of the system. By definition, virtual displacements are infinitesimal displacements consistent with the constraints at a given time, see [2], [3]. Thus the main assumption in this principle is that the work of the forces of constraint is zero on the allowed displacements at any given time. Such constraints are called ideal.

A simple example is that of a block sliding along a horizontal surface. The force of constraint in this case is the normal reaction from the surface which, being perpendicular to allowed displacements, does not do work. Another example is that of a bead threaded onto a wire and forced to move along the wire. The force of constraint that keeps the bead sliding along the wire is perpendicular to the wire at each moment.

It is important to distinguish between actual displacements and virtual displacements. Even if the wire in the second example is moving and the actual displacement of the bead has a component normal to the wire, virtual displacements are still along the wire. In the terminology of Lagrangian formalism, virtual displacements are tangent vectors to the configuration space at each time, see [2], [3].

This principle has a long history, going back to the principle of the lever (Archimedes). It appeared in some form in the works of Stevin, Galileo, Wallis, Varignon and many others to tackle problems in Statics, and took his final form in the hands of Johann Bernoulli and J. d’Alembert who, in a leap of genius, generalized it to systems not necessarily in equilibrium, by including the forces of inertia. It constitutes the guiding principle in Lagrange’s monumental work “Méchanique Analytique” (1788) [1], whose first chapter contains a detailed historical account of the subject. Its main advantage compared to the Newtonian (vectorial) approach is that we do not need to know the forces of constraint, but only the constraints themselves, expressed as relations between positions, velocities, etc. The reaction forces can then be computed using Lagrange multipliers (see below).

Suppose we have a system of N particles (point masses) in space with positions (x_{i},y_{i},z_{i}), i=1,2,\dots N. Let the total force acting on the i – th particle be F_i=(P_i,Q_i,R_i). If there are no constraints on the particles, all possible displacements d\mathbf{r}_i=(dx_{i},dy_{i},dz_{i}) are admissible and, according to the principle of virtual displacements, the system is in equilibrium if and only if

\sum_i F_i\cdot d\mathbf{r}_i=0\qquad\qquad\qquad\qquad (1)

for all d\mathbf{r}_i. By freezing all of the \mathbf{r}_i‘s but one, we see that each of the addends in the above sum has to vanish, that is

P_idx_i+Q_idy_i+R_idz_i=0.

Given that dx_i,dy_i,dz_i are arbitrary, this in turn implies

P_i=Q_i=R_i=0\qquad\textrm{for all\ \ }i,

an accordance to first Newton’s law. It is important to note here that the conclusion follows from the fact that the virtual displacements are completely arbitrary.

What if they are not? That is, what if our system is subject to some constraints? One possibility is to add the force responsible for the constraints, but this is can actually be avoided as follows. Suppose we have an analytic description of the constraint(s) of the form

g_1(\mathbf{r}_1,\mathbf{r}_2,\dots \mathbf{r}_N)=0,\quad\dots\quad  g_m(\mathbf{r}_1,\mathbf{r}_2,\dots \mathbf{r}_N)=0,

where, for simplicity, we assume that our constraints are scleronomic, {\it i.e.} holonomic (finite) and time independent. These constraints reduce the freedom of the virtual displacements, which now are forced to be tangent to the above surfaces in 3N – dimensional space and therefore to their (generically) (3N-m) – dimensional intersection, a manifold in \mathbb{R}^{3N}. To calculate those virtual displacements, we differentiate each one of the above equations, yielding

\left\{\begin{array}{l}\displaystyle{\frac{\partial g_1}{\partial \mathbf{r}_1}\cdot d\mathbf{r}_1+\frac{\partial g_1}{\partial \mathbf{r}_2}\cdot d\mathbf{r}_2+\dots \frac{\partial g_1}{\partial \mathbf{r}_N}\cdot d\mathbf{r}_N=0}\\ [15pt]\displaystyle{\frac{\partial g_2}{\partial \mathbf{r}_1}\cdot d\mathbf{r}_1+\frac{\partial g_2}{\partial \mathbf{r}_2}\cdot d\mathbf{r}_2+\dots \frac{\partial g_2}{\partial \mathbf{r}_N}\cdot d\mathbf{r}_N=0}\\[15pt]\cdots\cdots\cdots\cdots\end{array}\qquad\qquad\qquad\quad (2)\right.

where each addend is an inner product and

\displaystyle{\frac{\partial g_k}{\partial \mathbf{r}_i}=\left(\frac{\partial g_k}{\partial x_i},\,\frac{\partial g_k}{\partial y_i},\, \frac{\partial g_k}{\partial z_i}\right)}

are the partial gradients with respect to each particle. Given that now the d\mathbf{r}_i are not arbitrary but linked by the above relations, we cannot conclude from equation (1) that F_i=0 for all i. One possibility would be to solve the linear system (2) for m differentials in terms of 3N-m independent ones and substitute into (1). Then, the equilibrium condition would be obtained by setting the resulting coefficients of the independent differentials equal to zero.

The method of multipliers provides an elegant way to do just that. Let’s assume, for simplicity, that there is only one constraint and one particle, N=m=1. Then (1) and (2) give, respectively,

Pdx+Qdy+Rdz=0;\qquad A_xdx+A_ydy+A_zdz=0,\qquad\qquad\qquad (3),

where A_x=A_x(x,y,z)=\partial g/\partial x, etc. are the partial derivatives of some function g. At least one of A_x,A_y,A_z has to be non-zero if g=0 is a regular surface. In order to exclude one of the differentials from (3) we multiply the second equation by a constant \lambda and add it to the first. Thus at any given point we can choose \lambda in such a way that one of the quantities

P+\lambda A_1,\, Q+\lambda A_2,\, R+\lambda A_3

vanishes. But then the other two must also vanish, being the coefficients in front of the remaining independent differentials. Thus in any case, excluding \lambda from the equations

P+\lambda A_1=0,\, Q+\lambda A_2=0,\, R+\lambda A_3=0

and adjoining the constraint g(x,y,z)=0, we get the equilibria.

Generalizing the above procedure for the case of more constraints or more particles is straightforward. We multiply each one of the equations in (2) by independent multipliers \lambda_1,\lambda_2,\dots \lambda_m and add their sum to (1) . We choose the multipliers in order to remove m differentials at each point, assuming that the constraints are independent. The equations corresponding to removing those differentials and the ones resulting from equating to zero the remaining coefficients have the same form, giving the method its symmetry with respect to all variables. Finally, we adjoin the constraint equations themselves to close the system.

The number of constraints is at most 3N-1, otherwise no freedom is left for the system. For example, one particle in space can be subject to lie on a surface or on a line (the intersection of two surfaces). Adding one more constraint would completely fix its position.

This method is used for conditional optimization. If we are to minimize/maximize a function U of, say, three variables subject to constraints, we follow the above procedure with (1) replaced by the condition dU=0. When the work of a force can be put in the form dU for some scalar function U, we say that the force is conservative or potential, and U is a potential energy. Thus, a constrained conservative system is in equilibrium at a conditional extremum of its potential energy.

It is worth mentioning though that in the application to Mechanics, the type of equilibrium (max/min/saddle) is relevant and directly related to the notion of stability. For example, an ideal pendulum has two equilibria: the lowermost position and the uppermost position which correspond to a minimum and a maximum of the gravitational potential energy. The first one is stable and the second one is unstable. In order to keep the size of this post reasonable, we refrain from going into the topic of classification of extrema, which involves the analysis of the signature of the second differential of U over the subspace defined by virtual displacements.

When used to solve dynamics problems, the principle of virtual displacement includes the “force of inertia”. Thus, along the actual motion \mathbf{r}_i(t),

\sum_i\left(-m_i\mathbf{r}_i^{''}+F_i\right)\cdot d\mathbf{r}_i=0

for all virtual displacements d\mathbf{r}_i of the system. Here m_i stands for the mass of the i-th point and \mathbf{r}_i'' for its acceleration.

As a by-product of the method of multipliers, we can find the forces of constraint (reactions). For simplicity, suppose we have one point and one constraint g(x,y,z)=0, as before. The method of multipliers gives the equations of motion

\left\{\begin{array}{l}-mx''+F_x+\lambda\frac{\partial g}{\partial x}=0\\[15pt]-my''+F_y+\lambda\frac{\partial g}{\partial y}=0\\[15pt]-mz''+F_z+\lambda\frac{\partial g}{\partial z}=0\end{array}\right.

Therefore, the reaction force is

\displaystyle{R=\lambda\left(\frac{\partial g}{\partial x},\, \frac{\partial g}{\partial y},\, \frac{\partial g}{\partial z}\right).}

A few final remarks:

1) The above method can be applied if the constraints are time-dependent (rheonomic). Precisely, if we have just one particle and a constraint of the form

g(x,y,z,t)=0

by differentiating and setting dt=0, virtual displacements dx,\, dy,\, dz satisfy the same equation as before (second equation in (3)). In this case virtual displacements are different from real ones. Non-holonomic constraints cannot be handled by this method, except for special cases ( {\it e.g.} linear in the velocities). See [3].

2) Holonomic constraints can be completely eliminated by introducing independent generalized coordinates which implicitly account for them. Thus if \mathbf{q}=(q_j), j=1,2,\dots k=3N-m are generalized coordinates of the system and its kinetic energy is T=T(\mathbf{q},\mathbf{q}',t) , we can write the equations of motion in the (Lagrangian) form

\displaystyle{\frac{d}{dt}\left(\frac{\partial T}{\partial q_j'}\right)-\frac{\partial T}{\partial q_j}=Q_j},

where and Q_j are generalized forces, [3]. By solving this second order system of equations we find the dynamics \mathbf{q}(t). But in some cases we are interested in the forces of constraint. One way to find them would be to substitute into Newton’s second law and solving for the unaccounted forces. However, it is generally more convenient to use dependent generalized coordinates in combination with the method of multipliers to deal with constraints and find the forces of constraint as above.

3) If all the constraints are holonomic and forces are conservative with potential energy U, the principle of virtual displacements, which for one particle takes the form

\displaystyle{\left(m\mathbf{r}^{''}+\frac{\partial U}{\partial \mathbf{r}} \right)\cdot d\mathbf{r}=0};\qquad\qquad\qquad d\mathbf{r} - \textrm{virtual displacement}

is precisely the condition of (conditional) stationarity for the action functional

\displaystyle{S[\mathbf{q}]=\int\limits_{t_1}^{t_2}L(\mathbf{q}',\mathbf{q},t)\,dt},

where \mathbf{q} are, as before, generalized coordinates on the (3-m) – dimensional manifold M defined by the constraints, T=T(\mathbf{q}',\mathbf{q},t) is the kinetic energy and L=T-U is the Lagrangian of the system. The true motion \mathbf{q}(t) is an extremal of S among curves contained in M if only variations within M are allowed, [2]. The corresponding Euler-Lagrange equations furnish the equations of motion

\displaystyle{\frac{d}{dt}\left(\frac{\partial L}{\partial q_j'}\right)-\frac{\partial L}{\partial q_j}=0}.

Generalization to N>1 particles is straightforward.

References:

[1] J. L. Lagrange, “Mécanique Analytique“, Paris, Ve Courcier, 1811-15.

[2] V.I. Arnold, “Mathematical Methods of Classical Mechanics”, Graduate Texts in Mathematics, Vol. 60, 2nd Edition, 2010.

[3] H. Goldstein, Ch. Poole, J. Safe, “Classical Mechanics”, Addison Wesley, Third Edition, 2001.

Differentials in action

Before we move onto concrete examples, let us quickly review some basic differentiation rules. In order to differentiate equations of the form

F(u,v,w,\dots)=0

we need to replace all the variables u,v,w,\dots by their perturbed values u+du,v+dv,w+dw,\dots and disregard infinitesimals which are of order higher than one with respect to du, dv,dw, \dots in the resulting expression.

For example, if F(u,v)=Au+Bv for some constants A,B, we have

F(u+du,v+dv)=A(u+du)+B(v+dv)=Adu+Bdv,

since Au+Bv=0. In this case, the expression is linear in du and dv and, therefore

d(Au+Bv)=Adu+Bdv.

To differentiate an equation involving a product, F(u,v)=uv, we notice that

F(u+du,v+dv)=uv+udv+vdu+dudv

and, since uv=0, we get

d(uv)=udv+vdu.

Let us next find dF for F(u,v)=\frac {u}{v}

\displaystyle{F(u+du,v+dv)=\frac{u+du}{v+dv}=\frac{u}{v}\frac{1+\frac{du}{u}}{1+\frac{dv}{v}}}

Now, we observe that

\displaystyle{\left(1+\frac{dv}{v}\right)\left(1-\frac{dv}{v}\right)=1-\frac{dv^2}{v^2}},

Hence, up to linear terms,

\displaystyle{\frac 1{1+\frac{dv}{v}}=1-\frac{dv}{v}}

and therefore

\displaystyle{F(u+du,v+dv)=\frac{u}{v}\left(1+\frac{du}{u}\right)\left(1-\frac{dv}{v}\right)=\frac{u}{v}\left(1+\frac{du}{u}-\frac{dv}{v}\right)},

where we have disregarded the quadratic term \frac{dudv}{uv}. Finally, since F(u,v)=\frac{u}{v}=0 we get

\displaystyle{dF=\frac {vdu-udv}{v^2}}.

Successive applications of the product rule give the important power rule:

d(u^2)=udu+udu=2udu;\qquad d(u^3)=ud(u^2)+u^2du=3u^2du

and, in general, d(u^n)=nu^{n-1}du. For negative integer powers the same rule applies, as follows from the quotient and power rules. Combining the previous, we can differentiate any equation involving polynomial or rational functions.

What about irrational/transcendental functions? As for n-th roots, we have

\displaystyle{du^{1/n}=\frac 1{n}u^{1/n-1}}du

by just rearranging the power rule. Therefore, the power rule extends to rational powers.

Let’s move now to the fun part.

Reasoning with infinitesimals

Suppose we want to compute d(\sin x). In modern textbooks, they focus on the derivative d(\sin u)/du. The computation goes more or less like this.

\displaystyle{\frac{d(\sin x)}{dx}=\lim_{h\to 0}\frac{\sin(x+h)-\sin x}{h}=\lim_{h\to 0}\cos (x+h/2)\frac{\sin h/2}{h/2}}=\cos x,

where the trigonometric identity for the difference of sines has been used, as well as the fact that \sin t/t\to 1 as t\to 0. All this is nice and clean but does not provide compelling intuition. Next, we provide a proof based on infinitesimals. No computation needed. It is essentially a picture-based proof.

In the figure below, the unit circle is represented, and we consider a generic angle x being increased by an infinitesimal amount dx.

The corresponding change of \sin x is d(\sin x)=EB-AD=EB-CE=BC. The triangles \Delta OEB and \Delta ABC are similar since they are right and \angle BOE=\angle ABC. Therefore,

\displaystyle{\frac{d(\sin x)}{dx}=\frac{BC}{AB}=\frac{OE}{OB}=\cos x}.

From the perspective of rigor, the above argument is flawed, for many reasons. The “triangle” \Delta ABC is not really a rectilinear triangle, OE is actually \cos(x+dx), etc. BUT it provides a visual, compelling reason as to why d(\sin x)=\cos x dx and the conclusion is absolutely correct. It is correct because we know that the vanishing circular arc {AB} is indistinguishable from its tangent, while \cos(x+dx) is indistinguishable from \cos x.

Admittedly, this type of argument relies on the ability to recognize which approximations will become exact in the limit. We believe this is a skill worth developing.

Here is another example mentioned in the beautiful book [1] and attributed there to Newton. Let us show that d(\tan x)=\sec^2 xdx. In the figure below, |AB|=1 and \angle ABC is right. Then \tan\theta=|BD|. We increase the angle \theta by an infinitesimal amount d\theta. The tangent is then increased by d\tan\theta=|CD|. Let DE be the arc of a circle centered at A.

The triangles \Delta ABC and \Delta EDC are similar and thus

\displaystyle{\frac{|CD|}{|ED|}=\frac{|AC|}{|AB|}}

or

\displaystyle{\frac{d(\tan\theta)}{Ld\theta}=L}

(because |AC| differs from L by an infinitesimal) and finally

\displaystyle{\frac{d(\tan\theta)}{d\theta}=L^2=\sec^2\theta}.

Solving problems using differentials

Let’s start with a simple optimization problem which is more elegantly solved using differentials.

Consider a sliding segment of length L, with endpoints on the positive X and Y semi-axes. What is the location of the segment for which the area of the triangle that it determines on the first quadrant is maximal?

The standard approach to the problem is to express the area of the triangle as a function of some parameter (the horizontal projection of the segment, some angle, etc.) and find the value of the parameter that makes the derivative equal to zero. A more “symmetric” approach would be as follows. Let x\ge 0 and y\ge 0 be the lengths of the legs of the triangle. Then,

A=xy;\qquad x^2+y^2=L^2.

Differentiating both relations,

dA=xdy+ydx;\qquad xdx+ydy=0.

A necessary condition for A to reach a local maximum is dA=0. Thus we get a system of equations

xdy+ydx=0;\qquad xdx+ydy=0.

For the above homogeneous linear system to have a non-trivial solution (so infinitesimal changes (dx,dy) are allowed) we need zero determinant, y^2-x^2=0 which in our case implies x=y and then, from our second finite relation we get x=y=\sqrt{2}L/2.

If we were to follow the approach based on derivatives, we could either express y=\sqrt{L^2-x^2} and substitute, leading to the problem of minimizing a function of x

A(x)=x\sqrt{L^2-x^2}/2

We would then find A'(x) and set it equal to zero. In this simple example there is no big difference, but notice that with our approach: a) no “independent variable” is singled out; b) we avoid the annoyance of taking the derivative of a square root; c) L does not appear explicitly throughout the manipulation with differentials, but only at the beginning and at the end, when we appeal to finite relations (and this is, in principle, unavoidable given the nature of differential relations).

A similar example is presented in [2]. There, the classical problem of minimizing the length of a piece-wise straight path connecting two given points with a given line (so called Heron’s problem) is solved.

The given quantities (see Fig. above) are d=a+b, C and D and the quantity to minimize is l=p+q. This problem involves more parameters and, no matter what quantity you choose as independent, the solution using derivatives is a bit messy. Using differentials, it reduces to a compatibility condition for a homogeneous linear system as above. Namely, we have

p^2=C^2+a^2;\qquad q^2=D^2+b^2;\qquad a+b=L;\qquad p+q=l.

Differentiating all the equations gives

da+db=0;\qquad 2ada=2pdp;\qquad 2bdb=2qdq;\qquad dp+dq=dl

The condition of extremum is dl=0. Replacing da and db in terms of dp and dq we end up with a 2\times 2 system for dp,\, dq:

\displaystyle{\frac{p}{a}dp+\frac{q}{b}dq=0;\qquad dp+dq=0}

The determinant should be zero, that is a/p=b/q. This in turn implies

\displaystyle{\frac{b^2}{a^2}=\frac{q^2}{p^2}=\frac{b^2+D^2}{a^2+C^2}}

and, consequently, b/a=D/C. Finally,

\displaystyle{a=\frac{Cd}{C+D};\qquad b=\frac{Dd}{C+D}}.

Observe in particular that a/C=b/D so the path should “reflect” on the line.

In both examples above, we solved a conditional optimization problem. A possible approach is to use the method of Lagrange multipliers. Thus in our case we are trying to. minimize

f(p,q)=p+q

under the constraint

a+b=g(p,q)=\sqrt{p^2-C^2}+\sqrt{q^2-D^2}=d.

The main advantage of that method is to keep the symmetry between the variables, at the expense of introducing a number of multipliers. On our next post, we will discuss how the method was introduced by Lagrange to deal with constraints in mechanical systems. Not surprisingly, infinitesimals were at the heart of his analysis, in the form of virtual displacements.

References:

[1] T. Needham, “Visual Complex Analysis”,  Oxford University Press, 1999.

[2] T. Dray, Corinne A. Manogue, “Putting Differentials Back into Calculus”, The College Mathematics Journal 42 (2), 2010.

Viewpoints on differentials

…many mathematicians think in terms of infinitesimal quantities: apparently, however, real mathematicians would never allow themselves to write down such thinking, at least not in front of the children. —Bill McCallum 

The concept of differentiable function and differential introduced in a a previous post were related to the expansion

y(x+dx)-y(x)=A(x)dx+o(dx)=dy+o(dx),

valid as dx\to 0. During the XVII and XVIII centuries, mathematicians thought of dx and dy as actual infinitesimals “à la Leibniz”. After the advent of the rigorous concept of limit, the point of view changed. The above relation is now mostly used to introduce the derivative A(x), which plays a central role, and both dx and dy are deprived of their infinitesimal nature. Namely, dx is understood as an arbitrary, finite, increment of the “independent” variable whereas dy is a function of (x,dx), linear in its second argument. dy is thus a functional differential, and the relation dy/dx=A(x) is true by definition. This point of view stresses the notion of derivative, rendering differentials as superfluous, some sort of annoyance of the past, used only in the context of linear approximation, see [1].

However, in Engineering and Physics practice, differentials come first and are perceived as actual infinitesimals while derivatives are obtained as ratios. This seems more natural, the concept of ratio being psychologically more complex than that of infinitesimal. But that is not the only advantage. Indeed, as we will show on examples, reasoning with differentials as infinitesimals leads to shorter and more clear proofs of propositions and solutions to many problems. One should concede, however, that the notion of derivative is easier to put on a firm basis in the language of limits, while infinitesimals are logically problematic.

A more symmetric point of view, where no distinction between “independent” and “dependent” variables is made, is very prolific. As an example, suppose the quantities x and y are linked by the relation

xy=5.\qquad\qquad\qquad\qquad (1)

If we increase x and y by \Delta x (respectively \Delta y) in a way consistent with the above relation,

(x+\Delta x)(y+\Delta y)=5,

we have

(x+\Delta x)(y+\Delta y)-xy=0,\qquad\textrm{or}\qquad x\Delta y+y\Delta x+\Delta x\Delta y=0.

The latter is a condition on the finite increments \Delta x and \Delta y to be “compatible” with (1). By considering infinitesimal increments dx and dy instead, we can “filter” the above relation by keeping only the terms which are linear in dx and dy:

x dy+ydx=0,\qquad\qquad\qquad\qquad (2)

where the quadratically small term dxdy is dropped. This is a general principle in Differential Calculus: in any computation involving infinitesimals, one should keep the leading ones (i.e. the ones of lowest order), dropping the higher order ones. This principle is followed by physicists, engineers and other users of differential calculus, and deemed as intuitively clear. It can be ultimately justified using the language of limits.

Thus, (2) is the differential relation between x,y,dx,dy corresponding to the “finite” relation (1). We derived (2) from (1). This time we are not differentiating a function, but rather an equation.

The opposite process is called “integration” or “solving a differential equation”. Of course one cannot completely recover the finite relation from the differential one: the constant \lq\lq 5" in (1) could be replaced by any other constant without altering the resulting differential relation.

In line with a more “algebraic” point of view, differential equations in modern textbooks are written in the form

\displaystyle{G(x,y,\frac{dy}{dx})=0},

where G is a general function of three variables and the unknown is a function y=y(x). This approach carries some unpleasant consequences. For example, in modern notation (2) may be presented as

\displaystyle{\frac{dy}{dx}=-\frac{y}{x}}\qquad\qquad\qquad\qquad (3)

and solutions are sought as functional relations y = f(x) on some interval. Thus, the function y=\sqrt{1-x^2} on the interval (-1,1) is a solution of (3). The function y=-\sqrt{1-x^2} on the same interval is also a solution. But it would definitely be much nicer to be able to say that the full circle x^2+y^2=1 we started with is a solution to the symmetric differential equation (2), as well as any other circle centered at the origin. In some textbooks, they deal with the issue by saying that the equation (3) actually presupposes the simultaneous consideration of

\displaystyle{\frac{dx}{dy}=-\frac{x}{y}}\qquad\qquad\qquad\qquad (3')

where we look for solutions x=x(y). Then, the full set of solutions (to be precise, non-prolongable ones) is made of right, left, upper and lower half-circles. This is a consequence of insisting on functional relations and finite rates, where the variables play an asymmetric role. This was never an issue for Leibniz, Huygens, the Bernoullis, L’Hôpital or Euler. Much less for Newton, whose independent variable was always time, an extrinsic parameter. Equations in “differential form” like (2) are still presented in many texts, especially those for Engineers. The solution is then presented in the natural implicit form, apparently oblivious of their initial definition of solutions as functional relations.

Back to the above procedure, in order to get (2) from (1) it is assumed that both dx and dy are infinitesimals, and (1) is satisfied up to infinitesimals of order higher than dx and dy. This process is called linearization. More generally, given an implicit relation between the variables u,v,w,\dots

F(u,v,w,\dots)=0

we replace u,v,w,\dots by u+du,v+dv,w+dw,\dots on the left hand side and impose F(u+du,v+dw,w+dw,\dots)=0 up to infinitesimals of order higher than du, dv,dw,\dots (quadratic, cubic, etc.) That leads to a differential relation

A(u,v,w,\dots)du+B(u,v,w,\dots)dv+C(u,v,w,\dots)dw+\dots=0,\qquad\qquad\qquad\qquad (4)

valid “along” F=0 (see below).

In order to reconcile the above linearization process with our previous concept of functional differential, one can consider a new quantity z defined by

z=F(u,v, w,\dots)

and proceed as in the case of one independent variable, i.e. expanding the change of z as a sum of powers of independent increments du, dv,dw,\dots of u,v,w\dots respectively. For the sake of simplicity, assume there are two independent variables, u and v. Then, if \Delta z can be written in the form

z(u+du,v+dv)-z(u,v)=A(u,v)du+B(u,v)dv+\textrm{\ higher order terms}

(where the higher order terms contain higher powers of du and dv, products like dudv etc.) we say that z is a differentiable function of its arguments, and call the linear part

dz=dF(u,v; du, dv)=A(u,v)du+B(u,v)dv

the differential of z (or F). The coefficients A(u,v) and B(u,v) are called partial derivatives of z with respect to u (respectively, v), denoted as

\displaystyle{\frac{\partial z}{\partial u}}=A(u,v);\qquad\qquad \displaystyle{\frac{\partial z}{\partial v}}=B(u,v)

Summing up, when differentiating an equation F(u,v,w,\dots)=0, we differentiate the function on the left hand side as if the variables were independent. The obtained differential relation, however, only holds if du, dv,\dots are compatible with F(u,v,\dots)=0 to first order. Next, we present a nice geometric interpretation of the notion of compatibility.

The point of view of Differential Geometry

There is a very nice interpretation of the linearization process in the language of Differential Geometry. We can think of our relation F(u,v,w,\dots)=0 as a (hyper)surface in the space of the variables (u,v,w,\dots), contained in the domain of F. Then,

dF=A(u,v,w,\dots)du+B(u,v,w,\dots)dv+\dots

is an (exact) differential form, defined on the domain of F. At each point in this domain, it is a linear function of the differentials du, dv,dw,\dots. For example, the above relation xy=5 defines a hyperbola in the xy-plane, and the corresponding differential form is dF=xdy+ydx.

Then, the relation dF=0 holds along F=0 or, more generally, along any level set of F. More precisely, it holds when the differentials du, dv,\dots are the components of a vector, tangent to the hypersurface F=0. Those familiar with Multivariable Calculus will recognize the condition dF=0 as the requirement for (du,dv,dw,\dots) to be perpendicular to the gradient vector \nabla F(u,v,w,\dots) which is in turn perpendicular to the level set F=0. This is the geometric meaning of “differential increment being compatible with the given relation”. In the figure below, F(x,y)=x^2+2xy+3y^2-10.

…………………………………………. \qquad\qquad\qquad \mathbf{dF(A,dr)=0;\, \,\, dr=(dx,dy)} ……………………………………….

More generally, if F is a scalar function defined on a manifold, the tangent space to the submanifold F=0 is the kernel of the differential form dF. Solving an equation in “differential form” is precisely finding a submanifold whose tangent space is the kernel of the given form.

There is no trace of infinitesimal quantities in the above linearization procedure. After all, there is no restriction on the size of the vector (du, dv); it just needs to be tangent to the hyper-surface F=0. However, among Physicists, Engineers and other practitioners of Calculus, it is common to assume that the differentials of the involved variables are actual infinitesimals, and higher order terms are dropped by virtue of their relative smallness. Formally, both approaches lead to the same result, but the latter can often be used as a computational shortcut and is more intuitive.

Yet another advantage of differential relations like (4) is that if the variables (u,v,w,\dots) depend on further variables r, s,t,\dots, we obtain a valid differential relation between the new variables by just considering du, dv,dw,\dots as functional differentials of the new independent variables and substituting accordingly. This formal invariance of the first differential was already pointed out by Leibniz and is very useful in applications. From the point of view of derivatives, it is nothing but the chain rule. From an abstract point of view, it is a consequence of linearity, and thus does not extend to higher order differentials.

In forthcoming posts, we will present some examples of the use of differentials when thought as infinitesimals and some applications of the above calculus with differentials to Physics, to the solution of optimization problems, related rates problems, etc.

References:

[1] “Putting Differentials Back into Calculus”, T. Dray, Corinne A. Manogue, The College Mathematics Journal 42 (2), 2010.