The Lipschitz condition and uniqueness for ODEs

Fig.1. (L) The Lipschitz condition for a function f. (R) Rudolph Lipschitz (1832 -1903)

A classic condition on the right hand side of a first order normal ODE

\displaystyle{\frac{dx}{dt}=f(t,x)}

guaranteeing local uniqueness of solutions with given initial value x(t_0)=x_0 is Lipschitz continuity of f in the x variable in some open, connected neighborhood U of (t_0,x_0) where f is defined. That is, it is required that there exist some constant L>0 such that

|f(t,x_1)-f(t,x_2)|\le L|x_1-x_2|,\qquad\qquad\qquad \textrm{(LC)}

for all (t,x) in U. Under the above condition there is a unique local solution of the initial value problem

\left\{\begin{array}{l}\displaystyle{ \frac{dx}{dt}=f(t,x)} ;\\[10pt] x(t_0)=x_0 \end{array}\right.\qquad\qquad\qquad \textrm{(IVP)}

where uniqueness means that two prospective solutions defined on open intervals containing t_0 coincide in their intersection. We assume that our solutions are classical, \it{i.e.}, continuously differentiable. Such solutions exist when extra conditions on f are imposed. For instance, if f is assumed continuous in U, classical local solutions exist and can be extended up to the boundary of U. But here our concern is uniqueness.

My goal here is to explain why such condition implies uniqueness in simple terms, how it can be generalized and the relation between uniqueness and another interesting phenomenon, namely finite-time blow-up of solutions.

To illustrate the idea, we will assume that t_0=0 and x_0=0. We will also assume that f(t,0)=0 for every t and, therefore, one solution of the above IVP is x(t)\equiv 0. The general case reduces to this one, as we explain later.

We focus on forward uniqueness. Namely, we will prove that a solution x(t) with x(t_1)=x_1\neq 0 for some t_1>0 can never be zero at t=0. We will assume x_1>0, as the case x_1<0 can be handled in a similar way. Backwards uniqueness also follows easily from the forward result.

Uniqueness is violated if, as t decreases from t_1 to zero, x(t) vanishes at some point \bar t, 0\le \bar t<t_1, while x(t)>0 for \bar t<t\le t_1. By the Lipschitz condition,

|f(t,x(t))|=|f(t,x(t))-f(t,0)|\le L x(t)

while x(t)>0. It then follows from the ODE that

\displaystyle{\frac{dx}{Lx}\le dt}

along the solution for t\in (\bar t,t_1). Integrating this inequality over [t,t_1],

\displaystyle{t_1-t\ge\frac{1}{L}\int\limits_x^{x_1}\frac{ds}{s}}\qquad\qquad\qquad\textrm{(II)}

This integral inequality is the crux of the argument. Just observe that, since the improper integral is divergent, x(t) approaching zero would require t\to -\infty, contradicting the assumption x(t)\to 0 as t\to \bar t^+. The Lipschitz condition prevents x(t) from becoming zero over finite t-intervals.

The argument above is equivalent to the following: for any solution x(t) with x(t_1)=x_1>0 for some t_1>0 one can construct an exponential barrier from below of the form y(t)=Ce^{Lt} for some C>0, that is, x(t)\ge Ce^{Lt} for all t<t_1 in its domain, thus preventing x(t) from becoming zero. Indeed, the inequality

\displaystyle{\frac{dx}{dt}}\le Lx

implies that our solution x(t) is increasing at a slower pace than the solution y(t) of the IVP

\left\{\begin{array}{l}\displaystyle{ \frac{dy}{dt}=Ly} ;\\[10pt] y(t_1)=x_1 \end{array}\right.

which is nothing but y(t)=Ce^{Lt} for the appropriate C>0. Therefore, y(t)\le x(t) to the left of t_1. In other words, y(t) acts as a lower barrier for x(t), as in the figure below.

Fig 2. The exponential barrier prevents the solution through P from reaching the t-axis.

If we assume that x_1<0, we use the inequality (again a consequence of \textrm{(LC)})

\displaystyle{\frac{dx}{dt}}\ge Lx

for x<0 to conclude that y(t)=De^{Lt} with D<0 is a barrier from above for our solution, preventing it from reaching the t-axis.

The integral inequality \textrm{(II)} suggests a very natural generalization. Indeed, all we need is a diverging (at zero) improper integral on the right-hand side. We can replace the Lipschitz condition by the existence of a modulus of continuity, that is a continuous function \Phi: [0,\infty)\to [0,\infty) with \Phi(0)=0, \Phi (u)>0 for u>0 satisfying

|f(t,x_1)-f(t,x_2)|\le \Phi(|x_1-x_2|)

in U, with the additional property

\displaystyle{\int\limits_{0^+}\frac {ds}{\Phi(s)}=\infty}.

This more general statement is due to W. Osgood. The Lipschitz condition corresponds to the choice \Phi(s)=s. The proof is identical to the one above, given that the only property we need is the divergence of the improper integral of 1/\Phi at 0^+.

Thus, for an alternative solution to branch out from the trivial one, we require a non-Lipschitz right-hand side in \textrm{(IVP)} that leads to a convergent improper integral. This condition is satisfied, for instance, in the autonomous problem

\left\{\begin{array}{l}\displaystyle{ \frac{dx}{dt}=x^{2/3}} ;\\[10pt] x(0)=0 \end{array}\right.

which, apart from the trivial solution, has solutions of the form x(t)=0 on [0,c) and x(t)=(t-c)^3/27 for t\ge c for any c>0.

There is nothing special about the power 2/3 in this example. Any power \alpha with 0<\alpha<1 would do. These examples of non-uniqueness are usually attributed to G. Peano.

Uniqueness for general solutions can be easily reduced to the special case above. Namely, if \hat x(t) and x_1(t) are local solutions of \textrm{(IVP)} (say on [t_0,t_1)), then \bar x(s)=\hat x(t_0+s)-x_1(t_0+s) is a local solution of

\left\{\begin{array}{l}\displaystyle{ \frac{d\bar x}{ds}=g(s,\bar x):=f(s+t_0, \bar x+x_1(s+t_0))-f(s+t_0,  x_1(s+t_0))} ;\\[10pt] \bar x(0)=0 \end{array}\right.

on [0,t_1-t_0) with g(s,0)= f(t, x_1(t))-f(t,x_1(t))=0. Moreover, g satisfies a Lipschitz condition near (s,\bar x)=(0,0) if f does near (t_0,x_0), with the same constant L. By the above particular result, \bar x(s)\equiv 0 and hence \hat x(t)\equiv x_1(t) on the corresponding intervals.

Remarks: a) A simple and widely used sufficient condition for \textrm{(LC)} to hold is the continuity of the partial derivative \displaystyle{\frac{\partial f}{\partial x}} in an x-convex region U (typically a rectangle). This follows from a straightforward application of Lagrange’s mean value theorem; b) \textrm{(LC)} is not necessary for uniqueness, as the example of \textrm{(IVP)} with f(t,x)=x^{\alpha}\sin\frac 1{x} with \alpha\in (0,2] shows; d) The Lipschitz condition is relevant in other areas of Analysis. For instance, it guarantees the uniform convergence of Fourier series.

A related phenomenon: blow up in finite time.

Local solutions issued from (t_0,x_0) can be extended to the boundary of U, but not necessarily in the t-direction. The reason is a fast (superlinear) grow of f(t,x) as x\to\pm\infty, assuming that the domain of definition of f extends indefinitely in the x-direction. A simple example is the problem

\left\{\begin{array}{l}\displaystyle{ \frac{dx}{dt}=x^2} ;\\[10pt] x(t_0)=x_0>0 \end{array}\right.,

whose explicit solution \displaystyle{x(t)=\frac{x_0}{1-x_0(t-t_0)}} “blows up” as t\to (t_0+1/x_0)^-, despite the fact that f(x,t)=x^2 is smooth on the whole plane. The role of the superlinear growth at infinity is similar to the role of Lipschitz (or Osgood) condition in bounded regions for uniqueness. The above problem is equivalent to

\displaystyle{\int\limits_{x_0}^x\frac{dx}{x^2}=t-t_0}

Convergence of the improper integral \displaystyle{\int\limits^{\infty}\frac{dx}{x^2}} prevents t from attaining arbitrarily large values. Calling \displaystyle{T=t_0+\int\limits_{x_0}^{+\infty}\frac{dx}{x^2}=t_0+1/x_0}, we have \lim_{t\to T^-}x(t)=+\infty. This phenomenon is called finite time blow-up and is exhibited by ODEs with superlinear right-hand-sides, by some evolution PDEs with superlinear sources, etc.

The same reasoning applies in the general case when there exists a continuous \Psi>0 such that f(t,x)>\Psi(x) as x\to\infty (resp. f(t,x)<-\Psi(x) as x\to-\infty) if only

\displaystyle{\int\limits^{\infty}\frac{ds}{\Psi(s)}<\infty,\quad\textrm{resp.}\quad\int\limits_{-\infty}\frac{ds}{\Psi(s)}<\infty}.

This time, under assumptions guaranteeing existence and uniqueness of solutions and provided the first condition above holds, the (forward) solution to \textrm{(IVP)} with x(t_0)=a>0 stays to the left of the solution of

\left\{\begin{array}{l}\displaystyle{ \frac{dx}{dt}=\Psi(x)} \\[10pt] x(t_0)=x_0 \end{array}\right.

with 0<x_0<a. The latter blows up in finite time, namely at time \displaystyle{T=t_0+\int\limits_{x_0}^{\infty}\frac{ds}{\Psi(s)}}, forcing the solution to our IVP to blow up at some T'\le T.

Geometric loci

Sets of points satisfying certain geometric condition are ubiquitous in Mathematics. The simplest examples are straight lines, circles and more general conics. Thus a straight line can be defined as the set of points equidistant from two given points, a circle as the set of points whose distance to a fixed point (center) is constant and a conic as the set of points such that the ratio of the distances to a given point (focus) and a given line (directrix) is constant. The type of conic depends on whether the ratio (called eccentricity) is less, equal or greater than one.

Straight lines and circles were intensively studied since antiquity. They were the favorite objects of Greek geometers, and their properties are thoroughly investigated in Euclid’s Elements. In his fundamental treatise “Conics”, Apollonius of Perga, known as the “Great Geometer”, went further, tackling a systematic study of conics, establishing their focal properties, as well as those of chords and tangents, “conjugate” diameters, asymptotes, etc. It is believed that he heavily drew from previous work by Euclid as well as from Menaechmus, who is generally considered the discoverer of conic sections.

Greeks did not stop there. For the purpose of solving construction problems not amenable to the straightedge and the compass, they introduced more sophisticated loci like conchoids and cissoids, and “kinematic” curves like the quadratrix or the Archimedean spiral.

When the method of coordinates was introduced by Fermat and Decartes in the XVII century, the sophisticated auxiliary constructions typical of synthetic geometry were replaced by more straightforward and systematic algebraic methods. The equations of the above mentioned curves were obtained right away by expressing their defining properties in the language of Algebra. For instance, a parabola is a conic with eccentricity e=1. In other words, it is the locus of points equidistant from the focus and the directrix. A two line computation gives the equation

y^2=2px

for a parabola with focus at F(p,0) and directrix x=-p. In a similar fashion the equations for the other conics can be obtained and used to derive further properties. Conic sections correspond to quadratic equations in two variables, a fact first established by Wallis in 1655.

Yet another locus, also considered by the Greeks, is that of points such that the ratio of their distances to two given points is constant. Using coordinates, one easily arrives at the equation of a circle (or a line if the ratio is equal to one). These are the so called Apollonian circles. They appear in applications, for instance as the zero-potential line for a system of two point charges in Electrostatics.

In the previous examples, the property defining the curve could be directly translated into a finite algebraic relation between the coordinates. With the birth of Calculus other classes of curves started to draw the attention of scholars, namely those whose defining property was more “local” in nature, in the sense that it involved the direction or some other feature of the curve at each point. In those cases, the defining property is not a finite, but rather a differential equation, relating x, y, dx, dy, d^2y etc. which can be integrated in quadratures in some cases.

The consideration of the “differential triangle” with sides dx,dy at a generic point of the sought after curve was (and still is) a valuable tool in the derivation of the differential relations. Let’s look into some examples.

A family of equilateral hyperbolas

Consider the locus of points on the XY– plane satisfying the following property: the area of the triangle defined by the tangent, the ordinate and the subtangent is a positive constant a^2.

Let the generic point be P(x,y), its ordinate by PQ and the subtangent be QR (figure below).

Since PR is tangent to the curve, the triangle \Delta PQR is similar to the differential triangle at P. Therefore,

\displaystyle{\frac{dy}{dx}=\frac{PQ}{QR}=\frac{\pm y}{QR}}

and, consequently

\displaystyle{QR=\pm y\frac{dx}{dy}}

The given condition then reads

\displaystyle{\pm y^2 dx=2a^2dy}

or

\displaystyle{\frac{dy}{y^2}\pm\frac{dx}{2a^2}=0}

The expression on the left is a total differential,

\displaystyle{d\left(-\frac 1{y}\pm \frac{x}{2a^2}\right)=0}

giving a general solution of the form

\displaystyle{y=\frac{2a^2}{C\pm x}},

which is a family of equilateral hyperbolas with common asymptote y=0. As we move along one of these hyperbolas, the distance to the Y-axis is inversely proportional to the size of the subtangent.

The tractrix

The following problem was proposed by Claude Perrault in1670, solved in 1692 by Huygens and subsequently solved by Leibniz, Johann Bernoulli and others. 

What is the path of an object dragged along a horizontal plane by a string of constant length when the end of the string not joined to the object moves along a straight line in the plane?” 

Obviously, if the object is initially on the line of the force, the path is just a line. Assume it is not. For simplicity, choose the Y-axis in the direction of the force and the X-axis containing the point where the object is initially located. Let a be the initial distance from the object to the Y-axis (equal to the length of the string) so the initial position is (a,0). We look at this problem from a strictly geometric point of view, assuming that the object is a mass point that reacts instantly to the pulling force, aligning its motion with the force at all times. In other words, the goal is to find a curve whose segment of tangent between the point of tangency and the Y-axis has constant length, equal to a.

In the figure, P is a generic point on the curve, Q is the point of intersection between the tangent at P and the vertical axis and R is the foot of the perpendicular from P to the axis (so RP is the abscissa). Here, a=5.

The condition to be satisfied is |PQ|=a.

The triangle \Delta PQR and the differential triangle are similar, as before. Therefore,

\displaystyle{-\frac{dy}{dx}=\frac {|QR|}{|RP|}}

thus implying

\displaystyle{|QR|=-|RP|\frac{dy}{dx}=-x\frac{dy}{dx}}.

The condition |QP|^2=|RP|^2+|QR|^2 then reads

\displaystyle{x^2+x^2\left(\frac{dy}{dx}\right)^2=a^2},

equivalent to two differential equations,

\displaystyle{\frac{dy}{dx}=\pm\frac{\sqrt{a^2-x^2}}{x}},

Direct integration (say, by setting x=a\cos\theta) adding the condition y(a)=0 gives

\displaystyle{y=\pm\left(a\ln \frac{a+\sqrt{a^2-x^2}}{x} -\sqrt{a^2-x^2}\right)}

corresponding to an upper branch with negative slope (puller moving up) and a lower branch with positive slope (puller moving down). The branches meet at the initial point (a,0), which is a cusp. The vertical axis is an asymptote.

As it happens, if a tractrix is rotated about the asymptote, the obtained surface is a pseudosphere, whose Gaussian curvature is a negative constant (just like the Gaussian curvature on a sphere is a positive constant). The local geometry on a pseudosphere is hyperbolic, as shown by E. Beltrami.

Involutes

Some curves are generated from others. An involute (also called evolvent) of a curve is the locus of the tip (or any other point) on a piece of taut string as the string is either unwrapped from or wrapped around the curve. Involutes were first studied by Huygens in 1673, particularly those of a cycloid, as part of his study on isochronous pendula. There are infinitely many involutes to a given curve, depending on the point where the tip of the string detaches from it, and also depending on the direction of the wrapping/unwrapping. In the figure below, an involute to a given circle is represented in blue color. Any other involute is obtained by rotation/reflection about a line through the origin.

Let us derive the equation of the involute of a general regular curve \gamma on the plane given parametrically:

\gamma: x=x(t);\quad\quad y=y(t),

where regularity means x'(t)^2+y'(t)^2\neq 0 (if we think of the parameter t as time, the point never stops during its motion). We can easily parametrize the involute using the same parameter t. Namely, call (X(t), Y(t)) the coordinates of the point of the involute on the tangent to \gamma at (x(t),y(t)) and let s the length of the detached portion of the string (that is, the arc length measured from the point of detachment, also called natural parameter in Differential Geometry). In the figure below, we assume that the base curve is positively oriented, so the increase of the parameter t corresponds to a counterclockwise motion along the curve. The values dx and dy represented correspond to dt>0. The string is also being unwrapped counterclockwise.

We have

X(t)=x(t)+s\cos\phi;\qquad Y(t)=y(t)+s\sin\phi,

where \phi is the angle formed by the tangent at (x(t),y(t)) with the X-axis.

On the other hand, from the differential triangle we see that

\cos\phi=-dx/ds;\qquad \qquad \sin\phi=-dy/ds;\qquad ds=\sqrt{dx^2+dy^2}.

In terms of derivatives, we conclude

\displaystyle{X(t)=x(t)-s\frac{x'(t)}{\sqrt{x'(t)^2+y'(t)^2}};\qquad Y(t)=y(t)-s\frac{y'(t)}{\sqrt{x'(t)^2+y'(t)^2}}}.

In the generał case, s is obtained via integration from a given point

\displaystyle{s=\int\limits_{t_0}^tds=\int\limits_{t_0}^t\sqrt{x'(t)^2+y'(t)^2}\,dt}.

For a circle of unit radius, parametrized as x=\cos t,\, y=\sin t, clearly s=t if we choose the point (1,0) as the starting point. An appplication of the general formula gives

\displaystyle{X(t)=\cos t + t\sin t;\qquad Y(t)=\sin t - t\cos t}.

The involutes of a circle are used in the design of gear teeth, see https://www.marplesgears.com/2019/10/an-in-depth-look-at-involute-gear-tooth-profile-and-profile-shift/

Evolutes

Huygens also proved that the locus of the centers of curvature of any involute is the original curve, called the evolute. Huygens and his contemporaries defined the center of curvature as the “point of intersection of two infinitesimally close normals”. Nowadays we would say: ” the center of curvature is the center of the osculating circle”. This conceptual shift clearly shows the transition from a more dynamic to a more static point of view which runs in parallel with the abandonment of infinitesimals.

For algebraic curves, the osculating circle can be found by purely algebraic methods. Indeed, the osculating circle is the unique circle having order of tangency at least two with the curve at the given point. That was the method employed by Descartes. As Johann Bernoulli pointed out, this procedure breaks down for transcendental curves and has to be replaced by a more flexible method based on infinitesimal calculus.

I reproduce below Bernoulli’s derivation of a formula for the radius of the osculating circle (radius of curvature) as a function of x,y,dx,dy and d^2y. It is representative of the Leibnizian calculus of infinitesimals. Remarkably, the result involves second differentials.

Let ABO be a portion of a regular curve, with B and O being infinitesimally close points on it. Let the normals to the curve at B and O meet at D. We choose the origin of coordinates at A and pick the X-axis so it intersects BD and OD at points H and G. We draw a vertical auxiliary line BE and a horizontal line through B meeting OD at C, and yet another vertical through O meeting BC at F. Finally, we draw GL, perpendicular to BD. Let the coordinates of B and O be (x,y) (resp. (x+dx, y+dy)). Our goal is to compute the radius of curvature R=BD, in terms of x=AE, y=EB and their differentials dx=BF and dy=OF. Due to the local nature of R, the final result will not depend on x,y directly.

First, we observe that triangles \Delta OFC, \Delta BEH, \Delta GLH are all similar (strictly or up to negligible infinitesimals) to the differential triangle \Delta BFO. From the similarity of \Delta DHG and \Delta DBC we get

\displaystyle{\frac{BD}{HD}=\frac{BC}{HG}}

Writing HD=BD-BH and solving for BD,

\displaystyle{BD=\frac{BC\dot BH}{BC-HG}}.

Using the triangle similarities mentioned above,

\displaystyle{BC=BF+FC=BF+\frac{FO^2}{BF}=dx+\frac{dy^2}{dx}=\frac{dx^2+dy^2}{dx}} ;

\displaystyle{BH=\sqrt{BE^2+EH^2}=\sqrt{BE^2+\left(\frac{BE\cdot FO}{BF}\right)^2}=\sqrt{y^2+\left(\frac{ydy}{dx}\right)^2}=\frac{y\sqrt{dx^2+dy^2}}{dx}} ;

\displaystyle{HG=dAH=d(AE+EH)=d\left(x+\frac{ydy}{dx}\right)=dx+\frac{yd^2y+dy^2}{dx}}.

Putting all together,

\displaystyle{R=BD=\frac{(dx^2+dy^2)^{3/2}}{-dxd^2y}}.

Taking into account that our figure assumes d^2y<0 (otherwise the point D would be located above the curve and a few signs in the computations would change), and dividing throughout by dx^3 we obtain the familiar formula

\displaystyle{R=\frac{\left(1+y'^2\right)^{3/2}}{|y''|}}.

At points where y''=0, the osculating circle degenerates into a line and R=\infty.

If the original curve is given in parametric form, x ceases to be an independent variable and one has to modify the computation of HG above. Since d^2x\neq 0 one has

\displaystyle{d\left(x+\frac{ydy}{dx}\right)=dx+\frac{(yd^2y+dy^2)dx-ydyd^2x}{dx^2}},

leading to the formula

R(t)=\displaystyle{\frac{\left(x'^2+y'^2\right)^{3/2}}{|y''x'-x''y'|}},

where now derivatives are taken with respect to the parameter t.

Compared with a modern derivation, the one above may rightfully seem a bit clumsy and lacking a systematic approach. It is more of an art; the art of recognizing quantities that can be disregarded in the pre-limit situation. However, apart from that, it is impressive how little is actually needed to get the formula. Just similarity of triangles!

Once we have a formula for the radius of curvature, deriving the equation of the evolute of a generic curve is straightforward. We obtain the point (X(t),Y(t)) on the evolute by shifting the point (x(t),y(t)) in the direction of the normal (-dy,dx) by the amount R(t).That is,

\displaystyle{X(t)=x(t)-\frac{y'(t)R(t)}{\sqrt{x'(t)^2+y'(t)^2}}};\quad \displaystyle{Y(t)=y(t)+\frac{x'(t)R(t)}{\sqrt{x'(t)^2+y'(t)^2}}}.

The formulas obtained allow to easily prove Huygens’ claim: a given curve is the evolute of any of its involutes. As a consequence, the evolute is the envelope of the family of normals of any of its involutes.

Involutes have cusps at the point where the string detaches from the curve. Evolutes have cusps at points corresponding to maximum/minimum curvature.

Some examples of involute/evolute pairs are: tractrix/catenoid, parabola/semicubic parabola, ellipse/(stretched) astroid, logarithmic spiral/(another) logarithmic spiral, &c.

A series of videos showing several involute/evolute pairs and how they are generated can be found here https://kmr.dialectica.se/wp/research/math-rehab/learning-object-repository/geometry-2/metric-geometry/euclidean-geometry/geometry/plane-curves/evolutes/

What is a tangent?

In simple instances, tangency can be easily characterized. For example, a straight line is tangent to a circle precisely when they share a unique point. More generally, a curve which is the smooth boundary of a convex set has a unique tangent at each point, which can be characterized as the supporting line of the convex set.

But in more general cases such description is inadequate. For example, how do we characterize the tangent to the graph of a polynomial like y=x^3-6x at the point (1,-5)? The epigraph is not convex, so the description as a supporting line is not available. Also, it clearly intersects the graph at some other point. Even worse, at the point (0,0) the tangent actually crosses the graph, so the latter is not contained in either one of the two half-planes defined by the tangent, even locally. For polynomials, an algebraic definition of tangency in terms of the multiplicity of the point of tangency as a root to a polynomial equation is available, but in more general situations an analytic description is needed.

Finding tangents, along with finding areas, is one of the geometric problems that gave impetus to the use of infinitesimals and the eventual creation of Calculus. Predecessors of Newton and Leibniz, notably Fermat and Descartes, devised methods for finding tangents. For example, according to Fermat’s method, to find the tangent to the graph below at D(x,y), he would choose a point E(x+h,y(x+h)) on the graph, very close to D, and would argue that the triangles \Delta BAD and \Delta BCE were “almost” similar. He then wrote an “adequality” (approximate proportion)

\displaystyle{\frac{|EC|}{|AD|}=\frac{|BC|}{|BA|}}

or, calling s=|AB| (the subtangent)

\displaystyle{\frac{f(x+h)}{f(x)}=\frac{s+h}{s}}.

This, in turn, can be written as

\displaystyle{\frac{f(x+h)-f(x)}{h}=\frac{f(x)}{s}}.

Expanding and simplifying the left hand side and setting h=0 (a predecessor of “finding the derivative”) gives f(x)=ms, m being the slope of the tangent. When y is a polynomial function of x, this program can be easily carried out.

These ideas were further refined by Roverbal, Wallis, Barrow, Newton and Leibniz. The latter defined the tangent as the line through a pair of infinitely close points on the curve.

The modern analytic characterization is the following: the tangent line at a point of a curve has to be such that, as we move along the tangent towards the point of intersection with the curve, the distance to the graph in any “transversal” direction decreases to zero faster than the distance “along” the tangent to the point of intersection. That is, the curve is “transversally” closer to the tangent near the given point. In a sense, the neglect of the transversal separation between the triangles ABD and CBE is the key to Fermat’s construction. The following figure clarifies the situation further.

The line g is tangent to the graph at A if

\displaystyle{|BC|=o(|AB|)}

when |AB| is infinitesimal. The transversal direction chosen is irrelevant. Had we chosen the direction BD in the above figure, we would still have \displaystyle{|BD|=o(|AB|)}, since as B approaches A, the curve and the tangent become “parallel” and the ratio |BD|/|BC| stabilizes to some positive constant.

Notice that, for a non-tangent, transversal line through A, the ratio |BC|/|AB| is not infinitesimal, but rather stabilizes to a positive value depending on the final non-zero angle between the curve and the chosen non-tangent line.

It follows from our definition that the tangent is unique (if there is one). It is also worth noting that the concept of tangency belongs to Euclidean, as well as to affine geometries: tangency does not break down under rigid motions, translations and more general affine transformations (shear, dilations, etc.)

It is natural to quantify the “amount” of tangency by the order of the infinitesimal |BC| with respect to |AB|. Namely, we will say that the order of tangency is k\ge 0 if

\displaystyle{|BC|=O(|AB|^{k+1})}

So, in particular, the order of tangency is zero if the line is transversal to the curve, it is one if|BC| is a quadratic infinitesimal, etc. The order of tangency may not be an integer.

Tangents and differentiability

The existence of the tangent to the curve y=y(x) at a point (x_0,y_0) is intimately related to the existence of the first differential dy(x_0,dx). To see why, we start by noticing that if the curve y=y(x) has a non-vertical tangent at (x_0,y_0) (see figure below), the vertical distance between the graph and the tangent is also infinitesimal with respect to |AE|, because |AE| and |AB| are proportional, |AE|=|AB|\cos\alpha. Therefore

\displaystyle{\frac{|BC|}{|AE|}=\frac 1{\cos\alpha}\frac{|BC|}{|AB|}\to 0}

when |AB| is infinitesimal. But |AE|=|x-x_0|, where x is the abscissa of the moving point B.

If we assume that the straight line y=y_0+m(x-x_0) is tangent at (x_0,y_0), we should have

|BC|=y(x)-[y_0+m(x-x_0)]=o(|x-x_0|)

or

y(x_0+dx)-y(x_0)=mdx+o(dx).

But this is precisely our definition of differentiable function at x_0, with dy(x_0,dx)=mdx. Therefore, the existence of a tangent with slope m implies differentiability with dy/dx=m. The value of the derivative is the slope of the tangent.

If the order of tangency is k\ge 2, that is if

y(x_0+dx)-y(x_0)=mdx+o(dx^k),

it follows from the definition that d^ry(x_0,dx)=0 for r\ge k.

As an example, let’s examine the order of tangency between a circle and its tangent at some point. For simplicity, consider the circle

x^2+(y-1)^2=1

with unit radius, center (0,1) and tangent to the x-axis at the origin O, see the figure below.

We have |OA|=\sin\theta, |AB|=1-\cos\theta. Therefore,

\displaystyle{\frac{|AB|}{|OA|}=\frac{1-\cos\theta}{\sin\theta}=\frac{\sin\theta}{1+\cos\theta}\to 0}

as \theta\to 0. Hence |AB|=o(|OA|). Actually, |AB| is an infinitesimal of the second order with respect to |OA|. Indeed,

\displaystyle{\frac{|AB|}{|OA|^2}=\frac{1-\cos\theta}{\sin^2\theta}=\frac{1}{1+\cos\theta}\to \frac 1{2}}\qquad\qquad (*)

as \theta\to 0. Thus |AB|=O(|OA|^2) and the order of tangency is k=1.

The value of the limit (*) would be different for circles of different radii. Indeed, if our circle was x^2+(y-R)^2=R^2, since the numerator and denominator in (*) scale differently, the limit would be 1/2R.

The usual (linear) angle between a circle and its tangent at one point is zero. But our previous analysis allows to quantify the separation between the circle and the tangent using an infinitesimal quadratic scale. Observe that when R is close to zero, 1/2R is very large, whereas when R is very large and the circle is “very close” to its tangent, 1/2R is close to zero.

Angles like the above, whose usual measure is zero but their extent can be quantified as before, have been known since antiquity.

Horn angles

A horn angle is the angle formed between a circle and its tangent or, more generally, between two tangent circles at their point of tangency. Given two circles internally tangent, it is natural to define the measure of the horn angle between them as the difference of the angles they form with their common tangent. If they are externally tangent, we take the sum instead, as in the figure below.

Thus, \angle (c,d)=1/R_1-1/R_2 in the first case, \angle (e,f)=1/R'_1+1/R'_2 in the second. The factor \lq\lq 1/2" is common and omitted for simplicity. For a circle of radius R, the quantity 1/R is called its (scalar) curvature. The measure of a horn angle between two tangent circles is the difference/sum of their curvatures.

Horn angles are mentioned by Euclid in Book III (Prop. 16) of the “Elements”, and were known to Archimedes and Eudoxus. Euclid states that a horn angle is “smaller than any acute angle”. That property made horn angles problematic to Ancient Greek mathematicians, since they always assumed that any two “homogeneous” magnitudes (lengths, angles, areas, ..) were comparable, in the sense that anthyphairesis (as we would say today, the Euclidean algorithm) could be applied to them. At the end of the process, commensurable magnitudes could be assigned numbers with respect to some unit. Incommensurable magnitudes could not, but the situation could be handled by means of Eudoxus’ theory of proportions, a predecessor of Dedekind’s theory of real numbers. But the notion of a non-zero angle smaller than any acute angle was out of grasp. In modern terminology, we would say that the Archimedean property of segments, angles, areas was a basic assumption in Greek geometry.

We have been able to define a measure for horn angles using the concept of order of infinitesimals. However, to restore the Greeks’ assumption on the possibility to compare (albeit “in the limit”) any pair of angles, actual infinitesimal angles of different orders need to be included in the picture. This is the content of non-Archimedean Geometry, based on the construction of non-Archimedean fields (surreal, hyperreal numbers) in Nonstandard Analysis. The development of these ideas is a very interesting chapter of Analysis, well deserving a separate post, or even a separate thread.