Transcendental numbers? Show me a nice one!

The structure of the real line is deceptively intricate. Rational and irrational numbers are densely interwoven, yet they differ fundamentally—not just in their arithmetic properties, but in how they can be approximated. Even the rationals themselves possess a rich, hierarchical organization, elegantly captured by the Stern-Brocot tree.

If you take a course on Real Analysis where these matters are looked at in some detail, or an Algebra course where extensions of the rational field \mathbb{Q} are considered (say, a course on Galois theory), you encounter the concepts of algebraic and transcendental numbers. Rational numbers can be thought as the ones solving a linear equation a_1x+a_0=0 with integer coefficients, a_1\neq 0. A natural generalization leads us to the definition of algebraic numbers as real numbers satisfying a polynomial equation of the form a_nx^n+a_{n-1}x^{n-1}+\dots +a_0=0 with integer coefficients a_0,a_1,\dots a_n. The degree of an algebraic number is the lowest degree of a polynomial with integer coefficients having the given number as a root. Such polynomial has to be irreducible in \mathbb{Z}[x], otherwise the said number would be a root of a lower degree polynomial. Real numbers which are not algebraic are called transcendental. Clearly, all rational numbers are algebraic, and all transcendental numbers are irrational.

The idea of the existence of transcendental numbers goes back to Leibniz, but the first to prove it, by giving a concrete example, was J. Liouville in papers from 1844 and 1851, [1] & [2]. It is often the case that when transcendental numbers are presented, the examples provided include \pi, e, and the Euler-Mascheroni constant \gamma or Apéry’s constant \zeta(3) are mentioned as candidates (it is not even known if those are irrational). Proving that \pi or e are transcendental was accomplished by Hermite and Lindemann in the second half of the XIX century, and their proofs are far from elementary and can hardly be motivated and put in simple terms.

In contrast, Liouville’s constant has a very simple structure and the proof of its transcendency is completely elementary. It is defined as the number

\displaystyle{L=\sum_{k=1}^{\infty}\frac 1{10^{k!}}},

that is, the number 0.11000100000000000000000100000\dots where the “ones” appear at positions given by the factorials, 1,2,6,24, \dots after the decimal point. Not being periodic, L is clearly irrational.

Here is the reason why L is transcendental: it can be approximated “too well” by the partial sums of the series, contradicting a result proved by Liouville, whose (very simple) proof is given below. The statement of the result, which we accept for the time being, is as follows.

Theorem [1] : Let \alpha be an irrational algebraic number of degree d\ge 2 (i.e., \alpha is a root of an irreducible polynomial of degree d with integer coefficients). Then there exists a constant C(\alpha)>0 such that for all rational numbers \frac{p}{q} (with p,q\in\mathbb{Z},\, q>0) we have

\displaystyle{\left|\alpha-\frac{p}{q}\right|\ge \frac{C(\alpha)}{q^d}}\qquad (1).

(in words, algebraic numbers are not “too close” to rationals, in the sense that the rate of convergence of sequences of rationals with increasing denominators is limited by the degree of the given algebraic number). Formula (1) should be contrasted with the well known fact that the continued fraction convergents \displaystyle{\frac{p_k}{q_k}} of any irrational number \alpha satisfy

\displaystyle{\left|\alpha-\frac{p_k}{q_k}\right|\le \frac{1}{q_k^2}}.

Liouville’s constant violates the theorem

The partial sums L_n=\sum_{k=1}^{n}\frac 1{10^{k!}} are rational numbers with denominator q_n=10^{n!}. It is clear that

\displaystyle{\left|L-L_n\right|=\sum_{k=n+1}^{\infty}\frac 1{10^{k!}}\le \frac{2}{ 10^{(n+1)!}}}\qquad (2)

On the other hand, if L were algebraic of degree d, we would have, according to Liouville’s Theorem,

\displaystyle{\left|L-L_n\right|\ge \frac {C}{ 10^{dn!}}}\qquad (3)

But relations (2) and (3) clearly contradict each other for large n. Indeed, for large enough n, we have

\displaystyle{\frac{2}{ 10^{(n+1)!}}<\frac {C}{ 10^{dn!}}},

since (n+1-d)n!\to\infty. Thus, L is transcendental.

A proof of Liouville’s Theorem

Proof: By definition, there exists an irreducible polynomial in \mathbb{Z}[x]

P(x)=a_dx^d+a_{d-1}x^{d-1}+\dots +a_0

with P(\alpha)=0. If \,\,\displaystyle{\frac{p}{q}}\,\, is a rational number in lowest terms ({\it i.e.} \textrm{g.c.d}\,(p,q)=1), then q^dP\left(\frac{p}{q}\right) is a non-zero integer,

q^dP\left(\frac{p}{q}\right)=a_dp^d+a_{d-1}p^{d-1}q+\dots +a_0q^d

and therefore

\displaystyle{\left|P\left(\frac{p}{q}\right)\right|\ge \frac{1}{q^d}}.

Next, we relate P\left(\frac{p}{q}\right) to P(\alpha) via the Mean Value Theorem,

\displaystyle{\left|P\left(\frac{p}{q}\right)\right|=\left|P\left(\frac{p}{q}\right)-P(\alpha)\right|=\left|P'(\xi)\right|\left|\frac{p}{q}-\alpha\right|}

for some \xi between \frac{p}{q} and \alpha. If we fix an interval around \alpha, that is, if we assume, say, \left|\frac{p}{q}-\alpha\right|<1, then \left|P'(\xi)\right|\le M(\alpha) and the previous estimate entails

\displaystyle{\left|\frac{p}{q}-\alpha\right|=\frac{\left|P\left(\frac{p}{q}\right)\right|}{\left|P'(\xi)\right|}\ge \frac{1}{M(\alpha)q^d}}.

The assertion follows with C(\alpha)=\min\left\{1,\frac 1{M(\alpha)}\right\}.

Final Remarks

During the time of Liouville and Hermite, transcendental numbers appeared rare and exotic, requiring ingenious proofs to isolate even a handful of examples. Yet just a few decades later, in 1874, Cantor demonstrated that, in fact, almost all real numbers are transcendental. What seemed like a hunt for “needles in a haystack” turned out to be a realization that the “haystack” was almost entirely made of “needles.” His proof, however, was non-constructive—relying not on explicit examples but on the abstract principles of cardinality and the countability of algebraic numbers.

Here is a question worth pondering: If almost all numbers are transcendental, why do the ones we encounter so rarely reflect that? Part of the answer to this question is probably that humans think algorithmically. We are drawn to numbers we can computedescribe, or construct. Most numbers are not describable.

The apparent simplicity of Liouville’s constant is deceptive—a trick of human intuition. What makes it seem simple is merely the ease of describing its decimal representation. By contrast, a number like \sqrt{2} might appear ‘chaotic’ in its decimal expansion, yet it is fundamentally structured: as a quadratic irrational, its continued fraction is perfectly periodic,

\displaystyle{\sqrt{2}=1+\frac{1}{2+\displaystyle{\frac{1}{2+\displaystyle{\frac{1}{2+\displaystyle{\frac{1}{2+\dots}}}}}}}}.

The same applies, for instance, to the golden ratio, again a quadratic irrational with a very simple continued fraction representation, etc. This reveals a deeper truth: our perception of mathematical simplicity often depends on the representation we choose, not the object itself.

I hope you enjoyed this dive into Liouville’s constant — its deceptively tidy decimal mask is, after all, a very human kind of charm.

Bibliography

[1] Liouville, J. (1844), “Mémoires et communications”Comptes rendus de l’Académie des Sciences (in French). 18 (20, 21): 883–885, 910–911.

[2] Liouville, J. (1851), “Sur des classes très étendues de quantités dont la valeur n’est ni algébrique, ni même réductible à des irrationnelles algébriques”,  Journal de Mathématiques Pures et Appliquées, 16, 133–142. https://www.numdam.org/item/JMPA_1851_1_16__133_0.pdf

The Legendre transform and some applications

The Legendre transform has its origins in the work of A.M. Legendre [1], where it was introduced to recast a first order PDE into a more convenient form. It gained significance around 1830 with the reformulation of Lagrangian mechanics carried out by W. R. Hamilton by replacing the “natural” configuration space by the so called phase space with its rich symplectic structure [2] and as a central tool in Hamilton-Jacobi theory. In the late 19th century, it was used in Thermodynamics (Gibbs, Maxwell)  as a way of switching between different thermodynamic potentials. With the development of Convex Analysis and Optimization in the 20th century, its central role in providing so called dual representations of functions, problems, etc. became apparent, [3].

From a high level point of view, the transform establishes a link between a function of a variable x belonging to \mathbb{R}^n or a more general topological vector space and its “dual” or “conjugate” function, defined on the dual space of linear forms. From a more practical perspective, it can be understood as an alternative way of encoding the information contained in a function. In that sense, it is very similar to other transforms like the Fourier series/transform and the Laplace transform. The Fourier series, for instance, encodes the information about the given (periodic) function in the form of a sequence of “amplitudes” corresponding to the different “modes”. The Fourier transform furnishes a similar integral representation for non-periodic functions, where this time different “modes” or “frequencies” form a continuum. Both the series and the transform provide a spectral representation of the function.

Suppose we are given a convex real function of one real variable. Let us assume for simplicity that f is twice differentiable and f''(x)>0 for all x. Typical examples are quadratic functions f(x)=ax^2+bx+c with a>0, exponential functions g(x)=Ce^{kx} with C>0, the function h(x)=\ln (1+e^x), etc.

Since f''(x)>0, f' is a strictly increasing (hence one-to-one) function. Such correspondence allows to identify x with the value of the slope of f at that point, p=f'(x). In other words, p can be used as a new independent variable. At this point, one might be tempted to use the function g(p)=f(x(p)), where p=f'(x) (or x=(f')^{-1}(p)) as an alternative representation of f. This is, no doubt, a possible choice. However, a little extra work provides a function f^* that has the following pleasant additional property: if the procedure is applied twice we recover the original function, f^{**}:=(f^*)^*=f. The transformation f\to f^* is what is called involutive (an involution).

Namely, given a slope p, we start by identifying the point x such that p=f'(x). You can think of this operation as parallel-transporting a line with slope p in the direction of the y- axis starting from y=-\infty. Assuming that p\in\textrm{Im\ }f', there will be a first point of contact with the graph of f. At this point, the graph is tangent to the line. We will encode this information by recording the point of intersection with the y– axis; more precisely, if our line has equation y=px+b, we define f^*(p)=-b. The knowledge of the function f^* is tantamount to the knowledge of all the tangents to our graph: our original graph is their (uniquely defined) envelope. The function f^* is called the Legendre transform of f. In the figure below, the values of the transform of the function f(x)=x^2-2x+3 at p=0, \, p=1 and p=2 are represented.

It is easy to derive an explicit formula for f^*. Given p, we need to solve for x=x(p) in the equation

p=f'(x).

The equation of the tangent is y-f(x(p))=p(x-x(p)). Its y-intercept is b(p)=f(x(p))-px(p). Finally,

f^*(p)=-b(p)=px(p)-f(x(p))\qquad\qquad (1).

This new function f^* is convex. Indeed, a simple computation shows that

\displaystyle{(f^*)''(p)=\frac 1{f''(x(p))}>0}.

From formula (1) it follows that, given x, we have

f(x)=p(x)x-f^*(p(x)),

where, as before, f'(x)=p. The formula above reveals that f=(f^*)^*, (i.e. the transformation is involutive), since

\displaystyle{\frac{df^*(p)}{dp}=x(p)+p\frac{dx}{dp}-f'(x(p))\frac{dx}{dp}=x(p)+\frac{dx}{dp}\left[p-f'(x(p))\right]=x(p)}

There is another way to look at this. For fixed p in the range of f', the condition p=f'(x) yields the (unique) critical point of the concave function x\to px-f(x). This point is a global maximum. Thus,

f^*(p)=\max\limits_{x}\,(px-f(x)).\qquad\qquad (2).

The latter equation implies the well-known Young’s inequality:

f^*(p)+f(x)\ge px, \qquad\qquad (3)

valid for any x,p. Equality takes place when x,p are linked by the relation p=f'(x).

Examples: If f(x)=\frac{x^2}{2}, f^*(p)=\frac{p^2}{2}. More generally, if f(x)=\frac{x^{\alpha}}{\alpha} with \alpha>1 for x>0, then f^*(p)=\frac{p^{\beta}}{\beta}, where 1/\alpha+1/\beta=1. If f(x)=e^x, then f^*(p)=p\ln p -p, etc.

Generalizations

So far, we have assumed that our function f is twice differentiable, with f''(x)>0 in its domain. However, the following slight modification of (2),

f^*(p)=\sup\limits_{x}\,(px-f(x))\qquad (4)

is meaningful for any real p and any function f if we accept the values \pm\infty in the range of f^*. The resulting function f^*, being the supremum of a family of linear functions, is convex on its domain. Definition (4), however, is not completely satisfactory for non-convex functions, as it loses information about the original function. However, for a convex function defined on \mathbb{R}, f^* encodes all the information about f and, moreover, f^{**}=f, a fact known as the Fenchel-Moreau theorem, [3]. Namely,

f(x)=\sup\limits_{p}\,(px-f^{*}(p))\qquad (5),

which can be thought as a representation of f as the envelope of its tangents. For convex functions defined on some domain D\neq\mathbb{R}, an extra technical condition is needed on f for (5) to hold, namely lower semicontinuity, which is equivalent to the closedness of its epigraph. It is easy to see that this is a necessary condition for (5) to hold. Indeed, (5) furnishes a representation of the epigraph of f as an intersection of closed half-planes, hence necessarily closed.

In the context of Convex Analysis, the more general transformation given by (4) is called the Fenchel transform (or Fenchel-Legendre transform) and the more general inequality (3) is called the Fenchel inequality (or Fenchel-Young inequality).

Thus, the transformation can be applied to a piece-wise linear, convex function. For instance, if f is a linear function, f(x)=ax+b, then clearly f^* is finite only for p=a, with f^*(a)=b. If f is made of two linear functions, f(x)=a_1x+b_1 when x\le x_0 and f(x)=a_2x+b_2 when x\ge x_0, with a_1<a_2 and a_1x_0+b_1=a_2x_0+b_2, then f^* is finite only for p\in [a_1,a_2], where it is linear and ranges from -b_1 to -b_2. In general, to each “corner” of a polygonal graph there corresponds a segment on the graph of f^*, [2].

The generalization to functions f:\mathbb{R}^n\to\mathbb{R} is straightforward. Under the smoothness and strict convexity assumption

\displaystyle{\left(\frac{\partial^2f}{\partial x_i\partial x_j}\right)\succ 0},

the mapping D: x\to df(x,\cdot) is one-to-one. If p is the vector representing df(x,\cdot) via the standard Euclidean structure, we define the Legendre transform by

f^*(p)=\langle p, x\rangle-f(x), \qquad p\in \textrm{Im\ }D,

where \langle \cdot,\cdot\rangle denotes the inner product. Much like in the one-dimensional case, the previous definition is equivalent to

f^*(p)=\max\limits_{x}\,(\langle p, x\rangle-f(x)).

Finally, we relax the smoothness and strict convexity assumptions and arrive at the most general definition

f^*(p)=\sup\limits_{x}\,(\langle p, x\rangle-f(x))\qquad (5),

where now f is merely convex and f^*:\mathbb{R}^n\to (-\infty,+\infty]. The Fenchel-Moreau theorem holds without modifications under the extra assumption of lower semicontinuity of f if it is not defined on all of \mathbb{R}^n.

One can also consider “partial” Legendre transforms, i.e. transforms relative to some of the variables. Thus if g:\mathbb{R}^2\to\mathbb{R} is a function of two variables, one can consider its transform with respect to the first variable,

f_1^*(p_1,x_2)=\sup\limits_{x_1}\,( p_1 x_1-f(x_1,x_2)).

For smooth, strictly convex functions and fixed x_2, the supremum is achieved at the (unique) x_1 defined by

\displaystyle{p_1=\frac{\partial f}{\partial x_1}}.

The transform can be further generalized to functions on manifolds, but given the fact that a manifold does not have a global linear structure, the duality is established locally, between functions on the tangent bundle TM and their “conjugates” or “dual” on the cotangent bundle T^*M. Namely, the transform connects functions of (x,v) with functions of (x,p), where v\in T_xM and p\in T^*_xM.

Applications

A) Clairaut’s differential equation

A standard example of ODE not solved for the derivative y' is Clairaut’s equation

y=xy'-f(y').

It clearly admits the family of straight lines y=px-f(p) as solutions. The envelope of the family is a singular solution satisfying y=px-f(p) and x=f'(p). But these are precisely the relations defining the Legendre transform of f. We conclude that, for convex f, the singular solution of Clairaut equation is its Legendre transform.

B) Hamiltonian Mechanics from Lagrangian Mechanics.

For many Physics, Applied Math, and Engineering students, this is their first introduction to the Legendre transform. The Lagrangian of a mechanical system L(q,\dot{q},t) on its configuration space (a differentiable, Riemannian manifold) completely describes the system. The actual path q(t) joining two states (t_1,q_1) and (t_2,q_2) is an extremal of the action functional,

S[q]=\displaystyle{\int\limits_{t_1}^{t_2}}  L(q,\dot{q},t)\,dt

(Hamilton’s principle) and therefore satisfies Euler-Lagrange second order differential equations

\displaystyle{\frac{d}{dt}\frac{\partial L}{\partial \dot{q}}-\frac{\partial L}{\partial q}=0}.

Te geometry of the configuration space is intimately connected to the Physics. Thus, holonomic constraints are built into the manifold, the kinetic energy is nothing but the Riemannian metric on the manifold, geodesics relative to this metric represent motion “by inertia”, etc. The alternative Hamiltonian description reveals connections to a different geometry and is introduced as follows. Assuming that the Lagrangian is strictly convex in the generalized velocities \dot{q},

\displaystyle{\left(\frac{\partial^2L}{\partial \dot q_i\partial\dot q_j}\right)\succ 0},

we introduce the Hamiltonian of the system as the Legendre transform of the Lagrangian with respect to \dot{q}:

\displaystyle{H(p,q,t)=\langle p,\dot{q}\rangle -L(q,\dot{q},t)},

where \displaystyle{p=\frac{\partial L}{\partial \dot{q}}} is the generalized momentum, an element of the cotangent fiber at q. Since

\displaystyle{dH=\langle\dot{q},dp\rangle +\langle p,d\dot{q}\rangle-\langle\frac{\partial L}{\partial q},dq\rangle-\langle\frac{\partial L}{\partial \dot{q}},d\dot{q}\rangle-\frac{\partial L}{\partial t}dt}

\displaystyle{\,\,\, =\langle\dot{q},dp\rangle-\langle\dot{p},dq\rangle-\frac{\partial L}{\partial t}dt}

it follows that (p,q) satisfy the first order Hamiltonian system:

\displaystyle{\dot{p}=-\frac{\partial H}{\partial q}};\qquad \displaystyle{\dot{q}=\frac{\partial H}{\partial p}}

The space \{(p,q)\} is called the phase space, and the Hamiltonian function equips it with a remarkable symplectic structure. The Hamiltonian flow (if H does not depend on time) is a one-parameter subgroup of the group of symplectomorphisms of the phase space. A straightforward consequence is Liouville’s Theorem on the preservation of the phase volume (a cornerstone of Statistical Mechanics) . The Lagrangian approach has, however, certain advantages including: a) it is easier to deal with constraints, even non-holonomic via Lagrange multipliers; b) it is easier to track conserved quantities via Noether’s theorem, c) non-conservative forces can be incorporated, etc.

It is worth noticing that the above “Hamiltonization” applies to general variational problems, not just to the problem related to mechanical systems. In the case of mechanical systems, the part of the Lagrangian depending on \dot q is usually a positive definite quadratic function (the kinetic energy of the system) hence convex. The transform of a quadratic form is especially simple: it is just another quadratic form of the conjugate variable. For instance, the Lagrangian of a simple mass-spring system, assuming that the spring is linear and q represents the deviation of the mass from equilibrium is

L(q,\dot q)=\displaystyle{\frac 1{2}m\dot q^2-\frac 1{2}m q^2}

and the Hamiltonian is

H(p,q)=\displaystyle{\frac {p^2}{2m}+\frac 1{2}m q^2}

and represents the total energy of the system. That is the case whenever the Lagrangian is quadratic on velocities, the system is conservative and the constraints are time-independent.

C) Thermodynamic potentials

Yet another early use of the transform was in Thermodynamics, as a way to switch between potentials according to the most convenient independent parameters.

According to the first principle of Thermodynamics (energy conservation), there exists a function of state U (internal energy) such that, for any thermodynamical system undergoing an infinitesimal change of state,

\displaystyle{dU=\delta Q+\delta W+\sum_i\mu_idN_i},

where \delta Q represent the heat (thermal energy) added to the system, \delta W is the work of the surroundings on the system and the last sum above accounts for the energy added to the system by means of particle exchange. Here, dN_i is the amount of particles of the i-th type added to the system, and \mu_i the corresponding chemical potential. The relevant fact here is that dU is an actual differential, whereas the rest of the terms are just differential forms in the configuration space of the system. The term \delta W is in general a differential form \sum p_idq_i involving intensive (p_i) and conjugate extensive variables dq_i. Thus, for example, in the case of a gas expanding/compressing against the environment, the work done on the gas is \delta W=-p_{ext}dV, where dV is the infinitesimal change of volume and p_{ext} is the external pressure. Moreover, according to the second principle, for reversible processes the form \delta Q admits an integrating factor \frac 1{T}, namely there is a function of state called entropy S such that

\displaystyle{dS= \frac{\delta Q}{T}}

Putting all together, for an infinitesimal, reversible (hence quasi-static) process undergone by a gas we get

\displaystyle{dU=TdS-pdV+\sum_i\mu_idN_i}\qquad (6)

where p=p_{ext} is the pressure of the gas in equilibrium with its surroundings.

While U accounts for the total energy of the system, related state functions accounting for different manifestations of energy may result more convenient for particular experimental or theoretical scenarios. For instance, many chemical reactions occur at constant pressure in lab conditions. In such cases, it is convenient to include a term representing the work needed to push aside the surroundings to occupy volume V at pressure p, namely we define the enthalpy of the system as

H=U+pV.

Given that, thanks to (6) we have \partial U/\partial V=-p, H is nothing but (minus) the Legendre transform of U with respect to V, that is,

H(S,p)=-U^*(S,p).

(in Thermodynamics, the transform is usually defined with opposite sign so there is no “minus” in the previous formula. A possible reason is that one prefers all forms of energy to increase/decrease in agreement). Assuming for simplicity that dN=0 we have

dH=dU+pdV+Vdp=TdS-pdV+pdV+Vdp=TdS+Vdp

and, therefore, at constant pressure, dH=TdS=\delta Q. Thus, the change of enthalpy determines if a given chemical reaction is exothermic or endothermic. Moreover, \partial H/\partial S=T and \partial H/\partial p=V.

In a similar fashion, one can consider the (opposite of) Legendre transform of U with respect to entropy,

F=U-TS,

called (Helmholtz) free energy. A similar computation shows that

dF=-SdT-pdV,

thus we can think of dF as the pressure-volume work on the system under fixed temperature.

Yet another thermodynamic potential, the Gibbs free energy, is a measure of the maximum reversible work a system can perform at constant pressure (P) and temperature (T), excluding expansion work.  It is useful to determine if a given process occurs spontaneously (e.g., chemical reactions, phase transitions) and equilibrium conditions.

Bibliography

[1] Legendre, A. M., “Mémoire sur l’intégration de quelques équations aux différences partielles.” Histoire de l’Académie Royale des Sciences, 1789, pp. 309–351.

[2] Arnold, V. I. “Mathematical Methods of Classical Mechanics”, 2nd Edition,  Graduate Texts in Mathematics (60), 1989.

[3] Boyd, Stephen P. and Vandenberghe, L. “Convex Optimization“. Cambridge University Press, 2004.

Asymptotes, tangents and backwards long division

A slant asymptote for a real function of one real variable f is a line y=ax+b with the property

f(x)=ax+b+g(x)\qquad\qquad (1),

where g(x)\to 0 as x\to\pm\infty. Geometrically, the graph of f comes closer and closer to the line for large positive/negative values of x. For the sake of simplicity, we will deal with asymptotes at +\infty, the other case being completely analogous. When a=0 we say that the asymptote is horizontal. A simple way to detect if a given function has a slant asymptote is by checking linearity at infinity, {\it i.e.} if the limit

\displaystyle{\lim\limits_{x\to\infty}\frac{f(x)}{x}}

is finite. If that is the case, the value of the limit is the slope of the asymptote, a. The free term b is then given by the limit

\displaystyle{\lim\limits_{x\to\infty}f(x)-ax},

if the latter exists (and is finite). For rational functions of the form

\displaystyle{f(x)=\frac{P(x)}{Q(x)}}

where P and Q are polynomials, the situation is much simpler. In order for f to be linear at infinity, we need {\rm deg}(P)-{\rm deg}(Q)\le 1 and, if that is the case, we perform long division which leads to

\displaystyle{f(x)=ax+b+\frac{R(x)}{Q(x)}}\qquad\qquad (2)

where {\rm deg}(R)<{\rm deg}(Q) and, therefore, the asymptote is just the quotient y=ax+b since

\displaystyle{\lim\limits_{x\to\infty}\frac{R(x)}{Q(x)}=0}

When {\rm deg}(P)-{\rm deg}(Q)> 1, the fraction is superlinear at infinity.

All this is well known and usually taught in high school.

On the other hand, students are taught to find tangents to a given curve at a given point by means of derivatives. It turns out, however, that finding the tangent to a rational function at x=0 does not require derivatives at all. In order to understand this, we notice that y=cx+d is a tangent to f(x) at x=0 precisely when

f(x)=cx+d+g(x)

with g(x)=o(x) as x\to 0. The last relation is very similar to (1) except for the fact that now we are looking at x\to 0 instead of x\to\infty. Thus, if we could somehow come up with a relation like (2) where now

\displaystyle{\lim\limits_{x\to 0}\frac{R(x)}{xQ(x)}= 0},

the quotient would be the tangent.

As it happens, that is perfectly possible. All we need to do is divide the polynomials starting with the lowest powers (backwards), until we reach a “partial remainder” whose lowest degree is two or higher. Observe that the lowest degree of the divisor Q is necessarily zero (otherwise f is undefined at x=0).

As an example, let us find the tangent to the graph of

\displaystyle{g(x)=\frac{2-2x+3x^2}{1+2x^2}}

at x=0. Starting with the lowest degree, we have 2=2\cdot 1, 2(1+2x^2)=2+4x^2 with first partial remainder R_1(x)=2-2x+3x^2-(2+4x^2)=-2x-x^2. In the next step, we add -2x to the quotient, with a partial remainder R_2(x)=-2x-x^2+2x(1+2x^2)=-x^2+4x^3. Since we are interested in the tangent line and x^2 is an infinitesimal of degree higher than one at zero, the division stops here and the equation of the tangent is y=2-2x. If we keep dividing, we get the Taylor polynomials of higher degree. For instance, the osculating parabola at x=0 is y=2-2x-x^2 and so on.

Long division is taught at school starting with the highest powers. A possible reason is that long division of numbers proceeds by reducing the remainder at each step. If we replace the base 10 in the decimal representation of numbers by x, we arrive at the usual long division algorithms of polynomials, reducing the degree at each step. I would call this procedure “division at infinity”. In contrast, the above is an example of “division at zero”.

Finding the tangent to a rational function at a point different from zero can be reduced to the previous case. If, say, we need to find the tangent to f(x)=P(x)/Q(x) at x=x_0, all we need to do is set x=x_0+z and express P and Q as polynomials in z and then finding the tangent at z=0 as before. Finally, we have to replace z back by x-x_0 in the found equation of the tangent.

The above reveals a perfect symmetry between the problems of finding the asymptote and that of finding the tangent at x=0. In some sense, we can say that an asymptote is a “tangent at infinity” and, I guess, that a tangent is an asymptote at a finite point. Both problems are algebraic in nature and can be solved without limit procedures, just by means of division (forward or backward).

More generally, for rational functions, being the simplest non-polynomial functions, finding their Taylor expansion at zero (and, by translation at any point) is a pure algebraic procedure. I believe this fact should be emphasized in high school and it could be used as a motivating example to introduce more general power expansions. As a matter of fact, Newton was inspired by the algorithm of long division for numbers to start experimenting with power series, not necessarily with integer powers. The image below shows a page from his “Method of Fluxions and Infinite Series”. The “backwards” long division of a^2 by b+x is performed in order to get the power series expansion.

The differential of a quotient

Here is yet another example of “division at zero”.

When students are exposed to the differentiation rules, those are derived from the definition of derivative as the limit of the differential quotient. Thus for example to prove the rule of differentiation of a product we proceed as follows.

\displaystyle{(fg)'(x)=\lim\limits_{h\to 0}\frac{(fg)(x+h)-(fg)(x)}{h}=}

\displaystyle{=\lim\limits_{h\to 0}\frac{f(x+h)g(x+h)-f(x)g(x)}{h}=\lim\limits_{h\to 0}\frac{f(x+h)g(x+h)-f(x+h)g(x)+f(x+h)g(x)-f(x)g(x)}{h}=}

\displaystyle{=\lim\limits_{h\to 0}f(x+h)\frac{g(x+h)-g(x)}{h}+\lim\limits_{h\to 0}g(x)\frac{f(x+h)-f(x)}{h}=}

\displaystyle{=f(x)g'(x)+f'(x)g(x)},

given that all the limits are assumed to exist. A similar computation can be done for the quotient. It should be noted, however, that a little algebraic trick has to be used in both cases to make the derivatives of the individual factors appear explicitly. No big deal, but a bit artificial. And, importantly, not the way the founders of Infinitesimal Calculus arrived at these rules.

To help intuition, the product rule is often presented in the form

d(uv)=(u+du)(v+dv)-uv=udv+vdu+dudv=udv+vdu,

and the last term is ignored in the last equality as being a quadratic infinitesimal (in Leibniz’ terminology, the last equality is actually an “adequality”, a term coined by Fermat). Without a doubt, the latter derivation, albeit not meeting the modern standards of rigor, reveals the reason for the presence of the “mixed” terms and the general structure of the formula. Moreover, no algebraic tricks are required. The formula follows in a straightforward manner.

A similar derivation of the quotient rule involves “division at zero”. Here is the derivation in the book “Calculus made easy” by Silvanus Thompson, from 1910.

Observe that the operation has been stopped when the remainder is a quadratic infinitesimal. The conclusion of the computation is the familiar rule

\displaystyle{d\left(\frac u{v}\right)=\frac{u+du}{v+dv}-\frac u{v}=\frac{du}v-\frac{udv}{v^2}=\frac{vdu-udv}{v^2}}.