The Legendre transform has its origins in the work of A.M. Legendre [1], where it was introduced to recast a first order PDE into a more convenient form. It gained significance around 1830 with the reformulation of Lagrangian mechanics carried out by W. R. Hamilton by replacing the “natural” configuration space by the so called phase space with its rich symplectic structure [2] and as a central tool in Hamilton-Jacobi theory. In the late 19th century, it was used in Thermodynamics (Gibbs, Maxwell) as a way of switching between different thermodynamic potentials. With the development of Convex Analysis and Optimization in the 20th century, its central role in providing so called dual representations of functions, problems, etc. became apparent, [3].
From a high level point of view, the transform establishes a link between a function of a variable belonging to
or a more general topological vector space and its “dual” or “conjugate” function, defined on the dual space of linear forms. From a more practical perspective, it can be understood as an alternative way of encoding the information contained in a function. In that sense, it is very similar to other transforms like the Fourier series/transform and the Laplace transform. The Fourier series, for instance, encodes the information about the given (periodic) function in the form of a sequence of “amplitudes” corresponding to the different “modes”. The Fourier transform furnishes a similar integral representation for non-periodic functions, where this time different “modes” or “frequencies” form a continuum. Both the series and the transform provide a spectral representation of the function.
Suppose we are given a convex real function of one real variable. Let us assume for simplicity that is twice differentiable and
for all
. Typical examples are quadratic functions
with
, exponential functions
with
, the function
, etc.
Since ,
is a strictly increasing (hence one-to-one) function. Such correspondence allows to identify
with the value of the slope of
at that point,
. In other words,
can be used as a new independent variable. At this point, one might be tempted to use the function
, where
(or
) as an alternative representation of
. This is, no doubt, a possible choice. However, a little extra work provides a function
that has the following pleasant additional property: if the procedure is applied twice we recover the original function,
. The transformation
is what is called involutive (an involution).
Namely, given a slope , we start by identifying the point
such that
. You can think of this operation as parallel-transporting a line with slope
in the direction of the
axis starting from
. Assuming that
, there will be a first point of contact with the graph of
. At this point, the graph is tangent to the line. We will encode this information by recording the point of intersection with the
– axis; more precisely, if our line has equation
, we define
. The knowledge of the function
is tantamount to the knowledge of all the tangents to our graph: our original graph is their (uniquely defined) envelope. The function
is called the Legendre transform of
. In the figure below, the values of the transform of the function
at
and
are represented.

It is easy to derive an explicit formula for . Given
, we need to solve for
in the equation
The equation of the tangent is . Its
-intercept is
. Finally,
.
This new function is convex. Indeed, a simple computation shows that
.
From formula it follows that, given
, we have
where, as before, . The formula above reveals that
, (i.e. the transformation is involutive), since
There is another way to look at this. For fixed in the range of
, the condition
yields the (unique) critical point of the concave function
. This point is a global maximum. Thus,
.
The latter equation implies the well-known Young’s inequality:
valid for any . Equality takes place when
are linked by the relation
.
Examples: If ,
. More generally, if
with
for
, then
, where
. If
, then
, etc.
Generalizations
So far, we have assumed that our function is twice differentiable, with
in its domain. However, the following slight modification of
,
is meaningful for any real and any function
if we accept the values
in the range of
. The resulting function
, being the supremum of a family of linear functions, is convex on its domain. Definition
, however, is not completely satisfactory for non-convex functions, as it loses information about the original function. However, for a convex function defined on
,
encodes all the information about
and, moreover,
, a fact known as the Fenchel-Moreau theorem, [3]. Namely,
,
which can be thought as a representation of as the envelope of its tangents. For convex functions defined on some domain
, an extra technical condition is needed on
for
to hold, namely lower semicontinuity, which is equivalent to the closedness of its epigraph. It is easy to see that this is a necessary condition for
to hold. Indeed,
furnishes a representation of the epigraph of
as an intersection of closed half-planes, hence necessarily closed.
In the context of Convex Analysis, the more general transformation given by is called the Fenchel transform (or Fenchel-Legendre transform) and the more general inequality
is called the Fenchel inequality (or Fenchel-Young inequality).
Thus, the transformation can be applied to a piece-wise linear, convex function. For instance, if is a linear function,
, then clearly
is finite only for
, with
. If
is made of two linear functions,
when
and
when
, with
and
, then
is finite only for
, where it is linear and ranges from
to
. In general, to each “corner” of a polygonal graph there corresponds a segment on the graph of
, [2].
The generalization to functions is straightforward. Under the smoothness and strict convexity assumption
,
the mapping is one-to-one. If
is the vector representing
via the standard Euclidean structure, we define the Legendre transform by
,
where denotes the inner product. Much like in the one-dimensional case, the previous definition is equivalent to
.
Finally, we relax the smoothness and strict convexity assumptions and arrive at the most general definition
,
where now is merely convex and
. The Fenchel-Moreau theorem holds without modifications under the extra assumption of lower semicontinuity of
if it is not defined on all of
.
One can also consider “partial” Legendre transforms, i.e. transforms relative to some of the variables. Thus if is a function of two variables, one can consider its transform with respect to the first variable,
.
For smooth, strictly convex functions and fixed , the supremum is achieved at the (unique)
defined by
.
The transform can be further generalized to functions on manifolds, but given the fact that a manifold does not have a global linear structure, the duality is established locally, between functions on the tangent bundle and their “conjugates” or “dual” on the cotangent bundle
. Namely, the transform connects functions of
with functions of
, where
and
.
Applications
A) Clairaut’s differential equation
A standard example of ODE not solved for the derivative is Clairaut’s equation
.
It clearly admits the family of straight lines as solutions. The envelope of the family is a singular solution satisfying
and
. But these are precisely the relations defining the Legendre transform of
. We conclude that, for convex
, the singular solution of Clairaut equation is its Legendre transform.
B) Hamiltonian Mechanics from Lagrangian Mechanics.
For many Physics, Applied Math, and Engineering students, this is their first introduction to the Legendre transform. The Lagrangian of a mechanical system on its configuration space (a differentiable, Riemannian manifold) completely describes the system. The actual path
joining two states
and
is an extremal of the action functional,
(Hamilton’s principle) and therefore satisfies Euler-Lagrange second order differential equations
.
Te geometry of the configuration space is intimately connected to the Physics. Thus, holonomic constraints are built into the manifold, the kinetic energy is nothing but the Riemannian metric on the manifold, geodesics relative to this metric represent motion “by inertia”, etc. The alternative Hamiltonian description reveals connections to a different geometry and is introduced as follows. Assuming that the Lagrangian is strictly convex in the generalized velocities ,
we introduce the Hamiltonian of the system as the Legendre transform of the Lagrangian with respect to :
,
where is the generalized momentum, an element of the cotangent fiber at
. Since
it follows that satisfy the first order Hamiltonian system:
The space is called the phase space, and the Hamiltonian function equips it with a remarkable symplectic structure. The Hamiltonian flow (if
does not depend on time) is a one-parameter subgroup of the group of symplectomorphisms of the phase space. A straightforward consequence is Liouville’s Theorem on the preservation of the phase volume (a cornerstone of Statistical Mechanics) . The Lagrangian approach has, however, certain advantages including: a) it is easier to deal with constraints, even non-holonomic via Lagrange multipliers; b) it is easier to track conserved quantities via Noether’s theorem, c) non-conservative forces can be incorporated, etc.
It is worth noticing that the above “Hamiltonization” applies to general variational problems, not just to the problem related to mechanical systems. In the case of mechanical systems, the part of the Lagrangian depending on is usually a positive definite quadratic function (the kinetic energy of the system) hence convex. The transform of a quadratic form is especially simple: it is just another quadratic form of the conjugate variable. For instance, the Lagrangian of a simple mass-spring system, assuming that the spring is linear and
represents the deviation of the mass from equilibrium is
and the Hamiltonian is
and represents the total energy of the system. That is the case whenever the Lagrangian is quadratic on velocities, the system is conservative and the constraints are time-independent.
C) Thermodynamic potentials
Yet another early use of the transform was in Thermodynamics, as a way to switch between potentials according to the most convenient independent parameters.
According to the first principle of Thermodynamics (energy conservation), there exists a function of state (internal energy) such that, for any thermodynamical system undergoing an infinitesimal change of state,
,
where represent the heat (thermal energy) added to the system,
is the work of the surroundings on the system and the last sum above accounts for the energy added to the system by means of particle exchange. Here,
is the amount of particles of the
-th type added to the system, and
the corresponding chemical potential. The relevant fact here is that
is an actual differential, whereas the rest of the terms are just differential forms in the configuration space of the system. The term
is in general a differential form
involving intensive (
) and conjugate extensive variables
. Thus, for example, in the case of a gas expanding/compressing against the environment, the work done on the gas is
, where
is the infinitesimal change of volume and
is the external pressure. Moreover, according to the second principle, for reversible processes the form
admits an integrating factor
, namely there is a function of state called entropy
such that
Putting all together, for an infinitesimal, reversible (hence quasi-static) process undergone by a gas we get
where is the pressure of the gas in equilibrium with its surroundings.
While accounts for the total energy of the system, related state functions accounting for different manifestations of energy may result more convenient for particular experimental or theoretical scenarios. For instance, many chemical reactions occur at constant pressure in lab conditions. In such cases, it is convenient to include a term representing the work needed to push aside the surroundings to occupy volume
at pressure
, namely we define the enthalpy of the system as
.
Given that, thanks to we have
,
is nothing but (minus) the Legendre transform of
with respect to
, that is,
.
(in Thermodynamics, the transform is usually defined with opposite sign so there is no “minus” in the previous formula. A possible reason is that one prefers all forms of energy to increase/decrease in agreement). Assuming for simplicity that we have
and, therefore, at constant pressure, . Thus, the change of enthalpy determines if a given chemical reaction is exothermic or endothermic. Moreover,
and
.
In a similar fashion, one can consider the (opposite of) Legendre transform of with respect to entropy,
,
called (Helmholtz) free energy. A similar computation shows that
,
thus we can think of as the pressure-volume work on the system under fixed temperature.
Yet another thermodynamic potential, the Gibbs free energy, is a measure of the maximum reversible work a system can perform at constant pressure (P) and temperature (T), excluding expansion work. It is useful to determine if a given process occurs spontaneously (e.g., chemical reactions, phase transitions) and equilibrium conditions.
Bibliography
[1] Legendre, A. M., “Mémoire sur l’intégration de quelques équations aux différences partielles.” Histoire de l’Académie Royale des Sciences, 1789, pp. 309–351.
[2] Arnold, V. I. “Mathematical Methods of Classical Mechanics”, 2nd Edition, Graduate Texts in Mathematics (60), 1989.
[3] Boyd, Stephen P. and Vandenberghe, L. “Convex Optimization“. Cambridge University Press, 2004.