Lagrange multipliers and their origins in Mechanics

il n’est pas difficile de prouver par la théorie de l’élimination des équations linéaires, qu’on aura les mêmes résultats si on ajoute simplement à l’équation des vitesses virtuelles, les différentes équations de condition dL=o, \, dM=o, \, dN=o, \, \&c, multipliées chacune par un coefficient indéterminé qu’ensuite on égale à zéro la somme de tous les terms qui se trouvent multipliés par une même différentielle… – J. L. Lagrange, “Méchanique Analytique”.

Lagrange introduced the so called “multipliers” in a very natural way to deal with constrained mechanical systems. Unfortunately, in most Calculus textbooks the method is presented using the (unavailable to Lagrange) language of vectors and exhibiting a few completely artificial examples.

In some texts, a geometric interpretation is provided in the two-dimensional case where only one constraint can be imposed. The argument amounts to showing that at a relative extremum the level set of the objective function f cannot be “transversal” to the level set of the constraint g=0 and therefore their gradients have to be collinear. Despite how appealing this interpretation might be, it does not correspond to the historical development of the method. It is rather an afterthought. A similar example would be motivating Pythagorean theorem using vectors and the inner product in \mathbb{R}^2. I believe that re-interpreting mathematical facts from a higher perspective is very useful but using those re-interpretations as motivation is misleading.

Let’s look at how Lagrange himself came up with the idea of multipliers.

A fundamental principle of Mechanics is that of virtual displacements (virtual work, d’Alembert-Lagrange principle). It reads as follows: a mechanical system is in equilibrium if the total work of applied forces is zero for any virtual displacement of the system. The term “applied forces” is used to exclude forces of constraint, {\it i.e.} those which account for constraints of the system. By definition, virtual displacements are infinitesimal displacements consistent with the constraints at a given time, see [2], [3]. Thus the main assumption in this principle is that the work of the forces of constraint is zero on the allowed displacements at any given time. Such constraints are called ideal.

A simple example is that of a block sliding along a horizontal surface. The force of constraint in this case is the normal reaction from the surface which, being perpendicular to allowed displacements, does not do work. Another example is that of a bead threaded onto a wire and forced to move along the wire. The force of constraint that keeps the bead sliding along the wire is perpendicular to the wire at each moment.

It is important to distinguish between actual displacements and virtual displacements. Even if the wire in the second example is moving and the actual displacement of the bead has a component normal to the wire, virtual displacements are still along the wire. In the terminology of Lagrangian formalism, virtual displacements are tangent vectors to the configuration space at each time, see [2], [3].

This principle has a long history, going back to the principle of the lever (Archimedes). It appeared in some form in the works of Stevin, Galileo, Wallis, Varignon and many others to tackle problems in Statics, and took his final form in the hands of Johann Bernoulli and J. d’Alembert who, in a leap of genius, generalized it to systems not necessarily in equilibrium, by including the forces of inertia. It constitutes the guiding principle in Lagrange’s monumental work “Méchanique Analytique” (1788) [1], whose first chapter contains a detailed historical account of the subject. Its main advantage compared to the Newtonian (vectorial) approach is that we do not need to know the forces of constraint, but only the constraints themselves, expressed as relations between positions, velocities, etc. The reaction forces can then be computed using Lagrange multipliers (see below).

Suppose we have a system of N particles (point masses) in space with positions (x_{i},y_{i},z_{i}), i=1,2,\dots N. Let the total force acting on the i – th particle be F_i=(P_i,Q_i,R_i). If there are no constraints on the particles, all possible displacements d\mathbf{r}_i=(dx_{i},dy_{i},dz_{i}) are admissible and, according to the principle of virtual displacements, the system is in equilibrium if and only if

\sum_i F_i\cdot d\mathbf{r}_i=0\qquad\qquad\qquad\qquad (1)

for all d\mathbf{r}_i. By freezing all of the \mathbf{r}_i‘s but one, we see that each of the addends in the above sum has to vanish, that is

P_idx_i+Q_idy_i+R_idz_i=0.

Given that dx_i,dy_i,dz_i are arbitrary, this in turn implies

P_i=Q_i=R_i=0\qquad\textrm{for all\ \ }i,

an accordance to first Newton’s law. It is important to note here that the conclusion follows from the fact that the virtual displacements are completely arbitrary.

What if they are not? That is, what if our system is subject to some constraints? One possibility is to add the force responsible for the constraints, but this is can actually be avoided as follows. Suppose we have an analytic description of the constraint(s) of the form

g_1(\mathbf{r}_1,\mathbf{r}_2,\dots \mathbf{r}_N)=0,\quad\dots\quad  g_m(\mathbf{r}_1,\mathbf{r}_2,\dots \mathbf{r}_N)=0,

where, for simplicity, we assume that our constraints are scleronomic, {\it i.e.} holonomic (finite) and time independent. These constraints reduce the freedom of the virtual displacements, which now are forced to be tangent to the above surfaces in 3N – dimensional space and therefore to their (generically) (3N-m) – dimensional intersection, a manifold in \mathbb{R}^{3N}. To calculate those virtual displacements, we differentiate each one of the above equations, yielding

\left\{\begin{array}{l}\displaystyle{\frac{\partial g_1}{\partial \mathbf{r}_1}\cdot d\mathbf{r}_1+\frac{\partial g_1}{\partial \mathbf{r}_2}\cdot d\mathbf{r}_2+\dots \frac{\partial g_1}{\partial \mathbf{r}_N}\cdot d\mathbf{r}_N=0}\\ [15pt]\displaystyle{\frac{\partial g_2}{\partial \mathbf{r}_1}\cdot d\mathbf{r}_1+\frac{\partial g_2}{\partial \mathbf{r}_2}\cdot d\mathbf{r}_2+\dots \frac{\partial g_2}{\partial \mathbf{r}_N}\cdot d\mathbf{r}_N=0}\\[15pt]\cdots\cdots\cdots\cdots\end{array}\qquad\qquad\qquad\quad (2)\right.

where each addend is an inner product and

\displaystyle{\frac{\partial g_k}{\partial \mathbf{r}_i}=\left(\frac{\partial g_k}{\partial x_i},\,\frac{\partial g_k}{\partial y_i},\, \frac{\partial g_k}{\partial z_i}\right)}

are the partial gradients with respect to each particle. Given that now the d\mathbf{r}_i are not arbitrary but linked by the above relations, we cannot conclude from equation (1) that F_i=0 for all i. One possibility would be to solve the linear system (2) for m differentials in terms of 3N-m independent ones and substitute into (1). Then, the equilibrium condition would be obtained by setting the resulting coefficients of the independent differentials equal to zero.

The method of multipliers provides an elegant way to do just that. Let’s assume, for simplicity, that there is only one constraint and one particle, N=m=1. Then (1) and (2) give, respectively,

Pdx+Qdy+Rdz=0;\qquad A_xdx+A_ydy+A_zdz=0,\qquad\qquad\qquad (3),

where A_x=A_x(x,y,z)=\partial g/\partial x, etc. are the partial derivatives of some function g. At least one of A_x,A_y,A_z has to be non-zero if g=0 is a regular surface. In order to exclude one of the differentials from (3) we multiply the second equation by a constant \lambda and add it to the first. Thus at any given point we can choose \lambda in such a way that one of the quantities

P+\lambda A_1,\, Q+\lambda A_2,\, R+\lambda A_3

vanishes. But then the other two must also vanish, being the coefficients in front of the remaining independent differentials. Thus in any case, excluding \lambda from the equations

P+\lambda A_1=0,\, Q+\lambda A_2=0,\, R+\lambda A_3=0

and adjoining the constraint g(x,y,z)=0, we get the equilibria.

Generalizing the above procedure for the case of more constraints or more particles is straightforward. We multiply each one of the equations in (2) by independent multipliers \lambda_1,\lambda_2,\dots \lambda_m and add their sum to (1) . We choose the multipliers in order to remove m differentials at each point, assuming that the constraints are independent. The equations corresponding to removing those differentials and the ones resulting from equating to zero the remaining coefficients have the same form, giving the method its symmetry with respect to all variables. Finally, we adjoin the constraint equations themselves to close the system.

The number of constraints is at most 3N-1, otherwise no freedom is left for the system. For example, one particle in space can be subject to lie on a surface or on a line (the intersection of two surfaces). Adding one more constraint would completely fix its position.

This method is used for conditional optimization. If we are to minimize/maximize a function U of, say, three variables subject to constraints, we follow the above procedure with (1) replaced by the condition dU=0. When the work of a force can be put in the form dU for some scalar function U, we say that the force is conservative or potential, and U is a potential energy. Thus, a constrained conservative system is in equilibrium at a conditional extremum of its potential energy.

It is worth mentioning though that in the application to Mechanics, the type of equilibrium (max/min/saddle) is relevant and directly related to the notion of stability. For example, an ideal pendulum has two equilibria: the lowermost position and the uppermost position which correspond to a minimum and a maximum of the gravitational potential energy. The first one is stable and the second one is unstable. In order to keep the size of this post reasonable, we refrain from going into the topic of classification of extrema, which involves the analysis of the signature of the second differential of U over the subspace defined by virtual displacements.

When used to solve dynamics problems, the principle of virtual displacement includes the “force of inertia”. Thus, along the actual motion \mathbf{r}_i(t),

\sum_i\left(-m_i\mathbf{r}_i^{''}+F_i\right)\cdot d\mathbf{r}_i=0

for all virtual displacements d\mathbf{r}_i of the system. Here m_i stands for the mass of the i-th point and \mathbf{r}_i'' for its acceleration.

As a by-product of the method of multipliers, we can find the forces of constraint (reactions). For simplicity, suppose we have one point and one constraint g(x,y,z)=0, as before. The method of multipliers gives the equations of motion

\left\{\begin{array}{l}-mx''+F_x+\lambda\frac{\partial g}{\partial x}=0\\[15pt]-my''+F_y+\lambda\frac{\partial g}{\partial y}=0\\[15pt]-mz''+F_z+\lambda\frac{\partial g}{\partial z}=0\end{array}\right.

Therefore, the reaction force is

\displaystyle{R=\lambda\left(\frac{\partial g}{\partial x},\, \frac{\partial g}{\partial y},\, \frac{\partial g}{\partial z}\right).}

A few final remarks:

1) The above method can be applied if the constraints are time-dependent (rheonomic). Precisely, if we have just one particle and a constraint of the form

g(x,y,z,t)=0

by differentiating and setting dt=0, virtual displacements dx,\, dy,\, dz satisfy the same equation as before (second equation in (3)). In this case virtual displacements are different from real ones. Non-holonomic constraints cannot be handled by this method, except for special cases ( {\it e.g.} linear in the velocities). See [3].

2) Holonomic constraints can be completely eliminated by introducing independent generalized coordinates which implicitly account for them. Thus if \mathbf{q}=(q_j), j=1,2,\dots k=3N-m are generalized coordinates of the system and its kinetic energy is T=T(\mathbf{q},\mathbf{q}',t) , we can write the equations of motion in the (Lagrangian) form

\displaystyle{\frac{d}{dt}\left(\frac{\partial T}{\partial q_j'}\right)-\frac{\partial T}{\partial q_j}=Q_j},

where and Q_j are generalized forces, [3]. By solving this second order system of equations we find the dynamics \mathbf{q}(t). But in some cases we are interested in the forces of constraint. One way to find them would be to substitute into Newton’s second law and solving for the unaccounted forces. However, it is generally more convenient to use dependent generalized coordinates in combination with the method of multipliers to deal with constraints and find the forces of constraint as above.

3) If all the constraints are holonomic and forces are conservative with potential energy U, the principle of virtual displacements, which for one particle takes the form

\displaystyle{\left(m\mathbf{r}^{''}+\frac{\partial U}{\partial \mathbf{r}} \right)\cdot d\mathbf{r}=0};\qquad\qquad\qquad d\mathbf{r} - \textrm{virtual displacement}

is precisely the condition of (conditional) stationarity for the action functional

\displaystyle{S[\mathbf{q}]=\int\limits_{t_1}^{t_2}L(\mathbf{q}',\mathbf{q},t)\,dt},

where \mathbf{q} are, as before, generalized coordinates on the (3-m) – dimensional manifold M defined by the constraints, T=T(\mathbf{q}',\mathbf{q},t) is the kinetic energy and L=T-U is the Lagrangian of the system. The true motion \mathbf{q}(t) is an extremal of S among curves contained in M if only variations within M are allowed, [2]. The corresponding Euler-Lagrange equations furnish the equations of motion

\displaystyle{\frac{d}{dt}\left(\frac{\partial L}{\partial q_j'}\right)-\frac{\partial L}{\partial q_j}=0}.

Generalization to N>1 particles is straightforward.

References:

[1] J. L. Lagrange, “Mécanique Analytique“, Paris, Ve Courcier, 1811-15.

[2] V.I. Arnold, “Mathematical Methods of Classical Mechanics”, Graduate Texts in Mathematics, Vol. 60, 2nd Edition, 2010.

[3] H. Goldstein, Ch. Poole, J. Safe, “Classical Mechanics”, Addison Wesley, Third Edition, 2001.