To explain why this has a chance of working, consider a two-dimensional case. Suppose we have a function f(x,y) to maximise, subject to g(x,y) = c, where c is a given constant. We can visualise 'contours' given by f(x,y) = d for various values of d. The constraint is to stay on one contour, given by fixing the value of g. Suppose we are walking along that contour, and taking note of the various contours f = d we cross. If they cross the contour transversally, we can increase the value of f by walking 'uphill': that is following the direction leading to higher values of f. Only if the contour f = d we are on actually touches the contour g = c we are confined to will this not be possible. At a constrained maximum of f, that must be true.
A familiar example can be obtained from weather maps, with their contours for temperature and pressure: the constrained extrema will occur where the superposed maps show touching lines (isopleths).
Geometrically we translate the tangency condition to saying that the gradients of f and g are parallel vectors at the maximum. Introducing an unknown scalar λ, the gradient of f -λg is then 0 for some value of λ. This in geometrical form is the Lagrange multiplier argument: f -λg must be stationary, where the multiplier λ is a new variable, at a local extremum.
Table of contents |
2 Example 3 External links |
Let f (x1, x2, … xn) be a function defined on an n-dimensional open set.
We suppose f has continuous partial derivatives on that set.
A constraint is defined by the hypersurface which satisfies g (x1, x2, … xn) = C within the open set; where C is a constant.
Additionally, g ≠ 0 everywhere on the curve (where is the gradient).
Given that there is a local maximum (or minimum) of f(x) on the set {x ∈ Rn | g (x) = C},
by setting the derivative of f along the hypersurface to 0, one can show that there exists a real number λ (the Lagrange multiplier) such that
In general one can apply the reasoning to several constraints, by introducing multipliers λ, μ, ... for each.
The constraint g (x) = C will have to be to be used again, in applying the new criterion to find the solutions, since as it stands the particular value of C isn't mentioned.
Suppose we wish to find the discrete probability distribution with maximal information entropy. Then
The method of Lagrange multipliers is generalized by the Kuhn-Tucker Theorem.The method of Lagrange multipliers
Written differently
for each x that is such a local extremum. This is therefore a necessary condition, that holds at all possible solutions of the local extremum problem.Example
Of course, the sum of these probabilities equals 1, so our constraint function is
To find the point of maximum entropy (depending on the probabilities), we can use the Lagrange multiplier.
We set:
Carrying out the differentiation of these n equations, we get
This shows that all pi are equal (because they depend on λ only).
By using the constraint ∑k pk = 1 again, we get the uniform distribution:
For another example, see also Derivation of the partition function.