Gradient ascent and descent

Published

Fri, Feb 20

Gradient ascent is a fun way to find local maxima and minima of multivariate functions.

Contours

Let’s briefly review contours before we get to how the gradient fits in.

A contour of a function \(z=f(x,y)\) is formed by intersecting a horizontal plane with the three-dimensional graph of \(f.\) Algebraically, a contour is the graph of an equation of the form \[f(x,y)=c,\] where \(c\in\mathbb{R}\) is constant.

As a simple but important example, let’s play with \(f(x,y)=x^2+y^2\). The graph of this function looks like so:

The mesh on the graph that function is generated using polar coordinates, which we’ll talk about later this semester. Of importance for us now is the fact that the graph displays a rotational symmetry. The circles that we see are curves of the form \[ x^2+y^2 = c. \]

If we plot a bunch of those curves together and plot them in a plane, we get a contour diagram for the function:

When we see closed loops like these, they generally enclose some local extreme - either a local maximum or a local minimum.

Gradient vectors

We’ve also learned recently about the gradient. Algebraically, the gradient of a function \(f(x,y)\) is defined by \[ \nabla f = \left\langle \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right\rangle. \] Since it’s a vector, geometrically, the gradient is described by two properties: - Direction: The gradient points in the direction of greatest increase of \(f\) and - Magnitude: The magnitude tells you the rate of increase in that direction.

Two great tastes that taste great together

It turns out that something nice happens if we plot contours and gradients together. Here, for example, we do that with our basic paraboloid \(f(x,y)=x^2+y^2\):

Note that the gradient vectors are generally perpendicular to the contours. This is a consequence of the fact that the gradient points in the direction of greatest increase, while the contour travels along a direction of constant value. Thus, this perpendicularity is true fairly generally.

The peaks function

We’ve met the peaks function before:

\[ f(x,y) = 3 \, (1-x)^2 e^{-x^2-(y+1)^2}-10 \, e^{-x^2-y^2} \left(-x^3+\frac{x}{5}-y^5\right)-\frac{1}{3} \, e^{-(x+1)^2-y^2}. \]

The graph of the peaks function looks like so:

Here’s a contour plot:

And here are some gradient vectors on top of the contour plot:

Gradient ascent

Again, gradient vectors generally point away from minima and towards maxima. Thus, if you follow gradient vectors, then you will either

  • diverge to infinity or
  • eventually find yourself at a local maximum.

That leads to a maximization technique called gradient ascent.

Here’s what that looks like for the peaks function:

Naturally, you can walk backwards. Doing so leads to gradient descent, which finds minima.

Here’s gradient descent for peaks: