Maxima vs Minima and Global vs Local in Machine learning
You might have heard or read the statement that goes something like “The algorithm might get stuck at one of the local minima and not converge to the global minimum”. But what exactly does it mean? I’ll explain the concept of maxima, minima, local and global. Additionally I’ll be covering how calculus makes it possible to identify these points.
I’ll explain the concepts for functions of single variable because they are easy to visualize. However, they extend to multivariate cases.
Let us start with a few definitions.
A real-valued function f defined on a domain X has a global (or absolute) maximum point at x∗ if f(x∗) ≥ f(x) for all x in X.
A real-valued function f defined on a domain X has a global (or absolute) maximum point at x∗ if f(x∗) ≤ f(x) for all x in X.
If the domain X is a metric space then f is said to have a local (or relative) maximum point at the point x∗if there exists some ε > 0 such that f(x∗) ≥ f(x) for all x in X within distance of x∗.
If the domain X is a metric space then f is said to have a local (or relative) maximum point at the point x∗if there exists some ε > 0 such that f(x∗) ≤ f(x) for all x in X within distance of x∗.
Graphics tend to make the concepts easier to understand. I’ve summarized these four types of points in the following figure.
As the name suggests minimum is the lowest value in a set and maximum is the highest value. Global means it is true for the entire set and local means it is true in some vicinity. A function can have multiple local maxima and minima. However there can be only one global maximum as well as minimum. Note that for Figures (a) and (b) the function domain is restricted to the values you are seeing. If it were to be infinite then there is no global minimum for the graph in Figure (a).
Now that we understand these concepts, the next step is how to find these extreme points.
Turns out derivatives in calculus are useful for finding these points. I won’t be going into the details of derivatives. However, I’ll explain enough to understand the following discussion.
Derivative gives you rate of change of something with respect to something.
How quickly would a medicine be absorbed by your system can be modeled and analysed using calculus.
Now let us understand what is a critical point.
So we know that at these critical points there will be either a local or global maximum of minimum. The next step is to identity which category it belongs to.
You can use either of the two tests i.e. the first and second derivative test to classify the maximum and minimum values. When I was in my high school I used to find the second derivative test faster since I’d to calculate only one value (without using a calculator). I’ll show you one example to give you a taste of how it is actually done.
For finding whether the point is global you’ll have to evaluate the function at the critical points and see which point is the lowest. In our examples we have seen a polynomial function. It is smooth and differentiable. There were limited points to test for and evaluating the function if you have the equation is easy.
However, now let us move to the real world. We never know the actual equation of the real life process that we deal with. Additionally there are several variables involved in the equation. These tests won’t be useful in those cases. For training a neural network you need to minimize the loss with respect to the network parameters. This is a multi-dimensional surface and multiple factors come into play. And the tests I discussed above won’t be effective.