## How can you avoid local minima to achieve the minimized loss function?

**Ans:** We can try to prevent our loss function from getting stuck in a local minima by providing a momentum value. So, it provides a basic impulse to the loss function in a specific direction and helps the function avoid narrow or small local minima.

Use stochastic gradient descent. The idea is to not use the exact gradient, but use a noisy estimate of the gradient, a random gradient whose expected value is the true gradient because we are using a noisy gradient, we can move in directions that are different from the gradient. This sometimes takes us away from a nearby local minimum, and can have the effect of preventing us from getting trapped in small local minimum.

**Batch size in Neural Networks**

This fits very well with training deep networks, because the true gradient depends on all the training data, and is very expensive to compute. By computing a gradient estimate using just some of the training data, we can much more efficiently produce a noisy estimate of the gradient.