Why we use Softmax function only at the end of the Neural Network?
Ans: Softmax Function almost work like max layer that is output is either 0 or 1 for a single output node. It is also differentiable to train by gradient descent. Summation of all output will be always equal to 1. The high value of output will have highest probability than others.