Site icon i2tutorials

What are the different ways of solving Gradient issues in RNN?

Neural network 63 (i2tutorials)

Ans: The lower the gradient is, the harder it is for the network to update the weights and the longer it takes to get to the final result. The output of the earlier layers is used as the input for the further layers. The training for the time point t is based on inputs that are coming from untrained layers. So, because of the vanishing gradient, the whole network is not being trained properly. If wrec is small, you have vanishing gradient problem, and If wrec is large, you have exploding gradient problem.

For the vanishing gradient problem, the further you go through the network, the lower your gradient is and the harder it is to train the weights, which has a domino effect on all of the further weights throughout the network.

In case of exploding gradient, you can:

In case of vanishing gradient, you can:

Exit mobile version