What are different hyperparameters use in Convolutional Neural Networks during training model?
Ans: Hyperparameter tuning
Tuning hyperparameters for deep neural network is difficult because it is slow to train a deep neural network and there are many parameters to configure.
Learning rate controls updating of the weight in the optimization algorithm. By fixing the learning rate, gradually decreasing learning rate, momentum-based methods or adaptive learning rates, depending on our choice of optimizer such as SGD, Adam, Adagrad, AdaDelta or RMSProp.
Number of epochs
Number of epochs is the number of times the entire training set can pass through the neural network. Increase the number of epochs until we see a small gap between the test error and the training error.
Mini-batch is generally preferable in the learning process of convnet. A range of 16 to 128 is a good choice to test with. Convnet is sensitive to batch size.
Activation function introduces non-linearity to the model. Rectifier works well with convnet. Alternatives for performing hyperparameter tuning are sigmoid, tanh and other activation functions depending on the task.
Number of hidden layers and units
It is good to add more layers until the test error no longer improves. The tradeoff is computationally expensive to train the network. Having a small number of units may lead to underfitting while having more units are usually not damaging with suitable regularization.
Initializing the weights with small random numbers to prevent dead neurons, but not too small to avoid zero gradient. Generally Uniform distribution works well.
Dropout for regularization
Dropout is a better regularization technique to avoid overfitting in deep neural networks. This method simply drops out units in neural network according to the desired probability. A default value of 0.5 is a good choice to test with.
Grid search or randomized search
Manually tuning hyperparameter is painful and also unreasonable. Grid search thoroughly search all parameter combinations for given values. Random search sample a given number of candidates from a parameter space with a definite distribution.
To implement grid search more efficiently, it is better to start with coarse ranges of hyperparameter values in the initial stage which is also helpful to perform coarse grid search for smaller number of epochs or smaller training set. The next stage would then perform a narrow search with more epochs or entire training set.