What is the difference between Training dataset, Testing Dataset, Validation dataset? What is the Common Ratio?

Training dataset, Testing Dataset, Validation dataset (i2tutorials)

Top Machine learning interview questions and answers

September 27, 2019

What is the difference between Training dataset, Testing Dataset, Validation dataset? What is the Common Ratio?

Training Dataset

The sample of data used to fit the model.

The actual dataset that we use to train the model. The model sees and learns from this data.

Validation Dataset

The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyperparameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.

Test Dataset

The sample of data used to provide an unbiased evaluation of a final model fit on the training dataset.

Generally, Train Dataset, Validation Dataset, Test Dataset are divided in the ratio of 60%, 20%, 20% respectively. Sometimes it may be 80% and 20% for Training and Testing Datasets respectively.