i2tutorials

Machine Learning-K-Fold Cross Validation

K Fold Cross Validation

 

Cross Validation is a technique which involves reserving a specific sample of a dataset on which we do not train the model which is used to evaluate machine learning models on a limited data sample. It is commonly used in Machine learning to compare and choose a model for given predictive modeling problem this method is easy to understand and implement. Generally, the results in skill estimates have a lower bias compared to other methods.

 

K Fold Cross Validation method ensures that each and every observation from the original dataset has a chance of appearing in training and test set. It is one of the best methods if we have very limited input data.

 

K Fold Cross Validation procedure has a single parameter called k which refers to the number of groups that a given data sample is to be split into. Hence, the procedure is called as K Fold Cross Validation. When a specific value for K is selected, it may be used in place of K in the reference of the model, such as K=10 becoming 10-fold cross validation.

K Fold Cross Validation 1 (i2tutorials)

 

The steps followed in K Fold Cross Validation are discussed below:

 

Finally, we test our model on this sample before finalizing it. Cross Validation is a statistical approach used to estimate the skill of models.

 

There are number of variations in the K Fold Cross Validation procedure. The three generally used variations are explained below

 

Train/Test Split

To one extreme, value of K may be set to 2 not 1 such that a Single Train/Test split is created in order to evaluate the model.

 

LOOCV

LOOCV is called as Leave one out Cross Validation. To the another extreme, value of K may be set to the total number of observations in the dataset so that every observation is given a chance to be held out of the dataset.

 

Stratified

The splitting of data into number of folds may be governed by criteria like ensuring that every fold has the same ratio of observations with a given categorical value like class outcome value. This procedure is called Stratified Cross Validation.

 

Repeated

In this process, K fold cross validation procedure is iterated or repeated n number of times. The data sample is shuffled prior to each iteration which results in a different split of a sample.

Exit mobile version