Machine Learning - Interview Questions Part 4

Home / Machine Learning Interview question & answers / Machine Learning – Interview Questions Part 4

1.What is K in KNN classifier and How to choose optimal value of K?

To select the K for your data, we run the KNN algorithm several times with different values of K and choose the K which reduces the number of errors we meet while maintaining the algorithm’s ability to accurately make predictions.

As we decrease the value of K to 1, our predictions become less stable.

Inversely, as we increase the value of K, our predictions become more stable due to majority averaging, and thus, more likely to make more accurate predictions. Eventually, we begin to witness an increasing number of errors.

In cases where we are taking a majority vote among labels, we usually make K an odd number to have a tiebreaker.

2.Explain KNN classifier?

The KNN algorithm assumes that similar things exist in close proximity. Which means, similar things are near to each other. KNN captures the idea of similarity with some mathematics, calculating the distance between points on a graph.

It should also be noted that all three distance measures are only valid for continuous variables. In case of categorical variables, the Hamming distance must be used. It also brings up the problem of standardization of the numerical variables between 0 and 1 when there is a mixture of numerical and categorical variables in the dataset.

3.What is Sigmoid Function and Explain in detail?

Sigmoid is an activation function which is in S shape curve. It takes real value as an input and gives the output which is in between 0 and 1. It is non-linear in nature; it is continuously differentiable and has fixed output range of values. It gives a continuous form of output unlike Step function. It has smooth gradient. At the ends of the sigmoid function, Y values change less with change in X values, due to which vanishing gradient arises which results in Network refuses to learn further or too slow to make correct Predictions. Sigmoid Function is not zero centered.

4.What are the advantages and Disadvantages of Logistic Regression?

Advantages of Logistic Regression

1. Logistic Regression performs well when the dataset is linearly separable.

2. Logistic regression is less prone to over-fitting but it can overfit in high dimensional datasets. You should consider Regularization (L1 and L2) techniques to avoid over-fitting in these scenarios.

3. Logistic Regression not only gives a measure of how relevant a predictor (coefficient size) is, but also its direction of association (positive or negative).

4. Logistic regression is easier to implement, interpret and very efficient to train.

Disadvantages of Logistic Regression

1. Main limitation of Logistic Regression is the assumption of linearity between the dependent variable and the independent variables. In the real world, the data is rarely linearly separable. Most of the time data would be a jumbled mess.

2. If the number of observations are lesser than the number of features, Logistic Regression should not be used, otherwise it may lead to overfit.

3. Logistic Regression can only be used to predict discrete functions. Therefore, the dependent variable of Logistic Regression is restricted to the discrete number set. This restriction itself is problematic, as it is prohibitive to the prediction of continuous data.

5. What is difference between MSE and MAE?

The mean absolute error (MAE) is a quantity used to measure how close predictions are to the outcomes. The mean absolute error is an average of the all absolute errors. The mean absolute error is a common measure of estimate error in time series analysis. The mean squared error of an estimator measures the average of the squares of the errors, which means the difference between the estimator and estimated.

MSE is a function, equivalent to the expected value of the squared error loss or quadratic loss. The difference occurs because of the randomness. The MSE is a measure of the quality of an estimator, it is always positive, and values which are closer to zero are better. The MSE is the second moment of the error, and includes both the variance of the estimator and its bias. For an unbiased estimator, the MSE is the variance of the estimator.

6. What are the differences between MSE and RMSE

MSE (Mean Squared Error) represents the difference between the original and predicted values which are extracted by squaring the average difference over the data set. It is a measure of how close a fitted line is to actual data points. The lesser the Mean Squared Error, the closer the fit is to the data set. The MSE has the units squared of whatever is plotted on the vertical axis.

RMSE (Root Mean Squared Error) is the error rate by the square root of MSE. RMSE is the most easily interpreted statistic, as it has the same units as the quantity plotted on the vertical axis or Y-axis. RMSE can be directly interpreted in terms of measurement units, and hence it is a better measure of fit than a correlation coefficient.

7. What are the advantages and Disadvantages of Regression Algorithms

Advantages:

Easy and simple implementation.
Space complex solution.
Fast training.
Value of θ coefficients gives an assumption of feature significance.

Disadvantages:

Applicable only if the solution is linear. In many real-life scenarios, it may not be the case.
Algorithm assumes the input residuals (error) to be normal distributed, but may not be satisfied always.

8. What is the ordinary Least square method in Machine Learning

OLS or Ordinary Least Squares is a method used in Linear Regression for estimating the unknown parameters by creating a model which will minimize the sum of the squared errors between the observed data and the predicted one.

Ordinary Least Squares method works for both univariate dataset which means single independent variables and single dependent variables and multi-variate dataset which contains a single independent variable set and multiple dependent variables sets. Ordinary Least Squares method requires a machine learning algorithm called “Gradient Descent”.

9. Explain in detail about Gradient Descent?

Gradient descent is a first-order optimization algorithm. It is dependent on the first order derivative of a loss function. It calculates that which way the weights should be altered so that the function can reach a minima. Through back propagation, the loss is transferred from one layer to another and the model’s parameters also known as weights are modified depending on the losses so that the loss can be minimized.

10. What do you mean by Principal coordinate analysis?

Principal Coordinates Analysis (PCoA,) is a method to explore and to visualize similarities or dissimilarities of data. It starts with a similarity matrix or dissimilarity matrix and assigns for each item a location in a low-dimensional space. PCOA tries to find the main axes through a matrix. It is a kind of eigen analysis and calculates a series of eigenvalues and eigenvectors.

Each eigenvalue has an eigenvector, and there are as many eigenvectors and eigenvalues as there are rows in the initial matrix. By using PCoA we can visualize individual and/or group differences. Individual differences can be used to show outliers.