Machine Learning - Interview Questions Part 5

And the data are projected onto the directions in the data with the most variance. Hence the “spread” of the data is roughly conserved as the dimensionality decreases.

The output of MDS is a two- or three-dimensional projection of the points where distances are preserved.

2. What do you mean by Multi-Dimensional Scaling (MDS)?

Multidimensional scaling is a visual representation of distances or dissimilarities between sets of objects. “Objects” can be colors, faces, map coordinates. MDS does finds set of vectors in p-dimensional space such that the matrix of Euclidean distances among them corresponds as closely as possible to some function of the input matrix according to a criterion function called stress.

The input to multidimensional scaling is a distance matrix. The output is typically a two-dimensional scatterplot, where each of the objects is represented as a point.

3. What are the Pros and cons of the PCA?

In a real-world scenario, this is very common that you get thousands of features in your dataset. You cannot run your algorithm on all the features as it will reduce the performance of your algorithm and it will not be easy to visualize that many features in any kind of graph. So, you MUST reduce the number of features in your dataset.

You need to find out the correlation among the features (correlated variables). Finding correlation manually in thousands of features is nearly impossible, frustrating and time-consuming. PCA does this for you efficiently.

After implementing the PCA on your dataset, all the Principal Components are independent of one another. There is no correlation among them.

With so many features, the performance of your algorithm will drastically degrade. PCA is a very common way to speed up your Machine Learning algorithm by getting rid of correlated variables which don’t contribute in any decision making. The training time of the algorithms reduces significantly with less number of features.

After implementing PCA on the dataset, your original features will turn into Principal Components. Principal Components are the linear combination of your original features. Principal Components are not as readable and interpretable as original features.

You must standardize your data before implementing PCA, otherwise PCA will not be able to find the optimal Principal Components.

For instance, if a feature set has data expressed in units of Kilograms, Light years, or Millions, the variance scale is huge in the training set. If PCA is applied on such a feature set, the resultant loadings for features with high variance will also be large. Hence, principal components will be biased towards features with high variance, leading to false results.

4. WHAT IS DIMENSIONALITY REDUCTION IN MACHINE LEARNING?

Dimensionality reduction is the process of reducing the number of random variables under consideration, by obtaining a set of principal variables. It can be divided into feature selection and feature extraction. The higher the number of features, the harder it gets to visualize the training set and then work on it. Sometimes, most of these features are correlated, and hence redundant. This is where dimensionality reduction algorithms come into play.

In this, we try to find a subset of the original set of variables, or features, to get a smaller subset which can be used to model the problem. It usually involves three ways:

5. Explain different types of Plots that we generally use in Machine Learning?

There are different plots we use in Machine Learning which can be visualized using python. Different plots are listed below.

6. What do you mean by Curse of Dimensionality? What are the different ways to deal with it?

When the data has too many features, then we want to reduce some of the features in it for easy understanding and execution of the data analysis. This is called Curse of Dimensionality. This can be reduced by Dimensionality Reduction.

As the number of predictors or dimensions or features in the dataset increase, it becomes computationally more expensive and exponentially more difficult to produce accurate predictions in classification or regression models.

PCA reduces the dimensions of a d-dimensional dataset by projecting it onto a k-dimensional subspace, where k < d.

LDA extracts the k new independent variables that maximize the separation between the classes of the dependent variable.

PCA and LDA are the two methods used to reduce the dimensions or features of the data. Through which we can solve the problem of Curse of Dimensionality.

7. EXPLAIN IN DETAIL ABOUT COVARIANCE?

Covariance is a measure of how changes in one variable are associated with changes in a second variable. Precisely, covariance measures the degree to which two variables are linearly associated.

If the greater values of one variable mainly correspond with the greater values of the other variable, Lesser values of one variable corresponds to lesser values of other variable, then the covariance is positive.

8. Explain Multi collinearity in detail? How to reduce it?

Multicollinearity is a phenomenon in which two or more predictor variables or Independent variables in a regression model are highly correlated, which means that one variable can be linearly predicted from the others with a considerable degree of accuracy. Two variables are perfectly collinear if there is an exact linear relationship between them.

There are two types of multicollinearity:

Structural multicollinearity is a mathematical artifact caused by creating new predictors from other predictors.

Data-based multicollinearity is a result of a poorly designed experiment, dependence on purely observational data, or the inability to operate the system on which the data are collected.

Ways to reduce Multicollinearity

Drop one of the variables. An explanatory variable may be dropped to produce a model with significant coefficients.

Standardize your independent variables.

Obtain more data. This is the preferred solution. More data can produce more precise parameter estimates, as seen from the formula in variance inflation factor for the variance of the estimate of a regression coefficient in terms of the sample size and the degree of multicollinearity.

9. What is the difference between Feature Selection and Feature Extraction?

Feature selection is the process of choosing precise features, from a features pool. This helps in simplification, regularization and shortening training time. This can be done with various techniques: e.g. Linear Regression, Decision Trees.
Feature extraction is the process of converting the raw data into some other data type, with which the algorithm works is called Feature Extraction. Feature extraction creates a new, smaller set of features that captures most of the useful information in the data.
The main difference between them is Feature selection keeps a subset of the original features while feature extraction creates new ones.