/    /  Machine Learning – Interview Questions Part 5

1.What is the difference between Multi-Dimensional Scaling and Principal Component Analysis?

Principal Component Analysis

The input to PCA is the original vectors in n-dimensional space.

And the data are projected onto the directions in the data with the most variance. Hence the “spread” of the data is roughly conserved as the dimensionality decreases.

Multidimensional Scaling

The input to MDS is the pairwise distances between points.

The output of MDS is a two- or three-dimensional projection of the points where distances are preserved.

2. What do you mean by Multi-Dimensional Scaling (MDS)?

Multidimensional scaling is a visual representation of distances or dissimilarities between sets of objects. “Objects” can be colors, faces, map coordinates. MDS does finds set of vectors in p-dimensional space such that the matrix of Euclidean distances among them corresponds as closely as possible to some function of the input matrix according to a criterion function called stress.

The input to multidimensional scaling is a distance matrix. The output is typically a two-dimensional scatterplot, where each of the objects is represented as a point.

3. What are the Pros and cons of the PCA?

Advantages of Principal Component Analysis

1. Removes Correlated Features:

In a real-world scenario, this is very common that you get thousands of features in your dataset. You cannot run your algorithm on all the features as it will reduce the performance of your algorithm and it will not be easy to visualize that many features in any kind of graph. So, you MUST reduce the number of features in your dataset.

You need to find out the correlation among the features (correlated variables). Finding correlation manually in thousands of features is nearly impossible, frustrating and time-consuming. PCA does this for you efficiently.

After implementing the PCA on your dataset, all the Principal Components are independent of one another. There is no correlation among them.

2. Improves Algorithm Performance: 

With so many features, the performance of your algorithm will drastically degrade. PCA is a very common way to speed up your Machine Learning algorithm by getting rid of correlated variables which don’t contribute in any decision making. The training time of the algorithms reduces significantly with less number of features.

Disadvantages of Principal Component Analysis

1. Independent variables become less interpretable:

After implementing PCA on the dataset, your original features will turn into Principal Components. Principal Components are the linear combination of your original features. Principal Components are not as readable and interpretable as original features.

2. Data standardization is must before PCA: 

You must standardize your data before implementing PCA, otherwise PCA will not be able to find the optimal Principal Components.

For instance, if a feature set has data expressed in units of Kilograms, Light years, or Millions, the variance scale is huge in the training set. If PCA is applied on such a feature set, the resultant loadings for features with high variance will also be large. Hence, principal components will be biased towards features with high variance, leading to false results.

4. WHAT IS DIMENSIONALITY REDUCTION IN MACHINE LEARNING?

Dimensionality reduction is the process of reducing the number of random variables under consideration, by obtaining a set of principal variables. It can be divided into feature selection and feature extraction. The higher the number of features, the harder it gets to visualize the training set and then work on it. Sometimes, most of these features are correlated, and hence redundant. This is where dimensionality reduction algorithms come into play.

There are two components of dimensionality reduction:

Feature selection:

In this, we try to find a subset of the original set of variables, or features, to get a smaller subset which can be used to model the problem. It usually involves three ways:

  • Filter
  • Wrapper
  • Embedded

5. Explain different types of Plots that we generally use in Machine Learning?

There are different plots we use in Machine Learning which can be visualized using python. Different plots are listed below.

i) Scatter plot

ii) Box plot

iii) Bar chart

iv) Line plot

v) Histogram

6. What do you mean by Curse of Dimensionality? What are the different ways to deal with it?

When the data has too many features, then we want to reduce some of the features in it for easy understanding and execution of the data analysis. This is called Curse of Dimensionality. This can be reduced by Dimensionality Reduction.

As the number of predictors or dimensions or features in the dataset increase, it becomes computationally more expensive and exponentially more difficult to produce accurate predictions in classification or regression models.

PCA reduces the dimensions of a d-dimensional dataset by projecting it onto a k-dimensional subspace, where k < d.

LDA extracts the k new independent variables that maximize the separation between the classes of the dependent variable.

PCA and LDA are the two methods used to reduce the dimensions or features of the data. Through which we can solve the problem of Curse of Dimensionality.

10. WHAT DO YOU MEAN BY DUMMY VARIABLE? WHERE IT IS USED IN MACHINE LEARNING?

If there are n number of categories in categorical attribute, n new attributes will be created. These attributes created are called Dummy Variables. These dummy variables will be created with one hot encoding and each attribute will have value either 0 or 1, representing presence or absence of that attribute.

We use dummy variables in Regression in Machine Learning. For transforming categorical attribute to numerical attribute, we can use label encoding procedure (label encoding assigns a unique integer to each category of data). But this procedure is not alone that much suitable, hence, one hot encoding is used in regression models following label encoding. This enables us to create new attributes according to the number of classes present in the categorical attribute. Here is where dummy variables are used.