/    /  Machine Learning – Interview Questions Part 2

1.What do you mean by Fourier Transforms? How can we use in Machine Learning?

Fourier Transforms means converting or decomposes a signal into frequencies. Fourier Transform moves from Time domain to Frequency domain. If we want to move from Frequency domain to Time domain, we can do it by Inverse Fourier Transform.

Fourier transform is widely used not only in signal (radio, acoustic, etc.) processing but also in image analysis e.g. edge detection, image filtering, image reconstruction, and image compression.

Fourier transform of your data can expand accessible information about the analyzed sample.

2.Explain Multi collinearity in detail? How to reduce it?

Multicollinearity is a phenomenon in which two or more predictor variables or Independent variables in a regression model are highly correlated, which means that one variable can be linearly predicted from the others with a considerable degree of accuracy.  Two variables are perfectly collinear if there is an exact linear relationship between them.

There are two types of multicollinearity:

Structural multicollinearity is a mathematical artifact caused by creating new predictors from other predictors.

Data-based multicollinearity is a result of a poorly designed experiment, dependence on purely observational data, or the inability to operate the system on which the data are collected.

 

3. Explain in detail about Covariance?

Covariance is a measure of how changes in one variable are associated with changes in a second variable. Precisely, covariance measures the degree to which two variables are linearly associated.

If the greater values of one variable mainly correspond with the greater values of the other variable, Lesser values of one variable corresponds to lesser values of other variable, then the covariance is positive.

4.Explain different types of Plots that we generally use in Machine Learning?

There are different plots we use in Machine Learning which can be visualized using python. Different plots are listed below.

i) Scatter plot

ii) Box plot

iii) Bar chart

iv) Line plot

v) Histogram

5. What do you mean by Matplotlib?

Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib is a multi-platform data visualization library built on NumPy arrays and designed to work with the broader SciPy stack.

One of the greatest benefits of visualization is that it allows us visual access to huge amounts of data in easily digestible visuals. Matplotlib consists of several plots like line, bar, scatter, histogram etc.

6. Explain in detail about Line Chart?

The line chart is represented by a series of datapoints connected with a straight line. Line charts are most often used to visualize data that changes over time.

7. What do you mean by Box Plot?

A boxplot is a standardized way of displaying the distribution of data based on a five numbered summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”). It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed. It is also called as Whisker Plot.

8. What is Dimensionality Reduction in Machine Learning?

Dimensionality reduction is the process of reducing the number of random variables under consideration, by obtaining a set of principal variables. It can be divided into feature selection and feature extraction. The higher the number of features, the harder it gets to visualize the training set and then work on it. Sometimes, most of these features are correlated, and hence redundant. This is where dimensionality reduction algorithms come into play.

9. What do you mean by Linear Discriminant Analysis?

Linear Discriminant Analysis is a supervised algorithm as it takes the class label into consideration. It is a way to reduce ‘dimensionality’ while at the same time preserving as much of the class discrimination information as possible.

LDA helps to find the boundaries around clusters of classes. It projects data points on a line so that your clusters are as separated as possible, with each cluster having a relative distance to a centroid. LDA finds a centroid of each class datapoints.

PCA performs better in case where number of samples per class is less. Whereas LDA works better with large dataset having multiple classes; class separability is an important factor while reducing dimensionality.

10. Explain about Eigen values and Eigen Vectors?

Eigenvector —Every vector (list of numbers) has a direction when it is plotted on an XY chart. Eigenvectors are those vectors when a linear transformation (such as multiplying it to a scalar) is performed on them then their direction does not change. This attribute of Eigenvectors makes them very valuable as I will explain in this article.

Eigenvalue— The scalar that is used to transform (stretch) an Eigenvector.

Eigenvectors and eigenvalues are used to reduce noise in data. They can help us improve efficiency in computationally intensive tasks. They also eliminate features that have a strong correlation between them and also help in reducing over-fitting.

Eigenvalues and Eigenvectors have their importance in linear differential equations where you want to find a rate of change or when you want to maintain relationships between two variables.