Machine Learning – Interview Questions Part 3

1. What are the differences between PCA and LDA?

Both LDA and PCA are linear transformation techniques: LDA is a supervised whereas PCA is unsupervised and ignores class labels.

In contrast to PCA, LDA attempts to find a feature subspace that maximizes class separability. LDA makes assumptions about normally distributed classes and equal class covariances.

2. WHAT DO YOU MEAN BY PRINCIPAL COORDINATE ANALYSIS?

Principal Coordinates Analysis (PCoA,) is a method to explore and to visualize similarities or dissimilarities of data. It starts with a similarity matrix or dissimilarity matrix and assigns for each item a location in a low-dimensional space. PCOA tries to find the main axes through a matrix. It is a kind of eigen analysis and calculates a series of eigenvalues and eigenvectors.

Each eigenvalue has an eigenvector, and there are as many eigenvectors and eigenvalues as there are rows in the initial matrix. By using PCoA we can visualize individual and/or group differences. Individual differences can be used to show outliers.

3. What are the Advantages and Disadvantages of Naïve Bayes Classifier?

Advantages of Naive Bayes

1. When assumption of independent predictors holds true, a Naive Bayes classifier performs better as compared to other models.

2. Naive Bayes requires a small amount of training data to estimate the test data. So, the training period is less.

3. Naive Bayes is also easy to implement.

Disadvantages of Naive Bayes

1. Main imitation of Naive Bayes is the assumption of independent predictors. Naive Bayes implicitly assumes that all the attributes are mutually independent. In real life, it is almost impossible that we get a set of predictors which are completely independent.

2. If categorical variable has a category in test data set, which was not observed in training data set, then model will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as Zero Frequency.

4.What is the differnce between Generative and Discrimination models?

Generative Model

A Generative Model ‌learns the joint probability distribution p (x, y). It predicts the conditional probability with the help of Bayes Theorem. A Generative Model ‌explicitly models the actual distribution of each class.

Generative classifiers

Assume some functional form for P(Y), P(X|Y)
Estimate parameters of P(X|Y), P(Y) directly from training data
Use Bayes rule to calculate P (Y |X)

Generative classifiers examples

‌Naïve Bayes
Bayesian networks
Markov random fields
‌Hidden Markov Models (HMM)

Discriminative Model

A Discriminative model ‌models the decision boundary between the classes. A Discriminative model ‌learns the conditional probability distribution p (y |x).

Discriminative Classifiers

Assume some functional form for P(Y|X)
Estimate parameters of P(Y|X) directly from training data

Discriminative Classifiers

‌Logistic regression
Scalar Vector Machine
‌Traditional neural networks
‌Nearest neighbor
Conditional Random Fields (CRF)s

5.What is the difference between Probability and Likelihood?

Probability is the percentage that a success occurs. For example, we do the binomial experiment by tossing a coin. We suppose that the event that we get the face of coin in success, so the probability of success now is 0.5 because the probability of face and back of a coin is equal. 0.5 is the probability of a success.

Likelihood is the conditional probability. The same example, we toss the coin 10 times, and we suppose that we get 7 success (show the face) and 3 failed (show the back). Therefore, Likelihood is the probability(conditional probability) of an event (a set of success) occur by knowing the probability of a success occur.

6.How Naïve Bayes Theorem is termed or named as Naïve?

A naive Bayes classifier assumes that the presence or absence of a particular feature of a class is unrelated to the presence or absence of any other feature, given the class variable. Fundamentally, it’s “naive” because it makes assumptions that may or may not turn out to be correct.

In other words, Naive Bayes (NB) is ‘naive’ because it makes the assumption that features of a measurement are independent of each other. This is naive because it is (almost) never true.

7.WHAT ARE THE ADVANTAGES AND DISADVANTAGES OF KNN CLASSIFIER?

Advantages of KNN

1. No Training Period: KNN is called Lazy Learner (Instance based learning). It does not learn anything in the training period. It does not derive any discriminative function from the training data. In other words, there is no training period for it. It stores the training dataset and learns from it only at the time of making real time predictions. This makes the KNN algorithm much faster than other algorithms that require training e.g. SVM, Linear Regression etc.

2. Since the KNN algorithm requires no training before making predictions, new data can be added seamlessly which will not impact the accuracy of the algorithm.

Disadvantages of KNN

1. Does not work well with large dataset: In large datasets, the cost of calculating the distance between the new point and each existing point is huge which degrades the performance of the algorithm.

2. Does not work well with high dimensions: The KNN algorithm doesn’t work well with high dimensional data because with large number of dimensions, it becomes difficult for the algorithm to calculate the distance in each dimension.

8. Why cannot we use KNN for Large datasets?

KNN works well with a small number of input variables, but struggles when the number of inputs is very large. Each input variable can be considered a dimension of a p-dimensional input space. For example, if you had two input variables x1 and x2, the input space would be 2-dimensional. As the number of dimensions increases the volume of the input space increases at an exponential rate.

In high dimensions, points that may be similar may have very large distances. All points will be far away from each other and our intuition for distances in simple 2 and 3-dimensional spaces breaks down. This might feel unintuitive at first, but this general problem is called the Curse of Dimensionality.

9.Why KNN Algorithm is called as Lazy Learner?

KNN algorithm is the Classification algorithm. It is also called as K Nearest Neighbor Classifier. K-NN is a lazy learner because it doesn’t learn a discriminative function from the training data but memorizes the training dataset instead. There is no training time in K-NN. The prediction step in K-NN is expensive. Each time we want to make a prediction, K-NN is searching for the nearest neighbors in the entire training set. An eager learner has a model fitting or training step. A lazy learner does not have a training phase.

10.Why we need to take only odd values of K in KNN algorithm?

As we decrease the value of K to 1, our predictions become less stable. Inversely, as we increase the value of K, our predictions become more stable due to majority averaging, and thus, more likely to make more accurate predictions. Eventually, we begin to witness an increasing number of errors. In cases where we are taking a majority vote among labels, we usually make K an odd number to have a tiebreaker.