/    /  Machine Learning – Interview Questions Part 1

1. What do you mean by Machine Learning and various applications?

Machine Learning is the series of the Algorithms through which Machine can learn without being programmed explicitly. It is a simple concept that machine takes data and learn from the data. It also allows machine to learn new things from the given data. Machine learning algorithms allows system applications to become more accurate in predicting outcomes without being explicitly programmed. Machine Learning uses statistical analysis to predict the output.

Machine Learning is a subset of Artificial Intelligence which concentrates mainly on machine, learning from their understanding and making predictions based on its experience.

 

2. What are the differences between Supervised Machine Learning and Unsupervised Machine Learning?

In Supervised learning, we train the machine using data which is well labeled which means some data is already tagged with the correct answer. A supervised learning algorithm learns from labeled training data which helps to predict outcomes for unforeseen data.

Unsupervised learning is a machine learning technique, where we do not need to supervise the model. Instead we need to allow the model to work on its own to learn information. It mainly deals with the unlabeled data.

3. What is the difference between Artificial Intelligence, Machine Learning and Deep Learning?

Machine Learning:

Machine Learning is a technique of analyzing data, learn from that data and then apply what they have learned to a model to make a knowledgeable decision.

Nowadays many of big companies use machine learning to give their users a better experience, some of the examples are, Amazon using machine learning to give better product choice recommendations to their costumers based on their preferences, Netflix uses machine learning to give better suggestions to their users of the Tv series or movie or shows that they would like to watch.

Deep Learning:

Deep learning is a subset of machine learning. The main difference between deep learning and machine learning is, machine learning models become better gradually but the model still needs some guidance. If a machine learning model returns an inaccurate prediction then the programmer needs to fix that problem explicitly but in the case of deep learning, the model fixes that problem by itself. The automatic car driving system is a good example of deep learning.

Artificial intelligence:

Artificial Intelligence is completely a different thing from Machine learning and deep learning, actually deep learning and machine learning both are the subsets of AI. AI is an ability of computer programs to function like a human brain.

AI means to replicate a human brain, the way a human brain thinks, works and functions. There is no proper Artificial Intelligence till now but are very close to establishing it, one of the examples of AI is Sophia, the most advanced AI model present today.

4. What are the differences between Inductive Reasoning and Deductive Reasoning in Machine Learning?

Inductive Reasoning

Inductive reasoning includes making a simplification from specific facts, and observations. It uses a bottom-up method. It moves from precise observation to a generalization or simplification. In Inductive reasoning, the conclusions are probabilistic. An Inductive argument can be strong or weak, which means the conclusion may be false even if premises(properties) are true. Usage of inductive reasoning is fast and easy, as we need evidence instead of true facts.

Deductive Reasoning

Deductive reasoning uses available facts, information, or knowledge to assume a valid conclusion. It uses a top-down approach or method. It moves from generalized statement to an effective conclusion. In deductive reasoning, the conclusions are sure. Deductive arguments can be valid or invalid, that means if premises or properties are true, the conclusion must be true. Usage of deductive reasoning is difficult, as we need facts that must be true.

5. What do you mean by the terms Skewed Data, Outliers, Missing Values and Null Values?

Skewed data

The distribution of the data which is not symmetric is called Skewed data. Skewed Data has one of its tails that is longer than the other.

The distribution which has its right side has a long tail is called positively skewed or right-skewed. In this type of Skewed Data, Mode> Median > Mean.

Outliers

Outliers are extreme values that deviate from other observations on data. For example, in a normal distribution, outliers may be values on the tails of the distribution. They may indicate a variability in measurement, experimental errors or a novelty. Outliers can be of two kinds univariate and multivariate. Univariate outliers can be found when looking at a distribution of values in a single feature space. Multivariate outliers can be found in an n-dimensional space.

Missing Values

As the name suggests, the data has some values which are missing. Missing values can arise from information loss as well as dropouts and nonresponses of the study participants. The presence of missing values leads to a smaller sample size than intended and eventually compromises the reliability of the study results. It can also produce biased results when interpretations about a population are drawn based on such a sample, decline the consistency of the data.

Null Values

NULL is the value used to denote an unknown value of data. In a database context, Null is the total absence of a value in a particular field and means that the field value is unknown. It is not the same as a zero value for a numerical field, text field. When the values in a column which consists of nulls are counted, nulls are not included in the results. It is also represented by NaN.

6. What do you mean by Features and Labels in a Dataset?

Feature

Features are individual independent variables which acts as the input in the system. Prediction models use these features to make predictions. New features can also be extracted from old features using a method known as ‘feature engineering’. To make it simple, you can consider one column of your data set to be one feature. Features are also called attributes. And the number of features is dimensions.

Label

Labels are the final output or target Output. It can also be considered as the output class.  We obtain labels as output when provided with features as input.

7. What do you mean by independent and Dependent Variables?

An independent variable is a variable that represents a quantity that is being used in an experiment. The independent variable (sometimes known as the manipulated variable) is the variable whose change isn’t affected by any other variable in the experiment.

X is often the variable used to represent the independent variable in an equation.

A dependent variable is a variable that represents a quantity whose value depends on how the independent variable is manipulated.

The independent variable is what you change, and the dependent variable is what changes because of that.

Y is often the variable used to represent the dependent variable in an equation.

8. What do you mean by Dummy Variable? Where it is used in Machine Learning?

If there are n number of categories in categorical attributes, n new attributes will be created. These attributes created are called Dummy Variables. These dummy variables will be created with one-hot encoding and each attribute will have value either 0 or 1, representing the presence or absence of that attribute.

We use dummy variables in Regression in Machine Learning. For transforming categorical attributes to numerical attributes, we can use label encoding procedure (label encoding assigns a unique integer to each category of data). But this procedure is not alone that much suitable, hence, one hot encoding is used in regression models following label encoding. This enables us to create new attributes according to the number of classes present in the categorical attribute. Here is where dummy variables are used.

9. What is Feature Scaling?

Feature Scaling or Standardization: It is a step of Data Preprocessing which is applied to independent variables or features of data. It helps to normalize the data within a certain range. It also helps in speeding up the calculations in an algorithm.

Dataset consists of features that highly differ in magnitudes, units, and range. Normalization should be performed when the scale of a feature is irrelevant or misleading and not should Normalize when the scale is meaningful.

The algorithms which use Euclidean Distance measure are sensitive to Magnitudes. Here feature scaling helps to weigh all the features equally.

If a feature in the dataset is large in scale compared to others then in algorithms where Euclidean distance is measured this large-scaled feature becomes dominating and needs to be normalized.

10. Explain in detail about Normalization and Standardization?

Standardization

Standardization is the process of rescaling the features so that they’ll have the properties of a Gaussian distribution with

μ=0 and σ=1

Normalization

where μ is the mean and σ is the standard deviation from the mean; standard scores (also called z scores) of the samples are calculated as follows:

Normalization often also simply called Min-Max scaling shrinks the range of the data such that the range is fixed between 0 and 1. It works better for cases in which the standardization might not work so well. If the distribution is not Gaussian or the standard deviation is very small, the min-max scaler works better.