/    /  Statistics – Interview questions Part 9

1. What is selection bias?

Answer:  Selection bias occurs in an ‘active’ sense when the sample data that is gathered and prepared for modeling has characteristics that are not representative of the true, future population of cases the model will see. When a subset of the data are systematically (i.e., non-randomly) excluded from analysis an Active selection bias occurs.

 

2. What is an example of a data set with a non-Gaussian distribution?

Answer:  Gaussian distribution is part of the Exponential family of distributions, but there are a lot more of them, with the same sort of ease of use, in many cases, and if anyone doing the machine learning and has a solid grounding in statistics, they can be utilized where appropriate.

 

3. Explain what is logistic regression?

Answer: Logistic regression is a statistical method for examining a dataset in which there are one or more independent variables that defines an outcome.

 

4. What are the measures that are used to analyze the central tendency of data?

Answer: The mean, median, mode are the three statistical measures which help us to analyze the central tendency of data. To use these measures need to find the central value of the data to summarize the entire data set.

 

5. What are the measures of central tendency will always change if a single value in the data changes?

Answer: The mean of the dataset would always change if we change any value of the data set. Hence we are totaling all the values together to get it; every value of the data set contributes to its value. Where Median and mode may or may not change with altering a single value in the dataset.

 

6. What Is Quartile?

Answer: They are as follows.

second quartile (50th percentile) .

third quartile (75th percentile) .

kth percentile.

prctile(x, 25) % 25th percentile, return 2.25.

prctile(x, 50) % 50th percentile, return 3, i.e. median.