/    /  Statistics – Interview questions Part 10

1. If the variance of a dataset is correctly computed with the formula using (n – 1) in the denominator, Explain?

Answer: If the variance has n-1 in the formula, it means that the set is a sample. We try to estimate the population variance by dividing the sum of squared difference with the mean with n-1. If we have the actual population data we can directly divide the sum of squared differences with n instead of n-1.

 

 2. Is Standard deviation is robust to outliers?

Answer:  If you look at the formula for standard deviation above, a very high or a very low value would increase standard deviation as it would be very different from the mean. Hence outliers will effect standard deviation.

 

3. What happens to the confidence interval when we introduce some outliers to the data?

Answer:  We know that confidence interval depends on the standard deviation of the data. If we originate outliers into the data, the standard deviation increases, and hence the confidence interval also increases.

 

4. What is the value of t-statistic?

Answer:  The t statistic of the given group is nothing but the difference between the group means by the standard error.

=(10-7)/0.94 = 3.191

 

5. What happens when we introduce more variables to a linear regression model?

Answer:  The R square always increases or at least remains constant because in case of ordinary least squares the sum of square error never increases by adding more variables to the model. Hence the R squared does not decrease. The composed R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The Composed R-squared increases only if the new term improves the model more than would be expected by chance. R square decreases when a predictor improves the model by less than expected by chance.

 

6. What is the relationship between significance level and confidence level?

Answer:  Significance level is 1-confidence interval. Suppose, if the significance level is 0.05, the corresponding confidence interval is 95% or 0.95. The significance level is the probability of obtaining a result as extreme as, or more extreme than, the result actually obtained when the null hypothesis is true. The Significance level is 1-confidence interval then range of likely values for a population parameter, such as the population mean. For example, if you compute a 95% confidence interval for the average price of an ice cream, then you can be 95% confident that the interval contains the true average cost of all ice creams.

The significance level and confidence level are the complementary portions in the normal distribution.

 

7. What are types of selection bias?

Answer: There are as follows;

Sampling bias: It is a systematic error due to a non-random sample of a population causing some members of the population to be less likely to be included than others resulting in a biased sample.

Time interval: In time interval, a trial may be terminated early at an extreme value (often for ethical reasons), but the extreme value is likely to be reached by the variable with the largest variance, even if all variables have a similar mean.

Data: In this, when specific subsets of data are chosen to support a conclusion or rejection of bad data on arbitrary grounds, instead of according to previously stated or generally agreed criteria.

Attrition: It is a kind of selection bias caused by attrition (loss of participants) discounting trial subjects/tests that did not runto completion.