What do you mean by the terms Skewed Data, Outliers, Missing Values and Null Values?
The distribution of the data which is not symmetric is called Skewed data. Skewed Data has one of its tails is longer than the other.
The distribution which has its right side has long tail is called positively skewed or right skewed. In this type of Skewed Data, Mode> Median > Mean.
The distribution which has its left side has long tail is called negatively skewed or left skewed. In this type of Skewed data, Mean>Median>Mode.
Outliers are extreme values that deviate from other observations on data. For example, in a normal distribution, outliers may be values on the tails of the distribution. They may indicate a variability in a measurement, experimental errors or a novelty. Outliers can be of two kinds univariate and multivariate. Univariate outliers can be found when looking at a distribution of values in a single feature space. Multivariate outliers can be found in a n-dimensional space.
As the name suggests, the data has some values which are missing. Missing values can arise from information loss as well as dropouts and nonresponses of the study participants. The presence of missing values leads to a smaller sample size than intended and eventually compromises the reliability of the study results. It can also produce biased results when interpretations about a population are drawn based on such a sample, decline the consistency of the data.
NULL is the value used to denote an unknown value of data. In a database context, Null is the total absence of a value in a particular field and means that the field value is unknown. It is not the same as a zero value for a numerical field, text field. When the values in a column which consists of nulls are counted, nulls are not included in the results. It is also represented by NaN.
Leave a comment
You must be logged in to post a comment.