Statistics – Types:
The two main types of statistics are descriptive statistics and inferential statistics.As we know that the steps to study a survey or an experiment are to collect, organize, analyze, interpret and present the data. Now the steps are divided into two groups where the initial steps like collecting, organizing and presenting belong to Descriptive statistics and the remaining two steps like analyzing and interpreting(drawing the conclusion) the data belong to Inferential statistics.
Since the name Descriptive it involves all the describing about the numbers obtained in the experiment or survey and preparing the data for analysis by finding the measures of central tendency, spread, shape, regressions.etc, depending upon the type of data .Descriptive statistics are limited in so much that they only allow you to make summations about the people or objects that you have actually measured. For example, if we have a new drug which cures a particular virus and it worked on a set of patients, we cannot claim that it would work on other set of patients only based on descriptive statistics. This is where inferential statistics comes in.
As per wiki, Univariate analysis describes the distribution of a single variable, such as
- central tendency like mean, median, and mode
- dispersion such as rangeand quartiles of the data-set
- measures of spread like the varianceand standard deviation.
- shape of the distribution like the skewness and kurtosis.
variable’s distribution can also be depicted in graphical or tabular format, including histograms and stem-and-leaf display.
When a sample consists of more than one variable, descriptive statistics can be used to describe the relationship between pairs of variables. In this case, descriptive statistics include:
- Cross-tabulations and contingency tables
- Graphical representation via scatter plots
- Quantitative measures of dependence
- Descriptions of conditional distributions
The main reason for differentiating univariate and bivariate analysis is that bivariate analysis is it describes the relationship between two different variables. Quantitative measures of dependence include correlation and covariance . The un-standardized slope indicates that the unit change in the response variable for one unit change in the predictor variable. The standardized slope indicates this change in standardized (z-score) units. The data having high skewness are often transformed by taking logarithms. Use of logarithms makes graphs more symmetrical and looks more similar to the normal distribution, making them easier to interpret intuitively.
On the other hand, Inferential statistics involves drawing the right conclusions from the descriptive statistics. The methods of inferential statistics are the estimation of parameters and testing of statistical hypotheses. Finally, these inferences make studies important for the future generalizations about a population by studying a smaller sample. While drawing conclusions, one must be very careful to not to draw the wrong or biased conclusions. For example, data dredging is becoming a bigger problem as computers store loads of information which makes it easy( intentionally or unintentionally) to use the wrong inferential methods.
The important limitation being the data about a population provided is not fully measured hence we are not completely sure that the values/statistics we calculate are correct. Because, inferential statistics uses the values measured in a sample to infer the values that would be measured in a population; there will always be a degree of uncertainty in doing this. And the second one being chances of biased conclusions.
Inferential Statistics make propositions using data drawn from the population with some form of sampling. Given a hypothesis about a population, for which we wish to draw inferences, statistical inference consists of (first) selecting a model of the process that generates the data and (second) deducing propositions from the model. The conclusion of a inferential statistics is known as a statistical proposition. Some common forms of statistical proposition are :
- a point estimate( a particular value that approximates parameter at it’s very best )
- an interval estimate, e.g. a confidence interval(or set estimate), i.e. an interval constructed using a dataset drawn from a population contains a true parameter value with the probability at the stated confidence level under repeated sampling.
- a credible interval( a set of values containing in a particular interval)
- rejection of a hypothesis
- clusteringof data points
Both descriptive statistics and inferential statistics go hand in hand and cannot exist without one another.
We can simply further divide Descriptive and Inferential statistics as shown