/    /  Machine Learning- Evaluating Hypotheses: Basics of Sampling Theory

Evaluating Hypotheses: Basics of Sampling Theory

 

For estimating hypothesis accuracy, statistical methods are applied. In this blog, we’ll have a look at evaluating hypotheses and the basics of sampling theory. 

 

Let’s have a look at the terminologies involved and what they mean, 

 

Random Variable: 

A random variable may be thought of as the name of a probabilistic experiment. Its value is the outcome of the experiment. 

 

When we don’t know the outcome of the experiment for certain, it comes under random variables.

 

The outcome of a coin flip is a good illustration of a random variable. Consider a probability distribution where the outcomes of a random event aren’t all equally likely to occur. 

 

If the number of heads we get from tossing two coins is the random variable, Y, then Y might be 0, 1, or 2. On a two-coin toss, this means that we could get no heads, one head, or both heads.

 

The two coins, on the other hand, land in four different patterns: TT, HT, TH, and HH. As a result, P(Y=0) = 1/4 because we only have one possibility of obtaining no heads (i.e., two tails [TT] when the coins are tossed). 

 

Similarly, receiving two heads (HH) has a 1/4 chance of happening. In the probabilistic events, there are two cases where one head can appear, i.e HT and TH. P (Y=1) = 2/4 = 1/2 in this example.

 

Probability Distribution: 

A probability distribution is a statistical function that specifies all possible values and probabilities for a random variable in a given range.

 

This range will be bounded by the minimum and greatest possible values, but where the possible value will be plotted on the probability distribution will be determined by a variety of factors. 

 

The mean (average), standard deviation, skewness, and kurtosis of the distribution are among these parameters.

 

Y defines the chance Pr(Y = yi) that Y will take on the value yi for each potential value yi for a random variable.

 

Expected Value: 

The expected value (EV) is the value that investment is predicted to have at some time in the future. 

 

In statistics and probability analysis, the expected value is computed by multiplying each conceivable event by the likelihood that it will occur and then summing all of those values.

 

Investors might choose the scenario that is most likely to provide the desired result by assessing anticipated values.

 

The variance of a Random Variable: 

In statistics, variance refers to the deviation of a data collection from its mean value. The probability-weighted average of squared deviations from the predicted value is used to calculate it. 

 

As a result, the greater the variance, the greater the difference between the set’s numbers and the mean. A smaller variance, on the other hand, indicates that the numbers in the collection are closer to the mean.

 

The Y-random variable variance is defined as,

 

Standard Deviation: 

The standard deviation is a statistic that calculates the square root of the variance and measures the dispersion of a dataset relative to its mean. 

 

The standard deviation is determined as the square root of variance by computing each data point’s difference from the mean.

 

When data points are further from the mean, there is more variation within the data set; as a result, the larger the standard deviation, the more spread out the data is.

 

The standard deviation of Y is . The standard deviation of Y is usually represented using the symbol

 

The Binomial Distribution: 

Under a given set of factors or assumptions, the binomial distribution expresses the likelihood that a variable will take one of two independent values.

 

The binomial distribution is based on the premise that each trial has just one result, has the same chance of success, and is mutually exclusive, or independent of the others.

 

It gives the probability of observing r heads in a series of n independent coin tosses if the probabilities of heads in a single toss is p. 

 

Normal Distribution: 

The standard distribution, also known as the Gaussian distribution, is the probability of a measure of distribution based on the definition, indicating that the data about the definition occurs more often than the data at a distance. The normal distribution will appear as a metal grid on the graph.

 

It is also referred to as a bell-shaped probability distribution that covers many natural phenomena. 

 

Central Limit Theorem: 

Central Limit Theorem is a statistical premise that given a big enough sample size from a population with a finite level of variation, the mean of all sampled variables from the same population will be about equal to the mean of the entire population. 

 

Furthermore, according to the law of large numbers, these samples resemble a normal distribution, with their variances being roughly equal to the variance of the population as the sample size grows.

 

Estimator: 

It is a random variable Y used to estimate some parameter p of an underlying population. 

 

The estimand is the quantity that is being estimated (i.e. the one you wish to know). For example, suppose you needed to discover the average height of pupils at a 1000-student school. 

 

You measure a group of 30 children and discover that the average height is 56 inches. This is the estimator for your sample mean. You estimate the population means (your estimand) to be around 56 inches using the sample mean.

 

The Estimation Bias: 

The estimation bias of Y as an estimator for p is the quantity (E[Y] – p). An unbiased estimator is one for which the bias is zero. 

 

N % confidence interval: 

An N% confidence interval estimate for parameter p is an interval that includes p with probability N%. 

 

Reference

Evaluating Hypotheses: Basics of Sampling Theory