/    /  Statistics – Bias in Sampling

Bias in Sampling

A sampling method is called biased if the survey sample does not accurately represent the population. Sampling bias is sometimes called ascertainment bias or systematic bias. Sampling bias refers to sample and also the method of sampling.Bias can be either intentional or not.Sometimes even poor measurement process can lead to bias.When measuring a nonlinear functional of the probabilities from a limited number of experimental samples a bias may occur, even when these samples are picked randomly from the underlying population and there is thus no sampling bias. This bias is called “limited sampling bias”.

Example:

Telephone sampling is common in marketing surveys. In a survey, a random sample is chosen from the sampling frame having a list of telephone numbers of people in the particular area. This method does involve taking a simple random sample, but it will miss

People who do not have a phone or

People who only have a cell phone that has an area code not in the region being surveyed

People who do not wish to be surveyed, including those who monitor calls on an answering machine

People who don’t answer those from telephone surveyors. Thus systematically excluding certain types of consumers in the area.

Here are few sources of sampling bias:

Convenience samples:  

David A. Freedman, statistics professor stated”Statistical inference with convenience samples is a risky business.” In cases where it may not be possible or not be practical to choose a random sample, a convenience sample might be used. Sometimes convenience sample is considered as a random sample, but often it gets biased. Under coverage problem arises with convenience samples.Under coverage occurs when some members of the population are not adequately represented in the sample. In the above example people who do not have a phone under covered.

Voluntary response samples:

If the researcher appeals to people to voluntarily participate in a survey, then the resulting sample is called a voluntary response sample. These samples are always biasedbecause they only include people who choose volunteer, whereas a random sample would need to include people whether or not they choose to volunteer. Often, in a survey a voluntary response samples oversample (people having strong opinions)orunder sample(people who don’t care). In the above example people who do not want to be surveyed come under these samples.

Extrapolation: 

Drawing of a conclusion about something beyond the data range is called extrapolation. Extrapolation of a biased sample systematically excludes certain parts of the population under consideration, the inferences only apply to the subpopulation which has actually been sampled. Extrapolation also occurs if, for example, an inference based on a sample of senior citizens is applied to older adults or to adults without citizenship.

Self-Selection Bias:

A self-selection bias results when the non-random component occurs after the potential subject has enlisted in the experiment. Considering the hypothetical experiment in which subjects were asked about the details of their sex lives, assume that the subjects did not know what the experiment was about until they showed up. Many of the subjects would definitely leave the experiment resulting in a biased sample.Many of television or web sitepolls taken are prone to self-selection bias.

Correction and reduction of sampling bias:

If the statistic is unbiased, the average of all the statistics from all possible samples will equal the true population parameter; even though any individual statistic may differ from the population parameter. The variability among statistics from different samples is sampling error.Increasing the sample size tends to reduce the sampling error but does not affect survey bias (under coverage, nonresponse bias, etc.).

To reduce sampling bias, the two steps when designing an experiment are (i) to avoid convenience sampling (ii) to ensure that the target population is properly defined and the sample frame matches it as much as possible. Random sampling generates representative samples by eliminating voluntary response bias and guarding against under coverage bias. All probability sampling methods rely on random sampling.

Solutions to the bias due to non-response samples can be divided into ex-ante and ex-post solutions. Ex-ante solutions help to prevent and minimize non-response in various ways (for instance specific training of enumerators, several attempts to interview the respondent, etc.) whereas ex-post solutions try to gather auxiliary information about non-respondents which is then used to calculate a probability of response for different population sub-groups and re-weight response data for the inverse of such probability.