/    /  Statistics – Sample

Statistics – Sample:

A sample is any part of the fully defined population. A syringe full of blood drawn from the vein of a patient is a sample of all the blood in the patient’s circulation at the moment. Similarly, 100 patients of schizophrenia in a clinical study is a sample of the population of schizophrenics, provided the sample is properly chosen and the inclusion and exclusion criteria are well defined.

To make accurate inferences, the sample has to be representative in which each and every member of the population has an equal and mutually exclusive chance of being selected.

Target population, study population and study sample

A population is a complete set of people with a specialized set of characteristics, and a sample is a subset of the population. The usual criteria we use in defining population are geographic, for example, “the population of Andhra Pradesh”. In medical research, the criteria for population may be clinical, demographic and time related.

  • Clinical and demographic characteristics define the target population, the large set of people in the world to which the results of the study will be generalized (e.g. all schizophrenics).
  • The study population is the subset of the target population available for study (e.g. schizophrenics in the researcher’s town).
  • The study sample is the sample chosen from the study population.

METHODS OF SAMPLING:

Purposive (non-random samples)

  1. Volunteers who agree to participate
  2. Convenient sample such as captive medical students or other readily available groups
  3. Quota sampling, selection of a fixed number from each group
  4. Referred cases who may be under pressure to participate

Non-random samples do have certain limitations.:

  1. The larger group is difficult to identify. The results would be valid for the sample only.
  2. They can never provide clues for further studies based on random samples.
  3. The statistical inferences such as confidence intervals and tests of significance cannot be estimated from non-random samples

Random sampling methods:

Simple random sampling:

A sample may be defined as random if every individual in the population being sampled has an equal likelihood of being included. Random sampling is the basis of all good sampling techniques and disallows any method of selection. To undertake a separate exercise of listing the population for the study may consume time and monotonous. Two-stage sampling makes the task easier.

The usual method of selecting a simple random sample from a listing of individuals is to assign a number to each individual and then select certain numbers by reference to random number tables which are published in standard statistical textbooks. Random number can also be generated by statistical software such as EPI INFO developed by WHO and CDC Atlanta.

Systematic sampling:

A simple method of random sampling is to select a systematic sample in which every nth person is selected from a list. A systematic sample can be drawn from a queue of people or from patients ordered according to the time of their attendance at a clinic. To fulfill the statistical criteria for a random sample, a systematic sample should be drawn from subjects who are randomly ordered. The starting point for selection should be randomly chosen.

Multistage sampling:

Sometimes, a strictly random sample may be difficult to obtain and it may be more feasible to draw the required number of subjects in a series of stages. For example, suppose we wish to estimate the number of CATSCAN examinations made of all patients entering a hospital in a given month in the state of Maharashtra. It would be quite tedious to devise a scheme which would allow the total population of patients to be directly sampled. However, it would be easier

  • To list the districts of the state of Maharashtra and randomly draw a sample of these districts.
  • Within this sample of districts, all the hospitals would then be listed by name, and a random sample of these can be drawn.
  • Within each of these hospitals, a sample of the patients entering in the given month could be chosen randomly for observation and recording.

Thus, by stages, we draw the required sample. If indicated, we can introduce some element of stratification at some stage (urban/rural, gender, age).

It should be cautioned that multistage sampling should only be resorted to when difficulties in simple random sampling are insurmountable.

Stratified sampling:

If a condition is unevenly distributed in a population with respect to age, gender, or some other variable, it may be prudent to choose a stratified random sampling method. For example, to obtain a stratified random sample according to age, the study population can be divided into age groups such as 0–5, 6–10, 11–14, 15–20, 21–25, and so on, depending on the requirement.

A different proportion of each group can then be selected as a subsample either by simple random sampling or systematic sampling. If the condition decreases with advancing age, then to include adequate number in the older age groups, one may select more numbers in older subsamples.

Cluster sampling:

To obtain the required number of subjects for the study by a simple random sample method will require large costs and will be cumbersome. In such cases, clusters may be identified (e.g. households) and random samples of clusters will be included in the study; then, every member of the cluster will also be part of the study. This introduces two types of variations in the data – between clusters and within clusters – and this will have to be taken into account when analyzing data.

Cluster sampling may produce misleading results when the disease under study itself is distributed in a clustered fashion in an area. For example, suppose we are studying malaria in a population. Malaria incidence may be clustered in villages having stagnant water collections which serve as a source of mosquito breeding. In villages without such water stagnation, there will be lesser malaria cases. The choice of few villages in cluster sampling may give erroneous results. The selection of villages as a cluster may be quite unrepresentative of the whole population by chance.

Lot quality assurance sampling:

Lot quality assurance sampling (LQAS), which originated in the manufacturing industry for quality control purposes, was used in the nineties to assess immunization coverage, estimate disease prevalence, and evaluate control measures and service coverage in different health programs. Using only a small sample size, LQAS can effectively differentiate between areas that have or have not met the performance targets. Thus, this method is used not only to estimate the coverage of quality care but also to identify the exact subdivisions where it is deficient so that appropriate remedial measures can be implemented.