Sampling

Overview

When you conduct quantitative research it is very important that your sample is as representative as possible to the population that you are studying. Since sampling has many facets, let’s look at an example.

Setting the scene

A sports scientist, Adam, wants to study the effect of exercise on cholesterol levels amongst 45 to 65 year olds. More specifically, he wants to know if there is a relationship between the amounts of exercise an individual performs on a weekly basis and their levels of cholesterol. Not surprisingly, Adam thinks that 45 to 65 year olds that perform regular exercise will have lower levels of cholesterol compared to those that exercise infrequently or not at all.

Since there are millions of 45 to 65 year olds in the country, it will be almost impossible if not impractical to study all of them, which is called a census. Nonetheless, this entire group of 45 to 65 year olds does represent our population; that is, the complete group that we are interested in. Whilst the population in this study is a group of people, it could be anything; pencils, houses, jobs, iPods, and so on.

Where it is impractical to study the population we are interested in, perhaps because it would be very expensive or time consuming, we can study a smaller group of this population instead, which we call a sample. In our example, the sample would simply be a much smaller group of 45 to 65 year olds. The statistical guide, Power Analysis, helps to explain how much smaller this group can be.

The question arises: Is our sample representative of the population? Clearly, if we are interested in 45 to 65 year olds in general, we would not simply choose to study men and ignore women. This would not be representative at all. However, the argument is more nuanced that this. After all, small differences between the sample and population, perhaps reflecting where the study was conducted, the occupation of those involved, their economic status, whether they smoked or not, may also have an impact. As a result, it becomes imperative to have as representative a sample as possible.

There is no such thing as a completely representative sample since this would be a population census and not a sample. Some degree of error between the sample and population is expected and statistics have been developed to account for this, which are explained later in this guide. The solution is to use judgement (ideally based on academic or practitioner based theory) and more rigorous sampling techniques to minimise this error. In terms of Adam’s study of cholesterol levels and exercise, the academic and medical literature may suggest that economic status anddiet have an effect on the likelihood that an individual will have higher levels of cholesterol whilst occupation has a very marginal impact. Therefore, we may choose to include people of different economic status anddiet in our sample but not those of different occupation.

Before we discuss the statistics of sampling, the two main approaches to sampling, probabilitysampling and non-probability sampling, and their associated methods, are discussed. Those sampling techniques based on probability involve some form of random selection whilst non-probability sampling methods do not. Whilst both types of sampling approach are commonly used in research, probability sampling has two main advantages: (1) it helps to minimise (but not eradicate) sampling error; that is, the extent to which our sample does not reflect the population; and (2) it enables us to perform statistical analysis that, at specified levels of statistical significance, allow us to make inferences from our sample to the population.

Probability Sampling

Whilst there are a large number of probability sampling techniques that can be used, four main methods include (1) simple random sampling, (2) systematic random sampling, (3) stratified random sampling, and (4) cluster random sampling. In some cases, a number of these techniques may be required in what is known as multi-stage sampling. In order to discuss these different methods, let’s use the example of 1000 students in a school from which a researcher needs to survey 200 of them.
  • The aim of the simple random sample is to ensure that the chance of each student being surveyed is the same. It does this by assigning each student a number, whether this is done using a table of random numbers, a computer program that generates random numbers, or some other technique. The easiest way is to use a computer program, which can first assign a random number against each of the 1000 students’ names, and then randomly select 200 of these numbers, which becomes the desired sample.
  • Where assigning a number of every item that is being studied (in this case, students) can be very time consuming and perhaps impractical, the systematic random sample can be a useful sampling method. We still need our list of all students although this time we do not need to number them. Systematic random sampling works by first dividing the population size by the sample size; hence, 1000 students divided by 200 students (1000/200 = 5). The figure that is produced (in this case, 5) is the nth item that should be selected from our list. Therefore, we would go down our list and select every 5th student. However, first we need to select the first student randomly, which we can do using a table of random numbers. Since we have to select every 5th student, this means that we should select a random number between 1 and 5. For example, if we selected the number 4, then this would be the first student that we selected. The second would be the 9th student, the third, the 14th student, and so forth (i.e. 19th, 24th, 29th, etc…).
Untitled Document
Statistical Guides
Essentials
Descriptive and inferential statistics
Types of variable
Measures of central tendency
Measures of spread
Frequency Distributions
Standard score (z-score)
Hypothesis testing
Sampling
Overview
Need Help?
Selecting statistical tests
Parametric tests
Non-parametric tests
Lund Research Ltd. © 2007 Privacy Policy Terms of Use