|
|||||||||||||
|
From these formulae, it is evident that when calculating the sample standard deviation rather than the population standard deviation, we are not only interested in the sample sum of squares as opposed to the population sum of squares, but also the degrees of freedom rather than the number of scores in the population. Here, understanding what the degrees of freedom means is important because it will return as a central concept in many other statistical tests. Let’s use an example. A sample contains 5 scores and the sample mean is 6. As such, the sum of all the scores is 30 (5 x 6 = 30). At this stage, we do not know what the value of each of the scores is. Say that we find out that the first score is 10, then the 4 remaining scores must total 20 (30 – 10 = 20). However, although we know that these 4 remaining scores total 20, we still do not know what the value of each of these scores is. If we keep finding out these score, one at a time, there will come a point when we know the value of the final score without anyone telling us. For example, if the second, third and fourth scores were 4, 6 and 8 respectively, then the sum of the all the scores we know to date would be 28 (10 + 4 + 6 + 8 = 28). If 4 of the 5 scores total 28, then the value of the final score must be 2 (30 – 28 = 2). In this sense, degrees of freedom are a measure of the number of new pieces of information we need before knowing all the scores. Therefore, there are 4 degrees of freedom when selecting 5 scores (n – 1 = 5 – 1 = 4). Whilst the sample mean and sample standard deviation are two important statistics when we are using a sample rather than population parameter, we also need to know about the distribution of sample means and standard error of the mean. The Distribution of Sample Means and Standard Error of the Mean In our example, the sample consists of 100 students. However, since there are a lot more than 100 students in the population, which consists of every 16 year old student nationwide, the question arises: How many samples of 100 students are there in the population? For example, if there were 100,000 students aged 16 in the population, then this gives us 1,000 samples of size 100 (100,000 ÷ 100 = 1,000). As we have mentioned, each of these 1,000 samples will differ slightly from each other in terms of their sample mean and sample standard deviation. Some of the samples of 100 students will have students that are better at maths than other groups of 100 students. This could simply be because one group has better access to learning materials than another group (in fact, there could be a large variety of reasons why one group of 100 students performs better or worse than another, but this is not important at this stage). The key point is that each group is different, which is reflected in their sample mean and sample standard deviation. Therefore, if we took each of these 1,000 samples and treated them as a “score” as we would when calculating the population mean, we can also plot all the sample means on a histogram. We call this the distribution of sample means. Since the distribution of sample means is either a normal distribution (if the population mean is normally distributed) or very close to it, the mean of the sample means, From this mean value (that is, the mean of the sample means) and distribution (that is the distribution of sample means), we can calculate the standard deviation of the distribution of sample means, which is called the standard error of the mean, ![]() If we increase the sample size the standard distance between a sample mean and the population mean decreases, which makes the sample mean a more accurate representation of the population mean. Therefore, since we know (1) our population mean, (2) our population standard deviation, (3) our sample size, and (4) that our distribution of sample means is either a normal distribution or close to it, we can calculate a z-score to work out probability values using our standard normal distribution tables (see the statistical guide on Standard Score (z-score)). The value of this becomes particularly appropriate when conducting other parametric tests such as the t test, as does the sample standard deviation (see the statistical guide on One Sample t-Test, Dependent t Test and Independent t Test). This completes the Essentials series of our statistical guides. If you are using these guides because you have conducted some research and need to examine your results, we would recommend that you next read the statistical guide on Selecting Statistical Tests, which should help you to choose an appropriate statistical test(s) to perform on your data. However, if you are on a maths or statistics course and need to know more about parametric and non-parametric tests, we would recommend that you visit our Statistical Guides homepage. |
|
||||||||||||
| Lund Research Ltd. © 2007
|
|||||||||||||




