BioMath Exploration 0.1: Is there a significant difference in learning environments?
The goal of this BioMath Exploration is to help you interpret data like the learning gains in Figure 1. You need to know the concept of a mathematical average. You will learn the concepts of standard deviation, standard error, pvalue, and significant differences, all of which are used throughout ICB.
In Figure 1, class environment had a much greater effect on learning for some questions than for others.
Figure 1 Average change in test scores organized by type of question. Changes are the difference in individual performances between pretest and posttest. Purple bars are averages for the activelearning course, and teal bars are averages for the lecture course (+ 1 SE). * = p < 0.05; ** = p < 0.01; *** = p < 0.001; p > 0.05. Figure 1 from Udovic et al., 2002, by permission of Oxford University Press and AIBS.) 
Student change in understanding is displayed graphically by the height of the bars. If a purple bar is much taller than the corresponding teal bar, then the activelearning course resulted in more learning gains, on average, than the traditional course. However, to conclude that there was a significant increase in average learning gains in one environment versus the other, you need to consider both the size of the sample, and the variation among different students. If the variability is high, or the sample size is small, then the difference in averages might be due to chance. Let’s walk through an example to illustrate this idea.
Suppose two small classes of 10 students each (Class A and Class B) are tested for learning gains on Question 1. Pretest and posttest scores increase by the following amounts in Class A:
19, 17, 21, 16, 18, 20, 15, 19, 15, 20
and by the following amounts in Class B:
8, 26, 11, 33, –2, 15, 28, 14, 42, 5.
The students in a class are a sample from the population of students who might have taken that class in the past, or might take it in the future. Larger samples (more students) will more accurately represent the total population. The number of data points in a sample is the sample size, and the average of the data points in a sample is called the sample average or sample mean.
BioMath Exploration Integrating Questions

These two hypothetical classes of 10 students have the same average learning gains of 18 points on Question 1, but the second class has much greater variability in their learning gains than the first class. One student in Class B even performed worse on the posttest (–2) than on the pretest (a negative learning gain). Variability in a data set can be quantified by computing the standard deviation(s) of each sample. The formula for the sample standard deviation, illustrated in CH00_SE.xlsx, is essentially the square root of the average of the squared differences between the data points and sample average. Another way to quantify variability in a sample is the standard error (SE). SE is the standard deviation divided by the square root of the sample size. The standard error represents how much the average (not the individual data points) is expected to vary, and SE gets smaller with increasing sample size.
The error bar on each column in Figure 1 represents a distance of one standard error. For example, the average learning gain on Question 1 in Figure 1 in the activelearning environment was approximately 18, and the standard error was about 4.5. The standard error was approximately 0.68 in the Class A, and 4.36 in Class B. Error bars reflect how much variability there is in the sample, and provide much more information than the average alone. The data in Figure 1 are averages for 61 or 62 people, and a highly variable sample indicates that a different group of 61 or 62 people could have quite different outcomes.
BioMath Exploration Integrating Questions

The pvalue displayed for each question in Figure 1 goes a step further than the error bars, addressing the uncertainty in the difference between learning gains in the two learning environments. Specifically, the pvalue is the probability that average learning gains as different as those displayed in the graph could be observed if there were no difference in the learning environments. A small pvalue, such as 0.05, means there is a low probability that there is no difference between learning environments. In other words, there is a high probability the two learning environments are truly different. A large pvalue, such as 0.3, means there is a high probability that there is no difference between score changes in the two classes. In summary, a small pvalue means there is a significant difference, and a large pvalue means there is no significant difference.
Computing pvalues requires a statistical method such as a chisquared test (see BME 17.2) or a ttest (see BME 23.1). A rough rule of thumb is that if the standard error bars overlap, the difference is not significant, as seen in Figure 1 for Question 10. The important thing to remember is that a meaningful comparison of two populations always requires making several observations and comparing the averages of the observations in light of the variability within each sample. You should be very skeptical of a graph that does not include error bars or describe the variability in the results.