**Statistics Explained (by Brad Williamson and Kevin Karplus AP
Biology Experts)**

**Standard
Deviation:** is a measure of variability IF you know that your
data is approximately normally distributed. (your data when plotted
in a histogram approaches the bell
shaped, normal curve) (there are mathematical tests for normalcy
and methods for "normalizing" data as well. ) For many of
the questions that students will ask in the AP lab program this
(normal curve) may be the case especially those experiments that
involve some sort of measurement. If you have data that is not
normally distributed then you can use boxplots
and quartiles to indicate the variability in the data. The
standard deviation should be reported along with sample size and
mean. However, don't do what so many people do, use standard
deviation for an error bar. There are very few instances where this
practice is appropriate, instead use the standard deviation to
calculate the more powerful and meaningful parameter, the standard
error.

**Standard
Error:**The committee that is in charge of determining such
things has recommended that this statistic be called **Standard
Uncertainty** which I agree is a much better label. Another name
for this particular parameter suggested in the AP redesign is the
standard deviation of the sample means. Again, this statistic is for
normal curve data. The formula is the square root of the standard
deviation divided by the sample size. There's a really cool theorem
called the **central limits theorem**. In a nutshell what this
theorem proves is that for an unknown population with almost any kind
of distribution if you repeatedly draw out samples and take the mean
each time, eventually all the various means (each mean is an estimate
of the unknown population mean) will create a normal curve around the
unknown population mean. The width or spread of that curve can be
estimated from two factors: the standard deviation of one of
your sample and the size of your sample. The larger your sample the
narrower your curve. The smaller your standard deviation, the
narrower your curve. Stop just a minute and revisit what I just said
and think about the consequences.....the standard error is the
standard deviation of the sample means Plus or minus one standard
error from your sample mean provides you with an estimate of how well
your sample estimates the mean of the unknown population. In fact,
with some level of precision you could state that you are about 67%
confident that the true population means lies between plus one and
minus one standard error. Or even better, if you plotted error bars
with 2 standard errors (plus and minus) that would bound an area
where you would be about 95% confident that the true population mean
was bounded by the error bars. When appropriate (normal data) plot
error bars with standard error...er I mean standard uncertainty.

__MORE on Standard Error and its uses__: The **standard error**
is the standard deviation of an estimate of a parameter.For example,
the standard error for the estimate of the mean value from a sample
is the sample standard deviation of the sample divided by the square
root of the sample size. If you have a large sample, you can
get a good estimate of the mean of the population even if the
standard deviation of the population is large. A common use of
standard error in biology is for computing the Student t-test to
determine if the mean of a sample differs significantly from a fixed
value or (more commonly) whether the means of two samples differ
significantly from each other. Note that the Student t-test (and
variants of it) rely very strongly on the populations being normally
distributed.

P-value (p-Value) (http://en.wikipedia.org/wiki/P-value)

Example of a p-value for sample size 1 (the test statistic is the single observed value). The vertical coordinate is probability if the statistic is discrete and in that case we should see a probability histogram rather than a curve. (The curve shown is represented as a probability density function so the label on the vertical axis is misleading.) Data yielding a p-value of .05 means there is only a 5% chance obtaining the observed (or more extreme) result if no real effect exists.[3]This definition is rather counter-intuitive, leading to many misunderstandings and misinterpretations. "Common sense" tells us to judge our hypotheses based on how well they fit observed evidence. This is not what a p-value describes. Instead, it describes the likelihood of observing certain data given that the null hypothesis is true. (http://en.wikipedia.org/wiki/P-value)

**t-Test**....The t-Test
(of which there are several) is calculated from the standard
errors. The t-Test is a way of testing a hypothesis and the t-test
generates a probability distribution called a t-distribution that
varies depending on how many degrees of freedom there are (you didn't
ask about degrees of freedom but I don't really want to go into that
here). With this distribution you can precisely define a p-value that
provides you with a probability that your results or more extreme
results were due to chance alone. Again the t-test is for
normal data.... (i.e., non-discrete continuous data).

**Chi-square test: **In the AP lab manual there are labs that
a **chi-square** test
would be more appropriate--say a behavioral choice lab. This test
requires “discrete, categorical or parametric” data is
very different from a lab that would require something like a t-test.

__And more on Chi-square__: **Pearson's chi-squared test **is
used for determining the goodness of fit between a set of
observations and a parametric distribution. Most often it is used for
"categorical" distributions, where there are a finite
number of different values that the random variable can have (like
the cells of a 2x2 table). It is generally only useful when you have
at least 5 counts in each cell—for smaller samples you should
probably use a Fisher exact test instead, which requires more
computation. The chi-squared test is a bit old-fashioned these
days, since the Fisher
exact test is easily done by a computer and does not require the
asymptotic approximation of the chi-squared test, but the chi-squared
test is still widely used, and is good enough when samples are large.

Finally, some have wondered about the t-test. Will it be on the test. For now I doubt it but it is part of what the curriculum framework calls for when it calls for appropriate data analysis of laboratories. Your students should be using t-tests or other such test where appropriate. No--you don't have to teach statistics but many university lab programs include introductory, intuitive stats for specific labs--this includes the t-test, Anova, Mann-Whitney U, and chi-square, correlation, Spearman rank correlation and other tests. Students need to know what a p-value is and how to interpret it when they read scientific work.

Can you explain this cartoon?

http://imgs.xkcd.com/comics/significant.png