Lecture 9: General Guide to Comparing Two Samples
- Determine whether you have paired samples or independent samples.
- If you have paired data:
- Work with the differences.
- Are the differences approximately normally distributed or is the sample
size large? Then a T-distribution is the right statistical tool.
- If the data is continuous but non-normal and especially
if the sample size is small, use the signed rank test.
- If all or most of the differences are positive or if all or
most of the differences are negative, then a simple sign test may suffice to show significance. But a simple
sign test is not as powerful as one that includes ranks, so it may fail to show significance when there really
is a difference.
- If the data is non-normal and very discrete (that is, if it consists of only a few values, all -2, -1, 0, 1, 2, say)
then consider the sign test and/or bootstrapping.
- If you have independent samples:
- Consider whether a log transform of the data is appropriate. If it is, then work with the log transform but
interpret your results on the scale of the original data.
- Are both samples approximately normally distributed or are both sample sizes large? Then a t-statistic is appropriate.
- If the variances are approximately equal or if the design is balanced (equal sample
sizes from both populations), then you may used a pooled-variance t-test and it is the most
powerful method.
- If there is any doubt about whether the variances are equal or not especially with unbalanced designs,
use Welch's (unpooled) t-statistic.
- With non-normal continuous data and especially with non-normal continuous data and small sample sizes, consider the Mann-Whitney
(also called the Wilcoxin rank sum) test. Make sure your two distributions have the same shape (equal variances is one
consideration here) and make sure there are not too many ties in the data.
- With very discrete data (that is, if there are only a few possible values such as 0, 1, 2), consider bootstrapping.