Biostatistics Lab 1
- Find R. Either download it and install it on your own computer or find it on the Lab's computer. R software is available for free from http://streaming.stat.iastate.edu/CRAN/, for instance.
- The next hurdle is to get your data into R. So we need a data set. Let's just use the one from your first homework assignment. I copied the data and the basic story behind the data from Carnegie Mellon's StatLib DASL web site:
Original source: Schmitt, Maribeth C., The Effects on an Elaborated Directed Reading Activity on the Metacomprehension Skills of Third Graders, Ph.D. dissertaion, Purdue University, 1987. The data is at the very bottom of this page.
- Open something like Excel and copy and paste the data into it. You might well have to use the Text to Columns function under Data in order to get the data appropriately into the right columns.
- Save the file as tab delimited text called something like HW01.txt somewhere on the computer.
- Make sure you know where HW01.txt is on your computer.
- Open R
- Change directories in R to the directory where HW01.txt lives. You can do this by going under File and Change dir... or by typing code at the prompt like:
Double versus single quotes, backslashes versus forward slashes, versus double slashes are all variations that might be needed on different operating systems. And clearly you should replace my directory system names with your own.
- Get your data into R and assign it to a variable by typing something like the following at the prompt:
data <- read.table(file='HW01.txt', header=TRUE)
Tricks: "true" has to be all capitals. The file has a header line if you left the line with Treatment and Response at the top of your file. If you took that out, then the header should be FALSE, which is the default value. We will learn about more complicated objects in R like data frames later.
- Just type
to see the data and make sure that it is all there like it should be.
- The next hurdle is to write code to analyze your data and then to figure out whatever it is that R is telling you about the analysis.
help(t.test) to read about the t.test function in R.
- Since our data is in a single column, we are going to use the formula method for conducting the test. Try typing
t.test(data[,2] ~ data[,1], data)
data[,2] is the second column of data and data[,1] is the first column and the formula tells R to break up the data by the factor in the first column.
Read the output and the help page again. What kind of t-test was performed? Is the p-value reported for a one-sided or a 2-sided test? Were the variances of the two data sets assumed to be equal or not? The confidence interval is for a difference in the means. Which group came first in the difference and which group was subtracted from the first?
- Is a 2-sided test correct for this problem? Think about the set up and the question. Perform a 1-sided t-test by typing either
t.test(data[,2] ~ data[,1], data, alternative="greater") or
t.test(data[,2] ~ data[,1], data, alternative="less")
Which one gives you the test and confidence interval you want? You can figure out the order R puts the factors in by typing
- How would you get R to assume equal variances? Read the t.test help page to find out and try it out.
- There are a lot of basic statistical functions that you might want to use on your data. The manual for R is available at
For most of them, you would want to separate out the treatment from the control group. You can do that here with the commands
treatment <- subset(data[,2], data[,1]=="Treated")
control <-subset(data[,2], data[,1]=="Control")
From this you can do things like find the means and standard deviations by typing things like
- You can also plot the two datasets side by side in various ways. A simple side by side box plot can be obtained by typing
To play with the graph, look at the help page for the boxplot by typing
You can play with little things like changing the labels using "names" by typing
boxplot(data[,2]~data[,1], data, names=c("Control", "Treatment"))
You can add titles after the fact by just typing
title(main="Effect of Activites on Reading Outcomes", sub="Treatment versus Control")
Read about all of the things you can change by reading the help pages:
Try to change the color of something - like the boxes or the labels.