Biostatistics Lab 1

  1. Find R. Either download it and install it on your own computer or find it on the Lab's computer. R software is available for free from, for instance.
  2. The next hurdle is to get your data into R. So we need a data set. Let's just use the one from your first homework assignment. I copied the data and the basic story behind the data from Carnegie Mellon's StatLib DASL web site: Original source: Schmitt, Maribeth C., The Effects on an Elaborated Directed Reading Activity on the Metacomprehension Skills of Third Graders, Ph.D. dissertaion, Purdue University, 1987. The data is at the very bottom of this page.
    1. Open something like Excel and copy and paste the data into it. You might well have to use the Text to Columns function under Data in order to get the data appropriately into the right columns.
    2. Save the file as tab delimited text called something like HW01.txt somewhere on the computer.
    3. Make sure you know where HW01.txt is on your computer.
    4. Open R
    5. Change directories in R to the directory where HW01.txt lives. You can do this by going under File and Change dir... or by typing code at the prompt like:
      setwd("C:/Users/Elizabeth Housworth/Desktop")

      Double versus single quotes, backslashes versus forward slashes, versus double slashes are all variations that might be needed on different operating systems. And clearly you should replace my directory system names with your own.
    6. Get your data into R and assign it to a variable by typing something like the following at the prompt:
       data <- read.table(file='HW01.txt', header=TRUE) 

      Tricks: "true" has to be all capitals. The file has a header line if you left the line with Treatment and Response at the top of your file. If you took that out, then the header should be FALSE, which is the default value. We will learn about more complicated objects in R like data frames later.
    7. Just type

      to see the data and make sure that it is all there like it should be.
  3. The next hurdle is to write code to analyze your data and then to figure out whatever it is that R is telling you about the analysis.
    1. Type
      to read about the t.test function in R.
    2. Since our data is in a single column, we are going to use the formula method for conducting the test. Try typing
       t.test(data[,2] ~ data[,1], data)
      data[,2] is the second column of data and data[,1] is the first column and the formula tells R to break up the data by the factor in the first column.
    3. Read the output and the help page again. What kind of t-test was performed? Is the p-value reported for a one-sided or a 2-sided test? Were the variances of the two data sets assumed to be equal or not? The confidence interval is for a difference in the means. Which group came first in the difference and which group was subtracted from the first?
    4. Is a 2-sided test correct for this problem? Think about the set up and the question. Perform a 1-sided t-test by typing either
        t.test(data[,2] ~ data[,1], data, alternative="greater") 
       t.test(data[,2] ~ data[,1], data, alternative="less")  
      Which one gives you the test and confidence interval you want? You can figure out the order R puts the factors in by typing
    5. How would you get R to assume equal variances? Read the t.test help page to find out and try it out.
  4. There are a lot of basic statistical functions that you might want to use on your data. The manual for R is available at For most of them, you would want to separate out the treatment from the control group. You can do that here with the commands
     treatment <- subset(data[,2], data[,1]=="Treated") 
     control <-subset(data[,2], data[,1]=="Control")  
    From this you can do things like find the means and standard deviations by typing things like
  5. You can also plot the two datasets side by side in various ways. A simple side by side box plot can be obtained by typing
     boxplot(data[,2]~data[,1], data) 
    To play with the graph, look at the help page for the boxplot by typing
    You can play with little things like changing the labels using "names" by typing
     boxplot(data[,2]~data[,1], data, names=c("Control", "Treatment"))
    You can add titles after the fact by just typing
    title(main="Effect of Activites on Reading Outcomes", sub="Treatment versus Control")
    Read about all of the things you can change by reading the help pages:
    Try to change the color of something - like the boxes or the labels.

Treatment	Response
Treated	24
Treated	43
Treated	58
Treated	71
Treated	43
Treated	49
Treated	61
Treated	44
Treated	67
Treated	49
Treated	53
Treated	56
Treated	59
Treated	52
Treated	62
Treated	54
Treated	57
Treated	33
Treated	46
Treated	43
Treated	57
Control	42
Control	43
Control	55
Control	26
Control	62
Control	37
Control	33
Control	41
Control	19
Control	54
Control	20
Control	85
Control	46
Control	10
Control	17
Control	60
Control	53
Control	42
Control	37
Control	42
Control	55
Control	28
Control	48