Biostatistics Lab 4

In this lab, I promised to cover discrete data analysis in R. Of course, we haven't really had any lectures on discrete data analysis yet... Discrete data is essentially categorical data that can be coded with indicator variables that are 1 if you are in the category and 0 otherwise. The analysis you run is based on the question you are answering. But first, we need some data. Categorical data is often given in summary form. The raw data is a bunch of 0's and 1's and indicator variables. But the summary is often tabular like the following table from: https://onlinecourses.science.psu.edu/stat504/book/export/html/102

Defendantís Race
Victimís Race
Death penalty
Yes
No
White
White
19
132
Black
0
9
Black
White
11
52
Black
6
97

  1. Find R or RStudio. For analyzing categorical data, R will let you put in summary data. It will also let you put in raw data. In the case of a 2x2 tables, you can either put in a 2x2 matrix as "x" or two indicator variables as x and y. Since we have summary data here, we will do the former.

    Convince yourself that the summary table for the effect of a defendent's race on whether or not the defendent recieves the death penalty is:

    Defendantís Race
    Death penalty
    Yes
    No
    White 19 141
    Black 17 151

    Convince yourself that the summary table for the effect of a victim's race on whether or not the defendent recieves the death penalty is:

    Victimís Race
    Death penalty
    Yes
    No
    White 30 184
    Black 6 106

  2. Run a Chi Square test of whether the death penalty is handed out independently of the race of the defendant by typing:
    x = array(c(19, 17, 141, 151), dim=c(2, 2))
    x
    chisq.test(x)
    
    Run a Chi Square test of whether the death penalty is handed out independently of the race of the victim by typing:
    x = array(c(30, 6, 184, 106), dim=c(2, 2))
    x
    chisq.test(x)
    
  3. Do it again separately for each type of defendent based on the race the race of the victim. If the defendant is white, the table for the victim's race versus death penalty is x=array(c(19, 0, 132, 9), dim=c(2, 2)). If the defendant is black, the table for the victim's race versus death penalty is x=array(c(11, 6, 52, 97), dim=c(2, 2)).
  4. The same data could be given by indictor variables. The data would look like:
    Defendant_white?	Victim_white?	Death_penalty?
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	1
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	1	0
    1	0	0
    1	0	0
    1	0	0
    1	0	0
    1	0	0
    1	0	0
    1	0	0
    1	0	0
    1	0	0
    0	1	1
    0	1	1
    0	1	1
    0	1	1
    0	1	1
    0	1	1
    0	1	1
    0	1	1
    0	1	1
    0	1	1
    0	1	1
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	1	0
    0	0	1
    0	0	1
    0	0	1
    0	0	1
    0	0	1
    0	0	1
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    0	0	0
    
    Get this data into R. I named the data deathpenalty. Note that the question marks in the names have been changed by R to periods. Then use R's table command to figure out that it gives the same results:
    table(deathpenalty)
    
  5. Perform the previous chi-square tests on the raw data with commands such as:
    chisq.test(deathpenalty[,1], deathpenalty[,3])
    chisq.test(deathpenalty[,2], deathpenalty[,3])
    
    DefendantWhite = subset(deathpenalty, deathpenalty$Defendant_white.==1)
    chisq.test(DefendantWhite[,2], DefendantWhite[,3])
    
    DefendantBlack = subset(deathpenalty, deathpenalty$Defendant_white.==0)
    chisq.test(DefendantBlack[,2], DefendantBlack[,3])
    
  6. We will discuss the Mantel-Haenszel test in class which is a way of combining the results from separate 2x2 tables. The data can be raw or summarized but the levels are determined by the last category or column. To perform a test with defendant's race as the strata, our data needs to be in the form:
    data = array(c(19, 0, 132, 9, 11, 6, 52, 97), dim=c(2, 2, 2))
    
    Type data to look at it. The commands for running the test are one of the following:
    mantelhaen.test(data)
    mantelhaen.test( deathpenalty$Victim_white.,  deathpenalty$Death_penalty., deathpenalty$Defendant_white.)
    
  7. Yet another way of analyzing discrete response data which is especially useful if you also have continuous predictor data is with logistic regression. In R, this can be run as a regression model with our data as:
    model <- glm(formula = deathpenalty$Death_penalty.~deathpenalty$Defendant_white.*deathpenalty$Victim_white., family=binomial("logit"))
    summary(model)
    
  8. All in all, what do you think these data say about the death penalty and race?