Biostatistics Lab 4

In this lab, I promised to cover discrete data analysis in R. Of course, we haven't really had any lectures on discrete data analysis yet... Discrete data is essentially categorical data that can be coded with indicator variables that are 1 if you are in the category and 0 otherwise. The analysis you run is based on the question you are answering. But first, we need some data. Categorical data is often given in summary form. The raw data is a bunch of 0's and 1's and indicator variables. But the summary is often tabular like the following table from: https://onlinecourses.science.psu.edu/stat504/book/export/html/102

Defendant’s Race
Victim’s Race

Death penalty

Yes

No

White
White

19

132

Black

0

9

Black
White

11

52

Black

6

97

Find R or RStudio. For analyzing categorical data, R will let you put in summary data. It will also let you put in raw data. In the case of a 2x2 tables, you can either put in a 2x2 matrix as "x" or two indicator variables as x and y. Since we have summary data here, we will do the former.
Convince yourself that the summary table for the effect of a defendent's race on whether or not the defendent recieves the death penalty is:

Defendant’s Race
Death penalty

Yes

No

White 19 141
Black 17 151

Convince yourself that the summary table for the effect of a victim's race on whether or not the defendent recieves the death penalty is:

Victim’s Race
Death penalty

Yes

No

White 30 184
Black 6 106
Run a Chi Square test of whether the death penalty is handed out independently of the race of the defendant by typing:
```
x = array(c(19, 17, 141, 151), dim=c(2, 2))
x
chisq.test(x)
```
Run a Chi Square test of whether the death penalty is handed out independently of the race of the victim by typing:
```
x = array(c(30, 6, 184, 106), dim=c(2, 2))
x
chisq.test(x)
```
Do it again separately for each type of defendent based on the race the race of the victim. If the defendant is white, the table for the victim's race versus death penalty is x=array(c(19, 0, 132, 9), dim=c(2, 2)). If the defendant is black, the table for the victim's race versus death penalty is x=array(c(11, 6, 52, 97), dim=c(2, 2)).

The same data could be given by indictor variables. The data would look like:

Defendant_white?	Victim_white?	Death_penalty?
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	1
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	1	0
1	0	0
1	0	0
1	0	0
1	0	0
1	0	0
1	0	0
1	0	0
1	0	0
1	0	0
0	1	1
0	1	1
0	1	1
0	1	1
0	1	1
0	1	1
0	1	1
0	1	1
0	1	1
0	1	1
0	1	1
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	1	0
0	0	1
0	0	1
0	0	1
0	0	1
0	0	1
0	0	1
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0
0	0	0

Get this data into R. I named the data deathpenalty. Note that the question marks in the names have been changed by R to periods. Then use R's table command to figure out that it gives the same results:

table(deathpenalty)

Perform the previous chi-square tests on the raw data with commands such as:

chisq.test(deathpenalty[,1], deathpenalty[,3])
chisq.test(deathpenalty[,2], deathpenalty[,3])

DefendantWhite = subset(deathpenalty, deathpenalty$Defendant_white.==1)
chisq.test(DefendantWhite[,2], DefendantWhite[,3])

DefendantBlack = subset(deathpenalty, deathpenalty$Defendant_white.==0)
chisq.test(DefendantBlack[,2], DefendantBlack[,3])

We will discuss the Mantel-Haenszel test in class which is a way of combining the results from separate 2x2 tables. The data can be raw or summarized but the levels are determined by the last category or column. To perform a test with defendant's race as the strata, our data needs to be in the form:
```
data = array(c(19, 0, 132, 9, 11, 6, 52, 97), dim=c(2, 2, 2))
```
Type data to look at it. The commands for running the test are one of the following:
```
mantelhaen.test(data)
mantelhaen.test( deathpenalty$Victim_white.,  deathpenalty$Death_penalty., deathpenalty$Defendant_white.)
```
Yet another way of analyzing discrete response data which is especially useful if you also have continuous predictor data is with logistic regression. In R, this can be run as a regression model with our data as:
```
model <- glm(formula = deathpenalty$Death_penalty.~deathpenalty$Defendant_white.*deathpenalty$Victim_white., family=binomial("logit"))
summary(model)
```
All in all, what do you think these data say about the death penalty and race?

Defendant’s Race	Victim’s Race	Death penalty
Defendant’s Race	Victim’s Race	Yes	No
White	White	19	132
White	Black	0	9
Black	White	11	52
Black	Black	6	97