Lecture 7

The following data set was used in class. It is the shuttle O-ring incidents given in your text:

Below	Above
1	0
1	0
1	0
2	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	0
	1
	1
	2

In class, we wrote something like the following code in R: (However, there seems to be a problem. The t-test statistic in our data was 3.56. It is possible to get the same statistic if one zero, one 1, and two 2's are in the below column. It is possible to have a more extreme value if two 1's and two 2's are in the below column. Our result or something more extreme should be coming up about 12 times out of 1000. I don't get them to come up at all in simulations running up to 100,000. It is probably a problem with the random number generation and better code would fix it.)

out <- array(0, dim=c(10000))
pool = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,2,2)
for(i in 1:10000){
pool<-sample(pool, 23)
below = pool[1:4]
above = pool[5:23]
out[i] = t.test(above, below, var.equal=TRUE)$statistic
}
out[order(out)]

In other years, we wrote the code below for a global macro to perform a bootstrap version of a permutation test on these data. The following code creates a permutation, forms the pooled-variance t-statistic, and counts the number of times the t-statistic exceeded that of the original data. The code has to be modified if the test is not whether the mean or column 1 is greater than the mean of column 2 but a 2-sided test or a test in the other direction.

GMACRO
permutation

# this macro assumes you have 2 samples
# one stored in column 1 and one in column 2
# code after a pound sign are comments and are ignored

Name k1 "loop"
Name k2 "SampleSize1"
Name k3 "SampleSize2"
Name k4 "Total"
Name k5  "T"
Name k6  "RandomT"
Name k7 "Count"
Name k8 "pValue"
Name k9 "pooledSD"

Let Count = 0
Let SampleSize1 = N(c1)
Let SampleSize2 = N(c2)
Let Total = SampleSize1 + SampleSize2
Let pooledSD = sqrt(((SampleSize1 -1)*stdev(c1)**2 + (SampleSize2 -1)*stdev(c2)**2)/(SampleSize1 + SampleSize2 -2))
Let T =  (mean(c1) - mean(c2) )/(pooledSD*(sqrt(1/SampleSize1 + 1/SampleSize2)))

Stack c1 c2 c3;
subscripts c4.

Do k1 = 1:1000
Sample  Total c4 c5
Unstack c3 c6 c7;
Subscripts c5.
Let pooledSD = sqrt(((SampleSize1 -1)*stdev(c6)**2 + (SampleSize2 -1)*stdev(c7)**2)/(SampleSize1 + SampleSize2 -2))
Let RandomT =  (mean(c6) - mean(c7) )/(pooledSD*(sqrt(1/SampleSize1 + 1/SampleSize2)))
If RandomT >= T
Let  Count = Count + 1
ENDIF
ENDDO
Let pValue = Count/1000
Print pValue

ENDMACRO

A small change in the code gives the code for a bootstrap (sampling with replacement) procedure:

GMACRO
permutation

# this macro assumes you have 2 samples
# one stored in column 1 and one in column 2
# code after a pound sign are comments and are ignored

Name k1 "loop"
Name k2 "SampleSize1"
Name k3 "SampleSize2"
Name k4 "Total"
Name k5  "T"
Name k6  "RandomT"
Name k7 "Count"
Name k8 "pValue"
Name k9 "pooledSD"

Let Count = 0
Let SampleSize1 = N(c1)
Let SampleSize2 = N(c2)
Let Total = SampleSize1 + SampleSize2
Let pooledSD = sqrt(((SampleSize1 -1)*stdev(c1)**2 + (SampleSize2 -1)*stdev(c2)**2)/(SampleSize1 + SampleSize2 -2))
Let T =  (mean(c1) - mean(c2) )/(pooledSD*(sqrt(1/SampleSize1 + 1/SampleSize2)))

Stack c1 c2 c3;
subscripts c4.

Do k1 = 1:1000
Sample  Total c3 c5;
Replace.
Unstack c5 c6 c7;
Subscripts c4.
Let pooledSD = sqrt(((SampleSize1 -1)*stdev(c6)**2 + (SampleSize2 -1)*stdev(c7)**2)/(SampleSize1 + SampleSize2 -2))
Let RandomT =  (mean(c6) - mean(c7) )/(pooledSD*(sqrt(1/SampleSize1 + 1/SampleSize2)))
If RandomT >= T
Let  Count = Count + 1
ENDIF
ENDDO
Let pValue = Count/1000
Print pValue

ENDMACRO