R Code for the Analysis of Inter-MLH1-Foci Data

Reference: E. A. Housworth and F. W. Stahl. 2009. Is There Variation in Crossover Interference Levels among Chromosomes from Human Males? Genetics.

With the data in a tab or space delimited file with no row or column headers, such as:

30 47   23
100  
45 55
12 38   27   13   10

the following code for use in the free statistical software package R will conduct the appropriate analysis. You will need to include your file path and the genetic length of the chromosome where indicated. The code assumes the genetic length in the interference pathway is sharply known with small error compared to the interference parameter that the code estimates. This will be approximately true with large samples. Comments occur after the pound signs. With the data above, the estimate of the shape parameter is approximately 1.2 with a standard error of approximately 0.57 when the genetic length in the disjunction pathway for this chromosome is known to be 44 cM (with the caveat that this sample size is too small for the asymptotics used in determining this standard error to be correct.) Since the shape is not significantly different from 1, there is no evidence of any interference in this data set.

data = read.table('C:\\Users\\User Name\\Desktop\\file.name', 
fill=TRUE, col.names=c(1:21))
# insert your own file name between the single quotes
# if any chromosome has more than 20 crossovers, 
# increase the number 21 appropriately 

X = .44 # enter the disjunction-pathway length in Morgans
# Can be estimated as half the average number of fluorescent 
# foci observed on the chromosome from cytological images


loglike <- function(nu){ 
# the log likelihood is a function of the shape parameter, nu, 
rate = 2*nu*X/100
loglike = 0
number_tetrads = length(data[,1])
for (i in 1:number_tetrads){ # loop through each row of data
  len = length(data[i,]) - sum(is.na(data[i,]))
  g <- function(y){
  (rate/nu)*pgamma(y, shape=nu, rate=rate, lower.tail=FALSE, 
log.p=FALSE)}
  if (len == 1) loglike = loglike + log(1 - integrate(g, 0, 
100)$value)
  # there were no crossovers
  if (len == 2) loglike = loglike + log(g(data[i,1])) + 
log(pgamma(data[i,2], 
shape=nu, rate=rate, lower.tail=FALSE, log.p=FALSE))
  # there was only one crossover
  if (len > 2) { # loop through all the events
  loglike = loglike + log(g(data[i,1])) + 
log(pgamma(data[i,len], shape=nu, 
rate=rate, lower.tail=FALSE, log.p=FALSE))
    for (j in 2:(len-1)){
    loglike = loglike + log(dgamma(data[i, j], shape=nu, 
rate=rate))
		}
		}
	}
loglike # return the log likelihood value
}
MLE_shape =  optimize(loglike, lower=.01, upper=20, 
maximum=TRUE)$maximum
MLE_shape

Fisher_Information = ( 2*loglike(MLE_shape) - 
loglike(MLE_shape+0.01) -
loglike(MLE_shape-0.01) )/(0.01*0.01)

Standard_Error = 1/sqrt(Fisher_Information)

Standard_Error