Executables required: ped2tdt, calctdt, lrtdt, etdt.bat/etdt.sh
The standard transmission/disequilibrium test (TDT) examines a single allele of a marker to determine whether it is transmitted from heterozygous parents to affected offspring more often than would be expected by chance, i.e. on more than 50% of occasions. The analysis implemented in etdt (extended TDT) is one of a number of methods for extending this approach to multiallelic markers. Obviously, one method would just be to test each allele in turn, but this would require a Bonferroni correction and might not be optimal with certain patterns of association. The etdt package seeks to test specifically whether one or a group of marker alleles may be associated with the disease.
The etdt package implements likelihood ratio tests based on logistic regression analysis. One test simply treats each genotype (i.e. pair of marker alleles) separately to see whether for some genotypes there is a tendency for one or other allele to be more frequently transmitted to affected subjects. This is referred to as the genotype-wise test. However etdt also implements an allele-wise test which determines whether certain alleles are preferentially transmitted over others, combining the information across all genotypes in which these alleles occur. Likelihoods derived under the different hypotheses of no effect, a genotype-specific effect and an allele-specific effect are compared using likelihood ratio tests which yield chi-squared statistics. The goodness-of-fit of the allele-wise versus genotype-wise model is also tested with a likelihood ratio statistic.
The etdt analysis is in fact performed by three different programs, called ped2tdt, calctdt and lrtdt:
In practice, the example post-makeped pedigree file mar9tdt.ppd would have been produced from some central database holding information on all markers genotyped in the sample of trios. An accompanying locus data file, mar9tdt.par, provides information to ped2tdt regarding the number of alleles at each marker locus (the files could contain information regarding more than one marker). To carry out each stage of the etdt analysis, at the operating system prompt enter:
ped2tdt mar9tdt.ppd mar9tdt.par mar9tdt.tdt
calctdt mar9tdt.tdt mar9tdt.cou
lrtdt mar9tdt.cou mar9tdt.chi
To understand the process, you should use a text editor to examine the files intermediate input files, mar9tdt.tdt and mar9tdt.cou.
The whole process can be automated by using the etdt batch or script file. Under MSDOS enter:
etdt mar9tdt
Or under Unix enter:
etdt.sh mar9tdt
This will automatically perform each stage of the analysis.
The final output is written to the file mar9tdt.chi. Examine this file with a text editor.
The output from lrtdt consists of the maximum likelihood estimates of the association parameter for each allele, the observed data and the results which would be expected from these parameter estimates, and chi-squared statistics produced by comparing the log likelihoods under the null hypothesis and under the alternative hypotheses that transmission probabilities may deviate from 50% in an allele-specific or genotype-specific manner. For completeness, the chi-squared statistics for the consideration of each allele against the rest are also output.
If the allele-wise test is positive then one can compare L1 with L2 to examine whether the parsimonious allele-wise model used to produce L1 provides a good fit to the data. 2(L2-L1) is a chi-squared statistic with degrees of freedom equal to the number of observed heterozygous genotypes plus 1 minus the number alleles (the difference between the numbers of degrees of freedom for the first two tests).
If the goodness-of-fit test is statistically significant it indicates that the allele-wise model produces a poor fit. This might be because subjects are related, or because there is non-random mating between parents (so that alleles in the population from which the parents are drawn are not in HardyWeinberg equilibrium), or because there is a non-zero recombination fraction between the disease and marker loci.
The output from lrtdt also includes the transmissions for each allele individually, together with a chi-squared statistic and p value assuming one degree of freedom for those alleles observed in 10 or more heterozygous parental genotypes. These individual p values are subject to correction for multiple testing. In practice, one would seek to find evidence for a significant result using the allele-wise and/or genotype-wise analysis, and only if such evidence were obtained would one go on to examine transmissions for the different alleles individually in order to gain an understanding of the nature of the deviation from random transmission.
This section demonstrates the etdt package, which implements a logistic regression approach to implement the transmission/disequilibrium test for linkage and association with multiallelic markers.
Exercises in genetic linkage analysis
Copyright (C) Dave Curtis and Gili Koochaki 1996-2000
dcurtis@hgmp.mrc.ac.uk