Extended transmission disequilibrium test

Dave Curtis and Pak Sham, July 1995.

Sham PC and Curtis D. An extended transmission/disequilibrium test (TDT) for multi-allele marker loci. Ann Hum Genet, 1995.

The transmission disequilibrium test

The A allele is transmitted to affected offspring four times out of five.

Spielman RS, McGinnis RE, Ewens WJ. Transmission disequilibrium test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet, 1993, 52: 506-516.

The transmission disequilibrium test was proposed by Spielman as a robust test for association due to two loci being tightly linked. To test whether a marker allele exhibits transmission disequilibrium with a disease, parents of affected subjects are observed. If parents who are heterozygous for the allele transmit it to affected subjects on more than 50% of occasions this is evidence for both linkage and linkage disequilibrium between the marker and disease loci.

Features of TDT

  • Tests for linkage and linkage disequilibrium simultaneously
  • Not prone to false positives due to population stratifications
  • Model-free method of analysis
  • When positive, implicates a very small region (if affected subjects are unrelated)
  • Can be applied to pedigree data

Possible drawbacks of TDT

  • Only positive if linkage disequilibrium is present as well as linkage
  • Only tests a small region
  • When subjects are related, it is difficult to know if there is evidence for linkage disequilibrium in addition to linkage
  • Can only deal with two marker alleles (or one allele against the rest)

The fact that TDT can deal conceptually with only one associated allele poses problems for modern multiallelic markers. If one tests each allele in turn against the rest then one must introduce a correction for multiple testing. Additionally, if two alleles are both associated with the disease then each may serve to mask the other and true association may be missed.

The extended transmission disequilibrium test

If a marker has multiple alleles then each may have a certain degree of association with the susceptibility allele at the disease locus. Information pertinent to transmission disequilibrium to affected subjects may be summarised in a table such as this:

                    Transmitted

                 1    2    3    4

             1        4    4    2

Not          2  13        15    5
transmitted  
             3   9   12         3
 
             4   6    3    6    

Here the entries in the cells of the table indicate the number of times a heterozygote parent transmitted the allele corresponding to the cell column to an affected offspring, while not transmitting the allele corresponding to the cell row. Thus the 4 in the second column of the first row indicates that that 4 parents with genotype 12 transmitted allele 2 to an affected offspring, while the 13 in the first column of the second row indicates that the 13 parents with genotype 12 transmitted allele 1 instead. If there were no transmission disequilibrium we would expect diagonally opposite elements to be equal.

Consideration of individual alleles

If we wished to consider alleles individually then the information in the table could be summarised as follows:


                   1    2    3    4
Transmitted       28   19   25   10
Not transmitted   10   33   24   15

These totals are obtained simply by summing over the columns and rows of the original table. However although this format may be useful for examining the behaviour of individual alleles one cannot analyse the table as a whole because each parent is counted twice (once for the allele transmitted and once for the allele not transmitted).

A model-fitting approach

An alternative way of displaying the information in the original table would be as follows:

1 2   17  13
1 3   13   9
1 4    8   6
2 3   27  12
2 4    8   3
3 4    9   6

Here, the first two columns indicate the genotypes of parents of affected subjects, the third column indicates the number of times that the genotype occurs and the fourth column indicates the number of times the first allele is transmitted to the affected subject. In terms of logistic regression analysis, the third column denotes the number of "trials" and the fourth column the number of "successes".

Genotype-wise analysis

One test for transmission disequilibrium would be to allow a separate parameter to denote the probability of a "success" for each of the observed heterozygous parental genotypes, that is the probability of transmitting the first allele of the genotype, and to examine whether these probabilities differ from 50%. This would yield a likelihood ratio test based on comparing the likelihood maximised over separate transmission probabilities for each genotype with the likelihood assuming all these probabilities are 50%. We will refer to this as a saturated model, and denote it as H2. To compare H2 to H0, we take twice the natural logarithm of this likelihood ratio, 2ln(LR), and treat this as a chi-squared statistic with degrees of freedom equal to the number of observed heterozygous parental genotypes. If all possible parental genotypes occur, then for a marker with m alleles the chi-squared statistic will have m(m-1)/2 degrees of freedom.

Allele-wise analysis

If we consider each genotype individually we risk losing potentially valuable information. In the example shown, allele 1 is transmitted preferentially with respect to alleles 2,3 and 4, but a genotype-wise analysis would not take any account of this. Instead, we may attempt to use a more parsimonious model to fit to the observed data, such that each allele has its own parameter which reflects the extent to which it is associated with the disease allele.

Consider a situation where a marker is in linkage disequilibrium with a disease, such that for each pair of marker alleles i and j the log of odds for allele i to be transmitted from a parent with genotype Gij is given by:

1. ln[P(Ti|Gij)/P(Tj|Gij)] = Bi-Bj

Here Bi and Bj are simply parameters which pertain to the alleles and which tell us something about the extent to which each allele is associated with the disease allele. These allow the application of a standard logistic regression analysis to the data shown above. Feeding the number of "trials" (parental genotypes) and "successes" (first allele transmitted) into a standard logistic regression package allows maximum-likelihood estimation of the allele-specific parameters B.

Using these parameters provides a more parsimonious test. We can term the hypothesis that each allele is associated (positively or negatively) with the marker to a certain extent as H1. To carry out an allele-wise test for transmission disequilibrium, we compare the likelihood maximised over these allele-specific parameters to the likelihood that all parameters are equal, and then 2ln(LR) will yield a chi-squared statistic with m-1 degrees of freedom.

Goodness of fit

We can examine how well the allele-wise model fits the data by comparing it to the model in which each genotype can have a separate transmission probability. If we compare the likelihoods under H2 and H1 then 2ln(LR) forms a chi-squared statistic with (m-2)(m-1)/2 degrees of freedom (assuming that all possible parental genotypes occur).

Output from multi-allele analysis

Below is a summary of the results obtained by analysing the example data above.

Results of fitting parameters:

trials successes fitted

    17     13     12.974995
    13      9     9.010517
     8      6     6.001652
    27     12     11.123588
     8      3     3.858492
     9      6     5.136907

This section shows the predicted number of times the first allele would be transmitted (third column) compared to the observed number of times (second column). In this case, fitting one parameter for each allele produces a very close approximation to the observed data.

Fitted allele parameters with SE's:

           value      SE

Allele  1: 1.101505 0.503740
Allele  2: -0.070830 0.463126
Allele  3: 0.285143 0.459913

Correlation matrix of parameters:

1.000000 0.639812 0.629211 
0.639813 1.000000 0.740540 
0.629212 0.740540 1.000000 

These are the actual values of the fitted parameters, together with their correlations. The parameter for the last allele is arbitrarily fixed at 0.

Log likelihood under null hypothesis: L0 = -56.838066
Log likelihood under parsimonious (allele-wise) hypothesis: L1 = -51.785629
Log likelihood using saturated (genotype-wise) model: L2 = -51.367027

Chi-squared for allele-wise TDT = 2*(L1-L0) = 10.104874, 3 df, p = 0.017695
Chi-squared for genotype-wise TDT = 2*(L2-L0) = 10.942078, 6 df, p = 0.090183
Chi-squared for goodness-of-fit of allele-wise model = 2*(L2-L1)
 = 0.837204, 3 df, p=0.840549

Comparisons of the likelihoods under H0, H1 and H2 using chi-square tests shows significant evidence for transmission disequilibrium using the allele-wise analysis. Howeever the genotype-wise analysis incorporates additional degrees of freedom and does not yield a much higher likelihood, so is not statistically significant. The parsimonious allele-wise model is shown to fit well to the data.

Transmissions for individual alleles:

                  1      2      3      4 

Passed:          28     19     25     10 
Not passed:      10     33     24     15 
Chi-squared:  8.526  3.769  0.020  1.000 
p values:    0.0035 0.0522 0.8864 0.3173 
(these p values should be corrected for multiple testing)

Once the overall analysis has produced evidence for linkage disequilibrium between the loci, it may be helpful to examine which alleles appear most strongly to contribute to this. If one were simply to consider each allele separately without performing the overall analysis first, one would need to carry out a Bonferroni correction for a number of tests equal to the number of alleles. Here, the table helps focus attention on the fact that allele 1 appears preferentially transmitted over other loci.

Dealing with missing parental genotypes

If one parent has not been genotyped, one may still use information from the other parent under certain conditions. The typed parent must be heterozygous, and obviously if the affected child has the same genotype then one cannot tell which allele has been transmitted. If the child is homozygous, then one can deduce which allele has been transmitted, but the pair must still be discarded from the analysis because otherwise one will introduce a bias in favour of commoner alleles. If a parent is missing, one should only incorporate the remaining parent-child pair if both are heterozygous and have different genotypes.

Curtis D and Sham PC. A note on the application of the transmission disequilibrium test when a parent is missing. Am J Hum Genet, 1995, 56: 811-812.

Comparison with Terwilliger's test

Joe Terwilliger has recently described another method which allows TDT analaysis of multi-allelic markers (Terwilliger JD. A powerful likelihood method for the analysis of linkage disequilibrium between trait loci and one or more polymorphic loci. Am J Hum Genet, 1995, 56, 777-787.). His test makes an explicit assumption about the nature of the transmission disequilibrium, in particular that there is one associated allele and that the probability for each allele being the associated one is equal to its frequency in the general population. This test can be applied to a number of loci simultaneously to find a maximum-likelihood map position. The test described here makes no prior assumption about the nature of the association between the disease and marker alleles, and for example allows two or more alleles to be positively associated. On the other hand, this test can only be applied to one marker at a time.

Program availability

A simple DOS program to carry out all the analyses for the multiallele TDT is available from John Attwood's ftp site at ftp.gene.ucl.ac.uk in /pub/packages/dcurtis, with filename etdt.zip.

http://www.mds.qmw.ac.uk/statgen/dcurtis/lectures/etdtlect.html

Dave Curtis (dcurtis@hgmp.mrc.ac.uk)