Executables required: gh, ghostview (optional).
Because of theoretical difficulties concerning the application of the lod score method to complex disease, and because there have been cases where the lod score method has appeared to produce erroneous results, a number of other methods of linkage analysis have been developed which are broadly described as nonparametric.
The classical lod score method of linkage analysis has been very successful in mapping Mendelian disease genes and DNA markers. However in order to calculate a lod score it is necessary for the mode of transmission of all the loci involved to be fully specified, namely the allele frequencies and penetrance values.
It is possible to analyse diseases with complex (non-Mendelian) inheritance if the values for these parameters are known. Thus one may specify particular risks for a genetically normal subject to be a phenocopy and for a genetically abnormal subject to be a non-penetrant carrier. However when transmission is non-Mendelian it can be extremely difficult to estimate penetrance values, including phenocopy risks, and the allele frequency of the disease mutation. Indeed it may be the case that different lod mutations at different loci have different kinds of effect on susceptibility, some major and some minor, some dominant and some recessive.
If different modes of transmission are operative in different families, or if different loci interact in the same family, then no one transmission model may be appropriate. There is an argument that if the transmission model for a lod score analysis is specified incorrectly the results produced from it will not be valid and hence the lod score method should not be relied upon when analysing a disease with unknown mode of inheritance. This subject remains controversial, and some argue that the problems with the lod score method have been overstated. In any event, a variety of methods have been developed to test for linkage without the need to specify values for the parameters defining the transmission model, and these methods are termed nonparametric. Such tests may also be termed "model-free", implying that they may be applied without regard to the true transmission model.
Genehunter tests for an excess of identical-by-descent (IBD) allele-sharing between affected subjects within a pedigree. It differs from affected sib pair analysis in that it can deal with more complex relationships between affected subjects than just sibling relationships. Nevertheless, although the algorithm implemented means that genehunter can be applied only to pedigrees of moderate size, rather than the very large and complex pedigrees which the LINKAGE programs can deal with. Genehunter provides a nonparametric method for linkage analysis using the NPL (nonparametric linkage) statistic. As a useful bonus, it can also be used to calculate conventional multipoint lod scores under assumptions of homogeneity or heterogeneity.
Genehunter takes as input files a pedigree file and locus data file in standard LINKAGE format. These must contain information for a single affection locus followed by a number of marker of loci. Files suitable for genehunter are thus similar to those used by mfmap, except that genehunter will not run if there are any blank lines in the locus data file. Using a text editor, edit the locus data file used for the mflink analysis, called alzall.par, and remove all the blank lines. The file should then appear as follows:
4 0 0 5 << no loci, risk locus, sexlinked(if 1) 0 0.0 0.0 0 << mut locus, mut rate, haplotype freq(if 1) 1 2 4 3 << order of loci 1 2 # ALZ 0.9999 0.0001 << gene freqs 1 << number of liability classes 0.01 0.5 0.5 3 4 # MAR1 0.14 0.32 0.21 0.33 << gene freqs 3 3 # MAR2 0.4 0.4 0.2 3 3 # MAR3 0.3 0.4 0.3 0 0 0.5 0.04 0.06 1 0.05 0.4
Save the file with the new name alzallgh.par.
The genehunter program runs from the operating system. At the system prompt enter:
gh
Then provide the following input line to genehunter:
load markers alzallgh.par
This command reads in the locus data (allele frequencies for each genetic marker, frequency and penetrance information for the disease). The format of this file is as for a standard LINKAGE locus data file with the provisos that there are no blank lines, that the first locus is an affection locus and that subsequent loci are markers. Now enter the following command to see that all output is saved to a file called total.out:
photo total.out
Next provide genehunter with the following command, which instructs it to read in data from the corresponding pedigree file:
scan pedigrees alzall.ped
The main analysis command in genehunter is the scan command. For each pedigree found in the file indicated, the scan command will compute lod scores and NPL sharing statistics at many positions in the genetic map. The pedigree file should be in the standard LINKAGE pedigree file format (before makeped has been run on the file).
The output from the scan command for each pedigree is written to a file called total.out and consists of up to 5 columns of information as follows:
In order to turn on postscript output enter:
ps on
Then enter:
total stat
The total command can only be used after a successful scan command of multiple pedigrees. It will display the same 5 columns of output as the scan command produced for each pedigree, only now the columns will display the combined values of each statistic (sum of lod scores, combined NPL score, average information content, and p-values of the raw NPL score total). In addition to the numerical display of this information (if the postscript output option is turned on) postscript graphs of the total NPL statistic (npl_plot.ps), total lod score (lod_plot.ps), and total information content (info_content.ps) will be created.
Accept the default names for the postscript files.
Now, in order to obtain the heterogeneity lod score as well as the total lod score, enter:
total stat het
You will see that the heterogeneity lod score with the alpha value which produces it is output alongside the total lod score. Again, accept the default names for the postscript files.
Once you have obtained both the homogeneity and heterogeneity lod scores enter quit to leave the genehunter program.
Use an editor to examine total.out. (The overall statistics across all pedigrees are towards the end of the file.) You will see that genehunter reports conventional multipoint lod scores calculated using the transmission model specified in alzallgh.par. These may differ slightly from those obtained with linkmap if genehunter has had to omit certain individuals from the analysis because of memory constraints. NPL scores are also reported along with their associated p values.
It is interesting to compare the p values obtained from the NPL and lod score analyses. Bear in mind that the asymptotic significance of a lod score of 4.4 is 0.000003 and that the ceiling for the true p value is 0.00004. The asymptotic p value for a heterogeneity lod score of 5.5 is 0.0000032.
You can see a graphical representation of the results by using the ghostview program to view the postscript files.
Copyright (C) Gili Koochaki and Dave Curtis 1999-2005
david.curtis@qmul.ac.uk