Using the linkage utility programs to perform analyses

Example data files needed for this exercise

Executables required: makeped, quiklink, unknown, mlink, table.

Introduction

In practice one does not wish to deal directly with the raw input and output files of LINKAGE programs such as mlink. They are too inconvenient to manage in the form required. Additionally, one will generally wish to store information about more than one marker in the same file and then to extract information regarding subsets of markers for different analyses. The distribution package for the LINKAGE programs includes three utility programs designed to address this issue called lcp (Linkage Control Program), lsp (Linkage Setup Program) and lrp (Linkage Report Program). However these programs themselves are quite cumbersome to use by today's standards and they do not provide the output in a form which would be required for modern methods of analysis. There are a variety of different packages designed to facilitate setting up and managing the results of linkage analyses. Here we use the quiklink program as a replacement for LCP and the table program as a replacement for LRP.

Data for a second marker

Suppose that the pedigrees are now typed using a second marker called MAR2 which is linked to the first marker, and that the genotypes obtained are as follows:

pedigree diagrams with genotypes of second marker

In order to carry out a linkage analysis with this new marker, the genotypes need to be entered into the pedigree file and the locus data file needs to be modified to include a description of the second marker locus. This could be done using a text editor. Modified versions of the files are provided called autdom2.ped and autdom2.par.

If you examine autdom2.ped with the text editor you will see that it appears as follows:

004    1   0   0  1  2    2 2   2 3
004    2   0   0  2  1    1 2   1 3
004    3   1   2  1  2    1 2   2 3
004    4   0   0  2  1    2 3   2 2
004    5   3   4  2  2    2 3   2 2
004    6   3   4  1  2    2 2   2 2
004    7   3   4  1  1    1 3   2 3
004    8   3   4  1  2    2 3   2 2
004    9   3   4  2  1    2 2   2 3
007    1 101 102  1  2    1 4   1 2
007    2   0   0  2  1    2 4   2 3
007    3   1   2  2  2    1 2   1 2
007    4   1   2  1  2    1 2   1 2
007    5   1   2  1  1    2 4   2 3
007    6   1   2  1  2    1 4   2 3
007    7   1   2  2  1    1 2   2 2
007  101   0   0  1  2    0 0   0 0
007  102   0   0  2  1    0 0   0 0

The genotypes for the second marker have been added as two new columns of alleles.

If you examine the modified locus data file, autdom2.par, you will see that it appears as follows:

 3   0   0   5  << no loci, risk locus, sexlinked(if 1)
 0  0.0  0.0  0  << mut locus, mut rate, haplotype freq(if 1)
 1 2 3 << order of loci

 1 2 # DIS1
 0.9995  0.0005  << gene freqs
 1 << number of liability classes
 0.0 1.0 1.0

 3 4 # MAR1
 0.14 0.32 0.21 0.33 << gene freqs

 3 3 # MAR2
 0.4 0.4 0.2

0 0
 0.0 0.1
1 0.05 0.4

In order to accommodate information for the second marker, the following modifications have been made:

Running makeped on the modified pedigree file

Before carrying out a linkage analysis with the new data, pointers and probands must be added to the pedigree file using makeped, so at the operating system prompt enter:

makeped autdom2.ped autdom2.ppd n

Setting up an analysis with quiklink

If we wish to carry out a two-point analysis of the disease against the new marker we cannot input the files directly to mlink, because the pedigree and locus files contain data for all three loci, not just the two we are interested in. In order to facilitate managing data and set up analyses the LINKAGE programs are accompanied by utility programs. One, called lsp, can extract data for a subset of loci from the pedigree and locus files, and can also modify the locus data file in order to make it appropriate for a number of different types of linkage analysis. However lsp is not often used directly, but instead another program called lcp was supplied with the LINKAGE programs to write an MSDOS batch file or Unix shell script to automatically carry out the copying and renaming of files we performed in the previous exercise and to call lsp to set up the appropriate analyses, as well as to call the analysis progams themselves. Rather than use lcp we will use a different utility program which carries out a similar function, called quiklink. This program is less flexible but easier to use than lcp. Like lcp, quiklink writes an MSDOS batch file or Unix shell script to automate setting up and performing analyses.

To use quiklink to set up a two-point analysis between the disease locus and the second marker, at the system prompt enter:

quiklink autdom2.ppd autdom2.par

Then in response to the prompt Enter switches or fileroot locus_numbers (blank to finish): type in the following line and press Enter:

d1m2 dis1 mar2

(Note: the 1's above are the number one, not the letter L.)

The press Enter again to leave the quiklink program and return to the system prompt.

This instructs quiklink to set up a two-point mlink analysis between the first locus, which is the disease locus called DIS1, and third locus in the file which is the second marker. The program produces pedigree and locus files containing data for just these two loci called d1m2.ppd and d1m2.par, and an MSDOS command file called d1m2.bat or, under Unix, d1m2.sh. You can run this analysis by entering at the DOS prompt:

d1m2.bat (or just d1m2)

or at the Unix prompt:

sh d1m2.sh

The stream file output will be written to d1m2.out and the log file output (from outfile.dat) will be written to d1m2.res.

You can examine d1m2.res with a text editor to see the log likelihoods produced by the analysis.

Running table on mlink output

There is a simple utility called table which can read in the output produced by mlink and generate a table of lod scores. It can also read linkmap output. It differs from the lrp program supplied with the LINKAGE programs in that it cannot handle output from lodscore or ilink, and it will only work if the output file contains the results from just a single analysis. Its advantages are that it is simple to use and can carry out other functions such as preparing graph files and calculating admixture (heterogeneity) lod scores.

In order to run table on the results from this mlink analysis of the disease locus and second marker, at the operating system prompt enter:

table d1m2.res

This command will generate a file called d1m2.tab. Examine this file using a text editor. It should appear as follows:


theta     0.000   0.010   0.050   0.100   0.150   0.200   0.300   0.400
cM        0.000   1.000   5.017  10.137  15.476  21.182  34.657  54.931
     4    1.505   1.483   1.394   1.276   1.152   1.021   0.731   0.396
     7  -99.000  -0.813  -0.186   0.022   0.100   0.124   0.095   0.031
total   -97.495   0.670   1.208   1.298   1.253   1.145   0.825   0.427

The table consists of a set of recombination fractions and the individual and total lod scores at each position. In addition the recombination fractions are converted into Kosambi centimorgans, although this is more relevant for results of multipoint analysis. Compare the lods carefully to those obtained for the first marker. Where do the maximum lods occur, and how large are they? Examine the pedigrees to try to work out which meioses are recombinant and non-recombinant, and see if this seems consistent with the lod scores you have obtained.

Summary

This exercise demonstrates how to use quiklink and table to set up linkage analyses and view the results obtained. The pedigree and locus data files contain information on the disease and two markers, but quiklink is used to set up an analysis for just the disease and second marker. The table program takes the raw output from mlink and produces a table of lod scores.

Exercises in genetic linkage analysis

All material copyright (C) Dave Curtis 1996-2006

david.curtis@qmul.ac.uk