Using the qdb database to manage pedigree data

Example data files needed for this exercise

Executables required: qdb, pedraw (optional).

Using a database with pedigree data

We will now demonstrate the use of a database to manage data used for linkage analysis and other aspects of genetic studies. The example program we will use do this is called qdb, but we would emphasise that any database program could be used. The qdb program has been chosen as an example because it is free and because it runs under both Windows and X. However if you have another program which you are familiar you should use the principles illustrated to set up your own database to handle pedigree data. A database program has very important advantages over simply keeping data in LINKAGE pedigree and locus datafiles: additional data can be stored; the data can be handled far more flexibly, and it may be possible to streamline certain aspects of the analyses and data management. We would strongly recommend that for any moderate-sized dataset that you do not simply store the data in LINKAGE files but give careful consideration to using some kind of more powerful data management system.

The example qdb database

An example database has been set up which uses qdb and which contains the same data as we have used in the previous exercises. To view this database, run qdb and select File, Load header, and choose the file called autdom.hdc. (You may get a message saying Could not open index file autdomid.idc, but if this happens just click on OK.) Choose File, Reindex to make sure that the database index is up to date. Then, to examine a record, choose Records, Modify records and select First. This is the record for the subject 1 of pedigree 4. You can see that a six digit ID has been constructed, 004001, and this uniquely identifies each subject in the database. As well as fields for sex, father, mother and the genetic loci there are a number of items which can be entered in addition to the information which would normally be contained in a LINKAGE pedigree file. One could keep track of the subject's name, and one might wish to store additional information such as an address and phone number. One might keep information regarding their age, the age at onset of disease, whether they are exposed to environmental factors affecting their liability to affection, and so on. The Group field allows pedigrees with loops to be handled automatically, by marking subjects at whom loops are to be broken. (This only works if the dolink program is to be used.) One can code information regarding how the subject is to be shaded in a pedigree diagram, as the pedraw program allows a number of different patterns. Click on the OK+Next button to view the next record. This should be for subject 004002. Click OK+Next again to display the record for 004003, and you will see that the Father and Mother fields contain the ID numbers for the first two subjects. Now click on the OK or Cancel button and you will return to the Modify records menu. Choose Select and you will get a menu consisting the ID numbers of all the subjects in the database. Choose 007101 and you will see that this subject has a Y in the Dead field. This subject is deceased, which explains why no genotypes are available. Click on OK+Next and you will see that 007102 is also deceased. Click Cancel, and click Cancel again to return to the qdb main frame.

Exporting the data to LINKAGE format

The first thing to demonstrate is that it is possible to get the data from qdb into LINKAGE file format. Exporting data from a database can be done most flexibly using reports, and a report file called tolink.rep has already been set up which takes the data for the disease locus and two markers and writes it in the format of a LINKAGE pedigree file. To see this work select Reports, All records and choose tolink.rep. For the output file, type in the name tolink.ped. This should create a LINKAGE pedigree file in pre-makeped format, which should contain just the same data as autdom2.ped. Examine tolink.ped with a text editor. It should appear as follows:

004    1    0    0  1   2  2 2  2 3   
004    2    0    0  2   1  1 2  1 3   
004    3    1    2  1   2  1 2  2 3   
004    4    0    0  2   1  2 3  2 2   
004    5    3    4  2   2  2 3  2 2   
004    6    3    4  1   2  2 2  2 2   
004    7    3    4  1   1  1 3  2 3   
004    8    3    4  1   2  2 3  2 2   
004    9    3    4  2   1  2 2  2 3   
007    1  101  102  1   2  1 4  1 2   
007    2    0    0  2   1  2 4  2 3   
007    3    1    2  2   2  1 2  1 2   
007    4    1    2  1   2  1 2  1 2   
007    5    1    2  1   1  2 4  2 3   
007    6    1    2  1   2  1 4  2 3   
007    7    1    2  2   1  1 2  2 2   
007  101    0    0  1   2  0 0  0 0   
007  102    0    0  2   1  0 0  0 0   
If you wish to understand how the tolink.rep report file works you can examine it with a text editor, although you may wish to skip this section if you will be working with another database with which you are already familiar.

The report file appears as follows:

:DETAIL
[id,1,3] fformat "%03.0f  "
[id,4,3] fformat "%3.0f  "

.if (father=" ")
"  0  "
.else
[father,4,3] fformat "%3.0f  "
.endif

.if (mother=" ")
"  0  "
.else
[mother,4,3] fformat "%3.0f  "
.endif

.if (sex="M")
"1 "
.else
"2 "
.endif

.if (phe0!=" ")
"  "
[phe0,1,1]
.else
"  0"
.endif

.if (phe1!=" ")
"  "
[phe1,1,1]
" "
[phe1,2,1]
" "
.else
"  0 0 "
.endif

("123456789abcdefghijk" strstr [phe2,1,1]) fformat "%2.0f "
("123456789abcdefghijk" strstr [phe2,2,1]) fformat "%-2.0f  "

/1

The report begins with a :DETAIL statement, indicating that the following commands will be repeated for each record. (Some report files might have additional sets of commands to be performed just once, at the beginning and end of the report.) The [id,1,3] statement indicates that three characters from the ID field are to be output, beginning with the first one. This is the pedigree ID. Next, the [id,4,3] statement indicates that a number formed from three more characters of the ID field are to be output, this time beginning with the fourth one. This forms the ID for the subject within the pedigree. (The fformat commands indicate how these numbers are to be formatted, and are described in the qdb documentation.) The next set of commands outputs a 0 if the Father field is blank or else the within-pedigree ID of the father, and the Mother field is treated similarly. Next, the appropriate sex code is output, 1 for male or 2 for female. The code for the disease locus is stored in the Phe0 field, and a couple of spaces are output followed by the first character of this field, or if it is blank a 0 is output to indicate that the affection status is unknown. Similarly, the genotype for the first marker is stored in the Phe1 field, and the first two characters of this field are output with a space in between them, unless the genotype is missing in which case 0 0 is output. The second marker, which is stored in the Phe2 field, is handled somewhat differently in order to demonstrate how the database can cope with markers having more than 10 alleles. (Although in fact the second marker actually only has 3 alelles.) Alleles numbered higher than 9 can be coded with letters, so that alleles 10, 11 and 12 would be coded with the letters a, b and c. This means that only one character is needed to code for each allele and simplifies some aspects of data management, especially when the database is used in conjunction with the dolink program. In order to correctly decode such letters back to numbers, the strstr function is used. This reports the position of one string within another, and in this example it reports the position of the first and second characters of the Phe2 field within the 123456789abcdefghijk string. This position is output as the correct allele number. If the second string does not occur within the first then strstr produces a result of 0, so if one of the Phe2 characters is either blank or 0 then a 0 will be output, because these do not occur in the test string. The final command in the report consists of a /1 statement and this means that a new line character is to be output, ensuring that each entry in tolink.ped will begin on a new line.

Exporting the data for pedraw

We can also extract data from the database in a format suitable for input to pedraw, the pedigree-drawing program. If you have pedraw available, select Reports, All records and choose topedraw.rep. For the output file, type in the name topedraw.dat. (The file must have extension .dat for pedraw to know what format it is in.) Then run pedraw, load topedraw.dat and display the descendants of 007101. You will see that the first name of each subject has been added to the diagram, and that subjects 007101 and 007102 are marked as deceased. (The shading of 007101 has also been changed to make this slightly clearer.) If you wish, you can examine topedraw.rep and topedraw.dat with a text editor.

Exporting the data for other programs

Using different report files it is possible to export the data in a variety of formats suitable for other programs. For example, a program called dolink can be used to set up linkage analyses automatically, as well as having a number of other features, and qdb can output data in a format which can be read in by dolink. Likewise, there is a program called pointer which performs segregation analysis and which requires data in a completely different format from the linkage programs, but report files can be devised which will export the clinical and pedigree data from the qdb database so that it can be used by the pointer program. Fuller explanations of this will be found in the qdb and dolink documentation.

Summary

This section introduces qdb, an example database program which allows pedigree data to be stored in a central database, whence it can be exported in a variety of different formats for different purposes.

Exercises in genetic linkage analysis

All material copyright (C) Dave Curtis 1996-9

dcurtis@hgmp.mrc.ac.uk