Executables required: qdb.
This exercise goes through the some of the steps involved in setting up the qdb database to deal with an additional locus and inputting the genotypes for that locus. If you are using a database other than qdb then you would have to go through similar steps, and this exercise provides an example of what would need to be done.
In order to handle a new locus one needs to modify the appearance of the qdb data input screen and to modify all necessary report files. The data for the new locus can then be entered for each record separately, or preferably a special input file can be constructed which allows all the new genotypes to be entered as a column of text and then read into the database all at once.
The example database is set up with field Phe0 containing the affection data and fields Phe1 and Phe2 containing the data for the two markers. View the data entry screen using qdb (with autdom.hdc loaded) by selecting Records, Modify records and choosing First. The data entry screen has the labels DIS1, MAR1 and MAR2 by the relevant fields. Note that these labels are only used to identify which fields hold the data for which markers - they are just prompts for the user and have no effect on the way the data is held within the database. (An alternative approach would have been to change the names of the database fields themselves in order to reflect their contents, but it is somewhat simpler to leave the names unchanged and to continue to refer to the fields in report files as Phe0, Phe1 etc.)
In fact, the supplied header file has already been set up to provide a prompt for a third marker, MAR3. The data for this marker would appear in the Phe3 field. In order to provide this prompt, the field description in the autdom.hdc appears as follows:
NPhe3 PMAR3 Phe3 TC L4 R11 C11 V4The line PMAR3 Phe3 causes the relevant prompt to appear.
Currently, there is no prompt for the Phe4 field. If we wanted to deal with a fourth marker and provide a prompt for this in the Phe4 field, we would edit autdom.hdc with a text editor and change the field definition appropriately. The definition is as follows:
NPhe4 TC L4 R12 C11 V4
To include a prompt for MAR4 we would change it to read:
NPhe4 PMAR4 Phe4 TC L4 R12 C11 V4There is no need to be concerned with the details of this process unless one wishes to use qdb for handling one's own data, in which case one would consult the qdb program documentation.
Suppose that the third marker has been genotyped, so that it is now necessary to enter the new genotypes into the database. One way to do this would be to type the data into the Phe3 field of each record in turn. Suppose that the genotypes for subjects 1, 2. 3 and 4 of pedigree 4 are 23, 11, 12 and 13 respectively, and enter these genotypes into the database. Select Records, Modify records and choose First. Into the Phe3 field type in the genotype for the subject 004001, and then click on OK+Next and repeat this procedure for the next three records. When you have added the data for the fourth record, 004004, click on OK to return to the Modify records menu, and then click on Cancel to return to the main qdb frame.
Although this method of adding or modifying data is feasible for a small number of records, it would become very tiresome if one had to do it for a couple of hundred subjects every time a new marker was genotyped. Because of this, qdb has a special facility set up to allow reading in data for a large number of records at once.
A convenient method for inputting data to a large number of different records is to use an input file consisting of a plain text file into which the data can be typed using an editor. Another advantage is that this file can be produced by the laboratory staff and then forwarded for input to the database, without the laboratory staff needing to have any direct access to the database.
Suppose that the genotypes for both pedigrees are as shown in this diagram:
We have provided an input file with the data in the correct format called mar3.sdf. It can be viewed with a text editor and appears as follows:
autdomid.idc 1 6 Phe3 11 4 004001 23 004002 11 004003 12 004004 13 004005 12 004006 23 004007 11 004008 12 004009 13 007001 13 007002 22 007003 12 007004 12 007005 23 007006 12 007007 23 007101 007102The first line means that the index file autdom.idc is to be used and that the first 6 characters of each row provide the key for this index. (This just consists of the subject ID.) Data is to be read beginning with character position 11 and continuing for 4 characters and is to be placed into field Phe3. Subsequent rows then provide the ID keys and data to be input.
With the new marker genotypes in a text file having the correct format they can be read into the database. In qdb, select Utility, Record-specific SDF import (NB make sure you don't select Import SDF by mistake) and choose the file mar3.sdf. What will happen is that qdb will read in each line of data, go to the record having the matching ID and put the data from that line into the Phe3 field. To see that this has happened correctly, choose Records, Modify records and then choose Select and view any of the records for which you have added new genotypes. You should see that the Phe3 field contains the new data, and that the other fields are unchanged.
Now that the data for the third marker has been incorporated into the database it is necessary to modify the accompanying report files in order to handle these data appropriately.
This section demonstrates how to modify topedraw.rep to handle the third marker. If you do not use pedraw you can skip this section.
Use a text editor to examine topedraw.rep. It appears as follows:
FDECLARE shading :DETAIL [ID] /1 trunc(sex) "," .IF (phe0="2") shading:=1 :: .ELSE shading:=0 :: .ENDIF .IF (dead="y") .if (shading=1) shading:=103 :: .else shading:=shading+100 :: .endif .ENDIF shading /1 trunc(Father) "," trunc(Mother) /1 trunc(first) /1 [phe1] /1 [phe2] /10
After statements to output the subject's ID, sex, an appropriate shading code, the parental ID's and the subject's first name come statements for outputting the marker genotypes. The statements to output the genotypes from the Phe1 and Phe2 fields are each followed by a statement to produce one newline character, /1. When all the output for one record is completed ten newline characters are written using the /10 statement and this produces 10 blank lines in the ouput which allow pedraw to distinguish the end of one record from the start of the next. The output this report produces for subject 004003 would appear as follows:
004003 M,1.000000 004001,004002 Harry 12 23
Although we have typed the third marker after the other two, in fact it lies between them in the genetic map. The pedigree diagram will be easier to understand if the genotypes for the third marker are listed between those for the other markers. We can accomplish this by outputting the contents of the Phe3 field between those of the Phe1 and Phe2 fields in the pedraw datafile. To output the contents of the Phe3 followed by a newline character, the following two lines need to be inserted into topedraw.rep in between the lines dealing with Phe1 and Phe2:
[phe3] /1This has been done in a new report file called topedr3.rep. You can view this file with a text editor. It appears as follows:
FDECLARE shading :DETAIL [ID] /1 trunc(sex) "," .IF (phe0="2") shading:=1 :: .ELSE shading:=0 :: .ENDIF .IF (dead="y") .if (shading=1) shading:=103 :: .else shading:=shading+100 :: .endif .ENDIF shading /1 trunc(Father) "," trunc(Mother) /1 trunc(first) /1 [phe1] /1 [phe3] /1 [phe2] /10
In qdb select Reports, All records and choose topedr3.rep. This time for the output file type in pedraw3.dat, to remind us that this file will contain the data for the third marker as well as the first two. You can then run pedraw, load pedraw3.dat and display the descendants of 004001 and 007101. You should see diagrams showing the genotypes for all three markers.
This section demonstrates how to modify tolink.rep to handle the third marker. Use a text editor to examine tolink.rep. It appears as follows:
:DETAIL
[id,1,3] fformat "%03.0f "
[id,4,3] fformat "%3.0f "
.if (father=" ")
" 0 "
.else
[father,4,3] fformat "%3.0f "
.endif
.if (mother=" ")
" 0 "
.else
[mother,4,3] fformat "%3.0f "
.endif
.if (sex="M")
"1 "
.else
"2 "
.endif
.if (phe0!=" ")
" "
[phe0,1,1]
.else
" 0"
.endif
.if (phe1!=" ")
" "
[phe1,1,1]
" "
[phe1,2,1]
" "
.else
" 0 0 "
.endif
("123456789abcdefghijk" strstr [phe2,1,1]) fformat "%2.0f "
("123456789abcdefghijk" strstr [phe2,2,1]) fformat "%-2.0f "
/1
We wish to include the data for the third marker as well as for the
disease locus and the first two markers. The two lines which output data
for the second marker, in the phe2 field, allow a special coding
system for highly polymorphic markers so that the first 9 alleles can be
numbered 1-9 but thereafter alleles can be labelled a-k. We will use the
same system for the third marker, which involves adding in the following
two lines above the line containing /1:
("123456789abcdefghijk" strstr [phe3,1,1]) fformat "%2.0f "
("123456789abcdefghijk" strstr [phe3,2,1]) fformat "%-2.0f "
This has been done for a modified report file called tolink3.rep, which you can examine with a text editor. It appears as follows:
:DETAIL
[id,1,3] fformat "%03.0f "
[id,4,3] fformat "%3.0f "
.if (father=" ")
" 0 "
.else
[father,4,3] fformat "%3.0f "
.endif
.if (mother=" ")
" 0 "
.else
[mother,4,3] fformat "%3.0f "
.endif
.if (sex="M")
"1 "
.else
"2 "
.endif
.if (phe0!=" ")
" "
[phe0,1,1]
.else
" 0"
.endif
.if (phe1!=" ")
" "
[phe1,1,1]
" "
[phe1,2,1]
" "
.else
" 0 0 "
.endif
("123456789abcdefghijk" strstr [phe2,1,1]) fformat "%2.0f "
("123456789abcdefghijk" strstr [phe2,2,1]) fformat "%-2.0f "
("123456789abcdefghijk" strstr [phe3,1,1]) fformat "%2.0f "
("123456789abcdefghijk" strstr [phe3,2,1]) fformat "%-2.0f "
/1
In qdb select Reports, All records and choose tolink3.rep. This time for the output file enter autdom3.ped, to remind us that this file will contain the data for the third marker as well as the first two.
This section shows some of the steps to be gone through when adding information on a new marker locus to the qdb database.
Exercises in genetic linkage analysis
All material copyright (C) Dave Curtis 1996-2004
david.curtis@qmul.ac.uk