genetic maps - biostatistics and medical informatics · 2010-05-13 · genetic maps from my past,...

55
Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University of Wisconsin – Madison www.biostat.wisc.edu/~kbroman

Upload: others

Post on 19-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Genetic mapsfrom my past, present and future from my

Karl W BromanBiostatistics & Medical Informatics

University of Wisconsin – Madison

www.biostat.wisc.edu/~kbroman

Page 2: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Genetic mapsfrom my past, present and future from my

Karl W BromanBiostatistics & Medical Informatics

University of Wisconsin – Madison

www.biostat.wisc.edu/~kbroman

Page 3: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Eucalypt genetic map

1 2 3 4 5 6

0.0 '" 0.0 .m 0.0 0.0 ;I�se

0.0 0'" �1 1l '" ,IU ,. ."", ' " g427 " ", .0'" ,., ;na

•. , '.0 ,0029 • •• "1 C422

••• •• " ... . �, ". C184

'" '" \2.3 92)7 941 ..... 1 �.4 ,-, 1 '.2 gOgBA, �474 " '"

1 a. , �!iSB 17,!I " .'� 1 !I.I 91178

19.3 " 1'.1 ,�, 20. \ �, c187 2l.i 9'·2 IF ,,06;8

'H

I' U14..J " , c16!IC

.�" H ,�, 2B. ' "' .. 21.2 gOJ2.\, eU' ... " .J711A 21.1 ., ..

""� 21.1 HII-l 32.8 J'H 31.. .�. 32.1 9121 29.' ,42) 33.2 '"

0423 �2�' 3'.e 11-1 11-2

,Ill "'j 'I' N12-' • • 43.' 9I)88A 43.' 45.7 gtlUB 52.1 � .. 4B.l BI2-1 .. , cHI L1Z-! SIUI 812-1 ". , "" 52.' loIl-2

".IU, II. J ,OlU. .'" ,�, Sl.O 0'9) clBO ... lUI "" ' ..., .,,, d�7 62.0 ;0'99 d • Ii! .451S 81.54 "'00 11.8 u., •• sec 6'.1 e211 C

... ".W .... 6B.1 .�,

., .• e132 18.1 ,421 71.1 51-\ 7'.1 .- 14.' 11-\ 71.2 ... ,4-2

78.2 11.7 ,JliB 78.1 " 7&.1 0171 82.1 "�,

18.2 ,31 . .0. liS.' _ g3D " 78.' "

'1.1 ,,421C 111.1 m 11.2 ",n <:41\ BJ.t 9412 " ".0 ""

111.1 94128 , 01.1 IOU <00, 100.3 g"S "j'! 18N e211A ". 107.' .",

(18-' ,,,., " 10\1 . 2 L12-2

, 10.1 FI-2 110. I .m 110.1 ... ' lot1.7 1I7

'i 114.1 c421A 1\3 .• ,,,.., 110.1 c4278 110.7 117. 115

.

:5 11'11-1 "1'1 I II.' 111.� cOlO 111.2 _ WI1_2

. . . 11 8.1 120.3 �"7 III. G�-: 120.7 - ;428 1 H. lli.1 120.7 1;-2 122.7 �, 121.2 121.8 I:l.' ,6-'

121.2 128 .• I 21.0 -ill! 12'.5 ,

In.7 IlO.2 . _ cono 12'.7 " In. 130.11 Ill.' ;262 I2t. Ill.: 13a.' 13J.7

1&-2 1 38 .1 8342 IJI .• � ..

13'.4 Ul.: ellS 1l8.1 6-, 140.2

14'.0 IU.7 cSIS H5.7 ,lin

155.1 -l1f 00'" 151.1 -1+ ,Igl!I.I,

Byrne et al., Theor Appl Genet 91:869–875, 1995

3

Page 4: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

What is a genetic map?

A sequence-based map measures distance betweenchromosome locations in basepairs.

A genetic map measures distance between chromosomelocations via the recombination rate at meiosis.

Two markers are d centiMorgans (cM) apart if there is anaverage of d crossovers in the intervening interval in every100 meiotic products.

4

Page 5: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Meiosis

5

Page 6: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Genetic distance

• Genetic distance between two markers (in cM) =

Average number of crossovers in the intervalin 100 meiotic products.

• “Intensity” of the crossover point process

• Recombination rate varies by– Organism

– Sex

– Chromosome

– Position on chromosome

6

Page 7: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Recombination fraction

We generally do not observe thelocations of crossovers; rather, weobserve the grandparental origin ofDNA at a set of genetic markers.

Recombination across an intervalindicates an odd number ofcrossovers.

Recombination fraction =

Pr(recombination in interval) = Pr(odd no. XOs in interval)

7

Page 8: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Map functions

• A map function relates the genetic length of an interval and therecombination fraction.

r = M(d)

• Map functions are related to crossover interference, but a map function isnot sufficient to define the crossover process.

• Haldane map function: no crossover interference

• Kosambi: similar to the level of interference in humans

• Carter-Falconer: similar to the level of interference in mice

8

Page 9: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Ordering markers

ABC

∣∣∣∣∣∣abc

−→

ABC abcABc abCAbc aBCAbC aBc

Marker orders: A–B–C A–C–B B–A–C

9

Page 10: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Ordering markers

ABC

∣∣∣∣∣∣abc

−→

ABC abcABc abCAbc aBCAbC aBc

Marker orders: A–B–C A–C–B B–A–C

10

Page 11: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Ordering markers

ABC

∣∣∣∣∣∣abc

−→

ABC abcABc abCAbc aBCAbC aBc

Marker orders: A–B–C A–C–B B–A–C

11

Page 12: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Ordering markers

ABC

∣∣∣∣∣∣abc

−→

ABC abcABc abCAbc aBCAbC aBc

Marker orders: A–B–C A–C–B B–A–C

12

Page 13: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Ordering markers

ABC

∣∣∣∣∣∣abc

−→

ABC abcABc abCAbc aBCAbC aBc

Marker orders: A–B–C A–C–B B–A–C

With M markers, there are M!/2 possible orderings.

For M = 100, M!/2 ≈ 10157

13

Page 14: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Eucalypt pedigree

A B C D● ●

● ●

● ● ●

118 progeny

14

Page 15: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Inferring phase(

A1

A2

∣∣∣∣ B1

B2or

A1

B2

∣∣∣∣ B1

A2

(C1

C2

∣∣∣∣ D1

D2or

C1

D2

∣∣∣∣ D1

C2

)

locus 1locus 2 AC AD BC BC

AC •AD •BC •BD •

• • • •• • • •• • • •• • • •

15

Page 16: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Eucalypt genetic map

1 2 3 4 5 6

0.0 '" 0.0 .m 0.0 0.0 ;I�se

0.0 0'" �1 1l '" ,IU ,. ."", ' " g427 " ", .0'" ,., ;na

•. , '.0 ,0029 • •• "1 C422

••• •• " ... . �, ". C184

'" '" \2.3 92)7 941 ..... 1 �.4 ,-, 1 '.2 gOgBA, �474 " '"

1 a. , �!iSB 17,!I " .'� 1 !I.I 91178

19.3 " 1'.1 ,�, 20. \ �, c187 2l.i 9'·2 IF ,,06;8

'H

I' U14..J " , c16!IC

.�" H ,�, 2B. ' "' .. 21.2 gOJ2.\, eU' ... " .J711A 21.1 ., ..

""� 21.1 HII-l 32.8 J'H 31.. .�. 32.1 9121 29.' ,42) 33.2 '"

0423 �2�' 3'.e 11-1 11-2

,Ill "'j 'I' N12-' • • 43.' 9I)88A 43.' 45.7 gtlUB 52.1 � .. 4B.l BI2-1 .. , cHI L1Z-! SIUI 812-1 ". , "" 52.' loIl-2

".IU, II. J ,OlU. .'" ,�, Sl.O 0'9) clBO ... lUI "" ' ..., .,,, d�7 62.0 ;0'99 d • Ii! .451S 81.54 "'00 11.8 u., •• sec 6'.1 e211 C

... ".W .... 6B.1 .�,

., .• e132 18.1 ,421 71.1 51-\ 7'.1 .- 14.' 11-\ 71.2 ... ,4-2

78.2 11.7 ,JliB 78.1 " 7&.1 0171 82.1 "�,

18.2 ,31 . .0. liS.' _ g3D " 78.' "

'1.1 ,,421C 111.1 m 11.2 ",n <:41\ BJ.t 9412 " ".0 ""

111.1 94128 , 01.1 IOU <00, 100.3 g"S "j'! 18N e211A ". 107.' .",

(18-' ,,,., " 10\1 . 2 L12-2

, 10.1 FI-2 110. I .m 110.1 ... ' lot1.7 1I7

'i 114.1 c421A 1\3 .• ,,,.., 110.1 c4278 110.7 117. 115

.

:5 11'11-1 "1'1 I II.' 111.� cOlO 111.2 _ WI1_2

. . . 11 8.1 120.3 �"7 III. G�-: 120.7 - ;428 1 H. lli.1 120.7 1;-2 122.7 �, 121.2 121.8 I:l.' ,6-'

121.2 128 .• I 21.0 -ill! 12'.5 ,

In.7 IlO.2 . _ cono 12'.7 " In. 130.11 Ill.' ;262 I2t. Ill.: 13a.' 13J.7

1&-2 1 38 .1 8342 IJI .• � ..

13'.4 Ul.: ellS 1l8.1 6-, 140.2

14'.0 IU.7 cSIS H5.7 ,lin

155.1 -l1f 00'" 151.1 -1+ ,Igl!I.I,

Byrne et al., Theor Appl Genet 91:869–875, 1995

16

Page 17: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

CEPH pedigrees

1331 1332 1347

1362 1413 1416

884 102

● ●

● ●●●● ●

● ●

● ● ●● ●

● ●

● ●●

● ●

●● ●● ●● ●

● ●

●● ● ●

● ●

● ● ●● ●●

● ●

●●● ●● ●●

● ●●● ● ●●●

● ●

● ●●●● ●

● ●

● ● ●● ●

● ●

● ●●

● ●

●● ●● ●● ●

● ●

●● ● ●

● ●

● ● ●● ●●

● ●

●●● ●● ●●

● ●●● ● ●●●

15 14 13 12

2 1

3 4 5 6 7 8 9 10 11 16 17

16 15 14 13

2 1

3 4 5 6 7 8 9 10 11 12 17

15 14 13 12

2 1

3 4 5 6 7 8 9 10 11 16

16 15 14 13

2 1

3 4 5 6 7 8 9 10 11 12 17

19 21 18 20

2 1

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

14 13 12 11

2 1

3 4 5 6 7 8 9 10 11 15 16

18 17 16 15

2 1

3 4 5 6 7 8 9 10 11 12 13 14

2 1

3 4 5 6 7 8 9 10 11 12 13 14 15 16

17

Page 18: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Marshfield maps: Tasks

• Assemble data

• Understand marker namesAFM, UT, CHLC (GATA etc.), Mfd, D*S*

• Identify cryptic duplicates

• Order markers and identify genotyping errorsRemoved 764 / 969,425 genotypes

18

Page 19: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

CRIMAP chrompic

1332-03 ma -11-i--11--111-i111-11-1111i--1111i-1111-i--11---1--11-1111-1-1i1---1...1332-03 pa 0000----0000000o00o00-000-000-0000o00-000-00000-00001---000-00-o000-0...

1332-04 ma -11-i--11--111-1111-11-i111i--i1111-1111-i--11---1--11-1111-1-11i--11...1332-04 pa 1111----1111111111i11-1i1-111-i111i11-111-11111-11111---111-11-1i1111...

1332-05 ma -11-i--11--111-i111-11-1111o--0000o-0000-o--00---0--00-0000-0-0o0--00...1332-05 pa 0000----0000000o00o00-000-111-1111i11-111-1111--11111---111-11-i11111...

1332-06 ma -00-o--00--000-o000-00-0000o--0000o-0000-o--00---0--00-0000-1-11i--11...1332-06 pa 1111----1111111i11i11-111-111-1111i11-111-11111-11111---111-11-1i1111...

1332-07 ma -00-o--00--000-o000-00-0000o--0000o-0000-o--00---0--00-0000-0-0o0--00...1332-07 pa 1111----1111111i11i11-111-111-1111i11-111-1111--11111---111-11-i11111...

1332-08 ma -10-o--00--000-00-0-00-0000o--o0000-0000-o--00---0--11-1111-1-1i1--11...1332-08 pa 0000----000000000-o00-010-000-o000o00-000-00000-00000---000-00-o00000...

1332-10 ma -11-i--1---111-i111-11-1111i--1111i-1111-i--11---1--11-1111-1-1i1--11...1332-10 pa 1000-----000000o00o00-000-000-0000o00-000-00000-00000---000-00-o00000...

1332-11 ma -11-o--00--000-o000-00-0000o--0000o-0000-o--00---0--00-0000-0-0o0--00...1332-11 pa 1111----1111111i11i11-111-111-1111i11-111-11111-11111---111-11-i11111...

1332-12 ma -00-i--11--111-i111-11---11i--1111i-1111-i--11---1--11-1111-1-1i1---1...1332-12 pa 0000----0000000o00o00-0---000-0000o00-000-00000-00000---000-00-o000-0...

1332-17 ma -11-i--1---11--i111-1--1111i--1111i-1111-i--11---1--11-1100-0-00o--00...1332-17 pa 0000-----0000--o00o00-000-000-0000o-0-000-0000--00000---000-00-0o0000...

19

Page 20: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

CRIMAP chrompic

1332-03 ma -11-i--11--111-i111-11-1111i--1111i-1111-i--11---1--11-1111-1-1i1---1...1332-03 pa 0000----0000000o00o00-000-000-0000o00-000-00000-00001---000-00-o000-0...

1332-04 ma -11-i--11--111-1111-11-i111i--i1111-1111-i--11---1--11-1111-1-11i--11...1332-04 pa 1111----1111111111i11-1i1-111-i111i11-111-11111-11111---111-11-1i1111...

1332-05 ma -11-i--11--111-i111-11-1111o--0000o-0000-o--00---0--00-0000-0-0o0--00...1332-05 pa 0000----0000000o00o00-000-111-1111i11-111-1111--11111---111-11-i11111...

1332-06 ma -00-o--00--000-o000-00-0000o--0000o-0000-o--00---0--00-0000-1-11i--11...1332-06 pa 1111----1111111i11i11-111-111-1111i11-111-11111-11111---111-11-1i1111...

1332-07 ma -00-o--00--000-o000-00-0000o--0000o-0000-o--00---0--00-0000-0-0o0--00...1332-07 pa 1111----1111111i11i11-111-111-1111i11-111-1111--11111---111-11-i11111...

1332-08 ma -10-o--00--000-00-0-00-0000o--o0000-0000-o--00---0--11-1111-1-1i1--11...1332-08 pa 0000----000000000-o00-010-000-o000o00-000-00000-00000---000-00-o00000...

1332-10 ma -11-i--1---111-i111-11-1111i--1111i-1111-i--11---1--11-1111-1-1i1--11...1332-10 pa 1000-----000000o00o00-000-000-0000o00-000-00000-00000---000-00-o00000...

1332-11 ma -11-o--00--000-o000-00-0000o--0000o-0000-o--00---0--00-0000-0-0o0--00...1332-11 pa 1111----1111111i11i11-111-111-1111i11-111-11111-11111---111-11-i11111...

1332-12 ma -00-i--11--111-i111-11---11i--1111i-1111-i--11---1--11-1111-1-1i1---1...1332-12 pa 0000----0000000o00o00-0---000-0000o00-000-00000-00000---000-00-o000-0...

1332-17 ma -11-i--1---11--i111-1--1111i--1111i-1111-i--11---1--11-1100-0-00o--00...1332-17 pa 0000-----0000--o00o00-000-000-0000o-0-000-0000--00000---000-00-0o0000...

20

Page 21: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Top of chr 22

Marker Dnumber sex-ave(cM) female(cM) male(cM)

1 ATA2G02 Unknown 0.00 0.00 0.00

1.79 0.00 2.60

2 GATA198B05 Unknown 1.79 0.00 2.60

2.27 3.32 0.00

3 AFM217xf4 D22S420 4.06 3.32 2.60

4.26 4.51 5.42

4 AFM288we5 D22S427 8.32 7.83 8.02

5.25 7.52 3.00

5 265yf5 D22S425 13.57 15.35 11.02

0.03 0.00 0.65

6 GGAA10F06 D22S686 13.60 15.35 11.67

0.84 0.00 0.82

7 AFMa037zd1 D22S539 14.44 15.35 12.49

0.00 0.00 0.00

8 AFM292va9 D22S446 14.44 15.35 12.49

3.27 5.91 0.00

9 Mfd51 D22S257 17.71 21.26 12.49

21

Page 22: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Marker search

http://research.marshfieldclinic.org/genetics/MarkerSearch/searchMarkers.asp

22

Page 23: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

10th worst graph

Broman et al., Am J Hum Genet 63:861–869, 199823

Page 24: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Total no. crossovers

Broman et al., Am J Hum Genet 63:861–869, 199824

Page 25: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Crossover locations

Broman and Weber, Am J Hum Genet 66:1911–1926, 200025

Page 26: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Family 884, chr 6

Maternal chromosomes34567891011121314

Paternal chromosomes34567891011121314

26

Page 27: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Autozygosity

Broman and Weber, Am J Hum Genet 65:1493–1500, 199927

Page 28: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Crossover interference

Broman and Weber, Am J Hum Genet 66:1911–1926, 200028

Page 29: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Maternal chr 8

29

Page 30: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Apparent triple XOs

Broman et al., In: Science and Statistics: A Festschrift for Terry Speed, 2003

30

Page 31: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Chr 8p inversion

Broman et al., In: Science and Statistics: A Festschrift for Terry Speed, 2003

31

Page 32: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Comparison to sequence

Matise et al., Am J Hum Genet 70:1398–1410, 200232

Page 33: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Comparison to sequence(revisited)

15 20 25 30 35 40 45

0

10

20

30

40

50

60

Sequence Position (Mbp)

Gen

etic

map

pos

ition

(cM

)

D22S534(UT7213)

D22S529(UT1963)

D22S531(UT5900)

D22S1175(AFM331wc9)

Thanks to UCSC and Ensembl!33

Page 34: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Dog genetic map

Neff et al., Genetics 151:803–820, 199934

Page 35: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Dog genetic map

Neff et al., Genetics 151:803–820, 199935

Page 36: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

The first dog map

Mellersh et al., Genomics 46:326–336, 199736

Page 37: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

The dog X

Mellersh et al., Genomics 46:326–336, 1997

37

Page 38: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

The dog X

Mellersh et al., Genomics 46:326–336, 1997 Neff et al., Genetics 151:803–820, 1999

38

Page 39: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

C. savignyi (sea squirt)

Aim: Build linkage map to improve long-range ordering of draftgenome sequence (which is currently represented as a fewhundred “reftigs”)

39

Page 40: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

C. savignyi pedigree

●●

48 progeny

Markers: PCR amplicon (primers in exons, spanning anintron), digested with a restriction enzyme

−→ 2, 3 or 4 banding patterns

Tricky bits: Which marker “phenotype” corresponds to whichgenotype?

Using information on locations of markers within“reftigs”

40

Page 41: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Example of two markers

99481-HaeIII

114467-MboI AC AD BC BC

AC 8 3 1 0

AD 0 10 0 1

BC 2 0 11 1

BD 0 1 0 9

41

Page 42: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Example of two markers

99481-HaeIII

114467-MboI AC AD BC BC

AC 8 3 1 0

AD 0 10 0 1

BC 2 0 11 1

BD 0 1 0 9

42

Page 43: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Example of two markers

99481-HaeIII

114467-MboI AC AD BC BC

AC 8 3 1 0

AD 0 10 0 1

BC 2 0 11 1

BD 0 1 0 9

43

Page 44: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Example of two markers

99481-HaeIII

114467-MboI AC AD BC BC

AC 8 3 1 0

AD 0 10 0 1

BC 2 0 11 1

BD 0 1 0 9

44

Page 45: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

C. savignyi map

place 104 of the largest reftigs, cumulatively representing 83%(146.8 Mb) of the reference sequence, onto chromosomes. Inaddition, we could orient 36 of the largest reftigs that containedmultiple genetically resolved markers. Of the 56 reftigs >1 Mb, 54have now been mapped. In only nine instances were we unableto confidently order small subgroups of reftigs (e.g., Fig. 1, chro-mosome 10 reftigs 304 and 193).

Our approach appears to have resulted in complete coverageof the genome. In fact, in an effort to extend the map and placeadditional reftigs, 11 of the 182 markers were actually developedafter a preliminary map had been built. All of the additionalmarkers were found to be linked to the previously identified link-age groups. A complete list of the reference sequence reftigs, andtheir inferred order and orientation, is provided in SupplementalTable 1.

Tight integration of the genetic map and sequence data en-abled us to estimate additional genome parameters. The newlyassembled 650-cM map accounts for 83% of the sequenced ge-nome; therefore, given certain simplifying assumptions, we esti-

mate the full genetic map length to be at most 783 cM. From 60pairs of loci located on the same reftig, the average rate of re-combination is 200 kb/cM, and recombination events appear tobe roughly Poisson distributed. Finally, the regular spacingachieved by targeting markers to fragments puts any point on thegenome within an average of 4.4 cM (880 kb) from the nearestestablished marker.

Disagreement between the genetic map and sequence assembly

For a small number of marker loci, the location on the geneticmap and their presumed physical location were incongruent.Disagreements in ordering are easily identified on the integratedgenetic map (Fig. 1) by intersecting transverse lines connectingcorresponding loci on the linkage groups and chromosomes.Such differences could be due to problems with markers, such asamplification of an unintended target locus, a misassembly inthe reference sequence, or an actual polymorphism between themapping population and the sequenced individual. In the caseswhere ordering disagreed, we confirmed that the correct locus

Figure 1. Integrated genetic map and extended sequence assembly. Fourteen linkage groups in C. savignyi were identified by genetically mapping182 markers, corresponding to 110 distinct genetic loci. Vertical black bars represent the genetic linkage map. All bars are drawn to scale; the lengthof the bars is proportional to the number of centimorgans (cM). Map distances at each mapped genetic locus are indicated to the left of each bar. Bluebars on the right of each pair represent reftigs. Tick marks indicate the physical location of the markers among the 104 mapped reftigs. Transverse lineslink the location of each marker on the genetic and physical maps. Numbers to the right of each assembly fragment correspond to reftig identifiers inVersion 2.1 of the assembly (Small et al. 2007a). When two or more markers on the same reftig allowed the orientation of the fragment to be resolved,the orientation is indicated with a red arrow pointing up or down for the positive or negative orientation, respectively. Red diamonds indicate unknownorientation. The gap size between reftigs is arbitrary since the true physical distance between mapped reftigs is unknown.

C. savignyi genetic map

Genome Research 1371www.genome.org

Cold Spring Harbor Laboratory Press on August 8, 2008 - Published by genome.cshlp.orgDownloaded from

Hill et al., Genome Res 18:1369-1379, 200845

Page 46: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

High-resolution mouse map

Shifman et al. (PLoS Biology, 4:e395, 2006) constructed ahigh-resolution genetic map of the mouse genome.

• 10,202 SNPs

• 80 families from the latest generations in a heterogeneousstock (HS) of outbred mice

• 4,048 meioses

• Valuable resource for mouse geneticists

• Characterization of recombination rate variation– Particularly regarding the sex difference in recombination.

46

Page 47: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Concerns

• In order to accommodate the analysis of 8 complex pedigrees, Shifmanet al. used a sliding window of 5–15 SNPs.

• The remaining 72 families were all nuclear, and many lacked parentalgenotype data or had genotype data on just one parent, and many weresmall (as few as 2 siblings).

• The software used (CRIMAP, last revised in 1990) makes some approx-imations that result in biased estimates of genetic distances (even thesex-averaged ones) in the case of small sibships with incomplete parentalgenotype data.

• The length of the genetic map (and that of particular chromosomes) wasquite different from previous mouse maps.

47

Page 48: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

A complex pedigree

244 243 223 224 248 247 226 225 246 245

8 216 215 4 227 228 213 214 11 234 233 14 231 232 18

235 236 23 217 218

13

222 221

237

16

239

241

240

242 20

238

18 220 219

2 19 12

13

11

9 3

8 4 11 14 18

23

13

16 20 18

2 19 12

13

11

9 3

The diamonds are sibships.

48

Page 49: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

What we did

• Obtained the raw data.

• Omitted 13 individuals with clear pedigree errors.

• Switched the sex of 26 individuals from female to male.

• Omitted 176 genotypes due to Mendelian inconsistencies.

• Split the large pedigrees into sibships (plus parents and grandparents).

• Split the larger sibships.

• Omitted sibships with no parental genotypes.

• Omitted small sibships (≤ 8 sibs) with genotype data on just one parent.

• Omitted 538 genotypes leading to apparent tight double crossovers.

49

Page 50: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Substantial differences

The revised genetic maps...

• Are much smaller.

– The autosomal genome is 11% smaller in the revised maps

• Show a much smaller sex difference.

– Shifman et al.: female autosomal genome is 26% longer than the male.

– Revised maps: female autosomal genome is 9% longer than the male.

• Show fewer regions of unusually high recombination rate.

– “Torrid” regions disappear or have markedly attenuated rec’n rates.

50

Page 51: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Recombination rates(chr 1)

20 40 60 80 100 120 140 160 180

20 40 60 80 100 120 140 160 180

Position (Mb)

0.0

0.5

1.0

1.5

2.0

2.5

0.0

0.5

1.0

1.5

2.0

2.5

0.5

1.0

1.5

2.0

2.5

Sex

−av

erag

ed (

cM/M

b)M

ale

Fem

ale

Sex

−sp

ecifi

c (c

M/M

b)

Shifman et al.Revised

Cox et al., Genetics 182:1335–1344, 200951

Page 52: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Recombination rates(chr 5)

20 40 60 80 100 120 140

20 40 60 80 100 120 140

Position (Mb)

0.0

0.5

1.0

1.5

2.0

2.5

0.0

0.5

1.0

1.5

2.0

2.5

0.5

1.0

1.5

2.0

2.5

Sex

−av

erag

ed (

cM/M

b)M

ale

Fem

ale

Sex

−sp

ecifi

c (c

M/M

b)

Shifman et al.Revised

Cox et al., Genetics 182:1335–1344, 200952

Page 53: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Recombination rates(chr 18)

20 40 60 80

20 40 60 80

Position (Mb)

0.0

0.5

1.0

1.5

2.0

2.5

0.0

0.5

1.0

1.5

2.0

2.5

0.5

1.0

1.5

2.0

2.5

Sex

−av

erag

ed (

cM/M

b)M

ale

Fem

ale

Sex

−sp

ecifi

c (c

M/M

b)

Shifman et al.Revised

Cox et al., Genetics 182:1335–1344, 200953

Page 54: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Morals

• Genetic maps continue to be useful

• Be careful about automated map construction

• Careful, tedious work is often necessary

• The simplest things can have the greatest impact

• Artifacts can be more interesting than anything else

• Don’t give a sex-averaged map of the X chromosome

• Use care in data cleaning

• Split large pedigrees into non-overlapping sibships rather than resort tothe use of a sliding window of markers

• Use computer simulations to verify the appropriateness of the choicesyou make in a complex analysis

54

Page 55: Genetic maps - Biostatistics and Medical Informatics · 2010-05-13 · Genetic maps from my past, present and future from my Karl W Broman Biostatistics & Medical Informatics University

Acknowledgments

Terry Speed, University of California, Berkeley

Jim Weber, PreventionGenetics (formerly Marshfield Medical Research Foundation)

Mark Neff, University of California, Davis

Matt Hill and Arend Sidow, Stanford

Beth Dumont, University of Wisconsin–Madison

Gary Churchill, Bev Paigen, Ken Paigen, et al., The Jackson Lab

Jonathan Flint, The Wellcome Trust

55