lecture 14: mapping reads to a reference – burrows ...cs425/fall19/slides/lecture_14_bwt_f… ·...

16
Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index 1 Fall 2019 November 5, 2019

Upload: others

Post on 20-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

Lecture14:MappingReadstoaReference–BurrowsWheeler

TransformandFMIndex

1

Fall2019November5,2019

Page 2: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

Outline� ProblemDefinition� DifferentSolutions� Burrows-WheelerTransformation(BWT)�  Ferragina-Manzini(FM)Index�  SearchUsingFMIndex� AlignmentUsingFMIndex

2

Page 3: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

MappingReadsProblem:Wearegivenaread,R,andareferencesequence,S.FindthebestoralloccurrencesofRinS.

Example:R=AAACGAGTTAS=TTAATGCAAACGAGTTACCCAATATATATAAACCAGTTATTConsideringnoerror:oneoccurrence.Consideringupto1substitutionerror:twooccurrences.Consideringupto10substitutionerrors:manymeaninglessoccurrences!

3

Page 4: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

MappingReads(continued)Variations:�  Sequencingerror◦  Noerror:RisaperfectsubsequenceofS.◦  Onlysubstitutionerror:RisasubsequenceofSuptoafewsubstitutions.◦  Indelandsubstitutionerror:RisasubsequenceofSuptoafewshortindelsandsubstitutions.

�  Junctions(forinstanceinalternativesplicing)◦  Fixedorder/orientation R=R1R2…RnandRimaptodifferentnon-overlappinglociinS,buttothe

samestrandandpreservingtheorder.◦  Arbitraryorder/orientation R=R1R2…RnandRimaptodifferentnon-overlappinglociinS.

4

Page 5: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

DifferentSolutions�  Alignment,suchasSmith-Watermanalgorithm:◦  Pro:adequateforallvariations.◦  Con:computationallyexpensive,notsuitablefornext-generationsequencing.

�  Seed-and-Extend◦  Pro:canhandleerrorsandjunctionsmoreefficiently.◦  Con:slowwhenno(few)error(s).

�  FerraginaManzini(FM)IndexSearch◦  Pro:computationallyefficient,whennoerror.◦  Con:exponentialinthemaximumnumberoferrors.

5

Page 6: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

Burrows-WheelerTransformationExample:mississippi1.  Appendtothe

inputstringaspecialchar,$,smallerthanallalphabet.

6

mississippi$

Page 7: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

Burrows-WheelerTransformation(cnt’d)Example:mississippi2.  Generateall

rotations.

7

m i s s i s s i p p i $ i s s i s s i p p i $ m s s i s s i p p i $ m i s i s s i p p i $ m i s i s s i p p i $ m i s s s s i p p i $ m i s s i s i p p i $ m i s s i s i p p i $ m i s s i s s p p i $ m i s s i s s i p i $ m i s s i s s i p i $ m i s s i s s i p p $ m i s s i s s i p p i

Page 8: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

Burrows-WheelerTransformation(cnt’d)Example:mississippi3.  Sort

rotationsaccordingtothealphabeticalorder.

8

$ m i s s i s s i p p i i $ m i s s i s s i p p i p p i $ m i s s i s s i s s i p p i $ m i s s i s s i s s i p p i $ m

m i s s i s s i p p i $ p i $ m i s s i s s i p p p i $ m i s s i s s i s i p p i $ m i s s i s s i s s i p p i $ m i s s s i p p i $ m i s s i s s i s s i p p i $ m i

Page 9: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

Burrows-WheelerTransformation(cnt’d)Example:mississippi4.  Outputthe

lastcolumn.

9

$ m i s s i s s i p p i i $ m i s s i s s i p p i p p i $ m i s s i s s i s s i p p i $ m i s s i s s i s s i p p i $ m

m i s s i s s i p p i $ p i $ m i s s i s s i p p p i $ m i s s i s s i s i p p i $ m i s s i s s i s s i p p i $ m i s s s i p p i $ m i s s i s s i s s i p p i $ m i

Page 10: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

Burrows-WheelerTransformation(cnt’d)Example:mississippi

ipssm$pissii

10

Page 11: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

Ferragina-ManziniIndexExample:mississippiFirstcolumn:FLastcolumn:LLet’smakeanLtoFmap.Observation:ThenthiinListhenthiinF.

11

$ m i s s i s s i p p i i $ m i s s i s s i p p i p p i $ m i s s i s s i s s i p p i $ m i s s i s s i s s i p p i $ m

m i s s i s s i p p i $ p i $ m i s s i s s i p p p i $ m i s s i s s i s i p p i $ m i s s i s s i s s i p p i $ m i s s s i p p i $ m i s s i s s i s s i p p i $ m i

Page 12: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

Ferragina-ManziniIndex(cnt’d)LtoFmapStore/computeatwodimensionalOcc(j,‘c’)tableofthenumberofoccurrencesofchar‘c’uptopositionj(inclusive).andaonedimensionalCnt(‘c’)table.

12

$ i m p s i 0 1 0 0 0 p 0 1 0 1 0 s 0 1 0 1 1 s 0 1 0 1 2 m 0 1 1 1 2 $ 1 1 1 1 2 p 1 1 1 2 2 i 1 2 1 2 2 s 1 2 1 2 3 s 1 2 1 2 4 i 1 3 1 2 4 i 1 4 1 2 4

$ i m p s 1 4 1 2 4

Occ(j,‘c’)

Cnt(‘c’)

Page 13: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

Ferragina-ManziniIndexLtoFmap[Cnt(‘$’)+Cnt(‘i’)+Cnt(‘m’)+Cnt(‘p’)=8]+[Occ(9,‘s’)=3]

=11

13

1 $ m i s s i s s i p p i 2 i $ m i s s i s s i p p 3 i p p i $ m i s s i s s 4 i s s i p p i $ m i s s 5 i s s i s s i p p i $ m 6 m i s s i s s i p p i $ 7 p i $ m i s s i s s i p 8 p p i $ m i s s i s s i 9 s i p p i $ m i s s i s 10 s i s s i p p i $ m i s 11 s s i p p i $ m i s s i 12 s s i s s i p p i $ m i

‘s’ section

before ‘s’

Page 14: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

Ferragina-ManziniIndexReversetraversal(1)i(2)p(7)p(8)i(3)s(9)s(11)i(4)s(10)s(12)i(5)m(6)$

14

1 $ m i s s i s s i p p i 2 i $ m i s s i s s i p p 3 i p p i $ m i s s i s s 4 i s s i p p i $ m i s s 5 i s s i s s i p p i $ m 6 m i s s i s s i p p i $ 7 p i $ m i s s i s s i p 8 p p i $ m i s s i s s i 9 s i p p i $ m i s s i s 10 s i s s i p p i $ m i s 11 s s i p p i $ m i s s i 12 s s i s s i p p i $ m i

Page 15: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

Ferragina-ManziniIndexSearchissi(1)-(12)i(2)-(5)si(9)-(10)ssi(11)-(12)issi(4)-(5)

15

1 $ m i s s i s s i p p i 2 i $ m i s s i s s i p p 3 i p p i $ m i s s i s s 4 i s s i p p i $ m i s s 5 i s s i s s i p p i $ m 6 m i s s i s s i p p i $ 7 p i $ m i s s i s s i p 8 p p i $ m i s s i s s i 9 s i p p i $ m i s s i s 10 s i s s i p p i $ m i s 11 s s i p p i $ m i s s i 12 s s i s s i p p i $ m i

Page 16: Lecture 14: Mapping Reads to a Reference – Burrows ...cs425/fall19/slides/Lecture_14_bwt_f… · Lecture 14: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index

Ferragina-ManziniIndexSearchpi(1)-(12)ipi

16

1 $ m i s s i s s i p p i 2 i $ m i s s i s s i p p 3 i p p i $ m i s s i s s 4 i s s i p p i $ m i s s 5 i s s i s s i p p i $ m 6 m i s s i s s i p p i $ 7 p i $ m i s s i s s i p 8 p p i $ m i s s i s s i 9 s i p p i $ m i s s i s 10 s i s s i p p i $ m i s 11 s s i p p i $ m i s s i 12 s s i s s i p p i $ m i