lecture-25 · 2013-08-23 · 2012$%$bmmb$597d:$analyzing$next$genera;on$sequencing$data$ $...

14
2012 % BMMB 597D: Analyzing Next Genera;on Sequencing Data Week 13, Lecture 25 István Albert Biochemistry and Molecular Biology and Bioinforma;cs Consul;ng Center Penn State

Upload: others

Post on 03-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$

$$Week$13,$Lecture$25$

István'Albert''

Biochemistry$and$Molecular$Biology$$and$Bioinforma;cs$Consul;ng$Center$

$Penn$State$

Page 2: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

Final$Project$

•  Will$account$for$50%$of$your$the$grade.$

•  It$will$consist$of$one$or$more$datasets$and$a$number$of$hypotheses$that$need$to$be$tested$

•  Project$given$out$Thursday,$Dec$6th$and$is$due$Tuesday,$Dec$11th$

$

Page 3: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

Data$and$scripts$for$this$lecture$

•  Data,$code$and$scripts$are$packaged$in$lec25.tar.gz'

•  have$to$master$too$many$tools$and$it$is$easy$to$get$stuck$

•  Use$it$as$a$guide$%$$don’t$just$copy/paste$$

•  Develop$your$own$mini$libraries$of$tools,$shell$scripts$and$methods$$

Page 4: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

Chip%Seq$peak%caller$Recap$

1.$A$peak$caller$transforms$aligned$read$coverages$to$intervals$of$enrichment$

Page 5: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

The$most$common$misconcep;on$Cau;on:$this$a$personal$opinion$that$others$disagree$with$

Most'confusion'in'peak'calling'arises'from'combining'the'peak'calling'with'sta>s>cal'tes>ng'$•  A$peaks$means$a$region$that$appears$to$have$an$enrichment.$$

$•  The$opposite$of$peak$is$a$region$with$no'data'

$•  Some$peaks$may$be$caused$by$random$agglomera;on$of$data$–$but$that$

those$are$s;ll$peaks$–$only$that$they$are$peaks$occurring$by$random$chance$$

•  Most$published$peak$callers$will$o]en$remove$peaks$based$on$o]en$non%obvious$reasons$–$$

Page 6: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

Crazy$peaks$by$MACS$

Page 7: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

Understanding$peak$calling$

•  We$don’t$need$to$use$the$en;re$read!$Only$the$5’$end$ma`ers.$$

•  Transform$your$data$into$reads$that$are$1$bp$long$around$the$start$sites!$

•  See$the$README.txt$for$step$by$step$instruc;ons$

$

Page 8: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

README.txt$in$the$data$

Page 9: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

Effects$of$increase$of$smoothing$

Page 10: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

Isolate$peak$calling$by$strands$

•  Tools$that$merge$strands$always$need$to$make$assump;ons$on$the$data$–$some;mes$it$is$not$obvious$what$these$are$

•  The$best$approach$is$to$operate$on$strand$individually,$then$combine$the$results$

•  Caveat:$low$coverage,$error$prone$data$will$be$even$more$difficult$to$analyze$once$split$$

Page 11: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

Inves;ga;ng$two$error$models$$

Fidng$and$predic;ons$smoothing=5$

Fidng$and$predic;ons$smoothing=20$

DNA$fragment$length$=$factor$footprint$

Page 12: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

Peaks$with$no$exclusion$zone$around$them$

Page 13: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

How$to$get$the$average$fragment$size?$

1.  Need$to$find$peak$pairs$2.  Compute$distance$between$their$limits$$Step$by$step:$•  Expand$each$fragment$to$the$le]$by$a$limit$•  Intersect$fragment$on$the$opposite$strand$•  Compute$distances$

Page 14: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng

Homework$25$

•  Run$the$pipeline$described$in$README.txt$$

•  Report$the$average$fragment$size$that$it$generates.$