week1,lecture1 - pennsylvania state university ·...

32
BMMB 597D: Analyzing Next Genera1on Sequencing Data BMMB 852: Applied Bioinforma1cs Week 1, Lecture 1 István Albert Bioinforma1cs Consul1ng Center Penn State, 2013

Upload: others

Post on 24-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data"BMMB#852:"Applied"Bioinforma1cs"

""Week"1,"Lecture"1"

István#Albert##

Bioinforma1cs"Consul1ng"Center""

Penn"State,"2013"

Page 2: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Introduc1ons"

Lecturer:"Istvan#Albert#([email protected])""

TA:"Nicholas#Stoler#([email protected])"

"Office"hours:"MonRWed"from"1R3pm"in"502B"War1k"

"Email:"[email protected]#

"Course"Webpage:"hBp://www.personal.psu.edu/iua1/#

Page 3: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Ra1onale"for"this"course"

•  Life"sciences"are"becoming"a"data"driven"science""

•  Data"is"represented"as"text"files"in"various"formats"that"are"transformed"one"step"at"a"1me"

•  Most"bioinforma1cs"classes"are"focused"on"computer"science"or"algorithms.""

•  We"will"focus"on"informa1on"processing"and"applica1ons"

Page 4: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Requirements"

•  Recommended"latest"Mac"OSX"10.8.4"–"(properly"set"up)"

•  Or"another"Unix"based"opera1ng"system"""

•  if"you"have"a"Windows"computer"please"install"Linux""– Ubuntu"Live"CD""– Dual"boot"Linux"and"Windows"– Use"VirtualBox"and"install"Linux"into"it"

Page 5: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Lecture"topics"

15"weeks"–"two"lectures"per"week"="30"lectures"""

•  core"informa1cs"competency"•  computa1onal"founda1ons"•  biological"data"formats"•  sta1s1cal"methods"and"visualiza1on"•  soaware"tools"and"their"applica1ons""

"

Page 6: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Lecture"Formats"

""•  Background"informa1on""

•  Prac1cal"examples"that"1e"in"with"the"topic""

•  Finishing"with"in"class"exercises"+"homework"

•  We’ll"try"to"make"it"simple"and"easy"to"follow"

Page 7: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Home"work"

•  Home"work"will"be"given"out"during"each#lecture#and#correspond"to"the"lecture."Labeled"1,"2"…"30"""

•  Home"work"due"on"the"Tuesday"of"the"following"week"of"when"it"was"given"out."

•  For"example:"homework"1"and"2"will"be"due"next"Tuesday.""

•  Note:"there"are"office"hour(s)"between"each"homework’"due"date"(Wed"and"Mon)"

•  Homework"usually"fits"on"one"sheet"of"paper."Show"the"commands"and"their"output."

"

Page 8: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Grading"

•  Grades"will"be"the"average"of"all#homework#+"final#project""

•  Final#project"given"out"last"week,"and"is"due"on"Monday"on"the"final’s"week.""

•  For"homework"and"projects"you"may"work"in"teams""

Page 9: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Computa1on"!"Thought"

•  Computa1onal"approaches"reflect"and"affect"the"thought"process"

•  When"we"learn"informa1cs,"we"learn"how"to"think"in"a"way"that"is"easy"to"translate"into"computa1on"

•  There"is"no"magic"–"it"is"just"like"any"other"subject"majer"–"it"needs"a"lot"of"prac1ce"(the"brain"is"a"muscle)"""

•  Similar"to"learning"a"foreign"language"–"there"is"a"vocabulary,"a"grammar"""idioma1c"expressions"

""

Page 10: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Realis1c"News"

Bioinforma1cs"will"never"be"easy"or"trivial!"

"It"is"like"high"al1tude"mountain"hiking"

"Never"underes1mate"it."

""

Page 11: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Bioinforma1cs:"it"is"like"hiking"up"to"Hallet’s"Peak"

A"typical"bioinforma1cs"project"is"like"hiking"up"to"Hallet’s"Peak"in"the"Rocky"Mountain""It"is"hard"work,"with"a"lot"of"effort"and"you"if""keep"it"up,"pay""ajen1on"you’ll"get"there."""There"is"a"steep"but"not"overly"dangerous"trail"in"the"back.""There"is"no"special"skill"other"than"proper"walking"technique"and"not"giving"up."""There"are"no"magical#shortcuts"that"you"will"learn.""

Page 12: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Expecta1ons"

"

You"can"only#learn"by"doing"it""Spend"3R6"hours"outside"class"each"week:""–  Explore"behaviors""–  Expand"the"scope"of"the"study"–  Try"new"solu1ons"

Time"flies"when"you"know"what"you"are"doing.""

Page 13: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Complexity"versus"decision"making"

"•  Most"bioinforma1cs"analyses"consists"of"a"very#large#number#of"very#simple"decisions"

•  Most"of"which"need"to"be"correct!""

•  This"is"what"makes"it"difficult"""

•  There"are"no"strict"rules,"only"guidelines"""dare"to"improvise"and"adapt##

""

Page 14: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Bioinforma1cs"today"Large"datasets"generated"by"complex"equipment"

1.   Data#management"""storage,"transfer,"data"transforma1on"""domain"of"InformaRon#Technology""

2.   Data#analysis"""mapping,"assembly"""algorithm"scaling"""domain"of"Computer#Science"

3.   StaRsRcal#challenges#""tradi1onal"sta1s1cs"is"not"well"suited"for"modeling"systema1c"errors"over"large"number"of"observa1ons"""domain"of"StaRsRcs""

4.   Biological#hypothesis#tesRng##""data"interpreta1on""domain"of"Life#Science#

Page 15: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Analysis"Scaling"

•  Analysis"algorithms"almost"never"scale"linearly"with"the"amount"of"data."

•  For"example,"naïve"sequence"comparisons"scale"as"N*N:"in"order"to"compare"N"sequences"against"themselves"we"need"to"do"N*N"opera1ons""

•  N=1"""N=103"analysis"1me"increases"from"1""106."

Page 16: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Origins"of"Classical"Sta1s1cs"

Developed"in"the"era"of""

•  Limited"computa1onal"capabili1es"

•  Small"and"expensive"datasets""Operates"on"concepts"such"as""“null"hypothesis”"and"“pRvalues”"

Page 17: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Currently"in"life"sciences"

•  Powerful"computa1onal"capabili1es"

•  Cheap"and"extremely"large"datasets"

Small"systema1c"devia1ons"strongly"influence"any"test"–"we"are"unable"to"separate"the"many"influences"

""The"era"of"absurd"(silly?)"pVvalues,##p=10V19#

"

Page 18: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Data"characteris1cs ""

•  Random#errors#and"systemaRc#errors#accumulate"and"compound"during"each"step:"from"sample"extrac1on,"prepara1on"then"measurements"

•  Large"number"of"measurements"make"unlikely"events"very"common"

Page 19: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Data#produced#by#equipment#

novel"informa1on"

Example:"74"new"SNVs""(single"nucleo1de"varia1ons)"per"individual"per"genera1on"

Page 20: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Challenges:"shiaing"terminology"

What"is"the"difference"between"a"SNP"(single"nucleo1de"polymorphism)"and"a"SNV(single"nucleo1de"varia1on)?"""A"SNV"is"a"private"muta1on"while"a"SNP"is"a"muta1on"that"is"shared"amongst"a"popula1on""At"what"point"does"a"SNP"turn"into"SNV?""

Page 21: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

BioStar":"hjp://www.biostars.org"

•  I"started"the"site"in"2009"during"the"first"year"that"BMMB"597D"was"offered!"""

•  It"was"meant"to"support"ques1ons"for"this"course"

•  Today"it"has"grown"to"ajract"over"40K"unique"visitors"per"month"and"over"2#million#page"views"per"year"

Page 22: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

BioStar":"hjp://www.biostars.org"

Page 23: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

How"to"turn"you"computer"""into"a"computa1onal"beast"

On"a"Mac"you"will"need"to:""1.  Update"your"Mac#OS#to"the"latest"version"10.8.4"

(Mountain"Lion)""

2.  Using"the"App"Store"download"and"install"XCode###

3.  Download"and"install"the"command#line#tools##

Page 24: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Xcode"preferences"""Downloads"

Page 25: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

On"Linux"

•  Install"a"well"supported"linux"version:"Ubuntu,"Debian,"Fedora"etc."

•  Use"a"package#manager"to"install"dependencies,"leads"to"incanta1ons"such"as:"

""""""""""""""""""""""aptVget#install#zlib1gVdev##

"

Page 26: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Success"looks"like"this"

the"make"tool"and"gcc"both"work"

Page 27: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Noble"WS"(2009)"A"Quick"Guide"to"Organizing"Computa1onal"Biology"Projects."PLoS"Comput"Biol"5(7):"e1000424."doi:10.1371/journal.pcbi.1000424"hjp://www.ploscompbiol.org/ar1cle/info:doi/10.1371/journal.pcbi.1000424"

Organizing"your"projects"

Page 28: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

The"UNIX"command"line"

"Ac1on"words"

"Chain"words"together"to"form"statements"

""Open"Terminal"on"Mac."

""""""

"

Page 29: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Gewng"to"know"the"terminal:""man"(manual)"

DiscoverVability:"learn"how"to"find"out"more"details"on"a"tool"process"

Page 30: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Learn"more"about"these"commands"

•  ls"""(list"directory"contents)"•  rm"(remove"files/directories)"•  cp"(copy"files)"•  cd"(change"directory)"•  mkdir"(make"directory)"•  rmdir"(remove"directory)"•  pwd"(print"current"work"directory)""

Note"that"each"on"of"these"commands"has"op1ons"(flags)"

"

Page 31: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Customizing"commands"

•  A"flag"is"a"small"decora1on"used"to"change""or"customize"what"a"tool"does.""

Page 32: Week1,Lecture1 - Pennsylvania State University · BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data" BMMB#852:"Applied"Bioinforma1cs" "Week"1,"Lecture"1" István#Albert# # Bioinforma1cs"Consul1ng"Center"

Homework"1"

"1.  Install"all"the"required"soaware"2.  Navigate,"list"contents"of"directories"3.  Create"and"delete"directories,"create"and"delete"files"4.  Find"help"on"commands"5.  Understand"what"flags"mean"6.  Make"the"ls"command"write"out"the"files"sizes"in"“human"

friendly”"mode"7.  Make"the"rm"command"ask"for"permission"when"

removing"a"file"8.  Make"the"cp"command"ask"for"permission"if"the"copy"

would"overwrite"an"exis1ng"file"(this"is"called"clobbering)"