week1,lecture1 - pennsylvania state university ·...
Post on 24-Jun-2020
2 Views
Preview:
TRANSCRIPT
BMMB#597D:"Analyzing"Next"Genera1on"Sequencing"Data"BMMB#852:"Applied"Bioinforma1cs"
""Week"1,"Lecture"1"
István#Albert##
Bioinforma1cs"Consul1ng"Center""
Penn"State,"2013"
Introduc1ons"
Lecturer:"Istvan#Albert#(iua1@psu.edu)""
TA:"Nicholas#Stoler#(nicholas.stoler@psu.edu)"
"Office"hours:"MonRWed"from"1R3pm"in"502B"War1k"
"Email:"iua1@psu.edu#
"Course"Webpage:"hBp://www.personal.psu.edu/iua1/#
Ra1onale"for"this"course"
• Life"sciences"are"becoming"a"data"driven"science""
• Data"is"represented"as"text"files"in"various"formats"that"are"transformed"one"step"at"a"1me"
• Most"bioinforma1cs"classes"are"focused"on"computer"science"or"algorithms.""
• We"will"focus"on"informa1on"processing"and"applica1ons"
Requirements"
• Recommended"latest"Mac"OSX"10.8.4"–"(properly"set"up)"
• Or"another"Unix"based"opera1ng"system"""
• if"you"have"a"Windows"computer"please"install"Linux""– Ubuntu"Live"CD""– Dual"boot"Linux"and"Windows"– Use"VirtualBox"and"install"Linux"into"it"
Lecture"topics"
15"weeks"–"two"lectures"per"week"="30"lectures"""
• core"informa1cs"competency"• computa1onal"founda1ons"• biological"data"formats"• sta1s1cal"methods"and"visualiza1on"• soaware"tools"and"their"applica1ons""
"
Lecture"Formats"
""• Background"informa1on""
• Prac1cal"examples"that"1e"in"with"the"topic""
• Finishing"with"in"class"exercises"+"homework"
• We’ll"try"to"make"it"simple"and"easy"to"follow"
Home"work"
• Home"work"will"be"given"out"during"each#lecture#and#correspond"to"the"lecture."Labeled"1,"2"…"30"""
• Home"work"due"on"the"Tuesday"of"the"following"week"of"when"it"was"given"out."
• For"example:"homework"1"and"2"will"be"due"next"Tuesday.""
• Note:"there"are"office"hour(s)"between"each"homework’"due"date"(Wed"and"Mon)"
• Homework"usually"fits"on"one"sheet"of"paper."Show"the"commands"and"their"output."
"
Grading"
• Grades"will"be"the"average"of"all#homework#+"final#project""
• Final#project"given"out"last"week,"and"is"due"on"Monday"on"the"final’s"week.""
• For"homework"and"projects"you"may"work"in"teams""
Computa1on"!"Thought"
• Computa1onal"approaches"reflect"and"affect"the"thought"process"
• When"we"learn"informa1cs,"we"learn"how"to"think"in"a"way"that"is"easy"to"translate"into"computa1on"
• There"is"no"magic"–"it"is"just"like"any"other"subject"majer"–"it"needs"a"lot"of"prac1ce"(the"brain"is"a"muscle)"""
• Similar"to"learning"a"foreign"language"–"there"is"a"vocabulary,"a"grammar"""idioma1c"expressions"
""
Realis1c"News"
Bioinforma1cs"will"never"be"easy"or"trivial!"
"It"is"like"high"al1tude"mountain"hiking"
"Never"underes1mate"it."
""
Bioinforma1cs:"it"is"like"hiking"up"to"Hallet’s"Peak"
A"typical"bioinforma1cs"project"is"like"hiking"up"to"Hallet’s"Peak"in"the"Rocky"Mountain""It"is"hard"work,"with"a"lot"of"effort"and"you"if""keep"it"up,"pay""ajen1on"you’ll"get"there."""There"is"a"steep"but"not"overly"dangerous"trail"in"the"back.""There"is"no"special"skill"other"than"proper"walking"technique"and"not"giving"up."""There"are"no"magical#shortcuts"that"you"will"learn.""
Expecta1ons"
"
You"can"only#learn"by"doing"it""Spend"3R6"hours"outside"class"each"week:""– Explore"behaviors""– Expand"the"scope"of"the"study"– Try"new"solu1ons"
Time"flies"when"you"know"what"you"are"doing.""
Complexity"versus"decision"making"
"• Most"bioinforma1cs"analyses"consists"of"a"very#large#number#of"very#simple"decisions"
• Most"of"which"need"to"be"correct!""
• This"is"what"makes"it"difficult"""
• There"are"no"strict"rules,"only"guidelines"""dare"to"improvise"and"adapt##
""
Bioinforma1cs"today"Large"datasets"generated"by"complex"equipment"
1. Data#management"""storage,"transfer,"data"transforma1on"""domain"of"InformaRon#Technology""
2. Data#analysis"""mapping,"assembly"""algorithm"scaling"""domain"of"Computer#Science"
3. StaRsRcal#challenges#""tradi1onal"sta1s1cs"is"not"well"suited"for"modeling"systema1c"errors"over"large"number"of"observa1ons"""domain"of"StaRsRcs""
4. Biological#hypothesis#tesRng##""data"interpreta1on""domain"of"Life#Science#
Analysis"Scaling"
• Analysis"algorithms"almost"never"scale"linearly"with"the"amount"of"data."
• For"example,"naïve"sequence"comparisons"scale"as"N*N:"in"order"to"compare"N"sequences"against"themselves"we"need"to"do"N*N"opera1ons""
• N=1"""N=103"analysis"1me"increases"from"1""106."
Origins"of"Classical"Sta1s1cs"
Developed"in"the"era"of""
• Limited"computa1onal"capabili1es"
• Small"and"expensive"datasets""Operates"on"concepts"such"as""“null"hypothesis”"and"“pRvalues”"
Currently"in"life"sciences"
• Powerful"computa1onal"capabili1es"
• Cheap"and"extremely"large"datasets"
Small"systema1c"devia1ons"strongly"influence"any"test"–"we"are"unable"to"separate"the"many"influences"
""The"era"of"absurd"(silly?)"pVvalues,##p=10V19#
"
Data"characteris1cs ""
• Random#errors#and"systemaRc#errors#accumulate"and"compound"during"each"step:"from"sample"extrac1on,"prepara1on"then"measurements"
• Large"number"of"measurements"make"unlikely"events"very"common"
Data#produced#by#equipment#
novel"informa1on"
Example:"74"new"SNVs""(single"nucleo1de"varia1ons)"per"individual"per"genera1on"
Challenges:"shiaing"terminology"
What"is"the"difference"between"a"SNP"(single"nucleo1de"polymorphism)"and"a"SNV(single"nucleo1de"varia1on)?"""A"SNV"is"a"private"muta1on"while"a"SNP"is"a"muta1on"that"is"shared"amongst"a"popula1on""At"what"point"does"a"SNP"turn"into"SNV?""
BioStar":"hjp://www.biostars.org"
• I"started"the"site"in"2009"during"the"first"year"that"BMMB"597D"was"offered!"""
• It"was"meant"to"support"ques1ons"for"this"course"
• Today"it"has"grown"to"ajract"over"40K"unique"visitors"per"month"and"over"2#million#page"views"per"year"
BioStar":"hjp://www.biostars.org"
How"to"turn"you"computer"""into"a"computa1onal"beast"
On"a"Mac"you"will"need"to:""1. Update"your"Mac#OS#to"the"latest"version"10.8.4"
(Mountain"Lion)""
2. Using"the"App"Store"download"and"install"XCode###
3. Download"and"install"the"command#line#tools##
Xcode"preferences"""Downloads"
On"Linux"
• Install"a"well"supported"linux"version:"Ubuntu,"Debian,"Fedora"etc."
• Use"a"package#manager"to"install"dependencies,"leads"to"incanta1ons"such"as:"
""""""""""""""""""""""aptVget#install#zlib1gVdev##
"
Success"looks"like"this"
the"make"tool"and"gcc"both"work"
Noble"WS"(2009)"A"Quick"Guide"to"Organizing"Computa1onal"Biology"Projects."PLoS"Comput"Biol"5(7):"e1000424."doi:10.1371/journal.pcbi.1000424"hjp://www.ploscompbiol.org/ar1cle/info:doi/10.1371/journal.pcbi.1000424"
Organizing"your"projects"
The"UNIX"command"line"
"Ac1on"words"
"Chain"words"together"to"form"statements"
""Open"Terminal"on"Mac."
""""""
"
Gewng"to"know"the"terminal:""man"(manual)"
DiscoverVability:"learn"how"to"find"out"more"details"on"a"tool"process"
Learn"more"about"these"commands"
• ls"""(list"directory"contents)"• rm"(remove"files/directories)"• cp"(copy"files)"• cd"(change"directory)"• mkdir"(make"directory)"• rmdir"(remove"directory)"• pwd"(print"current"work"directory)""
Note"that"each"on"of"these"commands"has"op1ons"(flags)"
"
Customizing"commands"
• A"flag"is"a"small"decora1on"used"to"change""or"customize"what"a"tool"does.""
Homework"1"
"1. Install"all"the"required"soaware"2. Navigate,"list"contents"of"directories"3. Create"and"delete"directories,"create"and"delete"files"4. Find"help"on"commands"5. Understand"what"flags"mean"6. Make"the"ls"command"write"out"the"files"sizes"in"“human"
friendly”"mode"7. Make"the"rm"command"ask"for"permission"when"
removing"a"file"8. Make"the"cp"command"ask"for"permission"if"the"copy"
would"overwrite"an"exis1ng"file"(this"is"called"clobbering)"
top related