stat115 stat215 bio512 bist298 introduction to computational biology and bioinformatics spring 2015...

32
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In Sheet

Upload: marylou-newton

Post on 22-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

STAT115STAT215 BIO512 BIST298

Introduction to Computational Biology and Bioinformatics

Spring 2015

Xiaole Shirley Liu

Please Fill Out Student Sign In Sheet

Page 2: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

Bioinformatics and Computational Biology

• Interdisciplinary – Statistics, Biology, Computer Science

• Applied– From freshman to postdocs– Useful training for many– The more you practice, the better you get

• Moves with technology development

STAT1152

Page 3: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

The Protein Sequence and Structure Wave

• 1955: Sanger sequenced bovine insulin

• 1970: Smith-Waterman algorithm

• 1973: PDB

• 1990: BLAST

• 1994: BLOCKS database

• 1994-: CASP

• 1997-: Proteomics

STAT1153

Page 4: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

STAT1154

The Microarray Wave

• Microarray contains hundreds to millions of tiny probes

• Simultaneously detect how much each gene is expressed

Page 5: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

STAT1155

ALL vs AML

• Golub et al, Science 1999.

Page 6: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

STAT1156

ALL vs AML

Page 7: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

“Microarrays” Today

• Infer the expression value of all the genes from 1000 probes

• High throughput drug screen

STAT1157

Page 8: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

The DNA Sequencing Wave

STAT1158

• 1953: DNA structure

• 1972: Recombinant DNA

• 1977: Sanger sequencing

• 1985: PCR

• 1988: NCBI

• 1990: BLAST

Page 9: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

Sequencing in the 1970s

STAT1159

Page 10: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

STAT11510

The Human Genome Race

• Human Genome Project: 1990-2003– Originally 1990-2005– Boosted by technology improvement and

automation– Competition from Celera

Page 11: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

STAT11511

Human Genome Sequencing• Clone-by-clone and whole-genome shotgun

Page 12: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

STAT11512

The Human Genome Race

• Human Genome Project: 1990-2003– Originally 1990-2005– Boosted by technology improvement and

automation– Competition from Celera

• Informatics essential for both the public and private sequencing efforts– Sequence assembly and gene prediction– Working draft finished simultaneously spring

2000

Page 13: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

Sequencing in 2001

Page 14: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

Sequencing in 2007

Page 15: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

Sequencing Today

• Personal genome sequencing

• HiSeq X– 900GB data / flow cell

in < 3 days, 10 * 30X human genomes, at ~$1.5-2K / sample

STAT11515

Page 16: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

Personalized Disease Susceptibility Test and Treatment

STAT11516

Page 17: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

Big Data Challenges

STAT11517

Page 18: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

All biology is becoming computational, much the same way it has became

molecular … Otherwise “low input, high throughput and no output science”

--- Sydney Brenner

2002 Nobel Prize

Page 19: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

STAT11519

Page 20: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

Class Information

• Course website: – http://stat115.org/ – Video recording / slides online– Office hours, auditing– Background: CS, Stats, Biology

• Roughly 3 modules (2 HW each)– Transcriptome (microarrays and RNA-seq)– Gene regulation (transcriptional & epigenetic

regulation)– Human genetics and disease (GWAS / cancer)

STAT11520

Page 21: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

Class Information

• Teaching Fellows

Yang Li Stephanie Chan

• Labs: Wed 6 – 8pm, Science Center B09 – Tue 6-8pm, HSPH Kresge 209, Boston– First Lab: Fri 1/30 3-5pm (Odyssey)!

STAT11521

Page 22: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

HW and Grading

• Discussion forum: stat115.slack.com

• Submission email: [email protected]

• HW 6 * 10 or 6 * 12

• Final exams 20

• Class participation: 20

• Algorithm videos: 5

• Lecture notes: extra 5 points

• Late daysSTAT11522

Page 23: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

STAT11523

Page 24: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

Gene Expression Microarrays

Page 25: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

25

Expression Microarrays

• Grow cells at certain condition, collect mRNA population, and label them

• Microarray has high density (thousands to millions) sequence specific probes with known location for each gene/RNA

• Sample hybridized to microarray probes by DNA (A-T, G-C) base pairing, wash non-specific binding

• Measure sample mRNA value by checking labeled signals at each probe location

Page 26: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

26

Affymetrix GeneChip Arrays

Page 27: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

27

Labeled Samples Hybridize to DNA Probes on GeneChip

Page 28: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

28

Shining Laser Light CausesTagged Fragments to Glow

Page 29: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

29

Perfect Match (PM) vs MisMatch (MM)(control for cross hybridization)

Page 30: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

NimbleGen Arrays

30

Page 31: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

Agilent Arrays

31

Page 32: STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In

Microarrays

• Array comparison:– # probes / array, # probes / gene, probe length– Flexibility vs data reuse

• Why do we bother learning about microarrays now?– RNA-seq is probably preferred in new

expression experiments– The amount of useful public data– The data analysis techniques

STAT11532