cbio course, spring 2005, hebrew university computational methods in molecular biology cs-67693,...

48
cbio course, spring 2005, Hebrew University Computational Methods In Molecular Biology CS-67693, Spring 2005 School of Computer Science & Engineering Hebrew University, Jerusalem

Post on 21-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

cbio course, spring 2005, Hebrew University

Computational Methods In Molecular Biology

CS-67693, Spring 2005

School of Computer Science & Engineering

Hebrew University, Jerusalem

cbio course, spring 2005, Hebrew University

Class 1: Introduction

cbio course, spring 2005, Hebrew University

Introduction

What is Comp. Bio.? Why is it great? What are the aims and basic concepts of this

course High level biological review: give basic bio

background and motivation for tasks handled in the course

Administration…

cbio course, spring 2005, Hebrew University

The Cell

cbio course, spring 2005, Hebrew University

Example: Tissues in Stomach

cbio course, spring 2005, Hebrew University

DNA Components

Four nucleotide types: Adenine Guanine Cytosine Thymine

Hydrogen bonds: A-T C-G

cbio course, spring 2005, Hebrew University

The Double HelixS

ourc

e: A

lber

ts e

t al

cbio course, spring 2005, Hebrew University

DNA OrganizationS

ourc

e: A

lber

ts e

t al

cbio course, spring 2005, Hebrew University

Genome Sizes

E.Coli (bacteria) 4.6 x 106 bases Yeast (simple fungi) 15 x 106 bases Smallest human chromosome 50 x 106 bases Entire human genome 3 x 109 bases

cbio course, spring 2005, Hebrew University

Related Computational Tasks

Need a way to reconstruct DNA sequence from fragments – major contribution of comp. bio. !

Related: sequence comparison, sequence alignment

cbio course, spring 2005, Hebrew University

DNA DuplicationS

ourc

e: M

ath

ews

& v

an H

old

e

cbio course, spring 2005, Hebrew University

GenesThe DNA strings include: Coding regions (“genes”)

E. coli has ~4,000 genes Yeast has ~6,000 genes C. Elegans has ~13,000 genes Humans have ~32,000 genes

Control regions These typically are adjacent to the genes They determine when a gene should be

expressed “Junk” DNA (unknown function)

cbio course, spring 2005, Hebrew University

The Tree of Life

Sou

rce:

Alb

erts

et

al

cbio course, spring 2005, Hebrew University

Evolution

Related organisms have similar DNA Similarity in sequences of proteins Similarity in organization of genes along the

chromosomes Evolution plays a major role in biology

Many mechanisms are shared across a wide range of organisms (e.g. orthologes)

During the course of evolution existing components are adapted for new functions (e.g paraloges)

cbio course, spring 2005, Hebrew University

Evolution

Evolution of new organisms is driven by Diversity

Different individuals carry different variants of the same basic blue print

Mutations The DNA sequence can be changed due to

single base changes, deletion/insertion of DNA segments, etc.

Selection bias

cbio course, spring 2005, Hebrew University

Related Computational Tasks

Phylogeny – not just theory!: Rebuild the tree of life… Infer relations between genes/pathways etc.

across species Learn models for changes and development Major benefit: exploit the information we do

have/observe to infer about the systems on which we have very little knowledge and observations….

cbio course, spring 2005, Hebrew University

How Do Genes Code for Proteins?

Transcription

RNA

Translation

ProteinDNA

cbio course, spring 2005, Hebrew University

Transcription

Coding sequences can be transcribed to RNA

RNA nucleotides: Similar to DNA, slightly different backbone Uracil (U) instead of Thymine (T)

Sou

rce:

Mat

hew

s &

van

Hol

de

cbio course, spring 2005, Hebrew University

RNA Editing

cbio course, spring 2005, Hebrew University

Translation

cbio course, spring 2005, Hebrew University

Translation

The ribosome attaches to the mRNA at a translation initiation site

Then ribosome moves along the mRNA sequence and in the process constructs a poly-peptide

When the ribosome encounters a stop signal, it releases the mRNA. The construct poly-peptide is released, and folds into a protein.

Translation is mediated by the ribosomeRibosome is a complex of protein & rRNA molecules

cbio course, spring 2005, Hebrew University

Translation

Sou

rce:

Alb

erts

et

al

cbio course, spring 2005, Hebrew University

Translation

Sou

rce:

Alb

erts

et

al

cbio course, spring 2005, Hebrew University

Translation

Sou

rce:

Alb

erts

et

al

cbio course, spring 2005, Hebrew University

Translation

Sou

rce:

Alb

erts

et

al

cbio course, spring 2005, Hebrew University

Translation

Sou

rce:

Alb

erts

et

al

cbio course, spring 2005, Hebrew University

Genetic Code

cbio course, spring 2005, Hebrew University

Transcription

RNA

Translation

ProteinDNA

The Central Dogma

Genes

Experiments

cbio course, spring 2005, Hebrew University

TFTFTFs

Basal

Promoter

mRNA

Gene5’ 3’

Transcription start site

3’ 5’

RNA polymerase II

5’

Eukaryotic Transcription Regulation

“Classical Model” Composition of promoter region determines rate of

transcription initiation Combinations of TFs control the transcription of

gene sets under specific conditions

Genes

TF

cbio course, spring 2005, Hebrew University

From Data to Model

>YKL112W Chr 11 ATGGACAAATTAGTCGTGAATTATTATGAATACAAGCACCCTATAATTAATAAAGACCTGGCCATTGGAGCCCATGGAGGCAAAAAATTTCCCACCTTGGGTGCTTGGTATGATGTAATTAATGAGTACGAATTTCAGACGCGTTGCCCTATTATTTTAAAGAATTCGCATAGGAACAAACATTTTACATTTGCCTGTCATTTGAAAAACTGTCCATTTAAAGTCTTGCTAAGCTATGCTGGCAATGCTGCATCCTCAGAAACCTCATCTCCTTCTGCAAATAATAATACCAACCCTCCGGGTACTCCTGATCATATTCATCATCATAGCAACAACATGAACAACGAGGACAATGATAATAACAATGGCAGTAATAATAAGGTTAGCAATGACAGTAAACTTGACTTCGTTACTGATGATCTTGAATACCATCTGGCGAACACTCATCCGGACGACACCAATGACAAAGTGGAGTCGAGAAGCAATGAGGTGAATGGGAACAATGACGATGATGCTGATGCCAACAACATTTTTAAACAGCAAGGTGTTACTATCAAGAACGACACTGAAGATGATTCGATAAATAAGGCCTCTAT

cbio course, spring 2005, Hebrew University

Many Related Computational Tasks…

Information is in the code book →: How alternative splicing is determined and

where? Build models for regulation of genes at different

levels of complexity Relate genotype and phenotype: What are the

expression patterns of some disease? How do they relate to sequence? What model can explain the observations? Can we predict phenomenon based on our models?

cbio course, spring 2005, Hebrew University

Who came first?

Chicken or egg? Egg

DNA or Protein? RNA…

Thomas Cech & Sidney Altman ( 80’s !): RNA as an “independent” molecule Probably more close to the ancient “source”

cbio course, spring 2005, Hebrew University

RNA roles

Messenger RNA (mRNA) Encodes protein sequences

Transfer RNA (tRNA) Adaptor between mRNA molecules and amino-

acids (protein building blocks) Ribosomal RNA (rRNA)

Part of the ribosome, a machine for translating mRNA to proteins

...

cbio course, spring 2005, Hebrew University

Transfer RNA

Anticodon: matches a codon (triplet of mRNA nucleotides)

Attachment site: matches a specific amino-acid

cbio course, spring 2005, Hebrew University

Related Computational Tasks

RNA secondary structure prediction: based on CFG and CM

RNA coding area prediction …

cbio course, spring 2005, Hebrew University

How do Proteins Perform their Rules?

Protein interact in various ways Change conformations, conformations → function Major Issues:

Their “active”/functional areas which interact Their 3D structure

cbio course, spring 2005, Hebrew University

Protein Structure

Proteins are poly-peptides of 70-3000 amino-acids

This structure is (mostly) determined by the sequence of amino-acids that make up the protein

cbio course, spring 2005, Hebrew University

Protein Structure

cbio course, spring 2005, Hebrew University

Related Computational Tasks

Protein 2D, 3D structure prediction Identify sequence motifs/domains in proteins

Sequence similarity vs. functional similarity

cbio course, spring 2005, Hebrew University

Course Goals Review current tasks posed by modern molecular biology Review and experiment with some of the tools/solutions

currently found (e.g. BLAST, clustalw) Gain some tools to handle such problems:

Dynamic programming Probabilistic graphical models:

MM,HMM,CM,Trees Representation, what principles justify them, Learning,

Inference Statistic tools: how to measure our confidence in our

results?

cbio course, spring 2005, Hebrew University

Course’s Main Point

cbio course, spring 2005, Hebrew University

Course’s Main Point

Learn to do:

Define the problem → Find comp. solution

Four Aspects:Biological

What is the task?

Algorithmic How to perform the task at hand efficiently?

Learning How to adapt parameters of the task form examples

Statistics How to differentiate true phenomena from artifacts

cbio course, spring 2005, Hebrew University

Topics I

Dealing with DNA/Protein sequences: Genome projects and how sequences are found Finding similar sequences Models of sequences: Hidden Markov Models Transcription regulation Protein Families Gene finding

cbio course, spring 2005, Hebrew University

Topics II

Gene Expression: Genome-wide expression patterns Data organization: clustering Reconstructing transcription regulation Recognizing and classifying cancers

cbio course, spring 2005, Hebrew University

Topics III

Models of genetic change: Long term: evolutionary changes among species Reconstructing evolutionary trees from current day

sequences Short term: genetic variations in a population Finding genes by linkage and association

cbio course, spring 2005, Hebrew University

Topics IV

Protein World: How proteins fold - secondary & tertiary structure How to predict protein folds from sequences data

alone How to analyze proteins changes from raw

experimental measurements (MassSpec) 2D gels

cbio course, spring 2005, Hebrew University

Class Structure 2 weekly meeting

Mondays 16-18 (Levin 8), Wednesdays 10-12 (Kaplan)

Grade: Homework assignments: ~50% of the final grade. There will be

up to seven homework assignments. These assignments will include theoretical problems, using bioinformatics tools and programming.

Final home assignment: ~20% of the final grade. Final test: ~30% of the grade. Class participation: A 5% bonus grade for students who

actively participate in discussions during classes Possible: oral presentation of any exercise to define grade!

cbio course, spring 2005, Hebrew University

Exercises & Handouts

Check regularly

http://www.cs.huji.ac.il/~cbio