cbio course, spring 2005, hebrew university computational methods in molecular biology cs-67693,...
Post on 21-Dec-2015
214 views
TRANSCRIPT
cbio course, spring 2005, Hebrew University
Computational Methods In Molecular Biology
CS-67693, Spring 2005
School of Computer Science & Engineering
Hebrew University, Jerusalem
cbio course, spring 2005, Hebrew University
Introduction
What is Comp. Bio.? Why is it great? What are the aims and basic concepts of this
course High level biological review: give basic bio
background and motivation for tasks handled in the course
Administration…
cbio course, spring 2005, Hebrew University
DNA Components
Four nucleotide types: Adenine Guanine Cytosine Thymine
Hydrogen bonds: A-T C-G
cbio course, spring 2005, Hebrew University
Genome Sizes
E.Coli (bacteria) 4.6 x 106 bases Yeast (simple fungi) 15 x 106 bases Smallest human chromosome 50 x 106 bases Entire human genome 3 x 109 bases
cbio course, spring 2005, Hebrew University
Related Computational Tasks
Need a way to reconstruct DNA sequence from fragments – major contribution of comp. bio. !
Related: sequence comparison, sequence alignment
cbio course, spring 2005, Hebrew University
GenesThe DNA strings include: Coding regions (“genes”)
E. coli has ~4,000 genes Yeast has ~6,000 genes C. Elegans has ~13,000 genes Humans have ~32,000 genes
Control regions These typically are adjacent to the genes They determine when a gene should be
expressed “Junk” DNA (unknown function)
cbio course, spring 2005, Hebrew University
Evolution
Related organisms have similar DNA Similarity in sequences of proteins Similarity in organization of genes along the
chromosomes Evolution plays a major role in biology
Many mechanisms are shared across a wide range of organisms (e.g. orthologes)
During the course of evolution existing components are adapted for new functions (e.g paraloges)
cbio course, spring 2005, Hebrew University
Evolution
Evolution of new organisms is driven by Diversity
Different individuals carry different variants of the same basic blue print
Mutations The DNA sequence can be changed due to
single base changes, deletion/insertion of DNA segments, etc.
Selection bias
cbio course, spring 2005, Hebrew University
Related Computational Tasks
Phylogeny – not just theory!: Rebuild the tree of life… Infer relations between genes/pathways etc.
across species Learn models for changes and development Major benefit: exploit the information we do
have/observe to infer about the systems on which we have very little knowledge and observations….
cbio course, spring 2005, Hebrew University
How Do Genes Code for Proteins?
Transcription
RNA
Translation
ProteinDNA
cbio course, spring 2005, Hebrew University
Transcription
Coding sequences can be transcribed to RNA
RNA nucleotides: Similar to DNA, slightly different backbone Uracil (U) instead of Thymine (T)
Sou
rce:
Mat
hew
s &
van
Hol
de
cbio course, spring 2005, Hebrew University
Translation
The ribosome attaches to the mRNA at a translation initiation site
Then ribosome moves along the mRNA sequence and in the process constructs a poly-peptide
When the ribosome encounters a stop signal, it releases the mRNA. The construct poly-peptide is released, and folds into a protein.
Translation is mediated by the ribosomeRibosome is a complex of protein & rRNA molecules
cbio course, spring 2005, Hebrew University
Transcription
RNA
Translation
ProteinDNA
The Central Dogma
Genes
Experiments
cbio course, spring 2005, Hebrew University
TFTFTFs
Basal
Promoter
mRNA
Gene5’ 3’
Transcription start site
3’ 5’
RNA polymerase II
5’
Eukaryotic Transcription Regulation
“Classical Model” Composition of promoter region determines rate of
transcription initiation Combinations of TFs control the transcription of
gene sets under specific conditions
Genes
TF
cbio course, spring 2005, Hebrew University
From Data to Model
>YKL112W Chr 11 ATGGACAAATTAGTCGTGAATTATTATGAATACAAGCACCCTATAATTAATAAAGACCTGGCCATTGGAGCCCATGGAGGCAAAAAATTTCCCACCTTGGGTGCTTGGTATGATGTAATTAATGAGTACGAATTTCAGACGCGTTGCCCTATTATTTTAAAGAATTCGCATAGGAACAAACATTTTACATTTGCCTGTCATTTGAAAAACTGTCCATTTAAAGTCTTGCTAAGCTATGCTGGCAATGCTGCATCCTCAGAAACCTCATCTCCTTCTGCAAATAATAATACCAACCCTCCGGGTACTCCTGATCATATTCATCATCATAGCAACAACATGAACAACGAGGACAATGATAATAACAATGGCAGTAATAATAAGGTTAGCAATGACAGTAAACTTGACTTCGTTACTGATGATCTTGAATACCATCTGGCGAACACTCATCCGGACGACACCAATGACAAAGTGGAGTCGAGAAGCAATGAGGTGAATGGGAACAATGACGATGATGCTGATGCCAACAACATTTTTAAACAGCAAGGTGTTACTATCAAGAACGACACTGAAGATGATTCGATAAATAAGGCCTCTAT
cbio course, spring 2005, Hebrew University
Many Related Computational Tasks…
Information is in the code book →: How alternative splicing is determined and
where? Build models for regulation of genes at different
levels of complexity Relate genotype and phenotype: What are the
expression patterns of some disease? How do they relate to sequence? What model can explain the observations? Can we predict phenomenon based on our models?
cbio course, spring 2005, Hebrew University
Who came first?
Chicken or egg? Egg
DNA or Protein? RNA…
Thomas Cech & Sidney Altman ( 80’s !): RNA as an “independent” molecule Probably more close to the ancient “source”
cbio course, spring 2005, Hebrew University
RNA roles
Messenger RNA (mRNA) Encodes protein sequences
Transfer RNA (tRNA) Adaptor between mRNA molecules and amino-
acids (protein building blocks) Ribosomal RNA (rRNA)
Part of the ribosome, a machine for translating mRNA to proteins
...
cbio course, spring 2005, Hebrew University
Transfer RNA
Anticodon: matches a codon (triplet of mRNA nucleotides)
Attachment site: matches a specific amino-acid
cbio course, spring 2005, Hebrew University
Related Computational Tasks
RNA secondary structure prediction: based on CFG and CM
RNA coding area prediction …
cbio course, spring 2005, Hebrew University
How do Proteins Perform their Rules?
Protein interact in various ways Change conformations, conformations → function Major Issues:
Their “active”/functional areas which interact Their 3D structure
cbio course, spring 2005, Hebrew University
Protein Structure
Proteins are poly-peptides of 70-3000 amino-acids
This structure is (mostly) determined by the sequence of amino-acids that make up the protein
cbio course, spring 2005, Hebrew University
Related Computational Tasks
Protein 2D, 3D structure prediction Identify sequence motifs/domains in proteins
Sequence similarity vs. functional similarity
cbio course, spring 2005, Hebrew University
Course Goals Review current tasks posed by modern molecular biology Review and experiment with some of the tools/solutions
currently found (e.g. BLAST, clustalw) Gain some tools to handle such problems:
Dynamic programming Probabilistic graphical models:
MM,HMM,CM,Trees Representation, what principles justify them, Learning,
Inference Statistic tools: how to measure our confidence in our
results?
cbio course, spring 2005, Hebrew University
Course’s Main Point
Learn to do:
Define the problem → Find comp. solution
Four Aspects:Biological
What is the task?
Algorithmic How to perform the task at hand efficiently?
Learning How to adapt parameters of the task form examples
Statistics How to differentiate true phenomena from artifacts
cbio course, spring 2005, Hebrew University
Topics I
Dealing with DNA/Protein sequences: Genome projects and how sequences are found Finding similar sequences Models of sequences: Hidden Markov Models Transcription regulation Protein Families Gene finding
cbio course, spring 2005, Hebrew University
Topics II
Gene Expression: Genome-wide expression patterns Data organization: clustering Reconstructing transcription regulation Recognizing and classifying cancers
cbio course, spring 2005, Hebrew University
Topics III
Models of genetic change: Long term: evolutionary changes among species Reconstructing evolutionary trees from current day
sequences Short term: genetic variations in a population Finding genes by linkage and association
cbio course, spring 2005, Hebrew University
Topics IV
Protein World: How proteins fold - secondary & tertiary structure How to predict protein folds from sequences data
alone How to analyze proteins changes from raw
experimental measurements (MassSpec) 2D gels
cbio course, spring 2005, Hebrew University
Class Structure 2 weekly meeting
Mondays 16-18 (Levin 8), Wednesdays 10-12 (Kaplan)
Grade: Homework assignments: ~50% of the final grade. There will be
up to seven homework assignments. These assignments will include theoretical problems, using bioinformatics tools and programming.
Final home assignment: ~20% of the final grade. Final test: ~30% of the grade. Class participation: A 5% bonus grade for students who
actively participate in discussions during classes Possible: oral presentation of any exercise to define grade!