Download - Introduction to bioinformatics (I617)
![Page 1: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/1.jpg)
Introduction to bioinformatics(I617)Haixu Tang
School of InformaticsEmail: [email protected]
Office: EIG 1008Tel: 812-856-1859
![Page 2: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/2.jpg)
Textbook
• A Primer of Genome Science (2nd Edition) by Greg Gibson, Spencer V. Muse, Sinauer Associates, 2004
• Suggested reading materials will be posted on the class wiki page: http://cheminfo.informatics.indiana.edu/djwild/I617_2006_wiki/index.php/Main_Page
• Office Hour: MW 11:00-12:00, EIG 1008 or appointment
![Page 3: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/3.jpg)
Grading
• Class project: selected from one of four covered areas (bioinformatics, Chemical informatics, Laboratory informatics and Health informatics) 25%– Suggested Bioinformatics topics will be
posted on the class wiki page
• Homework: 25% in Bioinformatics– 4, each 6.25%
![Page 4: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/4.jpg)
Bioinformatics = BIOlogy + informatics?
• Not really: it is a term (somehow arbitrarily chosen) to define a multi-disciplinary area that combines life sciences, physical sciences and computer science / informatics;
• It addresses biological problems using theoretical informatics approaches, not vice versa;
• It is transforming classical Biology into a Information Science.
![Page 5: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/5.jpg)
The birth of bioinformatics
• A revolution in biology research: the emergence of Genome Science
• Technology advancement in both biology and information science
![Page 6: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/6.jpg)
Genome science: a revolution of biology
• Classical Biology • Genome Science
Hypothesis
Data
Knowledge
Hypothesis driven approach
Hypothesis
Knowledge
Data
Data driven approach
![Page 7: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/7.jpg)
Bioinformatics: from data analysis to data mining
Hypothesis
Data
• Classical Biology
Low throughput data
• Genome Science
Hypothesis
Data
High throughput data
Hypothesis confirmation / rejection
Hypothesis generation
1 2 3 …
![Page 8: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/8.jpg)
Bioinformatics: in the driver’s seat
• Classical Biology
Hypothesis
Data
Knowledge
• Genome Science
Hypothesis
Knowledge
Data
Data analysis
Data mining
![Page 9: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/9.jpg)
Key technology advancements• High throughput biotechnologies
– Genome sequencing techniques– DNA microarray– Mass spectrometry
• Large-scale experiments– HGP, HapMap– Omics / Systems Biology
• Massive data generation, storage, exchange and analysis– CPU, storage, etc.– High speed network (Internet)– Bioinformatics
![Page 10: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/10.jpg)
Bioinformatics: mutually beneficial
• For biologists– Fragment assembly in
genome sequencing– Genome comparison– Gene clustering in
DNA microarray analysis
– Protein identification in proteomics
• For computer scientists– String algorithms / Tree
algorithms– Alternative Eulerian path
(BEST theorem)– Reversal distances– Probabilistic graphic
models (HMMs, BNs, etc.)
![Page 11: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/11.jpg)
Two origins of bioinformatics
• Combinatorial pattern matching in theoretical computer science– DNA and protein sequence analysis
• Physical and analytical chemistry of Biomolecules– Protein structure analysis Structural
bioinformatics– Bio-analytical chemistry Proteomics
![Page 12: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/12.jpg)
Bioinformatics addresses computational challenges in life and medical sciences
• New computational problems for automatic data analysis
• Reformulation of old problems using new high throughput data
• Formulating new problems using high throughput data
![Page 13: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/13.jpg)
Bioinformatics addresses computational challenges in life and medical sciences
• New computational problems for automatic data analysis• Genome sequencing• Proteomics• Transcriptomics
• Data representation and visualization• Genome Browser
• Solving biological problems by in silico approaches– Reformulation of old problems using new high throughput data
• Gene finding• Protein structure and function
– Formulating new problems using high throughput data• Comparative genomics• Polymorphisms / Population genetics• Systems Biology
![Page 14: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/14.jpg)
Bioinformatics resources
• Databases– Nucleic Acid Research (NAR) annual database issue
• Organization– ISCB (International Society in Computational Biology)
• Conferences– ISMB– RECOMB– Many other smaller or regional conferences, e.g.
ECCB, CSB, PSB, etc, including local Indiana Bioinformatics conference
![Page 15: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/15.jpg)
A case study
• How bioinformatics help and transform classical biological topics?
• Molecular evolutionary studies: from anatomical features to molecular evidences
• Genome evolution: comparison of gene orders
![Page 16: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/16.jpg)
Early Evolutionary Studies
• Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early 1960s
![Page 17: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/17.jpg)
Early Evolutionary Studies
• Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early 1960s
• The evolutionary relationships derived from these relatively subjective observations were often inconclusive. Some of them were later proved incorrect
![Page 18: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/18.jpg)
Evolution and DNA Analysis: the Giant Panda Riddle
• For roughly 100 years scientists were unable to figure out which family the giant panda belongs to
• Giant pandas look like bears but have features that are unusual for bears and typical for raccoons, e.g., they do not hibernate
![Page 19: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/19.jpg)
Evolution and DNA Analysis: the Giant Panda Riddle
• In 1985, Steven O’Brien and colleagues solved the giant panda classification problem using DNA sequences and bioinformatics algorithms
![Page 20: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/20.jpg)
Evolutionary Tree of Bears and Raccoons
![Page 21: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/21.jpg)
Evolutionary Trees: DNA-based Approach
• 40 years ago: Emile Zuckerkandl and Linus Pauling brought reconstructing evolutionary relationships with DNA into the spotlight
• In the first few years after Zuckerkandl and Pauling proposed using DNA for evolutionary studies, the possibility of reconstructing evolutionary trees by DNA analysis was hotly debated
• Now it is a dominant approach to study evolution.
![Page 22: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/22.jpg)
Evolutionary Trees
How are these trees built from DNA sequences?
![Page 23: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/23.jpg)
Evolutionary Trees
How are these trees built from DNA sequences?
– leaves represent existing species
– internal vertices represent ancestors
– root represents the common evolutionary ancestor
![Page 24: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/24.jpg)
Rooted and Unrooted Trees
In the unrooted tree the position of the root (“common ancestor”) is unknown. Otherwise, they are like rooted trees
![Page 25: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/25.jpg)
Distances in Trees
• Edges may have weights reflecting:– Number of mutations on evolutionary path from
one species to another– Time estimate for evolution of one species into
another• In a tree T, we often compute
dij(T) - the length of a path between leaves i and j
dij(T) – tree distance between i and j
![Page 26: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/26.jpg)
Distance in Trees: an Exampe
d1,4 = 12 + 13 + 14 + 17 + 12 = 68
i
j
![Page 27: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/27.jpg)
Distance Matrix
• Given n species, we can compute the n x n distance matrix Dij
• Dij may be defined as the edit distance between a gene in species i and species j, where the gene of interest is sequenced for all n species.
Dij – edit distance between i and j
![Page 28: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/28.jpg)
Fitting Distance Matrix
• Given n species, we can compute the n x n distance matrix Dij
• Evolution of these genes is described by a tree that we don’t know.
• We need an algorithm to construct a tree that best fits the distance matrix Dij
![Page 29: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/29.jpg)
Reconstructing a 3 Leaved Tree
• Tree reconstruction for any 3x3 matrix is straightforward
• We have 3 leaves i, j, k and a center vertex c
Observe:
dic + djc = Dij
dic + dkc = Dik
djc + dkc = Djk
![Page 30: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/30.jpg)
Turnip vs Cabbage: Look and Taste Different
• Although cabbages and turnips share a recent common ancestor, they look and taste different
![Page 31: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/31.jpg)
Turnip vs Cabbage: Comparing Gene Sequences Yields No Evolutionary Information
![Page 32: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/32.jpg)
Turnip vs Cabbage: Almost Identical mtDNA gene sequences
• In 1980s Jeffrey Palmer studied evolution of plant organelles by comparing mitochondrial genomes of the cabbage and turnip
• 99% similarity between genes• These surprisingly identical gene
sequences differed in gene order• This study helped pave the way to
analyzing genome rearrangements in molecular evolution
![Page 33: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/33.jpg)
Turnip vs Cabbage: Different mtDNA Gene Order
• Gene order comparison:
Before
After
Evolution is manifested as the divergence in gene order
![Page 34: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/34.jpg)
Turnip vs Cabbage: Different mtDNA Gene Order
• Gene order comparison:
![Page 35: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/35.jpg)
Turnip vs Cabbage: Different mtDNA Gene Order
• Gene order comparison:
![Page 36: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/36.jpg)
Turnip vs Cabbage: Different mtDNA Gene Order
• Gene order comparison:
![Page 37: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/37.jpg)
Turnip vs Cabbage: Different mtDNA Gene Order
• Gene order comparison:
![Page 38: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/38.jpg)
Transforming Cabbage into Turnip
Reversal distance
![Page 39: Introduction to bioinformatics (I617)](https://reader035.vdocuments.site/reader035/viewer/2022062316/56813277550346895d991080/html5/thumbnails/39.jpg)
History of Chromosome X
Rat Consortium, Nature, 2004