what is bioinformatics?hpc.ilri.cgiar.org/beca/training/ilri-aau_2015/clc/k...1000 genomes project...
TRANSCRIPT
What is Bioinformatics?
■ “Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline.” - NCBI
■ “The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.” - NCBI
http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
⇒ A 3’ hydroxyl group is essential for chain elongation
CHAIN TERMINATOR
DNA Sequencing
5’
3’
Capillary Gel Electrophoresis
⇒ The sequencing reaction is run out in a single capillary gel. ⇒ The gel is scanned by a laser. ⇒ The sequence is read automatically using computer software from the pattern of different wavelengths emitted by the fluorescent dyes.
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
Automated sequencers: ABI 3700
■ Made by Applied
Biosystems ■ Most widely used
automated sequencers:
– 96 capillaries – robot loading from 384-
well plates ■ Two to three hours per
run ■ 600–700 bases per run
96–well plate
robotic arm and syringe
96 glass capillaries
load bar
Workflow of conventional vs. second-generation sequencing
6
High-throughput shotgun Sanger sequencing
Cyclic array shotgun sequencing
96 or 384 long reads per run
Millions of short reads per run
Template immobilization Sanger cycle seq
(Template amplification)
Template amplification
Capillary electrophoresis
Seq by synthesis or hybridization
Illumina
7 Figu
re fr
om M
. Met
zker
, Nat
Rev
Gen
et, J
an. 2
010
Cost of Sequence per megabase
Benefits of Next-gen sequencing
https://genomevolution.org/wiki/images/1/16/Plant_Genome_Growth.png
Why do we sequence? ■ Genome Annotation:
A complete genome sequence provides us with the raw data to construct a "parts list".
■ Comparative Genomics:
Conserved regions in the genome are more likely to play an important role in biology of the species.
■ Functional Genomics:
Sequencing the RNA provides us with an insight into the transcriptionally active regions of the genome.
■ Population Genetics and Genomics:
Genetic structure and diversity reveals history and distribution of phenotypic traits (e.g. disease susceptibility alleles)
■ Genetic Analysis:
Map and characterize molecular basis of allelic variants 10
We have the genome sequence, now what ?
● Well...!● We don’t know how many genes there are!!● We don’t know where they are!!● We don’t know what they do!!
Definitions of Annotation
■ Interpreting raw sequence data into useful biological information
■ Information attached to genomic coordinates with start and end point, can occur at different levels
■ Addition of as much reliable and up-to-date information as possible to describe a sequence
■ Identification, structural description, characterization of putative protein products and other features in primary genomic sequence
Genome annotation
• Structural annotation = Nucleotide-Protein level annotation. Finding genes and other biologically relevant sites thus building up a model of genome as objects with specific locations
• Functional annotation = Objects are used in database searches (and experiments) aim is attributing biologically relevant information to whole sequence and individual objects
Large-scale genome analysis projects
• Rate-limiting step is annotation
Two Main Levels
14
How do we get from here …
to here,
Summary of gene annotation steps
Gene prediction through comparative genomics
■ Highly similar (Conserved) regions between two genomes are useful or else they would have diverged
■ If genomes are too closely related all regions are similar, not just genes
■ If genomes are too far apart, analogous regions may be too dissimilar to be found
17
Mouse-human comparison
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 From: J.W. Thomas et al - Nature 14 August 2003
The ENCODE Project Consortium (2011) A User’s Guide to the Encyclopedia of DNA Elements (ENCODE). PLoS Biol 9(4)
Automated Manual Merged
Basic Distributed Annotation Systems (DAS)
Genome and Functional Annotation:
Predicted genes, GO, MIPSFuncat
Data to support modeling efforts Protein-protein interactions Protein-DNA interactions Pathways (KEGG, AraCyc)
Experimental Data Microarray Chip-Chip
Contents of an Integrated Database
Bioinformaticians integrate the data into one database
1) Find the data. Decentralized databases Data in different formats
XML is a good idea (SBML) 2) Convert to a common format
3) Data integration. Manual: Excel sheet comparisons (Biologists) Automated: Perl Scripts (Informatician) Database: Queries e.g. SQL (High-production labs)
4) Gene list intersect.
Experiments Function Models
5) Modeling Biological function in Gene list Need visualization and network modeling tools
Annotation
UCSC browser
Examples of Large Genome Projects ■ 1000 Genomes Project (www.1000genomes.org). An effort to
sequence the genome of 1000 people to identify genetic variants that affect 1% of the human population.
■ 1001 Arabidopsis thaliana Genomes Project (www.1001genomes.org) . Study the genomes and phenotypes of 1001 strains that can explain difference in phenotype caused by adaptation of different conditions.
■ Metagenomics (http://commonfund.nih.gov/hmp/): Sequencing of DNA samples from environments, for example mouth, skin, and digestive system, to identify the different bacterial species present.
Your genome ■ Personal Genome Sequencing: Several companies provide a
service where you can submit your DNA to get sequenced. This can help you learn more about your heritage and also which diseases you are susceptible to.
■ Medical Genomic Studies: There are already a collection of genetic testing procedures that look for specific genes. Unfortunately they are not accurate which can result in individuals making bad decisions. But hope is that with more genes, we can make better and more informed decisions.