bioinformatics as a approach to new generation of ...d93012/20041014.pdf2004/10/14 · introduction...
TRANSCRIPT
1Introduction 1.0
Bioinformatics As A Approach to Bioinformatics As A Approach to New Generation of Biological StudiesNew Generation of Biological Studies
林仲彥林仲彥
Lin, ChungLin, Chung--Yen Yen Ph.DPh.D
[email protected]@nhri.org.tw
助研究員助研究員國家衛生研究院生物統計與生物資訊研究組國家衛生研究院生物統計與生物資訊研究組
2Introduction 1.0
Introduction - Objectives
• Why does bioinformatics exist • What is bioinformatics• What are the big challenges in
bioinformatics– Research– Discipline differences between Bio and CS
3Introduction 1.0
Why Is There Bioinformatics?
u Lots of new sequences being added- Automated sequencers- Genome Projects- EST sequencing, microarray studies, proteomics
Patterns in datasets that can be analyzed using computers
Huge datasets
4Introduction 1.0
Need for Informatics in Biology: Origins
• Gramicidin S (Consden et al., 1947), partial insulin sequence (Sanger and Tuppy, 1951)
• 1961: tRNA fragments• Francis Crick, Sydney Brenner, and colleagues
propose the existence of transfer RNA that uses a three base code and mediates in the synthesis of proteins (Crick et al., 1961) General nature of genetic code for proteins. Nature 192: 1227-1232. In Microbiology: A Centenary Perspective, edited by Wolfgang K. Joklik, ASM Press. 1999, p.384
• First codon assignment UUU/phe (Nirenberg and Matthaei, 1961)
5Introduction 1.0
Need for Informatics in Biology: Origins
• The key to the whole field of nucleic acid-based identification of microorganisms. The introduction molecular systematics using proteins and nucleic acids by the American Nobel laureate Linus Pauling.
Zuckerkandl, E., and L. Pauling. "Molecules as Documents of Evolutionary History." 1965. Journal of Theoretical Biology 8:357-366
• Another landmark: Nucleic acid sequencing (Sanger and Coulson, 1975)
6Introduction 1.0
Need for Informatics in Biology: Origins
• First genomes sequenced: – 3.5 kb RNA bacteriophage MS2
(Fiers et al., 1976)– 5.4 kb bacteriophage ϕX174
(Sanger et al., 1977)– 1.83 Mb First complete genome sequence of a
free-living organism: Haemophilus influenzaeKW20 (Fleischmann et al., 1995)
– First multicellular organism to be sequenced: C. elegans (C. elegans sequencing consortium, 1998)
• Early databases: Dayhoff, 1972; Erdmann, 1978
• Early programs: restriction enzyme sites, promoters, etc… circa 1978.
• 1978 – 1993: Nucleic Acids Research published supplemental information
7Introduction 1.0
Genbank Doubles Every 16 Months
(from the National Centre for Biotechnology Information)
Shorter than Moore’s law (computer power doubling every 20 months!)
8Introduction 1.0
Today: So many genomes…
As of Oct 6, 2004, how many….
• published, complete genomes?
• eukaryotic genome projects in progress?
• prokaryote genome projects in progress?
Guess closest number without going over!
9Introduction 1.0
Today: The Human Genome Project
The genome sequence is complete - almost!– approximately 3.5 billion base pairs.
10Introduction 1.0
The next step is obviously to locate all of the genes and regulatory regions, describe their functions, and identify how they differ between different groups (i.e. “disease” vs “healthy”)……bioinformatics plays a critical role
11Introduction 1.0
Implications for Biomedicine and Bioinformatics
• Physicians will use genetic information to diagnose and treat disease.– Virtually all medical conditions (other than trauma)
have a genetic component– Individualize drugs – reduce side effects– Single Nucleotide Polymorphisms (SNPs)
• Faster drug development research– More targets– Faster clinical trials (selected trial populations)
• Most Biologists will analyze gene sequence information in their daily work
12Introduction 1.0
Bioinformatics will help with DNA Sequencing
u Automated sequencers > 40,000 bp per day
u 500 bp reads must be assembled into complete sequences- Detecting errors especially insertions and deletions
u Data flow management
13Introduction 1.0
Bioinformatics will help with Similarity Searching Sequence Databases
u What is similar to my sequence?
u Searching gets harder as the databases get bigger - and quality changes
u Tools: BLAST and FASTA = time saving heuristics (approximate methods)
u Statistics + informed judgement of the biologist
14Introduction 1.0
Bioinformatics will help with…….Structure-Function Relationships
u Can we predict the function of protein molecules from their sequence?
sequence > structure > function
u Prediction of some simple 3-D structures (α-helix, β-sheet, membrane spanning, etc.)
15Introduction 1.0
u Can we define evolutionary relationships between organisms by comparing DNA sequences- What is the molecular clock?- Lots of methods and software, what is
the "correct" analysis?
Bioinformatics will help with Phylogenetics
16Introduction 1.0
Top 10 Future Challenges for Bioinformatics
• Precise, predictive model of transcription initiation and termination: ability to predict where and when transcription will occur in a genome
• Precise, predictive model of RNA splicing/alternative splicing: ability to predict the splicing pattern of any primary transcript in any tissue
• Precise, quantitative models of signal transduction pathways: ability to predict cellular responses to external stimuli
• Determining effective protein:DNA, protein:RNA and protein:protein recognition codes
• Accurate ab initio protein structure prediction
17Introduction 1.0
Reference: Chris Burge, Ewan Birney, Jim Fickett. Genome Technology, issue No. 17, January, 2002
Top 10 Future Challenges for Bioinformatics
• Rational design of small molecule inhibitors of proteins• Mechanistic understanding of protein evolution:
understanding exactly how new protein functions evolve• Mechanistic understanding of speciation: molecular details
of how speciation occurs • Continued development of effective gene ontologies -
systematic ways to describe the functions of any gene or protein
• Education: development of appropriate bioinformatics curricula for secondary, undergraduate and graduate education
18Introduction 1.0
What is Bioinformatics?
• Think – Pair – Share!
19Introduction 1.0
The Biologist in the Age of Information
20Introduction 1.0
The Job of the Biologist Is Changing
• As more biological information becomes available …– The biologist will spend more time using computers– The biologist will spend more time on data analysis – Biology will become a more quantitative science
(think how the periodic table and atomic theory affected chemistry)
21Introduction 1.0
The challenge: Putting it all together
u The current state of the art requires the biologist to jump around from Web to mainframe to personal computer
u The trend is for integration u Real Power: Being able to use and customize
all resources
22Introduction 1.0
The Computer Scientist in the Age of Genomics
23Introduction 1.0
How much biology to understand?
u Increasing sophistication required for computational biologists in terms of biological knowledge
u What knowledge is important? What about all those exceptions?
u What problems are important?
24Introduction 1.0
What Computational Tools to Understand?
u Perl is still used extensively in bioinformaticsu Open source is prevalent in bioinformatics (Linux,
MySQL, bioperl)u Need to be knowledgeable about both the standard
bioinformatics algorithms and common tools that are based on them
u Appreciate the different databases and programs out there and what their benefits and fallacies are –databases have widely varying quality
25Introduction 1.0
High Quality Bioinformatics Research
Excellent Communication and Cooperation Between Biologists and Computer Scientists are Keys
26Introduction 1.0
The computer scientist and biologist compared
Computer scientist• Logic• Problem-solving• Process-oriented• Algorithmic• Optimizing
Biologist• Knowledge gathering• Experimentally-focused• Exceptions are as common as
rules• Describe work as a story• Develop conclusions and
models
27Introduction 1.0
Computer Science vs Biology
The result….• see the world differently• ask different questions• come to problems with different assumptions• pick up on different details• use different metaphors to organize knowledge• have different sets of analytical tools at their disposal• can even interact with people differently
Coming together• Communicate constantly!• Gain a better understanding of different ways of thinking • Try communicating in different ways• Remember there are others…. Statisticians, mathematicians,
engineers, physicists, chemists, physiologists….
28Introduction 1.0
Thoughts for the day
• What is bioinformatics?
• Why does bioinformatics exist?
• How can I use bioinformatics more effectively in my career?
• Questions?
29Introduction 1.0
Real World Applications of bioinformatics
• 1. . Molecular medicine Molecular medicine – 1.1 More drug targets – 1.2 Personalised medicine – 1.3 Preventative medicine – 1.4 Gene therapy
• 2. Microbial genome applicationsMicrobial genome applications– 2.1 Waste cleanup – 2.2 Climate change – 2.3 Alternative energy sources – 2.4 Biotechnology – 2.5 Antibiotic resistance – 2.6 Forensic analysis of microbes – 2.7 The reality of bioweapon creation – 2.8 Evolutionary studies
•• 3. 3. Agriculture Agriculture – 3.1 Crops – 3.2 Insect resistance – 3.3 Improve nutritional quality – 3.4 Grow crops in poorer soils
and that are drought resistant •• 4. Animals 4. Animals •• 5. Comparative studies 5. Comparative studies
30Introduction 1.0
BIOINFORMATICS INTRODUCTION
31Introduction 1.0
What is Bioinformatics
• The application of computer technology to the management of biological information
• Software applications used to gather, store, analyze and integrate biological information
• Databases and algorithms designed for the purpose of enhancing the process of biological research
32Introduction 1.0
What is Bioinformatics
• NCBI: “Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data including those to acquire, store, archive, analyze, or visualize such data.”
• Lincoln Stein: “Biologists using computers, or the other way around.”
33Introduction 1.0
“Hot” Bioinformatics Topics
• Gene Expression / Regulation
• Protein / RNA Structure
• Ontologies
• Genome Sequencing / Annotation
• Molecular Interactions
34Introduction 1.0
Where is Bioinformatics Used
• Pharmaceuticals
• Universities
• Biotech Companies
• Public Good / Health Research Institutes
• Hardware Manufacturers
• Government Agencies
THESE ARE OUR CLIENTS
35Introduction 1.0
Why Do We Need Bioinformatics
• Accessibility of biological data
• Data integration… at least within an organization
• Processing of data (data mining)
• Prediction and analysis
• Storage of mass amounts of data (high-throughput experiments)
36Introduction 1.0
DATA INTRODUCTION
37Introduction 1.0
How Much Data - GenBank
Source: NCBI
38Introduction 1.0
How Much Data - PDB
Source: RSCB
39Introduction 1.0
How Much Data - BIND
Source: Blueprint North America
40Introduction 1.0
How Much Data - PubMed
Source: Israel Institute of Technology
41Introduction 1.0
What Do We Do With All This Data?
• Design data structures to represent this information unambiguously
• Develop databases to house the data
• Develop accessible software to submit new data
• Develop fast applications to query the data
• Develop fast applications to analyze the data (data mining)
42Introduction 1.0
APPLICATIONS INTRODUCTION
43Introduction 1.0
Bioinformatics Application Trends
• Web based GUI tool accessibility
• Data marts
• Web services
• Integration Services
• Pre-analyzed Data Services
44Introduction 1.0
Languages of Bioinformatics
• Perl
• Python
• Java
• C++
• C
• And More…
45Introduction 1.0
Today’s World of Bioinformatics
Note: This is not intended to be an extensive list ofbioinformatics institutions
46Introduction 1.0
All Sorts of Bioinformatics Tools
Note: This is not intended to be an extensive list ofbioinformatics tools