welcome to the bioinformatics workshop july 28, 2003 introduction workshop objectives module 1:...
TRANSCRIPT
Welcome to the BioinformaticsWorkshop
July 28, 2003
Introduction
Workshop objectives Module 1: Retrieval of literature dealing with
molecular life sciences. Module 2: Sequence databases and similarity
searches. Module 3: Protein structure analysis
Workshop logistics
Course Website (http://www.calstatela.edu/faculty/jmomand/Bioinformaticscourse.html) Power point presentation In-class workshop
Definition of BioinformaticsMany definitions at the moment:
Use of computers to catalog and organize molecular life science information into meaningful entities.
Subset of Computational Biology
How can Bioinformatics help make scientific discoveries?
Bioinformatics is not just the storage of data in a computer.
Bioinformatics is the use of computers to test a biological hypothesis prior to performing the experiment in the laboratory.
Bioinformatics is the design of software programs that analyze data.
Basis of molecular biology
Hierarchy of relationships (not exactly true):
Genome
Gene 1 Gene 3Gene 2 Gene X
Protein 1 Protein 2 Protein 3 Protein X
Function 1 Function 2 Function 3 Function X
Genome SizesFERN: 160,000,000,000LUNGFISH: 139,000,000,000SALAMANDER: 81,300,000,000NEWT: 20,600,000,000ONION: 18,000,000,000GORILLA: 3,523,200,000MOUSE: 3,454,200,000HUMAN: 3,400,000,000 31,000Drosophila : 137,000,000 13,500C. elegans 96,000,000 19,000Yeast 12,000,000 6,315E. Coli 5,000,000 5,361Smallest Genome ??????
Genes
What is the approach used to sequence genomes?
Divide and conquer Split the genome into fragments Clone into vectors that can accept large fragments:
yeast artificial chromosomes (YAC Library) Landmarks within the genome can be obtained using a
Sequence Tagged Sites (STS) Sequences of YAC clones are matched with each other. Sequences that overlap form contigs.
History of the Human Genome Project
1953
Watson,CrickDNAstructure
1972
Berg,1st recombinantDNA
1977
Maxam,Gilbert,SangersequenceDNA
1980
Botstein,Davis,SkolnickWhitepropose to map humangenome withRFLPs
1982
Wadaproposes tobuild automated sequencingrobots
1984
MRCpublishesfirst largegenomeEpstein-Barrvirus (170 kb)
1985
Sinsheimer hosts meeting to discuss HGP at UCSantaCruz;Kary Mullis develops PCR
1986
DOE begins genome studies with $5.3 million
1987
Gilbert announces plans to start company to sequence and copyright DNA; Burke, Olson, Carle develop YACs; Donis-Keller publish first map (403 markers)
History of the Human Genome Project (continued)
1987 (cont)
Hood producesfirst automated sequencer;Dupont devolops fluorescent dideoxy-nucleotides
1988
NIH supports the HGP;Watson heads the project and allocates part of the budget to study social and ethical issues
1989
Hood, Olson, Botstein Cantor propose using STS’s to map the human genome
1990
Proposal to sequence20 Mb in model organism by 2005;Lipman, Myers publish the BLAST algorithm
1991
Venter announces strategy to sequence ESTs. He plans to patentpartial cDNAs;Uberbacher develops GRAIL, a gene finding program
1992
Simon develops BACs; US and French teams publish first physical maps of chromosomes; first genetic maps of mouse and human genome published
1993
Collins is named director of NCHGR; revise plan to complete seq of human genome by 2005
1995
Venter publishes first sequence of free-living organism:H. influenzae (1.8 Mb);Brown publishes on DNA arrays
1996
Yeast genome is sequenced (S. cerevisiae)
History of the Human Genome Project (continued)
1997
Blattner, Plunket complete E. coli sequence; a capillary sequencing machine is introduced.
1998
SNP project is initiated; rice genome project is started; Venter creates new company called Celera and proposes to sequence HG within 3 years; C. elegans genome completed
1999
NIH proposes to sequence mouse genome in 3 years; first sequence of chromosome 22 is announced
2000
Celera and others publish Drosphila sequence (180 Mb); human chromosome 21 is completely sequenced; proposal to sequence puffer fish; Arabadopsis sequence is completed
2001
Celera publishes human sequence in Science; the HGP consortium publishes the human sequence in Nature
2003
Completed genomes:112 Microbial18 Eukaryotes1275 Viruses
Public funding vs. Private funding
Public-Taxpayers’ money, international effort.
Private-Companies that invest money hope to provide access to their information on a fee basis. Celera also allows some free information to small research groups.
Both groups published the sequence of the human genome in 2001.
Bioinformatics is Multidisciplinary
ComputerScience
Math
Statistics
StructuralBiology
Phylogenetics
Drug Design
Genomics
MolecularBiology
Bioinformatics at CSULA
Upper Div. Standing inBiology or Biochem
Upper Div. Standing inCS, IS, CE
Introduction to Bioinformatics (Chem 434)(offered in Spring ‘04)
One course in C/C++ or PERL
programming (CIS 283, CS 201)
One course in Molec. Biology/Biochem or
Chem/Biol 154L (W’04)
www.calstatela.edu/faculty/jmomand/Bioinformaticscourse.html
How is Bioinformatics Used?
Experimental proof is still the “Gold Standard”.
Bioinformatics isn’t going to replace lab work anytime soon
Bioinformatics is used to help “focus”the experiments of the benchtop scientist
Unknown Function
What’s Left To Do? Find out what the rest of the genome does.
What is left to do?
Sequence genomes of other organisms
Analyze genes to predict function
Analyze interactions of gene products- Create genetic networks
Start making changes
Modify gene expression patterns to make better crops or better medicines
Once this is finished, then what?
Increasing levels of complexity
Genome(DNA)
Transcriptosome (RNA)
Proteome (proteins)
Metabalome (metabolic pathways)
Primary public domain bioinformatics servers
Public DomainBioinformatics
Facilities
European BioinformaticsInstitute (EBI)
United Kingdom
National CenterFor Biotechnology
Information (NCBI)United States
GenomeNet
(KEGG & DDBJ)Japan
DatabasesAnalysis
ToolsDatabases
AnalysisTools
DatabasesAnalysis
Tools
Literature Databases and NCBILearning objective- How does one retrieve information on a particular subject?National Center for Biotechnology Information (NCBI)Databases outside of NCBIRetrieval of information
Literature Databases
Medline (PubMed)
OMIM
CSULA Library
Other biological databasesBIOSISAgriculture http://www.fao.orgMelvyl (Books at UC Libraries)
NCBI ENTREZ
A search engine that provides access and links between various databases
ENTREZ
PubMed GenBank Proteindatabases
Genomes PopSet Taxonomy OMIM
On-line Mendelian Inheritance of Man (OMIM)
A catalog of human genes linked to diseasesBegan by Victor A. McKusick at Johns Hopkins UniversityA good place to start when you want to know about a certain disease.This database is linked to PubMed, the OMIM Morbid MapThe OMIM Gene Map
CSULA and other resourcesThe best way to access articles at Cal State LA is to obtain the exact reference from PubMed. Then search to the CSULA library database for the article: http://www.calstatela.edu/library/mudir1.htm
Publishers to search through at the CSULA Library Site: ACS Wiley InterScience IDEAL
There are two Website that offers free access to journals: PubMedCentral: http://www.pubmedcentral.nih.gov/ BioMedNet: http://www.bmn.com/
How to keep up to date on your favorite subject?
Set up Cubby. An automatic retrieval system that searches PubMed and deposits the literature citations in your own account (there is no charge). Demonstration of how Cubby works. Requires a login.
Workshop Exercise 1-Retrieve information on a topicfrom literature databases. Set up Cubby account for yourself.