welcome to the bioinformatics workshop july 28, 2003 introduction workshop objectives module 1:...

Welcome to the BioinformaticsWorkshop

July 28, 2003

Introduction

Workshop objectives Module 1: Retrieval of literature dealing with

molecular life sciences. Module 2: Sequence databases and similarity

searches. Module 3: Protein structure analysis

Workshop logistics

Course Website (http://www.calstatela.edu/faculty/jmomand/Bioinformaticscourse.html) Power point presentation In-class workshop

http://www.calstatela.edu/faculty/jmomand/Bioinformaticscourse.html









Definition of BioinformaticsMany definitions at the moment:

Use of computers to catalog and organize molecular life science information into meaningful entities.

Subset of Computational Biology

How can Bioinformatics help make scientific discoveries?

Bioinformatics is not just the storage of data in a computer.

Bioinformatics is the use of computers to test a biological hypothesis prior to performing the experiment in the laboratory.

Bioinformatics is the design of software programs that analyze data.

Basis of molecular biology

Hierarchy of relationships (not exactly true):

Genome

Gene 1 Gene 3Gene 2 Gene X

Protein 1 Protein 2 Protein 3 Protein X

Function 1 Function 2 Function 3 Function X

Genome SizesFERN: 160,000,000,000LUNGFISH: 139,000,000,000SALAMANDER: 81,300,000,000NEWT: 20,600,000,000ONION: 18,000,000,000GORILLA: 3,523,200,000MOUSE: 3,454,200,000HUMAN: 3,400,000,000 31,000Drosophila : 137,000,000 13,500C. elegans 96,000,000 19,000Yeast 12,000,000 6,315E. Coli 5,000,000 5,361Smallest Genome ??????

Genes

What is the approach used to sequence genomes?

Divide and conquer Split the genome into fragments Clone into vectors that can accept large fragments:

yeast artificial chromosomes (YAC Library) Landmarks within the genome can be obtained using a

Sequence Tagged Sites (STS) Sequences of YAC clones are matched with each other. Sequences that overlap form contigs.

History of the Human Genome Project

1953

Watson,CrickDNAstructure

1972

Berg,1st recombinantDNA

1977

Maxam,Gilbert,SangersequenceDNA

1980

Botstein,Davis,SkolnickWhitepropose to map humangenome withRFLPs

1982

Wadaproposes tobuild automated sequencingrobots

1984

MRCpublishesfirst largegenomeEpstein-Barrvirus (170 kb)

1985

Sinsheimer hosts meeting to discuss HGP at UCSantaCruz;Kary Mullis develops PCR

1986

DOE begins genome studies with $5.3 million

1987

Gilbert announces plans to start company to sequence and copyright DNA; Burke, Olson, Carle develop YACs; Donis-Keller publish first map (403 markers)

History of the Human Genome Project (continued)

1987 (cont)

Hood producesfirst automated sequencer;Dupont devolops fluorescent dideoxy-nucleotides

1988

NIH supports the HGP;Watson heads the project and allocates part of the budget to study social and ethical issues

1989

Hood, Olson, Botstein Cantor propose using STS’s to map the human genome

1990

Proposal to sequence20 Mb in model organism by 2005;Lipman, Myers publish the BLAST algorithm

1991

Venter announces strategy to sequence ESTs. He plans to patentpartial cDNAs;Uberbacher develops GRAIL, a gene finding program

1992

Simon develops BACs; US and French teams publish first physical maps of chromosomes; first genetic maps of mouse and human genome published

1993

Collins is named director of NCHGR; revise plan to complete seq of human genome by 2005

1995

Venter publishes first sequence of free-living organism:H. influenzae (1.8 Mb);Brown publishes on DNA arrays

1996

Yeast genome is sequenced (S. cerevisiae)

History of the Human Genome Project (continued)

1997

Blattner, Plunket complete E. coli sequence; a capillary sequencing machine is introduced.

1998

SNP project is initiated; rice genome project is started; Venter creates new company called Celera and proposes to sequence HG within 3 years; C. elegans genome completed

1999

NIH proposes to sequence mouse genome in 3 years; first sequence of chromosome 22 is announced

2000

Celera and others publish Drosphila sequence (180 Mb); human chromosome 21 is completely sequenced; proposal to sequence puffer fish; Arabadopsis sequence is completed

2001

Celera publishes human sequence in Science; the HGP consortium publishes the human sequence in Nature

2003

Completed genomes:112 Microbial18 Eukaryotes1275 Viruses

Public funding vs. Private funding

Public-Taxpayers’ money, international effort.

Private-Companies that invest money hope to provide access to their information on a fee basis. Celera also allows some free information to small research groups.

Both groups published the sequence of the human genome in 2001.

Bioinformatics is Multidisciplinary

ComputerScience

Math

Statistics

StructuralBiology

Phylogenetics

Drug Design

Genomics

MolecularBiology

Bioinformatics at CSULA

Upper Div. Standing inBiology or Biochem

Upper Div. Standing inCS, IS, CE

Introduction to Bioinformatics (Chem 434)(offered in Spring ‘04)

One course in C/C++ or PERL

programming (CIS 283, CS 201)

One course in Molec. Biology/Biochem or

Chem/Biol 154L (W’04)

www.calstatela.edu/faculty/jmomand/Bioinformaticscourse.html

How is Bioinformatics Used?

Experimental proof is still the “Gold Standard”.

Bioinformatics isn’t going to replace lab work anytime soon

Bioinformatics is used to help “focus”the experiments of the benchtop scientist

Unknown Function

What’s Left To Do? Find out what the rest of the genome does.

What is left to do?

Sequence genomes of other organisms

Analyze genes to predict function

Analyze interactions of gene products- Create genetic networks

Start making changes

Modify gene expression patterns to make better crops or better medicines

Once this is finished, then what?

Increasing levels of complexity

Genome(DNA)

Transcriptosome (RNA)

Proteome (proteins)

Metabalome (metabolic pathways)

Primary public domain bioinformatics servers

Public DomainBioinformatics

Facilities

European BioinformaticsInstitute (EBI)

United Kingdom

National CenterFor Biotechnology

Information (NCBI)United States

GenomeNet

(KEGG & DDBJ)Japan

DatabasesAnalysis

ToolsDatabases

AnalysisTools

DatabasesAnalysis

Tools

Literature Databases and NCBILearning objective- How does one retrieve information on a particular subject?National Center for Biotechnology Information (NCBI)Databases outside of NCBIRetrieval of information

Literature Databases

Medline (PubMed)

OMIM

CSULA Library

Other biological databasesBIOSISAgriculture http://www.fao.orgMelvyl (Books at UC Libraries)

http://www.fao.org/

NCBI ENTREZ

A search engine that provides access and links between various databases

ENTREZ

PubMed GenBank Proteindatabases

Genomes PopSet Taxonomy OMIM

On-line Mendelian Inheritance of Man (OMIM)

A catalog of human genes linked to diseasesBegan by Victor A. McKusick at Johns Hopkins UniversityA good place to start when you want to know about a certain disease.This database is linked to PubMed, the OMIM Morbid MapThe OMIM Gene Map

CSULA and other resourcesThe best way to access articles at Cal State LA is to obtain the exact reference from PubMed. Then search to the CSULA library database for the article: http://www.calstatela.edu/library/mudir1.htm

Publishers to search through at the CSULA Library Site: ACS Wiley InterScience IDEAL

There are two Website that offers free access to journals: PubMedCentral: http://www.pubmedcentral.nih.gov/ BioMedNet: http://www.bmn.com/

http://www.calstatela.edu/library/mudir1.htm






http://www.pubmedcentral.nih.gov/

http://www.bmn.com/

http://www.bmn.com/

http://www.bmn.com/

How to keep up to date on your favorite subject?

Set up Cubby. An automatic retrieval system that searches PubMed and deposits the literature citations in your own account (there is no charge). Demonstration of how Cubby works. Requires a login.

Workshop Exercise 1-Retrieve information on a topicfrom literature databases. Set up Cubby account for yourself.

welcome to the bioinformatics workshop july 28, 2003 introduction workshop objectives module 1:...

Documents

human genome project1953watson

genome studies

smallest genome

complete seq of human

human genome1990proposal

sequence databases

protein xfunction

gene xprotein