introduction to biological database

46
1 NTNU-SUN Advanced Bioinformatics and Systems Biology 2008 Introduction to biological database Introduction to biological database Lecturer: Dr. Chih-Wen Sun Dept. of Life Sciences, NTNU References: Molecular cell biology, 6th ed., Lodish et al. (2007) Various web resources

Upload: others

Post on 12-Sep-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to biological database

1NTNU-SUN

Advanced Bioinformatics and Systems Biology 2008

Introduction to biological databaseIntroduction to biological database

Lecturer: Dr. Chih-Wen SunDept. of Life Sciences, NTNU

References:Molecular cell biology, 6th ed., Lodish et al. (2007)Various web resources

Page 2: Introduction to biological database

2NTNU-SUN

Bioinformatics•Use or development of techniques (mathematics, informatics,

statistics, computer science, chemistry, biochemistry) to solvebiological problems

•Core principle: using computing tools and approaches toacquire, store, organize archive, analyze or visualizesequence/structure

•Major research efforts- Sequence alignment- Gene finding- Genome assembly- Protein structure alignment and prediction- Prediction of gene expression- Prediction of protein-protein interaction- Modeling of evolution

http://en.wikipedia.org/wiki/Bioinformatics

Page 3: Introduction to biological database

3NTNU-SUN

Systems biology

•Quantitative and systematic study of complexinteraction in biological processes

•Biological systematics:- Study the diversity and relationship of lives on the

planet earth

http://en.wikipedia.org/wiki/systems_biology

Page 4: Introduction to biological database

4NTNU-SUN

Strategies to determine the function,location, and structure of gene products

Molecular cell biology, 6th ed.Protein-protein interaction

Gene expression pattern

Page 5: Introduction to biological database

5NTNU-SUN

Genomics: genome wide analysis ofgene structure and expression

Page 6: Introduction to biological database

6NTNU-SUN

DNA sequencing by dideoxy method

Molecular cell biology, 6th ed.

Page 7: Introduction to biological database

7NTNU-SUN

T C C A T G G A C CT C C A T G G A C

T C C A T G G A

T C C A T G G

T C C A T G

T C C A T

T C C A

T C C

T C

T

Electrophoresis gel

one of the manyfragments of DNAmigrating through the gel

CGCTTGACATCA

Detection of fluorescent signals

Molecular cell biology, 6th ed.

Page 8: Introduction to biological database

8NTNU-SUN

•GenBank: National Center for Biotechnology Information(NCBI) server, National Institute of Health (NIH), Bethesda,Maryland, USA

•EMBL: European Bioinformatics Institute (EBI) server,European Molecular Biology laboratory, Heidelberg,Germany

•DDBJ: DNA Database of Japan, Mishima, Japan.

Three primary data banks

Page 9: Introduction to biological database

9NTNU-SUN

Sequence comparison•BLAST program: basic local alignment search tool

- http://www.ncbi.nlm.nih.gov/BLAST/- BLAST algorithm divides the query sequence into shortersegments and then searches the database for significantmatches to any of the stored sequences

Paste or import the query sequence thatyou want to compare

Page 10: Introduction to biological database

10NTNU-SUN

Motifs and domains•Motif: Short sequence segment on a protein that is functionally important•Domain: Region of a protein with a distinct tertiary structure and

characteristic activity•If a protein with no significant similarity to other proteins with the BLAST

algorithm, search for motif similarity might give clues

Molecular cell biology, 6th ed.

Page 11: Introduction to biological database

11NTNU-SUN

Evolutionary relationship b/w genes•Protein family: related protein sequences•Gene family: corresponding genes of protein family•Gene homologs

- Orthologs- Paralogs

Phylogenic treeMolecular cell biology, 6th ed.

Page 12: Introduction to biological database

12NTNU-SUN

Gene expression comparison•To monitoring the expression of few genes for organisms

during specific physiological responses or developmentalprocesses

•To monitoring the expression of thousands of genessimultaneously for organisms during specific physiologicalresponses or developmental processes

Page 13: Introduction to biological database

13NTNU-SUN

DNA microarray•Probe sources:

•Fix ssDNA to glass slides or membranes

Page 14: Introduction to biological database

14NTNU-SUN

DNA chip•Probe sources:

•Fix ssDNA to glass slides

www.carleton.ca/catalyst/2006s/hms7.html

Page 15: Introduction to biological database

15NTNU-SUN

Microarray examples

Laser excitation

Cy5: ~650 nmCy3: ~550 nm

Image overlay

No changes

Flower genes

Leaves genesMolecular cell biology, 6th ed.

Page 16: Introduction to biological database

16NTNU-SUN

Cluster analysis•Cluster analysis groups sets of genes which exhibit similar

expression changes or are co-regulated in a specific cellularprocess or pathway.

•This is very useful in analyzing microarray data

•Softwares:

Gene expression profile at time intervals over a 24h period after starved fibroblasts were providedwith serum: A) cholesterol biosynthesis, B) the cell cycle, C) the immediate-early response, D)signaling and angiogenesis, E) would healing and tissue remodeling

Molecular cell biology, 6th ed.

Page 17: Introduction to biological database

17NTNU-SUN

Strategies to determine the function,location, and structure of gene products

Molecular cell biology, 6th ed.Protein-protein interaction

Gene expression pattern

Page 18: Introduction to biological database

18NTNU-SUN

Proteomics: large-scale study ofprotein structures and functions

Page 19: Introduction to biological database

19NTNU-SUN Molecular cell biology, 6th ed.

Protein localization

Page 20: Introduction to biological database

20NTNU-SUN

Determination of protein location•Wet experiments

•Dry experiments- ExPASy (Expert Protein Analysis System) server: Swiss

institute of Bioinformatics (SIB)

Page 21: Introduction to biological database

21NTNU-SUN http://au.expasy.org/

ExPASy server

Page 22: Introduction to biological database

22NTNU-SUN http://au.expasy.org/tools/

Page 23: Introduction to biological database

23NTNU-SUN http://au.expasy.org/tools/

Page 24: Introduction to biological database

24NTNU-SUN http://psort.ims.u-tokyo.ac.jp/

PSORT server

Page 25: Introduction to biological database

25NTNU-SUN

Determination of protein function•Wet experiments

•Dry experiments- ExPASy (or InterPro)

- BLAST- Pfam (protein family)

Page 26: Introduction to biological database

26NTNU-SUN http://au.expasy.org/

ExPASy server

Page 27: Introduction to biological database

27NTNU-SUNhttp://au.expasy.org/sprot/www.uniprot.org

Swiss-Prot server

Page 28: Introduction to biological database

28NTNU-SUN http://au.expasy.org/prosite/

Prosite server

Page 29: Introduction to biological database

29NTNU-SUN http://au.expasy.org/tools/

BLAST at ExPASy

Page 30: Introduction to biological database

30NTNU-SUN

BLAST at NCBI

Paste or import the query sequence thatyou want to compare

http://www.ncbi.nlm.nih.gov/BLAST/

Page 31: Introduction to biological database

31NTNU-SUN

Pfam server

•Pfam is a large collection of multiple sequence alignmentsand hidden Markov models covering many common proteindomains and families. For each family in Pfam you can:- Look at multiple alignments- View protein domain architectures- Examine species distribution- Follow links to other databases- View known protein structures

http://pfam.sanger.ac.uk/

Page 32: Introduction to biological database

32NTNU-SUN

Determination of protein structure•Wet experiments

•Dry experiments- ExPASy

- PDB (protein data bank) server

Page 33: Introduction to biological database

33NTNU-SUN http://au.expasy.org/tools/

Page 34: Introduction to biological database

34NTNU-SUN http://www.rcsb.org/pdb/

PDB server

Page 35: Introduction to biological database

35NTNU-SUN

Protein-protein interaction•Wet experiments

•Dry experiments- APID (Agile Protein Interaction DataAnalyzer) and APID2NET

(unified interactome graphic analyzer)- cons-PPISP (consensus neural-network Protein-Protein Interaction

Site Predictor)- InterPreTS (Interaction Prediction through Tertiary Structure)- InterProSurf (Prediction of functional sites in monomeric

protein surface)- PIP (Potential Interactions of Proteins)- PRISM (Protein interaction by structure matching)- SCOPPI (Structural Classification of Protein-Protein Interfaces)

http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction

Page 36: Introduction to biological database

36NTNU-SUN

Genome projects of variouseukaryotic organisms

Page 37: Introduction to biological database

37NTNU-SUN

Assembled genome database at NCBI

http://blast.ncbi.nlm.nih.gov/Blast.cgi

Page 38: Introduction to biological database

38NTNU-SUN

Example of animal genome projects

2006(2001)

Human Genome ProjectConsortium and Celera Genomics

250003.2 GbHuman

Homosapiens

2003Washington Univ., Sanger Inst.and Cold Spring Harbor Lab.

19500104 MbNematode

Caenorhabditisbriggsae]

2006Honeybee Genome SequencingConsortium

101571.8 GbHoneybee

Apismellifera

2002]International Fugu GenomeConsortium

22000-29000

390 MbPufferfish

Takifugurubripes]

2002International Collaboration for theMouse Genome Sequencing

241742.5 GbMouse]Musmusculus

2000Celera, UC Berkeley, EuropeanDGP, Baylor College of Medicine

13600165 MbFruitfly

Drosophilamelanogaster

Complete year

OrganizationGenes#

Genomesize

TypeOrganism

http://en.wikipedia.org/wiki/List_of_sequenced_eukaryotic_genomes

Page 39: Introduction to biological database

39NTNU-SUN

Examples of plant genome projects

2008US Dept. of Energy Office ofScience Joint Genome Inst.

39458500 MbBryophyte

Physcomitrellapatens

2007The French-Italian PublicConsortium for GrapevineGenome Characterization

30434490 MbGrapevine

Vitis vinifera

2006The International PoplarGenome Consortium

45555550 MbPoplarPopulustrichocarpa

2004Univ. of Tokyo, Rikkyo Univ.,Saitama Univ., KumamotoUniv.

533116.5 MbRedalga

Cyanidioschyzon merolae

2002Syngenta and MyriadGenetics

46022-55615

466 MbRiceOryza sativassp japonica

2000Arabidopsis Genome Initiative27235125 MbCressArabidopsisthaliana

Completeyear

OrganizationGenes#

Genomesize

TypeOrganism

http://en.wikipedia.org/wiki/List_of_sequenced_eukaryotic_genomes

Page 40: Introduction to biological database

40NTNU-SUN

Organism-specific genomeresources

http://www.ncbi.nlm.nih.gov/Genomes/

Page 41: Introduction to biological database

41NTNU-SUN

Organism-specific genomeresources

http://www.ncbi.nlm.nih.gov/projects/genome/guide/cat/http://www.ncbi.nlm.nih.gov/projects/genome/guide/dog/http://www.ncbi.nlm.nih.gov/projects/genome/guide/pig/

Page 42: Introduction to biological database

42NTNU-SUN

http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=7227

Fly databases

http://flybase.bio.indiana.edu/

http://www.fruitfly.org/

Page 43: Introduction to biological database

43NTNU-SUN

Examples of unigene identifier

•Am for honey bee•Bt for cow•Dm for fruitfly•Dr for zebrafish•Hs for human•Mm for mouse•Rn for mouse•Xl for frog

•At for Arabidopsis•Hv for barley•Os for rice•Ta for wheat•Zm for maize

Plants Animals

Page 44: Introduction to biological database

44NTNU-SUN

General terms in GenBank•Accession number

- 1 letter + 5 digits (e.g., M12345)- 2 letters + 6 digits (e.g., AC123456)

•GenInfo identifier (GI)- 1 or more digits

•Protein ID- 3 letters + 5 digits (e.g., AAA35650)

•Version- M12345.1- M12345.2

Page 45: Introduction to biological database

45NTNU-SUN

Refseq accession numbers

•NT_123456 constructed genomic contigs•NM_123456 mRNA•NP_123456 proteins•NC_123456 chromosomes•XM_123456 predicted mRNA•XP_123456 Predicted protein

Page 46: Introduction to biological database

46NTNU-SUN

Exercises of biological databases