databases in bioinformatics - welcome to srm university ... · pdf filedatabases in...

Post on 21-Mar-2018

221 Views

Category:

Documents

6 Downloads

Preview:

Click to see full reader

TRANSCRIPT

UNIT-VDatabases in Bioinformatics

R.KAVITHA,M.PHARMLECTURER,

DEPARTMENT OF PHARMACUTICSSRM COLEGE OF PHARMACY

SRMUNIVERITY

• Why?• The different types of databases• Database language: identifiers• Nucleotide sequence databases• Protein sequence databases• 3D structure databases• Ontologies

Databases in Bioinformatics

• Make biological data available to scientists– Consolidation of data (gather data from different sources)– Provide access to large dataset that cannot be published

explicitly (genome, …)

• Make biological data available in computer-readable format– Make data accessible for automated analysis

Bioinformatics: “a collective term for data compilation, organisation, analysis and dissemination”

Biological databases: Why?

The different types of Databases in Bioinformatics

1) Data:

Type of data:• nucleotide sequences• protein sequences• 3D structures• gene expression data• metabolic pathways• ….

Data entry and quality control:• data deposited directly• curators add and update data• treatment of erroneous data: removed,

or marked• error checking• consistency, updates• ….

Primary, or derived data:• Primary databases: direct experimental results• Secondary databases: result of analysis on primary databases• Consolidation of many databases• …

The different types of Databases in Bioinformatics2) Database:

Organisation:• flat files• Relational databases• Object-oriented databases• ….

Curators:• Large, public institution (EMBL, NCBI)• Quasi-academic institute (Swiss institute of Bioinformatics, TIGR,…)• Academic group or scientist• Commercial company

Availability:• Publicly available, no restriction• Available, but with copyright• Accessible, but not downloadable• Academic, but not freely available• Commercial

• Identifier: string of letters and digits that generally is “understandable”– Example: TPIS_CHICK (Triose Phosphate Isomerase from

chicken (gallus gallus) ) in SwissProt– The identifier can change (based on the curator)

• Accession code: a string of letters and digits that uniquely identifies an entry in its database.– The accession number for TPIS_CHICK in Swissprot is

P00940– Accession number should not changed!!

Identifiers and Accession numbers

• 3 main databases– EMBL: www.ebi.ac.uk/embl– GenBank: www.ncbi.nlm.nih.gov/GenBank– DDBJ: www.ddbj.nig.ac.jp

The 3 databases are synchronized on a daily basis, and the accession numbers are consistent.

There are no legal restriction in the usage of these databases. However, there are some patented sequences in the database

Nucleotide Sequence Databases

Protein Sequence Databases

One of the first biological sequencedatabases was probably the book "Atlas of Protein Sequences and Structures"by Margaret Dayhoff and colleagues, first published in 1965. It contained the protein sequences determined at the time, and new editions of the book were published till 1978. It became the foundationof the PIR database.

http://pir.georgetown.edu/

Protein Information Resource

Protein Sequence Databases

http://www.expasy.ch/sprot/

The SWISS-PROT database has some legal restrictions: the entries are copyrighted, but freely accessible by academic researchers. Commercial companies must buy a license fee from SIB.

Amino AcidComposition

Size of SwissProt

SwissProt: Statistics

• PDB: http://www.rcsb.org• SCOP: http://scop.berkeley.edu• CATH: http://biochem.ucl.ac.uk/bsm/CATH• ASTRAL: http://astral.berkeley.edu• HOMSTRAD: http://www-cryst.bioc.cam.ac.uk/data/align/• Interfaces to PDB:

– PDB at a glancehttp://cmm.info.nih.gov/modeling/pdb_at_a_glance.html

– Molecules to go http://molbio.info.nih.gov/cgi-bin/pdb/– EBI interface: http://www.ebi.ac.uk/msd/– PDBSum: http://www.ebi.ac.uk/thornton-srv/databases/pdbsum

Biomolecule Structure Database

• GO paper: Creating the Gene Ontology Resource: Design and Implementation Genome Research (2001) 11:1425-1433

• The GO Website - http://www.geneontology.org• Application of GO –

The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro Genome Res. 2003 Apr;13(4):662-72.

The Gene Ontology (GO)

GO Goals

From Genome Res 2001 Aug;11(8):1425-33

• Three levels of annotation:

– Molecular function - what a gene product does at the biochemical level

– Biological process - a broad biological perspective – not currently a pathway (no dynamics or dependencies)

– Cellular component - location within cellular structures (eg Golgi apparatus) and macromolecular complexes (ribosome)

Gene Ontology (GO)

Structure of GO

Example from molecular function:

Transmembrane receptor tyrosine protein kinaseChild

ParentTransmembrane

receptorProtein tyrosine

kinase

Is_a Is_a

Searching for papers…

Searching for papers…

http://scholar.google.com

top related