genbank, swissprot and others

Post on 05-Jan-2016

52 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

GENBANK, SWISSPROT AND OTHERS. As Problem Sources for CSE 549 Andriy Tovkach Genetics. GENBANK OVERVIEW. Consists of EMBL, NCBI and DDBJ Started 10 years ago Exponential growth ( graph ) On Saturday, the 7 th – 20.2 billion bases. FILE FORMAT. Header Features Sequence ( see files ). - PowerPoint PPT Presentation

TRANSCRIPT

GENBANK, SWISSPROT AND OTHERSGENBANK, SWISSPROT AND OTHERS

As Problem Sources for CSE 549Andriy Tovkach

Genetics

GENBANK OVERVIEWGENBANK OVERVIEW

Consists of EMBL, NCBI and DDBJ Started 10 years ago Exponential growth (graph) On Saturday, the 7th – 20.2 billion bases

FILE FORMATFILE FORMAT

Header Features Sequence(see files)

FASTA FORMATFASTA FORMAT

Single line description begins with > Followed by sequence data Can be both protein or DNA

ENTREZ as RETRIEVAL SYSTEMENTREZ as RETRIEVAL SYSTEM

PubMed – 12 million citations from life science journals

Nucleotide – collection of DNA sequences Protein – protein sequences from SwissProt Genome – genomes of over 800 organisms Also Structure, PopSet, Taxonomy, OMIM

PROTEIN DATABASESPROTEIN DATABASES

SWISS-PROT EBI – TREMBL NCBI – GENPEPT (already in history)

GENOME DATABASESGENOME DATABASES

SGD: homepage example 1.1 example 1.2

Wormbase Ensembl Human Genome Browser

CONCLUSIONSCONCLUSIONS

Sequencing projects produce a lot of data These data have at least to be structured in the

databases Ideally all sequences need high-quality human

annotation That’s why computer scientists are welcome in

biology

LITERATURELITERATURE

Genebank presentation by Manpreet Katari (CSE 549, Fall 2000)

Thomas Lengauer (Ed.) Bioinformatics – From Genomes to Drugs

Entrez website Google

top related