genbank, swissprot and others
DESCRIPTION
GENBANK, SWISSPROT AND OTHERS. As Problem Sources for CSE 549 Andriy Tovkach Genetics. GENBANK OVERVIEW. Consists of EMBL, NCBI and DDBJ Started 10 years ago Exponential growth ( graph ) On Saturday, the 7 th – 20.2 billion bases. FILE FORMAT. Header Features Sequence ( see files ). - PowerPoint PPT PresentationTRANSCRIPT
GENBANK, SWISSPROT AND OTHERSGENBANK, SWISSPROT AND OTHERS
As Problem Sources for CSE 549Andriy Tovkach
Genetics
GENBANK OVERVIEWGENBANK OVERVIEW
Consists of EMBL, NCBI and DDBJ Started 10 years ago Exponential growth (graph) On Saturday, the 7th – 20.2 billion bases
FILE FORMATFILE FORMAT
Header Features Sequence(see files)
FASTA FORMATFASTA FORMAT
Single line description begins with > Followed by sequence data Can be both protein or DNA
ENTREZ as RETRIEVAL SYSTEMENTREZ as RETRIEVAL SYSTEM
PubMed – 12 million citations from life science journals
Nucleotide – collection of DNA sequences Protein – protein sequences from SwissProt Genome – genomes of over 800 organisms Also Structure, PopSet, Taxonomy, OMIM
PROTEIN DATABASESPROTEIN DATABASES
SWISS-PROT EBI – TREMBL NCBI – GENPEPT (already in history)
GENOME DATABASESGENOME DATABASES
SGD: homepage example 1.1 example 1.2
Wormbase Ensembl Human Genome Browser
CONCLUSIONSCONCLUSIONS
Sequencing projects produce a lot of data These data have at least to be structured in the
databases Ideally all sequences need high-quality human
annotation That’s why computer scientists are welcome in
biology
LITERATURELITERATURE
Genebank presentation by Manpreet Katari (CSE 549, Fall 2000)
Thomas Lengauer (Ed.) Bioinformatics – From Genomes to Drugs
Entrez website Google