what is genbank?– direct submissions individual records (bankit, sequin) – batch submissions via...
TRANSCRIPT
What is What is GenBankGenBank??NCBINCBI’’ss Primary Sequence DatabasePrimary Sequence Database
•• Nucleotide sequence database Nucleotide sequence database •• Archival in natureArchival in nature•• GenBankGenBank DataData
–– Direct submissions individual records (Direct submissions individual records (BankItBankIt, Sequin), Sequin)–– Batch submissions via email (EST, GSS, STS)Batch submissions via email (EST, GSS, STS)–– ftp accounts sequencing centersftp accounts sequencing centers
•• Data shared nightly among three collaborating Data shared nightly among three collaborating databasesdatabases–– GenBankGenBank–– DNA Database of Japan (DDBJ). DNA Database of Japan (DDBJ). MishimaMishima, Japan, Japan–– European Molecular Biology Laboratory Database (EMBL) at European Molecular Biology Laboratory Database (EMBL) at
EBI. EBI. HinxtonHinxton, UK, UK
GenBankGenBank
DDBJDDBJEMBL
Data LibraryEMBL
Data Library
EMBLEMBL
NIGNIG
NIHNIH Entrez
SRS
getentry
•Submissions•Updates
•Submissions•Updates
•Submissions•Updates
The International Nucleotide SequenceThe International Nucleotide SequenceDatabase Collaboration Database Collaboration DDBJ/EMBL/DDBJ/EMBL/GenBankGenBank
EBICIB
NCBI
NCBI NCBI HomepageHomepage
NCBI DatabasesNCBI Databases
NCBI Databases and ServicesNCBI Databases and Services
•• GenBank GenBank largest sequence databaselargest sequence database
•• Free public access to biomedical literatureFree public access to biomedical literature–– PubMed PubMed free Medlinefree Medline
–– PubMed Central PubMed Central full text online accessfull text online access
•• Entrez Entrez integrated molecular and literature databasesintegrated molecular and literature databases
•• BLAST BLAST highest volume sequence search servicehighest volume sequence search service
•• VASTVAST structure similarity searchesstructure similarity searches
•• Software and DatabasesSoftware and Databases
A TraditionalA TraditionalGenBank RecordGenBank Record
LOCUS AY182241 1931 bp mRNA linear PLN 04-MAY-2004DEFINITION Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA,
complete cds.ACCESSION AY182241VERSION AY182241.2 GI:32265057KEYWORDS .SOURCE Malus x domestica (cultivated apple)
ORGANISM Malus x domesticaEukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots;rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus.
REFERENCE 1 (bases 1 to 1931)AUTHORS Pechous,S.W. and Whitaker,B.D.TITLE Cloning and functional expression of an (E,E)-alpha-farnesene
synthase cDNA from peel tissue of apple fruitJOURNAL Planta 219, 84-94 (2004)
REFERENCE 2 (bases 1 to 1931)AUTHORS Pechous,S.W. and Whitaker,B.D.TITLE Direct SubmissionJOURNAL Submitted (18-NOV-2002) PSI-Produce Quality and Safety Lab,
USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD20705, USA
REFERENCE 3 (bases 1 to 1931)AUTHORS Pechous,S.W. and Whitaker,B.D.TITLE Direct SubmissionJOURNAL Submitted (25-JUN-2003) PSI-Produce Quality and Safety Lab,
USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD20705, USA
REMARK Sequence update by submitterCOMMENT On Jun 26, 2003 this sequence version replaced gi:27804758.FEATURES Location/Qualifiers
source 1..1931/organism="Malus x domestica"/mol_type="mRNA"/cultivar="'Law Rome'"/db_xref="taxon:3750"/tissue_type="peel"
gene 1..1931/gene="AFS1"
CDS 54..1784/gene="AFS1"/note="terpene synthase"/codon_start=1/product="(E,E)-alpha-farnesene synthase"/protein_id="AAO22848.2"/db_xref="GI:32265058"/translation="MEFRVHLQADNEQKIFQNQMKPEPEASYLINQRRSANYKPNIWKNDFLDQSLISKYDGDEYRKLSEKLIEEVKIYISAETMDLVAKLELIDSVRKLGLANLFEKEIKEALDSIAAIESDNLGTRDDLYGTALHFKILRQHGYKVSQDIFGRFMDEKGTLEDFLHKNEDLLYNISLIVRLNNDLGTSAAEQERGDSPSSIVCYMREVNASEETARKNIKGMIDNAWKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSLYKDGDGFGDQEKGPRTHILSLLFQPLVN"
ORIGIN 1 ttcttgtatc ccaaacatct cgagcttctt gtacaccaaa ttaggtattc actatggaat
61 tcagagttca cttgcaagct gataatgagc agaaaatttt tcaaaaccag atgaaacccg121 aacctgaagc ctcttacttg attaatcaaa gacggtctgc aaattacaag ccaaatattt181 ggaagaacga tttcctagat caatctctta tcagcaaata cgatggagat gagtatcgga241 agctgtctga gaagttaata gaagaagtta agatttatat atctgctgaa acaatggatt
//
Header
Feature Table
Sequence
The Flatfile FormatThe Flatfile Format
Traditional GenBank RecordTraditional GenBank Record
ACCESSION U07418
VERSION U07418.1 GI:466461
ACCESSION U07418
VERSION U07418.1 GI:466461
Accession•Stable•Reportable•Universal
Accession•Stable•Reportable•Universal
VersionTracks changes in sequenceVersionTracks changes in sequence
GI numberNCBI internal useGI numberNCBI internal use
well annotatedwell annotated
the sequence is the datathe sequence is the data
LOCUS AF062069 3808 bp mRNA INV 02-MAR-2000
Sequence and Database IdentifiersSequence and Database IdentifiersLocus, accession, Locus, accession, gigi, version, version
DEFINITION Limulus polyphemus myosin III mRNA, complete cds.
GB DivisionLocus Name
DEF line (Title)
Modification Datemol-typemRNA (= cDNA)rRNAsnRNADNA
Sequencelength
VERSION AF062069.2 GI:7144484
ACCESSION AF062069 Accession Number
Accession.version gi number
BASE COUNT 1201 a 689 c 782 g 1136 tORIGIN
1 tcgacatctg tggtcgcttt ttttagtaat aaaaaattgt attatgacgt cctatctgtt<sequence omitted>
3721 accaatgtta taatatgaaa tgaaataaag cagtcatggt agcagtggct gtttgaaata3781 aagatacagt aactagggaa aaaaaaaa
//
SequenceSequence
End of record
Indicates beginning of sequence data
Using Using EntrezEntrez
An integrated An integrated database search and database search and
retrieval systemretrieval system
WWWWWWAccessAccess
Entrez&BLAST
GenomesGenomes
TaxonomyTaxonomy
EntrezEntrez: Neighboring and Hard Links: Neighboring and Hard Links
PubMedabstractsPubMedabstracts
Nucleotide sequences
Nucleotide sequences
Protein sequencesProtein
sequences
3-D Structure(MMDB)(MMDB)
3 -D Structure
3 -D Structure
Word weight
PhylogenyPhylogenyVAST
BLASTBLAST
WWW WWW EntrezEntrez•All of MEDLINE plus others•Abstracts•Links to online Journals
GenBank, EMBL, DDBJRefSeq, PDB
GenBank, DDBJ, EMBL translationsPDB, PIR, SWISS-PROT, PRF, RefSeq
NCBI’s MMDB - derived from PDB
Graphical viewsAssembled sequence and mapping data
NCBI’s TaxonomyHierarchical tree structureaccess to sequences
MIMNow in Entrez
Population and phylogenetic studies
EntrezEntrez NucleotidesNucleotides
Mouse
Document Summaries: Mouse[All Fields]Document Summaries: Mouse[All Fields]
Chicken not mouse !?
3 million records
EntrezEntrez Nucleotides: Limits: Preview/IndexNucleotides: Limits: Preview/Index
Mouse
EntrezEntrez Nucleotides: LimitsNucleotides: LimitsAccessionAll FieldsAuthor NameEC/RN NumberFeature keyFilterGene NameIssueJournal NameKeywordModification DateOrganismPage NumberPrimary AccessionPropertiesProtein NamePublication DateSeqID StringSequence LengthSubstance NameText WordTitle WordUidVolume
Field Restriction
Only FromRefSeqGenBankEMBLDDBJ
Exclude unwanted categories of sequences
MoleculeGenomic DNA/RNAmRNArRNA
Gene LocationGenomic DNA/RNAMitochondrionChloroplast
Mouse
EntrezEntrez Nucleotides: Limits: OrganismNucleotides: Limits: Organism
Mouse
Document Summaries: Mouse[Organism]Document Summaries: Mouse[Organism]
2,976,070[All Fields]-2,921,009[Organism]
55,061
Exclude Bulk Sequences, mRNAExclude Bulk Sequences, mRNA
Adding Terms: Preview/IndexAdding Terms: Preview/Index
glyceraldehyde 3 phosphate dehydrogenase
AccessionAll FieldsAuthor NameEC/RN NumberFeature keyFilterGene NameIssueJournal NameKeywordModification DateOrganismPage NumberPrimary AccessionPropertiesProtein NamePublication DateSeqID StringSequence LengthSubstance NameText WordTitle WordUidVolume
Search History
Mouse GAPD RecordsMouse GAPD Records
(("Mus musculus"[Organism] ANDglyceraldehyde 3 phosphate dehydrogenase[Title Word])AND ((((((1900[MDAT] : 3000[MDAT]) NOT gbdiv_est[PROP]) NOT gbdiv_sts[PROP]) NOT gbdiv_gss[PROP])NOT gbdiv_htg[PROP]) NOT gbdiv_pat[PROP]))AND biomol_mrna[PROP]
Properties Field Terms
Displaying Mouse GAPD RecordsDisplaying Mouse GAPD Records
SummaryBriefGenBankASN.1FASTAGI listLinkOutPubMed LinksProtein LinksNucleotide NeighborsPopSet LinksStructure LinksGenome LinksTaxonomy LinksOMIM Links
Formats
Links and neighbors (related records)
>gi|193425|gb|M60978.1|MUSGAPDS Mus musculus testis-specific isoform of glyceraldGGCAGCCAGGCCATGAGATCTTAGGCCATGTCGAGACGTGACGTGGTCCTTACCAATGTTACTGTTGTCCAGCTACGGCGGGACCGATGCCCATGCCCATGCCCATGCCCATGTCCATGCCCATGCCCTGTGATCAGACCACCTCCACCCAAGCTTGAGGATCCACCACCCACGGTTGAAGAACAGCCACCGCCACCGCCGCCGCCACCTCCACCTCCACCACCACCTCCTCCTCCTCCTCCACCCCAGATAGAGCCAGACAAGTTTGAAGAGGCTCCCCCTCCCCCTCCCCCTCCTCCTCCTCCTCCCCCTCCCCCTCCTCCACCACTCCAAAAGCCAGCTAGAGAGCTGACAGTGGGTATCAATGGATTTGGACGCATTGGTCGTCTGGTGCTGCGAGTCTGCATGGAGAAGGGCATTAGGGTGGTAGCAGTGAATGACCCATTCATTGATCCAGAATACATGGTTTACATGTTCAAATATGACTCCACACATGGTAGATACAAAGGAAACGTGGAACATAAGAATGGACAACTAGTTGTGGACAACCTTGAGATCAACACGTACCAGTGCAAAGACCCTAAAGAAATCCCCTGGAGCTCTATAGGGAATCCCTACGTGGTGGAGTGTACAGGCGTCTATCTGTCCATCGAGGCAGCTTCGGCACATATTTCATCTGGTGCCAGGCGTGTGGTGGTCACTGCACCCTCCCCCGATGCACCCATGTTTGTCATGGGAGTGAACGAGAAGGACTATAACCCTGGCTCTATGACCATTGTCAGCAATGCATCCTGTACCACCAACTGCCTGGCTCCTCTCGCCAAGGTTATTCATGAAAACTTCGGGATCGTGGAAGGGCTAATGACCACAGTCCATTCCTACACAGCCACTCAGAAGACAGTGGATGGGCCATCAAAGAAGGACTGGCGAGGTGGCCGCGGCGCTCACCAAAACATCATCCCATCGTCCACTGGGGCTGCCAAGGCTGTAGGCAAAGTCATCCCAGAGCTCAAAGGGAAGCTAACAGGAATGGCATTCCGGGTGCCAACCCCAAACGTGTCAGTTGTGGACCTGACCTGCCGCCTGGCCAAGCCTGCTTCTTACTCGGCTATCACGGAGGCTGTGAAAGCTGCAGCCAAGGGACCTTTGGCTGGCATCCTTGCTTACACAGAGGACCAGGTGGTCTCCACGGACTTTAACGGCAATCCCCATTCTTCCATCTTTGATGCTAAGGCTGGAATTGCCCTCAATGACAACTTCGTGAAGCTTGTTGCCTGGTACGACAACGAATATGGCTACAGTAACCGAGTGGTCGACCTCCTCCGCTACATGTTTAGCCGAGAGAAGTAACACAAAAGGCCCCTCCTTGCTCCCCTGCGCACCTCGCGTTCCTGACTTCGGCTTCCACTCAAAGGCGCCGCCACCGGGTCAACAATGAAATAAAAACGAGAATGCGC
>
FASTA FormatFASTA Format
FASTA Definition Line>gi|193425|gb|M60978.1|MUSGAPDS
gi number
Database Identifiersgb GenBankemb EMBLdbj DDBJsp SWISS-PROTpdb Protein Databankpir PIRprf PRFref RefSeq
Accession number
Locus Name
Break!Break!
5 minutes