ncbi fieldguide national center for biotechnology information a field guide to genbank and ncbi’s...

127
NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University of Colorado Health Sciences Center

Upload: eileen-goodwin

Post on 16-Dec-2015

224 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

National Center for Biotechnology Information

A Field Guide to GenBank

and NCBI’s Molecular Biology Resources

August 30, 2005 University of Colorado Health Sciences Center

Page 2: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Topics About NCBI GenBank overview Primary vs derivative databases

The Reference Sequence (RefSeq) project

Entrez databases Genome resources Bookshelf

-break- Entrez text searching BLAST sequence searching VAST structure searching An integrated example

Page 3: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

The National Institutes of Health

Bethesda, MD

Page 4: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eThe National Center for

Biotechnology Information

Accepts submissions of primary data

Develops tools to analyze these data Creates derivative databases based on the

primary data Provides free search, link, and retrieval of these

data, primarily through the Entrez system

Page 5: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eNCBI WWW Users per

Day

Page 6: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Number of Users Per Day

0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

400,000

450,000

Nu

mb

er o

f U

sers

1997 1998 1999 2000 2001 2002 2003

Christmas & New Year

Page 7: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Homepage - accessing the data

all[filter]

Page 8: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eall[filter]

1/11/2005

3/15/2005

8/15/2005

Page 9: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Entrez Nucleotide

Primary Data GenBank / DDBJ / EMBL 57.3 million (97.4 %) Derivative Data

RefSeq 1.47 million (2.5 %)

RefSeq reviewed 60,000

PDB (structures) 5,973

“Total” 59 million

GenBank

# records

Page 10: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

GenBank: NCBI’s Primary Sequence Database

ftp://ftp.ncbi.nih.gov/genbank/ ftp://genbank.sdsc.edu/pub

ftp://bio-mirror.net/biomirror/genbank

Release 149 August 2005 47 x 106 Records 52 x 109 Nucleotides

195 Gigabytes 816 files

• full release every two months• incremental and cumulative updates daily• available only through internet• release notes: gbrel.txt

Over 100 billionbases!

Over 100 billionbases!

Page 11: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eWhat is

GenBank?

Nucleotide only sequence database Archival in nature GenBank Data

Direct submissions (traditional records) Batch submissions (EST, GSS, STS) ftp accounts (genome data)

Three collaborating databases GenBank DNA Database of Japan (DDBJ) European Molecular Biology Laboratory (EMBL)

Database

Page 12: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

GenBank Divisions

“Organismal”PRI (28) Primate ROD (15) Rodent PLN (13) Plant and FungalBCT (11) Bacterial/ArchealINV (7) InvertebrateVRT (7) Other VertebrateVRL (4) ViralMAM (2) MammalianPHG (1) PhageSYN (1) SyntheticUNA (1) Unannotated

“Functional”EST (377) Expressed Sequence Tag GSS (138) Genome Survey SequenceHTG (63) High Throughput GenomicPAT (17) PatentSTS (9) Sequence Tagged SiteCON (1) Contigs, virtual

• Organized by taxonomy (sort of)• Direct submissions (Sequin/Bankit)• Accurate (~1 error per 10,000 bp)• Well characterized

• Organized by sequence type• Batch submissions (ftp/email) • Inaccurate• Poorly characterized

Page 13: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGenBank Functional (Bulk)

Divisions

GenBankEST

STS

GSS

HTG

Expressed Sequence Tag

1st pass single read cDNA

Genome Survey Sequence

1st pass single read gDNA

High Throughput Genomic

incomplete sequences of genomic

clones

Sequence Tagged Site

PCR-based mapping reagents

Whole Genome Shotgun

Page 14: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eEST Division: Expressed Sequence

Tags

RNA gene products

nucleus30,000 genes

80-100,000 uniquecDNA clones in library

- isolate unique clones - sequence once from

each end

make cDNA library

5’

3’

>IMAGE:275615 3', mRNA sequenceNNTCAAGTTTTATGATTTATTTAACTTGTGGAACAAAAATAAACCAGATTAACCACAACCATGCCTTATTATCAAATGTATAAGANGTAAATATGAATCTTATATGACAAAATGTTTCATTCATTATAACAAATTTAATAATCCTGTCAATNATATTTCTAAATTTTCCCCCAAATTCTAAGCAGAGTATGTAAATTGGAAGTTCTTATGCACGCTTAACTATCTTAACAAGCTTTGAGTGCAAGAGATTGANGAGTTCAAATCTGACCAAGGTTGATGTTGGATAAGAGAATTCTCTGCTCCCCACCTCTANGTTGCCAGCCCTC

>IMAGE:275615 5' mRNA sequenceGACAGCATTCGGGCCGAGATGTCTCGCTCCGTGGCCTTAGCTGTGCTCGCGCTACTCTCTCTTTCTGGTGGAGGTATCCAGCGTACTCCAAAGATTCAGGTTTACTCACGTCATCCAGCAGAGAATGGAAAGTCAATTCCTGAATTGCTATGTGTCTGGGTTTCATCCATCCGACATTGAAGTTGACTTACTGAAGAATGGAGAGAATTGAAAAAGTGGAGCATTCAGACTTGTCTTTCAGCAAGGACTGGTCTTTCTATCTCTTGTACTACTGAATTCACCCCCACTGAAAAAGATGAGTATGCCTGCCGTGTTGAACCATGTNGACTTTGTCACAGNCAAGTTNAGTTTAAGTGGGNATCGAGACATGTAAGGCAGGCATCATGGGAGGTTTTGAAGNATGCCGCNTTGGATTGGGATGAATTCCAAATTTCTGGTTTGCTTGNTTTTTTAATATTGGATATGCTTTTG

Page 15: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

GSS, WGS, HTG

shred

Whole BAC insert (or genome)

isolate clonessequence

GSS divisionor trace archive

Draft sequence (HTG division)

assembly whole genome shotgun assemblies (traditional division)

Page 16: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eHTG Example: Honeybee Draft

Sequences

• Unfinished sequences of BACs

• Gaps and unordered pieces

• Finished sequences (Phase 3) move

to traditional GenBank division

• Unfinished sequences of BACs

• Gaps and unordered pieces

• Finished sequences (Phase 3) move

to traditional GenBank division

LOCUS AC141845 147720 bp DNA linear HTG 19-MAR-2004

DEFINITION Apis mellifera clone CH224-4A2, WORKING DRAFT

SEQUENCE, 14 unordered pieces.

ACCESSION AC141845

VERSION AC141845.1 GI:29124029

KEYWORDS HTG; HTGS_PHASE1; HTGS_DRAFT.

LOCUS AC141845 147720 bp DNA linear HTG 19-MAR-2004

DEFINITION Apis mellifera clone CH224-4A2, WORKING DRAFT

SEQUENCE, 14 unordered pieces.

ACCESSION AC141845

VERSION AC141845.1 GI:29124029

KEYWORDS HTG; HTGS_PHASE1; HTGS_DRAFT.

Page 17: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Whole Genome Shotgun Projects

351 projects Bacteria (251) Environmental sequences (6) Archaea (6)

Eukaryotes (88), including: Chicken, Rat, Mouse, Dog (2), Chimpanzee, Human

Pufferfish (2)

Honeybee, Anopheles, Fruit Flies (3), Silkworm

Nematode (2)

Yeasts (8), Aspergillus (2)

Rice (2)

351 projects Bacteria (251) Environmental sequences (6) Archaea (6)

Eukaryotes (88), including: Chicken, Rat, Mouse, Dog (2), Chimpanzee, Human

Pufferfish (2)

Honeybee, Anopheles, Fruit Flies (3), Silkworm

Nematode (2)

Yeasts (8), Aspergillus (2)

Rice (2)

Page 18: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eWhole Genome Shotgun (WGS)

Projects

wgs master[properties]

Page 19: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Derivative Databases

GenBank

SequencingCenters UniGene

RefSeq:

Entrez Gene and

annotation pipelines

Labs

Updated ONLY by submitters

ESTUniSTS

STS

HTG

GSS

PRI ROD PLN MAM BCT

INV VRT PHG VRL

ATT GA

ATT

C

GA

C

GA

C

C

CATT

TAACT

Updated

by NCBI

RefSeq

Page 20: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Why Make Reference Sequences?

Entrez Nucleotide query:

human[organism] AND lipase[title]

Page 21: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eWhy Make Reference Sequences?Entrez Nucleotide query:

human[organism] AND lipase[title]

Page 22: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

ehuman[organism] AND lipase[title] AND endothelial[title]

3927 bp

4150 bp

3927 bp

2323 bp

261 bp

human[organism] AND lipase[title] AND endothelial[title]

Page 23: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

RefSeq Benefits

genomestranscripts

proteins

• non-redundant; best representative

•updates to reflect current sequence data and biology

•distinct, stable accession series

Page 24: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Reference Sequence: RefSeq

Accession Sequence Type

NM_123456789 mRNANP_123456789 protein, from NM_NR_123456 non-coding RNAXM_123456 predicted mRNAXP_123456 predicted protein XR_123456 predicted non-coding RNAZP_12345678 predicted from NZ_

NC_123456 genomic, e.g., chromosomesNG_123455 genomic, incomplete region

NT_123456 genomic, BAC assemblyNW_123456 genomic, WGS assemblyNZ_ABCD12345678 genomic, WGS collection

blue=curated

Page 25: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Genomic DNAGenomic DNA((NCNC,, NTNT,, NW NW))

Model mRNAModel mRNA (XM)(XM)(XR)(XR)

Curated mRNACurated mRNA (NM)(NM)(NR)(NR)

Model protein Model protein (XP)(XP)

Annotation Process

Curated ProteinCurated Protein (NP)(NP)

Scanning....

GenbankSequences

RefSeq

Page 26: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Creating NM_ Records

NM’s must have cDNA support

Genome annotation

Longest mRNA

transcript variant 1transcript variant 2transcript variant 3

Page 27: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Where is RefSeq?

Page 28: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

GENSAT

The Entrez System

Entrez

Nucleotide

PubMed

Protein

Taxonomy

Structure

Domains 3D DomainsJournal

s

PMC

OMIM

Books

PopSet

SNP

UniGene UniSTS

Genome

Gene

GEO

MeSH

CancerChromosomes

Homologene

PubChem

Page 29: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

A Few Entrez Databases

UniGene Clusters of ESTs, mRNAs

dbSNP Single Nucleotide

Polymorphisms

GEO Gene Expression Omnibus

microarray and other

expression data

CDD Conserved Domain Database protein families (COGs

and KOGs)

single domains (PFAM,

SMART, CD)

UniGene Clusters of ESTs, mRNAs

dbSNP Single Nucleotide

Polymorphisms

GEO Gene Expression Omnibus

microarray and other

expression data

CDD Conserved Domain Database protein families (COGs

and KOGs)

single domains (PFAM,

SMART, CD)

Page 30: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGene-oriented clusters of expressed sequences

• Automatic clustering using MegaBlast

• Each cluster represents a unique gene

• Informed by genome hits

• Information on tissue types and map locations

• Useful for gene discovery and selection of

mapping reagents

UniGene

unique gene

Page 31: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

A Cluster of ESTs

query

5’ EST hits

3’ EST hits

Page 32: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eUniGene Collections

Page 33: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eExample UniGene Cluster

Page 34: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eHistogram of cluster sizes for UniGene Hs Build 177

(Now at Build #186)

Page 35: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eUniGene Cluster Hs.95351

SELECTED PROTEIN SIMILARITES

Page 36: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eUniGene Cluster Hs.95351

GENE EXPRESSION

Page 37: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

UniGene Cluster Hs.95351: expression

Page 38: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eUniGene Cluster Hs.95351: seqs

Page 39: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Download sequences

web page

ftp://ftp.ncbi.nih.gov/repository/UniGene/Homo_sapiens/

Page 40: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eEntrez GEO

Page 41: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

NCBI’s SNP Database

Primary and derivative (RefSNP) Single nucleotide polymorphisms

Repeat polymorphisms

Insertion-deletion polymorphisms

Over 19 million refSNPs (rsXXXXXXX)

(August, 2005)

Page 42: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Searching dbSNP

Page 43: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

RefSNP

Page 44: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

RefSNP

Page 45: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

RefSNP

Page 46: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

RefSNP

Search Mouse SNP between strains

Page 47: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

RefSNP

MapView GeneView SeqView OMIMNo 3D

Page 48: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

RefSNP

Page 49: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eEntrez GEO

Page 50: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

GPLPlatform

descriptions

GSMRaw/processedspot intensities

from a singleslide/chip

GSEGrouping of

slide/chip data“a single experiment”

GDSGrouping ofexperiments

Curated byNCBI

Submitted byExperimentalistsSubmitted by

Manufacturer*

Entrez GEOEntrez

GEO Datasets

GEO SaMple:

experimental

conditions

GEO SEries:

set of related

samples

Page 51: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

What’s a DataSet?

Platform (GPL)

array definition

Sample(GSM)

hyb. measurements

Series(GSE)

related Samples

Supplied by submitter

DataSet (GDS)

• A collection of experimentally-related samples processed using the same platform.• Samples within DataSets are organized into subgroups based on experimental variables.• Form the basis of GEO’s query, analysis and data display tools.

Assembled by GEO staff

Page 52: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGene Expression Omnibus (GEO)

Dataset browser

Page 53: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGEO Dataset Browser

Page 54: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGEO Dataset Report

Page 55: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

GEO Profiles

… of 12625

Page 56: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eEntrez CDD

Page 57: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eConserved Domain Database

Multiple sequence alignments

Position-specific scoring matrices (PSSM)

Sources SMART, PFAM, COGs, KOGs, and

NCBI curated domains (structure-informed

alignments)

Multiple sequence alignments

Position-specific scoring matrices (PSSM)

Sources SMART, PFAM, COGs, KOGs, and

NCBI curated domains (structure-informed

alignments)

Page 58: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

CDD

>gi|45549418|gb|AAS67634.1| ATP7A [Solenodon paradoxus] IVYQPHLITVEEIKKQIKAVGFPAFIKKQPKYLKLGAIDIERLKNIPVKSSEGSQQMSPSSTNDSKVTLTIDGMHCNSCVSNIESALSTLHYVSSIVVSLQNKSAIIKYNANSVTPEILKKAIEAISPGQYRVSITSEVESTSNSPSSSSQKAPLNVVSQPLTQVTVININGMTCNSCVQSIEGVMSKKAGVKSIQVSLANRNGTVEYDP LLTSPEILRE

Page 59: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

CDD

CD

Pfam

COG

Click on a colored bar to align your sequence to the CD

Page 60: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eConserved Domain Database: cd00371.1, HMA

Page 61: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

CDD

Page 62: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eCDART: Conserved Domain Architecture Retrieval

Tool

Page 63: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

cdd

Linking from Entrez Protein

Page 64: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Genome Resources

Gene database

Trace Archive

Map Viewer

Homologene

Genomic Biology

Page 65: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Genomic Biology

Page 66: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGen Biol: Gen Resources

Page 67: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Gen Biol: Gen Resources

Page 68: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGen Biol: Gen Resources

Page 69: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Genome Projects: microb

Page 70: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGen Biol: Gen Resources

Page 71: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGen Biol: Gen Resources

Page 72: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGen Biol: Gen Resources

Page 73: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGen Biol: Gen Resources

Page 74: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Gen Biol: Gen Resources

Page 75: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Genome Resources

Gene database

Trace Archive

Map Viewer

Homologene

Genomic Biology

Page 76: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Entrez Gene

A single query interface to …

• Sequences

- RefSeqs

- GenBank

- Homologene• Maps – MapViewer• Entrez links• Linkouts

More organisms, ~ 3000

Entrez integration

More organisms, ~ 3000

Entrez integration

Page 77: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGlobal Entrez: NADH2

Page 78: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eEntrez Gene: NADH2

Page 79: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGene Record for Pongo NADH2

Homo sapiens

Not found with “nadh2”

Page 80: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eA Record With More Data: Human HFE

Page 81: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eHuman HFE: Transcripts

Transcripts with experimental

evidence

Transcripts with experimental

evidence

Page 82: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGene Table

Page 83: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eIntrons/Exons: Gene Table

links to sequence

Page 84: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eHuman HFE: Links

Page 85: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Genotype

Page 86: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eGenotype

Page 87: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eHuman HFE: Links

Page 88: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

GeneView in dbSNP

Page 89: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

SNP in Structure

Page 90: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

SNP in Structure

Page 91: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

SNP in Structure

H41

S43

C260

Page 92: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eAnother Variation Source: OMIM

Page 93: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eVariants in OMIM

Page 94: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Genome Resources

Gene database

Trace Archive

Map Viewer

Homologene

Genomic Biology

Page 95: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

The New Homologene

Automated detection of homologs among the annotated genes of

completely sequenced eukaryotic genomes.

No longer UniGene based

Protein similarities first

Guided by taxonomic tree

Includes orthologs and

paralogs

No longer UniGene based

Protein similarities first

Guided by taxonomic tree

Includes orthologs and

paralogs

Page 96: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

The New Homologene

Homologene Build 43.1 (8/23/05)

Species Number of genes input grouped groups

Page 97: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

RAG1 → Homologene

Page 98: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

RAG1 → HomolgeneRAG1

Page 99: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eRAG1

RING-finger

Page 100: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

RAG1 → HomolgeneRAG1

Page 101: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eRAG1

Sugar_tr

Page 102: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Homologene: alignment scores

Page 103: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eBLASTPbl2seq

Page 104: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Genome Resources

LocusLinkLocusLinkGene databaseGene database

UniGeneUniGene

Trace ArchiveTrace Archive

Map ViewerMap Viewer

HomologeneHomologene

Page 105: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

List View

Page 106: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eHuman MapViewer

adar

Page 107: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eMapViewer: Human ADAR

Page 108: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

MV Hs ADAR3’ UTR

5’ UTR

Page 109: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eMaps & Options

--Sequence maps--Ab initioAssemblyRepeatsBES_CloneCloneNCI_CloneContigComponentCpG islanddbSNP haplotypeFosmidGenBank_DNAGenePhenotypeSAGE_TagSTSTCAG_RNATranscript (RNA)Hs_UniGeneHs_EST

--Cytogenetic maps--IdeogramFISH CloneGene_CytogeneticMitelman BreakpointMorbid/Disease--Genetic Maps--deCODEGenethonMarshfield--RH maps--GeneMap99-G3GeneMap99-GB4NCBI RHStandford-G3TNGWhitehead-RHWhitehead-YAC

Mm_UniGeneMm_ESTRn_UniGeneRn_ESTSsc_UniGeneSsc_ESTBt_UniGeneBt_ESTGga_UniGeneGga_ESTVariation

Maps & Options

= SNP

Page 110: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

MapViewerUniGene

Component

Repeats

Gene

Page 111: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

GenePhenotype Variation

Page 112: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

eMaps & OptionsMaps & Options

Page 113: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Genome Resources

LocusLinkLocusLinkGene databaseGene database

UniGeneUniGene

Trace ArchiveTrace Archive

Map ViewerMap Viewer

HomologeneHomologene

Page 114: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Trace Archive Page

Page 115: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Macaca Mulatta Traces

Page 116: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Page 117: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Trace Archive BLAST Page

Access to sequences NOT in GenBankAccess to sequences NOT in GenBank

Page 118: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Literature Links

Page 119: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

BOOKS Database

Page 120: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

BOOKS Database: hyperlinked

Page 121: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

BOOKS Database

Page 122: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

BOOKS Database

Page 123: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

BOOKS Database

Page 124: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Genes & Dis

Page 125: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Genes & Dis

Page 126: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

For More Information…

Page 127: NCBI FieldGuide National Center for Biotechnology Information A Field Guide to GenBank and NCBI’s Molecular Biology Resources August 30, 2005 University

NC

BI

Fie

ldG

uid

e

Intermission