day1 bootcamp 070926 - ambl · the national center for biotechnology information created in 1988 as...

76
BOOT CAMP BASIC BASIC BIOINFORMATICS BIOINFORMATICS Welcome to Day 1 bioteach.ubc.ca/bootcamp [email protected]

Upload: lehanh

Post on 23-Jul-2019

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

BOOTCAMP

BASICBASIC

BIOINFORMATICSBIOINFORMATICS

Welcome to Day 1bioteach.ubc.ca/bootcamp

[email protected]

Page 2: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 3: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

The National Center forBiotechnology Information

Created in 1988 as a part of theNational Library of Medicine at NIH

– Establish public databases

– Research in computational biology

– Develop software tools for sequence analysis

– Disseminate biomedical information

Bethesda,MD

Page 4: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Web Access: www.ncbi.nlm.nih.gov

Page 5: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Number of Users and Hits Per Day

0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

400,000

450,000

Nu

mb

er

of

Users

1997 1998 1999 2000 2001 2002 2003

Christmas &

New Year’s Days

Currently averaging

10,000,000 to 35,000,000

hits per day!

Page 6: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 7: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

The NCBI ftp site

30,000 files per day

620 Gigabytes per day

Page 8: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

NCBI Databases and Services

• GenBank largest sequence database

• Free public access to biomedical literature

– PubMed free Medline

– PubMed Central full text online access

• Entrez integrated molecular and literature databases

• BLAST highest volume sequence search service

• VAST structure similarity searches

• Software and Databases

Page 9: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Types of Databases

• Primary Databases

– Original submissions by experimentalists

– Content controlled by the submitter

• Examples: GenBank, SNP, GEO

• Derivative Databases

– Built from primary data

– Content controlled by third party (NCBI)

• Examples: Refseq, TPA, RefSNP, UniGene, NCBI

Protein, Structure, Conserved Domain

Page 10: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

What is GenBank? NCBI’s Primary Sequence Database

• Nucleotide only sequence database

• Archival in nature

– Historical

– Reflective of submitter point of view (subjective)

– Redundant

• GenBank Data

– Direct submissions (traditional records)

– Batch submissions (EST, GSS, STS)

– ftp accounts (genome data)

• Three collaborating databases

– GenBank

– DNA Database of Japan (DDBJ)

– European Molecular Biology Laboratory (EMBL)Database

Page 11: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

EBI

GenBank

DDBJ

EMBL

EMBLEMBL

Entrez

SRS

getentry

NIGNIG

CIB

NCBI

NIHNIH

•Submissions•Updates •Submissions

•Updates

•Submissions•Updates

Page 12: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

GenBank: NCBI’s Primary Sequence Database

ftp://ftp.ncbi.nih.gov/genbank/

Records 101,530,711

Total Bases181,489,883,388 includes WGS

August 2007 Release 161

• full release every two months

• incremental updates daily

• available only via ftp

Page 13: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

The Growth of GenBank

Release 161

Doubling time 12-14 months

Non-WGS: 79.5 billion bases

WGS: 102 billion bases

Page 14: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Organization of GenBank:Traditional Divisions

Records are divided into 18 Divisions.12 Traditional

6 Bulk

TraditionalTraditional

Divisions:Divisions:

•• Direct Submissions

(Sequin and BankIt)

•• Accurate

•• Well characterized

PRI PrimatePLN Plant and FungalBCT Bacterial and ArchealINV InvertebrateROD RodentVRL ViralVRT Other VertebrateMAM MammalianPHG PhageSYN Synthetic(cloning vectors)ENV Environmental SamplesUNA Unannotated

Entrez query: gbdiv_xxx[Properties]

Page 15: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Organization of GenBank:Bulk Divisions

Records are divided into 18 Divisions.12 Traditional

6 Bulk

BULK Divisions:BULK Divisions:

•• Batch Submission

(Email and FTP)

•• Inaccurate

•• Poorly characterized

EST Expressed Sequence TagGSS Genome Survey SequenceHTG High Throughput GenomicSTS Sequence Tagged SiteHTC High Throughput cDNAPAT Patent

Entrez query: gbdiv_xxx[Properties]

Page 16: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

A Traditional

GenBank Record

LOCUS AY182241 1931 bp mRNA linear PLN 04-MAY-2004DEFINITION Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds.ACCESSION AY182241VERSION AY182241.2 GI:32265057KEYWORDS .SOURCE Malus x domestica (cultivated apple) ORGANISM Malus x domestica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus.REFERENCE 1 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Cloning and functional expression of an (E,E)-alpha-farnesene synthase cDNA from peel tissue of apple fruit JOURNAL Planta 219, 84-94 (2004)REFERENCE 2 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (18-NOV-2002) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USAREFERENCE 3 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (25-JUN-2003) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REMARK Sequence update by submitterCOMMENT On Jun 26, 2003 this sequence version replaced gi:27804758.FEATURES Location/Qualifiers source 1..1931 /organism="Malus x domestica" /mol_type="mRNA" /cultivar="'Law Rome'" /db_xref="taxon:3750" /tissue_type="peel" gene 1..1931 /gene="AFS1" CDS 54..1784 /gene="AFS1" /note="terpene synthase" /codon_start=1 /product="(E,E)-alpha-farnesene synthase" /protein_id="AAO22848.2" /db_xref="GI:32265058" /translation="MEFRVHLQADNEQKIFQNQMKPEPEASYLINQRRSANYKPNIWK NDFLDQSLISKYDGDEYRKLSEKLIEEVKIYISAETMDLVAKLELIDSVRKLGLANLF EKEIKEALDSIAAIESDNLGTRDDLYGTALHFKILRQHGYKVSQDIFGRFMDEKGTLE DFLHKNEDLLYNISLIVRLNNDLGTSAAEQERGDSPSSIVCYMREVNASEETARKNIK GMIDNAWKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSLYKDGDGFGDQEKGPRTHI LSLLFQPLVN"ORIGIN 1 ttcttgtatc ccaaacatct cgagcttctt gtacaccaaa ttaggtattc actatggaat 61 tcagagttca cttgcaagct gataatgagc agaaaatttt tcaaaaccag atgaaacccg 121 aacctgaagc ctcttacttg attaatcaaa gacggtctgc aaattacaag ccaaatattt 181 ggaagaacga tttcctagat caatctctta tcagcaaata cgatggagat gagtatcgga 241 agctgtctga gaagttaata gaagaagtta agatttatat atctgctgaa acaatggatt//

Header

Feature Table

Sequence

The Flatfile Format

Page 17: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Traditional GenBank Record

ACCESSION U07418

VERSION U07418.1 GI:466461

Accession

•Stable

•Reportable

•Universal

Version

Tracks changes in sequenceGI number

NCBI internal use

well annotated

the sequence is the data

Page 18: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Primary vs. Derivative Databases

ACGTGC

CG

TG

AATTGACTAACGTGCA

CG

TG

C TTGACA

TATA

GCCG

GenBank

SequencingCenters

GAGA

ATTC

C

GAGA

ATTC

C

RefSeq:LocusLink andGenomes Pipelines

Labs

Curators

TATAGCCG

AGCTCCGATA

CCGATGACAA

Updated ONLY by submitters

EST

STS

GSS

HTG

UniGene

RefSeq:Annotation Pipeline

Algorithms

UniSTS

Updatedcontinuallyby NCBI

PRI ROD PLN MAM BCT

INV VRT PHG VRL

Page 19: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Derivative Databases

Page 20: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Entrez Protein: Derivative Database

99,187PDB

723,998)(PAT Division

5,267,602BLAST nr total

(no patents or env_nr -now 6 million)

17,360,570Total

29,456PIR

12,079PRF

273,209Swiss Prot

5,263Third Party Annotation

3,889,502RefSeq

Sequences

11,585,396

Data Source

GenPept

Page 21: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

FEATURES Location/Qualifiers

source 1..2484

/organism="Homo sapiens"

/mol_type="mRNA" /db_xref="taxon:9606"

/chromosome="3"

/map="3p22-p23"

gene 1..2484

/gene="MLH1" CDS 22..2292

/gene="MLH1"

/note="homolog of S. cerevisiae PMS1 (Swiss-Prot Accession

Number P14242), S. cerevisiae MLH1 (GenBank Accession Number U07187), E. coli MUTL (Swiss-Prot Accession Number

P23367), Salmonella typhimurium MUTL (Swiss-Prot Accession

Number P14161) and Streptococcus pneumoniae (Swiss-Prot

Accession Number P14160)"

/codon_start=1 /product="DNA mismatch repair protein homolog"

/protein_id="AAC50285.1"

/db_xref="GI:463989"

/translation="MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKS

TSIQVIVKEGGLKLIQIQDNGTGIRKEDLDIVCERFTTSKLQSFEDLASISTYGFRGE ALASISHVAHVTITTKTADGKCAYRASYSDGKLKAPPKPCAGNQGTQITVEDLFYNIA

TRRKALKNPSEEYGKILEVVGRYSVHNAGISFSVKKQGETVADVRTLPNASTVDNIRS

GenPept: GenBank CDS

translations

>gi|463989|gb|AAC50285.1| DNA mismatch repair prote... MSFVAGVIRRLDETVVNRIAAGEVIQRPANAIKEMIENCLDAKSTSIQVIV...

EDLDIVCERFTTSKLQSFEDLASISTYGFRGEALASISHVAHVTITTKTAD...

Page 22: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

RefSeq: NCBI’s Derivative Sequence Database

• Curated transcripts and proteins– reviewed

– human, mouse, rat, fruit fly, zebrafish, arabidopsis

microbial genomes (proteins), and more

• Model transcripts and proteins

• Assembled Genomic Regions (contigs)– human genome

– mouse genome

– rat genome

• Chromosome records

– Human genome

– microbial

– organelle

ftp://ftp.ncbi.nih.gov/refseq/release/

srcdb_refseq[Properties]

– chicken

– honeybee

– sea urchin

Page 23: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Selected RefSeq Accession Numbers

mRNAs and Proteins

NM_123456 Curated mRNA

NP_123456 Curated Protein

NR_123456 Curated non-coding RNA

XM_123456 Predicted mRNA

XP_123456 Predicted Protein

XR_123456 Predicted non-coding RNA

Gene RecordsNG_123456 Reference Genomic Sequence

ChromosomeNC_123455 Microbial replicons, organelle

genomes, human chromosomes

AssembliesNT_123456 Contig

NW_123456 WGS Supercontig

Page 24: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

RefSeq Benefits

• non-redundancy

• explicitly linked nucleotide and protein sequences

• updates to reflect current sequence data and biology

• data validation

• format consistency

• distinct accession series

• stewardship by NCBI staff and collaborators

Page 25: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Other NCBI Databases

•Structure: imported structures (PDB)

Cn3D viewer, NCBI curation

•CDD: conserved domain database

Protein families (COGs and KOGs)

Single domains (PFAM, SMART, CD)

•dbSNP: nucleotide polymorphism

•Gene: gene recordsUnifies LocusLink and Microbial Genomes

•HomoloGene: neighboring function for Gene

Page 26: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 27: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

WWW

Access

Entrez

&

BLAST

Page 28: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Gene

Homologene

Entrez: Database Integration

PubMed

abstracts

Nucleotide

sequences

Protein

sequences

3-D

Structure

3 -D

Structure

Word weight

VAST

BLASTBLAST

Hard LinkNeighbors

Related Sequences

Neighbors

Related Sequences

BLink

Domains

Neighbors

Related Structures

Page 29: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

The Links Menu: Access Links/Neighbors

SNP

GEO

Gene

PubMed

Protein

Page 30: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

The Links Menu: Access Neighbors/Links

Neighbors: BLAST Link

pre-computed BLAST

Neighbors:

pre-computed CDD search

Page 31: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

The Links Menu: Access Neighbors/Links

Neighbors

Hard Links

Page 32: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Database Searching with Entrez

Using limits and field restriction to find human MutL homolog

Linking and neighboring with MutL

Page 33: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Global NCBI (Entrez) Search

colon cancer

Page 34: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Global Entrez Search Results

Page 35: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

OMIM: Human Disease Genes

Page 36: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Nucleotide Sequences

Nucleotide database now three parts

•EST expressed sequence tags

•GSS genome survey sequences

•CoreNucleotide everything else

Page 37: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Advanced Search OptionsTabs

Page 38: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

colon cancer[Title] AND nonpolyposis[Title]

Page 39: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

colon cancer[Title] AND nonpolyposis[Title] AND

biomol_mrna[Properties] AND srcdb_refseq[Properties]

Page 40: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Advanced Search OptionsTabs

Page 41: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 42: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

More Precise Nucleotides Search

colon cancer[Title] AND nonpolyposis[Title] AND human[Organism]

AND biomol_mrna[Properties] AND srcdb_refseq[Properties]

Page 43: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Useful Field Restrictions[Title]: Definition line in GenBank / GenPept format shown in Summary format

glyceraldehyde 3 phosphate dehydrogenase[Title]

[Organism]: NCBI’s taxonomy. Organizing system for molecular databases

mouse[organism]; green plants[organism]; Streptomyces coelicolor[organism]

[Properties]: molecule type, location, database source

biomol_mrna[properties]; biomol_genomic[properties];

gene_in_mitochondrion[properties]; srcdb_pdb[properties]

[Filter]: subsets of data, Entrez links

all[filter]; nucleotide mapview[filter]; nucleotide_omim[filter]

Page 44: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 45: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 46: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 47: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Organism Field: NCBI’s Taxonomy

All molecular

databases

Page 48: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Entrez: Use Gene for everything

HomoloGene

Entrez

Protein

GeneOther Entrez DBs

BLink

Homologene:

Gene Neighbors

Page 49: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 50: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

MLH1 Gene Record

Page 51: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

MLH1 Gene Record: Interactions + GO

Page 52: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

MLH1 Gene Record: Sequences

Page 53: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

MLH1: Sequence Links

Page 54: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 55: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Finding Homologs: HomoloGene

Protein

mRNA

Genomic

Page 56: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

HomoloGene Cluster

Gene Links Protein Links

Page 57: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Finding Homologs 2: BLink

Page 58: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

BLink: BLAST Link (Best Hits)

BLAST

Opossum homolog

Redundant Proteins

First 200 only

Page 59: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 60: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

navigate to:

bioteach.ubc.ca/bootcamp

Follow link to practical exercise

page at the NCBI where you’ll find

step-by-step instructions

Strategy #1:

search nt

Let’s compare

our results

Strategy #2: search

entrez gene

Use the preview tab and feature keys

Page 61: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

215Search human[Organism] AND cancer[Text Word]

AND promoter[Feature key]

(Approach A: Entrez CoreNucleotide search)

#1

48178CoreNucleotide Links for Gene (Search

human[Organism] AND cancer[Text Word] AND

gene_nucleotide[Filter])

(Approach B: Entrez gene follow link to

CoreNucleotide)

#2

317Search #2 AND promoter[Feature key]

(limit Approach B search to records with promoter

annotated)

#3

173Search #1 NOT #3 (unique hits from Approach A:

straight to Entrez CoreNucleotide search)

#4

275Search #3 NOT #1 (unique hits from Approach B:

Entrez Gene to CoreNucleotide)

#5

ResultMost Recent QueriesSearch

Check your History

Page 62: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Searching PubMed

• How many papers in PubMed are there:– about cancer?

– about carrots?

• Using Entrez PubMed, can you see ifthere is any scientific links betweencarrots and cancer?– How many papers are there about “carrots

AND cancer”?

– What is the active chemical substance incarrots that may play a role in cancers?

Page 63: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 64: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 65: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 66: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 67: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 68: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 69: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 70: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 71: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 72: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 73: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases
Page 74: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

You can make up your own

examples, to search Pubmed…

or the Bookshelf…

Page 75: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

86

http://www.ncbi.nih.gov/Database/datamodel

Page 76: Day1 Bootcamp 070926 - AMBL · The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH – Establish public databases

Links• The About Entrez page at the NCBI

http://www.ncbi.nlm.nih.gov/Database/index.html

• Model of Entrez Databases from NCBIhttp://www.ncbi.nih.gov/Database/datamodel/index.html

• PubMed Tutorial from NLMhttp://www.nlm.nih.gov/bsd/pubmed_tutorial/m1001.html