church gmod2012 pt1
DESCRIPTION
Part one of my talk at the GMOD 2012 meetingTRANSCRIPT
![Page 1: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/1.jpg)
@deannachurch
Navigating Genome Resources at NCBI
Deanna M. Church, NCBI
The Evolution of the Reference Human Genome
Part 1
![Page 2: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/2.jpg)
NCBI
BLAST PubMed GenBank
![Page 3: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/3.jpg)
1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110
20,000
40,000
60,000
80,000
100,000
120,000
140,000
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
GenBank Base PairsUsers (Average)
Twenty Two Years of Growth:NCBI Data and User Services
Bas
e P
airs
(M
illio
ns)
Users/W
eekday
BLAST
EntrezGenBank at NCBIdbEST
3D StructureNetwork Entrez
WWWdbSTS
BankItGenomesTaxonomy
OMIMGeneMapCn3DUniGene
PubMedPSI-BLASTVASTePCR
Microbial GenomesPHI-BLASTCGAP
Human GenomeLinkOutLocusLinkRefSeqdbSNP
PubMed CentralBLINKMapViewerGEOGeneRIFs
WGSHLA HaplotypesHuman Genome-TPA
dbMHCBookShelfHuman Genome- Transcripts Alignments
Entrez GenesMouse Composite GenomeGnomon
PubChemTrace ArchiveCCDSCancer ChromosomesEnvironmental Samples
Public AccessInfluenza Seqs.GenSATGeneTests
Genome-Wide Association Studies dbGapEntrez Portal
Seq Read ArchiveUniSTSRefSeqGeneGenome Reference Consortium
Discovery InitiativeEntrez SensorsPrimer BLAST
PeptidomeBioSystemsFlu H1N1
dbVarEpigenomicsMyNCBI1000 Genomes Project
ClinVarGTRGenome Remapping ServicePubMed HealthCloneDBGenome Decoration Page
![Page 4: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/4.jpg)
NCBI
Tools Literature DataBlast
GBenchSplignCn3De-PCR
e-Utilities…
PubMedPubMed Central
BookshelfMeSH
Gene Reviews…
GenBankProtein DB
SRAGEO
dbSNPGene
RefSeq…
![Page 5: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/5.jpg)
Entrez: Pathway to Discovery
Amino acid sequence similarityCoding region
features
Nucleotide sequence similarity
Term frequency statistics
Literature citations in sequence databases
Literature citations in sequence databases
MEDLINE abstracts
Nucleotide sequences
Protein sequences
![Page 6: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/6.jpg)
http://www.ncbi.nlm.nih.gov/books/NBK25501/
Programmatic accesshttp://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=science[journal]+AND+breast+cancer+AND+2008[pdat]&usehistory=y
<eSearchResult><Count>6</Count><RetMax>6</RetMax><RetStart>0</RetStart><IdList>
<Id>19008416</Id><Id>18927361</Id><Id>18787170</Id><Id>18487186</Id><Id>18239126</Id><Id>18239125</Id>
</IdList>…
![Page 7: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/7.jpg)
http://www.ncbi.nlm.nih.gov/education/
http://www.youtube.com/NCBINLM @NCBI http://www.facebook.com/ncbi.nlm
![Page 8: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/8.jpg)
Collins FS et al, 1998
Throughput: 500 Mb/yearCost: < $0.25 per base
Variation: 100,000 SNPs mapped
![Page 9: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/9.jpg)
Steve Sherry, NCBI
2010
10
20
30
40
50
60
STR & IndelSNPAmbiguous mapping
Millions of rs-idsNCBI dbSNP database growth
human variations
Non-redundant annotations
25
50
75
100
125
150
175
1000 Genomes
Other projects
HapMap
TSC
Millions of submissionsSubmissions
by project
dbSNP build 135. November 2011
20001999 20112005
![Page 10: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/10.jpg)
Kidd et al, 2007 APOBEC cluster
BLACK: DeletionWhite: Insertion
![Page 11: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/11.jpg)
http://www.ncbi.nlm.nih.gov/dbvar
![Page 12: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/12.jpg)
![Page 13: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/13.jpg)
Church et al., 2011 PLoS
http://genomereference.org
![Page 14: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/14.jpg)
Distributed data
Genome not in INSDC Database
Old Assembly Model
GRC Beginnings
![Page 15: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/15.jpg)
![Page 16: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/16.jpg)
![Page 17: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/17.jpg)
![Page 18: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/18.jpg)
Build sequence contigs based on contigs defined in TPF.
Check for orientation consistenciesSelect switch pointsInstantiate sequence for further analysis
Switch point
Consensus sequence
![Page 19: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/19.jpg)
![Page 20: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/20.jpg)
ftp://ftp.ncbi.nlm.nih.gov/pub/grc/human/
![Page 21: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/21.jpg)
![Page 22: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/22.jpg)
Community Input
![Page 23: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/23.jpg)
Distributed data
Genome not in INSDC Database
Old Assembly Model
Centralized Data
![Page 24: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/24.jpg)
Large-Scale Variation Complicates Genome Assembly
Sequences from haplotype 1Sequences from haplotype 2
Old Assembly model: compress into a consensus
New Assembly model: represent both haplotypes
![Page 25: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/25.jpg)
NCBI36 (hg18)
UGT2B17 Region
![Page 26: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/26.jpg)
AC074378.4AC079749.5
AC134921.2AC147055.2
AC140484.1AC019173.4
AC093720.2AC021146.7
NCBI36 NC_000004.10 (chr4) Tiling Path
Xue Y et al, 2008
TMPRSS11E TMPRSS11E2
GRCh37 NC_000004.11 (chr4) Tiling Path
AC074378.4AC079749.5
AC134921.1AC147055.2
AC093720.2AC021146.7
TMPRSS11E
GRCh37: NT_167250.1 (UGT2B17 alternate locus)
AC074378.4AC140484.1
AC019173.4AC226496.2
AC021146.7
TMPRSS11E2
UGT2B17 Region
![Page 27: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/27.jpg)
GRCh37 (hg19)
http://genomereference.org
7 alternate haplotypesat the MHC
Alternate loci released as:FASTA
AGPAlignment to chromosome
UGT2B17 MHC MAPT
![Page 28: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/28.jpg)
![Page 29: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/29.jpg)
Assembly (e.g. GRCh37)
Primary Assembly
Non-nuclear assembly unit
(e.g. MT)
ALT 1
ALT 2
ALT 3
ALT 4
ALT 5
ALT 9
ALT 6
ALT 7ALT
8
PAR
Genomic Region(MHC)
Genomic Region
(UGT2B17)Genomic
Region(MAPT)
![Page 30: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/30.jpg)
Richa Agarwala
MHC Alternate locus
Alignment to chr6
![Page 31: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/31.jpg)
Oh No! Not a new version of the human genome!
http://genomereference.org
![Page 32: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/32.jpg)
![Page 33: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/33.jpg)
Assembly (e.g. GRCh37.p5)
Primary Assembly
Non-nuclear assembly unit
(e.g. MT)
ALT 1
ALT 2
ALT 3
ALT 4
ALT 5
ALT 9
ALT 6
ALT 7ALT
8
PAR
…
Genomic Region(MHC)
Genomic Region
(UGT2B17)Genomic
Region(MAPT)
Patches
Genomic Region(ABO)
Genomic Region(SMA)
Genomic Region
(PECAM1)
![Page 34: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/34.jpg)
TBC1D3C TBC1D3
TBC1D3C
TBC1D3H
Myo19 region (17q21)
![Page 35: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/35.jpg)
60 Fix PATCHES: Chromosome will update in GRCh38
70 Novel PATCHES: Additional sequence added
(adds >1 Mb of novel sequence to the assembly)
(adds >800K of novel sequence to the assembly)
Releasing patches quarterly
![Page 36: Church gmod2012 pt1](https://reader035.vdocuments.site/reader035/viewer/2022062617/54c63d0e4a795920538b469b/html5/thumbnails/36.jpg)
Distributed data
Genome not in INSDC Database
Old Assembly Model
Centralized Data
Updated Assembly Model
Genome in INSDC DatabaseGenome not in INSDC Database