agbt2015 workshop schneider
TRANSCRIPT
Advancing the Human Reference Assembly
Valerie SchneiderNCBI
25 February 2015
The Human Reference Genome: Today, Tomorrow and Next ?
http://genomereference.org
Outline
• The assembly model• Basics• Value added• Challenges
• Future relevance of the reference• Multiple genomes• Haploid genomes
• Assembly updates• Mechanisms• Requirements/Challenges
Sequences from haplotype 1
Sequences from haplotype 2
Old Assembly model: compress into a consensus
Current Assembly model: represent both haplotypes
GRC Assembly Model
many
Assembly (e.g. GRCh38)
Primary Assembly
Unit
Non-nuclear assembly unit
(e.g. MT)
PAR
Genomic Region(MHC)
Genomic Region
(UGT2B17)
Genomic Region(MAPT)
Church et al., PLoS Biol. 2011 Jul;9(7):e1001091GRC Assembly Model
ALT 2
ALT 3
ALT 4
ALT 5
ALT 6
ALT 7
ALT 1
GRC Assembly Model
Alt loci alignments are an integral part of the assembly modelalignment to chr + scaffold sequence = Alt
GRCh38• 178 regions with alt loci: 2% of chromosome
sequence (61.9 Mb)
• 261 Alt Loci: 3.6 Mb novel sequence relative to
chromosomes
• Average alt length = 400 kb, max = ~5 Mb
GRCh38
GRC Assembly Model
The human reference assembly represents population genomic diversity in the context of linear sequences
Assembly (e.g. GRCh38.p1)
Primary Assembly
Unit
Non-nuclear assembly unit
(e.g. MT)
ALT 1
ALT 2
ALT 3
ALT 4
ALT 5
ALT 6
ALT 7
PAR
Genomic Region(MHC)
Genomic Region
(UGT2B17)
Genomic Region(MAPT)
Patches
Genomic Region(ABO)
Genomic Region
(FOXO6)
Genomic Region
(FCGBP)
Assembly Updates
Patches
FIX NOVEL
SCAFFOLD STATUS AT NEXTMAJOR ASSEMBLY RELEASE
ALT LOCI
--(integrated)
Treat as: Allelic
Treat as: Preferred
Assembly Updates
GRC
• Finished Quality• INSDC Accessioned• Representative of an actual DNA molecule
Criteria for Reference Assembly Component Sequences
Summary
• Reference Assembly: Today• Multi-allelic• Need compatible toolsuites
• Reference Assembly: Tomorrow• Defining sequence context• Providing coordinates
• Reference Assembly: Next ?• Patches• Challenges
GRCh38 Collaborators• NCBI RefSeq and gpipe annotation team• Havana annotators• Karen Miga• David Schwartz• Steve Goldstein• Mario Caceres• Giulio Genovese• Jeff Kidd• Peter Lansdorp• Mark Hills• David Page• Jim Knight• Stephan Schuster• 1000 Genomes
GRC SAB• Rick Myers• Granger Sutton• Evan Eichler• Jim Kent• Roderic Guigo• Carol Bult• Derek Stemple• Jan Korbel• Liz Worthey• Matthew Hurles• Richard Gibbs
GRC Credits
Workshop sponsor:
http://genomereference.org