update on the assembly and annotation of the blueberry genome
TRANSCRIPT
www.P2EP.org
THE CURRENT STATUS OF THE BLUEBERRY GENOME
Robert Reid
Department of Bioinformatics & Genomics
University of North Carolina Charlotte
BLUEPRINTS FOR BLUEBERRY
www.P2EP.org
2009~
•76 BP Illumina GA II sequencing•3Kb & some 20KB 454 pyrosequencing•36 BP Illumina sequencing
2011• 454-pyrosequencing• 8 kb and 20 kb paired-end insert
2013
• Illumina Hiseq (5 lanes)• Illumina Nextera paired-end sequencing• Vaccinium.org website (WSU)
2014
• Masurca and GARM assembly• BAC libraries (UF), BAC-end sequencing
(NCSU)
2015
• SSPACE (modified) assembly• Gene annotation, RNA-Seq (Gupta et al.,
2015)• Repeat annotations, map alignments
BLUEBERRYPROJECT
TIMELINE
www.P2EP.org
Some Assembly Numbers
www.P2EP.org
Much room for improvement still
**Estimated genome size = 608 MB (Costich et al., 1993)
www.P2EP.org
MARKER ALIGNMENT TO SCAFFOLDS
Linkage Map # of markers # of scaffolds Size (bp)
*Tetraploid - Draper 689 358 121,530,818
*Tetraploid - Jewel 576 328 112,427,224
Interspecific hybrid 322 190 74,069,152
Diploid 318 153 56,781319
Cranberry 138 40 15,934975
696 scaffolds were assigned to at least one linkage group, the total size was 214 Mb
*earlier version of map markers than what was published
www.P2EP.org
GENOME COMPLETENESS
Missing gene
duplicate
complete
fragments
BUSCO2
CEGMA1
(1645 Core genes)
48%
22%
18%
12%
(458 Core genes)
2http://busco.ezlab.org/
MatchNo match
Newbler (454 reads) Nextera hybrid Assembly
Nextera Plus BAC end sequencing
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
356356350
www.P2EP.org
Gage identifies 84 KEGG pathways from gene predictions.
Top pathways found:
1. Pyruvate metabolism
2. Βeta-alanine metabolism
3. Ribosome biogenesis
4. RNA polymerase
5. Pyrimidine metabolism
ANNOTATING PATHWAYS VIA GAGE
Luo et al., GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics, 2009, 10:161
Predicted genes
Predicted proteins
Align to Ref-Seq
Map to KEGG
Identify most abundantKEGG pathways
RNA-Seq Gene prediction tools• Augustus• Genemark• SNAP• (100,000 genes)
transdecoder
BLASTPToGrape ORPotato NCBIREFSEQ
GAGE/pathview
GAGE
www.P2EP.org
MAPPED TO GRAPEvvi04141 Protein processing in endoplasmic reticulumvvi00510 N-Glycan biosynthesisvvi00350 Tyrosine metabolismvvi00561 Glycerolipid metabolismvvi03020 RNA polymerasevvi04120 Ubiquitin mediated proteolysisvvi03022 Basal transcription factorsvvi00950 Isoquinoline alkaloid biosynthesisvvi00030 Pentose phosphate pathwayvvi00730 Thiamine metabolismvvi00960 Tropane, piperidine and pyridine alkaloid biosynthesisvvi00500 Starch and sucrose metabolismvvi03420 Nucleotide excision repairvvi00196 Photosynthesis - antenna proteinsvvi03060 Protein exportvvi00565 Ether lipid metabolismvvi03430 Mismatch repairvvi00770 Pantothenate and CoA biosynthesisvvi00071 Fatty acid degradationvvi00380 Tryptophan metabolismvvi00520 Amino sugar and nucleotide sugar metabolismvvi00600 Sphingolipid metabolismvvi00040 Pentose and glucuronate interconversions
www.P2EP.org
MAPPING VIA PATHVIEW*
*Luo W, Brouwer C. Pathview: an R/Biocondutor package for pathway-based data integration and visualization. Bioinformatics, 2013
Predicted genes
Predicted proteins
Align to Ref-Seq
Map to KEGG
Overlay onto KEGG pathway
RNA-Seq1M isoforms
Gene prediction tools• Augustus• Genemark• SNAP• (100,000 genes)
Transdecoder
BLASTPToGrape ORpotato
pathview
pathview
www.P2EP.org
FLAVONOID PATHWAY
www.P2EP.org
2 AVAILABLE ONLINE ANNOTATION RESOURCES
https://www.vaccinium.org
Anne Lorraine http://bioviz.org/igb/index.html
Dorrie Maine - WSU
www.P2EP.org
FUTURE STEPS FOR GENOMICS
• Improving the genome• Massimo Iorizzo• Hamid Ashrafi
• Jeannie Rowland• Improve contiguity• Resolve repeat regions• Fill in the gaps
• Improving the linkage maps
• Higher density map• More anchoring points
• Optical map
• To improve scaffold / contig ordering
ACKNOWLEDGEMENTS• Allan Brown –CGIAR
• Ying-Chen Lin –NC State• Mary Ann Lila –NC State
• Ra’ad Gharaibeh -UNCC• Rachel Walstead -UNCC• Gregario Lingchanco-UNCC• Cory R Brouwer -UNCC
• Jeannie Rowland –USDA-ARS
• Dorrie Maine - WSU
• Garron Wright – DHMRI• Mark Burk – DHMRI
• James Olmstead – U of Florida
• And many more