update on the assembly and annotation of the blueberry genome

13
www.P2EP.org THE CURRENT STATUS OF THE BLUEBERRY GENOME Robert Reid [email protected] Department of Bioinformatics & Genomics University of North Carolina Charlotte BLUEPRINTS FOR BLUEBERRY

Upload: rob-reid

Post on 06-Apr-2017

129 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Update on the assembly and annotation of the blueberry genome

www.P2EP.org

THE CURRENT STATUS OF THE BLUEBERRY GENOME

Robert Reid

[email protected]

Department of Bioinformatics & Genomics

University of North Carolina Charlotte

BLUEPRINTS FOR BLUEBERRY

Page 2: Update on the assembly and annotation of the blueberry genome

www.P2EP.org

2009~

•76 BP Illumina GA II sequencing•3Kb & some 20KB 454 pyrosequencing•36 BP Illumina sequencing

2011• 454-pyrosequencing• 8 kb and 20 kb paired-end insert

2013

• Illumina Hiseq (5 lanes)• Illumina Nextera paired-end sequencing• Vaccinium.org website (WSU)

2014

• Masurca and GARM assembly• BAC libraries (UF), BAC-end sequencing

(NCSU)

2015

• SSPACE (modified) assembly• Gene annotation, RNA-Seq (Gupta et al.,

2015)• Repeat annotations, map alignments

BLUEBERRYPROJECT

TIMELINE

Page 3: Update on the assembly and annotation of the blueberry genome

www.P2EP.org

Some Assembly Numbers

Page 4: Update on the assembly and annotation of the blueberry genome

www.P2EP.org

Much room for improvement still

**Estimated genome size = 608 MB (Costich et al., 1993)

Page 5: Update on the assembly and annotation of the blueberry genome

www.P2EP.org

MARKER ALIGNMENT TO SCAFFOLDS

Linkage Map # of markers # of scaffolds Size (bp)

*Tetraploid - Draper 689 358 121,530,818

*Tetraploid - Jewel 576 328 112,427,224

Interspecific hybrid 322 190 74,069,152

Diploid 318 153 56,781319

Cranberry 138 40 15,934975

696 scaffolds were assigned to at least one linkage group, the total size was 214 Mb

*earlier version of map markers than what was published

Page 6: Update on the assembly and annotation of the blueberry genome

www.P2EP.org

GENOME COMPLETENESS

Missing gene

duplicate

complete

fragments

BUSCO2

CEGMA1

(1645 Core genes)

48%

22%

18%

12%

(458 Core genes)

2http://busco.ezlab.org/

MatchNo match

Newbler (454 reads) Nextera hybrid Assembly

Nextera Plus BAC end sequencing

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

356356350

Page 7: Update on the assembly and annotation of the blueberry genome

www.P2EP.org

Gage identifies 84 KEGG pathways from gene predictions.

Top pathways found:

1. Pyruvate metabolism

2. Βeta-alanine metabolism

3. Ribosome biogenesis

4. RNA polymerase

5. Pyrimidine metabolism

ANNOTATING PATHWAYS VIA GAGE

Luo et al., GAGE: Generally Applicable Gene Set Enrichment for Pathways Analysis. BMC Bioinformatics, 2009, 10:161

Predicted genes

Predicted proteins

Align to Ref-Seq

Map to KEGG

Identify most abundantKEGG pathways

RNA-Seq Gene prediction tools• Augustus• Genemark• SNAP• (100,000 genes)

transdecoder

BLASTPToGrape ORPotato NCBIREFSEQ

GAGE/pathview

GAGE

Page 8: Update on the assembly and annotation of the blueberry genome

www.P2EP.org

MAPPED TO GRAPEvvi04141 Protein processing in endoplasmic reticulumvvi00510 N-Glycan biosynthesisvvi00350 Tyrosine metabolismvvi00561 Glycerolipid metabolismvvi03020 RNA polymerasevvi04120 Ubiquitin mediated proteolysisvvi03022 Basal transcription factorsvvi00950 Isoquinoline alkaloid biosynthesisvvi00030 Pentose phosphate pathwayvvi00730 Thiamine metabolismvvi00960 Tropane, piperidine and pyridine alkaloid biosynthesisvvi00500 Starch and sucrose metabolismvvi03420 Nucleotide excision repairvvi00196 Photosynthesis - antenna proteinsvvi03060 Protein exportvvi00565 Ether lipid metabolismvvi03430 Mismatch repairvvi00770 Pantothenate and CoA biosynthesisvvi00071 Fatty acid degradationvvi00380 Tryptophan metabolismvvi00520 Amino sugar and nucleotide sugar metabolismvvi00600 Sphingolipid metabolismvvi00040 Pentose and glucuronate interconversions

Page 9: Update on the assembly and annotation of the blueberry genome

www.P2EP.org

MAPPING VIA PATHVIEW*

*Luo W, Brouwer C. Pathview: an R/Biocondutor package for pathway-based data integration and visualization. Bioinformatics, 2013

Predicted genes

Predicted proteins

Align to Ref-Seq

Map to KEGG

Overlay onto KEGG pathway

RNA-Seq1M isoforms

Gene prediction tools• Augustus• Genemark• SNAP• (100,000 genes)

Transdecoder

BLASTPToGrape ORpotato

pathview

pathview

Page 10: Update on the assembly and annotation of the blueberry genome

www.P2EP.org

FLAVONOID PATHWAY

Page 11: Update on the assembly and annotation of the blueberry genome

www.P2EP.org

2 AVAILABLE ONLINE ANNOTATION RESOURCES

https://www.vaccinium.org

Anne Lorraine http://bioviz.org/igb/index.html

Dorrie Maine - WSU

Page 12: Update on the assembly and annotation of the blueberry genome

www.P2EP.org

FUTURE STEPS FOR GENOMICS

• Improving the genome• Massimo Iorizzo• Hamid Ashrafi

• Jeannie Rowland• Improve contiguity• Resolve repeat regions• Fill in the gaps

• Improving the linkage maps

• Higher density map• More anchoring points

• Optical map

• To improve scaffold / contig ordering

Page 13: Update on the assembly and annotation of the blueberry genome

ACKNOWLEDGEMENTS• Allan Brown –CGIAR

• Ying-Chen Lin –NC State• Mary Ann Lila –NC State

• Ra’ad Gharaibeh -UNCC• Rachel Walstead -UNCC• Gregario Lingchanco-UNCC• Cory R Brouwer -UNCC

• Jeannie Rowland –USDA-ARS

• Dorrie Maine - WSU

• Garron Wright – DHMRI• Mark Burk – DHMRI

• James Olmstead – U of Florida

• And many more