annotating genomes using maker-p and iplant. what are annotations? annotations are descriptions of...
TRANSCRIPT
![Page 1: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/1.jpg)
Annotating genomes using MAKER-P and iPlant
![Page 2: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/2.jpg)
What Are Annotations?
• Annotations are descriptions of features of the genome– Structural: exons, introns, UTRs, splice forms etc.– Coding & non-coding genes– Expression, repeats, transposons
• Annotations should include evidence trail– Assists in quality control of genome annotations
• Examples of evidence supporting a structural annotation:– Ab initio gene predictions– ESTs– Protein homology
![Page 3: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/3.jpg)
Secondary Annotation• Protein Domains
– InterPro Scan: combines many HMM databases• GO and other ontologies• Pathway mapping
– E.g. BioCyc Pathway tools
![Page 4: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/4.jpg)
Challenges in Plant Genome Annotation• Genomes are BIG • Highly repetitive• Many pseudogenes• Assembly contamination• Incomplete evidence• No method is 100% accurate
![Page 5: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/5.jpg)
Options for Protein-coding Gene Annotation
Yandell & Ence. Nature Reviews Genetics 13, 329-342 (May 2012) | doi:10.1038/nrg3174
![Page 6: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/6.jpg)
Typical Annotation Pipeline• Contamination screening• Repeat/TE masking• Ab initio prediction• Evidence alignment (cDNA, EST, RNA-seq,
protein)• Evidence-driven prediction• Chooser/combiner• Evaluation/filtering• Manual curation
![Page 7: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/7.jpg)
MAKER-P Automated Pipeline
Ab initio prediction Evidence
MPI-enabled to allow parallel operation on large compute clusters
Collaboration with Yandell Lab
Repeat Library
![Page 8: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/8.jpg)
What is a GFF File?
Generic Feature Format
![Page 9: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/9.jpg)
• W559 - Annotation of the Lobolly Pine Megagenome—Jill Wegrzyn– 20.15 Gb assembly—split into 40 jobs—216 CPU/job (8640 CPU total)—17 hours
• P157 - Disease Resistance Gene Analysis on Chromosome 11 Across Ten Oryza Species
– 10 rice species (each w/12 chromosome pseudomolecules)– 96 CPU per chromosome (1152 CPU total) ~ 2hr per genome
9
22,656 CPU cores on1,888 nodes Genome Assembly Size
(Mb) CPU Run Time
Arabidopsis thaliana TAIR10 120 600 2:44Arabidopsis thaliana TAIR10 120 1500 1:27Zea mays RefGen_v2 2067 2172 2:53
TACC Lonestar Supercomputer
Campbell et al. Plant Physiology. December 4, 2013, DOI:10.1104/pp.113.230144
PAG 2014:
MAKER-P at iPlant
![Page 10: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/10.jpg)
MAKER-P at iPlant
• Virtual image• MPI-enabled for parallel computing• Check out with up to 16 CPU• Tested with 4 CPU instance
– Completed rice chr 1 in 8 hr 45 min
10
Atmosphere: MAKER_2.28 (emi-F13821D0)
![Page 11: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/11.jpg)
MAKER-P Tutorial
https://pods.iplantcollaborative.org/wiki/display/sciplant/MAKER-P+Atmosphere+Tutorial
![Page 12: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/12.jpg)
![Page 13: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/13.jpg)
![Page 14: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/14.jpg)
![Page 15: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/15.jpg)
Documentation and Help
![Page 16: Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649f335503460f94c507ce/html5/thumbnails/16.jpg)
Additional MAKER-P Resources• MAKER-P: http
://www.yandell-lab.org/software/maker-p.html
• Repeat Library construction: http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction--Advanced
• Pseudogene identification: http://shiulab.plantbiology.msu.edu/wiki/index.php/Protocol:Pseudogene