transcriptome analysis of drough-tolerant cam plants...

1
Transcriptome analysis of drough-tolerant CAM plants, Agave deserti and Agave tequilana Stephen M. Gross 1,2 , Jeffrey A. Martin 1,2 , June Simpson 3 , Zhong Wang 1,2 , and Axel Visel 1,2 1. DOE Joint Genome Institute, Walnut Creek, CA 2. Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 3. CINVESTAV, Irapuato, MX Agaves are succulent monocotyledonous plants native to hot and arid environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis) and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, we use RNA-seq data to gain insights into biological functions along the A. deserti juvenile leaf proximal-distal axis. Our work presents a foundation for further investigation of agave biology and their improvement for bioenergy development. ABSTRACT Cinvestav BLAST A. deserti A. tequilana 2948 2835 13,388 OrthoMCL one-to-one RBH protein comparison protein family comparison 20,377 20,161 14,709 A. deserti A. tequilana OrthoMCL B. distachyon 16618 O. sativa 19643 S. bicolor 18218 18218 8218 821 18218 Z. mays 20681 A. deserti 16223 A. tequilana 16336 A C B 461 315 782 755 129 503 199 2086 13 55 12 7 13 26 23 3 5 21 15 31 12 593 89 29 46 16 111 787 3751 1108 2735 161 362 575 3718 33 1789 14 70 20 26 16 9 36 11 4 3 27 13 29 20 748 4323 35 81 23 15 135 11 46 176 108 8144 Relative composite RPKM value normalized to section with highest expression 0.0 0.2 0.4 0.6 0.8 1.0 transcription factors hormones 1 2 3 4 1 2 3 4 cell wall & stomata development Cell wall biosynthesis Cellulose biosynthesis Lignin biosynthesis Stomata development Cutin & suberin biosynthesis photosynthesis Antenna proteins Photosystem II Photosystem I Cytochrome b6f & ATP synthase Calvin cycle C4 dicarboxylic acid cycles dark light CAM Chlorophyll biosynthesis Class I Class II MADS-box GRAS YABBY MYB bHLH Zn finger KNOX Auxin biosynthesis transport CK biosynthesis GA biosynthesis ETH biosynthesis BR biosynthesis ABA biosynthesis 79645 of 88718 loci clustered photosynthesis regulation of gene expression translation cellular protein modification DNA metabolism vesicle-mediated transport Cluster E O O O O 1 2 3 4 17579 loci Cluster F O O O O 1 2 3 4 8426 loci Distal Expression ï 0 1 zïVFDOHG 53.0 Cluster A O O O O 1 2 3 4 22249 loci Cluster B O O O O 1 2 3 4 12063 loci 3UR[LPDO ([SUHVVLRQ Cluster C O O O O 1 2 3 4 11789 loci Cluster D O O O O 1 2 3 4 7539 loci 0HGLDO ([SUHVVLRQ 3 4 2 1 proximal (base) distal (tip) D A B C Number of reads Probability of observing a unique 25-mer contig GC content density 0 1 2 3 4 5 6 0.0 0.2 0.4 0.6 0.8 1.0 A. tequilanacontigs removed contigs contig GC content 0 2 4 6 0.0 0.2 0.4 0.6 0.8 1.0 A. deserti contigs removed contigs 0.0 0.2 0.4 0.6 0.8 1.0 1 100 10000 ORFXV FRGLQJ SRWHQWLDO non-coding coding A. deserti Locus RPKM Transcript length (nt) 0.0 0.3 0.6 0.9 1.2 100 1000 10000 density VSHFLHV A. deserti A. tequilana Locus RPKM 0.0 0.5 1.0 1.5 2.0 1 100 10000 A. tequilana ORFXV FRGLQJ SRWHQWLDO non-coding coding density O O O O O O O O O O O O O O O O O O O 0.2 0.4 0.6 0.8 210 215 220 225 230 6SHFLHV GDWDVHW O A. deserti A. tequilana C D A B E F Agave tequilana Agave deserti 0 1000 2000 0 10 20 30 0 2000 4000 6000 PacBio subreads || Rnnotator Rnnotator contig contains PacBio subread Pacbio subreadcontains Rnnotator contig Sequences overlap unaligned Number of transcripts n = 4767 2683 862 1221 1 GenBank || Rnnotator Rnnotator contig contains GenBank sequence GenBank sequence contains Rnnotator contig Sequences overlap unaligned n = 82 1 38 14 29 A. deserti || A. tequilana A. tequilana || A. deserti A. deserti contains A. tequilana A. tequilanacontains A. deserti Sequences overlap unaligned 0 10,000 20,000 30,000 40,000 50,000 0 30,000 60,000 90,000 A. deserti contains A. tequilana A. tequilanacontains A. deserti Sequences overlap unaligned Fraction of A. tequilanatranscript aligning to A. deserti transcript 0 20 30 0.0 0.3 0.6 0.9 Number of A. tequilanatrasncripts (thousands) a Fraction of A. deserti transcript aligning to A. tequilanatranscript 0 10 20 30 40 50 Number of A. deserti transcripts (thousands) 0.0 0.6 0.9 0.3 A. deserti || A. tequilana A. tequilana || A. deserti Number of transcripts (thousands) 28,627 44,443 107,821 23,649 n = 128,959 32,231 37,241 52,109 7,378 n = 204,530 10 McKain et al. || Rnnotator n = 12,972 Overlap: Containment: Class descriptions $OLJQPHQW SURSHUWLHV indels no indels unaligned Rnnotator contig contains McKain et al. sequence McKain et al. contig contains Rnnotator contig Sequences overlap unaligned 6578 2710 3560 124 A B C AGAVE TRANSCRIPTOME ASSEMBLIES FROM DEEP RNA-seq COMPARISON OF AGAVE DE NOVO ASSEMBLIES FIGURE 3: Comparison of the de novo Agave transcriptome assemblies (A) Comparisons of the A. tequilana de novo Rnnotator assembly to error corrected Pacific Biosciences subreads, 82 GenBank A. tequilana sequences, and an additional A. tequilana dataset from McKain et al. 2012. [4] (B) Comparisons between the A. tequilana and A. deserti de novo Rnnotator assemblies. (C) Histograms of the fraction of aligned sequence lengths between A. deserti and A. tequilana. Symbol || separates query sequence dataset from subject sequence dataset. Total number of sequences (n) is noted in each bar chart, total number of sequences in alignment classes are noted above bar. FIGURE 4: Proteomic comparison of agaves to other plant species (A) Venn diagram of BLASTP-based one-to-one reciprocal best hit proteins shared between A. deserti and A. tequilana. (B) Venn diagram of OrthoMCL-defined protein families shared between agaves. (C) Edwards-Venn diagram of OrthoMCL-defined plant orthologous-group protein families (Plant OGs) shared between agave and 4 additional monocotyledonous plant species. Shape and color used for each species is at the right with the total number of Plant OGs within each species. FIGURE 5: Transcriptomic analysis of the A. deserti leaf proximal-distal axis. (A) One of the A. deserti leaves used for analysis, indicating proximal-distal (PD) sections 1–4. (B) Six major K-means clusters of gene expression along the PD axis. Clusters are manually grouped by highest expression in proximal, medial, or distal tissues. Blue lines connect mean z-scaled RPKM values, shaded areas represent the 25th and 75th percentiles, red lines indicate standard error at each mean. Green text beneath each cluster denotes the description of the most significantly enriched GO term in each cluster. (C, D) Heatmaps of composite gene expression for indicated biological processes along the leaf PD axis. PROTEOMIC ANALYSES SUPPORT COMPREHENSIVE AGAVE TRANSCRIPTOME ASSEMBLIES PROFILING OF THE A. DESERTI LEAF HIGHLIGHTS REGIONS CRITICAL TO DEVELOPMENT AND PHOTOSYNTHESIS FIGURE 2: A. tequilana, A. deserti, and their respective transcriptomes (A) Cultivated A. tequilana in Jalisco, Mexico. (B) A. deserti (foreground) in natural habitat, Riverside County, California, USA. (C) Plot of the fraction of unique 25-mers over indicated read depth (log2 scale). (D) Density plot of GC content of agave transcript contigs vs. contigs from contamination and commensal organisms. (E) Density plots of A. deserti and A. tequilana transcript lengths. Note log 10 scale. Peaks at 150 and 250 nt represent single reads or paired-end reads, respectively, that were not assembled into larger contigs. (F) Density plot of locus RPKM values for coding (dark shading) and non-coding (light shading) loci. Species A. tequilana A. deserti Total Sequencing 293.5 Gbp 184.7 Gbp No. of loci 139,525 88,718 No. transcript contigs 204,530 128,869 N50 length 1387 bp 1323 bp Sum assembled length 204.9 Mbp 125.0 Mbp No. protein-coding loci 34,870 35,086 OVERVIEW OF AGAVE TRANSCRIPTOME ASSEMBLIES CAM PHOTOSYNTHESIS, ARID ENVIRONMENTS, AND BIOENERGY Agave species are adapted to their native habitat in arid regions of Mexico and the United States. Agave thus holds promise as a biofuel feedstock [1,2], capable of growing on marginal lands where other proposed bioenergy plants cannot. The ability of agaves to withstand hot and arid conditions relies upon crassulacean acid metabolism (CAM)—a specialized form of photosynthesis allowing agaves to keep leaf stomata (pores) closed during the hot day, minimizing water loss through evapotranspiration. A sugar CO 2 C 4 C 3 CO 2 C 3 C 4 Calvin Cycle light vacuole chloroplast NIGHT DAY B C Agave Semi-arid regions FIGURE 1: Agaves and CAM biology (A) Agave tequilana cultivated in Mexico. (B) Semi-arid regions of the United States (brown) are unsuitable for cultivation of other bioenergy plants, which require more temperate regions (green). Most Agave species are adapted to semi-arid regions in Mexico and the extreme southwestern USA (purple). (C) Crassulacean Acid Metabolism (CAM). CO 2 enters plant cells at night, joins with a 3-carbon molecule (C 3 ) and is stored in the vacuole as a 4-carbon molecule (C 4 ). During the day, C 4 molecules diffuse out of the vacuole, and CO 2 is relased and assimilated into sugar in the chloroplast. Comparison of inputs (water and nitrogen) and outputs (biomass and ethanol) of agaves and other biofuel feedstock species. Though agaves are harvested at several years of age, their annualized growth rate is on par with Miscanthus. Table is modified from reference [2]. Corn grain Corn stover Miscanthus Poplar coppice Agave spp. 75–120 70–105 30–80 50–80 low low moderate high 0–15 0–50 0–12 90–120 7–10 3–6 15–40 5–11 10–34 2900 900 4600–12,400 1500–3400 3000–10,500 Feedstock Water (cm yr -1 ) Drought tolerance Nitrogen (kg ha -1 yr -1 ) Dry biomass (Mg ha -1 yr -1 ) Ethanol (liters yr -1 ) Inputs Outputs This work performed at the U.S. Department of Energy Joint Genome Institute was supported in part by the Office of Science of the U.S. Department of Energy under contract DE-AC02-05CH112. [1] Davis, A. S. et al. The global potential for Agave as a biofuel feedstock. GCB Bioenergy 3, 68–78, (2011). [2] Somerville, C. et al. Feedstocks for lignocellulosic biofuels. Science 329, 790-2, (2010). [3] Martin, J. et al. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-seq reads. BMC Genomics 11, 663, (2010). [4] McKain, M. et al. Phylogenomic analysis of transcriptome data elucidates co-occurrence of a paleopolyploid event and the origin of bimodal karyotates in Agavoideae (Asparagaceae). Am J Bot 99:2, 397–406. To provide sequence resources for the Agave research community, we built de novo transcriptomes of Agave tequilana and Agave deserti from deep Illumina RNA-seq data. Sequences were assembled by Rnnotator [3], a de novo transcriptome assembly pipeline. ACKNOWLEDGEMENTS AND CITATIONS Analysis of assembled contigs suggest the Agave de novo assemblies are comprehensive and accurate. Proteome comparisons between Agave species and additional monocot species suggest the majority of Agave proteins are conserved across taxa. We can also identify protein families specific to agaves. Agaves spend the majority of their lives as compact rosettes, thus leaves are important organs in which to study Agave developmental and bioenergetic processes.

Upload: others

Post on 09-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Transcriptome analysis of drough-tolerant CAM plants ...1ofdmq2n8tc36m6i46scovo2e-wpengine.netdna-ssl.com/wp-content/… · Transcriptome analysis of drough-tolerant CAM plants, Agave

Transcriptome analysis of drough-tolerant CAM plants,Agave deserti and Agave tequilanaStephen M. Gross1,2, Jeffrey A. Martin1,2, June Simpson3, Zhong Wang1,2, and Axel Visel1,2

1. DOE Joint Genome Institute, Walnut Creek, CA2. Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA3. CINVESTAV, Irapuato, MX

Agaves are succulent monocotyledonous plants native to hot and arid environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis) and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, we use RNA-seq data to gain insights into biological functions along the A. deserti juvenile leaf proximal-distal axis. Our work presents a foundation for further investigation of agave biology and their improvement for bioenergy development.

ABSTRACT

Cinvestav

BLAST

A. deserti A. tequilana

29482835 13,388

OrthoMCLone-to-one RBH protein comparison protein family comparison

20,37720,161 14,709

A. deserti A. tequilana

OrthoMCL

B. distachyon16618

O. sativa19643

S. bicolor182181821818218821882118218

Z. mays20681

A. deserti16223

A. tequilana16336

A

C

B

461

315782

755129

503199

2086

13

5512

7

1326

233

5

21

1531

12593

8929

46 16

111

787

37511108

2735

161362

575

3718

33

1789

14

7020

26

16

936

114

3

27

1329

20748

4323

35

81

23

15135

11

46176

108 8144

Relative composite RPKM valuenormalized to section with highest expression

0.0 0.2 0.4 0.6 0.8 1.0

transcription factorshorm

ones

1 2 3 4 1 2 3 4

cell wall &

stomata

development

Cell wall biosynthesis

Cellulose biosynthesis

Lignin biosynthesis

Stomata development

Cutin & suberin biosynthesis

photosynthesis

Antenna proteins

Photosystem II

Photosystem I

Cytochrome b6f & ATP synthase

Calvin cycle

C4 dicarboxylic acid cycles

dark

light CA

M

Chlorophyll biosynthesis Class I

Class II

MADS-box

GRAS

YABBY

MYB

bHLH

Zn finger

KN

OX

Au

xin biosynthesis

transport

CK biosynthesis

GA

biosynthesis

ET

H

biosynthesis

BR biosynthesis

AB

A

biosynthesis

7964

5 of

887

18 lo

ci c

lust

ered

photosynthesisregulation of

gene expressiontranslationcellular protein

modificationDNA

metabolismvesicle-mediated

transport

Cluster E

1 2 3 4

17579 loci

Cluster F

1 2 3 4

8426 loci

Distal Expression

0

1

z

Cluster A

1 2 3 4

22249 loci

Cluster B

1 2 3 4

12063 loci

Cluster C

1 2 3 4

11789 loci

Cluster D

1 2 3 4

7539 loci

3 421proximal(base)

distal(tip)

D

A

B

C

Number of reads

Pro

babi

lity

of o

bser

ving

a u

niqu

e 25

-mer

contig GC content

dens

ity

0

1

2

3

4

5

6

0.0 0.2 0.4 0.6 0.8 1.0

A. tequilana contigs

removed contigs

contig GC content

0

2

4

6

0.0 0.2 0.4 0.6 0.8 1.0

A. deserti contigs

removed contigs

0.0

0.2

0.4

0.6

0.8

1.0

1 100 10000

non-coding

coding

A. deserti

Locus RPKMTranscript length (nt)

0.0

0.3

0.6

0.9

1.2

100 1000 10000

dens

ity

A. deserti

A. tequilana

Locus RPKM

0.0

0.5

1.0

1.5

2.0

1 100 10000

A. tequilana

non-coding

coding

dens

ity

0.2

0.4

0.6

0.8

210 215 220 225 230

A. desertiA. tequilana

C D

A B

E F

Agave tequilana Agave deserti

0

1000

2000

0

10

20

30

0

2000

4000

6000

PacBio subreads || Rnnotator

Rnnotator contig contains

PacBio subread

Pacbio subreadcontains

Rnnotator contig

Sequences

overlap

unaligned

Num

ber o

f tra

nscr

ipts

n = 47672683

862

1221

1

GenBank || Rnnotator

Rnnotator contig contains

GenBank sequence

GenBank sequence contains

Rnnotator contig

Sequences

overlap

unaligned

n = 82

1

38

14

29

A. deserti || A. tequilanaA. tequilana || A. deserti

A. deserti contains

A. tequilana

A. tequilana contains

A. deserti

Sequences

overlap

unaligned

0

10,000

20,000

30,000

40,000

50,000

0

30,000

60,000

90,000

A. deserti contains

A. tequilana

A. tequilana contains

A. deserti

Sequences

overlap

unalignedFraction of A. tequilana transcript

aligning to A. deserti transcript

0

20

30

0.0 0.3 0.6 0.9

Num

ber

of A

. teq

uila

na tr

asnc

ripts

(th

ousa

nds)

a

Fraction of A. deserti transcriptaligning to A. tequilana transcript

0

10

20

30

40

50

Num

ber

ofA

. des

erti

tran

scrip

ts (

thou

sand

s)

0.0 0.6 0.90.3

A. deserti || A. tequilanaA. tequilana || A. deserti

Num

ber o

f tra

nscr

ipts

(tho

usan

ds)

28,627

44,443

107,821

23,649

n = 128,959

32,231

37,241

52,109

7,378

n = 204,530

10

McKain et al. || Rnnotator

n = 12,972

Overlap:

Containment:

Class descriptions

indels

no indels

unaligned

Rnnotator contig contains

McKain et al. sequence

McKain et al. contig contains

Rnnotator contig

Sequences

overlap

unaligned

6578

2710

3560

124

A

B C

AGAVE TRANSCRIPTOME ASSEMBLIES FROM DEEP RNA-seq

COMPARISON OF AGAVE DE NOVO ASSEMBLIES

FIGURE 3: Comparison of the de novo Agave transcriptome assemblies

(A) Comparisons of the A. tequilana de novo Rnnotator assembly to error corrected Pacific Biosciences subreads, 82 GenBank A. tequilana sequences, and an additional A. tequilana dataset from McKain et al. 2012. [4]

(B) Comparisons between the A. tequilana and A. deserti de novo Rnnotator assemblies.

(C) Histograms of the fraction of aligned sequence lengths between A. deserti and A. tequilana.

Symbol || separates query sequence dataset from subject sequence dataset. Total number of sequences (n) is noted in each bar chart, total number of sequences in alignment classes are noted above bar.

FIGURE 4: Proteomic comparison of agaves to other plant species

(A) Venn diagram of BLASTP-based one-to-one reciprocal best hit proteins shared between A. deserti and A. tequilana.

(B) Venn diagram of OrthoMCL-defined protein families shared between agaves.

(C) Edwards-Venn diagram of OrthoMCL-defined plant orthologous-group protein families (Plant OGs) shared between agave and 4 additional monocotyledonous plant species. Shape and color used for each species is at the right with the total number of Plant OGs within each species.

FIGURE 5: Transcriptomic analysis of the A. deserti leaf proximal-distal axis.

(A) One of the A. deserti leaves used for analysis, indicating proximal-distal (PD) sections 1–4.

(B) Six major K-means clusters of gene expression along the PD axis. Clusters are manually grouped by highest expression in proximal, medial, or distal tissues. Blue lines connect mean z-scaled RPKM values, shaded areas represent the 25th and 75th percentiles, red lines indicate standard error at each mean. Green text beneath each cluster denotes the description of the most significantly enriched GO term in each cluster.

(C, D) Heatmaps of composite gene expression for indicated biological processes along the leaf PD axis.

PROTEOMIC ANALYSES SUPPORT COMPREHENSIVE AGAVE TRANSCRIPTOME ASSEMBLIES

PROFILING OF THE A. DESERTI LEAF HIGHLIGHTS REGIONS CRITICAL TO DEVELOPMENT AND PHOTOSYNTHESIS

FIGURE 2: A. tequilana, A. deserti, and their respective transcriptomes

(A) Cultivated A. tequilana in Jalisco, Mexico.

(B) A. deserti (foreground) in natural habitat, Riverside County, California, USA.

(C) Plot of the fraction of unique 25-mers over indicated read depth (log2 scale).

(D) Density plot of GC content of agave transcript contigs vs. contigs from contamination and commensal organisms.

(E) Density plots of A. deserti and A. tequilana transcript lengths. Note log10 scale. Peaks at 150 and 250 nt represent single reads or paired-end reads, respectively, that were not assembled into larger contigs.

(F) Density plot of locus RPKM values for coding (dark shading) and non-coding (light shading) loci.

Species

A. tequilana

A. deserti

TotalSequencing293.5 Gbp

184.7 Gbp

No. of loci

139,525

88,718

No. transcriptcontigs204,530

128,869

N50 length

1387 bp

1323 bp

Sumassembled length

204.9 Mbp

125.0 Mbp

No. protein-codingloci

34,870

35,086

OVERVIEW OF AGAVE TRANSCRIPTOME ASSEMBLIES

CAM PHOTOSYNTHESIS, ARID ENVIRONMENTS, AND BIOENERGYAgave species are adapted to their native habitat in arid regions of Mexico and the United States. Agave thus holds promise as a biofuel feedstock [1,2], capable of growing on marginal lands where other proposed bioenergy plants cannot. The ability of agaves to withstand hot and arid conditions relies upon crassulacean acid metabolism (CAM)—a specialized form of photosynthesis allowing agaves to keep leaf stomata (pores) closed during the hot day, minimizing water loss through evapotranspiration.

A

sugar

CO2

C4

C3

CO2

C3

C4

CalvinCyclelight

vacuolechloroplast

NIGHT

DAY

B

C

Agave

Semi-arid regions

FIGURE 1: Agaves and CAM biology

(A) Agave tequilana cultivated in Mexico.

(B) Semi-arid regions of the United States (brown) are unsuitable for cultivation of other bioenergy plants, which require more temperate regions (green). Most Agave species are adapted to semi-arid regions in Mexico and the extreme southwestern USA (purple).

(C) Crassulacean Acid Metabolism (CAM). CO2enters plant cells at night, joins with a 3-carbon molecule (C3) and is stored in the vacuole as a 4-carbon molecule (C4). During the day, C4molecules diffuse out of the vacuole, and CO2 is relased and assimilated into sugar in the chloroplast.

Comparison of inputs (water and nitrogen) and outputs (biomass and ethanol) of agaves and other biofuel feedstock species. Though agaves are harvested at several years of age, their annualized growth rate is on par with Miscanthus. Table is modified from reference [2].

Corn grainCorn stoverMiscanthusPoplar coppiceAgave spp.

75–12070–10530–80

50–80 low

lowmoderate

high

0–150–500–12

90–120 7–103–6

15–405–11

10–34

2900900

4600–12,4001500–3400

3000–10,500

Feedstock Water(cm yr-1)

Droughttolerance

Nitrogen(kg ha-1 yr-1)

Dry biomass(Mg ha-1 yr-1)

Ethanol(liters yr-1)

Inputs Outputs

This work performed at the U.S. Department of Energy Joint Genome Institute was supported in part by the Office of Science of the U.S. Department of Energy under contract DE-AC02-05CH112.

[1] Davis, A. S. et al. The global potential for Agave as a biofuel feedstock. GCB Bioenergy 3, 68–78, (2011).[2] Somerville, C. et al. Feedstocks for lignocellulosic biofuels. Science 329, 790-2, (2010).[3] Martin, J. et al. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-seq reads. BMC

Genomics 11, 663, (2010).[4] McKain, M. et al. Phylogenomic analysis of transcriptome data elucidates co-occurrence of a paleopolyploid event and the origin

of bimodal karyotates in Agavoideae (Asparagaceae). Am J Bot 99:2, 397–406.

To provide sequence resources for the Agave research community, we built de novo transcriptomes of Agave tequilana and Agave deserti from deep Illumina RNA-seq data. Sequences were assembled by Rnnotator [3], a de novo transcriptome assembly pipeline.

ACKNOWLEDGEMENTS AND CITATIONS

Analysis of assembled contigs suggest the Agave de novo assemblies are comprehensive and accurate.

Proteome comparisons between Agave species and additional monocot species suggest the majority of Agave proteins are conserved across taxa. We can also identify protein families specific to agaves.

Agaves spend the majority of their lives as compact rosettes, thus leaves are important organs in which to study Agave developmental and bioenergetic processes.