split photosystem protein, linear-mapping …mass/charge properties of intact psaa and atpb was...

16
Article Split Photosystem Protein, Linear-Mapping Topology, and Growth of Structural Complexity in the Plastid Genome of Chromera velia Jan Janous ˇkovec,* , y,1 Roman Sobotka, y,2,3 De-Hua Lai, y, z,4 Pavel Flegontov, 4 Peter Konı ´k, 3 Josef Komenda, 2,3 Shahjahan Ali, 5 Ondr ˇej Pra ´s ˇil, 2,3 Arnab Pain, 6 Miroslav Obornı ´k, 2,3,4 Julius Lukes ˇ, 3,4 and Patrick J. Keeling* ,1 1 Department of Botany, University of British Columbia, Vancouver, British Columbia, Canada 2 Institute of Microbiology, Czech Academy of Sciences, Tr ˇebon ˇ, Czech Republic 3 Faculty of Science, University of South Bohemia, C ˇ eske ´ Bude ˇjovice, Czech Republic 4 Biology Centre, Institute of Parasitology, Czech Academy of Sciences, C ˇ eske ´ Bude ˇjovice, Czech Republic 5 Biosciences Core Laboratory-Genomics, King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia 6 Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Kingdom of Saudi Arabia y These authors contributed equally to this work. z Present address: School of Life Sciences, Sun Yat-Sen University, Guangzhou, China *Corresponding author: E-mail: [email protected]; [email protected]. Associate editor: Charles Delwiche Abstract The canonical photosynthetic plastid genomes consist of a single circular-mapping chromosome that encodes a highly conserved protein core, involved in photosynthesis and ATP generation. Here, we demonstrate that the plastid genome of the photosynthetic relative of apicomplexans, Chromera velia, departs from this view in several unique ways. Core photosynthesis proteins PsaA and AtpB have been broken into two fragments, which we show are independently transcribed, oligoU-tailed, translated, and assembled into functional photosystem I and ATP synthase complexes. Genome-wide transcription profiles support expression of many other highly modified proteins, including several that contain extensions amounting to hundreds of amino acids in length. Canonical gene clusters and operons have been fragmented and reshuffled into novel putative transcriptional units. Massive genomic coverage by paired-end reads, coupled with pulsed-field gel electrophoresis and polymerase chain reaction, consistently indicate that the C. velia plastid genome is linear-mapping, a unique state among all plastids. Abundant intragenomic duplication probably mediated by recombination can explain protein splits, extensions, and genome linearization and is perhaps the key driving force behind the many features that defy the conventional ways of plastid genome architecture and function. Key words: plastid genome evolution, Chromera velia, split protein, linear-mapping genome. Introduction Chromera velia is an autotrophic alveolate that was discov- ered during a survey of zooxanthellae in Australian coral reefs (Moore et al. 2008). The dominant reef symbionts are dino- flagellates from the genus Symbiodinium, but C. velia was found to be related to the sister group of dinoflagellates, the apicomplexan parasites (Moore et al. 2008; Obornı ´k et al. 2009). Because apicomplexans include a number of medically and economically important pathogens (e.g., the malaria parasite Plasmodium, as well as Toxoplasma, Cryptosporidium, and Babesia), and because of the interest in the cryptic, non-photosynthetic plastid that has now been found in many of these apicomplexans, the nature of the plastid in C. velia was of immediate interest. Accordingly, the complete C. velia plastid genome was characterized and has proven critical in elucidating the origin of the apicom- plexan plastid and its relationship to that of dinoflagellates (Janous ˇkovec et al. 2010), and other plastid-related metabolic pathways have also already been compared with those of apicomplexans (Botte ´ et al. 2011; Kor ˇeny ´ et al. 2011). Although these questions have certainly directed much attention to C. velia and its plastid in particular, they have also overshadowed the intrinsic interest in this organism. Aside from being a key to understanding apicomplexan plastids, C. velia is itself potentially interesting and important both ecologically and evolutionarily (Janous ˇkovec et al. 2012). This is because C. velia is one of the few known photosyn- thetic lineages in the alveolates (the others being Vitrella and dinoflagellates; Obornı ´k et al. 2012), so its plastid represents new breadth in the study of plastid diversity never before ß The Author 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 30(11):2447–2462 doi:10.1093/molbev/mst144 Advance Access publication August 22, 2013 2447 at The University of British Colombia Library on February 17, 2014 http://mbe.oxfordjournals.org/ Downloaded from

Upload: others

Post on 29-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

Article

Split Photosystem Protein Linear-Mapping Topology andGrowth of Structural Complexity in the Plastid Genome ofChromera velia

Jan Janouskovecy1 Roman Sobotkay23 De-Hua Laiyz4 Pavel Flegontov4 Peter Konık3

Josef Komenda23 Shahjahan Ali5 Ondrej Prasil23 Arnab Pain6 Miroslav Obornık234 Julius Lukes34 andPatrick J Keeling1

1Department of Botany University of British Columbia Vancouver British Columbia Canada2Institute of Microbiology Czech Academy of Sciences Trebon Czech Republic3Faculty of Science University of South Bohemia Ceske Budejovice Czech Republic4Biology Centre Institute of Parasitology Czech Academy of Sciences Ceske Budejovice Czech Republic5Biosciences Core Laboratory-Genomics King Abdullah University of Science and Technology (KAUST) ThuwalKingdom of Saudi Arabia6Computational Bioscience Research Center (CBRC) King Abdullah University of Science and Technology (KAUST) ThuwalKingdom of Saudi ArabiayThese authors contributed equally to this workzPresent address School of Life Sciences Sun Yat-Sen University Guangzhou China

Corresponding author E-mail janjanczgmailcom pkeelingmailubcca

Associate editor Charles Delwiche

Abstract

The canonical photosynthetic plastid genomes consist of a single circular-mapping chromosome that encodes a highlyconserved protein core involved in photosynthesis and ATP generation Here we demonstrate that the plastid genomeof the photosynthetic relative of apicomplexans Chromera velia departs from this view in several unique ways Corephotosynthesis proteins PsaA and AtpB have been broken into two fragments which we show are independentlytranscribed oligoU-tailed translated and assembled into functional photosystem I and ATP synthase complexesGenome-wide transcription profiles support expression of many other highly modified proteins including several thatcontain extensions amounting to hundreds of amino acids in length Canonical gene clusters and operons have beenfragmented and reshuffled into novel putative transcriptional units Massive genomic coverage by paired-end readscoupled with pulsed-field gel electrophoresis and polymerase chain reaction consistently indicate that the C velia plastidgenome is linear-mapping a unique state among all plastids Abundant intragenomic duplication probably mediated byrecombination can explain protein splits extensions and genome linearization and is perhaps the key driving forcebehind the many features that defy the conventional ways of plastid genome architecture and function

Key words plastid genome evolution Chromera velia split protein linear-mapping genome

IntroductionChromera velia is an autotrophic alveolate that was discov-ered during a survey of zooxanthellae in Australian coral reefs(Moore et al 2008) The dominant reef symbionts are dino-flagellates from the genus Symbiodinium but C velia wasfound to be related to the sister group of dinoflagellatesthe apicomplexan parasites (Moore et al 2008 Obornıket al 2009) Because apicomplexans include a number ofmedically and economically important pathogens (egthe malaria parasite Plasmodium as well as ToxoplasmaCryptosporidium and Babesia) and because of the interestin the cryptic non-photosynthetic plastid that has now beenfound in many of these apicomplexans the nature of theplastid in C velia was of immediate interest Accordinglythe complete C velia plastid genome was characterized and

has proven critical in elucidating the origin of the apicom-plexan plastid and its relationship to that of dinoflagellates(Janouskovec et al 2010) and other plastid-related metabolicpathways have also already been compared with those ofapicomplexans (Botte et al 2011 Koreny et al 2011)

Although these questions have certainly directed muchattention to C velia and its plastid in particular they havealso overshadowed the intrinsic interest in this organismAside from being a key to understanding apicomplexanplastids C velia is itself potentially interesting and importantboth ecologically and evolutionarily (Janouskovec et al 2012)This is because C velia is one of the few known photosyn-thetic lineages in the alveolates (the others being Vitrella anddinoflagellates Obornık et al 2012) so its plastid representsnew breadth in the study of plastid diversity never before

The Author 2013 Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution All rights reserved For permissions pleasee-mail journalspermissionsoupcom

Mol Biol Evol 30(11)2447ndash2462 doi101093molbevmst144 Advance Access publication August 22 2013 2447

at The U

niversity of British C

olombia L

ibrary on February 17 2014httpm

beoxfordjournalsorgD

ownloaded from

accessible for comparison with other plastids Indeed theinitial description of the C velia plastid genome noted severalunusual features with implications for plastid evolution andfunction (Janouskovec et al 2010) but because they weretypically not shared with either apicomplexans or dinoflagel-lates (and therefore not comparable to them) they have notbeen characterized further

Here we describe three intriguing features of the C veliaplastid genome the presence of split genes encoding splitproteins divergent characteristics of gene organization andexpression and the physical structure of the chromosomeWe show that two functionally important proteins involvedin photosynthesis PsaA representing a core subunit of pho-tosystem I (PSI) and AtpB representing the beta subunit ofthe ATP synthase are expressed in two discrete fragments atboth the RNA and protein levels which has interesting im-plications for the structure and function of these otherwisehighly conserved proteins We also show that the plastidchromosome is highly divergent in structure with a pro-nounced strand polarity altered gene order and large exten-sions in many coding sequences which appear transcribedFinally we significantly expand the depth of sequence cover-age at the DNA level and show that the coverage pattern atthe chromosome ends polymerase chain reaction (PCR)experiments genome migration on pulse field gels and pres-ence of two long terminal inverted repeats (TIRs) all suggestthat the genome is linear in structure This would representthe first documented case of a linear-mapping plastidgenome These unique characteristics substantially expandour current understanding of plastid genome diversitymuch of which we hypothesize is due to high levels of intra-genomic duplication in this lineage

Results

Fragmented Genes Encode Fragmented ProteinsExpressed from Polycistronic mRNAs

The complete C velia plastid genome contained a number ofsmall fragments of genes including psaA atpB psaB psbBrpl3 and tufA In most of these cases intact homologs werealso present but in psaA and atpB only two fragments werefound that were widely separated in the genome and thattogether would account for the entire gene (psaA-1 psaA-2atpB-1 and atpB-2) Both gene products are critical for pho-tosynthesis and ATP generation suggesting three possibleexplanations all of which are unusual the plastid fragmentsare pseudogenes and intact proteins are imported from thecytosol the plastid gene products are trans-spliced at theRNA or protein levels or the proteins function as uniquetwo-subunit forms

Expression of psaA-1 and psaA-2 fragments was examinedby transcript mapping by circularizing mRNAs and Northernanalysis (see Materials and Methods) Reverse transcriptaseRT-PCR on circularized mRNA produced a product from eachof the fragments comprising the coding region flanking se-quence and a short 6ndash12 nucleotide-long oligouridylylated(oligoU) tail (fig 1) The consistent failure to connect thefragments by RT-PCR using multiple primer sets suggested

they are not spliced at the RNA level Both psaA-1 and psaA-2are surrounded by several genes on the same strand so theirco-ordinated expression was analyzed by hybridization toprobes corresponding to psaA-1 psaA-2 the upstreamgenes and the downstream noncoding region All probeshybridized with a fragment of the expected size of a stand-alone oligoU mRNA corresponding to the fragment in ques-tion no evidence for a spliced RNA form was found Howeverprobes did hybridize to larger fragments (fig 1 and supple-mentary fig S1 Supplementary Material online) correspond-ing in size and hybridization pattern to a single dicistronic(psaA-1 + psbB) or multiple large polycistronic mRNAs(psaA-2 + psbE + psaB and surrounding genes) Linkagesbetween psaA-1 and psbB and psaA-2 and psaB were con-firmed by RT-PCR Hybridization patterns around psaA-2suggested processing of large mRNAs into the dicistronicpsaA-2 + psbE and subsequently to single gene transcripts(fig 1) Similar to psaA all attempts to link the two atpBfragments by RT-PCR yielded no products The Northernanalysis of atpB-1 and atpB-2 revealed single bands corre-sponding in size to each expected fragment and no evidencewas found for a spliced mRNA again suggesting independentexpression (fig 2A) The transcriptional profiles of both atpBfragments were supportive of this conclusion (fig 2B) All fourpsaA and atpB fragments contained structurally conserveddomains (figs 1 and 2A) and were among the top 22 mostabundantly transcribed plastid genes (table 1 and see later)Evidence at the genomic and transcriptomic levels thereforeconsistently suggested that all psaA and atpB fragments areindependently transcribed translated and code for functionalproducts

To obtain convincing evidence that psaA and atpB genefragments are expressed as separate polypeptides and that nointact version of the proteins is present we analyzed mem-brane protein complexes by combination of two-dimensional(2D) electrophoresis and mass spectrometry (MS)Membrane fraction isolated from C velia cells was solubilizedby dodecyl-b-maltoside and protein complexes were sepa-rated on Clear-Native gel in the first dimension and onsodium dodecyl sulfate (SDS) gel in the second dimension(fig 3) The most abundant spots were subjected to MS anal-ysis of their tryptic peptides which were correlated with pre-dicted peptides of plastid-encoded genes and availableexpressed sequence tag (EST) sequences This allowed us todistinguish photosystem II (PSII) cytochrome b6f and PSIand PSII supercomplexes with bound antennas (fig 3)Fragments corresponding to PsaA-1 PsaA-2 AtpB-1 andAtpB-2 were all identified but no spot with the expectedmasscharge properties of intact PsaA and AtpB was found(see supplementary tables S1 and S2 Supplementary Materialonline for a list of peptides assigned to PsaAB and AtpABproteins) Identification of spots was consistent with chloro-phyll fluorescence detected in the native gel All three PSIIcomplexes exhibited strong fluorescence which contrastedwith the minimal fluorescence of chlorophyll bound withinthe PSI supercomplex The minimal fluorescence emissionfrom the PSI complex indicated it was well preserved beforeit was separated into subunits in the second dimension

2448

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

658349813638

2604

1903

1383

955

623

281

285 kb

105

104

103

102

10

1Tran

scrip

t cov

erag

e pe

r bp

(Lo

g)

psaBpsaA -2psbBpsaA -1

psbEpsbB ψ psaA ψ

acsF

trnG

18 kb[UUUUU]

[UUUUU]095 kb

16 kb

fro frofro

[UUUUU]125 kb

035 kb

POLYCISTRONICTRANSCRIPTS

PROCESSEDTRANSCRIPTS

GENOME

PROTEINS psaA-1 psaA-2divergent conserved

65834981

36382604

1903

1383

955

623

281

psaA-1 psbB psaA-2 psbE

N N C

Cthylakoid stroma

plastid matrix

thylakoid membrane

M

M

D D

P PD

M

M

1 2 9876143 10 11

FIG 1 Expression model for the split PsaA PsaA fragments are separated at the genomic transcriptomic and protein level (top to bottom) The topgraph shows the transcriptional profile (total RNA in black upper line and polyA RNA in gray lower line) of the genomic region below In the genomicregion boxes represent genes and gray arrows show the coding DNA strand Consistent with the transcriptional profile Northern hybridization blots(boxes below) reveal that both psaA fragments are transcribed within larger polycistronic transcripts (P) and further processed into dicistronic (D) andmonocistronic (M) units The monocistronic psaA transcripts contain oligoU tails and translate into independent peptides both of which participate inPSI (thylakoid trans-membrane domains are shown by vertical boxes and numbered)

FIG 2 Expression model for the split AtpB (A) Northern analysis revealed a monocistronic transcript for each of the two atpB fragments(B) Transcriptome coverage of the two genomic regions is shown for total RNA (black upper line) and polyA RNA (gray lower line) Genes areshown by boxes and gray arrows indicate coding strands Black arrows below indicate predicted transcripts based on the combination of transcriptmapping and Northern analysis (C) Alignment of plastid cyanobacterial and yeast mitochondrial AtpBs reveals that the split of Chromera velia AtpBoccurred within a 24 amino acid insertion that is also present in the sister taxon Vitrella (the most probable positions of the split are boxed)

2449

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

(fig 3) This together with the lack of evidence for intactgenes or transcripts for either gene suggested strongly thatthe two separate subunits of both PsaA and AtpB proteins areassembled into functional PSI and ATP synthase respectively

Modeling the two-subunit forms shows that the pointat which both genes were split corresponds to a loopspanning structural domains (fig 1) In the case of PsaAthe position of this breakpoint corresponds to the largestloop in the protein which separates the first two pairs ofperipheral helices from the rest of the protein including thephotochemistry-performing core (fig 1 Jordan et al 2001Green 2003) The split has also driven a considerablechange in divergence rate between the two peptides PsaA-

1 at the photosynthetic antenna periphery is significantlyless conserved than PsaA-2 at the antenna core By calculat-ing maximum likelihood-corrected genetic distances (seeMaterials and Methods) we estimated that PsaA-1 is about46 times more divergent than PsaA-2 relative to other plastidhomologs suggesting that it is not surprisingly under weakerfunctional constraint Maximum likelihood phylogeniesusing a wide sampling of plastid and bacterial homologs(see Materials and Methods) also excluded a secondarybacterial origin for PsaA-1 as suggested previously (Mazoret al 2012) although fast-evolving both C velia PsaA-1 andPsaA-2 were most closely related to the sister taxon Vitrellaand other plastids

Table 1 Total RNA Transcript Mapping to Chromera velia Plastid Genes

Region NumberGene Length (kb) Covbpa Rel Cov vs Medb

Intergenic regionsc 105 224 3444

Genesc 107 941 33990

Ribosomal RNA 3 48 348584 4753

Transfer RNA 29 24 699 10

Protein-coding 75 869 12033 164

Photosystem II (psb) 11 66 52244 712

Photosystem I (psa) 4 57 22740 310

Cytochrome b6f (pet) 4 21 18301 250

ATP synthase (atp) 6 51 14762 201

Ribosomal proteins large subunit (rrl) 10 54 1141 16

Ribosomal proteins small subunit (rrs) 12 168 1328 18

Thylakoid import (sec tat) 3 45 817 11

RNA polymerase (rpo) 4 154 293 04

Other function (acsF ccsA clpC tufA ycf3) 7 119 3617 49

Unknown function (orf) 14 135 1339 18

Highly expressed protein-coding genes psbA 10 383384 5228

psaA-2 12 48594 663

psbE 03 41694 569

atpH1 02 41540 566

psbD 10 37785 515

psbC 14 32123 438

petD 05 27882 380

psbB 16 23102 315

psbV 05 20686 282

psaC 02 19422 265

atpA 17 17239 235

petA 09 16957 231

atpB-1 05 15992 218

psbH 03 15653 213

petB 06 15341 209

psaB 34 14407 196

tufA 13 13235 180

petG 01 13026 178

atpB-2 15 10950 149

psbK 01 9339 127

psaA-1 08 8538 116

psbJ 01 8027 109

NOTEmdashGenes encoding split proteins are underlinedaAverage coverage per base pairbRelative coverage compared with median gene coverage depth per base pair (=7334)cGeneintergenic regions in the TIRs were counted once

2450

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

When compared with PsaA the AtpB split is suggestiveof a different but equally interesting process The regionsurrounding the split is conserved in plastid mitochondrialand bacterial homologs but interrupted by a 24 amino acid(aa) insertion in Vitrella C veliarsquos closest photosyntheticrelative (fig 2C) A detailed sequence alignment of thisregion shows that the split in C velia AtpB occurred withinthis shared insertion suggesting that the ancestral acquisitionof this insertion might have provided an evolutionary oppor-tunity for the split A 31 aa long insertion at the same site is

also present in the AtpB of the related dinoflagellateAmphidinium (data not shown) Although unrelated insequence the notorious divergence of dinoflagellate plastidproteins and their greater distance to C velia and Vitrellacould easily account for the insertion divergence pointingto a possibly even deeper origin

Many Other Plastid Open Reading Frames areExtraordinary and Transcribed

PsaA and AtpB splits stand out as unprecedented cases butrepresent only a fraction of peculiarities in C velia plastidopen reading frames (ORFs) The genome contains manyunusually modified ORFs including several genes that areexpected to be present in plastid genomes but which areunusually large due to the presence of long extensions thatbear no recognizable similarity to known genes In three genes(rps7 8 and 17) the ORF region has been extended by 126 to273 amino acids toward the C-terminus effectively triplingthem in size (supplementary fig S2 SupplementaryMaterial online) In four other genes (rps3 4 8 and 11) theregion sharing homology with other plastid proteins is foundat the end of an unusually large ORF (supplementary fig S3Supplementary Material online) The most obvious exampleof this is Rps4 which is encoded within a 1998 aa long ORFcompared with 201 aa in the red alga Porphyra purpurea Inthese cases start codons may be located near the beginning ofthe homologous region but it is still puzzling why the homol-ogous region would be preceded by such a long stretch ofsequence uninterrupted by stop codons Canonical Shine-Dalgarno sequences are absent (the predicted 30-end of the16S rRNA is highly divergent in C velia) and therefore cannotsupport start codon prediction ORFs of two additional genesrpoC1 and rpoC2 contain long insertions (507 and 971 aa)in their variable regions (none of them appears to representan intein supplementary fig S4 Supplementary Materialonline) Finally a number of smaller changes are found inORFs throughout the C velia plastid genome Accountingfor extensions and indels longer than 20 aa five other ORFshave a truncated N-terminus (atpA ccsA petA ycf3 andpsaB) one ORF has a significantly extended N-terminus(rpl4) two ORFs have a truncated C-terminus (ccsA andrpoA) five ORFs have an extended C-terminus (atpA atpBatpH2 petA and rps2) and three ORFs contain interveninginsertions (rpl6 rpoB and psaB)

Altogether ORFs of various functionally unrelated proteinscontain unusual modifications Many of the proteins areindispensable and exclusively plastid-encoded in algae andplants suggesting that they are unlikely to be in the processof degeneration loss or relocation to the nucleus in C veliaAccordingly none of these proteins can be identified in theavailable nuclear transcriptomes from C velia (Woehle et al2011) To provide first insights into expression of the unusualproteins we analyzed the plastid transcriptome using totaland polyA RNA fraction sequence that we generated (polyURNA sequence was not available for this analysis) Transcriptmapping from the total RNA fraction (see Materials andMethods) revealed that all unusually long ORFs are coveredby transcriptome reads roughly equally in their entirety

FIG 3 The 2D electrophoresis of membrane protein complexes ofChromera velia and identification of individual spots Membrane pro-teins were solubilized by dodecyl-b-maltoside and separated in the firstdimension by Clear-Native electrophoresis (Clear-Native-PAGE) Afterthe separation the gel was scanned by LAS 4000 imager (Fuji Japan) inchlorophyll fluorescence mode (Chl fl) after excitation by blue LED lightto distinguish the PSII and light-harvesting complexes The proteincomplexes resolved in the first dimension were further separatedin the second dimension by denaturing gel (SDS-PAGE) and stainedby Coomassie blue Protein spots were cut from the gel and analyzedby LC-MSMS as described in Materials and Methods Identified spotsare highlighted and positions of protein complexes separated byClear-Native electrophoresis are marked as follows PSII PhotosystemII SC-PSII Supercomplexes of photosystem II with antenna SC-PSISupercomplexes of photosystem I with antenna

2451

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

(supplementary figs S2 and S3 Supplementary Materialonline) Although this approach cannot accurately pinpointthe position of start codons (many rps genes are expressedrelatively little and intergenic regions are often transcribed)the consistent coverage by transcripts suggests that the ORFsmay really have expanded in size at the protein level Much ofthe long insertions in rpo genes is also equally represented inthe transcriptome data however in this case a drop in cov-erage is found within each insertion creating a 200 bp gap(supplementary fig S4 Supplementary Material online) Thisraises the possibility that RpoC1 and RpoC2 may be splitmuch like PsaA and AtpB however both are in frame andlow coverage so more evidence is required to distinguishbetween the two alternatives

Unusual changes to gene presence have also been notedtwo genes are present in two (atpH) and three (clpC) full-length paralogs despite that a single type is found in otherplastid genomes All three clpC paralogs are highly divergentin sequence from each other and clpCs in other plastids Theyoverlap only partially and the largest and most complete ofthe paralogs (clpC3) is an order of magnitude less expressedthan the other two (supplementary fig S5A SupplementaryMaterial online) Likewise the two atpH paralogs are differentin sequence (93 nucleotide similarity) length and expres-sion AtpH1 is both highly expressed and conserved butthe expression of the duplicated atpH2 located on the TIRsis nearly 100-fold lower (supplementary fig S5BSupplementary Material online) The atpH2 gene contains

an unusual C-terminal extension which doubles it in size Acloser look at the surrounding sequence reveals that atpH2originated by a duplication event of the atpH1 region (includ-ing trnK discussed later) This together with its low expres-sion suggests that atpH2 is most likely a non-functionalpseudogene More generally however this example illustrateshow duplication can drive gene paralogy ORF expansionand altered gene order all features observed throughoutthe C velia plastid genome

Analysis of the total RNA and polyA fraction (fig 4) tran-scription profiles revealed more unexpected featuresTranscripts for all plastid genes were present in the totalRNA fraction and consistently also found in the polyA frac-tion at two to three magnitudes lower abundance probablydue to a carryover of non-polyA transcripts (fig 4) The psbAgene stood out in two ways however First it was highlyexpressed being represented at almost an order of magnitudehigher levels than other protein-coding genes (and similarlevels as the rRNA operon fig 4 and table 1) Second therepresentation of psbA in the polyA fraction was about equalto the total RNA fraction which would be the expected resultfor a polyadenylated transcript (fig 4) Only two additionalregions corresponding to rps19 and a psbB fragment (partof the annotated orf157) were relatively overrepresented inthe polyA fraction but only rps19 contained A-rich genomicsequence at the 30-end that could explain the bias Whetherthe psbA gene and psbB fragment are uniquely polyadeny-lated or a carryover into the polyA fraction occurred due to

10000 20000 30000 40000

105

104

106

103

102

10

1

atpB-2 psaA-1 psbB psaA-2 psbC psbDpsaB

position (bp)

Cov

erag

e pe

r bp

(Lo

g)

atpH2

Photosystem I + II ATPase Uknown ORFExpression Pseudogene fragmentOther functiontotal RNA polyA RNA

psbE

50000 60000 70000 80000

105

104

106

103

102

10

1

23S rRNA atpB-1 atpH1 atpI

Cov

erag

e pe

r bp

(Lo

g)

psbV K psbJ

90000 100000 110000 120000

105

104

106

103

102

10

1

psbAatpA

Cov

erag

e pe

r bp

(Lo

g)

atpH2psbHpsbT psaC petGB

petA

Cytochrome b6f

rps19

tufA

petD

16S rRNA

psbA psbB

FIG 4 Plastid total RNA and polyA RNA transcriptomic profiles Coverage by total RNA (red) and polyA RNA (blue) is shown for the whole plastidgenome (Log scale) Gene families are color-coded according to the key The highly expressed psbA gene is highlighted in bold The split psaA and atpBgenes are in red The boundaries of the TIRs are separated by dashed vertical lines

2452

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

another reason is currently not clear and awaits successfulcloning of their 30 transcript ends The transcriptomes alsoenabled us to analyze expression pattern among neigboringgenes which have been highly rearranged and strand-polar-ized in C velia compared with other plastids Canonicaloperons (atpIHGFDA psaAB petBD ribosomal proteinsupercluster ribosomal RNA operon eg Westhoff et al1983) and translationally coupled units (atpBE Gatenbyet al 1989) were also affected by rearrangements and eitherfragmented or internally reshuffled Canonical promoterscould not be identified These characteristics indicate thattranscription of many core plastid genes has been remodeledNorthern hybridization patterns in the psaA-1 and psaA-2region already revealed that this is the case in several keyproteins of the photosystems which are expressed fromnovel multicistronic mRNAs through complex processing(fig 1) Whole-genome transcriptional profiles (fig 4) confirmthis and suggest that polycistronic transcription is widespreadin the genome (compare transcript coverage at intergenicspacers and the region downstream of the 16S rRNA genefig 4) Although the primary transcript regions and processingpatterns cannot be predicted accurately the data show thatnovel polycistronic units most likely exist and often includefunctionally similar genes (compare distribution of genes forphotosynthesis and protein expression in fig 4)

Finally we estimated and compared relative expression ofall plastid genes separately and in their functional groupsbased on the total RNA transcriptome (table 1 and supple-mentary table S3 Supplementary Material online) Asexpected the rRNA genes were most highly expressedfollowed by genes of the four membrane complexes (PSIPSII cytochrome b6f and ATP synthase) and these alsocomprised all of the 22 most expressed protein-codinggenes with the single exception of tufA The least expressedwere genes related to transcription and translation RNApolymerase tRNA genes and ribosomal proteins (table 1)However unknown ORFs and most intergenic regions werealso expressed including relatively highly expressed orf389orf201 and intergenic regions downstream of psbJ andatpB-1 (fig 4) Additionally we used the transcriptome toaddress the presence of RNA editing which is known in theplastids of the related dinoflagellates (eg Dang and Green2009) but has not been found in those of apicomplexans andVitrella The consensus of transcriptomic reads at each sitewas identical to the C velia plastid genome sequence suggest-ing the absence of editing Only four sites showed significantpolymorphism (gt10) at RNA level (sites 44346 [515]46071 [2366] 46074 [3935] and 46574 [2379]) all ofwhich were located within the rRNA operon and identical(998ndash100) at the DNA level Based on their location weconclude that this is more likely due to reverse transcriptionerrors of modified rRNA nucleosides and not RNA editing

A Linear-Mapping Plastid Genome in C velia

One of the most significant features of the previouslypublished plastid genome from C velia is that it does notmap as a circle as do all other plastid genomes completed todate The gap in the sequence falls between the two copies of

the TIR containing psbA (Janouskovec et al 2010) Weattempted to close this gap using multiple approaches Westarted by massively increasing the depth of sequence cover-age The genome was originally sequenced by 454 pyrose-quencing to a 13-fold depth of coverage so we used anindependent approach and assembled the genome frompaired-end Illumina sequence reads to an average depth ofalmost 600-fold This approach extended the two TIRs by300 bp on each side which was verified by PCR (supplemen-tary fig S6 Supplementary Material online) The very end ofeach TIR was followed by a 15 bp sequence further followedby an extension into the complementary TIR at a very lowdepth of coverage (fig 5) These extensions created an overlapbetween the ends however several lines of evidencesuggested that this is not due to a circular chromosomebut rather a linear topology with structured ends Firstmapping the depth of coverage revealed a pattern expectedfor a linear molecule The depth of coverage was generallyconsistent across the large single copy (LSC) region and mostof the TIRs but reduced linearly beginning at about 600 bpfrom the end followed by an exponential reduction andpartial recovery between 300 to 50 bp from the end followedby a terminal drop (fig 5A) The size and shape of theexponential reductionrecovery would be predicted by themean DNA fragment size in the sequencing library (327 bp)and linear topology as a consequence of fragmentation biasnear the chromosome end Therefore both the overalldecrease in coverage depth and the exponential decreaserecovery close to the ends support a linear chromosomeconformation (if it were circular the whole region shouldnot differ from the rest of the genome) Second if thegenome was circular-mapping and the gap was hard tosequence we would expect some paired ends to span thegap However when more than 650000 paired ends aremapped to the genome not a single paired end spanningthe gap was identified (see Materials and Methods) Insteadboth ends of the chromosome consisted exclusively ofreverse-mapping reads (fig 5A and B) Third PCR experi-ments using six primers (all shown to successfully amplifyproducts from one side of the gap) and all their nested com-binations failed to fill the gap between the two TIRs (supple-mentary fig S6 Supplementary Material online) Fourth thesize of the plastid genome estimated directly by pulsed-fieldgel electrophoresis (PFGE) and by psbA probe hybridizationwas consistent with abundant presence of linear monomers120 kb in size (supplementary fig S7 Supplementary Materialonline and Janouskovec et al 2010) Subgenomic-sizedmaterial and unresolved DNA in the well were also abun-dantly represented whereas putative linear dimers wererare and linear concatemers were not detected The PFGEexperiment was reproduced by using probes to three plastidgenes encoded in the LSC region tufA petA and atpA (seeMaterials and Methods) and consistently led to the sameresult suggesting that linear monomers are an importantconstituent of plastid DNA in C velia Subgenomic-sized mol-ecules could represent plastid DNA fragmented during PFGEpreparation (note that intact C velia cells were digestedin agarose plugs however) nucleus-encoded plastid DNA

2453

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

fragments plastid DNA in the process of replication or otherforms of genuine plastid DNA (Ellis and Day 1986 Oldenburgand Bendich 2004 Scharff and Koop 2006) Incompletely di-gested cells of C velia in the agarose plugs (see Materials andMethods) could be responsible for much of the signal in thewell although high-molecular-weight branched DNA formscould also be present and cannot be distinguished using thecurrent approach (Bendich 1991 Bendich 2004) Altogetherbecause the plastid genome was not closed by deep sequenc-ing using either 454 or Illumina no paired-end linkage couldbe established no linkage could be established by PCR and

because the size of the plastid genome estimated by PFGE andSouthern blot hybridizations was consistent with the exis-tence of linear monomers we conclude that the genome isprincipally linear in structure and circular molecules andlinear concatemers are rare or absent

If the genome is indeed linear it will be important to knowhow the C velia plastid DNA replication and copy numbercompares to the circularconcatemeric plastid DNA inapicomplexans (Williamson et al 2001 Williamson et al2002) and other organisms To provide a first glimpse intothis we identified a putative replication origin of the C velia

A

B

FIG 5 Schematic of plastid chromosome ends (A) Coverage depth (y axis) by forward-mapping (yellow) reverse-mapping (purple) and total (red)genomic reads is projected onto the TIR Position of genes is shown by gray boxes at the bottom (K = trnK P = trnP) Total coverage depth startsdropping gradually at about 800 bp from the chromosome end and goes through a U-shaped minimum at about 150 bp from the end The calculatedmean size of the end-sequenced DNA fragments (horizontal black bar red bits correspond to sequenced end pairs) and coverage by forwardreverseread pairs in this region suggest that the U-shaped minimum results from unequal DNA fragmentation near the chromosome end A steep drop in totalcoverage depth occurs at the very end of the chromosome (B) Here the TIRs on each of the chromosome ends (green and blue respectively) divergeinto a shared 15 bp region followed by a short sequence of the complementary TIR (DNA ends marked as 30 and 50) The total depth of coverage byDNA reads (y axis) steeply decreases in the 15 bp region The 15 bp region is also enriched in nucleotide discrepancies (sequence logos in bold) The last130 bp at each chromosome end is exclusively assembled from reverse-oriented paired ends (horizontal colored arrows) and no paired-end spanningthis region can be identified suggesting that the genome cannot be genuinely circularized

2454

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

plastid genome in silico using the cumulative GC-skew anal-ysis This region (68ndash69 kb from one end of the linear contigsupplementary fig S8A Supplementary Material online) ischaracterized by a minimum in cumulative GC-skew andloosely conserved tandem repeats (positions 68404ndash69029)In contrast both ends of the chromosome are at a cumulativeGC-skew maximum suggesting that they most likely corre-spond to replication termini Another feature commonlyassociated with origins of replication is the major shift ingene orientation which is located approximately 18 kbaway from the predicted replication origin in C velia (supple-mentary fig S8B Supplementary Material online) Thegenomic region surrounding the two features is relativelyoverrepresented in the Illumina data whereas both chromo-some ends are relatively underrepresented (this holds aftercorrection for base composition supplementary fig S8CSupplementary Material online) which is consistent withthe existence of replicating molecules in the Illumina libraryAltogether these three characteristics are consistent withlinear-mapping topology and suggest that replication startsat about two-thirds of the way from one end of the moleculeand proceeds bidirectionally toward the ends The copynumber for two C velia plastid genes (tufA and atpH) wasdetermined using dot blot hybridization to total genomicDNA and a synthetic construct (see Materials andMethods) The inferred copy number of both plastid geneswas 9 times higher compared with a single-copy nuclear genetopoisomerase II and confirmed that genes encoded in theLSC region and TIR are equally represented (atpH is present inthree closely related paralogs all of which hybridized to theconstruct) (supplementary fig S9 Supplementary Materialonline)

Discussion

Significance of Split Proteins for Photosynthesis andATP Generation

Both PsaA and AtpB have never been observed to be frag-mented like those of C velia Although the psaA gene is splitin Chlamydomonas reinhardtii RNA trans-splicing producesa full-length transcript which is translated to a full-lengthprotein (Merendino et al 2006) The structure of the PSIcore is highly conserved in all known oxygenic phototrophs(Nelson and Yocum 2006 Busch and Hippler 2011) and it isformed by a heterodimer of structurally similar PsaA and PsaBproteins which bind a remarkably high number of cofactorsabout 100 chlorophylls 14 carotenoids and 2 phylloquinones(Amunts et al 2010) The algal PSI is expected to be structur-ally similar to the plant PSI including several Lhca antennaproteins bound in a half-circle around the PSI (fig 6A)

PsaA in C velia is split between domains 4 and 5 (figs 1 and6A) This suggests that the split was not simply a reversal ofthe proposed ancient formation of psaA by the gene fusion ofa psbBpsbC-like antenna component and a psbApsbD-likecore of the reaction centre (domains 1ndash6 and 7ndash11 respec-tively in fig 1 Schubert et al 1998) The position of the splitand the faster evolutionary rate of PsaA-1 (peripheral frag-ment) can be reconciled with the structure of the PSI (fig 6A

and B) Indeed the presence of a supercomplex of PSI andantenna proteins in C velia is obvious from the 2D electro-phoresis (fig 3) Apart from the structure of PSI itself theprocess of the PSI assembly seems to be highly conservedthrough evolution from cyanobacteria to higher plant chlo-roplasts including assistance of the same auxiliary factors suchas Ycf3 and Ycf4 (Boudreau et al 1997 Ruf et al 1997)Although many individual steps in the PSI biogenesisremain unknown this process had to be remodeled inC velia to assemble three proteins instead of two into thefunctional PSI core complex Particularly intriguing is thebinding of chlorophyll cofactors into the split PsaA Both invivo and in vitro studies suggest that chlorophyll has to beinserted into core subunits of PSI co-translationally probablyas a prerequisite for correct protein folding (Kim et al 1994Eichacker et al 1996) The putative interface between PsaA-1and PsaA-2 is rich in chlorophyll molecules (fig 6B) andaccording to available crystal structures these chlorophyllsare coordinated by both parts of split PsaA It is not clearhow C velia inserts these chlorophylls into PSI and how sta-bility and correct folding of the nascent PsaA-12 is achievedPerhaps both PsaA fragments are assembled together co-translationally and this is synchronized with or even assistedby chlorophyll loading (which could also help to coordinatethe orientation of the two fragments) In any case the PSI corebiogenesis in C velia could involve additional synchronizingassembly step(s) and the recruitment of new nuclear-encoded auxiliary factors Interestingly the second proteinof the PSI antenna core PsaB mirrors some of the PsaAstructure at functionally analogous positions although notsplit PsaB also contains a variable loop between the fourthand fifth trans-membrane helices and a highly divergentN-terminus Whether this plays a compensatory role directlyrelated to the splitting of PsaA is not clear but it is an inter-esting possibility that would require direct biochemicaltesting

Similar to the PsaA protein any split of the AtpB proteinwould seem improbable due to its critical function andconserved structure Even though the proposed AtpB-1part of the b-subunit appears to be relatively far from thecatalytic site (fig 6C) it is known that tilting of the b-subunitsincluding the top b-sheet crown is critical for the catalyticcycle The movement of the upper part of the b-subunit issubtle but if it is restricted by inhibitor tentoxin then thecyclic interconversion of nucleotide binding sites is blocked(Groth 2002) Tentoxin binds close to the place where theAtpB protein is split (Groth 2002) Correct reassembly ofAtpB-1 and AtpB-2 proteins is therefore likely to be highlyconstrained to avoid any restrictions of AtpB-2 movementOn the other hand it is also possible that the fragmentedAtpB-2 is more flexible and perhaps less sensitive to somenatural inhibitors The ATP synthase was also noteworthy inthat it migrated at the top of the native protein gel This hasnever been observed in cyanobacteria (Herranen et al 2004)or higher plants (Jarvi et al 2011) and indicates that ATPsynthase might be integrated into a very stable megadaltonsupercomplex in the plastid of C velia

2455

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Highly Divergent Characteristics of Gene andChromosomal Structure

A number of highly modified ORFs are found in the C veliaplastid genome and distributed without any obvious patternLong extensions or insertions in plastid proteins are uncom-mon in most plastids including those in other alveolatesSome have been identified in green algae (eg de Cambiaireet al 2006) but never in the extent or number reported hereMany of the C velia predicted proteins that contain exten-sions function in the small subunit of the ribosome Ifthe extensions are really present in the final functional pro-teins this has implications for plastid ribosome structureAlternatively transcripts of the rps ORFs and other unusualORFs could be modified so that only the homologous regionis translated This process may be related to transcript oligour-idylylation but as yet we have no evidence to support thisOligoU tailing is another modification of general interest(Wang and Morse 2006) and its significance and distribution

in C velia plastid transcripts remain unknown OligoU tailsin some C velia transcripts are found in mature mRNAs thatare derived from polycistronic primary transcripts (fig 1)This suggests a possible role in transcript processing perhapssimilar to that observed in dinoflagellate plastids (Nelsonet al 2007 Dang and Green 2009 Barbrook et al 2012)If confirmed the ubiquitous presence of polycistronictranscription would imply that oligoU tailing is widespreadin the C velia plastid and tightly intertwined with expressionof many core plastid genes

No linear-mapping plastid genome has been documentedto date although there is a longstanding debate aboutwhether circular-mapping plastid genomes might representphysically linear molecules (Bendich 2004) At a sufficientdepth of coverage the two ends of the C velia plastidgenome can be (barely) overlapped in sequence but multiplelines of evidence suggest the genome is actually linear ThePFGE data support existence of linear monomers and rare

FIG 6 A putative position of the PsaA proteins in PSI and the AtpB proteins in ATP synthase of Chromera velia as approximated from proteinalignments and available crystal structures of plant counterparts (A) A stromal view on the pea PSI-LHC complex (PDB ID code 2WSC Amunts et al2009) showing a putative position of PsaA-1 and PsaA-2 in the PSI complex Lhca antenna proteins are shown in green stromal subunits PsaC PsaD andPsaE were removed for clarity (B) A detail of the same structure showing arrangement of chlorophyll molecules bound to the PSI core Red circle markschlorophylls expected to be bound at an interface between PsaA-1 and PsaA-2 proteins in C velia (see discussion for details) (C) AtpB-1 and AtpB-2proteins fitted into structure of the spinach F1-ATPase (PDB ID code 1KMH) Catalytic site is indicated by an orange asterisk

2456

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

dimers (which could arise by TIR-mediated recombinationbetween two monomers) but is incongruent with a signifi-cant presence of linear concatemers This leaves no directcomparisons possible with other plastid DNAs includingthose in related apicomplexans which comprise various mix-tures of circular molecules linear concatemers and high-mo-lecular-weight material (Lilly et al 2001 Williamson et al 2002Bendich 2004 Day and Madesis 2007) Presence of TIRs in theC velia plastid genome is a common characteristic of lineargenomes (eg in apicomplexan mitochondria Kairo et al1994 Hikosaka et al 2010) The physical structure of thechromosome ends remains unknown but given thesequence one can speculate based on structures observedin other linear genomes These include single-strandedloops (possibly followed by a nick) or single-stranded over-hangs in the 15 bp regions (reviewed in Nosek et al 2004) ordiverse associations with end-specific proteins (eg Rekoshet al 1977 Tomaska et al 1997) Gradually decreasing depthof coverage near the chromosome end may also suggest thatthe whole terminal 600 bp region not just its very end isprotected possibly by a t-loop (Tomaska et al 2002) Norepeated sequence can be found close to the ends howeverso determining which of these structures if any most closelyrepresents the physical state of the C velia plastid chromo-some will require direct testing Altogether these data provideevidence for the first linear-mapping plastid genome inC velia and suggests that a similar topology could existin other plastid genomes particularly those that do notpresently map as a circle (Gabrielsen et al 2011)

The change in the plastid chromosome topology hasimportant implications on the process of DNA replicationand overcoming the end-replication problem In the api-complexan and other plastids several types of replicationorigins have been identified and linked to the D-loop roll-ing circle and recombination-mediated replication strate-gies The most common type of replication origin isassociated with bidirectional D-loop replication and locatedinside the duplicated rRNA inverted repeat (Kolodner andTewari 1975 Williamson et al 2002 Krishnan and Rao2009) In contrast the C velia plastid genome lacks boththe duplicated rRNA unit and associated replication originand is probably replicated bidirectionally from thepredicted putative replication origin toward the chromo-some termini which is a unique situation among all plastidgenomes The estimated plastid gene copy number (9that of a nuclear marker) is also unusually low comparedwith most photosynthetic plastids (~50ndash100 copies Dayand Madesis 2007) and is more similar to the gene copynumber in the non-photosynthetic plastids of apicomplex-ans (~25 copies Matsuzaki et al 2001) or plastid minicir-cles during exponential growth in dinoflagellates(2ndash4 copies Koumandou and Howe 2007)

Searching for Forces Driving Structural Complexity

The C velia plastid genome is unique in a number of waysHere we have examined several unusual characteristics ofits organization and expression all of which raises the

question why is this genome home to so many odditiesOne property that might explain several of these character-istics concurrently is a high level of intragenomic duplicationThe C velia plastid genome has been extensively reshuffledand in order to understand the underlying forces wesearched for palindromes short repeats and long repeatsPalindromes and short repeats were comparatively rarehowever the searches identified at least 46 pairwise matchesbetween longer genomic sequences (gt50 bp) at 65ndash100similarity (fig 7) Most of the duplicates are divergent andlow copy number (2ndash3) and apart from the TIRs they includemany intergenic regions duplicated atpH and clpC genesand duplicated fragments of otherwise complete genes(psaA psaB psbB rpl3 and tufA fig 7) The region containingmost repeated sequence is found around the TIR boundary(fig 8) Multiple parts of this region including severalgene fragments three atpH paralogs and six trnK paralogscomprising both complete and fragmentary variants arescattered in several places in the genome (fig 8) Long repeatsare rare in most plastids suggesting that duplication isan important force in shaping the C velia plastid genomeDuplication could explain occurrence of paralogs small genefragments extensive gene re-shuffling and re-structuringof operon units through movement of promoters andintergenic elements Similar processes could account for theaddition of extensions and insertions to genes The splitting ofpsaA and atpB could likewise be seen as a process involvingduplication or partial duplication For example the PsaA splitcould have proceeded through an intermediate in which the

FIG 7 Large repeats in the Chromera velia plastid genome The lineargenome is drawn as an incomplete circle with genes in black intergenicsin light gray and two TIRs (white lines) at chromosome ends (15 bpregions at the termini are not visible here) Links connect homologousregions of 50 bp or more identified by BLAST (dark gray) and additionalmatches identified by Pipmaker (medium gray) Homology between theTIRs is not shown for clarity Names of genes most affected by dupli-cation events are indicated

2457

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

canonical and split PsaAs coexisted In this scenario each ofthe two complexes could have been regulated independentlyand potentially even have different functions much alike thecyanobacterial monomeric and trimeric PSI (Majeed et al2012)

Duplicated sequences in C velia plastid are low copynumber long and divergent from each other They aredissimilar from short dispersed repeats found in some algalplastids (eg in Chlamydomonas Maul et al 2002) and longerrepeats found in angiosperm plastids (eg Guisinger et al2011) all of which are conserved in sequence and compara-tively high copy number It remains unclear how duplicationis mediated in C velia Few duplicates are in tandem so theduplication process is not likely to be related to DNA repli-cation and is more likely related to recombination A func-tional DNA recombination machinery has been documentedin several land plant and algal plastids (Palmer 1983 Boyntonet al 1988 Haberle et al 2008) where it primarily acts in DNArepair and replication Unwanted genome reorganization isknown to be actively prevented which explains why mostplastids have maintained a conserved genomic architecture(reviewed in Marechal and Brisson 2010) It is possiblethat mechanisms preventing unwanted recombination oralternatively those allowing for recombination-mediatedrepair were loosened or lost in the C velia plastid whichled to propagation of duplicates and rearrangementsThe apparent linearization of the chromosome could alsostem from a recombination-mediated rearrangementinvolving homologous TIRs and 15 bp repeats now locatedat the termini Interestingly although sequence shuffling hasbeen extensive in the C velia plastid genome it has not beencompletely random Most genes have been reorganized inlarge clusters with a strongly pronounced strand polarity(supplementary fig S8B Supplementary Material online)More importantly functionally related genes often clustertogether and are most likely transcribed polycistronically(fig 4) so even with high levels of rearrangement a function-ally reorganized genomic architecture can result fromselection

Intragenomic duplication may be the cause of most pecu-liarities in the C velia plastid genome but not all of themFeatures such as transcript oligoU tailing and noncanonicalgenetic code (Moore et al 2008 Janouskovec et al 2010) may

have different causes including increased mutation rate smallpopulation size and a short replication cycle Moreover earlyevolutionary acquisition and long-term combined effect ofsome these forces could have significantly contributed tothe genome remodeling For example all alveolate plastids(C velia apicomplexans Vitrella and dinoflagellates) have areduced gene complement somewhat modified gene orderand comparatively fast rate of protein evolution (Janouskovecet al 2010) Oligouridylylation of plastid transcripts is alsofound in dinoflagellates and may significantly predateC velia although it seems to be missing in apicomplexans(Janouskovec et al 2010 JJ unpublished data) The AtpBinsertion present in dinoflagellates and Vitrella at a homolo-gous position to the split in C velia could be viewed as anancestral acquisition that may have individualized the twoparts of the protein preparing ground for the split Similarlyrecombination has been very active in the dinoflagellate plas-tids and possibly even led to a massive fragmentation of theirgenome to small mini-circles (Zhang et al 1999 Zhang et al2001) so it might be tempting to speculate that the corecause of many of these conditions traces back to an ancientancestor Distinguishing between ancient and independentgains is nevertheless difficult The plastid genomes of apicom-plexans and Vitrella are generally not that unusual in struc-ture as C velia and dinoflagellates so it is more accurate topropose that all alveolate plastids were ancestrally somewhatdivergent but evolved in different directions

The fact that in a single organism two conservativephotosynthesis proteins are split is astonishing given thehigh efficiency of C velia photosynthesis (Quigg et al 2012)The mechanism behind a particular split can be relativelysimple but it is more difficult to see how each of the PsaAand AtpB fragments became integrated into a functionalmultiprotein complex For example both PsaA fragmentshad to coevolve with mechanisms allowing co-translationalinsertion of chlorophylls and simultaneously acquireremodelinteractions with numerous cofactors that mediate theirassembly into a fully functional PSI complex Many otherstructures in the C velia plastid appear similarly complicatedBoth the ribosome and RNA polymerase complex comprisea number of highly modified components including proteinswith hundreds of amino acid long extensions and insertionsExplaining the appearance of highly modified structures

FIG 8 Intrachromosomal duplication and recombination hotspot around the TIR boundary (vertical dashed line) The plastid genome is shown bythick black line at the bottom with TIRs (TIR-A and TIR-B) folded onto themselves to the left and the LSC region forming a loop to the right Other partsof the genome with homology to the TIR boundary region are shown by thick gray lines numbers indicate positions Genes and pseudogenes are shownby boxes Level of sequence similarity is distinguished according to the key

2458

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is more difficult than their components individually becausemany parts of these systems are mutually constrainedLikewise molecular processes such as oligoU tailing aredifficult to justify in adaptive terms Considering the complex-ity of similar unnecessary processes in other organellessuch as RNA editing or intron splicing (Gray et al 2010)it is attractive to hypothesize that complicated molecularmachineries have evolved in the C velia plastid that serveno general advantage

Materials and Methods

DNA Extraction Sequencing Annotation andFragmented Gene Analysis

Pelleted cells of C velia were ground in liquid nitrogenusing mortar and pestle and the resulting slurry was incu-bated in CTAB buffer (2 wv cetyltrimethyl ammoniumbromide 142 M NaCl 20 mM EDTA 100 mM Tris HCl pH80 2 wv polyvinylpyrrolidone 05 b-mercaptoethanol1 mgml RNase A) at 65 C for 20 min After two extrac-tions with phenolchloroform and one with chloroformonly DNA was precipitated with isopropanol and washedwith ethanol Pellet was dried at room temperature andresuspended in TE buffer Extracted DNA was separated byCsCl-Hoechst gradient ultracentrifugation and AT-enrichedfractions were tested for the presence of plastid DNA usingthe psbA probe Plastid-enriched DNA fractions weresequenced using Illumina 54 bp paired-end reads technol-ogy (a total of 9591179 read pairs were obtained) anddeposited in GenBank Sequence Read Archive (BioprojectPRJNA193178) De novo sequence assembly using MIRA3extended the previous plastid contig (NC_0143401) atboth ends to the final length of 120426 nucleotides at572 average depth of coverage Two errors at homopol-ymeric regions were corrected which led to the merging ofrps7 with orf142 and rpl4 with orf128 The gene for 5Sribosomal RNA was identified upstream of 23S rRNAand its fold verified using Mfold (httpmfoldrnaalbanyeduq=mfold last accessed May 25 2013) Proteins werere-annotated by comparison with plastid homologs froman NTG start codon with the exception of 10 proteins thatwere more consistent with an AT[ACT] start codon Six ofthe proteins had no alternative NTG start (atpA rpl31rps7 rps12 rps14 ycf3) Unknown ORFs and genes withlong N-terminal extensions were annotated from the firstNTG start codon The extended plastid genome sequenceand new annotations were deposited in an updated C veliaplastid genome entry (NC_0143402) PsaA trans-membranedomains were predicted using TMHMM 20 (httpwwwcbsdtudkservicesTMHMM last accessed May 25 2013)and by comparison to PsaA of Synechoccocus elongatusThe corrected genetic distances for PsaA-1 and PsaA-2were calculated in Tree-Puzzle 52 (Schmidt et al 2002)using a data set of 40 representative plastid PsaA homologsand the following parameters Likelihood mapping Slow(exact) parameter estimates WAG model with fourgamma categories estimated alpha parameter and esti-mated aa frequencies PsaA-12 phylogenies were

computed using a broader data set in RAxML using -mPROTGAMMALGF -f a - 100 PsaA-12 and AtpB data setswere aligned using ndashlocalpair option in MAFFT

RNA Extraction RACE and Northern BlotHybridization

Total RNA was isolated from ~02 g of C velia cells rup-tured by repeated freezing and thawing followed by grind-ing with a pestle in liquid nitrogen by the addition of 5 mlTRI reagent followed by manufacturerrsquos instructions(Sigma) Total RNA was ligated with 50 adapter (50

RACE) or self-ligated with T4 RNA ligase (circular RACE)and RT-PCR was carried out with either random or specificprimers or with polyT primers (30 RACE) Sequences ofcircularized transcripts were deposited under GenBankNucleotide accessions KC734564ndashKC734569 For Northernblot analysis ~10mg per lane of total RNA was separatedin a 1 formaldehyde agarose gel in 1 MOPS bufferblotted and UV cross-linked as described elsewhere(Vondruskova et al 2005) PCR-amplified DNA fragmentsof selected genes (for primers see supplementary table S4Supplementary Material online) were labeled by randompriming with [32P]-dATP and used as probes for hybrid-ization at 55 C After washing the results were visualizedon Typhoon Imaging System (GE Healthcare)

Preparations of Cell Membranes and TwoDimensional Electrophoresis

Cells of C velia (optical density at 750 nm = ~05) werewashed and resuspended in 1 ml of the working buffer con-taining 25 mM MESNaOH pH 65 5 mM CaCl2 10 mMMgCl2 and 20 glycerol The resuspended cells were mixedwith 05 ml of glass beads (01 mm diameter) in 2 ml eppen-dorf tube and broken using Mini-BeadBeater (BioSpeceight shaking cycles 10 s each with 2-min breaks for coolingthe suspension on ice) Membranes were separated fromthe cell extract by centrifugation (40000 g 20 min)The isolated membranes were resuspended in the workingbuffer and solubilized by gentle shaking with 1 dodecyl-b-maltoside at 10 C for 1 h Insoluble parts were removedby centrifugation (65000 g 20 min) Analysis of nativemembrane complexes was performed using Clear-Nativeelectrophoresis as described in Wittig and Schagger (2008)Individual proteins in membrane complexes were resolvedin the second dimension by SDS-PAGE in a 12ndash20 lineargradient polyacrylamide gel containing 7 M urea (Sobotkaet al 2008)

Protein Identification by LC-MSMS Analysis

Gel slices were placed in 200ml of 40 acetonitrile 200 mMammonium bicarbonate and incubated at 37 C for 30 minThe solution was then discarded the procedure was repeatedone time and the gel was finally dried in a vacuum centrifugeTwenty microliters of 40 mM ammonium bicarbonatecontaining 04mg trypsin (proteomics grade Sigma) wasadded to the tube incubated at 4 C for 45 min and thendried in a vacuum centrifuge To digest proteins 20ml of 9

2459

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 2: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

accessible for comparison with other plastids Indeed theinitial description of the C velia plastid genome noted severalunusual features with implications for plastid evolution andfunction (Janouskovec et al 2010) but because they weretypically not shared with either apicomplexans or dinoflagel-lates (and therefore not comparable to them) they have notbeen characterized further

Here we describe three intriguing features of the C veliaplastid genome the presence of split genes encoding splitproteins divergent characteristics of gene organization andexpression and the physical structure of the chromosomeWe show that two functionally important proteins involvedin photosynthesis PsaA representing a core subunit of pho-tosystem I (PSI) and AtpB representing the beta subunit ofthe ATP synthase are expressed in two discrete fragments atboth the RNA and protein levels which has interesting im-plications for the structure and function of these otherwisehighly conserved proteins We also show that the plastidchromosome is highly divergent in structure with a pro-nounced strand polarity altered gene order and large exten-sions in many coding sequences which appear transcribedFinally we significantly expand the depth of sequence cover-age at the DNA level and show that the coverage pattern atthe chromosome ends polymerase chain reaction (PCR)experiments genome migration on pulse field gels and pres-ence of two long terminal inverted repeats (TIRs) all suggestthat the genome is linear in structure This would representthe first documented case of a linear-mapping plastidgenome These unique characteristics substantially expandour current understanding of plastid genome diversitymuch of which we hypothesize is due to high levels of intra-genomic duplication in this lineage

Results

Fragmented Genes Encode Fragmented ProteinsExpressed from Polycistronic mRNAs

The complete C velia plastid genome contained a number ofsmall fragments of genes including psaA atpB psaB psbBrpl3 and tufA In most of these cases intact homologs werealso present but in psaA and atpB only two fragments werefound that were widely separated in the genome and thattogether would account for the entire gene (psaA-1 psaA-2atpB-1 and atpB-2) Both gene products are critical for pho-tosynthesis and ATP generation suggesting three possibleexplanations all of which are unusual the plastid fragmentsare pseudogenes and intact proteins are imported from thecytosol the plastid gene products are trans-spliced at theRNA or protein levels or the proteins function as uniquetwo-subunit forms

Expression of psaA-1 and psaA-2 fragments was examinedby transcript mapping by circularizing mRNAs and Northernanalysis (see Materials and Methods) Reverse transcriptaseRT-PCR on circularized mRNA produced a product from eachof the fragments comprising the coding region flanking se-quence and a short 6ndash12 nucleotide-long oligouridylylated(oligoU) tail (fig 1) The consistent failure to connect thefragments by RT-PCR using multiple primer sets suggested

they are not spliced at the RNA level Both psaA-1 and psaA-2are surrounded by several genes on the same strand so theirco-ordinated expression was analyzed by hybridization toprobes corresponding to psaA-1 psaA-2 the upstreamgenes and the downstream noncoding region All probeshybridized with a fragment of the expected size of a stand-alone oligoU mRNA corresponding to the fragment in ques-tion no evidence for a spliced RNA form was found Howeverprobes did hybridize to larger fragments (fig 1 and supple-mentary fig S1 Supplementary Material online) correspond-ing in size and hybridization pattern to a single dicistronic(psaA-1 + psbB) or multiple large polycistronic mRNAs(psaA-2 + psbE + psaB and surrounding genes) Linkagesbetween psaA-1 and psbB and psaA-2 and psaB were con-firmed by RT-PCR Hybridization patterns around psaA-2suggested processing of large mRNAs into the dicistronicpsaA-2 + psbE and subsequently to single gene transcripts(fig 1) Similar to psaA all attempts to link the two atpBfragments by RT-PCR yielded no products The Northernanalysis of atpB-1 and atpB-2 revealed single bands corre-sponding in size to each expected fragment and no evidencewas found for a spliced mRNA again suggesting independentexpression (fig 2A) The transcriptional profiles of both atpBfragments were supportive of this conclusion (fig 2B) All fourpsaA and atpB fragments contained structurally conserveddomains (figs 1 and 2A) and were among the top 22 mostabundantly transcribed plastid genes (table 1 and see later)Evidence at the genomic and transcriptomic levels thereforeconsistently suggested that all psaA and atpB fragments areindependently transcribed translated and code for functionalproducts

To obtain convincing evidence that psaA and atpB genefragments are expressed as separate polypeptides and that nointact version of the proteins is present we analyzed mem-brane protein complexes by combination of two-dimensional(2D) electrophoresis and mass spectrometry (MS)Membrane fraction isolated from C velia cells was solubilizedby dodecyl-b-maltoside and protein complexes were sepa-rated on Clear-Native gel in the first dimension and onsodium dodecyl sulfate (SDS) gel in the second dimension(fig 3) The most abundant spots were subjected to MS anal-ysis of their tryptic peptides which were correlated with pre-dicted peptides of plastid-encoded genes and availableexpressed sequence tag (EST) sequences This allowed us todistinguish photosystem II (PSII) cytochrome b6f and PSIand PSII supercomplexes with bound antennas (fig 3)Fragments corresponding to PsaA-1 PsaA-2 AtpB-1 andAtpB-2 were all identified but no spot with the expectedmasscharge properties of intact PsaA and AtpB was found(see supplementary tables S1 and S2 Supplementary Materialonline for a list of peptides assigned to PsaAB and AtpABproteins) Identification of spots was consistent with chloro-phyll fluorescence detected in the native gel All three PSIIcomplexes exhibited strong fluorescence which contrastedwith the minimal fluorescence of chlorophyll bound withinthe PSI supercomplex The minimal fluorescence emissionfrom the PSI complex indicated it was well preserved beforeit was separated into subunits in the second dimension

2448

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

658349813638

2604

1903

1383

955

623

281

285 kb

105

104

103

102

10

1Tran

scrip

t cov

erag

e pe

r bp

(Lo

g)

psaBpsaA -2psbBpsaA -1

psbEpsbB ψ psaA ψ

acsF

trnG

18 kb[UUUUU]

[UUUUU]095 kb

16 kb

fro frofro

[UUUUU]125 kb

035 kb

POLYCISTRONICTRANSCRIPTS

PROCESSEDTRANSCRIPTS

GENOME

PROTEINS psaA-1 psaA-2divergent conserved

65834981

36382604

1903

1383

955

623

281

psaA-1 psbB psaA-2 psbE

N N C

Cthylakoid stroma

plastid matrix

thylakoid membrane

M

M

D D

P PD

M

M

1 2 9876143 10 11

FIG 1 Expression model for the split PsaA PsaA fragments are separated at the genomic transcriptomic and protein level (top to bottom) The topgraph shows the transcriptional profile (total RNA in black upper line and polyA RNA in gray lower line) of the genomic region below In the genomicregion boxes represent genes and gray arrows show the coding DNA strand Consistent with the transcriptional profile Northern hybridization blots(boxes below) reveal that both psaA fragments are transcribed within larger polycistronic transcripts (P) and further processed into dicistronic (D) andmonocistronic (M) units The monocistronic psaA transcripts contain oligoU tails and translate into independent peptides both of which participate inPSI (thylakoid trans-membrane domains are shown by vertical boxes and numbered)

FIG 2 Expression model for the split AtpB (A) Northern analysis revealed a monocistronic transcript for each of the two atpB fragments(B) Transcriptome coverage of the two genomic regions is shown for total RNA (black upper line) and polyA RNA (gray lower line) Genes areshown by boxes and gray arrows indicate coding strands Black arrows below indicate predicted transcripts based on the combination of transcriptmapping and Northern analysis (C) Alignment of plastid cyanobacterial and yeast mitochondrial AtpBs reveals that the split of Chromera velia AtpBoccurred within a 24 amino acid insertion that is also present in the sister taxon Vitrella (the most probable positions of the split are boxed)

2449

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

(fig 3) This together with the lack of evidence for intactgenes or transcripts for either gene suggested strongly thatthe two separate subunits of both PsaA and AtpB proteins areassembled into functional PSI and ATP synthase respectively

Modeling the two-subunit forms shows that the pointat which both genes were split corresponds to a loopspanning structural domains (fig 1) In the case of PsaAthe position of this breakpoint corresponds to the largestloop in the protein which separates the first two pairs ofperipheral helices from the rest of the protein including thephotochemistry-performing core (fig 1 Jordan et al 2001Green 2003) The split has also driven a considerablechange in divergence rate between the two peptides PsaA-

1 at the photosynthetic antenna periphery is significantlyless conserved than PsaA-2 at the antenna core By calculat-ing maximum likelihood-corrected genetic distances (seeMaterials and Methods) we estimated that PsaA-1 is about46 times more divergent than PsaA-2 relative to other plastidhomologs suggesting that it is not surprisingly under weakerfunctional constraint Maximum likelihood phylogeniesusing a wide sampling of plastid and bacterial homologs(see Materials and Methods) also excluded a secondarybacterial origin for PsaA-1 as suggested previously (Mazoret al 2012) although fast-evolving both C velia PsaA-1 andPsaA-2 were most closely related to the sister taxon Vitrellaand other plastids

Table 1 Total RNA Transcript Mapping to Chromera velia Plastid Genes

Region NumberGene Length (kb) Covbpa Rel Cov vs Medb

Intergenic regionsc 105 224 3444

Genesc 107 941 33990

Ribosomal RNA 3 48 348584 4753

Transfer RNA 29 24 699 10

Protein-coding 75 869 12033 164

Photosystem II (psb) 11 66 52244 712

Photosystem I (psa) 4 57 22740 310

Cytochrome b6f (pet) 4 21 18301 250

ATP synthase (atp) 6 51 14762 201

Ribosomal proteins large subunit (rrl) 10 54 1141 16

Ribosomal proteins small subunit (rrs) 12 168 1328 18

Thylakoid import (sec tat) 3 45 817 11

RNA polymerase (rpo) 4 154 293 04

Other function (acsF ccsA clpC tufA ycf3) 7 119 3617 49

Unknown function (orf) 14 135 1339 18

Highly expressed protein-coding genes psbA 10 383384 5228

psaA-2 12 48594 663

psbE 03 41694 569

atpH1 02 41540 566

psbD 10 37785 515

psbC 14 32123 438

petD 05 27882 380

psbB 16 23102 315

psbV 05 20686 282

psaC 02 19422 265

atpA 17 17239 235

petA 09 16957 231

atpB-1 05 15992 218

psbH 03 15653 213

petB 06 15341 209

psaB 34 14407 196

tufA 13 13235 180

petG 01 13026 178

atpB-2 15 10950 149

psbK 01 9339 127

psaA-1 08 8538 116

psbJ 01 8027 109

NOTEmdashGenes encoding split proteins are underlinedaAverage coverage per base pairbRelative coverage compared with median gene coverage depth per base pair (=7334)cGeneintergenic regions in the TIRs were counted once

2450

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

When compared with PsaA the AtpB split is suggestiveof a different but equally interesting process The regionsurrounding the split is conserved in plastid mitochondrialand bacterial homologs but interrupted by a 24 amino acid(aa) insertion in Vitrella C veliarsquos closest photosyntheticrelative (fig 2C) A detailed sequence alignment of thisregion shows that the split in C velia AtpB occurred withinthis shared insertion suggesting that the ancestral acquisitionof this insertion might have provided an evolutionary oppor-tunity for the split A 31 aa long insertion at the same site is

also present in the AtpB of the related dinoflagellateAmphidinium (data not shown) Although unrelated insequence the notorious divergence of dinoflagellate plastidproteins and their greater distance to C velia and Vitrellacould easily account for the insertion divergence pointingto a possibly even deeper origin

Many Other Plastid Open Reading Frames areExtraordinary and Transcribed

PsaA and AtpB splits stand out as unprecedented cases butrepresent only a fraction of peculiarities in C velia plastidopen reading frames (ORFs) The genome contains manyunusually modified ORFs including several genes that areexpected to be present in plastid genomes but which areunusually large due to the presence of long extensions thatbear no recognizable similarity to known genes In three genes(rps7 8 and 17) the ORF region has been extended by 126 to273 amino acids toward the C-terminus effectively triplingthem in size (supplementary fig S2 SupplementaryMaterial online) In four other genes (rps3 4 8 and 11) theregion sharing homology with other plastid proteins is foundat the end of an unusually large ORF (supplementary fig S3Supplementary Material online) The most obvious exampleof this is Rps4 which is encoded within a 1998 aa long ORFcompared with 201 aa in the red alga Porphyra purpurea Inthese cases start codons may be located near the beginning ofthe homologous region but it is still puzzling why the homol-ogous region would be preceded by such a long stretch ofsequence uninterrupted by stop codons Canonical Shine-Dalgarno sequences are absent (the predicted 30-end of the16S rRNA is highly divergent in C velia) and therefore cannotsupport start codon prediction ORFs of two additional genesrpoC1 and rpoC2 contain long insertions (507 and 971 aa)in their variable regions (none of them appears to representan intein supplementary fig S4 Supplementary Materialonline) Finally a number of smaller changes are found inORFs throughout the C velia plastid genome Accountingfor extensions and indels longer than 20 aa five other ORFshave a truncated N-terminus (atpA ccsA petA ycf3 andpsaB) one ORF has a significantly extended N-terminus(rpl4) two ORFs have a truncated C-terminus (ccsA andrpoA) five ORFs have an extended C-terminus (atpA atpBatpH2 petA and rps2) and three ORFs contain interveninginsertions (rpl6 rpoB and psaB)

Altogether ORFs of various functionally unrelated proteinscontain unusual modifications Many of the proteins areindispensable and exclusively plastid-encoded in algae andplants suggesting that they are unlikely to be in the processof degeneration loss or relocation to the nucleus in C veliaAccordingly none of these proteins can be identified in theavailable nuclear transcriptomes from C velia (Woehle et al2011) To provide first insights into expression of the unusualproteins we analyzed the plastid transcriptome using totaland polyA RNA fraction sequence that we generated (polyURNA sequence was not available for this analysis) Transcriptmapping from the total RNA fraction (see Materials andMethods) revealed that all unusually long ORFs are coveredby transcriptome reads roughly equally in their entirety

FIG 3 The 2D electrophoresis of membrane protein complexes ofChromera velia and identification of individual spots Membrane pro-teins were solubilized by dodecyl-b-maltoside and separated in the firstdimension by Clear-Native electrophoresis (Clear-Native-PAGE) Afterthe separation the gel was scanned by LAS 4000 imager (Fuji Japan) inchlorophyll fluorescence mode (Chl fl) after excitation by blue LED lightto distinguish the PSII and light-harvesting complexes The proteincomplexes resolved in the first dimension were further separatedin the second dimension by denaturing gel (SDS-PAGE) and stainedby Coomassie blue Protein spots were cut from the gel and analyzedby LC-MSMS as described in Materials and Methods Identified spotsare highlighted and positions of protein complexes separated byClear-Native electrophoresis are marked as follows PSII PhotosystemII SC-PSII Supercomplexes of photosystem II with antenna SC-PSISupercomplexes of photosystem I with antenna

2451

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

(supplementary figs S2 and S3 Supplementary Materialonline) Although this approach cannot accurately pinpointthe position of start codons (many rps genes are expressedrelatively little and intergenic regions are often transcribed)the consistent coverage by transcripts suggests that the ORFsmay really have expanded in size at the protein level Much ofthe long insertions in rpo genes is also equally represented inthe transcriptome data however in this case a drop in cov-erage is found within each insertion creating a 200 bp gap(supplementary fig S4 Supplementary Material online) Thisraises the possibility that RpoC1 and RpoC2 may be splitmuch like PsaA and AtpB however both are in frame andlow coverage so more evidence is required to distinguishbetween the two alternatives

Unusual changes to gene presence have also been notedtwo genes are present in two (atpH) and three (clpC) full-length paralogs despite that a single type is found in otherplastid genomes All three clpC paralogs are highly divergentin sequence from each other and clpCs in other plastids Theyoverlap only partially and the largest and most complete ofthe paralogs (clpC3) is an order of magnitude less expressedthan the other two (supplementary fig S5A SupplementaryMaterial online) Likewise the two atpH paralogs are differentin sequence (93 nucleotide similarity) length and expres-sion AtpH1 is both highly expressed and conserved butthe expression of the duplicated atpH2 located on the TIRsis nearly 100-fold lower (supplementary fig S5BSupplementary Material online) The atpH2 gene contains

an unusual C-terminal extension which doubles it in size Acloser look at the surrounding sequence reveals that atpH2originated by a duplication event of the atpH1 region (includ-ing trnK discussed later) This together with its low expres-sion suggests that atpH2 is most likely a non-functionalpseudogene More generally however this example illustrateshow duplication can drive gene paralogy ORF expansionand altered gene order all features observed throughoutthe C velia plastid genome

Analysis of the total RNA and polyA fraction (fig 4) tran-scription profiles revealed more unexpected featuresTranscripts for all plastid genes were present in the totalRNA fraction and consistently also found in the polyA frac-tion at two to three magnitudes lower abundance probablydue to a carryover of non-polyA transcripts (fig 4) The psbAgene stood out in two ways however First it was highlyexpressed being represented at almost an order of magnitudehigher levels than other protein-coding genes (and similarlevels as the rRNA operon fig 4 and table 1) Second therepresentation of psbA in the polyA fraction was about equalto the total RNA fraction which would be the expected resultfor a polyadenylated transcript (fig 4) Only two additionalregions corresponding to rps19 and a psbB fragment (partof the annotated orf157) were relatively overrepresented inthe polyA fraction but only rps19 contained A-rich genomicsequence at the 30-end that could explain the bias Whetherthe psbA gene and psbB fragment are uniquely polyadeny-lated or a carryover into the polyA fraction occurred due to

10000 20000 30000 40000

105

104

106

103

102

10

1

atpB-2 psaA-1 psbB psaA-2 psbC psbDpsaB

position (bp)

Cov

erag

e pe

r bp

(Lo

g)

atpH2

Photosystem I + II ATPase Uknown ORFExpression Pseudogene fragmentOther functiontotal RNA polyA RNA

psbE

50000 60000 70000 80000

105

104

106

103

102

10

1

23S rRNA atpB-1 atpH1 atpI

Cov

erag

e pe

r bp

(Lo

g)

psbV K psbJ

90000 100000 110000 120000

105

104

106

103

102

10

1

psbAatpA

Cov

erag

e pe

r bp

(Lo

g)

atpH2psbHpsbT psaC petGB

petA

Cytochrome b6f

rps19

tufA

petD

16S rRNA

psbA psbB

FIG 4 Plastid total RNA and polyA RNA transcriptomic profiles Coverage by total RNA (red) and polyA RNA (blue) is shown for the whole plastidgenome (Log scale) Gene families are color-coded according to the key The highly expressed psbA gene is highlighted in bold The split psaA and atpBgenes are in red The boundaries of the TIRs are separated by dashed vertical lines

2452

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

another reason is currently not clear and awaits successfulcloning of their 30 transcript ends The transcriptomes alsoenabled us to analyze expression pattern among neigboringgenes which have been highly rearranged and strand-polar-ized in C velia compared with other plastids Canonicaloperons (atpIHGFDA psaAB petBD ribosomal proteinsupercluster ribosomal RNA operon eg Westhoff et al1983) and translationally coupled units (atpBE Gatenbyet al 1989) were also affected by rearrangements and eitherfragmented or internally reshuffled Canonical promoterscould not be identified These characteristics indicate thattranscription of many core plastid genes has been remodeledNorthern hybridization patterns in the psaA-1 and psaA-2region already revealed that this is the case in several keyproteins of the photosystems which are expressed fromnovel multicistronic mRNAs through complex processing(fig 1) Whole-genome transcriptional profiles (fig 4) confirmthis and suggest that polycistronic transcription is widespreadin the genome (compare transcript coverage at intergenicspacers and the region downstream of the 16S rRNA genefig 4) Although the primary transcript regions and processingpatterns cannot be predicted accurately the data show thatnovel polycistronic units most likely exist and often includefunctionally similar genes (compare distribution of genes forphotosynthesis and protein expression in fig 4)

Finally we estimated and compared relative expression ofall plastid genes separately and in their functional groupsbased on the total RNA transcriptome (table 1 and supple-mentary table S3 Supplementary Material online) Asexpected the rRNA genes were most highly expressedfollowed by genes of the four membrane complexes (PSIPSII cytochrome b6f and ATP synthase) and these alsocomprised all of the 22 most expressed protein-codinggenes with the single exception of tufA The least expressedwere genes related to transcription and translation RNApolymerase tRNA genes and ribosomal proteins (table 1)However unknown ORFs and most intergenic regions werealso expressed including relatively highly expressed orf389orf201 and intergenic regions downstream of psbJ andatpB-1 (fig 4) Additionally we used the transcriptome toaddress the presence of RNA editing which is known in theplastids of the related dinoflagellates (eg Dang and Green2009) but has not been found in those of apicomplexans andVitrella The consensus of transcriptomic reads at each sitewas identical to the C velia plastid genome sequence suggest-ing the absence of editing Only four sites showed significantpolymorphism (gt10) at RNA level (sites 44346 [515]46071 [2366] 46074 [3935] and 46574 [2379]) all ofwhich were located within the rRNA operon and identical(998ndash100) at the DNA level Based on their location weconclude that this is more likely due to reverse transcriptionerrors of modified rRNA nucleosides and not RNA editing

A Linear-Mapping Plastid Genome in C velia

One of the most significant features of the previouslypublished plastid genome from C velia is that it does notmap as a circle as do all other plastid genomes completed todate The gap in the sequence falls between the two copies of

the TIR containing psbA (Janouskovec et al 2010) Weattempted to close this gap using multiple approaches Westarted by massively increasing the depth of sequence cover-age The genome was originally sequenced by 454 pyrose-quencing to a 13-fold depth of coverage so we used anindependent approach and assembled the genome frompaired-end Illumina sequence reads to an average depth ofalmost 600-fold This approach extended the two TIRs by300 bp on each side which was verified by PCR (supplemen-tary fig S6 Supplementary Material online) The very end ofeach TIR was followed by a 15 bp sequence further followedby an extension into the complementary TIR at a very lowdepth of coverage (fig 5) These extensions created an overlapbetween the ends however several lines of evidencesuggested that this is not due to a circular chromosomebut rather a linear topology with structured ends Firstmapping the depth of coverage revealed a pattern expectedfor a linear molecule The depth of coverage was generallyconsistent across the large single copy (LSC) region and mostof the TIRs but reduced linearly beginning at about 600 bpfrom the end followed by an exponential reduction andpartial recovery between 300 to 50 bp from the end followedby a terminal drop (fig 5A) The size and shape of theexponential reductionrecovery would be predicted by themean DNA fragment size in the sequencing library (327 bp)and linear topology as a consequence of fragmentation biasnear the chromosome end Therefore both the overalldecrease in coverage depth and the exponential decreaserecovery close to the ends support a linear chromosomeconformation (if it were circular the whole region shouldnot differ from the rest of the genome) Second if thegenome was circular-mapping and the gap was hard tosequence we would expect some paired ends to span thegap However when more than 650000 paired ends aremapped to the genome not a single paired end spanningthe gap was identified (see Materials and Methods) Insteadboth ends of the chromosome consisted exclusively ofreverse-mapping reads (fig 5A and B) Third PCR experi-ments using six primers (all shown to successfully amplifyproducts from one side of the gap) and all their nested com-binations failed to fill the gap between the two TIRs (supple-mentary fig S6 Supplementary Material online) Fourth thesize of the plastid genome estimated directly by pulsed-fieldgel electrophoresis (PFGE) and by psbA probe hybridizationwas consistent with abundant presence of linear monomers120 kb in size (supplementary fig S7 Supplementary Materialonline and Janouskovec et al 2010) Subgenomic-sizedmaterial and unresolved DNA in the well were also abun-dantly represented whereas putative linear dimers wererare and linear concatemers were not detected The PFGEexperiment was reproduced by using probes to three plastidgenes encoded in the LSC region tufA petA and atpA (seeMaterials and Methods) and consistently led to the sameresult suggesting that linear monomers are an importantconstituent of plastid DNA in C velia Subgenomic-sized mol-ecules could represent plastid DNA fragmented during PFGEpreparation (note that intact C velia cells were digestedin agarose plugs however) nucleus-encoded plastid DNA

2453

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

fragments plastid DNA in the process of replication or otherforms of genuine plastid DNA (Ellis and Day 1986 Oldenburgand Bendich 2004 Scharff and Koop 2006) Incompletely di-gested cells of C velia in the agarose plugs (see Materials andMethods) could be responsible for much of the signal in thewell although high-molecular-weight branched DNA formscould also be present and cannot be distinguished using thecurrent approach (Bendich 1991 Bendich 2004) Altogetherbecause the plastid genome was not closed by deep sequenc-ing using either 454 or Illumina no paired-end linkage couldbe established no linkage could be established by PCR and

because the size of the plastid genome estimated by PFGE andSouthern blot hybridizations was consistent with the exis-tence of linear monomers we conclude that the genome isprincipally linear in structure and circular molecules andlinear concatemers are rare or absent

If the genome is indeed linear it will be important to knowhow the C velia plastid DNA replication and copy numbercompares to the circularconcatemeric plastid DNA inapicomplexans (Williamson et al 2001 Williamson et al2002) and other organisms To provide a first glimpse intothis we identified a putative replication origin of the C velia

A

B

FIG 5 Schematic of plastid chromosome ends (A) Coverage depth (y axis) by forward-mapping (yellow) reverse-mapping (purple) and total (red)genomic reads is projected onto the TIR Position of genes is shown by gray boxes at the bottom (K = trnK P = trnP) Total coverage depth startsdropping gradually at about 800 bp from the chromosome end and goes through a U-shaped minimum at about 150 bp from the end The calculatedmean size of the end-sequenced DNA fragments (horizontal black bar red bits correspond to sequenced end pairs) and coverage by forwardreverseread pairs in this region suggest that the U-shaped minimum results from unequal DNA fragmentation near the chromosome end A steep drop in totalcoverage depth occurs at the very end of the chromosome (B) Here the TIRs on each of the chromosome ends (green and blue respectively) divergeinto a shared 15 bp region followed by a short sequence of the complementary TIR (DNA ends marked as 30 and 50) The total depth of coverage byDNA reads (y axis) steeply decreases in the 15 bp region The 15 bp region is also enriched in nucleotide discrepancies (sequence logos in bold) The last130 bp at each chromosome end is exclusively assembled from reverse-oriented paired ends (horizontal colored arrows) and no paired-end spanningthis region can be identified suggesting that the genome cannot be genuinely circularized

2454

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

plastid genome in silico using the cumulative GC-skew anal-ysis This region (68ndash69 kb from one end of the linear contigsupplementary fig S8A Supplementary Material online) ischaracterized by a minimum in cumulative GC-skew andloosely conserved tandem repeats (positions 68404ndash69029)In contrast both ends of the chromosome are at a cumulativeGC-skew maximum suggesting that they most likely corre-spond to replication termini Another feature commonlyassociated with origins of replication is the major shift ingene orientation which is located approximately 18 kbaway from the predicted replication origin in C velia (supple-mentary fig S8B Supplementary Material online) Thegenomic region surrounding the two features is relativelyoverrepresented in the Illumina data whereas both chromo-some ends are relatively underrepresented (this holds aftercorrection for base composition supplementary fig S8CSupplementary Material online) which is consistent withthe existence of replicating molecules in the Illumina libraryAltogether these three characteristics are consistent withlinear-mapping topology and suggest that replication startsat about two-thirds of the way from one end of the moleculeand proceeds bidirectionally toward the ends The copynumber for two C velia plastid genes (tufA and atpH) wasdetermined using dot blot hybridization to total genomicDNA and a synthetic construct (see Materials andMethods) The inferred copy number of both plastid geneswas 9 times higher compared with a single-copy nuclear genetopoisomerase II and confirmed that genes encoded in theLSC region and TIR are equally represented (atpH is present inthree closely related paralogs all of which hybridized to theconstruct) (supplementary fig S9 Supplementary Materialonline)

Discussion

Significance of Split Proteins for Photosynthesis andATP Generation

Both PsaA and AtpB have never been observed to be frag-mented like those of C velia Although the psaA gene is splitin Chlamydomonas reinhardtii RNA trans-splicing producesa full-length transcript which is translated to a full-lengthprotein (Merendino et al 2006) The structure of the PSIcore is highly conserved in all known oxygenic phototrophs(Nelson and Yocum 2006 Busch and Hippler 2011) and it isformed by a heterodimer of structurally similar PsaA and PsaBproteins which bind a remarkably high number of cofactorsabout 100 chlorophylls 14 carotenoids and 2 phylloquinones(Amunts et al 2010) The algal PSI is expected to be structur-ally similar to the plant PSI including several Lhca antennaproteins bound in a half-circle around the PSI (fig 6A)

PsaA in C velia is split between domains 4 and 5 (figs 1 and6A) This suggests that the split was not simply a reversal ofthe proposed ancient formation of psaA by the gene fusion ofa psbBpsbC-like antenna component and a psbApsbD-likecore of the reaction centre (domains 1ndash6 and 7ndash11 respec-tively in fig 1 Schubert et al 1998) The position of the splitand the faster evolutionary rate of PsaA-1 (peripheral frag-ment) can be reconciled with the structure of the PSI (fig 6A

and B) Indeed the presence of a supercomplex of PSI andantenna proteins in C velia is obvious from the 2D electro-phoresis (fig 3) Apart from the structure of PSI itself theprocess of the PSI assembly seems to be highly conservedthrough evolution from cyanobacteria to higher plant chlo-roplasts including assistance of the same auxiliary factors suchas Ycf3 and Ycf4 (Boudreau et al 1997 Ruf et al 1997)Although many individual steps in the PSI biogenesisremain unknown this process had to be remodeled inC velia to assemble three proteins instead of two into thefunctional PSI core complex Particularly intriguing is thebinding of chlorophyll cofactors into the split PsaA Both invivo and in vitro studies suggest that chlorophyll has to beinserted into core subunits of PSI co-translationally probablyas a prerequisite for correct protein folding (Kim et al 1994Eichacker et al 1996) The putative interface between PsaA-1and PsaA-2 is rich in chlorophyll molecules (fig 6B) andaccording to available crystal structures these chlorophyllsare coordinated by both parts of split PsaA It is not clearhow C velia inserts these chlorophylls into PSI and how sta-bility and correct folding of the nascent PsaA-12 is achievedPerhaps both PsaA fragments are assembled together co-translationally and this is synchronized with or even assistedby chlorophyll loading (which could also help to coordinatethe orientation of the two fragments) In any case the PSI corebiogenesis in C velia could involve additional synchronizingassembly step(s) and the recruitment of new nuclear-encoded auxiliary factors Interestingly the second proteinof the PSI antenna core PsaB mirrors some of the PsaAstructure at functionally analogous positions although notsplit PsaB also contains a variable loop between the fourthand fifth trans-membrane helices and a highly divergentN-terminus Whether this plays a compensatory role directlyrelated to the splitting of PsaA is not clear but it is an inter-esting possibility that would require direct biochemicaltesting

Similar to the PsaA protein any split of the AtpB proteinwould seem improbable due to its critical function andconserved structure Even though the proposed AtpB-1part of the b-subunit appears to be relatively far from thecatalytic site (fig 6C) it is known that tilting of the b-subunitsincluding the top b-sheet crown is critical for the catalyticcycle The movement of the upper part of the b-subunit issubtle but if it is restricted by inhibitor tentoxin then thecyclic interconversion of nucleotide binding sites is blocked(Groth 2002) Tentoxin binds close to the place where theAtpB protein is split (Groth 2002) Correct reassembly ofAtpB-1 and AtpB-2 proteins is therefore likely to be highlyconstrained to avoid any restrictions of AtpB-2 movementOn the other hand it is also possible that the fragmentedAtpB-2 is more flexible and perhaps less sensitive to somenatural inhibitors The ATP synthase was also noteworthy inthat it migrated at the top of the native protein gel This hasnever been observed in cyanobacteria (Herranen et al 2004)or higher plants (Jarvi et al 2011) and indicates that ATPsynthase might be integrated into a very stable megadaltonsupercomplex in the plastid of C velia

2455

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Highly Divergent Characteristics of Gene andChromosomal Structure

A number of highly modified ORFs are found in the C veliaplastid genome and distributed without any obvious patternLong extensions or insertions in plastid proteins are uncom-mon in most plastids including those in other alveolatesSome have been identified in green algae (eg de Cambiaireet al 2006) but never in the extent or number reported hereMany of the C velia predicted proteins that contain exten-sions function in the small subunit of the ribosome Ifthe extensions are really present in the final functional pro-teins this has implications for plastid ribosome structureAlternatively transcripts of the rps ORFs and other unusualORFs could be modified so that only the homologous regionis translated This process may be related to transcript oligour-idylylation but as yet we have no evidence to support thisOligoU tailing is another modification of general interest(Wang and Morse 2006) and its significance and distribution

in C velia plastid transcripts remain unknown OligoU tailsin some C velia transcripts are found in mature mRNAs thatare derived from polycistronic primary transcripts (fig 1)This suggests a possible role in transcript processing perhapssimilar to that observed in dinoflagellate plastids (Nelsonet al 2007 Dang and Green 2009 Barbrook et al 2012)If confirmed the ubiquitous presence of polycistronictranscription would imply that oligoU tailing is widespreadin the C velia plastid and tightly intertwined with expressionof many core plastid genes

No linear-mapping plastid genome has been documentedto date although there is a longstanding debate aboutwhether circular-mapping plastid genomes might representphysically linear molecules (Bendich 2004) At a sufficientdepth of coverage the two ends of the C velia plastidgenome can be (barely) overlapped in sequence but multiplelines of evidence suggest the genome is actually linear ThePFGE data support existence of linear monomers and rare

FIG 6 A putative position of the PsaA proteins in PSI and the AtpB proteins in ATP synthase of Chromera velia as approximated from proteinalignments and available crystal structures of plant counterparts (A) A stromal view on the pea PSI-LHC complex (PDB ID code 2WSC Amunts et al2009) showing a putative position of PsaA-1 and PsaA-2 in the PSI complex Lhca antenna proteins are shown in green stromal subunits PsaC PsaD andPsaE were removed for clarity (B) A detail of the same structure showing arrangement of chlorophyll molecules bound to the PSI core Red circle markschlorophylls expected to be bound at an interface between PsaA-1 and PsaA-2 proteins in C velia (see discussion for details) (C) AtpB-1 and AtpB-2proteins fitted into structure of the spinach F1-ATPase (PDB ID code 1KMH) Catalytic site is indicated by an orange asterisk

2456

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

dimers (which could arise by TIR-mediated recombinationbetween two monomers) but is incongruent with a signifi-cant presence of linear concatemers This leaves no directcomparisons possible with other plastid DNAs includingthose in related apicomplexans which comprise various mix-tures of circular molecules linear concatemers and high-mo-lecular-weight material (Lilly et al 2001 Williamson et al 2002Bendich 2004 Day and Madesis 2007) Presence of TIRs in theC velia plastid genome is a common characteristic of lineargenomes (eg in apicomplexan mitochondria Kairo et al1994 Hikosaka et al 2010) The physical structure of thechromosome ends remains unknown but given thesequence one can speculate based on structures observedin other linear genomes These include single-strandedloops (possibly followed by a nick) or single-stranded over-hangs in the 15 bp regions (reviewed in Nosek et al 2004) ordiverse associations with end-specific proteins (eg Rekoshet al 1977 Tomaska et al 1997) Gradually decreasing depthof coverage near the chromosome end may also suggest thatthe whole terminal 600 bp region not just its very end isprotected possibly by a t-loop (Tomaska et al 2002) Norepeated sequence can be found close to the ends howeverso determining which of these structures if any most closelyrepresents the physical state of the C velia plastid chromo-some will require direct testing Altogether these data provideevidence for the first linear-mapping plastid genome inC velia and suggests that a similar topology could existin other plastid genomes particularly those that do notpresently map as a circle (Gabrielsen et al 2011)

The change in the plastid chromosome topology hasimportant implications on the process of DNA replicationand overcoming the end-replication problem In the api-complexan and other plastids several types of replicationorigins have been identified and linked to the D-loop roll-ing circle and recombination-mediated replication strate-gies The most common type of replication origin isassociated with bidirectional D-loop replication and locatedinside the duplicated rRNA inverted repeat (Kolodner andTewari 1975 Williamson et al 2002 Krishnan and Rao2009) In contrast the C velia plastid genome lacks boththe duplicated rRNA unit and associated replication originand is probably replicated bidirectionally from thepredicted putative replication origin toward the chromo-some termini which is a unique situation among all plastidgenomes The estimated plastid gene copy number (9that of a nuclear marker) is also unusually low comparedwith most photosynthetic plastids (~50ndash100 copies Dayand Madesis 2007) and is more similar to the gene copynumber in the non-photosynthetic plastids of apicomplex-ans (~25 copies Matsuzaki et al 2001) or plastid minicir-cles during exponential growth in dinoflagellates(2ndash4 copies Koumandou and Howe 2007)

Searching for Forces Driving Structural Complexity

The C velia plastid genome is unique in a number of waysHere we have examined several unusual characteristics ofits organization and expression all of which raises the

question why is this genome home to so many odditiesOne property that might explain several of these character-istics concurrently is a high level of intragenomic duplicationThe C velia plastid genome has been extensively reshuffledand in order to understand the underlying forces wesearched for palindromes short repeats and long repeatsPalindromes and short repeats were comparatively rarehowever the searches identified at least 46 pairwise matchesbetween longer genomic sequences (gt50 bp) at 65ndash100similarity (fig 7) Most of the duplicates are divergent andlow copy number (2ndash3) and apart from the TIRs they includemany intergenic regions duplicated atpH and clpC genesand duplicated fragments of otherwise complete genes(psaA psaB psbB rpl3 and tufA fig 7) The region containingmost repeated sequence is found around the TIR boundary(fig 8) Multiple parts of this region including severalgene fragments three atpH paralogs and six trnK paralogscomprising both complete and fragmentary variants arescattered in several places in the genome (fig 8) Long repeatsare rare in most plastids suggesting that duplication isan important force in shaping the C velia plastid genomeDuplication could explain occurrence of paralogs small genefragments extensive gene re-shuffling and re-structuringof operon units through movement of promoters andintergenic elements Similar processes could account for theaddition of extensions and insertions to genes The splitting ofpsaA and atpB could likewise be seen as a process involvingduplication or partial duplication For example the PsaA splitcould have proceeded through an intermediate in which the

FIG 7 Large repeats in the Chromera velia plastid genome The lineargenome is drawn as an incomplete circle with genes in black intergenicsin light gray and two TIRs (white lines) at chromosome ends (15 bpregions at the termini are not visible here) Links connect homologousregions of 50 bp or more identified by BLAST (dark gray) and additionalmatches identified by Pipmaker (medium gray) Homology between theTIRs is not shown for clarity Names of genes most affected by dupli-cation events are indicated

2457

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

canonical and split PsaAs coexisted In this scenario each ofthe two complexes could have been regulated independentlyand potentially even have different functions much alike thecyanobacterial monomeric and trimeric PSI (Majeed et al2012)

Duplicated sequences in C velia plastid are low copynumber long and divergent from each other They aredissimilar from short dispersed repeats found in some algalplastids (eg in Chlamydomonas Maul et al 2002) and longerrepeats found in angiosperm plastids (eg Guisinger et al2011) all of which are conserved in sequence and compara-tively high copy number It remains unclear how duplicationis mediated in C velia Few duplicates are in tandem so theduplication process is not likely to be related to DNA repli-cation and is more likely related to recombination A func-tional DNA recombination machinery has been documentedin several land plant and algal plastids (Palmer 1983 Boyntonet al 1988 Haberle et al 2008) where it primarily acts in DNArepair and replication Unwanted genome reorganization isknown to be actively prevented which explains why mostplastids have maintained a conserved genomic architecture(reviewed in Marechal and Brisson 2010) It is possiblethat mechanisms preventing unwanted recombination oralternatively those allowing for recombination-mediatedrepair were loosened or lost in the C velia plastid whichled to propagation of duplicates and rearrangementsThe apparent linearization of the chromosome could alsostem from a recombination-mediated rearrangementinvolving homologous TIRs and 15 bp repeats now locatedat the termini Interestingly although sequence shuffling hasbeen extensive in the C velia plastid genome it has not beencompletely random Most genes have been reorganized inlarge clusters with a strongly pronounced strand polarity(supplementary fig S8B Supplementary Material online)More importantly functionally related genes often clustertogether and are most likely transcribed polycistronically(fig 4) so even with high levels of rearrangement a function-ally reorganized genomic architecture can result fromselection

Intragenomic duplication may be the cause of most pecu-liarities in the C velia plastid genome but not all of themFeatures such as transcript oligoU tailing and noncanonicalgenetic code (Moore et al 2008 Janouskovec et al 2010) may

have different causes including increased mutation rate smallpopulation size and a short replication cycle Moreover earlyevolutionary acquisition and long-term combined effect ofsome these forces could have significantly contributed tothe genome remodeling For example all alveolate plastids(C velia apicomplexans Vitrella and dinoflagellates) have areduced gene complement somewhat modified gene orderand comparatively fast rate of protein evolution (Janouskovecet al 2010) Oligouridylylation of plastid transcripts is alsofound in dinoflagellates and may significantly predateC velia although it seems to be missing in apicomplexans(Janouskovec et al 2010 JJ unpublished data) The AtpBinsertion present in dinoflagellates and Vitrella at a homolo-gous position to the split in C velia could be viewed as anancestral acquisition that may have individualized the twoparts of the protein preparing ground for the split Similarlyrecombination has been very active in the dinoflagellate plas-tids and possibly even led to a massive fragmentation of theirgenome to small mini-circles (Zhang et al 1999 Zhang et al2001) so it might be tempting to speculate that the corecause of many of these conditions traces back to an ancientancestor Distinguishing between ancient and independentgains is nevertheless difficult The plastid genomes of apicom-plexans and Vitrella are generally not that unusual in struc-ture as C velia and dinoflagellates so it is more accurate topropose that all alveolate plastids were ancestrally somewhatdivergent but evolved in different directions

The fact that in a single organism two conservativephotosynthesis proteins are split is astonishing given thehigh efficiency of C velia photosynthesis (Quigg et al 2012)The mechanism behind a particular split can be relativelysimple but it is more difficult to see how each of the PsaAand AtpB fragments became integrated into a functionalmultiprotein complex For example both PsaA fragmentshad to coevolve with mechanisms allowing co-translationalinsertion of chlorophylls and simultaneously acquireremodelinteractions with numerous cofactors that mediate theirassembly into a fully functional PSI complex Many otherstructures in the C velia plastid appear similarly complicatedBoth the ribosome and RNA polymerase complex comprisea number of highly modified components including proteinswith hundreds of amino acid long extensions and insertionsExplaining the appearance of highly modified structures

FIG 8 Intrachromosomal duplication and recombination hotspot around the TIR boundary (vertical dashed line) The plastid genome is shown bythick black line at the bottom with TIRs (TIR-A and TIR-B) folded onto themselves to the left and the LSC region forming a loop to the right Other partsof the genome with homology to the TIR boundary region are shown by thick gray lines numbers indicate positions Genes and pseudogenes are shownby boxes Level of sequence similarity is distinguished according to the key

2458

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is more difficult than their components individually becausemany parts of these systems are mutually constrainedLikewise molecular processes such as oligoU tailing aredifficult to justify in adaptive terms Considering the complex-ity of similar unnecessary processes in other organellessuch as RNA editing or intron splicing (Gray et al 2010)it is attractive to hypothesize that complicated molecularmachineries have evolved in the C velia plastid that serveno general advantage

Materials and Methods

DNA Extraction Sequencing Annotation andFragmented Gene Analysis

Pelleted cells of C velia were ground in liquid nitrogenusing mortar and pestle and the resulting slurry was incu-bated in CTAB buffer (2 wv cetyltrimethyl ammoniumbromide 142 M NaCl 20 mM EDTA 100 mM Tris HCl pH80 2 wv polyvinylpyrrolidone 05 b-mercaptoethanol1 mgml RNase A) at 65 C for 20 min After two extrac-tions with phenolchloroform and one with chloroformonly DNA was precipitated with isopropanol and washedwith ethanol Pellet was dried at room temperature andresuspended in TE buffer Extracted DNA was separated byCsCl-Hoechst gradient ultracentrifugation and AT-enrichedfractions were tested for the presence of plastid DNA usingthe psbA probe Plastid-enriched DNA fractions weresequenced using Illumina 54 bp paired-end reads technol-ogy (a total of 9591179 read pairs were obtained) anddeposited in GenBank Sequence Read Archive (BioprojectPRJNA193178) De novo sequence assembly using MIRA3extended the previous plastid contig (NC_0143401) atboth ends to the final length of 120426 nucleotides at572 average depth of coverage Two errors at homopol-ymeric regions were corrected which led to the merging ofrps7 with orf142 and rpl4 with orf128 The gene for 5Sribosomal RNA was identified upstream of 23S rRNAand its fold verified using Mfold (httpmfoldrnaalbanyeduq=mfold last accessed May 25 2013) Proteins werere-annotated by comparison with plastid homologs froman NTG start codon with the exception of 10 proteins thatwere more consistent with an AT[ACT] start codon Six ofthe proteins had no alternative NTG start (atpA rpl31rps7 rps12 rps14 ycf3) Unknown ORFs and genes withlong N-terminal extensions were annotated from the firstNTG start codon The extended plastid genome sequenceand new annotations were deposited in an updated C veliaplastid genome entry (NC_0143402) PsaA trans-membranedomains were predicted using TMHMM 20 (httpwwwcbsdtudkservicesTMHMM last accessed May 25 2013)and by comparison to PsaA of Synechoccocus elongatusThe corrected genetic distances for PsaA-1 and PsaA-2were calculated in Tree-Puzzle 52 (Schmidt et al 2002)using a data set of 40 representative plastid PsaA homologsand the following parameters Likelihood mapping Slow(exact) parameter estimates WAG model with fourgamma categories estimated alpha parameter and esti-mated aa frequencies PsaA-12 phylogenies were

computed using a broader data set in RAxML using -mPROTGAMMALGF -f a - 100 PsaA-12 and AtpB data setswere aligned using ndashlocalpair option in MAFFT

RNA Extraction RACE and Northern BlotHybridization

Total RNA was isolated from ~02 g of C velia cells rup-tured by repeated freezing and thawing followed by grind-ing with a pestle in liquid nitrogen by the addition of 5 mlTRI reagent followed by manufacturerrsquos instructions(Sigma) Total RNA was ligated with 50 adapter (50

RACE) or self-ligated with T4 RNA ligase (circular RACE)and RT-PCR was carried out with either random or specificprimers or with polyT primers (30 RACE) Sequences ofcircularized transcripts were deposited under GenBankNucleotide accessions KC734564ndashKC734569 For Northernblot analysis ~10mg per lane of total RNA was separatedin a 1 formaldehyde agarose gel in 1 MOPS bufferblotted and UV cross-linked as described elsewhere(Vondruskova et al 2005) PCR-amplified DNA fragmentsof selected genes (for primers see supplementary table S4Supplementary Material online) were labeled by randompriming with [32P]-dATP and used as probes for hybrid-ization at 55 C After washing the results were visualizedon Typhoon Imaging System (GE Healthcare)

Preparations of Cell Membranes and TwoDimensional Electrophoresis

Cells of C velia (optical density at 750 nm = ~05) werewashed and resuspended in 1 ml of the working buffer con-taining 25 mM MESNaOH pH 65 5 mM CaCl2 10 mMMgCl2 and 20 glycerol The resuspended cells were mixedwith 05 ml of glass beads (01 mm diameter) in 2 ml eppen-dorf tube and broken using Mini-BeadBeater (BioSpeceight shaking cycles 10 s each with 2-min breaks for coolingthe suspension on ice) Membranes were separated fromthe cell extract by centrifugation (40000 g 20 min)The isolated membranes were resuspended in the workingbuffer and solubilized by gentle shaking with 1 dodecyl-b-maltoside at 10 C for 1 h Insoluble parts were removedby centrifugation (65000 g 20 min) Analysis of nativemembrane complexes was performed using Clear-Nativeelectrophoresis as described in Wittig and Schagger (2008)Individual proteins in membrane complexes were resolvedin the second dimension by SDS-PAGE in a 12ndash20 lineargradient polyacrylamide gel containing 7 M urea (Sobotkaet al 2008)

Protein Identification by LC-MSMS Analysis

Gel slices were placed in 200ml of 40 acetonitrile 200 mMammonium bicarbonate and incubated at 37 C for 30 minThe solution was then discarded the procedure was repeatedone time and the gel was finally dried in a vacuum centrifugeTwenty microliters of 40 mM ammonium bicarbonatecontaining 04mg trypsin (proteomics grade Sigma) wasadded to the tube incubated at 4 C for 45 min and thendried in a vacuum centrifuge To digest proteins 20ml of 9

2459

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 3: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

658349813638

2604

1903

1383

955

623

281

285 kb

105

104

103

102

10

1Tran

scrip

t cov

erag

e pe

r bp

(Lo

g)

psaBpsaA -2psbBpsaA -1

psbEpsbB ψ psaA ψ

acsF

trnG

18 kb[UUUUU]

[UUUUU]095 kb

16 kb

fro frofro

[UUUUU]125 kb

035 kb

POLYCISTRONICTRANSCRIPTS

PROCESSEDTRANSCRIPTS

GENOME

PROTEINS psaA-1 psaA-2divergent conserved

65834981

36382604

1903

1383

955

623

281

psaA-1 psbB psaA-2 psbE

N N C

Cthylakoid stroma

plastid matrix

thylakoid membrane

M

M

D D

P PD

M

M

1 2 9876143 10 11

FIG 1 Expression model for the split PsaA PsaA fragments are separated at the genomic transcriptomic and protein level (top to bottom) The topgraph shows the transcriptional profile (total RNA in black upper line and polyA RNA in gray lower line) of the genomic region below In the genomicregion boxes represent genes and gray arrows show the coding DNA strand Consistent with the transcriptional profile Northern hybridization blots(boxes below) reveal that both psaA fragments are transcribed within larger polycistronic transcripts (P) and further processed into dicistronic (D) andmonocistronic (M) units The monocistronic psaA transcripts contain oligoU tails and translate into independent peptides both of which participate inPSI (thylakoid trans-membrane domains are shown by vertical boxes and numbered)

FIG 2 Expression model for the split AtpB (A) Northern analysis revealed a monocistronic transcript for each of the two atpB fragments(B) Transcriptome coverage of the two genomic regions is shown for total RNA (black upper line) and polyA RNA (gray lower line) Genes areshown by boxes and gray arrows indicate coding strands Black arrows below indicate predicted transcripts based on the combination of transcriptmapping and Northern analysis (C) Alignment of plastid cyanobacterial and yeast mitochondrial AtpBs reveals that the split of Chromera velia AtpBoccurred within a 24 amino acid insertion that is also present in the sister taxon Vitrella (the most probable positions of the split are boxed)

2449

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

(fig 3) This together with the lack of evidence for intactgenes or transcripts for either gene suggested strongly thatthe two separate subunits of both PsaA and AtpB proteins areassembled into functional PSI and ATP synthase respectively

Modeling the two-subunit forms shows that the pointat which both genes were split corresponds to a loopspanning structural domains (fig 1) In the case of PsaAthe position of this breakpoint corresponds to the largestloop in the protein which separates the first two pairs ofperipheral helices from the rest of the protein including thephotochemistry-performing core (fig 1 Jordan et al 2001Green 2003) The split has also driven a considerablechange in divergence rate between the two peptides PsaA-

1 at the photosynthetic antenna periphery is significantlyless conserved than PsaA-2 at the antenna core By calculat-ing maximum likelihood-corrected genetic distances (seeMaterials and Methods) we estimated that PsaA-1 is about46 times more divergent than PsaA-2 relative to other plastidhomologs suggesting that it is not surprisingly under weakerfunctional constraint Maximum likelihood phylogeniesusing a wide sampling of plastid and bacterial homologs(see Materials and Methods) also excluded a secondarybacterial origin for PsaA-1 as suggested previously (Mazoret al 2012) although fast-evolving both C velia PsaA-1 andPsaA-2 were most closely related to the sister taxon Vitrellaand other plastids

Table 1 Total RNA Transcript Mapping to Chromera velia Plastid Genes

Region NumberGene Length (kb) Covbpa Rel Cov vs Medb

Intergenic regionsc 105 224 3444

Genesc 107 941 33990

Ribosomal RNA 3 48 348584 4753

Transfer RNA 29 24 699 10

Protein-coding 75 869 12033 164

Photosystem II (psb) 11 66 52244 712

Photosystem I (psa) 4 57 22740 310

Cytochrome b6f (pet) 4 21 18301 250

ATP synthase (atp) 6 51 14762 201

Ribosomal proteins large subunit (rrl) 10 54 1141 16

Ribosomal proteins small subunit (rrs) 12 168 1328 18

Thylakoid import (sec tat) 3 45 817 11

RNA polymerase (rpo) 4 154 293 04

Other function (acsF ccsA clpC tufA ycf3) 7 119 3617 49

Unknown function (orf) 14 135 1339 18

Highly expressed protein-coding genes psbA 10 383384 5228

psaA-2 12 48594 663

psbE 03 41694 569

atpH1 02 41540 566

psbD 10 37785 515

psbC 14 32123 438

petD 05 27882 380

psbB 16 23102 315

psbV 05 20686 282

psaC 02 19422 265

atpA 17 17239 235

petA 09 16957 231

atpB-1 05 15992 218

psbH 03 15653 213

petB 06 15341 209

psaB 34 14407 196

tufA 13 13235 180

petG 01 13026 178

atpB-2 15 10950 149

psbK 01 9339 127

psaA-1 08 8538 116

psbJ 01 8027 109

NOTEmdashGenes encoding split proteins are underlinedaAverage coverage per base pairbRelative coverage compared with median gene coverage depth per base pair (=7334)cGeneintergenic regions in the TIRs were counted once

2450

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

When compared with PsaA the AtpB split is suggestiveof a different but equally interesting process The regionsurrounding the split is conserved in plastid mitochondrialand bacterial homologs but interrupted by a 24 amino acid(aa) insertion in Vitrella C veliarsquos closest photosyntheticrelative (fig 2C) A detailed sequence alignment of thisregion shows that the split in C velia AtpB occurred withinthis shared insertion suggesting that the ancestral acquisitionof this insertion might have provided an evolutionary oppor-tunity for the split A 31 aa long insertion at the same site is

also present in the AtpB of the related dinoflagellateAmphidinium (data not shown) Although unrelated insequence the notorious divergence of dinoflagellate plastidproteins and their greater distance to C velia and Vitrellacould easily account for the insertion divergence pointingto a possibly even deeper origin

Many Other Plastid Open Reading Frames areExtraordinary and Transcribed

PsaA and AtpB splits stand out as unprecedented cases butrepresent only a fraction of peculiarities in C velia plastidopen reading frames (ORFs) The genome contains manyunusually modified ORFs including several genes that areexpected to be present in plastid genomes but which areunusually large due to the presence of long extensions thatbear no recognizable similarity to known genes In three genes(rps7 8 and 17) the ORF region has been extended by 126 to273 amino acids toward the C-terminus effectively triplingthem in size (supplementary fig S2 SupplementaryMaterial online) In four other genes (rps3 4 8 and 11) theregion sharing homology with other plastid proteins is foundat the end of an unusually large ORF (supplementary fig S3Supplementary Material online) The most obvious exampleof this is Rps4 which is encoded within a 1998 aa long ORFcompared with 201 aa in the red alga Porphyra purpurea Inthese cases start codons may be located near the beginning ofthe homologous region but it is still puzzling why the homol-ogous region would be preceded by such a long stretch ofsequence uninterrupted by stop codons Canonical Shine-Dalgarno sequences are absent (the predicted 30-end of the16S rRNA is highly divergent in C velia) and therefore cannotsupport start codon prediction ORFs of two additional genesrpoC1 and rpoC2 contain long insertions (507 and 971 aa)in their variable regions (none of them appears to representan intein supplementary fig S4 Supplementary Materialonline) Finally a number of smaller changes are found inORFs throughout the C velia plastid genome Accountingfor extensions and indels longer than 20 aa five other ORFshave a truncated N-terminus (atpA ccsA petA ycf3 andpsaB) one ORF has a significantly extended N-terminus(rpl4) two ORFs have a truncated C-terminus (ccsA andrpoA) five ORFs have an extended C-terminus (atpA atpBatpH2 petA and rps2) and three ORFs contain interveninginsertions (rpl6 rpoB and psaB)

Altogether ORFs of various functionally unrelated proteinscontain unusual modifications Many of the proteins areindispensable and exclusively plastid-encoded in algae andplants suggesting that they are unlikely to be in the processof degeneration loss or relocation to the nucleus in C veliaAccordingly none of these proteins can be identified in theavailable nuclear transcriptomes from C velia (Woehle et al2011) To provide first insights into expression of the unusualproteins we analyzed the plastid transcriptome using totaland polyA RNA fraction sequence that we generated (polyURNA sequence was not available for this analysis) Transcriptmapping from the total RNA fraction (see Materials andMethods) revealed that all unusually long ORFs are coveredby transcriptome reads roughly equally in their entirety

FIG 3 The 2D electrophoresis of membrane protein complexes ofChromera velia and identification of individual spots Membrane pro-teins were solubilized by dodecyl-b-maltoside and separated in the firstdimension by Clear-Native electrophoresis (Clear-Native-PAGE) Afterthe separation the gel was scanned by LAS 4000 imager (Fuji Japan) inchlorophyll fluorescence mode (Chl fl) after excitation by blue LED lightto distinguish the PSII and light-harvesting complexes The proteincomplexes resolved in the first dimension were further separatedin the second dimension by denaturing gel (SDS-PAGE) and stainedby Coomassie blue Protein spots were cut from the gel and analyzedby LC-MSMS as described in Materials and Methods Identified spotsare highlighted and positions of protein complexes separated byClear-Native electrophoresis are marked as follows PSII PhotosystemII SC-PSII Supercomplexes of photosystem II with antenna SC-PSISupercomplexes of photosystem I with antenna

2451

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

(supplementary figs S2 and S3 Supplementary Materialonline) Although this approach cannot accurately pinpointthe position of start codons (many rps genes are expressedrelatively little and intergenic regions are often transcribed)the consistent coverage by transcripts suggests that the ORFsmay really have expanded in size at the protein level Much ofthe long insertions in rpo genes is also equally represented inthe transcriptome data however in this case a drop in cov-erage is found within each insertion creating a 200 bp gap(supplementary fig S4 Supplementary Material online) Thisraises the possibility that RpoC1 and RpoC2 may be splitmuch like PsaA and AtpB however both are in frame andlow coverage so more evidence is required to distinguishbetween the two alternatives

Unusual changes to gene presence have also been notedtwo genes are present in two (atpH) and three (clpC) full-length paralogs despite that a single type is found in otherplastid genomes All three clpC paralogs are highly divergentin sequence from each other and clpCs in other plastids Theyoverlap only partially and the largest and most complete ofthe paralogs (clpC3) is an order of magnitude less expressedthan the other two (supplementary fig S5A SupplementaryMaterial online) Likewise the two atpH paralogs are differentin sequence (93 nucleotide similarity) length and expres-sion AtpH1 is both highly expressed and conserved butthe expression of the duplicated atpH2 located on the TIRsis nearly 100-fold lower (supplementary fig S5BSupplementary Material online) The atpH2 gene contains

an unusual C-terminal extension which doubles it in size Acloser look at the surrounding sequence reveals that atpH2originated by a duplication event of the atpH1 region (includ-ing trnK discussed later) This together with its low expres-sion suggests that atpH2 is most likely a non-functionalpseudogene More generally however this example illustrateshow duplication can drive gene paralogy ORF expansionand altered gene order all features observed throughoutthe C velia plastid genome

Analysis of the total RNA and polyA fraction (fig 4) tran-scription profiles revealed more unexpected featuresTranscripts for all plastid genes were present in the totalRNA fraction and consistently also found in the polyA frac-tion at two to three magnitudes lower abundance probablydue to a carryover of non-polyA transcripts (fig 4) The psbAgene stood out in two ways however First it was highlyexpressed being represented at almost an order of magnitudehigher levels than other protein-coding genes (and similarlevels as the rRNA operon fig 4 and table 1) Second therepresentation of psbA in the polyA fraction was about equalto the total RNA fraction which would be the expected resultfor a polyadenylated transcript (fig 4) Only two additionalregions corresponding to rps19 and a psbB fragment (partof the annotated orf157) were relatively overrepresented inthe polyA fraction but only rps19 contained A-rich genomicsequence at the 30-end that could explain the bias Whetherthe psbA gene and psbB fragment are uniquely polyadeny-lated or a carryover into the polyA fraction occurred due to

10000 20000 30000 40000

105

104

106

103

102

10

1

atpB-2 psaA-1 psbB psaA-2 psbC psbDpsaB

position (bp)

Cov

erag

e pe

r bp

(Lo

g)

atpH2

Photosystem I + II ATPase Uknown ORFExpression Pseudogene fragmentOther functiontotal RNA polyA RNA

psbE

50000 60000 70000 80000

105

104

106

103

102

10

1

23S rRNA atpB-1 atpH1 atpI

Cov

erag

e pe

r bp

(Lo

g)

psbV K psbJ

90000 100000 110000 120000

105

104

106

103

102

10

1

psbAatpA

Cov

erag

e pe

r bp

(Lo

g)

atpH2psbHpsbT psaC petGB

petA

Cytochrome b6f

rps19

tufA

petD

16S rRNA

psbA psbB

FIG 4 Plastid total RNA and polyA RNA transcriptomic profiles Coverage by total RNA (red) and polyA RNA (blue) is shown for the whole plastidgenome (Log scale) Gene families are color-coded according to the key The highly expressed psbA gene is highlighted in bold The split psaA and atpBgenes are in red The boundaries of the TIRs are separated by dashed vertical lines

2452

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

another reason is currently not clear and awaits successfulcloning of their 30 transcript ends The transcriptomes alsoenabled us to analyze expression pattern among neigboringgenes which have been highly rearranged and strand-polar-ized in C velia compared with other plastids Canonicaloperons (atpIHGFDA psaAB petBD ribosomal proteinsupercluster ribosomal RNA operon eg Westhoff et al1983) and translationally coupled units (atpBE Gatenbyet al 1989) were also affected by rearrangements and eitherfragmented or internally reshuffled Canonical promoterscould not be identified These characteristics indicate thattranscription of many core plastid genes has been remodeledNorthern hybridization patterns in the psaA-1 and psaA-2region already revealed that this is the case in several keyproteins of the photosystems which are expressed fromnovel multicistronic mRNAs through complex processing(fig 1) Whole-genome transcriptional profiles (fig 4) confirmthis and suggest that polycistronic transcription is widespreadin the genome (compare transcript coverage at intergenicspacers and the region downstream of the 16S rRNA genefig 4) Although the primary transcript regions and processingpatterns cannot be predicted accurately the data show thatnovel polycistronic units most likely exist and often includefunctionally similar genes (compare distribution of genes forphotosynthesis and protein expression in fig 4)

Finally we estimated and compared relative expression ofall plastid genes separately and in their functional groupsbased on the total RNA transcriptome (table 1 and supple-mentary table S3 Supplementary Material online) Asexpected the rRNA genes were most highly expressedfollowed by genes of the four membrane complexes (PSIPSII cytochrome b6f and ATP synthase) and these alsocomprised all of the 22 most expressed protein-codinggenes with the single exception of tufA The least expressedwere genes related to transcription and translation RNApolymerase tRNA genes and ribosomal proteins (table 1)However unknown ORFs and most intergenic regions werealso expressed including relatively highly expressed orf389orf201 and intergenic regions downstream of psbJ andatpB-1 (fig 4) Additionally we used the transcriptome toaddress the presence of RNA editing which is known in theplastids of the related dinoflagellates (eg Dang and Green2009) but has not been found in those of apicomplexans andVitrella The consensus of transcriptomic reads at each sitewas identical to the C velia plastid genome sequence suggest-ing the absence of editing Only four sites showed significantpolymorphism (gt10) at RNA level (sites 44346 [515]46071 [2366] 46074 [3935] and 46574 [2379]) all ofwhich were located within the rRNA operon and identical(998ndash100) at the DNA level Based on their location weconclude that this is more likely due to reverse transcriptionerrors of modified rRNA nucleosides and not RNA editing

A Linear-Mapping Plastid Genome in C velia

One of the most significant features of the previouslypublished plastid genome from C velia is that it does notmap as a circle as do all other plastid genomes completed todate The gap in the sequence falls between the two copies of

the TIR containing psbA (Janouskovec et al 2010) Weattempted to close this gap using multiple approaches Westarted by massively increasing the depth of sequence cover-age The genome was originally sequenced by 454 pyrose-quencing to a 13-fold depth of coverage so we used anindependent approach and assembled the genome frompaired-end Illumina sequence reads to an average depth ofalmost 600-fold This approach extended the two TIRs by300 bp on each side which was verified by PCR (supplemen-tary fig S6 Supplementary Material online) The very end ofeach TIR was followed by a 15 bp sequence further followedby an extension into the complementary TIR at a very lowdepth of coverage (fig 5) These extensions created an overlapbetween the ends however several lines of evidencesuggested that this is not due to a circular chromosomebut rather a linear topology with structured ends Firstmapping the depth of coverage revealed a pattern expectedfor a linear molecule The depth of coverage was generallyconsistent across the large single copy (LSC) region and mostof the TIRs but reduced linearly beginning at about 600 bpfrom the end followed by an exponential reduction andpartial recovery between 300 to 50 bp from the end followedby a terminal drop (fig 5A) The size and shape of theexponential reductionrecovery would be predicted by themean DNA fragment size in the sequencing library (327 bp)and linear topology as a consequence of fragmentation biasnear the chromosome end Therefore both the overalldecrease in coverage depth and the exponential decreaserecovery close to the ends support a linear chromosomeconformation (if it were circular the whole region shouldnot differ from the rest of the genome) Second if thegenome was circular-mapping and the gap was hard tosequence we would expect some paired ends to span thegap However when more than 650000 paired ends aremapped to the genome not a single paired end spanningthe gap was identified (see Materials and Methods) Insteadboth ends of the chromosome consisted exclusively ofreverse-mapping reads (fig 5A and B) Third PCR experi-ments using six primers (all shown to successfully amplifyproducts from one side of the gap) and all their nested com-binations failed to fill the gap between the two TIRs (supple-mentary fig S6 Supplementary Material online) Fourth thesize of the plastid genome estimated directly by pulsed-fieldgel electrophoresis (PFGE) and by psbA probe hybridizationwas consistent with abundant presence of linear monomers120 kb in size (supplementary fig S7 Supplementary Materialonline and Janouskovec et al 2010) Subgenomic-sizedmaterial and unresolved DNA in the well were also abun-dantly represented whereas putative linear dimers wererare and linear concatemers were not detected The PFGEexperiment was reproduced by using probes to three plastidgenes encoded in the LSC region tufA petA and atpA (seeMaterials and Methods) and consistently led to the sameresult suggesting that linear monomers are an importantconstituent of plastid DNA in C velia Subgenomic-sized mol-ecules could represent plastid DNA fragmented during PFGEpreparation (note that intact C velia cells were digestedin agarose plugs however) nucleus-encoded plastid DNA

2453

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

fragments plastid DNA in the process of replication or otherforms of genuine plastid DNA (Ellis and Day 1986 Oldenburgand Bendich 2004 Scharff and Koop 2006) Incompletely di-gested cells of C velia in the agarose plugs (see Materials andMethods) could be responsible for much of the signal in thewell although high-molecular-weight branched DNA formscould also be present and cannot be distinguished using thecurrent approach (Bendich 1991 Bendich 2004) Altogetherbecause the plastid genome was not closed by deep sequenc-ing using either 454 or Illumina no paired-end linkage couldbe established no linkage could be established by PCR and

because the size of the plastid genome estimated by PFGE andSouthern blot hybridizations was consistent with the exis-tence of linear monomers we conclude that the genome isprincipally linear in structure and circular molecules andlinear concatemers are rare or absent

If the genome is indeed linear it will be important to knowhow the C velia plastid DNA replication and copy numbercompares to the circularconcatemeric plastid DNA inapicomplexans (Williamson et al 2001 Williamson et al2002) and other organisms To provide a first glimpse intothis we identified a putative replication origin of the C velia

A

B

FIG 5 Schematic of plastid chromosome ends (A) Coverage depth (y axis) by forward-mapping (yellow) reverse-mapping (purple) and total (red)genomic reads is projected onto the TIR Position of genes is shown by gray boxes at the bottom (K = trnK P = trnP) Total coverage depth startsdropping gradually at about 800 bp from the chromosome end and goes through a U-shaped minimum at about 150 bp from the end The calculatedmean size of the end-sequenced DNA fragments (horizontal black bar red bits correspond to sequenced end pairs) and coverage by forwardreverseread pairs in this region suggest that the U-shaped minimum results from unequal DNA fragmentation near the chromosome end A steep drop in totalcoverage depth occurs at the very end of the chromosome (B) Here the TIRs on each of the chromosome ends (green and blue respectively) divergeinto a shared 15 bp region followed by a short sequence of the complementary TIR (DNA ends marked as 30 and 50) The total depth of coverage byDNA reads (y axis) steeply decreases in the 15 bp region The 15 bp region is also enriched in nucleotide discrepancies (sequence logos in bold) The last130 bp at each chromosome end is exclusively assembled from reverse-oriented paired ends (horizontal colored arrows) and no paired-end spanningthis region can be identified suggesting that the genome cannot be genuinely circularized

2454

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

plastid genome in silico using the cumulative GC-skew anal-ysis This region (68ndash69 kb from one end of the linear contigsupplementary fig S8A Supplementary Material online) ischaracterized by a minimum in cumulative GC-skew andloosely conserved tandem repeats (positions 68404ndash69029)In contrast both ends of the chromosome are at a cumulativeGC-skew maximum suggesting that they most likely corre-spond to replication termini Another feature commonlyassociated with origins of replication is the major shift ingene orientation which is located approximately 18 kbaway from the predicted replication origin in C velia (supple-mentary fig S8B Supplementary Material online) Thegenomic region surrounding the two features is relativelyoverrepresented in the Illumina data whereas both chromo-some ends are relatively underrepresented (this holds aftercorrection for base composition supplementary fig S8CSupplementary Material online) which is consistent withthe existence of replicating molecules in the Illumina libraryAltogether these three characteristics are consistent withlinear-mapping topology and suggest that replication startsat about two-thirds of the way from one end of the moleculeand proceeds bidirectionally toward the ends The copynumber for two C velia plastid genes (tufA and atpH) wasdetermined using dot blot hybridization to total genomicDNA and a synthetic construct (see Materials andMethods) The inferred copy number of both plastid geneswas 9 times higher compared with a single-copy nuclear genetopoisomerase II and confirmed that genes encoded in theLSC region and TIR are equally represented (atpH is present inthree closely related paralogs all of which hybridized to theconstruct) (supplementary fig S9 Supplementary Materialonline)

Discussion

Significance of Split Proteins for Photosynthesis andATP Generation

Both PsaA and AtpB have never been observed to be frag-mented like those of C velia Although the psaA gene is splitin Chlamydomonas reinhardtii RNA trans-splicing producesa full-length transcript which is translated to a full-lengthprotein (Merendino et al 2006) The structure of the PSIcore is highly conserved in all known oxygenic phototrophs(Nelson and Yocum 2006 Busch and Hippler 2011) and it isformed by a heterodimer of structurally similar PsaA and PsaBproteins which bind a remarkably high number of cofactorsabout 100 chlorophylls 14 carotenoids and 2 phylloquinones(Amunts et al 2010) The algal PSI is expected to be structur-ally similar to the plant PSI including several Lhca antennaproteins bound in a half-circle around the PSI (fig 6A)

PsaA in C velia is split between domains 4 and 5 (figs 1 and6A) This suggests that the split was not simply a reversal ofthe proposed ancient formation of psaA by the gene fusion ofa psbBpsbC-like antenna component and a psbApsbD-likecore of the reaction centre (domains 1ndash6 and 7ndash11 respec-tively in fig 1 Schubert et al 1998) The position of the splitand the faster evolutionary rate of PsaA-1 (peripheral frag-ment) can be reconciled with the structure of the PSI (fig 6A

and B) Indeed the presence of a supercomplex of PSI andantenna proteins in C velia is obvious from the 2D electro-phoresis (fig 3) Apart from the structure of PSI itself theprocess of the PSI assembly seems to be highly conservedthrough evolution from cyanobacteria to higher plant chlo-roplasts including assistance of the same auxiliary factors suchas Ycf3 and Ycf4 (Boudreau et al 1997 Ruf et al 1997)Although many individual steps in the PSI biogenesisremain unknown this process had to be remodeled inC velia to assemble three proteins instead of two into thefunctional PSI core complex Particularly intriguing is thebinding of chlorophyll cofactors into the split PsaA Both invivo and in vitro studies suggest that chlorophyll has to beinserted into core subunits of PSI co-translationally probablyas a prerequisite for correct protein folding (Kim et al 1994Eichacker et al 1996) The putative interface between PsaA-1and PsaA-2 is rich in chlorophyll molecules (fig 6B) andaccording to available crystal structures these chlorophyllsare coordinated by both parts of split PsaA It is not clearhow C velia inserts these chlorophylls into PSI and how sta-bility and correct folding of the nascent PsaA-12 is achievedPerhaps both PsaA fragments are assembled together co-translationally and this is synchronized with or even assistedby chlorophyll loading (which could also help to coordinatethe orientation of the two fragments) In any case the PSI corebiogenesis in C velia could involve additional synchronizingassembly step(s) and the recruitment of new nuclear-encoded auxiliary factors Interestingly the second proteinof the PSI antenna core PsaB mirrors some of the PsaAstructure at functionally analogous positions although notsplit PsaB also contains a variable loop between the fourthand fifth trans-membrane helices and a highly divergentN-terminus Whether this plays a compensatory role directlyrelated to the splitting of PsaA is not clear but it is an inter-esting possibility that would require direct biochemicaltesting

Similar to the PsaA protein any split of the AtpB proteinwould seem improbable due to its critical function andconserved structure Even though the proposed AtpB-1part of the b-subunit appears to be relatively far from thecatalytic site (fig 6C) it is known that tilting of the b-subunitsincluding the top b-sheet crown is critical for the catalyticcycle The movement of the upper part of the b-subunit issubtle but if it is restricted by inhibitor tentoxin then thecyclic interconversion of nucleotide binding sites is blocked(Groth 2002) Tentoxin binds close to the place where theAtpB protein is split (Groth 2002) Correct reassembly ofAtpB-1 and AtpB-2 proteins is therefore likely to be highlyconstrained to avoid any restrictions of AtpB-2 movementOn the other hand it is also possible that the fragmentedAtpB-2 is more flexible and perhaps less sensitive to somenatural inhibitors The ATP synthase was also noteworthy inthat it migrated at the top of the native protein gel This hasnever been observed in cyanobacteria (Herranen et al 2004)or higher plants (Jarvi et al 2011) and indicates that ATPsynthase might be integrated into a very stable megadaltonsupercomplex in the plastid of C velia

2455

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Highly Divergent Characteristics of Gene andChromosomal Structure

A number of highly modified ORFs are found in the C veliaplastid genome and distributed without any obvious patternLong extensions or insertions in plastid proteins are uncom-mon in most plastids including those in other alveolatesSome have been identified in green algae (eg de Cambiaireet al 2006) but never in the extent or number reported hereMany of the C velia predicted proteins that contain exten-sions function in the small subunit of the ribosome Ifthe extensions are really present in the final functional pro-teins this has implications for plastid ribosome structureAlternatively transcripts of the rps ORFs and other unusualORFs could be modified so that only the homologous regionis translated This process may be related to transcript oligour-idylylation but as yet we have no evidence to support thisOligoU tailing is another modification of general interest(Wang and Morse 2006) and its significance and distribution

in C velia plastid transcripts remain unknown OligoU tailsin some C velia transcripts are found in mature mRNAs thatare derived from polycistronic primary transcripts (fig 1)This suggests a possible role in transcript processing perhapssimilar to that observed in dinoflagellate plastids (Nelsonet al 2007 Dang and Green 2009 Barbrook et al 2012)If confirmed the ubiquitous presence of polycistronictranscription would imply that oligoU tailing is widespreadin the C velia plastid and tightly intertwined with expressionof many core plastid genes

No linear-mapping plastid genome has been documentedto date although there is a longstanding debate aboutwhether circular-mapping plastid genomes might representphysically linear molecules (Bendich 2004) At a sufficientdepth of coverage the two ends of the C velia plastidgenome can be (barely) overlapped in sequence but multiplelines of evidence suggest the genome is actually linear ThePFGE data support existence of linear monomers and rare

FIG 6 A putative position of the PsaA proteins in PSI and the AtpB proteins in ATP synthase of Chromera velia as approximated from proteinalignments and available crystal structures of plant counterparts (A) A stromal view on the pea PSI-LHC complex (PDB ID code 2WSC Amunts et al2009) showing a putative position of PsaA-1 and PsaA-2 in the PSI complex Lhca antenna proteins are shown in green stromal subunits PsaC PsaD andPsaE were removed for clarity (B) A detail of the same structure showing arrangement of chlorophyll molecules bound to the PSI core Red circle markschlorophylls expected to be bound at an interface between PsaA-1 and PsaA-2 proteins in C velia (see discussion for details) (C) AtpB-1 and AtpB-2proteins fitted into structure of the spinach F1-ATPase (PDB ID code 1KMH) Catalytic site is indicated by an orange asterisk

2456

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

dimers (which could arise by TIR-mediated recombinationbetween two monomers) but is incongruent with a signifi-cant presence of linear concatemers This leaves no directcomparisons possible with other plastid DNAs includingthose in related apicomplexans which comprise various mix-tures of circular molecules linear concatemers and high-mo-lecular-weight material (Lilly et al 2001 Williamson et al 2002Bendich 2004 Day and Madesis 2007) Presence of TIRs in theC velia plastid genome is a common characteristic of lineargenomes (eg in apicomplexan mitochondria Kairo et al1994 Hikosaka et al 2010) The physical structure of thechromosome ends remains unknown but given thesequence one can speculate based on structures observedin other linear genomes These include single-strandedloops (possibly followed by a nick) or single-stranded over-hangs in the 15 bp regions (reviewed in Nosek et al 2004) ordiverse associations with end-specific proteins (eg Rekoshet al 1977 Tomaska et al 1997) Gradually decreasing depthof coverage near the chromosome end may also suggest thatthe whole terminal 600 bp region not just its very end isprotected possibly by a t-loop (Tomaska et al 2002) Norepeated sequence can be found close to the ends howeverso determining which of these structures if any most closelyrepresents the physical state of the C velia plastid chromo-some will require direct testing Altogether these data provideevidence for the first linear-mapping plastid genome inC velia and suggests that a similar topology could existin other plastid genomes particularly those that do notpresently map as a circle (Gabrielsen et al 2011)

The change in the plastid chromosome topology hasimportant implications on the process of DNA replicationand overcoming the end-replication problem In the api-complexan and other plastids several types of replicationorigins have been identified and linked to the D-loop roll-ing circle and recombination-mediated replication strate-gies The most common type of replication origin isassociated with bidirectional D-loop replication and locatedinside the duplicated rRNA inverted repeat (Kolodner andTewari 1975 Williamson et al 2002 Krishnan and Rao2009) In contrast the C velia plastid genome lacks boththe duplicated rRNA unit and associated replication originand is probably replicated bidirectionally from thepredicted putative replication origin toward the chromo-some termini which is a unique situation among all plastidgenomes The estimated plastid gene copy number (9that of a nuclear marker) is also unusually low comparedwith most photosynthetic plastids (~50ndash100 copies Dayand Madesis 2007) and is more similar to the gene copynumber in the non-photosynthetic plastids of apicomplex-ans (~25 copies Matsuzaki et al 2001) or plastid minicir-cles during exponential growth in dinoflagellates(2ndash4 copies Koumandou and Howe 2007)

Searching for Forces Driving Structural Complexity

The C velia plastid genome is unique in a number of waysHere we have examined several unusual characteristics ofits organization and expression all of which raises the

question why is this genome home to so many odditiesOne property that might explain several of these character-istics concurrently is a high level of intragenomic duplicationThe C velia plastid genome has been extensively reshuffledand in order to understand the underlying forces wesearched for palindromes short repeats and long repeatsPalindromes and short repeats were comparatively rarehowever the searches identified at least 46 pairwise matchesbetween longer genomic sequences (gt50 bp) at 65ndash100similarity (fig 7) Most of the duplicates are divergent andlow copy number (2ndash3) and apart from the TIRs they includemany intergenic regions duplicated atpH and clpC genesand duplicated fragments of otherwise complete genes(psaA psaB psbB rpl3 and tufA fig 7) The region containingmost repeated sequence is found around the TIR boundary(fig 8) Multiple parts of this region including severalgene fragments three atpH paralogs and six trnK paralogscomprising both complete and fragmentary variants arescattered in several places in the genome (fig 8) Long repeatsare rare in most plastids suggesting that duplication isan important force in shaping the C velia plastid genomeDuplication could explain occurrence of paralogs small genefragments extensive gene re-shuffling and re-structuringof operon units through movement of promoters andintergenic elements Similar processes could account for theaddition of extensions and insertions to genes The splitting ofpsaA and atpB could likewise be seen as a process involvingduplication or partial duplication For example the PsaA splitcould have proceeded through an intermediate in which the

FIG 7 Large repeats in the Chromera velia plastid genome The lineargenome is drawn as an incomplete circle with genes in black intergenicsin light gray and two TIRs (white lines) at chromosome ends (15 bpregions at the termini are not visible here) Links connect homologousregions of 50 bp or more identified by BLAST (dark gray) and additionalmatches identified by Pipmaker (medium gray) Homology between theTIRs is not shown for clarity Names of genes most affected by dupli-cation events are indicated

2457

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

canonical and split PsaAs coexisted In this scenario each ofthe two complexes could have been regulated independentlyand potentially even have different functions much alike thecyanobacterial monomeric and trimeric PSI (Majeed et al2012)

Duplicated sequences in C velia plastid are low copynumber long and divergent from each other They aredissimilar from short dispersed repeats found in some algalplastids (eg in Chlamydomonas Maul et al 2002) and longerrepeats found in angiosperm plastids (eg Guisinger et al2011) all of which are conserved in sequence and compara-tively high copy number It remains unclear how duplicationis mediated in C velia Few duplicates are in tandem so theduplication process is not likely to be related to DNA repli-cation and is more likely related to recombination A func-tional DNA recombination machinery has been documentedin several land plant and algal plastids (Palmer 1983 Boyntonet al 1988 Haberle et al 2008) where it primarily acts in DNArepair and replication Unwanted genome reorganization isknown to be actively prevented which explains why mostplastids have maintained a conserved genomic architecture(reviewed in Marechal and Brisson 2010) It is possiblethat mechanisms preventing unwanted recombination oralternatively those allowing for recombination-mediatedrepair were loosened or lost in the C velia plastid whichled to propagation of duplicates and rearrangementsThe apparent linearization of the chromosome could alsostem from a recombination-mediated rearrangementinvolving homologous TIRs and 15 bp repeats now locatedat the termini Interestingly although sequence shuffling hasbeen extensive in the C velia plastid genome it has not beencompletely random Most genes have been reorganized inlarge clusters with a strongly pronounced strand polarity(supplementary fig S8B Supplementary Material online)More importantly functionally related genes often clustertogether and are most likely transcribed polycistronically(fig 4) so even with high levels of rearrangement a function-ally reorganized genomic architecture can result fromselection

Intragenomic duplication may be the cause of most pecu-liarities in the C velia plastid genome but not all of themFeatures such as transcript oligoU tailing and noncanonicalgenetic code (Moore et al 2008 Janouskovec et al 2010) may

have different causes including increased mutation rate smallpopulation size and a short replication cycle Moreover earlyevolutionary acquisition and long-term combined effect ofsome these forces could have significantly contributed tothe genome remodeling For example all alveolate plastids(C velia apicomplexans Vitrella and dinoflagellates) have areduced gene complement somewhat modified gene orderand comparatively fast rate of protein evolution (Janouskovecet al 2010) Oligouridylylation of plastid transcripts is alsofound in dinoflagellates and may significantly predateC velia although it seems to be missing in apicomplexans(Janouskovec et al 2010 JJ unpublished data) The AtpBinsertion present in dinoflagellates and Vitrella at a homolo-gous position to the split in C velia could be viewed as anancestral acquisition that may have individualized the twoparts of the protein preparing ground for the split Similarlyrecombination has been very active in the dinoflagellate plas-tids and possibly even led to a massive fragmentation of theirgenome to small mini-circles (Zhang et al 1999 Zhang et al2001) so it might be tempting to speculate that the corecause of many of these conditions traces back to an ancientancestor Distinguishing between ancient and independentgains is nevertheless difficult The plastid genomes of apicom-plexans and Vitrella are generally not that unusual in struc-ture as C velia and dinoflagellates so it is more accurate topropose that all alveolate plastids were ancestrally somewhatdivergent but evolved in different directions

The fact that in a single organism two conservativephotosynthesis proteins are split is astonishing given thehigh efficiency of C velia photosynthesis (Quigg et al 2012)The mechanism behind a particular split can be relativelysimple but it is more difficult to see how each of the PsaAand AtpB fragments became integrated into a functionalmultiprotein complex For example both PsaA fragmentshad to coevolve with mechanisms allowing co-translationalinsertion of chlorophylls and simultaneously acquireremodelinteractions with numerous cofactors that mediate theirassembly into a fully functional PSI complex Many otherstructures in the C velia plastid appear similarly complicatedBoth the ribosome and RNA polymerase complex comprisea number of highly modified components including proteinswith hundreds of amino acid long extensions and insertionsExplaining the appearance of highly modified structures

FIG 8 Intrachromosomal duplication and recombination hotspot around the TIR boundary (vertical dashed line) The plastid genome is shown bythick black line at the bottom with TIRs (TIR-A and TIR-B) folded onto themselves to the left and the LSC region forming a loop to the right Other partsof the genome with homology to the TIR boundary region are shown by thick gray lines numbers indicate positions Genes and pseudogenes are shownby boxes Level of sequence similarity is distinguished according to the key

2458

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is more difficult than their components individually becausemany parts of these systems are mutually constrainedLikewise molecular processes such as oligoU tailing aredifficult to justify in adaptive terms Considering the complex-ity of similar unnecessary processes in other organellessuch as RNA editing or intron splicing (Gray et al 2010)it is attractive to hypothesize that complicated molecularmachineries have evolved in the C velia plastid that serveno general advantage

Materials and Methods

DNA Extraction Sequencing Annotation andFragmented Gene Analysis

Pelleted cells of C velia were ground in liquid nitrogenusing mortar and pestle and the resulting slurry was incu-bated in CTAB buffer (2 wv cetyltrimethyl ammoniumbromide 142 M NaCl 20 mM EDTA 100 mM Tris HCl pH80 2 wv polyvinylpyrrolidone 05 b-mercaptoethanol1 mgml RNase A) at 65 C for 20 min After two extrac-tions with phenolchloroform and one with chloroformonly DNA was precipitated with isopropanol and washedwith ethanol Pellet was dried at room temperature andresuspended in TE buffer Extracted DNA was separated byCsCl-Hoechst gradient ultracentrifugation and AT-enrichedfractions were tested for the presence of plastid DNA usingthe psbA probe Plastid-enriched DNA fractions weresequenced using Illumina 54 bp paired-end reads technol-ogy (a total of 9591179 read pairs were obtained) anddeposited in GenBank Sequence Read Archive (BioprojectPRJNA193178) De novo sequence assembly using MIRA3extended the previous plastid contig (NC_0143401) atboth ends to the final length of 120426 nucleotides at572 average depth of coverage Two errors at homopol-ymeric regions were corrected which led to the merging ofrps7 with orf142 and rpl4 with orf128 The gene for 5Sribosomal RNA was identified upstream of 23S rRNAand its fold verified using Mfold (httpmfoldrnaalbanyeduq=mfold last accessed May 25 2013) Proteins werere-annotated by comparison with plastid homologs froman NTG start codon with the exception of 10 proteins thatwere more consistent with an AT[ACT] start codon Six ofthe proteins had no alternative NTG start (atpA rpl31rps7 rps12 rps14 ycf3) Unknown ORFs and genes withlong N-terminal extensions were annotated from the firstNTG start codon The extended plastid genome sequenceand new annotations were deposited in an updated C veliaplastid genome entry (NC_0143402) PsaA trans-membranedomains were predicted using TMHMM 20 (httpwwwcbsdtudkservicesTMHMM last accessed May 25 2013)and by comparison to PsaA of Synechoccocus elongatusThe corrected genetic distances for PsaA-1 and PsaA-2were calculated in Tree-Puzzle 52 (Schmidt et al 2002)using a data set of 40 representative plastid PsaA homologsand the following parameters Likelihood mapping Slow(exact) parameter estimates WAG model with fourgamma categories estimated alpha parameter and esti-mated aa frequencies PsaA-12 phylogenies were

computed using a broader data set in RAxML using -mPROTGAMMALGF -f a - 100 PsaA-12 and AtpB data setswere aligned using ndashlocalpair option in MAFFT

RNA Extraction RACE and Northern BlotHybridization

Total RNA was isolated from ~02 g of C velia cells rup-tured by repeated freezing and thawing followed by grind-ing with a pestle in liquid nitrogen by the addition of 5 mlTRI reagent followed by manufacturerrsquos instructions(Sigma) Total RNA was ligated with 50 adapter (50

RACE) or self-ligated with T4 RNA ligase (circular RACE)and RT-PCR was carried out with either random or specificprimers or with polyT primers (30 RACE) Sequences ofcircularized transcripts were deposited under GenBankNucleotide accessions KC734564ndashKC734569 For Northernblot analysis ~10mg per lane of total RNA was separatedin a 1 formaldehyde agarose gel in 1 MOPS bufferblotted and UV cross-linked as described elsewhere(Vondruskova et al 2005) PCR-amplified DNA fragmentsof selected genes (for primers see supplementary table S4Supplementary Material online) were labeled by randompriming with [32P]-dATP and used as probes for hybrid-ization at 55 C After washing the results were visualizedon Typhoon Imaging System (GE Healthcare)

Preparations of Cell Membranes and TwoDimensional Electrophoresis

Cells of C velia (optical density at 750 nm = ~05) werewashed and resuspended in 1 ml of the working buffer con-taining 25 mM MESNaOH pH 65 5 mM CaCl2 10 mMMgCl2 and 20 glycerol The resuspended cells were mixedwith 05 ml of glass beads (01 mm diameter) in 2 ml eppen-dorf tube and broken using Mini-BeadBeater (BioSpeceight shaking cycles 10 s each with 2-min breaks for coolingthe suspension on ice) Membranes were separated fromthe cell extract by centrifugation (40000 g 20 min)The isolated membranes were resuspended in the workingbuffer and solubilized by gentle shaking with 1 dodecyl-b-maltoside at 10 C for 1 h Insoluble parts were removedby centrifugation (65000 g 20 min) Analysis of nativemembrane complexes was performed using Clear-Nativeelectrophoresis as described in Wittig and Schagger (2008)Individual proteins in membrane complexes were resolvedin the second dimension by SDS-PAGE in a 12ndash20 lineargradient polyacrylamide gel containing 7 M urea (Sobotkaet al 2008)

Protein Identification by LC-MSMS Analysis

Gel slices were placed in 200ml of 40 acetonitrile 200 mMammonium bicarbonate and incubated at 37 C for 30 minThe solution was then discarded the procedure was repeatedone time and the gel was finally dried in a vacuum centrifugeTwenty microliters of 40 mM ammonium bicarbonatecontaining 04mg trypsin (proteomics grade Sigma) wasadded to the tube incubated at 4 C for 45 min and thendried in a vacuum centrifuge To digest proteins 20ml of 9

2459

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 4: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

(fig 3) This together with the lack of evidence for intactgenes or transcripts for either gene suggested strongly thatthe two separate subunits of both PsaA and AtpB proteins areassembled into functional PSI and ATP synthase respectively

Modeling the two-subunit forms shows that the pointat which both genes were split corresponds to a loopspanning structural domains (fig 1) In the case of PsaAthe position of this breakpoint corresponds to the largestloop in the protein which separates the first two pairs ofperipheral helices from the rest of the protein including thephotochemistry-performing core (fig 1 Jordan et al 2001Green 2003) The split has also driven a considerablechange in divergence rate between the two peptides PsaA-

1 at the photosynthetic antenna periphery is significantlyless conserved than PsaA-2 at the antenna core By calculat-ing maximum likelihood-corrected genetic distances (seeMaterials and Methods) we estimated that PsaA-1 is about46 times more divergent than PsaA-2 relative to other plastidhomologs suggesting that it is not surprisingly under weakerfunctional constraint Maximum likelihood phylogeniesusing a wide sampling of plastid and bacterial homologs(see Materials and Methods) also excluded a secondarybacterial origin for PsaA-1 as suggested previously (Mazoret al 2012) although fast-evolving both C velia PsaA-1 andPsaA-2 were most closely related to the sister taxon Vitrellaand other plastids

Table 1 Total RNA Transcript Mapping to Chromera velia Plastid Genes

Region NumberGene Length (kb) Covbpa Rel Cov vs Medb

Intergenic regionsc 105 224 3444

Genesc 107 941 33990

Ribosomal RNA 3 48 348584 4753

Transfer RNA 29 24 699 10

Protein-coding 75 869 12033 164

Photosystem II (psb) 11 66 52244 712

Photosystem I (psa) 4 57 22740 310

Cytochrome b6f (pet) 4 21 18301 250

ATP synthase (atp) 6 51 14762 201

Ribosomal proteins large subunit (rrl) 10 54 1141 16

Ribosomal proteins small subunit (rrs) 12 168 1328 18

Thylakoid import (sec tat) 3 45 817 11

RNA polymerase (rpo) 4 154 293 04

Other function (acsF ccsA clpC tufA ycf3) 7 119 3617 49

Unknown function (orf) 14 135 1339 18

Highly expressed protein-coding genes psbA 10 383384 5228

psaA-2 12 48594 663

psbE 03 41694 569

atpH1 02 41540 566

psbD 10 37785 515

psbC 14 32123 438

petD 05 27882 380

psbB 16 23102 315

psbV 05 20686 282

psaC 02 19422 265

atpA 17 17239 235

petA 09 16957 231

atpB-1 05 15992 218

psbH 03 15653 213

petB 06 15341 209

psaB 34 14407 196

tufA 13 13235 180

petG 01 13026 178

atpB-2 15 10950 149

psbK 01 9339 127

psaA-1 08 8538 116

psbJ 01 8027 109

NOTEmdashGenes encoding split proteins are underlinedaAverage coverage per base pairbRelative coverage compared with median gene coverage depth per base pair (=7334)cGeneintergenic regions in the TIRs were counted once

2450

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

When compared with PsaA the AtpB split is suggestiveof a different but equally interesting process The regionsurrounding the split is conserved in plastid mitochondrialand bacterial homologs but interrupted by a 24 amino acid(aa) insertion in Vitrella C veliarsquos closest photosyntheticrelative (fig 2C) A detailed sequence alignment of thisregion shows that the split in C velia AtpB occurred withinthis shared insertion suggesting that the ancestral acquisitionof this insertion might have provided an evolutionary oppor-tunity for the split A 31 aa long insertion at the same site is

also present in the AtpB of the related dinoflagellateAmphidinium (data not shown) Although unrelated insequence the notorious divergence of dinoflagellate plastidproteins and their greater distance to C velia and Vitrellacould easily account for the insertion divergence pointingto a possibly even deeper origin

Many Other Plastid Open Reading Frames areExtraordinary and Transcribed

PsaA and AtpB splits stand out as unprecedented cases butrepresent only a fraction of peculiarities in C velia plastidopen reading frames (ORFs) The genome contains manyunusually modified ORFs including several genes that areexpected to be present in plastid genomes but which areunusually large due to the presence of long extensions thatbear no recognizable similarity to known genes In three genes(rps7 8 and 17) the ORF region has been extended by 126 to273 amino acids toward the C-terminus effectively triplingthem in size (supplementary fig S2 SupplementaryMaterial online) In four other genes (rps3 4 8 and 11) theregion sharing homology with other plastid proteins is foundat the end of an unusually large ORF (supplementary fig S3Supplementary Material online) The most obvious exampleof this is Rps4 which is encoded within a 1998 aa long ORFcompared with 201 aa in the red alga Porphyra purpurea Inthese cases start codons may be located near the beginning ofthe homologous region but it is still puzzling why the homol-ogous region would be preceded by such a long stretch ofsequence uninterrupted by stop codons Canonical Shine-Dalgarno sequences are absent (the predicted 30-end of the16S rRNA is highly divergent in C velia) and therefore cannotsupport start codon prediction ORFs of two additional genesrpoC1 and rpoC2 contain long insertions (507 and 971 aa)in their variable regions (none of them appears to representan intein supplementary fig S4 Supplementary Materialonline) Finally a number of smaller changes are found inORFs throughout the C velia plastid genome Accountingfor extensions and indels longer than 20 aa five other ORFshave a truncated N-terminus (atpA ccsA petA ycf3 andpsaB) one ORF has a significantly extended N-terminus(rpl4) two ORFs have a truncated C-terminus (ccsA andrpoA) five ORFs have an extended C-terminus (atpA atpBatpH2 petA and rps2) and three ORFs contain interveninginsertions (rpl6 rpoB and psaB)

Altogether ORFs of various functionally unrelated proteinscontain unusual modifications Many of the proteins areindispensable and exclusively plastid-encoded in algae andplants suggesting that they are unlikely to be in the processof degeneration loss or relocation to the nucleus in C veliaAccordingly none of these proteins can be identified in theavailable nuclear transcriptomes from C velia (Woehle et al2011) To provide first insights into expression of the unusualproteins we analyzed the plastid transcriptome using totaland polyA RNA fraction sequence that we generated (polyURNA sequence was not available for this analysis) Transcriptmapping from the total RNA fraction (see Materials andMethods) revealed that all unusually long ORFs are coveredby transcriptome reads roughly equally in their entirety

FIG 3 The 2D electrophoresis of membrane protein complexes ofChromera velia and identification of individual spots Membrane pro-teins were solubilized by dodecyl-b-maltoside and separated in the firstdimension by Clear-Native electrophoresis (Clear-Native-PAGE) Afterthe separation the gel was scanned by LAS 4000 imager (Fuji Japan) inchlorophyll fluorescence mode (Chl fl) after excitation by blue LED lightto distinguish the PSII and light-harvesting complexes The proteincomplexes resolved in the first dimension were further separatedin the second dimension by denaturing gel (SDS-PAGE) and stainedby Coomassie blue Protein spots were cut from the gel and analyzedby LC-MSMS as described in Materials and Methods Identified spotsare highlighted and positions of protein complexes separated byClear-Native electrophoresis are marked as follows PSII PhotosystemII SC-PSII Supercomplexes of photosystem II with antenna SC-PSISupercomplexes of photosystem I with antenna

2451

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

(supplementary figs S2 and S3 Supplementary Materialonline) Although this approach cannot accurately pinpointthe position of start codons (many rps genes are expressedrelatively little and intergenic regions are often transcribed)the consistent coverage by transcripts suggests that the ORFsmay really have expanded in size at the protein level Much ofthe long insertions in rpo genes is also equally represented inthe transcriptome data however in this case a drop in cov-erage is found within each insertion creating a 200 bp gap(supplementary fig S4 Supplementary Material online) Thisraises the possibility that RpoC1 and RpoC2 may be splitmuch like PsaA and AtpB however both are in frame andlow coverage so more evidence is required to distinguishbetween the two alternatives

Unusual changes to gene presence have also been notedtwo genes are present in two (atpH) and three (clpC) full-length paralogs despite that a single type is found in otherplastid genomes All three clpC paralogs are highly divergentin sequence from each other and clpCs in other plastids Theyoverlap only partially and the largest and most complete ofthe paralogs (clpC3) is an order of magnitude less expressedthan the other two (supplementary fig S5A SupplementaryMaterial online) Likewise the two atpH paralogs are differentin sequence (93 nucleotide similarity) length and expres-sion AtpH1 is both highly expressed and conserved butthe expression of the duplicated atpH2 located on the TIRsis nearly 100-fold lower (supplementary fig S5BSupplementary Material online) The atpH2 gene contains

an unusual C-terminal extension which doubles it in size Acloser look at the surrounding sequence reveals that atpH2originated by a duplication event of the atpH1 region (includ-ing trnK discussed later) This together with its low expres-sion suggests that atpH2 is most likely a non-functionalpseudogene More generally however this example illustrateshow duplication can drive gene paralogy ORF expansionand altered gene order all features observed throughoutthe C velia plastid genome

Analysis of the total RNA and polyA fraction (fig 4) tran-scription profiles revealed more unexpected featuresTranscripts for all plastid genes were present in the totalRNA fraction and consistently also found in the polyA frac-tion at two to three magnitudes lower abundance probablydue to a carryover of non-polyA transcripts (fig 4) The psbAgene stood out in two ways however First it was highlyexpressed being represented at almost an order of magnitudehigher levels than other protein-coding genes (and similarlevels as the rRNA operon fig 4 and table 1) Second therepresentation of psbA in the polyA fraction was about equalto the total RNA fraction which would be the expected resultfor a polyadenylated transcript (fig 4) Only two additionalregions corresponding to rps19 and a psbB fragment (partof the annotated orf157) were relatively overrepresented inthe polyA fraction but only rps19 contained A-rich genomicsequence at the 30-end that could explain the bias Whetherthe psbA gene and psbB fragment are uniquely polyadeny-lated or a carryover into the polyA fraction occurred due to

10000 20000 30000 40000

105

104

106

103

102

10

1

atpB-2 psaA-1 psbB psaA-2 psbC psbDpsaB

position (bp)

Cov

erag

e pe

r bp

(Lo

g)

atpH2

Photosystem I + II ATPase Uknown ORFExpression Pseudogene fragmentOther functiontotal RNA polyA RNA

psbE

50000 60000 70000 80000

105

104

106

103

102

10

1

23S rRNA atpB-1 atpH1 atpI

Cov

erag

e pe

r bp

(Lo

g)

psbV K psbJ

90000 100000 110000 120000

105

104

106

103

102

10

1

psbAatpA

Cov

erag

e pe

r bp

(Lo

g)

atpH2psbHpsbT psaC petGB

petA

Cytochrome b6f

rps19

tufA

petD

16S rRNA

psbA psbB

FIG 4 Plastid total RNA and polyA RNA transcriptomic profiles Coverage by total RNA (red) and polyA RNA (blue) is shown for the whole plastidgenome (Log scale) Gene families are color-coded according to the key The highly expressed psbA gene is highlighted in bold The split psaA and atpBgenes are in red The boundaries of the TIRs are separated by dashed vertical lines

2452

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

another reason is currently not clear and awaits successfulcloning of their 30 transcript ends The transcriptomes alsoenabled us to analyze expression pattern among neigboringgenes which have been highly rearranged and strand-polar-ized in C velia compared with other plastids Canonicaloperons (atpIHGFDA psaAB petBD ribosomal proteinsupercluster ribosomal RNA operon eg Westhoff et al1983) and translationally coupled units (atpBE Gatenbyet al 1989) were also affected by rearrangements and eitherfragmented or internally reshuffled Canonical promoterscould not be identified These characteristics indicate thattranscription of many core plastid genes has been remodeledNorthern hybridization patterns in the psaA-1 and psaA-2region already revealed that this is the case in several keyproteins of the photosystems which are expressed fromnovel multicistronic mRNAs through complex processing(fig 1) Whole-genome transcriptional profiles (fig 4) confirmthis and suggest that polycistronic transcription is widespreadin the genome (compare transcript coverage at intergenicspacers and the region downstream of the 16S rRNA genefig 4) Although the primary transcript regions and processingpatterns cannot be predicted accurately the data show thatnovel polycistronic units most likely exist and often includefunctionally similar genes (compare distribution of genes forphotosynthesis and protein expression in fig 4)

Finally we estimated and compared relative expression ofall plastid genes separately and in their functional groupsbased on the total RNA transcriptome (table 1 and supple-mentary table S3 Supplementary Material online) Asexpected the rRNA genes were most highly expressedfollowed by genes of the four membrane complexes (PSIPSII cytochrome b6f and ATP synthase) and these alsocomprised all of the 22 most expressed protein-codinggenes with the single exception of tufA The least expressedwere genes related to transcription and translation RNApolymerase tRNA genes and ribosomal proteins (table 1)However unknown ORFs and most intergenic regions werealso expressed including relatively highly expressed orf389orf201 and intergenic regions downstream of psbJ andatpB-1 (fig 4) Additionally we used the transcriptome toaddress the presence of RNA editing which is known in theplastids of the related dinoflagellates (eg Dang and Green2009) but has not been found in those of apicomplexans andVitrella The consensus of transcriptomic reads at each sitewas identical to the C velia plastid genome sequence suggest-ing the absence of editing Only four sites showed significantpolymorphism (gt10) at RNA level (sites 44346 [515]46071 [2366] 46074 [3935] and 46574 [2379]) all ofwhich were located within the rRNA operon and identical(998ndash100) at the DNA level Based on their location weconclude that this is more likely due to reverse transcriptionerrors of modified rRNA nucleosides and not RNA editing

A Linear-Mapping Plastid Genome in C velia

One of the most significant features of the previouslypublished plastid genome from C velia is that it does notmap as a circle as do all other plastid genomes completed todate The gap in the sequence falls between the two copies of

the TIR containing psbA (Janouskovec et al 2010) Weattempted to close this gap using multiple approaches Westarted by massively increasing the depth of sequence cover-age The genome was originally sequenced by 454 pyrose-quencing to a 13-fold depth of coverage so we used anindependent approach and assembled the genome frompaired-end Illumina sequence reads to an average depth ofalmost 600-fold This approach extended the two TIRs by300 bp on each side which was verified by PCR (supplemen-tary fig S6 Supplementary Material online) The very end ofeach TIR was followed by a 15 bp sequence further followedby an extension into the complementary TIR at a very lowdepth of coverage (fig 5) These extensions created an overlapbetween the ends however several lines of evidencesuggested that this is not due to a circular chromosomebut rather a linear topology with structured ends Firstmapping the depth of coverage revealed a pattern expectedfor a linear molecule The depth of coverage was generallyconsistent across the large single copy (LSC) region and mostof the TIRs but reduced linearly beginning at about 600 bpfrom the end followed by an exponential reduction andpartial recovery between 300 to 50 bp from the end followedby a terminal drop (fig 5A) The size and shape of theexponential reductionrecovery would be predicted by themean DNA fragment size in the sequencing library (327 bp)and linear topology as a consequence of fragmentation biasnear the chromosome end Therefore both the overalldecrease in coverage depth and the exponential decreaserecovery close to the ends support a linear chromosomeconformation (if it were circular the whole region shouldnot differ from the rest of the genome) Second if thegenome was circular-mapping and the gap was hard tosequence we would expect some paired ends to span thegap However when more than 650000 paired ends aremapped to the genome not a single paired end spanningthe gap was identified (see Materials and Methods) Insteadboth ends of the chromosome consisted exclusively ofreverse-mapping reads (fig 5A and B) Third PCR experi-ments using six primers (all shown to successfully amplifyproducts from one side of the gap) and all their nested com-binations failed to fill the gap between the two TIRs (supple-mentary fig S6 Supplementary Material online) Fourth thesize of the plastid genome estimated directly by pulsed-fieldgel electrophoresis (PFGE) and by psbA probe hybridizationwas consistent with abundant presence of linear monomers120 kb in size (supplementary fig S7 Supplementary Materialonline and Janouskovec et al 2010) Subgenomic-sizedmaterial and unresolved DNA in the well were also abun-dantly represented whereas putative linear dimers wererare and linear concatemers were not detected The PFGEexperiment was reproduced by using probes to three plastidgenes encoded in the LSC region tufA petA and atpA (seeMaterials and Methods) and consistently led to the sameresult suggesting that linear monomers are an importantconstituent of plastid DNA in C velia Subgenomic-sized mol-ecules could represent plastid DNA fragmented during PFGEpreparation (note that intact C velia cells were digestedin agarose plugs however) nucleus-encoded plastid DNA

2453

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

fragments plastid DNA in the process of replication or otherforms of genuine plastid DNA (Ellis and Day 1986 Oldenburgand Bendich 2004 Scharff and Koop 2006) Incompletely di-gested cells of C velia in the agarose plugs (see Materials andMethods) could be responsible for much of the signal in thewell although high-molecular-weight branched DNA formscould also be present and cannot be distinguished using thecurrent approach (Bendich 1991 Bendich 2004) Altogetherbecause the plastid genome was not closed by deep sequenc-ing using either 454 or Illumina no paired-end linkage couldbe established no linkage could be established by PCR and

because the size of the plastid genome estimated by PFGE andSouthern blot hybridizations was consistent with the exis-tence of linear monomers we conclude that the genome isprincipally linear in structure and circular molecules andlinear concatemers are rare or absent

If the genome is indeed linear it will be important to knowhow the C velia plastid DNA replication and copy numbercompares to the circularconcatemeric plastid DNA inapicomplexans (Williamson et al 2001 Williamson et al2002) and other organisms To provide a first glimpse intothis we identified a putative replication origin of the C velia

A

B

FIG 5 Schematic of plastid chromosome ends (A) Coverage depth (y axis) by forward-mapping (yellow) reverse-mapping (purple) and total (red)genomic reads is projected onto the TIR Position of genes is shown by gray boxes at the bottom (K = trnK P = trnP) Total coverage depth startsdropping gradually at about 800 bp from the chromosome end and goes through a U-shaped minimum at about 150 bp from the end The calculatedmean size of the end-sequenced DNA fragments (horizontal black bar red bits correspond to sequenced end pairs) and coverage by forwardreverseread pairs in this region suggest that the U-shaped minimum results from unequal DNA fragmentation near the chromosome end A steep drop in totalcoverage depth occurs at the very end of the chromosome (B) Here the TIRs on each of the chromosome ends (green and blue respectively) divergeinto a shared 15 bp region followed by a short sequence of the complementary TIR (DNA ends marked as 30 and 50) The total depth of coverage byDNA reads (y axis) steeply decreases in the 15 bp region The 15 bp region is also enriched in nucleotide discrepancies (sequence logos in bold) The last130 bp at each chromosome end is exclusively assembled from reverse-oriented paired ends (horizontal colored arrows) and no paired-end spanningthis region can be identified suggesting that the genome cannot be genuinely circularized

2454

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

plastid genome in silico using the cumulative GC-skew anal-ysis This region (68ndash69 kb from one end of the linear contigsupplementary fig S8A Supplementary Material online) ischaracterized by a minimum in cumulative GC-skew andloosely conserved tandem repeats (positions 68404ndash69029)In contrast both ends of the chromosome are at a cumulativeGC-skew maximum suggesting that they most likely corre-spond to replication termini Another feature commonlyassociated with origins of replication is the major shift ingene orientation which is located approximately 18 kbaway from the predicted replication origin in C velia (supple-mentary fig S8B Supplementary Material online) Thegenomic region surrounding the two features is relativelyoverrepresented in the Illumina data whereas both chromo-some ends are relatively underrepresented (this holds aftercorrection for base composition supplementary fig S8CSupplementary Material online) which is consistent withthe existence of replicating molecules in the Illumina libraryAltogether these three characteristics are consistent withlinear-mapping topology and suggest that replication startsat about two-thirds of the way from one end of the moleculeand proceeds bidirectionally toward the ends The copynumber for two C velia plastid genes (tufA and atpH) wasdetermined using dot blot hybridization to total genomicDNA and a synthetic construct (see Materials andMethods) The inferred copy number of both plastid geneswas 9 times higher compared with a single-copy nuclear genetopoisomerase II and confirmed that genes encoded in theLSC region and TIR are equally represented (atpH is present inthree closely related paralogs all of which hybridized to theconstruct) (supplementary fig S9 Supplementary Materialonline)

Discussion

Significance of Split Proteins for Photosynthesis andATP Generation

Both PsaA and AtpB have never been observed to be frag-mented like those of C velia Although the psaA gene is splitin Chlamydomonas reinhardtii RNA trans-splicing producesa full-length transcript which is translated to a full-lengthprotein (Merendino et al 2006) The structure of the PSIcore is highly conserved in all known oxygenic phototrophs(Nelson and Yocum 2006 Busch and Hippler 2011) and it isformed by a heterodimer of structurally similar PsaA and PsaBproteins which bind a remarkably high number of cofactorsabout 100 chlorophylls 14 carotenoids and 2 phylloquinones(Amunts et al 2010) The algal PSI is expected to be structur-ally similar to the plant PSI including several Lhca antennaproteins bound in a half-circle around the PSI (fig 6A)

PsaA in C velia is split between domains 4 and 5 (figs 1 and6A) This suggests that the split was not simply a reversal ofthe proposed ancient formation of psaA by the gene fusion ofa psbBpsbC-like antenna component and a psbApsbD-likecore of the reaction centre (domains 1ndash6 and 7ndash11 respec-tively in fig 1 Schubert et al 1998) The position of the splitand the faster evolutionary rate of PsaA-1 (peripheral frag-ment) can be reconciled with the structure of the PSI (fig 6A

and B) Indeed the presence of a supercomplex of PSI andantenna proteins in C velia is obvious from the 2D electro-phoresis (fig 3) Apart from the structure of PSI itself theprocess of the PSI assembly seems to be highly conservedthrough evolution from cyanobacteria to higher plant chlo-roplasts including assistance of the same auxiliary factors suchas Ycf3 and Ycf4 (Boudreau et al 1997 Ruf et al 1997)Although many individual steps in the PSI biogenesisremain unknown this process had to be remodeled inC velia to assemble three proteins instead of two into thefunctional PSI core complex Particularly intriguing is thebinding of chlorophyll cofactors into the split PsaA Both invivo and in vitro studies suggest that chlorophyll has to beinserted into core subunits of PSI co-translationally probablyas a prerequisite for correct protein folding (Kim et al 1994Eichacker et al 1996) The putative interface between PsaA-1and PsaA-2 is rich in chlorophyll molecules (fig 6B) andaccording to available crystal structures these chlorophyllsare coordinated by both parts of split PsaA It is not clearhow C velia inserts these chlorophylls into PSI and how sta-bility and correct folding of the nascent PsaA-12 is achievedPerhaps both PsaA fragments are assembled together co-translationally and this is synchronized with or even assistedby chlorophyll loading (which could also help to coordinatethe orientation of the two fragments) In any case the PSI corebiogenesis in C velia could involve additional synchronizingassembly step(s) and the recruitment of new nuclear-encoded auxiliary factors Interestingly the second proteinof the PSI antenna core PsaB mirrors some of the PsaAstructure at functionally analogous positions although notsplit PsaB also contains a variable loop between the fourthand fifth trans-membrane helices and a highly divergentN-terminus Whether this plays a compensatory role directlyrelated to the splitting of PsaA is not clear but it is an inter-esting possibility that would require direct biochemicaltesting

Similar to the PsaA protein any split of the AtpB proteinwould seem improbable due to its critical function andconserved structure Even though the proposed AtpB-1part of the b-subunit appears to be relatively far from thecatalytic site (fig 6C) it is known that tilting of the b-subunitsincluding the top b-sheet crown is critical for the catalyticcycle The movement of the upper part of the b-subunit issubtle but if it is restricted by inhibitor tentoxin then thecyclic interconversion of nucleotide binding sites is blocked(Groth 2002) Tentoxin binds close to the place where theAtpB protein is split (Groth 2002) Correct reassembly ofAtpB-1 and AtpB-2 proteins is therefore likely to be highlyconstrained to avoid any restrictions of AtpB-2 movementOn the other hand it is also possible that the fragmentedAtpB-2 is more flexible and perhaps less sensitive to somenatural inhibitors The ATP synthase was also noteworthy inthat it migrated at the top of the native protein gel This hasnever been observed in cyanobacteria (Herranen et al 2004)or higher plants (Jarvi et al 2011) and indicates that ATPsynthase might be integrated into a very stable megadaltonsupercomplex in the plastid of C velia

2455

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Highly Divergent Characteristics of Gene andChromosomal Structure

A number of highly modified ORFs are found in the C veliaplastid genome and distributed without any obvious patternLong extensions or insertions in plastid proteins are uncom-mon in most plastids including those in other alveolatesSome have been identified in green algae (eg de Cambiaireet al 2006) but never in the extent or number reported hereMany of the C velia predicted proteins that contain exten-sions function in the small subunit of the ribosome Ifthe extensions are really present in the final functional pro-teins this has implications for plastid ribosome structureAlternatively transcripts of the rps ORFs and other unusualORFs could be modified so that only the homologous regionis translated This process may be related to transcript oligour-idylylation but as yet we have no evidence to support thisOligoU tailing is another modification of general interest(Wang and Morse 2006) and its significance and distribution

in C velia plastid transcripts remain unknown OligoU tailsin some C velia transcripts are found in mature mRNAs thatare derived from polycistronic primary transcripts (fig 1)This suggests a possible role in transcript processing perhapssimilar to that observed in dinoflagellate plastids (Nelsonet al 2007 Dang and Green 2009 Barbrook et al 2012)If confirmed the ubiquitous presence of polycistronictranscription would imply that oligoU tailing is widespreadin the C velia plastid and tightly intertwined with expressionof many core plastid genes

No linear-mapping plastid genome has been documentedto date although there is a longstanding debate aboutwhether circular-mapping plastid genomes might representphysically linear molecules (Bendich 2004) At a sufficientdepth of coverage the two ends of the C velia plastidgenome can be (barely) overlapped in sequence but multiplelines of evidence suggest the genome is actually linear ThePFGE data support existence of linear monomers and rare

FIG 6 A putative position of the PsaA proteins in PSI and the AtpB proteins in ATP synthase of Chromera velia as approximated from proteinalignments and available crystal structures of plant counterparts (A) A stromal view on the pea PSI-LHC complex (PDB ID code 2WSC Amunts et al2009) showing a putative position of PsaA-1 and PsaA-2 in the PSI complex Lhca antenna proteins are shown in green stromal subunits PsaC PsaD andPsaE were removed for clarity (B) A detail of the same structure showing arrangement of chlorophyll molecules bound to the PSI core Red circle markschlorophylls expected to be bound at an interface between PsaA-1 and PsaA-2 proteins in C velia (see discussion for details) (C) AtpB-1 and AtpB-2proteins fitted into structure of the spinach F1-ATPase (PDB ID code 1KMH) Catalytic site is indicated by an orange asterisk

2456

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

dimers (which could arise by TIR-mediated recombinationbetween two monomers) but is incongruent with a signifi-cant presence of linear concatemers This leaves no directcomparisons possible with other plastid DNAs includingthose in related apicomplexans which comprise various mix-tures of circular molecules linear concatemers and high-mo-lecular-weight material (Lilly et al 2001 Williamson et al 2002Bendich 2004 Day and Madesis 2007) Presence of TIRs in theC velia plastid genome is a common characteristic of lineargenomes (eg in apicomplexan mitochondria Kairo et al1994 Hikosaka et al 2010) The physical structure of thechromosome ends remains unknown but given thesequence one can speculate based on structures observedin other linear genomes These include single-strandedloops (possibly followed by a nick) or single-stranded over-hangs in the 15 bp regions (reviewed in Nosek et al 2004) ordiverse associations with end-specific proteins (eg Rekoshet al 1977 Tomaska et al 1997) Gradually decreasing depthof coverage near the chromosome end may also suggest thatthe whole terminal 600 bp region not just its very end isprotected possibly by a t-loop (Tomaska et al 2002) Norepeated sequence can be found close to the ends howeverso determining which of these structures if any most closelyrepresents the physical state of the C velia plastid chromo-some will require direct testing Altogether these data provideevidence for the first linear-mapping plastid genome inC velia and suggests that a similar topology could existin other plastid genomes particularly those that do notpresently map as a circle (Gabrielsen et al 2011)

The change in the plastid chromosome topology hasimportant implications on the process of DNA replicationand overcoming the end-replication problem In the api-complexan and other plastids several types of replicationorigins have been identified and linked to the D-loop roll-ing circle and recombination-mediated replication strate-gies The most common type of replication origin isassociated with bidirectional D-loop replication and locatedinside the duplicated rRNA inverted repeat (Kolodner andTewari 1975 Williamson et al 2002 Krishnan and Rao2009) In contrast the C velia plastid genome lacks boththe duplicated rRNA unit and associated replication originand is probably replicated bidirectionally from thepredicted putative replication origin toward the chromo-some termini which is a unique situation among all plastidgenomes The estimated plastid gene copy number (9that of a nuclear marker) is also unusually low comparedwith most photosynthetic plastids (~50ndash100 copies Dayand Madesis 2007) and is more similar to the gene copynumber in the non-photosynthetic plastids of apicomplex-ans (~25 copies Matsuzaki et al 2001) or plastid minicir-cles during exponential growth in dinoflagellates(2ndash4 copies Koumandou and Howe 2007)

Searching for Forces Driving Structural Complexity

The C velia plastid genome is unique in a number of waysHere we have examined several unusual characteristics ofits organization and expression all of which raises the

question why is this genome home to so many odditiesOne property that might explain several of these character-istics concurrently is a high level of intragenomic duplicationThe C velia plastid genome has been extensively reshuffledand in order to understand the underlying forces wesearched for palindromes short repeats and long repeatsPalindromes and short repeats were comparatively rarehowever the searches identified at least 46 pairwise matchesbetween longer genomic sequences (gt50 bp) at 65ndash100similarity (fig 7) Most of the duplicates are divergent andlow copy number (2ndash3) and apart from the TIRs they includemany intergenic regions duplicated atpH and clpC genesand duplicated fragments of otherwise complete genes(psaA psaB psbB rpl3 and tufA fig 7) The region containingmost repeated sequence is found around the TIR boundary(fig 8) Multiple parts of this region including severalgene fragments three atpH paralogs and six trnK paralogscomprising both complete and fragmentary variants arescattered in several places in the genome (fig 8) Long repeatsare rare in most plastids suggesting that duplication isan important force in shaping the C velia plastid genomeDuplication could explain occurrence of paralogs small genefragments extensive gene re-shuffling and re-structuringof operon units through movement of promoters andintergenic elements Similar processes could account for theaddition of extensions and insertions to genes The splitting ofpsaA and atpB could likewise be seen as a process involvingduplication or partial duplication For example the PsaA splitcould have proceeded through an intermediate in which the

FIG 7 Large repeats in the Chromera velia plastid genome The lineargenome is drawn as an incomplete circle with genes in black intergenicsin light gray and two TIRs (white lines) at chromosome ends (15 bpregions at the termini are not visible here) Links connect homologousregions of 50 bp or more identified by BLAST (dark gray) and additionalmatches identified by Pipmaker (medium gray) Homology between theTIRs is not shown for clarity Names of genes most affected by dupli-cation events are indicated

2457

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

canonical and split PsaAs coexisted In this scenario each ofthe two complexes could have been regulated independentlyand potentially even have different functions much alike thecyanobacterial monomeric and trimeric PSI (Majeed et al2012)

Duplicated sequences in C velia plastid are low copynumber long and divergent from each other They aredissimilar from short dispersed repeats found in some algalplastids (eg in Chlamydomonas Maul et al 2002) and longerrepeats found in angiosperm plastids (eg Guisinger et al2011) all of which are conserved in sequence and compara-tively high copy number It remains unclear how duplicationis mediated in C velia Few duplicates are in tandem so theduplication process is not likely to be related to DNA repli-cation and is more likely related to recombination A func-tional DNA recombination machinery has been documentedin several land plant and algal plastids (Palmer 1983 Boyntonet al 1988 Haberle et al 2008) where it primarily acts in DNArepair and replication Unwanted genome reorganization isknown to be actively prevented which explains why mostplastids have maintained a conserved genomic architecture(reviewed in Marechal and Brisson 2010) It is possiblethat mechanisms preventing unwanted recombination oralternatively those allowing for recombination-mediatedrepair were loosened or lost in the C velia plastid whichled to propagation of duplicates and rearrangementsThe apparent linearization of the chromosome could alsostem from a recombination-mediated rearrangementinvolving homologous TIRs and 15 bp repeats now locatedat the termini Interestingly although sequence shuffling hasbeen extensive in the C velia plastid genome it has not beencompletely random Most genes have been reorganized inlarge clusters with a strongly pronounced strand polarity(supplementary fig S8B Supplementary Material online)More importantly functionally related genes often clustertogether and are most likely transcribed polycistronically(fig 4) so even with high levels of rearrangement a function-ally reorganized genomic architecture can result fromselection

Intragenomic duplication may be the cause of most pecu-liarities in the C velia plastid genome but not all of themFeatures such as transcript oligoU tailing and noncanonicalgenetic code (Moore et al 2008 Janouskovec et al 2010) may

have different causes including increased mutation rate smallpopulation size and a short replication cycle Moreover earlyevolutionary acquisition and long-term combined effect ofsome these forces could have significantly contributed tothe genome remodeling For example all alveolate plastids(C velia apicomplexans Vitrella and dinoflagellates) have areduced gene complement somewhat modified gene orderand comparatively fast rate of protein evolution (Janouskovecet al 2010) Oligouridylylation of plastid transcripts is alsofound in dinoflagellates and may significantly predateC velia although it seems to be missing in apicomplexans(Janouskovec et al 2010 JJ unpublished data) The AtpBinsertion present in dinoflagellates and Vitrella at a homolo-gous position to the split in C velia could be viewed as anancestral acquisition that may have individualized the twoparts of the protein preparing ground for the split Similarlyrecombination has been very active in the dinoflagellate plas-tids and possibly even led to a massive fragmentation of theirgenome to small mini-circles (Zhang et al 1999 Zhang et al2001) so it might be tempting to speculate that the corecause of many of these conditions traces back to an ancientancestor Distinguishing between ancient and independentgains is nevertheless difficult The plastid genomes of apicom-plexans and Vitrella are generally not that unusual in struc-ture as C velia and dinoflagellates so it is more accurate topropose that all alveolate plastids were ancestrally somewhatdivergent but evolved in different directions

The fact that in a single organism two conservativephotosynthesis proteins are split is astonishing given thehigh efficiency of C velia photosynthesis (Quigg et al 2012)The mechanism behind a particular split can be relativelysimple but it is more difficult to see how each of the PsaAand AtpB fragments became integrated into a functionalmultiprotein complex For example both PsaA fragmentshad to coevolve with mechanisms allowing co-translationalinsertion of chlorophylls and simultaneously acquireremodelinteractions with numerous cofactors that mediate theirassembly into a fully functional PSI complex Many otherstructures in the C velia plastid appear similarly complicatedBoth the ribosome and RNA polymerase complex comprisea number of highly modified components including proteinswith hundreds of amino acid long extensions and insertionsExplaining the appearance of highly modified structures

FIG 8 Intrachromosomal duplication and recombination hotspot around the TIR boundary (vertical dashed line) The plastid genome is shown bythick black line at the bottom with TIRs (TIR-A and TIR-B) folded onto themselves to the left and the LSC region forming a loop to the right Other partsof the genome with homology to the TIR boundary region are shown by thick gray lines numbers indicate positions Genes and pseudogenes are shownby boxes Level of sequence similarity is distinguished according to the key

2458

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is more difficult than their components individually becausemany parts of these systems are mutually constrainedLikewise molecular processes such as oligoU tailing aredifficult to justify in adaptive terms Considering the complex-ity of similar unnecessary processes in other organellessuch as RNA editing or intron splicing (Gray et al 2010)it is attractive to hypothesize that complicated molecularmachineries have evolved in the C velia plastid that serveno general advantage

Materials and Methods

DNA Extraction Sequencing Annotation andFragmented Gene Analysis

Pelleted cells of C velia were ground in liquid nitrogenusing mortar and pestle and the resulting slurry was incu-bated in CTAB buffer (2 wv cetyltrimethyl ammoniumbromide 142 M NaCl 20 mM EDTA 100 mM Tris HCl pH80 2 wv polyvinylpyrrolidone 05 b-mercaptoethanol1 mgml RNase A) at 65 C for 20 min After two extrac-tions with phenolchloroform and one with chloroformonly DNA was precipitated with isopropanol and washedwith ethanol Pellet was dried at room temperature andresuspended in TE buffer Extracted DNA was separated byCsCl-Hoechst gradient ultracentrifugation and AT-enrichedfractions were tested for the presence of plastid DNA usingthe psbA probe Plastid-enriched DNA fractions weresequenced using Illumina 54 bp paired-end reads technol-ogy (a total of 9591179 read pairs were obtained) anddeposited in GenBank Sequence Read Archive (BioprojectPRJNA193178) De novo sequence assembly using MIRA3extended the previous plastid contig (NC_0143401) atboth ends to the final length of 120426 nucleotides at572 average depth of coverage Two errors at homopol-ymeric regions were corrected which led to the merging ofrps7 with orf142 and rpl4 with orf128 The gene for 5Sribosomal RNA was identified upstream of 23S rRNAand its fold verified using Mfold (httpmfoldrnaalbanyeduq=mfold last accessed May 25 2013) Proteins werere-annotated by comparison with plastid homologs froman NTG start codon with the exception of 10 proteins thatwere more consistent with an AT[ACT] start codon Six ofthe proteins had no alternative NTG start (atpA rpl31rps7 rps12 rps14 ycf3) Unknown ORFs and genes withlong N-terminal extensions were annotated from the firstNTG start codon The extended plastid genome sequenceand new annotations were deposited in an updated C veliaplastid genome entry (NC_0143402) PsaA trans-membranedomains were predicted using TMHMM 20 (httpwwwcbsdtudkservicesTMHMM last accessed May 25 2013)and by comparison to PsaA of Synechoccocus elongatusThe corrected genetic distances for PsaA-1 and PsaA-2were calculated in Tree-Puzzle 52 (Schmidt et al 2002)using a data set of 40 representative plastid PsaA homologsand the following parameters Likelihood mapping Slow(exact) parameter estimates WAG model with fourgamma categories estimated alpha parameter and esti-mated aa frequencies PsaA-12 phylogenies were

computed using a broader data set in RAxML using -mPROTGAMMALGF -f a - 100 PsaA-12 and AtpB data setswere aligned using ndashlocalpair option in MAFFT

RNA Extraction RACE and Northern BlotHybridization

Total RNA was isolated from ~02 g of C velia cells rup-tured by repeated freezing and thawing followed by grind-ing with a pestle in liquid nitrogen by the addition of 5 mlTRI reagent followed by manufacturerrsquos instructions(Sigma) Total RNA was ligated with 50 adapter (50

RACE) or self-ligated with T4 RNA ligase (circular RACE)and RT-PCR was carried out with either random or specificprimers or with polyT primers (30 RACE) Sequences ofcircularized transcripts were deposited under GenBankNucleotide accessions KC734564ndashKC734569 For Northernblot analysis ~10mg per lane of total RNA was separatedin a 1 formaldehyde agarose gel in 1 MOPS bufferblotted and UV cross-linked as described elsewhere(Vondruskova et al 2005) PCR-amplified DNA fragmentsof selected genes (for primers see supplementary table S4Supplementary Material online) were labeled by randompriming with [32P]-dATP and used as probes for hybrid-ization at 55 C After washing the results were visualizedon Typhoon Imaging System (GE Healthcare)

Preparations of Cell Membranes and TwoDimensional Electrophoresis

Cells of C velia (optical density at 750 nm = ~05) werewashed and resuspended in 1 ml of the working buffer con-taining 25 mM MESNaOH pH 65 5 mM CaCl2 10 mMMgCl2 and 20 glycerol The resuspended cells were mixedwith 05 ml of glass beads (01 mm diameter) in 2 ml eppen-dorf tube and broken using Mini-BeadBeater (BioSpeceight shaking cycles 10 s each with 2-min breaks for coolingthe suspension on ice) Membranes were separated fromthe cell extract by centrifugation (40000 g 20 min)The isolated membranes were resuspended in the workingbuffer and solubilized by gentle shaking with 1 dodecyl-b-maltoside at 10 C for 1 h Insoluble parts were removedby centrifugation (65000 g 20 min) Analysis of nativemembrane complexes was performed using Clear-Nativeelectrophoresis as described in Wittig and Schagger (2008)Individual proteins in membrane complexes were resolvedin the second dimension by SDS-PAGE in a 12ndash20 lineargradient polyacrylamide gel containing 7 M urea (Sobotkaet al 2008)

Protein Identification by LC-MSMS Analysis

Gel slices were placed in 200ml of 40 acetonitrile 200 mMammonium bicarbonate and incubated at 37 C for 30 minThe solution was then discarded the procedure was repeatedone time and the gel was finally dried in a vacuum centrifugeTwenty microliters of 40 mM ammonium bicarbonatecontaining 04mg trypsin (proteomics grade Sigma) wasadded to the tube incubated at 4 C for 45 min and thendried in a vacuum centrifuge To digest proteins 20ml of 9

2459

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 5: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

When compared with PsaA the AtpB split is suggestiveof a different but equally interesting process The regionsurrounding the split is conserved in plastid mitochondrialand bacterial homologs but interrupted by a 24 amino acid(aa) insertion in Vitrella C veliarsquos closest photosyntheticrelative (fig 2C) A detailed sequence alignment of thisregion shows that the split in C velia AtpB occurred withinthis shared insertion suggesting that the ancestral acquisitionof this insertion might have provided an evolutionary oppor-tunity for the split A 31 aa long insertion at the same site is

also present in the AtpB of the related dinoflagellateAmphidinium (data not shown) Although unrelated insequence the notorious divergence of dinoflagellate plastidproteins and their greater distance to C velia and Vitrellacould easily account for the insertion divergence pointingto a possibly even deeper origin

Many Other Plastid Open Reading Frames areExtraordinary and Transcribed

PsaA and AtpB splits stand out as unprecedented cases butrepresent only a fraction of peculiarities in C velia plastidopen reading frames (ORFs) The genome contains manyunusually modified ORFs including several genes that areexpected to be present in plastid genomes but which areunusually large due to the presence of long extensions thatbear no recognizable similarity to known genes In three genes(rps7 8 and 17) the ORF region has been extended by 126 to273 amino acids toward the C-terminus effectively triplingthem in size (supplementary fig S2 SupplementaryMaterial online) In four other genes (rps3 4 8 and 11) theregion sharing homology with other plastid proteins is foundat the end of an unusually large ORF (supplementary fig S3Supplementary Material online) The most obvious exampleof this is Rps4 which is encoded within a 1998 aa long ORFcompared with 201 aa in the red alga Porphyra purpurea Inthese cases start codons may be located near the beginning ofthe homologous region but it is still puzzling why the homol-ogous region would be preceded by such a long stretch ofsequence uninterrupted by stop codons Canonical Shine-Dalgarno sequences are absent (the predicted 30-end of the16S rRNA is highly divergent in C velia) and therefore cannotsupport start codon prediction ORFs of two additional genesrpoC1 and rpoC2 contain long insertions (507 and 971 aa)in their variable regions (none of them appears to representan intein supplementary fig S4 Supplementary Materialonline) Finally a number of smaller changes are found inORFs throughout the C velia plastid genome Accountingfor extensions and indels longer than 20 aa five other ORFshave a truncated N-terminus (atpA ccsA petA ycf3 andpsaB) one ORF has a significantly extended N-terminus(rpl4) two ORFs have a truncated C-terminus (ccsA andrpoA) five ORFs have an extended C-terminus (atpA atpBatpH2 petA and rps2) and three ORFs contain interveninginsertions (rpl6 rpoB and psaB)

Altogether ORFs of various functionally unrelated proteinscontain unusual modifications Many of the proteins areindispensable and exclusively plastid-encoded in algae andplants suggesting that they are unlikely to be in the processof degeneration loss or relocation to the nucleus in C veliaAccordingly none of these proteins can be identified in theavailable nuclear transcriptomes from C velia (Woehle et al2011) To provide first insights into expression of the unusualproteins we analyzed the plastid transcriptome using totaland polyA RNA fraction sequence that we generated (polyURNA sequence was not available for this analysis) Transcriptmapping from the total RNA fraction (see Materials andMethods) revealed that all unusually long ORFs are coveredby transcriptome reads roughly equally in their entirety

FIG 3 The 2D electrophoresis of membrane protein complexes ofChromera velia and identification of individual spots Membrane pro-teins were solubilized by dodecyl-b-maltoside and separated in the firstdimension by Clear-Native electrophoresis (Clear-Native-PAGE) Afterthe separation the gel was scanned by LAS 4000 imager (Fuji Japan) inchlorophyll fluorescence mode (Chl fl) after excitation by blue LED lightto distinguish the PSII and light-harvesting complexes The proteincomplexes resolved in the first dimension were further separatedin the second dimension by denaturing gel (SDS-PAGE) and stainedby Coomassie blue Protein spots were cut from the gel and analyzedby LC-MSMS as described in Materials and Methods Identified spotsare highlighted and positions of protein complexes separated byClear-Native electrophoresis are marked as follows PSII PhotosystemII SC-PSII Supercomplexes of photosystem II with antenna SC-PSISupercomplexes of photosystem I with antenna

2451

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

(supplementary figs S2 and S3 Supplementary Materialonline) Although this approach cannot accurately pinpointthe position of start codons (many rps genes are expressedrelatively little and intergenic regions are often transcribed)the consistent coverage by transcripts suggests that the ORFsmay really have expanded in size at the protein level Much ofthe long insertions in rpo genes is also equally represented inthe transcriptome data however in this case a drop in cov-erage is found within each insertion creating a 200 bp gap(supplementary fig S4 Supplementary Material online) Thisraises the possibility that RpoC1 and RpoC2 may be splitmuch like PsaA and AtpB however both are in frame andlow coverage so more evidence is required to distinguishbetween the two alternatives

Unusual changes to gene presence have also been notedtwo genes are present in two (atpH) and three (clpC) full-length paralogs despite that a single type is found in otherplastid genomes All three clpC paralogs are highly divergentin sequence from each other and clpCs in other plastids Theyoverlap only partially and the largest and most complete ofthe paralogs (clpC3) is an order of magnitude less expressedthan the other two (supplementary fig S5A SupplementaryMaterial online) Likewise the two atpH paralogs are differentin sequence (93 nucleotide similarity) length and expres-sion AtpH1 is both highly expressed and conserved butthe expression of the duplicated atpH2 located on the TIRsis nearly 100-fold lower (supplementary fig S5BSupplementary Material online) The atpH2 gene contains

an unusual C-terminal extension which doubles it in size Acloser look at the surrounding sequence reveals that atpH2originated by a duplication event of the atpH1 region (includ-ing trnK discussed later) This together with its low expres-sion suggests that atpH2 is most likely a non-functionalpseudogene More generally however this example illustrateshow duplication can drive gene paralogy ORF expansionand altered gene order all features observed throughoutthe C velia plastid genome

Analysis of the total RNA and polyA fraction (fig 4) tran-scription profiles revealed more unexpected featuresTranscripts for all plastid genes were present in the totalRNA fraction and consistently also found in the polyA frac-tion at two to three magnitudes lower abundance probablydue to a carryover of non-polyA transcripts (fig 4) The psbAgene stood out in two ways however First it was highlyexpressed being represented at almost an order of magnitudehigher levels than other protein-coding genes (and similarlevels as the rRNA operon fig 4 and table 1) Second therepresentation of psbA in the polyA fraction was about equalto the total RNA fraction which would be the expected resultfor a polyadenylated transcript (fig 4) Only two additionalregions corresponding to rps19 and a psbB fragment (partof the annotated orf157) were relatively overrepresented inthe polyA fraction but only rps19 contained A-rich genomicsequence at the 30-end that could explain the bias Whetherthe psbA gene and psbB fragment are uniquely polyadeny-lated or a carryover into the polyA fraction occurred due to

10000 20000 30000 40000

105

104

106

103

102

10

1

atpB-2 psaA-1 psbB psaA-2 psbC psbDpsaB

position (bp)

Cov

erag

e pe

r bp

(Lo

g)

atpH2

Photosystem I + II ATPase Uknown ORFExpression Pseudogene fragmentOther functiontotal RNA polyA RNA

psbE

50000 60000 70000 80000

105

104

106

103

102

10

1

23S rRNA atpB-1 atpH1 atpI

Cov

erag

e pe

r bp

(Lo

g)

psbV K psbJ

90000 100000 110000 120000

105

104

106

103

102

10

1

psbAatpA

Cov

erag

e pe

r bp

(Lo

g)

atpH2psbHpsbT psaC petGB

petA

Cytochrome b6f

rps19

tufA

petD

16S rRNA

psbA psbB

FIG 4 Plastid total RNA and polyA RNA transcriptomic profiles Coverage by total RNA (red) and polyA RNA (blue) is shown for the whole plastidgenome (Log scale) Gene families are color-coded according to the key The highly expressed psbA gene is highlighted in bold The split psaA and atpBgenes are in red The boundaries of the TIRs are separated by dashed vertical lines

2452

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

another reason is currently not clear and awaits successfulcloning of their 30 transcript ends The transcriptomes alsoenabled us to analyze expression pattern among neigboringgenes which have been highly rearranged and strand-polar-ized in C velia compared with other plastids Canonicaloperons (atpIHGFDA psaAB petBD ribosomal proteinsupercluster ribosomal RNA operon eg Westhoff et al1983) and translationally coupled units (atpBE Gatenbyet al 1989) were also affected by rearrangements and eitherfragmented or internally reshuffled Canonical promoterscould not be identified These characteristics indicate thattranscription of many core plastid genes has been remodeledNorthern hybridization patterns in the psaA-1 and psaA-2region already revealed that this is the case in several keyproteins of the photosystems which are expressed fromnovel multicistronic mRNAs through complex processing(fig 1) Whole-genome transcriptional profiles (fig 4) confirmthis and suggest that polycistronic transcription is widespreadin the genome (compare transcript coverage at intergenicspacers and the region downstream of the 16S rRNA genefig 4) Although the primary transcript regions and processingpatterns cannot be predicted accurately the data show thatnovel polycistronic units most likely exist and often includefunctionally similar genes (compare distribution of genes forphotosynthesis and protein expression in fig 4)

Finally we estimated and compared relative expression ofall plastid genes separately and in their functional groupsbased on the total RNA transcriptome (table 1 and supple-mentary table S3 Supplementary Material online) Asexpected the rRNA genes were most highly expressedfollowed by genes of the four membrane complexes (PSIPSII cytochrome b6f and ATP synthase) and these alsocomprised all of the 22 most expressed protein-codinggenes with the single exception of tufA The least expressedwere genes related to transcription and translation RNApolymerase tRNA genes and ribosomal proteins (table 1)However unknown ORFs and most intergenic regions werealso expressed including relatively highly expressed orf389orf201 and intergenic regions downstream of psbJ andatpB-1 (fig 4) Additionally we used the transcriptome toaddress the presence of RNA editing which is known in theplastids of the related dinoflagellates (eg Dang and Green2009) but has not been found in those of apicomplexans andVitrella The consensus of transcriptomic reads at each sitewas identical to the C velia plastid genome sequence suggest-ing the absence of editing Only four sites showed significantpolymorphism (gt10) at RNA level (sites 44346 [515]46071 [2366] 46074 [3935] and 46574 [2379]) all ofwhich were located within the rRNA operon and identical(998ndash100) at the DNA level Based on their location weconclude that this is more likely due to reverse transcriptionerrors of modified rRNA nucleosides and not RNA editing

A Linear-Mapping Plastid Genome in C velia

One of the most significant features of the previouslypublished plastid genome from C velia is that it does notmap as a circle as do all other plastid genomes completed todate The gap in the sequence falls between the two copies of

the TIR containing psbA (Janouskovec et al 2010) Weattempted to close this gap using multiple approaches Westarted by massively increasing the depth of sequence cover-age The genome was originally sequenced by 454 pyrose-quencing to a 13-fold depth of coverage so we used anindependent approach and assembled the genome frompaired-end Illumina sequence reads to an average depth ofalmost 600-fold This approach extended the two TIRs by300 bp on each side which was verified by PCR (supplemen-tary fig S6 Supplementary Material online) The very end ofeach TIR was followed by a 15 bp sequence further followedby an extension into the complementary TIR at a very lowdepth of coverage (fig 5) These extensions created an overlapbetween the ends however several lines of evidencesuggested that this is not due to a circular chromosomebut rather a linear topology with structured ends Firstmapping the depth of coverage revealed a pattern expectedfor a linear molecule The depth of coverage was generallyconsistent across the large single copy (LSC) region and mostof the TIRs but reduced linearly beginning at about 600 bpfrom the end followed by an exponential reduction andpartial recovery between 300 to 50 bp from the end followedby a terminal drop (fig 5A) The size and shape of theexponential reductionrecovery would be predicted by themean DNA fragment size in the sequencing library (327 bp)and linear topology as a consequence of fragmentation biasnear the chromosome end Therefore both the overalldecrease in coverage depth and the exponential decreaserecovery close to the ends support a linear chromosomeconformation (if it were circular the whole region shouldnot differ from the rest of the genome) Second if thegenome was circular-mapping and the gap was hard tosequence we would expect some paired ends to span thegap However when more than 650000 paired ends aremapped to the genome not a single paired end spanningthe gap was identified (see Materials and Methods) Insteadboth ends of the chromosome consisted exclusively ofreverse-mapping reads (fig 5A and B) Third PCR experi-ments using six primers (all shown to successfully amplifyproducts from one side of the gap) and all their nested com-binations failed to fill the gap between the two TIRs (supple-mentary fig S6 Supplementary Material online) Fourth thesize of the plastid genome estimated directly by pulsed-fieldgel electrophoresis (PFGE) and by psbA probe hybridizationwas consistent with abundant presence of linear monomers120 kb in size (supplementary fig S7 Supplementary Materialonline and Janouskovec et al 2010) Subgenomic-sizedmaterial and unresolved DNA in the well were also abun-dantly represented whereas putative linear dimers wererare and linear concatemers were not detected The PFGEexperiment was reproduced by using probes to three plastidgenes encoded in the LSC region tufA petA and atpA (seeMaterials and Methods) and consistently led to the sameresult suggesting that linear monomers are an importantconstituent of plastid DNA in C velia Subgenomic-sized mol-ecules could represent plastid DNA fragmented during PFGEpreparation (note that intact C velia cells were digestedin agarose plugs however) nucleus-encoded plastid DNA

2453

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

fragments plastid DNA in the process of replication or otherforms of genuine plastid DNA (Ellis and Day 1986 Oldenburgand Bendich 2004 Scharff and Koop 2006) Incompletely di-gested cells of C velia in the agarose plugs (see Materials andMethods) could be responsible for much of the signal in thewell although high-molecular-weight branched DNA formscould also be present and cannot be distinguished using thecurrent approach (Bendich 1991 Bendich 2004) Altogetherbecause the plastid genome was not closed by deep sequenc-ing using either 454 or Illumina no paired-end linkage couldbe established no linkage could be established by PCR and

because the size of the plastid genome estimated by PFGE andSouthern blot hybridizations was consistent with the exis-tence of linear monomers we conclude that the genome isprincipally linear in structure and circular molecules andlinear concatemers are rare or absent

If the genome is indeed linear it will be important to knowhow the C velia plastid DNA replication and copy numbercompares to the circularconcatemeric plastid DNA inapicomplexans (Williamson et al 2001 Williamson et al2002) and other organisms To provide a first glimpse intothis we identified a putative replication origin of the C velia

A

B

FIG 5 Schematic of plastid chromosome ends (A) Coverage depth (y axis) by forward-mapping (yellow) reverse-mapping (purple) and total (red)genomic reads is projected onto the TIR Position of genes is shown by gray boxes at the bottom (K = trnK P = trnP) Total coverage depth startsdropping gradually at about 800 bp from the chromosome end and goes through a U-shaped minimum at about 150 bp from the end The calculatedmean size of the end-sequenced DNA fragments (horizontal black bar red bits correspond to sequenced end pairs) and coverage by forwardreverseread pairs in this region suggest that the U-shaped minimum results from unequal DNA fragmentation near the chromosome end A steep drop in totalcoverage depth occurs at the very end of the chromosome (B) Here the TIRs on each of the chromosome ends (green and blue respectively) divergeinto a shared 15 bp region followed by a short sequence of the complementary TIR (DNA ends marked as 30 and 50) The total depth of coverage byDNA reads (y axis) steeply decreases in the 15 bp region The 15 bp region is also enriched in nucleotide discrepancies (sequence logos in bold) The last130 bp at each chromosome end is exclusively assembled from reverse-oriented paired ends (horizontal colored arrows) and no paired-end spanningthis region can be identified suggesting that the genome cannot be genuinely circularized

2454

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

plastid genome in silico using the cumulative GC-skew anal-ysis This region (68ndash69 kb from one end of the linear contigsupplementary fig S8A Supplementary Material online) ischaracterized by a minimum in cumulative GC-skew andloosely conserved tandem repeats (positions 68404ndash69029)In contrast both ends of the chromosome are at a cumulativeGC-skew maximum suggesting that they most likely corre-spond to replication termini Another feature commonlyassociated with origins of replication is the major shift ingene orientation which is located approximately 18 kbaway from the predicted replication origin in C velia (supple-mentary fig S8B Supplementary Material online) Thegenomic region surrounding the two features is relativelyoverrepresented in the Illumina data whereas both chromo-some ends are relatively underrepresented (this holds aftercorrection for base composition supplementary fig S8CSupplementary Material online) which is consistent withthe existence of replicating molecules in the Illumina libraryAltogether these three characteristics are consistent withlinear-mapping topology and suggest that replication startsat about two-thirds of the way from one end of the moleculeand proceeds bidirectionally toward the ends The copynumber for two C velia plastid genes (tufA and atpH) wasdetermined using dot blot hybridization to total genomicDNA and a synthetic construct (see Materials andMethods) The inferred copy number of both plastid geneswas 9 times higher compared with a single-copy nuclear genetopoisomerase II and confirmed that genes encoded in theLSC region and TIR are equally represented (atpH is present inthree closely related paralogs all of which hybridized to theconstruct) (supplementary fig S9 Supplementary Materialonline)

Discussion

Significance of Split Proteins for Photosynthesis andATP Generation

Both PsaA and AtpB have never been observed to be frag-mented like those of C velia Although the psaA gene is splitin Chlamydomonas reinhardtii RNA trans-splicing producesa full-length transcript which is translated to a full-lengthprotein (Merendino et al 2006) The structure of the PSIcore is highly conserved in all known oxygenic phototrophs(Nelson and Yocum 2006 Busch and Hippler 2011) and it isformed by a heterodimer of structurally similar PsaA and PsaBproteins which bind a remarkably high number of cofactorsabout 100 chlorophylls 14 carotenoids and 2 phylloquinones(Amunts et al 2010) The algal PSI is expected to be structur-ally similar to the plant PSI including several Lhca antennaproteins bound in a half-circle around the PSI (fig 6A)

PsaA in C velia is split between domains 4 and 5 (figs 1 and6A) This suggests that the split was not simply a reversal ofthe proposed ancient formation of psaA by the gene fusion ofa psbBpsbC-like antenna component and a psbApsbD-likecore of the reaction centre (domains 1ndash6 and 7ndash11 respec-tively in fig 1 Schubert et al 1998) The position of the splitand the faster evolutionary rate of PsaA-1 (peripheral frag-ment) can be reconciled with the structure of the PSI (fig 6A

and B) Indeed the presence of a supercomplex of PSI andantenna proteins in C velia is obvious from the 2D electro-phoresis (fig 3) Apart from the structure of PSI itself theprocess of the PSI assembly seems to be highly conservedthrough evolution from cyanobacteria to higher plant chlo-roplasts including assistance of the same auxiliary factors suchas Ycf3 and Ycf4 (Boudreau et al 1997 Ruf et al 1997)Although many individual steps in the PSI biogenesisremain unknown this process had to be remodeled inC velia to assemble three proteins instead of two into thefunctional PSI core complex Particularly intriguing is thebinding of chlorophyll cofactors into the split PsaA Both invivo and in vitro studies suggest that chlorophyll has to beinserted into core subunits of PSI co-translationally probablyas a prerequisite for correct protein folding (Kim et al 1994Eichacker et al 1996) The putative interface between PsaA-1and PsaA-2 is rich in chlorophyll molecules (fig 6B) andaccording to available crystal structures these chlorophyllsare coordinated by both parts of split PsaA It is not clearhow C velia inserts these chlorophylls into PSI and how sta-bility and correct folding of the nascent PsaA-12 is achievedPerhaps both PsaA fragments are assembled together co-translationally and this is synchronized with or even assistedby chlorophyll loading (which could also help to coordinatethe orientation of the two fragments) In any case the PSI corebiogenesis in C velia could involve additional synchronizingassembly step(s) and the recruitment of new nuclear-encoded auxiliary factors Interestingly the second proteinof the PSI antenna core PsaB mirrors some of the PsaAstructure at functionally analogous positions although notsplit PsaB also contains a variable loop between the fourthand fifth trans-membrane helices and a highly divergentN-terminus Whether this plays a compensatory role directlyrelated to the splitting of PsaA is not clear but it is an inter-esting possibility that would require direct biochemicaltesting

Similar to the PsaA protein any split of the AtpB proteinwould seem improbable due to its critical function andconserved structure Even though the proposed AtpB-1part of the b-subunit appears to be relatively far from thecatalytic site (fig 6C) it is known that tilting of the b-subunitsincluding the top b-sheet crown is critical for the catalyticcycle The movement of the upper part of the b-subunit issubtle but if it is restricted by inhibitor tentoxin then thecyclic interconversion of nucleotide binding sites is blocked(Groth 2002) Tentoxin binds close to the place where theAtpB protein is split (Groth 2002) Correct reassembly ofAtpB-1 and AtpB-2 proteins is therefore likely to be highlyconstrained to avoid any restrictions of AtpB-2 movementOn the other hand it is also possible that the fragmentedAtpB-2 is more flexible and perhaps less sensitive to somenatural inhibitors The ATP synthase was also noteworthy inthat it migrated at the top of the native protein gel This hasnever been observed in cyanobacteria (Herranen et al 2004)or higher plants (Jarvi et al 2011) and indicates that ATPsynthase might be integrated into a very stable megadaltonsupercomplex in the plastid of C velia

2455

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Highly Divergent Characteristics of Gene andChromosomal Structure

A number of highly modified ORFs are found in the C veliaplastid genome and distributed without any obvious patternLong extensions or insertions in plastid proteins are uncom-mon in most plastids including those in other alveolatesSome have been identified in green algae (eg de Cambiaireet al 2006) but never in the extent or number reported hereMany of the C velia predicted proteins that contain exten-sions function in the small subunit of the ribosome Ifthe extensions are really present in the final functional pro-teins this has implications for plastid ribosome structureAlternatively transcripts of the rps ORFs and other unusualORFs could be modified so that only the homologous regionis translated This process may be related to transcript oligour-idylylation but as yet we have no evidence to support thisOligoU tailing is another modification of general interest(Wang and Morse 2006) and its significance and distribution

in C velia plastid transcripts remain unknown OligoU tailsin some C velia transcripts are found in mature mRNAs thatare derived from polycistronic primary transcripts (fig 1)This suggests a possible role in transcript processing perhapssimilar to that observed in dinoflagellate plastids (Nelsonet al 2007 Dang and Green 2009 Barbrook et al 2012)If confirmed the ubiquitous presence of polycistronictranscription would imply that oligoU tailing is widespreadin the C velia plastid and tightly intertwined with expressionof many core plastid genes

No linear-mapping plastid genome has been documentedto date although there is a longstanding debate aboutwhether circular-mapping plastid genomes might representphysically linear molecules (Bendich 2004) At a sufficientdepth of coverage the two ends of the C velia plastidgenome can be (barely) overlapped in sequence but multiplelines of evidence suggest the genome is actually linear ThePFGE data support existence of linear monomers and rare

FIG 6 A putative position of the PsaA proteins in PSI and the AtpB proteins in ATP synthase of Chromera velia as approximated from proteinalignments and available crystal structures of plant counterparts (A) A stromal view on the pea PSI-LHC complex (PDB ID code 2WSC Amunts et al2009) showing a putative position of PsaA-1 and PsaA-2 in the PSI complex Lhca antenna proteins are shown in green stromal subunits PsaC PsaD andPsaE were removed for clarity (B) A detail of the same structure showing arrangement of chlorophyll molecules bound to the PSI core Red circle markschlorophylls expected to be bound at an interface between PsaA-1 and PsaA-2 proteins in C velia (see discussion for details) (C) AtpB-1 and AtpB-2proteins fitted into structure of the spinach F1-ATPase (PDB ID code 1KMH) Catalytic site is indicated by an orange asterisk

2456

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

dimers (which could arise by TIR-mediated recombinationbetween two monomers) but is incongruent with a signifi-cant presence of linear concatemers This leaves no directcomparisons possible with other plastid DNAs includingthose in related apicomplexans which comprise various mix-tures of circular molecules linear concatemers and high-mo-lecular-weight material (Lilly et al 2001 Williamson et al 2002Bendich 2004 Day and Madesis 2007) Presence of TIRs in theC velia plastid genome is a common characteristic of lineargenomes (eg in apicomplexan mitochondria Kairo et al1994 Hikosaka et al 2010) The physical structure of thechromosome ends remains unknown but given thesequence one can speculate based on structures observedin other linear genomes These include single-strandedloops (possibly followed by a nick) or single-stranded over-hangs in the 15 bp regions (reviewed in Nosek et al 2004) ordiverse associations with end-specific proteins (eg Rekoshet al 1977 Tomaska et al 1997) Gradually decreasing depthof coverage near the chromosome end may also suggest thatthe whole terminal 600 bp region not just its very end isprotected possibly by a t-loop (Tomaska et al 2002) Norepeated sequence can be found close to the ends howeverso determining which of these structures if any most closelyrepresents the physical state of the C velia plastid chromo-some will require direct testing Altogether these data provideevidence for the first linear-mapping plastid genome inC velia and suggests that a similar topology could existin other plastid genomes particularly those that do notpresently map as a circle (Gabrielsen et al 2011)

The change in the plastid chromosome topology hasimportant implications on the process of DNA replicationand overcoming the end-replication problem In the api-complexan and other plastids several types of replicationorigins have been identified and linked to the D-loop roll-ing circle and recombination-mediated replication strate-gies The most common type of replication origin isassociated with bidirectional D-loop replication and locatedinside the duplicated rRNA inverted repeat (Kolodner andTewari 1975 Williamson et al 2002 Krishnan and Rao2009) In contrast the C velia plastid genome lacks boththe duplicated rRNA unit and associated replication originand is probably replicated bidirectionally from thepredicted putative replication origin toward the chromo-some termini which is a unique situation among all plastidgenomes The estimated plastid gene copy number (9that of a nuclear marker) is also unusually low comparedwith most photosynthetic plastids (~50ndash100 copies Dayand Madesis 2007) and is more similar to the gene copynumber in the non-photosynthetic plastids of apicomplex-ans (~25 copies Matsuzaki et al 2001) or plastid minicir-cles during exponential growth in dinoflagellates(2ndash4 copies Koumandou and Howe 2007)

Searching for Forces Driving Structural Complexity

The C velia plastid genome is unique in a number of waysHere we have examined several unusual characteristics ofits organization and expression all of which raises the

question why is this genome home to so many odditiesOne property that might explain several of these character-istics concurrently is a high level of intragenomic duplicationThe C velia plastid genome has been extensively reshuffledand in order to understand the underlying forces wesearched for palindromes short repeats and long repeatsPalindromes and short repeats were comparatively rarehowever the searches identified at least 46 pairwise matchesbetween longer genomic sequences (gt50 bp) at 65ndash100similarity (fig 7) Most of the duplicates are divergent andlow copy number (2ndash3) and apart from the TIRs they includemany intergenic regions duplicated atpH and clpC genesand duplicated fragments of otherwise complete genes(psaA psaB psbB rpl3 and tufA fig 7) The region containingmost repeated sequence is found around the TIR boundary(fig 8) Multiple parts of this region including severalgene fragments three atpH paralogs and six trnK paralogscomprising both complete and fragmentary variants arescattered in several places in the genome (fig 8) Long repeatsare rare in most plastids suggesting that duplication isan important force in shaping the C velia plastid genomeDuplication could explain occurrence of paralogs small genefragments extensive gene re-shuffling and re-structuringof operon units through movement of promoters andintergenic elements Similar processes could account for theaddition of extensions and insertions to genes The splitting ofpsaA and atpB could likewise be seen as a process involvingduplication or partial duplication For example the PsaA splitcould have proceeded through an intermediate in which the

FIG 7 Large repeats in the Chromera velia plastid genome The lineargenome is drawn as an incomplete circle with genes in black intergenicsin light gray and two TIRs (white lines) at chromosome ends (15 bpregions at the termini are not visible here) Links connect homologousregions of 50 bp or more identified by BLAST (dark gray) and additionalmatches identified by Pipmaker (medium gray) Homology between theTIRs is not shown for clarity Names of genes most affected by dupli-cation events are indicated

2457

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

canonical and split PsaAs coexisted In this scenario each ofthe two complexes could have been regulated independentlyand potentially even have different functions much alike thecyanobacterial monomeric and trimeric PSI (Majeed et al2012)

Duplicated sequences in C velia plastid are low copynumber long and divergent from each other They aredissimilar from short dispersed repeats found in some algalplastids (eg in Chlamydomonas Maul et al 2002) and longerrepeats found in angiosperm plastids (eg Guisinger et al2011) all of which are conserved in sequence and compara-tively high copy number It remains unclear how duplicationis mediated in C velia Few duplicates are in tandem so theduplication process is not likely to be related to DNA repli-cation and is more likely related to recombination A func-tional DNA recombination machinery has been documentedin several land plant and algal plastids (Palmer 1983 Boyntonet al 1988 Haberle et al 2008) where it primarily acts in DNArepair and replication Unwanted genome reorganization isknown to be actively prevented which explains why mostplastids have maintained a conserved genomic architecture(reviewed in Marechal and Brisson 2010) It is possiblethat mechanisms preventing unwanted recombination oralternatively those allowing for recombination-mediatedrepair were loosened or lost in the C velia plastid whichled to propagation of duplicates and rearrangementsThe apparent linearization of the chromosome could alsostem from a recombination-mediated rearrangementinvolving homologous TIRs and 15 bp repeats now locatedat the termini Interestingly although sequence shuffling hasbeen extensive in the C velia plastid genome it has not beencompletely random Most genes have been reorganized inlarge clusters with a strongly pronounced strand polarity(supplementary fig S8B Supplementary Material online)More importantly functionally related genes often clustertogether and are most likely transcribed polycistronically(fig 4) so even with high levels of rearrangement a function-ally reorganized genomic architecture can result fromselection

Intragenomic duplication may be the cause of most pecu-liarities in the C velia plastid genome but not all of themFeatures such as transcript oligoU tailing and noncanonicalgenetic code (Moore et al 2008 Janouskovec et al 2010) may

have different causes including increased mutation rate smallpopulation size and a short replication cycle Moreover earlyevolutionary acquisition and long-term combined effect ofsome these forces could have significantly contributed tothe genome remodeling For example all alveolate plastids(C velia apicomplexans Vitrella and dinoflagellates) have areduced gene complement somewhat modified gene orderand comparatively fast rate of protein evolution (Janouskovecet al 2010) Oligouridylylation of plastid transcripts is alsofound in dinoflagellates and may significantly predateC velia although it seems to be missing in apicomplexans(Janouskovec et al 2010 JJ unpublished data) The AtpBinsertion present in dinoflagellates and Vitrella at a homolo-gous position to the split in C velia could be viewed as anancestral acquisition that may have individualized the twoparts of the protein preparing ground for the split Similarlyrecombination has been very active in the dinoflagellate plas-tids and possibly even led to a massive fragmentation of theirgenome to small mini-circles (Zhang et al 1999 Zhang et al2001) so it might be tempting to speculate that the corecause of many of these conditions traces back to an ancientancestor Distinguishing between ancient and independentgains is nevertheless difficult The plastid genomes of apicom-plexans and Vitrella are generally not that unusual in struc-ture as C velia and dinoflagellates so it is more accurate topropose that all alveolate plastids were ancestrally somewhatdivergent but evolved in different directions

The fact that in a single organism two conservativephotosynthesis proteins are split is astonishing given thehigh efficiency of C velia photosynthesis (Quigg et al 2012)The mechanism behind a particular split can be relativelysimple but it is more difficult to see how each of the PsaAand AtpB fragments became integrated into a functionalmultiprotein complex For example both PsaA fragmentshad to coevolve with mechanisms allowing co-translationalinsertion of chlorophylls and simultaneously acquireremodelinteractions with numerous cofactors that mediate theirassembly into a fully functional PSI complex Many otherstructures in the C velia plastid appear similarly complicatedBoth the ribosome and RNA polymerase complex comprisea number of highly modified components including proteinswith hundreds of amino acid long extensions and insertionsExplaining the appearance of highly modified structures

FIG 8 Intrachromosomal duplication and recombination hotspot around the TIR boundary (vertical dashed line) The plastid genome is shown bythick black line at the bottom with TIRs (TIR-A and TIR-B) folded onto themselves to the left and the LSC region forming a loop to the right Other partsof the genome with homology to the TIR boundary region are shown by thick gray lines numbers indicate positions Genes and pseudogenes are shownby boxes Level of sequence similarity is distinguished according to the key

2458

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is more difficult than their components individually becausemany parts of these systems are mutually constrainedLikewise molecular processes such as oligoU tailing aredifficult to justify in adaptive terms Considering the complex-ity of similar unnecessary processes in other organellessuch as RNA editing or intron splicing (Gray et al 2010)it is attractive to hypothesize that complicated molecularmachineries have evolved in the C velia plastid that serveno general advantage

Materials and Methods

DNA Extraction Sequencing Annotation andFragmented Gene Analysis

Pelleted cells of C velia were ground in liquid nitrogenusing mortar and pestle and the resulting slurry was incu-bated in CTAB buffer (2 wv cetyltrimethyl ammoniumbromide 142 M NaCl 20 mM EDTA 100 mM Tris HCl pH80 2 wv polyvinylpyrrolidone 05 b-mercaptoethanol1 mgml RNase A) at 65 C for 20 min After two extrac-tions with phenolchloroform and one with chloroformonly DNA was precipitated with isopropanol and washedwith ethanol Pellet was dried at room temperature andresuspended in TE buffer Extracted DNA was separated byCsCl-Hoechst gradient ultracentrifugation and AT-enrichedfractions were tested for the presence of plastid DNA usingthe psbA probe Plastid-enriched DNA fractions weresequenced using Illumina 54 bp paired-end reads technol-ogy (a total of 9591179 read pairs were obtained) anddeposited in GenBank Sequence Read Archive (BioprojectPRJNA193178) De novo sequence assembly using MIRA3extended the previous plastid contig (NC_0143401) atboth ends to the final length of 120426 nucleotides at572 average depth of coverage Two errors at homopol-ymeric regions were corrected which led to the merging ofrps7 with orf142 and rpl4 with orf128 The gene for 5Sribosomal RNA was identified upstream of 23S rRNAand its fold verified using Mfold (httpmfoldrnaalbanyeduq=mfold last accessed May 25 2013) Proteins werere-annotated by comparison with plastid homologs froman NTG start codon with the exception of 10 proteins thatwere more consistent with an AT[ACT] start codon Six ofthe proteins had no alternative NTG start (atpA rpl31rps7 rps12 rps14 ycf3) Unknown ORFs and genes withlong N-terminal extensions were annotated from the firstNTG start codon The extended plastid genome sequenceand new annotations were deposited in an updated C veliaplastid genome entry (NC_0143402) PsaA trans-membranedomains were predicted using TMHMM 20 (httpwwwcbsdtudkservicesTMHMM last accessed May 25 2013)and by comparison to PsaA of Synechoccocus elongatusThe corrected genetic distances for PsaA-1 and PsaA-2were calculated in Tree-Puzzle 52 (Schmidt et al 2002)using a data set of 40 representative plastid PsaA homologsand the following parameters Likelihood mapping Slow(exact) parameter estimates WAG model with fourgamma categories estimated alpha parameter and esti-mated aa frequencies PsaA-12 phylogenies were

computed using a broader data set in RAxML using -mPROTGAMMALGF -f a - 100 PsaA-12 and AtpB data setswere aligned using ndashlocalpair option in MAFFT

RNA Extraction RACE and Northern BlotHybridization

Total RNA was isolated from ~02 g of C velia cells rup-tured by repeated freezing and thawing followed by grind-ing with a pestle in liquid nitrogen by the addition of 5 mlTRI reagent followed by manufacturerrsquos instructions(Sigma) Total RNA was ligated with 50 adapter (50

RACE) or self-ligated with T4 RNA ligase (circular RACE)and RT-PCR was carried out with either random or specificprimers or with polyT primers (30 RACE) Sequences ofcircularized transcripts were deposited under GenBankNucleotide accessions KC734564ndashKC734569 For Northernblot analysis ~10mg per lane of total RNA was separatedin a 1 formaldehyde agarose gel in 1 MOPS bufferblotted and UV cross-linked as described elsewhere(Vondruskova et al 2005) PCR-amplified DNA fragmentsof selected genes (for primers see supplementary table S4Supplementary Material online) were labeled by randompriming with [32P]-dATP and used as probes for hybrid-ization at 55 C After washing the results were visualizedon Typhoon Imaging System (GE Healthcare)

Preparations of Cell Membranes and TwoDimensional Electrophoresis

Cells of C velia (optical density at 750 nm = ~05) werewashed and resuspended in 1 ml of the working buffer con-taining 25 mM MESNaOH pH 65 5 mM CaCl2 10 mMMgCl2 and 20 glycerol The resuspended cells were mixedwith 05 ml of glass beads (01 mm diameter) in 2 ml eppen-dorf tube and broken using Mini-BeadBeater (BioSpeceight shaking cycles 10 s each with 2-min breaks for coolingthe suspension on ice) Membranes were separated fromthe cell extract by centrifugation (40000 g 20 min)The isolated membranes were resuspended in the workingbuffer and solubilized by gentle shaking with 1 dodecyl-b-maltoside at 10 C for 1 h Insoluble parts were removedby centrifugation (65000 g 20 min) Analysis of nativemembrane complexes was performed using Clear-Nativeelectrophoresis as described in Wittig and Schagger (2008)Individual proteins in membrane complexes were resolvedin the second dimension by SDS-PAGE in a 12ndash20 lineargradient polyacrylamide gel containing 7 M urea (Sobotkaet al 2008)

Protein Identification by LC-MSMS Analysis

Gel slices were placed in 200ml of 40 acetonitrile 200 mMammonium bicarbonate and incubated at 37 C for 30 minThe solution was then discarded the procedure was repeatedone time and the gel was finally dried in a vacuum centrifugeTwenty microliters of 40 mM ammonium bicarbonatecontaining 04mg trypsin (proteomics grade Sigma) wasadded to the tube incubated at 4 C for 45 min and thendried in a vacuum centrifuge To digest proteins 20ml of 9

2459

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 6: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

(supplementary figs S2 and S3 Supplementary Materialonline) Although this approach cannot accurately pinpointthe position of start codons (many rps genes are expressedrelatively little and intergenic regions are often transcribed)the consistent coverage by transcripts suggests that the ORFsmay really have expanded in size at the protein level Much ofthe long insertions in rpo genes is also equally represented inthe transcriptome data however in this case a drop in cov-erage is found within each insertion creating a 200 bp gap(supplementary fig S4 Supplementary Material online) Thisraises the possibility that RpoC1 and RpoC2 may be splitmuch like PsaA and AtpB however both are in frame andlow coverage so more evidence is required to distinguishbetween the two alternatives

Unusual changes to gene presence have also been notedtwo genes are present in two (atpH) and three (clpC) full-length paralogs despite that a single type is found in otherplastid genomes All three clpC paralogs are highly divergentin sequence from each other and clpCs in other plastids Theyoverlap only partially and the largest and most complete ofthe paralogs (clpC3) is an order of magnitude less expressedthan the other two (supplementary fig S5A SupplementaryMaterial online) Likewise the two atpH paralogs are differentin sequence (93 nucleotide similarity) length and expres-sion AtpH1 is both highly expressed and conserved butthe expression of the duplicated atpH2 located on the TIRsis nearly 100-fold lower (supplementary fig S5BSupplementary Material online) The atpH2 gene contains

an unusual C-terminal extension which doubles it in size Acloser look at the surrounding sequence reveals that atpH2originated by a duplication event of the atpH1 region (includ-ing trnK discussed later) This together with its low expres-sion suggests that atpH2 is most likely a non-functionalpseudogene More generally however this example illustrateshow duplication can drive gene paralogy ORF expansionand altered gene order all features observed throughoutthe C velia plastid genome

Analysis of the total RNA and polyA fraction (fig 4) tran-scription profiles revealed more unexpected featuresTranscripts for all plastid genes were present in the totalRNA fraction and consistently also found in the polyA frac-tion at two to three magnitudes lower abundance probablydue to a carryover of non-polyA transcripts (fig 4) The psbAgene stood out in two ways however First it was highlyexpressed being represented at almost an order of magnitudehigher levels than other protein-coding genes (and similarlevels as the rRNA operon fig 4 and table 1) Second therepresentation of psbA in the polyA fraction was about equalto the total RNA fraction which would be the expected resultfor a polyadenylated transcript (fig 4) Only two additionalregions corresponding to rps19 and a psbB fragment (partof the annotated orf157) were relatively overrepresented inthe polyA fraction but only rps19 contained A-rich genomicsequence at the 30-end that could explain the bias Whetherthe psbA gene and psbB fragment are uniquely polyadeny-lated or a carryover into the polyA fraction occurred due to

10000 20000 30000 40000

105

104

106

103

102

10

1

atpB-2 psaA-1 psbB psaA-2 psbC psbDpsaB

position (bp)

Cov

erag

e pe

r bp

(Lo

g)

atpH2

Photosystem I + II ATPase Uknown ORFExpression Pseudogene fragmentOther functiontotal RNA polyA RNA

psbE

50000 60000 70000 80000

105

104

106

103

102

10

1

23S rRNA atpB-1 atpH1 atpI

Cov

erag

e pe

r bp

(Lo

g)

psbV K psbJ

90000 100000 110000 120000

105

104

106

103

102

10

1

psbAatpA

Cov

erag

e pe

r bp

(Lo

g)

atpH2psbHpsbT psaC petGB

petA

Cytochrome b6f

rps19

tufA

petD

16S rRNA

psbA psbB

FIG 4 Plastid total RNA and polyA RNA transcriptomic profiles Coverage by total RNA (red) and polyA RNA (blue) is shown for the whole plastidgenome (Log scale) Gene families are color-coded according to the key The highly expressed psbA gene is highlighted in bold The split psaA and atpBgenes are in red The boundaries of the TIRs are separated by dashed vertical lines

2452

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

another reason is currently not clear and awaits successfulcloning of their 30 transcript ends The transcriptomes alsoenabled us to analyze expression pattern among neigboringgenes which have been highly rearranged and strand-polar-ized in C velia compared with other plastids Canonicaloperons (atpIHGFDA psaAB petBD ribosomal proteinsupercluster ribosomal RNA operon eg Westhoff et al1983) and translationally coupled units (atpBE Gatenbyet al 1989) were also affected by rearrangements and eitherfragmented or internally reshuffled Canonical promoterscould not be identified These characteristics indicate thattranscription of many core plastid genes has been remodeledNorthern hybridization patterns in the psaA-1 and psaA-2region already revealed that this is the case in several keyproteins of the photosystems which are expressed fromnovel multicistronic mRNAs through complex processing(fig 1) Whole-genome transcriptional profiles (fig 4) confirmthis and suggest that polycistronic transcription is widespreadin the genome (compare transcript coverage at intergenicspacers and the region downstream of the 16S rRNA genefig 4) Although the primary transcript regions and processingpatterns cannot be predicted accurately the data show thatnovel polycistronic units most likely exist and often includefunctionally similar genes (compare distribution of genes forphotosynthesis and protein expression in fig 4)

Finally we estimated and compared relative expression ofall plastid genes separately and in their functional groupsbased on the total RNA transcriptome (table 1 and supple-mentary table S3 Supplementary Material online) Asexpected the rRNA genes were most highly expressedfollowed by genes of the four membrane complexes (PSIPSII cytochrome b6f and ATP synthase) and these alsocomprised all of the 22 most expressed protein-codinggenes with the single exception of tufA The least expressedwere genes related to transcription and translation RNApolymerase tRNA genes and ribosomal proteins (table 1)However unknown ORFs and most intergenic regions werealso expressed including relatively highly expressed orf389orf201 and intergenic regions downstream of psbJ andatpB-1 (fig 4) Additionally we used the transcriptome toaddress the presence of RNA editing which is known in theplastids of the related dinoflagellates (eg Dang and Green2009) but has not been found in those of apicomplexans andVitrella The consensus of transcriptomic reads at each sitewas identical to the C velia plastid genome sequence suggest-ing the absence of editing Only four sites showed significantpolymorphism (gt10) at RNA level (sites 44346 [515]46071 [2366] 46074 [3935] and 46574 [2379]) all ofwhich were located within the rRNA operon and identical(998ndash100) at the DNA level Based on their location weconclude that this is more likely due to reverse transcriptionerrors of modified rRNA nucleosides and not RNA editing

A Linear-Mapping Plastid Genome in C velia

One of the most significant features of the previouslypublished plastid genome from C velia is that it does notmap as a circle as do all other plastid genomes completed todate The gap in the sequence falls between the two copies of

the TIR containing psbA (Janouskovec et al 2010) Weattempted to close this gap using multiple approaches Westarted by massively increasing the depth of sequence cover-age The genome was originally sequenced by 454 pyrose-quencing to a 13-fold depth of coverage so we used anindependent approach and assembled the genome frompaired-end Illumina sequence reads to an average depth ofalmost 600-fold This approach extended the two TIRs by300 bp on each side which was verified by PCR (supplemen-tary fig S6 Supplementary Material online) The very end ofeach TIR was followed by a 15 bp sequence further followedby an extension into the complementary TIR at a very lowdepth of coverage (fig 5) These extensions created an overlapbetween the ends however several lines of evidencesuggested that this is not due to a circular chromosomebut rather a linear topology with structured ends Firstmapping the depth of coverage revealed a pattern expectedfor a linear molecule The depth of coverage was generallyconsistent across the large single copy (LSC) region and mostof the TIRs but reduced linearly beginning at about 600 bpfrom the end followed by an exponential reduction andpartial recovery between 300 to 50 bp from the end followedby a terminal drop (fig 5A) The size and shape of theexponential reductionrecovery would be predicted by themean DNA fragment size in the sequencing library (327 bp)and linear topology as a consequence of fragmentation biasnear the chromosome end Therefore both the overalldecrease in coverage depth and the exponential decreaserecovery close to the ends support a linear chromosomeconformation (if it were circular the whole region shouldnot differ from the rest of the genome) Second if thegenome was circular-mapping and the gap was hard tosequence we would expect some paired ends to span thegap However when more than 650000 paired ends aremapped to the genome not a single paired end spanningthe gap was identified (see Materials and Methods) Insteadboth ends of the chromosome consisted exclusively ofreverse-mapping reads (fig 5A and B) Third PCR experi-ments using six primers (all shown to successfully amplifyproducts from one side of the gap) and all their nested com-binations failed to fill the gap between the two TIRs (supple-mentary fig S6 Supplementary Material online) Fourth thesize of the plastid genome estimated directly by pulsed-fieldgel electrophoresis (PFGE) and by psbA probe hybridizationwas consistent with abundant presence of linear monomers120 kb in size (supplementary fig S7 Supplementary Materialonline and Janouskovec et al 2010) Subgenomic-sizedmaterial and unresolved DNA in the well were also abun-dantly represented whereas putative linear dimers wererare and linear concatemers were not detected The PFGEexperiment was reproduced by using probes to three plastidgenes encoded in the LSC region tufA petA and atpA (seeMaterials and Methods) and consistently led to the sameresult suggesting that linear monomers are an importantconstituent of plastid DNA in C velia Subgenomic-sized mol-ecules could represent plastid DNA fragmented during PFGEpreparation (note that intact C velia cells were digestedin agarose plugs however) nucleus-encoded plastid DNA

2453

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

fragments plastid DNA in the process of replication or otherforms of genuine plastid DNA (Ellis and Day 1986 Oldenburgand Bendich 2004 Scharff and Koop 2006) Incompletely di-gested cells of C velia in the agarose plugs (see Materials andMethods) could be responsible for much of the signal in thewell although high-molecular-weight branched DNA formscould also be present and cannot be distinguished using thecurrent approach (Bendich 1991 Bendich 2004) Altogetherbecause the plastid genome was not closed by deep sequenc-ing using either 454 or Illumina no paired-end linkage couldbe established no linkage could be established by PCR and

because the size of the plastid genome estimated by PFGE andSouthern blot hybridizations was consistent with the exis-tence of linear monomers we conclude that the genome isprincipally linear in structure and circular molecules andlinear concatemers are rare or absent

If the genome is indeed linear it will be important to knowhow the C velia plastid DNA replication and copy numbercompares to the circularconcatemeric plastid DNA inapicomplexans (Williamson et al 2001 Williamson et al2002) and other organisms To provide a first glimpse intothis we identified a putative replication origin of the C velia

A

B

FIG 5 Schematic of plastid chromosome ends (A) Coverage depth (y axis) by forward-mapping (yellow) reverse-mapping (purple) and total (red)genomic reads is projected onto the TIR Position of genes is shown by gray boxes at the bottom (K = trnK P = trnP) Total coverage depth startsdropping gradually at about 800 bp from the chromosome end and goes through a U-shaped minimum at about 150 bp from the end The calculatedmean size of the end-sequenced DNA fragments (horizontal black bar red bits correspond to sequenced end pairs) and coverage by forwardreverseread pairs in this region suggest that the U-shaped minimum results from unequal DNA fragmentation near the chromosome end A steep drop in totalcoverage depth occurs at the very end of the chromosome (B) Here the TIRs on each of the chromosome ends (green and blue respectively) divergeinto a shared 15 bp region followed by a short sequence of the complementary TIR (DNA ends marked as 30 and 50) The total depth of coverage byDNA reads (y axis) steeply decreases in the 15 bp region The 15 bp region is also enriched in nucleotide discrepancies (sequence logos in bold) The last130 bp at each chromosome end is exclusively assembled from reverse-oriented paired ends (horizontal colored arrows) and no paired-end spanningthis region can be identified suggesting that the genome cannot be genuinely circularized

2454

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

plastid genome in silico using the cumulative GC-skew anal-ysis This region (68ndash69 kb from one end of the linear contigsupplementary fig S8A Supplementary Material online) ischaracterized by a minimum in cumulative GC-skew andloosely conserved tandem repeats (positions 68404ndash69029)In contrast both ends of the chromosome are at a cumulativeGC-skew maximum suggesting that they most likely corre-spond to replication termini Another feature commonlyassociated with origins of replication is the major shift ingene orientation which is located approximately 18 kbaway from the predicted replication origin in C velia (supple-mentary fig S8B Supplementary Material online) Thegenomic region surrounding the two features is relativelyoverrepresented in the Illumina data whereas both chromo-some ends are relatively underrepresented (this holds aftercorrection for base composition supplementary fig S8CSupplementary Material online) which is consistent withthe existence of replicating molecules in the Illumina libraryAltogether these three characteristics are consistent withlinear-mapping topology and suggest that replication startsat about two-thirds of the way from one end of the moleculeand proceeds bidirectionally toward the ends The copynumber for two C velia plastid genes (tufA and atpH) wasdetermined using dot blot hybridization to total genomicDNA and a synthetic construct (see Materials andMethods) The inferred copy number of both plastid geneswas 9 times higher compared with a single-copy nuclear genetopoisomerase II and confirmed that genes encoded in theLSC region and TIR are equally represented (atpH is present inthree closely related paralogs all of which hybridized to theconstruct) (supplementary fig S9 Supplementary Materialonline)

Discussion

Significance of Split Proteins for Photosynthesis andATP Generation

Both PsaA and AtpB have never been observed to be frag-mented like those of C velia Although the psaA gene is splitin Chlamydomonas reinhardtii RNA trans-splicing producesa full-length transcript which is translated to a full-lengthprotein (Merendino et al 2006) The structure of the PSIcore is highly conserved in all known oxygenic phototrophs(Nelson and Yocum 2006 Busch and Hippler 2011) and it isformed by a heterodimer of structurally similar PsaA and PsaBproteins which bind a remarkably high number of cofactorsabout 100 chlorophylls 14 carotenoids and 2 phylloquinones(Amunts et al 2010) The algal PSI is expected to be structur-ally similar to the plant PSI including several Lhca antennaproteins bound in a half-circle around the PSI (fig 6A)

PsaA in C velia is split between domains 4 and 5 (figs 1 and6A) This suggests that the split was not simply a reversal ofthe proposed ancient formation of psaA by the gene fusion ofa psbBpsbC-like antenna component and a psbApsbD-likecore of the reaction centre (domains 1ndash6 and 7ndash11 respec-tively in fig 1 Schubert et al 1998) The position of the splitand the faster evolutionary rate of PsaA-1 (peripheral frag-ment) can be reconciled with the structure of the PSI (fig 6A

and B) Indeed the presence of a supercomplex of PSI andantenna proteins in C velia is obvious from the 2D electro-phoresis (fig 3) Apart from the structure of PSI itself theprocess of the PSI assembly seems to be highly conservedthrough evolution from cyanobacteria to higher plant chlo-roplasts including assistance of the same auxiliary factors suchas Ycf3 and Ycf4 (Boudreau et al 1997 Ruf et al 1997)Although many individual steps in the PSI biogenesisremain unknown this process had to be remodeled inC velia to assemble three proteins instead of two into thefunctional PSI core complex Particularly intriguing is thebinding of chlorophyll cofactors into the split PsaA Both invivo and in vitro studies suggest that chlorophyll has to beinserted into core subunits of PSI co-translationally probablyas a prerequisite for correct protein folding (Kim et al 1994Eichacker et al 1996) The putative interface between PsaA-1and PsaA-2 is rich in chlorophyll molecules (fig 6B) andaccording to available crystal structures these chlorophyllsare coordinated by both parts of split PsaA It is not clearhow C velia inserts these chlorophylls into PSI and how sta-bility and correct folding of the nascent PsaA-12 is achievedPerhaps both PsaA fragments are assembled together co-translationally and this is synchronized with or even assistedby chlorophyll loading (which could also help to coordinatethe orientation of the two fragments) In any case the PSI corebiogenesis in C velia could involve additional synchronizingassembly step(s) and the recruitment of new nuclear-encoded auxiliary factors Interestingly the second proteinof the PSI antenna core PsaB mirrors some of the PsaAstructure at functionally analogous positions although notsplit PsaB also contains a variable loop between the fourthand fifth trans-membrane helices and a highly divergentN-terminus Whether this plays a compensatory role directlyrelated to the splitting of PsaA is not clear but it is an inter-esting possibility that would require direct biochemicaltesting

Similar to the PsaA protein any split of the AtpB proteinwould seem improbable due to its critical function andconserved structure Even though the proposed AtpB-1part of the b-subunit appears to be relatively far from thecatalytic site (fig 6C) it is known that tilting of the b-subunitsincluding the top b-sheet crown is critical for the catalyticcycle The movement of the upper part of the b-subunit issubtle but if it is restricted by inhibitor tentoxin then thecyclic interconversion of nucleotide binding sites is blocked(Groth 2002) Tentoxin binds close to the place where theAtpB protein is split (Groth 2002) Correct reassembly ofAtpB-1 and AtpB-2 proteins is therefore likely to be highlyconstrained to avoid any restrictions of AtpB-2 movementOn the other hand it is also possible that the fragmentedAtpB-2 is more flexible and perhaps less sensitive to somenatural inhibitors The ATP synthase was also noteworthy inthat it migrated at the top of the native protein gel This hasnever been observed in cyanobacteria (Herranen et al 2004)or higher plants (Jarvi et al 2011) and indicates that ATPsynthase might be integrated into a very stable megadaltonsupercomplex in the plastid of C velia

2455

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Highly Divergent Characteristics of Gene andChromosomal Structure

A number of highly modified ORFs are found in the C veliaplastid genome and distributed without any obvious patternLong extensions or insertions in plastid proteins are uncom-mon in most plastids including those in other alveolatesSome have been identified in green algae (eg de Cambiaireet al 2006) but never in the extent or number reported hereMany of the C velia predicted proteins that contain exten-sions function in the small subunit of the ribosome Ifthe extensions are really present in the final functional pro-teins this has implications for plastid ribosome structureAlternatively transcripts of the rps ORFs and other unusualORFs could be modified so that only the homologous regionis translated This process may be related to transcript oligour-idylylation but as yet we have no evidence to support thisOligoU tailing is another modification of general interest(Wang and Morse 2006) and its significance and distribution

in C velia plastid transcripts remain unknown OligoU tailsin some C velia transcripts are found in mature mRNAs thatare derived from polycistronic primary transcripts (fig 1)This suggests a possible role in transcript processing perhapssimilar to that observed in dinoflagellate plastids (Nelsonet al 2007 Dang and Green 2009 Barbrook et al 2012)If confirmed the ubiquitous presence of polycistronictranscription would imply that oligoU tailing is widespreadin the C velia plastid and tightly intertwined with expressionof many core plastid genes

No linear-mapping plastid genome has been documentedto date although there is a longstanding debate aboutwhether circular-mapping plastid genomes might representphysically linear molecules (Bendich 2004) At a sufficientdepth of coverage the two ends of the C velia plastidgenome can be (barely) overlapped in sequence but multiplelines of evidence suggest the genome is actually linear ThePFGE data support existence of linear monomers and rare

FIG 6 A putative position of the PsaA proteins in PSI and the AtpB proteins in ATP synthase of Chromera velia as approximated from proteinalignments and available crystal structures of plant counterparts (A) A stromal view on the pea PSI-LHC complex (PDB ID code 2WSC Amunts et al2009) showing a putative position of PsaA-1 and PsaA-2 in the PSI complex Lhca antenna proteins are shown in green stromal subunits PsaC PsaD andPsaE were removed for clarity (B) A detail of the same structure showing arrangement of chlorophyll molecules bound to the PSI core Red circle markschlorophylls expected to be bound at an interface between PsaA-1 and PsaA-2 proteins in C velia (see discussion for details) (C) AtpB-1 and AtpB-2proteins fitted into structure of the spinach F1-ATPase (PDB ID code 1KMH) Catalytic site is indicated by an orange asterisk

2456

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

dimers (which could arise by TIR-mediated recombinationbetween two monomers) but is incongruent with a signifi-cant presence of linear concatemers This leaves no directcomparisons possible with other plastid DNAs includingthose in related apicomplexans which comprise various mix-tures of circular molecules linear concatemers and high-mo-lecular-weight material (Lilly et al 2001 Williamson et al 2002Bendich 2004 Day and Madesis 2007) Presence of TIRs in theC velia plastid genome is a common characteristic of lineargenomes (eg in apicomplexan mitochondria Kairo et al1994 Hikosaka et al 2010) The physical structure of thechromosome ends remains unknown but given thesequence one can speculate based on structures observedin other linear genomes These include single-strandedloops (possibly followed by a nick) or single-stranded over-hangs in the 15 bp regions (reviewed in Nosek et al 2004) ordiverse associations with end-specific proteins (eg Rekoshet al 1977 Tomaska et al 1997) Gradually decreasing depthof coverage near the chromosome end may also suggest thatthe whole terminal 600 bp region not just its very end isprotected possibly by a t-loop (Tomaska et al 2002) Norepeated sequence can be found close to the ends howeverso determining which of these structures if any most closelyrepresents the physical state of the C velia plastid chromo-some will require direct testing Altogether these data provideevidence for the first linear-mapping plastid genome inC velia and suggests that a similar topology could existin other plastid genomes particularly those that do notpresently map as a circle (Gabrielsen et al 2011)

The change in the plastid chromosome topology hasimportant implications on the process of DNA replicationand overcoming the end-replication problem In the api-complexan and other plastids several types of replicationorigins have been identified and linked to the D-loop roll-ing circle and recombination-mediated replication strate-gies The most common type of replication origin isassociated with bidirectional D-loop replication and locatedinside the duplicated rRNA inverted repeat (Kolodner andTewari 1975 Williamson et al 2002 Krishnan and Rao2009) In contrast the C velia plastid genome lacks boththe duplicated rRNA unit and associated replication originand is probably replicated bidirectionally from thepredicted putative replication origin toward the chromo-some termini which is a unique situation among all plastidgenomes The estimated plastid gene copy number (9that of a nuclear marker) is also unusually low comparedwith most photosynthetic plastids (~50ndash100 copies Dayand Madesis 2007) and is more similar to the gene copynumber in the non-photosynthetic plastids of apicomplex-ans (~25 copies Matsuzaki et al 2001) or plastid minicir-cles during exponential growth in dinoflagellates(2ndash4 copies Koumandou and Howe 2007)

Searching for Forces Driving Structural Complexity

The C velia plastid genome is unique in a number of waysHere we have examined several unusual characteristics ofits organization and expression all of which raises the

question why is this genome home to so many odditiesOne property that might explain several of these character-istics concurrently is a high level of intragenomic duplicationThe C velia plastid genome has been extensively reshuffledand in order to understand the underlying forces wesearched for palindromes short repeats and long repeatsPalindromes and short repeats were comparatively rarehowever the searches identified at least 46 pairwise matchesbetween longer genomic sequences (gt50 bp) at 65ndash100similarity (fig 7) Most of the duplicates are divergent andlow copy number (2ndash3) and apart from the TIRs they includemany intergenic regions duplicated atpH and clpC genesand duplicated fragments of otherwise complete genes(psaA psaB psbB rpl3 and tufA fig 7) The region containingmost repeated sequence is found around the TIR boundary(fig 8) Multiple parts of this region including severalgene fragments three atpH paralogs and six trnK paralogscomprising both complete and fragmentary variants arescattered in several places in the genome (fig 8) Long repeatsare rare in most plastids suggesting that duplication isan important force in shaping the C velia plastid genomeDuplication could explain occurrence of paralogs small genefragments extensive gene re-shuffling and re-structuringof operon units through movement of promoters andintergenic elements Similar processes could account for theaddition of extensions and insertions to genes The splitting ofpsaA and atpB could likewise be seen as a process involvingduplication or partial duplication For example the PsaA splitcould have proceeded through an intermediate in which the

FIG 7 Large repeats in the Chromera velia plastid genome The lineargenome is drawn as an incomplete circle with genes in black intergenicsin light gray and two TIRs (white lines) at chromosome ends (15 bpregions at the termini are not visible here) Links connect homologousregions of 50 bp or more identified by BLAST (dark gray) and additionalmatches identified by Pipmaker (medium gray) Homology between theTIRs is not shown for clarity Names of genes most affected by dupli-cation events are indicated

2457

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

canonical and split PsaAs coexisted In this scenario each ofthe two complexes could have been regulated independentlyand potentially even have different functions much alike thecyanobacterial monomeric and trimeric PSI (Majeed et al2012)

Duplicated sequences in C velia plastid are low copynumber long and divergent from each other They aredissimilar from short dispersed repeats found in some algalplastids (eg in Chlamydomonas Maul et al 2002) and longerrepeats found in angiosperm plastids (eg Guisinger et al2011) all of which are conserved in sequence and compara-tively high copy number It remains unclear how duplicationis mediated in C velia Few duplicates are in tandem so theduplication process is not likely to be related to DNA repli-cation and is more likely related to recombination A func-tional DNA recombination machinery has been documentedin several land plant and algal plastids (Palmer 1983 Boyntonet al 1988 Haberle et al 2008) where it primarily acts in DNArepair and replication Unwanted genome reorganization isknown to be actively prevented which explains why mostplastids have maintained a conserved genomic architecture(reviewed in Marechal and Brisson 2010) It is possiblethat mechanisms preventing unwanted recombination oralternatively those allowing for recombination-mediatedrepair were loosened or lost in the C velia plastid whichled to propagation of duplicates and rearrangementsThe apparent linearization of the chromosome could alsostem from a recombination-mediated rearrangementinvolving homologous TIRs and 15 bp repeats now locatedat the termini Interestingly although sequence shuffling hasbeen extensive in the C velia plastid genome it has not beencompletely random Most genes have been reorganized inlarge clusters with a strongly pronounced strand polarity(supplementary fig S8B Supplementary Material online)More importantly functionally related genes often clustertogether and are most likely transcribed polycistronically(fig 4) so even with high levels of rearrangement a function-ally reorganized genomic architecture can result fromselection

Intragenomic duplication may be the cause of most pecu-liarities in the C velia plastid genome but not all of themFeatures such as transcript oligoU tailing and noncanonicalgenetic code (Moore et al 2008 Janouskovec et al 2010) may

have different causes including increased mutation rate smallpopulation size and a short replication cycle Moreover earlyevolutionary acquisition and long-term combined effect ofsome these forces could have significantly contributed tothe genome remodeling For example all alveolate plastids(C velia apicomplexans Vitrella and dinoflagellates) have areduced gene complement somewhat modified gene orderand comparatively fast rate of protein evolution (Janouskovecet al 2010) Oligouridylylation of plastid transcripts is alsofound in dinoflagellates and may significantly predateC velia although it seems to be missing in apicomplexans(Janouskovec et al 2010 JJ unpublished data) The AtpBinsertion present in dinoflagellates and Vitrella at a homolo-gous position to the split in C velia could be viewed as anancestral acquisition that may have individualized the twoparts of the protein preparing ground for the split Similarlyrecombination has been very active in the dinoflagellate plas-tids and possibly even led to a massive fragmentation of theirgenome to small mini-circles (Zhang et al 1999 Zhang et al2001) so it might be tempting to speculate that the corecause of many of these conditions traces back to an ancientancestor Distinguishing between ancient and independentgains is nevertheless difficult The plastid genomes of apicom-plexans and Vitrella are generally not that unusual in struc-ture as C velia and dinoflagellates so it is more accurate topropose that all alveolate plastids were ancestrally somewhatdivergent but evolved in different directions

The fact that in a single organism two conservativephotosynthesis proteins are split is astonishing given thehigh efficiency of C velia photosynthesis (Quigg et al 2012)The mechanism behind a particular split can be relativelysimple but it is more difficult to see how each of the PsaAand AtpB fragments became integrated into a functionalmultiprotein complex For example both PsaA fragmentshad to coevolve with mechanisms allowing co-translationalinsertion of chlorophylls and simultaneously acquireremodelinteractions with numerous cofactors that mediate theirassembly into a fully functional PSI complex Many otherstructures in the C velia plastid appear similarly complicatedBoth the ribosome and RNA polymerase complex comprisea number of highly modified components including proteinswith hundreds of amino acid long extensions and insertionsExplaining the appearance of highly modified structures

FIG 8 Intrachromosomal duplication and recombination hotspot around the TIR boundary (vertical dashed line) The plastid genome is shown bythick black line at the bottom with TIRs (TIR-A and TIR-B) folded onto themselves to the left and the LSC region forming a loop to the right Other partsof the genome with homology to the TIR boundary region are shown by thick gray lines numbers indicate positions Genes and pseudogenes are shownby boxes Level of sequence similarity is distinguished according to the key

2458

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is more difficult than their components individually becausemany parts of these systems are mutually constrainedLikewise molecular processes such as oligoU tailing aredifficult to justify in adaptive terms Considering the complex-ity of similar unnecessary processes in other organellessuch as RNA editing or intron splicing (Gray et al 2010)it is attractive to hypothesize that complicated molecularmachineries have evolved in the C velia plastid that serveno general advantage

Materials and Methods

DNA Extraction Sequencing Annotation andFragmented Gene Analysis

Pelleted cells of C velia were ground in liquid nitrogenusing mortar and pestle and the resulting slurry was incu-bated in CTAB buffer (2 wv cetyltrimethyl ammoniumbromide 142 M NaCl 20 mM EDTA 100 mM Tris HCl pH80 2 wv polyvinylpyrrolidone 05 b-mercaptoethanol1 mgml RNase A) at 65 C for 20 min After two extrac-tions with phenolchloroform and one with chloroformonly DNA was precipitated with isopropanol and washedwith ethanol Pellet was dried at room temperature andresuspended in TE buffer Extracted DNA was separated byCsCl-Hoechst gradient ultracentrifugation and AT-enrichedfractions were tested for the presence of plastid DNA usingthe psbA probe Plastid-enriched DNA fractions weresequenced using Illumina 54 bp paired-end reads technol-ogy (a total of 9591179 read pairs were obtained) anddeposited in GenBank Sequence Read Archive (BioprojectPRJNA193178) De novo sequence assembly using MIRA3extended the previous plastid contig (NC_0143401) atboth ends to the final length of 120426 nucleotides at572 average depth of coverage Two errors at homopol-ymeric regions were corrected which led to the merging ofrps7 with orf142 and rpl4 with orf128 The gene for 5Sribosomal RNA was identified upstream of 23S rRNAand its fold verified using Mfold (httpmfoldrnaalbanyeduq=mfold last accessed May 25 2013) Proteins werere-annotated by comparison with plastid homologs froman NTG start codon with the exception of 10 proteins thatwere more consistent with an AT[ACT] start codon Six ofthe proteins had no alternative NTG start (atpA rpl31rps7 rps12 rps14 ycf3) Unknown ORFs and genes withlong N-terminal extensions were annotated from the firstNTG start codon The extended plastid genome sequenceand new annotations were deposited in an updated C veliaplastid genome entry (NC_0143402) PsaA trans-membranedomains were predicted using TMHMM 20 (httpwwwcbsdtudkservicesTMHMM last accessed May 25 2013)and by comparison to PsaA of Synechoccocus elongatusThe corrected genetic distances for PsaA-1 and PsaA-2were calculated in Tree-Puzzle 52 (Schmidt et al 2002)using a data set of 40 representative plastid PsaA homologsand the following parameters Likelihood mapping Slow(exact) parameter estimates WAG model with fourgamma categories estimated alpha parameter and esti-mated aa frequencies PsaA-12 phylogenies were

computed using a broader data set in RAxML using -mPROTGAMMALGF -f a - 100 PsaA-12 and AtpB data setswere aligned using ndashlocalpair option in MAFFT

RNA Extraction RACE and Northern BlotHybridization

Total RNA was isolated from ~02 g of C velia cells rup-tured by repeated freezing and thawing followed by grind-ing with a pestle in liquid nitrogen by the addition of 5 mlTRI reagent followed by manufacturerrsquos instructions(Sigma) Total RNA was ligated with 50 adapter (50

RACE) or self-ligated with T4 RNA ligase (circular RACE)and RT-PCR was carried out with either random or specificprimers or with polyT primers (30 RACE) Sequences ofcircularized transcripts were deposited under GenBankNucleotide accessions KC734564ndashKC734569 For Northernblot analysis ~10mg per lane of total RNA was separatedin a 1 formaldehyde agarose gel in 1 MOPS bufferblotted and UV cross-linked as described elsewhere(Vondruskova et al 2005) PCR-amplified DNA fragmentsof selected genes (for primers see supplementary table S4Supplementary Material online) were labeled by randompriming with [32P]-dATP and used as probes for hybrid-ization at 55 C After washing the results were visualizedon Typhoon Imaging System (GE Healthcare)

Preparations of Cell Membranes and TwoDimensional Electrophoresis

Cells of C velia (optical density at 750 nm = ~05) werewashed and resuspended in 1 ml of the working buffer con-taining 25 mM MESNaOH pH 65 5 mM CaCl2 10 mMMgCl2 and 20 glycerol The resuspended cells were mixedwith 05 ml of glass beads (01 mm diameter) in 2 ml eppen-dorf tube and broken using Mini-BeadBeater (BioSpeceight shaking cycles 10 s each with 2-min breaks for coolingthe suspension on ice) Membranes were separated fromthe cell extract by centrifugation (40000 g 20 min)The isolated membranes were resuspended in the workingbuffer and solubilized by gentle shaking with 1 dodecyl-b-maltoside at 10 C for 1 h Insoluble parts were removedby centrifugation (65000 g 20 min) Analysis of nativemembrane complexes was performed using Clear-Nativeelectrophoresis as described in Wittig and Schagger (2008)Individual proteins in membrane complexes were resolvedin the second dimension by SDS-PAGE in a 12ndash20 lineargradient polyacrylamide gel containing 7 M urea (Sobotkaet al 2008)

Protein Identification by LC-MSMS Analysis

Gel slices were placed in 200ml of 40 acetonitrile 200 mMammonium bicarbonate and incubated at 37 C for 30 minThe solution was then discarded the procedure was repeatedone time and the gel was finally dried in a vacuum centrifugeTwenty microliters of 40 mM ammonium bicarbonatecontaining 04mg trypsin (proteomics grade Sigma) wasadded to the tube incubated at 4 C for 45 min and thendried in a vacuum centrifuge To digest proteins 20ml of 9

2459

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 7: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

another reason is currently not clear and awaits successfulcloning of their 30 transcript ends The transcriptomes alsoenabled us to analyze expression pattern among neigboringgenes which have been highly rearranged and strand-polar-ized in C velia compared with other plastids Canonicaloperons (atpIHGFDA psaAB petBD ribosomal proteinsupercluster ribosomal RNA operon eg Westhoff et al1983) and translationally coupled units (atpBE Gatenbyet al 1989) were also affected by rearrangements and eitherfragmented or internally reshuffled Canonical promoterscould not be identified These characteristics indicate thattranscription of many core plastid genes has been remodeledNorthern hybridization patterns in the psaA-1 and psaA-2region already revealed that this is the case in several keyproteins of the photosystems which are expressed fromnovel multicistronic mRNAs through complex processing(fig 1) Whole-genome transcriptional profiles (fig 4) confirmthis and suggest that polycistronic transcription is widespreadin the genome (compare transcript coverage at intergenicspacers and the region downstream of the 16S rRNA genefig 4) Although the primary transcript regions and processingpatterns cannot be predicted accurately the data show thatnovel polycistronic units most likely exist and often includefunctionally similar genes (compare distribution of genes forphotosynthesis and protein expression in fig 4)

Finally we estimated and compared relative expression ofall plastid genes separately and in their functional groupsbased on the total RNA transcriptome (table 1 and supple-mentary table S3 Supplementary Material online) Asexpected the rRNA genes were most highly expressedfollowed by genes of the four membrane complexes (PSIPSII cytochrome b6f and ATP synthase) and these alsocomprised all of the 22 most expressed protein-codinggenes with the single exception of tufA The least expressedwere genes related to transcription and translation RNApolymerase tRNA genes and ribosomal proteins (table 1)However unknown ORFs and most intergenic regions werealso expressed including relatively highly expressed orf389orf201 and intergenic regions downstream of psbJ andatpB-1 (fig 4) Additionally we used the transcriptome toaddress the presence of RNA editing which is known in theplastids of the related dinoflagellates (eg Dang and Green2009) but has not been found in those of apicomplexans andVitrella The consensus of transcriptomic reads at each sitewas identical to the C velia plastid genome sequence suggest-ing the absence of editing Only four sites showed significantpolymorphism (gt10) at RNA level (sites 44346 [515]46071 [2366] 46074 [3935] and 46574 [2379]) all ofwhich were located within the rRNA operon and identical(998ndash100) at the DNA level Based on their location weconclude that this is more likely due to reverse transcriptionerrors of modified rRNA nucleosides and not RNA editing

A Linear-Mapping Plastid Genome in C velia

One of the most significant features of the previouslypublished plastid genome from C velia is that it does notmap as a circle as do all other plastid genomes completed todate The gap in the sequence falls between the two copies of

the TIR containing psbA (Janouskovec et al 2010) Weattempted to close this gap using multiple approaches Westarted by massively increasing the depth of sequence cover-age The genome was originally sequenced by 454 pyrose-quencing to a 13-fold depth of coverage so we used anindependent approach and assembled the genome frompaired-end Illumina sequence reads to an average depth ofalmost 600-fold This approach extended the two TIRs by300 bp on each side which was verified by PCR (supplemen-tary fig S6 Supplementary Material online) The very end ofeach TIR was followed by a 15 bp sequence further followedby an extension into the complementary TIR at a very lowdepth of coverage (fig 5) These extensions created an overlapbetween the ends however several lines of evidencesuggested that this is not due to a circular chromosomebut rather a linear topology with structured ends Firstmapping the depth of coverage revealed a pattern expectedfor a linear molecule The depth of coverage was generallyconsistent across the large single copy (LSC) region and mostof the TIRs but reduced linearly beginning at about 600 bpfrom the end followed by an exponential reduction andpartial recovery between 300 to 50 bp from the end followedby a terminal drop (fig 5A) The size and shape of theexponential reductionrecovery would be predicted by themean DNA fragment size in the sequencing library (327 bp)and linear topology as a consequence of fragmentation biasnear the chromosome end Therefore both the overalldecrease in coverage depth and the exponential decreaserecovery close to the ends support a linear chromosomeconformation (if it were circular the whole region shouldnot differ from the rest of the genome) Second if thegenome was circular-mapping and the gap was hard tosequence we would expect some paired ends to span thegap However when more than 650000 paired ends aremapped to the genome not a single paired end spanningthe gap was identified (see Materials and Methods) Insteadboth ends of the chromosome consisted exclusively ofreverse-mapping reads (fig 5A and B) Third PCR experi-ments using six primers (all shown to successfully amplifyproducts from one side of the gap) and all their nested com-binations failed to fill the gap between the two TIRs (supple-mentary fig S6 Supplementary Material online) Fourth thesize of the plastid genome estimated directly by pulsed-fieldgel electrophoresis (PFGE) and by psbA probe hybridizationwas consistent with abundant presence of linear monomers120 kb in size (supplementary fig S7 Supplementary Materialonline and Janouskovec et al 2010) Subgenomic-sizedmaterial and unresolved DNA in the well were also abun-dantly represented whereas putative linear dimers wererare and linear concatemers were not detected The PFGEexperiment was reproduced by using probes to three plastidgenes encoded in the LSC region tufA petA and atpA (seeMaterials and Methods) and consistently led to the sameresult suggesting that linear monomers are an importantconstituent of plastid DNA in C velia Subgenomic-sized mol-ecules could represent plastid DNA fragmented during PFGEpreparation (note that intact C velia cells were digestedin agarose plugs however) nucleus-encoded plastid DNA

2453

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

fragments plastid DNA in the process of replication or otherforms of genuine plastid DNA (Ellis and Day 1986 Oldenburgand Bendich 2004 Scharff and Koop 2006) Incompletely di-gested cells of C velia in the agarose plugs (see Materials andMethods) could be responsible for much of the signal in thewell although high-molecular-weight branched DNA formscould also be present and cannot be distinguished using thecurrent approach (Bendich 1991 Bendich 2004) Altogetherbecause the plastid genome was not closed by deep sequenc-ing using either 454 or Illumina no paired-end linkage couldbe established no linkage could be established by PCR and

because the size of the plastid genome estimated by PFGE andSouthern blot hybridizations was consistent with the exis-tence of linear monomers we conclude that the genome isprincipally linear in structure and circular molecules andlinear concatemers are rare or absent

If the genome is indeed linear it will be important to knowhow the C velia plastid DNA replication and copy numbercompares to the circularconcatemeric plastid DNA inapicomplexans (Williamson et al 2001 Williamson et al2002) and other organisms To provide a first glimpse intothis we identified a putative replication origin of the C velia

A

B

FIG 5 Schematic of plastid chromosome ends (A) Coverage depth (y axis) by forward-mapping (yellow) reverse-mapping (purple) and total (red)genomic reads is projected onto the TIR Position of genes is shown by gray boxes at the bottom (K = trnK P = trnP) Total coverage depth startsdropping gradually at about 800 bp from the chromosome end and goes through a U-shaped minimum at about 150 bp from the end The calculatedmean size of the end-sequenced DNA fragments (horizontal black bar red bits correspond to sequenced end pairs) and coverage by forwardreverseread pairs in this region suggest that the U-shaped minimum results from unequal DNA fragmentation near the chromosome end A steep drop in totalcoverage depth occurs at the very end of the chromosome (B) Here the TIRs on each of the chromosome ends (green and blue respectively) divergeinto a shared 15 bp region followed by a short sequence of the complementary TIR (DNA ends marked as 30 and 50) The total depth of coverage byDNA reads (y axis) steeply decreases in the 15 bp region The 15 bp region is also enriched in nucleotide discrepancies (sequence logos in bold) The last130 bp at each chromosome end is exclusively assembled from reverse-oriented paired ends (horizontal colored arrows) and no paired-end spanningthis region can be identified suggesting that the genome cannot be genuinely circularized

2454

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

plastid genome in silico using the cumulative GC-skew anal-ysis This region (68ndash69 kb from one end of the linear contigsupplementary fig S8A Supplementary Material online) ischaracterized by a minimum in cumulative GC-skew andloosely conserved tandem repeats (positions 68404ndash69029)In contrast both ends of the chromosome are at a cumulativeGC-skew maximum suggesting that they most likely corre-spond to replication termini Another feature commonlyassociated with origins of replication is the major shift ingene orientation which is located approximately 18 kbaway from the predicted replication origin in C velia (supple-mentary fig S8B Supplementary Material online) Thegenomic region surrounding the two features is relativelyoverrepresented in the Illumina data whereas both chromo-some ends are relatively underrepresented (this holds aftercorrection for base composition supplementary fig S8CSupplementary Material online) which is consistent withthe existence of replicating molecules in the Illumina libraryAltogether these three characteristics are consistent withlinear-mapping topology and suggest that replication startsat about two-thirds of the way from one end of the moleculeand proceeds bidirectionally toward the ends The copynumber for two C velia plastid genes (tufA and atpH) wasdetermined using dot blot hybridization to total genomicDNA and a synthetic construct (see Materials andMethods) The inferred copy number of both plastid geneswas 9 times higher compared with a single-copy nuclear genetopoisomerase II and confirmed that genes encoded in theLSC region and TIR are equally represented (atpH is present inthree closely related paralogs all of which hybridized to theconstruct) (supplementary fig S9 Supplementary Materialonline)

Discussion

Significance of Split Proteins for Photosynthesis andATP Generation

Both PsaA and AtpB have never been observed to be frag-mented like those of C velia Although the psaA gene is splitin Chlamydomonas reinhardtii RNA trans-splicing producesa full-length transcript which is translated to a full-lengthprotein (Merendino et al 2006) The structure of the PSIcore is highly conserved in all known oxygenic phototrophs(Nelson and Yocum 2006 Busch and Hippler 2011) and it isformed by a heterodimer of structurally similar PsaA and PsaBproteins which bind a remarkably high number of cofactorsabout 100 chlorophylls 14 carotenoids and 2 phylloquinones(Amunts et al 2010) The algal PSI is expected to be structur-ally similar to the plant PSI including several Lhca antennaproteins bound in a half-circle around the PSI (fig 6A)

PsaA in C velia is split between domains 4 and 5 (figs 1 and6A) This suggests that the split was not simply a reversal ofthe proposed ancient formation of psaA by the gene fusion ofa psbBpsbC-like antenna component and a psbApsbD-likecore of the reaction centre (domains 1ndash6 and 7ndash11 respec-tively in fig 1 Schubert et al 1998) The position of the splitand the faster evolutionary rate of PsaA-1 (peripheral frag-ment) can be reconciled with the structure of the PSI (fig 6A

and B) Indeed the presence of a supercomplex of PSI andantenna proteins in C velia is obvious from the 2D electro-phoresis (fig 3) Apart from the structure of PSI itself theprocess of the PSI assembly seems to be highly conservedthrough evolution from cyanobacteria to higher plant chlo-roplasts including assistance of the same auxiliary factors suchas Ycf3 and Ycf4 (Boudreau et al 1997 Ruf et al 1997)Although many individual steps in the PSI biogenesisremain unknown this process had to be remodeled inC velia to assemble three proteins instead of two into thefunctional PSI core complex Particularly intriguing is thebinding of chlorophyll cofactors into the split PsaA Both invivo and in vitro studies suggest that chlorophyll has to beinserted into core subunits of PSI co-translationally probablyas a prerequisite for correct protein folding (Kim et al 1994Eichacker et al 1996) The putative interface between PsaA-1and PsaA-2 is rich in chlorophyll molecules (fig 6B) andaccording to available crystal structures these chlorophyllsare coordinated by both parts of split PsaA It is not clearhow C velia inserts these chlorophylls into PSI and how sta-bility and correct folding of the nascent PsaA-12 is achievedPerhaps both PsaA fragments are assembled together co-translationally and this is synchronized with or even assistedby chlorophyll loading (which could also help to coordinatethe orientation of the two fragments) In any case the PSI corebiogenesis in C velia could involve additional synchronizingassembly step(s) and the recruitment of new nuclear-encoded auxiliary factors Interestingly the second proteinof the PSI antenna core PsaB mirrors some of the PsaAstructure at functionally analogous positions although notsplit PsaB also contains a variable loop between the fourthand fifth trans-membrane helices and a highly divergentN-terminus Whether this plays a compensatory role directlyrelated to the splitting of PsaA is not clear but it is an inter-esting possibility that would require direct biochemicaltesting

Similar to the PsaA protein any split of the AtpB proteinwould seem improbable due to its critical function andconserved structure Even though the proposed AtpB-1part of the b-subunit appears to be relatively far from thecatalytic site (fig 6C) it is known that tilting of the b-subunitsincluding the top b-sheet crown is critical for the catalyticcycle The movement of the upper part of the b-subunit issubtle but if it is restricted by inhibitor tentoxin then thecyclic interconversion of nucleotide binding sites is blocked(Groth 2002) Tentoxin binds close to the place where theAtpB protein is split (Groth 2002) Correct reassembly ofAtpB-1 and AtpB-2 proteins is therefore likely to be highlyconstrained to avoid any restrictions of AtpB-2 movementOn the other hand it is also possible that the fragmentedAtpB-2 is more flexible and perhaps less sensitive to somenatural inhibitors The ATP synthase was also noteworthy inthat it migrated at the top of the native protein gel This hasnever been observed in cyanobacteria (Herranen et al 2004)or higher plants (Jarvi et al 2011) and indicates that ATPsynthase might be integrated into a very stable megadaltonsupercomplex in the plastid of C velia

2455

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Highly Divergent Characteristics of Gene andChromosomal Structure

A number of highly modified ORFs are found in the C veliaplastid genome and distributed without any obvious patternLong extensions or insertions in plastid proteins are uncom-mon in most plastids including those in other alveolatesSome have been identified in green algae (eg de Cambiaireet al 2006) but never in the extent or number reported hereMany of the C velia predicted proteins that contain exten-sions function in the small subunit of the ribosome Ifthe extensions are really present in the final functional pro-teins this has implications for plastid ribosome structureAlternatively transcripts of the rps ORFs and other unusualORFs could be modified so that only the homologous regionis translated This process may be related to transcript oligour-idylylation but as yet we have no evidence to support thisOligoU tailing is another modification of general interest(Wang and Morse 2006) and its significance and distribution

in C velia plastid transcripts remain unknown OligoU tailsin some C velia transcripts are found in mature mRNAs thatare derived from polycistronic primary transcripts (fig 1)This suggests a possible role in transcript processing perhapssimilar to that observed in dinoflagellate plastids (Nelsonet al 2007 Dang and Green 2009 Barbrook et al 2012)If confirmed the ubiquitous presence of polycistronictranscription would imply that oligoU tailing is widespreadin the C velia plastid and tightly intertwined with expressionof many core plastid genes

No linear-mapping plastid genome has been documentedto date although there is a longstanding debate aboutwhether circular-mapping plastid genomes might representphysically linear molecules (Bendich 2004) At a sufficientdepth of coverage the two ends of the C velia plastidgenome can be (barely) overlapped in sequence but multiplelines of evidence suggest the genome is actually linear ThePFGE data support existence of linear monomers and rare

FIG 6 A putative position of the PsaA proteins in PSI and the AtpB proteins in ATP synthase of Chromera velia as approximated from proteinalignments and available crystal structures of plant counterparts (A) A stromal view on the pea PSI-LHC complex (PDB ID code 2WSC Amunts et al2009) showing a putative position of PsaA-1 and PsaA-2 in the PSI complex Lhca antenna proteins are shown in green stromal subunits PsaC PsaD andPsaE were removed for clarity (B) A detail of the same structure showing arrangement of chlorophyll molecules bound to the PSI core Red circle markschlorophylls expected to be bound at an interface between PsaA-1 and PsaA-2 proteins in C velia (see discussion for details) (C) AtpB-1 and AtpB-2proteins fitted into structure of the spinach F1-ATPase (PDB ID code 1KMH) Catalytic site is indicated by an orange asterisk

2456

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

dimers (which could arise by TIR-mediated recombinationbetween two monomers) but is incongruent with a signifi-cant presence of linear concatemers This leaves no directcomparisons possible with other plastid DNAs includingthose in related apicomplexans which comprise various mix-tures of circular molecules linear concatemers and high-mo-lecular-weight material (Lilly et al 2001 Williamson et al 2002Bendich 2004 Day and Madesis 2007) Presence of TIRs in theC velia plastid genome is a common characteristic of lineargenomes (eg in apicomplexan mitochondria Kairo et al1994 Hikosaka et al 2010) The physical structure of thechromosome ends remains unknown but given thesequence one can speculate based on structures observedin other linear genomes These include single-strandedloops (possibly followed by a nick) or single-stranded over-hangs in the 15 bp regions (reviewed in Nosek et al 2004) ordiverse associations with end-specific proteins (eg Rekoshet al 1977 Tomaska et al 1997) Gradually decreasing depthof coverage near the chromosome end may also suggest thatthe whole terminal 600 bp region not just its very end isprotected possibly by a t-loop (Tomaska et al 2002) Norepeated sequence can be found close to the ends howeverso determining which of these structures if any most closelyrepresents the physical state of the C velia plastid chromo-some will require direct testing Altogether these data provideevidence for the first linear-mapping plastid genome inC velia and suggests that a similar topology could existin other plastid genomes particularly those that do notpresently map as a circle (Gabrielsen et al 2011)

The change in the plastid chromosome topology hasimportant implications on the process of DNA replicationand overcoming the end-replication problem In the api-complexan and other plastids several types of replicationorigins have been identified and linked to the D-loop roll-ing circle and recombination-mediated replication strate-gies The most common type of replication origin isassociated with bidirectional D-loop replication and locatedinside the duplicated rRNA inverted repeat (Kolodner andTewari 1975 Williamson et al 2002 Krishnan and Rao2009) In contrast the C velia plastid genome lacks boththe duplicated rRNA unit and associated replication originand is probably replicated bidirectionally from thepredicted putative replication origin toward the chromo-some termini which is a unique situation among all plastidgenomes The estimated plastid gene copy number (9that of a nuclear marker) is also unusually low comparedwith most photosynthetic plastids (~50ndash100 copies Dayand Madesis 2007) and is more similar to the gene copynumber in the non-photosynthetic plastids of apicomplex-ans (~25 copies Matsuzaki et al 2001) or plastid minicir-cles during exponential growth in dinoflagellates(2ndash4 copies Koumandou and Howe 2007)

Searching for Forces Driving Structural Complexity

The C velia plastid genome is unique in a number of waysHere we have examined several unusual characteristics ofits organization and expression all of which raises the

question why is this genome home to so many odditiesOne property that might explain several of these character-istics concurrently is a high level of intragenomic duplicationThe C velia plastid genome has been extensively reshuffledand in order to understand the underlying forces wesearched for palindromes short repeats and long repeatsPalindromes and short repeats were comparatively rarehowever the searches identified at least 46 pairwise matchesbetween longer genomic sequences (gt50 bp) at 65ndash100similarity (fig 7) Most of the duplicates are divergent andlow copy number (2ndash3) and apart from the TIRs they includemany intergenic regions duplicated atpH and clpC genesand duplicated fragments of otherwise complete genes(psaA psaB psbB rpl3 and tufA fig 7) The region containingmost repeated sequence is found around the TIR boundary(fig 8) Multiple parts of this region including severalgene fragments three atpH paralogs and six trnK paralogscomprising both complete and fragmentary variants arescattered in several places in the genome (fig 8) Long repeatsare rare in most plastids suggesting that duplication isan important force in shaping the C velia plastid genomeDuplication could explain occurrence of paralogs small genefragments extensive gene re-shuffling and re-structuringof operon units through movement of promoters andintergenic elements Similar processes could account for theaddition of extensions and insertions to genes The splitting ofpsaA and atpB could likewise be seen as a process involvingduplication or partial duplication For example the PsaA splitcould have proceeded through an intermediate in which the

FIG 7 Large repeats in the Chromera velia plastid genome The lineargenome is drawn as an incomplete circle with genes in black intergenicsin light gray and two TIRs (white lines) at chromosome ends (15 bpregions at the termini are not visible here) Links connect homologousregions of 50 bp or more identified by BLAST (dark gray) and additionalmatches identified by Pipmaker (medium gray) Homology between theTIRs is not shown for clarity Names of genes most affected by dupli-cation events are indicated

2457

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

canonical and split PsaAs coexisted In this scenario each ofthe two complexes could have been regulated independentlyand potentially even have different functions much alike thecyanobacterial monomeric and trimeric PSI (Majeed et al2012)

Duplicated sequences in C velia plastid are low copynumber long and divergent from each other They aredissimilar from short dispersed repeats found in some algalplastids (eg in Chlamydomonas Maul et al 2002) and longerrepeats found in angiosperm plastids (eg Guisinger et al2011) all of which are conserved in sequence and compara-tively high copy number It remains unclear how duplicationis mediated in C velia Few duplicates are in tandem so theduplication process is not likely to be related to DNA repli-cation and is more likely related to recombination A func-tional DNA recombination machinery has been documentedin several land plant and algal plastids (Palmer 1983 Boyntonet al 1988 Haberle et al 2008) where it primarily acts in DNArepair and replication Unwanted genome reorganization isknown to be actively prevented which explains why mostplastids have maintained a conserved genomic architecture(reviewed in Marechal and Brisson 2010) It is possiblethat mechanisms preventing unwanted recombination oralternatively those allowing for recombination-mediatedrepair were loosened or lost in the C velia plastid whichled to propagation of duplicates and rearrangementsThe apparent linearization of the chromosome could alsostem from a recombination-mediated rearrangementinvolving homologous TIRs and 15 bp repeats now locatedat the termini Interestingly although sequence shuffling hasbeen extensive in the C velia plastid genome it has not beencompletely random Most genes have been reorganized inlarge clusters with a strongly pronounced strand polarity(supplementary fig S8B Supplementary Material online)More importantly functionally related genes often clustertogether and are most likely transcribed polycistronically(fig 4) so even with high levels of rearrangement a function-ally reorganized genomic architecture can result fromselection

Intragenomic duplication may be the cause of most pecu-liarities in the C velia plastid genome but not all of themFeatures such as transcript oligoU tailing and noncanonicalgenetic code (Moore et al 2008 Janouskovec et al 2010) may

have different causes including increased mutation rate smallpopulation size and a short replication cycle Moreover earlyevolutionary acquisition and long-term combined effect ofsome these forces could have significantly contributed tothe genome remodeling For example all alveolate plastids(C velia apicomplexans Vitrella and dinoflagellates) have areduced gene complement somewhat modified gene orderand comparatively fast rate of protein evolution (Janouskovecet al 2010) Oligouridylylation of plastid transcripts is alsofound in dinoflagellates and may significantly predateC velia although it seems to be missing in apicomplexans(Janouskovec et al 2010 JJ unpublished data) The AtpBinsertion present in dinoflagellates and Vitrella at a homolo-gous position to the split in C velia could be viewed as anancestral acquisition that may have individualized the twoparts of the protein preparing ground for the split Similarlyrecombination has been very active in the dinoflagellate plas-tids and possibly even led to a massive fragmentation of theirgenome to small mini-circles (Zhang et al 1999 Zhang et al2001) so it might be tempting to speculate that the corecause of many of these conditions traces back to an ancientancestor Distinguishing between ancient and independentgains is nevertheless difficult The plastid genomes of apicom-plexans and Vitrella are generally not that unusual in struc-ture as C velia and dinoflagellates so it is more accurate topropose that all alveolate plastids were ancestrally somewhatdivergent but evolved in different directions

The fact that in a single organism two conservativephotosynthesis proteins are split is astonishing given thehigh efficiency of C velia photosynthesis (Quigg et al 2012)The mechanism behind a particular split can be relativelysimple but it is more difficult to see how each of the PsaAand AtpB fragments became integrated into a functionalmultiprotein complex For example both PsaA fragmentshad to coevolve with mechanisms allowing co-translationalinsertion of chlorophylls and simultaneously acquireremodelinteractions with numerous cofactors that mediate theirassembly into a fully functional PSI complex Many otherstructures in the C velia plastid appear similarly complicatedBoth the ribosome and RNA polymerase complex comprisea number of highly modified components including proteinswith hundreds of amino acid long extensions and insertionsExplaining the appearance of highly modified structures

FIG 8 Intrachromosomal duplication and recombination hotspot around the TIR boundary (vertical dashed line) The plastid genome is shown bythick black line at the bottom with TIRs (TIR-A and TIR-B) folded onto themselves to the left and the LSC region forming a loop to the right Other partsof the genome with homology to the TIR boundary region are shown by thick gray lines numbers indicate positions Genes and pseudogenes are shownby boxes Level of sequence similarity is distinguished according to the key

2458

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is more difficult than their components individually becausemany parts of these systems are mutually constrainedLikewise molecular processes such as oligoU tailing aredifficult to justify in adaptive terms Considering the complex-ity of similar unnecessary processes in other organellessuch as RNA editing or intron splicing (Gray et al 2010)it is attractive to hypothesize that complicated molecularmachineries have evolved in the C velia plastid that serveno general advantage

Materials and Methods

DNA Extraction Sequencing Annotation andFragmented Gene Analysis

Pelleted cells of C velia were ground in liquid nitrogenusing mortar and pestle and the resulting slurry was incu-bated in CTAB buffer (2 wv cetyltrimethyl ammoniumbromide 142 M NaCl 20 mM EDTA 100 mM Tris HCl pH80 2 wv polyvinylpyrrolidone 05 b-mercaptoethanol1 mgml RNase A) at 65 C for 20 min After two extrac-tions with phenolchloroform and one with chloroformonly DNA was precipitated with isopropanol and washedwith ethanol Pellet was dried at room temperature andresuspended in TE buffer Extracted DNA was separated byCsCl-Hoechst gradient ultracentrifugation and AT-enrichedfractions were tested for the presence of plastid DNA usingthe psbA probe Plastid-enriched DNA fractions weresequenced using Illumina 54 bp paired-end reads technol-ogy (a total of 9591179 read pairs were obtained) anddeposited in GenBank Sequence Read Archive (BioprojectPRJNA193178) De novo sequence assembly using MIRA3extended the previous plastid contig (NC_0143401) atboth ends to the final length of 120426 nucleotides at572 average depth of coverage Two errors at homopol-ymeric regions were corrected which led to the merging ofrps7 with orf142 and rpl4 with orf128 The gene for 5Sribosomal RNA was identified upstream of 23S rRNAand its fold verified using Mfold (httpmfoldrnaalbanyeduq=mfold last accessed May 25 2013) Proteins werere-annotated by comparison with plastid homologs froman NTG start codon with the exception of 10 proteins thatwere more consistent with an AT[ACT] start codon Six ofthe proteins had no alternative NTG start (atpA rpl31rps7 rps12 rps14 ycf3) Unknown ORFs and genes withlong N-terminal extensions were annotated from the firstNTG start codon The extended plastid genome sequenceand new annotations were deposited in an updated C veliaplastid genome entry (NC_0143402) PsaA trans-membranedomains were predicted using TMHMM 20 (httpwwwcbsdtudkservicesTMHMM last accessed May 25 2013)and by comparison to PsaA of Synechoccocus elongatusThe corrected genetic distances for PsaA-1 and PsaA-2were calculated in Tree-Puzzle 52 (Schmidt et al 2002)using a data set of 40 representative plastid PsaA homologsand the following parameters Likelihood mapping Slow(exact) parameter estimates WAG model with fourgamma categories estimated alpha parameter and esti-mated aa frequencies PsaA-12 phylogenies were

computed using a broader data set in RAxML using -mPROTGAMMALGF -f a - 100 PsaA-12 and AtpB data setswere aligned using ndashlocalpair option in MAFFT

RNA Extraction RACE and Northern BlotHybridization

Total RNA was isolated from ~02 g of C velia cells rup-tured by repeated freezing and thawing followed by grind-ing with a pestle in liquid nitrogen by the addition of 5 mlTRI reagent followed by manufacturerrsquos instructions(Sigma) Total RNA was ligated with 50 adapter (50

RACE) or self-ligated with T4 RNA ligase (circular RACE)and RT-PCR was carried out with either random or specificprimers or with polyT primers (30 RACE) Sequences ofcircularized transcripts were deposited under GenBankNucleotide accessions KC734564ndashKC734569 For Northernblot analysis ~10mg per lane of total RNA was separatedin a 1 formaldehyde agarose gel in 1 MOPS bufferblotted and UV cross-linked as described elsewhere(Vondruskova et al 2005) PCR-amplified DNA fragmentsof selected genes (for primers see supplementary table S4Supplementary Material online) were labeled by randompriming with [32P]-dATP and used as probes for hybrid-ization at 55 C After washing the results were visualizedon Typhoon Imaging System (GE Healthcare)

Preparations of Cell Membranes and TwoDimensional Electrophoresis

Cells of C velia (optical density at 750 nm = ~05) werewashed and resuspended in 1 ml of the working buffer con-taining 25 mM MESNaOH pH 65 5 mM CaCl2 10 mMMgCl2 and 20 glycerol The resuspended cells were mixedwith 05 ml of glass beads (01 mm diameter) in 2 ml eppen-dorf tube and broken using Mini-BeadBeater (BioSpeceight shaking cycles 10 s each with 2-min breaks for coolingthe suspension on ice) Membranes were separated fromthe cell extract by centrifugation (40000 g 20 min)The isolated membranes were resuspended in the workingbuffer and solubilized by gentle shaking with 1 dodecyl-b-maltoside at 10 C for 1 h Insoluble parts were removedby centrifugation (65000 g 20 min) Analysis of nativemembrane complexes was performed using Clear-Nativeelectrophoresis as described in Wittig and Schagger (2008)Individual proteins in membrane complexes were resolvedin the second dimension by SDS-PAGE in a 12ndash20 lineargradient polyacrylamide gel containing 7 M urea (Sobotkaet al 2008)

Protein Identification by LC-MSMS Analysis

Gel slices were placed in 200ml of 40 acetonitrile 200 mMammonium bicarbonate and incubated at 37 C for 30 minThe solution was then discarded the procedure was repeatedone time and the gel was finally dried in a vacuum centrifugeTwenty microliters of 40 mM ammonium bicarbonatecontaining 04mg trypsin (proteomics grade Sigma) wasadded to the tube incubated at 4 C for 45 min and thendried in a vacuum centrifuge To digest proteins 20ml of 9

2459

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 8: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

fragments plastid DNA in the process of replication or otherforms of genuine plastid DNA (Ellis and Day 1986 Oldenburgand Bendich 2004 Scharff and Koop 2006) Incompletely di-gested cells of C velia in the agarose plugs (see Materials andMethods) could be responsible for much of the signal in thewell although high-molecular-weight branched DNA formscould also be present and cannot be distinguished using thecurrent approach (Bendich 1991 Bendich 2004) Altogetherbecause the plastid genome was not closed by deep sequenc-ing using either 454 or Illumina no paired-end linkage couldbe established no linkage could be established by PCR and

because the size of the plastid genome estimated by PFGE andSouthern blot hybridizations was consistent with the exis-tence of linear monomers we conclude that the genome isprincipally linear in structure and circular molecules andlinear concatemers are rare or absent

If the genome is indeed linear it will be important to knowhow the C velia plastid DNA replication and copy numbercompares to the circularconcatemeric plastid DNA inapicomplexans (Williamson et al 2001 Williamson et al2002) and other organisms To provide a first glimpse intothis we identified a putative replication origin of the C velia

A

B

FIG 5 Schematic of plastid chromosome ends (A) Coverage depth (y axis) by forward-mapping (yellow) reverse-mapping (purple) and total (red)genomic reads is projected onto the TIR Position of genes is shown by gray boxes at the bottom (K = trnK P = trnP) Total coverage depth startsdropping gradually at about 800 bp from the chromosome end and goes through a U-shaped minimum at about 150 bp from the end The calculatedmean size of the end-sequenced DNA fragments (horizontal black bar red bits correspond to sequenced end pairs) and coverage by forwardreverseread pairs in this region suggest that the U-shaped minimum results from unequal DNA fragmentation near the chromosome end A steep drop in totalcoverage depth occurs at the very end of the chromosome (B) Here the TIRs on each of the chromosome ends (green and blue respectively) divergeinto a shared 15 bp region followed by a short sequence of the complementary TIR (DNA ends marked as 30 and 50) The total depth of coverage byDNA reads (y axis) steeply decreases in the 15 bp region The 15 bp region is also enriched in nucleotide discrepancies (sequence logos in bold) The last130 bp at each chromosome end is exclusively assembled from reverse-oriented paired ends (horizontal colored arrows) and no paired-end spanningthis region can be identified suggesting that the genome cannot be genuinely circularized

2454

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

plastid genome in silico using the cumulative GC-skew anal-ysis This region (68ndash69 kb from one end of the linear contigsupplementary fig S8A Supplementary Material online) ischaracterized by a minimum in cumulative GC-skew andloosely conserved tandem repeats (positions 68404ndash69029)In contrast both ends of the chromosome are at a cumulativeGC-skew maximum suggesting that they most likely corre-spond to replication termini Another feature commonlyassociated with origins of replication is the major shift ingene orientation which is located approximately 18 kbaway from the predicted replication origin in C velia (supple-mentary fig S8B Supplementary Material online) Thegenomic region surrounding the two features is relativelyoverrepresented in the Illumina data whereas both chromo-some ends are relatively underrepresented (this holds aftercorrection for base composition supplementary fig S8CSupplementary Material online) which is consistent withthe existence of replicating molecules in the Illumina libraryAltogether these three characteristics are consistent withlinear-mapping topology and suggest that replication startsat about two-thirds of the way from one end of the moleculeand proceeds bidirectionally toward the ends The copynumber for two C velia plastid genes (tufA and atpH) wasdetermined using dot blot hybridization to total genomicDNA and a synthetic construct (see Materials andMethods) The inferred copy number of both plastid geneswas 9 times higher compared with a single-copy nuclear genetopoisomerase II and confirmed that genes encoded in theLSC region and TIR are equally represented (atpH is present inthree closely related paralogs all of which hybridized to theconstruct) (supplementary fig S9 Supplementary Materialonline)

Discussion

Significance of Split Proteins for Photosynthesis andATP Generation

Both PsaA and AtpB have never been observed to be frag-mented like those of C velia Although the psaA gene is splitin Chlamydomonas reinhardtii RNA trans-splicing producesa full-length transcript which is translated to a full-lengthprotein (Merendino et al 2006) The structure of the PSIcore is highly conserved in all known oxygenic phototrophs(Nelson and Yocum 2006 Busch and Hippler 2011) and it isformed by a heterodimer of structurally similar PsaA and PsaBproteins which bind a remarkably high number of cofactorsabout 100 chlorophylls 14 carotenoids and 2 phylloquinones(Amunts et al 2010) The algal PSI is expected to be structur-ally similar to the plant PSI including several Lhca antennaproteins bound in a half-circle around the PSI (fig 6A)

PsaA in C velia is split between domains 4 and 5 (figs 1 and6A) This suggests that the split was not simply a reversal ofthe proposed ancient formation of psaA by the gene fusion ofa psbBpsbC-like antenna component and a psbApsbD-likecore of the reaction centre (domains 1ndash6 and 7ndash11 respec-tively in fig 1 Schubert et al 1998) The position of the splitand the faster evolutionary rate of PsaA-1 (peripheral frag-ment) can be reconciled with the structure of the PSI (fig 6A

and B) Indeed the presence of a supercomplex of PSI andantenna proteins in C velia is obvious from the 2D electro-phoresis (fig 3) Apart from the structure of PSI itself theprocess of the PSI assembly seems to be highly conservedthrough evolution from cyanobacteria to higher plant chlo-roplasts including assistance of the same auxiliary factors suchas Ycf3 and Ycf4 (Boudreau et al 1997 Ruf et al 1997)Although many individual steps in the PSI biogenesisremain unknown this process had to be remodeled inC velia to assemble three proteins instead of two into thefunctional PSI core complex Particularly intriguing is thebinding of chlorophyll cofactors into the split PsaA Both invivo and in vitro studies suggest that chlorophyll has to beinserted into core subunits of PSI co-translationally probablyas a prerequisite for correct protein folding (Kim et al 1994Eichacker et al 1996) The putative interface between PsaA-1and PsaA-2 is rich in chlorophyll molecules (fig 6B) andaccording to available crystal structures these chlorophyllsare coordinated by both parts of split PsaA It is not clearhow C velia inserts these chlorophylls into PSI and how sta-bility and correct folding of the nascent PsaA-12 is achievedPerhaps both PsaA fragments are assembled together co-translationally and this is synchronized with or even assistedby chlorophyll loading (which could also help to coordinatethe orientation of the two fragments) In any case the PSI corebiogenesis in C velia could involve additional synchronizingassembly step(s) and the recruitment of new nuclear-encoded auxiliary factors Interestingly the second proteinof the PSI antenna core PsaB mirrors some of the PsaAstructure at functionally analogous positions although notsplit PsaB also contains a variable loop between the fourthand fifth trans-membrane helices and a highly divergentN-terminus Whether this plays a compensatory role directlyrelated to the splitting of PsaA is not clear but it is an inter-esting possibility that would require direct biochemicaltesting

Similar to the PsaA protein any split of the AtpB proteinwould seem improbable due to its critical function andconserved structure Even though the proposed AtpB-1part of the b-subunit appears to be relatively far from thecatalytic site (fig 6C) it is known that tilting of the b-subunitsincluding the top b-sheet crown is critical for the catalyticcycle The movement of the upper part of the b-subunit issubtle but if it is restricted by inhibitor tentoxin then thecyclic interconversion of nucleotide binding sites is blocked(Groth 2002) Tentoxin binds close to the place where theAtpB protein is split (Groth 2002) Correct reassembly ofAtpB-1 and AtpB-2 proteins is therefore likely to be highlyconstrained to avoid any restrictions of AtpB-2 movementOn the other hand it is also possible that the fragmentedAtpB-2 is more flexible and perhaps less sensitive to somenatural inhibitors The ATP synthase was also noteworthy inthat it migrated at the top of the native protein gel This hasnever been observed in cyanobacteria (Herranen et al 2004)or higher plants (Jarvi et al 2011) and indicates that ATPsynthase might be integrated into a very stable megadaltonsupercomplex in the plastid of C velia

2455

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Highly Divergent Characteristics of Gene andChromosomal Structure

A number of highly modified ORFs are found in the C veliaplastid genome and distributed without any obvious patternLong extensions or insertions in plastid proteins are uncom-mon in most plastids including those in other alveolatesSome have been identified in green algae (eg de Cambiaireet al 2006) but never in the extent or number reported hereMany of the C velia predicted proteins that contain exten-sions function in the small subunit of the ribosome Ifthe extensions are really present in the final functional pro-teins this has implications for plastid ribosome structureAlternatively transcripts of the rps ORFs and other unusualORFs could be modified so that only the homologous regionis translated This process may be related to transcript oligour-idylylation but as yet we have no evidence to support thisOligoU tailing is another modification of general interest(Wang and Morse 2006) and its significance and distribution

in C velia plastid transcripts remain unknown OligoU tailsin some C velia transcripts are found in mature mRNAs thatare derived from polycistronic primary transcripts (fig 1)This suggests a possible role in transcript processing perhapssimilar to that observed in dinoflagellate plastids (Nelsonet al 2007 Dang and Green 2009 Barbrook et al 2012)If confirmed the ubiquitous presence of polycistronictranscription would imply that oligoU tailing is widespreadin the C velia plastid and tightly intertwined with expressionof many core plastid genes

No linear-mapping plastid genome has been documentedto date although there is a longstanding debate aboutwhether circular-mapping plastid genomes might representphysically linear molecules (Bendich 2004) At a sufficientdepth of coverage the two ends of the C velia plastidgenome can be (barely) overlapped in sequence but multiplelines of evidence suggest the genome is actually linear ThePFGE data support existence of linear monomers and rare

FIG 6 A putative position of the PsaA proteins in PSI and the AtpB proteins in ATP synthase of Chromera velia as approximated from proteinalignments and available crystal structures of plant counterparts (A) A stromal view on the pea PSI-LHC complex (PDB ID code 2WSC Amunts et al2009) showing a putative position of PsaA-1 and PsaA-2 in the PSI complex Lhca antenna proteins are shown in green stromal subunits PsaC PsaD andPsaE were removed for clarity (B) A detail of the same structure showing arrangement of chlorophyll molecules bound to the PSI core Red circle markschlorophylls expected to be bound at an interface between PsaA-1 and PsaA-2 proteins in C velia (see discussion for details) (C) AtpB-1 and AtpB-2proteins fitted into structure of the spinach F1-ATPase (PDB ID code 1KMH) Catalytic site is indicated by an orange asterisk

2456

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

dimers (which could arise by TIR-mediated recombinationbetween two monomers) but is incongruent with a signifi-cant presence of linear concatemers This leaves no directcomparisons possible with other plastid DNAs includingthose in related apicomplexans which comprise various mix-tures of circular molecules linear concatemers and high-mo-lecular-weight material (Lilly et al 2001 Williamson et al 2002Bendich 2004 Day and Madesis 2007) Presence of TIRs in theC velia plastid genome is a common characteristic of lineargenomes (eg in apicomplexan mitochondria Kairo et al1994 Hikosaka et al 2010) The physical structure of thechromosome ends remains unknown but given thesequence one can speculate based on structures observedin other linear genomes These include single-strandedloops (possibly followed by a nick) or single-stranded over-hangs in the 15 bp regions (reviewed in Nosek et al 2004) ordiverse associations with end-specific proteins (eg Rekoshet al 1977 Tomaska et al 1997) Gradually decreasing depthof coverage near the chromosome end may also suggest thatthe whole terminal 600 bp region not just its very end isprotected possibly by a t-loop (Tomaska et al 2002) Norepeated sequence can be found close to the ends howeverso determining which of these structures if any most closelyrepresents the physical state of the C velia plastid chromo-some will require direct testing Altogether these data provideevidence for the first linear-mapping plastid genome inC velia and suggests that a similar topology could existin other plastid genomes particularly those that do notpresently map as a circle (Gabrielsen et al 2011)

The change in the plastid chromosome topology hasimportant implications on the process of DNA replicationand overcoming the end-replication problem In the api-complexan and other plastids several types of replicationorigins have been identified and linked to the D-loop roll-ing circle and recombination-mediated replication strate-gies The most common type of replication origin isassociated with bidirectional D-loop replication and locatedinside the duplicated rRNA inverted repeat (Kolodner andTewari 1975 Williamson et al 2002 Krishnan and Rao2009) In contrast the C velia plastid genome lacks boththe duplicated rRNA unit and associated replication originand is probably replicated bidirectionally from thepredicted putative replication origin toward the chromo-some termini which is a unique situation among all plastidgenomes The estimated plastid gene copy number (9that of a nuclear marker) is also unusually low comparedwith most photosynthetic plastids (~50ndash100 copies Dayand Madesis 2007) and is more similar to the gene copynumber in the non-photosynthetic plastids of apicomplex-ans (~25 copies Matsuzaki et al 2001) or plastid minicir-cles during exponential growth in dinoflagellates(2ndash4 copies Koumandou and Howe 2007)

Searching for Forces Driving Structural Complexity

The C velia plastid genome is unique in a number of waysHere we have examined several unusual characteristics ofits organization and expression all of which raises the

question why is this genome home to so many odditiesOne property that might explain several of these character-istics concurrently is a high level of intragenomic duplicationThe C velia plastid genome has been extensively reshuffledand in order to understand the underlying forces wesearched for palindromes short repeats and long repeatsPalindromes and short repeats were comparatively rarehowever the searches identified at least 46 pairwise matchesbetween longer genomic sequences (gt50 bp) at 65ndash100similarity (fig 7) Most of the duplicates are divergent andlow copy number (2ndash3) and apart from the TIRs they includemany intergenic regions duplicated atpH and clpC genesand duplicated fragments of otherwise complete genes(psaA psaB psbB rpl3 and tufA fig 7) The region containingmost repeated sequence is found around the TIR boundary(fig 8) Multiple parts of this region including severalgene fragments three atpH paralogs and six trnK paralogscomprising both complete and fragmentary variants arescattered in several places in the genome (fig 8) Long repeatsare rare in most plastids suggesting that duplication isan important force in shaping the C velia plastid genomeDuplication could explain occurrence of paralogs small genefragments extensive gene re-shuffling and re-structuringof operon units through movement of promoters andintergenic elements Similar processes could account for theaddition of extensions and insertions to genes The splitting ofpsaA and atpB could likewise be seen as a process involvingduplication or partial duplication For example the PsaA splitcould have proceeded through an intermediate in which the

FIG 7 Large repeats in the Chromera velia plastid genome The lineargenome is drawn as an incomplete circle with genes in black intergenicsin light gray and two TIRs (white lines) at chromosome ends (15 bpregions at the termini are not visible here) Links connect homologousregions of 50 bp or more identified by BLAST (dark gray) and additionalmatches identified by Pipmaker (medium gray) Homology between theTIRs is not shown for clarity Names of genes most affected by dupli-cation events are indicated

2457

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

canonical and split PsaAs coexisted In this scenario each ofthe two complexes could have been regulated independentlyand potentially even have different functions much alike thecyanobacterial monomeric and trimeric PSI (Majeed et al2012)

Duplicated sequences in C velia plastid are low copynumber long and divergent from each other They aredissimilar from short dispersed repeats found in some algalplastids (eg in Chlamydomonas Maul et al 2002) and longerrepeats found in angiosperm plastids (eg Guisinger et al2011) all of which are conserved in sequence and compara-tively high copy number It remains unclear how duplicationis mediated in C velia Few duplicates are in tandem so theduplication process is not likely to be related to DNA repli-cation and is more likely related to recombination A func-tional DNA recombination machinery has been documentedin several land plant and algal plastids (Palmer 1983 Boyntonet al 1988 Haberle et al 2008) where it primarily acts in DNArepair and replication Unwanted genome reorganization isknown to be actively prevented which explains why mostplastids have maintained a conserved genomic architecture(reviewed in Marechal and Brisson 2010) It is possiblethat mechanisms preventing unwanted recombination oralternatively those allowing for recombination-mediatedrepair were loosened or lost in the C velia plastid whichled to propagation of duplicates and rearrangementsThe apparent linearization of the chromosome could alsostem from a recombination-mediated rearrangementinvolving homologous TIRs and 15 bp repeats now locatedat the termini Interestingly although sequence shuffling hasbeen extensive in the C velia plastid genome it has not beencompletely random Most genes have been reorganized inlarge clusters with a strongly pronounced strand polarity(supplementary fig S8B Supplementary Material online)More importantly functionally related genes often clustertogether and are most likely transcribed polycistronically(fig 4) so even with high levels of rearrangement a function-ally reorganized genomic architecture can result fromselection

Intragenomic duplication may be the cause of most pecu-liarities in the C velia plastid genome but not all of themFeatures such as transcript oligoU tailing and noncanonicalgenetic code (Moore et al 2008 Janouskovec et al 2010) may

have different causes including increased mutation rate smallpopulation size and a short replication cycle Moreover earlyevolutionary acquisition and long-term combined effect ofsome these forces could have significantly contributed tothe genome remodeling For example all alveolate plastids(C velia apicomplexans Vitrella and dinoflagellates) have areduced gene complement somewhat modified gene orderand comparatively fast rate of protein evolution (Janouskovecet al 2010) Oligouridylylation of plastid transcripts is alsofound in dinoflagellates and may significantly predateC velia although it seems to be missing in apicomplexans(Janouskovec et al 2010 JJ unpublished data) The AtpBinsertion present in dinoflagellates and Vitrella at a homolo-gous position to the split in C velia could be viewed as anancestral acquisition that may have individualized the twoparts of the protein preparing ground for the split Similarlyrecombination has been very active in the dinoflagellate plas-tids and possibly even led to a massive fragmentation of theirgenome to small mini-circles (Zhang et al 1999 Zhang et al2001) so it might be tempting to speculate that the corecause of many of these conditions traces back to an ancientancestor Distinguishing between ancient and independentgains is nevertheless difficult The plastid genomes of apicom-plexans and Vitrella are generally not that unusual in struc-ture as C velia and dinoflagellates so it is more accurate topropose that all alveolate plastids were ancestrally somewhatdivergent but evolved in different directions

The fact that in a single organism two conservativephotosynthesis proteins are split is astonishing given thehigh efficiency of C velia photosynthesis (Quigg et al 2012)The mechanism behind a particular split can be relativelysimple but it is more difficult to see how each of the PsaAand AtpB fragments became integrated into a functionalmultiprotein complex For example both PsaA fragmentshad to coevolve with mechanisms allowing co-translationalinsertion of chlorophylls and simultaneously acquireremodelinteractions with numerous cofactors that mediate theirassembly into a fully functional PSI complex Many otherstructures in the C velia plastid appear similarly complicatedBoth the ribosome and RNA polymerase complex comprisea number of highly modified components including proteinswith hundreds of amino acid long extensions and insertionsExplaining the appearance of highly modified structures

FIG 8 Intrachromosomal duplication and recombination hotspot around the TIR boundary (vertical dashed line) The plastid genome is shown bythick black line at the bottom with TIRs (TIR-A and TIR-B) folded onto themselves to the left and the LSC region forming a loop to the right Other partsof the genome with homology to the TIR boundary region are shown by thick gray lines numbers indicate positions Genes and pseudogenes are shownby boxes Level of sequence similarity is distinguished according to the key

2458

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is more difficult than their components individually becausemany parts of these systems are mutually constrainedLikewise molecular processes such as oligoU tailing aredifficult to justify in adaptive terms Considering the complex-ity of similar unnecessary processes in other organellessuch as RNA editing or intron splicing (Gray et al 2010)it is attractive to hypothesize that complicated molecularmachineries have evolved in the C velia plastid that serveno general advantage

Materials and Methods

DNA Extraction Sequencing Annotation andFragmented Gene Analysis

Pelleted cells of C velia were ground in liquid nitrogenusing mortar and pestle and the resulting slurry was incu-bated in CTAB buffer (2 wv cetyltrimethyl ammoniumbromide 142 M NaCl 20 mM EDTA 100 mM Tris HCl pH80 2 wv polyvinylpyrrolidone 05 b-mercaptoethanol1 mgml RNase A) at 65 C for 20 min After two extrac-tions with phenolchloroform and one with chloroformonly DNA was precipitated with isopropanol and washedwith ethanol Pellet was dried at room temperature andresuspended in TE buffer Extracted DNA was separated byCsCl-Hoechst gradient ultracentrifugation and AT-enrichedfractions were tested for the presence of plastid DNA usingthe psbA probe Plastid-enriched DNA fractions weresequenced using Illumina 54 bp paired-end reads technol-ogy (a total of 9591179 read pairs were obtained) anddeposited in GenBank Sequence Read Archive (BioprojectPRJNA193178) De novo sequence assembly using MIRA3extended the previous plastid contig (NC_0143401) atboth ends to the final length of 120426 nucleotides at572 average depth of coverage Two errors at homopol-ymeric regions were corrected which led to the merging ofrps7 with orf142 and rpl4 with orf128 The gene for 5Sribosomal RNA was identified upstream of 23S rRNAand its fold verified using Mfold (httpmfoldrnaalbanyeduq=mfold last accessed May 25 2013) Proteins werere-annotated by comparison with plastid homologs froman NTG start codon with the exception of 10 proteins thatwere more consistent with an AT[ACT] start codon Six ofthe proteins had no alternative NTG start (atpA rpl31rps7 rps12 rps14 ycf3) Unknown ORFs and genes withlong N-terminal extensions were annotated from the firstNTG start codon The extended plastid genome sequenceand new annotations were deposited in an updated C veliaplastid genome entry (NC_0143402) PsaA trans-membranedomains were predicted using TMHMM 20 (httpwwwcbsdtudkservicesTMHMM last accessed May 25 2013)and by comparison to PsaA of Synechoccocus elongatusThe corrected genetic distances for PsaA-1 and PsaA-2were calculated in Tree-Puzzle 52 (Schmidt et al 2002)using a data set of 40 representative plastid PsaA homologsand the following parameters Likelihood mapping Slow(exact) parameter estimates WAG model with fourgamma categories estimated alpha parameter and esti-mated aa frequencies PsaA-12 phylogenies were

computed using a broader data set in RAxML using -mPROTGAMMALGF -f a - 100 PsaA-12 and AtpB data setswere aligned using ndashlocalpair option in MAFFT

RNA Extraction RACE and Northern BlotHybridization

Total RNA was isolated from ~02 g of C velia cells rup-tured by repeated freezing and thawing followed by grind-ing with a pestle in liquid nitrogen by the addition of 5 mlTRI reagent followed by manufacturerrsquos instructions(Sigma) Total RNA was ligated with 50 adapter (50

RACE) or self-ligated with T4 RNA ligase (circular RACE)and RT-PCR was carried out with either random or specificprimers or with polyT primers (30 RACE) Sequences ofcircularized transcripts were deposited under GenBankNucleotide accessions KC734564ndashKC734569 For Northernblot analysis ~10mg per lane of total RNA was separatedin a 1 formaldehyde agarose gel in 1 MOPS bufferblotted and UV cross-linked as described elsewhere(Vondruskova et al 2005) PCR-amplified DNA fragmentsof selected genes (for primers see supplementary table S4Supplementary Material online) were labeled by randompriming with [32P]-dATP and used as probes for hybrid-ization at 55 C After washing the results were visualizedon Typhoon Imaging System (GE Healthcare)

Preparations of Cell Membranes and TwoDimensional Electrophoresis

Cells of C velia (optical density at 750 nm = ~05) werewashed and resuspended in 1 ml of the working buffer con-taining 25 mM MESNaOH pH 65 5 mM CaCl2 10 mMMgCl2 and 20 glycerol The resuspended cells were mixedwith 05 ml of glass beads (01 mm diameter) in 2 ml eppen-dorf tube and broken using Mini-BeadBeater (BioSpeceight shaking cycles 10 s each with 2-min breaks for coolingthe suspension on ice) Membranes were separated fromthe cell extract by centrifugation (40000 g 20 min)The isolated membranes were resuspended in the workingbuffer and solubilized by gentle shaking with 1 dodecyl-b-maltoside at 10 C for 1 h Insoluble parts were removedby centrifugation (65000 g 20 min) Analysis of nativemembrane complexes was performed using Clear-Nativeelectrophoresis as described in Wittig and Schagger (2008)Individual proteins in membrane complexes were resolvedin the second dimension by SDS-PAGE in a 12ndash20 lineargradient polyacrylamide gel containing 7 M urea (Sobotkaet al 2008)

Protein Identification by LC-MSMS Analysis

Gel slices were placed in 200ml of 40 acetonitrile 200 mMammonium bicarbonate and incubated at 37 C for 30 minThe solution was then discarded the procedure was repeatedone time and the gel was finally dried in a vacuum centrifugeTwenty microliters of 40 mM ammonium bicarbonatecontaining 04mg trypsin (proteomics grade Sigma) wasadded to the tube incubated at 4 C for 45 min and thendried in a vacuum centrifuge To digest proteins 20ml of 9

2459

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 9: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

plastid genome in silico using the cumulative GC-skew anal-ysis This region (68ndash69 kb from one end of the linear contigsupplementary fig S8A Supplementary Material online) ischaracterized by a minimum in cumulative GC-skew andloosely conserved tandem repeats (positions 68404ndash69029)In contrast both ends of the chromosome are at a cumulativeGC-skew maximum suggesting that they most likely corre-spond to replication termini Another feature commonlyassociated with origins of replication is the major shift ingene orientation which is located approximately 18 kbaway from the predicted replication origin in C velia (supple-mentary fig S8B Supplementary Material online) Thegenomic region surrounding the two features is relativelyoverrepresented in the Illumina data whereas both chromo-some ends are relatively underrepresented (this holds aftercorrection for base composition supplementary fig S8CSupplementary Material online) which is consistent withthe existence of replicating molecules in the Illumina libraryAltogether these three characteristics are consistent withlinear-mapping topology and suggest that replication startsat about two-thirds of the way from one end of the moleculeand proceeds bidirectionally toward the ends The copynumber for two C velia plastid genes (tufA and atpH) wasdetermined using dot blot hybridization to total genomicDNA and a synthetic construct (see Materials andMethods) The inferred copy number of both plastid geneswas 9 times higher compared with a single-copy nuclear genetopoisomerase II and confirmed that genes encoded in theLSC region and TIR are equally represented (atpH is present inthree closely related paralogs all of which hybridized to theconstruct) (supplementary fig S9 Supplementary Materialonline)

Discussion

Significance of Split Proteins for Photosynthesis andATP Generation

Both PsaA and AtpB have never been observed to be frag-mented like those of C velia Although the psaA gene is splitin Chlamydomonas reinhardtii RNA trans-splicing producesa full-length transcript which is translated to a full-lengthprotein (Merendino et al 2006) The structure of the PSIcore is highly conserved in all known oxygenic phototrophs(Nelson and Yocum 2006 Busch and Hippler 2011) and it isformed by a heterodimer of structurally similar PsaA and PsaBproteins which bind a remarkably high number of cofactorsabout 100 chlorophylls 14 carotenoids and 2 phylloquinones(Amunts et al 2010) The algal PSI is expected to be structur-ally similar to the plant PSI including several Lhca antennaproteins bound in a half-circle around the PSI (fig 6A)

PsaA in C velia is split between domains 4 and 5 (figs 1 and6A) This suggests that the split was not simply a reversal ofthe proposed ancient formation of psaA by the gene fusion ofa psbBpsbC-like antenna component and a psbApsbD-likecore of the reaction centre (domains 1ndash6 and 7ndash11 respec-tively in fig 1 Schubert et al 1998) The position of the splitand the faster evolutionary rate of PsaA-1 (peripheral frag-ment) can be reconciled with the structure of the PSI (fig 6A

and B) Indeed the presence of a supercomplex of PSI andantenna proteins in C velia is obvious from the 2D electro-phoresis (fig 3) Apart from the structure of PSI itself theprocess of the PSI assembly seems to be highly conservedthrough evolution from cyanobacteria to higher plant chlo-roplasts including assistance of the same auxiliary factors suchas Ycf3 and Ycf4 (Boudreau et al 1997 Ruf et al 1997)Although many individual steps in the PSI biogenesisremain unknown this process had to be remodeled inC velia to assemble three proteins instead of two into thefunctional PSI core complex Particularly intriguing is thebinding of chlorophyll cofactors into the split PsaA Both invivo and in vitro studies suggest that chlorophyll has to beinserted into core subunits of PSI co-translationally probablyas a prerequisite for correct protein folding (Kim et al 1994Eichacker et al 1996) The putative interface between PsaA-1and PsaA-2 is rich in chlorophyll molecules (fig 6B) andaccording to available crystal structures these chlorophyllsare coordinated by both parts of split PsaA It is not clearhow C velia inserts these chlorophylls into PSI and how sta-bility and correct folding of the nascent PsaA-12 is achievedPerhaps both PsaA fragments are assembled together co-translationally and this is synchronized with or even assistedby chlorophyll loading (which could also help to coordinatethe orientation of the two fragments) In any case the PSI corebiogenesis in C velia could involve additional synchronizingassembly step(s) and the recruitment of new nuclear-encoded auxiliary factors Interestingly the second proteinof the PSI antenna core PsaB mirrors some of the PsaAstructure at functionally analogous positions although notsplit PsaB also contains a variable loop between the fourthand fifth trans-membrane helices and a highly divergentN-terminus Whether this plays a compensatory role directlyrelated to the splitting of PsaA is not clear but it is an inter-esting possibility that would require direct biochemicaltesting

Similar to the PsaA protein any split of the AtpB proteinwould seem improbable due to its critical function andconserved structure Even though the proposed AtpB-1part of the b-subunit appears to be relatively far from thecatalytic site (fig 6C) it is known that tilting of the b-subunitsincluding the top b-sheet crown is critical for the catalyticcycle The movement of the upper part of the b-subunit issubtle but if it is restricted by inhibitor tentoxin then thecyclic interconversion of nucleotide binding sites is blocked(Groth 2002) Tentoxin binds close to the place where theAtpB protein is split (Groth 2002) Correct reassembly ofAtpB-1 and AtpB-2 proteins is therefore likely to be highlyconstrained to avoid any restrictions of AtpB-2 movementOn the other hand it is also possible that the fragmentedAtpB-2 is more flexible and perhaps less sensitive to somenatural inhibitors The ATP synthase was also noteworthy inthat it migrated at the top of the native protein gel This hasnever been observed in cyanobacteria (Herranen et al 2004)or higher plants (Jarvi et al 2011) and indicates that ATPsynthase might be integrated into a very stable megadaltonsupercomplex in the plastid of C velia

2455

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Highly Divergent Characteristics of Gene andChromosomal Structure

A number of highly modified ORFs are found in the C veliaplastid genome and distributed without any obvious patternLong extensions or insertions in plastid proteins are uncom-mon in most plastids including those in other alveolatesSome have been identified in green algae (eg de Cambiaireet al 2006) but never in the extent or number reported hereMany of the C velia predicted proteins that contain exten-sions function in the small subunit of the ribosome Ifthe extensions are really present in the final functional pro-teins this has implications for plastid ribosome structureAlternatively transcripts of the rps ORFs and other unusualORFs could be modified so that only the homologous regionis translated This process may be related to transcript oligour-idylylation but as yet we have no evidence to support thisOligoU tailing is another modification of general interest(Wang and Morse 2006) and its significance and distribution

in C velia plastid transcripts remain unknown OligoU tailsin some C velia transcripts are found in mature mRNAs thatare derived from polycistronic primary transcripts (fig 1)This suggests a possible role in transcript processing perhapssimilar to that observed in dinoflagellate plastids (Nelsonet al 2007 Dang and Green 2009 Barbrook et al 2012)If confirmed the ubiquitous presence of polycistronictranscription would imply that oligoU tailing is widespreadin the C velia plastid and tightly intertwined with expressionof many core plastid genes

No linear-mapping plastid genome has been documentedto date although there is a longstanding debate aboutwhether circular-mapping plastid genomes might representphysically linear molecules (Bendich 2004) At a sufficientdepth of coverage the two ends of the C velia plastidgenome can be (barely) overlapped in sequence but multiplelines of evidence suggest the genome is actually linear ThePFGE data support existence of linear monomers and rare

FIG 6 A putative position of the PsaA proteins in PSI and the AtpB proteins in ATP synthase of Chromera velia as approximated from proteinalignments and available crystal structures of plant counterparts (A) A stromal view on the pea PSI-LHC complex (PDB ID code 2WSC Amunts et al2009) showing a putative position of PsaA-1 and PsaA-2 in the PSI complex Lhca antenna proteins are shown in green stromal subunits PsaC PsaD andPsaE were removed for clarity (B) A detail of the same structure showing arrangement of chlorophyll molecules bound to the PSI core Red circle markschlorophylls expected to be bound at an interface between PsaA-1 and PsaA-2 proteins in C velia (see discussion for details) (C) AtpB-1 and AtpB-2proteins fitted into structure of the spinach F1-ATPase (PDB ID code 1KMH) Catalytic site is indicated by an orange asterisk

2456

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

dimers (which could arise by TIR-mediated recombinationbetween two monomers) but is incongruent with a signifi-cant presence of linear concatemers This leaves no directcomparisons possible with other plastid DNAs includingthose in related apicomplexans which comprise various mix-tures of circular molecules linear concatemers and high-mo-lecular-weight material (Lilly et al 2001 Williamson et al 2002Bendich 2004 Day and Madesis 2007) Presence of TIRs in theC velia plastid genome is a common characteristic of lineargenomes (eg in apicomplexan mitochondria Kairo et al1994 Hikosaka et al 2010) The physical structure of thechromosome ends remains unknown but given thesequence one can speculate based on structures observedin other linear genomes These include single-strandedloops (possibly followed by a nick) or single-stranded over-hangs in the 15 bp regions (reviewed in Nosek et al 2004) ordiverse associations with end-specific proteins (eg Rekoshet al 1977 Tomaska et al 1997) Gradually decreasing depthof coverage near the chromosome end may also suggest thatthe whole terminal 600 bp region not just its very end isprotected possibly by a t-loop (Tomaska et al 2002) Norepeated sequence can be found close to the ends howeverso determining which of these structures if any most closelyrepresents the physical state of the C velia plastid chromo-some will require direct testing Altogether these data provideevidence for the first linear-mapping plastid genome inC velia and suggests that a similar topology could existin other plastid genomes particularly those that do notpresently map as a circle (Gabrielsen et al 2011)

The change in the plastid chromosome topology hasimportant implications on the process of DNA replicationand overcoming the end-replication problem In the api-complexan and other plastids several types of replicationorigins have been identified and linked to the D-loop roll-ing circle and recombination-mediated replication strate-gies The most common type of replication origin isassociated with bidirectional D-loop replication and locatedinside the duplicated rRNA inverted repeat (Kolodner andTewari 1975 Williamson et al 2002 Krishnan and Rao2009) In contrast the C velia plastid genome lacks boththe duplicated rRNA unit and associated replication originand is probably replicated bidirectionally from thepredicted putative replication origin toward the chromo-some termini which is a unique situation among all plastidgenomes The estimated plastid gene copy number (9that of a nuclear marker) is also unusually low comparedwith most photosynthetic plastids (~50ndash100 copies Dayand Madesis 2007) and is more similar to the gene copynumber in the non-photosynthetic plastids of apicomplex-ans (~25 copies Matsuzaki et al 2001) or plastid minicir-cles during exponential growth in dinoflagellates(2ndash4 copies Koumandou and Howe 2007)

Searching for Forces Driving Structural Complexity

The C velia plastid genome is unique in a number of waysHere we have examined several unusual characteristics ofits organization and expression all of which raises the

question why is this genome home to so many odditiesOne property that might explain several of these character-istics concurrently is a high level of intragenomic duplicationThe C velia plastid genome has been extensively reshuffledand in order to understand the underlying forces wesearched for palindromes short repeats and long repeatsPalindromes and short repeats were comparatively rarehowever the searches identified at least 46 pairwise matchesbetween longer genomic sequences (gt50 bp) at 65ndash100similarity (fig 7) Most of the duplicates are divergent andlow copy number (2ndash3) and apart from the TIRs they includemany intergenic regions duplicated atpH and clpC genesand duplicated fragments of otherwise complete genes(psaA psaB psbB rpl3 and tufA fig 7) The region containingmost repeated sequence is found around the TIR boundary(fig 8) Multiple parts of this region including severalgene fragments three atpH paralogs and six trnK paralogscomprising both complete and fragmentary variants arescattered in several places in the genome (fig 8) Long repeatsare rare in most plastids suggesting that duplication isan important force in shaping the C velia plastid genomeDuplication could explain occurrence of paralogs small genefragments extensive gene re-shuffling and re-structuringof operon units through movement of promoters andintergenic elements Similar processes could account for theaddition of extensions and insertions to genes The splitting ofpsaA and atpB could likewise be seen as a process involvingduplication or partial duplication For example the PsaA splitcould have proceeded through an intermediate in which the

FIG 7 Large repeats in the Chromera velia plastid genome The lineargenome is drawn as an incomplete circle with genes in black intergenicsin light gray and two TIRs (white lines) at chromosome ends (15 bpregions at the termini are not visible here) Links connect homologousregions of 50 bp or more identified by BLAST (dark gray) and additionalmatches identified by Pipmaker (medium gray) Homology between theTIRs is not shown for clarity Names of genes most affected by dupli-cation events are indicated

2457

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

canonical and split PsaAs coexisted In this scenario each ofthe two complexes could have been regulated independentlyand potentially even have different functions much alike thecyanobacterial monomeric and trimeric PSI (Majeed et al2012)

Duplicated sequences in C velia plastid are low copynumber long and divergent from each other They aredissimilar from short dispersed repeats found in some algalplastids (eg in Chlamydomonas Maul et al 2002) and longerrepeats found in angiosperm plastids (eg Guisinger et al2011) all of which are conserved in sequence and compara-tively high copy number It remains unclear how duplicationis mediated in C velia Few duplicates are in tandem so theduplication process is not likely to be related to DNA repli-cation and is more likely related to recombination A func-tional DNA recombination machinery has been documentedin several land plant and algal plastids (Palmer 1983 Boyntonet al 1988 Haberle et al 2008) where it primarily acts in DNArepair and replication Unwanted genome reorganization isknown to be actively prevented which explains why mostplastids have maintained a conserved genomic architecture(reviewed in Marechal and Brisson 2010) It is possiblethat mechanisms preventing unwanted recombination oralternatively those allowing for recombination-mediatedrepair were loosened or lost in the C velia plastid whichled to propagation of duplicates and rearrangementsThe apparent linearization of the chromosome could alsostem from a recombination-mediated rearrangementinvolving homologous TIRs and 15 bp repeats now locatedat the termini Interestingly although sequence shuffling hasbeen extensive in the C velia plastid genome it has not beencompletely random Most genes have been reorganized inlarge clusters with a strongly pronounced strand polarity(supplementary fig S8B Supplementary Material online)More importantly functionally related genes often clustertogether and are most likely transcribed polycistronically(fig 4) so even with high levels of rearrangement a function-ally reorganized genomic architecture can result fromselection

Intragenomic duplication may be the cause of most pecu-liarities in the C velia plastid genome but not all of themFeatures such as transcript oligoU tailing and noncanonicalgenetic code (Moore et al 2008 Janouskovec et al 2010) may

have different causes including increased mutation rate smallpopulation size and a short replication cycle Moreover earlyevolutionary acquisition and long-term combined effect ofsome these forces could have significantly contributed tothe genome remodeling For example all alveolate plastids(C velia apicomplexans Vitrella and dinoflagellates) have areduced gene complement somewhat modified gene orderand comparatively fast rate of protein evolution (Janouskovecet al 2010) Oligouridylylation of plastid transcripts is alsofound in dinoflagellates and may significantly predateC velia although it seems to be missing in apicomplexans(Janouskovec et al 2010 JJ unpublished data) The AtpBinsertion present in dinoflagellates and Vitrella at a homolo-gous position to the split in C velia could be viewed as anancestral acquisition that may have individualized the twoparts of the protein preparing ground for the split Similarlyrecombination has been very active in the dinoflagellate plas-tids and possibly even led to a massive fragmentation of theirgenome to small mini-circles (Zhang et al 1999 Zhang et al2001) so it might be tempting to speculate that the corecause of many of these conditions traces back to an ancientancestor Distinguishing between ancient and independentgains is nevertheless difficult The plastid genomes of apicom-plexans and Vitrella are generally not that unusual in struc-ture as C velia and dinoflagellates so it is more accurate topropose that all alveolate plastids were ancestrally somewhatdivergent but evolved in different directions

The fact that in a single organism two conservativephotosynthesis proteins are split is astonishing given thehigh efficiency of C velia photosynthesis (Quigg et al 2012)The mechanism behind a particular split can be relativelysimple but it is more difficult to see how each of the PsaAand AtpB fragments became integrated into a functionalmultiprotein complex For example both PsaA fragmentshad to coevolve with mechanisms allowing co-translationalinsertion of chlorophylls and simultaneously acquireremodelinteractions with numerous cofactors that mediate theirassembly into a fully functional PSI complex Many otherstructures in the C velia plastid appear similarly complicatedBoth the ribosome and RNA polymerase complex comprisea number of highly modified components including proteinswith hundreds of amino acid long extensions and insertionsExplaining the appearance of highly modified structures

FIG 8 Intrachromosomal duplication and recombination hotspot around the TIR boundary (vertical dashed line) The plastid genome is shown bythick black line at the bottom with TIRs (TIR-A and TIR-B) folded onto themselves to the left and the LSC region forming a loop to the right Other partsof the genome with homology to the TIR boundary region are shown by thick gray lines numbers indicate positions Genes and pseudogenes are shownby boxes Level of sequence similarity is distinguished according to the key

2458

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is more difficult than their components individually becausemany parts of these systems are mutually constrainedLikewise molecular processes such as oligoU tailing aredifficult to justify in adaptive terms Considering the complex-ity of similar unnecessary processes in other organellessuch as RNA editing or intron splicing (Gray et al 2010)it is attractive to hypothesize that complicated molecularmachineries have evolved in the C velia plastid that serveno general advantage

Materials and Methods

DNA Extraction Sequencing Annotation andFragmented Gene Analysis

Pelleted cells of C velia were ground in liquid nitrogenusing mortar and pestle and the resulting slurry was incu-bated in CTAB buffer (2 wv cetyltrimethyl ammoniumbromide 142 M NaCl 20 mM EDTA 100 mM Tris HCl pH80 2 wv polyvinylpyrrolidone 05 b-mercaptoethanol1 mgml RNase A) at 65 C for 20 min After two extrac-tions with phenolchloroform and one with chloroformonly DNA was precipitated with isopropanol and washedwith ethanol Pellet was dried at room temperature andresuspended in TE buffer Extracted DNA was separated byCsCl-Hoechst gradient ultracentrifugation and AT-enrichedfractions were tested for the presence of plastid DNA usingthe psbA probe Plastid-enriched DNA fractions weresequenced using Illumina 54 bp paired-end reads technol-ogy (a total of 9591179 read pairs were obtained) anddeposited in GenBank Sequence Read Archive (BioprojectPRJNA193178) De novo sequence assembly using MIRA3extended the previous plastid contig (NC_0143401) atboth ends to the final length of 120426 nucleotides at572 average depth of coverage Two errors at homopol-ymeric regions were corrected which led to the merging ofrps7 with orf142 and rpl4 with orf128 The gene for 5Sribosomal RNA was identified upstream of 23S rRNAand its fold verified using Mfold (httpmfoldrnaalbanyeduq=mfold last accessed May 25 2013) Proteins werere-annotated by comparison with plastid homologs froman NTG start codon with the exception of 10 proteins thatwere more consistent with an AT[ACT] start codon Six ofthe proteins had no alternative NTG start (atpA rpl31rps7 rps12 rps14 ycf3) Unknown ORFs and genes withlong N-terminal extensions were annotated from the firstNTG start codon The extended plastid genome sequenceand new annotations were deposited in an updated C veliaplastid genome entry (NC_0143402) PsaA trans-membranedomains were predicted using TMHMM 20 (httpwwwcbsdtudkservicesTMHMM last accessed May 25 2013)and by comparison to PsaA of Synechoccocus elongatusThe corrected genetic distances for PsaA-1 and PsaA-2were calculated in Tree-Puzzle 52 (Schmidt et al 2002)using a data set of 40 representative plastid PsaA homologsand the following parameters Likelihood mapping Slow(exact) parameter estimates WAG model with fourgamma categories estimated alpha parameter and esti-mated aa frequencies PsaA-12 phylogenies were

computed using a broader data set in RAxML using -mPROTGAMMALGF -f a - 100 PsaA-12 and AtpB data setswere aligned using ndashlocalpair option in MAFFT

RNA Extraction RACE and Northern BlotHybridization

Total RNA was isolated from ~02 g of C velia cells rup-tured by repeated freezing and thawing followed by grind-ing with a pestle in liquid nitrogen by the addition of 5 mlTRI reagent followed by manufacturerrsquos instructions(Sigma) Total RNA was ligated with 50 adapter (50

RACE) or self-ligated with T4 RNA ligase (circular RACE)and RT-PCR was carried out with either random or specificprimers or with polyT primers (30 RACE) Sequences ofcircularized transcripts were deposited under GenBankNucleotide accessions KC734564ndashKC734569 For Northernblot analysis ~10mg per lane of total RNA was separatedin a 1 formaldehyde agarose gel in 1 MOPS bufferblotted and UV cross-linked as described elsewhere(Vondruskova et al 2005) PCR-amplified DNA fragmentsof selected genes (for primers see supplementary table S4Supplementary Material online) were labeled by randompriming with [32P]-dATP and used as probes for hybrid-ization at 55 C After washing the results were visualizedon Typhoon Imaging System (GE Healthcare)

Preparations of Cell Membranes and TwoDimensional Electrophoresis

Cells of C velia (optical density at 750 nm = ~05) werewashed and resuspended in 1 ml of the working buffer con-taining 25 mM MESNaOH pH 65 5 mM CaCl2 10 mMMgCl2 and 20 glycerol The resuspended cells were mixedwith 05 ml of glass beads (01 mm diameter) in 2 ml eppen-dorf tube and broken using Mini-BeadBeater (BioSpeceight shaking cycles 10 s each with 2-min breaks for coolingthe suspension on ice) Membranes were separated fromthe cell extract by centrifugation (40000 g 20 min)The isolated membranes were resuspended in the workingbuffer and solubilized by gentle shaking with 1 dodecyl-b-maltoside at 10 C for 1 h Insoluble parts were removedby centrifugation (65000 g 20 min) Analysis of nativemembrane complexes was performed using Clear-Nativeelectrophoresis as described in Wittig and Schagger (2008)Individual proteins in membrane complexes were resolvedin the second dimension by SDS-PAGE in a 12ndash20 lineargradient polyacrylamide gel containing 7 M urea (Sobotkaet al 2008)

Protein Identification by LC-MSMS Analysis

Gel slices were placed in 200ml of 40 acetonitrile 200 mMammonium bicarbonate and incubated at 37 C for 30 minThe solution was then discarded the procedure was repeatedone time and the gel was finally dried in a vacuum centrifugeTwenty microliters of 40 mM ammonium bicarbonatecontaining 04mg trypsin (proteomics grade Sigma) wasadded to the tube incubated at 4 C for 45 min and thendried in a vacuum centrifuge To digest proteins 20ml of 9

2459

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 10: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

Highly Divergent Characteristics of Gene andChromosomal Structure

A number of highly modified ORFs are found in the C veliaplastid genome and distributed without any obvious patternLong extensions or insertions in plastid proteins are uncom-mon in most plastids including those in other alveolatesSome have been identified in green algae (eg de Cambiaireet al 2006) but never in the extent or number reported hereMany of the C velia predicted proteins that contain exten-sions function in the small subunit of the ribosome Ifthe extensions are really present in the final functional pro-teins this has implications for plastid ribosome structureAlternatively transcripts of the rps ORFs and other unusualORFs could be modified so that only the homologous regionis translated This process may be related to transcript oligour-idylylation but as yet we have no evidence to support thisOligoU tailing is another modification of general interest(Wang and Morse 2006) and its significance and distribution

in C velia plastid transcripts remain unknown OligoU tailsin some C velia transcripts are found in mature mRNAs thatare derived from polycistronic primary transcripts (fig 1)This suggests a possible role in transcript processing perhapssimilar to that observed in dinoflagellate plastids (Nelsonet al 2007 Dang and Green 2009 Barbrook et al 2012)If confirmed the ubiquitous presence of polycistronictranscription would imply that oligoU tailing is widespreadin the C velia plastid and tightly intertwined with expressionof many core plastid genes

No linear-mapping plastid genome has been documentedto date although there is a longstanding debate aboutwhether circular-mapping plastid genomes might representphysically linear molecules (Bendich 2004) At a sufficientdepth of coverage the two ends of the C velia plastidgenome can be (barely) overlapped in sequence but multiplelines of evidence suggest the genome is actually linear ThePFGE data support existence of linear monomers and rare

FIG 6 A putative position of the PsaA proteins in PSI and the AtpB proteins in ATP synthase of Chromera velia as approximated from proteinalignments and available crystal structures of plant counterparts (A) A stromal view on the pea PSI-LHC complex (PDB ID code 2WSC Amunts et al2009) showing a putative position of PsaA-1 and PsaA-2 in the PSI complex Lhca antenna proteins are shown in green stromal subunits PsaC PsaD andPsaE were removed for clarity (B) A detail of the same structure showing arrangement of chlorophyll molecules bound to the PSI core Red circle markschlorophylls expected to be bound at an interface between PsaA-1 and PsaA-2 proteins in C velia (see discussion for details) (C) AtpB-1 and AtpB-2proteins fitted into structure of the spinach F1-ATPase (PDB ID code 1KMH) Catalytic site is indicated by an orange asterisk

2456

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

dimers (which could arise by TIR-mediated recombinationbetween two monomers) but is incongruent with a signifi-cant presence of linear concatemers This leaves no directcomparisons possible with other plastid DNAs includingthose in related apicomplexans which comprise various mix-tures of circular molecules linear concatemers and high-mo-lecular-weight material (Lilly et al 2001 Williamson et al 2002Bendich 2004 Day and Madesis 2007) Presence of TIRs in theC velia plastid genome is a common characteristic of lineargenomes (eg in apicomplexan mitochondria Kairo et al1994 Hikosaka et al 2010) The physical structure of thechromosome ends remains unknown but given thesequence one can speculate based on structures observedin other linear genomes These include single-strandedloops (possibly followed by a nick) or single-stranded over-hangs in the 15 bp regions (reviewed in Nosek et al 2004) ordiverse associations with end-specific proteins (eg Rekoshet al 1977 Tomaska et al 1997) Gradually decreasing depthof coverage near the chromosome end may also suggest thatthe whole terminal 600 bp region not just its very end isprotected possibly by a t-loop (Tomaska et al 2002) Norepeated sequence can be found close to the ends howeverso determining which of these structures if any most closelyrepresents the physical state of the C velia plastid chromo-some will require direct testing Altogether these data provideevidence for the first linear-mapping plastid genome inC velia and suggests that a similar topology could existin other plastid genomes particularly those that do notpresently map as a circle (Gabrielsen et al 2011)

The change in the plastid chromosome topology hasimportant implications on the process of DNA replicationand overcoming the end-replication problem In the api-complexan and other plastids several types of replicationorigins have been identified and linked to the D-loop roll-ing circle and recombination-mediated replication strate-gies The most common type of replication origin isassociated with bidirectional D-loop replication and locatedinside the duplicated rRNA inverted repeat (Kolodner andTewari 1975 Williamson et al 2002 Krishnan and Rao2009) In contrast the C velia plastid genome lacks boththe duplicated rRNA unit and associated replication originand is probably replicated bidirectionally from thepredicted putative replication origin toward the chromo-some termini which is a unique situation among all plastidgenomes The estimated plastid gene copy number (9that of a nuclear marker) is also unusually low comparedwith most photosynthetic plastids (~50ndash100 copies Dayand Madesis 2007) and is more similar to the gene copynumber in the non-photosynthetic plastids of apicomplex-ans (~25 copies Matsuzaki et al 2001) or plastid minicir-cles during exponential growth in dinoflagellates(2ndash4 copies Koumandou and Howe 2007)

Searching for Forces Driving Structural Complexity

The C velia plastid genome is unique in a number of waysHere we have examined several unusual characteristics ofits organization and expression all of which raises the

question why is this genome home to so many odditiesOne property that might explain several of these character-istics concurrently is a high level of intragenomic duplicationThe C velia plastid genome has been extensively reshuffledand in order to understand the underlying forces wesearched for palindromes short repeats and long repeatsPalindromes and short repeats were comparatively rarehowever the searches identified at least 46 pairwise matchesbetween longer genomic sequences (gt50 bp) at 65ndash100similarity (fig 7) Most of the duplicates are divergent andlow copy number (2ndash3) and apart from the TIRs they includemany intergenic regions duplicated atpH and clpC genesand duplicated fragments of otherwise complete genes(psaA psaB psbB rpl3 and tufA fig 7) The region containingmost repeated sequence is found around the TIR boundary(fig 8) Multiple parts of this region including severalgene fragments three atpH paralogs and six trnK paralogscomprising both complete and fragmentary variants arescattered in several places in the genome (fig 8) Long repeatsare rare in most plastids suggesting that duplication isan important force in shaping the C velia plastid genomeDuplication could explain occurrence of paralogs small genefragments extensive gene re-shuffling and re-structuringof operon units through movement of promoters andintergenic elements Similar processes could account for theaddition of extensions and insertions to genes The splitting ofpsaA and atpB could likewise be seen as a process involvingduplication or partial duplication For example the PsaA splitcould have proceeded through an intermediate in which the

FIG 7 Large repeats in the Chromera velia plastid genome The lineargenome is drawn as an incomplete circle with genes in black intergenicsin light gray and two TIRs (white lines) at chromosome ends (15 bpregions at the termini are not visible here) Links connect homologousregions of 50 bp or more identified by BLAST (dark gray) and additionalmatches identified by Pipmaker (medium gray) Homology between theTIRs is not shown for clarity Names of genes most affected by dupli-cation events are indicated

2457

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

canonical and split PsaAs coexisted In this scenario each ofthe two complexes could have been regulated independentlyand potentially even have different functions much alike thecyanobacterial monomeric and trimeric PSI (Majeed et al2012)

Duplicated sequences in C velia plastid are low copynumber long and divergent from each other They aredissimilar from short dispersed repeats found in some algalplastids (eg in Chlamydomonas Maul et al 2002) and longerrepeats found in angiosperm plastids (eg Guisinger et al2011) all of which are conserved in sequence and compara-tively high copy number It remains unclear how duplicationis mediated in C velia Few duplicates are in tandem so theduplication process is not likely to be related to DNA repli-cation and is more likely related to recombination A func-tional DNA recombination machinery has been documentedin several land plant and algal plastids (Palmer 1983 Boyntonet al 1988 Haberle et al 2008) where it primarily acts in DNArepair and replication Unwanted genome reorganization isknown to be actively prevented which explains why mostplastids have maintained a conserved genomic architecture(reviewed in Marechal and Brisson 2010) It is possiblethat mechanisms preventing unwanted recombination oralternatively those allowing for recombination-mediatedrepair were loosened or lost in the C velia plastid whichled to propagation of duplicates and rearrangementsThe apparent linearization of the chromosome could alsostem from a recombination-mediated rearrangementinvolving homologous TIRs and 15 bp repeats now locatedat the termini Interestingly although sequence shuffling hasbeen extensive in the C velia plastid genome it has not beencompletely random Most genes have been reorganized inlarge clusters with a strongly pronounced strand polarity(supplementary fig S8B Supplementary Material online)More importantly functionally related genes often clustertogether and are most likely transcribed polycistronically(fig 4) so even with high levels of rearrangement a function-ally reorganized genomic architecture can result fromselection

Intragenomic duplication may be the cause of most pecu-liarities in the C velia plastid genome but not all of themFeatures such as transcript oligoU tailing and noncanonicalgenetic code (Moore et al 2008 Janouskovec et al 2010) may

have different causes including increased mutation rate smallpopulation size and a short replication cycle Moreover earlyevolutionary acquisition and long-term combined effect ofsome these forces could have significantly contributed tothe genome remodeling For example all alveolate plastids(C velia apicomplexans Vitrella and dinoflagellates) have areduced gene complement somewhat modified gene orderand comparatively fast rate of protein evolution (Janouskovecet al 2010) Oligouridylylation of plastid transcripts is alsofound in dinoflagellates and may significantly predateC velia although it seems to be missing in apicomplexans(Janouskovec et al 2010 JJ unpublished data) The AtpBinsertion present in dinoflagellates and Vitrella at a homolo-gous position to the split in C velia could be viewed as anancestral acquisition that may have individualized the twoparts of the protein preparing ground for the split Similarlyrecombination has been very active in the dinoflagellate plas-tids and possibly even led to a massive fragmentation of theirgenome to small mini-circles (Zhang et al 1999 Zhang et al2001) so it might be tempting to speculate that the corecause of many of these conditions traces back to an ancientancestor Distinguishing between ancient and independentgains is nevertheless difficult The plastid genomes of apicom-plexans and Vitrella are generally not that unusual in struc-ture as C velia and dinoflagellates so it is more accurate topropose that all alveolate plastids were ancestrally somewhatdivergent but evolved in different directions

The fact that in a single organism two conservativephotosynthesis proteins are split is astonishing given thehigh efficiency of C velia photosynthesis (Quigg et al 2012)The mechanism behind a particular split can be relativelysimple but it is more difficult to see how each of the PsaAand AtpB fragments became integrated into a functionalmultiprotein complex For example both PsaA fragmentshad to coevolve with mechanisms allowing co-translationalinsertion of chlorophylls and simultaneously acquireremodelinteractions with numerous cofactors that mediate theirassembly into a fully functional PSI complex Many otherstructures in the C velia plastid appear similarly complicatedBoth the ribosome and RNA polymerase complex comprisea number of highly modified components including proteinswith hundreds of amino acid long extensions and insertionsExplaining the appearance of highly modified structures

FIG 8 Intrachromosomal duplication and recombination hotspot around the TIR boundary (vertical dashed line) The plastid genome is shown bythick black line at the bottom with TIRs (TIR-A and TIR-B) folded onto themselves to the left and the LSC region forming a loop to the right Other partsof the genome with homology to the TIR boundary region are shown by thick gray lines numbers indicate positions Genes and pseudogenes are shownby boxes Level of sequence similarity is distinguished according to the key

2458

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is more difficult than their components individually becausemany parts of these systems are mutually constrainedLikewise molecular processes such as oligoU tailing aredifficult to justify in adaptive terms Considering the complex-ity of similar unnecessary processes in other organellessuch as RNA editing or intron splicing (Gray et al 2010)it is attractive to hypothesize that complicated molecularmachineries have evolved in the C velia plastid that serveno general advantage

Materials and Methods

DNA Extraction Sequencing Annotation andFragmented Gene Analysis

Pelleted cells of C velia were ground in liquid nitrogenusing mortar and pestle and the resulting slurry was incu-bated in CTAB buffer (2 wv cetyltrimethyl ammoniumbromide 142 M NaCl 20 mM EDTA 100 mM Tris HCl pH80 2 wv polyvinylpyrrolidone 05 b-mercaptoethanol1 mgml RNase A) at 65 C for 20 min After two extrac-tions with phenolchloroform and one with chloroformonly DNA was precipitated with isopropanol and washedwith ethanol Pellet was dried at room temperature andresuspended in TE buffer Extracted DNA was separated byCsCl-Hoechst gradient ultracentrifugation and AT-enrichedfractions were tested for the presence of plastid DNA usingthe psbA probe Plastid-enriched DNA fractions weresequenced using Illumina 54 bp paired-end reads technol-ogy (a total of 9591179 read pairs were obtained) anddeposited in GenBank Sequence Read Archive (BioprojectPRJNA193178) De novo sequence assembly using MIRA3extended the previous plastid contig (NC_0143401) atboth ends to the final length of 120426 nucleotides at572 average depth of coverage Two errors at homopol-ymeric regions were corrected which led to the merging ofrps7 with orf142 and rpl4 with orf128 The gene for 5Sribosomal RNA was identified upstream of 23S rRNAand its fold verified using Mfold (httpmfoldrnaalbanyeduq=mfold last accessed May 25 2013) Proteins werere-annotated by comparison with plastid homologs froman NTG start codon with the exception of 10 proteins thatwere more consistent with an AT[ACT] start codon Six ofthe proteins had no alternative NTG start (atpA rpl31rps7 rps12 rps14 ycf3) Unknown ORFs and genes withlong N-terminal extensions were annotated from the firstNTG start codon The extended plastid genome sequenceand new annotations were deposited in an updated C veliaplastid genome entry (NC_0143402) PsaA trans-membranedomains were predicted using TMHMM 20 (httpwwwcbsdtudkservicesTMHMM last accessed May 25 2013)and by comparison to PsaA of Synechoccocus elongatusThe corrected genetic distances for PsaA-1 and PsaA-2were calculated in Tree-Puzzle 52 (Schmidt et al 2002)using a data set of 40 representative plastid PsaA homologsand the following parameters Likelihood mapping Slow(exact) parameter estimates WAG model with fourgamma categories estimated alpha parameter and esti-mated aa frequencies PsaA-12 phylogenies were

computed using a broader data set in RAxML using -mPROTGAMMALGF -f a - 100 PsaA-12 and AtpB data setswere aligned using ndashlocalpair option in MAFFT

RNA Extraction RACE and Northern BlotHybridization

Total RNA was isolated from ~02 g of C velia cells rup-tured by repeated freezing and thawing followed by grind-ing with a pestle in liquid nitrogen by the addition of 5 mlTRI reagent followed by manufacturerrsquos instructions(Sigma) Total RNA was ligated with 50 adapter (50

RACE) or self-ligated with T4 RNA ligase (circular RACE)and RT-PCR was carried out with either random or specificprimers or with polyT primers (30 RACE) Sequences ofcircularized transcripts were deposited under GenBankNucleotide accessions KC734564ndashKC734569 For Northernblot analysis ~10mg per lane of total RNA was separatedin a 1 formaldehyde agarose gel in 1 MOPS bufferblotted and UV cross-linked as described elsewhere(Vondruskova et al 2005) PCR-amplified DNA fragmentsof selected genes (for primers see supplementary table S4Supplementary Material online) were labeled by randompriming with [32P]-dATP and used as probes for hybrid-ization at 55 C After washing the results were visualizedon Typhoon Imaging System (GE Healthcare)

Preparations of Cell Membranes and TwoDimensional Electrophoresis

Cells of C velia (optical density at 750 nm = ~05) werewashed and resuspended in 1 ml of the working buffer con-taining 25 mM MESNaOH pH 65 5 mM CaCl2 10 mMMgCl2 and 20 glycerol The resuspended cells were mixedwith 05 ml of glass beads (01 mm diameter) in 2 ml eppen-dorf tube and broken using Mini-BeadBeater (BioSpeceight shaking cycles 10 s each with 2-min breaks for coolingthe suspension on ice) Membranes were separated fromthe cell extract by centrifugation (40000 g 20 min)The isolated membranes were resuspended in the workingbuffer and solubilized by gentle shaking with 1 dodecyl-b-maltoside at 10 C for 1 h Insoluble parts were removedby centrifugation (65000 g 20 min) Analysis of nativemembrane complexes was performed using Clear-Nativeelectrophoresis as described in Wittig and Schagger (2008)Individual proteins in membrane complexes were resolvedin the second dimension by SDS-PAGE in a 12ndash20 lineargradient polyacrylamide gel containing 7 M urea (Sobotkaet al 2008)

Protein Identification by LC-MSMS Analysis

Gel slices were placed in 200ml of 40 acetonitrile 200 mMammonium bicarbonate and incubated at 37 C for 30 minThe solution was then discarded the procedure was repeatedone time and the gel was finally dried in a vacuum centrifugeTwenty microliters of 40 mM ammonium bicarbonatecontaining 04mg trypsin (proteomics grade Sigma) wasadded to the tube incubated at 4 C for 45 min and thendried in a vacuum centrifuge To digest proteins 20ml of 9

2459

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 11: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

dimers (which could arise by TIR-mediated recombinationbetween two monomers) but is incongruent with a signifi-cant presence of linear concatemers This leaves no directcomparisons possible with other plastid DNAs includingthose in related apicomplexans which comprise various mix-tures of circular molecules linear concatemers and high-mo-lecular-weight material (Lilly et al 2001 Williamson et al 2002Bendich 2004 Day and Madesis 2007) Presence of TIRs in theC velia plastid genome is a common characteristic of lineargenomes (eg in apicomplexan mitochondria Kairo et al1994 Hikosaka et al 2010) The physical structure of thechromosome ends remains unknown but given thesequence one can speculate based on structures observedin other linear genomes These include single-strandedloops (possibly followed by a nick) or single-stranded over-hangs in the 15 bp regions (reviewed in Nosek et al 2004) ordiverse associations with end-specific proteins (eg Rekoshet al 1977 Tomaska et al 1997) Gradually decreasing depthof coverage near the chromosome end may also suggest thatthe whole terminal 600 bp region not just its very end isprotected possibly by a t-loop (Tomaska et al 2002) Norepeated sequence can be found close to the ends howeverso determining which of these structures if any most closelyrepresents the physical state of the C velia plastid chromo-some will require direct testing Altogether these data provideevidence for the first linear-mapping plastid genome inC velia and suggests that a similar topology could existin other plastid genomes particularly those that do notpresently map as a circle (Gabrielsen et al 2011)

The change in the plastid chromosome topology hasimportant implications on the process of DNA replicationand overcoming the end-replication problem In the api-complexan and other plastids several types of replicationorigins have been identified and linked to the D-loop roll-ing circle and recombination-mediated replication strate-gies The most common type of replication origin isassociated with bidirectional D-loop replication and locatedinside the duplicated rRNA inverted repeat (Kolodner andTewari 1975 Williamson et al 2002 Krishnan and Rao2009) In contrast the C velia plastid genome lacks boththe duplicated rRNA unit and associated replication originand is probably replicated bidirectionally from thepredicted putative replication origin toward the chromo-some termini which is a unique situation among all plastidgenomes The estimated plastid gene copy number (9that of a nuclear marker) is also unusually low comparedwith most photosynthetic plastids (~50ndash100 copies Dayand Madesis 2007) and is more similar to the gene copynumber in the non-photosynthetic plastids of apicomplex-ans (~25 copies Matsuzaki et al 2001) or plastid minicir-cles during exponential growth in dinoflagellates(2ndash4 copies Koumandou and Howe 2007)

Searching for Forces Driving Structural Complexity

The C velia plastid genome is unique in a number of waysHere we have examined several unusual characteristics ofits organization and expression all of which raises the

question why is this genome home to so many odditiesOne property that might explain several of these character-istics concurrently is a high level of intragenomic duplicationThe C velia plastid genome has been extensively reshuffledand in order to understand the underlying forces wesearched for palindromes short repeats and long repeatsPalindromes and short repeats were comparatively rarehowever the searches identified at least 46 pairwise matchesbetween longer genomic sequences (gt50 bp) at 65ndash100similarity (fig 7) Most of the duplicates are divergent andlow copy number (2ndash3) and apart from the TIRs they includemany intergenic regions duplicated atpH and clpC genesand duplicated fragments of otherwise complete genes(psaA psaB psbB rpl3 and tufA fig 7) The region containingmost repeated sequence is found around the TIR boundary(fig 8) Multiple parts of this region including severalgene fragments three atpH paralogs and six trnK paralogscomprising both complete and fragmentary variants arescattered in several places in the genome (fig 8) Long repeatsare rare in most plastids suggesting that duplication isan important force in shaping the C velia plastid genomeDuplication could explain occurrence of paralogs small genefragments extensive gene re-shuffling and re-structuringof operon units through movement of promoters andintergenic elements Similar processes could account for theaddition of extensions and insertions to genes The splitting ofpsaA and atpB could likewise be seen as a process involvingduplication or partial duplication For example the PsaA splitcould have proceeded through an intermediate in which the

FIG 7 Large repeats in the Chromera velia plastid genome The lineargenome is drawn as an incomplete circle with genes in black intergenicsin light gray and two TIRs (white lines) at chromosome ends (15 bpregions at the termini are not visible here) Links connect homologousregions of 50 bp or more identified by BLAST (dark gray) and additionalmatches identified by Pipmaker (medium gray) Homology between theTIRs is not shown for clarity Names of genes most affected by dupli-cation events are indicated

2457

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

canonical and split PsaAs coexisted In this scenario each ofthe two complexes could have been regulated independentlyand potentially even have different functions much alike thecyanobacterial monomeric and trimeric PSI (Majeed et al2012)

Duplicated sequences in C velia plastid are low copynumber long and divergent from each other They aredissimilar from short dispersed repeats found in some algalplastids (eg in Chlamydomonas Maul et al 2002) and longerrepeats found in angiosperm plastids (eg Guisinger et al2011) all of which are conserved in sequence and compara-tively high copy number It remains unclear how duplicationis mediated in C velia Few duplicates are in tandem so theduplication process is not likely to be related to DNA repli-cation and is more likely related to recombination A func-tional DNA recombination machinery has been documentedin several land plant and algal plastids (Palmer 1983 Boyntonet al 1988 Haberle et al 2008) where it primarily acts in DNArepair and replication Unwanted genome reorganization isknown to be actively prevented which explains why mostplastids have maintained a conserved genomic architecture(reviewed in Marechal and Brisson 2010) It is possiblethat mechanisms preventing unwanted recombination oralternatively those allowing for recombination-mediatedrepair were loosened or lost in the C velia plastid whichled to propagation of duplicates and rearrangementsThe apparent linearization of the chromosome could alsostem from a recombination-mediated rearrangementinvolving homologous TIRs and 15 bp repeats now locatedat the termini Interestingly although sequence shuffling hasbeen extensive in the C velia plastid genome it has not beencompletely random Most genes have been reorganized inlarge clusters with a strongly pronounced strand polarity(supplementary fig S8B Supplementary Material online)More importantly functionally related genes often clustertogether and are most likely transcribed polycistronically(fig 4) so even with high levels of rearrangement a function-ally reorganized genomic architecture can result fromselection

Intragenomic duplication may be the cause of most pecu-liarities in the C velia plastid genome but not all of themFeatures such as transcript oligoU tailing and noncanonicalgenetic code (Moore et al 2008 Janouskovec et al 2010) may

have different causes including increased mutation rate smallpopulation size and a short replication cycle Moreover earlyevolutionary acquisition and long-term combined effect ofsome these forces could have significantly contributed tothe genome remodeling For example all alveolate plastids(C velia apicomplexans Vitrella and dinoflagellates) have areduced gene complement somewhat modified gene orderand comparatively fast rate of protein evolution (Janouskovecet al 2010) Oligouridylylation of plastid transcripts is alsofound in dinoflagellates and may significantly predateC velia although it seems to be missing in apicomplexans(Janouskovec et al 2010 JJ unpublished data) The AtpBinsertion present in dinoflagellates and Vitrella at a homolo-gous position to the split in C velia could be viewed as anancestral acquisition that may have individualized the twoparts of the protein preparing ground for the split Similarlyrecombination has been very active in the dinoflagellate plas-tids and possibly even led to a massive fragmentation of theirgenome to small mini-circles (Zhang et al 1999 Zhang et al2001) so it might be tempting to speculate that the corecause of many of these conditions traces back to an ancientancestor Distinguishing between ancient and independentgains is nevertheless difficult The plastid genomes of apicom-plexans and Vitrella are generally not that unusual in struc-ture as C velia and dinoflagellates so it is more accurate topropose that all alveolate plastids were ancestrally somewhatdivergent but evolved in different directions

The fact that in a single organism two conservativephotosynthesis proteins are split is astonishing given thehigh efficiency of C velia photosynthesis (Quigg et al 2012)The mechanism behind a particular split can be relativelysimple but it is more difficult to see how each of the PsaAand AtpB fragments became integrated into a functionalmultiprotein complex For example both PsaA fragmentshad to coevolve with mechanisms allowing co-translationalinsertion of chlorophylls and simultaneously acquireremodelinteractions with numerous cofactors that mediate theirassembly into a fully functional PSI complex Many otherstructures in the C velia plastid appear similarly complicatedBoth the ribosome and RNA polymerase complex comprisea number of highly modified components including proteinswith hundreds of amino acid long extensions and insertionsExplaining the appearance of highly modified structures

FIG 8 Intrachromosomal duplication and recombination hotspot around the TIR boundary (vertical dashed line) The plastid genome is shown bythick black line at the bottom with TIRs (TIR-A and TIR-B) folded onto themselves to the left and the LSC region forming a loop to the right Other partsof the genome with homology to the TIR boundary region are shown by thick gray lines numbers indicate positions Genes and pseudogenes are shownby boxes Level of sequence similarity is distinguished according to the key

2458

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is more difficult than their components individually becausemany parts of these systems are mutually constrainedLikewise molecular processes such as oligoU tailing aredifficult to justify in adaptive terms Considering the complex-ity of similar unnecessary processes in other organellessuch as RNA editing or intron splicing (Gray et al 2010)it is attractive to hypothesize that complicated molecularmachineries have evolved in the C velia plastid that serveno general advantage

Materials and Methods

DNA Extraction Sequencing Annotation andFragmented Gene Analysis

Pelleted cells of C velia were ground in liquid nitrogenusing mortar and pestle and the resulting slurry was incu-bated in CTAB buffer (2 wv cetyltrimethyl ammoniumbromide 142 M NaCl 20 mM EDTA 100 mM Tris HCl pH80 2 wv polyvinylpyrrolidone 05 b-mercaptoethanol1 mgml RNase A) at 65 C for 20 min After two extrac-tions with phenolchloroform and one with chloroformonly DNA was precipitated with isopropanol and washedwith ethanol Pellet was dried at room temperature andresuspended in TE buffer Extracted DNA was separated byCsCl-Hoechst gradient ultracentrifugation and AT-enrichedfractions were tested for the presence of plastid DNA usingthe psbA probe Plastid-enriched DNA fractions weresequenced using Illumina 54 bp paired-end reads technol-ogy (a total of 9591179 read pairs were obtained) anddeposited in GenBank Sequence Read Archive (BioprojectPRJNA193178) De novo sequence assembly using MIRA3extended the previous plastid contig (NC_0143401) atboth ends to the final length of 120426 nucleotides at572 average depth of coverage Two errors at homopol-ymeric regions were corrected which led to the merging ofrps7 with orf142 and rpl4 with orf128 The gene for 5Sribosomal RNA was identified upstream of 23S rRNAand its fold verified using Mfold (httpmfoldrnaalbanyeduq=mfold last accessed May 25 2013) Proteins werere-annotated by comparison with plastid homologs froman NTG start codon with the exception of 10 proteins thatwere more consistent with an AT[ACT] start codon Six ofthe proteins had no alternative NTG start (atpA rpl31rps7 rps12 rps14 ycf3) Unknown ORFs and genes withlong N-terminal extensions were annotated from the firstNTG start codon The extended plastid genome sequenceand new annotations were deposited in an updated C veliaplastid genome entry (NC_0143402) PsaA trans-membranedomains were predicted using TMHMM 20 (httpwwwcbsdtudkservicesTMHMM last accessed May 25 2013)and by comparison to PsaA of Synechoccocus elongatusThe corrected genetic distances for PsaA-1 and PsaA-2were calculated in Tree-Puzzle 52 (Schmidt et al 2002)using a data set of 40 representative plastid PsaA homologsand the following parameters Likelihood mapping Slow(exact) parameter estimates WAG model with fourgamma categories estimated alpha parameter and esti-mated aa frequencies PsaA-12 phylogenies were

computed using a broader data set in RAxML using -mPROTGAMMALGF -f a - 100 PsaA-12 and AtpB data setswere aligned using ndashlocalpair option in MAFFT

RNA Extraction RACE and Northern BlotHybridization

Total RNA was isolated from ~02 g of C velia cells rup-tured by repeated freezing and thawing followed by grind-ing with a pestle in liquid nitrogen by the addition of 5 mlTRI reagent followed by manufacturerrsquos instructions(Sigma) Total RNA was ligated with 50 adapter (50

RACE) or self-ligated with T4 RNA ligase (circular RACE)and RT-PCR was carried out with either random or specificprimers or with polyT primers (30 RACE) Sequences ofcircularized transcripts were deposited under GenBankNucleotide accessions KC734564ndashKC734569 For Northernblot analysis ~10mg per lane of total RNA was separatedin a 1 formaldehyde agarose gel in 1 MOPS bufferblotted and UV cross-linked as described elsewhere(Vondruskova et al 2005) PCR-amplified DNA fragmentsof selected genes (for primers see supplementary table S4Supplementary Material online) were labeled by randompriming with [32P]-dATP and used as probes for hybrid-ization at 55 C After washing the results were visualizedon Typhoon Imaging System (GE Healthcare)

Preparations of Cell Membranes and TwoDimensional Electrophoresis

Cells of C velia (optical density at 750 nm = ~05) werewashed and resuspended in 1 ml of the working buffer con-taining 25 mM MESNaOH pH 65 5 mM CaCl2 10 mMMgCl2 and 20 glycerol The resuspended cells were mixedwith 05 ml of glass beads (01 mm diameter) in 2 ml eppen-dorf tube and broken using Mini-BeadBeater (BioSpeceight shaking cycles 10 s each with 2-min breaks for coolingthe suspension on ice) Membranes were separated fromthe cell extract by centrifugation (40000 g 20 min)The isolated membranes were resuspended in the workingbuffer and solubilized by gentle shaking with 1 dodecyl-b-maltoside at 10 C for 1 h Insoluble parts were removedby centrifugation (65000 g 20 min) Analysis of nativemembrane complexes was performed using Clear-Nativeelectrophoresis as described in Wittig and Schagger (2008)Individual proteins in membrane complexes were resolvedin the second dimension by SDS-PAGE in a 12ndash20 lineargradient polyacrylamide gel containing 7 M urea (Sobotkaet al 2008)

Protein Identification by LC-MSMS Analysis

Gel slices were placed in 200ml of 40 acetonitrile 200 mMammonium bicarbonate and incubated at 37 C for 30 minThe solution was then discarded the procedure was repeatedone time and the gel was finally dried in a vacuum centrifugeTwenty microliters of 40 mM ammonium bicarbonatecontaining 04mg trypsin (proteomics grade Sigma) wasadded to the tube incubated at 4 C for 45 min and thendried in a vacuum centrifuge To digest proteins 20ml of 9

2459

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 12: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

canonical and split PsaAs coexisted In this scenario each ofthe two complexes could have been regulated independentlyand potentially even have different functions much alike thecyanobacterial monomeric and trimeric PSI (Majeed et al2012)

Duplicated sequences in C velia plastid are low copynumber long and divergent from each other They aredissimilar from short dispersed repeats found in some algalplastids (eg in Chlamydomonas Maul et al 2002) and longerrepeats found in angiosperm plastids (eg Guisinger et al2011) all of which are conserved in sequence and compara-tively high copy number It remains unclear how duplicationis mediated in C velia Few duplicates are in tandem so theduplication process is not likely to be related to DNA repli-cation and is more likely related to recombination A func-tional DNA recombination machinery has been documentedin several land plant and algal plastids (Palmer 1983 Boyntonet al 1988 Haberle et al 2008) where it primarily acts in DNArepair and replication Unwanted genome reorganization isknown to be actively prevented which explains why mostplastids have maintained a conserved genomic architecture(reviewed in Marechal and Brisson 2010) It is possiblethat mechanisms preventing unwanted recombination oralternatively those allowing for recombination-mediatedrepair were loosened or lost in the C velia plastid whichled to propagation of duplicates and rearrangementsThe apparent linearization of the chromosome could alsostem from a recombination-mediated rearrangementinvolving homologous TIRs and 15 bp repeats now locatedat the termini Interestingly although sequence shuffling hasbeen extensive in the C velia plastid genome it has not beencompletely random Most genes have been reorganized inlarge clusters with a strongly pronounced strand polarity(supplementary fig S8B Supplementary Material online)More importantly functionally related genes often clustertogether and are most likely transcribed polycistronically(fig 4) so even with high levels of rearrangement a function-ally reorganized genomic architecture can result fromselection

Intragenomic duplication may be the cause of most pecu-liarities in the C velia plastid genome but not all of themFeatures such as transcript oligoU tailing and noncanonicalgenetic code (Moore et al 2008 Janouskovec et al 2010) may

have different causes including increased mutation rate smallpopulation size and a short replication cycle Moreover earlyevolutionary acquisition and long-term combined effect ofsome these forces could have significantly contributed tothe genome remodeling For example all alveolate plastids(C velia apicomplexans Vitrella and dinoflagellates) have areduced gene complement somewhat modified gene orderand comparatively fast rate of protein evolution (Janouskovecet al 2010) Oligouridylylation of plastid transcripts is alsofound in dinoflagellates and may significantly predateC velia although it seems to be missing in apicomplexans(Janouskovec et al 2010 JJ unpublished data) The AtpBinsertion present in dinoflagellates and Vitrella at a homolo-gous position to the split in C velia could be viewed as anancestral acquisition that may have individualized the twoparts of the protein preparing ground for the split Similarlyrecombination has been very active in the dinoflagellate plas-tids and possibly even led to a massive fragmentation of theirgenome to small mini-circles (Zhang et al 1999 Zhang et al2001) so it might be tempting to speculate that the corecause of many of these conditions traces back to an ancientancestor Distinguishing between ancient and independentgains is nevertheless difficult The plastid genomes of apicom-plexans and Vitrella are generally not that unusual in struc-ture as C velia and dinoflagellates so it is more accurate topropose that all alveolate plastids were ancestrally somewhatdivergent but evolved in different directions

The fact that in a single organism two conservativephotosynthesis proteins are split is astonishing given thehigh efficiency of C velia photosynthesis (Quigg et al 2012)The mechanism behind a particular split can be relativelysimple but it is more difficult to see how each of the PsaAand AtpB fragments became integrated into a functionalmultiprotein complex For example both PsaA fragmentshad to coevolve with mechanisms allowing co-translationalinsertion of chlorophylls and simultaneously acquireremodelinteractions with numerous cofactors that mediate theirassembly into a fully functional PSI complex Many otherstructures in the C velia plastid appear similarly complicatedBoth the ribosome and RNA polymerase complex comprisea number of highly modified components including proteinswith hundreds of amino acid long extensions and insertionsExplaining the appearance of highly modified structures

FIG 8 Intrachromosomal duplication and recombination hotspot around the TIR boundary (vertical dashed line) The plastid genome is shown bythick black line at the bottom with TIRs (TIR-A and TIR-B) folded onto themselves to the left and the LSC region forming a loop to the right Other partsof the genome with homology to the TIR boundary region are shown by thick gray lines numbers indicate positions Genes and pseudogenes are shownby boxes Level of sequence similarity is distinguished according to the key

2458

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

is more difficult than their components individually becausemany parts of these systems are mutually constrainedLikewise molecular processes such as oligoU tailing aredifficult to justify in adaptive terms Considering the complex-ity of similar unnecessary processes in other organellessuch as RNA editing or intron splicing (Gray et al 2010)it is attractive to hypothesize that complicated molecularmachineries have evolved in the C velia plastid that serveno general advantage

Materials and Methods

DNA Extraction Sequencing Annotation andFragmented Gene Analysis

Pelleted cells of C velia were ground in liquid nitrogenusing mortar and pestle and the resulting slurry was incu-bated in CTAB buffer (2 wv cetyltrimethyl ammoniumbromide 142 M NaCl 20 mM EDTA 100 mM Tris HCl pH80 2 wv polyvinylpyrrolidone 05 b-mercaptoethanol1 mgml RNase A) at 65 C for 20 min After two extrac-tions with phenolchloroform and one with chloroformonly DNA was precipitated with isopropanol and washedwith ethanol Pellet was dried at room temperature andresuspended in TE buffer Extracted DNA was separated byCsCl-Hoechst gradient ultracentrifugation and AT-enrichedfractions were tested for the presence of plastid DNA usingthe psbA probe Plastid-enriched DNA fractions weresequenced using Illumina 54 bp paired-end reads technol-ogy (a total of 9591179 read pairs were obtained) anddeposited in GenBank Sequence Read Archive (BioprojectPRJNA193178) De novo sequence assembly using MIRA3extended the previous plastid contig (NC_0143401) atboth ends to the final length of 120426 nucleotides at572 average depth of coverage Two errors at homopol-ymeric regions were corrected which led to the merging ofrps7 with orf142 and rpl4 with orf128 The gene for 5Sribosomal RNA was identified upstream of 23S rRNAand its fold verified using Mfold (httpmfoldrnaalbanyeduq=mfold last accessed May 25 2013) Proteins werere-annotated by comparison with plastid homologs froman NTG start codon with the exception of 10 proteins thatwere more consistent with an AT[ACT] start codon Six ofthe proteins had no alternative NTG start (atpA rpl31rps7 rps12 rps14 ycf3) Unknown ORFs and genes withlong N-terminal extensions were annotated from the firstNTG start codon The extended plastid genome sequenceand new annotations were deposited in an updated C veliaplastid genome entry (NC_0143402) PsaA trans-membranedomains were predicted using TMHMM 20 (httpwwwcbsdtudkservicesTMHMM last accessed May 25 2013)and by comparison to PsaA of Synechoccocus elongatusThe corrected genetic distances for PsaA-1 and PsaA-2were calculated in Tree-Puzzle 52 (Schmidt et al 2002)using a data set of 40 representative plastid PsaA homologsand the following parameters Likelihood mapping Slow(exact) parameter estimates WAG model with fourgamma categories estimated alpha parameter and esti-mated aa frequencies PsaA-12 phylogenies were

computed using a broader data set in RAxML using -mPROTGAMMALGF -f a - 100 PsaA-12 and AtpB data setswere aligned using ndashlocalpair option in MAFFT

RNA Extraction RACE and Northern BlotHybridization

Total RNA was isolated from ~02 g of C velia cells rup-tured by repeated freezing and thawing followed by grind-ing with a pestle in liquid nitrogen by the addition of 5 mlTRI reagent followed by manufacturerrsquos instructions(Sigma) Total RNA was ligated with 50 adapter (50

RACE) or self-ligated with T4 RNA ligase (circular RACE)and RT-PCR was carried out with either random or specificprimers or with polyT primers (30 RACE) Sequences ofcircularized transcripts were deposited under GenBankNucleotide accessions KC734564ndashKC734569 For Northernblot analysis ~10mg per lane of total RNA was separatedin a 1 formaldehyde agarose gel in 1 MOPS bufferblotted and UV cross-linked as described elsewhere(Vondruskova et al 2005) PCR-amplified DNA fragmentsof selected genes (for primers see supplementary table S4Supplementary Material online) were labeled by randompriming with [32P]-dATP and used as probes for hybrid-ization at 55 C After washing the results were visualizedon Typhoon Imaging System (GE Healthcare)

Preparations of Cell Membranes and TwoDimensional Electrophoresis

Cells of C velia (optical density at 750 nm = ~05) werewashed and resuspended in 1 ml of the working buffer con-taining 25 mM MESNaOH pH 65 5 mM CaCl2 10 mMMgCl2 and 20 glycerol The resuspended cells were mixedwith 05 ml of glass beads (01 mm diameter) in 2 ml eppen-dorf tube and broken using Mini-BeadBeater (BioSpeceight shaking cycles 10 s each with 2-min breaks for coolingthe suspension on ice) Membranes were separated fromthe cell extract by centrifugation (40000 g 20 min)The isolated membranes were resuspended in the workingbuffer and solubilized by gentle shaking with 1 dodecyl-b-maltoside at 10 C for 1 h Insoluble parts were removedby centrifugation (65000 g 20 min) Analysis of nativemembrane complexes was performed using Clear-Nativeelectrophoresis as described in Wittig and Schagger (2008)Individual proteins in membrane complexes were resolvedin the second dimension by SDS-PAGE in a 12ndash20 lineargradient polyacrylamide gel containing 7 M urea (Sobotkaet al 2008)

Protein Identification by LC-MSMS Analysis

Gel slices were placed in 200ml of 40 acetonitrile 200 mMammonium bicarbonate and incubated at 37 C for 30 minThe solution was then discarded the procedure was repeatedone time and the gel was finally dried in a vacuum centrifugeTwenty microliters of 40 mM ammonium bicarbonatecontaining 04mg trypsin (proteomics grade Sigma) wasadded to the tube incubated at 4 C for 45 min and thendried in a vacuum centrifuge To digest proteins 20ml of 9

2459

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 13: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

is more difficult than their components individually becausemany parts of these systems are mutually constrainedLikewise molecular processes such as oligoU tailing aredifficult to justify in adaptive terms Considering the complex-ity of similar unnecessary processes in other organellessuch as RNA editing or intron splicing (Gray et al 2010)it is attractive to hypothesize that complicated molecularmachineries have evolved in the C velia plastid that serveno general advantage

Materials and Methods

DNA Extraction Sequencing Annotation andFragmented Gene Analysis

Pelleted cells of C velia were ground in liquid nitrogenusing mortar and pestle and the resulting slurry was incu-bated in CTAB buffer (2 wv cetyltrimethyl ammoniumbromide 142 M NaCl 20 mM EDTA 100 mM Tris HCl pH80 2 wv polyvinylpyrrolidone 05 b-mercaptoethanol1 mgml RNase A) at 65 C for 20 min After two extrac-tions with phenolchloroform and one with chloroformonly DNA was precipitated with isopropanol and washedwith ethanol Pellet was dried at room temperature andresuspended in TE buffer Extracted DNA was separated byCsCl-Hoechst gradient ultracentrifugation and AT-enrichedfractions were tested for the presence of plastid DNA usingthe psbA probe Plastid-enriched DNA fractions weresequenced using Illumina 54 bp paired-end reads technol-ogy (a total of 9591179 read pairs were obtained) anddeposited in GenBank Sequence Read Archive (BioprojectPRJNA193178) De novo sequence assembly using MIRA3extended the previous plastid contig (NC_0143401) atboth ends to the final length of 120426 nucleotides at572 average depth of coverage Two errors at homopol-ymeric regions were corrected which led to the merging ofrps7 with orf142 and rpl4 with orf128 The gene for 5Sribosomal RNA was identified upstream of 23S rRNAand its fold verified using Mfold (httpmfoldrnaalbanyeduq=mfold last accessed May 25 2013) Proteins werere-annotated by comparison with plastid homologs froman NTG start codon with the exception of 10 proteins thatwere more consistent with an AT[ACT] start codon Six ofthe proteins had no alternative NTG start (atpA rpl31rps7 rps12 rps14 ycf3) Unknown ORFs and genes withlong N-terminal extensions were annotated from the firstNTG start codon The extended plastid genome sequenceand new annotations were deposited in an updated C veliaplastid genome entry (NC_0143402) PsaA trans-membranedomains were predicted using TMHMM 20 (httpwwwcbsdtudkservicesTMHMM last accessed May 25 2013)and by comparison to PsaA of Synechoccocus elongatusThe corrected genetic distances for PsaA-1 and PsaA-2were calculated in Tree-Puzzle 52 (Schmidt et al 2002)using a data set of 40 representative plastid PsaA homologsand the following parameters Likelihood mapping Slow(exact) parameter estimates WAG model with fourgamma categories estimated alpha parameter and esti-mated aa frequencies PsaA-12 phylogenies were

computed using a broader data set in RAxML using -mPROTGAMMALGF -f a - 100 PsaA-12 and AtpB data setswere aligned using ndashlocalpair option in MAFFT

RNA Extraction RACE and Northern BlotHybridization

Total RNA was isolated from ~02 g of C velia cells rup-tured by repeated freezing and thawing followed by grind-ing with a pestle in liquid nitrogen by the addition of 5 mlTRI reagent followed by manufacturerrsquos instructions(Sigma) Total RNA was ligated with 50 adapter (50

RACE) or self-ligated with T4 RNA ligase (circular RACE)and RT-PCR was carried out with either random or specificprimers or with polyT primers (30 RACE) Sequences ofcircularized transcripts were deposited under GenBankNucleotide accessions KC734564ndashKC734569 For Northernblot analysis ~10mg per lane of total RNA was separatedin a 1 formaldehyde agarose gel in 1 MOPS bufferblotted and UV cross-linked as described elsewhere(Vondruskova et al 2005) PCR-amplified DNA fragmentsof selected genes (for primers see supplementary table S4Supplementary Material online) were labeled by randompriming with [32P]-dATP and used as probes for hybrid-ization at 55 C After washing the results were visualizedon Typhoon Imaging System (GE Healthcare)

Preparations of Cell Membranes and TwoDimensional Electrophoresis

Cells of C velia (optical density at 750 nm = ~05) werewashed and resuspended in 1 ml of the working buffer con-taining 25 mM MESNaOH pH 65 5 mM CaCl2 10 mMMgCl2 and 20 glycerol The resuspended cells were mixedwith 05 ml of glass beads (01 mm diameter) in 2 ml eppen-dorf tube and broken using Mini-BeadBeater (BioSpeceight shaking cycles 10 s each with 2-min breaks for coolingthe suspension on ice) Membranes were separated fromthe cell extract by centrifugation (40000 g 20 min)The isolated membranes were resuspended in the workingbuffer and solubilized by gentle shaking with 1 dodecyl-b-maltoside at 10 C for 1 h Insoluble parts were removedby centrifugation (65000 g 20 min) Analysis of nativemembrane complexes was performed using Clear-Nativeelectrophoresis as described in Wittig and Schagger (2008)Individual proteins in membrane complexes were resolvedin the second dimension by SDS-PAGE in a 12ndash20 lineargradient polyacrylamide gel containing 7 M urea (Sobotkaet al 2008)

Protein Identification by LC-MSMS Analysis

Gel slices were placed in 200ml of 40 acetonitrile 200 mMammonium bicarbonate and incubated at 37 C for 30 minThe solution was then discarded the procedure was repeatedone time and the gel was finally dried in a vacuum centrifugeTwenty microliters of 40 mM ammonium bicarbonatecontaining 04mg trypsin (proteomics grade Sigma) wasadded to the tube incubated at 4 C for 45 min and thendried in a vacuum centrifuge To digest proteins 20ml of 9

2459

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 14: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

acetonitrile in 40 mM ammonium bicarbonate was addedto the gel and incubated at 37 C overnight Peptides werepurified using ZipTip C18 pipette tips (Millipore Corporation)according to manufacturerrsquos protocol MS analysis was per-formed on NanoAcquity UPLC (Waters) on-line coupled tothe ESI Q-ToF Premier mass spectrometer (Waters) One mi-croliter of the sample was diluted in 3 acetonitril01formic acid and tryptic peptides were separated using theSymmetry C18 Trapping column (180mm id 20 mmlength particle size 5mm reverse phase Waters) with aflow rate of 15mlmin for 1 min It was followed by areverse-phase UHPLC using the BEH300 C18 analyticalcolumn (75mm id 150 mm length particle size 17mmreverse phase Waters) Linear gradient elution was from97 solvent A (01 formic acid) to 40 solvent B (01formic acid in acetonitrile) at a flow rate of 04mlminEluted peptides flowed directly into the ESI source Rawdata were acquired in data-independent MS^e identitymode (Waters) Precursor ion spectra were acquired withcollision energy 5 V and fragment ion spectra with a collisionenergy 20ndash35 V ramp in alternating 1 s scans For the secondanalysis data-dependent analysis mode was used peptidespectra were acquired with collision energy 5 V and peptideswith charge states of + 2 + 3 and + 4 were selected for MSMS analysis Fragment spectra were collected with a collisionenergy 20ndash40 V ramp In both modes acquired spectra weresubmitted for database searching using the PLGS23 software(Waters) against the predicted proteins coded by the plastidgenome of C velia and against the available EST sequences(wwwncbinlmnihgov last accessed May 25 2013) Acetyl N-terminal deamidation N and Q carbamidomethyl C andoxidation M were set as variable modificationsIdentification of three consecutive y- or b-ions was requiredfor a positive peptide match and a minimum of three peptidematches were required for a positive protein identification

Transcriptome Analysis

Total RNA and polyA RNA fraction were sequenced usingIllumina paired-end read technology Read coverage depthwas averaged across the two TIRs Coverage values for allsites were then exported and plotted in a spread sheeteditor Incorrect read mapping to duplicated genes (atpHand clpC) and repeated regions was ruled out based onmatch length or sequence divergence (all matches were966 similar or lower) Presence of RNA editing was assessedby comparing relative representation of high-quality nucleo-tides (Phred score 30 or higher) at each site in betweentranscriptomic and genomic reads mapped on the plastidcontig

DNA Read Mapping Chromosome Ends Analysis andRepeat Analysis

Illumina paired-end DNA reads were separately mappedon the linear and artificially circularized plastid genomesequence using Consed 22 and Bowtie 20 under defaultsettings (667715 read pairs mapped) Read coverage depthwas averaged across the two TIRs and incorrect read map-ping to repeats was ruled out as above Coverage depth

by forward-oriented (ie oriented outside of the chromo-some) and reverse-oriented reads was calculated usingBedtools 217 No forward-oriented reads were foundwithin the last 130 bp of the chromosome ends (all readsin this region were reverse-oriented) Only 23 forward-oriented reads were found within the last 250 bp of thechromosome ends all of which had a reverse-oriented pairmapping close to the chromosome end except 6 readswhose pairs did not map to the plastid genome sequenceor each other In order to estimate the mean DNA frag-ment size full-length read pairs (54 bp each) were mappedon the plastid contig using Bowtie 20 and sortedand analyzed using Picard Tools SortSamjar andCollectInsertSizeMetricsjar utilities Discrepant bases infigure 5B were visualized using WebLogo 3 (httpweblogothreeplusonecomcreatecgi last accessed May 252013) using ldquoBase probabilitiesrdquo as units and ldquoNo adjust-ment for base compositionrdquo setting PCR on genomic DNAwas done at the following conditions annealing at 55 C30 s to 4 min elongation times and 35 cycles All six pri-mers used in bridging the gap between the two contigends were confirmed to give functional products with adifferent primer pair under the same conditions (supple-mentary fig S6C and D Supplementary Material online)Palindromes and repeats were searched using EMBOSS 631repeat identification tools at Pasteur Mobyle website(httpmobylepasteurfr last accessed May 25 2013)Pipmaker (httppipmakerbxpsuedupipmaker lastaccessed May 25 2013) and pairwise BlastN searches(length gt50 nucleotides e-value lt001 and sequence sim-ilarity 65ndash100) all at default settings Figure 7 was plottedusing Circos

PFGE and Southern Blot Hybridization

Chromera velia cells (107ndash108) were slowly pelleted embed-ded in low-melting agarose plugs and treated with 2N-laurylsarcosine and 2 mgml proteinase K for 28 h at56 C Thick cell walls in C velia prevented efficient penetra-tion of cell membranes and DNA release leaving between 10and 30 of all cells undigested The plugs were inserted into1 agarose gel and DNA was resolved on CHEF-DR III PFGE(Bio-Rad) in 05 TBE at 14 C and using two differentsettings 1) U = 6 Vcm with 05ndash25 s pulses and 120 anglefor 20 h and 2) U = 6 Vcm with 01ndash2 s pulses and 120 anglefor 14 h After treatment with 025 M HCl for 20 min the gelswere denatured neutralized and blotted to nylon membraneand UV cross-linked following standard protocols DNAprobes were labeled as described above Southern blotanalysis with psbA tufA petA and atpA probes (see supple-mentary table S4 Supplementary Material online for primers)was performed in NaPi solution (05 M Na2HPO4 pH 721 mM EDTA 7 SDS 1 BSA) at 65 C overnight and themembranes were washed 20 min in 2 SSC 01 SDS atroom temperature and another 20 min in 02 SSC 01SDS at 65 C and visualized on Typhoon Imaging System(GE Healthcare)

2460

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 15: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

Determining the Replication Origin and Plastid GeneCopy Number

The cumulative GC skew plot was drawn using the utility athttpgcatdavidsoneduDGPBgc_skewgc_skewhtml (lastaccessed May 25 2013) Coverage depth by Illumina genomicreads was determined as above Relative abundance of se-lected plastid and nuclear genes in C velia were estimatedas follows A 519 bp-long fragment of nuclear topoisomeraseII gene (a typical single-copy nuclear gene) and 337 bp-longand 327 bp-long fragments of chloroplast tufA and atpH2genes respectively were cloned in tandem into a single plas-mid using unique restriction sites in the primers (supplemen-tary table S4 Supplementary Material online) The resultingconstruct was labeled p55+13 Separate serial dilutions oftotal DNA of C velia digested with DraI and Sph1103I andthe EcoRV-linearized p55+13 were spotted on a Biodynemembrane (Pall) cross-linked and blotted as describedabove Each gene fragment (Topo II tufA atpH) was labeledby [32P]ATP hybridized at 65 C to one of three identicalblots and visualized as above The atpH2 probe was tested tohybridize efficiently to all three atpH paralogs (two of whichare identical to the probe and the third is 93 identical) Forthis genomic DNA digested with DraI and Sph1103I wasresolved in a 075 agarose gel blotted and hybridized asabove and signals from atpH1 and atpH2-specific fragmentswere compared

Supplementary MaterialSupplementary figures S1ndashS9 and tables S1ndashS4 are available atMolecular Biology and Evolution online (httpwwwmbeoxfordjournalsorg)

Acknowledgments

This work was supported by a grant from the CanadianInstitutes of Health Research to PJK (MOP-42517) by theCzech Science Foundation projects P506121522 and P50112G055 to MO by the Praemium Academiae award to JLby Award IC201009 by the King Abdullah University ofScience and Technology (KAUST) to AP MO and JLand by the project Algatech (CZ1052100030110) to RSJK and OP PJK and JL are Fellows of the CanadianInstitute for Advanced Research PJK was supported by aFellowship from the John Simon Guggenheim Foundation

ReferencesAmunts A Toporik H Borovikova A Nelson N 2010 Structure deter-

mination and improved model of plant photosystem I J Biol Chem2853478ndash3486

Barbrook AC Dorrell RG Burrows J Plenderleith LJ Nisbet RER Howe CJ2012 Polyuridylylation and processing of transcripts from multiplegene minicircles in chloroplasts of the dinoflagellate Amphidiniumcarterae Plant Mol Biol 79347ndash357

Bendich AJ 1991 Moving pictures of DNA released upon lysis frombacteria chloroplasts and mitochondria Protoplasma 160121ndash130

Bendich AJ 2004 Circular chloroplast chromosomes the grand illusionPlant Cell 161661ndash1666

Botte CY Yamaryo-Botte Y Janouskovec J Rupasinghe T Keeling PJCrellin P Coppel RL Marechal E McConville MJ McFadden GI2011 Identification of plant-like galactolipids in Chromera velia a

photosynthetic relative of malaria parasites J Biol Chem 28629893ndash29903

Boudreau E Takahashi Y Lemieux C Turmel M Rochaix J-D 1997 Thechloroplast ycf3 and ycf4 open reading frames of Chlamydomonasreinhardtii are required for the accumulation of the photosystem Icomplex EMBO J 166095ndash6104

Boynton JE Gillham NW Harris EH et al (11 co-authors) 1988Chloroplast transformation in Chlamydomonas with high velocitymicroprojectiles Science 2401534ndash1538

Busch A Hippler M 2011 The structure and function of eukaryoticphotosystem I Biochim Biophys Acta 1807864ndash877

Dang Y Green BR 2009 Substitutional editing of Heterocapsa triquetrachloroplast transcripts and a folding model for its divergentchloroplast 16S rRNA Gene 44273ndash80

Day A Madesis P 2007 DNA replication recombination and repair inplastids In Bock R editor Cell and molecular biology of plastidsBerlin Springer p 65ndash119

de Cambiaire J-C Otis C Lemieux C Turmel M 2006 The completechloroplast genome sequence of the chlorophycean green algaScenedesmus obliquus reveals a compact gene organization and abiased distribution of genes on the two DNA strands BMC Evol Biol637

Eichacker LA Muller B Helfrich M 1996 Stabilization of the chlorophyllbinding apoproteins P700 CP47 CP43 D2 and D1 by synthesis ofZn-pheophytin a in intact etioplasts from barley FEBS Lett 395251ndash256

Ellis TH Day A 1986 A hairpin plastid genome in barley EMBO J 52769ndash2774

Gabrielsen TM Minge MA Espelund M et al (11 co-authors) 2011Genome evolution of a tertiary dinoflagellate plastid PLoS One 6e19132

Gatenby A Rothstein S Nomura M 1989 Translational coupling of themaize chloroplast atpB and atpE genes Proc Natl Acad Sci U S A 864066ndash4066

Gray MW Lukes J Archibald JM Keeling PJ Doolittle WF 2010Irremediable complexity Science 330920ndash921

Green BR 2003 The evolution of light-harvesting antennas In GreenBR Parson WW editors Light-harvesting antennas in photosynthe-sis Netherlands Kluwer Academic Publishers p 129ndash168

Groth G 2002 Structure of spinach chloroplast F1-ATPase complexedwith the phytopathogenic inhibitor tentoxin Proc Natl Acad SciU S A 993464ndash3468

Guisinger MM Kuehl JV Boore JL Jansen RK 2011 Extreme reconfigura-tion of plastid genomes in the angiosperm family Geraniaceae rear-rangements repeats and codon usage Mol Biol Evol 28583ndash600

Haberle RC Fourcade HM Boore JL Jansen RK 2008 Extensive rearran-gements in the chloroplast genome of Trachelium caeruleum areassociated with repeats and tRNA genes J Mol Evol 66350ndash361

Herranen M Battchikova N Zhang P Graf A Sirpio S Paakkarinen VAro E-M 2004 Towards functional proteomics of membrane pro-tein complexes in Synechocystis sp PCC 6803 Plant Physiol 134470ndash481

Hikosaka K Watanabe Y-I Tsuji N et al (12 co-authors) 2010Divergence of the mitochondrial genome structure in the apicom-plexan parasites Babesia and Theileria Mol Biol Evol 271107ndash1116

Janouskovec J Horak A Barott KL Rohwer FL Keeling PJ 2012 Globalanalysis of plastid diversity reveals apicomplexan-related lineages incoral reefs Curr Biol 22R518ndashR519

Janouskovec J Horak A Obornık M Lukes J Keeling PJ 2010 A commonred algal origin of the apicomplexan dinoflagellate and heterokontplastids Proc Natl Acad Sci U S A 10710949ndash10954

Jarvi S Suorsa M Paakkarinen V Aro E 2011 Optimized native gelsystems for separation of thylakoid protein complexes novelsuper- and mega-complexes Biochem J 439207ndash214

Jordan P Fromme P Witt HT Klukas O Saenger W Krauss N 2001Three-dimensional structure of cyanobacterial photosystem I at25 A resolution Nature 411909ndash917

Kairo A Fairlamb AH Gobright E Nene V 1994 A 71 kb linear DNAmolecule of Theileria parva has scrambled rDNA sequences

2461

Plastid Genome of Chromera velia doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from

Page 16: Split Photosystem Protein, Linear-Mapping …mass/charge properties of intact PsaA and AtpB was found (see supplementary tables S1 and S2, Supplementary Material online, for a list

and open reading frames for mitochondrially encoded proteinsEMBO J 13898ndash905

Kim J Eichacker LA Rudiger W Mullet JE 1994 Chlorophyll regulatesaccumulation of the plastid-encoded chlorophyll proteins P700 andD1 by increasing apoprotein stability Plant Physiol 104907ndash916

Kolodner RD Tewari KK 1975 Chloroplast DNA from higher plantsreplicates by both the Cairns and the rolling circle mechanismNature 256708ndash711

Koreny L Sobotka R Janouskovec J Keeling PJ Obornık M 2011Tetrapyrrole synthesis of photosynthetic chromerids is likelyhomologous to the unusual pathway of apicomplexan parasitesPlant Cell 233454ndash3462

Koumandou VL Howe CJ 2007 The copy number of chloroplast geneminicircles changes dramatically with growth phase in the dinofla-gellate Amphidinium operculatum Protist 15889ndash103

Krishnan NM Rao BJ 2009 A comparative approach to elucidatechloroplast genome replication BMC Genomics 10237ndash237

Lilly JW Havey MJ Jackson SA Jiang J 2001 Cytogenomic analyses revealthe structural plasticity of the chloroplast genome in higher plantsPlant Cell 13245ndash254

Majeed W Zhang Y Xue Y Ranade S Blue RN Wang Q He Q 2012RpaA regulates the accumulation of monomeric photosystem I andPsbA under high light conditions in Synechocystis sp PCC 6803 PLoSOne 7e45139

Marechal A Brisson N 2010 Recombination and the maintenance ofplant organelle genome stability New Phytol 186299ndash317

Matsuzaki M Kikuchi T Kita K Kojima S Kuroiwa T 2001 Largeamounts of apicoplast nucleoid DNA and its segregation inToxoplasma gondii Protoplasma 218180ndash191

Maul JEJ Lilly JJW Cui L DePamphilis CW Miller W Harris EH Stern DB2002 The Chlamydomonas reinhardtii plastid chromosome islandsof genes in a sea of repeats Plant Cell 142659ndash2679

Mazor Y Greenberg I Toporik H Beja O Nelson N 2012 The evolutionof photosystem I in light of phage-encoded reaction centres PhilosTrans R Soc Lond B Biol Sci 3673400ndash3405

Merendino L Perron K Rahire M Howald I Rochaix J-D Goldschmidt-Clermont M 2006 A novel multifunctional factor involved in trans-splicing of chloroplast introns in Chlamydomonas Nucleic Acids Res34262ndash274

Moore RB Obornık M Janouskovec J et al (14 co-authors) 2008 Aphotosynthetic alveolate closely related to apicomplexan parasitesNature 451959ndash963

Nelson MJ Dang Y Filek E Zhang Z Yu VWC Ishida K Green BR 2007Identification and transcription of transfer RNA genes in dinoflagel-late plastid minicircles Gene 392291ndash298

Nelson N Yocum CF 2006 Structure and function of photosystems Iand II Annu Rev Plant Biol 57521ndash565

Nosek J Tomaska L Kucejova B 2004 The chromosome end replicationlessons from mitochondrial genetics J Appl Biomed 271ndash79

Obornık M Janouskovec J Chrudimsky T Lukes J 2009 Evolution of theapicoplast and its hosts from heterotrophy to autotrophy and backagain Int J Parasitol 391ndash12

Obornık M Modry D Lukes M Cernotıkova-Strıbrna E Cihlar J TesarovaM Kotabova E Vancova M Prasil O Lukes J 2012 Morphologyultrastructure and life cycle of Vitrella brassicaformis n sp n gena novel chromerid from the Great Barrier Reef Protist 163306ndash323

Oldenburg DJ Bendich AJ 2004 Most chloroplast DNA of maize seed-lings in linear molecules with defined ends and branched formsJ Mol Biol 335953ndash970

Palmer J 1983 Chloroplast DNA exists in two orientations Nature 30192ndash93

Quigg A Kotabova E Jaresova J Kana R Setlık J Sediva B Komarek OPrasil O 2012 Photosynthesis in Chromera velia represents a simplesystem with high efficiency PLoS One 7e47036

Rekosh DM Russell WC Bellet AJ Robinson AJ 1977 Identification of aprotein linked to the ends of adenovirus DNA Cell 11283ndash295

Ruf S Kossel H Bock R 1997 Targeted inactivation of a tobacco intron-containing open reading frame reveals a novel chloroplast-encodedphotosystem I-related gene J Cell Biol 13995ndash102

Scharff LB Koop H-U 2006 Linear molecules of tobacco ptDNA end atknown replication origins and additional loci Plant Mol Biol 62611ndash621

Schmidt HA Strimmer K Vingron M von Haeseler A 2002 TREE-PUZZLE maximum likelihood phylogenetic analysis using quartetsand parallel computing Bioinformatics 18502ndash504

Schubert WD Klukas O Saenger W Witt HT Fromme P Krauss N 1998A common ancestor for oxygenic and anoxygenic photosyntheticsystems a comparison based on the structural model of photosys-tem I J Mol Biol 280297ndash314

Sobotka R Duhring U Komenda J Peter E Gardian Z Tichy M GrimmB Wilde A 2008 Importance of the cyanobacterial Gun4 protein forchlorophyll metabolism and assembly of photosynthetic complexesJ Biol Chem 28325794ndash25802

Tomaska L Makhov AM Griffith JD Nosek J 2002 t-Loops in yeastmitochondria Mitochondrion 1455ndash459

Tomaska L Nosek J Fukuhara H 1997 Identification of a putativemitochondrial telomere-binding protein of the yeast Candidaparapsilosis J Biol Chem 2723049ndash3056

Vondruskova E van den Burg J Zıkova A Ernst NL Stuart K Benne RLukes J 2005 RNA interference analyses suggest a transcript-specificregulatory role for mitochondrial RNA-binding proteins MRP1 andMRP2 in RNA editing and other RNA processing in Trypanosomabrucei J Biol Chem 2802429ndash2438

Wang Y Morse D 2006 Rampant polyuridylylation of plastid genetranscripts in the dinoflagellate Lingulodinium Nucleic Acids Res34613ndash619

Westhoff P Alt J Nelson N Bottomley W Bunemann H Herrmann RG1983 Genes and transcripts for the P700 chlorophylla apoproteinand subunit 2 of the photosystem I reaction center complex fromspinach thylakoid membranes Plant Mol Biol 295ndash107

Williamson DH Denny PW Moore PW Sato S McCready S Wilson RJ2001 The in vivo conformation of the plastid DNA of Toxoplasmagondii implications for replication J Mol Biol 306159ndash168

Williamson DH Preiser PR Moore PW McCready S Strath M WilsonRJM 2002 The plastid DNA of the malaria parasite Plasmodiumfalciparum is replicated by two mechanisms Mol Microbiol 45533ndash542

Wittig I Schagger H 2008 Features and applications of blue-native andclear-native electrophoresis Proteomics 83974ndash3990

Woehle C Dagan T Martin WF Gould SB 2011 Red and problematicgreen phylogenetic signals among thousands of nuclear genesfrom the photosynthetic and apicomplexa-related Chromera veliaGenome Biol Evol 31220ndash1230

Zhang Z Cavalier-Smith T Green BR 2001 A family of selfish minicir-cular chromosomes with jumbled chloroplast gene fragmentsfrom a dinoflagellate Mol Biol Evol 181558ndash1565

Zhang Z Green BR Cavalier-Smith T 1999 Single gene circles indinoflagellate chloroplast genomes Nature 400155ndash159

2462

Janouskovec et al doi101093molbevmst144 MBE at T

he University of B

ritish Colom

bia Library on February 17 2014

httpmbeoxfordjournalsorg

Dow

nloaded from