characterisation and evolution of globin like genes … · 2018. 3. 1. · ii characterisation and...
TRANSCRIPT
CHARACTERISATION AND EVOLUTION OF
GLOBIN-LIKE GENES IN PHYLUM
CNIDARIA
Hayden Lee Smith
Bachelor of Science
Submitted in fulfilment of the requirements for the degree of Master of Applied Science (Research)
Earth, Environmental and Biological Sciences Science and Engineering Faculty
Queensland University of Technology 2018
Characterisation and evolution of globin-like genes in phylum Cnidaria i
Keywords
Actiniaria, bioinformatics, Cnidaria, differential gene expression, gene duplication,
globin gene superfamily, globin-X, hexacoordination, in silico protein prediction,
neofunctionalisation, neuroglobin, novel gene, pentacoordination, phylogenetics,
subfunctionalisation
ii Characterisation and evolution of globin-like genes in phylum Cnidaria
Abstract
Globins are among the best-studied gene and protein families in biology. In
particular, the globin superfamily of vertebrates has been extensively studied, but
little research on globin genes has occurred in early-diverging eumetazoans. This
study aimed to address this knowledge gap by identifying globin genes in early-
diverging eumetazoan phyla, such as phylum Cnidaria, and to use bioinformatic
approaches to characterise and understand the evolution, structure and expression of
these genes. Sea anemones (phylum Cnidaria; order Actiniaria) represent a highly
diverse group of organisms that inhabit different environments, and are an excellent
candidate to understand the diversity and distribution of globin genes in this phylum.
Compared to other cnidarians, sea anemones are relatively easier to obtain from their
environment and typically have large populations. Additionally, many species lack
symbionts that are a potential contaminant in experimental design. Thus, sea
anemones are the ideal cnidarian for a multitude of different experimental studies.
This study also aimed to infer possible relationships between phylum Cnidaria and
its sister group superphylum Bilateria, specifically the well-known vertebrate globin
repertoire. Cnidarians can give great insight into the evolutionary history of the
globin superfamily and could provide further knowledge to resolve the debate about
the ancestral globin gene in the vertebrate repertoire.
Using a bioinformatics approach, this research has addressed this knowledge
gap about the expression and evolution of globin genes and proteins in phylum
Cnidaria. This research has identified globin genes in four classes of cnidarians that
are molecularly, structurally and phylogenetically most similar to vertebrate
Characterisation and evolution of globin-like genes in phylum Cnidaria iii
neuroglobin and globin-X. There was a large-scale expansion of cnidarian globin-
like genes in order Actiniaria (including sea anemones) with up to 10 genes
identified across a diverse range of taxa. In silico protein predictions revealed the
possibility of two structural conformations for cnidarian globin proteins,
hexacoordinate and pentacoordinate. Additionally, the observed cnidarian globin
gene expansion specifically contained hexacoordinate sequences, however, there was
a single pentacoordinate sequence found exclusively in class Anthozoa. Tissue and
development specific expression analyses suggest that the expansion of globin-like
genes in cnidarians resulted in subfunctionalisation of duplicate copies, with a
possible neofunctionalisation event resulting in a single copy of the pentacoordinate
sequence. This research has improved our understanding of the evolution and
function of the globin gene superfamily in early-diverging eumetazoan phyla.
This thesis has helped to fill the knowledge gaps about the evolution of the
globin gene superfamily. A broad expansion of globin genes has been revealed in
two early-diverging phyla, Cnidaria and Placozoa, and these genes are similar to
vertebrate neuroglobin and globin-X genes. Subsequently, this suggests that a globin-
like gene was present in the metazoan ancestor which was most likely the progenitor
gene to the neuroglobin and globin-X subfamilies, and the expansion of the globin
gene repertoire in vertebrates. Current research has revealed that three globin
subfamilies, neuroglobin-like, myoglobin and hemoglobin, have undergone gene
expansions in divergent eumetazoan taxa and this research has identified a globin-
like gene expansion in Actiniarians. Subsequently, this is the first report of
convergent amplification in the globin superfamily. This thesis provides a starting
point for future research into the structural and biochemical properties of the
cnidarian globin proteins identified and how protein function has evolved over more
iv Characterisation and evolution of globin-like genes in phylum Cnidaria
than 600 million years of evolution. Consequently, understanding the evolution,
expression and structure of globin genes and proteins will improve our understanding
of the vertebrate globin repertoire.
Characterisation and evolution of globin-like genes in phylum Cnidaria v
Table of Contents
Keywords ................................................................................................................................. ii
Abstract ................................................................................................................................... iii
Table of Contents .................................................................................................................... vi
List of Figures ........................................................................................................................ vii
List of Tables ............................................................................................................................ x
List of Abbreviations ............................................................................................................... xi
Statement of Original Authorship .......................................................................................... xii
Acknowledgements ............................................................................................................... xiii
Chapter 1: Introduction ...................................................................................... 1
1.1 Overview ........................................................................................................................ 1
1.2 Context ........................................................................................................................... 2
1.3 Research Aims and Objectives....................................................................................... 3
1.4 Literature Review ........................................................................................................... 4
Chapter 2: Methods and Results ...................................................................... 23
2.1 Materials and Methods ................................................................................................. 23
2.2 Results .......................................................................................................................... 28
Chapter 3: General Discussion ......................................................................... 37
3.1 Key Findings ................................................................................................................ 38
3.2 Evolution of globin genes in phylum Cnidaria ............................................................ 39
3.3 Convergent amplification of globin genes in Eumetazoa ............................................ 42
3.4 Structure and function of globin proteins in phylum Cnidaria ..................................... 43
3.5 Effect of environment on globin gene expression in phylum Cnidaria ........................ 45
3.6 Research gaps and future directions ............................................................................. 46
3.7 Conclusion ................................................................................................................... 48
Bibliography ............................................................................................................. 49
Appendices ................................................................................................................ 63
Appendix A Supplementary Tables and Figures .................................................................... 63
vi Characterisation and evolution of globin-like genes in phylum Cnidaria
List of Figures
Figure 1.1: Actiniarian (sea anemone) species of interest; (A) A. tenebrosa and (B) E. pallida. ................................................................................................ 3
Figure 1.2: Hypothesised evolution of animal globins (modified from Burmester & Hankeln, 2014). Abbreviations: Androglobin, Adgb; Neuroglobin, Ngb, Globin-X, GbX; Cytoglobin, Cygb; Myoglobin, Mb; Globin-E, GbE; Globin-Y, GbY; Hemoglobin, Hb. .............................. 5
Figure 1.3: General cnidarian morphology; (A) overview and (B) single celled dermal layers (Technau & Steele, 2011). ....................................................... 7
Figure 1.4: Unrooted molecular phylogeny based on multiple alignments of a subset of 84 sequences that comprise 138 amino acids of Ngb, Ngb-like, hemoglobin, myoglobin, and cytoglobin sequences from diverse phyla (modified from Lechauve et al, 2013). .............................................. 10
Figure 1.5: Expression of myoglobin mRNA in selected Protopterus annectens tissue samples, as estimated by qRT-PCR (modified from Koch et al., 2016). Legend: gene copy reference. ........................................................... 12
Figure 1.6: Predictive structures of unliganded wild-type and mutant Ngb; focused on the heme pocket (modified from Azarov et al., 2016). ............. 14
Figure 2.1: Maximum Likelihood bootstrap phylogenetic tree of identified candidate cnidarian globin genes in genomes of cnidarian species, with supported Bayesian posterior probabilities. Model species representations of phyla Cnidaria, Ctenophora, Placozoa and Porifera (highlighted in purple, green, brown, and yellow, respectively) with vertebrate globin genes highlighted with red branches, and the S. minutum outgroup highlighted in grey. Phylogenetic values shown as maximum likelihood bootstrap support (0-100)/Bayesian posterior probabilities (0-1.0). Bootstrap values < 50 and posterior probabilities < 0.5 shown with a ~ symbol and nodes not identical between each method shown with a - symbol. ................................................................... 31
Figure 2.2: Maximum Likelihood bootstrap phylogenetic tree of identified candidate cnidarian globin genes in transcriptomes of cnidarian species, with supported Bayesian posterior probabilities. Cnidarian pentacoordinate and hexacoordinate branches highlighted in green and blue, respectively, with vertebrate branches highlighted in red and the S. minutum outgroup branch highlighted in black. Pentacoordinate cnidarian genes represented in ortholog reference nvec7000121 are associated with protein model highlighted in green. Hexacoordinate cnidarian genes are associated with protein model highlighted in blue. Phylogenetic values shown as maximum likelihood bootstrap support (0-100)/Bayesian posterior probabilities (0-1.0). Bootstrap values < 50 and posterior probabilities < 0.5 shown with a ~ symbol and nodes not identical between each method shown with a - symbol. Collapsed clades represent sequences with the corresponding ortholog reference
Characterisation and evolution of globin-like genes in phylum Cnidaria vii
gene nomenclature as referenced in Appendix A: Supplementary Table 2.3 (expanded clades shown in Appendix A: Supplementary Figure 2.2). ................................................................................................... 32
Figure 2.3: Predictive cnidarian globin protein structure with heme pocket residues shown. (A) Structural variation of A. tenebrosa ortholog references A.tenebrosa_nvec7000121 (highlighted green) and A.tenebrosa_nvec42000019 (highlighted blue) with side chain residue structures for F (CD1 position; phenylalanine), Q/H (E7 position; distal glutamine/histidine) and H (F8 position; proximal histidine) shown. (B) Structural variation of A. tenebrosa ortholog reference A.tenebrosa_nvec7000121 (highlighted green) and E. pallida ortholog reference E.pallida_nvec7000121 (highlighted gold) showing forward and reverse position of E7 residue Q, respectively, and with side chain residues surrounding E7 position shown. .................................................... 34
Figure 2.4: Heatmap for tissue specific RNA-seq differential gene expression (DGE) analysis with three biological replicates for each tissue type. (A) Analysis of A. tenebrosa tissue types: acrorhagi, tentacle and mesentery filament. (B) Analysis of N. vectensis tissue types: nematosome, tentacle and mesentery filament. ........................................... 35
Figure 2.5: Heatmap for development specific RNA-seq differential gene expression (DGE) analysis with two biological replicates for each tissue type. (A) Analysis of E. pallida developmental stages: immature (larvae) and mature (adult), with three biological replicates for adult stage only. (B) Analysis of N. vectensis developmental stages: immature (planula) and mature (adult). ........................................... 36
Supplementary Figure 2.1: Cladogram overview of phylogenetic relationships for early-diverging species, phylum Cnidaria derived from mitochondrial (Rodríguez et al., 2014) and genomic (Zapata et al., 2015) genes. Red, light green and pink highlighting represents the three most studied Superfamilies of Actiniaria; Actinioidea, Metridioidea and Edwardsioidea, respectively. Candidate cnidarian globin gene copy number in brackets after species name. Abbreviations: O, Order; C, Class. .............................................................. 75
Supplementary Figure 2.2: Maximum Likelihood phylogenetic tree of identified candidate cnidarian globin genes in transcriptomes of cnidarian species, with supported Bayesian posterior probabilities. Model and non-model representations of vertebrate globin genes, cnidarian classes Anthozoa, Cubozoa, Hydrozoa and Scyphozoa, with S. minutum as the outgroup. Phylogenetic values shown as maximum likelihood bootstrap support (0-100)/Bayesian posterior probabilities (0-1.0). .......................................................................................................... 76
Supplementary Figure 3.1: Maximum Likelihood bootstrap phylogenetic tree of identified candidate cnidarian globin genes in transcriptomes of cnidarian species, with supported Bayesian posterior probabilities. Blue dots represent gene duplication events within Actiniaria taxa. Red brackets represent individual gene duplication events within specific species. Phylogenetic values shown as maximum likelihood bootstrap support (0-100)/Bayesian posterior probabilities (0-1.0).
viii Characterisation and evolution of globin-like genes in phylum Cnidaria
Bootstrap values < 50 and posterior probabilities < 0.5 shown with a ~ symbol and nodes not identical between each method shown with a - symbol. Collapsed clades represent sequences with the corresponding ortholog reference gene nomenclature as referenced in Appendix A: Supplementary Table 2.3 (expanded clades shown in Appendix A: Supplementary Figure 2.2)........................................................................... 77
Characterisation and evolution of globin-like genes in phylum Cnidaria ix
List of Tables
Supplementary Table 2.1: Output from OrthoMCL for candidate cnidarian globin genes, with individual gene nomenclature used for all downstream analyses. ................................................................................... 63
Supplementary Table 2.2: Primer sequences and estimated gene sequence length for candidate cnidarian globin genes in A. tenebrosa and E. pallida. Candidate gene nomenclature referenced from OrthoMCL results detailed in Supplementary Table 2.4. ............................................... 66
Supplementary Table 2.3: Trinity De novo assembled transcriptome statistics for quality check analysis. Abbreviations: n/a, Not Applicable. ................. 67
Supplementary Table 2.4. Results of data interrogation for genome and transcriptome datasets. Details represent additional information for individual candidate cnidarian globin genes. Candidate gene nomenclature referenced from OrthoMCL results detailed in Supplementary Table 2.4. ............................................................................ 68
Supplementary Table 2.5: Synonymous and nonsynonymous mutations identified in validated transcriptome contigs for E. pallida species. Abbreviations: Syn, Synonymous; Non-syn, Non-synonymous; n/a, Not Applicable. ............................................................................................ 72
Supplementary Table 2.6: Intron-exon structure analysis of nine E. pallida globin genes. Gene, exon and intron lengths are given as nucleotide counts. N/A used to identify introns with large blocks of ambiguous nucleotides, thus true length of intron could not be determined. Abbreviations: forward, F; reverse, R. ......................................................... 73
x Characterisation and evolution of globin-like genes in phylum Cnidaria
List of Abbreviations
Androglobin (Adgb) Carbon monoxide (CO) Cytoglobin (Cygb) Globin-E (GbE) Globin-X (GbX) Globin-Y (GbY) Hemoglobin (Hb) Hemoglobin-α (HbA) Hemoglobin-β (HbB) Hydrogen Sulphide (H2S) Myoglobin (Mb) Neuroglobin (Ngb) Nitric Oxide (NO) Real-time Reverse Transcriptase Polymerase Chain Reaction (qRT-PCR) RNA sequencing (RNA-seq)
Characterisation and evolution of globin-like genes in phylum Cnidaria xi
Statement of Original Authorship
The work contained in this thesis has not been previously submitted to meet
requirements for an award at this or any other higher education institution. To the
best of my knowledge and belief, the thesis contains no material previously
published or written by another person except where due reference is made.
Signature:
Date: _________________________
xii Characterisation and evolution of globin-like genes in phylum Cnidaria
20 / 02 / 2018
QUT Verified Signature
Acknowledgements
I wish to express my sincere thanks to my supervisor Dr Peter Prentis and my
co-supervisors Dr Ana Pavasovic and Dr Matthew Phillips for their guidance,
suggestions and assistance. I wish to make a special mention to Joachim Surm for his
assistance with bioinformatic troubleshooting throughout my project.
This work was supported by the Evolutionary Physiological Genomics
Laboratory Group and the Central Analytical Research Facility, Queensland
University of Technology. Computational and data visualisation resources and
services used in this work were provided by the HPC and Research Support Group,
Queensland University of Technology.
Characterisation and evolution of globin-like genes in phylum Cnidaria xiii
Chapter 1: Introduction
1.1 OVERVIEW
Vertebrate (superphylum Bilateria, phylum Chordata) globins are among the
best-studied gene families and proteins in biology, however, there is still a general
lack of knowledge about the structure, expression and evolutionary history of globin
genes across other eumetazoan phyla. In particular, there is very little knowledge
about the globin gene superfamily in species of the early-diverging phylum,
Cnidaria. Cnidarians are the sister group to superphylum Bilateria and they represent
an ideal group to resolve the lack of knowledge surrounding the globin gene
superfamily in early-diverging taxa. The structure of globin proteins have been
characterised with an eight α-helix conformation, however the individual amino acid
changes that occur throughout these helices can alter the efficiency of different
ligands that bind to the heme group associated with globin proteins. Consequently,
the function of each globin subfamily changes based on structural variations found
within each specific subfamily. Each globin subfamily also typically has tissue and
developmental specific expression that is associated with changes in structure and
function of these proteins. The extensive study of the vertebrate globin gene
superfamily has resulted in a greater understanding of the evolutionary history for
each globin subfamily. However, this does not address the knowledge gap associated
with globin diversity in early-diverging taxa, such as phylum Cnidaria, as well as
those which were present in the last common eumetazoan ancestor. By elucidating
the structure, expression and evolution of globin genes in phylum Cnidaria, this
Chapter 1: Introduction 1
knowledge gap was addressed and a greater understanding of the globin gene
superfamily in eumetazoan taxa was elucidated.
1.2 CONTEXT
This project utilised genomic data from cnidarian species in order to evaluate
the expression and evolution of cnidarian globin genes, and to determine the protein
structure of globin genes in two actiniarian species, Actinia tenebrosa (Figure 1.1A)
and Exaiptasia pallida (Figure 1.1B). These two species are ideal for the study of
globin gene evolution due to their high abundance in two distinct environments of
the Australian east coast, the intertidal and shallow marine zones, respectively. Thus,
they are relatively easier to collect for experimentation compared to other cnidarian
taxa. This project has provided insight into the evolutionary history and function of
globin genes in early-diverging taxa of phylum Cnidaria. Globin genes were
identified in a wide distribution of cnidarians and included the four classes of phylum
Cnidaria (Anthozoa, Cubozoa, Hydrozoa and Scyphozoa; Supplementary Figure
2.1), and a broad expansion of globin genes in order Actiniaria was elucidated.
Additionally, variations in protein structure, and tissue and development expression
were observed in actiniarian species. Consequently, this research has provided a
foundation of knowledge for understanding the evolution and function of the globin
gene superfamily in early-diverging taxa.
2 Chapter 1: Introduction
Figure 1.1: Actiniarian (sea anemone) species of interest; (A) A. tenebrosa and (B) E. pallida.
1.3 RESEARCH AIMS AND OBJECTIVES
This project investigated the evolution, structure and expression of globin
genes in several species of cnidarians, with a specific focus on sea anemones (order
Actiniaria). Using molecular and bioinformatics approaches, this research has
increased our understanding of the globin gene superfamily in this early-diverging
eumetazoan phylum. Published datasets for multiple classes of cnidarians, as well as
published globin genes in the early-diverging phyla Ctenophora, Placozoa and
Porifera were obtained. Subsequently, these datasets were used to elucidate a better
understanding of the globin gene superfamily in phylum Cnidaria and Eumetazoa in
general, as well as to identify the variation in structure and expression profiles of
globin genes in select species.
Overall the project had two main objectives:
Objective 1: To understand the evolutionary history of the cnidarian globin
gene superfamily through phylogenetic and comparative genomic approaches. This
entailed a detailed examination of the diversification and distribution of cnidarian
globin genes, and a comparison of these genes to previously characterised vertebrate
Chapter 1: Introduction 3
globin genes using Maximum Likelihood and Bayesian Inference phylogenetic
methods.
Objective 2: To understand the structure and expression of cnidarian globin
genes through protein modelling and quantitative transcriptomic analyses. Validated
globin genes from A. tenebrosa and E. pallida were used to determine protein model
predictions for all globin proteins found in both species based on reviewed globin
protein structures. Previously published RNA-seq data was used to determine tissue
specific expression patterns in A. tenebrosa and Nematostella vectensis, as well as,
development specific expression patterns in E. pallida and N. vectensis.
1.4 LITERATURE REVIEW
1.4.1 BACKGROUND
The globin gene superfamily encodes proteins that are found in all kingdoms of
life with a ubiquitous distribution across metazoan species (Freitas et al., 2004;
Hardison, 1996; Hoogewijs et al., 2008; Lechauve et al., 2013). Globins are small
respiratory proteins that bind gaseous molecules to a heme group that contains an
iron-ion and porphyrin ring within the heme pocket of the protein’s structure
(Dickerson & Geis, 1983). Globin proteins are known to bind gaseous compounds
such as oxygen, carbon monoxide (CO) and nitric oxide (NO) (Dewilde et al., 2001;
Fago et al., 2006; Jayaraman et al., 2011). The binding potential of these compounds
is dependent on both the structure of the globin protein and the presence of a heme
group. The variation in structure and binding potential within vertebrate globin
proteins has resulted in the classification of 11 globin subfamilies; hemoglobin-α
(HbA), hemoglobin-β (HbB), myoglobin (Mb), cytoglobin (Cygb), neuroglobin
4 Chapter 1: Introduction
(Ngb), globin-X (GbX), globin-E (GbE), globin-Y (GbY), Agnathan globins,
Protostome globins and androglobin (Adgb) (Burmester et al., 2004) (Figure 1.2).
The globin genes present within and among vertebrate species have arisen from
repeated rounds of single gene and whole genome duplication events (Hoffman et al.,
2010; Hoffman et al., 2012; Jeffreys et al., 1980; Shen et al., 1981; Storz et al.,
2013), with the expansion and subsequent neofunctionalisation of these globin genes
seen throughout superphylum Deuterostomia (Hoffman et al., 2012; Burmester &
Hankeln, 2014). This superphylum has a broad distribution and diversity of globin
genes, however, Burmester & Hankeln (2014) have suggested the earliest diverging
globin subfamily in vertebrates is either Ngb, GbX or androglobin. By studying
early-diverging taxa, the identity of the ancestral globin gene can be elucidated and
this will present significant insights into the evolution of protein structure, binding
and function of this gene superfamily outside of vertebrate taxa.
Figure 1.2: Hypothesised evolution of animal globins (modified from Burmester & Hankeln, 2014).
Abbreviations: Androglobin, Adgb; Neuroglobin, Ngb, Globin-X, GbX; Cytoglobin, Cygb;
Myoglobin, Mb; Globin-E, GbE; Globin-Y, GbY; Hemoglobin, Hb.
Chapter 1: Introduction 5
The distribution of globin and globin-like genes in the established model
species of Bilateria (in particular vertebrates) is well-known, however, this is not the
case for early-diverging eumetazoan phyla, such as cnidarians. Bilaterians have a
diverse repertoire of globin genes that frequently show tissue and development
specific expression patterns (Burmester et al., 2004; Ebner et al., 2010; Hoogewijs et
al., 2011; Koch et al., 2016; Roesner et al., 2005). A defining feature of cnidarians is
the lack of complex organs and organ systems seen in bilaterians (Figure 1.3)
(Brusca & Brusca, 2003; Technau & Steele, 2012). They do, however, possess a
relatively simple nervous system (nerve net) and only have two dermal layers
(ectoderm and endoderm). Consequently, cnidarian species predominantly rely on
diffusion to supply oxygen to their working cells (Brusca & Brusca, 2003; Technau
& Steele, 2012) and are unlikely to possess circulating globin proteins. The lack of
morphological complexity and reliance on diffusion seen in cnidarians presents an
interesting experimental analogue to examine the diversity, distribution, expression,
and structure of globin genes and proteins in simple early-diverging eumetazoan
taxa. Currently, only Ngb-like genes have been identified in cnidarian species, which
have been characterised only in a single species, the hydrozoan Clytia hemisphaerica
(Lechauve et al., 2013). The two Ngb-like genes in C. hemisphaerica were found to
have tissue specific expression patterns and both were expressed predominantly in
tentacle bulbs and manubrium of this species. Specifically, these genes were found
associated with collagen-like proteins found in the wall of nematocysts (the venom
delivery organelle of cnidarians associated with nematoblast cells in the neural net;
Technau & Steele, 2011) (Lechauve et al., 2013). The identification of tissue specific
globin genes in phylum Cnidaria indicates a similar differentiation of globins as seen
in vertebrates. It is therefore highly likely that the diversity of globin genes in
6 Chapter 1: Introduction
cnidarians is linked to their morphology and physiology. Consequently, we can better
understand the role that these genes and proteins have within cnidarians, and
subsequently the distinctions of Ngb among eumetazoans.
Figure 1.3: General cnidarian morphology; (A) overview and (B) single celled dermal layers (Technau
& Steele, 2011).
Neuroglobins are an early branching and ancient member of the globin gene
superfamily in Eumetazoa (Burmester et al., 2000; Hoogewijs et al., 2008) and are
currently thought to have multiple functions in vertebrate neural cells (Burmester et
al., 2000; Burmester et al., 2009; Watanabe et al., 2012). Within vertebrates, the
function of Ngbs has yet to be fully characterised, but studies have shown that they
have roles in oxygen supply, storage, and interactions with mitochondria for cellular
respiration and signalling (Burmester et al., 2000; Burmester et al., 2009; Burmester
Chapter 1: Introduction 7
et al., 2004; Ruetz et al., 2017; Singh et al., 2013; Watanabe et al., 2012; Hoffman et
al., 2010; Teixerira et al., 2013). Some Ngb-like genes have been identified outside
of vertebrate taxa, such as in phyla Nematoda and Cnidaria. Thirty-three globin
genes have been identified in the nematode Caenorhabditis elegens, and many of
these genes have a broad distribution among other nematode species (Hoogewijs et
al., 2008). The high copy number of nematode globin genes is indicative of repeated
rounds of gene duplication increasing gene number in this phylum. In C. elegens
there is also evidence that many of the Ngb-like genes have undertaken new roles, as
32 of the 33 genes show patterns of tissue specific expression profiles throughout its
body plan (Hoogewijs et al., 2008). This research, along with the study of C.
hemisphaerica, has begun to address the knowledge gap for the evolution and
expression of globin genes in eumetazoan phyla outside of phylum Chordata, but
many more studies in other phyla are needed.
1.4.2 EVOLUTION OF THE GLOBIN GENE SUPERFAMILY
While the globin gene superfamily has been extensively studied, its
evolutionary history is still poorly understood in early-diverging eumetazoan taxa.
The globin gene family is found throughout all kingdoms of life (Freitas et al., 2004;
Hardison, 1996; Hoogewijs et al., 2008; Lechauve et al., 2013), and has a prokaryote
origin (Freitas et al., 2013). This gene family is widespread but has been extensively
studied mainly in vertebrate species. To understand the evolution of the globin gene
superfamily beyond the scope of vertebrates, the study of other taxa is required.
Phylogenetic analyses have partially elucidated the ancestral history of
vertebrate globin genes with the identification of three early divergent genes;
8 Chapter 1: Introduction
androglobin, neuroglobin and globin-X (Burmester et al., 2000; Burmester &
Hankeln, 2014; Hoogewijs et al., 2011; Roesner et al., 2005). The identification of
Ngb-like genes in early-diverging eumetazoans suggests that a globin-like gene was
most likely the ancestral globin gene present in the last common ancestor of phylum
Cnidaria and superphylum Bilateria (Hoogewijs et al., 2008; Lechauve et al., 2013).
In fact, phylogenetic analyses revealed that globin genes are present in most early-
diverging phyla, and form a clade with vertebrate Ngb and GbX genes (Figure 1.4)
(Lechauve et al., 2013). This conclusion is based on very limited sampling within
phylum Cnidaria and more research is needed to support or refute this idea. Overall
this lack of knowledge exposes the need for further research into the diversity and
evolution of globin genes in Eumetazoa.
Chapter 1: Introduction 9
Figure 1.4: Unrooted molecular phylogeny based on multiple alignments of a subset of 84 sequences
that comprise 138 amino acids of Ngb, Ngb-like, hemoglobin, myoglobin, and cytoglobin sequences
from diverse phyla (modified from Lechauve et al, 2013).
Gene duplication is one of the most important molecular mechanisms for the
generation of copy number variation and gene diversity within metazoan gene
families. Three processes give rise to gene duplication events; unequal crossing over,
chromosomal or genome duplication events and retroposition of mRNA transcripts
(Ohno, 1969; Zhang, 2003). The presence of two Ngb-like genes in C.
hemisphaerica (Lechauve et al., 2013) shows that there has been at least one gene
duplication event in phylum Cnidaria. The best evidence for the role of gene
duplication in the evolution of globin gene diversity, however, comes from the seven
globin subfamilies found exclusively in vertebrates. These seven globin subfamilies
10 Chapter 1: Introduction
are thought to have evolved from an ancestral globin gene present in the last
common ancestor of vertebrates (Burmester & Hankeln, 2014). In fact, phylogenetic
and comparative genomic analysis has revealed that the seven different globin
subfamilies unique to vertebrates are the result of at least two rounds of whole
genome duplication and a number of single gene duplication events (Hoffman et al.,
2012; Ohno, 1969). Consequently, gene and genome duplication events followed by
mutation have expanded the diversity of proteins encoded by globin genes in
vertebrates. The expansion of diversity has also resulted in an expansion of function,
with mutation and fixation resulting in subfunctionalisation and/or
neofunctionalisation of the different globin subfamilies into the vertebrate globin
repertoire.
Subfunctionalisation and neofunctionalisation can occur following gene
duplication events where one duplicate copy accumulates new mutations and gains a
new or related function (Ohno, 1969; Zhang, 2003). Fixation of these processes have
resulted in the current diversity of globin genes in vertebrates. Each subfamily has
undergone gene duplication followed by subfunctionalisation or neofunctionalisation
from the ancestral globin gene (Hoffman et al., 2012; Zhang, 2003). One example of
these processes is the recent characterisation of an expansion and subsequent
subfunctionalisation and/or neofunctionalisation for myoglobin genes in lungfish
(Koch et al., 2016). These genes show similar tissue specificity as myoglobin and
Ngb with gene expression predominantly in muscle, eye and brain tissues (Figure
1.5). They have also been suggested to replace the function of cytoglobin and Ngb,
as these two genes were not present. The four main globin subfamilies (hemoglobin,
myoglobin, cytoglobin and neuroglobin) are found throughout most vertebrates and
are a representation of the diversity of the globin superfamily that has revealed tissue
Chapter 1: Introduction 11
and development specific expression, as well as different functional roles (Brunori et
al., 2005; Burmester et al., 2002; Burmester et al., 2014; Fuch et al., 2005; Koch et
al., 2016; Stamatoyannopoulos, 2005). Improving our understanding of globin genes
in different eumetazoan phyla may reveal that early-diverging species, other than
nematodes, have undergone similar evolutionary mechanisms as seen in vertebrates.
Figure 1.5: Expression of myoglobin mRNA in selected Protopterus annectens tissue samples, as
estimated by qRT-PCR (modified from Koch et al., 2016). Legend: gene copy reference.
Overall, metazoan globin proteins all share a largely similar function for
binding gaseous compounds (Burmester & Hankeln, 2014; Pesce et al., 2003).
Despite this, in some instances significant differences can be seen at the molecular
level that can alter their functional properties (Dewilde et al., 2005; Fuchs et al.,
2005; Koch et al., 2016), as well as having different tissue and development
expression patterns (Burmester et al., 2004). For example, in vertebrates, hemoglobin
forms a tetramer for the supply and transport of oxygen around a circulating system,
whereas, Ngb forms a monomer and has multiple functions within neural cells
including oxygen supply (Burmester et al., 2000; Burmester et al., 2004; Burmester
& Hankeln, 2014; Pesce et al., 2003). Furthermore, there is a foetal and adult
12 Chapter 1: Introduction
developmental form of hemoglobin (Stamatoyannopoulos, 2005) but only a single
known form of Ngb and GbX. While only tissue specific expression has been
observed in C. hemisphaerica, it is unknown whether duplicated globin genes in
phylum Cnidaria display structural and functional differences at the protein level.
The knowledge gaps surrounding globin protein structure and function reveals the
importance of identifying globin genes in early-diverging Eumetazoan phyla, such as
Cnidaria.
1.4.3 GLOBIN STRUCTURAL VARIATION AND LIGAND BINDING
POTENTIAL
The structure and conformation of globin proteins influences their binding
potential for gaseous compounds, which can give insights into their function and
evolution (Bocahut et al., 2013; Borhani et al., 2015; Fago et al., 2006; Jayaraman et
al., 2011; Kriegl et al., 2002; Ramos-Alvarez et al., 2013). Recent studies have
expanded the knowledge of Ngb protein structure and function in phylum Chordata
(Dewilde et al., 2001; Jayaraman et al., 2011; Kiger et al., 2011), which can be used
to elucidate the functional characterisation of potential globin proteins in cnidarians.
This literature focuses on the conformational changes of the heme binding site
(Burmester et al., 2004; Pesce et al., 2003; Ota et al., 1997) and ligand binding
potential (Dewilde et al., 2001; Fago et al., 2006; Hoffman et al., 2010). There are
two conformations within globin proteins, pentacoordination and hexacoordination
(Bocahut et al., 2013; Dewilde et al., 2001; Fago et al., 2006; Jayaraman et al.,
2011). The vertebrate repertoire predominantly consists of pentacoordinate globin
proteins, with hexacoordination represented only in Ngb, GbX, androglobin and
cytoglobin (Figure 1.2) (Burmester & Hankeln, 2014). Both hexacoordinated and
Chapter 1: Introduction 13
pentacoordinated globin proteins have different binding potentials for gaseous
molecules such as oxygen, CO (Azarov et al., 2016; Dewilde et al., 2001; Fago et al.,
2006) and NO (Jayaraman et al., 2011; Tejero et al., 2015). The variation between
hexacoordinate and pentacoordinate conformations is the result of a single amino
acid replacement at the E7 helical position (Figure 1.6). This residue determines the
efficiency of gaseous compounds that can be bound and unbound to the heme group
(Azarov et al., 2016; Dewilde et al., 2001; Fago et al., 2006; Jayaraman et al., 2011;
Tejero et al., 2015). This is especially important as the pentacoordinate conformation
has been shown to reduce the impact of hypoxia due to nitric oxide (Jayaraman et al.,
2011) and carbon monoxide toxicity (Azarov et al., 2016; Dewilde et al., 2001; Fago
et al., 2006). Vertebrate Ngbs have hexacoordinate conformation, but have structural
modifications that alter the E7 position heme binding site to conform to the
pentacoordinate structure (Bocahut et al., 2013; Jayaraman et al., 2011), thus
allowing for greater autoxidation efficiency for heme binding and unbinding (Tejero
et al., 2015). Currently, there are no studies that have closely investigated the
structure and function of cnidarian globin proteins. Modelling the structure of
cnidarian globin proteins will help to address this knowledge gap.
Figure 1.6: Predictive structures of unliganded wild-type and mutant Ngb; focused on the heme pocket
(modified from Azarov et al., 2016).
14 Chapter 1: Introduction
The binding potential of Ngb in vertebrates varies based on environmental
conditions, but is limited mainly by the pentacoordinate conformation from its initial
hexacoordinate state (Fago et al., 2006; Jayaraman et al., 2011; Kriegl et al., 2002).
Studies have shown that the kinetics of oxygen, NO, CO and hydrogen sulphide
(H2S) ligand binding can be altered by pH, temperature, and globin and ligand
concentrations (Bocahut et al., 2013; Borhani et al., 2015; Fago et al., 2006;
Nienhaus et al., 2004; Ramos-Alvarez et al., 2013). By removing these
environmental variables, the binding potential of gaseous compounds is dependent
on globin protein structure around the heme pocket (Bashford et al., 1987; Brunori et
al., 2005; Dewilde et al., 2001; Giuffre et al., 2008). Studies with hexacoordinate
Ngb show that there is a faster binding mechanism involved in pentacoordinate
dissociation with specific distal and alternate side-chain residues limiting the release
of the ligands (Bocahut et al., 2013; Brunori et al., 2005; Giuffre et al., 2008; Kriegl
et al., 2002). Identifying the similarities between these studies and the key residues in
cnidarian globin proteins will help to uncover their potential binding properties.
Vertebrate globins typically have a greater binding affinity and stability to oxygen
for transport and storage than other gaseous molecules. Neuroglobin, however, has
been observed to have less affinity to oxygen and greater affinity for nitric oxide and
carbon monoxide (Brunori et al., 2005; Kiger et al., 2011; Kriegl et al., 2002),
suggesting a cellular detoxification function to be more likely than transport and
storage. Functionality in the proteins encoded by cnidarian globin genes is still to be
established but they would likely have similar gaseous affinities and biological
functions as vertebrate globins given the basic cellular characteristics observed in all
metazoan systems. Elucidating functionality and affinity of cnidarian globins for
various gases will enable further study into the mechanisms involved and how novel
Chapter 1: Introduction 15
functions have arisen throughout the evolutionary history of globin genes in
Eumetazoa.
1.4.4 DETOXIFICATION OF DELETERIOUS MOLECULES
Globin proteins have various physiological roles other than oxygen storage and
transport, such as, the removal of toxic and deleterious molecules. These molecules
can accrue from normal biological processes or external sources due to absorption
into an organism. The removal or detoxification of deleterious molecules is
necessary in order to maintain cell homeostasis in aerobic organisms. There are three
molecules that can have toxic effects and have been studied in relation to the
different globin subfamilies; CO, NO and H2S. Hemoglobin and myoglobin have
been shown to reduce NO and H2S toxicity (Bostelaar et al., 2016; Flögel et al.,
2001; Vitvitsky et al., 2015), and cytoglobin has been shown to reduce NO toxicity
(Hundahl et al., 2013). Neuroglobin, however, can reduce the toxicity of all three of
these molecules (Azarov et al., 2016, Ruetz et al., 2017, Singh et al., 2013),
subsequently suggesting a detoxification role rather than oxygen storage and
transport. While there is still very limited research into the detoxification function of
these four globin proteins, the current literature suggests that each globin protein is
similar and that it is protein efficacy as well as tissue specificity that distinguishes
them from each other. Therefore, identifying and characterising Ngb and Ngb-like
genes in early-diverging species will give us insights into the function of this globin
subfamily outside of vertebrates.
The discovery and subsequent functionality of Ngb is still ongoing, with
several key attributes and roles being identified in vertebrates; most recently, the role
16 Chapter 1: Introduction
of detoxification (Azarov et al., 2016; Brunori et al., 2005; Ruetz et al., 2017; Singh
et al., 2013). The globin superfamily is highly complex and further studies are
needed to improve our knowledge of the detoxification role and why it is necessary
in eumetazoans. Recent literature has identified the capability of Ngb to neutralise
CO (Azarov et al., 2016), NO (Brunori et al., 2005; Singh et al., 2013) and H2S
(Ruetz et al., 2017) that are in excess within an organism. Azarov et al. (2016) found
that by selectively mutating the distal histidine residue in Ngb, the affinity and
binding potential of Ngb to CO increased. This was primarily achieved due to the
structural conformation change from a hexacoordinate state to a pentacoordinate
state. Detoxification is important in maintaining homeostasis in neural cells and
mitochondria of vertebrates, and therefore is likely to be necessary in other
eumetazoan groups. Detoxification is especially important for cnidarians, as
diffusion is their only known physiological process that transports gaseous molecules
in and out of their cells and tissues (Brusca & Brusca, 2003; Technau & Steele,
2012). If diffusion is impaired or inefficient at removing excess molecules then it is
logical that a globin protein would supplement this role. Consequently,
characterising the protein structures of globin genes in cnidarians, will provide a
better understanding for the role of globin proteins in less complex morphologies, as
well as elucidating the similarities and differences between cnidarian and vertebrate
globin proteins.
1.4.5 ENVIRONMENTAL STRESS IN AQUATIC SPECIES
Aquatic species have been examined for the expression of globin genes under
oxygen stress conditions for a few model vertebrate species, however, information on
the expression of globin genes is lacking in non-model and non-vertebrate species.
Chapter 1: Introduction 17
Vertebrate studies show that gene expression and protein abundance vary in response
to hypoxia (Roesner et al., 2006; Roesner et al., 2008). For example, zebrafish Ngb
gene transcription and protein abundance increased up to five-fold under hypoxic
conditions (Roesner et al., 2006). Under similar conditions goldfish Ngb protein
abundance, however, had a five-fold increase compared to zebrafish, even though
goldfish Ngb transcription remained unchanged (Roesner et al., 2008). This
difference in protein abundance is suggested to be the result of adaptations to their
respective environmental conditions (Roesner et al., 2008). This is important when
comparing cnidarian species from different habitats. Intertidal cnidarians would
repeatedly be under hypoxic stress from emersion and consequently should have a
similar response mechanism as seen in goldfish (Roesner et al., 2008). Subsequently,
cnidarians that are always submerged would likely have a different response, as seen
in zebrafish (Roesner et al., 2006), as it is less likely that they would be under
hypoxic stress. Understanding the adaptations between different cnidarian species
would elicit a greater understanding as to the origins of hypoxic endurance and how
globin genes have expanded their functionality beyond a simple oxygen carrying and
storage protein. Furthermore, understanding the relationship between the expansion
of globin genes in a broad diversity of non-model taxa and different environments
would expand our knowledge of gene duplication.
Heat stress from natural environmental conditions is an important factor in
gene expression, particularly for chemical toxicity in cnidarians. However, there has
been very limited research for compounds such as nitric oxide. Nitric oxide is an
important compound across metazoan taxa for cellular processes (particularly cell
signalling) but can have toxic effects when overproduced, such as symbiont
expulsion in corals and sea anemones (Bouchard et al., 2008, Perez et al., 2006,
18 Chapter 1: Introduction
Trapido-Rosenthal et al., 2005). Expression of globin genes in cnidarians could
mitigate high concentrations of NO, and potentially CO and H2S, particularly for
species that are consistently under heat stress, such as A. tenebrosa in the intertidal
zone. Interestingly, Perez et al. (2006) studied the effect of heat stress and NO
concentration on symbionts, but did not report on the survivability/mortality of E.
pallida. Subsequently, it is still unknown whether globin genes have a detoxification
role in cnidarians, however, observations of Ngb activity in reducing NO toxicity in
vertebrates (Singh et al., 2013) suggests that cnidarian globin proteins are likely to
have a similar functional role.
1.4.6 SUMMARY AND IMPLICATIONS
The evolution of the globin gene superfamily is well-established in vertebrates,
and by analysing taxa from an early-diverging lineage (phylum Cnidaria), a better
understanding of globin gene evolution across Eumetazoa can be gained. Current
evidence has shown that globin genes are ubiquitous across all kingdoms of life,
revealing that they are likely essential to the survival of aerobic organisms. The
diversity and distribution of globin genes found within specific taxonomic groups is
dependent on the globin subfamily and gene copy number. By identifying and
evaluating each globin subfamily and their corresponding copy number in cnidarians,
this research will be able to gain insights into the evolutionary history of the globin
gene superfamily in early-diverging taxa from Eumetazoa. Individual globin
subfamilies, such as myoglobin or Ngb, found in specific vertebrate species have
undergone expansions of these specific genes. These expansions are the result of
repeated rounds of gene duplication events, with an increase in functional diversity
from the result of subfunctionalisation and/or neofunctionalisation after duplication.
Chapter 1: Introduction 19
In phylum Cnidaria, only two globin genes have been identified i.e., Ngb-like genes
in C. hemisphaerica. A more thorough bioinformatics approach across multiple
cnidarian classes will elucidate a more complete repertoire of globin genes found
within this phylum. Furthermore, the evolution of globin genes in cnidarians would
likely be similar to vertebrates due to the same mechanisms involved; gene
duplication, subfunctionalisation and neofunctionalisation. These mechanisms have
resulted in variations in gene sequences, protein structures and functional roles in
vertebrates, and consequently these variations would be seen in cnidarians.
The function of previously characterised metazoan globin proteins is primarily
for oxygen storage and transport, but recent literature has revealed a range of other
physiological roles are undertaken by different globin subfamilies. This variation in
function is largely associated with alterations in protein structure and expression
patterns. There are two structural conformations observed in globin proteins;
pentacoordination and hexacoordination. These structural variations have an
important role in multicellular systems for their binding and affinity of gaseous
ligands. In particular, the binding and detoxification of toxic molecules, such as nitric
oxide and carbon dioxide, would be an important functional role needed by
cnidarians that inhabit dynamic environments, specifically sea anemones.
Subsequently, the identification of globin protein structure in these organisms will
further our understanding of the evolution of the globin superfamily and the possible
function of the ancestral globin protein. Defining protein structure and function will
also improve our understanding and the significance of gene expression, especially
for the comparison between species of different morphologies and environments.
Tissue and development specific gene expression profiles have been well-
established for vertebrate globin genes. Throughout vertebrates, there are variations
20 Chapter 1: Introduction
to expression patterns of each globin subfamily, however, there are also instances of
differential expression within the same subfamily. These variations are associated
with expansions of an individual subfamily, such as tissue specific expression of Ngb
in nematodes and myoglobin in lungfish. Consequently, it would be congruent that
the tissue expression variations seen in these species, as well as development
expression variations seen in other vertebrates, would also be observed in the
cnidarian globin gene expansion that has been proposed. Furthermore, vertebrate
globin expression varies based on the environmental pressures they have been
exposed to, thus similar variations would be expected for cnidarian gene expression
profiles between species of different environmental habitats. Analysis of cnidarian
species, specifically sea anemones, from different habitats but similar ancestry will
elucidate a greater understanding of the globin gene repertoire and expression within
early-diverging species. Moreover, the similarities and differences of globin gene
expression profiles between cnidarians and vertebrates, and within cnidarian taxa
will further our understanding of the globin gene superfamily.
Chapter 1: Introduction 21
Chapter 2: Methods and Results
2.1 MATERIALS AND METHODS
2.1.1 Transcriptome construction and quality checking
Transcriptome datasets generated with Illumina platforms were obtained from
NCBI GenBank; Acropora digitifera (PRJNA309168; Mohamed et al., 2016),
Actinia tenebrosa (SRX1604071), Alatina alata (SRX978662), Anthopleura
buddemeieri (SRX1604661; Van Der Burg et al., 2016), Aulactinia veratra
(SRX1614867; Van Der Burg et al., 2016), Aurelia aurita (PRJNA252562;
Brekhman et al 2015), Calliactis polypus (SRX1614869; Van Der Burg et al., 2016),
Chironex fleckeri (SRX891607), Corallium rubrum (SRX675792; Pratlong et al.,
2015), Hydractinia polyclina (SRX315374), Nemanthus annamensis (SRX1634628;
Van Der Burg et al., 2016), Protopalythoa variabilis (SRX978667). Trinity de novo
assembler software (v2.0.6) was used to assemble high quality reads (> Q30, < 1%
ambiguities) into contiguous sequences (contigs) (Haas et al., 2013). Default settings
were used with the addition of Trimmomatic to remove low quality reads and
adaptors (Haas et al., 2013). Redundant and chimeric sequences present in the
transcriptome were removed using CD-hit (v4.6.1) by clustering sequences with >
95% similarity into a single contig (Fu et al., 2012). The quality and completeness of
the transcriptome assemblies were determined with CEGMA to report the presence
of the 248 core eukaryotic genes (CEG) that were complete (> 70% alignment with
CEG protein) (Parra et al., 2007) and BUSCO to report the presence of the 978
Chapter 2: Methods and Results 23
single-copy orthologs in metazoans that were complete (Simão et al., 2015). A
CEGMA and BUSCO score of > 80% was considered high quality.
2.1.2 Candidate gene identification
Blast searches using vertebrate globin sequences against the genomes of
Nematostella vectensis (Putnam et al., 2007), Acropora digitifera (Shinzato et al.,
2011), Hydra vulgaris (Chapman et al., 2010), Trichoplax adhaerens (Srivastava et
al., 2008), Amphimedon queenslandica (Srivastava et al., 2010) and Mnemiopsis
leidyi (Ryan et al., 2013) were conducted. Potential globin gene sequences were
extracted from genome scaffolds and transcripts.
Transcriptomes were annotated using the SwissProt database (Haas et al.,
2013) within the Trinotate software package (v2.0.6) with an e-value stringency of
1e−6. A custom BLAST database was created using globin genes annotated in the N.
vectensis and H. vulgaris genomes. Transcriptomes were locally blasted against this
custom database to ensure any predicted proteins were contained in the globin gene
candidate list. Candidate sequences were further scrutinised against the genome and
transcriptome assemblies of the dinoflagellate, Symbiodinium minutum, using the
Okinawa Institute of Science and Technology Graduate University’s Marine
Genomics Unit genome browser (http://marinegenomics.oist.jp/symb/viewer/
info?project_id=21). Sequences with an e-value < 1e-6 were considered as potential
dinoflagellate genes and subsequently removed from downstream analyses.
Candidate genes were assigned a custom nomenclature using the OrthoMCL
database (http://orthomcl.org/orthomcl/) (Chen et al., 2006). Candidate genes were
translated into protein sequences and queried against the OrthoMCL database to
assign these genes to orthologous groups. They were assigned a custom
24 Chapter 2: Methods and Results
nomenclature based on species and the best orthologous protein hit (Appendix A:
Supplementary Table 2.1).
2.1.3 Candidate gene validation and interrogation
Candidate sequences from A. tenebrosa and Exaiptasia pallida were validated
using PCR amplification and Sanger sequencing. Primers were designed using the
NCBI primer design tool (Ye et al., 2012) in order to amplify the entire open reading
frame of the candidate globin genes (Appendix A: Supplementary Table 2.2). PCR
amplification of candidate genes was achieved using the MyFi2x Taq Polymerase Kit
(BIOLINE); 12.5 µL MyFi2x polymerase master mix, 9.5 µL ddH2O, 1 µL 10 pmol
Forward primer, 1 µL 10 pmol Reverse primer, 1 µL (20-50ng) cDNA template.
Sanger sequencing was completed using a modified BigDye Terminator v3.1
protocol (Applied Biosystems). Sanger sequences were aligned and mapped back to
ORFs of the candidate gene they were designed from to validate assembly of
candidate globin genes in these two species.
Validated candidate sequences from E. pallida were mapped back to the
genome (Baumgarten et al., 2015) and the intron-exon structures were interrogated.
Candidate genes were used as blast queries against the genome to identify the
scaffolds they occurred on. Sequences were mapped back to these scaffolds using the
Geneious software (v9), and intron-exon boundaries were determined using the
typical GT/AG splicing rule.
Chapter 2: Methods and Results 25
2.1.4 Phylogenetic Analysis
The distribution and diversification of globin-like genes was analysed using
Maximum Likelihood, and Bayesian Inference phylogenetic methods. Exonic
nucleotide sequences comprising the globin protein domain (PFAM ID: PF00042)
were aligned in MEGA (v6.06) (Tamura et al., 2013) using Muscle codon modelling
(Edgar, 2004). The best fit model test was conducted in MEGA (v6.06) for all
phylogenetic trees. Subsequent phylogenetic analyses were conducted using IQ-
TREE (http://iqtree.cibiv.univie.ac.at) (Trifinopoulos et al., 2016) and MrBayes
(v3.2) (Ronquist et al., 2012). All phylogenies for nucleotide sequences were
undertaken using the General Time Reversal model with gamma distribution and
invariant sites. Maximum likelihood analyses were completed for 1,000 ultrafast
bootstrap replications, using Codon F3x4 state frequency, ascertainment bias
correction and 0.95 minimum correlation coefficient. Bayesian Inference analyses
were completed for 10,000,000 MCMC generations, sampling every 1,000th
generation, and used a default burn-in value of 25% (25,000 samples). The outgroup
used for all phylogenetic trees was a S. minutum gene containing a single globin
domain.
Outputs from IQ-TREE and MrBayes were further analysed using topology
tests in IQ-TREE (Trifinopoulos et al., 2016) and ancestral state reconstruction in
MrBayes (Ronquist et al., 2012), respectively. Topology testing, with the same
settings as above, was used to analyse ML trees and manually constrained trees using
the Approximately Unbiased test (Shimodaira, 2002). Constrained trees forced a
monophyletic node for cnidarian sequences with either Ngb or Ngb and GbX.
Ancestral state probabilities for three different constrained nodes (cnidarian
sequences with either Ngb, Ngb and GbX or GbX) were completed with default
26 Chapter 2: Methods and Results
settings as specified in the manual (Ronquist et al., 2012), with the following
changes: 2,000,000 MCMC generations, sampling every 2,000th generation, and
diagnosis frequency every 50,000th generation.
2.1.5 Protein modelling prediction
Validated candidate globin genes for A. tenebrosa and E. pallida were used to
model predictive protein structures. Amino acid sequences were input into RaptorX
(Källberg et al., 2012) to align against the Protein Data Bank with a stringency value
of ≤ 1e-3. This relatively low value was used due to the lack of invertebrate globin-
like protein structures available. Predictive models were subsequently loaded into the
Chimera protein editor (Pettersen et al., 2004) to annotate and visualise protein
structures, and manually align candidate cnidarian globin proteins for comparative
analyses.
2.1.6 Differential gene expression analysis
Tissue and development specific transcriptome datasets generated with
Illumina platforms were obtained from NCBI GenBank; A. tenebrosa
(PRJNA350366) and N. vectensis (PRJEB13676) for tissue data, and N. vectensis
(PRJNA213177) and E. pallida (PRJNA261862) for developmental data. Each
individual dataset was assembled with all raw reads combined into a single assembly
as per the above Transcriptome construction and quality checking section. The
Trinity RNA-seq software pipeline for determining differential gene expression was
used for each of the assembled datasets (Haas et al., 2013). Raw reads were mapped
back to the relevant combined assembly to obtain transcript abundance using the
RSEM estimation method (Li & Dewey 2011) and the bowtie alignment method
Chapter 2: Methods and Results 27
(Langmead et al., 2009). Principle components analysis was performed on the RSEM
abundance count and normalised FPKM data outputs to ensure no batch effects were
present. Differential expression was conducted using default settings in the Trinity
RNA-seq software pipeline for the edgeR method with a dispersion value of 0.1
(Haas et al., 2013; Robinson et al., 2010). Differentially expressed genes were
considered significant provided they had a false discovery rate p-value of < 1e-3.
Heatmaps were constructed in the Trinity RNA-seq software pipeline using default
Perl to R sample correlation matrix settings for the normalised data and a log2 fold
change centred on the mean, with minimum row and column expression values of 0
(Haas et al., 2013).
2.2 RESULTS
2.2.1 Transcriptome assembly and candidate gene validation
All assembled transcriptomes were high quality based on their N50 values, and
CEGMA and BUSCO completeness scores (Appendix A: Supplementary Table 2.3).
All transcriptomes had N50 values > 1,000, with the exception of A. aurita. All
transcriptomes were largely complete with CEGMA and BUSCO scores > 80%, with
the exception of Chironex fleckeri.
Analysis of genome sequences from early-diverging lineages (N. vectensis, A.
digitifera, H. vulgaris, T. adhaerens, A. queenslandica, and M. leidyi) identified a
total of 23 globin-like genes in the six different taxa examined (Appendix A:
Supplementary Table 2.4; Appendix A: Supplementary Figure 2.1). The three
cnidarian species N. vectensis, A. digitifera, and H. vulgaris had nine, three and four
globin-like genes, respectively, while T. adhaerens (Placozoa) had five globin-like
genes and M. leidyi (Ctenophora) and A. queenslandica (Porifera) had one each.
28 Chapter 2: Methods and Results
Analysis of transcriptomic data identified a total of 74 globin-like genes from
15 different cnidarian taxa (Appendix A: Supplementary Table 2.4; Appendix A:
Supplementary Figure 2.1). Cnidarian species had up to 10 globin-like genes.
Compared to genome analyses, an additional globin-like gene was identified in the
N. vectensis transcriptome (ortholog reference: N.vectensis_tadh6000210), and two
additional candidate genes identified in the A. digitifera transcriptome (ortholog
reference: A.digitifera_nvec42000019, A.digitifera_nvec76000030). The order
Actiniaria had the greatest number with between 5-10 globin-like genes identified in
all species. In the medusozoan classes, A. alata and C. fleckeri (Cubozoa) each had
one globin-like gene, A. aurita (Scyphozoa) had five globin-like genes, while H.
polyclina (Hydrozoa) had four globin-like genes.
Seven globin-like genes in A. tenebrosa and nine globin-like genes in E.
pallida were validated using Sanger sequencing with identity matches of ≥ 99.8%
and ≥ 98.6%, respectively. A nonsynonymous mutation was observed at nucleotide
position 139 for A. tenebrosa ortholog reference A.tenebrosa_nvec76000030,
resulting in an amino acid change from Lysine (K) to Glutamic Acid (E).
Synonymous and nonsynonymous mutations (between 0-5 and 0-2, respectively)
were observed in all E. pallida candidate genes (Appendix A: Supplementary Table
2.5) which likely reflects the different sampling locations, Saudi Arabia (NCBI
accession number: PRJNA261862) versus Australia. The GbX membrane binding
motif (MGC) (Blank et al., 2011) was identified in six A. tenebrosa globin-like
proteins and seven E. pallida globin-like proteins. The majority of candidate globin
genes in sea anemones have the membrane binding motif, whereas, the majority of
candidate globin genes in all other cnidarians analysed lacked this motif.
Additionally, the 2/3 intron-exon structure is present in all nine sequences of E.
Chapter 2: Methods and Results 29
pallida (Appendix A: Supplementary Table 2.6), with predicted intron start locations
at helix positions B12.2 and G7.0, typical of metazoan globin genes.
2.2.2 Evolutionary and structural analyses
Phylogenetic analysis of globin sequences derived from genome data (Figure
2.1) showed that the majority of anthozoan globin genes fall within their own clade
and are sister to vertebrate GbX gene with moderate bootstrap support. However,
other globin genes from early-diverging lineages are paraphyletic with or fall outside
of vertebrate globin genes. Comparative phylogenetic analysis of transcriptome data
(Figure 2.2) showed that the majority of cnidarian globin-like genes are
monophyletic with vertebrate GbX (strong support) or Ngb (weak support).
However, some cnidarian globin genes fall outside these clades, specifically ortholog
reference nvec7000121 and medusozoan genes.
30 Chapter 2: Methods and Results
Figure 2.1: Maximum Likelihood bootstrap phylogenetic tree of identified candidate cnidarian globin genes in genomes of cnidarian species, with supported Bayesian posterior probabilities. Model species
representations of phyla Cnidaria, Ctenophora, Placozoa and Porifera (highlighted in purple, green, brown, and yellow, respectively) with vertebrate globin genes highlighted with red branches, and the
S. minutum outgroup highlighted in grey. Phylogenetic values shown as maximum likelihood bootstrap support (0-100)/Bayesian posterior probabilities (0-1.0). Bootstrap values < 50 and posterior probabilities < 0.5 shown with a ~ symbol and nodes not identical between each method shown with a
- symbol.
Chapter 2: Methods and Results 31
Figure 2.2: Maximum Likelihood bootstrap phylogenetic tree of identified candidate cnidarian globin genes in transcriptomes of cnidarian species, with supported Bayesian posterior probabilities.
Cnidarian pentacoordinate and hexacoordinate branches highlighted in green and blue, respectively, with vertebrate branches highlighted in red and the S. minutum outgroup branch highlighted in black. Pentacoordinate cnidarian genes represented in ortholog reference nvec7000121 are associated with
protein model highlighted in green. Hexacoordinate cnidarian genes are associated with protein model highlighted in blue. Phylogenetic values shown as maximum likelihood bootstrap support (0-
100)/Bayesian posterior probabilities (0-1.0). Bootstrap values < 50 and posterior probabilities < 0.5 shown with a ~ symbol and nodes not identical between each method shown with a - symbol.
Collapsed clades represent sequences with the corresponding ortholog reference gene nomenclature as referenced in Appendix A: Supplementary Table 2.3 (expanded clades shown in Appendix A:
Supplementary Figure 2.2).
32 Chapter 2: Methods and Results
Topological and ancestral analyses of globin sequences derived from genome
data revealed that Figure 2.1 is an accurate representation of phylogenetic
distribution. Approximately unbiased tests suggest that it is highly unlikely (p-value
≤ 0.07) that cnidarian globin genes are monophyletic with either Ngb or Ngb and
GbX. However, ancestral state inferences revealed that Ngb and cnidarian globin
genes are the favoured ancestral state over GbX.
Alignment of cnidarian globin genes showed that three amino acid residues
were highly conserved; Phenylalanine (F, CD1 position), Histidine/Glutamine (H/Q,
E7 position) and Histidine (H, F8 position). Figure 2.1 and Figure 2.2 revealed the
E7 amino acid variation between ortholog reference nvec7000121 (Q), which we
have reported as pentacoordinate, and all other sequences (H), which we have
reported as hexacoordinate. Interestingly, the presence of pentacoordinate globin
proteins was only identified in the class Anthozoa, suggesting a unique role that has
neither been characterised nor elucidated from any other class in phylum Cnidaria.
Phylogenetic analysis revealed that predicted proteins with pentacoordinate
confirmation have arisen once, but hexacoordinate predicted protein sequences have
undergone an expansion in actiniarian species (Figure 2.2).
Protein models for all validated sequences in A. tenebrosa and E. pallida
showed similar conserved structures between these two species (Figure 2.3A).
Interestingly, the pentacoordinate sequences revealed a forward and reverse position
for the distal Glutamine residue in A. tenebrosa and E. pallida, respectively (Figure
2.3B). The positions of the surrounding residues highlight the significance of steric
hindrance on protein structure, especially for ligand binding to the heme group.
Chapter 2: Methods and Results 33
Figure 2.3: Predictive cnidarian globin protein structure with heme pocket residues shown. (A) Structural variation of A. tenebrosa ortholog references A.tenebrosa_nvec7000121 (highlighted green)
and A.tenebrosa_nvec42000019 (highlighted blue) with side chain residue structures for F (CD1 position; phenylalanine), Q/H (E7 position; distal glutamine/histidine) and H (F8 position; proximal
histidine) shown. (B) Structural variation of A. tenebrosa ortholog reference A.tenebrosa_nvec7000121 (highlighted green) and E. pallida ortholog reference
E.pallida_nvec7000121 (highlighted gold) showing forward and reverse position of E7 residue Q, respectively, and with side chain residues surrounding E7 position shown.
2.2.3 Differential gene expression analyses
Tissue and developmental data assemblies passed quality checking with the
exception of two datasets from the N. vectensis development assembly. These
datasets (NCBI Accession: SRX351436 and SRX351430) displayed batch effect
outliers because of the different RNA treatments used, and subsequently were
excluded from downstream analysis. The development specific dataset was further
refined to include only the planula and adult stages to represent the same
developmental stages from E. pallida and N. vectensis.
34 Chapter 2: Methods and Results
Tissue specific data revealed four and three globin genes were differentially
expressed in A. tenebrosa and N. vectensis, respectively (Figure 2.4). These cnidarian
globin genes were downregulated/unregulated in the acrorhagi of A. tenebrosa and
the nematosome of N. vectensis. Two globin genes (ortholog references:
tadh6000210, nvec141000032/nvec50000067) were upregulated in tentacle, in both
A. tenebrosa and N. vectensis, while being downregulated in the mesentery filament.
Ortholog reference nvec7000121 was upregulated in mesenteric filament of A.
tenebrosa, whereas, it was upregulated in tentacle of N. vectensis.
Figure 2.4: Heatmap for tissue specific RNA-seq differential gene expression (DGE) analysis with three biological replicates for each tissue type. (A) Analysis of A. tenebrosa tissue types: acrorhagi, tentacle and mesentery filament. (B) Analysis of N. vectensis tissue types: nematosome, tentacle and
mesentery filament.
Development specific data revealed seven and two globin genes were
differentially expressed in N. vectensis and E. pallida, respectively (Figure 2.5). E.
pallida has one globin gene upregulated at the immature stage, and the other
upregulated at the mature stage. The globin sequence from the N. vectensis
transcriptome (ortholog reference: tadh6000210), also found in the tissue specific
Chapter 2: Methods and Results 35
data, was present in the development specific transcriptomic data. This ortholog had
the same upregulation in expression for both E. pallida and N. vectensis at the mature
stage. Furthermore, there is a difference in expression pattern for globin genes from
the same clade observed in Figure 2.2 (ortholog references: nvec5000153 and
nvec141000032).
Figure 2.5: Heatmap for development specific RNA-seq differential gene expression (DGE) analysis with two biological replicates for each tissue type. (A) Analysis of E. pallida developmental stages:
immature (larvae) and mature (adult), with three biological replicates for adult stage only. (B) Analysis of N. vectensis developmental stages: immature (planula) and mature (adult).
Across the tissue and developmental analyses, nine of the ten different
orthologs were differentially expressed, with eight differentially expressed in N.
vectensis. Interestingly, only a single cnidarian globin gene (ortholog reference
tadh6000210) was upregulated in tentacle (Figure 2.4) and adult (Figure 2.5) tissues
across each species, and this globin gene clusters closely with vertebrate Ngb (Figure
2.2).
36 Chapter 2: Methods and Results
Chapter 3: General Discussion
The globin gene superfamily has been extensively studied in vertebrates, but
has been poorly investigated in early-diverging eumetazoan lineages. Studies in early
branching lineages, such as phylum Cnidaria, can resolve this knowledge gap by
enabling elucidation of the similarities and differences among globin genes between
the sister groups: Cnidaria and Bilateria. This research has in part resolved this
through the investigation of genomic data from multiple cnidarian species, with a
particular focus on order Actiniaria. This project has provided a more detailed view
of the evolution and expression of cnidarian globin genes, as well as the predicted
structure and function of the proteins encoded by these globin genes. This project
also investigated the evolution of globin genes in multiple classes of phylum
Cnidaria, with a major focus on class Anthozoa, as more genomic data exists for this
group. Subsequently, a broad expansion of globin genes was revealed and some
resolution on the ancestry of eumetazoan globins was obtained. By studying the
predicted structure of cnidarian globin proteins, two structural conformations were
identified, which likely have different and possibly unique functions within specific
cnidarian lineages. Consequently, this knowledge can be used as a starting point for
more extensive studies into the globin gene superfamily of non-vertebrate and non-
model species.
Chapter 3: General Discussion 37
3.1 KEY FINDINGS
The expansion of globin genes has been identified in various metazoan
lineages, such as myoglobin in lungfish (Koch et al., 2016) and nerve globins in
nematodes (Hoogewijs et al., 2008), but there have been limited studies in early-
diverging species. Analyses of gene copy number, protein sequences, DGE patterns
and protein modelling suggests that large-scale duplication of cnidarian globin genes
followed by subfunctionalisation and possibly neofunctionalisation events has
occurred in actiniarians. This expansion of globin genes is similar to recent
observations in lungfish myoglobin genes (Koch et al., 2016). However, the large-
scale expansion of globin genes in phylum Cnidaria, but particularly order Actiniaria,
has not been observed in any other Metazoan lineage except for nerve globins
reported in phylum Nematoda (Hoogewijs et al., 2008). Furthermore, based on the
evolution of ancestral vertebrate globin genes suggested by Roesner et al., (2005),
Burmester et al., (2014) and this research, we cannot reject the hypothesis that a
Ngb-like/GbX-like gene was likely the ancestral globin gene in Eumetazoans. In fact,
it is more likely that a globin gene similar to those identified in cnidarians was the
ancestral gene that evolved into the vertebrate Ngb and GbX gene ancestor.
The pentacoordinate and hexacoordinate conformations in vertebrate globin
proteins have different affinities for ligands, and subsequently cnidarian globin
proteins would likely exhibit similar ligand affinities under these different
conformations. Ligands other than oxygen can be toxic at high concentrations, such
as, nitric oxide and carbon monoxide. These compounds can be bound to the heme
pocket of a globin protein, and this subsequently reduces the deleterious impact of
these molecules on the organism (Azarov et al., 2016; Brunori et al., 2005; Dewilde
et al., 2001; Fago et al., 2006). Furthermore, recent studies in carbon monoxide
38 Chapter 3: General Discussion
poisoning (Azarov et al., 2016) and nitrite reduction (Tejero et al., 2015) showed that
the pentacoordinate H64Q Ngb protein was a more effective ligand trap for toxic
compounds than hexacoordinate H64 Ngb. Consequently, the presence of a single
pentacoordinate globin protein found exclusively in class Anthozoa suggests a
unique function likely linked to a detoxification role.
Expression patterns of globin genes in actiniarians revealed some copies
displayed tissue and development specific expression. The upregulation of tentacle
specific globin genes suggests that they may have an alternative function to oxygen
storage and transport as this tissue type is in direct contact with the surrounding
water and is likely to be constantly diffusing gaseous compounds with the
surrounding environment. The requirement for cellular energy to replenish the dense
concentration of cells in the tentacle, however, makes an oxygen storage role for
mitochondrial ATP production and reactive oxygen species detoxification a possible
function for these cnidarian globin genes, a role similar to vertebrate Ngb (Bentmann
et al., 2005; Fordel et al., 2007).
3.2 EVOLUTION OF GLOBIN GENES IN PHYLUM CNIDARIA
Research into the evolution of the globin gene has been extensive, and yet the
ancestry of this ubiquitous gene family in Eumetazoa is still not resolved. The history
of vertebrate globin genes has been partially elucidated with the identification of
three genetically divergent genes; androglobin, neuroglobin and globin-X. By
evaluating early-diverging eumetazoan phyla, this research has revealed the presence
of Ngb-like and GbX-like genes in phylum Cnidaria. The presence of any chimeric
globins (androglobin-like) could not be confirmed. There was limited evidence of the
Chapter 3: General Discussion 39
membrane-bound globin motifs in phylum Cnidaria and phylogenetic analyses
suggest cnidarian globins are sister to vertebrate GbX (GbX-like). Additionally, there
are neither chimeric nor membrane-bound globins in other early-diverging phyla;
Ctenophora, Placozoa and Porifera. Bioinformatic analyses have confirmed the
preliminary research of Lechauve et al., (2013) by also identifying the presence of
Ngb-like genes in cnidarians. This data indicates that a cnidarian-like globin gene
(molecularly similar to Ngb and GbX) is common to all eumetazoans and was likely
the ancestral globin gene in this group. The identification of globin genes in the
early-diverging phylum, Placozoa, also suggests that this ancestral gene arose early
in metazoan evolution. Furthermore, it is likely that gene duplication events followed
by subfunctionalisation and possibly neofunctionalisation of the ancestral gene may
have given rise to the rich diversity of globin protein encoding genes in eumetazoans.
The expansion of globin genes in vertebrates can be attributed to gene
duplication events followed by sub-/neofunctionalisation. The 11 different globin
subfamilies found in vertebrates have different protein sequences, as well as a range
of other properties (Burmester & Hankeln, 2014). This research has identified
sequence variations in cnidarian globin genes to determine gene copy number and to
infer potential duplication events. Within order Actiniaria, up to six duplication
events have been identified (Supplementary Figure 3.1), which is similar to
vertebrate globin evolution for sub-/neofunctionalisation. Furthermore, nine globin
gene copies for E. pallida were validated with two independent duplication events
restricted to this species, as well as, two in N. vectensis and one in N. annamensis
(Supplementary Figure 3.1). Similar observations have been observed in teleost fish,
with two cytoglobin genes (identified as cytoglobin 1 and 2) that are thought to have
different functional roles (Fuchs et al., 2005). Additionally, these genes are assumed
40 Chapter 3: General Discussion
to be the result of a subfunctionalisation event, rather than neofunctionalisation,
based on evidence for selection pressure, mutation rates and expression patterns
(Fuchs et al., 2005). Differential expression of recently duplicated genes in E. pallida
only occurred between immature and adult stages for a single gene, although there is
a lack of knowledge about expression profiles of recently duplicated genes in
different tissues. Consequently, the limited number of recently duplicated genes that
were differentially expressed suggests that the duplication events in E. pallida may
have been recent and that these duplicates have not had time to undergo
diversification of their functional roles. However, it is also probable that older
duplicated globin gene copies in order Actiniaria have undergone
subfunctionalisation, particularly those that show altered tissue and developmental
expression patterns.
Our analysis of differential expression across tissues and development of
globin genes in order Actiniaria has also revealed similarities with the evolution in
bilaterian lineages. For example, Koch et al., (2016) have shown tissue specific
expression variations and statistically significant mitochondrial linked functional
variations among an expanded complement of myoglobin genes in lungfish.
Furthermore, Hoogewijs et al., (2008) have also shown tissue specific expression
variations following the expansion of globin genes in nematodes. Additionally,
developmental expression has been well-characterised for β-hemoglobin genes in
humans, where the embryonic, foetal and adult hemoglobin genes are sequentially
expressed during development (Stamatoyannopoulos, 2005). These three globin
genes have been shown to be differentially expressed during three developmental
stages of human growth, yet they have similar functional roles. Consequently, the
observed expressed patterns of these globin genes in vertebrates suggests that
Chapter 3: General Discussion 41
subfunctionalisation following gene duplication is one of the main driving forces in
the evolution of these genes. The tissue and development expression patterns
observed in A. tenebrosa, E. pallida and N. vectensis also indicate that
subfunctionalisation may be the dominant evolutionary force shaping the expression
of cnidarian globin genes.
3.3 CONVERGENT AMPLIFICATION OF GLOBIN GENES IN
EUMETAZOA
Eumetazoa is a large and diverse group of taxa, and there has been very limited
evidence for individual globin subfamily expansions in many groups. To our
knowledge, there have been three independent gene expansions, Ngb in nematodes,
myoglobin in lungfish and hemoglobin in mammals (Hoffmann et al., 2010;
Hoogewijs et al., 2008; Koch et al., 2016; Stamatoyannopoulos, 2005). There are up
to 33 Ngb gene copies in nematodes (Hoogewijs et al., 2008), seven myoglobin gene
copies in lungfish (Koch et al., 2016), and up to 11 hemoglobin gene copies in
mammals (Hoffmann et al., 2012; Stamatoyannopoulos, 2005). The research
presented here has revealed a convergent expansion of globin genes in actiniarians of
phylum Cnidaria with up to 10 gene copies present. This discovery represents more
evidence for the convergent expansion of globin genes in different eumetazoan taxa.
This research also provides evidence that a cnidarian-like globin gene,
molecularly and phylogenetically similar to vertebrate Ngb and GbX, is most likely
the ancestral gene in eumetazoans. This finding is interesting as it suggests that a
single ancestral gene has undergone gene duplication followed by
42 Chapter 3: General Discussion
subfunctionalisation and neofunctionalisation, resulting in multiple functional roles
to represent the broad repertoire of globin genes found in extant eumetazoan taxa.
3.4 STRUCTURE AND FUNCTION OF GLOBIN PROTEINS IN PHYLUM
CNIDARIA
The structure of globin proteins in vertebrates is well-known, however, there is
no known characterised structure for globin proteins in phylum Cnidaria.
Bioinformatic analyses have determined predictive models for the protein structures
of two actiniarian species, A. tenebrosa and E. pallida, which has given us insights
into the possible structures of these proteins. In vertebrate taxa, the two
conformations, hexacoordinate and pentacoordinate, are typically associated with
different globin protein subfamilies, e.g. Ngb is hexacoordinate and myoglobin is
pentacoordinate. However, there is a lack of knowledge about the structure of non-
vertebrate globins, which can be addressed by discerning protein sequences from
identified globin genes for different groups of cnidarians. In silico protein analyses
suggest that cnidarian sequences have similar structures to hexacoordinate globin
proteins. However, in cnidarians there is a key conserved residue mutation (E7
Histidine to Glutamine) that is known to form pentacoordinate globin proteins in
vertebrates (Azarov et al., 2016). Thus it is expected that cnidarians also have the
pentacoordinate globin protein structure and the functional roles attributed to this
structural form. Furthermore, protein predictions have revealed differences in steric
hindrance between A. tenebrosa and E. pallida pentacoordinate proteins, based on
the reversed distal residue in the heme pocket. The significance of this structural
change has yet to be determined but is likely to have an effect on ligand binding
Chapter 3: General Discussion 43
potentials and affinities of these cnidarian globin proteins. The identification of these
two possible structural conformations in cnidarian globin proteins is a preliminary
step in identifying the functions of these early-diverging globins.
The functional roles of vertebrate globin proteins have been well-characterised,
and it would be congruent that globin proteins in cnidarians would have similar
functions. The most well-known function in vertebrates is the transport and supply of
oxygen to working cells. Nematocysts are important cells in cnidarians used in
intraspecific aggressive encounters, defence against predators, prey capture and
during digestion, and thus the requirement for cellular energy to replenish the dense
concentration of nematocysts (especially in tentacle tissues) would be high. Notably,
upregulation of tentacle globin gene expression is likely from dense concentrations
of nematocysts or their progenitor cells, nematoblasts. This association of cnidarian
neural cell types and the observed expression patterns suggests the possibility of
functionally similar roles to vertebrate Ngb, and provides further evidence for the
identification of Ngb-like genes by Lechauve et al. (2013). Consequently, an oxygen
storage role for mitochondria ATP production and reactive oxygen species
detoxification is a strong possibility for cnidarian globin proteins (Bentmann et al.,
2005; Fordel et al., 2007). Alternately, vertebrate globin proteins have an affinity for
gaseous compounds other than oxygen, such as carbon monoxide and nitric oxide.
Interaction with these alternate gaseous compounds has been shown to reduce the
toxic effect that these molecules can cause, specifically for detoxification to maintain
cellular respiration and signalling (Azarov et al., 2016; Brunori et al., 2005; Flögel et
al., 2001; Kiger et al., 2011 Singh et al., 2013). In sea anemones, NO concentrations
increase due to heat stress (Perez et al., 2006), thus an intertidal species that is
regularly exposed to heat, such as A. tenebrosa, would likely have a globin protein to
44 Chapter 3: General Discussion
reduce the toxicity of NO. Additionally, a recent study in carbon monoxide
poisoning (Azarov et al., 2016) showed that the pentacoordinated H[E7]Q Ngb
protein was a more effective ligand trap for toxic chemicals in vertebrate taxa than
the wildtype hexacoordinate Ngb. Thus, from the in silico protein models, a single
pentacoordinate globin sequence was found exclusively in order Actiniaria, which
suggests a unique functional role more likely linked to detoxification rather than an
oxygen binding role. Consequently, this research has provided preliminary evidence
of a possible neofunctionalisation event exclusive to class Anthozoa.
3.5 EFFECT OF ENVIRONMENT ON GLOBIN GENE EXPRESSION IN
PHYLUM CNIDARIA
The supply of oxygen, typically to prevent hypoxia from environmental
pressures, is the main functional role attributed to the globin superfamily in
vertebrates. Research into the expression of globin genes and proteins in vertebrates
have shown variations in gene sequence that are thought to be linked to the different
environments where species occur (Roesner et al., 2006; Roesner et al., 2008).
Similar mechanisms would likely account for the variation observed for the tissue
and development expression patterns of the three species analysed; A. tenebrosa
inhabits the marine intertidal zone (often exposed to air), N. vectensis inhabits
shallow brackish coastal waters and E. pallida inhabits shallow marine waters. These
analyses only provide a bioinformatic analysis on expression rather than a functional
assay, so this conclusion cannot be validated based on bioinformatics alone.
However, by identifying these expression profiles, a more comprehensive study into
the effect of hypoxia on gene expression and protein abundance can be conducted.
Chapter 3: General Discussion 45
3.6 RESEARCH GAPS AND FUTURE DIRECTIONS
Genomic data was used throughout this project to complete our intended
objectives, however, more analyses of intron-exon structure were not performed.
Using RNA-seq data, published datasets were analysed to identify the evolution and
expression of the globin gene repertoire in cnidarians, and subsequently infer
structural models and functional roles of their corresponding proteins. Gene
validation for N. vectensis and Hydra vulgaris would have enabled us to identify the
intron-exon sequences from their genome datasets, which may have improved our
evolutionary analyses and could have provided further evidence for our conclusions.
Specifically, comparing the intron-exon structure between cnidarian and vertebrate
globin genes would have elucidated a greater understanding of the evolution of this
already highly studied gene superfamily. However, this approach was not considered
due to the lack of validated gene data for the Anthozoan species analysed, and it
would not have been viable in the required timeframe for this project.
Previously published datasets for three Actiniaria taxa were used to identify
and analyse tissue and development specific expression patterns. The bioinformatic
approach used for this analysis required a greater number of RNA-seq data for both
taxa diversity and replicate sampling. Due to budget constraints, further research and
an expansion of genomic resources for this project was not achievable. Expanding
the data for tissue and development specific expression, especially for more closely
related and for highly diverse taxa, would give a more holistic understanding of the
cnidarian globin gene repertoire. Furthermore, by using molecular techniques, such
as real-time PCR, the RNA-seq data used could be validated for basal and stress
46 Chapter 3: General Discussion
responses. This would be particularly beneficial for understanding the response of
individual organisms to globin related functional roles, such as oxygen storage and
detoxification, as well as potential environmental pressures, such as dissolved
oxygen, pH, salinity and temperature, on globin gene expression. Due to the lack of
published data and budget constraints, this approach was also unable to be applied in
this project.
This research was focused on bioinformatic techniques, such as phylogenetics
and predictive protein modelling, but these methods do not address the lack of
knowledge surrounding the biochemical functions of globin proteins in phylum
Cnidaria. The bioinformatic techniques that were used identified and validated a set
of globin genes in two Cnidarian taxa, A. tenebrosa and E. pallida. This suggests that
all other gene candidates identified are likely to be present within each dataset
analysed. Solving the protein structures and functions of the identified cnidarian
globin genes would further develop the knowledge and understanding of the globin
superfamily. Specifically, identifying the effect of various stresses related to the
functional roles of globin proteins, such as hypoxia, and carbon monoxide and nitric
oxide toxicity, would further the knowledge and understanding of cnidarian globins.
This would also determine the similarities and differences between the orthologous
globin proteins within phylum Cnidaria and its sister group, superphylum Bilateria.
Our research has, however, provided the preliminary evidence for understanding the
evolution, expression, structure and function of cnidarian globin genes and proteins.
Chapter 3: General Discussion 47
3.7 CONCLUSION
The globin gene superfamily has been extensively researched, and this project
has expanded this knowledge by analysing globin genes within the early-diverging
eumetazoan phylum, Cnidaria. Bioinformatic approaches were used to elucidate the
evolution, expression, structure and function of cnidarian globin genes and predicted
proteins. A broad expansion of globin genes was identified in four classes of
cnidarians, which have undergone subfunctionalisation events and the possibility of
neofunctionalisation events. This represents the third known large-scale expansion of
a single globin subfamily among eumetazoan taxa. In silico protein analyses have
revealed two structural conformations within the cnidarian repertoire,
pentacoordination which is exclusively found in class Anthozoa and
hexacoordination. Consequently, the functional role of cnidarian globin proteins is
likely to have similar roles as those observed in vertebrates, with a possible unique
role for the pentacoordinate globin protein. Consequently, this project has provided
foundational knowledge for cnidarian globin genes and proteins that future studies
can build upon to expand the knowledge of the well-studied globin gene superfamily.
48 Chapter 3: General Discussion
Bibliography
Azarov, I., Wang, L., Rose, J.J., Xu, Q., Huang, X.N., Belanger, A., … Gladwin,
M.T. (2016). Five-coordinate H64Q neuroglobin as a ligand-trap antidote for
carbon monoxide poisoning. Scientific Translational Medicine, 8(368),
368ra173. https://doi.org/10.1126/scitranslmed.aah6571
Babonis, L.S., Martindale, M.Q., & Ryan, J.F. (2016). Do novel genes drive
morphological novelty? An investigation of the nematosomes in the sea
anemone Nematostella vectensis. BMC Evolutionary Biology 16, 114.
Bashford, D., Chothia, C., & Lesk, A.M. (1987). Determinants of a Protein Fold:
Unique Features of the Globin Amino Acid Sequences. Journal of Molecular
Biology, 196, 199-216.
Baumgarten, S., Simakov, O., Esherickc, L.Y., Liewa, Y.J., Lehnertc, E.M.,
Michella, C.T., … Voolstraa, C.R. (2015). The genome of Aiptasia, a sea
anemone model for coral symbiosis. Proceedings of the National Academy of
Sciences United States of America, 112(38), 11893-11898.
Bentmann, A., Schmidt, M., Reuss, S., Wolfrum, U., Hankeln, T., & Burmester, T.
(2005). Divergent distribution in vascular and avascular mammalian retinae
links neuroglobin to cellular respiration. The Journal of Biological Chemistry,
280, 20660-20665.
Bocahut, A., Derrien, V., Bernad, S., Sebban, P., Sacquin-Mora, S., Guittet, E., &
Lescop, E. (2013). Heme orientation modulates histidine dissociation and ligand
Bibliography 49
binding kinetics in the hexacoordinated human neuroglobin. Journal of
Biological Inorganic Chemistry, 18(1), 111-122.
Borhani, H.A., Berghmans, H., Trashin, S., De Wael, K., Fago, A., Moens, L., …
Dewilde, S. (2015). Kinetic properties and heme pocket structure of two
domains of the polymeric haemoglobin of Artemia in comparison with the
native molecule. Biochimica et Biophysica Acta, 1854(10), 1307-1316.
Bostelaar, T., Vitvitsky, V., Kumutima, J., Lewis, B.E., Yadav, P.K., Brunold, T.C.,
… Banerjee, R. (2016). Hydrogen sulphide oxidation by myoglobin. Journal of
the American Chemical Society, 138, 8476-8488.
Bouchard, J.N., & Yamasaki, H. (2008). Heat Stress Stimulates Nitric Oxide
Production in Symbiodinium microadriaticum: A Possible Linkage between
Nitric Oxide and the Coral Bleaching Phenomenon. Plant Cell Physiology,
49(4), 641-652.
Brekhman, V., Malik, A., Haas, B., Sher, N., & Lotan, T. (2015). Transcriptome
profiling of the dynamic life cycle of the scypohozoan jellyfish Aurelia aurita.
BMC Genomics, 16, 74. https://doi.org/10.1186/s12864-015-1320-z
Brunori, M., Giuffrè, A., Nienhaus, K., Nienhaus, G.U., Scandurra, F.M., &Vallone,
B. (2005). Neuroglobin, nitric oxide, and oxygen: Functional pathways and
conformational changes. Proceedings of the National Academy of Sciences of
the United States of America, 102(24), 8483-8488.
Brusca, R. C., & Brusca, G. J. (2003). Invertebrates (2nd ed.). Sunderland, Mass:
Sinauer Associates.
50 Bibliography
Burmester, T., Ebner, B., Weich, B., & Hankeln, T. (2002). Cytoglobin: a novel
globin type ubiquitously expressed in vertebrate tissues. Molecular Biology and
Evolution, 19(4), 416-421.
Burmester, T., Haberkamp, M., Mitz, S., Roesner, A., Schmidt, M., Ebner, B., …
Hankeln, T. (2004). Neuroglobin and Cytoglobin: Genes, Proteins and
Evolution. Life, 56(11-12), 703-707.
Burmester, T., & Hankeln, T. (2009). What is the function of neuroglobin? Journal
of Experimental Biology, 212(10), 1423-1428.
Burmester, T., & Hankeln, T. (2014). Function and evolution of vertebrate globins.
Acta Physiologica. 211(3), 501-514.
Burmester, T., Welch, B., Reinhardt, S., & Hankeln, T. (2000). A vertebrate globin
expressed in the brain. Nature, 407(6803), 520-523.
Chapman, J.A., Kirkness, E.F., Simakov, O., Hampson, S.E., Mitros, T., Weinmaier,
T., … Steele, R.E. (2010). The dynamic genome of Hydra. Nature. 464(7288),
592-596.
Chen, F., Mackey, A.J., Stoeckert, C.J., & Roos, D.S. (2006). OrthoMCL-DB:
querying a comprehensive multi-species collection of ortholog groups. Nucleic
Acids Research, 34, D363-D368.
Dewilde, S., Kiger, L., Burmester, T., Hankeln, T., Baudin-Creuza, V., Aerts, T., …
Moens, L. (2001). Biochemical Characterization and Ligand Binding Properties
of Neuroglobin, a Novel member of the Globin Family. Journal of Biological
Chemistry, 276(42), 38949-38955.
Bibliography 51
Dickerson, R. E., & Geis, I. (1983). Hemoglobin: structure, function, evolution, and
pathology (Vol. 1983). Benjamin-Cummings Publishing Company.
Ebner, B., Panopoulou, G., Vinogradov, S.N., Kiger, L., Marden, M.C., Burmester,
T., & Hankeln, T. (2010). The globin gene family of the cephalochordate
amphioxus: implications for chordate globin evolution. BMC Evolutionary
Biology, 10. doi: 10.1186/1471-2148-10-370.
Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and
high throughput. Nucleic Acids Research, 32(5), 1792-1797.
Fago, A., Mathews, A.J., Dewilde, S., Moens, L., & Brittain, T. (2006). The
reactions of neuroglobin with CO: Evidence for two forms of the ferrous
protein. Journal of Inorganic Biochemistry, 100(8), 1339-1343.
Flögel, U., Merx, M.W., Gödecke, A., Decking, U.K.M., & Schrader, J. (2001).
Myoglobin: A scavenger of bioactive NO. Proceedings of the National Academy
of Sciences of the United States of America, 98(2), 735-740.
Fordel, E., Thijs, L., Moens, L., & Dewilde, S. (2007). Neuroglobin and cytoglobin
expression in mice: Evidence for a correlation with reactive oxygen species
scavenging. The FEBS Journal, 274(5), 1312-1317.
Freitas, T.A.K., Hou, S., Dioum, E.M., Saito, J.A., Newhouse, J., Gonzalez, G., …
Alam, M. (2004). Ancestral hemoglobins in Archaea. Proceedings of the
National Academy of Sciences of the United States of America, 101(17), 6675-
6680.
52 Bibliography
Freitas, T.A.K., Saito, J.A., Hou, S., & Alam, M. (2013). Globin-coupled sensors,
protoglobins, and the last universal common ancestor. Journal of Inorganic
Chemistry, 99(1), 23-33.
Fu, L., Niu, B., Zhu, Z., Wu, S., & Li, W. (2012). CD-HIT: accelerated for clustering
the next-generation sequencing data. Bioinformatics. 28(23), 3150-3152.
Giuffre, A., Moschetti, T., Vallone, B., & Brunori, M. (2008). Neuroglobin:
Enzymatic reduction and oxygen affinity. Biochemical and Biophysical
Research Communications, 367(4), 893-898.
Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D., Bowden, J.,
… Regev, A. (2008). De novo transcript sequence reconstruction from RNA-seq
using the Trinity platform for reference generation and analysis. Nature
protocols, 8(8), 1494-1512.
Hardison, R.C. (1996). A brief history of hemoglobins: Plant, animal, protist and
bacteria. Proceedings of the National Academy of Sciences of the United States
of America, 93, 5675-5679.
Hoffmann, F.G., Opazo, J.C., & Storz, J.F. (2010). Gene cooption and convergent
evolution of oxygen transport hemoglobins in jawed and jawless vertebrates.
Proceedings of the National Academy of Sciences of the United States of
America, 107(32), 14274-14279.
Hoffmann, F.G., Opazo, J.C., Hoogewijs, D., Hankeln, T., Ebner, B., Vinogradov,
S.N., … Storz, J.F. (2012). Evolution of the Globin Gene Family in
Deuterostomes: Lineage-Specific Patterns of Diversification and Attrition.
Molecular Biology and Evolution, 29(7), 1735-1745.
Bibliography 53
Hoogewijs, D., De Henau, S., Dewilde, S., Moens, L., Couvreur, M., Borgonie, G.,
… Vanfleteren, J.R. (2008). The Caenorhabditis globin gene family reveals
extensive nematode-specific radiation and diversification. BMC Evolutionary
Biology, 8. https://doi.org/10.1186/1471-2148-8-279
Hoogewijs, D., Ebner, B., Germani, F., Hoffmann, F.G. Fabrizius, A., Moens, L., …
Hankeln, T. (2011). Androglobin: a chimeric globin in metazoans that is
preferentially expressed in Mammalian testes. Molecular Biology and Evolution.
29(4), 1105-1114.
Hundahl, C.A, Elfving, B., Muller, H.K., Hay-Schmidt, A., & Wegener, G. (2013). A
gene-environment study of cytoglobin in the human and rat hippocampus. PLOS
One, 8(5), e63288. doi:10.1371/journal.pone.0063288
Jayaraman, T., Tejero, J., Chen, B.B., Blood, A.B., Frizzell, S., Shapiro, C., …
Gladwin, M.T. (2011). 14-3-3 Binding and Phosphorylation of Neuroglobin
during Hypoxia Modulate Six-to-Five Heme Pocket Coordination and Rate of
Nitrite Reduction to Nitric Oxide. Journal of Biological Chemistry, 286(49),
42679-42689.
Jeffreys, A.J., Wilson, V., Wood, D., & Simons, J.P. (1980). Linkage of Adult α–
and β–Globin Genes in X. laevis and Gene Duplication by Tetraploidization.
Cell, 21(2), 555-564.
Källberg, M., Wang, H., Wang, S., Peng, J., Wang, Z., Lu, H., & Xu, J. (2012).
Template-based protein structure modeling using the RaptorX web server.
Nature Protocols, 7(8), 1511–1522.
54 Bibliography
Kiger, L., Tillman, L., Geuens, E., Hoogewijs, G., Lechauve, C., Moens, L., …
Marden, M.C. (2011). Electron transfer function versus oxygen delivery: A
comparative study for several hexacoordinated globins across the Animal
Kingdom. PLOS One, 6(6). DOI: 10.1371/journal.pone.0020478
Koch, J., Lüdemann, J., Spies, R., Last, M., Amemiya, C.T., & Burmester, T. (2016).
Unusual diversity of myoglobin genes in the lungfish. Molecular Biology and
Evolution 33(12), 3033-3041.
Kriegl, J.M., Bhattacharyya, A.J., Nienhaus, K., Deng, P., Minkow, O., & Nienhaus,
G.U. (2002). Ligand binding and protein dynamics in neuroglobin. Proceedings
of the National Academy of Sciences of the United States of America, 99(12),
7992-7997.
Langmead, B., Trapnell, C., Pop, M., & Salzberg, S.L. (2009). Ultrafast and
memory-efficient alignment of short DNA sequences to the human genome.
Genome Biology, 10, R25. https://doi.org/10.1186/gb-2009-10-3-r25
Lechauve, C., Jager, M., Laguerre, L., Kiger, L., Correc, G., Leroux, C., … Bailey,
X. (2013). Neuroglobins, Pivotal Proteins Associated with Emerging Neural
Systems and Precursors of Metazoan Globin Diversity. Journal of Biological
Chemistry, 288(10), 6957-6967.
Li, B., & Dewey, C.N. (2011). RSEM: accurate transcript quantification from RNA-
Seq data with or without a reference genome. BMC Bioinformatics, 12, 323.
https://doi.org/10.1186/1471-2105-12-323
Mohamed, A.R., Cumbo, V., Harii, S., Shinzato, C., Chan, C.X., Ragan, M.A. …
Miller, D.J. (2016). The transcriptomic response of the coral Acropora digitifera
Bibliography 55
to a competent Symbiodinium strain: the symbiosome as an arrested early
phagosome. Molecular Ecology, 25(13), 3127-3141.
Nienhaus, K., Kriegl, J.M., & Nienhaus, G.U. (2004). Structural Dynamics in the
Active Site of Murine Neuroglobin and Its Effects on Ligand Binding. The
Journal of Biological Chemistry, 279(22), 22944-22952.
Ohno, S. (1969). The role of gene duplication in vertebrate evolution. The biological
basis of medicine, 4, 109-132.
Ota, M., Isogai, Y., & Nishikawa, K. (1997). Structural requirement of highly-
conserved residues in globins. FEBS Letters, 415(2), 129-133.
Parra, G., Bradnam, K., & Korf, I. (2007). CEGMA: a pipeline to accurately annotate
core genes in eukaryotic genomes. Bioinformatics, 23(9), 1061-1067.
Perez, S., & Weis, V. (2006). Nitric oxide and cnidarian bleaching: an eviction notice
mediates breakdown of a symbiosis. The Journal of Experimental Biology, 209,
2804-2810.
Pesce, A., Dewilde, S., Nardini, M., Moens, L., Ascenzi, P., Hankeln, T., …
Bolognesi, M. (2003). Human Brain Neuroglobin Structure Reveals a Distinct
Mode of Controlling Oxygen Affinity. Structure, 11(9), 1087-1095.
Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng,
E.C., & Ferrin, T.E. (2004). UCSF Chimera—a visualization system for
exploratory research and analysis. Journal of Computational Chemistry, 25(13),
1605-1612.
Pratlong, M., Haguenauer, A., Chabrol, O., Klopp, C., Pontarotti, P., & Aurelle, D.
(2015). The red coral (Corallium rubrum) transcriptome: a new resource for
56 Bibliography
population genetics and local adaptation studies. Molecular Ecology Resources,
15(5), 1205-1215.
Putnam, N.H., Srivastava, M., Hellsten, U., Dirks, B., Chapman, J., Salamov, A., …
Rokhsar, D.S. (2007). Sea Anemone Genome Reveals Ancestral Eumetazoan
Gene Repertoire and Genomic Organization. Science, 317(5834), 86-94.
Ramos-Alvarez, C., Yoo, B., Pietri, R., Lamarre, I., Martin, J., Lopez-Garriga, J., &
Negrerie, M. (2013). Reactivity and Dynamics of H2S, NO, and O2 Interacting
with Hemoglobins from Lucina pectinata. Biochemistry, 52(40), 7007-7021.
Robinson, M.D., McCarthy, D.J., & Smyth, G.K. (2010). A Bioconductor package
for differential expression analysis of digital gene expression data.
Bioinformatics, 26(1), 139-140.
Rodríguez, E., Barbeitos, M.S., Brugler, M.R., Crowley, L.M., Grajales, A., Gusmão,
L., … Daly, M. (2014). Hidden among sea anemones: The first comprehensive
phylogenetic reconstruction of the order Actiniaria (Cnidaria, Anthozoa,
Hexacorallia) reveals a novel group of hexacorals. PLOS One, 9(5), e96998.
Roesner, A., Fuchs, C., Hankeln, T., & Burmester, T. (2005). A Globin Gene of
Ancient Evolutionary Origin in Lower Vertebrates: Evidence for Two Distinct
Globin Families in Animals. Molecular Biology and Evolution, 22(1), 12-20.
Roesner, A., Hankeln, T., & Burmester, T. (2006). Hypoxia induces a complex
response of globin expression in zebrafish (Danio rerio). Journal of
Experimental Biology, 209, 2129-2137.
Bibliography 57
Roesner, A., Mitz, S.A., Hankeln, T., & Burmester, T. (2008). Globins and hypoxia
adaptation in the goldfish, Carassius auratus. The FEBS Journal, 275, 3633-
3643.
Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D.L., Darling, A., Höhna, S., …
Huelsenbeck, J.P. (2012). MrBayes 3.2: Efficient Bayesian phylogenetic
inference and model choice across a large model space. Systematic Biology,
61(3), 539-542.
Ruetz, M., Kumutima, J., Lewis, B.E., Filipovic, M.R., Lehnert, N., Stemmler, T.L.,
& Banerjee, R. (2017). A distal ligand mutes the interaction of hydrogen
sulphide with human neuroglobin. Journal of Biological Chemistry, 292(16),
6512-6528.
Ryan, J.F., Pang, K., Schnitzler, C.E., Nguyen, A., Moreland, R.T., Simmons, D.K.,
… Baxevanis, A.D. (2013). The genome of the ctenophore Mnemiopsis leidyi
and its implications for cell type evolution. Science, 342(6164), 1242592.
Shen, S., Slightom, J.L., & Smithies, O. (1981). A history of the Human Fetal Globin
Gene Duplication. Cell, 26(2), 191-203.
Singh, S., Zhuo, M., Gorgun, F.M., & Englander, E.W. (2013). Overexpressed
neuroglobin raises threshold for nitric oxide-induced impairment of
mitochondrial respiratory activities and stress signalling in primary cortical
neurons. Nitric Oxide, 32, 21-28.
Shimodaira, H. (2002). An approximately unbiased test of phylogenetic tree
selection. Systematic Biology, 51(3). 492-508.
58 Bibliography
Shinzato, C., Shoguchi, E., Kawashima, T., Hamada, M., Hisata, K., Tanaka, M., …
Satoh, N. (2011). Using the Acropora digitifera genome to understand coral
responses to environmental change. Nature, 476(7360), 320-323.
Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., Zdobnov, E.M.
(2015). BUSCO: assessing genome assembly and annotation completeness with
single-copy orthologs. Bioinformatics, 31(19), 3210-3212.
Srivastava, M., Begovic, E., Chapman, J., Putnam, N.H., Hellsten, U., Kawashima,
T., … Rokhsar, D.S. (2008). The Trichoplax genome and the nature of
placozoans. Nature, 454(7207), 955-960
Srivastava, M., Simakov, O., Chapman, J., Fahey, B., Gauthier, M.E.A., Mitros, T.,
… Rokhsar, D.S. (2010). The Amphimedon queenslandica genome and the
evolution of animal complexity. Nature, 466(7307), 720-726.
Stamatoyannopoulos, G. (2005). Control of globin gene expression during
development and erythroid differentiation. Experimental Hematology, 33. 259-
271.
Storz, J.F., Opazo, J.C., & Hoffmann, F.G. (2013). Gene duplication, genome
duplication, and the functional diversification of vertebrate globins. Molecular
Phylogenetics and Evolution, 66(2), 469-478
Tamura, K., Stecher, G., Peterson, D., Filipski, A., & Kumar, S. (2013) MEGA6:
Molecular Evolutionary Genetics Analysis Version 6.0. Molecular Biology and
Evolution, 30(12), 2725-2729.
Technau, U., & Steele, R.E. (2012). Evolutionary crossroads in developmental
biology: Cnidaria. Development, 138, 1447-1458.
Bibliography 59
Teixerira, T., Diniz, M., CaladoR., & Rosa, R. (2013). Coral physiological
adaptations to air exposure: Heat shock and oxidative stress responses in
Veretillum cynomorium. Journal of Experimental Marine Biology and Ecology,
439, 35-41.
Tejero, J., Sparacino-Watkins, C.E., Ragireddy, V., Frizzell, S., & Gladwin, M.T.
(2015). Exploring the Mechanisms of the Reductase Activity of Neuroglobin by
Site-Directed Mutagenesis of the Heme Distal Pocket. Biochemistry, 54(3), 722-
733.
Trapido-Rosenthal, H., Zielke, S., Owen, R., Buxton, L., Boeing, B., Bhagooli, R., &
Archer, J. (2005). Increased Zooxanthellae Nitric Oxide Synthase activity is
associated with coral bleaching. Biology Bulletin, 208, 3-6.
Trifinopoulos, J., Nguyen, L., von Haeseler, A., Minh, B.Q. (2016). W-IQ-TREE: a
fast online phylogenetic tool for maximum likelihood analysis. Nucleic Acids
Research, 44(W1), W232-W235.
Van Der Burg, C.A., Prentis, P.J., Surm, J.M., Pavasovic, A. (2016). Insights into the
innate immunome of actiniarians using a comparative genomic approach. BMC
Genomics, 17, 850. https://doi.org/10.1186/s12864-016-3204-2
Vitvitsky, V., Yadov, P.R., Kurthen, A., & Banerjee, R. (2015). Sulphide oxidation
by a noncanonical pathway in red blood cells generates thiosulfate and
polysulfides. The journal of Biological Chemistry, 290(13), 8310-8320.
Watanabe, S., Takahashi, N., Uchida, H., & Wakasugi, K. (2012). Human
Neuroglobin Functions as an Oxidative Stress-responsive Sensor for
Neuroprotection. Journal of Biological Chemistry, 287(6), 30128-30138.
60 Bibliography
Ye, J., Coulouris, G., Zaretskaya, I., Cutcutache, I., Rozen, S. & Madden, T.L.
(2012). Primer-BLAST: A tool to design target-specific primers for polymerase
chain reaction. BMC Bioinformatics, 13, 134. https://doi.org/10.1186/1471-
2105-13-134
Zapata, F., Goetz, F.E., Smith, S.A., Howison, M., Siebert, S., Church, S.H., …
Cartwright, P. (2015). Phylogenetic analyses support traditional relationships
within Cnidaria. PLOS One, 10(10), e0139068.
Zhang, J. (2003). Evolution by gene duplication: an update. Ecology and Evolution,
18(6), 292-298.
Bibliography 61
Appendices
Appendix A
Supplementary Tables and Figures
Supplementary Table 2.1: Output from OrthoMCL for candidate cnidarian globin genes, with individual gene nomenclature used for all downstream analyses.
Candidate Gene Nomenclature OrthoMCL Group
Genome Reference ID E-value %Ident %Match
A.alata_oanaENSOANP00000016790 OG5_132086 oana|ENSOANP00000016790 2e-20 33 96 A.aurita_nvec3000224_1 OG5_132086 nvec|fgenesh1_pg.scaffold_3000224 1e-25 34 99 A.aurita_nvec141000032_1 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 6e-16 31 91 A.aurita_nvec141000032_2 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 2e-25 34 99 A.aurita_nvec3000224_2 OG5_132086 nvec|fgenesh1_pg.scaffold_3000224 4e-18 29 99 A.aurita_oanaENSOANP00000016790 OG5_132086 oana|ENSOANP00000016790 9e-17 33 83 A.buddemeieri_nvec5000153 OG5_132086 nvec|fgenesh1_pg.scaffold_5000153 9e-66 60 99 A.buddemeieri_tadh6000210 OG5_132086 tadh|fgeneshTA2_pg.C_scaffold_6000210 2e-28 40 92 A.buddemeieri_nvec7000121 OG5_146786 nvec|fgenesh1_pg.scaffold_7000121 5e-35 39 99 A.buddemeieri_nvec141000032 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 6e-67 69 99 A.buddemeieri_nvec76000030 OG5_132086 nvec|fgenesh1_pg.scaffold_76000030 3e-42 47 100 A.buddemeieri_nvec3000224 OG5_132086 nvec|fgenesh1_pg.scaffold_3000224 6e-62 56 100 A.buddemeieri_nvec42000019 OG5_132086 nvec|fgenesh1_pg.scaffold_42000019 2e-63 61 99 A.digitifera_micrACO65508 OG5_132086 micr|ACO65508 8e-24 41 97 A.digitifera_nvec141000032_1 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 1e-34 43 98 A.digitifera_nvec141000032_2 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 1e-65 67 100 A.digitifera_nvec76000030 OG5_132086 nvec|fgenesh1_pg.scaffold_76000030 2e-36 42 99 A.digitifera_nvec42000019 OG5_132086 nvec|fgenesh1_pg.scaffold_42000019 8e-24 44 98 A.queenslandica_nvec7000121 OG5_146786 nvec|fgenesh1_pg.scaffold_7000121 3e-18 34 97
63 Appendices
A.tenebrosa_nvec42000019 OG5_132086 nvec|fgenesh1_pg.scaffold_42000019 1e-65 64 99 A.tenebrosa_nvec7000121 OG5_146786 nvec|fgenesh1_pg.scaffold_7000121 2e-34 40 97 A.tenebrosa_tadh6000210 OG5_132086 tadh|fgeneshTA2_pg.C_scaffold_6000210 8e-29 42 92 A.tenebrosa_nvec76000030 OG5_132086 nvec|fgenesh1_pg.scaffold_76000030 3e-42 48 98 A.tenebrosa_nvec3000224 OG5_132086 nvec|fgenesh1_pg.scaffold_3000224 4e-62 57 100 A.tenebrosa_nvec141000032_1 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 6e-65 68 99 A.tenebrosa_nvec141000032_2 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 3e-67 70 99 A.veratra_nvec7000121 OG5_146786 nvec|fgenesh1_pg.scaffold_7000121 1e-31 37 97 A.veratra_nvec141000032_1 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 9e-67 71 99 A.veratra_nvec141000032_2 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 1e-65 67 98 A.veratra_nvec76000030 OG5_132086 nvec|fgenesh1_pg.scaffold_76000030 4e-47 52 100 A.veratra_nvec3000224 OG5_132086 nvec|fgenesh1_pg.scaffold_3000224 4e-62 58 99 A.veratra_nvec42000019 OG5_132086 nvec|fgenesh1_pg.scaffold_42000019 3e-65 63 99 A.veratra_tadh6000210 OG5_132086 tadh|fgeneshTA2_pg.C_scaffold_6000210 3e-29 40 92 C.fleckeri_nvec7000121 OG5_146786 nvec|fgenesh1_pg.scaffold_7000121 7e-22 34 77 C.polypus_nvec141000032 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 1e-65 71 99 C.polypus_tadh6000210 OG5_132086 tadh|fgeneshTA2_pg.C_scaffold_6000210 1e-25 40 92 C.polypus_nvec42000019 OG5_132086 nvec|fgenesh1_pg.scaffold_42000019 4e-61 60 98 C.polypus_nvec7000121 OG5_146786 nvec|fgenesh1_pg.scaffold_7000121 1e-29 37 99 C.polypus_nvec76000030 OG5_132086 nvec|fgenesh1_pg.scaffold_76000030 2e-35 49 100 C.rubrum_nvec141000032_1 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 2e-39 46 99 C.rubrum_nvec141000032_2 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 1e-19 28 99 C.rubrum_nvec7000121 OG5_146786 nvec|fgenesh1_pg.scaffold_7000121 1e-25 35 81 E.pallida_nvec42000019_1 OG5_132086 nvec|fgenesh1_pg.scaffold_42000019 4e-47 51 97 E.pallida_nvec3000224 OG5_132086 nvec|fgenesh1_pg.scaffold_3000224 5e-63 57 99 E.pallida_tadh6000210_1 OG5_132086 tadh|fgeneshTA2_pg.C_scaffold_6000210 9e-28 42 90 E.pallida_nvec141000032_1 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 2e-67 73 99 E.pallida_nvec7000121 OG5_146786 nvec|fgenesh1_pg.scaffold_7000121 2e-30 36 98 E.pallida_nvec141000032_2 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 5e-66 71 99 E.pallida_tadh6000210_2 OG5_132086 tadh|fgeneshTA2_pg.C_scaffold_6000210 9e-24 40 90 E.pallida_nvec141000032_3 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 5e-38 44 99 E.pallida_nvec42000019_2 OG5_132086 nvec|fgenesh1_pg.scaffold_42000019 2e-62 65 99 H.polyclina_tadh6000210 OG5_132086 tadh|fgeneshTA2_pg.C_scaffold_6000210 3e-10 26 87
64 Appendices
H.polyclina_mdomENSODP00000006771 OG5_132086 mdom|ENSMODP00000006771 3e-8 22 92 H.polyclina_phumPHUM323880 OG5_132086 phum|PHUM323880 4e-8 25 72 H.polyclina_trubENSTRU00000033639 OG5_132086 trub|ENSTRUP00000033639 8e-9 23 85 H.vulgaris_tnigENSTNIP00000020604 OG5_132086 tnig|ENSTNIP00000020604 2e-13 27 70 H.vulgaris_drerENSDARP00000045749 OG5_132086 drer|ENSDARP00000045749 2e-13 27 81 H.vulgaris_nvec50000067 OG5_132086 nvec|fgenesh1_pg.scaffold_50000067 1e-11 25 84 H.vulgaris_tnigENSTNIP00000020604 OG5_132086 tnig|ENSTNIP00000020604 1e-13 29 71 M.leidyi_micrACO65508 OG5_132086 micr|ACO65508 6e-10 31 85 N.annamensis_nvec141000032 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 6e-62 64 99 N.annamensis_tadh6000210_1 OG5_132086 tadh|fgeneshTA2_pg.C_scaffold_6000210 1e-28 44 92 N.annamensis_tadh6000210_2 OG5_132086 tadh|fgeneshTA2_pg.C_scaffold_6000210 3e-24 40 100 N.annamensis_nvec7000121 OG5_146786 nvec|fgenesh1_pg.scaffold_7000121 2e-30 36 99 N.annamensis_nvec42000019 OG5_132086 nvec|fgenesh1_pg.scaffold_42000019 2e-55 55 99 N.annamensis_nvec76000030 OG5_132086 nvec|fgenesh1_pg.scaffold_76000030 3e-43 48 100 N.annamensis_nvec141000032_1 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 3e-66 71 99 N.annamensis_nvec141000032_2 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 2e-67 72 99 N.vectensis_nvec3000224 OG5_132086 nvec|fgenesh1_pg.scaffold_3000224 1e-106 100 100 N.vectensis_nvec141000032 OG5_132086 nvec|fgenesh1_pg.scaffold_141000032 6e-98 100 100 N.vectensis_nvec46000041 OG5_132086 nvec|fgenesh1_pg.scaffold_46000041 2e-77 100 100 N.vectensis_nvec7000121 OG5_146786 nvec|fgenesh1_pg.scaffold_7000121 1e-114 100 100 N.vectensis_nvec42000018 OG5_132086 nvec|fgenesh1_pg.scaffold_42000018 1e-138 100 100 N.vectensis_nvec42000019 OG5_132086 nvec|fgenesh1_pg.scaffold_42000019 1e-104 100 100 N.vectensis_nvec76000030 OG5_132086 nvec|fgenesh1_pg.scaffold_76000030 1e-107 100 100 N.vectensis_nvec5000153 OG5_132086 nvec|fgenesh1_pg.scaffold_5000153 1e-124 100 100 N.vectensis_nvec50000067 OG5_132086 nvec|fgenesh1_pg.scaffold_50000067 1e-105 100 100 N.vectensis_tadh6000210 OG5_132086 tadh|fgeneshTA2_pg.C_scaffold_6000210 1e-23 41 96 P.variabilis_tadh6000210 OG5_132086 tadh|fgeneshTA2_pg.C_scaffold_6000210 3e-25 39 86 P.variabilis_micrACO65508 OG5_132086 micr|ACO65508 2e-18 38 88 T.adhaerens_tadh12000183 OG5_174830 tadh|fgeneshTA2_pg.C_scaffold_12000183 2e-64 100 100 T.adhaerens_tadh42000020 OG5_211503 tadh|fgeneshTA2_pg.C_scaffold_42000020 1e-103 100 100 T.adhaerens_tadh6000210 OG5_132086 tadh|fgeneshTA2_pg.C_scaffold_6000210 7e-91 100 100 T.adhaerens_tadh3000908 OG5_173496 tadh|fgeneshTA2_pg.C_scaffold_3000908 8e-75 100 100 T.adhaerens_tadh3000909 OG5_211503 tadh|fgeneshTA2_pg.C_scaffold_3000909 3e-81 100 100
65 Appendices
Supplementary Table 2.2: Primer sequences and estimated gene sequence length for candidate cnidarian globin genes in A. tenebrosa and E. pallida. Candidate gene nomenclature referenced
from OrthoMCL results detailed in Supplementary Table 2.4.
Candidate Gene ID NCBI Accession Number
Forward Primer Sequence Reverse Primer Sequence Estimated Gene Length
A.tenebrosa_nvec141000032_1 KY810202 TTTTTCCGTCTCGAAGATA CAAAGTGTACACCCTCTTC 579 A.tenebrosa_nvec141000032_2 KY810203 AAACCAAGATCGACCAGTT TACAGATCTAGACCAGGAAAG 588 A.tenebrosa_nvec3000224 KY810201 TCTTTTCAAGTTTTCCTAGCC GGCAAGACTTTTCCAGTTTA 609 A.tenebrosa_nvec42000019 KY810197 GAGTTAAGAATTCAAGAGGC GCTGTTCACACAGATATAAAGA 640 A.tenebrosa_nvec7000121 KY810198 AGTTTTCTTGCTCTGTTCATC CATGCGCATCACTGTTTG 577 A.tenebrosa_nvec76000030 KY810200 CACTGCTTAAAGTCCTCATTAT CCTGTGCGTTCTCATGTA 604 A.tenebrosa_tadh6000210 KY810199 TGATGTCCAAAATACTGATGC CCCTTGTCGATTGATAAAGTAT 648 E.pallida_nvec141000032_1 KY810207 TCCGACTAGGCGAAATTAAA GTTCTTTATTCATGTTTGATGTG 582 E.pallida_nvec141000032_2 KY810209 TATACAAAGAAATCCTCAAGAGA TTAGGTGGTCGATAGTGATG 564 E.pallida_nvec141000032_3 KY810211 CCTGGTTTGCCATATTGATTG AAGATTCTTACATATGACAAGTGG 614 E.pallida_nvec3000224 KY810205 CTGATAGAGAAGTGACGAGAT CGATACCGCTGAACATCAAT 580 E.pallida_nvec42000019_1 KY810204 ACCAACAATCTTCATTGAACT TAGCCATAGATTTTACGTGGA 610 E.pallida_nvec42000019_2 KY810212 TTAATTTGAAGTCTTTCGTGAAG AATTAGACTTTGGCTTTGAGC 590 E.pallida_nvec7000121 KY810208 TAAAATCGTTCACACATCGTT GCTATTCGTACGAGAATGAAA 620 E.pallida_tadh6000210_1 KY810206 TAGGTGTACTGGGAATTTGAT GACAGTAGGTAAAGCAAGAAG 544 E.pallida_tadh6000210_2 KY810210 TGAAGCAATAAGCAGTTCCC CTAAAAAGAGATGTGATTGGCT 555
66 Appendices
Supplementary Table 2.3: Trinity De novo assembled transcriptome statistics for quality check analysis. Abbreviations: n/a, Not Applicable.
Genus Species Accession Number
N50 No. Genes
No. Transcripts
CEGMA Score
BUSCO %
Accession Citation
Actinia tenebrosa SRX1604071 1,995 92,938 114,252 239 97.0 Van Der Burg et al., 2016 Actinia tenebrosa PRJNA350366 1,256 165,401 221,845 241 97.7 n/a Acropora digitifera PRJNA309168 1,160 101,721 133,920 236 93.8 Mohamed et al., 2016 Alatina alata SRX978662 1,044 121,034 141,973 231 91.4 n/a Anthopleura buddemeieri SRX1604661 1,034 150,702 212,774 220 94.6 Van Der Burg et al., 2016 Aulactinia verata SRX1614867 1,333 132,909 174,203 237 97.2 Van Der Burg et al., 2016 Aurelia aurita PRJNA252562 932 99,240 132,259 236 97.1 Brekhman et al 2015 Calliactis polypus SRX1614869 1,516 118,290 146,659 236 96.9 Van Der Burg et al., 2016 Chironex fleckeri SRX891607 1,377 46,983 51,149 200 74.7 n/a Corallium rubrum SRX675792 n/a n/a n/a 244 97.4 Pratlong et al., 2015 Exaiptasia pallida PRJNA261862 1,449 163,275 192,450 245 97.4 Baumgarten et al., 2015 Hydractinia polyclina SRX315374 1,300 135,939 159,235 242 95.8 n/a Nemanthus annamensis SRX1634628 1,699 72,505 88,325 242 97.3 Van Der Burg et al., 2016 Nematostella vectensis PRJEB13676 1,033 301,047 369,434 232 86.8 Babonis et al., 2016 Nematostella vectensis PRJNA213177 1,255 133,272 153,212 222 96.5 n/a Protopalythoa variabilis SRX978667 1,094 118,609 131,993 222 88.9 n/a
67 Appendices
Supplementary Table 2.4. Results of data interrogation for genome and transcriptome datasets. Details represent additional information for individual candidate cnidarian globin genes.
Candidate gene nomenclature referenced from OrthoMCL results detailed in Supplementary Table 2.4.
Species and Candidate Count
Dataset Accession Number
Candidate ID Details
Cnidaria
Acropora digitifera (3) Genome GCA_000222465.2 A.digitifera_micrACO65508 Genome GCA_000222465.2 A.digitifera_nvec141000032_1 Genome GCA_000222465.2 A.digitifera_nvec141000032_2 Hydra vulgaris (4) Genome XM_012702974 H.vulgaris_drerENSDARP00000045749 Genome XM_012711718 H.vulgaris_nvec50000067 Genome XM_004209707 H.vulgaris_tnigENSTNIP00000020604_1 Genome XM_004206290 H.vulgaris_tnigENSTNIP00000020604_2 Nematostella vectensis (9) Genome XM_001629427 N.vectensis_nvec141000032 Genome XM_001641595 N.vectensis_nvec3000224 Genome XM_001636028 N.vectensis_nvec42000018 Genome XM_001636029 N.vectensis_nvec42000019 Genome XM_001635585 N.vectensis_nvec46000041 Genome XM_001635260 N.vectensis_nvec50000067 Genome XM_001640935 N.vectensis_nvec5000153 Genome XM_001640512 N.vectensis_nvec7000121 Genome XM_001633077 N.vectensis_nvec76000030 Acropora digitifera (5) Transcriptome PRJNA309168 A.digitifera_micrACO65508 Transcriptome PRJNA309168 A.digitifera_nvec141000032_1 Transcriptome PRJNA309168 A.digitifera_nvec141000032_2 Transcriptome PRJNA309168 A.digitifera_nvec42000019 Incomplete ORF; Full globin
domain Transcriptome PRJNA309168 A.digitifera_nvec76000030 Actinia tenebrosa (7) Transcriptome SRX1604071 A.tenebrosa_nvec141000032_1 Transcriptome SRX1604071 A.tenebrosa_nvec141000032_2 Transcriptome SRX1604071 A.tenebrosa_nvec3000224 Transcriptome SRX1604071 A.tenebrosa_nvec42000019
68 Appendices
Transcriptome SRX1604071 A.tenebrosa_nvec7000121 Transcriptome SRX1604071 A.tenebrosa_nvec76000030 Transcriptome SRX1604071 A.tenebrosa_tadh6000210 Alatina alata (1) Transcriptome SRX978662 A.alata_oanaENSOANP00000016790 Tentacle only transcriptome Anthopleura buddemeieri (7) Transcriptome SRX1604661 A.buddemeieri_nvec141000032 Transcriptome SRX1604661 A.buddemeieri_nvec3000224 Transcriptome SRX1604661 A.buddemeieri_nvec42000019 Transcriptome SRX1604661 A.buddemeieri_nvec5000153 Transcriptome SRX1604661 A.buddemeieri_nvec7000121 Transcriptome SRX1604661 A.buddemeieri_nvec76000030 Transcriptome SRX1604661 A.buddemeieri_tadh6000210 Aulactinia veratra (7) Transcriptome SRX1614867 A.veratra_nvec141000032_1 Transcriptome SRX1614867 A.veratra_nvec141000032_2 Transcriptome SRX1614867 A.veratra_nvec3000224 Transcriptome SRX1614867 A.veratra_nvec42000019 Transcriptome SRX1614867 A.veratra_nvec7000121 Transcriptome SRX1614867 A.veratra_nvec76000030 Transcriptome SRX1614867 A.veratra_tadh6000210 Aurelia aurita (5) Transcriptome PRJNA252562 A.aurita_nvec141000032_1 Transcriptome PRJNA252562 A.aurita_nvec141000032_2 Transcriptome PRJNA252562 A.aurita_nvec3000224_1 Transcriptome PRJNA252562 A.aurita_nvec3000224_2 Transcriptome PRJNA252562 A.aurita_oanaENSOANP00000016790 Calliactis polypus (5) Transcriptome SRX1614869 C.polypus_nvec141000032 Transcriptome SRX1614869 C.polypus_nvec42000019 Transcriptome SRX1614869 C.polypus_nvec7000121 Transcriptome SRX1614869 C.polypus_nvec76000030 Incomplete ORF; Full globin
domain Transcriptome SRX1614869 C.polypus_tadh6000210 Chironex fleckeri (1) Transcriptome SRX891607 C.fleckeri_nvec7000121 Tentacle only transcriptome Corallium rubrum (3) Transcriptome SRX675792 C.rubrum_nvec141000032_1 Transcriptome SRX675792 C.rubrum_nvec141000032_2 Transcriptome SRX675792 C.rubrum_nvec7000121
69 Appendices
Exaiptasia pallida (9) Transcriptome PRJNA261862 E.pallida_nvec141000032_1 Transcriptome PRJNA261862 E.pallida_nvec141000032_2 Transcriptome PRJNA261862 E.pallida_nvec141000032_3 Transcriptome PRJNA261862 E.pallida_nvec3000224 Transcriptome PRJNA261862 E.pallida_nvec42000019_1 Transcriptome PRJNA261862 E.pallida_nvec42000019_2 Transcriptome PRJNA261862 E.pallida_nvec7000121 Transcriptome PRJNA261862 E.pallida_tadh6000210_1 Transcriptome PRJNA261862 E.pallida_tadh6000210_2 Hydractinia polyclina (4) Transcriptome SRX315374 H.polyclina_mdomENSODP00000006771 Transcriptome SRX315374 H.polyclina_phumPHUM323880 Transcriptome SRX315374 H.polyclina_tadh6000210 Transcriptome SRX315374 H.polyclina_trubENSTRU00000033639 Nemanthus annamensis (8) Transcriptome SRX1634628 N.annamensis_nvec141000032 Transcriptome SRX1634628 N.annamensis_nvec141000032 Transcriptome SRX1634628 N.annamensis_nvec141000032 Transcriptome SRX1634628 N.annamensis_nvec42000019 Transcriptome SRX1634628 N.annamensis_nvec7000121 Transcriptome SRX1634628 N.annamensis_nvec76000030 Transcriptome SRX1634628 N.annamensis_tadh6000210 Transcriptome SRX1634628 N.annamensis_tadh6000210 Nematostella vectensis (10) Transcriptome PRJNA213177 N.vectensis_nvec141000032 Transcriptome PRJNA213177 N.vectensis_nvec3000224 Transcriptome PRJNA213177 N.vectensis_nvec42000018 Transcriptome PRJNA213177 N.vectensis_nvec42000019 Transcriptome PRJNA213177 N.vectensis_nvec46000041 Transcriptome PRJNA213177 N.vectensis_nvec50000067 Transcriptome PRJNA213177 N.vectensis_nvec5000153 Transcriptome PRJNA213177 N.vectensis_nvec7000121 Transcriptome PRJNA213177 N.vectensis_nvec76000030 Transcriptome PRJNA213177 N.vectensis_tadh6000210 Protopalythoa variabilis (2) Transcriptome SRX978667 P.variabilis_micrACO65508 Transcriptome SRX978667 P.variabilis_tadh6000210
70 Appendices
Ctenophora
Mnemiopsis leidyi (1) Genome GCA_000226015.1 M.leidyi_micrACO65508
Placozoa
Trichoplax adhaerens (5) Genome GCA_000150275.1 T.adhaerens_tadh3000908 Genome GCA_000150275.1 T.adhaerens_tadh3000909 Genome GCA_000150275.1 T.adhaerens_tadh6000210 Genome GCA_000150275.1 T.adhaerens_tadh42000020 Genome GCA_000150275.1 T.adhaerens_tadh12000183
Porifera
Amphimedon queenslandica (1)
Genome GCA_000090795.1 A.queenslandica_nvec7000121
71 Appendices
Supplementary Table 2.5: Synonymous and nonsynonymous mutations identified in validated transcriptome contigs for E. pallida species. Abbreviations: Syn, Synonymous; Non-syn, Non-
synonymous; n/a, Not Applicable.
Ortholog Reference ID Syn Count
Non-syn Count
Non-syn Nucleotide position
Non-syn Nucleotide Change
Non-syn Amino Acid Change
E.pallida_nvec141000032_1 2 1 47 G/A R/K E.pallida_nvec141000032_2 2 0 n/a n/a n/a E.pallida_nvec141000032_3 2 0 n/a n/a n/a E.pallida_nvec3000224 1 0 n/a n/a n/a E.pallida_nvec42000019_1 5 2 223; 369 A/T; A/T T/S; E/D E.pallida_nvec42000019_2 3 1 242 G/A S/N E.pallida_nvec7000121 5 0 n/a n/a n/a E.pallida_tadh6000210_1 0 2 7;38 A/C; C/T S/R; A/V E.pallida_tadh6000210_2 4 0 n/a n/a n/a
72 Appendices
Supplementary Table 2.6: Intron-exon structure analysis of nine E. pallida globin genes. Gene, exon and intron lengths are given as nucleotide counts. N/A used to identify introns with large
blocks of ambiguous nucleotides, thus true length of intron could not be determined. Abbreviations: forward, F; reverse, R.
Ortholog Reference
Scaffold Reference
Alignment Direction
Gene Length Exon 1
Intron 1 Exon 2
Intron 2 Exon 3
nvec141000032_1 18385412 F 516 143 219 226 795 147 nvec141000032_2 18385412 F 513 146 1366 226 537 141 nvec141000032_3 18385051 F 558 159 1437 225 343 174 nvec3000224 18385051 R 543 173 911 226 146 144 nvec42000019_1 18387879 F 546 173 2619 217 N/A 156 nvec42000019_2 18387879 F 534 170 N/A 217 N/A 147 nvec7000121 18385098 R 528 155 980 226 470 147 tadh6000210_1 18388191 F 498 161 1076 208 719 129 tadh6000210_2 18385444 R 486 146 N/A 205 260 135
73 Appendices
Supplementary Figure 2.1: Cladogram overview of phylogenetic relationships for early-diverging species, phylum Cnidaria derived from mitochondrial (Rodríguez et al., 2014) and genomic (Zapata et
al., 2015) genes. Red, light green and pink highlighting represents the three most studied Superfamilies of Actiniaria; Actinioidea, Metridioidea and Edwardsioidea, respectively. Candidate
cnidarian globin gene copy number in brackets after species name. Abbreviations: O, Order; C, Class.
75 Appendices
Supplementary Figure 2.2: Maximum Likelihood phylogenetic tree of identified candidate cnidarian globin genes in transcriptomes of cnidarian species, with supported Bayesian posterior probabilities.
Model and non-model representations of vertebrate globin genes, cnidarian classes Anthozoa, Cubozoa, Hydrozoa and Scyphozoa, with S. minutum as the outgroup. Phylogenetic values shown as
maximum likelihood bootstrap support (0-100)/Bayesian posterior probabilities (0-1.0).
76 Appendices
Supplementary Figure 3.1: Maximum Likelihood bootstrap phylogenetic tree of identified candidate cnidarian globin genes in transcriptomes of cnidarian species, with supported Bayesian posterior
probabilities. Blue dots represent gene duplication events within Actiniaria taxa. Red brackets represent individual gene duplication events within specific species. Phylogenetic values shown as maximum likelihood bootstrap support (0-100)/Bayesian posterior probabilities (0-1.0). Bootstrap
values < 50 and posterior probabilities < 0.5 shown with a ~ symbol and nodes not identical between each method shown with a - symbol. Collapsed clades represent sequences with the corresponding
ortholog reference gene nomenclature as referenced in Appendix A: Supplementary Table 2.3 (expanded clades shown in Appendix A: Supplementary Figure 2.2).
77 Appendices