conservation of the mixl1 homeobox in multiple species

14
Conservation of the homeodomain sequence of the Mixl1 homeobox gene in multiple species Nhi Hin ABSTRACT The Mix family of paired-like homeobox genes are highly conserved throughout evolution due to their vital roles in the formation and specification of mesoderm and endoderm during vertebrate gastrulation. The homeodomain motif is essential for the DNA-binding activity of the transcription factor products of these genes and is shown to be highly conserved amongst a diverse range of species despite some variation in homeobox nucleotide sequence. In the present study, the Mix1 homeodomain sequences in the mouse, zebrafish and platypus are isolated, sequenced and compared to gain insight into their evolutionary history. The Mix1 homeobox sequences determined are consistent with the reported literature. Despite some variation in nucleotide sequence, active-site amino acid residues in the N-terminal arm and recognition helix were found to be particularly conserved throughout all species while nucleotides corresponding to non- active site amino acids displayed greater variation. INTRODUCTION Homeobox (Hox) genes are crucial in the pattern formation of many vertebrates during embryogenesis. The genes are clustered in the genome and encode transcription factors called homeoproteins that specify segmental identity and positional information along the anterior-posterior axis. The organisation of Hox genes in the chromosome corresponds to the order of their spatial and temporal expression along the anterior- posterior body axis, a phenomenon referred to as “collinearity” (Fig.1A). Additionally, Hox genes contain a highly conserved DNA sequence known as the homeobox which encodes a 60-amino acid protein structure called the homeodomain (Fig.1B). The homeodomain has a helix-turn-helix motif that allows homeoproteins to bind to specific DNA sequences; the N-terminal arm contacts the minor DNA groove while the third helix interacts with the major DNA groove (Fig. 1C) (Burke et al. 1995; Gehring et al. 1994). The Mix family of paired-like homeobox genes has been highly conserved throughout vertebrate development and is involved in the establishment and specification of mesoderm and endoderm germ layers during gastrulation (Pereira et al. 2012). Mesoderm-Inducing-Factor Inducible Homeobox (Mix1) is predominantly expressed at the sites of future endoderm and mesoderm development in the blastula in many species including mice and humans (Pereira et al. 2012). Zebrafish appear to have multiple partially redundant Mix genes, including the specific paired-like-homeobox gene called Mxtx1, which is expressed in an analogous location to the primitive endoderm in mammalian embryos (Hirata et al. 2000). The conservation of Mix-family genes across many species make them useful for investigating the evolutionary history of these species. In the present study, sequences corresponding to the Mixl1 gene in mice and platypus and Mxtx1gene in zebrafish will be extracted and sequenced. The gene structures for mouse and platypus Mixl1 along with zebrafish Mxtx1 are shown in Figure 2. Note the homeobox sequences are separated by a variable intrionic sequence in each species. Primers have been designed to extract sequences containing the homeobox regions. The extent of conservation of nucleotide sequence and amino acid sequence will be determined and the evolutionary history of these species briefly discussed.

Upload: nhi-hin

Post on 22-Jan-2018

75 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Conservation of the Mixl1 homeobox in multiple species

Conservation of the homeodomain sequence of

the Mixl1 homeobox gene in multiple species

Nhi Hin

ABSTRACT

The Mix family of paired-like homeobox genes are highly conserved throughout evolution due to their vital

roles in the formation and specification of mesoderm and endoderm during vertebrate gastrulation. The

homeodomain motif is essential for the DNA-binding activity of the transcription factor products of these

genes and is shown to be highly conserved amongst a diverse range of species despite some variation in

homeobox nucleotide sequence. In the present study, the Mix1 homeodomain sequences in the mouse,

zebrafish and platypus are isolated, sequenced and compared to gain insight into their evolutionary history.

The Mix1 homeobox sequences determined are consistent with the reported literature. Despite some

variation in nucleotide sequence, active-site amino acid residues in the N-terminal arm and recognition helix

were found to be particularly conserved throughout all species while nucleotides corresponding to non-

active site amino acids displayed greater variation.

INTRODUCTION

Homeobox (Hox) genes are crucial in the pattern

formation of many vertebrates during

embryogenesis. The genes are clustered in the

genome and encode transcription factors called

homeoproteins that specify segmental identity and

positional information along the anterior-posterior

axis. The organisation of Hox genes in the

chromosome corresponds to the order of their

spatial and temporal expression along the anterior-

posterior body axis, a phenomenon referred to as

“collinearity” (Fig.1A). Additionally, Hox genes

contain a highly conserved DNA sequence known as

the homeobox which encodes a 60-amino acid

protein structure called the homeodomain (Fig.1B).

The homeodomain has a helix-turn-helix motif that

allows homeoproteins to bind to specific DNA

sequences; the N-terminal arm contacts the minor

DNA groove while the third helix interacts with the

major DNA groove (Fig. 1C) (Burke et al. 1995;

Gehring et al. 1994).

The Mix family of paired-like homeobox genes has

been highly conserved throughout vertebrate

development and is involved in the establishment

and specification of mesoderm and endoderm germ

layers during gastrulation (Pereira et al. 2012).

Mesoderm-Inducing-Factor Inducible Homeobox

(Mix1) is predominantly expressed at the sites of

future endoderm and mesoderm development in the

blastula in many species including mice and humans

(Pereira et al. 2012). Zebrafish appear to have

multiple partially redundant Mix genes, including the

specific paired-like-homeobox gene called Mxtx1,

which is expressed in an analogous location to the

primitive endoderm in mammalian embryos (Hirata

et al. 2000).

The conservation of Mix-family genes across many

species make them useful for investigating the

evolutionary history of these species. In the present

study, sequences corresponding to the Mixl1 gene in

mice and platypus and Mxtx1gene in zebrafish will be

extracted and sequenced. The gene structures for

mouse and platypus Mixl1 along with zebrafish Mxtx1

are shown in Figure 2. Note the homeobox

sequences are separated by a variable intrionic

sequence in each species. Primers have been

designed to extract sequences containing the

homeobox regions. The extent of conservation of

nucleotide sequence and amino acid sequence will

be determined and the evolutionary history of these

species briefly discussed.

Page 2: Conservation of the Mixl1 homeobox in multiple species

Figure 1. Illustrations of Hox gene properties. (A) Collinear expression of Hox genes in Drosophila. The order of

Hox genes in the HOM-C cluster in the genome corresponds to spatial and temporal expression across the anterior-

posterior body axis. Image: (Mark et al. 1997). (B) General schematic representation of a Hox gene, homeoprotein and

homeodomain. The 180 bp homeobox on the Hox gene encodes the homeodomain on the homeoprotein. The

homeodomain has three helices. Image: (Lappin et al. 2006). (C) Binding of homeodomain motif to DNA. The N-terminal

arm contacts the minor groove while the third helix interacts with the major groove. Image: (Hueber 2007).

A. Mouse Mixl1 gene structure

B. Platypus Mixl1 gene structure

C. Zebrafish mxtx1 gene structure

Figure 2. Schematic gene diagrams of (A) mouse Mixl1, (B) platypus Mixl1 and (C) zebrafish Mxtx1 (Daish 2016b).

Primers Homeobox

location

Page 3: Conservation of the Mixl1 homeobox in multiple species

RESULTS & DISCUSSION

Extraction & Purification of Genomic DNA

for Sequencing Reactions.

Figure 3 shows that most genomic DNA samples

(Lanes 1-5) were successfully extracted, with distinct

intense bands in most lanes indicating sufficient

amounts of gDNA at least 23 kbp in size. This

suggests excessive degradation has not occurred for

DNA in Lanes 1-5 and integrity of extracted DNA is

high. In contrast, the diffuse band in Lane 6 is less

than 0.56 kpb, suggesting only very small fragments

of DNA are present. Possible causes of this include

contamination by nucleases that degrade DNA.

excess salt in the sample, leaving the sample at high

temperature for too long, or using an incorrect

buffer solution (ThermoFisher 2016). Issues with the

reagents themselves are unlikely as the same

reagents were used to prepare all samples.

Streaking of bands is prevalent, signifying various

sizes of DNA in the samples, although the intense

bands near 23 kbp indicate most DNA is still intact

in large genomic fragments. Intense staining at the

wells of samples in Lanes 2, 3 and 5 suggest

particularly high concentrations of genomic DNA;

these high concentrations would block the pores of

the gel matrix, inhibiting DNA movement through

the gel. Consequently, smearing of the bands occurs

as DNA bleeds into the gel slowly, producing the

streaking observed. Band distortion in Lanes 2, 3 and

5 may have been caused by air bubbles when loading

sample or uneven heating of gel which would cause

local changes in buffer conductivity (Qiagen 2015).

Sample DNA concentration is estimated through

comparing intensity of sample bands with the

intensity of fragments in the 0.5μg of Lambda/HindIII

molecular weight marker in Lane 7. Box 1 explains

how the concentration of the zebrafish genomic

DNA in Lane 4 was estimated to be 13.2ng/μL. Bands

in other lanes (e.g. 3, 5) in Figure 3 are more intense,

indicating higher concentrations of genomic DNA.

However, it is not critical that the gDNA sample

concentration is high. The recommended

concentration of gDNA for amplification via PCR

ranges from 25-100ng/μL, so having a lower

concentration simply means that a greater amount

should be used in the PCR (see Appendix A).

Isolation of DNA fragments containing mixl1

homeobox structure using PCR.

The PCR products are shown in a gel

electrophoresis image in Figure 4. Most PCR

product sizes are consistent with expected amplicon

sizes, indicating primers were highly specific to the

target sequence and amplification was successful.

The expected amplicon size for the Z1 forward and

reverse primers for the zebrafish is 653 bp (Daish

2016b). This is consistent with the gel

electrophoresis in Figure 4, which shows a PCR

product of approximately 653 bp in Lane 5.

Meanwhile, Lanes 1, 2 and 7 in Figure 3 show bands

corresponding to the expected amplicon using the

platypus PF2 and PR2 primers of 685 bp (Daish

2016b). However, Lane 7 shows an additional band

of approximately 360 bp. It is possible that this

second PCR product arose from the primers binding

to and amplifying another region on the template

DNA.

Figure 3. Electrophoresis of extracted

genomic DNA from various species on

1.5% agarose gel.

Lanes 1-6 contain 7.5μg samples of genomic

DNA; 1 = Mouse liver, 2 = Mouse liver, 3 =

Zebrafish, 4 = Zebrafish, 5 = Platypus, 6 =

Mouse liver, 7 = 0.5μg Lambda/HindIII molecular

weight marker.

Page 4: Conservation of the Mixl1 homeobox in multiple species

In this case, gel purifying the 685 bp PCR product

would help ensure that the purity is sufficient for a

successful sequencing reaction. Lanes 8 and 9 used

the MF1/MR1 and MF2/MR2 primer sets

respectively. Lane 8 has one band of 453 bp while

Lane 9 has one band of 391 bp. These sizes are

consistent with expected amplicon sizes and the

lack of other bands indicates the PCR products are

sufficiently pure. However, faintness of these bands

suggests low concentration of PCR products. This

could be due to non-optimal PCR conditions. For

example, temperature may have been too low

resulting in incomplete denaturation, DNA template

had insufficient integrity, denaturation time was too

long leading to degradation (Bio-Rad 2016).

The product in Lane 6 failed to amplify. Running the

following controls in the same gel would help

determine the cause of failure along with ensuring

the correct target sequence was amplified:

Negative control with water to ensure no DNA

contamination was in the water.

Negative control with all PCR reagents except

for DNA template to ensure that reagents are

not contaminated, and there is no non-specific

amplification in the reaction. Detection of

positive signal in this control would indicate the

presence of contaminating nucleic acids.

Positive control using template and primers

known to amplify correctly and produce distinct

bands under the PCR conditions. This control

should contain the same PCR reagents as the

samples and should be easily distinguished from

the target DNA (e.g different size).

Preparing several samples of different

concentration may help determine if smearing is

due to using too high of a DNA concentration.

This is particularly important if the

concentration of the genomic DNA was only

estimated.

Lane 4 has one smeared band where the majority of

DNA has remained in the well. A smear instead of a

single band indicates DNA fragments of varying sizes.

The expected products of a successful PCR reaction

should have the same sequence and same size,

assuming PCR conditions are optimal and the

primers are specific. Hence varying sized bands

indicate that the PCR reaction was not specific

enough in amplifying the target DNA. Possible

causes include:

Non-specific primers, leading them to bind to

other parts of the template DNA which also get

amplified.

Non-optimal cycling conditions: For example,

excessive number of cycles, excessive extension

time, excessive annealing time, or insufficiently

Box 1. Estimation of concentration of

zebrafish genomic DNA in Lane 4.

Table 1. Known sizes of DNA fragments

from Lambda/HindIII molecular weight

marker

Fragment Size (Kbp)

1 23

2 9.6

3 6.6

4 4.4

5 2.2

6 2.0

7 0.56

Total Size 48.36

Source: Genetics III Practical Manual (University of

Adelaide, 2016).

The total size of the fragments in the DNA

marker is 48.36 kbp. Since 0.5μg of

molecular weight marker was used, the

corresponding ratio is:

48.36kpb

0.5μg=48.36kbp

500ng=

1kbp

10.339ng

The sample in Lane 4 has comparable

intensity to Fragment 2 of the molecular

weight marker, corresponding to a size of

9.6 kbp (Table 1):

1kbp

10.339ng× 9.6kbp =

9.6kbp

99.26ng

i.e. There is 99.26ng of genomic DNA in

Lane 4. Because 7.5μL of genomic DNA had

been loaded onto the gel, the concentration

of genomic DNA in Lane 4 is estimated to

be:

99.26ng

7.5μL=13.2ng

1μL

Figure 2. Estimation of Zebrafish genomic

DNA concentration (Lane 4 from Figure 1)

and brief explanation of reasoning.

Page 5: Conservation of the Mixl1 homeobox in multiple species

high annealing temperature all increase the

opportunity for non-specific amplification (Bio-

Rad 2016).

Too high concentration of template DNA. This

can inhibit the polymerase due to inhibitors in

the template or inefficient denaturation (Qiagen

2015).

Genomic DNA was of poor quality (e.g.

sheared).

Using such a PCR product in a sequencing reaction

would likely result in many nucleotides which cannot

be accurately identified (appear as “N” in the

sequence). The presence of multiple DNA

sequences (from the non-specific PCR products)

means that the sequence of the desired PCR

product cannot be distinguished from the

contaminating sequences. The PCR products in

Lanes 5 and 7 in Figure 4 are suitable for sequencing

due to single discrete PCR bands indicating

amplicons of expected size. The concentration of

the zebrafish template DNA used in Lane 5 was

measured to be 20.28 μg/mL using a

spectrophotometer.

Figure 4. Gel electrophoresis image of PCR products of various species. Lanes 2-9 were loaded

with 25-100ng DNA template. 1 = SPP1 Molecular Markers; 2 = Platypus, PF2 and PR2 primers; 3 =

Platypus, PF2 and PR2 primers; 4 = Mouse, MF1 and MR1 primers; 5 = Zebrafish ZF1 and ZR1 primers, 6 =

Zebrafish ZF2 and ZR2 primers; 7 = Platypus PF2 and PR2 primers; 8 = Mouse MF1 and MR1 primers; 9 =

Mouse MF2 and MR2 primers. Approximate size of successful amplicons marked above PCR bands.

360

Page 6: Conservation of the Mixl1 homeobox in multiple species

Sequencing of Zebrafish PCR Product & Identification of mxtx1 homeobox sequence.

Figure 6. Comparison of sequenced zebrafish and cavefish Mxtx1 homeobox sequences, and

corresponding amino acid sequences.

Figure 5 shows a sequence alignment of the zebrafish

amplicon amplified with ZF1 and ZR1 primers and

sequenced using ZF1 primer; zebrafish amplicon

amplified with ZF2 and ZR2 primers and sequenced

using ZF2 primer (obtained from demonstrators);

and the cavefish mxtx1 homeobox sequence. Both

the ZF1 and ZF2 zebrafish sequences were required

to determine the full 180 bp zebrafish homeobox, as

although the ZF1 amplicon has most of the required

homeobox sequence, the ZF2 amplicon has the

remaining small part. The zebrafish and cavefish

sequences are very similar, which is expected as

homeobox sequences tend to be highly conserved

through evolution, and the cavefish and zebrafish

share a common ancestor. There are several

nucleotide differences (36/180 = 20%), indicating

that point mutations have occurred since the

zebrafish and cavefish diverged from their common

ancestor. However, some of these may also be due

to the accuracy of the zebrafish sequences used. It

zebrafish_ZF1 TGCTACTGCTAAAACATCTGGAAGTGGAGCTGTATCCAGAAGCGCAAGTC cavefish_mxtx1_homeobox ----------------------------------------------AGCC ** * zebrafish_ZF1 GCAGGAAAAGGACAAGTTTCTCCAAGGAACACGTTGAGCTTCTGCGAGCT cavefish_mxtx1_homeobox GCAGAAAGAGGACCAGCTTCTCCAAAGAGCACGTAGAGCTGCTGAGGGCC **** ** ***** ** ******** ** ***** ***** *** * ** zebrafish_ZF1 ACATTTGAAACAGACCCTTACCCTGGAATCAGTCTCAGAGAGAGTCTTTC cavefish_mxtx1_homeobox ACATTTGAGACGGACCCGTACCCGGGCATCAGCCTGAGGGAGAGCCTGTC ******** ** ***** ***** ** ***** ** ** ***** ** ** zebrafish_ZF1 CCAAACCACAGGACTGCCAGAGTCTCGCATACAGGT-------------- cavefish_mxtx1_homeobox TCAGACCACCGGCCTGCCTGAGTCACGAATACAGGTTTGGTTCCAGAACA zebrafish_ZF2 --------------------------------AGGTCTGGTTCCAGAATA ** ***** ** ***** ***** ** ******** *********** * cavefish_mxtx1_homeobox GGAGGGCTCGTACGCTTAAGTGTAAG zebrafish_ZF2 GGAGAGCTCGCACGTTGAAATGCAAG **** ***** *** * ** ** ***

Figure 5. Multiple Sequence Alignment of zebrafish ZF1/ZR1 PCR product (sequenced using

ZF1 primer), zebrafish ZF2/ZR2 PCR product (sequenced using ZF2 primer) and cavefish

mxtx1 homeobox sequences. Asterisks signify perfectly matched nucleotides.

**** ***** *** * ** ** ***

Page 7: Conservation of the Mixl1 homeobox in multiple species

may be useful to do several independent sequencing

reactions using different zebrafish samples to

average out random point mutations which do not

reflect the consensus zebrafish homeobox sequence.

Point mutations result in a different RNA transcript,

which may be translated into a different amino acid

sequence. A different amino acid sequence could

affect the folding and hence function of the encoded

protein. It is likely that point mutations

corresponding to active-site amino acids would

likely be deleterious and hence be unlikely to occur,

explaining why most nucleotides are highly

conserved between the zebrafish and cavefish

sequences. However, replacement polymorphisms

that correspond to non-active site amino acids or to

no change to the amino acid sequence (silent

polymorphisms) would have less or no effect on the

functioning of the homeoprotein and hence may be

passed onto the next generation. Figure 6 shows the

amino acid sequences corresponding to the 180 bp

zebrafish and cavefish mxtx1 homeobox sequences.

Surprisingly, despite the nucleotide substitutions,

both amino acid sequences are identical. This is

made possible by codon degeneracy which refers to

the manner in which a particular amino acid may be

specified by several different codons.

Figure 8. Comparison of sequenced platypus homeobox sequence and opossom Mixl1

homeobox sequence, with corresponding amino acid sequences.

platypus_PF1 GTCGCAGCGGCGGAAGCGCACGTCGTTCAGCCCGGAGCAGCTGCAGCTGC opossum_MixL1_homeobox ----CAGCGCAGGAAGAGAACGTCTTTCAGCCCCGAGCAGCTGCAGCTGC ***** ***** * ***** ******** **************** platypus_PF1 TGGAGCTCGTCTTCCGCCGCACCATGTACCCCGACATCAACCTGCGGGAC opossum_MixL1_homeobox TGGAACTGGTGTTTCGCCGGACCATGTACCCGGACATCACCTTGCGGGAA **** ** ** ** ***** *********** ******* * ******* platypus_PF1 CGCCTGGCCGCCCTCACGCAGCTCCCCGAGTCCAGGATCCAGGTC----- opossum_MixL1_homeobox CGCCTGGCTACCCTCACTAGGCTCCCGGAGTCCAGGATCCAGGTCTGGTT platypus_PF2 --------------------------------------CCAGGTCTGGTT ******** ******* ****** *********************** opossum_MixL1_homeobox CCAGAACAGACGCGCCAAATCCCGTCGGCAGAGA platypus_PF2 CCAGAACAGACGTGCCAAATCCCGGCGCCAGAAA ************ *********** ** **** *

Figure 7. Multiple Sequence Alignment of platypus PF1/PR1 PCR product (sequenced using

PF1 primer), platypus PF2/PR2 PCR product (sequenced using PF2 primer) and opossom mixl1

homeobox sequences. Asterisks signify perfectly matched nucleotides.

Page 8: Conservation of the Mixl1 homeobox in multiple species

Sequencing of platypus PCR Product &

Identification of mixl1 homeobox sequence.

Figure 7 shows the multiple sequence alignment of

the sequenced platypus products (using PF1,

obtained from Marina Zupan; and PF2, obtained

from demonstrators). Sequence similarity is also

high, consistent with how the homeobox is highly

conserved. However, there are several mismatches

(24/180 = 13%), indicating that the platypus and

opossom underwent some evolutionary change

since diverging from their common ancestor. Figure

8 shows the amino acid sequence corresponding to

the platypus and opossom homeobox sequences. It

can be seen that most amino acids are the same due

to some of the replacement polymorphisms being

silent. However, there are several amino acid

changes (E → D at 94-96 nt; T→A at 107-109 nt;

R→Q at 115-118 nt). It is likely that these

correspond to amino acids not at the active site of

the homeoprotein, as replacements to active site

amino acids would likely be deleterious. It is also

possible that there could be errors associated with

the platypus sequence used or mismatches during

sequencing that account for some nucleotide

changes and hence amino acid changes.

Sequencing of mouse PCR product &

Identification of mixl1homeobox sequence.

A sequence comparison of mice PCR products (MF1,

MR1, sequenced with MF1 primer; MF2, MR2,

sequenced with MR2 primer; obtained from 2014

class) and the human mixl1 homeobox sequence is

shown in Figure 9. In contrast to the zebrafish and

platypus sequences, the mouse sequence appears to

have more “N”s (unidentified nucleotides) in the

middle of the sequence. Reasons for these “N”s are

discussed later. However, sequence similarity is still

high (34/180 = 19% difference), consistent with the

conservation of homeobox sequences through

evolution. The corresponding amino acid sequences

are shown in Figure 10. Although some amino acids

are unknown due to the presence of unidentified

nucleotides, most amino acids appear to be

conserved.

It is noted that a mouse PCR product using

MF1/MR2 primers can be used in the sequencing

reactions for both the forward and reverse

sequencing primers. This is because the 2,148 bp

amplified region contains the entire homeobox

sequence (Figure 2A). Sequencing using the MF1

forward primer would yield a sequence containing

both homeobox sequences, while sequencing with

the MF2 forward primer yields a sequence

containing the smaller homeobox sequence.

sequencing with the MR1 reverse primer yields a

sequence containing the larger homeobox sequence

while sequencing with the MR2 reverse primer

yields a sequence containing both homeobox

sequences.

mouse_f1_2014 AGGGTCGGGCGCCCCGTCGGAGCCNNNNNNCGCAAGAGT-TGTCGTTCANCTCGGAGCAG human_mixl1_cds ------------------------CAGCGCCGCAAGCGCACGTCTTTCAGCGCCGAACAG ****** * *** **** * * ** *** mouse_f1_2014 CTGCCGTTGCTGGATCTCGTCTTCCNACAGACCATGTACCCNGACATCCACTTGCGGGAG human_mixl1_cds CTGCAGCTGCTGGAGCTCGTCTTCCGCCGGACCCGGTACCCCGACATCCACTTGCGCGAG **** * ******* ********** * **** ****** ************** *** mouse_f1_2014 CGCCTGGCTGCGCTCACGNTNCTACCCGAGTCCAGGATCC------------------- human_mixl1_cds CGCCTGGCCGCGCTCACCCTGCTCCCCGAGTCCAGGATCCAGGTATGGTTCCAGAACAGG mouse_f2_2014 CCAGGTATGGTTCCAGAACCGA ******** ******** * ** ********************************* * human_mixl1_cds CGTGCCAAGTCTCGGCGTCAGAGT mouse_f2_2014 CGGGCCAAGTCCAGGCGCCAGAGT ** ******** **** ******

Figure 9. Multiple Sequence Alignment of mouse MF1/MR1 PCR product (sequenced using

MF1 primer), mouse MF2/MR2 PCR product (sequenced using MR2 primer and reverse

complemented) and human mixl1 homeobox sequences.

Asterisks signify perfectly matched nucleotides.

Page 9: Conservation of the Mixl1 homeobox in multiple species

Figure 10. Comparison of sequenced mouse homeobox sequence and human Mixl1

homeobox sequence, with corresponding amino acid sequences.

Evolutionary history of species using mix

homeobox sequences.

A phylogenetic tree for the sequenced species

(zebrafish, mouse, platypus) along with comparison

sequences (chicken, opossom, human, cavefish) is

shown in Figure 11. The tree supports the choice of

comparison sequences used earlier for the zebrafish,

platypus and mouse. The zebrafish is most related to

the cavefish compared to the other organisms. This

is also the case between the mouse and human, and

platypus and opossom. In the sequence comparison

of all species analysed in Figure 12, it can be seen

that species that are more related (share a more

recent common ancestor) appear to have more

Figure 11. Phylogenetic tree for various species, constructed using their 180 bp mixl1 homeobox

sequences using Jukes-Cantor Genetic Distance Model and Neighbour Joining tree build method.

Page 10: Conservation of the Mixl1 homeobox in multiple species

similar nucleotide sequences compared to other

species. The tree shows that the chicken is less

related to the other species and this is reflected

through its Mixl1sequence in Figure 12 which shows

comparatively more variation. In addition, although

the zebrafish and cavefish share a recent common

ancestor, their Mxtx1 nucleotide sequences display

more variation when compared to mammal,

platypus and opossom sequences. This is consistent

with their position on the tree.

The branch length reflects the rate of evolutionary

change. Here it indicates the zebrafish underwent a

greater rate of evolutionary change compared to the

cavefish; the human underwent a greater rate of

evolutionary change compared to the mouse; and

the opossom underwent a greater rate of

evolutionary change compared to the platypus. Both

fish have a significantly greater rate of change

compared to the other species, consistent with their

shorter generation times and greater numbers of

offspring. However, it is known that monotremes

(e.g. platypus) are older than marsupials (e.g.

opossoms) along with mammals like humans and

mice. From this, it is expected that the branch length

of the platypus should be longer than that observed

in the tree to reflect more evolutionary change.

Additionally, while it would be expected that

monotremes like the platypus would diverge from

the common ancestor before mammals and

marsupials, the tree in Figure 10 suggests that the

platypus and opossom diverged from their common

ancestor at the same time, and this event happened

after the divergence of mammals, which is

inconsistent with current knowledge (see Appendix

B for a more accurate phylogenetic tree). This

indicates possible issues with the sequences used to

construct the tree; perhaps the sequences had

errors or it is inaccurate to construct phylogenetic

relationships using a single homeobox sequence.

Primer design for sequencing compared to

PCR.

Some primers work well in PCR but not in a

sequencing reaction. Sequencing reactions differ

from PCR in that they involve linear amplification of

the template DNA rather than exponential

amplification. While PCR uses two primers to create

a product that has priming sites and is readily able

to be used as a template for future amplifications,

DNA sequencing uses one primer. The resulting

product is in the same direction as the primer and

cannot be used as a template for future cycles, so all

amplification is directly from the original template

DNA (DNACore 2015). Because sequencing

reactions are much more sensitive to inefficient

primers, this may mean that primers that are

sufficiently competent in the exponential PCR

process are inefficient in the linear amplification of

sequencing reactions and may not produce sufficient

sequencing product to obtain a clear sequence.

Kieleczawa (2006) describes additional factors that

may inhibit the success of a sequencing reaction,

including:

GC-rich templates: DNA templates

containing >60% GC-content;

Templates with various repeats (e.g. di and tri-

nucleotide, direct, inverted);

Templates or primers which contain hairpin or

other secondary structures which interfere with

DNA polymerase movement;

Primers which have high melting point

temperatures (>65oC) which can lead to

secondary priming artefacts and noisy sequences;

Primers which are able to self-hybridise, forming

‘primer-dimers’.

Sequencing Reaction Issues

The presence of “N”s in the sequence obtained

from a sequencing reaction indicates unidentifiable

nucleotides. It is also common to observe variation

in the sequence size. Possible causes include:

Inadequately purified PCR product: This may

transfer PCR reaction components to the

sequencing reaction. If dNTPs were transferred,

these would compete with the fluorescent

ddNTPs, resulting in less fluorescence and

hence low resolution readings which may

contribute to “N”s. If forward and backward

primers still remained, they will both anneal to

complementary strands, resulting in two

sequences superimposed on each other that are

not readable (contain many “N”s) (Micromon

2013). Also. the non-purified PCR reaction may

Page 11: Conservation of the Mixl1 homeobox in multiple species

contain multiple products which bind the

sequencing primer leading to multiple reads.

Primer melting temperature too low: Will result

in less stringent primer, increasing probability of

non-specific annealing to other sites on the

template. These multiple reads would likely

result in undistinguishable nucleotides (“N”s). If

they overlap, different sized sequencing

products could be formed.

Secondary structures in the DNA template that

limit DNA polymerase movement, contributing

to shorter sequences (Kieleczawa 2006).

Multiple priming sites on the template DNA:

Resulting sequence reads will be overlapped.

Calculation error in primer or template DNA

concentration. Too little primer or template

DNA will result in low amounts of extension

products being generated, showing up as a low

resolution signal, which may contribute to

inability to distinguish nucleotides.

Contamination by inhibitors (e.g. salt, protein,

nuclease) which degrade the DNA into smaller

fragments. If the primer binding site is still intact

on the smaller fragments, this may result in

smaller length sequencing products.

High conservation of DNA sequence between

each species is restricted to the 180 bp

homeobox. DNA sequence does not need to

be conserved for function to be conserved.

Homeobox sequences tend to be highly conserved

through evolution as they play an essential role in

differentiation during embryogenesis (Burke et al.,

1995; Gehring et al., 1994). Homeobox sequences

encode transcription factors with a homeodomain

allowing them to bind to specific DNA sequences.

These transcription factors regulate gene

expression to determine the identity of segment

structures that will form on a given segment. Since

transcription factors are proteins, their amino acid

sequence dictates their tertiary structure and hence

their function. Replacement polymorphisms

corresponding to amino acid changes are usually

deleterious to homeoprotein function and selected

against during evolution; hence the homeobox

sequence tends to be highly conserved. However,

the intrionic DNA flanking the homeobox does not

need to be highly conserved as it is usually spliced

out during RNA transcript processing. Nucleotide

substitutions tend to have minimal effect on the

phenotype of the organism, hence leading to greater

flexibility for variation.

It is not essential that the DNA sequence is

conserved for function to be conserved. The

function of the homeoprotein depends on its shape

which depends on its amino acid sequence.

Nucleotide substitutions corresponding to silent

polymorphisms (no amino acid change) would not

affect shape nor function of the homeoprotein.

Nucleotide substitutions corresponding to

replacement polymorphisms in non-active site

amino-acids may affect protein folding but may not

significantly affect protein function. The

homeodomain has a helix-turn-helix structure

composed of three helices. The N-terminal arm

interacts with the minor groove of DNA, while the

third helix interacts with the major groove of DNA.

Gehring et al. (1994) derived a consensus homeobox

sequence from 346 homeodomains taken from

different species and determined that there were

seven positions in the consensus sequence that are

occupied by the same amino acid in more than 95%

of cases. These amino acid positions are marked on

the sequences in Figure 12, which show that the

homeodomain amino acid sequences in the various

species determined in this study are also highly

conserved. The exception is the mouse sequence

which has an S instead of R in the N-terminal arm;

however, there are several unidentifiable

nucleotides around this region, meaning that it is

possible that this is an issue with the mouse

sequence itself. Additionally, the zebrafish Mxtx1

sequence exactly matches the one determined

Pereira et al. (2012), indicating it was accurately

sequenced. The mouse Mixl1 sequence has several

differences, likely attributable to errors in

sequencing, although the majority of the sequence is

similar.

Page 12: Conservation of the Mixl1 homeobox in multiple species

Figure 12. Comparison of Mixl1 homeobox sequences and corresponding amino acids across various species.

Functional regions marked above the sequences. Amino acids found to be particularly conserved by Gehring et al.

(1994) are marked in boxes.

CONCLUSION

In the present study, the Mixl1 homeobox sequences

for the zebrafish, mouse and platypus were isolated

and successfully sequenced. Comparing these

sequences to closely related species including the

cavefish, human and opossom revealed the highly

conserved nature of the Mixl1 homeobox sequence.

Although some nucleotide differences were

observed, amino acid sequences were mostly similar

with particularly high conservation of active-site

amino acids. Highest similarity was observed

between species that shared a more recent common

ancestor, reflecting less evolutionary change

between them. Mixl1 sequences determined are

consistent with reported literature, although there

are some mismatches, likely due to insufficient

purification of PCR products or other experimental

error.

MATERIALS & METHODS

As stated in Daish (2016a; 2016c), except 10X load buffer was used during gel electrophoresis instead of 6X.

Page 13: Conservation of the Mixl1 homeobox in multiple species

REFERENCES

Bio-Rad 2016, PCR Troubleshooting, Bio-Rad

Australia, viewed 16 April 2016, <http://www.bio-

rad.com/en-au/applications-technologies/pcr-

troubleshooting>.

Burke, AC, Nelson, CE, Morgan, BA & Tabin, C 1995,

‘Hox genes and the evolution of vertebrate axial

morphology’, Development, vol. 121, no. 2, pp.333-

346.

Daish, T 2016a, ‘Multispecies analysis of homeobox

domain containing transcription factors’, practical

notes for Genetics 3111, University of Adelaide,

viewed 20 April 2016,

<https://myuni.adelaide.edu.au/bbcswebdav/pid-

6999322-dt-content-rid-

9036585_1/courses/3610_GENETICS_COMBINED_

0001/Prac%20Handbook%202016.pdf>.

Daish, T 2016b, ‘PCR Amplification and Sequence

Analysis of Homeobox Containing Mixl1 Transcription

Factors’, practical notes for Genetics 3111,

University of Adelaide, viewed 20 April 2016,

<https://myuni.adelaide.edu.au/bbcswebdav/pid-

7272638-dt-content-rid-9349662_1/xid-

9349662_1>.

Daish, T 2016c, ‘GIII Prac Session 3: Transcription

factor homeobox sequencing file manipulation and

multi-species alignments’, practical notes for Genetics

3111, University of Adelaide, viewed 20 April

2016,

<https://myuni.adelaide.edu.au/bbcswebdav/pid-

7287107-dt-content-rid-

9402676_1/courses/3610_GENETICS_COMBINED_

0001/DAISH%20GIII%202016%20Prac%203%20i

ntro%282%29.pdf>.

DNACore 2015, Sanger DNA Sequencing: Template

Preparation, Harvard University, viewed 16 April

2016, <https://dnacore.mgh.harvard.edu/new-cgi-

bin/site/pages/sequencing_pages/seq_template_pr

eparation.jsp;jsessionid=B4030742B11CFAB5D82B5

7477E17E5E5>.

Gehring, WJ, Affolter, M & Burglin, T 1994,

‘Homeodomain proteins’, Annual Review of

Biochemistry, vol. 63, no. 1, pp.487-526.

Hirata, T, Yamanaka, Y, Ryu, SL, Shimizu, T, Yabe, T,

Hibi, M & Hirano, T 2000, ‘Novel mix-family

homeobox genes in zebrafish and their differential

regulation’, Biochemical and Biophysical Research

Communications, vol. 271, no.3, pp.603-609.

Hueber, SD 2009, ‘Identification and Functional

Analysis of Hox Downstream Genes in Drosophila’,

viewed 20 April 2016, <http://nbn-

resolving.de/urn:nbn:de:bsz:21-opus-38027>.

Kieleczawa, J 2006, ‘Fundamentals of sequencing of

difficult templates-an overview’, Journal of

Biomolecular Techniques, vol. 17, no. 3, p.207.

Lappin, TR, Grier, DG, Thompson, A & Halliday, HL

2006, ‘HOX genes: seductive science, mysterious

mechanisms’, Ulster Medical Journal, vol. 75, no. 1,

pp.23-31.

Mark, M, Rijli, FM & Chambon, P 1997, ‘Homeobox

genes in embryogenesis and pathogenesis’, Pediatric

Research, vol. 42, no. 4, pp.421-429.

Micromon 2013, Common Reasons for DNA

Sequencing Failure, Monash University, viewed 16

April 2016,

<https://platforms.monash.edu/micromon/images/st

ories/forms-and-user-guides/sequencing-

failure.pdf>.

Pereira, LA, Wong, MS, Lim, SM, Stanley, EG &

Elefanty, AG 2012, ‘The Mix family of homeobox

genes—key regulators of mesendoderm formation

during vertebrate development’, Developmental

Biology, vol. 367, no. 2, pp.163-177.

Qiagen 2015, Why do I get smeared PCR products?,

Qiagen, viewed 16 April 2016,

<https://qiagen.com/au/resources/faq?id=4eb03cc

8-4623-4e9e-96b2-6a4c17c03c58>.

ThermoFisher 2016, Nucleic Acid Gel Electrophoresis

and Blotting Support/Troubleshooting, ThermoFisher

Scientific, viewed 16 April 2016,

<https://thermofisher.com/au/en/home/technical-

resources/technical-reference-library/nucleic-acid-

purification-analysis-support-center/nucleic-acid-

electrophoresis-blotting-support>.

Warren, WC, Hillier, LW, Graves, J, Birney, E,

Ponting, CP, Grützner, F, Belov, K, Miller, W, Clarke,

L, Chinwalla, AT & Yang, SP 2008, ‘Genome analysis

of the platypus reveals unique signatures of

evolution’, Nature, vol. 453, no. 7192, pp.175-183.

Page 14: Conservation of the Mixl1 homeobox in multiple species

Appendix A

Appendix B: Evolutionary tree showing accepted placement of monotremes and marsupials

relative to mice and humans (Warren et al. 2008)

Table A. PCR reaction reagents (in μL) used for the amplification of a segment of the Mixl1 homeobox

sequence from zebrafish genomic DNA.

Water 10x reaction

buffer

25mM

MgCl2

2.5mM

dNTP

Primers

(ZF1, ZR1

5μM)

DNA

template

(zebrafish

gDNA)

Taq polymerase

22.5 5 5 4 4 7.5 2