insect molecular genetics || dna, gene structure, and dna replication

34
3 Insect Molecular Genetics. DOI: © 2013 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/B978-0-12-415874-0.00001-9 2013 DNA, Gene Structure, and DNA Replication CHAPTER 1 Chapter Outline 1.1 Overview 3 1.2 DNA is the Hereditary Material: A Brief History 4 1.3 The Central Dogma 6 1.4 The “RNA World” Came First? 8 1.5 The Molecular Structure of DNA 8 1.6 The Molecular Structure of RNA 11 1.7 The Double Helix 13 1.8 Complementary Base Pairing is Fundamental 14 1.9 DNA Exists in Several Forms 14 1.10 Genes 16 1.11 The Genetic Code for Protein-Coding Genes is a Triplet and is Degenerate 16 1.12 Gene Organization 18 1.13 Efficient DNA Replication is Essential 21 1.14 DNA Replication is Semiconservative 21 1.15 Replication Begins at Replication Origins 21 1.16 DNA Replication Occurs Only in the 5to 3Direction 22 1.17 Replication of DNA Requires an RNA Primer 23 1.18 Ligation of Replicated DNA Fragments 23 1.19 DNA Replication during Mitosis in Eukaryotes 25 1.20 Telomeres at the End: A Solution to the Loss of DNA during Replication 28 1.21 DNA Replication Fidelity and DNA Repair 28 1.22 Mutations in the Genome 29 1.23 Common Genetic Terminology 32 1.24 Independent Assortment and Recombination during Sexual Reproduction 33 General References 33 References Cited 34 1.1 Overview Arthropod genes are made of deoxyribonucleic acid (DNA) and are located in chromosomes that consist of proteins, RNA, and DNA. DNA is a polymer of nucleotides (nt). Each nucleotide consists of a pentose sugar, one of four

Upload: marjorie-a

Post on 10-Dec-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

3

Insect Molecular Genetics.DOI: © 2013 Elsevier Inc. All rights reserved.

http://dx.doi.org/10.1016/B978-0-12-415874-0.00001-92013

DNA, Gene Structure, and DNA Replication

CHAPTER 1

Chapter Outline 1.1 Overview 3 1.2 DNA is the Hereditary Material: A Brief History 4 1.3 The Central Dogma 6 1.4 The “RNA World” Came First? 8 1.5 The Molecular Structure of DNA 8 1.6 The Molecular Structure of RNA 11 1.7 The Double Helix 13 1.8 Complementary Base Pairing is Fundamental 14 1.9 DNA Exists in Several Forms 141.10 Genes 161.11 The Genetic Code for Protein-Coding Genes is a Triplet and is Degenerate 161.12 Gene Organization 181.13 Efficient DNA Replication is Essential 211.14 DNA Replication is Semiconservative 211.15 Replication Begins at Replication Origins 211.16 DNA Replication Occurs Only in the 5′ to 3′ Direction 221.17 Replication of DNA Requires an RNA Primer 231.18 Ligation of Replicated DNA Fragments 231.19 DNA Replication during Mitosis in Eukaryotes 251.20 Telomeres at the End: A Solution to the Loss of DNA during Replication 281.21 DNA Replication Fidelity and DNA Repair 281.22 Mutations in the Genome 291.23 Common Genetic Terminology 321.24 Independent Assortment and Recombination during Sexual Reproduction 33General References 33References Cited 34

1.1 OverviewArthropod genes are made of deoxyribonucleic acid (DNA) and are located in chromosomes that consist of proteins, RNA, and DNA. DNA is a polymer of nucleotides (nt). Each nucleotide consists of a pentose sugar, one of four

Page 2: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

4 Chapter 1

nitrogenous bases, and a phosphoric acid component. DNA consists of two com-plementary strands in a helix form. Pairing of the nitrogenous bases adenine (A) with thymine (T) and cytosine (C) with guanine (G) on the two complementary strands occurs by hydrogen bonding. A pairs with T by two hydrogen bonds, and C pairs with G by three hydrogen bonds. DNA has chemically distinct 5′ and 3′ ends, and the two strands are antiparallel, with one strand running in the 5′ to 3′ direction and the other strand running in the 3′ to 5′ direction. The antipar-allel orientation of the two strands creates a special problem when the DNA is duplicated or replicated during mitosis or meiosis.

Genetic information in protein-coding genes is determined by the sequence of nitrogenous bases (A, T, G, C) in one of the strands, with a three-base (triplet) codon designating an amino acid. The genetic code is degenerate, meaning that more than one codon specifies most amino acids. The genetic information is expressed when DNA is transcribed into pre-messenger RNA (pre-mRNA) that is processed into mRNA and then translated into polypeptides. Most insect genes have intervening noncoding sequences (introns) that must be removed from the primary RNA molecule before translation into the protein can occur.

Efficient and accurate replication of DNA must occur at each cell division, or the cell or organism may not survive. DNA replication is semiconservative, i.e., one of the nucleotide strands of each new DNA molecule is new and the other nucleo-tide strand is old in each “cell generation.” The new DNA strand is complementary to the parental (or template) strand. DNA replication occurs in one direction only, from the 5′ to the 3′ end of the strand, and thus replication takes place differently on the two antiparallel strands. Replication on the “leading strand” can occur in the 5′ to 3′ direction in a continuous manner. DNA replication on the other strand, the “lagging strand,” occurs in short segments (Okazaki fragments) because the DNA runs in the 3′ to 5′ direction. Subsequently, the Okazaki fragments must be ligated together. Replication of DNA in chromosomes begins at multiple sites called origins of replication along the chromosome, and it involves many enzymes and proteins. Although DNA replication is usually highly accurate, errors in DNA replication, or mutations, can result from duplications, deletions, inversions, and translocations of nucleotides, all of which may affect the functioning of the resultant polypeptide. New combinations of genes can occur through recombination during meiosis.

1.2 DNA is the Hereditary Material: A Brief HistoryGregor Johann Mendel founded modern genetics in 1866 by publishing his studies on inheritance in garden peas. He confirmed that hereditary traits were transmitted from generation to generation, and he proposed the principles

Page 3: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 5

of Segregation and Independent Assortment, which are discussed further in the description of meiosis and mitosis in Chapter 3. His work, however, was not widely known until 1900, when Hugo de Vries, Carl Correns, and Erich von Tschermak rediscovered these laws of inheritance. Mendel described traits in peas that were “dominant” or “recessive,” showed that peas could be selected for different traits, and showed that the traits were inherited in a stable manner.

The discovery that DNA is the hereditary material was first determined using a bacterium that causes pneumonia, Streptococcus pneumoniae (Griffiths 1928). Before this discovery, scientists speculated that the hereditary material might be composed of proteins or RNA. Proteins were considered the most likely heredi-tary material because they were known to be more variable (having 20 amino acids that could serve as the genetic code) than DNA. Furthermore, proteins are present in the nucleus in amounts nearly equal to DNA. DNA, by contrast, seemed to have only four types of structure (consisting of A, T, C, or G) that could serve as the genetic code. Griffiths (1928) found that nonvirulent forms of S. pneumoniae could be “transformed” to virulent forms by combining heat-treated virulent bacteria with nonvirulent bacteria. The reverse was true and led to the conclusion that the virulence traits were heritable and that the heritable material was capable of surviving mild heat treatment. Subsequently, Avery et al. (1944) conducted experiments in which the “transforming principle” was found to have the characteristics of DNA, and the transforming factors did not test positive for proteins or RNA. Avery et al. (1944) showed that enzymes that degrade proteins or RNA did not degrade the transforming principle but that enzymes that could degrade DNA did degrade the transforming principle. Hershey and Chase (1952) conducted experiments to further resolve whether protein or DNA was the hereditary material. They labeled DNA and protein from viruses that infect bacteria (bacteriophages) with different radioactive markers and monitored whether labeled DNA or labeled protein entered the bacterial host. Only labeled DNA entered the bacteria, confirming that the transforming principle, or genetic information, was contained in DNA.

The next big questions were how the DNA was structured, how the genetic information was encoded, and how the genetic information was replicated in a reliable manner. Answers to these questions were hotly pursued by several scien-tists, including Francis Crick, James Watson, Rosalind Franklin, Maurice Wilkins, Linus Pauling, and others. Rosalind Franklin and Maurice Wilkins provided criti-cal information relevant to the solution of the structure of DNA with their X-ray diffraction pictures of purified DNA. The X-ray diffraction photographs provided an essential clue that allowed Watson and Crick to propose the correct structure

Page 4: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

6 Chapter 1

of DNA and to hypothesize how the genetic information was reliably replicated (Watson and Crick 1953). Previous proposals had been made that suggested that DNA “consists of three intertwined chains, with the phosphates near the fibre axis, and the bases on the outside.” Another three-chain structure also had been suggested in which “the phosphates are on the outside and the bases on the inside, linked together by hydrogen bonds.” Watson and Crick (1953) proposed that DNA “has two helical chains each coiled round the same axis… the bases on the inside of the helix and the phosphates on the outside…” and indicated the “novel feature of the structure is the manner in which the two chains are held together by the purine and pyrimidine bases…They are joined together in pairs, a single base from one chain being hydrogen-bonded to a single base from the other chain.” Watson and Crick stated, “It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material,” the accuracy of which is crucial to the trans-mission of genetic information from cell to cell and from generation to genera-tion. The next big question to be answered involved the issue of how the purine and pyrimidine bases were able to encode the genetic information.

Crick et al. (1961) deciphered the genetic code, by showing that three bases of DNA code for one amino acid. As noted by Crick et al. (1961), “If the coding ratio is indeed 3 ... and if the code is the same throughout Nature, then the genetic code may well be solved within a year.” To resolve the question, Crick et al. (1961) found that mutations in a bacteriophage gene caused by the insertion or dele-tion of a single base pair resulted in a mutation that lead to a failure to pro-duce a normal protein. The protein could be made functional again by inserting or deleting a total of three nucleotides, indicating that the genetic code uses a codon of three DNA bases that correspond to an amino acid and that the code for genes is not overlapping.

As more is learned about genomes, the concept of the gene has had to be mod-ified. An early definition focused on protein-coding genes, but we now know that much of the DNA in an organism is transcribed into large and small RNAs that are not translated into proteins (Collins and Penny 2009). Furthermore, protein-coding genes have a variety of regulatory elements, including enhancers and pro-moters, whereas some genes code for RNAs that are used directly and other RNAs regulate development, as will be described in Chapters 2 and 3.

1.3 The Central DogmaThe Central Dogma of molecular biology, as proposed by Francis Crick (Crick 1958), stated that biological information is carried in DNA, this information

Page 5: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 7

subsequently is transferred to RNA (mRNA), and finally it is translated into spe-cific proteins based on the code in the DNA. Initially, the Central Dogma stated that the flow of information is unidirectional, with proteins unable to direct the synthesis of RNA, and RNA unable to direct the synthesis of DNA (Figure 1.1).

The Central Dogma had to be amended in 1970 when certain viruses were found to transfer information from RNA to DNA. Subsequently, mutated pro-teins found in the membrane of brain cells of vertebrates were shown to be inherited. Although such aberrant proteins initially were thought to be caused by slow viruses or viroids, Stanley Prusiner discovered that the mutated pro-teins (called prions) could cause a group of fatal neurodegenerative diseases. The term prion refers to proteinaceous infectious particles (Prusiner and Scott 1997) that cause diseases such as bovine spongiform encephalopathy (“mad cow disease”) in cattle, scrapie in sheep, and Creutzfeldt–Jakob disease or kuru in humans. These proteinaceous infective particles do not contain DNA, but they are able to transmit the disease to individuals who eat the altered proteins (Prusiner and Scott 1997). The altered protein acts as a template upon which the normal protein is refolded into a deformed molecule through a process facili-tated by another protein (Prusiner and Scott 1997, Tuite 2000). Such abnormal proteins are transmitted to daughter cells, thereby propagating the mutant phenotype in the absence of any mutated nucleic acid.

The Central Dogma remains an important tenet of modern biology, although our knowledge of the roles of RNAs continues to expand and some have ques-tioned its relevance (Mattick 2009, Shapiro 2009). In insects, genes (DNA) are found in complex structures called chromosomes that consist of proteins, RNAs, and DNA. This chapter reviews the structure of DNA and RNA, the basis of the

Figure 1.1 The Central Dogma assumes that biological information transfers from DNA to RNA to proteins. Recent discoveries of viruses that transcribe information from RNA to DNA required modification in the Dogma. Three processes are involved in the Central Dogma: DNA replication, transcription of the genetic information into RNA, and translation of the mRNA into a polypeptide (protein).

Page 6: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

8 Chapter 1

genetic code for proteins, the processes involved in DNA replication, and the changes in DNA that result in mutations.

1.4 The “RNA World” Came First?It is thought that there could have been an era on early Earth during which RNA played the role of genetic material and also served as the main agent of catalytic activity, because RNA can serve as a ribozyme (DiGiulio 1997, Jeffares et al. 1998, Poole et al. 1998, Cooper 2000, Eddy 2001, Gesteland et al. 2006, Atkins et al. 2011, Darnell 2011). This role implies that enzymatic proteins in the modern world replaced RNA as the main catalysts. The “RNA organism” is thought to have had a multiple-copy, double-stranded RNA genome capable of recombination and splic-ing. The RNA genome was probably fragmented into “chromosomes” (Jeffares et al. 1998). RNA could have been the first genetic material because it can serve as a template for self-replication and can catalyze chemical reactions, including the polymerization of nucleotides (Johnston et al. 2001). It is thought that inter-actions between RNA and amino acids then evolved into the present-day world in which DNA is the more stable repository of genetic information. Knowledge of the number of RNAs and their very diverse functions has increased and an appre-ciation for the role of RNAs in gene regulation and development is reflected in many publications. First, however, let’s examine the structure and function of DNA in its role as a stable repository of genetic information.

1.5 The Molecular Structure of DNADNA is a long, double-stranded polymeric molecule consisting of individual monomers that are linked in a series and organized in a helix. Each monomer is called a nucleotide. Each nucleotide is itself a complex molecule made up of three components: a sugar, a nitrogenous base, and a phosphoric acid (Box 1.1).

In DNA, the sugar component is a pentose (with five carbon atoms) in a ring form that is called 2′-deoxyribose (Figure 1.2). The nitrogenous bases are single- or double-ring structures that are attached to the 1′-carbon of the sugar. The bases are purines (adenine and guanine) or pyrimidines (thymine and cytosine) (Figure 1.3). When a sugar is joined to a base it is called a nucleoside.

A nucleoside is converted to a nucleotide by the attachment of a phosphoric acid group to the 5′-carbon of the sugar ring (Figure 1.4). The four nucleotides that polymerize to form DNA are 2′-deoxyadenosine 5′-triphosphate, 2′-deoxy-guanosine 5′-triphosphate, 2′-deoxycytidine 5′-triphosphate, and 2′-deoxythymi-dine 5′-triphosphate (Figure 1.5). These names usually are abbreviated as dATP, dGTP, dCTP, and dTTP, or shortened further to A, G, C and T, respectively.

Page 7: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 9

Box 1.1 Key Points About DNA and RNA

DNAl DNA is a double helix with antiparallel strands consisting of four types of nucleotides.l Complementary base pairing is essential to its structure and to maintaining an accurate code during

replication.l Adenine pairs with thymine (two hydrogen bonds) and guanine pairs with cytosine (three hydrogen

bonds).l Replication is semiconservative, a mechanism that helps to maintain the genetic information without

errors. Daughter molecules contain one old and one new strand.l DNA replication occurs only from the 5′ to the 3′ direction and requires an RNA primer.l DNA replication begins at multiple origins of replication on the chromosome.l Replication is continuous on the leading strand of DNA but is discontinuous on the lagging

strand.l The genetic code consists of a triplet of bases and is degenerate; 64 codons code for only 20 amino

acids.l Protein-coding genes contain introns and exons in most eukaryotes; the introns must be

removed from pre-mRNA before the genetic information can be translated into proteins in eukaryotes.

l DNA is organized in chromosomes with telomeres at their ends, an organization that helps to preserve the ends of the chromosome.

l DNA can be mutated or damaged and repaired by multiple repair mechanisms.l Mutations affect the phenotype of the insect in multiple ways.l DNA can be modified during meiosis by crossing over and can result in new combinations of genes in

the gametes.

RNAl RNA contains uracil rather than thymine and ribose sugar instead of deoxyribose.l Pre-mRNA is processed in the nucleus before the mRNA can move to the cytoplasm for

translation.l Some RNAs are used directly (ribosomal, transfer, and a variety of small RNAs) without being

translated into proteins.

Figure 1.2 Structure of sugars found in nucleic acids; 2′-deoxyribose is found in DNA and ribose is found in RNA.

Page 8: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

10 Chapter 1

Individual nucleotides are linked together by phosphodiester bonds to form polynucleotides (Figure 1.4). Polynucleotides have chemically distinct ends. In Figure 1.5, the top of the polynucleotide ends with a nucleotide in which the triphosphate group attached to the 5′-carbon has not participated in a phospho-diester bond. This end is called the 5′ or 5′-P terminus. At the other end of the molecule, the unreacted group is not the phosphate, but the 3′-hydroxyl. This end is called the 3′ or 3′-OH terminus. This distinction between the two ends (5′ and 3′) means that polynucleotides have an orientation that is very impor-tant in many molecular genetics concepts and applications.

Polynucleotides can be of any length and have any sequence of bases. The DNA molecules in chromosomes are probably several million nucleotides long.

Figure 1.3 Bases in DNA are purines (adenine and guanine) or pyrimidines (thymine and cytosine). Uracil is substituted for thymine in RNA.

Page 9: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 11

Because there are no restrictions on the nucleotide sequence, a polynucleotide of just 10 nt long could have any one of 410 (or 1,048,576) different sequences. This ability to vary the sequence is what allows DNA to contain complex genetic information.

1.6 The Molecular Structure of RNARNA also is a polynucleotide and has multiple functions in the cell (Richter and Treisman 2011, Tuck and Tollervey 2011), including the role as mRNA. RNAs dif-fer from DNA in two important ways. First, the sugar in RNA is ribose (Figure 1.2). Second, RNA contains the nitrogenous base uracil (U) instead of thymine (Figure 1.3). The four nucleotides that polymerize to form RNA are adenosine 5′-triphosphate, guanosine 5′-triphosphate, cytidine 5′-triphosphate, and uri-dine 5′-triphosphate, abbreviated as ATP, GTP, CTP, and UTP or A, G, C, and U,

Figure 1.4 A nucleoside consists of a sugar joined to a base. It becomes a nucleotide (nt) when a phosphoric acid group is attached to the 5′-carbon of the sugar. Nucleotides link together by phos-phodiester bonds to form polynucleotides.

Page 10: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

12 Chapter 1

Figure 1.5 The four nucleotides from which DNA is synthesized are 2′-deoxyadenosine 5′-triphos-phate (dATP), 2′-deoxyguanosine 5′-triphosphate (dGTP), 2′-deoxycytidine 5′-triphosphate (dCTP), and 2′-deoxythymidine 5′-triphosphate (dTTP).

Page 11: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 13

respectively. The individual nucleotides are linked together with 3′ to 5′ phos-phodiester bonds. RNA is typically single-stranded, making it less stable than double-stranded DNA, although it can form complex structures (such as hairpins) or become double-stranded.

1.7 The Double HelixThe discovery, by Watson and Crick (1953), that DNA is a double helix of antipar-allel polynucleotides ranks as one of the most important discoveries in biology because it provided a hypothesis as to how DNA replication could be achieved with a minimal error rate. Nitrogenous bases are located inside the double helix, with the sugar and phosphate groups forming the backbone of the molecule on the outside (Figure 1.6). The nitrogenous bases of the two polynucleotides inter-act by hydrogen bonding, with an A pairing to a T and a G to a C.

Hydrogen bonds are weak bonds in which two negatively charged atoms share a hydrogen atom between them. Two hydrogen bonds form between A and T and three hydrogen bonds form between G and C. Bonding between G and C is thus stronger, and more energy is required to break it. The hydrogen

Figure 1.6 Two representations of the double-helix structure of DNA. The model on the left shows the hydrogen-bonding between nitrogenous bases that holds the two antiparallel strands together. The model on the right shows the relative sizes of the atoms in the molecule.

Page 12: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

14 Chapter 1

bonds, and other molecular interactions called stacking interactions, hold the double helix together.

The DNA helix turns approximately every 10 base pairs (bp), with spacing between adjacent base pairs of 3.4 Å, so that a complete turn requires 34 Å (Figure 1.6). The helix is 20 Å in diameter and right-handed, thus each chain follows a clockwise path. The strands run antiparallel to each other, with one strand running in the 5′ to 3′ direction and the other strand running in the 3′ to 5′ direction; this alignment has important implications for DNA replication and several molecular genetics techniques. The DNA helix has two grooves, a major groove and a minor groove (Figure 1.6) in which proteins involved in DNA repli-cation and transcription interact with the DNA and with each other.

1.8 Complementary Base Pairing is FundamentalThe principle of complementary base pairing is a fundamental element of DNA structure and of great practical significance in many techniques used in molec-ular biology. A pairs with T and G pairs with C. Normally, no other base-pair-ing pattern will fit in the helix or allow hydrogen bonding to occur (Figure 1.7, Box 1.1).

Complementary base pairing provides the mechanism by which the sequence of a DNA molecule is retained during replication of the DNA molecule, which is crucial if the information contained in the gene is not to be altered or lost dur-ing cell division. Complementary base pairing is important in transcription and expression of genetic information in the living insect and is important in several molecular techniques.

1.9 DNA Exists in Several FormsDNA actually is a dynamic molecule in living organisms and has several differ-ent variations in form. In some regions of the chromosome, the strands of the DNA molecule may separate and later reanneal. DNA typically is right-handed, but it can form >20 slightly different variations of right-handed helices. In some regions of the molecule, it can even form left-handed helices. If segments of nucleotides in the same strand are complementary, the DNA may even fold back upon itself in a hairpin structure.

DNA exists in different crystalline forms, depending upon the amount of water present in the DNA solution (Bustamante et al. 2003). B-DNA is the form in which DNA commonly occurs under most cellular conditions. A-DNA is more compact than B-DNA, with 11 bp/turn of the helix and a diameter of 12 Å.

Page 13: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 15

In addition, C-, D-, E-, and Z-DNA have been found. Z-DNA has a left-handed helix rather than a right-handed helix. A triple-helical form (H-DNA) also occurs. A, H, and Z forms of DNA are thought to occur in cells, and C, D, and E forms of DNA may be produced only under laboratory conditions.

Figure 1.7 A) Complementary base pairing of polynucleotides by hydrogen bonds holds the two strands of the DNA molecule together. B) Thymine (T) pairs with adenine (A) with two hydrogen bonds, and guanine (G) pairs with cytosine (C) with three hydrogen bonds.

Page 14: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

16 Chapter 1

1.10 GenesThe concept of a “gene” has evolved and has become increasingly difficult to define (Muller 1947, Maienschein 1992, Nelkin 2001, Pearson 2006, Pesole 2008, Brosius 2009). Genes can be a specific location on a chromosome (the bead on a string analogy), a particular type of biochemical material, and a physiological unit that directs development. Genes can consist of DNA sequences that can be spliced in alternative ways so that they are essentially coding for more than one protein (Pearson 2006). We also know that genes transcribed into RNA can regu-late other genes, and many genes are never translated into proteins.

Genes are segments of a DNA molecule that may vary in size from as few as 75 nt to >200 kilobases (kb) of DNA. A kilobase is 1000 nt. Genes contain biologi-cal information by coding for the synthesis of an RNA molecule. The RNA may subsequently direct the synthesis of a protein molecule or the RNA may be the end product (e.g., transfer RNAs [tRNAs], ribosomal RNAs [rRNAs], regulatory RNAs). Proteins may regulate other genes, form part of the structure of cells, or function as enzymes. Expression of the information contained in protein-coding genes involves a two-step process of transcription and translation (Figure 1.1).

Genetic information is determined by one of the two strands of the dou-ble-helix DNA molecule. The DNA containing the genetic information is called the coding strand, and the other strand is the noncoding complement to it. Sometimes, the coding strand is known as the sense strand and the noncoding strand is known as the antisense strand. A few examples are known in which both strands are the “coding strand” for part of the length of the DNA mol-ecule, but the genes occur in different specific regions. Thus, one strand of the double helix may be the sense strand over part of its length but be the antisense strand over other segments (Figure 1.8). A protein-coding gene typically includes a variety of regulatory structures and signals, as is described in Chapter 2.

Nonprotein-coding genes include genes that code for RNAs that are them-selves the end products: the synthesized RNAs may be used directly as tRNAs, rRNAs, small nucleolar RNAs (snoRNAs), small nuclear RNAs (snRNAs), and other regulatory elements (Eddy 2001, Sharp 2009, Tuck and Tollervey 2011).

1.11 The Genetic Code for Protein-Coding Genes is a Triplet and is DegenerateThe genetic code for a protein-coding gene is based on the sequence of three nucleotides in the DNA molecule. The triplet sequence (or codon) determines which amino acids are assembled in a particular sequence into proteins. It is

Page 15: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 17

possible to order four different bases (A, T, C, G) in combinations of three into 64 triplets or codons. Because there are only 20 common amino acids, the ques-tion immediately arises as to what the other 44 codons do?

The answer is that the genetic code is degenerate with all amino acids, except methionine and tryptophan, determined by more than one codon (Table 1.1). A, U, C, and G represent the codons in Table 1.1 because the genetic information in DNA is transcribed into mRNA, which uses U instead of T.

The genetic code also contains punctuation codons. Three codons (UAA, UGA, and UAG) function as “stop” messages or termination codons; they occur at the end of a protein-coding gene to indicate where translation should stop. AUG serves as an initiation or start codon when it occurs at the front end of a gene. Because AUG is the sole codon for methionine, AUGs also are found in the mid-dle of genes.

The genetic code is not universal, although it was assumed to be so initially. In 1979, it was found that mitochondrial genes use a slightly different code (Knight et al. 2001). For example, the codon AGA typically codes for arginine, but in Drosophila mitochondria the codon AGA codes for serine.

Eukaryotic genes have evolutionary histories and seem to have been derived from at least two sources. There are three domains of life: Archaea (archaebac-teria), Bacteria (eubacteria), and Eukarya (eukaryotes). Eukaryotes are organ-isms (including arthropods) that consist of cells with true nuclei bounded by nuclear membranes. Cell division in eukaryotes occurs by mitosis, reproductive cells undergo cell division by meiosis, and oxidative enzymes are packaged in mitochondria with its own circular chromosome. Evidence derived from analyses of genome sequences from the three domains strongly suggests that eukaryotic nuclear genes are derived from both archaebacterial (informational genes) and

Figure 1.8 Genetic information is contained in genes carried on one of the two strands (coding strand). The complementary strand in that region is the noncoding strand. Genes can occur on differ-ent strands at different points of the DNA molecule. Noncoding DNA between genes is called inter-genic or spacer DNA.

Page 16: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

18 Chapter 1

eubacterial (operational genes) lineages, indicating that eukaryotic genomes are chimeric (Lang et al. 1999, Nesbo et al. 2001).

1.12 Gene OrganizationAll genes are located on chromosomes. Each chromosome contains a single DNA molecule. These DNA molecules contain hundreds or thousands of genes in insects. For example, the fruit fly Drosophila melanogaster is estimated to have ≈13,600 genes distributed on four chromosomes (Adams et al. 2000). Genes may be spaced out along the length of a DNA molecule, with DNA sequences inter-vening that do not code for proteins, or the genes may be grouped into clus-ters. Genes in a cluster may be related or unrelated to each other in structure and function. There are segments of DNA in eukaryotes in which the nucleo-tide sequences apparently do not code for anything; this DNA has been called “spacer” or intergenic DNA if it occurs between genes. Studies indicate that these noncoding sequences often are transcribed into RNAs that have gene reg-ulatory functions.

Multigene families are clusters of related genes with similar nucleotide sequences. Multigene families may have originated from a single ancestral

Table 1.1: The 20 Amino Acids that Occur in Proteins and their Codons.

Amino acid Abbreviation Codons

Alanine ala GCU GCC GCA GCGArginine arg AGA AGGAsparagine asn AAU AACAspartic acid asp GAU GACCysteine cys UGU UGCGlutamic acid glu GAA GAGGlutamine gln CAA CAGGlycine gly GGU GGC GGA GGGHistidine his CAU CACIsoleucine ile AUU AUC AUALeucine leu UUA UUG CUU CUC CUA CUGLysine lys AAA AAGMethioninea met AUGPhenylalanine phe UUU UUCProline pro CCU CCC CCA CCGSerine ser AGU AGCThreonine thr ACU ACC ACA ACGTryptophana trp UGGTyrosine try UAU UACValine val GUU GUC GUA GUG

aMethionine and tryptophan are underlined because they are specified by only one codon.

Page 17: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 19

gene that duplicated to produce two, or more, identical genes (Francino 2005). These identical genes could have diverged in nucleotide sequence through time to produce (two or more) related functional genes. In some cases, the genes of multigene families can be found at different positions on more than one chro-mosome after large-scale rearrangements (translocations or inversions) that occur both within and between chromosomes. Examples of multigene families in insects include actins, tubulins, heat shock, salivary glue, chorion, cuticle, and yolk protein genes. (Note that the name of a specific gene usually is italicized.)

Pseudogenes are DNA sequences that seem similar to those of functional genes, but the genetic information has been altered (mutated) so that the for-mer gene is no longer functional. Once the biological information has been lost, a pseudogene can undergo rapid changes in nucleotide sequence and, given sufficient time, it may degrade to the point where it is not possible to identify the sequence as a former gene. At this point, it may be called “junk” DNA.

One of the interesting discoveries in genetics was the revelation in 1977 that most protein-coding genes in eukaryotes are discontinuous. Discontinuous genes contain coding and noncoding segments called exons and introns, respectively (Figure 1.9). Considerable discussion of the origin, evolution, and importance of introns has occurred previously (Herbert 1996, Gilbert et al. 1997, Trotman 1998). Introns have been maligned as examples of junk DNA because they may be con-siderably longer than the coding sequences (exons), and they were thought to have no function, although we now know that some introns contain regulatory sequences. Two hypotheses have been proposed to explain the origin of introns: the introns-early hypothesis and the introns-late hypothesis.

According to the introns-early hypothesis, many introns were present in the common ancestor of all life, but large or complete losses of the introns occurred in independent lineages. In addition, introns functioned in the primor-dial assembly of protein genes by promoting the recombination, or shuffling, of short exons, each encoding 15–20 amino acids (minigenes) into different

Figure 1.9 Protein-coding genes in eukaryotic organisms are divided into introns and exons. Introns are removed from the mRNA before it is translated into a polypeptide. In this example, there are six exons and four introns. The genetic message is present in exons I, II, III, IV, V, and VI.

Page 18: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

20 Chapter 1

functional genes through fusion (Gilbert et al. 1997). It is likely that there has been an average of two or three acts of such fusions of minigenes into the larger exons of today (Gilbert et al. 1997). Some introns have been inherited for millions of years, making it possible to find a consistent location for the introns when homologous genes from different organisms are examined. The actual sequences of the introns in these homologous genes may have diverged through mutation to the point that they seem to have no sequence similarity. Trotman (1998) suggests that this consistent location of introns is evidence that introns may have been integral to the development of primordial genes, leading to the hypothesis that novel genes could arise from new combinations of exons, and thereby generate novel proteins and functions.

The introns-late hypothesis assumes that mechanisms for splicing introns out were not present in the common ancestor of life but that these mechanisms arose and spread within eukaryotes during their evolution. According to this hypothesis, introns could not have played a role in ancient gene and protein assembly. As is often the case with many “either/or” debates, the truth may be a combination of the two hypotheses (Tyshenko and Walker 1997). Both concepts may be correct; the introns in the triosephosphate isomerase genes of insects may be the result of the insertion of a transposable element relatively recently, whereas other introns may have been present for a very long time (Logsdon et al. 1995). DeSouza et al. (1998) suggest that 30–40% of the present-day intron positions in ancient genes correspond to the introns originally present in the ancestral gene. The rest of the intron positions are due to the movement or addition of introns over evolutionary time. Thus, introns may be both early and late, with ≈65% of the introns having been added to preexisting genes.

Introns are present in low frequency in prokaryotes and are rare in some eukaryotes, such as yeast. The number of introns and their lengths vary from species to species and from gene to gene. Some genes in eukaryotic organisms lack introns, whereas other genes in the same species may have as many as 50 introns. Introns may interrupt a coding region, or they may occur in the untrans-lated regions of the gene. Some eukaryotic genes contain numerous and very large introns, but introns typically range from 100 to 10,000 bp in length. A few introns contain genes themselves; how the genes got into the middle of an intron of another gene remains a mystery.

The presence of introns within many eukaryotic protein-coding genes requires that an additional step take place between transcription and translation in eukaryotes. Thus, when the DNA is transcribed into RNA, the initial RNA tran-script is not mRNA. The synthesized RNA is a precursor to mRNA and is called

Page 19: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 21

pre-mRNA. The pre-mRNA must undergo processing (splicing) in the nucleus to remove the introns before it can travel to the cytoplasm for translation into proteins. This process is described in Chapter 3, but first DNA replication is reviewed.

1.13 Efficient DNA Replication is EssentialEvery living organism must make a copy of its genes in each cell each time the cell divides. Such replication ideally is both rapid and accurate. If not, the organism’s survival and integrity are jeopardized. Even a very small error rate of 0.001% (one mistake/100,000 nt) can lead to detrimental changes or muta-tions. Although many mutations are detrimental, many apparently are neutral or nearly so, and a few are beneficial. The intrinsic structure of DNA helps to ensure that replication is accurate most of the time.

1.14 DNA Replication is SemiconservativeDNA replication is semiconservative, i.e., the daughter molecules each contain one polynucleotide derived from the original DNA molecule and one newly syn-thesized strand (Figure 1.10, Box 1.1). Semiconservative DNA replication requires the hydrogen bonds that hold the two strands together be broken so that syn-thesis of new complementary strands can occur. Semiconservative replication of DNA increases the likelihood that replication error rates are very low.

1.15 Replication Begins at Replication OriginsDuring the replication of long DNA molecules, only a limited region of the DNA molecule is in an unpaired form at any one time. Replication occurs after the two strands separate; separation involves breaking the weak hydrogen bonds holding the bases of the opposite strands together. The separation of the two strands starts at specific multiple positions in the chromosome called origins of replication and moves along the molecule. Replication sites in Drosophila, for example, occur at thousands of sites throughout the genome (Eaton et al. 2011). Synthesis of the new complementary polynucleotides occurs as the double helix “unzips.” The region at which the base pairs of the parent molecule are broken and the new polynucleotides are synthesized is the replication fork (Figure 1.11).

The two strands of the parent DNA molecule are broken apart by enzymes called helicases. Once the helicase has separated the two strands, single-strand binding proteins attach to the single strands to prevent them from immedi-ately reannealing to each other (Figure 1.11). This attachment makes it possi-ble for DNA polymerase to synthesize new complementary DNA strands. DNA

Page 20: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

22 Chapter 1

polymerases have two properties that complicate DNA synthesis. First, DNA polymerase can synthesize only in the 5′ to 3′ direction; and, second, DNA poly-merase cannot initiate the synthesis of new DNA strands without a primer.

1.16 DNA Replication Occurs Only in the 5′ to 3′ DirectionBecause DNA polymerases can synthesize DNA only in the 5′ to 3′ direction, the template strands must be read in the 3′ to 5′ direction (Box 1.1). This process is straight-forward for one of the DNA template strands, called the leading strand, and DNA synthesis can proceed in an uninterrupted manner the entire length of the leading strand. However, DNA synthesis cannot proceed uninterrupted on the other template strand, called the lagging strand (Figure 1.12). DNA synthesis on the lagging strand is discontinuous, occurring in short sections, and produces short fragments (100–200 nt in length) of DNA called Okazaki fragments, after their discoverer who identified them in 1968 (Ogawa and Okazaki 1980).

Figure 1.10 DNA replication is semiconservative, meaning that each new DNA helix contains one old and one new complementary strand. DNA synthesis relies on complementary base pairing to replicate DNA accurately.

Page 21: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 23

1.17 Replication of DNA Requires an RNA PrimerAnother complication of DNA synthesis is that synthesis an RNA primer (Figure 1.13). Apparently, the first few (50–75) nucleotides attached to either the leading or lagging strands are not deoxyribonucleotides but rather ribo-nucleotides that are put in place by an RNA polymerase called primase. Once these ribonucleotides have been polymerized on the DNA template, the primase detaches, and DNA polymerase is able to synthesize DNA (Figure 1.14).

1.18 Ligation of Replicated DNA FragmentsAfter the Okazaki fragments (sequences complementary to the lagging strand of DNA) are produced, they must be joined together to produce a continuous strand (Figure 1.12). On the lagging strand, DNA polymerase III of Escherichia coli stops

Figure 1.11 During DNA replication, only part of the DNA molecule upzips to allow synthesis of new DNA strands. Helicases break the hydrogen bonds. In this example, replication begins at an origin of replication. Eukaryotes have many origins of replication along their chromosomes so that replication can occur rapidly. To keep the strands from reannealing at the replication forks where synthesis is occurring, single-strand binding proteins (SSBs) attach.

Page 22: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

Figure 1.12 DNA replication occurs in a different manner on the two strands. A) The leading strand is continuously copied, with synthesis occurring in the 5′ to 3′ direction. B) Synthesis on the lagging strand is discontinuous because the DNA strands are antiparallel. Synthesis occurs in short segments (Okazaki fragments) because DNA polymerase can only synthesize DNA in the 5′ to 3′ direction.

Figure 1.13 A) DNA must be primed so that DNA polymerase is able to synthesize a complementary strand. B) A primer of ribonucleotides is attached to a strand by RNA polymerase. DNA polymerase can then attach deoxyribonucleotides (dNTPs) to the DNA template in a sequence that is determined by the template strand (complementary base pairing). DNA synthesis occurs in the 5′ to 3′ direction.

Page 23: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 25

when it reaches the RNA primer at the 5′ end of the next Okazaki fragment. Then, DNA polymerase I of E. coli removes the ribonucleotides from the Okazaki fragment and replaces them with deoxyribonucleotides. When all the ribonu-cleotides have been replaced, DNA polymerase I replaces nucleotides on a short distance into the DNA region, before it dissociates from the new double-helix molecule. The Okazaki fragments are then joined up by DNA ligase that catalyzes the formation of a phosphodiester bond between the neighboring nucleotides.

DNA replication also requires that the double helix be unwound, as well as unzipped. There are ≈400,000 turns in 400 kb of DNA. This unwinding is accom-plished with the aid of enzymes called DNA topoisomerases. DNA topoisomerases unwind a DNA molecule without rotating the helix by causing short-term breaks in the polynucleotide backbone just in front of the replication fork. The reverse reac-tion is performed by DNA topoisomerases so that DNA molecules can be coiled.

1.19 DNA Replication during Mitosis in EukaryotesThe goal is to replicate the genome once during mitosis and provide equiva-lent chromosomes to the daughter cells (Sclafani and Holzen 2007). The replica-tion of prokaryotic and eukaryotic DNA is similar, but differs in several aspects, the details of which are not fully resolved (Gavin et al. 1995, Huberman 1995, Baker and Bell 1998, Leipe et al. 1999, Sutton and Walker 2001). DNA replication takes place during the eukaryotic cell cycle before the condensed metaphase

Figure 1.14 The cell cycle of a eukaryotic cell with a generation time of ≈24 hours. DNA synthesis occurs during the S phase. During G1 and G2 , no DNA synthesis occurs. Mitosis (M) occurs after G2.

Page 24: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

26 Chapter 1

chromosomes become visible in mitosis or meiosis. DNA replication occurs at a rate of up to 1000 nt/second (Dixon 2009).

The cell division cycle consists of five distinct phases, if interphase is included. Nondividing cells are in interphase, the longest phase. Some cells remain in interphase and never divide. However, if cell division occurs, the chromosomes must be replicated in a precise manner, and any errors in replication must be detected and, ideally, corrected. There are two gap phases or periods (G1 and G2), when the cell is carrying out its normal metabolic activities (Figure 1.14), separated by the S phase, when DNA replication or synthesis occurs. Mitosis (M) occurs subsequent to the G2 phase (also known as the premitotic phase). Mitosis occurs when highly condensed duplicated chromosomes separate and segregate into daughter cells followed by cytokinesis in which the cell membrane forms around each daughter cell. During the G2 phase, cells grow rapidly and pro-teins and RNAs are synthesized. Mitosis is divided into several phases (prophase, metaphase, anaphase, and telophase), as described in Chapter 3.

To reduce the amount of time required to replicate the very long DNA mol-ecule in eukaryotic chromosomes, DNA replication is initiated at a series of rep-lication origins ≈40 kb apart on the linear chromosome and proceeds in both directions (Figures 1.11, 1.12) (DePamphilis 1999). For example, replication in D. melanogaster occurs at a rate of ≈2600 nt pairs/minute at 24 °C. The largest chromosome in Drosophila is ≈8 × 107 nt long, so, with ≈8500 replication origins/chromosome, ≈0.25–0.5 hour is required to replicate this chromosome. If replica-tion occurred from a single replication fork, rather than from multiple replica-tion origins, replication of a single chromosome would require ≈15 days.

Origins of replication of eukaryotic chromosomes are recognized by a pro-tein complex called the Origin of Replication Complex (ORC) that is essential for initiation of DNA replication at yeast origins. The protein complex opens the DNA, stabilizes the single-stranded DNA that is formed, and allows polymerases to copy the DNA. The ORC complex seems to recruit other proteins (including DNA helicases) to the origin of replication, leading to the start of replication. Proteins related to yeast ORC proteins have been identified in Drosophila and other eukaryotes (Gavin et al. 1995).

Eukaryotes contain several different DNA polymerases (Hubscher et al. 2002). One DNA polymerase is located in the mitochondria and is responsible for rep-lication of mitochondrial DNA. The other DNA polymerases are in the nucleus and are involved in DNA replication, repair, and recombination. Polymerase α complexes with primase (the RNA polymerase that primes DNA synthesis) and seems to function with primase to synthesize short RNA–DNA fragments. Two other polymerases then synthesize the leading and lagging strands, extending

Page 25: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 27

the RNA–DNA primers initially synthesized by the polymerase α–primase com-plex. A DNA polymerase fills the gaps between the Okazaki fragments after the primers are removed (Sutton and Walker 2001, Hubscher et al. 2002).

Proteins (called sliding-clamp proteins and clamp-loading proteins) act at the eukaryote replication fork to load the polymerase onto the primer and maintain its stable association with the template. The clamp-loading proteins (called rep-lication factor C) recognize and bind DNA at the junction between the primer and template. The sliding-clamp proteins (proliferating cell nuclear antigen) in eukaryotes bind adjacent to the clamp-loading proteins, forming a ring around the template DNA. The clamp proteins then load the DNA polymerase onto the DNA at the primer–template junction.

The ring formed by the sliding clamp maintains the association of the poly-merase with its template as replication progresses, allowing the uninterrupted synthesis of long DNA molecules. Helicases unwind the template DNA ahead of the replication fork. Single-stranded DNA-binding proteins (eukaryotic rep-lication factor A) then stabilize the unwound template DNA so that the single-stranded DNA can be replicated. The enzymes involved in DNA replication, in combination with their accessory proteins, synthesize both leading and lagging strands of DNA simultaneously at the replication fork. The idea that DNA poly-merases track like locomotives along the DNA template during DNA replication is pervasive and is probably based on the misperception that the polymerase is smaller than the DNA (Cook 1999). We now know that the DNA polymerase–protein complexes involved in DNA replication can be much larger than the DNA template.

An alternative model to the “movement” of polymerase along the DNA tem-plate has been proposed in which the fixed polymerase complexes “reel in their DNA templates” as they extrude newly made DNA in replication “foci” or repli-cation factories within the cell. This “fixed” model assumes that the DNA poly-merase complex is fixed and that the DNA rotates around it. This solution is a simple solution to the potential problem of untangling DNA strands that twine around each other if the DNA polymerase moves (Cook 1999).

Some DNA polymerases in eukaryotes have 3′ to 5′ exonuclease activity in addition to their polymerase activity, meaning that they can excise a misincor-porated nucleotide by (proofreading) during DNA replication. DNA mismatch correction further minimizes replication errors by a survey of newly synthesized DNA strands. Furthermore, accessory factors such as DNA helicases apparently improve accuracy during DNA elongation, possibly due to resolution of stalled replication forks. Despite all these precautions, occasional misincorporated nucleotides or deletions, or insertions may remain, resulting in mutations.

Page 26: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

28 Chapter 1

1.20 Telomeres at the End: A Solution to the Loss of DNA during ReplicationBecause DNA synthesis occurs exclusively in the 5′ to 3′ direction and initiation requires a short RNA primer, the extreme 5′ end of a linear DNA strand consists of an RNA primer (Figure 1.13B). If the RNA primer is not replaced by deoxy-ribonucleotides, the chromosome would gradually decrease in length after each replication during mitosis because the segment with the RNA primer would not be copied into DNA. Shortening of the chromosome by 50–200 bp at the 3′ end of the lagging strand in each cell cycle could seriously affect gene function over time. However, linear chromosomes normally are stable because they have a specialized structure at their ends called a telomere (Zakian 1989). Without telo-meres, chromosomes are sticky and could fuse with other chromosomes, result-ing in growth arrest and cell death (Verdun and Karlseder 2007). The discovery of telomeres in 1978 resulted in a 2009 Nobel Prize in Physiology or Medicine to Elizabeth Blackburn, Carol Greider, and Jack Szostak.

Telomeres contain a series of species-specific repeated nucleotide sequences that are added to the ends of eukaryotic linear chromosomes by an enzyme called telomerase. In many arthropods, the highly repeated telomeric sequence has the motif of TTAGG and the telomeres may consist of 4–6 kb of sequence. Vitkova et al. (2005) found these sequences in Diplura, Collembola, crus-taceans, myriapods, pycnogonids, and most chelicerates (except spiders). Telomerase is a reverse transcriptase, meaning that it can transcribe DNA from an RNA template. A few copies of a short repetitive sequence (called the telomere sequence) are required to prime the telomerase to add additional copies to form a telomere. There are also longer, moderately repetitive nucle-otide sequences subterminal to the telomere sequences (subtelomeric region). Although the telomeres are maintained by telomerase during cell divisions, telo-meres do shorten through time and, in vertebrates, shortened telomeres are correlated with aging (Aubert and Lansdorp 2008).

1.21 DNA Replication Fidelity and DNA RepairFaithful maintenance of the genome is crucial to both the individual and the species (Lindahl and Wood 1999). When DNA is replicated inaccurately or is damaged by endogenous factors (such as water or oxygen) or exogenous factors (such as UV light, chemicals, and irradiation), death can ensue. Thus, there has been strong selection for multiple mechanisms to repair damaged DNA. Generally, the cell has two classes of mechanisms with which to repair DNA: 1) direct repair and 2) excision repair, or removal of the damaged bases followed by their replacement with newly synthesized DNA.

Page 27: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 29

Two types of damage, DNA damaged caused by UV light and modifications of G by the addition of methyl or ethyl groups to the sixth oxygen position of the purine ring, are repaired directly (direct repair) (Cooper 2000). The most common repair mechanism in cells involves removal (excision) of damaged com-ponents of the DNA. Repair can be divided into the following excision repair systems: base-excision repair, nucleotide-excision repair, and mismatch repair. Base-excision repair involves removal of only the damaged base from the DNA strand. Nucleotide-excision repair operates mainly on damage caused by envi-ronmental mutagens and involves DNA synthesis and ligation to replace an excised oligonucleotide (Lindahl and Wood 1999). In mismatch repair, the mis-matched bases that are incorporated during replication occasionally are not removed by the proofreading activity of DNA polymerase. The mismatched bases that are not removed are corrected by the mismatch repair system. If the DNA is not repaired before replication by the above-mentioned mechanisms, a postreplication repair system comes into play. Postreplication repair (recombi-national repair) can repair several types of damage to DNA, including double-strand breaks introduced into DNA by irradiation.

1.22 Mutations in the GenomeChanges in the genetic material (genotype) of an organism occur if DNA repair is not successful. Such changes are mutations. Many kinds of mutations can occur: within an exon, within introns, or in the chromosomal regions (inter-genic regions) located between the genes. If a mutation occurs in an intergenic region, it may be silent and have no detectable effect on the cell or individual. If a mutation occurs in an exon, it may alter protein product and cause a change in the organism’s phenotype (or appearance). A mutation in an intron may not have an effect on the phenotype, but it could have an effect if there are regula-tory elements in the intron that are important for proper gene function.

An organism with the “normal” appearance (phenotype) for that species is called the “wild type,” whereas an organism with a phenotype that has been changed is a mutant. If the mutation is dominant (meaning that only a single copy is required to cause the change in phenotype), the name of the gene is capitalized. If the mutation is recessive (meaning that both copies of the gene carry the mutation), the name is not capitalized.

A mutagen is a chemical or physical agent that causes changes in bases. Mutagens include UV radiation, X-irradiation, ethyl methane sulfonate, base analogs such as 5-bromouracil, acridine dyes, and nitrous acid. Mutations occur spontaneously approximately once in every 108 bp/cell division, or they can be induced by the experimenter.

Page 28: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

30 Chapter 1

Mutations affect the DNA sequence, gene organization, gene regulation, or gene function (Table 1.2). A point mutation is the replacement of one nucleo-tide by another (substitution). A substitution can be either a transition or a transversion. Transitions involve changes between A and G (purines) or T and C

Table 1.2: Mutations Affect DNA Sequence, Gene Function, Gene Regulation, and the Phenotype of the Organism.

Changes in DNA sequence

Point mutation Replacement of one nucleotide by another.Transition A point mutation in which a purine is changed to a purine (A <--> G) or

a pyrimidine to a pyrimidine (T<-->C).Transversion A point mutation in which the change is purine to pyrimidine, or vice

versa (A or G <--> T or C).

Changes in the gene

Silent mutation Sequence changes in an intergenic region usually result in no phenotypic changes. Changes in a gene can be silent if a point mutation occurs in the third nucleotide of a codon that, because of the degeneracy of the code, does not alter the amino acid.

Nonsense mutation A point mutation that alters a codon specifying an amino acid into a termination codon, which will prematurely terminate the polypeptide produced, changing the activity of the protein and altering the phenotype.

Frameshift mutation Insertions or deletions that are not in multiples of three can cause changes in the amino acids downstream from the mutation, resulting in a mutant phenotype.

Insertions or deletions (indels) When nucleotides are inserted or deleted (indels), the resulting mutations can be benign to lethal, depending on where these modifications occur.

Nonsynonymous or missense Changes in codons that alter the amino acid specified.

Changes in gene regulation

Mutations in regulatory genes alter the organism’s ability to control expression of a gene normally subject to regulation.

Changes in the organism

Lethal mutation Mutations that alter the function of an essential gene product so that the organism cannot survive.

Conditional lethal Individuals with these mutations can survive under a particular set of conditions, such as a specific temperature range, but die if reared outside these conditions.

Back mutation Organisms sometimes revert to the wild-type phenotype after a second mutation occurs that restores the original nucleotide sequence of the mutated gene.

Reversion Mutations can be corrected by restoring the original phenotype, but not the original DNA sequence, in the mutated gene by altering a second site within the gene.

Suppression The effects of a mutation can be altered by a new mutation that occurs in a different gene.

Page 29: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 31

(pyrimidines), whereas transversions involve changes between a purine and a pyrimidine.

An insertion or deletion (also called indels) involves the addition or deletion of one or more nucleotides. An inversion is the excision of a part of the DNA molecule followed by its reinsertion into the same position but with a reversed orientation. An inversion in Drosophila buzzatii was caused by a transposable element called Galileo, and this mechanism may be the mechanism by which many inversions occur (Caceres et al. 1999).

Some mutations are lethal, whereas others have an effect on the organism that can range from phenotypically undetectable (silent) to lethal only under certain circumstances (conditional lethal). For example, many mutations are temperature-sensitive, and the organism can survive if reared within one tem-perature range but will die if reared at higher temperatures.

A silent mutation may occur if the third base in a codon is altered but, because the genetic code is degenerate, there is no change in the amino acid specified. These mutations are called synonymous mutations because there is no change in protein structure or function from a silent mutation.

Some changes in codons alter the amino acid specified, and they are called nonsynonymous or missense mutations. Most point mutations that occur at the first or second nucleotide positions of a codon will be missense, as will a few third-position changes. A polypeptide with an amino-acid change may result in a changed phenotype, depending on the precise role the altered amino acid plays in the structure or function of the polypeptide. Most proteins can tolerate some changes in their amino-acid sequence if the alteration does not change a segment of the polypeptide essential for its structure or function.

Nonsense mutations are point mutations that change a codon specifying an amino acid into a termination codon, producing a truncated gene that codes for a polypeptide that is terminated prematurely. In many cases, essential amino acids will be deleted, and the protein’s activity will be altered, resulting in a mutant phenotype.

Frameshift mutations result if additions or deletions of base pairs occur that are not in a multiple of three. The polypeptide produced will likely have a complete new set of amino acids produced downstream of the frameshift. Frameshifts usually produce mutant phenotypes.

Occasionally, back mutations may occur to reverse a point mutation. Reversions sometimes occur when the original phenotype is restored by a new

Page 30: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

32 Chapter 1

change in the nucleotide sequence. In reversions, the original mutation is not restored to its previous unmutated form; rather, the second mutation restores the code for the original amino acid because the code is degenerate. Regulatory mutations are mutations that affect the ability to control expression of a gene.

The movement of a transposable element (TE) into a gene can also cre-ate mutations in genes. TEs are segments of foreign DNA that can move into genomes. When TEs move into a gene, as is shown in Figure 1.15, the gene will be inactivated or the gene product will be altered and produce in a visible phe-notype (mutation). TEs can cause other types of mutations, including inversions (Caceres et al. 1999). TEs are found in most eukaryotic organisms, and there are many types. TEs are important for understanding genome evolution and for genetic engineering, and they are discussed further in Chapters 3, 4, 9, and 14.

1.23 Common Genetic TerminologyA wild-type gene is normally identified only after a mutation has disrupted the phenotype of an organism. Mutations commonly are given a descriptive name, such as “white eyes.” The name of the gene usually is italicized (white) and is abbreviated using one, two, or three italicized letters (such as w). If the mutation is dominant, the name and abbreviation are capitalized (White and W); they are in lowercase (white) if the mutation is recessive. Individuals that are homozygous for the recessive w mutation are w/w and have white eyes. Heterozygous flies are w/w+, with the wild-type allele designated as w+, and their appearance (pheno-type) should be wild type. The gene product is called the white product or white protein and is not italicized. The term for the gene product may be abbreviated as the w protein. Sometimes the protein product is designated by the gene name but capitalized to distinguish it from the gene (WHITE).

Figure 1.15 Movement (transposition) of transposable elements into chromosomes can result in mutations that inactivate genes or alter their expression. Transposable elements are also known as jumping genes.

Page 31: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 33

1.24 Independent Assortment and Recombination during Sexual ReproductionFor organisms to survive and evolve with changing environmental conditions, they need to be able to generate genetic variability. Mutations are one source of genetic variability and thus are not always undesirable. Another source of genetic variability is the result of sexual reproduction.

In sexually reproducing organisms, the progeny produced by parents that have different versions of genes (different alleles, AA or aa) will have a differ-ent combination of alleles (Aa). This shuffling of the genetic information during sexual reproduction is due to the independent assortment of chromosomes into the gametes during meiosis. Thus, an individual of genotype Aa Bb, in which the genes A and B are located on different chromosomes, will produce equal numbers of four different types of gametes: AB, Ab, aB, or ab.

Crossing over also leads to recombination between DNA molecules. Crossing over occurs between homologous chromosomes during the production of eggs or sperm in meiosis I, and results in an exchange of genetic material. Crossing over allows new combinations of different genes that are linked (located on the same chromosome). Thus, if a parent has one chromosome containing with A and B, and the homologous chromosome has a and b alleles, a physical exchange between the chromatids during meiosis I can lead to gametes that have the fol-lowing combinations: A and B, A and b, a and B, and a and b. Nonhomologous recombination, crossing over between DNA lacking sequence homology, also may occur. Meiosis and mitosis are described further in Chapter 3.

General ReferencesBrown, T.A., (2011). Introduction to Genetics: A Molecular Approach. Garland Science, New York.Carlson, E.A., (2004). Mendel’s Legacy. The Origin of Classical Genetics. Cold Spring Harbor

Laboratory Press, Cold Spring Harbor, NY.Cooper, G.M., and Hausman, R.E., (2009). The Cell. A Molecular Approach, 5th ed. ASM Press,

Washington, DC.Deamer, D., and Szostak, J.W., (2010). The Origins of Life. Cold Spring Harbor Laboratory Press,

Cold Spring Harbor, NY.DePamphilis, M.L., and Bell, S.D., (2010). Genome Duplication: Concepts, Mechanisms, Evolution,

and Disease. Garland Science, New York.Friedberg., E.C., (1997). Correcting the Blueprint of Life: An Historical Account of the Discovery of

DNA Repair Mechanisms. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.Haga, S.B., (2006). Teaching resources for genetics. Nat. Rev. Genet. 7: 223–229.Hartl, D.L., and Jones, E.W., (1998). Genetics: Principles and Analysis. Jones and Bartlett, Sudbury,

MA.Micklos, D.A., and Freyer, G., (2010). DNA Science: A First Course, 2nd ed. Cold Spring Harbor

Laboratory Press, Cold Spring Harbor, NY.

Page 32: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

34 Chapter 1

Perdew, G.H., Van den Heuvel, J.P., and Peters, J.M., (2006). Regulation of Gene Expression. Humana Press, Totowa, NJ.

Russell, P.J., (2010). iGenetics. A Molecular Approach, 3rd ed. Benjamin Cummings, San Francisco, CA.

Watson, J.D., Baker, T.A., Bell, S.P., Gann, A., Levine, M., and Losick, R., (2008). Molecular Biology of the Gene, 6th Edition. Benjamin Cummings, Menlo Park, CA.

References CitedAdams, M.D., Celniker, S.E., Holt, R.A., Evans, C.A., Gocayne, J.D., and Amanatides, P.G., et al.

(2000). The genome sequence of Drosophila melanogaster. Science 287: 2185–2195.Atkins, J.F., Gesteland, R.F., and Cech, T.R., (2011). RNA Worlds: From Life’s Origins to Diversity in

Gene Regulation. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.Aubert, G., and Lansdorp, P.M., (2008). Telomeres and aging. Physiol. Rev. 86: 567–579.Avery, O.T., Macleod, C.M., and McCarty, M., (1944). Studies on the chemical nature of the

substance inducing transformation of pneumococcal types. J. Exp. Med. 79: 137–158.Baker, T.A., and Bell, S.P., (1998). Polymerases and the replisome: machines within machines. Cell

92: 295–305.Brosius, J. (2009). The fragmented gene. In: Natural Genetic Engineering and Natural Genome

Editing. G. Witzany, Ed. Ann. N. Y. Acad. Sci. 1178: 186–193.Bustamante, C., Bryant, Z., and Smith, S.B., (2003). Ten years of tension: single-molecule DNA

mechanics. Nature 421: 423–426.Caceres, M., Ranz, J.M., Barbadilla, A., Long, M., and Ruiz, A., (1999). Generation of a widespread

Drosophila inversion by a transposable element. Science 285: 415–418.Collins, L.J., and Penny, D., (2009). The RNA infrastructure: dark matter of the eukaryotic cell?

Trends Genet. 25: 120–128.Cook, P.R., (1999). The organization of replication and transcription. Science 284: 1790–1795.Cooper, G.M., (2000). The Cell: A Molecular Approach, 2nd ed. ASM Press, Washington, DC.Crick, F.H.C., (1958). On protein synthesis. Symp. Soc. Exp. Biol. XII: 139–163.Crick, F.H.C., Barnett, L., Brenner, S., and Watts-Tobin, R.J., (1961). General nature of the genetic

code for proteins. Nature 4809: 1227–1232.Darnell, J.E., (2011). RNA: Life’s Indispensable Molecule. Cold Spring Harbor Laboratory Press, Cold

Spring Harbor, NY.DePamphilis, M.L., (1999). Replication origins in the metazoan chromosomes: fact or fiction?

Bioessays 21: 5–16.DeSouza, S.J., Long, M., Klein, R.J., Roy, S., Lin, S., and Gilbert, W., (1998). Toward a resolution

of the introns early/late debate: only phase zero introns are correlated with the structure of ancient proteins. Proc. Natl. Acad. Sci. USA 95: 5094–5099.

DiGiulio, M., (1997). On the RNA world: evidence in favor of an early ribonucleopeptide world. J. Mol. Evol. 45: 571–578.

Dixon, N.E., (2009). Prime-time looping. Nature 462: 854–856.Eaton, M.L., Prinz, J.A., MacAlpine, H.K., Tretyakov, G., Kherchenko, P.V., and MacAlpine, D.M.,

(2011). Chromatin signatures of the Drosophila replication program. Genome Res. 21: 164–174.

Eddy, S.R., (2001). Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet. 2: 919–929.Francino, M.P., (2005). An adaptive radiation model for the origin of new gene functions. Nat.

Genet 37: 573–577.Gavin, K.A., Hidaka, M., and Stillman, B., (1995). Conserved initiator proteins in eukaryotes.

Science 270: 1667–1671.

Page 33: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

DNA, Gene Structure, and DNA Replication 35

Gesteland, R.F., Cech, T.R., Atkins, J.F. (Eds.), 2006. Cold Spring Harbor Monograph Series 43, 3rd ed. Cold Spring Harbor Laboratory Press. Cold Spring Harbor, NY.

Gilbert, W., DeSouza, S.J., and Long, M., (1997). Origin of genes. Proc. Natl. Acad. Sci. USA 94: 7698–7703.

Griffiths, F., (1928). The significance of pneumococcal types. J. Hyg. 27: 113–159.Herbert, A., (1996). RNA editing, introns and evolution. Trends Genet. 12: 6–8.Hershey, A., and Chase, M., (1952). Independent functions of viral protein and nucleic acid in

growth of bacteriophage. J. Gen. Physiol. 36: 39–56.Huberman, J.A., (1995). Prokaryotic and eukaryotic replicons. Cell 82: 535–542.Hubscher, U., Maga, G., and Spadari, S., (2002). Eukaryotic DNA polymerases. Annu. Rev. Biochem.

71: 133–163.Jeffares, D.C., Poole, A.M., and Penny., D., (1998). Relics from the RNA world. J. Mol. Evol. 46:

18–36.Johnston, W.K., Unrau, P.J., Lawrence, M.S., Glasner, M.E., and Bartel., D.P., (2001). RNA-catalyzed

RNA polymerization: accurate and general RNA-templated primer extension. Science 292: 1319–1325.

Knight, R.D., Landweber, L.F., and Yarus., M., (2001). How mitochondria redefine the code. J. Mol. Evol. 53: 299–313.

Lang, F., Gray, M.W., and Burger, G., (1999). Mitochondrial genome evolution and the origin of eukaryotes. Annu. Rev. Genet. 22: 351–397.

Leipe, D.D., Aravind, L., and Koonin, E.V., (1999). Did DNA replication evolve twice independently? Nucleic Acids Res. 27: 3389–3401.

Lindahl, T., and Wood, R.D., (1999). Quality control by DNA repair. Science 286: 1897–1905.Logsdon Jr., J.M., Tyshenko, M.G., Dixon, C., Jafari, J.D., Walker, V.K., and Palmer, J.D., (1995).

Seven newly discovered intron positions in the triose-phosphate isomerase gene: evidence for the introns-late theory. Proc. Natl. Acad. Sci. USA 92: 8507–8511.

Maienschein, J., (1992). Gene: historical perspectives. In: Fox-Keller, E., Lloyd, E.A. (Eds.), Keywords in Evolutionary Biology Harvard University Press. Cambridge, MA, pp. 122–127.

Mattick, J.S., (2009). Deconstructing the dogma: a new view of the evolution and genetic programming of complex organisms. Natural Genetic Engineering and Natural Genome Editing. Ann. N. Y. Acad. Sci 1178: 29–46.

Muller, H.J., (1947). The gene. Proc. R. Soc. Lond. B 134: 1–37.Nelkin, D., (2001). Molecular metaphores: the gene in popular discourse. Nat. Rev. Genet. 2:

555–559.Nesbo, C.L., Boucher, Y., and Doolittle, W.F., (2001). Defining the core of nontransferable

prokaryotic genes: the euryarchaeal core. J. Mol. Evol. 53: 340–350.Ogawa, T., and Okazaki, T., (1980). Discontinuous DNA replication. Annu. Rev. Biochem. 49:

421–457.Pearson, H., (2006). What is a gene? Nature 441: 399–401.Pesole, G., (2008). What is a gene? An updated operational definition. Gene 417: 1–4.Poole, A.M., Jeffares, D.C., and Penny, D., (1998). The path from the RNA world. J. Mol. Evol. 46:

1–17.Prusiner, S.B., and Scott, M.R., (1997). Genetics of prions. Annu. Rev. Genet. 31: 139–175.Richter, J.D., and Treisman, J.E., (2011). Not just the messenger: RNA takes control. Curr. Opin. Gen.

Dev. 21: 363–365.Sclafani, R.A., and Holzen, T.M., (2007). Cell cycle regulation of DNA replication. Annu. Rev. Genet.

41: 237–280.Shapiro, J. A. (2009). Revisiting the central dogma in the 21st century. In: Natural Genetic

Engineering and Natural Genome Editing. G. Witzany, Ed. Ann. N. Y. Acad. Sci. 1178: 6–28.Sharp, P.A., (2009). The centrality of RNA. Cell 126: 577–580.

Page 34: Insect Molecular Genetics || DNA, Gene Structure, and DNA Replication

36 Chapter 1

Sutton, M.D., and Walker, G.C., (2001). Managing DNA polymerases: coordinating DNA replication, DNA repair, and DNA recombination. Proc. Natl. Acad. Sci. USA 98: 8342–8349.

Trotman, C.N.A., (1998). Introns-early: slipping lately? Trends Genet. 14: 132–134.Tuck, A.C., and Tollervey, D., (2011). RNA in pieces. Trends Genet. 27: 422–432.Tuite, M.F., (2000). Sowing the protein seeds of prion propagation. Science 289: 556–557.Tyshenko, M.G., and Walker, V.K., (1997). Towards a reconciliation of the introns early or late

views: triosephosphate isomerase genes from insects. Biochim. Biophys. Acta 1353: 131–136.Verdun, R.E., and Karlseder, J., (2007). Replication and protection of telomeres. Nature 447:

924–931.Vitkova, M., Kral, J., Traut, W., Zrzavy, J., and Marec, F., (2005). The evolutionary origin of insect

telomeric repeats, (TTAGG)n. Chromosome Res. 13: 145–156.Watson, J.D., and Crick, F.H.C., (1953). Molecular structure of nucleic acids: a structure for

deoxyribose nucleic acid. Nature 171: 737–738.Zakian, O.A., (1989). Structure and function of the telomeres. Annu. Rev. Genet. 23: 579–604.