anatomy of genomes in eukaryotes and prokaryotesprokaryotic genomes have small sizes and most of...

28
Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU Dr. Fahd Nasr-All rights reserved 52 Anatomy of genomes in Eukaryotes and Prokaryotes I. Opening remark Eukaryotic genomes range in size from 10Mb to over 100,000Mb. In general, the size of the genome is consistent with the complexity of the organism. The eukaryote genome is divided into two distinct components: the nuclear genome and the mitochondrial genome, which is much smaller (In the case of photosynthetic organisms there exists an additional small component, which is confined within the chloroplasts). The nuclear genome of eukaryotes, with no exceptions, is split into distinct molecules called chromosomes that are best visible during the cell division when they adopt a very condensed state. Each chromosome contains one linear DNA molecule. The number of DNA molecules and thus chromosomes is unrelated to the biological features of the organism. For instance, the simple eukaryote the yeast Saccharomyces cerevisiae has 16 chromosomes while the fruit fly Drosophila melanogaster has only 4 chromosomes. This number reflects some events that happened during evolution and have shaped the genome architecture of different organisms. In contrast with eukaryotes, prokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome whereas others harbor many chromosomes and plasmids. In this chapter we will first describe the eukaryotic genome i.e. chromosome structure, genome complexity, karyotypes and sex chromosomes. Second, we will present an overview of the prokaryotic genomes where many of them have been completely sequenced and published. We will also try to depict the dynamism and variations in eukaryotic as well as prokaryotic genomes. II. Anatomy of the eukaryotic genome II.1. Chromatin structure During most of the cell cycle DNA is organized within the interphase chromatin. Chromatin is the fibrous complex containing genomic DNA, RNA and two classes of proteins: histones and non-histones. Histone proteins are of five kinds, they are rich in lysine and arginine residues and thus positively charged. For this reason they have affinity and bind tightly to the negatively charged phosphates in DNA. The term chromatin is generally used to describe the dispersed nucleoproteins present in the interphase nucleus, as distinct from the condensed chromosomes 1 that are visible during mitosis and meiosis. Two types of chromatin are distinguishable in the interphase nucleus: heterochromatin and euchromatin. The former, which consists of very densely packed nucleoprotein fibers, may 1 The term chromosome can be defined as a complex of DNA, RNA and proteins. It also applies to each DNA molecule either in the unpacked or the compact forms. Nevertheless, the word chromosome is generally associated with the typical appearance that the DNA adopts during mitosis and meiosis (metaphase chromosomes).

Upload: others

Post on 21-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

52

Anatomy of genomes in Eukaryotes and Prokaryotes

I. Opening remark Eukaryotic genomes range in size from 10Mb to over 100,000Mb. In general, the size of the genome is consistent with the complexity of the organism. The eukaryote genome is divided into two distinct components: the nuclear genome and the mitochondrial genome, which is much smaller (In the case of photosynthetic organisms there exists an additional small component, which is confined within the chloroplasts). The nuclear genome of eukaryotes, with no exceptions, is split into distinct molecules called chromosomes that are best visible during the cell division when they adopt a very condensed state. Each chromosome contains one linear DNA molecule. The number of DNA molecules and thus chromosomes is unrelated to the biological features of the organism. For instance, the simple eukaryote the yeast Saccharomyces cerevisiae has 16 chromosomes while the fruit fly Drosophila melanogaster has only 4 chromosomes. This number reflects some events that happened during evolution and have shaped the genome architecture of different organisms. In contrast with eukaryotes, prokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome whereas others harbor many chromosomes and plasmids. In this chapter we will first describe the eukaryotic genome i.e. chromosome structure, genome complexity, karyotypes and sex chromosomes. Second, we will present an overview of the prokaryotic genomes where many of them have been completely sequenced and published. We will also try to depict the dynamism and variations in eukaryotic as well as prokaryotic genomes. II. Anatomy of the eukaryotic genome II.1. Chromatin structure During most of the cell cycle DNA is organized within the interphase chromatin. Chromatin is the fibrous complex containing genomic DNA, RNA and two classes of proteins: histones and non-histones. Histone proteins are of five kinds, they are rich in lysine and arginine residues and thus positively charged. For this reason they have affinity and bind tightly to the negatively charged phosphates in DNA. The term chromatin is generally used to describe the dispersed nucleoproteins present in the interphase nucleus, as distinct from the condensed chromosomes1 that are visible during mitosis and meiosis. Two types of chromatin are distinguishable in the interphase nucleus: heterochromatin and euchromatin. The former, which consists of very densely packed nucleoprotein fibers, may

1 The term chromosome can be defined as a complex of DNA, RNA and proteins. It also applies to each DNA molecule either in the unpacked or the compact forms. Nevertheless, the word chromosome is generally associated with the typical appearance that the DNA adopts during mitosis and meiosis (metaphase chromosomes).

Page 2: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

53

be condensed permanently (constitutive heterochromatin) or temporarily (facultative heterochromatin). Euchromatin, containing less densely packed fibers, represents the regions where gene expression can occur. When released from cells and viewed in the electron microscope, chromatin appears as a mesh of fibers (ca. 11nm diameter): the "Beads-on-a-string" structure. These beads are called nucleosomes. Treatment of the 11nm fibers with an endonuclease results in the separation of nucleosomes, each of which is called a chromatosome or simply nucleosome. The latter contains about 168bp of DNA, one molecule of histone H1, and a histone octamer comprising two molecules each of histones H2A, H2B, H3 and H4 (Table 1). Further endonuclease action release a core particle, a disc shaped structure (ca. 11nm is diameter and 5.7nm thick) consisting of the histone octamer around which a 146bp sequence (corresponding to one and three-quarters turns) of B-DNA is wrapped in a left-handed superhelix. In the chromatosome, the DNA form two complete superhelical turns around the histone octamer, the turns being secured by histone H1, a three-domain protein, which serves to seal the ends of the turns to the nucleosome core. In the intact 11nm chromatin fiber the chromatosome structures are connected to one another via linker DNA which may vary from 40 to 60bp.

Table 1. General properties of histone proteins. These basic proteins, rich in arginine and lysine, form the major components of chromatin in eukaryotes. The main classes of histones are: H1 (a lysine-rich protein), H2A and H2B (slightly lysine-rich proteins), and H3 and H4 (both are arginine-rich proteins). Histones interact with DNA via ionic bonds between the positively charged arginine/lysine residues and the negatively charged phosphate groups of the backbone. Note that chromatin remodeling depends largely on the many modifications that histones undergo during the cell cycle such as methylation, acetylation, and phosphorylation.

Histone protein Ratio of lysine/arginine Molecular weight Copies per nucleosome

H1 59/3 21.2 one out of core H2A 13/13 14.1 two in core H2B 20/8 13.9 two in core H3 13/17 15.1 two in core H4 11/14 11.4 two in core

Box 1- Although histones are the major constituents of chromatin in most eukaryotes, there is an exception to the rule Dinoflagellates form a large group of photosynthetic and/or heterotrophic organisms regarded either as algae or as protozoa. The dinoflagellate nucleus contains chromosomes which are unique among eukaryotes in lacking centromeres and containing little or no protein. Dinoflagellate nuclear organization is considered as an intermediate between prokaryotic and eukaryotic, thus it is termed mesokaryotic. It is worth noting that mitosis in dinoflagellates differs significantly from the typical mitosis that occurs in the cells of higher animals. In fact, during mitosis the nuclear membrane remains intact and the mitotic spindle form externally (extranuclear). The microtubules passes through the nucleus via channels or holes (also called fenestrae) lined with the nuclear envelope.

Page 3: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

54

II.2. Organization of chromosomes The vast majority of human cells are said to be diploid i.e. their genetic material consists of 23 pairs of DNA molecules in the form of chromosomes. As the human genome comprises 3x109bp, diploid cells contain 6x109bp representing, at an average of 0.34nm/bp in B-DNA, a DNA length of more than 2m. In order to package this large amount of DNA into a nucleus 5m in diameter, it is clear that the DNA must be condensed by a factor of more than 105. This is achieved first, by wrapping the DNA around proteins structures called nucleosomes and second, packing the nucleosomes to form a helical filament that is arranged in loops associated with the nuclear matrix2. Although the precise position of the linker histone H1 relative to the nucleosome is not known, it is generally believed that H1 acts as a clamp to prevent the DNA from detaching from the outside of the nucleosome3. Moreover, The consecutive nucleosomes seem to be stabilized through interactions between their respective H1 histones to create the 30nm filament (Fig. 1). The "Beads-on-a-string" structure, thought to represent the unpacked form of chromatin, occurs only infrequently in living nuclei. The use of very gentle cell breakage techniques in the 1970s has resulted in a more compact form of chromatin called the 30nm fiber. The 30nm fiber is probably the major type of chromatin in the interphase nucleus. Despite the packing ratio of 40 provided by the 30nm filament, this order of condensation is still not enough to fit the DNA material within the nucleus; indeed, each of the human chromosome would extend approximately 1mm as a 30nm fiber. An average human chromosome would be 200 times longer than the diameter of the nucleus. Thus starting with the 11nm fiber a higher order of chromatin is created when the nucleosomes are wound in a solenoid fashion having six nucleosomes per turn (Fig. 1). The resulting 30nm filament then forms long DNA loops of variable length, each containing on average 60,000 and 150,000 bp. Electron microscope analysis suggests that 18 loops are arranged radially to from a miniband unit of the chromosome. II.3. Special features of the metaphase chromosomes and Karyotype4 As described above, during most of the cell life the chromosomes are too elongated and thin to be seen under a microscope. However, as mitosis (or meiosis) begins, DNA adopts a very compact form of packaging, which results in the highly condensed metaphase chromosomes. These are short (~5m) and condensed structures which can easily be stained and observed under the light microscope. The individual metaphase chromosomes do not all look the same. Under microscope these chromosomes vary in terms of size and morphology. Each chromosome is characterized by a functionally important component called centromere,

2 It is a skeleton or scaffold of proteins providing a structural framework within the nucleus. 3 Recent results suggested that, at least in some organisms, H1 histone is not positioned on the extreme surface of the nucleosome, which is consistent with its role as a clamp, but rather is inserted between the core octamer and the DNA [Pennisi (1996) Linker histones, DNA's protein custodians, gain new respect. Science, 274: 503-504]. 4 Karyotype is defined as the chromosomal constitution of a eukaryotic cell in terms of the number, size and morphology of the metaphase chromosomes. A systematized photographic representation may be referred to as an idiogram or as a karyogram.

Page 4: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

55

which appear as a constriction under the microscope. The position of centromere and the size of each chromosome are constant parameters and help in the identification of individual chromosome (Fig. 2). The karyotype that is defined as a complete set of metaphase chromosome does not vary within the same species. However, the number, size and morphology of metaphase chromosomes vary widely among eukaryotic organisms (Table 2).

Figure 1. A model for chromosome structure. The 2nm DNA double helix is wound twice around histone octamers to form the 11nm fiber of nucleosomes, each of which contains 160 bp (corresponding to 80bp per turn). The nucleosomes are wound in a solenoid structure (6 per turn) to form the 30nm chromatin fiber. In the next level of condensation, the 30nm filament forms loops, each containing about 60,000bp, which are attached at their bas to the nuclear scaffold; this higher order encompassing about 50 turns per turn ensure a packing ratio of 680. About eighteen of these loops are then arranged radially to form a miniband unit of a chromosome.

Page 5: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

56

Simple staining procedures, such as using Orcein and Giemsa dye without any special treatment, result in the metaphase chromosomes uniformly stained, which makes it difficult to distinguish between chromosomes that have the same size and morphology. To overcome this problem, a number of staining techniques were devised, each resulting in a banding pattern that is characteristic for the individual chromosome. These procedures, which generate bands that are more intensely stained than others, are summarized in table 3. This means that the set of chromosomes of a given organism can be represented as a karyogram, in which the banding pattern of each chromosome is shown. Figure 3 and 4 show the karyotype of a normal human male with 22 autosomes and two sex chromosomes, X and Y.

Table 2. Chromosomes numbers in different species together with two extreme examples. Each species has a characteristic number of homologous pairs of chromosomes termed 2n or diploid number. The record for minimum number of chromosomes is found in a subspecies of the ant Myrmecia pilosula, in which females have a single pair of chromosomes. Because this species reproduces by a process called haplodiploidy, females that result from fertilized eggs are diploid, whereas males, which develop from unfertilized eggs, are haploid i.e. they have a single chromosome. The record for maximum number of chromosomes belongs to the fern family, which is polyploid (a common feature in plants). In this fern, called Ophioglossum reticulatum, there are about 630 pairs of chromosomes or 1260 chromosomes per cell. So, it is not a trivial affair that these cells should manage to segregate accurately this huge number of chromosomes during cell division.

Species (Genus and species)

diploid number of chromosomes

Ophioglossum reticulatum (A fern) 1260 Equisetum arvense (Field horsetail, a plant) 216 Cambarus clarkii (A crayfish) 200 Canis familiaris (Domestic dog) 78 Equus caballus (Horse) 64 E. asinus (Donkey) 62 Bison bison (Buffalo) 60 Bos taurus, B. indicus (Cattle) 60 Capra hircus (Goat) 60 Ovis aries (Sheep) 54 Homo sapiens (Human) 46 Mus musculus (House mouse) 40 Sus scrofa (Pig) 38 Felis catus (Cat) 38 Xenopus laevis (South African clawed frog) 36 Saccharomyces cerevisiae (Budding yeast) 32 Zea mays (Corn or maize) 20 Caenorhabditis elegans (Microscopic roundworm) 12 Arabidopsis thaliana (Plant in the mustard family) 10 Drosophila melanogaster (Fruit fly) 8 Parascaris equorum var. univalens (Parasitic roundworm) 2 Myrmecia pilosula (An ant) 2

Page 6: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

57

Figure 2. Structure of the metaphase chromosomes. Before nuclear division (mitosis or meiosis) each chromosome is duplicated (or replicated) during the S phase of the interphase. As cell division starts, duplicated chromosomes, also called dyads, become condensed and visible by light microscope. Henceforth, each dyad consists of duplicated chromosomes held together at the centromere. For convenience and while they are still attached, the duplicated chromosomes of each dyad are called sister-chromatids, but this should not obscure the fact that each chromatid is an authentic chromosome with a full complement of genes. Moreover, the centromere corresponds to a region of DNA, whereas the kinetochore is a complex of proteins (11 in the yeast Saccharomyces cerevisiae) that forms at the centromere and helps to separate the sister chromatids as cell division proceeds into anaphase. Note that the centromere divides each chromosome into two arms: the shorter is called the p arm and the longer the q arm. Table 3. Different staining procedures used to generate specific banding pattern for each of the metaphase chromosomes. Technique Procedure Banding pattern G-banding Mild proteolysis or heat followed Dark bands are AT-rich by Giemsa staining Pale bands are GC-rich R-banding Heat denaturing followed by Dark bands are GC-rich Giemsa staining Pale bands are AT-rich Q-banding Stain with quinacrine Dark bands are AT-rich Pale bands are GC-rich C-banding Denature with barium hydroxide Dark bands are constitutive then stain with Giemsa heterochromatin

Note that the karyogram must be seen as a physical map at a very low resolution. High-resolution mapping can be obtained with a new procedure called FISH or Fluorescence In Situ Hybridization (see chapter 10). This technique allows specific genes or sequences to be directly visualized by fluorescent microscopy. In this procedure the chromosomes are first treated to cause denaturing of DNA and then DNA probes, tagged with fluorescent chemicals, are added

Centromere (DNA)Kinetochore (protein)

Dyad

Sister chromatidsNon-sister chromatids

Homologous pair

Page 7: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

58

where they bind to the complementary regions. Applied to metaphase chromosomes, only genes (or markers) that are 1Mb apart can be resolved as two distinct hybridization signals. However, when applied to non metaphase chromosomes resolution is enhanced and markers that are less than 25kb apart can be distinguished.

Figure 3. The Karyogram of a normal human male. The karyotype of male contains 23 pairs of homologous chromosomes: 22 pairs of autosomes, one X chromosome and one Y chromosome. The karyotype of human female contains the same 22 pairs of autosomes and 1 pair of X chromosomes. Banding patterns are generated by the R-banding technique {adapted from Dutrillaux and Lejeune [Dollander and Fenart (1979) Eléments de embryologie, Flammarion Médecine-Sciences, 4th Ed}. Group A, chromosomes 1 to 3; Group B, 4 and 5; Group C, 6 to 12; Group D, 13 to 15; Group E, 16 to 18; Group F, 19 and 20; Group G, 21 and 22. Note that each autosomal pair is represented by one chromosome in this Karyogram.

Page 8: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

59

Figure 4. The karyogram of normal human male where the chromosomes are shown with the G-banding pattern. Band numbers are given to the left. rDNA is a region containing many repeats for the ribosomal RNA genes. Constitutive heterochromatin is very compact and lacks genes {adapted from Strachan and Read (1996) Human Molecular Genetics. BIOS Scientific Publishers, Oxford}.

Page 9: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

60

Box 2- Human cytogenetics out of ages Definition Cytogenetics, as a discipline that overlaps both cytology and genetics, seeks to study the chromosomes and the related diseases caused by abnormalities. These affect either the structure (chromosome mutations) or the number of chromosomes (aneuploidy and polyploidy). Therefore, cytogenetics is the study of normal and abnormal chromosomes with emphasis on the structure-phenotype relationships and the analysis of the causes of chromosome abnormalities. The complete set of chromosomes are observed and characterized by examining the individual's karyotype. The latter, which is a microphotograph representation of metaphase chromosomes, provides a description of the structure and the number of chromosomes. Here we will present a historical view of cytogenetics and describe one protocol to prepare a karyotype. Historical view of cytogenetics The year 1956 is considered to mark the beginning of modern human cytogenetics. Until this time the number of chromosomes in the normal human cell was considered to be 48. Due to improvements in techniques it was discovered that the correct number5 is 46. After this, further technological improvements allowed the identification of individual chromosomes and the association of specific genetic disorders with specific chromosomes. Historians have divided the discipline of human cytogenetics into five "eras": the "Dark Ages", the "Hypotonic Period", the "Trisomy Period", the "Banding Era", and the "Molecular Era". During the "Dark Ages" (prior to 1952) mammalian tissue culture techniques were developed, as were techniques for arresting cells during division, which allowed chromosomes to be visualized. Early studies reported the number of chromosomes per cell to be 48, and staining techniques allowed for limited differentiation of specific chromosomes, based on darkly vs. lightly stained areas. The "Hypotonic Era" (started in 19526) denotes the use of a solution with a lower salt concentration than the cells it contains. This causes the cells to absorb water through their membranes and swell (but not burst). The swollen cells allow the chromosomes to readily separate, making them easier to count. Thus the correct chromosome number, namely 46, was established. During the "Trisomy Period" cytogeneticists turned their attention to patients with congenital abnormalities. Patients with Down syndrome were discovered to have an additional copy of a small chromosome, chromosome number 217. The syndrome is therefore associated with a trisomy 21 genotype. Other trisomies were also discovered during this period, namely trisomy 13 (Patau syndrome) and trisomy 18 (Edward syndrome). Numerical abnormalities involving sex chromosomes (the X and Y chromosomes such as XXY Klinefelter syndrome, XO Turner syndrome, etc.) were also described for the first time and associated with specific clinical phenotypes. Further advances in technology led to banding techniques (hence the "Banding Era"), which brought out horizontal bands of differential staining intensity (first employed in fluorescence microscopy). The pattern of bands were specific for individual chromosomes, and allowed the identification of each chromosome. This in turn made possible the recognition of structural abnormalities associated with specific genetic syndromes. Nowadays laboratories employ high-resolution banding techniques; this increases the number of bands visible, and therefore the level of resolution at which chromosomes can be studied. The most recent developments in cytogenetics have led to the "Molecular Era". Advances in the use of DNA probes have allowed cytogeneticists to hybridize these probes to chromosomes and determine if a specific DNA sequence is present on the target chromosome (see FISH technique, Annex 1). This has been useful in detecting abnormalities beyond the resolution level of studying banded chromosomes at the microscope, and also in determining the location of specific genes on chromosomes.

5 Tjio, J.H. and Levan, A. (1956). The chromosome number of man. Am J Obstet Gynecol, 130:723-724. 6 Hsu, T.C. (1952). Mammalian chromosomes in vitro. I. The karyotype of man. J Hered, 43:167-172. 7 Lejeune, J. et al. (1959) Etude des chromosomes somatiques de neuf enfants mongoliens. Compt Rend 248:1721-1722

Page 10: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

61

Cytogenetics in diagnosis Cytogenetic analysis has been an invaluable tool in screening for and diagnosing genetic disorders. In the future cytogenetic methods will become more and more linked to molecular techniques, and will continue to play an important role in medical service and research. Preparation of a karyotype Metaphase cells are required to prepare a standard karyotype, and virtually any population of dividing cells could be used. Blood is easily the most frequently sampled tissue, but at times, karyotypes are prepared from cultured skin fibroblasts or bone marrow cells. None of the leukocytes in blood normally divide, but lymphocytes can readily be induced to proliferate, providing a very accessible source of metaphase cells. There are many protocols for preparing a karyotype from peripheral blood lymphocytes, but a rather standard series of steps is involved: (i) A sample of blood is drawn and coagulation prevented by addition of heparin. (ii) Mononuclear cells are purified from the blood by centrifugation through a dense medium that allows red cells and granulocytes to pellet, but retards the mononuclear cells (lymphocytes and monocytes). (iii) The mononuclear cells are cultured for 3-4 days in the presence of a mitogen like phytohemagglutinin, which stimulates the lymphocytes to proliferate. (iv) At the end of the culture period, when there is a large population of dividing cells, the culture is treated with a drug such as colcemid or colchicine, which disrupts mitotic spindles and prevents completion of mitosis. This greatly enriches the population of metaphase cells. (v) The lymphocytes are harvested and treated briefly with a hypotonic solution. This makes the nuclei swell osmotically and greatly aids in getting preparations in which the chromosomes don't lie on top of one another. (vi) The swollen cells are fixed, dropped onto a microscope slide and dried. (vii) Slides are stained after treatment to induce a banding pattern as described above. Once stained slides are prepared, they are scanned to identify "good" chromosome spreads, which are photographed. The photos then are given to students or trainees, who cut out the images of each chromosome and paste them to a sheet in an orderly manner. Alternatively, a digital image of the chromosomes can be cut and pasted using a computer. If standard staining was used, the orderly arrangement is limited to grouping like-sized chromosomes together in pairs, whereas if the chromosomes were banded, they can be unambiguously paired and numbered. Karyotypes are presented in a standard form. First, the total number of chromsomes is given, followed by a comma and the sex chromosome constitution (a normal male human: 46, XY). This shorthand description is followed by coding of any autosomal abnormalities (a male human with trisomy 21: 47, XY, +21).

II.4. Centromere and telomere: revisited Two components of the metaphase chromosomes are functionally important. The first is the centromere, which plays a central role in holding the daughter chromosomes together and in acting as the attachment point for the microtubules that draw the daughters to their respective nuclei when the cell divides (the plate-like structures that are present in the dividing cell at the centromeric region are called Kinetochores). The DNA from centromeres has been sequenced and shown to be made up of repetitive units, which in humans are 171bp in length and are called alphoid. In the yeast Saccharomyces cerevisiae, centromeric sequences fall within a stretch of about 120bp, in which three conserved regions can be identified by the sequence homologies between yeast centromeres. The second important chromosomal structure is the telomere or the terminal region of each chromosome. Telomeres are important because they mark and protect the ends of chromosomes. Telomeric DNA is made up of hundreds of copies of a repeated motif such as 5'-TTAGGG-3' in human. Special telomere proteins are

Page 11: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

62

believed to bind to the repeat sequences; as a consequence the telomeric nucleosomes may have special structures. III. Anatomy of the prokaryotic genomes III.1. The relatively small sizes of the prokaryotic genomes In 1995 Venter and colleagues reported the complete sequence of the genome of the bacterium Haemophilus influenza8. Its genome of 1.83Mb encompasses 1743 open reading frames (ORFs) of which 40% encode proteins of unknown function. The second bacterial genome to be fully sequenced was that of Mycoplasma genitalium. This is the smallest bacterial genome sequenced so far; the 580kb sequence was shown to specify 470 ORFs that cover about 88% of the genome. Many other bacterial genomes were completely sequenced; most of them have a genome size of less than 5Mb (Table 4). III.2. Dynamism of bacterial genomes Most of the bacterial genomes are circular and haploid, however, as information accumulate in the databases some exceptions to this traditional view start to emerge. In addition, the complete sequencing of genomes of some commonly studied bacteria has brought a new dimension to biology affecting more particularly the fields of genetics and biochemistry. The so-called genome projects that aim at the sequencing of entire genomes have launched a new era in science and created a new field in biology: the Genomics (see below). This is a comprehensive study of all the genes and their interactions in a given organism. Comparative analysis of bacterial genomes led to a new vision of genomes in terms of genes organization, operons structures, G+C content, transposable elements, number and architecture of the origins of replication and many other points (see table 4). Escherichia coli, the most famous bacterial species, has been extensively analyzed genetically and biochemically. E. coli9 genome consists of one circular chromosome of 4.6Mb and 4289 predicted genes. However, the size of prokaryotic genomes varies widely ranging from 0.58Mb in Mycoplasma genitalium to 9.2Mb in Myxococcus xanthus. Not only the size but also the number and form of chromosomes vary between species. For instance, the genome of Vibrio cholerae, the causative agent of cholera, consists of two circular chromosomes of 2,961Kb and 1,072Kb. Six species of Brucella have two chromosomes of 2.1Mb and 1.2Mb. Some bacterial species contains plasmids that carry a part of the genetic information encoded by the genome. Xylella fastidiosa has two plasmids harboring 66 genes. Deinococcus radiodurans, a bacterium noted for its resistance to radiation damage, has also two plasmids.

8 Fleischmann, R.D. et al. (1995). Whole-genome random sequencing and assembly of Haemophilus influenza Rd. Science, 269: 496-512. 9 Usually, E. coli is a beneficial bacterium found in the human gastrointestinal system. Nevertheless, it also exists in harmful forms, one of them, called O157:H7, was found to be linked to human disease and cause an unusual and severe gastrointestinal ailment. The E. coli O157:H7 has a circular genome of 5.44Mb specifying about 5,416 genes.

Page 12: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

63

Table 4. Genome sizes, G+C content and number of genes in some prokaryotes. The species marked with asterisk are Archeae also known as Archaebacteria. In fact, the term bacterium was used to designate all microscopic prokaryotes. However it turns out that prokaryotes entails two different groups: Bacteria and Archeae. Archeae are unicellular prokaryotes, mostly found in extreme environments, and have an independent evolutionary history. Their genes are different from those of bacteria and eukaryotes and are classified in a third kingdom. Studies have provided considerable evidence that archeae are more closely related to eukaryotes than they are to bacteria. Some of the species listed below are pathogens, whereas the others are not. The number of genes (not final) indicates protein-coding plus RNA-coding genes. Note that the list is not exhaustive as many other prokaryotic genomes have been entirely sequenced and published. Species Genome size in Kb G+C (%) Nb of genes Aeropyrum pernix* 1,670 56.31 2,694 Aquifex aeolicus* 1,551 43.48 1,522 Achaeoglobus fulgidis* 2,178 48.58 2,407 Bacillus subtilis 4,215 43.52 4,098 Borrelia bugdorferi 911 28.59 853 Camylobacter jejuni 1,641 30.5 1,731 Chlamydia pneumoniae 1,230 40.58 1,052 Chlamydia trachomatis 1,043 41.31 894 Escherichia coli 4,639 50.79 4,289 Haemophilus influenza 1,830 38.15 1,709 Helicobacter pylori 1,668 38.87 1,566 Methanococcus jannaschi* 1,665 31.43 1,715 Methanobacterium thermoautotrophicum* 1,751 49.54 1,869 Mycoplasma genitalium 581 31.69 480 Mycoplasma pneumoniae 816 40.01 677 Mycobacterium tuberculosis 4,412 65.61 3,909 Mycobacterium leprae 3,268 57.79 1,604 Pyrococcus abyssi* 1,765 44.71 1,763 Pyrococcus horikoshii* 1,739 41.88 2,058 Rickettsia prowazekii 1,112 29.00 834 Thermotoga maritima 1,861 46.25 1,846 Treponema pallidum 1,138 52.77 1,031 Vibreo cholerae 4,033 47.5 3,885

Moreover, the finding that Borrelia burgdorferi has a linear chromosome has overturned on old paradigm stating that prokaryotic organisms have only circular genomes. The genetic information of B. burgdorferi is based on its single linear chromosome, 911kb in length and carrying 853 genes, accompanied by at least 17 linear and circular plasmids, which add another 533kb and 430 genes10. Telomeric structures of the linear DNA molecules follow a model different than that of the eukaryotic chromosomes. In B. burgdorferi the ends are

10 Fraser, C.M. et al. (1998). Genomic sequence of the lyme disease spirochaete, Borrelia burgdorferi. Nature, 390: 580-586.

Page 13: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

64

covalently linked forming hairpin structures (see DNA replication). Finally, prokaryotic genomes are not necessarily haploids; it was found that D. radiodurans has several copies of its chromosome (4 or 5). This polyploidy accounts for its resistance to radiation damage because it allows the bacterium to reconstitute a wild-type chromosome (or at least a functional chromosome) from the damaged copies by recombination.

Figure 5. A model proposed for the E. coli nucleoid. The genome is organized within loops, each of which contains approximately 100kb. About 50 DNA loops radiate from the central protein core. Because loops seem to be secured by a yet unknown mechanism, each DNA loop forms an independent domain. Inside a loop DNA can be either supercoiled or released due to a break that has occurred in this segment.

III.3. The structure and organization of the prokaryotic genome A traditional view stipulates that prokaryotic genome is single tightly coiled circular DNA molecule, localized in the nucleoid. In E. coli its circular chromosome of 4.6Mb has a molecular mass of 3x106 kD and a circumference of 1.6mm. As with eukaryotes, a prokaryotic genome has to squeeze into a small space e.g. E. coli cell is just 1x2m. To achieve this goal the genome is packaged in a highly ordered fashion with the help of DNA-binding proteins. The first feature to be recognized about the organization of the DNA in the nucleoid comes from studies on the E. coli chromosome that turned out to be negatively supercoiled. Because the circular chromosome of E. coli has no free ends, the strain or torsional stress cannot be released by rotation. Instead the molecule responds to the strain by writhing itself in a more

DNA loops

Central protein core

Page 14: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

65

compact structure. Supercoiling, that is a nice way to package a circular molecule into a tiny space, is thought to be maintained and controlled by DNA gyrase and DNA topoisomerase I (see DNA replication). It was obvious from various lines of evidence that the E. coli DNA molecule does not have an unlimited freedom to rotate if a break were introduced somewhere in the chromosome. Instead, a break results only in the loss of supercoiling in a limited region of DNA. This can be explained by the fact that DNA is bound by proteins that restrict its the ability to relax. The current model states that the E. coli genome is attached to a protein core from which 45 to 50 loops radiate out in the cytosol (Fig. 5). Each DNA loop form a domain containing approximately 100kb of supercoiled DNA, this corresponds to the amount of DNA that become relaxed from a single break. The protein part of the nucleoid includes, in addition to DNA gyrase and DNA topoisomerase I, at least four proteins that are believed to play a specific role in DNA packaging. The most abundant of these is the protein HU, which is different from eukaryotic histones but acts in a similar fashion. HU forms a tetramer around which about 60bp of DNA become wound. It is estimated that a single E. coli cell contains approximately 60,000HU proteins, however, it is not known whether or not these tetramers are evenly distributed along DNA or simply restricted to the central protein core. IV. Genomics11 as a new field in biology IV.1. Definition The word genome is more than 75 years old and refers to an organism’s complete set of GENes and chromosOMEs. The term genomics was coined in1986 to describe a new field of biological sciences that aims at the study of an organism's entire DNA complement. This scientific discipline concerns mapping, sequencing and analyzing the entire genome that contains all the biological information required to specify all the functions that maintain the life of an organism. In the late 80’s, the scientific community has arrived to a kind of conviction that the characterization of the molecular mechanisms of life as well as their regulatory networks needs a comprehensive set of all the genes involved to achieve this goal. For this, they decided to launch out an enterprise that we call now “Genomics”. The first step is the determination of the complete DNA sequence of the model organism being studied. The genome sequence should then be submitted to further analysis in order to know how the genome functions. IV.2. Prokaryotic genomes When compared to their eukaryotic counterparts, prokaryotic genomes appear much smaller. The paradigm of prokaryotes is the bacterium E. coli whose genome is contained in a single and covalently closed circular DNA molecule. However, some variations occur concerning the number of DNA molecules and the physical organization (e.g. linear genome

11 Many students from the fourth-year biology class have contributed, during the academic year 2001-2002, to this section entitled "Genomics”. Their research projects were dedicated to the study of different genome projects as well as to the different functional approaches that had been devised to analyze entire genomes.

Page 15: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

66

instead of circular). More than 75 prokaryotic organisms have seen their genomes completely sequenced and submitted to functional analysis. The first genome project to have been achieved was that of the bacterium Haemophilus influenzae12, which a pathogen in man. Two years later the best two bacterial models for biochemical and genetic studies, the gram-positive Bacillus subtilis13 and the gram-negative E. coli, have joined the list of completed genomes. Later, in 2002, the complete genome sequence of the actinomycete Streptomyces coelicolor A3(2) was published. Prokaryotic species fall into two major kingdoms: Bacteria (or eubacteria) and Archaea, each of which will be treated separately. IV.2.1. The two kingdoms of prokaryotes: Bacteria and Archaea Since there is increasing confusion about the relationship of the major divisions of life, it is important to have an idea about the history of living organisms. Until 1970’s, living organisms were divided into two major branches: the prokaryotes (no nucleus) and the eukaryotes (defined nucleus). The advent of RNA sequencing led to a three-kingdom world, the prokaryotes being divided into two branches: the true bacteria (Eubacteria, also known as Bacteria) and a motley group composed of organisms from diverse and extreme habitats (the archaeabacteria or simply the Archaea). In 1996, for example, comparison of genomic sequences from the microbe Methanococcus janaschii with those of other organisms confirmed the existence of the archaeal branch of life. IV.2.2. Bacterial genomes Bacteria are often pathogens as they are the causes of human and animal diseases. However, certain bacteria, the actinomycetes(14), produce antibiotics such as streptomycin and nocardicin; others live symbiotically in the guts of animals or elsewhere in their bodies, or on the roots of plants. Bacteria are of such immense importance because of their extreme flexibility, capacity for growth and reproduction, and great age-the oldest fossils known, nearly 3.5 billion years old, are fossils of bacteria-like organisms. Figure 6 shows six bacterial species whose genomes were sequenced and published. More than 60 bacterial genomes have been sequenced however, we will present a brief description of the genome projects of E. coli and B. subtilis.

12 Fleischmann, R.D. et al. (1995) Whole genome random sequencing and assembly of Haemophilus influenza Rd. Science, 269: 496-512. 13 Kunst, F. et al. (1997) The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature, 390: 249-256 14 Actinomycete refers to any member of the order Actinomycetal. The latter includes gram+ and typically aerobic bacteria. Actinomycetes can contain various transmissible or non-transmissible plasmids, some of which are involved in antibiotic production. Note that most members of the Actinomycetals have a GC content higher than 55 %.

Page 16: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

67

Figure 6. Some of the bacterial species that were selected and submitted to genomic analysis15. (A) E. coli (0157:H7). (B) Helicobacter pylori. (C) Salmonella typhi. (D) Staphylococcus aureus. (E) Campylobacter jejuni. (F) Vibrio cholerae.

IV.2.2.a. The complete genome of Escherichia coli K-12 It is an important component of the biosphere that colonize the lower gut of animals and as a facultative anaerobe, survive when released to the nature environment, allowing wide spread dissemination of the new hosts. K-12 isolate of strain MG 1655 was chosen as representative to sequence, having only been cured of the temperate bacteriophage lambda and

15 E. coli (0157:H7) is a hemorrhagic type, enteric, facultatively anaerobic and potentially fatal to humans, contracted when contaminated meat is cooked inadequately. H. pylori is a gram-negative and spiral to pleomorphic. It can move by means of tiny flagella at the end of the cell. There are many strains of H. pylori, which are distinguished by the human disease with which they cause. H. pylori infection is the main cause of chronic superficial gastritis and it is associated with both gastric and duodenal ulcers. It lives in the interface between the surface of gastric epithelial cells (the lining of the stomach). It often clusters at the junctions of epithelial cells. S. typhi is a gram-negative, enteric, rod prokaryote (dividing); causes typhoid fever. S. aureus is a gram-positive, methilicin resistant, coccus prokaryote (dividing); causes food poisoning, toxic shock syndrome and skin and wound infections. C. jejuni is a gram-negative, enteric, curved (vibrio-shaped), rod prokaryote. It is found in the gastrointestinal tract of humans and animals, it can travel to the oral cavity and genitourinary tract. Causes gastroenteritis, especially in infants. V. cholerae is a gram-negative, facultatively anaerobic, curved (vibrio-shaped) rod prokaryote; causes Asiatic cholera.

A B C

D E F

Page 17: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

68

F plasmid by ultraviolet light and acridine orange respectively. The genome is very compact as the average distance between genes is 118 bp. The 70 intergenic regions larger than 600 bp were reevaluated for the presence of open reading frames (ORFs), 15 of these regions were found to contain previously unannotated ORFs, and an additional 11 intergenic regions contain sequence features such as long untranslated leader sequences. Example araFGH operon control region. The remaining 44 large intergenic regions fall into three general classes: putative gene regulatory regions, large repetitive sequences and regions of unknown function. The genome of E. coli consists of 4,639,221 bp of circular duplex DNA and encodes 4,289 genes. Protein coding genes account for 87.8% of the genome, 0.8% encodes stable RNAs and 0.7% consists of non-coding repeats. 11% of the genome serves regulatory and other functions. The origin and terminus of replication divide the genome into oppositely replicated halves called replichores. Replichore one is the leading strand and replichore two is the lagging strand. Many features of E. coli are oriented with respect to replication. All seven rRNA operons and 53 of 86 t-RNA genes are expressed in the direction of replication. Approximately 55% of protein coding genes are also aligned with the direction of replication. IV.2.2.b. The Bacillus subtilis genome The sequence of the B. subtilis genome revealed many surprises such as the finding of genes encoding 18 sigma factors (prokaryotic regulators of gene expression). This suggests that B. subtilis regulates many of its genes in small groups. The expansion of certain gene families (paralogues) is also remarkable, resulting in, e.g., 77 different members in the ABC family of transporter proteins. This indicates that B. subtilis has evolved an elaborate and finely tuned system for chemical communication with its environments i.e., these transporters are vital pumping systems that use energy from ATP hydrolysis to import cell nutrients, signaling molecules to bacteria, export toxic byproducts and noxious agents as antibiotics. At the other extreme, one-quarter of the genes are present as a single copy and bear no obvious similarity to any other gene discovered so far. Presumably, they play some useful role in B. subtilis physiology under conditions not yet mimicked in the laboratory. The presence of several antibiotic-production pathways occupying 2 % of the genome, indicate that B. subtilis can defend its ecological niche. As a gram-positive bacterium its genome sequence provides a solid basis for understanding the genes and genomes of other gram-positive microorganisms. B. subtilis can be transformed into a dormant spore so that it can resist extreme conditions of the environment. Because of this, we know more about gene expression during the post-exponential phase of growth in B. subtilis than any other bacteria . IV.2.3. Archaeal genomes Archaea are microscopic prokaryotes whose cellular organization is similar to that of eubacteria. They live in the most extreme environments on the planet. Some live near rift vents in the deep sea, other in hot springs, brine pools, or in extremely alkaline or acid wastes. For this Archaea have been termed extremophiles and later they were classified into two major groups: euryarchaeota and crenarchaeota (see Box 3). New research is showing that archaeans

Page 18: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

69

and also quite abundant in plankton of the open sea. Much is still to be learned about these microbes, but it is clear that archaea are remarkably diverse and successful clade of organisms. As stated before, many lines of evidence suggested that prokaryotic organisms came into two kinds: Bacteria and Archaea. The distinction was made clear by comparing the ribosomal RNA genes and analyzing the chemical nature of the cell walls. Despite their bacterial nature, Archaeal organisms share common features with both eukaryotic and bacterial systems. Among the eukaryotic traits we cite the DNA replication and transcription machineries, the presence of introns in some archaeal tRNA genes, etc. On the other hand, Archaea have many bacterial traits such as the presence of a single and circular chromosome, operons, etc. It was obvious to all scientists that the specific characteristics of Archaea and their ability to survive extreme conditions are encoded by their genomes. The genome projects dedicated to some Archaea should help understand the genetics and physiology of these odd living systems. The type species of Archaea is the methanogenic euryarchaeon Methonococcus jannaschii that was first isolated in 1983 in the area of a “smoker”, a hydrothermal vent on the floor of the Pacific Ocean. Thriving at pressures that would crush a conventional submarine, these heat-living, methane-producing microbes lives without sunlight, oxygen, or organic compounds. The complete genome sequence was completely sequenced and published in 199616. The analysis of the 1.66Mb genome sequence predicted the presence of 1738 coding genes, of which 38% could be assigned to a putative cellular role with a significant score. Interestingly, the majority of genes involved in metabolism and energy production were closely related to those found in Bacteria, while most of the genes dedicated to transcription, translation and replication were more similar to those of Eucarya. The sequenced genome M. jannaschii was joined by 14 other archaeal genomes including crearchaeons and euryachaeons.

Box 3- A deeper look of Archaea17 A recent view of life Microorganisms are the foundation of the biosphere both from an evolutionary as well as an ecological perspective. Earth’s biosphere is largely shaped by the geochemical activities of prokaryotic microorganisms; activities that have provided conditions both for the evolution of plants and animals and for the continuation of all life on earth. Consequently, it is not surprising that the diversity of microorganisms, from the standpoint of genetic, metabolic, and physiological aspects, is far greater than that found in plants and animals. A new awareness has occurred in our understanding of microbial diversity. New scientific discoveries, mainly in the genomics area, have allowed biologists to build phylogenetic trees and compare all living organisms to one another on the basis of highly conserved genes that all organisms have. The sequences ribosomal genes that code for small subunit RNA (i.e. 16S or 18S rDNA) have been used to determine the relatedness of all living organisms. The universal tree of life also shows the tremendous diversity of microorganisms, indicating the presence of three domains of life, namely the Bacteria, the

16 Bult, C.J. et al. (1996) Complete genome sequence of the methanogenic archaeon Methanococcus janaschii. Science, 273: 1058-1073. 17 This section is taken from a research project entitled "Functional Genomics IV: molecular analysis and phylogenetic studies of the sequenced archaeal genomes”, which was prepared, under my supervision, by Mr. Noubar Kevorkian and Miss Maya thoubian (2002-2003). Here, I would like to thank both of them for their outstanding contribution and excellent work.

Page 19: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

70

Archaea and the Eukarya. The life history and the different archaeal lineages The Domain Archaea wasn't recognized as a major and third domain of life until quite recently. During the first decades of the 20th century, most biologists considered all living things to be classifiable as either a plant or an animal. But in the 1950s and 1960s, most biologists came to realize that this system failed to accommodate the fungi, protists, and bacteria. By the 1970s, a system of four kingdoms had come to be accepted as the model that encompass all living things. A distinction was made between the prokaryotic kingdom and the three eukaryotic kingdoms (plants, animals, and eukaryotic protista). The scientific community was understandably shocked in 1977 by the discovery of an entirely new group of organisms, the Archaea. The study of relationships among prokaryotes using DNA sequences led to the identification of two distinctly different groups. Those prokaryotic species that lived at high temperatures or produced methane, clustered together as a group, well away from the usual bacteria and the eukaryotes. Because of this vast difference in genetic makeup, it was proposed that life is divided into three domains: Eukarya, Eubacteria (or simply Bacteria), and Archaebacteria. Later the term Archaebacteria was shortened it to Archaea. Archaeans include inhabitants of some of the most extreme environments on the planet such as thermal vents, at temperatures well over 100ºC, extremely alkaline or acidic waters, hypersaline waters. etc. Archaeans have been also found thriving inside the digestive tracts of cows and termites, in the anoxic mud of marshes, an environment that is greatly deficient in oxygen, at the bottom of the ocean, and even in petroleum deposits deep underground. In short, Archaea proliferate in habitats where the chance of survival of other organisms is null. However, they are not restricted to extreme environments for new research is showing their abundance in the plankton of the open sea. Although much is still to be learnt about these microbes, it is clear that Archaeans are remarkably diverse and a successful clade of organisms18. Classification of Archaea On the basis of ribosomal RNA analysis, the Archaea consist of four phylogenetically distinct groups: Crenarchaeota, Euryarchaeota, Korarchaeota, and Nanoarchaeota. However, for Korarchaeota, only the nucleic acids have been detected, and no organisms have been isolated or cultured. Single stranded rRNAs of Korarchaeota have been obtained from hyperthermophilic environments, similar to those inhabited by Crenarchaeota. The Crenarchaeota consist mainly of hyperthermophilic sulfur-dependent organisms that come from several distinct phylogenetic lineages. The extreme thermophiles require a very high temperature (82 to 113 degrees) for growth. Their membranes and enzymes are unusually stable at high temperatures. Most of these Archaea require elemental sulfur for growth. Some are anaerobes that use sulfur as an electron acceptor for respiration in place of oxygen. Others are lithotrophs that oxidize sulfur as an energy source. Sulfur-oxidizers grow at low pH values (less than pH=2) because they acidify their own environment by oxidizing elemental sulfur (S0) to sulfate (SO4

2-). The Euryarchaeota contains the methanogens and the extreme halophiles. Methanogens are anaerobes that will not tolerate even brief exposure to air. They have an incredible type of metabolism that can use hydrogen gas (H2) as an energy source and carbon dioxide gas (CO2) as a carbon source for growth. In the process of making cell material from H2 and CO2, the methanogens produce methane (CH4) in a unique energy-generating process. Extreme halophiles live in natural environments, where the salt concentration is very high (as high as 5 molar or 25% NaCl). These organisms require salt for growth and are not able to survive at low salt concentrations, because Na+ stabilizes their cell walls, ribosomes, and enzymes. They adapt to the high salt environment by the development of “purple membrane”, which corresponds to patches of light-harvesting pigment in the plasma membrane. The pigment is a type of rhodopsin called bacteriorhodopsin, detected in the well-studied archaeon Halobacterium. Retinal in this protein absorbs a photon and uses it to eject a proton from the cell forming a proton gradient, which is used to generate ATP via ATPase. This is the only example in nature of non-photosynthetic phosphorylation. Another unique adaptation of halophiles is another pigment called halorhodopsin, a light driven chloride pump,

18 Woese, C.R. (2000). Interpreting the universal phylogenetic tree. P.N.A.S. USA, 97: 8392-8396. Woese, C.R. (2002). On the evolution of cells. P.N.A.S. USA, 99: 8742-8747.

Page 20: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

71

similar to bacteriorhodopsin, that functions to accumulate Cl¯ in the cytosol to protect the organism against passive loss of these ions across the cell membrane. Halophiles are heterotrophs that normally respire by aerobic means. The high concentration of NaCl in their environment limits the availability of O2 for respiration, so they are able to supplement their ATP-producing capacity by converting light energy into ATP using bacteriorhodopsin. The Nanoarchaeota is so far represented by a single species, Nanoarchaeum equitans, which is a nanosized hyperthermophilic archaeon isolated from a submarine hot vent. They grow attached to the surface of a specific archaeal host of the genus Ignicoccus, and harbor the smallest archaeal genome of about 0.5 megabases in size. The distribution of ‘Nanoarchaeota’ is so far unknown. N. equitans may provide insight into the evolution of thermophily, of tiny genomes and of interspecies communication19. Maintenance of structural integrity in extreme conditions Two main questions arise: why are these organisms so weird? And how do they survive in such harsh environments? Actually, there are many structural features of the extremophilic archaea that contribute to their survival at extreme conditions: The stability of archaeal proteins in extreme environments is due to their three dimensional structure. In thermophiles, the amino terminal of the polypeptide chains, are tied down by hydrogen bonding that prevents denaturation of proteins. Protein stability is also due to high concentration of acidic amino acids. These help stabilize the helices through the formation and maintenance of salt bridges. Furthermore, the structure of a glyceraldehyde-3-phosphate dehydrogenase (GADPH) from Sulpholobus solfataricus has recently been solved using X-ray methods. The archaeal enzyme is different from other GADPHs because it has an increased number of helices. The thermostability of the enzyme is attributed to a combination of ion pair clusters, and an intrasubunit disulphide bond between the cofactor binding domain and the catalytic domain of each monomer20. Membrane lipids in Archaea are different from those in other domains. They are characterized with ether linkages, instead of ester linkages in Bacteria and Eukarya, between the glycerol moiety and the lipid chains, which are branched isoprenoid units instead of unbranched fatty acid chains21. Nucleic acid stability in Archaea is primarily due to high G+C ratio, which raises the stability of interactions between DNA strands hence increasing the unfolding temperature. However in hyperthermophilic archaea, no meaningful correlation is found between optimum growth temperature (OGT) and the genomic G/C content. Cations stabilize the double-stranded conformation of DNA by canceling the negative charges of phosphates, and increasingly higher in vivo K+ concentrations have been reported for some thermophilic archaeal species with increasing OGT. With the K+ concentration in hyperthermophiles of the Pyrococcus genus being as high as 800 mM, the double-stranded conformation can be maintained at temperature close to 100ºC. Mystery of early evolution The study of the phylogenetic tree is expected to give an idea about the nature of the entity represented by its root and how this entity gave rise to the primary organismal lineages. For a long time, it seemed a hopeless quest to reconstruct the early history of life, considering the very scarce fossil record available and the paucity of useful phenotypic characters to define phylogenetic relationships between microorganisms22. In theory, the best way to determine the evolutionary history of different genomes is to generate, and then compare and contrast,

19 Huber, H. et al. (2002). A new phylum of Archaea represented by a nanosized hyperthermophilic symbiont. Nature, 417: 63-67. 20 Berry, S. (2001). An extremely interesting conference. Trends in Biotechnology, 19: 2-4. 21 Lipid stability in Archaea is due to unique ether lipids of fatty chains, characterized with four features: ether linkage (alkyl chains are bound to the glycerol moiety by ether linkages, instead of ester linkages), isoprenoid chain (hydrocarbon chains are isoprenoid chains, which are different from the almost straight chain fatty acids), enantiomeric configuration (the alkyl chains are etherified at the sn-2 and sn-3 positions of glycerol moiety, which is the enantiomeric configuration of bacterial and eukaryal ester lipids), and tetraether bridged form lipids (some archaea contain tetraether lipids). The presence of four to five branched isoprenoid units adds stability while maintaining the fluidity of the cell membrane needed to allow the passage of molecules. 22 Forterre, P. and Philippe, H. (1999). Where is the root of the universal tree of life? BioEssays, 21: 871-879.

Page 21: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

72

phylogenetic trees for every single gene in every genome23. One of the universal trees of life depicts a division of the living world into three domains, Archaea, Bacteria, and Eucarya, with a root splitting the Bacteria and a clade grouping Eucarya and Archaea together (the bacterial rooting). This traditional tree mainly corresponds to the rRNA tree with its rooting being based on duplicated genes. Another approach uses two universal trees of paralogous proteins having a common root. However, this tree has been questioned on several grounds. With more and more sequences available, in particular from rapidly expanding genome projects, it turned out that many protein phylogenies contradict the rRNA tree and also each other in terms of relationships between the three domains. Several authors have proposed alternative scenarios that consider only two primary domains and derive the third from later merging of ancestral members of the two others. In most of these hypotheses, the derived domain corresponds to the eukaryotes that originated from a merging of the two primitive prokaryotic lineages, but many authors disagree on the nature of these lineages. Although the rRNA tree of life was and continues to be incredibly useful, it has also been the subject of much controversy. It is clear there is no single gene tree that can represent the evolution of species. The use of a single tree of life assumes that species are related through vertical descent; however, not all genes follow the rules of vertical descent. For example, some genes can be transferred between lineages, a phenomenon known as horizontal or lateral gene transfer. Horizontal transfer complicates evolutionary reconstruction because it means that some species are chimeric, with several histories for different parts of the genome (see chapter X). The purpose of phylogeny In the 1960s and 1970s, with the advent of molecular sequencing, gene histories, organismal genealogies, and with the recent eruption of genomic sequencing- specially those of microorganismal lineages, the various adaptations to harsh environments of some microorganisms seem to be unfolding in front of our eyes. The recent genomic data opened the possibility to analyze of the different genomic and phenotypic features, paving the way to an intuitive understanding of the various domains of life. The aim of such an analysis is to detect or disclose the standpoints that contribute to the weirdness of Archaea and their ability to sustain diverse unique ecological niches. Although unique, Archaea exhibit a mosaic of features present in the other two groups; they possess eukaryotic-like replication and repair proteins and bacteria-like metabolic pathways and enzymes. This mosaicism has prompted the real construction of a phylogenetic tree that will provide Biology with a new and powerful perspective that may help trace all life forms to one common origin.

IV.3. Eukaryotic genomes Eukaryotic genomes are larger than their prokaryotic counterparts. It is well known that all eukaryotic genomes are divided into linear DNA molecules, each of which contained in a structure termed chromosome. In addition, all eukaryotes possess independent but smaller and usually circular, mitochondrial genomes. Plant cells comprise one more genetic feature that is not present in the animal genome. This genetic feature concerns the chloroplast genome that is located in plants and other photosynthetic organisms. Eukaryotic genomes are of different size ranging from less than 10 Mb (12.1 Mb for S. cerevisiae) to more than 100,000 Mb (some plants).

23 Eisen, J.A. (2000). Assessing evolutionary relationships among microbes from whole-genome analysis. Curr. Opin. Microbiol, 3: 475-480.

Page 22: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

73

Figure 7. Genome projects of eukaryotes. (A) S. cerevisiae. (B) S. pombe. (C) A. thaliana. (D) C. elegans. (E) D. melanogaster. (F) Homo sapiens. (G) M. musculus. (H) Rattus novegicus.

It is generally accepted that the size of genome is somehow consistent with the complexity of the organism. Indeed simple eukaryotes such as fungi have the smallest genomes. On the other hand, higher eukaryotes such as vertebrates and plants have the largest genomes. Despite the difference, there is no correlation between the genome size and the number of genes. For instance, some plants such as maize (Zea mays) and wheat (Triticum aestivum) have genome size of 5,000 and 17,000Mb, respectively, both are larger than the

A B

C D E

F G H

Page 23: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

74

human genome of 3,000Mb24. This absence of correlation was baptized the C-value paradox. Sequencing of small regions of the maize genome has revealed a genome dominated by repetitive elements, a result that may explain in part the C-paradox. Also, there is no correspondence between the size of the genome and the number of genes. If it was, we would expect the human genome, which is 250 times the size of the yeast S. cerevisiae (its genome comprises about 6,000 genes) to contain (250 x 6000) 1,500,000 genes. In fact, the latest estimations indicate that the human genome contains about 30,000 genes. Sequence analysis has shown that small genomes are compact in that sense their genomes are more closely packed together while in large genomes genes may be separated by large intergenic sequences (98% of the human genome correspond to intergenic regions). Moreover, when S. cerevisiae genome is compared to the human genome we notice that only few of the yeast genes are split genes. In the whole S. cerevisiae genome there are about 240 introns, whereas in humans some individual genes contain more than 100 introns. The complete DNA sequence has been determined for some eukaryotic organisms that are considered as model organisms for eukaryotic molecular biology (Fig. 7). The selection of some living systems for genome projects is justified and depends largely on their advantages for genetics. The budding yeast S. cerevisiae and the fission yeast S. pombe are both models for the basic eukaryotic cell. Genomic studies have shown that some gene sequences of these two yeasts are as distant from each other as they are from their human homologues. The worm C. elegans and the fly D. melanogaster are model organisms for developmental genetics. The plant A. thaliana is without doubt one of the best-studied plant systems; it has become a model for plant molecular biology. Finally, the eukaryotic genome projects include the human genome whose functional analysis will concern most of the laboratories all over the world for several decades. Note that the international effort to map and sequence the genomes of the mouse M. musculus and the rat will complement the human genome and provide further insights into the mechanism of action of several genes involved in human diseases. IV.3.1. The genome project of the budding yeast S. cerevisiae When compared to other eukaryote, such as human, the yeast S. cerevisiae has a small and economical genome with an estimated size of 12Mb. Since 1957, when Pasteur demonstrated the role of the yeast in fermentation, S. cerevisiae has become a tool and a model for biochemists and geneticists. Many arguments were presented to argue in favor of sequencing the yeast genome. First, the genome is small so that the sequencing part could be achieved with the available technology. Second, the genome is compact i.e. 75% of the genome are transcribed and the intergenic distances are small. Consequently, and as introns are relatively rare, the coding information can be easily obtained by using simple gene-finding program. Third, one major advantage of this system is the facility with which yeast genes can be manipulated in vitro and in vivo. This advantage means that the structural information can

24 The genome size of the plant fritillary (Fritillaria assyriaca) is 120,000Mb!

Page 24: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

75

be exploited by the so-called reverse genetic approaches to generate biological or functional information. The genome of S. cerevisiae was the result of international collaboration of more than 600 researchers from over 100 laboratories. The final sequence of the 16 chromosomes that comprise the yeast genome, was assembled with an estimated error rate of 0.03% (99.97% Accuracy). This represented the first complete genome sequence of a eukaryotic organism. Analysis of the yeast genome revealed the existence of 6275 ORFs, of which 390 ORFs are unlikely to be translated into proteins. Therefore, only 5885 coding genes are believed to exist. In addition, the genome encodes about 400 non-coding RNAs: 140 ribosomal RNA genes in a large tandem array on chromosome XII, 40 small nuclear RNA genes (dispersed on the 16 chromosome) and 275 tRNA genes corresponding to 43 families. IV.3.2. The genome sequence of the Schizosaccharomyces pombe The genomic sequence of the fission yeast S. pombe was determined. The genome of 13.8 Mb is distributed on 3 chromosomes: I (5.7Mb), II (4.6Mb) and III (3.5Mb). The genome contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (S. cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes have been acquired with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization. IV.3.3. Drosophila melanogaster and Caenorhabditis elegans, two models for molecular developmental genetics If S. cerevisiae has a confirmed position as a model for the eukaryotic molecular genetics, this model is not suitable to study the developmental processes needed to build up a multicellular eukaryote. For this two other organisms, Drosophila melanogaster and Caenorhabditis elegans, have become models for multicellular eukaryotic development. Research with C. elegans was initiated on the 1960s, it is a microscopic (~1 mm) nematode (roundworm) that normally lives in soil. It has become one of the "model" organisms in biology because of several features: It is a true animal with at least rudiments of the physiological systems found in "higher" animals like mice and humans; it is so small that large numbers can be raised in petri dishes; it reproduces rapidly; it is transparent so that every cell in the living animal can be seen under the microscope from the fertilized egg to the 558 cells of the newly-hatched worm and, later, the 959 somatic cells, and a variable number of germ cells,

Page 25: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

76

of the adult worm. Its cells contain 5 pairs of autosomes and, usually, 2 X chromosomes. C. elegans was the first multicellular eukaryote to have its entire genome sequenced. It contains about 19,820 genes incorporated in 9.5 x 107 base pairs of DNA. D. melanogaster is the second model organism for which the entire genome sequence has been determined and published. Its genome of 14 x 107 contains about 13,000 genes. The use of Drosophila in genetics dates back to 1910 when Morgan first used this organism as a model in genetic research. D. melanogaster or the fruit fly has a number of advantageous features: it has a small size, enabling large numbers to be studied in a single experiment; its genome is relatively small of 180Mb; gene isolation is aided by the presence in a fly's salivary glands of giant chromosomes made up of multiple copies of the same DNA molecule (polyteny); it has become a model for developmental research. Indeed, D. melanogaster has greatly contributed to our understanding of the different mechanisms through which an undifferentiated embryo acquires positional information that eventually results in the construction of complex adult organisms. Further, these mechanisms turned out to be similar to those used by other organisms, including humans. It is obvious that knowledge from DNA sequence of different model organisms will define biological research through the present century. Among the many challenges of the genome projects we cite: the determination of (1) gene number, exact locations, and functions, (2) gene regulation, (3) chromosomal structure and organization, (4) coordination of gene expression, protein synthesis, and post-translational events, (5) characterization of proteomes (total protein content and function), etc. This new and exciting research area will help identify life's molecular machines, the multiprotein complexes that carry out the functions of living systems and characterize the gene-regulatory networks and processes that control life's molecular machines. IV.3.4. Arabidopsis thaliana, a model for plant molecular biology More than 250,000 species of flowering plants decorate the world and the entire biosphere depends on plants for food and oxygen. The sequencing and analysis of the first genome from the plant kingdom has recently been reported. The genome in question is that of Arabidopsis thaliana, which has become and excellent model for plant research. Its small size, short life cycle and prodigious seed production make it an easy and inexpensive organism to propagate in the laboratory, and with a (relatively) small genome, it was an ideal choice for sequencing. A. thaliana contains a complete set of genes for controlling developmental patterns, metabolism, disease resistance, etc. Thus, its genomic sequence provides a means for analyzing gene function relevant to a range of plant species. The completion of the genome sequence will certainly accelerate research. Although plants generally do not move, they can perpetuate indefinitely and they synthesize all their metabolites. Comparison of Arabidopsis, bacterial, fungal and animal genomes will define the genetic basis for these differences between plants and other life forms e.g. gene families specific of plant kingdom. Arabidopsis genome is split into three genomes: a nuclear genome of about 125Mb, (25,498 protein genes), a mitochondrial genome of about 367Kb (58 protein genes) and plastid (chloroplast) genome of about 154 Kb (79 protein genes). The function of the 69% of the genes

Page 26: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

77

was classified according to sequence similarity to proteins of known function in all organisms; only 9% of the genes have been characterized experimentally. The genome has 589 cytoplasmic tRNA and 27 organelle-derived tRNAs. It has also spliceosomal RNAs (u1, u2, u4, u5, u6), and between 10 and 16 copies of snRNA found as single or as small groups across all chromosomes. Transposable elements in Arabidopsis account for at least 10% of the genome; those found in many other plant genomes are well represented here. Transposon-rich regions are relatively gene poor and have low rates of recombination. In Arabidopsis, the nucleolar organizers25 (NORs) juxtapose the telomeres of chromosomes 2 and 4 and comprise uninterrupted 18S, 5.8S, and 25S units all oriented in the chromosomes in the same direction. In contrast, the 5S rRNA genes are localized to heterogeneous arrays in the centromeric regions of chromosomes 3, 4 and 5. Arabidopsis telomeres are composed of CCCTAAA repeats and average about 2 to 3 Kb. Arabidopsis centromeres, like those of many higher eukaryotes, contain numerous repetitive elements including retro-elements, transposons, micro-satellite and repetitive DNA. IV.3.5. The human genome project It is the largest genome to be extensively sequenced so far, being 25 times as large as any previously sequenced genome. It is the first vertebrate genome to be extensively sequenced starting from a draft genome sequence, which was generated from a physical map covering over 94% of the human genome. The task ahead is to produce a finished sequence, by closing all gaps and resolving all ambiguities. The genome contains about 30,000 to 40,000 protein- coding genes generating a complex proteome. In human, coding sequences comprise less than 3% of the genome, whereas repetitive sequences account for at least 50% and probably much more. These repeats fall into five classes: (1) Transposon-derived repeats; (2) Inactivated partially retroposed copies of cellular genes; (3) SSRs (simple sequence repeats), which represent 3% of the human genome and which can be micro-satellites or mini-satellites); (4) Segmental duplications (intrachromosomal or interchromosomal duplications). (5) Blocks of tandemly repeated sequences, such that at centromeres, telomeres, the short arm of acrocentric chromosomes and ribosomal gene clusters. As stated previously the genes (or at least their coding regions) comprise only a tiny fraction of human DNA, but they represent the major biological function of the genome and the main focus of interest by biologists. The ultimate goal is to have a complete list of all human genes and their encoded proteins. But this is a difficult task. In organisms with small genomes, it is straightforward to identify most genes by the presence of long ORFs. In contrast, human genes tend to have small exons (encoding an average of only 50 codons) separated by long introns (some exceed 10 Kb). Thousands of human genes produce non-coding RNAs,

25 The nucleolus is a distinct region not delimited by a membrane, found in eukaryotic nuclei, in which ribosomal RNAs are synthesized and assembled into ribosomal subunits (ribonucleoprotein subunits). The rRNA is transcribed by RNA polymerase I from a nucleolar organizer (NOR) which is a group of tandemly repeated chromosomal genes that encode rRNA.

Page 27: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

78

(ncRNAs) as their ultimate product. There are several major classes of ncRNA genes: tRNA, rRNA, snoRNA, and snRNA genes. How many genes are there in the human genome? In mid-1980s it was suggested that there might be about 100,000 genes, based on approximate ratio of size of typical gene (approximately 30000bp) to the size of the genome (3,000,000,000 bp). According to the frequency of association of CpG islands26 with known genes, there is an estimate of (70,000-80,000) genes. According to expressed sequence tags (ESTs) data, the number of genes varied between 35,000 and 120,000. According to the latest estimation based on the sequence analysis there may be approximately 30,000 human genes. IV.3.6. Organelle genomes As stated before the eukaryotic genome is split into two parts, the nuclear genome, the largest one, and the much smaller mitochondrial genome. Also, in plant cells there is a third genome located in the chloroplast (see chapter X for further details). The existence of organelle genomes had to wait the early 1900s to be accepted by scientists. Although most of the organelle genomes are circular, as prokaryotic genomes, we now know that some are linear. In contrast with the nuclear genome, which is present in two copies in diploid cells, the copy number for organelle genomes is much higher. For example, each human mitochondrion contains approximately 10 identical molecules: with an average of 800 mitochondria per cell, a human cell may contain 8,000 mitochondrial genomes per cell. As to plants, a higher plant cell contains about 5,000 chloroplast genomes. In term of genome size, chloroplast genomes are less variable than their mitochondrial counterparts. If the size of chloroplast genome is 150 Kb in average, mitochondrial genomes are more variable in size and range from as low as 16 Kb in humans to 2500 Kb in the flowering plant Cucumis melo (melon). IV.4. The post-sequencing era will be dominated by the new field of functional genomics The availability of a huge amount of sequence information and particularly of complete genome sequence of many systems has ushered in a new field in biology termed functional genomics. The first genome to be entirely sequenced was that of the bacteriophage X174 (5386bp), which did Sanger et al. achieve in 197827. Since 1978, hundreds of viruses, more than 75 prokaryotic species and many eukaryotic systems, including humans, have seen their genomes sequenced and submitted to extensive genetic analysis. The most challenging of the genome projects was that of humans with more than 3 x 109 bp deciphered by using new approaches and sophisticated technology. Comparative analysis of different genomes should help us gain new insights into how genomes have evolved. Further, it is now possible to know

26 CpG island is a GC-rich DNA region, approximately 1kb in size, found upstream of many vertebrate genes. For instance, some 56% of the human genes are associated with an upstream CpG island. Therefore, this feature of the human genome provided an alternative to estimate the number of genes on the basis that each CpG island is associated with a single gene. As there are about 45,000 CpG islands, this suggests there are 80,000 (45,000/0.56) genes or so. 27 Sanger, F. et al. (1978) The nucleotide sequence of the bacteriophage x174. J. Mol. Biol., 125, 225-246.

Page 28: Anatomy of genomes in Eukaryotes and Prokaryotesprokaryotic genomes have small sizes and most of them are less than 5Mb in size. Some bacterial species have one circular chromosome

Molecular Genetics and Genomics Department of Biology/Faculty of Sciences/LU

Dr. Fahd Nasr-All rights reserved

79

exactly how genes work, and how they interact with other genes in the human body, as well as with factors in the environment, in order to prevent some diseases. One of the major challenges of the human genome will be to define susceptibility to diseases and try to devise, in the short or long term, an effective therapy for afflicted individuals. It may be hard to predict the future of biology during the 21st century, however it is becoming increasingly clear that most of the research effort will be dedicated to functional genomics in order to bridge all the genetic information accumulated so far, characterize the molecular mechanisms of life and reach a comprehensive understanding of their regulatory networks. To meet this challenge, functional genomics uses high throughput technologies to study and compare the entire genomes, sets of expressed RNAs or proteins and gene families for a large number of species. Functional genomics encompasses several branches: One. Transcriptomics: this step of genome expression determines the make-up of the cellular RNA known as the transcriptome. The science that deals with the transcriptome, termed transcriptomics, will yield important information about when genes are turned on or off. However, there is no correlation between RNA transcription and protein expression since RNA has to undergo post-transcriptional modifications. This may lead to the formation of different proteins with different functions. This indicates that transcriptomics reveals no data regarding the activity of the product, leaving a huge information gulf about the regulation of protein expression, structure and function of which must be filled to allow fast-track exploitation of genomics. Two. Proteomics: a detailed understanding of the control of gene networks requires information on both mRNA and protein expression levels. The complete set of functioning proteins in a living cell is the proteome and proteomics is the large- scale study of gene expression at the protein level. Three. Structural genomics: aims to translate the sequence data into structural information. It seeks to determine the 3D structure of every protein in an organism. Four. Ribonomics: an evolving arm of functional genomics, RNomics, studies the class of small and non-coding RNAs that have a cellular function on their own or in complex with proteins thus forming ribonucleoproteins. Five. Bioinformatics: the genome projects of a variety of organisms have generated a vast amount of sequence information; bioinformatics–another branch of functional genomics–has emerged from the need to store, organize, analyze and integrate the huge information from the various genomic projects in order to facilitate the work of researchers. Also it helps decode the DNA of the various genomes, interpret experimental data and better understand the biological processes.

*****