molecular biology

1

Genomes

311 404 Molecular Biology

Watanachai Lontom, Ph.D. Department of Biology, Faculty of Science,

Khon Kaen University2

References

• Brown, T. A. 2007. Genome. 3rd ed. Garland Science Publishing, New York

• Weaver, R. F. 2008. Molecular Biology. 4th

ed. The McGraw-Hill Companies, Inc., New York.

3

E-learning

• Khan Academyhttp://www.khanacademy.org

• Youtube Education

4

Objectives

1. Describe the differences between prokaryotic and eukaryotic genomes,

2. Described the organization of genome,

3. Describe the importance of some genome projects.

When you have learned this Chapter, you should be able to:

5

Genome of Organisms

Genome is the complete collection of genetic information, including the genes and the extra DNA that are passed down from generation to generation in a given organism.

Genome can be DNA or RNA.

Genome sizes vary among organisms RNA viruses have the smallest genome which compose of only 3 genes

6

Genome of Organisms

7

Genome of Organisms

Genome Form Size (Kb)

Eukaryotes ds linear 104-106Bacteria ds circular 103Plasmid ds circular (some ds linear) 2-15Mammalian DNA viruses ss linear, ds linear, ds

circular3-280

Bacteriophage ss circular, ds linear ~50Chloroplast DNA ds circular 120-160Mitochondrial DNA ds circular (some ds linear) Animals: 16.5

Plants: 100-2500

Diversity of DNA-based genome organization (Allison et al., 2007)

8

Prokaryote and Eukaryote

9http://www.phschool.com/science/biology_place/biocoach/images/cells/allcell.jpg

Prokaryote and Eukaryote

10

Prokaryotic Genome

Prokaryotes do not have nucleus. However, they still must fit DNA that is 1000 times the length of the cell within the cell membrane.

Most of prokaryotes (for example Escherichia coli) have 1 large chromosome which is circular DNA.

The Genome of E. coli is 4,700 kb in size and exists as one double-stranded circular DNA molecule, which no free 5’ or 3’ ends.

Structure of prokaryotic genome

11

Prokaryotic Genome

The chromosomal DNA is organized into a condensed ovoid structure called a nucleoid.

The chromosomal DNA is packed with the help of DNA-binding protein, histone-like proteins or nucleoid-associated proteins.

HU (heat-unstable protein),IHF (integration host factor), HNS (heat-stable nucleoid structuring), and SMC (structural maintenance of chromosomes) are histone-like proteins.


12

Chromosome of E. Coli

(HU protein) 40-50 loops

Prokaryotic Genome

13

E. coli 1 เซลลม์ีขนาด 1 x 2 μm แต่โครโมโซมของ E. coli มีเสน้รอบวง1.6 mm โครโมโซมดงักล่าวบรรจุอยูใ่นนิวคลิออยดข์องเซลล ์E. coli ไดอ้ยา่งไร?

Prokaryotic Genome

14

Prokaryotic Genome

The chromosome of E. coli is supercoiled.

Supercoiled occurs when additional turns are introduced into the DNA double helix (positive supercoiling) or if turns are removed (negative supercoiling)

In E. coli the supercoiling is thought to be generated and controlled by two enzymes, DNA gyrase and DNA topoisomerase I.


15

Prokaryotic Genome

The current model has the E. coli DNA attached to aprotein core from which 40-50 supercoiled loops radiate out into the cell.

Each loop contains approximately 100 kb of supercoiled DNA.


16

Supercoiled structure of bacterial DNA

(HU protein)(40-50 loops)

Prokaryotic Genome

17

Prokaryotic Genome

Although the majority of bacterial and archaeal chromosomes are circurlar, an increasing number of linear ones are being found.

The first of these, for Borrelia burgdorferi, the organism that cause Lyme disease, was described in 1989 and during the following years similar discoveries were made for Streptomyces.


18

Prokaryotic Genome

Plasmids are small, double-stranded circular or linear DNA molecules carried by bacteria (some fungi and some higher plant).

They range in size from 2-100 kb with self-replicating property.

Some types of plasmids are able to integrate into the main genome, but others are thought to be permanently independent.

Plasmids carry genes that are not usually present in the main chromosome coding for characteristics such as antibiotic resistance.


19

Prokaryotic Genome

20

Most prokaryotes have 1 copy of gene They have genes with no intron Very little spaces between genes Very low frequency of repetitive sequence in genome Contain groups of genes that are located adjacent to one another in the genome (operon) such as lactose operon in E. coli ’s genome

Prokaryotic GenomeStructure of prokaryotic genome

21Comparison of the 50-kb segments of genome of humans, yeast, fruit flies, maize, and E. coli (Brown, 2007).

Prokaryotic Genome

22

Prokaryotic Genome

23

Eukaryotic Genome

Nuclear genome

Organelle genomes

24

Large and complex Multiple linear DNA In ordinary cells, linear DNA molecules are packed into chromatin (DNA with its associated proteins). Chromatin is then folded into chromosomes in metaphase cells. More than 1 copies of genes High frequency of intron and repetitive DNA

Nuclear genome

Eukaryotic Genome

25

Chemical composition of eukaryotic chromosome1. DNA

2. Protein Basic protein has positive charge at neutral pH.

Histone proteins (H1, H2A, H2B, H3 และ H4) Histone molecule is rich in lysine and arginine that result in the positive charge of histone. Histone is well associated with DNA by ionic bond.

Acidic protein has positive charge at neutral pH. .Non-histone proteins

Eukaryotic Genome

26

จีโนม 1 ชุดของมนุษยม์ีดีเอน็เอความยาวรวมทั้งหมดประมาณ 100 cm ทาํไมจึงสามารถเกบ็ในรูปของโครโมโซมจาํนวน 23 โครโมโซมได ้ ทั้งที่โครโมโซมใหญ่สุดมีขนาดเพียง 0.5 x 10 μm ในระยะเมทาเฟส

Eukaryotic Genome

27

Packaging of DNA into chromosomes

Nuclease protection experiments (1973-1974)

Olins and Olins (1974) proposed electron micrograph of protein beads on the string of DNA. Each bead is called nucleosome.

http://bio3400.nicerweb.com/Locked/media/ch11/11_15-nucleosome.jpg

Eukaryotic Genome

28

Eukaryotic Genome

29

Nuclosome comprises 8 molecules of histone proteins (2 of H2A, H2B, H3 and H4) called core octamer wrapped twice around with 140-150 bp of DNA

Each nucleosome is seperated by 50-70 bp of linker DNA.

A single linker histone (H1) is attached to each nucleosome.

Eukaryotic GenomePackaging of DNA into chromosomes

30

31

The 30 nm fiber Bead-on-a-string structure forms a compact fiber of approximately 30 nm in diameter.

Solenoid model or zig-zag ribbon structure


32

33

Loop domains - The 30 nm fiber is compacted into loop domains. - The length of loops is approximately 0.25 m Metaphase chromosomes

- Further condensation requires a number of ATP-hydrolyzing enzymes, including topoisomerase II and the condensin complex. - Condensin is a large protein complex composed of 5 subunits and is one of the most abundant structural components of metaphase chromosomes.


34

Looped domains


35

Centromere

A specific position where 2 sister chromatids are held together

Arabidopsis centromere span 0.9-1.2 Mb of DNA and each one is made up largely of 180-bp repeat sequences.

The 125-bp yeast centromere is divided into 3 regions:

I and III have conserve sequence which involves in the attachment of spindle fiber

II lines in the middle region with AT-reached 90 bp

Eukaryotic Genome

36http://www.cbs.dtu.dk/dtucourse/cookbooks/dave/Fig16_16.JPG

Eukaryotic GenomeCentromere

37

Telomere The terminal region of chromosomes

Mark the end of chromosomes and enable the cell to distinguish a real end from an unnatural end

Made up of hundred copies of repeated motif (5’-T1-4A0-1G1-8-3’)

Has a short extension of the 3’ terminus which then forms a T-loop by unusual hydrogen bond

Telomerase regulates the length of telomere

Eukaryotic Genome

38

Telomere

http://www.cbs.dtu.dk/dtucourse/cookbooks/dave/Fig16_16.JPG

Eukaryotic Genome

39

Organization of genes in genome

Genes are distributed randomly in genome.

Gene density varies among chromosome and species Arabidopsis 1-38 gene (s)/100 kb Humans 0-64 gene (s)/100 kb

Genes in genome can be catagorized by their function or their protein domain.

Eukaryotic Genome

40Comparison of the gene catalogs of Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, fruit fly and humans (Brown, 2002)

Eukaryotic Genome

41

Eukaryotic Genome

42

Multigene families: groups of genes of identical or similar nucleotide sequence and present in multiple copies in genome.

Gene that is a heavy demand for cellular metabolism.

rRNA genes in plant genome compose of sequences that code for 25S, 18S and 5.8S rRNAs align as repeating units in nucleolar organizer region (NOR)

Eukaryotic GenomeOrganization of genes in genome

43

rRNA genes

Eukaryotic Genome

44

Both mitochondria and chloroplasts contain their own genetic information. The genomes are usually, but not always, circular. In circular form, the mitochondrial and chloroplast genomes look remarkably similar to bacterial genomes. This similarity led to the endosymbiont hypothesis. Organelle genomes are inherited independently of the nuclear genome and they exhibit a uniparental mode of inheriance Some genes in organelle are contributed with gene in nucleus.

Organelle genomes

Eukaryotic Genome

45

Mitochondrial DNA (mtDNA) mtDNA is usually a circular, double-stranded DNA molecule that is not packaged with histone. Encodes essential enzymes or protein involved in ATP production (NADH dehydrogenase, cytochrome b, cytochrome c oxidase and ATP synthase) Differs greatly in size among organisms. 16-18 kb in animals 100 kb – 2.5 Mb in plants

Multiple copies of mtDNA per organelle

Eukaryotic GenomeOrganelle genomes

46The Saccharomyces cerevisiae mitochondrial genome (Brown, 2002)

Eukaryotic Genome

47

Chloroplast DNA (cpDNA)

cpDNA is a circular and double-stranded DNA molecule 120-160 kb 20-40 copies / organelle Encodes enzymes involved in photosynthesis, rRNA and tRNA

Eukaryotic GenomeOrganelle genomes

48The rice chloroplast genome (Brown, 2002)

Eukaryotic Genome

49

Repetitive DNA: repeating units of nucleotide sequences found in DNA molecule Tandemly repeated DNA Interspersed genome-wide repeats

Repetitive DNA in eukaryotic genome

Eukaryotic Genome

50

1. Tandemly repeated DNA

Tandemly repeated DNA is a common feature of eukaryotic genome. This type of repeat is also called satellite DNA with repeat domain that contains repeat unit < 5 to >200 bp Present in centromere and telomere Minisatellites form cluster up to 20 kb length with repeat units up to 25 bp. Telomeric DNA with 100 units of repeat units 5’-TTAGGG-3’ is an example of minisatellites. Microsatellite form cluster <150 bp with repeat units of 13 bp or less.

Eukaryotic GenomeRepetitive DNA in eukaryotic genome

51

2.Interspersed genome-wide repeats

Are arised by transposition of transposon Transposon or transposable element (TE) is a DNA fragment that can transposition from one location to another. TEs are devided into

2.1 DNA transposon2.2 Retrotransposon


52

2.1 DNA transposon

Transposon which transpose in DNA to DNA manner. DNA transposon is cut from the original location by transposase (conservative transposition) or is copied (replicative transposition) Ac/Ds elememts in maize is an example of DNA transposon in eukaryote. Insertion sequences (IS1 และ IS186) in E. coli genome is an example of DNA transposon in prokaryote.


53

Eukaryotic Genome

54

DNA transposon (Ac/Ds elememts) in maizehttp://www.nature.com/nature/journal/v443/n7111/images/443521a-i1.0.jpg


55

2.2 Retrotransposon

Transposon which requires RNA intermediate for transposition Retrotransposon is similar to retrovirus


56

2.2 RetrotransposonRetrotransposon

LTR retrotranspsonมีลาํดบัเบสซํ้ าขนาดยาวที่ปลายทั้งสองดา้น (long terminal repeats; LTR)

Non-LTR retrotranspson

LINEs (long interspersed nuclear elements) มี reverse-transcriptase-like gene

SINEs (short interspersed nuclear elements) ไม่มี reverse-transcriptase-like gene


57Retroelements (Brown, 2002)

Eukaryotic Genome

58

Genome Projects of Some Organisms

Genome projects are scientific projects that aim to map and sequence genomes of organisms

There are 3 basic steps to complete the project

Genome sequencing

Genome assembly

Genome annotation

59

(Weaver, 2008)

Genome Projects of Some Organisms

60

The human genome project (HGP) HGP is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome.

The project began in October 1990 by Department of Energy and National Institutes of Health of USA and completed in 2003.

U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003

The Human Genome Project

61

The objectives of this project were to:1. identify all the approximately 20,000-25,000 genes in human

DNA,2. determine the sequences of the 3 billion chemical base

pairs that make up human DNA,3. store this information in databases,4. improve tools for data analysis,5. transfer related technologies to the private sector, and6. address the ethical, legal, and social issues

(ELSI) that may arise from the project.



62

What does the sequence tell us? The human genome size is 3038 Mb. The average gene consists of 3000 bases, but sizes vary greatly, with the largest known human gene being dystrophin at 2.4 million bases. The total number of genes is approximately 20,000-25,000 genes Almost all (99.9%) nucleotide bases are exactly the same in all people. The functions are unknown for over 50% of discovered genes.



63

Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest (231). Less than 2% of the genome codes for proteins. Repeated sequences that do not code for proteins ("junk DNA") make up at least 50% of the human genome. Repetitive sequences are thought to have no direct functions, but they shed light on chromosome structure and dynamics. Over time, these repeats reshape the genome by rearranging it, creating entirely new genes, and modifying and reshuffling existing genes. The human genome has a much greater portion (50%) of repeat sequences than the mustard weed (11%), the worm (7%), and the fly (3%).


What does the sequence tell us?


64


65

Anticipated benefitsMolecular Medicine• improve diagnosis of disease• detect genetic predispositions to disease• create drugs based on molecular information• use gene therapy and control systems as drugs• design “custom drugs” (pharmacogenomics) based on individual genetic profiles

Microbial Genomics• rapidly detect and treat pathogens (disease-causing microbes) in clinical practice• develop new energy sources (biofuels)• monitor environments to detect pollutants• protect citizenry from biological and chemical warfare• clean up toxic waste safely and efficiently



66

DNA Identification (Forensics)• identify potential suspects whose DNA may match evidence left at crime scenes• exonerate persons wrongly accused of crimes• identify crime and catastrophe victims• establish paternity and other family relationships• identify endangered and protected species as an aid to wildlife officials (could be used for prosecuting poachers)• detect bacteria and other organisms that may pollute air, water, soil, and food• match organ donors with recipients in transplant programs• determine pedigree for seed or livestock breeds• authenticate consumables such as caviar and wine


The Human Genome ProjectAnticipated benefits

67

Agriculture, Livestock Breeding, and Bioprocessing• grow disease-, insect-, and drought-resistant crops• breed healthier, more productive, disease-resistant farm animals• grow more nutritious produce• develop biopesticides• incorporate edible vaccines incorporated into food products• develop new environmental cleanup uses for plants like tobacco


The Human Genome ProjectAnticipated benefits

68

• Gene number, exact locations, and functions • Gene regulation • DNA sequence organization• Chromosomal structure and organization • Noncoding DNA types, amount, distribution, information content, and functions • Coordination of gene expression, protein synthesis, and post-translational events • Interaction of proteins in complex molecular machines• Predicted vs experimentally determined gene function• Evolutionary conservation among organisms• Protein conservation (structure and function)• Proteomes (total protein content and function) in organisms• Correlation of SNPs (single-base DNA variations among individuals) with health and disease• Disease-susceptibility prediction based on gene sequence variation• Genes involved in complex traits and multigene diseases• Complex systems biology including microbial consortia useful for environmental restoration• Developmental genetics, genomics

Future Challenges: What We Still Don’t Know



69

Rice (Oryza sativa L.) is the staple food and an important biological model species for monocot plants, and major cereal crops such as maize, wheat, barley and sorghum. Its immense economic value and a relatively small genome size (12 chromosomes) makes it a focal point for scientific investigations. Rice was the first organism whose sequencing was pursued by four groups independently

- International Rice Genome Sequencing Project (IRGSP)- Monsanto- Syngenta- Beijing Genomics Institute (BGI)

Rice Genome Project

japonica cultivar ‘Nipponbare’indica cultivar ‘93-11’

70

This project was started in 1998 and finished in 2004.

Rice Genome Project

71

A total of 37,544 genes have been predicted for the complete sequence with an average gene density of 1 gene/9.9 kb and average gene length of 2,699 bp. Chromosomes 1 and 3 have the highest gene density. Chromosomes 11 and 12 have the lowest gene density. Rice genome comprises ~35% repeat elements. For more details, see Vij et al. (2006)

Rice Genome Project

72

ฐานข้อมูลจโีนมของโครงการศึกษาจโีนมสิ่งมชีีวติเวปไซต ์http://www.ncbi.nlm.nih.gov/sites/genome

73

ฐานข้อมูลจโีนมของโครงการศึกษาจโีนมสิ่งมชีีวติ