molecular biology
DESCRIPTION
TRANSCRIPT
1
Genomes
311 404 Molecular Biology
Watanachai Lontom, Ph.D. Department of Biology, Faculty of Science,
Khon Kaen University2
References
• Brown, T. A. 2007. Genome. 3rd ed. Garland Science Publishing, New York
• Weaver, R. F. 2008. Molecular Biology. 4th
ed. The McGraw-Hill Companies, Inc., New York.
3
E-learning
• Khan Academyhttp://www.khanacademy.org
• Youtube Education
4
Objectives
1. Describe the differences between prokaryotic and eukaryotic genomes,
2. Described the organization of genome,
3. Describe the importance of some genome projects.
When you have learned this Chapter, you should be able to:
5
Genome of Organisms
Genome is the complete collection of genetic information, including the genes and the extra DNA that are passed down from generation to generation in a given organism.
Genome can be DNA or RNA.
Genome sizes vary among organisms RNA viruses have the smallest genome which compose of only 3 genes
6
Genome of Organisms
7
Genome of Organisms
Genome Form Size (Kb)
Eukaryotes ds linear 104-106Bacteria ds circular 103Plasmid ds circular (some ds linear) 2-15Mammalian DNA viruses ss linear, ds linear, ds
circular3-280
Bacteriophage ss circular, ds linear ~50Chloroplast DNA ds circular 120-160Mitochondrial DNA ds circular (some ds linear) Animals: 16.5
Plants: 100-2500
Diversity of DNA-based genome organization (Allison et al., 2007)
8
Prokaryote and Eukaryote
9http://www.phschool.com/science/biology_place/biocoach/images/cells/allcell.jpg
Prokaryote and Eukaryote
10
Prokaryotic Genome
Prokaryotes do not have nucleus. However, they still must fit DNA that is 1000 times the length of the cell within the cell membrane.
Most of prokaryotes (for example Escherichia coli) have 1 large chromosome which is circular DNA.
The Genome of E. coli is 4,700 kb in size and exists as one double-stranded circular DNA molecule, which no free 5’ or 3’ ends.
Structure of prokaryotic genome
11
Prokaryotic Genome
The chromosomal DNA is organized into a condensed ovoid structure called a nucleoid.
The chromosomal DNA is packed with the help of DNA-binding protein, histone-like proteins or nucleoid-associated proteins.
HU (heat-unstable protein),IHF (integration host factor), HNS (heat-stable nucleoid structuring), and SMC (structural maintenance of chromosomes) are histone-like proteins.
Structure of prokaryotic genome
12
Chromosome of E. Coli
(HU protein) 40-50 loops
Prokaryotic Genome
13
E. coli 1 เซลลม์ีขนาด 1 x 2 μm แต่โครโมโซมของ E. coli มีเสน้รอบวง1.6 mm โครโมโซมดงักล่าวบรรจุอยูใ่นนิวคลิออยดข์องเซลล ์E. coli ไดอ้ยา่งไร?
Prokaryotic Genome
14
Prokaryotic Genome
The chromosome of E. coli is supercoiled.
Supercoiled occurs when additional turns are introduced into the DNA double helix (positive supercoiling) or if turns are removed (negative supercoiling)
In E. coli the supercoiling is thought to be generated and controlled by two enzymes, DNA gyrase and DNA topoisomerase I.
Structure of prokaryotic genome
15
Prokaryotic Genome
The current model has the E. coli DNA attached to aprotein core from which 40-50 supercoiled loops radiate out into the cell.
Each loop contains approximately 100 kb of supercoiled DNA.
Structure of prokaryotic genome
16
Supercoiled structure of bacterial DNA
(HU protein)(40-50 loops)
Prokaryotic Genome
17
Prokaryotic Genome
Although the majority of bacterial and archaeal chromosomes are circurlar, an increasing number of linear ones are being found.
The first of these, for Borrelia burgdorferi, the organism that cause Lyme disease, was described in 1989 and during the following years similar discoveries were made for Streptomyces.
Structure of prokaryotic genome
18
Prokaryotic Genome
Plasmids are small, double-stranded circular or linear DNA molecules carried by bacteria (some fungi and some higher plant).
They range in size from 2-100 kb with self-replicating property.
Some types of plasmids are able to integrate into the main genome, but others are thought to be permanently independent.
Plasmids carry genes that are not usually present in the main chromosome coding for characteristics such as antibiotic resistance.
Structure of prokaryotic genome
19
Prokaryotic Genome
20
Most prokaryotes have 1 copy of gene They have genes with no intron Very little spaces between genes Very low frequency of repetitive sequence in genome Contain groups of genes that are located adjacent to one another in the genome (operon) such as lactose operon in E. coli ’s genome
Prokaryotic GenomeStructure of prokaryotic genome
21Comparison of the 50-kb segments of genome of humans, yeast, fruit flies, maize, and E. coli (Brown, 2007).
Prokaryotic Genome
22
Prokaryotic Genome
23
Eukaryotic Genome
Nuclear genome
Organelle genomes
24
Large and complex Multiple linear DNA In ordinary cells, linear DNA molecules are packed into chromatin (DNA with its associated proteins). Chromatin is then folded into chromosomes in metaphase cells. More than 1 copies of genes High frequency of intron and repetitive DNA
Nuclear genome
Eukaryotic Genome
25
Chemical composition of eukaryotic chromosome1. DNA
2. Protein Basic protein has positive charge at neutral pH.
Histone proteins (H1, H2A, H2B, H3 และ H4) Histone molecule is rich in lysine and arginine that result in the positive charge of histone. Histone is well associated with DNA by ionic bond.
Acidic protein has positive charge at neutral pH. .Non-histone proteins
Eukaryotic Genome
26
จีโนม 1 ชุดของมนุษยม์ีดีเอน็เอความยาวรวมทั้งหมดประมาณ 100 cm ทาํไมจึงสามารถเกบ็ในรูปของโครโมโซมจาํนวน 23 โครโมโซมได ้ ทั้งที่โครโมโซมใหญ่สุดมีขนาดเพียง 0.5 x 10 μm ในระยะเมทาเฟส
Eukaryotic Genome
27
Packaging of DNA into chromosomes
Nuclease protection experiments (1973-1974)
Olins and Olins (1974) proposed electron micrograph of protein beads on the string of DNA. Each bead is called nucleosome.
http://bio3400.nicerweb.com/Locked/media/ch11/11_15-nucleosome.jpg
Eukaryotic Genome
28
Eukaryotic Genome
29
Nuclosome comprises 8 molecules of histone proteins (2 of H2A, H2B, H3 and H4) called core octamer wrapped twice around with 140-150 bp of DNA
Each nucleosome is seperated by 50-70 bp of linker DNA.
A single linker histone (H1) is attached to each nucleosome.
Eukaryotic GenomePackaging of DNA into chromosomes
30
31
The 30 nm fiber Bead-on-a-string structure forms a compact fiber of approximately 30 nm in diameter.
Solenoid model or zig-zag ribbon structure
Eukaryotic GenomePackaging of DNA into chromosomes
32
33
Loop domains - The 30 nm fiber is compacted into loop domains. - The length of loops is approximately 0.25 m Metaphase chromosomes
- Further condensation requires a number of ATP-hydrolyzing enzymes, including topoisomerase II and the condensin complex. - Condensin is a large protein complex composed of 5 subunits and is one of the most abundant structural components of metaphase chromosomes.
Eukaryotic GenomePackaging of DNA into chromosomes
34
Looped domains
Eukaryotic GenomePackaging of DNA into chromosomes
35
Centromere
A specific position where 2 sister chromatids are held together
Arabidopsis centromere span 0.9-1.2 Mb of DNA and each one is made up largely of 180-bp repeat sequences.
The 125-bp yeast centromere is divided into 3 regions:
I and III have conserve sequence which involves in the attachment of spindle fiber
II lines in the middle region with AT-reached 90 bp
Eukaryotic Genome
36http://www.cbs.dtu.dk/dtucourse/cookbooks/dave/Fig16_16.JPG
Eukaryotic GenomeCentromere
37
Telomere The terminal region of chromosomes
Mark the end of chromosomes and enable the cell to distinguish a real end from an unnatural end
Made up of hundred copies of repeated motif (5’-T1-4A0-1G1-8-3’)
Has a short extension of the 3’ terminus which then forms a T-loop by unusual hydrogen bond
Telomerase regulates the length of telomere
Eukaryotic Genome
38
Telomere
http://www.cbs.dtu.dk/dtucourse/cookbooks/dave/Fig16_16.JPG
Eukaryotic Genome
39
Organization of genes in genome
Genes are distributed randomly in genome.
Gene density varies among chromosome and species Arabidopsis 1-38 gene (s)/100 kb Humans 0-64 gene (s)/100 kb
Genes in genome can be catagorized by their function or their protein domain.
Eukaryotic Genome
40Comparison of the gene catalogs of Saccharomyces cerevisiae, Arabidopsis thaliana, Caenorhabditis elegans, fruit fly and humans (Brown, 2002)
Eukaryotic Genome
41
Eukaryotic Genome
42
Multigene families: groups of genes of identical or similar nucleotide sequence and present in multiple copies in genome.
Gene that is a heavy demand for cellular metabolism.
rRNA genes in plant genome compose of sequences that code for 25S, 18S and 5.8S rRNAs align as repeating units in nucleolar organizer region (NOR)
Eukaryotic GenomeOrganization of genes in genome
43
rRNA genes
Eukaryotic Genome
44
Both mitochondria and chloroplasts contain their own genetic information. The genomes are usually, but not always, circular. In circular form, the mitochondrial and chloroplast genomes look remarkably similar to bacterial genomes. This similarity led to the endosymbiont hypothesis. Organelle genomes are inherited independently of the nuclear genome and they exhibit a uniparental mode of inheriance Some genes in organelle are contributed with gene in nucleus.
Organelle genomes
Eukaryotic Genome
45
Mitochondrial DNA (mtDNA) mtDNA is usually a circular, double-stranded DNA molecule that is not packaged with histone. Encodes essential enzymes or protein involved in ATP production (NADH dehydrogenase, cytochrome b, cytochrome c oxidase and ATP synthase) Differs greatly in size among organisms. 16-18 kb in animals 100 kb – 2.5 Mb in plants
Multiple copies of mtDNA per organelle
Eukaryotic GenomeOrganelle genomes
46The Saccharomyces cerevisiae mitochondrial genome (Brown, 2002)
Eukaryotic Genome
47
Chloroplast DNA (cpDNA)
cpDNA is a circular and double-stranded DNA molecule 120-160 kb 20-40 copies / organelle Encodes enzymes involved in photosynthesis, rRNA and tRNA
Eukaryotic GenomeOrganelle genomes
48The rice chloroplast genome (Brown, 2002)
Eukaryotic Genome
49
Repetitive DNA: repeating units of nucleotide sequences found in DNA molecule Tandemly repeated DNA Interspersed genome-wide repeats
Repetitive DNA in eukaryotic genome
Eukaryotic Genome
50
1. Tandemly repeated DNA
Tandemly repeated DNA is a common feature of eukaryotic genome. This type of repeat is also called satellite DNA with repeat domain that contains repeat unit < 5 to >200 bp Present in centromere and telomere Minisatellites form cluster up to 20 kb length with repeat units up to 25 bp. Telomeric DNA with 100 units of repeat units 5’-TTAGGG-3’ is an example of minisatellites. Microsatellite form cluster <150 bp with repeat units of 13 bp or less.
Eukaryotic GenomeRepetitive DNA in eukaryotic genome
51
2.Interspersed genome-wide repeats
Are arised by transposition of transposon Transposon or transposable element (TE) is a DNA fragment that can transposition from one location to another. TEs are devided into
2.1 DNA transposon2.2 Retrotransposon
Eukaryotic GenomeRepetitive DNA in eukaryotic genome
52
2.1 DNA transposon
Transposon which transpose in DNA to DNA manner. DNA transposon is cut from the original location by transposase (conservative transposition) or is copied (replicative transposition) Ac/Ds elememts in maize is an example of DNA transposon in eukaryote. Insertion sequences (IS1 และ IS186) in E. coli genome is an example of DNA transposon in prokaryote.
Eukaryotic GenomeRepetitive DNA in eukaryotic genome
53
Eukaryotic Genome
54
DNA transposon (Ac/Ds elememts) in maizehttp://www.nature.com/nature/journal/v443/n7111/images/443521a-i1.0.jpg
Eukaryotic GenomeRepetitive DNA in eukaryotic genome
55
2.2 Retrotransposon
Transposon which requires RNA intermediate for transposition Retrotransposon is similar to retrovirus
Eukaryotic GenomeRepetitive DNA in eukaryotic genome
56
2.2 RetrotransposonRetrotransposon
LTR retrotranspsonมีลาํดบัเบสซํ้ าขนาดยาวที่ปลายทั้งสองดา้น (long terminal repeats; LTR)
Non-LTR retrotranspson
LINEs (long interspersed nuclear elements) มี reverse-transcriptase-like gene
SINEs (short interspersed nuclear elements) ไม่มี reverse-transcriptase-like gene
Eukaryotic GenomeRepetitive DNA in eukaryotic genome
57Retroelements (Brown, 2002)
Eukaryotic Genome
58
Genome Projects of Some Organisms
Genome projects are scientific projects that aim to map and sequence genomes of organisms
There are 3 basic steps to complete the project
Genome sequencing
Genome assembly
Genome annotation
59
(Weaver, 2008)
Genome Projects of Some Organisms
60
The human genome project (HGP) HGP is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome.
The project began in October 1990 by Department of Energy and National Institutes of Health of USA and completed in 2003.
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
The Human Genome Project
61
The objectives of this project were to:1. identify all the approximately 20,000-25,000 genes in human
DNA,2. determine the sequences of the 3 billion chemical base
pairs that make up human DNA,3. store this information in databases,4. improve tools for data analysis,5. transfer related technologies to the private sector, and6. address the ethical, legal, and social issues
(ELSI) that may arise from the project.
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
The Human Genome Project
62
What does the sequence tell us? The human genome size is 3038 Mb. The average gene consists of 3000 bases, but sizes vary greatly, with the largest known human gene being dystrophin at 2.4 million bases. The total number of genes is approximately 20,000-25,000 genes Almost all (99.9%) nucleotide bases are exactly the same in all people. The functions are unknown for over 50% of discovered genes.
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
The Human Genome Project
63
Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest (231). Less than 2% of the genome codes for proteins. Repeated sequences that do not code for proteins ("junk DNA") make up at least 50% of the human genome. Repetitive sequences are thought to have no direct functions, but they shed light on chromosome structure and dynamics. Over time, these repeats reshape the genome by rearranging it, creating entirely new genes, and modifying and reshuffling existing genes. The human genome has a much greater portion (50%) of repeat sequences than the mustard weed (11%), the worm (7%), and the fly (3%).
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
What does the sequence tell us?
The Human Genome Project
64
The Human Genome Project
65
Anticipated benefitsMolecular Medicine• improve diagnosis of disease• detect genetic predispositions to disease• create drugs based on molecular information• use gene therapy and control systems as drugs• design “custom drugs” (pharmacogenomics) based on individual genetic profiles
Microbial Genomics• rapidly detect and treat pathogens (disease-causing microbes) in clinical practice• develop new energy sources (biofuels)• monitor environments to detect pollutants• protect citizenry from biological and chemical warfare• clean up toxic waste safely and efficiently
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
The Human Genome Project
66
DNA Identification (Forensics)• identify potential suspects whose DNA may match evidence left at crime scenes• exonerate persons wrongly accused of crimes• identify crime and catastrophe victims• establish paternity and other family relationships• identify endangered and protected species as an aid to wildlife officials (could be used for prosecuting poachers)• detect bacteria and other organisms that may pollute air, water, soil, and food• match organ donors with recipients in transplant programs• determine pedigree for seed or livestock breeds• authenticate consumables such as caviar and wine
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
The Human Genome ProjectAnticipated benefits
67
Agriculture, Livestock Breeding, and Bioprocessing• grow disease-, insect-, and drought-resistant crops• breed healthier, more productive, disease-resistant farm animals• grow more nutritious produce• develop biopesticides• incorporate edible vaccines incorporated into food products• develop new environmental cleanup uses for plants like tobacco
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
The Human Genome ProjectAnticipated benefits
68
• Gene number, exact locations, and functions • Gene regulation • DNA sequence organization• Chromosomal structure and organization • Noncoding DNA types, amount, distribution, information content, and functions • Coordination of gene expression, protein synthesis, and post-translational events • Interaction of proteins in complex molecular machines• Predicted vs experimentally determined gene function• Evolutionary conservation among organisms• Protein conservation (structure and function)• Proteomes (total protein content and function) in organisms• Correlation of SNPs (single-base DNA variations among individuals) with health and disease• Disease-susceptibility prediction based on gene sequence variation• Genes involved in complex traits and multigene diseases• Complex systems biology including microbial consortia useful for environmental restoration• Developmental genetics, genomics
Future Challenges: What We Still Don’t Know
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
The Human Genome Project
69
Rice (Oryza sativa L.) is the staple food and an important biological model species for monocot plants, and major cereal crops such as maize, wheat, barley and sorghum. Its immense economic value and a relatively small genome size (12 chromosomes) makes it a focal point for scientific investigations. Rice was the first organism whose sequencing was pursued by four groups independently
- International Rice Genome Sequencing Project (IRGSP)- Monsanto- Syngenta- Beijing Genomics Institute (BGI)
Rice Genome Project
japonica cultivar ‘Nipponbare’indica cultivar ‘93-11’
70
This project was started in 1998 and finished in 2004.
Rice Genome Project
71
A total of 37,544 genes have been predicted for the complete sequence with an average gene density of 1 gene/9.9 kb and average gene length of 2,699 bp. Chromosomes 1 and 3 have the highest gene density. Chromosomes 11 and 12 have the lowest gene density. Rice genome comprises ~35% repeat elements. For more details, see Vij et al. (2006)
Rice Genome Project
72
ฐานข้อมูลจโีนมของโครงการศึกษาจโีนมสิ่งมชีีวติเวปไซต ์http://www.ncbi.nlm.nih.gov/sites/genome
73
ฐานข้อมูลจโีนมของโครงการศึกษาจโีนมสิ่งมชีีวติ