finding genes in the rice genome
DESCRIPTION
Finding Genes in the Rice Genome. Hao Bailin T-Life Research Center, Fudan University Beijing Genomics Institute , Academia Sinica Institute of Theoretical Physics, Academia Sinica (www.itp.ac.cn/~hao/) - PowerPoint PPT PresentationTRANSCRIPT
Finding Genes in theRice Genome
Hao Bailin
T-Life Research Center, Fudan University
Beijing Genomics Institute , Academia Sinica
Institute of Theoretical Physics, Academia Sinica
(www.itp.ac.cn/~hao/)
On-going work by a team of 10-12 people since August 2001: Zheng Weimou, Xie Huimin, Liu Jinsong, Xu Zhao, Fang Lin, Li Heng, Gao Lei, Jin Jiao, et al. Nothing written yet.
Two Cultivars of Rice
• Oryza sativa ssp. indica ( 籼稻 )• Oryza sativa ssp. Japonica ( 粳稻 )
The difference was described in Xu Shen’s ( 许慎《说文解字》 ) Chinese Dictionary of East Han Dynasty (~ 2nd Century AD)J.H. Zhang et al. Rice cultivation of Jianhu Remains in
Henan Province, Science J. ( 《科学》杂志 ) , 53( 4 ), 2002 , 3 (in Chinese)
cccaatatcttgcttcagcaagatattgggtatttctagctttcctttcttcaaaaattgctatatgttagcagaaaagccttatccattaagagatggaacttcaagagcagctaggtctagagggaagttgtgagcattacgttcgtgcattacttccataccaagattagcacggttgatgatatcagcccaagtattaataacgcgaccttggctatcaactacagattggttgaaattgaatccgtttagattgaaagccatagtactaatacctaaagcagtgaaccaaatccctactacaggccaagcagccaagaagaagtgtaaagaacgagagttgttaaaactagcatattggaagattaatcggccaaaataaccatgagcggccacaatattataagtttcttcctcttgaccaaatctgtaaccctcattagcagattcgttttcagtggtttccctgatcaaactagaggttaccaaggaaccatgcatagcactgaatagggaaccgccgaatacaccagctacacctaacatgtgaaatggatgcataaggatgttatgctctgcctggaatacaatcataaagttgaaagtaccagatattcctaaaggcataccatcagagaaacttccttgaccaatagggtaaatcaagaaaacagcagtagcagctgcaacaggagctgaatatgcaacagcaatccaaggacgcatacccagacggaaactcagttcccactcacgacccatataacaagctacaccaagtaagaagtgtagaacaattagctcataaggaccaccattgtataaccactcatcaacagatgcagcttcccaaattgggtaaaagtgcaatccgatcgccgcagaagtaggaataatggcaccagagataatattgtttccgtaaagtaaagaaccagaaacaggctcacgaataccatcaatatctactggaggggcagcgatgaaggcgataataaatacagaagttgcggtcaataaggtagggatcatcaaaacaccgaaccatccgatgtaaagacggttttcggtgctagttatccagttgcagaagcgaccccacaggcttgtactttcgcgtctctctaaaattgcagtcatggtaagatcttggtttattcaaattgcaaggactcccaagcacacgtattaactagaaagataatagaaggcttgttatttaacagtataatatagactatataccaatgtcaaccaagccagccccgacagttgtatatccatacaacaaaatttaccaaaccaaaaaattttgtaaatgaagtgagtgaaaaatcaaaactcagattgctcctttctagtttccatatgggttgcccgggactcgaacccggaactagtcggatggagtagataattattccttgttacaatagagaaaaaacctctccccaaatcgtgcttgcatttttcattgcacacgactttccctatgtagaaataggctatttctattccgaagaggaagtctactaatttttttagtagtaagttgattcacttactatttattatagtacagagaacatttcagaatggaaactgtgaaagttttaccttgatcatttatcaatcatttctagtttattagttttgtttaatgattaattaagaggattcaccagatcattgatacggagaatatccaaataccaaatacgctcactgtgcgatccacggaaagaaaagtaagttgttttggcgaacatcaaagaaaaaacttgctcttcttccgtaaaaaattcttctaaaaataccgaacccaaccattgcataaaagctcgtaccgtgcttttatgtttacgagctaaagttctagcgcatgaaagtcgaagtatatactttagtcgatacaaagtcttcttttttgaagatccactgtgataatgaaaaagatttctacatatccgaccaaaccgatcaagaatatcccaatccgataaatcggtccaaattggtttactaataggatgccccgatccagtacaaaattgggcttttgctaaagatccaatgagaggagtaacagggactttggtatcgaattttttcatttgagtatctattagaaatgaattctccagcatttgattccttactaacaaagaatttattggtacacttgaaaagtaccccagaaaatcgaagcaagagttttctaattggtttagatggatcctttgcggttgagtccaaaaagagaaagaatattgccacaaacggacaaggtaacatttccatttcttcttcaaaagaagagttccttttgatgcaagaattgcctttccttgatatcgaacataatgcataaggggatccataacgaaccatatggttttccgaaaaaaagcagggtacattaacccaaaatgttccatcttcctagaaaagatgattcgttccagaaaggttccggaagaagttaatcgcaagcaagaagattgtttacgaagaaacaacaagaaaaattcatattctgatacataagagttatataggaaccgaaatagtcttttattttcttttttcaaaataaaaatggatttcattgaagtaataaaactattccaattcgagtagtagttgagaaagaatcgcaataaatgcaaggatggaacatcttggatccggtattgaaggagttgaagcaagatatccaaatggataggatagggtatttctatatgtgctagataatgtaagtgcaaaaatttgtcttctaaaaaaggaaatattgaatgaatagatcgtaaattctgaaactttggtatttctttttcttccggacaagactgttctcgtagcgagaatgggatttctacaacgatcgcaaacccctcagatagaatctgagaataaaactcagaataaaaaaaattgttgtaatccaataatcgatcttggttaggatgattaaccaaattaatccaaaaattctgctgatacattcgaatcattaaccgtttcacaagtagtgaactaaatttcttgttattagaaccaataatttcgacaagttcggaaccatttaatccataatcatgggcaaacacataaatgtactcctgaaagagtagtgggtagacgaaatattgtctaggaaatttaagtttttctgaataaccctcgaatttttccatttgtatttctacttgaatcagagagagagaaatatttctcggtttatcaaatggtgatacatagtacaatatggtcagaacagggtgttgcattttttaatacaaacccctggggaagaaaaggagtctaatccacggatctttttccgctccttttctatccaatttgtttatgtttgttctaattacaaaagagaacaaatcctttatttttgcaggccaattgctcttttgactttgggatacagtctctttatcaatatactgcttcttttacacattcaatccataacatccttttcaatccaaaatcaagaataattaggatttctaaaaaaaaaagaaaaaatcaaaggtctactcataggaaaaccagcttttccctacatcaggcactaatctatttttaacgtctaattagatcagggagttcttccaattaagaagttaagctcgttgctttttgttttaccagaattggagccaggctctatccatttattcattagacccagaaaatcagaatttttttattccattccaaaaatccaaaataagaaattgattttattacgacatgctattttttccattcattacccttgaggatcagtcgcggtcttatagactctaccaagagtctggacgaattttttgcttcatccaaatgtgtaaaagatcatagtcgcacttaaaagccgagtactctaccattgagttagcaacccagataaactaggatcttagatacgatcgaaatccaaaaatcaatggaattacaccgcacacccctgtcaaaatcttaaaatagcaagacattaaaagaaagattttatcaccattgaaaacactcagataccaaaaggaacgggtctggttaaatttcactaaggttaaaagtggcaccaatcacgatcgtaaaattgtcatttttttagcatttttatttaaataaataaataaatcttgtatgagagtacaaacaagagggacaaccctaccatttgagcaaagtgtaggcaaaaaacctaatagggagtgaggataaagagacttatccatctacaaattctagatgttcaatggacctttgtcaatggaaatacaatggtaagaaaaaaattagatagaaaaactcaaaaaaataaaggcttatgttggattggcacgacataaatccagtcaaaaataggattaagaaagaggcaaattatttctaaatagttagacaacaagggatactagtgagcctctcctagttttttattcatttagttcttcaattaactcaaagttctttctttttctttaaagaattccgccttccttaaaatatcagaaacggttcttgtaggttgagcacctttttcaaggaaatagagaatagctggaacatttaaacaagtttgattctttatcggatcataaaaacctacttttcgaagatctcttccttctcttcgagatcgaacatcaattgcaacgattcgatagacagcttattgggatagatgtagataaataaagccccccctagaaacgtataggaggttttctcctcatacggctcgagaatatgacttgcattaatttccgtacagaaaaaacaaatttcatttatactcatgactcaagttgactaattttgattgacagacttgaaagaaaaaaatcctttgaaattttttgagtcgtctctaaactcttttctttgcctcatctcgaacaaattcacttttattccttattccggtccaattctattgttgagacagttgaaaatcgtgtttacttgttcgggaatcctttatctttgatttgtgaaatccttgggtttaaacattacttcgggaattcttattcttttttctttcaaaagagtagcaacatacccttttttcttatttccttcgataaagcatttccctcttctatagaaatcgaatatgagcgattgattctgatagactttaatcaaaagagttttcccatatcttccaaaattggactttcttcttattttaaccttttgatttctatattatttcgatttctatattaagggtagaatgacaaagttggcctaatttattagttttcactaaccctagattctttcccttgataaaaaataaattctgtcctctcgagctccatcgtgtactatttacttagcttacttacaaacaacccagcgaaaattcggttcgggacgaatagaacagactatgtcgagccaagagcattttcattactatggaaaatggtggatagcaaaatccacaatcgatcgtgtccttcaagtcgcacgttgctttctaccacatcgttttaaacgaagttttaacataacattcctctaatttcattgcaaagtgttatagggaattgatccaatatggatggaatcatgaatagtcattagtttcgttttttgtatactaattcaaacttgctttgctatctatggagaaatatgaataaaagaaattaagtatttatcgggaaagactccgcaaagagccaatttatttaaacccatattctatcatatgaatgaaatatagttcgaaaaaagggaataaacaagtttgcttaagacttatttattatggaatttccatcctcaacagaggactcgagatgatcaatccaatcctgaaatgataagagaagaattgactcttctccaacaaataaactatcaacctcccgtttaattaatttaattaatatattagattagcaatctatttttccataccatttttccgtaacaaaactaattaactattaactagttaaactattgcaatgaaaagaaagttttttggtagttatagaattctcgtatttcttcgactcgaataccaaaagaaagaaaaaaatgaagtaaaaaaaacgcatttcctgtaaagtaaaattaaggtctttgcttttacttattttttcttttacctaaaagaagcaactccaaatcaaaattgaatccattctatctaacgagcagttcttatcttatctttaccgggatggatcattctggatatttaaaaaatcgcggatcgagatcgtttttgcttaaccaaagaaagaaaaagaagaaggaaccttttttactaataaaatactataaaaaaaatttatctctatcataaatctatctctaccataaaggaataggtctcgttttttatacaatgttctacgtcaagtttaaaattttttcatgaaaaaaagattttcaatttgactggacttgacactggattatgttttctgagacagaaaatgaacgcattaggactgcatcgaatctaagagtttataagagaaaaaaattctctttaataaactttatgtctcgtgcagaatacaatacgatttcatctttcgtttcatcagaaaaaatctgggacggaaggattcgaacctccgagtaacgggaccaaaacccgctgccttaccacttggccacgccccatttcgggttttatgcgacactaataaacagtattatgtttatttcttattcgtcaatcctacttcaattacataaaaatggggggtattctcttggtaggattctagacatgcgaataatatagaatccaaaaaatgcattgatcattacatggaattctattaagatattatatgaaagtcgaatttcttccactctcatttgagagtgcgaatacaaggaggtattttgtgtttgggaaagtccgaagaaaaaaggattttgaatcctccttttcctttttcccttagaaaaataactcaatcaaaatccaattatctactctacaagaacgaaacgcttgttatgcctaatatacttagtttaacctgtatttgttttaattctgttatttatccgactagttttttcttcgccaaattgcccgaagcttatgccattttcaatccaatcgtggattttatgcctgtcatacctgtactcttttttctattagcctttgtttggcaagctgctgtaagttttcgatgaaatctttactactctgtctgccaaattgaatcatgtattcattctaaaaaaattcgaaaaatggataagagccgagaagtcttatattatgaaccttcgattctaaaattcaaattcttctacattgaatgtatagctgcagcaataaatttggatcagcctttctactccctgcatctacgttgagcaggtatctttaggtaaccgcacaatacctaacctaatttattgataagagtgcttattataaatcaattcttgcaatttttttcaaaaattgatttttgcatttttaggtgtcaaaataaacaaaacccatcctagtggatttgtgtggtaaggaaaaacgggtaatctattccttaaaaaaaaatcttggagattatgtaatgcttactctcaaactttttgtttatacagtagtgatattctttgtttccctctttatctttggattcttatctaatgatccaggacgtaatcctgggcgtgacgagtaaaaatccaaaattttttcttacaaattggatttgtttcatacatttatctacgagaaaatccgggggtcagaattccttccaattcgaaagtcccaaacgatccgagggggcggaaagagagggattcgaaccctcggtacaaaaaaattgtacaacggattagcaatccgccgctttagtccactcagccatctctccccgttccaaatcgaaaggtttccgtgatatgacagaggcaagaaataacgattgcaaaaaatccttcctttttctttcaaaagttcaaaaaaattatattgccaattccattttagttatattcttttttcttaatgttaataaaaaaaagaagaaaattcttcttttttctttctaattctaaaattggatattggctaaaagacaatcagatagattttctcttcagcaggcatttccatataggacttgttataataaaacaagcaggttatagaaaaaaactcttttttttattatttatcaacaaagcaaaaaggggtcttatcaaaccaacccaccccataaaattggaaagaaagataaagtaagtggacctgactccttgaatgaggcctctatccgctattctgatatataaattcgatgtagatgaaattgtataagtggatttttttgtatttccttagacttagaccacgcaaggcaagaatttctcgctatttactatttcatattcttgttactagatgttctataggaataagaagaaatcgcaacccctttccgctacacataaaaatggatttcgaaagtcaatttttcttttcaatatctttactttttttcagaatcctatttttgttcttatacccatgcaatagagagcgagtgggaaaagggaggttactttttttcattttttccttaaaaaataggctttcttggaaataggaatcatggaataatctgaattccaatgtttatttctatagtataagaaaaactaattgaatcaaattcatggatttaccacgacctcggctgtgaccccatagataaaaatgcaaaatttctatcttcgagaccattgaaaaaaggcattgaacgagaaaaaatcgtccacagataatctatcgtatgccttggaagtgatataaggtgctcggaaatggttgaagtaattgaataggaggatcactatgactatagcccttggtagagttactaaagaagaaaatgatttatttgatattatggacgactggttacgaagggaccgttttgtttttgtaggatggtctggcctattgctttttccttgtgcttatttcgctttaggaggttggtttacagggacaacttttgtaacttcttggtatacccatggattggcgagttcctatttggaaggttgcaatttcttaaccgcagcagtttccacccctgccaatagtttagcacactctttgttgctactatggggcccggaagcacaaggggattttactcgttggtgtcaattaggtggtctgtggacttttgttgctctccatggggcttttgcactaataggtttcatgttacgtcaatttgaacttgctcggtctgttcaattgcggccttataatgcaatttcattctctggcccaatcgctgtttttgtttccgtattcctgatttat
ccactggggcaatccggttggttctttgcgccgagttttggcgtagcagcgatatttcgattcatcctcttcttccaaggatttcataattggacgttgaacccatttcatatgatgggagttgccggagtattaggcgcggctctgctatgcgctattcatggggcaaccgtgga
Gene-Finding by Computer
Starting from early 1980s:
• “Ab initio” or “de novo” algorithms: GeneMark, GenScan, FgeneSH, Genie, …based on gene-structure models and training data. (Our on-going project: BGF, the BGI Gene Finder)
• Homolog methods based on sequence alignment with known genes in databases
• Mixed approach using both strategy: TwinScan
Different Stages of Gene-Finding
• Use all possible existing programs and services on the web with a public-domain or home-made genome viewer
• Write your own gene-finder, trained for the specific organism
• A dream for the time being: design a self-training and self-developing program “for any species” which would improve itself iteratively starting from a few available reads, cDNAs, and ESTs
Performance of Gene-Finders in Eukaryote Genomes
• M. Q. Zhang, Nature Review Genetics, 3 (2002) 698-710 (mostly for the human genome):
Nucleotide level: 80% Exon level: 45% Whole gene structure: 20%• FgeneSH and BGF for rice (our tests on 128 cDNA-confirm
ed single-gene genomic sequences): Nucleotide level: 90% Exon level: 60% Whole gene structure: 40%
5‘ 3‘
3‘ 5‘
Each strand carries the same amount of information, but different sets of genes.Two strands are equivalent in information content.Two strands are not equivalent in gene content.Biological processing (duplication, transcription) goes from 5’ to 3’. Finding genes on one strand at a time or on two strands at the same time: one-pass or two-pass programs.
5’-UTR 3’-UTR
transcribe
Genomic DNA
Pre-mRNA
splice
mRNA
translate
AA seq ( protein primary seq )
fold
Protein fold
start stop
5’ 3’
RNA Pol II +…
splicesome u1u2u4u5u6RNP
ribsome init.
+ elong. factors term.
chaperonine
Three Scales of Search• Local: signals with minimal signature (start, stop, sp
licing); movable signals (caps, promoters, polyAs, branching points, some very weak) --- clustering, discrimination analysis, various statistical models
• Intermediate: exons, introns, intergenic --- Markov, semi-Markov, Hidden-Markov models; intron length distribution
• Global: optimal combination of the above --- dynamic programming
{()【( . )( . )( . )】()}
Signals:• { transcription start (downstream of promoters)
• } transcription end (upstream of poly-A)
• 【 translation start (ctg, 1/64 in a random seq.)
• 】 translation end (tag, tga, taa, 3/64)
• ( splicing donor site (minimal signal=gt, 1/16)
• ) splicing accepter site (ag, 1/16)
• · branching point (very weak …a…)
Transcription Translation Translation Transcription start start end end
{()【( . )( . )( . )】()}
• 【( First exon
• )( Internal exon
• )】 Last exon
• {( Non-coding 5’ exon
• )【 Non-coding 5’ exon
• ( . ) Intron
• 】( Non-coding 3’ exon (rare)
• )} Non-coding 3’ exon (rare)
• }{ Intergenic region
Transcription Translation Translation Transcription start start end end
Signal and Sequence Models
• eiid: equal probability independently and identically distributed
• niid: non-equal probability independently and identically distributed
• WWM: Windowed weight matrix, etc.
• MMn: Markov chain model of order n: homogeneous and period-3 MM5 are used in many gene-finders
• Consensus sequence
Consensus Sequences• TATAAT ( Pribnov or -10 box ):
T80A95T45A60A50T96
• TTGACA ( -35 box ):
T82T84G78A65C54A45
• CAAT ( CAAT or –75 box ):
GGYCAATCT• TATA ( TATA or Goldberger-Hogness box ):
TATAWAW• ATG ( Transcription start point )
However, in Aful: ATG –76% GTG –22% TTG –2%
GT-AG Rule for Intron 5’ splicing donor site
exon …A64G73 G100T100A62A68G84T63… …12PyNC65A100G100 N…exon
3’ splicing
acceptor site
Exon Intron
Arapdopsis
Rice
Human
Exon and intron size distribution
Algorithms
• Sequence models and scores for signals
• Dynamic programming: optimal parse
• Hidden Markov Model: geometric distribution of intron lengths
• Semi-Hidden Markov Model: needs sequence-generating models and length probability for each node
• Language theory approach
Flow Chart of GenScan
Chris Burge (1996): A 27-state semi-HMM A simpler model: 19-stateA model taking UTR introns into account : 35-state
Figure : N, intergenic
region; P,promotor; F,
5’UTR; , single-
exon gene; , initial
exon; phase
k internal exon; ,ter
-minal exon; T, 3’UTR;
A,polyadenylation signal;
and, , phase k
intron. ) strand.
snglE
initE
)20( kEk
termE
)20( kI k
Problems: Minor and Major
• Ambiguity symbols (N, W, S, R, …)
• (1-p) at flanking D-type nodes
• Indels and frame-shifts
• Gradient effects in gene structure
• Introns in 5’-UTRs and 3’-UTRs: leading to 35-state Markov Models
• Alternative splicing and sub-optimal paths
• Limit of probabilistic models
• Deterministic approaches
Dyck language: A language of nested parentheses
• Many types of parentheses
• Finite depth of nesting
• Context-free language
Our case:
• Only 3 types of parentheses
• Shallow nesting
• Conjecture: may be regular language
Two Test Datasets for RiceGene-Finders
• The 28469 japonica full-length cDNAs (Kikuchi et al., Science 301 (18 July 2003)
• Select a high-quality subset without overlaps with publically available cDNAs
• A single-gene set: 500 sequences with one gene in each
• A multi-gene set: 46 sequences with 199 genes in total (at least 4 genes in a sequence)
Assessment of Gene-Finders
Test done between 22 July and 2 August 2003
• FgeneSH (trained on monocotyledons)
• GeneMark.hmm
• RiceHMM
• GlimmerR
• GenScan (trained on maize)
• BGF
Our Ultimate Goal
• An iterative, self-training, self-improving gene-finder “for any species”, starting from a small number of reads with or without EST, cDNA supports
• Annotaion and re-annotation of the rice genomes
• Plant comparative genomics, especially, that of Gramene and Crucifers
tRNA features
• tRNA gene pre-tRNA mature tRNA
• Mature tRNA: 75 – 95 bases
• Cloverleaf like structure
• Five arms: acceptor arm, D arm, anticodon arm, V loop (extra arm), T C arm
How many tRNA genes are present in an organism?
• Codon tRNA amino acid
• 61 encoding codons
• 20 amino acids
• Are there 61 species of tRNA with all possible anticodons ?
• Met (M) has one codon but two tRNAs
Wobble hypothesis Crick, 1966
• Many tRNAs recognize more than one codon
• Through non-Watson-Crick base pairings
• Less than 61 tRNAs are needed
The Modified Wobble Hypothesis(Guthrie & Abelson 1982)
• In eukaryotes, 46 different tRNA species would be enough.
• The modified wobble hypothesis is almost perfectly hold in H. sapiens, S. cerevisiae, A. thaliana, C.elegans whose complete collection of tRNAs are now known.
aa codonA C H anti aa codonA C H anti aa codonA C H anti aa codonA C H anti
UUU0 0 0 AAA UCU37 14 10 AGA UAU0 0 1 AUA UGU0 0 0 ACA
UUC16 16 14 GAA UCC1 0 0 GGA UAC76 19 11 GUA UGC15 1330GCA
UUA6 5 8 UAA UCA9 7 5 UGA UAA0 0 1 UUA UGA0 0 0 UCAUUG10 7 6 CAA UCG4 5 4 CGA UAG0 0 1 CUA UGG14 11 7 CCA
CUU11 18 13 AAG CCU16 6 11 AGG CAU0 0 0 AUG CGU9 18 9 ACG
CUC1 0 0 GAG CCC0 0 0 GGG CAC10 17 12 GUG CGC0 1 0 GCG
CUA10 3 2 UAG CCA39 34 10 UGG CAA8 18 11 UUG CGA6 10 7 UCGCUG3 5 6 CAG CCG5 3 4 CGG CAG9 7 21 CUG CGG4 3 5 CCG
AUU20 19 13 AAU ACU10 17 8 AGU AAU0 0 1 AUU AGU 0 0 0 ACU
AUC0 0 1 GAU ACC0 0 0 GGU AAC16 20 33 GUU AGC13 9 7 GCU
AUA5 8 5 UAU ACA8 11 10 UGU AAA13 16 16 UUU AGA9 7 5 UCUAUG23 20 17 CAU ACG6 7 7 CGU AAG18 33 22 CUU AGG8 3 4 CCU
GUU15 19 20 AAC GCU16 21 25 AGC GAU0 0 0 AUC GGU1 0 0 ACC
GUC0 0 0 GAC GCC0 0 0 GGC GAC23 22 10 GUC GGC23 1411GCCGUA7 6 5 UAC GCA10 10 10 UGC GAA12 17 14 UUC GGA12 33 5 UCCGUG8 5 19 CAC GCG7 4 5 CGC GAG13 20 8 CUC GGG5 3 8 CCC
tRNA copies in Arabidopsis, C. elegans, and Human
F
L
I
M
V
S
P
T
A
Y
*
H
Q
N
K
D
E
C
*W
R
S
R
G
*
tRNA Genes in the Rice Genome(Found by tRNAScan-SE + BLASTN)
Chromosome Indica (BGI) Japonica/syngenta (IRGSP) 1 85 71 (85) 2 57 59 3 79 68 4 45 46 (41) 5 58 56 6 38 32 7 34 35 8 45 42 9 34 32 10 28 23 (28) 11 23 24 12 38 36 Total 564 (in 382 Mbp) 519 (in 360 Mbp)
Chloroplast tRNA genes in ssp. indica and japonica
• 33 tRNA genes found in indica and japonica genome respectively.
• They are completely identical, no mutation is found (E. C. Kemmerer and Ray Wu found two tRNA genes perfectly conserved).
• It is remarkable that in spite of more than 9000 years of separation no mutation could be observed in the chloroplast tRNA genes in the two ssp.