comparative analysis on the conservation of two orthologous genome regions

1
Comparative Analysis on the Conservatio n of two Orthologous Genome Regions Recently, draft sequences of human and mouse genome have be en released. Such genomic sequence data are increasing very rapidly and effective annotation methods are required. Inve stigating orthologous contigs of two genomes can be very us eful to understand their functional similarities and differ ences. Especially, as the mouse becomes a leading model for studying disease processes in humans, the emerging mouse ge nome data have become valuable resources for functional stu dies [1]. Introduction A set of orthologous contigs was collected from the Human G enome Consortium and the Mouse Genome Database respectivel y. We targeted contigs which contain neurological disorder related genes. These orthologous contigs at least share one or more genes in common. Global similarities were examined between two orthologous groups and were visualized by dot p lots. Contigs were ranked according to the degree of comple teness and their annotation information. Thereby 22 sets of human and 13 sets of mouse contigs were finally chosen as d ata sets for further comparison. All the functional regions were specifically predicted by G enscan[3] (exon, promoter, polyA, etc.), which is the most advanced and reliable gene prediction algorithm based on Hi dden Markov Models, by RepeatMasker[4] (various repeat regi ons), and by Sim4[5] (CDS regions). In addition, known anno tations were extracted from Genbank flat data files. All th e previous tasks were carried out in a systematic manner af ter human and mouse genomes were collected from the Interna tional Human Genome Mapping Consortium[6]. The platform use d was Linux and most of the processes were automated. Both of the putative and the known annotations of human genome w ere applied to annotate those of mouse genome as a whole. F inally human-mouse contigs were displayed using web based v isualization tool, Pip[7] (percent identity plot) as cross- species sequence comparisons. Methods Conserved regio n : Human vs Mouse This overall conservation can be interpreted as 1) the regi ons contain regulatorily important elements, 2) the genomes were under pressure from structural and stability constrain ts, 3) the genes in the regions are functionally critical t o the whole organism or 4) the evolutionary distance betwee n the mice and humans is not sufficiently big to allow any divergence. As known in yeast and prokaryotes, genomes tend to contain shuffled and highly mutated regions outside of t he coding regions even between very closely related specie s. Even though mice and humans are relatively close to each other, there are many cases where the locations of genes fo r certain functions are differently distributed across the chromosomes and genomes, so the conservation observed here can not be explained by the evolutionary distance between t he two genomes. Therefore, the conserve non-coding regions possibly suggest a hidden and significant non-coding region s to the disease associated genes through direct and indire ct functional causes. This approach of analyzing contigs wh ich are not necessarily coding and well-annotated could giv e us further insights for genomic regions containing neurol ogical disorder. Conclusion and Discussion Recently, draft sequences of human and mouse genome have be en released. Such genomic sequence data are increasing very rapidly and effective annotation methods are required. Inve stigating orthologous contigs of two genomes can be very us eful to understand their functional similarities and differ ences. Especially, as the mouse becomes a leading model for studying disease processes in humans, the emerging mouse ge nome data have become valuable resources for functional stu dies [1]. Results [1] Battey, J., Jordat, E., Cox, D. and Dove, W. (1999) An action plan for mouse genomics. Nature Genet., 21, 73-75 [2] http://www.tigr.org/tdb/hgi/index.html [3] Burge, C. & Karlin S. (1997) Prediction of completed gene structures in human genomic DNA. J. Mol. Biol. 268, 78-94. [4] http://ftp.genome.washington.edu/RM/RepeatMasker.html [5] Liliana Florea, George Hartzell, Zheng Zhang, A Computer Program for Aligning a cDNA Sequence with … , Genome Research 8:967-974 [6] The International Human Genome Mapping Consortium, A physical map of the human genome, Nature article Vol 409, 15 February 2001 [7] Scott Schwartz, PipMaker-A Web Server for Aligning Two Genomic DNA Sequences, Genome Research 10:577-586 http://bio.cse.psu.edu/pipmaker/ References This work was funded in part by the Bioinformatics Training Grant of Ministry of Health & Welfare, and supported by Pus an National University of Korea and MRC, UK. Acknowledgement Conserved non-coding region False negative False positive

Upload: leilani-davidson

Post on 31-Dec-2015

29 views

Category:

Documents


5 download

DESCRIPTION

Conserved non-coding region. False positive. False negative. Conserved region : Human vs Mouse. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Comparative Analysis on the Conservation  of two Orthologous Genome Regions

Comparative Analysis on the Conservation of two Orthologous Genome Regions

Recently, draft sequences of human and mouse genome have been released. Such genomic sequence data are increasing very rapidly and effective annotation methods are required. Investigating orthologous contigs of two genomes can be very useful to understand their functional similarities and differences. Especially, as the mouse becomes a leading model for studying disease processes in humans, the emerging mouse genome data have become valuable resources for functional studies [1].

Introduction

A set of orthologous contigs was collected from the Human Genome Consortium and the Mouse Genome Database respectively. We targeted contigs which contain neurological disorder related genes. These orthologous contigs at least share one or more genes in common. Global similarities were examined between two orthologous groups and were visualized by dot plots. Contigs were ranked according to the degree of completeness and their annotation information. Thereby 22 sets of human and 13 sets of mouse contigs were finally chosen as data sets for further comparison.

All the functional regions were specifically predicted by Genscan[3] (exon, promoter, polyA, etc.), which is the most advanced and reliable gene prediction algorithm based on Hidden Markov Models, by RepeatMasker[4] (various repeat regions), and by Sim4[5] (CDS regions). In addition, known annotations were extracted from Genbank flat data files. All the previous tasks were carried out in a systematic manner after human and mouse genomes were collected from the International Human Genome Mapping Consortium[6]. The platform used was Linux and most of the processes were automated. Both of the putative and the known annotations of human genome were applied to annotate those of mouse genome as a whole. Finally human-mouse contigs were displayed using web based visualization tool, Pip[7] (percent identity plot) as cross-species sequence comparisons.

Methods

Conserved region : Human vs Mouse

This overall conservation can be interpreted as 1) the regions contain regulatorily important elements, 2) the genomes were under pressure from structural and stability constraints, 3) the genes in the regions are functionally critical to the whole organism or 4) the evolutionary distance between the mice and humans is not sufficiently big to allow any divergence. As known in yeast and prokaryotes, genomes tend to contain shuffled and highly mutated regions outside of the coding regions even between very closely related species. Even though mice and humans are relatively close to each other, there are many cases where the locations of genes for certain functions are differently distributed across the chromosomes and genomes, so the conservation observed here can not be explained by the evolutionary distance between the two genomes. Therefore, the conserve non-coding regions possibly suggest a hidden and significant non-coding regions to the disease associated genes through direct and indirect functional causes. This approach of analyzing contigs which are not necessarily coding and well-annotated could give us further insights for genomic regions containing neurological disorder.

Conclusion and Discussion

Recently, draft sequences of human and mouse genome have been released. Such genomic sequence data are increasing very rapidly and effective annotation methods are required. Investigating orthologous contigs of two genomes can be very useful to understand their functional similarities and differences. Especially, as the mouse becomes a leading model for studying disease processes in humans, the emerging mouse genome data have become valuable resources for functional studies [1].

Results

[1] Battey, J., Jordat, E., Cox, D. and Dove, W. (1999) An action plan for mouse genomics. Nature Genet., 21, 73-75

[2] http://www.tigr.org/tdb/hgi/index.html

[3] Burge, C. & Karlin S. (1997) Prediction of completed gene structures in human genomic DNA. J. Mol. Biol. 268, 78-94.

[4] http://ftp.genome.washington.edu/RM/RepeatMasker.html

[5] Liliana Florea, George Hartzell, Zheng Zhang, A Computer Program for Aligning a cDNA Sequence with … , Genome Research 8:967-974

[6] The International Human Genome Mapping Consortium, A physical map of the human genome, Nature article Vol 409, 15 February 2001

[7] Scott Schwartz, PipMaker-A Web Server for Aligning Two Genomic DNA Sequences, Genome Research 10:577-586

http://bio.cse.psu.edu/pipmaker/

References

This work was funded in part by the Bioinformatics Training Grant of Ministry of Health & Welfare, and supported by Pusan National University of Korea and MRC, UK.

Acknowledgement

Conserved non-coding region

False negative

False positive