medaka genome project - pdfs.semanticscholar.org...medaka genome project daisuke kobayashi and...

12
Medaka genome project Daisuke Kobayashi and Hiroyuki Takeda Advance Access publication date 4 October 2008 Abstract The medaka Oryzias latipes is a small egg-laying freshwater teleost, and has become an excellent model system for ecotoxicology, developmental genetics, genomics and evolutionary biology studies. A high-quality draft genome sequence (700 Mb) of the medaka inbred strain, Hd-rR, has now been completed. In this review, we present an overview of the medaka genome project and describe the substantial genomic resources and genome browsers through which researchers can now freely access these resources. We also present specific findings that have been obtained using these tools, such as sex chromosome differentiation, gene prediction and gene evolution. Keywords: Whole genome shotgun; mutant; sex chromosome; SNP; evolution INTRODUCTION Over the past decade, most model organisms, including human and mice, have had their genomes completely sequenced. Medaka (Oryziaslatipes) is one of the model vertebrates that has generated increas- ing interest in developmental genetics, genomics and evolutionary biology [1–7] and has also become an important test system in ecotoxicology and carcino- genesis studies [8]. Notably, the first demonstration in any species of Y-linked inheritance [9], the Erst successful sex-reversal in vertebrates [10] and the identification of the male-determining gene, DMY, the first non-mammalian equivalent of SRY were achieved [11]. Comparative studies of distantly related vertebrate species are essential to identify the conserved, as well as the species-specific, genetic and molecular mechanisms that underlie development and evolu- tion. Medaka and zebrafish are ideal for this purpose. In addition to these species, another teleost fish, the fugu (Takifugu and Tetraodon) has been utilized in genome biology. The complete draft sequence of fugu [12, 13] had a large impact on medaka genomics, as medaka is a much more closely related to fugu than to zebraEsh [14] (Figure 1). In 2000, the zebraEsh genome sequencing project began at the Sanger Centre and sequencing data continues to be updated frequently online (http://pre.ensembl.org/ Danio_rerio/). It was anticipated also that whole- genome sequencing of the medaka would facilitate comparative genomic studies of both teleosts and vertebrate organisms. In addition to common shared features with zebraEsh, the medaka has several advantages as a model vertebrate: (i) a smaller genome (800 Mb, half the size of the zebraEsh genome); (ii) the availability of polymorphic and highly fertile inbred strains derived by Hyodo- Taguchi and Egami [15, 16]. A low number of polymorphisms within a strain is one of the keys to a successful genome sequence assembly. In addition, the high degree of polymorphism between the inbred strains derived from the southern and north- ern populations of medaka was the basis for the establishment of a genetic map for this species [5]; and (iii) permissive growth at a wide range of temperatures (6–40 C) during embryonic develop- ment, which increases the chance of isolating temperature-sensitive mutants. Recently, large-scale mutagenesis projects have been conducted by several groups in Japan and delivered a vastly expanded pool of medaka mutant stocks [4, 17]. Rapid completion of the genomic sequence of medaka was a crucial step in enabling the rapid movement from mutant phenotypes to gene functions. It also greatly facilitated the positional cloning of medaka developmental mutants and thus Corresponding author. Daisuke Kobayashi, Department of Anatomy and Developmental Biology, Graduate School of Medical Science, Kyoto Prefectural University of Medicine, Kyoto 602-0841, Japan. Tel: þ81-75-251-5302; Fax: þ81-75-231-5304; E-mail: [email protected] Daisuke Kobayashi is a research associate at Department of Anatomy and Developmental Biology, Graduate School of Medicine, Kyoto Prefectural University of Medicine, Kyoto, Japan. Hiroyuki Takeda is a Professor at the Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS. VOL 7. NO 6. 415^ 426 doi:10.1093/bfgp/eln044 ß The Author 2008. Published by Oxford University Press. All rights reserved. For permissions, please email: [email protected]

Upload: others

Post on 10-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Medaka genome project - pdfs.semanticscholar.org...Medaka genome project Daisuke Kobayashi and Hiroyuki Takeda Advance Access publication date 4 October 2008 Abstract The medaka Oryzias

Medaka genome projectDaisuke Kobayashi and Hiroyuki Takeda

Advance Access publication date 4 October 2008

AbstractThe medaka Oryzias latipes is a small egg-laying freshwater teleost, and has become an excellent model system forecotoxicology, developmental genetics, genomics and evolutionary biology studies. A high-quality draft genomesequence (700Mb) of the medaka inbred strain, Hd-rR, has now been completed. In this review, we present anoverview of the medaka genome project and describe the substantial genomic resources and genome browsersthrough which researchers can now freely access these resources.We also present specific findings that have beenobtained using these tools, such as sex chromosome differentiation, gene prediction and gene evolution.

Keywords:Whole genome shotgun; mutant; sex chromosome; SNP; evolution

INTRODUCTIONOver the past decade, most model organisms,

including human and mice, have had their genomes

completely sequenced. Medaka (Oryziaslatipes) is one

of the model vertebrates that has generated increas-

ing interest in developmental genetics, genomics and

evolutionary biology [1–7] and has also become an

important test system in ecotoxicology and carcino-

genesis studies [8]. Notably, the first demonstration

in any species of Y-linked inheritance [9], the Erst

successful sex-reversal in vertebrates [10] and the

identification of the male-determining gene, DMY,

the first non-mammalian equivalent of SRY were

achieved [11].

Comparative studies of distantly related vertebrate

species are essential to identify the conserved, as well

as the species-specific, genetic and molecular

mechanisms that underlie development and evolu-

tion. Medaka and zebrafish are ideal for this purpose.

In addition to these species, another teleost fish,

the fugu (Takifugu and Tetraodon) has been utilized

in genome biology. The complete draft sequence

of fugu [12, 13] had a large impact on medaka

genomics, as medaka is a much more closely related

to fugu than to zebraEsh [14] (Figure 1). In 2000, the

zebraEsh genome sequencing project began at the

Sanger Centre and sequencing data continues to be

updated frequently online (http://pre.ensembl.org/

Danio_rerio/). It was anticipated also that whole-

genome sequencing of the medaka would facilitate

comparative genomic studies of both teleosts and

vertebrate organisms. In addition to common shared

features with zebraEsh, the medaka has several

advantages as a model vertebrate: (i) a smaller

genome (�800 Mb, half the size of the zebraEsh

genome); (ii) the availability of polymorphic and

highly fertile inbred strains derived by Hyodo-

Taguchi and Egami [15, 16]. A low number of

polymorphisms within a strain is one of the keys to a

successful genome sequence assembly. In addition,

the high degree of polymorphism between the

inbred strains derived from the southern and north-

ern populations of medaka was the basis for the

establishment of a genetic map for this species [5];

and (iii) permissive growth at a wide range of

temperatures (6–40�C) during embryonic develop-

ment, which increases the chance of isolating

temperature-sensitive mutants.

Recently, large-scale mutagenesis projects have

been conducted by several groups in Japan and

delivered a vastly expanded pool of medaka mutant

stocks [4, 17]. Rapid completion of the genomic

sequence of medaka was a crucial step in enabling the

rapid movement from mutant phenotypes to gene

functions. It also greatly facilitated the positional

cloning of medaka developmental mutants and thus

Corresponding author. Daisuke Kobayashi, Department of Anatomy and Developmental Biology, Graduate School of Medical

Science, Kyoto Prefectural University of Medicine, Kyoto 602-0841, Japan. Tel: þ81-75-251-5302; Fax: þ81-75-231-5304;

E-mail: [email protected]

Daisuke Kobayashi is a research associate at Department of Anatomy and Developmental Biology, Graduate School of Medicine,

Kyoto Prefectural University of Medicine, Kyoto, Japan.

Hiroyuki Takeda is a Professor at the Department of Biological Sciences, Graduate School of Science, The University of Tokyo,

Tokyo, Japan.

BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS. VOL 7. NO 6. 415^426 doi:10.1093/bfgp/eln044

� The Author 2008. Published by Oxford University Press. All rights reserved. For permissions, please email: [email protected]

Page 2: Medaka genome project - pdfs.semanticscholar.org...Medaka genome project Daisuke Kobayashi and Hiroyuki Takeda Advance Access publication date 4 October 2008 Abstract The medaka Oryzias

provided a powerful tool for comparative genomic

analyses which are underway in many laboratories

worldwide. Indeed, Yokoi etal. [18] recently reported

the unique-phenotype medaka mutant, headfish,which is defective in fgf receptor 1 ( fgfr1). The authors

elucidated an interesting feature of ligand–receptor

evolution by comparing the phenotypes of medaka

and zebrafish fgfr1- and fgf8-defective embryos.

In this review, we provide an overview of the

medaka draft genome and describe some of the

practical applications of the available genome

information and related biological resources. We

also illustrate some of the unique aspects of the

medaka genome which were found during the

course of the project.

OVERVIEWOF THEMEDAKAGENOME PROJECTThe medaka genome project commenced in late

2002 as a collaborative effort of three core labo-

ratories led by H. Takeda (University of Tokyo,

Japan), S. Morishita (University of Tokyo, Japan) and

Y. Kohara [National Institute of Genetics (NIG),

Japan]. Funding support for this project was obtained

from Grants-in-Aid for Scientific Research in

Priority Area ‘Genome Science’ from the Ministry

of Education, Culture, Sports, Science and Tech-

nology of Japan. The sequencing was carried out at

the Academia Sequencing Centre of the NIG and

was successfully completed in 2006, providing a

high-quality draft genome [19]. For this medaka

genome project, we adopted the whole genome

shotgun (WGS) sequencing approach rather than of

the use of Bacterial Artificial Chromosome (BAC)

clones, due to its simplicity and rapid early coverage

of the whole genome at lower cost. There were

three key aspects underlying the success of our WGS

approach: (i) the inbred medaka strain Hd-rR was

used so that no polymorphisms were present other

than in the region around the sex-determining gene;

(ii) pair-mate sequence data from large inserts in

BAC and Fosmid libraries were used to assemble the

contigs into scaffolds; and (iii) a dense SNP genetic

map (see below) was constructed and used to align

the scaffolds we generated on the chromosomes.

The Hd-rR genome was re-constructed from 13.8

million reads of the WGS plasmid, Fosmid and BAC

libraries (Table 1). The total size of the assembled

Figure1: Phylogenetic relationship of teleost.Our scenario of genome evolution from the ancestralkaryotype to thethree teleost genomes.The date we adopt for whole-genome duplication (WGD) and lineage divergence is based onmolecular clock estimates [14]. The last common ancestor of both medaka and zebrafish lived more than 323�9.1million years ago (Mya), and medaka are a much more closely related to Fugu (184�6.8 million years (Mys) apart)than to zebraEsh (329�9.1Mys apart).The figure depicts a model for the distribution of ancestral chromosome seg-ments in the medaka, zebrafish and Tetraodon genomes.Thirteen reconstructed ancestral chromosomes are repre-sented by the gray-scale bars. Major rearrangements are represented by arrows and lineage-specific small-scaletranslocations by dotted arrows.

416 Kobayashi and Takeda

Page 3: Medaka genome project - pdfs.semanticscholar.org...Medaka genome project Daisuke Kobayashi and Hiroyuki Takeda Advance Access publication date 4 October 2008 Abstract The medaka Oryzias

contigs was 700.4 Mb. Aligning the Hd-rR assembly

with reference BACs totaling 2.3 Mb of sequence

showed that our overall nucleotide level accuracy

was 99.96% when 100 base pairs of contig ends were

excluded [19].

We obtained 648 Mb of genomic sequence from

another inbred strain HNI. Hd-rR and HNI belong

to different isolated populations of medaka in south-

ern and northern areas, respectively, of Japan, and

hence a high degree of polymorphism was expected.

The alignment of these two medaka genomes

identified about 16.4 million single nucleotide

polymorphisms (SNPs), from which 2401 SNPs

were genetically mapped onto medaka chromosomes

using a backcross panel between Hd-rR and HNI.

Where possible, at least one SNP marker was selected

in each Hd-rR scaffold of >60 kb. The resultant

genetic map was then used for aligning generated

scaffolds. The newly developed genome assembler,

RAMEN, as well as the SNP genetic map was vital

for generating a high-quality draft genome of the

medaka [19]. Two scaffolds linked by a single BAC

are connected into one ultracontig if it shows

consistency with genetic markers. The N50 ultra-

contig size became �5.1 Mb (excluding gaps),

indicating that half of the nucleotides were within

ultracontigs of greater than�5.1 Mb in size (Table 2),

and that 89.7% of the assembled nucleotides were

anchored to the chromosome (Table 3). Given the

relatively lower coverage of the preceding Tetraodon

genome (65%), our medaka genome project had

therefore generated the highest quality genome avai-

lable for a Esh species. According to the Ensembl

database (http://www.ensembl.org/Danio_rerio/)

released in July 2007, the zebraEsh genome also

reached a coverage of 88.6%. Hence, as of the end

of 2007, the medaka and zebraEsh genomes are of

comparable quality to their counterparts in mammals

including human, mouse, rat and dog.

GENOMIC RESOURCE ANDBROWSERTo enable free access to our genomic information for

the medaka, the University of Tokyo Genome

Browser Medaka (UTGB/medaka) database has

been made publicly available at http://medaka.

utgenome.org/ [20] (Figure 2A). This database

contains the draft genome sequence and other

fundamental features such as predicted genes,

expressed sequence tags (ESTs), GC content, infor-

mation regarding repeats and comparative genomics

and unique data sets. The unique data resources

in UTGB/medaka include (i) genetic markers

[Figure 2A (iv)]; (ii) SNPs and insertions/deletions

(In/Dels) detected between the Hd-rR and HNI

genomes [Figure 2A (v)]; (iii) the positions of BAC/

Fosmid clones on the medaka chromosomes

[Figure 2A (i, ii)]; and (iv) 50 SAGE tags (Figure 2A

(iii), see Figure 1 in reference [21] for further details)

Table 1: The list of the libraries used for the HdrR assembly

Type Size (kb) Strain Collectedreads

Passed reads Passing rate (%) Passed bases Sequencecoverage

Clonecoverage

Plasmid 2.6 Hd-rR 15278100 12 533986 82.04 6788 830 895 9.69 23.3Plasmid 7.5 Hd-rR 978 816 808 789 82.63 335777 725 0.48 4.3Fosmid 37.5 Hd-rR 139 008 87077 62.64 31333 412 0.04 2.3Fosmid 35.5 Hd-rR 390 912 347612 88.92 151266 641 0.22 8.8BAC 135 Hd-rR 114161 104 410 91.46 62 582733 0.09 10.1BAC 180 HNI N/A 13968 N/A 6 642999 0.01 1.3BAC 210 Hd-rR N/A 24 036 N/A 11467501 0.02 3.6Total 13919 878 7387901906 10.55 53.7

Table 2: Medaka genome assembly statistics

Total reads 13 800 000Sequence covarage 10.6�Scaffold (>2kb) 8134Contig N50 9.8kbScaffold N50 1.41MbUltraconting N50 5.1MbTotal length 700.4Mb

Table 3: Anchoring statistics

Bases (Mb) Percentage (%)

oriented 582.1 83.11Anchored unoriented 16.1 2.29

unordered 29.8 4.26Unanchored 72.4 10.34Total 700.4 100

Medaka genome project 417

Page 4: Medaka genome project - pdfs.semanticscholar.org...Medaka genome project Daisuke Kobayashi and Hiroyuki Takeda Advance Access publication date 4 October 2008 Abstract The medaka Oryzias

that identify transcription start sites (TSS) within the

genome.

The genetic markers used in the medaka genome

project and other reported genetic markers [5, 22]

have been incorporated into UTGB/medaka. These

markers are anchored on the scaffolds and their

primer information is also available. To identify the

genes responsible for mutant phenotypes, the estab-

lishment of a linkage between the mutated locus

and polymorphic markers is an important first step.

Figure 2: Snapshots of genomebrowsers. (A) UTGB/medaka.Theunique data set inUTGB/medaka, such as 50 SAGEtags, BAC, Fosmid, geneticmarkers and polymorphic information between Hd-rR and HNI, can be accessed from thispage. (i and ii) Positions of BAC and Fosmid end sequences. (iii) The positions of individual 50SAGE tags are illustrated.(iv) Genetic marker positions are indicated. (v) HNI alignment track represents SNPs and In/Dels between Hd-rR andHNI. Sequencemismatches are highlighted by different colors. It is illustrated how to identify BAC/Fosmid clones har-boring genes of interest using UTGB/medaka. BAC and Fosmid clones that cover the sonic hedgehog (shh) gene areindicated by gray boxes. (B) Ensembl genome browser.Click on ‘View Region in Medaka Genome Browser’ will bringthe corresponding region in UTGB/medaka. (C) UCSC genome browser.Click on ‘UTGB’will bring the correspondingregion in UTGB/medaka.

418 Kobayashi and Takeda

Page 5: Medaka genome project - pdfs.semanticscholar.org...Medaka genome project Daisuke Kobayashi and Hiroyuki Takeda Advance Access publication date 4 October 2008 Abstract The medaka Oryzias

These genetic markers will accelerate positional

cloning in the medaka. The alignment of the

Hd-rR and HNI genomes revealed 16 519 460

SNPs and 2 859 905 In/Dels, and the locations of

individual SNPs and In/Dels are shown as a HNI

alignment track. This very large polymorphism

collection will also be a useful resource for evolu-

tional biology studies and a repository of new genetic

markers.

Three BAC and two Fosmid libraries were

constructed and end-sequenced during the project.

These end-sequences are aligned on the genome and

the BAC (total 142 414) and Fosmid (total 217 344)

clones are available at the National Bioresource

Project (NBRP) medaka at http://www.nbrp.jp/

report/reportProject.jsp?project¼medaka.

The 50 SAGE tags (841 235) that were used to

identify 344 266 TSS within the medaka genome are

shown in the 50SAGE tag track. The sequence

information of these tags is available and can be used

for amplifying the 50 ends of transcripts (i.e. used as a

forward primer). This will complement the current

shortage of transcript information in the medaka.

The medaka genome sequence has now been

incorporated into the other public genome browsers

Ensembl (Figure 2B) and University of California,

Santa Cruz (UCSC) (Figure 2C). The Ensembl Oct.

2005 MEDAKA1 assembly and the UCSC Medaka

April 2006 (oryLat1) assembly is equivalent to the

version 1.0 draft assembly in UTGB/medaka. These

public genome browsers also provide their own

unique data. For example, another gene annotation

was independently achieved by the Ensembl pipe-

line. Moreover, UCSC provides several comparative

genomic data. Furthermore, you can switch from

Ensembl or UCSC to UTGB/medaka to view the

corresponding region, that is to say, unique data in

UTGB/medaka can be accessed from these other

public genome browsers. For the user’s convenience,

UTGB/medaka also retrieves data from other

databases, such as Ensembl predicted genes and the

NCBI UniGene data sets. Hence, these genome

browsers can be used cooperatively if required.

SEX-CHROMOSOMEDIFFERENTIATION INMEDAKASimilar to mammals, the medaka has an XX–XY

sex-determining system [9], but the differentiation

of its sex chromosomes appears to be far more

primitive. The X and Y chromosomes of the medaka

are not cytogenetically distinct and crossing-over is

possible over almost their entire lengths [23]. In

addition, fully viable and fertile YY males and

females and XX males can be obtained [10, 24]. The

linkage map of these sex chromosomes [linkage

group 1 (LG1)] contains in large part an even

distribution of markers like on the autosomes.

Around the sex-determining locus, however, a

pronounced clustering of markers indicates a region

in which crossing-over is suppressed [25, 26]. In

addition, the medaka sex-determining gene DMY(dmrt1b) [11, 27–29] has been identified and LG1

with a DMY insertion serves as a Y chromosome,

whereas LG1 without this gene insert functions as

the X chromosome (Figure 3A).

As mentioned above, the mechanisms underlying

medaka sex determination and the sex chromosomes

in this species have been studied extensively. The

medaka genome project shed further light on the

differences between the Y- and X-chromosome

at the nucleotide sequence level. Because we

sequenced the genome of an inbred strain, essentially

no polymorphisms between allelic regions were

found other than sequencing errors. Thus, although

we sequenced male genomic DNA, it was difficult to

ascertain whether the sequence reads of LG1 were

derived from the X or Y chromosome. However,

we observed unusual haplotypes for LG1 in a region

extending over 3.5 Mb around the Y-speciEc region

(Figure 3). This directly demonstrates the local

suppression of crossing-over near the male-

determining region, which has long been evident

in medaka genetic recombination tests [25]. On the

other hand, the region that is far from the sex-

determining locus in LG1 is highly homozygous,

indicating free crossing over. These results indicate

that the inserted male-determining region has an

impact on the recipient LG1 with respect to

restriction of recombination, but at present, the

effect is limited to about 10% of its length, 3.5/

33.7 Mb. It is widely accepted that as a consequence

of ‘proto-X’ and ‘proto-Y’ failing to cross over with

the sex-determining locus, they are genetically

isolated from one another and will diverge over

evolutionary time [30]. The medaka genome

sequencing project has therefore revealed the

suppression of recombination between the primitive

sex chromosomes at the sequence level for the first

time. The medaka Y chromosome is, so to speak,

at an initial stage of sexual differentiation at the

chromosomal level. Similar local suppression of

Medaka genome project 419

Page 6: Medaka genome project - pdfs.semanticscholar.org...Medaka genome project Daisuke Kobayashi and Hiroyuki Takeda Advance Access publication date 4 October 2008 Abstract The medaka Oryzias

crossing-over near the sex-determining regions has

also been reported in other Esh species such as the

stickleback [31]. Further, comparative analysis of the

medaka Y chromosome may yield important insights

into the evolution of vertebrate sex chromosomes.

GENE PREDICTIONWITH ACOMBINATIONOFEVIDENCE-BASEDANDIN SILICOMETHODOLOGIESBecause of the limited availability of EST and full-

length cDNA information for the medaka, the gene

identification program GENSCAN, in combination

with 50SAGE, was used to annotate the draft genome

sequences. Hashimoto etal. [21] recently showed that

the 50SAGE method could detect TSS in the human

genome with a 99% degree of accuracy. A large

collection of 50 SAGE tags, each tag representing

19–20 bases of the 50-end of an mRNA, allowed us

to globally identify the medaka TSS. The protein

coding regions were subsequently predicted from

these sites using GENSCAN. During the medaka

genome project, we collected 1 186 742 50 SAGE

tags in total from a mixture of 0–7 dpf (days post

fertilization) medaka embryos and adult whole

bodies. The combination of the tag information

(evidence-based) and use of GENSCAN with our

newly developed algorithm (an in silico method)

Enally produced 20 141 non-redundant sets of

predicted genes. Predicted genes are supported by

a single or multiple TSS, possibly reflecting

their expression levels. The predicted genes were

then compared with those of human, Tetraodon,Takifugu, zebraEsh, chicken, mouse using TBALSTX

for characterization purposes. A total of 16 414

Figure 3: Sex chromosome differentiation in medaka. (A) Sex chromosomes in medaka. The Y-speciEc region(250kb) itself is derived from the duplication of a fragment from linkage group (LG) 9 containing dmrt1. One of theduplicated fragments was inserted into LG1, and its dmrt1became themale-determining gene,DMY/dmrt1b.The 3.5Mbregion inwhichhaplotypes are observed is shown inblack line. (B) An example of haplotypes near theY-speciEc region.(B) Courtesy of Dr Masahiro Kasahara (University of Tokyo).

420 Kobayashi and Takeda

Page 7: Medaka genome project - pdfs.semanticscholar.org...Medaka genome project Daisuke Kobayashi and Hiroyuki Takeda Advance Access publication date 4 October 2008 Abstract The medaka Oryzias

(81.5%, Figure 4a, light gray) were found to have

homologues in at least one of six species with 11 809

(58.6%, Figure 4d, black) having homologues in all

six species (E< 10�4). Based on these data, we also

generated a transcriptome map, in which the density

of the 50SAGE tags (Figure 4B, bar chart, black

arrowhead), together with the gene density

(Figure 4B, line chart, gray arrowhead), was plotted.

The correlation coefficient between the density of

50SAGE tags, possibly reflecting the gene expression

levels and the gene density in the entire genome was

0.383, thus exhibiting a weak correlation. However,

correlation coefficients in individual chromosomes

were largely divergent, ranging from �0.07 to 0.58,

indicating the preferential localization of highly

expressed genes in the medaka genome.

RAPIDLYAND SLOWLY EVOLVINGGENE CATEGORIES IN THEMEDAKA SPECIESTo gain insight into gene evolution and species

differentiation, we examined rapidly and slowly

evolving gene categories between the two medaka

inbred strains Hd-rR and HNI. The average SNP

rate between Hd-rR and HNI was found to be

3.42% and the average Ka/Ks ratio (where Ka is the

number of non-synonymous substitutions and Ks,the number of synonymous substitutions) of 8889

medaka predicted genes (qualiEed by certain criteria)

measured as 0.413. These figures are significantly

higher than those determined for the human–

chimpanzee lineage (1.23% for SNP and 0.23 for

Ka/Ks) [32], which of course has experienced major

speciation events for >6 Myr. The high degree of

intra-species variation prompted us to undertake a

genome-wide comparison of the evolutionary rate

in protein-coding genes between the northern–

southern medaka and human–chimpanzee lineages,

to gain further insight into the relationship between

genetic variations and species differentiation. The

Ka/Ks ratio has been commonly used as an indicator

of selective constraints and is typically calculated

from interspecies alignments [33]. Proteins with

rigorous functional or structural requirements are

subject to strong purifying (negative) selective

pressure, resulting in smaller numbers of amino

acid changes. Therefore, these proteins tend to

evolve more slowly than proteins with weaker

constraints.

To measure evolutionary constraints among

different classes of proteins, we calculated the

median Ka/Ks ratio of each functional category of

genes, based on the Gene Ontology (GO) classifica-

tion. We conducted the large-scale survey of Ka/Ksratios between the northern–southern medaka and

human–chimpanzee lineages (Figure 4C). The results

we obtained revealed the following interesting

tendencies for rapidly and slowly evolving gene

categories in the medaka species relative to the

hominid lineage. In this analysis, we looked at the

speciEc categories outlined by previous studies in

mammalian species as they identified immunity, host

defense, reproduction (sexual reproduction, gama-

togenesis etc.) and olfaction as rapidly evolving

categories whereas intracellular signaling, neurogen-

esis and neurophysiology (e.g. synaptic transmission)

were found to be slowly evolving [33]. The rapidly

evolving categories are thought to be involved in

adaptation to environmental conditions and sexual

separation, both of which are essential processes

during and/or after speciation. Intriguingly, these

rapidly evolving categories are not evident in the

medaka lineage, and they are mostly ranked as

intermediate. In particular, the reduced evolutionary

rate in the reproduction- and sex-related categories

(Figure 4C, ovals) might explain why the two

medaka strains can mate and produce fertile offspring

even after a long period of geographical and genetic

separation. The difference in sex-determining sys-

tems between the medaka and mammals could affect

the evolutionary rate on genes with reproduction-

related functions. In contrast, slowly evolving

neural-related categories in mammals (Figure 4C,

boxes) exhibit relaxed constraints in the medaka

lineage, and this tendency may reflect less compli-

cated neural circuits in Esh, or potentiate the

adaptation ability of the Esh nervous system to a

variety of living environment. It is generally accepted

that the extent of phenotypic variation between

organisms is not strictly related to the degree of

sequence variation. This holds true in the case of

species differentiation. The two medaka poly-

morphic strains, Hd-rR and HNI, remain as a

single species, although their sequence differences is

far greater than that observed between human and

chimpanzee and between the mouse species Musmusculus and Mus spretus. Our comparative analyses

thus suggest that differential selective pressures act on

speciEc categories in a lineage-speciEc manner,

which may contribute to a pattern of evolution,

Medaka genome project 421

Page 8: Medaka genome project - pdfs.semanticscholar.org...Medaka genome project Daisuke Kobayashi and Hiroyuki Takeda Advance Access publication date 4 October 2008 Abstract The medaka Oryzias

Figure 4: Medaka genes. (A) Breakdown of medaka gene homologues in other species. (a) 16 414 homologues, eachfound in at least one of six species, (Tetraodon, zebrafish,Takifugu, chicken, mouse, human). (b) 15565, each found in atleast one of three fishes (Tetraodon, zebrafish, Takifugu). (c) 12 821, each found in all three fishes (Tetraodon, zebrafish,Takifugu). (d) 11809, each found in all six species (Tetraodon, zebrafish, Takifugu, chicken, mouse, human). (B) Medakatranscriptomemap.The vertical lines represent themedaka linkage groups. Each horizontal bar (black arrowhead) tothe right represents themedian of 50SAGE tags.The curved line (gray arrowhead) represents the gene density with aslidewindow size of1Mb.Figures under linkage groups numbers showcorrelation coefficients between the expressionlevel and the gene density in individual chromosomes. (C) Dots representpairs ofmedians of Ka/Ks ratios in the north-ern-southernmedaka strains and thehuman^ chimpanzee lineage forGeneOntology (GO) categorieswith�10 genes.Rapidly and slowly evolving GO categories inmedaka relative to the hominid lineage are shown in closed rhombus andopen triangle, respectively.

422 Kobayashi and Takeda

Page 9: Medaka genome project - pdfs.semanticscholar.org...Medaka genome project Daisuke Kobayashi and Hiroyuki Takeda Advance Access publication date 4 October 2008 Abstract The medaka Oryzias

for example, adaptation with or without speciation.

The genome sequences of polymorphic medaka

inbred lines thus provide unique resources for the

analysis of vertebrate gene evolution.

POLYMORPHISMS BETWEENTWO INBREDMEDAKA STRAINS,Hd-rR ANDHNIIn the medaka genome project, 16.4 million SNPs,

1.4 million insertions and 1.45 million deletions

were identified by aligning the HNI contigs with the

Hd-rR genome. These very large SNP and In/Dels

resources enabled us to analyze genome-wide SNP

and In/Dels distributions.

The genome-wide SNP rate (see Table 2,

Supplementary Material in reference [19] for further

details) between the two inbred medaka strains was

measured at 3.42%, which is, to our knowledge, the

highest seen in any vertebrate species. To further

study the SNP distribution pattern, a 150 kb region

harboring the HoxAa cluster was analyzed and whilst

the SNPs showed a roughly uniform distribution at

an average frequency of 3.4% as a whole, some local

fluctuations with a tendency towards the lower rate

in the coding regions (1.8%) was observed. In

addition, the distribution of small In/Dels, where

>50% are one or two nucleotides in length, are also

uniform across the genome but their frequency is

5–7-fold lower than the SNP frequency. The

relative frequency of In/Dels to SNP is 0.173,

which is higher than that of either human and

chimpanzee (0.145; [32]) or chicken [34]. This

tendency, together with a recent whole-genome

duplication (WGD) event, may contribute to the

genomic diversity that enables the rapid adaptive

radiation of teleost species.

MEDAKAGENOME EVOLUTIONThe WGD events and subsequent asymmetric

changes are known to be important in evolution.

With the accumulation of genomic information

from several teleost species, a number of studies have

examined WGD in the teleost linage [13, 35, 36].

These reports have estimated the number of proto-

karyotypes prior to WGD but a scenario for

the evolution of the teleost genome (i.e. inter-

chromosomal rearrangements) had not been

adequately developed thus far. Our higher quality

medaka draft genome sequences prompted us to

reconstruct proto-karyotypes and thus the chromo-

somal history of this species. We have thereby

conducted large-scale four-way comparisons of the

medaka, human, zebrafish and Tetraodon genomes.

The following analysis was performed: (i) 5918

orthologous genes were found between the medaka

and Tetraodon chromosomes, and 1365 orthologs

between the medaka and zebrafish chromosomes,

allowing us to detect clear orthologous chromosome

correspondence; (ii) using the human genome as a

reference genome that diverged before the WGD in

the teleost lineage, we were able to identify 2009

pairs of medaka duplicated genes that were derived

from the WGD by associating each pair of such

genes with their single human ortholog. Similarly,

2134 pairs of Tetraodon paralogs were also detected

whereby we observed clear paralogous chromosome

correspondence; and (iii) in the human lineage,

due to inter-/intra-chromosome rearrangements, the

ancestral chromosome was broken into smaller

segments that were likely to have syntenic segments

on a pair of duplicated medaka chromosomes. These

segments are known as doubly conserved synteny

(DCS) [13]. In reality, most regions of medaka

chromosomes have DSC with human chromosomes.

Based on these observations, we reconstructed a

proto-karyotype. The key events we propose from

these analyses are as follows: in a relatively short

period of about 50 Myr after the WGD event

[336–404 million years ago (Mya), the MTZ-

ancestor (the last common ancestor of medaka,

Tetraodon and zebrafish) had 24 chromosomes and had

undergone eight major inter-chromosomal rearran-

gements (two fissions, four fusions and two translo-

cations). The medaka genome has therefore

preserved its ancestral genomic structure without

undergoing major inter-chromosomal rearrange-

ments for >300 Myr (Figure 1).

As stated above, the high-quality medaka genome

that we produced has contributed to the analysis of

vertebrate genome evolution [37]. Likewise, this

genomic resource is providing an important refer-

ence point for various ray-Enned Eshes that have yet

to be sequenced. This holds true not only for model

fish organisms, but also aquaculture species of fish

that are commercially important. Indeed, the medaka

genome information is now being utilized to

construct a syntenic relationship between fully

sequenced model fish species, including medaka,

and tree marine species of commercial and evolu-

tionary interest [38].

Medaka genome project 423

Page 10: Medaka genome project - pdfs.semanticscholar.org...Medaka genome project Daisuke Kobayashi and Hiroyuki Takeda Advance Access publication date 4 October 2008 Abstract The medaka Oryzias

PERSPECTIVESAmong teleost fish fully sequenced genomes are

available for five species: zebrafish (Danio rerio),Takifugu (Fugurubripes),Tetraodon (Tetraodon nigroviridis),medaka (O. latipes) and three-spined stickleback

(Gasterosteus aculeatus). Comparative studies among

these species will greatly facilitate the study of the

genetic and molecular mechanisms that underlie

development and evolution. In addition to further

knowledge of the basic biology of these species, fish

industries will also benefit from these genome

resources. In addition, although the latest expansion

in the available genome resources for aquaculture fish

species are still at an early stage the accumulated

genomic resources for model fish species including

the medaka greatly facilitates comparative genome

analysis of commercially important fish species.

Comparative analysis of multiple genomes in

related species also benefits the study of evolutionary

biology. Recently, the Drosophila 12 Genomics

Consortium reported the genomes of 12 Drosophilaspecies and revealed a comprehensive picture of the

evolution in this organism [39]. This comparative

analysis also enabled the computational prediction of

functional elements, including coding sequences as

well as regulatory motifs and noncoding RNAs [40].

Medaka is one of the best model organisms to use in

the study of evolutionary biology in vertebrates.

There are also many close relatives of medaka that

are indigenous to East and South–East Asia [41],

many of which are now maintained by NBRP

medaka. Together with the medaka draft genome,

an initial low-coverage sequence of the genomes of

these fish species would shed further light on the

mechanisms underlying speciation and diversity,

which have yet to be fully addressed at the

genome sequence level. Next generation sequencers

will greatly facilitate this.

The medaka genome project has provided a high-

resolution mapping of conErmed genetic markers

as well as genomic sequences on which individ-

ual genetic markers are located in this species.

These genomic resources have enabled the recent,

successful positional cloning of medaka mutants

[18, 42–46]. Both published [5, 19] and additional

genetic markers designed using genome sequence

data have also facilitated the rapid mapping of

mutant loci. In addition, BAC contig construction,

which was formerly a prerequisite step for sequen-

cing of the regions spanning a mutant locus, is no

longer necessary which greatly assists with the

positional cloning of medaka mutants. Recently, an

expanded pool of medaka mutant stocks [4, 17] has

been deposited in NBRP medaka. These mutants are

maintained as living and/or frozen stocks and are

available for distribution. Together with medaka

genome resources, medaka mutants represent new

genes and, therefore, are important for exploring

genomic functions.

In summary, the medaka draft genome is the first

high-quality fish genome with an appreciable

number of useful genetic markers. This comprehen-

sive genomic information available online is accel-

erating positional cloning of many mutant genes

leading to the elucidation of novel genes and

pathways that have crucial functions in vertebrate

development and in human disease. It also provides a

high-quality reference sequence for studies of basic

biology and for use by commercial fisheries.

AcknowledgementsThe medaka genome project was jointly conducted by nearly 40

researchers, including seven principal investigators in Japanese

laboratories of the University of Tokyo, National Institute of

Genetics, National Institute of Informatics, RIKEN, Keio

University and Niigata University. Here, we would like to

thank again all these people for their devoted contribution to the

project.

References1. Ishikawa Y. Medakafish as a model system for vertebrate

developmental genetics. Bioessays 2000;22:487–95.

2. Wittbrodt J, Shima A, Schartl M. Medaka-a modelorganism from the far East. Nat RevGenet 2002;3:53–64.

3. Shima A, Mitani H. Medaka as a research organism: past,present and future. Mech Dev 2004;121:599–604.

4. Halpern M. Medaka special issue. Mech Dev 2004;121:593–1008.

5. Naruse K, Fukamachi S, Mitani H, et al. A detailed linkagemap of medaka, Oryzias latipes: comparative genomics andgenome evolution. Genetics 2000;154:1773–84.

Key Points� The medaka genome sequencing project achieved to obtain a

high-quality draft genome sequence.� Medaka genome resources are available online.� Sex chromosome differentiation was analyzed at the nucleotide

sequence level.� Proto-karyotypes of teleost and their chromosomal history

were analyzed.� Medaka gene evolution was assessed using medaka gene catalo-

gue that are predictedbyde novo.

424 Kobayashi and Takeda

Page 11: Medaka genome project - pdfs.semanticscholar.org...Medaka genome project Daisuke Kobayashi and Hiroyuki Takeda Advance Access publication date 4 October 2008 Abstract The medaka Oryzias

6. Naruse K, Hori H, Shimizu N, et al. Medaka genomics: abridge between mutant phenotype and gene function. MechDev 2004;121:619–28.

7. Sasaki T, Shimizu A, Ishikawa SK, et al. The DNAsequence of medaka chromosome LG22. Genomics 2007;89:124–33.

8. Patyna PJ, Davi RA, Parkerton TF, et al. A proposedmultigeneration protocol for Japanese medaka (Oryziaslatipes) to evaluate effects of endocrine disruptors. SciTotalEnviron 1999;233:211–20.

9. Aida T. On the inheritance of color in a fresh-waterEsh, Aplocheilus latipes temmick and schlegel, withspecial reference to sex-linked inheritance. Genetics 1921;6:554–73.

10. Yamamoto T. Artificially induced sex-reversal in genotypicmales of the medaka (Oryzias latipes). J Exp Zool 1953;123:571–94.

11. Matsuda M, Nagahama Y, Shinomiya A, et al. DMY is aY-specific DM-domain gene required for male develop-ment in the medaka fish. Nature 2002;417:559–63.

12. Aparicio S, Chapman J, Stupka E, et al. Whole-genomeshotgun assembly and analysis of the genome of Fugurubripes. Science 2002;297:1301–10.

13. Jaillon O, Aury J-M, Brunet F, et al. Genome duplication inthe teleost fish Tetraodon nigroviridis reveals the earlyvertebrate proto-karyotype. Nature 2004;431:946–57.

14. Yamanoue Y, Miya M, Inoue JG, et al. The mitochondrialgenome of spotted green pufferfish Tetraodon nigroviridis(Teleostei: Tetraodontiformes) and divergence time estima-tion among model organisms in fishes. Genes Genet Syst2006;81:29–39.

15. Hyodo-Taguchi Y, Egami N. Establishment of inbredstrains of the medaka Oryzias latipes and the usefulnessof the strains for biomedical research. Zool Sci 1985;2:305–16.

16. Hyodo-Taguchi Y. Inbred strains of the medaka, Oryziaslatipes. Fish BiolJMedaka 1990;8:11–4.

17. Furutani-Seiki M, Sasado T, Morinaga C, et al. A systematicgenome-wide screen for mutations affecting organo-genesis in Medaka, Oryzias latipes. Mech Dev 2004;121:647–58.

18. Yokoi H, Shimada A, Carl M, et al. Mutant analyses revealdifferent functions of fgfr1 in medaka and zebrafishdespite conserved ligand-receptor relationships. Dev Biol2007;304:326.

19. Kasahara M, Naruse K, Sasaki S, et al. The medaka draftgenome and insights into vertebrate genome evolution.Nature 2007;447:714–9.

20. Ahsan B, Kobayashi D, Yamada T, et al. UTGB/medaka:genomic resource database for medaka biology. NucleicAcidsRes 2008;36:D747–52.

21. Hashimoto S-i, Suzuki Y, Kasai Y, et al. 50-end SAGE forthe analysis of transcriptional start sites. Nat Biotech 2004;22:1146–9.

22. Kimura T, Yoshida K, Shimada A, et al. Genetic linkagemap of medaka with polymerase chain reaction lengthpolymorphisms. Gene 2005;363:24–31.

23. Matsuda M, Matsuda C, Hamaguchi S, et al. Identificationof the sex chromosomes of the medaka, Oryzias latipes, byfluorescence in situ hybridization. CytogenetCellGenet 1998;82:257–62.

24. Yamamoto T. Artificial induction of functional sex-reversalin genotypic females of the medaka (Oryzias latipes). J ExpZool 1958;137:227–63.

25. Kondo M, Nagao E, Mitani H, et al. Differences inrecombination frequencies during female and male meiosesof the sex chromosomes of the medaka, Oryzias latipes.Genet Res 2001;78:23–30.

26. Matsuda M, Satoyama S, Hamaguchi S, et al. Male-specificrestriction of recombination frequency in the sex chromo-somes of the medaka Oryzias latipes. Genet Res 1991;73:225–31.

27. Kondo M, Nanda I, Hornung U, et al. Evolutionary originof the medaka Y chromosome. Curr Biol 2004;14:1664–9.

28. Nanda I, Kondo M, Hornung U, et al. A duplicated copy ofDMRT1 in the sex-determining region of the Y chromo-some of the medaka, Oryzias latipes. ProcNatl Acad SciUSA2002;99:11778–83.

29. Kondo M, Nanda I, Hornung U, et al. Absence of thecandidate male sex-determining gene dmrt1b(Y) of medakafrom other fish species. Curr Biol 2003;13:416–20.

30. Charlesworth B. Sex determination: primitive Y chromo-somes in fish. Curr Biol 2004;14:R745–7.

31. Peichel CL, Ross JA, Matson CK, et al. The master sex-determination locus in threespine sticklebacks is on anascent Y chromosome. Curr Biol 2004;14:1416–24.

32. CSAAC. Initial sequence of the chimpanzee genome andcomparison with the human genome. Nature 2005;437:69–87.

33. Hurst LD. The Ka/Ks ratio: diagnosing the form ofsequence evolution. Trends Genet 2002;18:486–7.

34. Wong GK, Liu B, Wang J, etal. A genetic variation map forchicken with 2.8 million single-nucleotide polymorphisms.Nature 2004;432:717–22.

35. Naruse K, Tanaka M, Mita K, et al. A medaka gene map:the trace of ancestral vertebrate proto-chromosomesrevealed by comparative gene mapping. Genome Res 2004;14:820–8.

36. Woods IG, Wilson C, Friedlander B, et al. The zebrafishgene map defines ancestral vertebrate chromosomes.Genome Res 2005;15:1307–14.

37. Nakatani Y, Takeda H, Kohara Y, et al. Reconstruction ofthe vertebrate ancestral genome reveals dynamic genomereorganization in early vertebrates. Genome Res 2007;17:1254–65.

38. Sarropoulou E, Nousdili D, Magoulas A, et al. Linking thegenomes of nonmodel teleosts through comparativegenomics. Mar Biotechnol 2008;10:227–33.

39. Clark AG, Eisen MB, Smith DR, et al. Evolution of genesand genomes on the Drosophila phylogeny. Nature 2007;450:203–18.

40. Stark A, Lin MF, Kheradpour P, et al. Discovery offunctional elements in 12 Drosophila genomes usingevolutionary signatures. Nature 2007;450:219–32.

41. Takehana Y, Naruse K, Sakaizumi M. Molecular phylogenyof the medaka fishes genus Oryzias (Beloniformes:Adrianichthyidae) based on nuclear and mitochondrialDNA sequences. Mol Phylogenet Evol 2005;36:417–28.

42. Saito D, Morinaga C, Aoki Y, et al. Proliferation of germcells during gonadal sex differentiation in medaka: insightsfrom germ cell-depleted mutant zenzai. Dev Biol 2007;310:280–90.

Medaka genome project 425

Page 12: Medaka genome project - pdfs.semanticscholar.org...Medaka genome project Daisuke Kobayashi and Hiroyuki Takeda Advance Access publication date 4 October 2008 Abstract The medaka Oryzias

43. Sasado T, Yasuoka A, Abe K, et al. Distinct contributions ofCXCR4b and CXCR7/RDC1 receptor systems in regula-tion of PGC migration revealed by medaka mutants kazuraand yanagi. Dev Biol; 320:328–39.

44. Morinaga C, Saito D, Nakamura S, et al. The hotei mutationof medaka in the anti-Mullerian hormone receptor causesthe dysregulation of germ cell and sexual development. ProcNatl Acad Sci 2007;104:9691–6.

45. Takashima S, Shimada A, Kobayashi D, et al. Phenotypicanalysis of a novel chordin mutant in medaka. Dev Dyn2007;236:2298–310.

46. Yu J-F, Fukamachi S, Mitani H, et al. Reduced expressionof vps11 causes less pigmentation in medaka, Oryzias latipes.Pigment Cell Res 2006;19:628–34.

426 Kobayashi and Takeda