kazusamart: 植物関連生物...

12
KazusaMart: 植物関連生物 ゲノムデータの統合と管理 ○藤澤貴智1),中尾 光輝2),岡本忍2),中村保一3) 1)かずさDNA研究所,2)情報・システム研究機構 ライフサイエンス統合 データベースセンター,3)DDBJ

Upload: others

Post on 20-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: KazusaMart: 植物関連生物 ゲノムデータの統合と管理lifesciencedb.jp/symposium2009/poster/A-23.pdf · annotation データ変換(配列、GFF3) データ変換(ブックマーク)

KazusaMart: 植物関連生物ゲノムデータの統合と管理

○藤澤貴智1),中尾 光輝2),岡本忍2),中村保一3) 1)かずさDNA研究所,2)情報・システム研究機構 ライフサイエンス統合データベースセンター,3)DDBJ

Page 2: KazusaMart: 植物関連生物 ゲノムデータの統合と管理lifesciencedb.jp/symposium2009/poster/A-23.pdf · annotation データ変換(配列、GFF3) データ変換(ブックマーク)

Background  バイオ関連プロジェクトのゲノム分野において、データベースの作成・公開は研究成果を広く公開し、今後の様々な用途に活用されるためのデータを提供する手段として重要な意義を持っている。また、データベースは作成するのみではなく、データの追加・更新など作成した後も維持・継続していく必要がある。

 植物ゲノムは、一般的に動物と比較してゲノムサイズが大きく、またリファレンスとなる生物種が限定されている。そのため、研究者はEST解析やマーカー解析などから得られた実験データを手がかりに既存のゲノム情報を有効に活用する必要がある。植物ゲノム情報の統合化は非常に重要であり、理研、かずさ、農水ははじめとした国内の植物微生物ゲノム(スケール)情報の統合的な検索と閲覧環境の構築を目指している。 

http://www.biomart.org/

Page 3: KazusaMart: 植物関連生物 ゲノムデータの統合と管理lifesciencedb.jp/symposium2009/poster/A-23.pdf · annotation データ変換(配列、GFF3) データ変換(ブックマーク)

BioMart  B i o M a r t は E B I ( E u r o p e a n Bioinformatics Institute)と CSHL(the Cold Spring Harbor Laboratory)が共同開発したクエリー指向型のデータ管理システムである。 様々なタイプのデータを扱うことが可能で、多様なデータを公開するためのデータベース作成機能を持っている。ユーザーはWebブラウザ上で、ゲノム情報を反映した細かい検索条件でデータを収集して閲覧でき、簡便な方法でダウンロードすることが可能である。また、B i o c l i p s e 、 b i o m a R t (BioConductor)、Cytoscape、Galaxy、Taverna、WebLabなどの多数のソフトウェアからアクセスが可能でありデータの再利用性が高い。

Page 4: KazusaMart: 植物関連生物 ゲノムデータの統合と管理lifesciencedb.jp/symposium2009/poster/A-23.pdf · annotation データ変換(配列、GFF3) データ変換(ブックマーク)

BioMartによる植物ゲノム情報統合

Intra-org.

genome infogenome resource infoexperimental info...

org. 1gene A’project A

genome infogenome resource infoexperimental info...

org. 1gene A

ref.

IDs

Page 5: KazusaMart: 植物関連生物 ゲノムデータの統合と管理lifesciencedb.jp/symposium2009/poster/A-23.pdf · annotation データ変換(配列、GFF3) データ変換(ブックマーク)

BioMartによる植物ゲノム情報統合

Arabidopsisgene A

genome infogenome resource infoexperimental info...

Arabidopsisgene A’RIKEN TIAR

IDs

relations

Intra-org. Inter-org.

genome infogenome resource infoexperimental info...

Ricegene B

RAP

genome infogenome resource infoexperimental info...

genome infogenome resource infoexperimental info...

Lotusgene CKazusa

genome infogenome resource infoexperimental info...

Grassgene DGramene

Reference

Page 6: KazusaMart: 植物関連生物 ゲノムデータの統合と管理lifesciencedb.jp/symposium2009/poster/A-23.pdf · annotation データ変換(配列、GFF3) データ変換(ブックマーク)

Genome Database at KDRI

JOURNAL OF BACTERIOLOGY, Apr. 2007, p. 2750–2758 Vol. 189, No. 70021-9193/07/$08.00!0 doi:10.1128/JB.01903-06Copyright © 2007, American Society for Microbiology. All Rights Reserved.

Coordinated High-Light Response of Genes Encoding Subunits ofPhotosystem I Is Achieved by AT-Rich Upstream Sequences in

the Cyanobacterium Synechocystis sp. Strain PCC 6803!

Masayuki Muramatsu† and Yukako Hihara*Department of Biochemistry and Molecular Biology, Faculty of Science, Saitama University, 255 Shimo-okubo,

Saitama 338-8570, Japan

Received 16 December 2006/Accepted 23 January 2007

Genes encoding subunits of photosystem I (PSI genes) in the cyanobacterium Synechocystis sp. strain PCC6803 are actively transcribed under low-light conditions, whereas their transcription is coordinately andrapidly down-regulated upon the shift to high-light conditions. In order to identify the molecular mechanismof the coordinated high-light response, we searched for common light-responsive elements in the promoterregion of PSI genes. First, the precise architecture of the psaD promoter was determined and compared withthe previously identified structure of the psaAB promoter. One of two promoters of the psaAB genes (P1) andof the psaD gene (P2) possessed an AT-rich light-responsive element located just upstream of the basalpromoter region. These sequences enhanced the basal promoter activity under low-light conditions, and theiractivity was transiently suppressed upon the shift to high-light conditions. Subsequent analysis of psaC, psaE,psaK1, and psaLI promoters revealed that their light response was also achieved by AT-rich sequences locatedat the !70 to !46 region. These results clearly show that AT-rich upstream elements are responsible for thecoordinated high-light response of PSI genes dispersed throughout Synechocystis genome.

Photosynthetic organisms have ability to cope with thechanges in light environment by modulating both the structureand the function of the photosynthetic machinery (31, 59). Atypical example is the flexible control of the amounts of pho-tosystem (PS) and light-harvesting antenna complexes depend-ing on the availability of light energy (4, 27, 38). Under light-limiting conditions, the amount of these complexes is maintainedat high level, because maximal capture of light energy is re-quired to fulfill the energy demand of cells. Under light-satu-rating conditions, on the other hand, they are largely down-regulated since absorption of excess light energy tends to causethe generation of harmful reactive oxygen species (6).

The dynamics of reaction center complexes during the pro-cess of high-light (HL) acclimation have been well character-ized in cyanobacteria. Amount of PSI is more strictly down-regulated than that of PSII upon the exposure to HL (28, 40).The analysis of the pmgA mutant deficient in down-regulationof PSI content revealed that the selective repression of PSI isessential for growth under continuous HL conditions (28, 54).Although the primary determinant of PSI content under HLconditions has not been identified, transcriptional regulation islikely to be one of the important factors. The cyanobacterialPSI complex is comprised of about 11 subunits, with someexceptions (23), and genes encoding these subunits (PSI genes)are dispersed throughout the genome. In Synechocystis sp.

strain PCC 6803, PSI genes are actively transcribed underlow-light (LL) conditions, whereas their transcription is coor-dinately and rapidly down-regulated upon the shift to HL con-ditions (26, 29, 30, 42, 57), except for the psaK2 gene encodingan HL-inducible isoform of the PsaK subunit (19). PSI tran-scripts become barely detectable within 1 h of HL exposureand then gradually reaccumulate after 3 h. The change inpromoter activities of PSI genes is well coincident with thechange in transcript levels (42, 43), suggesting that the coor-dinated light response of PSI genes is achieved at the level oftranscriptional regulation. In the course of HL acclimation,cells need to activate genes related to several processes such asCO2 fixation, protection from photoinhibition, and generalstress management (29). The down-regulation of high pro-moter activities of PSI genes upon the shift to HL conditionsmay be important not only for the repression of PSI content,but also for the recruitment of RNA polymerases to activetranscription of such HL-inducible genes.

As the first step for the elucidation of the molecular mech-anism of coordinated HL response of PSI genes in Synecho-cystis sp. strain PCC 6803, we recently dissected the promoterarchitecture of the psaAB genes encoding reaction center sub-units (43). The psaAB genes have two promoters, P1 and P2,both of which are responsible for the photon flux density-dependent transcription. Deletion analysis of the upstreamregion of psaAB fused to bacterial luciferase reporter genes(luxAB) indicated that the light responses of P1 and P2 areachieved in different manners. The cis element required for thelight response of P1, designated as PE1, was located just up-stream of the "35 element of P1 and was comprised of AT-richsequence. PE1 activated P1 under LL conditions, and thedown-regulation of P1 was achieved by rapid inactivation ofPE1 upon the shift to HL conditions. On the other hand, thecis element required for the light response of P2, designated as

* Corresponding author. Mailing address: Department of Biochem-istry and Molecular Biology, Faculty of Science, Saitama University,255 Shimo-okubo, Saitama 338-8570, Japan. Phone: 81-48-858-3396.Fax: 81-48-858-3384. E-mail: [email protected].

† Present address: Department of Integrated Biosciences, GraduateSchool of Frontier Sciences, University of Tokyo, Kashiwa, Chiba277-8562, Japan.

! Published ahead of print on 2 February 2007.

2750

by o

n S

ep

tem

be

r 7, 2

00

7

jb.a

sm

.org

Do

wn

loa

de

d fro

m

the analysis since the arrangement of regulatory elements fortwo overlapping promoter is difficult to predict without precisepromoter analysis.

Effect of the !70 to !46 region on the promoter activity ofPSI genes. Figure 4A shows the bioluminescence level of Syn-echocystis cells harboring PSI promoter-luxAB reporter geneswith or without the upstream region under LL conditions. Thereporter activities of the downstream promoter fragmentsalone were generally low, but there existed some differencesamong them. For example, the promoter activity of the down-stream region was low in the case of psaD [(8.2 ! 0.2) " 105

relative units/OD730] and psaK1 [(2.3 ! 0.4) " 105 relativeunits/OD730], whereas that of psaLI was significantly high [(1.1 !0.2) " 107 relative units/OD730]. It is possible that the highactivity of the psaLI promoter is brought about by a positiveregulatory element located within the downstream region.When the AT-rich upstream region was added, each promoterdisplayed much higher activity compared with the correspond-ing derivative containing only the downstream region. Thisdemonstrates that the #70 to #46 region can work as a posi-

tive regulatory element for every PSI gene examined here. Thelow activity of psaD and psaK1 promoters was largely up-regulated in the presence of the upstream region by 40.1- and99.9-fold, respectively. On the other hand, strong promoteractivity of psaLI was not enhanced as much by the upstreamregion (5.1-fold). As a result, similar promoter activity (around5.0 " 107 relative units/OD730) was attained among PSI genesirrespective of the activity of the downstream promoter region.

Next, we transformed E. coli cells with the above mentionedreporter constructs and measured the level of bioluminescenceto see whether the upstream region can work as a positiveregulatory element in E. coli cells (Fig. 4B). In all strainsharboring PSI promoter-luxAB constructs, the luminescencelevel was higher than that of the control cells having promot-erless luxAB genes [(1.4 ! 0.5) " 108 relative units/OD600],showing that PSI promoters can be recognized by RNA poly-merase of E. coli. The rank orders of promoter strength aresimilar in both Synechocystis and E. coli cells. Namely, theactivities of the downstream promoter fragments of psaD andpsaK1 were low [(2.4 ! 1.0) " 108 relative units/OD600 and

FIG. 3. Mapping of the 5$ ends of PSI transcripts. (A) Total RNA was isolated from the wild-type cells incubated under HL conditions for 0,1, 3, and 6 h and used for primer extension analysis of psaC, psaE, psaFJ, psaK1, and psaLI. Detected 5$ ends of the major transcripts are indicatedby asterisks, and those of minor ones are indicated by dots. (B) Nucleotide sequences of the core promoter and its upstream region of PSIpromoters. Transcriptional start points are shown in boldface letters. The promoters are aligned according to the major transcriptional start pointnoted as %1. Putative #35 and #10 hexamers are boxed. Light-responsive positive elements identified in psaAB and psaD promoters are shadedin gray. The nucleotides shown to be critical for the light response of psaAB promoter (43) are underlined. The numbers in parentheses shownabove the nucleotide sequence of the P2 promoter of psaFJ indicate the position according to the major transcriptional start point of the P1promoter.

2754 MURAMATSU AND HIHARA J. BACTERIOL.

by o

n S

ep

tem

be

r 7, 2

00

7

jb.a

sm

.org

Do

wn

loa

de

d fro

m

mutant

Gene Indexing

genomic contents

gene

motif

homologs

external links

transcript

annotation

データ変換(配列、GFF3)

データ変換(ブックマーク)

相互リンク

データ入力

移行

Legacy DB

mutant info

publication

かずさGenome DBおよびKazusa Annotationの光合成細菌ゲノムデータを利用し、遺伝子IDによってデータ統合を行った。データの互換性を保持し、 定期的な更新データの変換によってデータ管理、運用が可能である

KazusaMart

Page 7: KazusaMart: 植物関連生物 ゲノムデータの統合と管理lifesciencedb.jp/symposium2009/poster/A-23.pdf · annotation データ変換(配列、GFF3) データ変換(ブックマーク)

How to use KazusaMart

2. Datasetを選択3. Attributesを選択

4. Filtersを選択

5. “Count”でデータ件数を確認

6. “Result”で結果(の一部)を確認

1. Databaseを選択

7. 配列やテーブル形式 (HTML, CSV, TSV, XLS)のデータをダウンロード

Page 8: KazusaMart: 植物関連生物 ゲノムデータの統合と管理lifesciencedb.jp/symposium2009/poster/A-23.pdf · annotation データ変換(配列、GFF3) データ変換(ブックマーク)

簡単な検索例

SynechocystisのChromosome上200000から1000000の領域の遺伝子のリスト

SynechocystisのChromosome上200000から1000000の領域の遺伝子のアミノ酸配列

Page 9: KazusaMart: 植物関連生物 ゲノムデータの統合と管理lifesciencedb.jp/symposium2009/poster/A-23.pdf · annotation データ変換(配列、GFF3) データ変換(ブックマーク)

複雑な検索(例)

1. キーワードABCでヒットする遺伝子のうち、SMART:SM00382にヒットする遺伝子の上流200bpの塩基配列を取得する

2. Anabaena, Thermo, GloeobacterにIdentities 50%以上のOrthologが存在するが、Chlorobiumに存在しない遺伝子のリストを取得する

3. データベース上のgene descriptionが”hypothetical protein”のうち文献に出現してかつ文献中に遺伝子名が記述された遺伝子リストを取得する

4. Gene Ontology: Cellular ComponentのGOをもちかつ完全な遺伝子破壊株が報告されていない遺伝子のリストを取得する

Synechocystis sp. PCC 6803において

Page 10: KazusaMart: 植物関連生物 ゲノムデータの統合と管理lifesciencedb.jp/symposium2009/poster/A-23.pdf · annotation データ変換(配列、GFF3) データ変換(ブックマーク)

理研オミックスデータとの連携

TAIR8の遺伝子とIDを介して理研の実験情報が統合されている(試験導入)

リンク関係

データ変換 (RDF)

Page 11: KazusaMart: 植物関連生物 ゲノムデータの統合と管理lifesciencedb.jp/symposium2009/poster/A-23.pdf · annotation データ変換(配列、GFF3) データ変換(ブックマーク)

Summary

• ゲノム情報の統合のケーススタディとして植物関連微生物である光合成細菌34生物種のゲノムデータを用いてBioMartデータベースを作成し、絞り込み検索フィルタのデザインを行った

• ソーシャルブックマークとしてKazusaAnnotationに集積された構造化されたゲノムアノテーション情報 (Gene Indexing) およびMutant情報 (CyanoGene)を遺伝子IDによってデータ統合を行った

• KazuusaMart (β版) として公開し、試験運用を開始した

http://mart.kazusa.or.jp/

Page 12: KazusaMart: 植物関連生物 ゲノムデータの統合と管理lifesciencedb.jp/symposium2009/poster/A-23.pdf · annotation データ変換(配列、GFF3) データ変換(ブックマーク)

Feature works

• 植物共生する根粒菌16種データ公開を行う• 理研シロイヌナズナデータのAttribute、filterの設計と導入および農水RAP、かずさミヤコグサゲノムデータの導入を行う