gabipd: the gabi primary database - a plant integrative

U n i v e r s i t ä t P o t s d a m

Diego Mauricio Riano-Pachon, Axel Nagel, Jost Neigenfind,Robert Wagner, Rico Basekow, Elke Weber, Bernd Muller-Rober,Svenja Diehl, Birgit Kersten

GabiPD : the GABI primary database - aplant integrative omicsdatabase

first published in:Nucleic acids research 37 (2009), Database issue,S. D954 - D959DOI: 10.1093/nar/gkn611

Postprint published at the Institutional Repository of the Potsdam University:In: Postprints der Universitat PotsdamMathematisch-Naturwissenschaftliche Reihe ; 137http://opus.kobv.de/ubp/volltexte/2010/4507/http://nbn-resolving.de/urn:nbn:de:kobv:517-opus-45075

Postprints der Universitat PotsdamMathematisch-Naturwissenschaftliche Reihe ; 137

D954-D959 Nucleic Acids Research, 2009, Vo l. 37, Database issue doi: 10. 1093 /nar/gkn611

Published on line 23 September 2008

GabiPD: the GABI primary database-a plant integrative 'omics' databaste Diego Mauricio Riano-Pachon1

,2, Axe I Nagel\ Jost Neigenfind\ Robert Wagner\ Rico Basekow\ Elke Weber\ Bernd Mueller-Boeber2

, Svenja Diehl3 and Birgit Kersten 1,2 ,*

1GabiPD team, Bioinformat ics group, Max Planck InstitutE~ of Molecular Plant Physiology, Am MOhlenberg 1, 2Department of Molecular Biology, University of Potsdam, Karl-Liebknecht-Strasse 24-25, Haus 20, 14476 Potsdam-Golm and 3Former RZPD German Resource Center for Genome Research GmbH, Berlin, Germany

Received August 6, 2008; Accepted September 9, 2008

ABSTRACT

The GABI Primary Database, GabiPD (http:// www.gabipd.org/), was established in the frame of the German initiative for Genome Analysis of the Plant Biological System (GAB I). The goal of GabiPD is to collect, integrate, analyze and visualize primary information from GABI projects. GabiPD constitutes a repository and analysis platform for a wide array of heterogeneous data from high-throughput experiments in several plant species. Data from different 'omics' fronts are incorporated (Le. genomics, transcriptomics, proteomics and metabolomics), originating from 14 different model or crop species. We have developed the concept of GreenCards for textbased retrieval of all data types in GabiPD (e.g. clones, genes, mutant lines). All data types point to a central Gene GreenCard, where gene information is integrated from genome projects or NCBI UniGene sets. The centralized Gene GreenCard allows visualizing ESTs aligned to annotated transcripts as well as displaying identified protein domains and gene structure. Moreover, GabiPD makes available interactive genetic maps from potato and barley, and protein 2DE gels from Arabidopsis thaliana and Brassica napus. Gene expression and metabolic-profiling data can be visualized through MapManWeb. By the integration of complex data in a framework of existing knowledge, GabiPD provides new insights and allows for new interpretations of the data.

INTRODUCTION

Experimental studies in the post-genomic era generate a very large amount of data from high-throughput experiments on biological systems. Current studies include,

among others , expression and metabolite profiles , proteome and interaction data (e .g. DNA- protein and protein- protein interactions), collected at different space and time scales . This increasing flow of data requires computa tional systems that, besides managing efficiently the enormous quantity of data , are capable of integrating and displaying these disparate data collections in a meaningful and user-friendly way. We have developed the GABI primary database, GabiPD , in order to fulfill these requirements .

GabiPD is a web-accessible database that was developed in the frame of the German initiative for Genome Analysis of the Plant Biological System (Genomanalyse im biologischen System Pflanze, GABI). GabiPD allows a seamless integration of varied 'omics' data types obtained from plant systems and will follow the MIAME (1) and MIAMET (2) standards for storing gene expression and metabolic-profiling data , respectively . Its flexible design allows for a high level of data integration, and eases cross-referencing the different GabiPD data types among each other (e.g. mapping information, sequences and single nucleotide polymorphisms (SNP), 2DE gel images and protein information) and access to public gene/protein-specific information, which in turn provides the users a comprehensive overview of the available information for their particular gene or protein of interest. The integration with genome databases like TAIR (3) and general nucleotide databases like GenBank, as well as cross-links to secondary databases, such as ARAMEMNON (4), PlnTFDB (5), GABI-KAT (6), PhosPhAt (7) and ProMEX (8) further increase the usefulness of GabiPD.

METHODS AND CONTENTS

Design and implementation

GabiPD's web interface was developed using Ped and Java in combination with template processing to separate the visualization from the application logic. Our

' To whom co rrespo ndence should be addressed . Tel: + 49-(0)33 1-5 67-8750; Fax: + 49-(0)33 1-567-898-750; Email: kersten@mpimp-go lm.mpg.de

© 2008 The Author(sl This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creati vecommons.org/licenses/ by-nc/2.0/uk/l which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

o o :E ~

0-Ol D.. CD D..

a 3

o

applications are database driven , which means that the application interface logic is derived directly from the database structure (shown in blue in Figure 1). To achieve this , we deploy reverse engineering methods in combination with template processing to generate interfaces to programming languages like Perl or Java , thus supporting all the database-specific actions like 'insert' , 'update' , 'delete' or ' select' . These object-oriented interfaces are automatically generated from the database schema, supporting inheritance, automated key generation and advanced exception handling. The separation of the database application interface from the application and visualization logic (shown in yellow in Figure 1) facilitates fast adjustment to modifications of the data structure and diminishes the efforts on fixing existing application logic during larger database changes.

GabiPD content and gene-centric views

Currently, GabiPD includes data originating from 14 different angiosperm species representing the most important lineages in the flowering plants (Figure 2). Arabidopsis thaliana is the most widely represented model species, followed by the crop plants Solanum tuberosum (potato) and Hordeum vulgare (barley). In GabiPD, genomic, transcriptomic, proteomic and metabolomic data are integrated from those species. Genomic data comprise mapping information, sequences and SNP/InDel information. Transcriptomics is represented by a large number of ESTs and corresponding sequence trace fi les . ESTs are further analysed by BLAST and ORF analysis. For barley, in addition, EST clustering results and corresponding information on a new 27K unigene set is accessible and downloadable . As a type of proteomic data , annotated 2DE gel images from Arabidopsis thaliana and Brassica napus are integrated . Moreover, transcript and metabolite-profiling data are provided via MapManWeb.

~ Templates J read 4. ,-------,

I API Generator read r "--------' 1 generate

Schema

Data

Database

Web Interface

GreenCards

BioTools

YAMB

Web Services

~L.2:::1 = BiO=MOb:::::::y ===::::J11

D meta information and api generation

D data flow and data processing

Figure 1. Schematic overview of the Ga biPD structure. The 'API Generator' transla tes the 'Templates' using the database metainformation (shown in blue), generating the database application interface (Java and Per! API ). The main applications, i.e. 'Web Interface', 'Web Services' and data ma nipulation routines, interacts with the 'Database' through the database application interface (shown in yellow).

Nucleic Acids Research, 2009, Vol, 37, Database issue D955

Most entries of all GabiPD data types are pointing to the central Gene GreenCard and vice versa (see Figure 3 and next section). In the Gene GreenCard, gene information from genome annotation projects or NCBI UniGene sets is integrated and useful links to secondary databases are provided . Currently, the genome annotation (TAIR version 7.0) for A . thaliana (3) is integrated . Annotations for other sequenced species will follow . In order to ease the transfer of knowledge from sequenced to nonsequenced species, i.e. crop plants , we have performed similarity-based mappings between closely related species, i.e. Arabidopsis and Brassica spp.

Querying the database

We have developed the concept of 'GreenCards' as a central entry-point for text-based data queries and visualization, which grant public as well as credentials-based access to the integrated data in GabiPD. 'GreenCards' enable users to comprehensively query GabiPD by genotype name, marker or gene name, keyword or GenBank sequence accession number. Searches can be restricted to selected species or data types, while wildcards can be used to broaden the scope of the query. The result of a 'GreenCard' search is presented as a list of hits with links to complete descriptions , i.e. GreenCards. Figure 3 shows an example of this type of search, where the user had entered the gene name 'FLOWERIN G LOC US T ' as a search term. This search retrieves , among others , an

Populus trichocarpa*

11 Salicaceae

Populus tremula Salicaceae

Medicago truncatula '" Fabaceae '0 'Vi

Arabidopsis thaliana

1] 0 0::

Brassicaceae

Brassica napus '6 Brassicaceae .~

Vitis vinifera * '0 ::l

'" Vitaceae I.U E ~ a;

Solanum tuberosum 0 Q. 0 '" Solan aceae 0

'0,

Solanum lycopersicum c

'" «

Solanaceae '0

Capsicum annuum 2 '"

Solanaceae «

'" Nicotiana tabacum Q)

Solanaceae ~ >-

Beta vulgaris .!::

1 fr Amaranthaceae ~

m Hordeum vulgare 0

IJ Poaceae

Oryza sativa * Poaceae

Figure 2. Phylogenetic tree depicting the evolutionary relationships among the species represented in GabiPD (15- 17) . Species for which whole-genome sequences and annotations are available a re shown in blue. Asterisk indicates species that will soon be integrated in G abiPD. Species not shown: S . bulbocastanum, S. demissum, S. phureja and S. spegazzinii.

o o :E ~

0' Ol D.. CD D..

a 3

o

D956 Nucleic Acids Research, 2009, Vol . 37, Database issue

Arabidopsis 'Gene GreenCard' (gene: AT I G65480.1) corresponding to the genome annotation project and 'Plant GreenCards' representing T-DNA insertion lines, e.g. plant 290E08, with flanking sequences that have BLAST hits to the gene ATlG65480.1 and with seeds available from GABI-KAT (6). Moreover, several 'Clone GreenCards' of cDNA clones (e.g. clone: MPMGp2011E01215) in which the keyword is found were retrieved by the search. A more strict relationship between the Gene GreenCard and the Plant and Clone GreenCards is established by similarity-based searches. The best BLAST hit of the sequence, e.g. representing relationship to a cDNA clone or a mutant plant line, appears

seaICh!allspeCIl!S

101 1R..O'.VERlNG LOCUS T

Functional annotation

Related clones

Identified in 2DE gels

External resources

Sequence features

.....

Resu~sllCrl"'!je: 1020 50 jOO?00500

TA!~

CellefPlJI'owser

Pho<;~'

plnTFOB

Sequl'llce:AI1G6S4BO.1

$Ullllllfl erof seIIUIlfM;e:~

length: 855

(jenesh"uCU_e - UTR - la rltodOl, _ ,Iu~ J(m --I! re,"

CloneallgnlllllnlS - fnr\fl3td - rl'Vl'lSe - ~IOf)

AfAllTTUA G..UGCI .. c..u..A (,AJ.J..AJ.fllC r.u..u.r.A'::.u ACUTCAACA O .. CA(;A..U.C(, ACCTCTtTCT TcAAc ... r c ....... A(;ATCT':TAT AAATATAACA CACC~TCtTA l ACTMCCA(; AGTTGTTCGII. CACGTl(,YTC ATCCGTn.u lACAfCIlATr AC'FTA.U.GG TTACTTATGG CCA.UGAG'AG GTGACU.U r, GC" TT G(;UCT AACGCCTTCT CAGGTTCA.U. ACllGCCUC AGTTCAGA7TG<;TGGAr;.u.GAC(,TCAGG.u(,TTCTATACJTTC;GnJ.TI;GT~TC(,Ar;A

TGTTCCAACT CC lAGC.ucc CTCACCTCCC ACMJA'I'CTC CA TTGG Tl (;G TG-ACT(;AUT CCCTGCT~CA At;TCCU.CAA C{rTTCCCM TGAGATTCTC TGrTACC.v..+. ATCCAAC:TCC CACTGCA.;GA ATTCATCC TC TCC:TGHTAT A!'l'GTTTCCA CACCUG'C!.'"A C:(;'cAAACACT GTATCCACCA C(;(; TGCCCC~ AGMCrtCM CACTC(;cCA C TTI~TG"'~A tCUC ...... TCI CCGCCITCCC GT"G'cr(;("A" TTTTt;TACAA HCTCA .. "'';'' CA,,"GT"c:rT c;rC;';A(;l;;A.lC; MCAC TTIAe: ATGC:CTTC TT CC rTUYMC CM HCATAT tC:CATAnCT GATCAe:ATrT

ACTATTTTAA TTTMTAACC ATTTTATGJlT AC('~CTA.M·C MCCCTCATC tAGTT:llIA tAtllGTCtC TllT.UUAt CA·;A.CCCGCA C.;.u..uTCA.C TTATATAU~ t"r·;AT"C"A TMTTATATT M1CTAO.TG .v..+.TC;UC;TC;

in the section 'Related with' , and the users can go from the clone or the plant line to the associated gene or vice versa.

Furthermore, the 'Gene GreenCard ' , which displays information from genome annotation projects and NCBI UniGene sets, has been extended to include links to secondary databases, such as ARAMEMNON (4) , GABIKAT (6) and ProMEX (8) . Additionally, schematic representations of gene sequence features are provided to highlight protein domains identified using the latest PFAM library (9), exon-exon borders and untranslated regions (UTRs) identified by the genome annotation projects (Figure 3). These features are displayed onto a representation of the cDNA sequence.

::J ----

-~~.:hrI·~·~;:: ~. . ..: -=.:..~ .- . ......

' "

Figure 3. Example of a keyword search using GreenCards. (A) The user had performed a search for the keywords 'FLOWERING LOCUS T' , which retrieves links to the GreenCards of Genes (genome annotation projects), C lones (ESTs) and Plants (mutant plant lines). (B) Display of the Gene GreenCard, corresponding to the A rabidopsis annotated gene ATlG65480.l Here, the users find high confidence matching EST sequences displayed in the 'Related with ' section. Sequence features and the sequences themselves are displayed as well. The selected gene has a matching EST (Clone: MPMGp201IEOI2l 5), this Clone GreenCard is shown in (C) and links back to the Gene GreenCard a nd to the original EST trace file, displayed by JTrev (D). A protein spot in rapeseed (B. napus) has been identified by 2DE/MS as the protein encoded by the retrieved gene, and a link directs the user to the 2DE gel image with the identified protein spot highlighted with blue cross-hair (E). The spot identified links to a description of the protein (F) that provides links to the origina l Gene GreenCard.

o o :E ~

0-Ol D.. CD D..

a 3

o

Nucleic Acids Research, 2009, Vol, 37, Database issue D957

Alternatively, users can enter their own amino acid or nucleotide sequence to identify, by a BLAST search (10), similar sequences integrated in GabiPD.

In addition to the GreenCard and BLAST search functionality, users can browse and search the genetic maps and 2DE gels stored in GabiPD, through specifically designed visualization tools: (i) 2DEGelViewer by which

2DE gel images can be viewed in an interactive way, which allows retrieving extra information on 2D spots as identified by mass spectrometry (Figure 3); (ii) genetic mapping data can be visualized using YAMB (Yet Another Map Browser; Figure 4) with the possibility to view details on all mapped elements (11), (iii) MapManWeb (Figure 5) allows the visualization

I BarleyChromosomeJ

~~

00

50

100

150

Consensus Stein2006

GBR0642~ G8R1184 GBR0077 GBR0093 ~~=~~g~~ GBR04S1/"

GBR0849

I Marker (RFLP): GBR1211

St. pto. x ""ex :::: ~~o J=--- - 11 b'.9 '/ mar) srJecies chr. m8PrJedas rJOsttlon subrmtter

~~~ ,onsensus Stelll2006 8arley 1 G8R1211 148 cm Or. Nrls Stern

~~ ~9R1019 Stleptoe x Morex 8a,ley I G8R1211 8 cm Or. Nils Stein

BR0702 4elated to Se(ltlence. correspondig to BioOhject: HY06E20 BR0706 9R0394 Se'tuence: HY06E20u BR02 18

----~G9R184S

submrtter of sequence: Dr. Wotfgang Michalek

length: 642

prinuug: 3-pnme

EBI·III: HVUL 11728

GenBankAc: AL510886

IPK eR·EST: HY06E20u

Selluence features and Clone alignments (BLAST):

0 100 200 300 I I I I

400 I

500 600 I I

Gene structure - UTR

Clone ahglllllents - forward

Protein domains - domain

- start codon - slop codon - - exon·exon border I - reverse ~ start ... stop

GBR184S-------J 200

c~~~~:=======~_

ClICk on sequence features 10 oblarn addrtronallllformatron

HY06E20'-l CCCGTTGHTA CGCCCTCCGA AGTACTCrCG ACTTTT!TTT 111111111T TTTTTTTGCA A.Ao_o\cn.TT TGTIIGTATT T.6011CCC.lI,.,.o\G TTTTATCTJI.G TATTGGATGG A..UCGCACAG

ATGATGGTGe ATGACACIT! ATTTTTCATT GCTACATCGA TACTCCAATC GAGTJI_o\CJ!.AT

GAAGGTCCAT CGATelAGTA TTAC}l_~.CTCT TGTCATTTCA TICGCCACeA GTGCCGCrAG

J CCATGCTACC A.A ... lt.TGGTGAG CA.~.TJ_Ii.GGTG GG.Io,.C .~.TGCAC AT TGCATAAC ATAGGAAGTG /1 G8R1154 TCTGAATCAC CAATCACCCA ACCACCTTGA ACTGGGCTTG TTGTIGAGGT AGGAC}U'JI.GG

F====---------'--....:.;;I-=----------=='1~_~~~~~~~_~~ ~~~!~~!~~~ ~~~~~~~~~~~ ~~!~~~~~~~ ~~~~~~!~~ !~~~~!~~~~ legeml

_ SSR marker

C A.O\TGGTTTG GATGCeICGC CATCGGG.ll.TT GCTCATTGAT ATTTGCTAGT TGTTGACJI_~.C

A.TTGTTGTCG CTTCAATTGG CL1o.CTGTTTT GTTGCGJo3·.GA TATGTGCTHC GGAGG}.t_lo.CGA

C·ATCACTHGA CTGG.~\GTAA TCGCTACCTT CTCCGGTCCA AG

_ Public Domarne Marker

_ RFLP marker 0 Twe PosH ion Orientation ,----, SNP marker W &;' __ ..:.. 1 ;... _________________________ -1 PRF 1 . 195: forward

PRF 2 . 97: forward rO~------------------1:-:0:::0:::'0----1 PRF 3 - 131: forward

1 1

RUS129Gl2w RUS32E02w

( RUS33Dl2w

RUS34A04w RUS43E03w

RUS43H08w ( S'i RUS44f1Ow

RU 31 E02w HY02nOu

( HYOJl.22u

HY06E20V

~6F02V HY06E20u l

HY06F02u I (

HY09F16V HV09F18u

I HYlOL22u (

HY'ODI 2V

r r

,

, ,

I ,

r

, ,

~ , r

PRF 131 ·256: fo,ward

PRF 282 · 142: reverse

PRF 434 ·526: forward

(lRF 465 · 533: forward

PRF 503.249: reverse

PRF 516 ·310: reverse

PRF 640 - 221: reverse

(lRF 642 ·556: reverse

Cluster IlIfo:

~:onti!l30438.1 (18 members,]

\new Blast Searches

I

I

60 120 180 240 300 360 420 480

5<0

600 642

Figure 4. Visualization of the genetic maps published by Stein et al. (IS) . (A) The region between OcM and 25 cM of barley chromosome I is shown in Y AMB; markers with SNPs are shown in light blue, markers with restriction fragment length polymorphisms (RFLPs) are shown in dark green . A selected marker is displayed in red , and links to the Marker GreenCard (B) , which contains information on a related EST therewith connecting genomic with transcriptomic information. With the EST description, cluster information is included (Contig3043S .I ) that links to the schematic representation of all ClusterContig members displayed onto the related consensus sequence (C) . ESTs that were selected from this ClusterContig as representatives for the new 27K barley unigene set are shown in turquoise.

o o :E ~

0' Ol D.. CD D..

a 3

o

D958 Nucleic Acids Research, 2009, Vol . 37, Database issue

Select all assay I. 0111 the listiJox ht!low (2 entries) :

_ 1. AraLer umreated-Aral er Senescence

? AraLer untreated-AIjlletJ:!eatTrfLated

1. Alaler _untreated_Araler _Senescence:

.uln-like CC AAT. IWl2

E2f·OP -.. .

ARR.B

"' .. H

HB bZI'

C2C2-(0 Wte C2C2 Dot' M .. OS

C2C2·Cala ClCZ.YA 08Y

... , ~-, .. ~cd

HAC

CC AA1 _ORt

.=

CPP

ElL

GRAS

GR'

HS'

lHcer -like

Pathway: I tlanscrip1Jon

Sample: 250597 _at (J~ffymetrix GeneChip: Arabidopsis enome ATH1 Array F'eature)

species: Arahirlo!lsfs lhaliana

tyJle: -...'( I dt~pe

~~ta

len{J1h: 948

Ilescfllltlon: Al5gU7b81l. f\!A~' (no ,~f)I(a eles 31792

"'SOS97_At tQ'Q':.tttqc ctcctQ'Q'ttt ta'1OjJtttcat cCAacaq ~tCCllc.a'Ja aqqttcttga cct:t:'J<'l'~ttC tcq9cta ,,"ca""')"'-9 Iioqcc .. t'i1'JII"' gtt:'1,:c ... t .. t "'''''''9C'''''

Gene: AT5G07680.1

AHelnatlW nl.lllle:li,TN.II.(.4 AlternatIVe name:~rabldopS1S NA. domain (ontaUlInIJ protein 79 Alte,nalrve nallle:".t.JAC079 AHmnalr.re nome:ATSG07680

ellot~lIle (GenotyJle)

species: ArabidQIlSIS thaliana

fyJIC: W1 ldtiJlE'

COmll1011l1l.1l1le: thalecres"

cuttrvar: COlumbia

~AC079f,A.NACOBorATNAC4 V\l.abluol)sIS NI\C dom,:,m conl.aulmg protein 79, Arabldupsr", NAC dom ain contamlng protein 80), transcriptIOn tartar (T,AIR 7 0)

'NAC079JANA.COSOfATN.A,''>t V\labldopsls Nt.,C domam contailllng ploleln 79, .A.r .. bidopsls NAC domain conlill!1ing prolein 80) Iranstrlp!lon fatlor similar 10 ANAC. tOOfATNAC5 (Arabldopsrs NAC. domain (onl3lnmg prCrlem 100), tranSCrIption factor [l\rabldoPsls Ulahanal (TAJR:ATSG61430 '1, $10111311

10 NA('5 protein IG~lne md:1 (G8 MX85982 1), onl3ms InlerPro domain t l o aorcal merlstem (NMI) prol eln. (InterPro IPR00344 P ~Map~lan)

.ttttctqtq tq .... gq .. t.q aaiolJ1: ...... cC':l .. ctqqtct MaflMan Annotation: ccqqtt .. tt q<l" ..... qc9.c .go'l9"aa'illlat aaago.q.t qtatqaa'la .... CaCttqt tt:t:cta,:aqa 9<;Jaaqagc

_999'=9at9C .tqa\tta'!aq q'!~ag .. ':q'Ja .. aactctc cta.a.gaa.tq a.at .. qOjJtg.t ctQc.a9';I..,ea tttcata CCJll.tttc:gll. C~'C3a'Ccc", aatC9~tct. taCQ'g3ac ctqattct.t caCCll.'!aclI.lI. cqa.caa.lkGCC _qBCCq

BIN 27.3.27 RI'lA.lel,lulatroo oflram.Ltlptlon NAC domain tr<lnscnptlon f<lc lol fdmr ly

BIN 27.3 RnA, reglll", [Jon ottranscrtpIJon

BIN 27 R~IA

~F=----.,--l'ecaaecAaq ctqaa.!l.~t.g agq.!l..!I.c,aflo>:a ctcaatt

BIN 33.99 de.elopmenll1nspeClfied

BIN 33 de'.elopmenl eCAtecaac ca\l'a'!tt:tct eca~at, .. at.t. cc.ctct.

:~::~:::: ::::;::~~ ~::::~:!~:: ::::::~~ Relaled wllh

,oI.,,,,,,,,",,,i-_..::..._raccgataact catcqgtttt t9aattl:999 ""~9aaqC_joiO~'~2 ~BO~'~(:D:",,:,:,,:ne:::=:=:::~~~~~ ________ ~~~

TYlle Poslllon DeSCflllltton \I OUgo 395 419: TCTCTGCTCACMCTTGCCT.A,AAA,C

... Ollgo 476 500: AGA.TCCCGATTICGACGTIAATCCG

• Ollgo 530 554: TACCACCTITAACTGATTCTTCAI.. C

L.-----.... --..... - ..... - ...... -------......... - ...... -I~: :~: :~~ -:~~~ ~:;~~:~~(~Z~~~.~~~:.~'t~ Ollgo 702 726; ACCTCAATCTCTCAA(IO,TTICCGAG Ollgo 717 741: (Il,TTTCC (;A.GAGTT(CAA.TC(,GGTI

Olrgo 810 834: Gt.,GCATCTCGCMGAAACCGGGGTT

OIlgo 8]0 854: GGGTTICGAAC,A.CCGA.TAACTCATC

Oliuo 88] 907; GA7CATCAA'c·MC·TECATC TCCCT

Ollgo 905 929: C(TCTICC'3G T({~GGTIGATCTTGA

Figure 5. Visualization of expression profile data in MapManWeb. (A) The Affymetrix® NASC Array experiment on programmed cell dea th in Arabidopsis is di splayed (NASCArrays reference number: 30). MapManWeb allows the visualization of expressed genes in different biological processes; here only probesets (i.e. genes) involved in transcription regulation a re shown. (B) Details for a strongly down-regulated probeset, with links to the related Gene GreenCard in A. thaliana . (C) The Gene GreenCard for the selected gene (ANAC79) links back to the probeset of the Affymetrix® ATHI array.

and extraction of relevant information from transcript and metabolite-profiling data and the graphical mapping of such data onto diagrams of metabolic pathways and other biological processes (12); and (iv) an extended version of JTrev (13) allows the display of sequence traces with integrated SNP information.

SATlotyper: a software tool designed for inferring ha plo types and phased genotypes from unphased SNP data for polyploid and polyallelic heterozygous populations (14).

The GabiPD project page serves as an additional gateway to specific data by providing project-specific views, such as BreedCAM or PoMaMo (11) where potato genomic data and Solanaceae function maps for pathogen resistance are accessible.

ADDITIONAL TOOLS AVAILABLE FROM GabiPD

In addition to the data and data visualization available from our site, the newest versions of the following tools are made available for download:

MapMan desktop version: a user-driven software tool that displays large datasets (e.g. gene expression data from Arabidopsis Affymetrix arrays) onto diagrams of biological processes, such as metabolic pathways (12).

FUTURE DIRECTIONS

The presentation of a wide spectrum of different plant species in GabiPD paves the way for cross-species comparisons that are facilitated by the availability of BLAST hits between the GabiPD sequences and plant NCBI UniGene sets. To ease the transfer of knowledge from sequenced to non-sequenced plant species, the genome annotations of Oryza sativa, Populus trichocarpa and Vitis vinifera will be added and mapped to closely related species in the near future . Furthermore, information about orthologous genes will be included for cross-species studies. Moreover, we will extend our WebServices to provide programmatic access to multiple data types for all plant species in GabiPD.

o o :E ~

Cl Ol D.. CD D..

a 3

o

ACKNOWLEDGEMENTS

The GabiPD team thanks the GABI and the WPG (Wirtschaftsverbund Pfianzengenomforschung GABI e.V.) community for providing data and supporting the continuous development of the database . Dr Bj6rn Usadel is acknowledged for helpful discussions and his support in the development of MapManWeb. We wish to thank Dr Patrick Schweitzer, Dr Lothar Altschmied, Dr Uwe Scholz and Dr Nils Stein for the collaboration in the generation of the new 27K barley unigene set and Dr Kathryn F. Beal for sharing the source code of the trace viewer JTrev. Ozgiir Demir and Sebastian K6hler are acknowledged for further developments of user interfaces and for data integration.

FUNDING

German Ministry for Education and Research (BMBF) (GABl I: 0312272, GABI 11: 0313112 and GABIFUTURE: 0315046); the former German Resource Center for Genome Research (RZPD) GmbH; Max Planck Society. Funding for open access charge: Max Planck Society.

Conflict of interest statement. None declared .

REFERENCES

I. Ba ll ,C.A. and Brazma,A (2006) MGED standards: work in progress . Omics, 10, 13S- 144.

2. Jenkins,H., Hardy,N ., Beckmann,M., Draper,1., Smith,AR., Taylor,J. , Fiehn,O. , Goodacre,R., Bino,R.J., Hall ,R. et al. (2004) A proposed framework for the description of plant meta bolomics experiments and their results. Nat. Biotechnol., 22, 1601 - 1606.

3. Swarbreck,D. , Wilks,C. , Lamesch,P. , Berardini ,T.Z. , G arcia-Hernandez,M. , Foerster,H. , Li ,D. , Meyer,T. , Muller,R. , Ploetz,L. et al. (200S) The Arabidopsis Information Resource (TAIR): gene structure and function annota tion. N ucleic A cids Res ., 36, DI009- DIOI4.

4. Schwacke,R. , Schneider,A. , van der Graa ff,E. , Fischer,K. , Catoni,E., D esimone,M., Frommer,W.B., Flugge,U .I. and Kunze,R. (2003) ARAMEMN ON, a novel da tabase for Arabidopsis integral membra ne proteins. Plant Physiol., 131, 16- 26.

Nucleic Acids Research, 2009, Vol. 37, Database issue D959

5. Riafio-Pach6n,D.M., Ruzicic,S., Dreyer,I. and Mueller-Roeber,B. (2007) PlnTFDB: an integra tive pla nt transcription factor da tabase . BMC Bioinformatics, 8, 42.

6. Li ,Y. , Rosso,M.G. , Viehoever,P. and Weisshaa r,B. (2007) GABIKat SimpleSearch: an Arahidopsis thaliana T-DNA mutant da tabase with deta iled information fo r confirmed insertions. N ucleic Acids R es., 35, D S74-D S7S .

7. Heazlewood,1.L. , Durek,P. , Hummel,1. , Selbig,1. , Weckwerth ,W. , Walther,D. and Schulze,W.X. (200S) PhosPhAt: a da tabase of phosphorylation sites in Arahidopsis thaliana a nd a plant-specific phosphorylation site predicto r. N ucleic Acids Res ., 36, DIOI5- DI02!.

S. Hummel,1., Niemann,M., Wienkoop,S., Schulze,W., Steinhauser,D., Selbig,J., Walther,D. and Weckwerth,W. (2007) ProMEX: a mass spectral reference da tabase for proteins and protein phosphorylation sites . BMC Bio informatics, 8, 216.

9. Finn,R.D. , Tate,1. , Mistry,1. , Coggill ,P.C. , Sammut,S.J. , Hotz ,H.R. , Ceric,G. , Forslund,K ., Eddy,S.R. , Sonnhammer,E.L. et al. (200S) The Pfam protein families database . N ucleic Acids R es., 36, D2SI- D2SS .

10. Altschul ,S.F. , Gish,W. , Miller,W. , Myers,E.W. and Lipman,D.J. (1 990) Basic loca l a lignment search too l. J. Mol. Bioi., 215, 403- 410.

11. Meyer,S. , N agel,A and Gebhardt,C. (2005) PoMaMo-a comprehensive da tabase for potato genome data . N ucleic A cids Res ., 33, D666- D670.

12. Usadel,B., Nagel,A., Thimm,O. , Redestig,H., Blaesing,O.E., Palacios-Rojas ,N. , Selbig,1. , Hannemann,1. , Piques ,M.C. , Steinhauser,D. et al. (2005) Extension of the visua liza tion tool MapMan to a llow sta tistica l analysis of a rrays , display of corresponding genes, and comparison with known responses. Plant Physiol., 138, 11 95- 1204.

13. Bonfield,1.K. , Bea l,K.F. , Betts,M.J. and Staden,R. (2002) Trev : a DNA trace editor and viewer. Bioinformatics, 18, 194- 195.

14. Neigenfind,1. , Gyetva i,G. , Basekow,R. , Diehl ,S. , Achenbach,U ., G ebhardt,C. , Selbig,1. and Kersten,B. (200S) Haplotype inference from unphased SN P data in heterozygous polyploids based on SAT. BMC Genomics, 9, 356.

IS. Soltis,P.S. and Soltis,D.E. (2004) The origin and diversifica tion of angiosperms. Am. J. Bot ., 91, 161 4- 1626.

16. Angiosperm Phylogeny Group. (2003) An update of the Angiosperm Phylogeny Group classifica tion for the orders and families of flowering pla nts: APG 11. Bot . J. £inn. Soc., 141 , 399-436.

17. Knapp,S. (2002) Tobacco to toma toes : a phylogenetic perspective on fruit diversity in the Sola naceae . J. Exp. Bot ., 53 , 2001 - 2022.

IS. Stein,N. , Prasad,M. , Scholz,U ., Thiel,T. , Zhang,H. , Wolf,M. , K ota,R. , Varshney,R.K. , Perovic,D. , Grosse,I. et al. (2007) A I,OOO-Ioci transcript map of the barley genome: new anchoring points fo r integra tive grass genomics. Theor. App l. Genet ., 114, S23- S39.

o o :E ~

0' Ol D.. CD D..

a 3

o

gabipd: the gabi primary database - a plant integrative

Documents