wp4 current status

29
VII EBRCN GM, Berlin, 26-27/09/2004 1 WP4 Current status Paolo Romano & WP4 group

Upload: nellis

Post on 15-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Questa presentazione può essere utilizzata come traccia per una discussione con gli spettatori, durante la quale potranno essere assegnate delle attività. Per memorizzare le attività durante la presentazione: In visualizzazione Presentazione diapositive fare clic con il pulsante destro del mouse - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 1

WP4

Current status

Paolo Romano & WP4 group

Page 2: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 2

WP4 objectives

Improved accessibility and interconnection

• Links to external resources• Literature, Sequence, Special interest databases

• Extracted databases• Available at interested SRS sites

• Inventory of data and usage• Local and remote search, sites’ map

Page 3: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 3

Links to external resources

LiteratureMedline, Taxon

Special interestMicro-organisms images

Plasmids’ maps

SequencesEMBL Data Library

Page 4: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 4

Links to Medline

Syntax:add [PMID: <number>] after bibliographic reference

Links in place (> 7000): Plasmids: LMBP (375), NCCB (30)

Cell lines: ICLC (294), DSMZ (905)

Fungi: CBS (454)

Yeasts: CBS (1132)

Phages: NCCB (30)

Literature reference file: DSMZ (3818)

Page 5: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 5

Special interest databases

Plasmids’ maps:Syntax:

New FDS field: ‘External_links map <name>’

Links in place: Plasmids: LMBP (777)

Images of micro-organisms:Syntax:

New FDS field: ‘External_links image <name>’

Links in place: None (waiting for next catalogues’ update)

Page 6: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 6

Linking to EMBL (i)

• Linking “on-the-fly” to EMBL Data Library through SRS, without IDs, gave negative results:• Links are different for different materials and can use

various EMBL fields: • Organism (micro-organisms), Division (viruses and plasmids),

Feature Table (definition of the source through Key, Qualifier, Description)

• Annotation problems (e.g., missing spaces)• Indexing problems (e.g., use of dots)

Page 7: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 7

Linking to EMBL (ii)

(well known) Example of search “on-the-fly”:

• Searching for fil. fungi strain CBS 100.20Involves: fungi & source & cbs 100.20

( ( ([emblrelease-FtKey:source] & [emblrelease-FtQualifier:strain] & ( ( [emblrelease-FtDescription:cbs] & [emblrelease-FtDescription:100] ) | [emblrelease-FtDescription:cbs100] ) & [emblrelease-FtDescription:20]) ) < [emblrelease-Organism:fungi*] )

Page 8: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 8

Linking to EMBL (iii)

• Agreement with EBI• Identification of crossreferences from CABRI

catalogues to EMBL (and viceversa) by unique IDs• Submission of the list to EBI• ID based links to CABRI included in EMBL data library

and distributed with it• Use these links when linking from CABRI

• Links from LMBP to EMBL managed differently

Page 9: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 9

Linking to EMBL (iv)

• Work started vs EMBL 79

• Common (new) SRS site for CABRI and EMBL• Modified indexing -> common keys format• SRS links established

• Preliminar list of references sent to collections• Comments returned

Page 10: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 10

Common site established

Page 11: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 11

Common keys format

CABRI indexing: by whole IDCBS 100.20 -> ‘CBS 100.20’

EMBL indexing: by single wordsCBS 100.20 -> ‘CBS’ + ‘100’ + ’20’

CBS100.20 -> ‘CBS100’ + ’20’

Common indexing: name (only letters), possibly followed by space, followed by string (including letters, numbers, dot, dash), punctuation removed

CBS 100.20 -> ‘CBS10020’

CBS100.20 -> ‘CBS10020’

Special case (not currently managed):

NCCB LMD and Phabagen bacteria catalogues

Page 12: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 12

SRS links EMBL - CABRI

#links Embl to Cabri Bact & Fun & Yeasts

$Link: [from:$EMBLRELEASE_DB to:$BCCM_LMG_DB fromField:$DF_FtDescription toField:$DF_CABRI_Strain_number]

$Link: [from:$EMBLRELEASE_DB to:$CBS_BACT_DB fromField:$DF_FtDescription toField:$DF_CABRI_Strain_number]

Page 13: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 13

Automatic identification of links

Page 14: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 14

Custom views (i)

Page 15: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 15

Custom views (ii)

Page 16: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 16

Links to EMBL: current status

Almost ready for submission of the list of crossreferences

EBI objection: many, some little, databases, instead of a big one

New proposal from EBILinks added in the SRS site at EBI only

Links not serchable

Links not distributed with EMBL Data Library

Alternative proposals from usMaking CABRI virtual catalogues by resource type (bacteria, cell lines,…)

Making an interrnediate database

Page 17: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 17

SRS virtual libraries

SRS Virtual librariesInclude many member libraries

Appear and can be searched as a unique database

Use indexes of member libraries

Member libraries must have a common data structure

CABRI Virtual librariesCan be created for each resource type

Interconnected Bacteria DB

Interconnected Cell Lines DB

May be created for similar resource typesInterconnected Micro-organisms DB

Page 18: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 18

Intermediate database

Intermediate CABRI database wouldInclude very limited infomation: identification and name

Be linked by EMBL and link to the related CABRI catalogueEMBL -> Intermediate db -> CABRI

Example:Identification CIP 70.34

Name Acinetobacter baumannii

Identification ECACC 88020401

Name Vero

Identification LMG 3589

Name Bacillus subtilis (Ehrenberg 1835) Cohn 1872 AL

Page 19: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 19

Extracted databases

• Intended to improve accessibility of CABRI catalogues by distributing them in a controlled frame

• Inlude a subset of information:CABRI MDS + link to CABRI site (new field Full_details)

• Established agreement with EBI• Preparation of extracted databases:

• Setting up of a purpose Web site: http://export.cabri.org/• Setting up of an FTP site for distributing data and SRS

configuration files: ftp.cabri.org (not anonymous) • Upload of catalogues to EBI: march 2004

• Automatic updating by FTP through SRS Prisma

Page 20: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 20

Catalogues at EBI

Page 21: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 21

CABRI views in place

Page 22: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 22

Link to CABRI for details & orders

Page 23: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 23

Quick searches at EBI (i)

Page 24: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 24

Quick searches at EBI (ii)

Page 25: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 25

Inventory of data usage and sets

• GlobalSearch on CABRI site available

• GlobalSearch on partners’ sites• Not stable• Partial (give me URLs!)

• Virtual BRCs’ Library• Map of sites’ maps• Includes links to archives/databases

• PLEASE SUBMIT YOUR DATA!

Page 26: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 26

That’all, folk!• MEDLINE

• Links to Medline already in place for many catalogues• New links added with periodical updates

• EMBL• Common site and index keys in place• Implementation of links under study with EBI staff

• Other external links• Plasmids’ maps in place• Micro-organisms images ongoing

• Extracted databases• Procedure implemented• Purpose web and ftp sites available• Uploaded to EBI march 2004

• Inventory of data usage and data sets• Search on partners’ site contents (ht://dig) soon available• List of partner’s site contents (sort of “Map of sites’ maps”) under construction

Page 27: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 27

Thoughts about the future (i)

• CABRI as it is• Many links to external databases are being set up and are already in place

for some of the catalogues• Extracted databases have been uploaded to EBI• Integration made possible (mainly) because of the adoption of SRS• CABRI sites are now well known, appreciated and use network services

• GBIF perspective• GBIF has designed a nice and innovative architecure• Distributed architecture can help management by avoiding conversions

and updates• It requires a sound expertise and good computer skills, not always

available at collections/BRCs• The ABCD Schema is not adequate for catalogues’ contents

Page 28: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 28

Thoughts about the future (ii)

• We need to keep current and set up new links• Current links with the molecular biology world should be kept

• SRS is an essential key for this connection

• Web Services based GBIF architecture must be taken into account for the future links with the (quickly) evolving biodiversity information environment

• SRS is evolving• Since SRS 6, XML has been incorporated

• With SRS 7, XML is essential (alternative to flat files)

• With SRS 8, Web Services have been added and SRS itself able to provide Web Services and to access them remotely

Page 29: WP4 Current status

VII EBRCN GM, Berlin, 26-27/09/2004 29

Thoughts about the future (iii)

• Proposal• Start by extending the ABCD Schema to reach our needs

• Continue with SRS and follow its evolution

• Adopt as early as possible the new SRS Web Services features and start offering information to GBIF

• Individual collections/BRCs willing to go autonomous can stop submission of data, provided they offer an agreed interface for remote access by the central SRS based system

• Finally, reach a mix distributed/centralized architecture, based on SRS and offering both standard SRS services and Web Services