bioimage database project director image bioinformatics laboratory, oxford e-science centre...

12
BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e- Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS, UK e-mail: david.shotton @zoo.ox.ac.uk David Shotton IDF Members’ Meeting 22/06/04 © David Shotton 2004 The BioImage Database and identifiers

Upload: christian-ball

Post on 27-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e-Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS,

BioImage Database Project DirectorImage Bioinformatics Laboratory, Oxford e-Science Centre

Department of Zoology, University of OxfordOxford OX1 3PS, UK

e-mail: david.shotton @zoo.ox.ac.uk

David Shotton

IDF Members’ Meeting

22/06/04

© David Shotton 2004

The BioImage Database and identifiers

Page 2: BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e-Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS,

The nature of bioinformatics databases

Provide Open Access, free to academics, for data generated by publicly funded research

Few databases have assured long-term funding; many run on a shoestring

None are run for the purpose of generating profit, although some generate income to offset costs by selling services to commercial companies

Data are rarely replicated between databases (with the exception of basic sequence and crystal structure data), so little need for multiple resolution

Many add value to raw data by providing expert annotations, organizing data according to protein families, etc.

Databases typically use distinct data models and lack interoperability

Some are actively adopting new Semantic Web technologies

All are ripe for the use of universal resolvable identifiers

Page 3: BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e-Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS,

The aims of the BioImage Database Project

The aims of the BioImage Database (www.bioimage.org), funded by the European COmmission ORIEL project (www.oriel.org) are:

To be a searchable database of high-quality multidimensional research images of biological specimens, both ‘raw’ and processes, with detailed supporting metadata concerning:

the biological specimen itself the experimental procedure details of image formation and subsequent digital processing the people, institutions and funding agencies involved the curation and provenance of the image and its metadata

To integrate such multi-dimensional digital image data with other life science resources by providing links to literature and ‘factual’ databases

To store, use and conform with standard external identifiers such as DOIs where these are available, particularly when referencing articles

Page 4: BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e-Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS,

The basic BioImage metadata model (with thanks to <indecs>)

Cell or organism

Experiment or study

Image capture

Image sets of multidimensional images, including videos

Subject or specimen

Researcher

Photographer or microscopist Camera or

microscope

Experimental conditions or manipulations

Page 5: BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e-Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS,

TheBioImage

home page

www.bioimage.org

Note the hierarchical

browse categories and the alternative Browse / Search

arrangement

Page 6: BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e-Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS,

Search result,

showing Studies

Page 7: BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e-Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS,

What are Life Science Identifiers?

LSIDs have been developed by IBM and I3C (the Interoperable Informatics Infrastructure Consortium; www.i3c.org) to serve the life sciences

They uniquely identify single digital objects

They provide persistent URNs resolvable though normal DNS mechanisms

They are location independent

They permit provenance records (versioning)

While developed for the life sciences, they are in fact completely generic

Page 8: BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e-Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS,

What do LSIDs look like?

A five-part format: urn:lsid:Authority:Namespace:Object_ID[:Revision-ID] urn:lsid: This network identifier (NID) is a mandatory prefix Authority is the root DNS name of the issuing authority Namespace is chosen by the issuing authority and constrains the scope of

the object Object_ID is an alphanumeric object ID unique to the namespace Revision is an optional version of the object

For example:

urn:lsid:bioimage.org:BIOIMAGE:76

refers to entry 76 in the BioImage Database

urn:lsid:ncbi.nlm.nih.gov:pubmed:12571434

references a PubMed article

urn:lsid:ncbi.nlm.nig.gov:GenBank:T48601:2

refers to the second version of an entry in GenBank

Page 9: BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e-Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS,

How do they work?

LSIDs are in a sense just a sociological con trick, since they are nothing more than cheap and cheerful URNs

They can be published and resolved, either over the Web using DNS mechanisms, or using Web Services protocols (UDDI, WSDL, SOAP)

If an LSID names an abstract concept, such as a protein name, for which multiple relevant datasets may exist, that LSID will not have any byte data associated with it, but instead will have metadata pointing to other LSIDs that themselves name ‘concrete’ versions of the object, e.g. the protein sequence, the crystal structure. Those LSIDs do name actual byte data

They share many of the weaknesses of conventional Web mechanisms, particularly regarding security and access control

Page 10: BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e-Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS,

Who are using LSIDs?

LSIDs wer first introduced in 2003

LSIDs are being adopted by highly respected and influential leaders in the bioinformatics Web Services and Semantic Web community, including

Mark Wilkinson of BioMOBY (www.biomoby.org),

Carol Goble of MyGrid (www.mygrid.org.uk), and

and Eric Neumann, Global Head of Knowledge Management for Aventis

They form the resolution mechanism behind Haystack, the first Semantic Web browser, based on Eclipse (haystack.lcs.mit.edu)

Page 11: BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e-Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS,

Why are people choosing LSIDs

LSIDs can be retro-fitted to existing databases, permitting you to convert your own internal identifiers into unique resolvable identifiers, without altering your existing naming system

Open Source software exists to permit you to establish DNS and Web Services resolution services for your URNs, such that anyone addressing your URL can send an LSID and have returned an RDF document describing what that LSID represents

Like being a Web publisher, anyone can become an LSID registration agency

No central third-party registration agency is required, and there are no fees to pay

This no-cost decentralized mechanism, while lacking many of the safeguards and refinements of DOIs, has the same ingredients for success as Tim Berners-Lee’s original Web protocols

We have adopted LSIDs for the BioImage Database and will establish our own LSID resolution authority as soon as we go public

Page 12: BioImage Database Project Director Image Bioinformatics Laboratory, Oxford e-Science Centre Department of Zoology, University of Oxford Oxford OX1 3PS,

End