talk1 ben sadi for_gmod_bosc_2011

10
SADI for GMOD: Bringing Model Organism Databases onto the Semantic Web Ben Vandervalk, Luke McCarthy, Edward Kawas, Mark Wilkinson James Hogg Research Centre, Heart + Lung Institute University of British Columbia http://code.google.com/p/sadi/wiki/SADIforGMOD

Upload: bioinformatics-open-source-conference

Post on 12-Jan-2015

680 views

Category:

Education


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Talk1 ben sadi for_gmod_bosc_2011

SADI for GMOD: Bringing Model Organism Databases onto the Semantic Web 

Ben Vandervalk, Luke McCarthy, Edward Kawas, Mark Wilkinson

James Hogg Research Centre, Heart + Lung InstituteUniversity of British Columbia

http://code.google.com/p/sadi/wiki/SADIforGMOD

Page 2: Talk1 ben sadi for_gmod_bosc_2011

SADI for GMOD: BackgroundSADI (Semantic Automated Discovery and Integration)

• Standard for Web services that consume/generate RDF

• Motivation: automated integration of bioinformatics data and software 

GMOD (Generic Model Organism Database)

• Toolkit for building a model organism database and website

• Collection of related open source projects: e.g. Chado, Gbrowse, Pathway Tools  

• Many sites use GMOD components: FlyBase, BeetleBase, DictyBase, etc. 

Page 3: Talk1 ben sadi for_gmod_bosc_2011

SADI in a Nutshell• to invoke a SADI service:

o HTTP POST an RDF document to the service URIo e.g. $ curl --data-binary @input.rdf http://sadiframework.org/examples/hello

• to get service metadata:  o HTTP GET on service URLo returns an RDF document with service name, description, etc. o e.g. $ curl http://sadiframework.org/examples/hello

• structure of input/output data is described in OWLo service provider specifies one input OWL class and one output OWL class

• strengths of SADIo no framework-specific messaging formats or ontologieso supports batch processing of inputso supports long-running services (asynchronous services)

more info: http://sadiframework.org/

Page 4: Talk1 ben sadi for_gmod_bosc_2011

SADI for GMOD• SADI services for accessing sequence feature data• implemented as Perl CGI scripts

Service Name Input Relationship Output

get_feature_info database identifier is about feature description

get_features_overlapping_region

genomic coordinates overlaps collection of feature descriptions

get_sequence_for_region

genomic coordinates is represented byDNA, RNA, or amino 

acid sequence

get_child_features feature description has part / derives intocollection of feature 

descriptions

get_parent_feature feature description is part of / derives from

collection of feature descriptions

Page 5: Talk1 ben sadi for_gmod_bosc_2011

SADI for GMOD: Structure of Service Input/Output RDF

@prefix lsrn: <http://purl.oclc.org/SADI/LSRN/> .@prefix GeneID: <http://lsrn.org/GeneID:> .

GeneID:49962 a lsrn:GeneID_Record; sio:SIO_000008 [ # p = 'has attribute' a lsrn:GeneID_Identifier; sio:SIO_000300 "49962" # p = 'has value' ] .

@perefix lsrn: <http://purl.oclc.org/SADI/LSRN/> .@prefix GeneID: <http://lsrn.org/GeneID:> .@prefix FlyBase: <http://flybase.org/cgi-bin/sadi.gmod/feature?id=> .@prefix GenBank: <http://lsrn.org/GB:> .

# p = 'is about'GeneID:49962 sio:SIO_000332 FlyBase:FBgn0040037 .

# feature

FlyBase:FBgn0040037 a SO:SO_0000704 . # o = 'gene' range:position [ a range:RangedSequencePosition; sio:SIO_000053 . # p = 'has proper part' [ a range:StartPosition; sio:SIO_000300 26994]; sio:SIO_000053 . # p = 'has proper part' [ a range:EndPosition; sio:SIO_000300 32391]; range:in_relation_to _:minus_strand_seq ] .

_:minus_strand_seq sio:SIO_000011 [ # p = 'represents' a strand:MinusStrand; sio:SIO_000093 GenBank:AE014135 # p = 'is proper part of' ] .

# reference feature (chromosome)

FlyBase:4 # chromosome 4 a SO:SO_0000105 . # o = 'chromosome arm'

Input RDF (N3) Output RDF (N3)

get_feature_info

HTTP POST

Page 6: Talk1 ben sadi for_gmod_bosc_2011

[GENERAL]db_adaptor = Bio::DB::SeqFeature::Storedb_args = -adaptor DBI::mysql -dsn dbi:mysql:database=flybasebase_url = http://flybase.org/cgi-bin/sadi.gmod/

SADI for GMOD: Setting up the Services1. Load your GFF files into a Bio::DB::SeqFeature::Store database (mysql) 2. Install SADI for GMOD dependencies with CPAN

3. Download the SADI for GMOD tarball and unpack into cgi-bin

4. Set DB connection parameters in cgi-bin/sadi.gmod/sadi.gmod.conf

5. Configure Dbxref mappings in cgi-bin/sadi.gmod/dbxref.conf

[DBXREF_TO_LSRN]SwissProt = UniProtUniProtKB = UniProtSwissProt/TrEMBL = UniProt...

6. Register the services in public SADI registry: http://sadiframework.org/registry

more info: http://code.google.com/p/sadi/wiki/SADIforGMOD

Page 7: Talk1 ben sadi for_gmod_bosc_2011

SADI Client Software

SADI Taverna PluginSHARE Query Engine

http://biordf.net/cardioSHARE/query

SPARQL Query => SADI Workflow Design SADI workflows

http://sadiframework.org/content/2010/05/03/sadi-taverna-plugin-tutorial/

Page 8: Talk1 ben sadi for_gmod_bosc_2011

Acknowledgements

 

TeamMark Wilkinson: Principal InvestigatorLuke McCarthy: Lead Programmer, SADI & SHAREEdward Kawas: Perl Programmer, SADI

FundingMicrosoftResearch

http://sadiframework.org/

Page 9: Talk1 ben sadi for_gmod_bosc_2011

Extra Slides

Page 10: Talk1 ben sadi for_gmod_bosc_2011

Demo with SHARE Query Engine

SPARQL Query SADI Workflow

"What proteins are homologous to FlyBase protein FBpp0288804?"

PREFIX FlyBase: <http://lsrn.org/FLYBASE:>PREFIX sio: <http://semanticscience.org/resource/>

SELECT ?homologWHERE { # SIO_000332 = 'is about' FlyBase:FBpp0288804 sio:SIO_000332 ?protein . # SIO_000205 = 'is represented by' ?protein sio:SIO_000205 ?sequence .

# SIO_010302 = 'is homologous to' ?protein sio:SIO_010302 ?homolog .

}

online demo: http://biordf.net/cardioSHARE/query