andy jenkinson, ebi the das protocol. summary of topics technical overview principles of...
TRANSCRIPT
Andy Jenkinson, EBI
The DAS Protocol
Summary of Topics
• Technical overview
• Principles of communication
• Pros and cons
• DAS capabilities
DAS Architecture
• A client asks for data from many servers• HTTP requests• identically structured URLs, the same parameters
• Each server behaves in the same way• pre-defined set of behaviours• e.g. provide a sequence, provide annotations of a sequence
• Each server provides different data in the same format• DAS-XML
DAS Concepts
Reference object• usually a sequence• e.g. “chromosome X” or “NT_025741”
Annotation• information attached to a location within a segment• e.g. “substitution at residue 326 of BRCA1”
DAS Concepts
Reference server• server that provides “core” reference object data• e.g. GRCh37 sequence data
Annotation server• server that provides annotations of reference objects
Segment• part of a reference object • e.g. “bases 100 to 200 of chromosome X”• ties together annotation and reference servers
Architectural Overview
The DAS Protocol
Defines 3 constraints• transport layer: HTTP• query format: constrained REST URLs• response format: constrained XML
Keyword: constrained
The DAS Protocol
Defines 3 constraints• transport layer: HTTP• query format: constrained REST URLs• response format: constrained XML
The DAS Protocol
Defines 3 constraints• transport layer: HTTP• query format: constrained REST URLs• response format: constrained XML
Data transport• Standard HTTP• Includes compression• Some additional headers, e.g. to indicate DAS version
The DAS Protocol
Defines 3 constraints• transport layer: HTTP• query format: constrained REST URLs• response format: constrained XML
Well-defined query URLs• A client can issue a command
http://das.sanger.ac.uk/das/ccds_mouse/features?segment=...^^^^^^^^^^^^^^^^^^^^^^^ ^^^ ^^^^^^^^^^ ^^^^^^^^ ^^^^^^^^^^^ site prefix das source command arguments
The DAS Protocol
Defines 3 constraints• transport layer: HTTP• query format: constrained REST URLs• response format: constrained XML
XML format• server responds with a simple XML document
<SEGMENT id=“X” start=“1” end=“100”> <FEATURE id=“exon1”> <TYPE id=“exon”>exon</TYPE>
Why DAS?
Fast, targeted queries• suitable for visual display
Based on existing simple tech• XML/HTTP/CGI• “dumb server, clever client” - relatively low knowledge
barrier for bioinformaticians with data to expose
Scalable• integrators (client software) get more data for zero cost
Why not DAS?
One-dimensional queries• query only by sequence position• not by developmental stage, tissue type, etc• (yet)
Constrained generic format• clients aren’t “tailored” to each data source • possible data types are to some extent limited
Not semantically rich• ontology support optional
Commands: the basics
Sequence• give me the DNA sequence for a given segment of a
reference object• e.g. “bases 100k – 200k of chromosome 15”
Features• give me all annotations offered by the data source that
are attached to a given segment of the sequence
The sequence command
/das/<source>/sequence?<params>
Parameters:
segment=ID:start,end (one or more)
ID of reference object
Example:
/das/<source>/sequence?segment=X:100,200 ;segment=Y:500,600
The sequence command
Response:<DASSEQUENCE> <SEQUENCE id="X” start="100” stop="200” version="1.0”> cctgagccagcagtggcaacccaatggggtccctttcca... </SEQUENCE> <SEQUENCE id=”Y” start=”500” stop=”600” version="1.0”> ctggacagcccggaaaatgagctcctcatctctaaccca...</SEQUENCE></DASSEQUENCE>
The features command
/das/<source>/features?<params>
Parameters:
segment=ID:start,end (one or more)
type=foo (zero or more)
category=bar (zero or more)
Example:
/das/<source>/features?segment=X:100,200 ;segment=Y:500,600 ;type=SNP
The features command
Response:<DASGFF> <GFF version="1.01" href=”..."> <SEGMENT id="X" start="100" stop="200"> <FEATURE id="X"> <START>100</START> <END>200</END> <TYPE id=”SNP” category=”variation">SNP</TYPE> <METHOD id=”sequencing">sequencing</METHOD> <SCORE>86.4</SCORE> <ORIENTATION>+</ORIENTATION> </FEATURE> ...
Other Commands
Stylesheet• hints on how to render different types of feature• e.g. “exons as blue boxes, SNPs as red triangles”
/das/<source>/stylesheet
Types• lists the types of feature available
/das/<source>/types
Metadata
Can make a client that knows how to query a server and parse the response
BUT something missing…• which data sources are available on a server?• which commands does a source support?• what kind of reference objects does it know about?
The sources command
<server>/das/sources
• Lists a server’s data sources
For each source:• text description• list of “capabilities” (commands)• list of coordinate systems (type of reference object)• etc
DAS Registry
• third component of DAS• catalogue of DAS sources
Human interface• validate, register, search, view statistics
Programmatic interface• http://www.dasregistry.org/das/sources• http://www.dasregistry.org/das/coordinatesystem• http://www.dasregistry.org/das/organism
SOA
Registry
Find
ClientBindServer
Publ
ish
Links
DAS Homepage• http://www.biodas.org/
DAS Specification• http://www.biodas.org/documents/spec-1.6.html
DAS in Ensembl:• http://www.ensembl.org/info/docs/das/index.html
Mailing list:• http://biodas.org/mailman/listinfo/das