aip-workshop1-dev-tutorial
TRANSCRIPT
araport.org
Developing Data APIs for the Arabidopsis Information Portal
Matt VaughnDirector, Life Sciences ComputingTexas Advanced Computing Center
araport.org
What’s In It For You?
• Discoverability and reusability of your services and UI codes
• Discoverability and reusability of other people’s services and UI codes
• Community exposure and reputation for you
• Comprehensive usage analysis logs• AIP-provided DOIs for apps and APIs• Satisfaction of helping to build and sustain
an Arabidopsis informatics ecosystem
araport.org
The AIP Strategy
• Key Design Decisions– Centralized (but powerful) data warehousing capability
PLUS infrastructure enabling data federation• Web service access to A LOT of Arabidopsis data to enable
powerful bioinformatics
– App store and Data API store model• Robust 3rd party development path for each• Accessible languages and frameworks
– Liberal adoption of GMOD technologies• Jbrowse as a genome browser platform• WebApollo + Tripal for community annotation• InterMine for data warehousing
– Secure & modern single-sign on– Geo-replication and high availability*– Full code release in real time via GitHub
araport.org
Araport Architecture
Agave Enterprise Service Bus
CLI clients, Scripts, 3rd
party applications
Physical resources
HPC | Files | DB
Agave Services
apps
meta
files
profile
jobssystems
ADAMAmanage
enroll
a b c d e f
AIP & 3rd party data providers
Mediators• query• Map• generic• passthrough
• Single-sign on• Throttling• Unified
logging• API versioning• Automatic
HTTPS
REST*
REST-likeSOAP
POX
Cambrian CGI
araport.org
Gold standard Data APIs
• Implement RESTful queries via HTTP GET• Are served over HTTPS with a valid SSL certificate• Allow Cross Origin Scripting Support (CORS)• Require authentication so that developers can
understand and respond to client demographics• Adopt AIP’s controlled vocabulary for query
parameters• Provide descriptive metadata for parameters*• Return JSON conforming to AIP community-established
schemas (unless there’s reason not to)• Can participate in future AIP deep caching and mining
infrastructure
araport.org
ADAMA: Araport DAta Mediator API
AGAVE
API MANAGER
NoSQL intermediary
Endpointhttps://api.araport.org/community/v0.3/
Live Docshttps://adama-dev.tacc.utexas.edu/api/adama.html
araport.org
Data API TypesTypes of Web Services:
Adapters build web services• query:• map:• generic:• passthrough: (just
changes URL)
araport.org
Data API TypesType Inputs Outputs Notes
query AIP-aligned parameters mandatory
AIP-aligned JSON Gold standard data APIs
map AIP-aligned parameters preferred
Transformed JSON Ideal for implementing namespace transformations or filters
generic AIP-aligned parameters preferred
Specified within code but can be any valid Content-type
Implement return of non-JSON data
passthrough
Specified by remote service
Specified by remote service
Allows existing services to be discoverable from AIP data store
araport.org
Data API Parameters (1)
• Diversity is a liability when parameterizing access to web services
• AIP is claiming a few parameter names that we want you to adhere to
• We would like you as a community to define extensions to our controlled vocabulary for parameters
araport.org
Data API Parameters (2)Name Description Validator (Case-insensitive)
locus AGI Gene Locus Identifiers
AT[1-5GM][0-5]{5,5}$
transcript AGI Transcript Identifiers
AT[1-5GM][0-9]{5,5}.[0-9]{1,3}$
identifier Another string plausibly expected to identify a gene or transcript
Valid alphanumeric string. No whitespace.
chromosome
A. thaliana Col-0 chromosome identifiers
CHR[1-5MC]$
start/end Coordinates within Col-0 assembly
Numeric. Should be range-checked.
strand Defines genomic strand
[\+\-\.]{1,1}
accession Ecotypes or natural accessions
Not validated at present
term Generic search term Valid text string. Useful for implementing full-text search
araport.org
Specifying Parameters
• At present, ADAMA services don’t know about or advertise their parameterization.– This will change VERY SOON!
• You will specify parameters and their metadata inside your service’s metadata.yml file
• ADAMA will use these to validate inbound HTTP requests
• ADAMA will use these parameters to dynamically create Swagger-based docs for community APIs
• Your API and its usage pattern will be discoverable inside the Javascript console
araport.org
Mockup: Parameters inside metadata.yml
Why? 1. YAML is easier to
read/write than JSON2. Help client application
developers create intuitive GUIs for accessing your services
3. Guide users towards success
araport.org
Responses: The Araport Data API JSON schema
• Facilitate creation of mash-up client applications
• Enable extraction and mining of the Arabidopsis deep web
• Facilitate interoperability with semantic web technologies without forcing their adoption by API developers
araport.org
Responses: Araport JSON 1curl –skL -XGET -H "Authorization: Bearer 624513772fbc2caf662b9accbf10380" https://api.araport.org/community/v0.3/aip/resolver_fetch_locus_by_synonym_v0.2/search?identifier=URIC_ARATH
{"result":[ {"relationships":[ {"direction":"undirected", "type":"synonymous_with", "scores":[ {"confidence":1}]}], "related_entity":"URIC_ARATH", "class":"locus_id_mapping", "locus":"AT2G26230", "related_entity_kind":"UniProtKB-ID"}], "metadata": {"time_in_main":0.020552873611450195}, "status":"success"}
araport.org
Responses: Araport JSON 2curl -sk -L -H "Authorization: Bearer 624513772fbc2caf662b9accbf10380" https://api.araport.org/community/v0.3/vaughn-dev/expressologs_by_locus_v0.1/search?locus=At2g26230
{"result":[ {"relationships":[ {"direction":"undirected", "type":"coexpression", "scores":[ {"correlation_coefficient":"0.2975"}]}, {"direction":"undirected", "type":"sequence_similarity", "scores":[ {"percentage":"66"}]}], "reference":"TAIR10", "locus":"AT2G26230", "related_entity":"Contig7331", "class":"locus_relationship", "other_data": {"probeset_A":"267374_at", "dataSource":"barley_mas", "efp_link":"http://bar.utoronto.ca/efp_barley/cgi-bin/efpWeb.cgi?dataSource=barley_mas&primaryGene=Contig7331&modeInput=Absolute", "probeset_B":"Contig7331_at"}}], "metadata": {"time_in_main":0.11528205871582031}, "status":"success"}
araport.org
Creating an Araport Data API (1)
• Decide which type of Data API you need to build• Create a local Git repository• Author a main function (Python only for now)• Test that it works in your local Python interpreter• Author a metadata.yml file describing the service• Submit your repository to Github• Perform an authenticated HTTP POST to the
community API with a a link to your repo • Verify that the service was created successfully• Test it out via HTTP request
araport.org
Creating an Araport Data API (2)
• Updating a deployed API– Incrementing the version number
(recommended)• Change the version number in metadata.yml
– Please use SEMANTIC VERSIONING
• Commit the change to the Git repo• POST the service• DELETE the old version (optional)
– Without incrementing version number:• Now: DELETE the previous version and re-POST• Soon: Re-submit the same repo via PUT
araport.org
ADAMA Road Map
• Automatic documentation for query, map, and generic types
• Parameter validation• Per-namespace and per-service Access
Control Lists• Provenance and attribution• Support for Java and Javascript• Integration with Agave CLI• Validation of responses against JSON
schema(s)
araport.org
Next Steps
• We need you to take the APIs we have started with and make them better
• Lots of assistance needed to design and document useful JSON schemas
• Need guidance on attribution and provenance
• Create new and innovative demonstrations of data federation in action!
• Look for a call for participation after this workshop concludes – we can’t do this alone!
araport.org
Questions?
• We’ll be having a breakout session tomorrow from 1:00 – 3:00 where we’ll have a hands-on tutorial material around AP building
• Check out example API codes on our project GitHub – they’re short and readable!
araport.org
Code Examples• https://github.com/Arabidopsis-Information-Portal/jcvi-rtpcr-demos• https://github.com/Arabidopsis-Information-Portal/
aip_thalemine_webservices• https://github.com/Arabidopsis-Information-Portal/
atted_webservices• https://github.com/Arabidopsis-Information-Portal/
bar_webservices_demos
In addition to our tutorial code, these are good, illustrative examples of ADAMA web services.
araport.org
Chris Town, PI
Lisa McDonaldEducation and Outreach Coordinator
Chris NelsonProject Manager
Jason Miller, Co-PIJCVI Technical Lead
Erik FerlantiSoftware Engineer
Vivek KrishnakumarBioinf. Engineer
Svetlana KaramychevaBioinf Engineer
Eva HualaProject lead, TAIR
Bob MullerTechnical lead, TAIR
Gos Micklem, co-PI Sergio ContrinoSoftware Engineer
Matt Vaughnco-PI
Steve MockPortal Engineer
Rion Dooley, API Engineer
Matt Hanlon, Portal Engineer
Maria KimBioinf Engineer
Ben RosenBioinf Analyst
Joe Stubbs, API Engineer
Walter Moreira, API Engineer
araport.org
API Manager + Enterprise Service Bus
Araport architecture (2)
Secure, rationalized REST services
Consumer Applications
Simple Proxy
ThaleMine, Data
integration, other services
Cache
XML-to-JSON
SOAP-to-REST
CGI-to-REST
Throttle
Legacy API A
Legacy API B
REST API C
Simple Proxy
• Single-sign on
• Throttling• Unified
logging• API
versioning• Mediation
and translation
• Dev-friendly interfaces
• Rationalized REST for consumer apps
Media
tors
araport.org
Science Objectives
• Make more, varied data available to the Arabidopsis (and other) communities within a unified user experience
• Enhance the innate value of data by offering enhanced search, retrieval, and display capabilities
• Facilitate analysis of user data• Enable community participation in
functional annotation
araport.org
Technical Objectives
• Deploy a responsive, flexible community-extensible system
• Provide APIs everywhere!• Promote and facilitate data integration• Enable language- and region-specific
presentation of scientific content• Meet mobile computing on its own
terms
araport.org
Local vs. Data-driven Apps
Resources are local and inherently offline.
Operating on local data using local computing.
Resources are cloud-based and inherently online. Multiple data streams integrated, queried,
presented in context of broader objective.
Photoshop Express KAYAK Pro
araport.org
Araport Bill of Materials
• Araport is currently built using– Drupal 7.25
• Developer-oriented content management system
– Bootstrap.js and some other Javascript toolkits– InterMine (with modifications)– Bioinformatics infrastructure + misc. other bits– Agave 2.0 Software as a Service platform
• Developed by iPlant Collaborative project• Bulk data, metadata, authentication, HPC app and job
management, notifications & events, and more• OAuth2 out of the box• Enterprise service bus (ESB) architecture• http://agaveapi.co/
araport.org
Agave wso2 interface
Cache (Technology TBD)
CSV
Araport APIM Architecture (1)
POLYMORPH CGI
Form
Input Key Map
Output Key Map
InputTransfor
m
OutputTransfor
m
Listen Respond
Send Listen
Input Key Map
Output Key Map
InputTransfor
m
OutputTransfor
m
Listen Respond
Send Listen
Araport API Manager
JSON Query JSON Response
ElasticSearch
Remote Services
SNP by Locus REST Indel by Position REST Enroll Manage
araport.org
Araport Architecture: Use Cases (1)
• 1001 Genomes POLYMORPH tools– Provides variation data via locus or positional
search– Total of seven variant types available for search– Search parameterization depends a lot on
variant type– Example of a plain-text CGI service– Returns results as CSV with named columns
• Objective: Transform into a RESTful API that expects and returns rationalized JSON
http://polymorph.weigelworld.org
araport.org
Araport Architecture: Use Cases (2)
• ThaleMine– Has native REST interface for general queries– Has templates which can form basis of
specific services
• Objective: Offer both Intermine-native and AIP-conformant interfaces as Data APIs
• Current path– Enroll native services in our APIM– Develop template-based AIP-conformant
serviceshttp://polymorph.weigelworld.org
araport.org
Data APIs: Getting StartedService Queries Notes
BAR eFP Locus
BAR Expressologs Locus
BAR Interactions Locus
COGe Position Special case – output transform only
NASC $SERVICE Locus SOAP based but may be offline permanently
OrthologFinder Locus Based on a Thalemine template
POLYMORPH Locus, Position Actually seven CGI services
SUBA3 Locus
Compiling example queries, parameter mapping and description, and ideal results for use in implementing the system
araport.org
Developing a Data API
• In order, we prefer that you have ready• Well-documented REST• Moderately well-documented REST• SOAP services (plus WSDL or WADL)• Plain Old XML• Plaintext CGI• HTML CGI• No web services at all
• Work with us to enroll your services as a data source. This will involve a minor amount of coding.
araport.org
Computational App Model (1)
Host file systems
Host OSDocker.io
Centos 6.4
custom-repo
Container
/scratch/
database
araport-compute-00
araport-storage-00
Host FS (250 GB)
TACC Corral (PB+)
sftp
Agave apps, data, jobs
REST API x JSON objects
araport.org
Science Apps: Grid View• Current Scheme
• 2-3 column view w draggable apps
• Apps are normal, full-size, or collapsed
• Single app screen• Later in 2014
• N x X grid scheme implementing resizable app “tiles” like one sees in Android or Win8.x
• App SDK libraries will have “help” for enabling resizable design
• Multiple app screens
araport.org
Data API Details (2)
• For service-specific parameters– Provide human-readable names mapped to original
parameter names– Offer minimal descriptive text– Specify validation
• Cardinality• Pattern validator (regex)• Type (number, string, etc.)
– Indicate whether required– Indicate whether they should be visible in a UI– Specify reasonable default values
• Seems familiar?– This approach is used to to abstract command line apps– Allows automatic generation of minimally functional UI
araport.org
Data APIs: Response types (1)
• locus_relationship – pairwise relationship between A and B– Directionality– Type– Array of scores (weights, etc.)
• sequence_feature – positional attribute– Extension of GFF model plus– Build– Attributes array
araport.org
Data APIs: Response types (2)
• locus_feature – key-value attributes per locus– Optional controlled vocabulary* for keys– Support for both slots and arrays
• raw – for returning images or other binary formats– Source and other metadata carried in X-headers instead
of JSON result– Outbound transformation still supported– Not a preferred response mode
• text – returning either native service response or a non-conformant JSON document– Source and other metadata carried in X-headers instead
of JSON result– Not a preferred response mode
araport.org
Data API Details (6)
• Transparent caching will compensate for transient remote service failures
• Automatic indexing of certain response types via ElasticSearch, allowing for sophisticated global search– ElasticSearch allows us to index everything
we “know about” and return it quickly– iPlant uses it to live-index >700 TB user data
araport.org
Developing an app
• Understand and document the user stories you’re addressing with your app
• Identify all requisite data sources AND• Help us prepare them as Data APIs
– This may involve coding
• Understand the data integration or aggregation needs of your app– This may involve coding
• Develop the user interface(s) for your app using our tool kits and suggested practices– This will involve coding.– But you will learn tools like jQuery, Bootstrap, & D3 and will
thus be eminently employable!