cytoscape: now and future
DESCRIPTION
Presentation slides for Ideker Lab meeting on 12/1/2014. Briefly describes current status of the projects and future vision.TRANSCRIPT
Cytoscape: Now and Future
Keiichiro Ono UCSD Trey Ideker Lab Cytoscape Core Team Lab Meeting 12/1/2014
NowFuture
Now: Current Status
- Overall Project Status
- Cytoscape 3.2
- cyREST
Future
Service-oriented workflow
Cytoscape-as-a-service
DockerReproducible Science
Web Technologies
NIH The Commons
Status
> 700 downloads/month
App Store Daily Download Plot
Project Status
Cytoscape Core: Apps:
Publications:
✔
✔
?
Status: Cytoscape Core
Cytoscape 3.2
Cytoscape 3.2
- Released early November
- What’s New?
- Chart Editor
- Export as Web Application - Performance Improvements
- Lots of bug fixes
Chart Editor
Chart Editor
Chart Editor- Visualize multiple data points
to a single view
- Time series data
- Multiple GO terms
- Chart types: Bar, Box, Pie, Heat Map, Ring
- Part of standard Visual Style Editor
- Everything will be saved into session files
Gradient Editor
Export as Web Application
Export as Web Application
Exporting Cytoscape-generated visualizations as a complete web application using Cytoscape.js
cyREST
Users
User Type I- Average computing skills
- Use Excel as their primary workbench for data analysis
- For them, bioinformatics means using some of NCBI/EBI web tools or DAVID
- Have tons of data not analyzed / visualized yet
- Excel is my friend.
User Type II- Advanced computing skills
- Use Python + SciPy /NumPy, R + Bioconductor, or MATLAB every day
- If necessary, write their own packages
- Use HPC technologies a lot
- Manual operation is evil.
Both of them are Important!- Type I: “Bench Biologists”
- Domain experts
- Data producers
- Type II: Computational Biologists
- Experts of large-scale data analysis
- Especially important for genome-scale data analysis
They are ignored for a long time in Cytoscape world…
User Type II - Advanced computing skills
- Use Python + SciPy /NumPy, R + Bioconductor, or MATLAB every day
- If necessary, write their own packages
- Use HPC technologies a lot
- Manual operation is evil.
Requests from Type II Users- I have 200 networks in my session and I need to create
one PDF per view. How can I do it with Cytoscape?
- I need to use igraph for network analysis, but its visualization feature is limited. I want to use Cytoscape as an external visualization engine for R.
- Usually I use IPython Notebook to record my work. How can I integrate Cytoscape into my workflow?
- I want to generate Style for each time point and create small multiples of networks.
REST
What is cyREST?
- Platform-independent, RESTful API module for Cytoscape
- Means you can access basic Cytoscape data objects programmatically
REST
Interactive Data Analysis Environments
In-House Databases External Computing Resources
- Graph Layout- Statistical Analysis- Data Pre-processing
RStudio
- NumPy- SciPy- Pandas- NetworkX
IPython Notebook
File / Code Hosting ServicesPublic Data Repository
PSICQUIC Services
EBI RDF Platform
Other Bioinformatics Web Applications / Services
- igraph- rCurl
Command Line Tools
> sed> awk> grep> curl
Web Browsers
Data Repository & Collaboration Service
Data Bus (Internet)
Your Workstation
Cytoscape App Store
Cytoscape Desktop
Apps
Core
REST
REST
Cytoscape 3.1+Clients
POST
PUT
DELETE
GET
Mapping Cytoscape API to HTTP Methods
Create
Read
Update
Delete
Cytoscape Operations
POST
GET
PUT
DELETE
HTTP Methods
Get full network with unique ID 52 as JSON
GET http://localhost:1234/v1/networks/52
http://localhost:1234/v1/networks/52
Future
History
2005
2005
- Cytoscape 2.2: Simple Java Application
- Google released an application called Google Maps beta
- “Re-discovery” of JavaScript, or Ajax
2014
2014- Cytoscape 3.2.0: (Modularized) Java Application
- Client applications are migrating to the web browsers
- “Pure” desktop applications are dying slowly…
- Even desktop applications depend on eternal services
- JavaScript everywhere
- Cloud Computing
- Scale-out over scale-up
Trend in Software Design
- An application is a collection of smaller services
- JavaScript is a first-class citizen in the world of programming languages
- Design application with cloud services in mind
http://12factor.net/
In the modern era, software is commonly delivered as a service: called web apps, or software-as-a-service. The twelve-factor app is a methodology for building software-as-a-service apps that:
• Use declarative formats for setup automation, to minimize time and cost for new developers joining the project
• Have a clean contract with the underlying operating system, offering maximum portability between execution environments
• Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration
• Minimize divergence between development and production, enabling continuous deployment for maximum agility
• And can scale up without significant changes to tooling, architecture, or development practices.
This MANIFESTO counters current trends in bioinformatics where institutes and companies are creating monolithic software solutions aimed mostly at end-users.
Let’s see what’s happening in (scientific) computing…
Bioinformatics Open Source Conference (BOSC)
@New York Times@Facebook HQ
in Boston
What I Have Learned…- Python is becoming the standard
language for “Data Scientists”
- Python itself is a very slow language, but is a perfect glue
- Lots of tools are made by scientists (e.g. Anaconda by Continuum)
- They do understand current problems in modern scientific computing, and trying to solve them
What I Have Learned…- Data visualization
- Visualization needs varies, especially for complex data sets like the one from life science domain
- For that purpose, Java is not the best language to implement applications
- Even large-scale data visualization applications are moving to the web browsers
- Canvas (Cytoscape.js), WebGL (Three.js), SVG (D3.js)
- Most of the talented hackers are working on the web browsers, i.e., JavaScript
WikiGalaxy: http://wiki.polyfra.me/#
Problems in Scientific Computing- No more free lunch
- Even if you buy expensive machines, you cannot get free performance gain anymore. You have to design your code for massively distributed environment. (From Scale-up to Scale-out)
- Complex Data Analysis Pipeline
- Need to build pipeline by connecting multiple resources, or services
- Needs for complex, customized data visualization
- Reproducibility
➡ But building, deploying, and maintaining reproducible pipeline is not straight-forward
What does this mean to biologists?
- “Omics-Scale" Data Analysis
- Need computing power beyond your workstations
- Need to build pipelines by connecting multiple resources, or services
➡ But developing, deploying, and maintaining reproducible, or “portable” pipeline is not straight-forward
What does this mean to biologists?
- Collaboration between scientists and software engineers will be more important
- Scientists should spend their time on science, not the details of JavaScript syntax or how to build large scale pipeline
- In other words, building bioinformatics computing environment itself is a research project
What does this mean to Cytoscape team?
- Cytoscape should work nicely with other tools
- All bioinformatics tools should work as a building block of large workflows
- In a long term, Cytoscape should be a collection of services
Universe of Tools for Bioinformatics
!
Cytoscape as a Collection of Services
Case Study 1
PANGIA App
Srivas, Rohith et al. “Assembling Global Maps of Cellular Function through Integrative Analysis of Physical and Genetic Networks.” Nature Protocols 6.9 (2011): 1308–1323. PMC. Web. 1 Dec. 2014.
Core algorithm 1 as Python
Java Implementation of Algorithms
Cytoscape 2.x Plugin
Biological Problem
Cytoscape 3.x App
Core algorithm 2 as Python
Core algorithm n as Python
PanGIA Service(Implement in Python again…?)
by Sourav
by Greg, Rohith
by Greg, Rothith and Cytoscape Team
by David
History of PanGIA Application
Lots of Duplicate Efforts!
Case Study 2
NeXO Web
NeXO Web
- Term Enrichment Analysis
- From list of genes, perform hypergeometric test over set of machine-generated ontology (NeXO) terms and display terms with p-values
- It is independent from all other parts of NeXO Web application
Term Enrichment Service API by Flask
Python CoreSciPy
NumPy
Overview of NeXO Term Enrichment Service
NeXO Web RESTful API
Term Enrichment Service API by Flask
Python CoreSciPy
NumPy
Overview of NeXO Term Enrichment Service
NeXO Web RESTful API
Option 1: As a Cytoscape App- Re-implement this algorithm as a Cytoscape App
(Java Application)
- Pros:
- Easy to install
- Cons:
- A lot of work…
- Should be written in Java
- Does not scale-out!
Option 2: As a Service- Wrap existing applications and deploy to platform of users’ choice:
- Laptops, private servers, and commercial cloud services (AWS/Google Computing Cloud, etc.)
- Pros:
- Scales-out
- Client-independent
- Workflow-friendly
- Cons:
- Need to adopt to the new way of software design
- Relatively more complex deployment
Summary
- Best practice: for future applications, implementing them as services and then call them from Cytoscape, IPython, RStudio, and other tools
- To make your algorithms available to both Type I (domain experts) and Type II (hardcore computational biologists) users, it is better to deploy them as a service, instead of an App
Does technology available to implement such applications / workflows?
Yes!
Key: Provenance
Data Workflow Environment
Data Workflow
Environment
Data
Workflow Environment
Data
Workflow
Environment
Software Distribution Problem
- “It-worked-on-my-machine” syndrome
- This is a serious problem especially when you want to share your workflow with collaborators.
What is Docker?
- Container to run applications in an isolated environment
- Application = Layer of images
- Sharable Environments
- Environments as code
https://www.docker.com/whatisdocker/
Docker Hub
- Sharing environments as code!
- Dockerfile - Definition of your container
- Example: http://bit.ly/15N23P8
Goal: Reproducible Science
We (the NIH) Are Working On, But As Yet Do Not Have Good Answers To:
1. Today, how much are we actually spending on data and software related activities?
2. How much should we be spending to achieve the maximum benefit to biomedical science relative to what we spend in other areas?
Biomedical Research as an Open Digital Enterprise by Philip E. Bourne Ph.D.Associate Director for Data Science (NIH)
Reproducibility
! Most of the 27 Institutes and Centers of the NIH are currently reviewing the ability to reproduce research they are funding
! The NIH recently convened a meeting with publishers to discuss the issue – a set of guiding principles arose
Biomedical Research as an Open Digital Enterprise by Philip E. Bourne Ph.D.Associate Director for Data Science (NIH)
The Cytoscape to a Cytoscape- Shares Core Concepts
- Graph Model
- Table associated with graph
- Style (Collection of visual mappings)
- Implemented as different collection of services
- Desktop Cytoscape
- Interactive network data visualizer on the web
- Optimized for ontology browsing (i.e., future version of NeXO Web)
• https://flic.kr/p/bFZpyg
• https://flic.kr/p/bmXUz1
Photo Credits
2014 Keiichiro Ono [email protected]