cytoscape: now and future

99
Cytoscape: Now and Future Keiichiro Ono UCSD Trey Ideker Lab Cytoscape Core Team Lab Meeting 12/1/2014

Upload: keiichiro-ono

Post on 07-Jul-2015

857 views

Category:

Science


3 download

DESCRIPTION

Presentation slides for Ideker Lab meeting on 12/1/2014. Briefly describes current status of the projects and future vision.

TRANSCRIPT

Page 1: Cytoscape: Now and Future

Cytoscape: Now and Future

Keiichiro Ono UCSD Trey Ideker Lab Cytoscape Core Team Lab Meeting 12/1/2014

Page 2: Cytoscape: Now and Future

NowFuture

Page 3: Cytoscape: Now and Future

Now: Current Status

- Overall Project Status

- Cytoscape 3.2

- cyREST

Page 4: Cytoscape: Now and Future

Future

Service-oriented workflow

Cytoscape-as-a-service

DockerReproducible Science

Web Technologies

NIH The Commons

Page 5: Cytoscape: Now and Future

Status

Page 6: Cytoscape: Now and Future

> 700 downloads/month

Page 7: Cytoscape: Now and Future
Page 8: Cytoscape: Now and Future
Page 9: Cytoscape: Now and Future

App Store Daily Download Plot

Page 10: Cytoscape: Now and Future
Page 11: Cytoscape: Now and Future
Page 12: Cytoscape: Now and Future

Project Status

Cytoscape Core: Apps:

Publications:

?

Page 13: Cytoscape: Now and Future

Status: Cytoscape Core

Page 14: Cytoscape: Now and Future

Cytoscape 3.2

Page 15: Cytoscape: Now and Future

Cytoscape 3.2

- Released early November

- What’s New?

- Chart Editor

- Export as Web Application - Performance Improvements

- Lots of bug fixes

Page 16: Cytoscape: Now and Future

Chart Editor

Page 17: Cytoscape: Now and Future

Chart Editor

Page 18: Cytoscape: Now and Future

Chart Editor- Visualize multiple data points

to a single view

- Time series data

- Multiple GO terms

- Chart types: Bar, Box, Pie, Heat Map, Ring

- Part of standard Visual Style Editor

- Everything will be saved into session files

Page 19: Cytoscape: Now and Future

Gradient Editor

Page 20: Cytoscape: Now and Future

Export as Web Application

Page 21: Cytoscape: Now and Future

Export as Web Application

Exporting Cytoscape-generated visualizations as a complete web application using Cytoscape.js

Page 22: Cytoscape: Now and Future
Page 23: Cytoscape: Now and Future
Page 24: Cytoscape: Now and Future
Page 25: Cytoscape: Now and Future

cyREST

Page 26: Cytoscape: Now and Future

Users

Page 27: Cytoscape: Now and Future

User Type I- Average computing skills

- Use Excel as their primary workbench for data analysis

- For them, bioinformatics means using some of NCBI/EBI web tools or DAVID

- Have tons of data not analyzed / visualized yet

- Excel is my friend.

Page 28: Cytoscape: Now and Future

User Type II- Advanced computing skills

- Use Python + SciPy /NumPy, R + Bioconductor, or MATLAB every day

- If necessary, write their own packages

- Use HPC technologies a lot

- Manual operation is evil.

Page 29: Cytoscape: Now and Future

Both of them are Important!- Type I: “Bench Biologists”

- Domain experts

- Data producers

- Type II: Computational Biologists

- Experts of large-scale data analysis

- Especially important for genome-scale data analysis

They are ignored for a long time in Cytoscape world…

Page 30: Cytoscape: Now and Future

User Type II - Advanced computing skills

- Use Python + SciPy /NumPy, R + Bioconductor, or MATLAB every day

- If necessary, write their own packages

- Use HPC technologies a lot

- Manual operation is evil.

Page 31: Cytoscape: Now and Future

Requests from Type II Users- I have 200 networks in my session and I need to create

one PDF per view. How can I do it with Cytoscape?

- I need to use igraph for network analysis, but its visualization feature is limited. I want to use Cytoscape as an external visualization engine for R.

- Usually I use IPython Notebook to record my work. How can I integrate Cytoscape into my workflow?

- I want to generate Style for each time point and create small multiples of networks.

Page 32: Cytoscape: Now and Future

REST

Page 33: Cytoscape: Now and Future

What is cyREST?

- Platform-independent, RESTful API module for Cytoscape

- Means you can access basic Cytoscape data objects programmatically

REST

Page 34: Cytoscape: Now and Future

Interactive Data Analysis Environments

In-House Databases External Computing Resources

- Graph Layout- Statistical Analysis- Data Pre-processing

RStudio

- NumPy- SciPy- Pandas- NetworkX

IPython Notebook

File / Code Hosting ServicesPublic Data Repository

PSICQUIC Services

EBI RDF Platform

Other Bioinformatics Web Applications / Services

- igraph- rCurl

Command Line Tools

> sed> awk> grep> curl

Web Browsers

Data Repository & Collaboration Service

Data Bus (Internet)

Your Workstation

Cytoscape App Store

Cytoscape Desktop

Apps

Core

REST

Page 35: Cytoscape: Now and Future

REST

Cytoscape 3.1+Clients

POST

PUT

DELETE

GET

Page 36: Cytoscape: Now and Future

Mapping Cytoscape API to HTTP Methods

Create

Read

Update

Delete

Cytoscape Operations

POST

GET

PUT

DELETE

HTTP Methods

Page 37: Cytoscape: Now and Future

Get full network with unique ID 52 as JSON

GET http://localhost:1234/v1/networks/52

Page 38: Cytoscape: Now and Future

http://localhost:1234/v1/networks/52

Page 39: Cytoscape: Now and Future

Demo: Cytoscape Controlled

from IPython NotebookREST

http://bit.ly/1wcKXVV

Page 40: Cytoscape: Now and Future

Ready to Use Now!

REST

http://apps.cytoscape.org/apps/cyrest

Page 41: Cytoscape: Now and Future

Future

Page 42: Cytoscape: Now and Future

History

Page 43: Cytoscape: Now and Future

2005

Page 44: Cytoscape: Now and Future
Page 45: Cytoscape: Now and Future

2005

- Cytoscape 2.2: Simple Java Application

- Google released an application called Google Maps beta

- “Re-discovery” of JavaScript, or Ajax

Page 46: Cytoscape: Now and Future

2014

Page 47: Cytoscape: Now and Future
Page 48: Cytoscape: Now and Future

2014- Cytoscape 3.2.0: (Modularized) Java Application

- Client applications are migrating to the web browsers

- “Pure” desktop applications are dying slowly…

- Even desktop applications depend on eternal services

- JavaScript everywhere

- Cloud Computing

- Scale-out over scale-up

Page 49: Cytoscape: Now and Future

Trend in Software Design

- An application is a collection of smaller services

- JavaScript is a first-class citizen in the world of programming languages

- Design application with cloud services in mind

Page 50: Cytoscape: Now and Future

http://12factor.net/

Page 51: Cytoscape: Now and Future

In the modern era, software is commonly delivered as a service: called web apps, or software-as-a-service. The twelve-factor app is a methodology for building software-as-a-service apps that:

• Use declarative formats for setup automation, to minimize time and cost for new developers joining the project

• Have a clean contract with the underlying operating system, offering maximum portability between execution environments

• Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration

• Minimize divergence between development and production, enabling continuous deployment for maximum agility

• And can scale up without significant changes to tooling, architecture, or development practices.

Page 52: Cytoscape: Now and Future
Page 53: Cytoscape: Now and Future

This MANIFESTO counters current trends in bioinformatics where institutes and companies are creating monolithic software solutions aimed mostly at end-users.

Page 54: Cytoscape: Now and Future

Let’s see what’s happening in (scientific) computing…

Page 55: Cytoscape: Now and Future

Bioinformatics Open Source Conference (BOSC)

Page 56: Cytoscape: Now and Future

@New York Times@Facebook HQ

in Boston

Page 57: Cytoscape: Now and Future

What I Have Learned…- Python is becoming the standard

language for “Data Scientists”

- Python itself is a very slow language, but is a perfect glue

- Lots of tools are made by scientists (e.g. Anaconda by Continuum)

- They do understand current problems in modern scientific computing, and trying to solve them

Page 58: Cytoscape: Now and Future
Page 59: Cytoscape: Now and Future

What I Have Learned…- Data visualization

- Visualization needs varies, especially for complex data sets like the one from life science domain

- For that purpose, Java is not the best language to implement applications

- Even large-scale data visualization applications are moving to the web browsers

- Canvas (Cytoscape.js), WebGL (Three.js), SVG (D3.js)

- Most of the talented hackers are working on the web browsers, i.e., JavaScript

Page 60: Cytoscape: Now and Future

WikiGalaxy: http://wiki.polyfra.me/#

Page 61: Cytoscape: Now and Future
Page 62: Cytoscape: Now and Future

Problems in Scientific Computing- No more free lunch

- Even if you buy expensive machines, you cannot get free performance gain anymore. You have to design your code for massively distributed environment. (From Scale-up to Scale-out)

- Complex Data Analysis Pipeline

- Need to build pipeline by connecting multiple resources, or services

- Needs for complex, customized data visualization

- Reproducibility

➡ But building, deploying, and maintaining reproducible pipeline is not straight-forward

Page 63: Cytoscape: Now and Future

What does this mean to biologists?

- “Omics-Scale" Data Analysis

- Need computing power beyond your workstations

- Need to build pipelines by connecting multiple resources, or services

➡ But developing, deploying, and maintaining reproducible, or “portable” pipeline is not straight-forward

Page 64: Cytoscape: Now and Future

What does this mean to biologists?

- Collaboration between scientists and software engineers will be more important

- Scientists should spend their time on science, not the details of JavaScript syntax or how to build large scale pipeline

- In other words, building bioinformatics computing environment itself is a research project

Page 65: Cytoscape: Now and Future

What does this mean to Cytoscape team?

- Cytoscape should work nicely with other tools

- All bioinformatics tools should work as a building block of large workflows

- In a long term, Cytoscape should be a collection of services

Page 66: Cytoscape: Now and Future

Universe of Tools for Bioinformatics

!

Page 67: Cytoscape: Now and Future

Cytoscape as a Collection of Services

Page 68: Cytoscape: Now and Future

Case Study 1

Page 69: Cytoscape: Now and Future

PANGIA App

Page 70: Cytoscape: Now and Future

Srivas, Rohith et al. “Assembling Global Maps of Cellular Function through Integrative Analysis of Physical and Genetic Networks.” Nature Protocols 6.9 (2011): 1308–1323. PMC. Web. 1 Dec. 2014.

Page 71: Cytoscape: Now and Future

Core algorithm 1 as Python

Java Implementation of Algorithms

Cytoscape 2.x Plugin

Biological Problem

Cytoscape 3.x App

Core algorithm 2 as Python

Core algorithm n as Python

PanGIA Service(Implement in Python again…?)

by Sourav

by Greg, Rohith

by Greg, Rothith and Cytoscape Team

by David

History of PanGIA Application

Page 72: Cytoscape: Now and Future

Lots of Duplicate Efforts!

Page 73: Cytoscape: Now and Future

Case Study 2

Page 74: Cytoscape: Now and Future

NeXO Web

Page 75: Cytoscape: Now and Future

NeXO Web

- Term Enrichment Analysis

- From list of genes, perform hypergeometric test over set of machine-generated ontology (NeXO) terms and display terms with p-values

- It is independent from all other parts of NeXO Web application

Page 76: Cytoscape: Now and Future

Term Enrichment Service API by Flask

Python CoreSciPy

NumPy

Overview of NeXO Term Enrichment Service

NeXO Web RESTful API

Page 77: Cytoscape: Now and Future

Term Enrichment Service API by Flask

Python CoreSciPy

NumPy

Overview of NeXO Term Enrichment Service

NeXO Web RESTful API

Page 78: Cytoscape: Now and Future

Option 1: As a Cytoscape App- Re-implement this algorithm as a Cytoscape App

(Java Application)

- Pros:

- Easy to install

- Cons:

- A lot of work…

- Should be written in Java

- Does not scale-out!

Page 79: Cytoscape: Now and Future

Option 2: As a Service- Wrap existing applications and deploy to platform of users’ choice:

- Laptops, private servers, and commercial cloud services (AWS/Google Computing Cloud, etc.)

- Pros:

- Scales-out

- Client-independent

- Workflow-friendly

- Cons:

- Need to adopt to the new way of software design

- Relatively more complex deployment

Page 80: Cytoscape: Now and Future

Summary

- Best practice: for future applications, implementing them as services and then call them from Cytoscape, IPython, RStudio, and other tools

- To make your algorithms available to both Type I (domain experts) and Type II (hardcore computational biologists) users, it is better to deploy them as a service, instead of an App

Page 81: Cytoscape: Now and Future

Does technology available to implement such applications / workflows?

Page 82: Cytoscape: Now and Future

Yes!

Page 83: Cytoscape: Now and Future

Key: Provenance

Page 84: Cytoscape: Now and Future

Data Workflow Environment

Page 85: Cytoscape: Now and Future

Data Workflow

Environment

Page 86: Cytoscape: Now and Future

Data

Workflow Environment

Page 87: Cytoscape: Now and Future

Data

Workflow

Environment

Page 88: Cytoscape: Now and Future

Software Distribution Problem

- “It-worked-on-my-machine” syndrome

- This is a serious problem especially when you want to share your workflow with collaborators.

Page 89: Cytoscape: Now and Future
Page 90: Cytoscape: Now and Future
Page 91: Cytoscape: Now and Future

What is Docker?

- Container to run applications in an isolated environment

- Application = Layer of images

- Sharable Environments

- Environments as code

Page 92: Cytoscape: Now and Future

https://www.docker.com/whatisdocker/

Page 93: Cytoscape: Now and Future

Docker Hub

- Sharing environments as code!

- Dockerfile - Definition of your container

- Example: http://bit.ly/15N23P8

Page 94: Cytoscape: Now and Future

Goal: Reproducible Science

Page 95: Cytoscape: Now and Future

We (the NIH) Are Working On, But As Yet Do Not Have Good Answers To:

1. Today, how much are we actually spending on data and software related activities?

2. How much should we be spending to achieve the maximum benefit to biomedical science relative to what we spend in other areas?

Biomedical Research as an Open Digital Enterprise by Philip E. Bourne Ph.D.Associate Director for Data Science (NIH)

Page 96: Cytoscape: Now and Future

Reproducibility

!  Most of the 27 Institutes and Centers of the NIH are currently reviewing the ability to reproduce research they are funding

!  The NIH recently convened a meeting with publishers to discuss the issue – a set of guiding principles arose

Biomedical Research as an Open Digital Enterprise by Philip E. Bourne Ph.D.Associate Director for Data Science (NIH)

Page 97: Cytoscape: Now and Future

The Cytoscape to a Cytoscape- Shares Core Concepts

- Graph Model

- Table associated with graph

- Style (Collection of visual mappings)

- Implemented as different collection of services

- Desktop Cytoscape

- Interactive network data visualizer on the web

- Optimized for ontology browsing (i.e., future version of NeXO Web)

Page 98: Cytoscape: Now and Future

• https://flic.kr/p/bFZpyg

• https://flic.kr/p/bmXUz1

Photo Credits