cytegeist: a framework for biological data analysis · 2017-11-20 · cytegeist: a framework for...
TRANSCRIPT
Cytegeist: A framework for biological data analysis Adam Treister1, Anja Mirenska2, Christian Hennig2
1) Gladstone Institutes, San Francisco, CA 2) Zellkraftwerk GmbH, Hannover, Germany
A New Cellular Metaphor
Complex experimental analysis is broken down into individual
steps executed by individual pieces of software. The traditional
process of chaining analytical steps together is known as
pipelining. The plumbing metaphor is accepted and ubiquitous,
but also reflects the limitations of an industrial age, where
architectures are static and changes require intervention by skilled
laborers to fundamentally reconstruct the pipeline. We propose
here a new approach to combining tools in a cellular
metaphor. We claim this architecture is more appropriate to
modern (Big Data) programming problems, where small amounts
of work are performed in parallel by independent work cells.
This is not a new computer architecture. It is merely a rebranding
of existing technology to better fit the mental model of
the biologist. The Service Oriented Architecture (SOA)
approach to enterprise computing is well-accepted as a scalable
abstraction that can distribute problems across a sea of virtual
machines. Well-defined service contracts are required to move
data between services, analogous to pathway diagrams used to
explain biological mechanisms.
As with cellular processes, there is a strict distinction between
inside and outside the cell, as well as a nucleus of essential data
that is translated into pieces of output. Internal pathways are used
to:
• normalize data
• map terms and abbreviations
• visualize data sets
• track workflow protocols.
Intermediate results and internal regulatory mechanisms float in a
protected environment inside the cell. At the same time a wide
array of services can access external databases and ontologies,
interact with robots, outsource extended calculations, share data,
and publish reports. As select analyses mature, they can be
cloned to scale up to the limits of available resources.
Cytegeist is fundamentally written in Java 1.8+. It is platform-
agnostic, data type-independent, and open source. Through
the CyRest interface, it inherits state transfer access to Python,
Ruby and R scripting. Interface elements are coded in JavaFX for
desktop and JavaScript for web clients. Data interchange is
generally pushed through JSON, XML and RDF channels, but
specific formats are easy to plug into the extensible architecture.
Summary
We propose a framework of analysis tools that interact through a
service API resembling an antigen binding model. The goal is to
provide a context that captures a modern service architecture in a
format comfortable to the biologist. We are looking for partners in
all aspects of developing open methods of analysis
and visualization.
Contact at: [email protected]
References
1. Wikipedia; Instruction Pipelining; https://en.wikipedia.org/wiki/Instruction_pipelining. 2. Oracle; An Enterprise Architect’s Guide to Big Data; http://www.oracle.com/technetwork/topics/entarch/articles/oea-big-data-guide-1522052.pdf. 3. Zellkraftwerk, GmBH; Clinical Immunology; http://www.zellkraftwerk.com/files/Clinical_Immunology.pdf 4. PrattD, et.al.; NDEx, the Network Data Exchange; http://www.cell.com/cell-systems/abstract/S2405-4712(15)00147-7 5. CyREST: Turbocharging Cytoscape Access for External Tools via a RESTful API; http://f1000research.com/articles/4-478/v1
The National Resource for Network Biology is an NIH Biomedical Technology Research Center (BTRC)
supported by the National Institute of General Medical Sciences (NIGMS) under grant P41 GM103504.
Core Data
Classifiers
Normalizers
Intermediate Data
(Projectplasm)
Visualizations
ID Mapping
Methods in Experimental Description Language
Visualizations
Image Retrieval
Literature Searching
Zellkraftwerk
Z
Pathway Diagrams