the open connectome project data cluster: scalable ... · the open connectome project data cluster:...

11
The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience Randal Burns * William Gray Roncal Dean Kleissas Kunal Lillaney * Priya Manavalan * Eric Perlman § Daniel R. Berger k Davi D. Bock § Kwanghun Chung ‡‡ Logan Grosenick ‡‡ Narayanan Kasthuri Nicholas C. Weiler §§ Karl Deisseroth ‡‡ Michael Kazhdan Jeff Lichtman R. Clay Reid ** Stephen J. Smith §§ Alexander S. Szalay †† Joshua T. Vogelstein R. Jacob Vogelstein ABSTRACT We describe a scalable database cluster for the spatial anal- ysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomesneural connectivity maps of the brain—using the parallel ex- ecution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We di- rect I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, im- proving scalability and usability. We include a performance evaluation of the production system, highlighting the effec- tiveness of spatial data organization. Categories and Subject Descriptors H.2.4 [Database Management]: Systems—Distributed Databases ; H.2.8 [Database Management]: Database Ap- plications—Scientific Databases ; J.3 [Computer Applica- tions]: Life and Medical Sciences—Biology and Genetics General Terms Data-intensive computing, Connectomics This work was supported by the National Institutes of Health (NIBIB 1RO1EB016411-01) and the National Sci- ence Foundation (OCI-1244820 and CCF-0937810). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. SSDBM ’13, July 29 - 31 2013, Baltimore, MD, USA Copyright 2013 ACM 978-1-4503-1921-8/13/07 $15.00 1. INTRODUCTION The neuroscience community faces a scalability crisis as new high-throughput imaging technologies come online. Most notably, electron microscopes that image serial sections now produce data at more than one terabyte per day. Data at this scale can no longer be stored, managed, and analyzed on workstations in the labs of the scientists that collect them. In response to this crisis, we have developed a community database cluster based on the principles of data-intensive computing [10] and Open Science [26]. Labs contribute imaging data to the Open Connectome Project (OCP). In exchange, OCP provides storage and analysis Web-services that relieve the data management burden for neuroscien- tists and create an incentive to make data publicly available. (Data analysis products may be kept private.) This open science model replicates that pioneered by the Sloan Digital Sky Survey [36] for observational astronomy, democratizing world-class data sets by making them freely available over the Internet. To date, the model has been well received by leading neuroscientists; OCP manages the largest image stack [3] and the most detailed neural reconstruction [16] collected to date and have partnered with both teams to re- ceive data streams from the next generation of instruments. Data-intensive computing will create the capability to re- construct neural circuits at a scale that is relevant to char- acterizing brain systems and will help to solve grand chal- lenge problems, such as building biologically-inspired com- puter architectures for machine learning and discovering “connecto-pathies”—signatures for brain disease based on neural connectivity that are diagnostic and prognostic. Pre- vious studies have been limited by analysis capabilities to image volumes representing tens of neurons [3]. At present, automated reconstruction tools are neither sufficiently ac- curate nor scalable [32, 13]. The largest dense neural re- construction describes only 100s of GB of data [31]. As a consequence, current analyses depends on humans using manual annotation tools to describe neural connectivity [4] or to correct the output of computer vision algorithms [12]. The gap between the size of the system and the state of current technology scopes the problem. A graph represent- ing the human brain has a fundamental size of 10 11 vertices (neurons) and 10 15 edges (synapses). Mouse brains have 10 9 nodes and 10 13 vertices. We conclude that manual an- notation cannot reach these scales. More accurate computer vision and scalable data systems must be developed in order to realize the ultimate goal of a full reconstruction of the brain—the human connectome or human brain map. arXiv:1306.3543v2 [cs.DC] 18 Jun 2013

Upload: others

Post on 31-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Open Connectome Project Data Cluster: Scalable ... · The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience Randal Burns William

The Open Connectome Project Data Cluster ScalableAnalysis and Vision for High-Throughput Neuroscience

Randal Burnslowast William Gray Roncaldagger Dean Kleissasdagger

Kunal Lillaneylowast Priya Manavalanlowast Eric Perlmansect

Daniel R Berger Davi D Bocksect Kwanghun ChungDaggerDagger

Logan GrosenickDaggerDagger Narayanan Kasthuripara Nicholas C Weilersectsect

Karl DeisserothDaggerDagger Michael KazhdanDagger Jeff Lichtmanpara

R Clay Reidlowastlowast Stephen J Smithsectsect Alexander S Szalaydaggerdagger

Joshua T VogelsteinDagger R Jacob Vogelsteindagger

ABSTRACTWe describe a scalable database cluster for the spatial anal-ysis and annotation of high-throughput brain imaging datainitially for 3-d electron microscopy image stacks but fortime-series and multi-channel data as well The system wasdesigned primarily for workloads that build connectomesmdashneural connectivity maps of the brainmdashusing the parallel ex-ecution of computer vision algorithms on high-performancecompute clusters These services and open-science data setsare publicly available at openconnectome

The system design inherits much from NoSQL scale-out anddata-intensive computing architectures We distribute datato cluster nodes by partitioning a spatial index We di-rect IO to different systemsmdashreads to parallel disk arraysand writes to solid-state storagemdashto avoid IO interferenceand maximize throughput All programming interfaces areRESTful Web services which are simple and stateless im-proving scalability and usability We include a performanceevaluation of the production system highlighting the effec-tiveness of spatial data organization

Categories and Subject DescriptorsH24 [Database Management] SystemsmdashDistributedDatabases H28 [Database Management] Database Ap-plicationsmdashScientific Databases J3 [Computer Applica-tions] Life and Medical SciencesmdashBiology and Genetics

General TermsData-intensive computing Connectomics

This work was supported by the National Institutes ofHealth (NIBIB 1RO1EB016411-01) and the National Sci-ence Foundation (OCI-1244820 and CCF-0937810)

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page Copyrights for componentsof this work owned by others than ACM must be honored Abstracting withcredit is permitted To copy otherwise or republish to post on servers or toredistribute to lists requires prior specific permission andor a fee Requestpermissions from PermissionsacmorgSSDBM rsquo13 July 29 - 31 2013 Baltimore MD USACopyright 2013 ACM 978-1-4503-1921-81307 $1500

1 INTRODUCTIONThe neuroscience community faces a scalability crisis as newhigh-throughput imaging technologies come online Mostnotably electron microscopes that image serial sections nowproduce data at more than one terabyte per day Data atthis scale can no longer be stored managed and analyzed onworkstations in the labs of the scientists that collect them

In response to this crisis we have developed a communitydatabase cluster based on the principles of data-intensivecomputing [10] and Open Science [26] Labs contributeimaging data to the Open Connectome Project (OCP) Inexchange OCP provides storage and analysis Web-servicesthat relieve the data management burden for neuroscien-tists and create an incentive to make data publicly available(Data analysis products may be kept private) This openscience model replicates that pioneered by the Sloan DigitalSky Survey [36] for observational astronomy democratizingworld-class data sets by making them freely available overthe Internet To date the model has been well receivedby leading neuroscientists OCP manages the largest imagestack [3] and the most detailed neural reconstruction [16]collected to date and have partnered with both teams to re-ceive data streams from the next generation of instruments

Data-intensive computing will create the capability to re-construct neural circuits at a scale that is relevant to char-acterizing brain systems and will help to solve grand chal-lenge problems such as building biologically-inspired com-puter architectures for machine learning and discoveringldquoconnecto-pathiesrdquomdashsignatures for brain disease based onneural connectivity that are diagnostic and prognostic Pre-vious studies have been limited by analysis capabilities toimage volumes representing tens of neurons [3] At presentautomated reconstruction tools are neither sufficiently ac-curate nor scalable [32 13] The largest dense neural re-construction describes only 100s of GB of data [31] Asa consequence current analyses depends on humans usingmanual annotation tools to describe neural connectivity [4]or to correct the output of computer vision algorithms [12]The gap between the size of the system and the state ofcurrent technology scopes the problem A graph represent-ing the human brain has a fundamental size of 1011 vertices(neurons) and 1015 edges (synapses) Mouse brains have109 nodes and 1013 vertices We conclude that manual an-notation cannot reach these scales More accurate computervision and scalable data systems must be developed in orderto realize the ultimate goal of a full reconstruction of thebrainmdashthe human connectome or human brain map

arX

iv1

306

3543

v2 [

csD

C]

18

Jun

2013

Figure 1 Visualization of the spatial distributionof synapses detected in the mouse visual cortex ofBock et al [3]

The Open Connectome Project was specifically designed tobe a scalable data infrastructure for parallel computer visionalgorithms that discover neural connectivity through imageprocessing Researchers have many different approaches tothe task For example the segmentation process which di-vides the image into bounded regions may be performedwith feed-forward neural networks [14] or by geometric trac-ing [17] However neural reconstruction has some funda-mental properties that we capture in system design Algo-rithms realize parallelism through a geometric decomposi-tion of the data OCP provides a cutout service to extractspatially contiguous regions of the data including projec-tions to lower dimensions We use indexes derived fromspace-filling curves to partition data which makes cutoutqueries efficient and (mostly) uniform across lower dimen-sional projections Partitions are striped across disk arraysto realize parallel IO and to multiple nodes for scale-outVision algorithms output descriptions of neural connectivityWe capture these in a relational database of neural objectmetadata [19] linked to a spatial annotation database thatstores the structures Algorithms run in multiple phasesassembling structure from the output of previous stageseg fusing previous segmentations into neurons We sup-port queries across images and annotations Annotationdatabases are spatially registered to images Finally wesupport spatial queries for individual objects and regionsthat are used in analysis to extract volumes find nearestneighbors and compute distances

The Open Connectome Project stores more than 50 uniquedata sets totaling more than 75TB of data Connectomesrange from the macro (magnetic resonance imaging of hu-man subjects at 1 mm3) to the micro (electron microscopyof mouse visual cortex at 4nm times 4nm times 40nm) We havedemonstrated scalable computer vision in the system by ex-tracting more than 19 million synapse detections from a 4trillion pixel image volume one quarter scale of the largestpublished EM connectome data [3] This involved a clus-ter of three physical nodes with 186 cores running for threedays communicating with the OCP cutout and annotationservice over the Internet

2 DATA AND APPLICATIONSWe present two examples data sets and the correspondinganalysis as use cases for Open Connectome Project services

Figure 2 Electron microscopy imaging of a mousesomatosensory cortex [16] overlaid by manual an-notations describing neural objects including ax-ons dendrites and synapses These images werecutout from two spatially registered databases anddisplayed in the CATMAID Web viewer [34]

The data themselves are quite similar high-resolution elec-tron microscopy of a mouse brain However the analyses ofthese data highlight different services

The bock11 data [3] demonstrates state-of-the-art scalabil-ity and the use of parallel processing to perform computervision The image data are the largest published collec-tion of high-resolution images covering a volume of roughly450x350x50 microns with 20 trillion voxels at a resolutionof 4x4x40nm Volumes at this scale just start to containthe connections between neurons Neurons have large spa-tial extent and connections can be analyzed when both cellsand all of the connection wiring (dendritesynapseaxon) liewithin the volume We are using this data to explore thespatial distribution of synapses identifying clusters and out-liers to generate a statistical model of where neurons con-nect Our synapse-finding vision algorithm extracts morethan 19 millions locations in the volume (Figure 1) Wehave not yet characterized the precision and recall of thistechnique Thus this exercise is notable for its scale onlywe ran 20 parallel instances and processed the entire volumein less than 3 days For comparison Bock et al [3] collectedthis data so that they could manually trace 10 neurons 245synapses and 185 postsynaptic targets over the course of270 human days We build a framework for running thistask within the LONI [33] parallel execution environment

The kasthuri11 data [16] shows how spatial analysis canbe performed using object metadata and annotations (Fig-ure 2) This data has the most detailed and accuratemanual annotations Two regions of 1000x1000x100 and1024x1024x256 voxels have been densely reconstructed la-beling every structure in the volume Three dendrites thatspan the entire 12000x12000x1850 voxel volume have hadall synapses that attach to dendritic spines annotated OCPhas ingested all of these manual annotations including ob-ject metadata for all structures and a spatial database ofannotated regions This database has been used to answerquestions about the spatial distribution of synapses (con-nections) with respect to the target dendrite (major neuronbranch) This analysis proceeds based on (1) using meta-data to get the identifiers of all synapses that connect to thespecified dendrite and then (2) querying the spatial extentof the synapses and dendrite to compute distances The lat-ter stage can be done by extracting each object individuallyor specifying a list of objects and a region and having the

Figure 3 Visualization of six channels array tomog-raphy data courtesy of Nick Weiler and StephenSmith [28 22] Data were drawn from a 17-channeldatabase and rendered by the OCP cutout service

database filter out all other annotations We also use thedensely annotated regions as a ldquoground truthrdquo for evaluat-ing machine vision reconstruction algorithms

3 DATA MODELThe basic storage structure in OCP is a dense multi-dimensional spatial array partitioned into cuboids (rectan-gular subregions) in all dimensions Cuboids in OCP aresimilar in design and goal to chunks in ArrayStore [39] Eachcuboid gets assigned an index using a Morton-order space-filling curve (Figure 4) Space-filling curves organize datarecursively so that any power-of-two aligned subregion iswholly contiguous in the index [30] Space-filling curves min-imize the number of discontiguous regions needed to retrievea convex shape in a spatial database [23] While the Hilbertcurve has the best properties in this regard we choose theMorton-order curve for two reasons It is simple to evaluateusing bit interleaving of offsets in each dimension unlikeother curves that are defined recursively Also cube ad-dresses are strictly non-decreasing in each dimension so thatthe index works on subspaces Non-decreasing offsets alsoaid in interpolation filtering and other image processingoperations [15 7]

Image data contain up to 5 dimensions and often exhibitanisotropy For example serial section electron microscopydata come from imaging sections created by slicing or ablat-ing a sample The resolution of image plane (XY) is deter-mined by the instrument and the sections (Z) by the section-ing technique The difference in resolution is often a factorof 10 The fourth and fifth dimension arise in other imagingmodalities Some techniques produce time-series such asfunctional magnetic-resonance and light-microscopy of cal-cium channels Others image multiple channels that corre-spond to different proteins or receptors (Figure 3) Thesedata are studied by correlating multiple channels eg spa-tial co-occurence or exclusion to reveal biology

31 Physical DesignWe store a multi-resolution hierarchy for each image dataset so that analyses and visualization can choose the ap-propriate scale For EM data each lower resolution reducesthe data size by a factor of four halving the scale in X andY Typically we do not scale Z because it is poorly resolvedWe also do not scale time or channels The bock11 data hasnine levels and kasthuri11 six As an example of choosingscale our bock11 synapse detector runs on high resolution

Figure 4 Partitioning the Morton (z-order) space-filling curve For clarity the figure shows 16 cuboidsin 2-dimensions mapping to four nodes The z-ordercurve is recursively defined and scales in dimensionsand data size

Figure 5 The resolution hierarchy scales the XYdimensions of cuboids but not Z So that cuboidscontain roughly equal lengths in all dimensions

data because synapses have limited spatial extent (tens ofvoxels in any dimension) We detect synapses at resolutionone four times smaller and four times faster than the rawimage data We found that the algorithms was no less accu-rate at this scale We analyze large structures that cannotcontain synapses such as blood vessels and cell bodies tomask out false positives We developed this software for thisanalysis but many of the techniques follow those describein ilastik [38] We conduct the analysis at resolution 5 inwhich each voxel represents a (32321) voxel region in theraw data The structures are large and detectable at lowresolution and the computation requires all data to be inmemory to run efficiently

Resolution scaling and workload dictate the the dimensionand shape of cuboids For EM data computer vision algo-rithms operate on data regions that are roughly cubic in theoriginal sample so that they can detect brain anatomy in allorientations Because the X and Y dimensions are scaledbut not Z this results in different sized voxel requests atdifferent resolutions For this reason use different shapedcuboids at different levels in the hierarchy (Figure 5) Forexample at the highest three resolutions in bock11 cuboidsare flat (128x128x16) because each voxel represents 10 timesas much length in Z as X and Y Beyond level 4 we shift

to a cube of (64x64x64) We include time series in the spa-tial index using a 4-d space filling curve Time is oftenas large as other dimensions 1000s of time points in MRdata This supports queries that analyze the time historyof a smaller region We do not include channel data in theindexmdashthere are different cuboids for different channelsmdashbecause the number of channels tend to be few (up to 17)and most analyses look at combinations of a few channels

Cuboids contain only 218 = 256K of data which is a com-promise among the different uses of the data This sizemay be too small to optimize IO throughput Space fillingcurves mitigate this by ensuring that larger aligned regionsare stored sequentially and can be read in a single stream-ing IO An important use of our service extracts lower-dimensional projections for visualization of EM data andsub-space clustering of 4-d and 5-d data Keeping the cuboidsize small reduces the total amount of data to be read (anddiscarded) for these queries

32 AnnotationsAn annotation project contains a description of the spa-tial extent and metadata for objects detected in an imagedatabase Each project has a spatial database registered toan image data set Projects may be shared among teams ofhuman annotators that want a coherent view of what struc-tures have been labeled Projects are also plentiful eachparameterization of a computer vision algorithm may havea project so that outputs can be compared Annotations alsohave a resolution hierarchy so that large structures such ascell bodies may be found and marked at low resolution anddetail synapses and dendritic spines at high resolution

Anldquoannotationrdquoconsists of an object identifier that is linkedto object metadata in the RAMON (Reusable AnnotationMarkup for Open coNectomes) neuroscience ontology [19]and a set of voxels labeled with that identifier in the spa-tial database RAMON is a hierarchical systems engineeringframework for computer-vision and brain anatomy that in-clude concepts such as synapses seeds neurons organellesetc We developed RAMON for use within the Open Con-nectome Project community of collaborators it is not a stan-dard Although annotations are often sparse we store themin dense cuboids We found that this design is flexible andefficient When annotations are dense such as the outputof the automatic image segmentation [31] storing them incuboids outperforms sparse lists by orders of magnitude Toimprove performance when annotations are sparse we allo-cate cuboids lazily regions with no annotations use no stor-age and can be ignored during reads We also gzip compresscube data Cube labels compress well because they havelow entropy with both many zero values and long repeatedruns of non-zero values (labeled regions) Recent implemen-tations of run length encoding [1 44] may be preferable butwe have not evaluated them

We support multiple annotations per voxel through excep-tions An exceptions list per cuboid tracks multiply-labeledvoxels Exceptions are activated on a per project basiswhich incurs a minor runtime cost to check for exceptionson every read even if no exceptions are defined Each anno-tation write specifies how to deal with conflicting labels byoverwriting or preserving the previous label or by creatingan exception

For performance reasons we initially store annotations atsingle level in the resolution hierarchy and propagate themto all levels as a background batch IO job The conse-quence is that annotations are only immediately visible atthe resolution at which they are written The alternativewould be to update all levels of the hierarchy for each writeWhile there are data structures and image formats for this

Figure 6 Original (left) and color corrected (right)images across multiple serial sections [16]

task such as wavelets [27] the incremental maintenance ofa resolution hierarchy always makes updates more complexand expensive The decision to not make annotations con-sistent instantaneously reflects how we perform vision andanalysis Multiple algorithms are run to detect differentstructures at different levels and fuse them together Thenwe build the hierarchy of annotations prior to performingspatial analysis Because write IO performance limits sys-tem throughput we sacrifice data consistency to optimizethis workflow We may need to revisit this design decisionfor other applications

33 TilesAt present we store a redundant version of the image datain a 2-d tile stack as storage optimization for visualizationin the Web Viewer CATMAID [34] Tiles are 256x256 to1024x1024 image sections used in a pan and zoom interactiveviewer CATMAID dynamically loads and prefetches tiles sothat the user can experience continuous visual flow throughthe image region

The cutout service provides the capability to extract imageplanes as an alternative to storing tiles To do so it reads3-d cubes extracts the requested plane and discards thevast majority of the data To dynamically build tiles forCATMAID we use an http rewrite rule to convert a requestfor a file url into an invocation of the cutout Web service

Our current practice stores tiles for the image planendashthedimension of highest isotropic resolutionndashand dynamicallybuilds tiles from the cutout service for the orthogonal di-mensions including time Most visualization is done in theimage plane because anisotropy exposure differences andimperfect registration between slices makes images more dif-ficult to interpret (Figure 6) We modify CATMAIDrsquos stor-age organization to create locality within each file systemdirectory By default CATMAID keeps a directory for eachslice so that a tile at resolution r at coordinates (x y z) isnamed zy_x_rpng This places files from multiple reso-lutions in the same directory which decreases lookup speedand sequential IO Again using http rewrite rules we re-structure the hierarchy into rzy_xpng This halves thenumber of files per directory and more importantly eachdirectory corresponds to a single CATMAID viewing plane

Figure 7 OCP Data Cluster and clients as configured to run and visualize the parallel computer visionworkflow for synapse detection (Section 2)

We believe that we can provide performance similar to tilesin the cutout service through a combination of prefetchingand caching Doing so would eliminate the redundant stor-age of tiles Instead of doing a planar cutout for each tilewe could round the request up to the next cuboid and ma-terialize and cache all the nearby tiles either on the serveror in a distributed memory cache This is future work

34 Data CleaningWhile not a database technology we mention that we colorcorrect image data to eliminate exposure differences betweenimage planes The process solves a global Poisson equa-tion to minimize steep gradients in the low frequencies out-putting smoothed data High-frequencies from the originalimage are added back in to preserve the edges and struc-tures used in computer vision The process was derived fromwork by Kazhdan [18] The resulting color corrected dataare easier to interpret in orthogonal views (Figure 6) Wealso believe that they will improve the accuracy of computervision a hypothesis that we are exploring

4 SYSTEM DESIGNWe describe the OCP Data Cluster based on the configu-ration we used for running the synapse detector (Figure 7)because this demonstrates a parallel computer vision appli-cation The figure includes parallel vision pipelines usingWeb services and a visualization function run from a Webbrowser both connecting over the public Internet

41 ArchitectureThe OCP Data Cluster consists of heterogeneous nodes eachdesigned for a different function or workload These includeapplication servers database nodes file system nodes andSSD IO nodes The database nodes store all of the imageand annotation data for cutout (except for high-IO anno-tation projects on SSD IO nodes) We currently have twoDell R710s with dual Intel Xeon E5630 quad-core proces-sors 64GB of memory and Dell H700 RAID controllersThe twelve hot-plug drive bays hold a RAID-6 array of 112TB (resp 3TB) SATA drives and a hot-spare for an 18TB(resp 27TB) file system Two drives have failed and rebuiltfrom the hot spare automatically in the last 18 months Weuse MySQL for all database storage

Application servers perform all network and data manipula-tion except for database query processing This includespartitioning spatial data requests into multiple databasequeries and assembling rewriting and filtering the re-sults We currently deploy two Web-servers in a load-balancing proxy They run on the same physical hardware as

our database nodes because processor capabilities of thesenodes exceed the compute demand of the databases Thesefunctions are entirely separable

File server nodes store CATMAID image tiles for visualiza-tion image data streamed from the instruments over theInternet and other ldquoprojectrdquo data from our partners thatneeds to be ingested into OCP formats The nodes are de-signed for capacity and sequential read IO We have a DellR710 with less memory (16GB) and compute (one E5620processor) than the database nodes The machine has thesame IO subsystem with a 27TB file system We use twoData-Scope nodes (described below) as file servers that eachhave a 12TB software RAID 10 array built on 24 1TB SATAdisks

SSD IO nodes are designed for the random write workloadsgenerated by parallel computer visions algorithms The twonodes are Dell R310s with an Intel Xeon 3430 processorwith 4 24GHz cores Each machine has a striped (RAID0) volume on two OCZ Vertex4 solid states drives Thesystem realizes 20K IOPS of the theoretical hardware limitof 120K an improved IO controller is needed We deployedthe SSD IO nodes in response to our experience writing19M synapses in 3 days when running the parallel synapsefinding algorithm For synapse finding we had to throttlethe write rate to 50 concurrent outstanding requests to avoidoverloading the database nodes

Our growth plans involve moving more services into theData-Scope [41] a 90 node data-intensive cluster with a10PB ZFS file system We are currently using two machinesas file system nodes and have an unused allocation of 250TB on the shared file system Data-Scope nodes have 2 IntelXeon 5690 hex core processors 64 GB of RAM 3 Vertex3SSDs and 24 local SATA disks and thus serve all of ourcluster functions well

Data Distribution We place concurrent workloads on dis-tinct nodes in order to avoid IO interference This arisesin two applications Computer vision algorithms eg oursynapse detector reads large regions from the cutout ser-vice and performs many small writes We map cutouts toa database node and small writes to an SSD node Vi-sualization reads image data from either the tile stack orcutout database and overlays annotations Again anno-tation databases are placed on different nodes than imagedatabases Notably the cutouts and tile stack are not usedtogether and could be placed on the same node BecauseSSD storage is a limited resource OCP migrates databasesfrom SSD nodes to database nodes when they are no longer

actively being written This is an administrative action im-plemented with MySQLrsquos dump and restore utilities andmost often performed when we build the annotation reso-lution hierarchy (Section 32)

We shard large image data across multiple database nodesby partitioning the Morton-order space filling curve (Fig-ure 4) Currently we do this only for our largest dataset (bock11) for capacity reasons We have not yet founda performance benefit from sharding data For shardeddatabases the vast majority of cutout requests go to a sin-gle node We do note that multiple concurrent users of thesame data set would benefit from parallel access to the nodesof a sharded database Our sharding occurs at the applica-tion level The application is aware of the data distributionand redirects requests to the node that stores the data Weare in the process of evaluating multiple NoSQL systems inwhich sharding and scaleout are implemented in the storagesystem rather than the application We are also evaluatingSciDB [40 43] for use as an array store for spatial data

42 Web ServicesWeb-services provide rich object representations that bothcapture the output of computer vision algorithms and sup-port the spatial analysis of brain anatomy We have usedthe service to perform analysis tasks such as spatial densityestimation clustering and building distance distributionsAnalysis queries typically involve selecting a sets of objectsbased on metadata properties examining the spatial extentof these objects and computing statistics on spatial proper-ties such as distances or volumes For example one analysislooked at the distribution of the lengths of dendritic spinesskinny connections from dendrites to synapses in order tounderstand the region of influence of neural wiring TheOCP queries used metadata to find all synapses of a certaintype that connect to the selected dendrite and extractedvoxels for synapses and the object

All programming interfaces to the OCP Data Cluster useRESTful (REpresentational State Transfer) [8] interfacesthat are stateless uniform and cacheable Web-service in-vocations perform HTTP GETPUTDELETE requests tohuman readable URLs RESTrsquos simplicity makes it easy tointegrate services into many environments we have built ap-plications in Java CC++ Python Perl php and Matlab

We have selected HDF5 as our data interchange formatHDF5 is a scientific data model and format We prefer itto more common Web data formats such as JSon or XMLbecause of its support for multidimensional arrays and largedata sets

Cutout Providing efficient access to arbitrary sub-volumesof image data guides the design to the OCP Data SystemThe query which we call a cutout specifies a set of dimen-sions and ranges within each dimension in a URL GETsthe URL from the Web service which returns an HDF5file that contains a multidimensional array (Table 1) Everydatabase in OCP supports cutouts EM image databases re-turn arrays of 8-bit grayscale values annotation databasesreturn 32-bit annotation identifiers and we also support 16-bit (TIFF) and 32-bit (RBGA) image formats

Projects and Datasets A dataset configuration describesthe dimensions of spatial databases This includes the num-ber of dimensions (time series and channels) the size of eachdimensions and the number of resolutions in the hierarchyA project defines a specific database for a dataset includingthe project type (annotations or images) the storage config-uration (nodes and sharding) and properties such as doesthe database support exceptions and is the database read-only We often have tens of projects for a single dataset in-

Figure 8 A cutout of an annotation database (left)and the dense read of a single annotation (right)

cluding original data cleaned data and multiple annotationdatabases that describe different structures or use differentannotations algorithms

Object Representations Annotation databases combineall annotations of the same value (identifier) into an ob-ject associated with metadata and support spatial queriesagainst individual objects The service provides several dataoptions that allow the query to specify whether to retrievedata and in what format (Table 1) The default retrievesmetadata only from the databases tables that implementthe RAMON ontology A query may request a bounding-box around the annotation which queries a spatial indexbut does not access voxel data All the data may be re-trieved as a list of voxels or as a dense array by specify-ing cutout The dense data set has the dimensions of thebounding box Dense data may also be restricted to a spe-cific region by specifying ranges Data queries retrieve voxeldata from cuboids and then filter out all voxel labels thatdo not match the requested annotation

We provide both sparse (voxels lists) and dense (cutout)data interfaces to provide different performance optionswhen retrieving spatial data for an object The best choicedepends upon both the data and the network environmentAt the server it is always faster to compute the dense cutoutThe server reads cuboids from disk and filter the data inplace in the read buffer To compute voxel lists the match-ing voxels locations are written to another array Howevermany neural structures have large spatial extent and areextremely sparse For these the voxel representations aremuch smaller On WAN and Internet connections the re-duced network transfer time dominates additional computetime For example the dendrite 13 in kasthuri11 set com-prises 8 million voxels in a bounding box of more than 19trillion voxels ie less that 04 of voxels are in the annota-tion Other neural structures such as synapses are compactand dense interfaces always perform better

To write an annotation clients make an HTTP PUT requestto a project that includes an HDF5 file All writes use thesame base URL because the HDF5 file specifies the anno-tation identifier or gives no identifier causing the server tochoose a unique identifier for a new object The data optionsspecify the write discipline for voxels in the current anno-tation that are already labeled in the database overwritereplaces prior labels preserve keeps prior labels and ex-ception keeps the prior label and marks the new label as anexception Other data options include update when mod-ifying an existing annotation and dataonly to write voxellabels without changing metadata

Batch Interfaces OCP provides interfaces to read or writemultiple annotations at once in order to amortize the fixedcosts of making a Web service request over multiple writesand reads Batching requests is particularly important whencreating neural structures with little or no voxel data inwhich the Web service invocation dominates IO costs tothe database This was the cases with our synapse finder(Section 2) in which we doubled throughput by batching 40writes at a time HDF5 supports a directory structure that

3-d image cutout httpopenconnectometokenhdf5resolutionx-rangey-rangez-range 5123 at offset (1024 1024 1024) resolution 4 httpopenconnectomebock11hdf54512102451210245121024

Read an annotation httpopenconnectometokenidentifierdata options and voxel list for identifier 75 httpopenconnectomeannoproj75voxels and bounding box httpopenconnectomeannoproj75boundingbox and cutout restricted to a region httpopenconnectomeannoproj75cutout210002000100020001020

Write an annotation httpopenconnectometokendata optionsBatch read httpopenconnectomeannproj100010011002

Predicate query (find all synapses) httpopenconnectomeannoprojobjectstypesynapse

Table 1 RESTful interfaces to the OCP Cutout and Annotation Web services Italics indicate argumentsand parameters annoproj is an example annotation project

we use to encode multiple RAMON objects placing eachobject in its own directory by annotation identifier

Querying Metadata OCP provides a keyvalue queryinterface to object metadata The queries extract a listof objects that match a predicate against metadata Wecurrently allow equality queries against integers enumera-tions strings and user-defined keyvalue pairs and rangequeries against floating points values Although a limitedinterface these queries are simple and expressive for cer-tain tasks The objects Web service (Table 1) requests oneor more metadata fields and values (or metadata field in-equality operator and value for floating point fields) andthe service returns a list of matching annotation identi-fiers As an example we use the url openconnectomeobjectstypesynapseconfidencegeq099 to visualizethe highest confidence objects found by the synapse detec-tion pipeline

Although we do not currently provide full relational accessto databases we will provide SQL access for more complexqueries and move power users to SQL To date Web serviceshave been sufficient for computer vision pipelines But wesee no reason to re-implement an SQL-equivalent parser andlanguage through Web Services

Spatial Queries and Indexing Objects Critical to spa-tial analysis of annotation databases are two queries (1)what objects are in a region and (2) what voxels comprisean object On these primitives OCP applications buildmore complex queries such as nearest neighbors cluster-ing and spatial distributions OCPrsquos database design is wellsuited to identifying the annotations in a region The serviceperforms a cutout of the region and extracts all unique ele-ments in that region In fact the Numpy library in Pythonprovides a built in function to identify unique elements thatis implemented in C Locating the spatial extent of an objectrequires an additional index

OCP uses a simple sparse indexing technique that supportsbatch IO to locate the spatial extent of an object its voxelsand bounding box The index comprises a database tablethat enumerates the list of cuboids that contain voxels foreach annotation identifier (Figure 9) The list itself is aBLOB that contains a Python array Although this index isnot particularly compact it has several desirable properties

Adding to the index is efficient It is a batch operationand uses append only IO When writing an annotation ina cuboid we determine that the annotation is new to thecuboid based on the current contents and add the cuboidrsquosMorton-order location to a temporary list After all cuboidshave been updated a single write transaction appends allnew cuboids to the list as a batch The annotation pro-cess be it manual of machine vision tends to create newobjects and only rarely deletes or prunes existing objectsThe workload suits an append-mostly physical design

Figure 9 The sparse index for an object (green) isa list of the Morton-order location of the cuboidsthat contain voxels for that annotation The indexdescribes the disk blocks that contain voxels for thatobject which can be read in a single pass

Retrieving an object allows for batch IO and retrieves alldata in a single sequential pass The query retrieves the listof cuboids and sorts then by Morton-order locations Thenall cuboids are requested as a single query Cuboids arelaid out in increasing Morton order on disk and thus areretrieved a single sequential pass over the data

We chose this design over existing spatial indexing tech-niques because it is particularly well suited to neuroscienceobjects which are sparse and have large spatial extent ForOCP data indexes that rely on bounding boxes either growvery large (R-Trees [11]) or have inefficient search (R+-Trees[37]) Informally neural objects are long and skinny andthere are many in each region so that bounding boxes inter-sect and overlap pathologically An alternative is to use di-rected local search techniques that identify objects by a cen-troid and locate the object by searching nearby data [29 42]For OCP data these techniques produce much more com-pact indexes that are faster to maintain However query-ing the index requires many IOs to conduct an interactivesearch and we prefer lookup performance to maintenancecosts We plan to quantify and evaluate these informal com-parisons in the future

Software Application servers run a DjangoPython stackintegrated with Apache2 using WSGI that processes Webservice requests This includes performing database queriesassembling and filtering spatial data and formatting dataand metadata into and out of HDF5 files We uses paral-lel Cython to accelerate the most compute intensive tasks

(a) Maximum (b) By cutout size (c) By cutout size (loglog scale)

Figure 10 The performance of the cutout Web-service that extracts three-dimensional subvolumes from thekasthuri11 image database

Cython compiles Python code into the C language so thatit executes without the overhead of interpretation Paral-lel Cython activates OpenMP multicore parallelism withinCython routines The compute intensive routines that weaccelerate with Cython operate against every voxel in acutout Examples include (1) false coloring annotations forimage overlays in which every 32-bit annotation is mappedto an RGBA color in an output image buffer and (2) filter-ing out annotations that do not match a specified criteriaeg to find all synapses in a region one queries the metadatato get a list of synapse ids and then filters all non-matchingidentifiers out of the cutout

5 PRELIMINARY PERFORMANCEWe conducted experiments against the live Web-services toprovide an initial view of performance The major con-clusion is that memory and IO performance both limitthroughput The process of array slicing and assembly forcutout requests keeps all processors fully utilized reorganiz-ing data in memory We had not witnessed the memorybottleneck prior to this study because OCP serves data overa 1 Gbps Ethernet switch and a 1 Gbps Internet uplinkwhich we saturate long before other cluster resources Aswe migrate to the Data-Scope cluster (Section 41) we willupgrade to a 40 Gbps Internet2 uplink and more computecapable nodes which will make memory an even more crit-ical resource

Experiments were conducted on an OCP Database node aDell R710 with two Intel Xeon E5630 quad-core processors64GB of memory and a Dell H700 RAID controller manag-ing a RAID-6 array of 11 3TB SATA drives for a 27TB filesystem All Web service requests are initiated on the samenode and the use the localhost interface allowing us to testthe server beyond the 1 Gbps network limit Experimentsshow the cutout size not the size of the data transferred orread from disk Each cuboid is compressed on disk read anduncompressed The data are then packaged into a cutoutwhich is compressed prior to data transfer The cutout sizeis the amount of data that the server must handle in mem-ory The EM mouse brain data in question has high entropyand tends to compress by less than 10 The annotationdata are highly compressible

Cutout throughput is the principal performance measurefor our system dictating how much data the service pro-vides to parallel computer vision workflows Figure 10(a)

Figure 11 Throughput of 256MB cutout requests tokasthuri11 as a function of the number of concurrentrequests

shows the maximum throughput achieved over all configu-rations when issuing 16 parallel cutout requests When dataare in memory and cutout requests are aligned to cuboidboundaries (aligned memory) the system performs no IOand minimally rearranges data Processing in the appli-cation stack bounds performance In this case a singlenode realizes a peak throughput of more than 173 MBsfor the largest transfers Cutouts to random offsets alignedto cuboid boundaries add IO costs to each request andbring performance down to a peak of 121 MBs Unalignedcutouts require data to be reorganized in memory movingbyte ranges that are unaligned with the cache hierarchyThis incurs further performance penalties and peak through-put drops to only 61 MBs The IO cost of these requestsis only marginally more than aligned cutouts rounding eachdimension up to the next cuboid Unaligned cutouts revealsthe dominance of memory performance

Scalability results reveal fixed costs in both Web-service in-vocation and IO that become amortized for larger cutoutsFigures 10(b) and 10(c) show the throughput as a functionof cutout size in normal and loglog scale The experimentsuses 16 parallel requests each Performance scales nearly lin-early until 256K for reads (IO cost) and almost 1 MB for in-

Figure 12 Throughput of writing annotations as afunction of the size of the annotated region

cache (Web-service costs) Beyond this point performancecontinues to increase albeit more slowly For aligned andunaligned reads we attribute the continued increase to theMorton-order space-filling curve Larger cutouts intersectlarger aligned regions of the Morton-order curve producinglarger contiguous IOs [23] We do not report performanceabove a 256M cutout size Beyond this point the buffersneeded by the application and Web server exceed memorycapacity

To realize peak throughput we had to initiate multiple re-quests in parallel The application stack runs each Web-service request on a single process thread and memory lim-its the per-thread performance Figure 11 shows throughputas a function of the number of parallel requests Through-put scales with the number of parallel requests beyond theeight physical cores of the machine to 16 when reading datafrom disk and to 32 when reading from memory Perfor-mance scales beyond the 8 cores owing to some combina-tion of overlapping IO with computation and hyperthread-ing Too much parallelism eventually leads to a reduction inthroughput because all hardware resources are fully utilizedand more parallelism introduces more resource contention

Parallelism is key to performance in the OCP Data Clusterand the easiest way to realize it is through initiating concur-rent Web-service requests Parallelizing individual cutoutswould benefit the performance of individual request butnot overall system throughput because multiple requests al-ready consume the memory resources of all processors Wenote that the Numpy Python library we use for array ma-nipulation implements parallel operators using BLAS butdoes not parallelize array slicing

Figure 12 shows the write throughput as a function ofthe volume size when uploading annotated volumes An-notations differ from cutouts in that the data are 32-bitsper voxel and are highly compressible For this experi-ment we uploaded a region dense manual annotations ofthe kasthuri11 data set in which more than 90 of voxelsare labeled Data compress to 6 the original size How-ever performance scales with the uncompressed size of theregion because that dictates how many cuboids must bemanipulated in memory This experiment uses 16 parallelrequests Write performance scales well up to 2MB cutoutsand is faster than read at the same cutout size because thedata compress much better However beyond 2MB trans-fers write throughput collapses The best performance overall configurations of 19 MBs does not compare well withthe 121 MBs when reading image cutouts Updating a vol-

Figure 13 Performance comparison of Databasenodes and SSD nodes when writing synapses (smallrandom writes)

ume of annotations is much more complex than a cutoutIt (1) reads the previous annotations (2) applies the newannotations to the volume database resolving conflicts ona per voxel basis (3) writes back the volume database (4)reads index entries for all new annotations (5) updates eachlist by unioning new and old cuboid locations and (5) writesback the index IO is doubled and index maintenance costsare added on top The critical performance issues is updat-ing the spatial index Parallel writes to the spatial indexresult in transaction retries and timeouts in MySQL dueto contention Often a single annotation volume will re-sult in the update of hundreds of index entries one for eachunique annotation identifier in the volume We can scale tolarger writes at the expense of using fewer parallel writersImproving annotation throughput is a high priority for theOCP Data Cluster

We also include an comparison of small write performancebetween an SSD node and a Database node The SSD nodeis a Dell R310 with an Intel Xeon 3430 with 4 24GHz cores16 GB of memory and two OCZ Vertex 4 drives config-ured in a RAID 0 array The experiment uploads all of thesynapse annotations in the kasthuri11 data in random or-der committing after each write Figure 13 shows that theSSD node achieves more than 150 the throughput of theDatabase array on small random writes The number ofsynapse writes per second is surprisingly low We write onlyabout 6 RAMON objects per second However a single Ra-mon object write generates updates to three different meta-data tables the index and the volume database The take-away from this experiment is that an inexpensive SSD node(lt$3000) offloads the write workload of an entire Databasenode (gt$18000) All IO in this experiment is randomIn practice we achieve much higher write throughput be-cause of locality and batching of requests Our synapsefinder workload uploaded more than 73 synapses a secondper node but that number reflects caching and effects whenupdating many synapses in a small region

6 RELATED WORKThe performance issues faced by the spatial databases in theOCP Data Cluster parallel those of SciDB [40 43] includingarray slicing selection queries joins and clusteringscale-out SciDB might benefit OCP by making data distributiontransparent and by offloading application functions imple-mented in Python to SciDB

Many of the physical design principles of OCP follow the Ar-rayStore of Soroush [39] They give a model for multi-levelpartitioning (chunking and tiling) that is more general thanOCP cuboids They can handle both regular and irregular

data ArrayStore extends decades of work on regular [6 24]and irregular [5] tiling The RasDaMan multidimensionalarray database [2] also builds spatial data services for im-age processing on top of a relational database They toohave explored the issues of tiling data organization anddata distribution [9] In contrast to general-purpose arraydatabases OCP has a focus on the specific application ofparallel computer vision for neuroscience and as such tunesits design to application concepts and properties Examplesinclude using the partitioning of space-filling curves in thedata distribution function and indexing of neural objects

OCPrsquos design incorporates many techniques developed inthe spatial data management community Samet [35] wrotethe authoritative text on the subject which we have used asa reference for region quad-trees space-filling curves tessel-lations and much more

OCP represents the a current state of evolution of the scale-out database architectures developed by the Institute forData-Intensive Engineering and Science at Johns HopkinsGray and Szalay developed the progenitor system in theSloan Digital Sky Survey [36] This has lead to more than 30different data products over the past decade including theJHU Turbulence Database Cluster [20] and the Life UnderYour Feet soil ecology portal [25]

7 FINAL THOUGHTSIn less than two years The Open Connectome Projecthas evolved from a problem statement to a data manage-ment and analysis platform for a community of neurosci-entists with the singular goal of mapping the brain High-throughput imaging instruments have driven data manage-ment off the workstation and out of the lab into data-intensive clusters and Web-services Also the scope of theproblem has forced experimental biologists to engage statis-ticians systems engineers machine learners and big-datascientists The Open Connectome Project aims to be theforum in which they all meet

The OCP Data Cluster has built a rich set of features tomeet the analysis needs of its users Application-specificcapabilities have been the development focus to date Theplatform manages data sets from many imaging modalities(Five to fifteen depending on how one counts them)

Presently our focus must change to throughput and scala-bility The near future holds two 100 teravoxel data setsmdasheach larger than the sum of all our data We will move to anew home in the Data-Scope cluster which will increase ouroutward-facing Internet bandwidth by a factor of forty Wealso expect the Connectomics community to expand rapidlyThe US Presidentrsquos Office recently announced a decade-longinitiative to build a comprehensive map of the human brain[21] This study reveals many limitations of the OCP datacluster and provides guidance to focus our efforts specif-ically toward servicing many small writes to record brainstructures and efficient and parallel manipulation of data toalleviate memory bottlenecks

AcknowledgmentsThe authors would like to thank additional members of theOpen Connectome Project that contributed to this workincluding Disa Mhembere and Ayushi Sinha We wouldalso like to thank our collaborators Forrest Collman AlbertoCardona Cameron Craddock Michael Milham Partha Mi-tra and Sebastian Seung

8 ADDITIONAL AUTHORSlowast Department of Computer Science and the Institute forData Intensive Engineering and Science Johns Hopkins Uni-versity

dagger Johns Hopkins University Applied Physics LaboratoryDagger Department of Statistical Science and Mathematics andthe Institute for Brain Science Duke Universitysect Janelia Farm Research Campus Howard Hughes MedicalInstitutepara Department of Molecular and Cellular Biology HarvardUniversity Department of Computational Neuroscience MassachusettsInstitute of Technologylowastlowast Allen Institute for Brain Sciencedaggerdagger Department of Physics and Astronomy and the Institutefor Data Intensive Engineering and Science Johns HopkinsUniversityDaggerDagger Department of Bioengineering Stanford Universitysectsect Department of Molecular and Cellular Physiology Stan-ford University

9 REFERENCES[1] D J Abadi S R Madden and M Ferreira

Integrating compression and execution incolumn-oriented database systems In SIGMOD 2006

[2] P Baumann A Dehmel P Furtado R Ritsch andB Widmann The multidimensional database systemRasDaMan In SIGMOD 1998

[3] D D Bock W-C A Lee A M Kerlin M LAndermann G Hood A W Wetzel S YurgensonE R Soucy H S Kim and R C Reid Networkanatomy and in vivo physiology of visual corticalneurons Nature 471(7337) 2011

[4] A Cardona S Saalfeld J SchindelinI Arganda-Carreras S Preibisch M LongairP Tomancak V Hartenstein and R J DouglasTrakEM2 software for neural circuit reconstructionPLoS ONE 7(6) 2012

[5] C Chang A Acharya A Sussman and J Saltz T2a customizable parallel database for multi-dimensionaldata SIGMOD Record 27(1) 1998

[6] C Chang B Moon A Acharya C ShockA Sussman and J Saltz Titan A high-performanceremote sensing database In ICDE 1997

[7] F Crow Summed-area tables for texture mapping InSIGGRAPH 1984

[8] R T Fielding R Taylor and N Richard Principleddesign of the modern Web architecture ACMTransactions on Internet Technology 2(2) 2002

[9] P Furtado and P Baumann Storage ofmultidimensional arrays based on arbitrary tiling InICDE 1999

[10] J Gray and A Szalay Science in an exponentialworld Nature 440(23) 23 March 2006

[11] A Guttman R-trees a dynamic index structure forspatial searching In ACM SIGMOD Conference 1984

[12] H S Seung et al Eyewire Available at eyewireorg2012

[13] V Jain H S Seung and S C Turaga Machines thatlearn to segment images a crucial technology forconnectomics Current opinion in neurobiology 20(5)2010

[14] V Jain S Turaga K Briggman W Denk andS Seung Learning to agglomerate superpixelhierarchies In Neural Information Processing Systems2011

[15] K Kanov R Burns G Eyink C Meneveau andA Szalay Data-intensive spatial filtering in largenumerical simulation datasets In Supercomputing2012

[16] N Kasthuri and J Lichtman Untitled Inpreparation 2013

[17] V Kaynig T Fuchs and J M BuhmannGeometrical consistent 3d tracing of neuronalprocesses in ssTEM data In MICCAI 2010

[18] M Kazhdan and H Hoppe Streaming multigrid forgradient-domain operations on large images ACMTransactions on Graphics 27 2008

[19] D M Kleissas W R Gray J M Burck J TVogelstein E Perlman P M Burlina R Burns andR J Vogelstein CAJAL3D toward a fully automaticpipeline for connectome estimation fromhigh-resolution em data In Neuroinformatics 2012

[20] Y Li E Perlman M Wang Y Yang C MeneveauR Burns S Chen A Szalay and G Eyink A publicturbulence database cluster and applications to studyLagrangian evolution of velocity increments inturbulence Journal of Turbulence 9(31)1ndash29 2008

[21] J Markoff Obama seeking to boost study of humanbrain New York Times February 17 2013

[22] K D Micheva B Busse N C Weiler N OrsquoRourkeand S J Smith Single-synapse analysis of a diversesynapse population Proteomic imaging methods andmarkers Neuron 68(1) 2010

[23] B Moon H V Jagadish C Faloutsos and J HSaltz Analysis of the clustering properties of theHilbert space-filling curve IEEE Transactions onKnowledge and Data Engineering 13(1) 2001

[24] B Moon and J Saltz Scalability analysis ofdeclustering methods for multidimensional rangequeries Trans of Knowledge and Data Engineering2(10) 1998

[25] R A Musaloiu-E A Terzis K Szlavecz A SzalayJ Cogan and J Gray Life under your feet Wirelesssensors in soil ecology In Embedded NetworkedSensors 2006

[26] M Nielsen Reinventing Discovery PrincetonUniversity Press 2011

[27] A Norton and J Clyne The VAPOR visualizationapplication In High Performance Visualization 2012

[28] N OrsquoRourke N C Weiler K D Micheva and S JSmith Deep molecular diversity of mammaliansynapses why it matters and how to measure itNature Reviews Neuroscience 13(1) 2012

[29] S Papadomanolakis A Ailamaki J C Lopez T TuD R OrsquoHallaron and G Heber Efficient queryprocessing on unstructured tetrahedral meshes InACM SIGMOD 2006

[30] E Perlman R Burns Y Li and C Meneveau Dataexploration of turbulence simulations using a database

cluster In Supercomputing 2007[31] A V Reina M Gelbart D Huang J Lichtman

E L Miller and H Pfister Segmentation fusion forconnectomics In International Conference onComputer Vision 2011

[32] A V Reina W-K Jeong J Lichtman andH Pfister The connectome project discovering thewiring of the brain ACM Crossroads 18(1)8ndash132011

[33] D E Rex J Q Ma and A W Toga The LONIpipeline processing environment Neuroimage 19(3)2003

[34] S Saalfeld A Cardona V Hartenstein andP Tomancak CATMAID collaborative annotationtoolkit for massive amounts of image dataBioinformatics 25(15) 2009

[35] H Samet Foundations of Multidimensional andMetric Data Structures Morgan Kaufmann PublishersInc San Francisco 2006

[36] The Sloan Digital Sky Survey 2013 Available athttpwwwsdssorg

[37] T Sellis N Roussopoulos and C Faloutsos TheR+-tree A dynamic index for multi-dimensionalobjects In VLDB 1987

[38] C Sommer C Straehle U Koethe and F AHamprecht rdquoilastik Interactive learning andsegmentation toolkitrdquo In Biomedical Imaging 2011

[39] E Soroush M Balazinska and D Wang ArraystoreA storage manager for complex parallel arrayprocessing In SIGMOD 2011

[40] M Stonebraker J Becla D DeWitt K-T LimD Maier O Ratzesberger and S ZdonikRequirements for science data bases and SciDB InConference on Innovative Data Systems Research2009

[41] A S Szalay K Church C Meneveau A Terzis andS Zeger MRI The Development of Data-Scopemdashamulti-petabyte generic data analysis environment forscience Available at httpswikiphajhueduescience_wikiimages77fDataScopepdf 2012

[42] F Tauheed L Biveinis T Heinis F SchurmannH Markram and A Ailamaki Accelerating rangequeries for brain simulations In ICDE 2012

[43] The SciDB Development Team Overview of SciDBlarge scale array storage processing and analysis InSIGMOD 2010

[44] K Wu FastBit Interactively searching massive dataJournal of Physics Conference Series 180(1) 2009

  • 1 Introduction
  • 2 Data and Applications
  • 3 Data Model
    • 31 Physical Design
    • 32 Annotations
    • 33 Tiles
    • 34 Data Cleaning
      • 4 System Design
        • 41 Architecture
        • 42 Web Services
          • 5 Preliminary Performance
          • 6 Related Work
          • 7 Final Thoughts
          • 8 Additional Authors
          • 9 References
Page 2: The Open Connectome Project Data Cluster: Scalable ... · The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience Randal Burns William

Figure 1 Visualization of the spatial distributionof synapses detected in the mouse visual cortex ofBock et al [3]

The Open Connectome Project was specifically designed tobe a scalable data infrastructure for parallel computer visionalgorithms that discover neural connectivity through imageprocessing Researchers have many different approaches tothe task For example the segmentation process which di-vides the image into bounded regions may be performedwith feed-forward neural networks [14] or by geometric trac-ing [17] However neural reconstruction has some funda-mental properties that we capture in system design Algo-rithms realize parallelism through a geometric decomposi-tion of the data OCP provides a cutout service to extractspatially contiguous regions of the data including projec-tions to lower dimensions We use indexes derived fromspace-filling curves to partition data which makes cutoutqueries efficient and (mostly) uniform across lower dimen-sional projections Partitions are striped across disk arraysto realize parallel IO and to multiple nodes for scale-outVision algorithms output descriptions of neural connectivityWe capture these in a relational database of neural objectmetadata [19] linked to a spatial annotation database thatstores the structures Algorithms run in multiple phasesassembling structure from the output of previous stageseg fusing previous segmentations into neurons We sup-port queries across images and annotations Annotationdatabases are spatially registered to images Finally wesupport spatial queries for individual objects and regionsthat are used in analysis to extract volumes find nearestneighbors and compute distances

The Open Connectome Project stores more than 50 uniquedata sets totaling more than 75TB of data Connectomesrange from the macro (magnetic resonance imaging of hu-man subjects at 1 mm3) to the micro (electron microscopyof mouse visual cortex at 4nm times 4nm times 40nm) We havedemonstrated scalable computer vision in the system by ex-tracting more than 19 million synapse detections from a 4trillion pixel image volume one quarter scale of the largestpublished EM connectome data [3] This involved a clus-ter of three physical nodes with 186 cores running for threedays communicating with the OCP cutout and annotationservice over the Internet

2 DATA AND APPLICATIONSWe present two examples data sets and the correspondinganalysis as use cases for Open Connectome Project services

Figure 2 Electron microscopy imaging of a mousesomatosensory cortex [16] overlaid by manual an-notations describing neural objects including ax-ons dendrites and synapses These images werecutout from two spatially registered databases anddisplayed in the CATMAID Web viewer [34]

The data themselves are quite similar high-resolution elec-tron microscopy of a mouse brain However the analyses ofthese data highlight different services

The bock11 data [3] demonstrates state-of-the-art scalabil-ity and the use of parallel processing to perform computervision The image data are the largest published collec-tion of high-resolution images covering a volume of roughly450x350x50 microns with 20 trillion voxels at a resolutionof 4x4x40nm Volumes at this scale just start to containthe connections between neurons Neurons have large spa-tial extent and connections can be analyzed when both cellsand all of the connection wiring (dendritesynapseaxon) liewithin the volume We are using this data to explore thespatial distribution of synapses identifying clusters and out-liers to generate a statistical model of where neurons con-nect Our synapse-finding vision algorithm extracts morethan 19 millions locations in the volume (Figure 1) Wehave not yet characterized the precision and recall of thistechnique Thus this exercise is notable for its scale onlywe ran 20 parallel instances and processed the entire volumein less than 3 days For comparison Bock et al [3] collectedthis data so that they could manually trace 10 neurons 245synapses and 185 postsynaptic targets over the course of270 human days We build a framework for running thistask within the LONI [33] parallel execution environment

The kasthuri11 data [16] shows how spatial analysis canbe performed using object metadata and annotations (Fig-ure 2) This data has the most detailed and accuratemanual annotations Two regions of 1000x1000x100 and1024x1024x256 voxels have been densely reconstructed la-beling every structure in the volume Three dendrites thatspan the entire 12000x12000x1850 voxel volume have hadall synapses that attach to dendritic spines annotated OCPhas ingested all of these manual annotations including ob-ject metadata for all structures and a spatial database ofannotated regions This database has been used to answerquestions about the spatial distribution of synapses (con-nections) with respect to the target dendrite (major neuronbranch) This analysis proceeds based on (1) using meta-data to get the identifiers of all synapses that connect to thespecified dendrite and then (2) querying the spatial extentof the synapses and dendrite to compute distances The lat-ter stage can be done by extracting each object individuallyor specifying a list of objects and a region and having the

Figure 3 Visualization of six channels array tomog-raphy data courtesy of Nick Weiler and StephenSmith [28 22] Data were drawn from a 17-channeldatabase and rendered by the OCP cutout service

database filter out all other annotations We also use thedensely annotated regions as a ldquoground truthrdquo for evaluat-ing machine vision reconstruction algorithms

3 DATA MODELThe basic storage structure in OCP is a dense multi-dimensional spatial array partitioned into cuboids (rectan-gular subregions) in all dimensions Cuboids in OCP aresimilar in design and goal to chunks in ArrayStore [39] Eachcuboid gets assigned an index using a Morton-order space-filling curve (Figure 4) Space-filling curves organize datarecursively so that any power-of-two aligned subregion iswholly contiguous in the index [30] Space-filling curves min-imize the number of discontiguous regions needed to retrievea convex shape in a spatial database [23] While the Hilbertcurve has the best properties in this regard we choose theMorton-order curve for two reasons It is simple to evaluateusing bit interleaving of offsets in each dimension unlikeother curves that are defined recursively Also cube ad-dresses are strictly non-decreasing in each dimension so thatthe index works on subspaces Non-decreasing offsets alsoaid in interpolation filtering and other image processingoperations [15 7]

Image data contain up to 5 dimensions and often exhibitanisotropy For example serial section electron microscopydata come from imaging sections created by slicing or ablat-ing a sample The resolution of image plane (XY) is deter-mined by the instrument and the sections (Z) by the section-ing technique The difference in resolution is often a factorof 10 The fourth and fifth dimension arise in other imagingmodalities Some techniques produce time-series such asfunctional magnetic-resonance and light-microscopy of cal-cium channels Others image multiple channels that corre-spond to different proteins or receptors (Figure 3) Thesedata are studied by correlating multiple channels eg spa-tial co-occurence or exclusion to reveal biology

31 Physical DesignWe store a multi-resolution hierarchy for each image dataset so that analyses and visualization can choose the ap-propriate scale For EM data each lower resolution reducesthe data size by a factor of four halving the scale in X andY Typically we do not scale Z because it is poorly resolvedWe also do not scale time or channels The bock11 data hasnine levels and kasthuri11 six As an example of choosingscale our bock11 synapse detector runs on high resolution

Figure 4 Partitioning the Morton (z-order) space-filling curve For clarity the figure shows 16 cuboidsin 2-dimensions mapping to four nodes The z-ordercurve is recursively defined and scales in dimensionsand data size

Figure 5 The resolution hierarchy scales the XYdimensions of cuboids but not Z So that cuboidscontain roughly equal lengths in all dimensions

data because synapses have limited spatial extent (tens ofvoxels in any dimension) We detect synapses at resolutionone four times smaller and four times faster than the rawimage data We found that the algorithms was no less accu-rate at this scale We analyze large structures that cannotcontain synapses such as blood vessels and cell bodies tomask out false positives We developed this software for thisanalysis but many of the techniques follow those describein ilastik [38] We conduct the analysis at resolution 5 inwhich each voxel represents a (32321) voxel region in theraw data The structures are large and detectable at lowresolution and the computation requires all data to be inmemory to run efficiently

Resolution scaling and workload dictate the the dimensionand shape of cuboids For EM data computer vision algo-rithms operate on data regions that are roughly cubic in theoriginal sample so that they can detect brain anatomy in allorientations Because the X and Y dimensions are scaledbut not Z this results in different sized voxel requests atdifferent resolutions For this reason use different shapedcuboids at different levels in the hierarchy (Figure 5) Forexample at the highest three resolutions in bock11 cuboidsare flat (128x128x16) because each voxel represents 10 timesas much length in Z as X and Y Beyond level 4 we shift

to a cube of (64x64x64) We include time series in the spa-tial index using a 4-d space filling curve Time is oftenas large as other dimensions 1000s of time points in MRdata This supports queries that analyze the time historyof a smaller region We do not include channel data in theindexmdashthere are different cuboids for different channelsmdashbecause the number of channels tend to be few (up to 17)and most analyses look at combinations of a few channels

Cuboids contain only 218 = 256K of data which is a com-promise among the different uses of the data This sizemay be too small to optimize IO throughput Space fillingcurves mitigate this by ensuring that larger aligned regionsare stored sequentially and can be read in a single stream-ing IO An important use of our service extracts lower-dimensional projections for visualization of EM data andsub-space clustering of 4-d and 5-d data Keeping the cuboidsize small reduces the total amount of data to be read (anddiscarded) for these queries

32 AnnotationsAn annotation project contains a description of the spa-tial extent and metadata for objects detected in an imagedatabase Each project has a spatial database registered toan image data set Projects may be shared among teams ofhuman annotators that want a coherent view of what struc-tures have been labeled Projects are also plentiful eachparameterization of a computer vision algorithm may havea project so that outputs can be compared Annotations alsohave a resolution hierarchy so that large structures such ascell bodies may be found and marked at low resolution anddetail synapses and dendritic spines at high resolution

Anldquoannotationrdquoconsists of an object identifier that is linkedto object metadata in the RAMON (Reusable AnnotationMarkup for Open coNectomes) neuroscience ontology [19]and a set of voxels labeled with that identifier in the spa-tial database RAMON is a hierarchical systems engineeringframework for computer-vision and brain anatomy that in-clude concepts such as synapses seeds neurons organellesetc We developed RAMON for use within the Open Con-nectome Project community of collaborators it is not a stan-dard Although annotations are often sparse we store themin dense cuboids We found that this design is flexible andefficient When annotations are dense such as the outputof the automatic image segmentation [31] storing them incuboids outperforms sparse lists by orders of magnitude Toimprove performance when annotations are sparse we allo-cate cuboids lazily regions with no annotations use no stor-age and can be ignored during reads We also gzip compresscube data Cube labels compress well because they havelow entropy with both many zero values and long repeatedruns of non-zero values (labeled regions) Recent implemen-tations of run length encoding [1 44] may be preferable butwe have not evaluated them

We support multiple annotations per voxel through excep-tions An exceptions list per cuboid tracks multiply-labeledvoxels Exceptions are activated on a per project basiswhich incurs a minor runtime cost to check for exceptionson every read even if no exceptions are defined Each anno-tation write specifies how to deal with conflicting labels byoverwriting or preserving the previous label or by creatingan exception

For performance reasons we initially store annotations atsingle level in the resolution hierarchy and propagate themto all levels as a background batch IO job The conse-quence is that annotations are only immediately visible atthe resolution at which they are written The alternativewould be to update all levels of the hierarchy for each writeWhile there are data structures and image formats for this

Figure 6 Original (left) and color corrected (right)images across multiple serial sections [16]

task such as wavelets [27] the incremental maintenance ofa resolution hierarchy always makes updates more complexand expensive The decision to not make annotations con-sistent instantaneously reflects how we perform vision andanalysis Multiple algorithms are run to detect differentstructures at different levels and fuse them together Thenwe build the hierarchy of annotations prior to performingspatial analysis Because write IO performance limits sys-tem throughput we sacrifice data consistency to optimizethis workflow We may need to revisit this design decisionfor other applications

33 TilesAt present we store a redundant version of the image datain a 2-d tile stack as storage optimization for visualizationin the Web Viewer CATMAID [34] Tiles are 256x256 to1024x1024 image sections used in a pan and zoom interactiveviewer CATMAID dynamically loads and prefetches tiles sothat the user can experience continuous visual flow throughthe image region

The cutout service provides the capability to extract imageplanes as an alternative to storing tiles To do so it reads3-d cubes extracts the requested plane and discards thevast majority of the data To dynamically build tiles forCATMAID we use an http rewrite rule to convert a requestfor a file url into an invocation of the cutout Web service

Our current practice stores tiles for the image planendashthedimension of highest isotropic resolutionndashand dynamicallybuilds tiles from the cutout service for the orthogonal di-mensions including time Most visualization is done in theimage plane because anisotropy exposure differences andimperfect registration between slices makes images more dif-ficult to interpret (Figure 6) We modify CATMAIDrsquos stor-age organization to create locality within each file systemdirectory By default CATMAID keeps a directory for eachslice so that a tile at resolution r at coordinates (x y z) isnamed zy_x_rpng This places files from multiple reso-lutions in the same directory which decreases lookup speedand sequential IO Again using http rewrite rules we re-structure the hierarchy into rzy_xpng This halves thenumber of files per directory and more importantly eachdirectory corresponds to a single CATMAID viewing plane

Figure 7 OCP Data Cluster and clients as configured to run and visualize the parallel computer visionworkflow for synapse detection (Section 2)

We believe that we can provide performance similar to tilesin the cutout service through a combination of prefetchingand caching Doing so would eliminate the redundant stor-age of tiles Instead of doing a planar cutout for each tilewe could round the request up to the next cuboid and ma-terialize and cache all the nearby tiles either on the serveror in a distributed memory cache This is future work

34 Data CleaningWhile not a database technology we mention that we colorcorrect image data to eliminate exposure differences betweenimage planes The process solves a global Poisson equa-tion to minimize steep gradients in the low frequencies out-putting smoothed data High-frequencies from the originalimage are added back in to preserve the edges and struc-tures used in computer vision The process was derived fromwork by Kazhdan [18] The resulting color corrected dataare easier to interpret in orthogonal views (Figure 6) Wealso believe that they will improve the accuracy of computervision a hypothesis that we are exploring

4 SYSTEM DESIGNWe describe the OCP Data Cluster based on the configu-ration we used for running the synapse detector (Figure 7)because this demonstrates a parallel computer vision appli-cation The figure includes parallel vision pipelines usingWeb services and a visualization function run from a Webbrowser both connecting over the public Internet

41 ArchitectureThe OCP Data Cluster consists of heterogeneous nodes eachdesigned for a different function or workload These includeapplication servers database nodes file system nodes andSSD IO nodes The database nodes store all of the imageand annotation data for cutout (except for high-IO anno-tation projects on SSD IO nodes) We currently have twoDell R710s with dual Intel Xeon E5630 quad-core proces-sors 64GB of memory and Dell H700 RAID controllersThe twelve hot-plug drive bays hold a RAID-6 array of 112TB (resp 3TB) SATA drives and a hot-spare for an 18TB(resp 27TB) file system Two drives have failed and rebuiltfrom the hot spare automatically in the last 18 months Weuse MySQL for all database storage

Application servers perform all network and data manipula-tion except for database query processing This includespartitioning spatial data requests into multiple databasequeries and assembling rewriting and filtering the re-sults We currently deploy two Web-servers in a load-balancing proxy They run on the same physical hardware as

our database nodes because processor capabilities of thesenodes exceed the compute demand of the databases Thesefunctions are entirely separable

File server nodes store CATMAID image tiles for visualiza-tion image data streamed from the instruments over theInternet and other ldquoprojectrdquo data from our partners thatneeds to be ingested into OCP formats The nodes are de-signed for capacity and sequential read IO We have a DellR710 with less memory (16GB) and compute (one E5620processor) than the database nodes The machine has thesame IO subsystem with a 27TB file system We use twoData-Scope nodes (described below) as file servers that eachhave a 12TB software RAID 10 array built on 24 1TB SATAdisks

SSD IO nodes are designed for the random write workloadsgenerated by parallel computer visions algorithms The twonodes are Dell R310s with an Intel Xeon 3430 processorwith 4 24GHz cores Each machine has a striped (RAID0) volume on two OCZ Vertex4 solid states drives Thesystem realizes 20K IOPS of the theoretical hardware limitof 120K an improved IO controller is needed We deployedthe SSD IO nodes in response to our experience writing19M synapses in 3 days when running the parallel synapsefinding algorithm For synapse finding we had to throttlethe write rate to 50 concurrent outstanding requests to avoidoverloading the database nodes

Our growth plans involve moving more services into theData-Scope [41] a 90 node data-intensive cluster with a10PB ZFS file system We are currently using two machinesas file system nodes and have an unused allocation of 250TB on the shared file system Data-Scope nodes have 2 IntelXeon 5690 hex core processors 64 GB of RAM 3 Vertex3SSDs and 24 local SATA disks and thus serve all of ourcluster functions well

Data Distribution We place concurrent workloads on dis-tinct nodes in order to avoid IO interference This arisesin two applications Computer vision algorithms eg oursynapse detector reads large regions from the cutout ser-vice and performs many small writes We map cutouts toa database node and small writes to an SSD node Vi-sualization reads image data from either the tile stack orcutout database and overlays annotations Again anno-tation databases are placed on different nodes than imagedatabases Notably the cutouts and tile stack are not usedtogether and could be placed on the same node BecauseSSD storage is a limited resource OCP migrates databasesfrom SSD nodes to database nodes when they are no longer

actively being written This is an administrative action im-plemented with MySQLrsquos dump and restore utilities andmost often performed when we build the annotation reso-lution hierarchy (Section 32)

We shard large image data across multiple database nodesby partitioning the Morton-order space filling curve (Fig-ure 4) Currently we do this only for our largest dataset (bock11) for capacity reasons We have not yet founda performance benefit from sharding data For shardeddatabases the vast majority of cutout requests go to a sin-gle node We do note that multiple concurrent users of thesame data set would benefit from parallel access to the nodesof a sharded database Our sharding occurs at the applica-tion level The application is aware of the data distributionand redirects requests to the node that stores the data Weare in the process of evaluating multiple NoSQL systems inwhich sharding and scaleout are implemented in the storagesystem rather than the application We are also evaluatingSciDB [40 43] for use as an array store for spatial data

42 Web ServicesWeb-services provide rich object representations that bothcapture the output of computer vision algorithms and sup-port the spatial analysis of brain anatomy We have usedthe service to perform analysis tasks such as spatial densityestimation clustering and building distance distributionsAnalysis queries typically involve selecting a sets of objectsbased on metadata properties examining the spatial extentof these objects and computing statistics on spatial proper-ties such as distances or volumes For example one analysislooked at the distribution of the lengths of dendritic spinesskinny connections from dendrites to synapses in order tounderstand the region of influence of neural wiring TheOCP queries used metadata to find all synapses of a certaintype that connect to the selected dendrite and extractedvoxels for synapses and the object

All programming interfaces to the OCP Data Cluster useRESTful (REpresentational State Transfer) [8] interfacesthat are stateless uniform and cacheable Web-service in-vocations perform HTTP GETPUTDELETE requests tohuman readable URLs RESTrsquos simplicity makes it easy tointegrate services into many environments we have built ap-plications in Java CC++ Python Perl php and Matlab

We have selected HDF5 as our data interchange formatHDF5 is a scientific data model and format We prefer itto more common Web data formats such as JSon or XMLbecause of its support for multidimensional arrays and largedata sets

Cutout Providing efficient access to arbitrary sub-volumesof image data guides the design to the OCP Data SystemThe query which we call a cutout specifies a set of dimen-sions and ranges within each dimension in a URL GETsthe URL from the Web service which returns an HDF5file that contains a multidimensional array (Table 1) Everydatabase in OCP supports cutouts EM image databases re-turn arrays of 8-bit grayscale values annotation databasesreturn 32-bit annotation identifiers and we also support 16-bit (TIFF) and 32-bit (RBGA) image formats

Projects and Datasets A dataset configuration describesthe dimensions of spatial databases This includes the num-ber of dimensions (time series and channels) the size of eachdimensions and the number of resolutions in the hierarchyA project defines a specific database for a dataset includingthe project type (annotations or images) the storage config-uration (nodes and sharding) and properties such as doesthe database support exceptions and is the database read-only We often have tens of projects for a single dataset in-

Figure 8 A cutout of an annotation database (left)and the dense read of a single annotation (right)

cluding original data cleaned data and multiple annotationdatabases that describe different structures or use differentannotations algorithms

Object Representations Annotation databases combineall annotations of the same value (identifier) into an ob-ject associated with metadata and support spatial queriesagainst individual objects The service provides several dataoptions that allow the query to specify whether to retrievedata and in what format (Table 1) The default retrievesmetadata only from the databases tables that implementthe RAMON ontology A query may request a bounding-box around the annotation which queries a spatial indexbut does not access voxel data All the data may be re-trieved as a list of voxels or as a dense array by specify-ing cutout The dense data set has the dimensions of thebounding box Dense data may also be restricted to a spe-cific region by specifying ranges Data queries retrieve voxeldata from cuboids and then filter out all voxel labels thatdo not match the requested annotation

We provide both sparse (voxels lists) and dense (cutout)data interfaces to provide different performance optionswhen retrieving spatial data for an object The best choicedepends upon both the data and the network environmentAt the server it is always faster to compute the dense cutoutThe server reads cuboids from disk and filter the data inplace in the read buffer To compute voxel lists the match-ing voxels locations are written to another array Howevermany neural structures have large spatial extent and areextremely sparse For these the voxel representations aremuch smaller On WAN and Internet connections the re-duced network transfer time dominates additional computetime For example the dendrite 13 in kasthuri11 set com-prises 8 million voxels in a bounding box of more than 19trillion voxels ie less that 04 of voxels are in the annota-tion Other neural structures such as synapses are compactand dense interfaces always perform better

To write an annotation clients make an HTTP PUT requestto a project that includes an HDF5 file All writes use thesame base URL because the HDF5 file specifies the anno-tation identifier or gives no identifier causing the server tochoose a unique identifier for a new object The data optionsspecify the write discipline for voxels in the current anno-tation that are already labeled in the database overwritereplaces prior labels preserve keeps prior labels and ex-ception keeps the prior label and marks the new label as anexception Other data options include update when mod-ifying an existing annotation and dataonly to write voxellabels without changing metadata

Batch Interfaces OCP provides interfaces to read or writemultiple annotations at once in order to amortize the fixedcosts of making a Web service request over multiple writesand reads Batching requests is particularly important whencreating neural structures with little or no voxel data inwhich the Web service invocation dominates IO costs tothe database This was the cases with our synapse finder(Section 2) in which we doubled throughput by batching 40writes at a time HDF5 supports a directory structure that

3-d image cutout httpopenconnectometokenhdf5resolutionx-rangey-rangez-range 5123 at offset (1024 1024 1024) resolution 4 httpopenconnectomebock11hdf54512102451210245121024

Read an annotation httpopenconnectometokenidentifierdata options and voxel list for identifier 75 httpopenconnectomeannoproj75voxels and bounding box httpopenconnectomeannoproj75boundingbox and cutout restricted to a region httpopenconnectomeannoproj75cutout210002000100020001020

Write an annotation httpopenconnectometokendata optionsBatch read httpopenconnectomeannproj100010011002

Predicate query (find all synapses) httpopenconnectomeannoprojobjectstypesynapse

Table 1 RESTful interfaces to the OCP Cutout and Annotation Web services Italics indicate argumentsand parameters annoproj is an example annotation project

we use to encode multiple RAMON objects placing eachobject in its own directory by annotation identifier

Querying Metadata OCP provides a keyvalue queryinterface to object metadata The queries extract a listof objects that match a predicate against metadata Wecurrently allow equality queries against integers enumera-tions strings and user-defined keyvalue pairs and rangequeries against floating points values Although a limitedinterface these queries are simple and expressive for cer-tain tasks The objects Web service (Table 1) requests oneor more metadata fields and values (or metadata field in-equality operator and value for floating point fields) andthe service returns a list of matching annotation identi-fiers As an example we use the url openconnectomeobjectstypesynapseconfidencegeq099 to visualizethe highest confidence objects found by the synapse detec-tion pipeline

Although we do not currently provide full relational accessto databases we will provide SQL access for more complexqueries and move power users to SQL To date Web serviceshave been sufficient for computer vision pipelines But wesee no reason to re-implement an SQL-equivalent parser andlanguage through Web Services

Spatial Queries and Indexing Objects Critical to spa-tial analysis of annotation databases are two queries (1)what objects are in a region and (2) what voxels comprisean object On these primitives OCP applications buildmore complex queries such as nearest neighbors cluster-ing and spatial distributions OCPrsquos database design is wellsuited to identifying the annotations in a region The serviceperforms a cutout of the region and extracts all unique ele-ments in that region In fact the Numpy library in Pythonprovides a built in function to identify unique elements thatis implemented in C Locating the spatial extent of an objectrequires an additional index

OCP uses a simple sparse indexing technique that supportsbatch IO to locate the spatial extent of an object its voxelsand bounding box The index comprises a database tablethat enumerates the list of cuboids that contain voxels foreach annotation identifier (Figure 9) The list itself is aBLOB that contains a Python array Although this index isnot particularly compact it has several desirable properties

Adding to the index is efficient It is a batch operationand uses append only IO When writing an annotation ina cuboid we determine that the annotation is new to thecuboid based on the current contents and add the cuboidrsquosMorton-order location to a temporary list After all cuboidshave been updated a single write transaction appends allnew cuboids to the list as a batch The annotation pro-cess be it manual of machine vision tends to create newobjects and only rarely deletes or prunes existing objectsThe workload suits an append-mostly physical design

Figure 9 The sparse index for an object (green) isa list of the Morton-order location of the cuboidsthat contain voxels for that annotation The indexdescribes the disk blocks that contain voxels for thatobject which can be read in a single pass

Retrieving an object allows for batch IO and retrieves alldata in a single sequential pass The query retrieves the listof cuboids and sorts then by Morton-order locations Thenall cuboids are requested as a single query Cuboids arelaid out in increasing Morton order on disk and thus areretrieved a single sequential pass over the data

We chose this design over existing spatial indexing tech-niques because it is particularly well suited to neuroscienceobjects which are sparse and have large spatial extent ForOCP data indexes that rely on bounding boxes either growvery large (R-Trees [11]) or have inefficient search (R+-Trees[37]) Informally neural objects are long and skinny andthere are many in each region so that bounding boxes inter-sect and overlap pathologically An alternative is to use di-rected local search techniques that identify objects by a cen-troid and locate the object by searching nearby data [29 42]For OCP data these techniques produce much more com-pact indexes that are faster to maintain However query-ing the index requires many IOs to conduct an interactivesearch and we prefer lookup performance to maintenancecosts We plan to quantify and evaluate these informal com-parisons in the future

Software Application servers run a DjangoPython stackintegrated with Apache2 using WSGI that processes Webservice requests This includes performing database queriesassembling and filtering spatial data and formatting dataand metadata into and out of HDF5 files We uses paral-lel Cython to accelerate the most compute intensive tasks

(a) Maximum (b) By cutout size (c) By cutout size (loglog scale)

Figure 10 The performance of the cutout Web-service that extracts three-dimensional subvolumes from thekasthuri11 image database

Cython compiles Python code into the C language so thatit executes without the overhead of interpretation Paral-lel Cython activates OpenMP multicore parallelism withinCython routines The compute intensive routines that weaccelerate with Cython operate against every voxel in acutout Examples include (1) false coloring annotations forimage overlays in which every 32-bit annotation is mappedto an RGBA color in an output image buffer and (2) filter-ing out annotations that do not match a specified criteriaeg to find all synapses in a region one queries the metadatato get a list of synapse ids and then filters all non-matchingidentifiers out of the cutout

5 PRELIMINARY PERFORMANCEWe conducted experiments against the live Web-services toprovide an initial view of performance The major con-clusion is that memory and IO performance both limitthroughput The process of array slicing and assembly forcutout requests keeps all processors fully utilized reorganiz-ing data in memory We had not witnessed the memorybottleneck prior to this study because OCP serves data overa 1 Gbps Ethernet switch and a 1 Gbps Internet uplinkwhich we saturate long before other cluster resources Aswe migrate to the Data-Scope cluster (Section 41) we willupgrade to a 40 Gbps Internet2 uplink and more computecapable nodes which will make memory an even more crit-ical resource

Experiments were conducted on an OCP Database node aDell R710 with two Intel Xeon E5630 quad-core processors64GB of memory and a Dell H700 RAID controller manag-ing a RAID-6 array of 11 3TB SATA drives for a 27TB filesystem All Web service requests are initiated on the samenode and the use the localhost interface allowing us to testthe server beyond the 1 Gbps network limit Experimentsshow the cutout size not the size of the data transferred orread from disk Each cuboid is compressed on disk read anduncompressed The data are then packaged into a cutoutwhich is compressed prior to data transfer The cutout sizeis the amount of data that the server must handle in mem-ory The EM mouse brain data in question has high entropyand tends to compress by less than 10 The annotationdata are highly compressible

Cutout throughput is the principal performance measurefor our system dictating how much data the service pro-vides to parallel computer vision workflows Figure 10(a)

Figure 11 Throughput of 256MB cutout requests tokasthuri11 as a function of the number of concurrentrequests

shows the maximum throughput achieved over all configu-rations when issuing 16 parallel cutout requests When dataare in memory and cutout requests are aligned to cuboidboundaries (aligned memory) the system performs no IOand minimally rearranges data Processing in the appli-cation stack bounds performance In this case a singlenode realizes a peak throughput of more than 173 MBsfor the largest transfers Cutouts to random offsets alignedto cuboid boundaries add IO costs to each request andbring performance down to a peak of 121 MBs Unalignedcutouts require data to be reorganized in memory movingbyte ranges that are unaligned with the cache hierarchyThis incurs further performance penalties and peak through-put drops to only 61 MBs The IO cost of these requestsis only marginally more than aligned cutouts rounding eachdimension up to the next cuboid Unaligned cutouts revealsthe dominance of memory performance

Scalability results reveal fixed costs in both Web-service in-vocation and IO that become amortized for larger cutoutsFigures 10(b) and 10(c) show the throughput as a functionof cutout size in normal and loglog scale The experimentsuses 16 parallel requests each Performance scales nearly lin-early until 256K for reads (IO cost) and almost 1 MB for in-

Figure 12 Throughput of writing annotations as afunction of the size of the annotated region

cache (Web-service costs) Beyond this point performancecontinues to increase albeit more slowly For aligned andunaligned reads we attribute the continued increase to theMorton-order space-filling curve Larger cutouts intersectlarger aligned regions of the Morton-order curve producinglarger contiguous IOs [23] We do not report performanceabove a 256M cutout size Beyond this point the buffersneeded by the application and Web server exceed memorycapacity

To realize peak throughput we had to initiate multiple re-quests in parallel The application stack runs each Web-service request on a single process thread and memory lim-its the per-thread performance Figure 11 shows throughputas a function of the number of parallel requests Through-put scales with the number of parallel requests beyond theeight physical cores of the machine to 16 when reading datafrom disk and to 32 when reading from memory Perfor-mance scales beyond the 8 cores owing to some combina-tion of overlapping IO with computation and hyperthread-ing Too much parallelism eventually leads to a reduction inthroughput because all hardware resources are fully utilizedand more parallelism introduces more resource contention

Parallelism is key to performance in the OCP Data Clusterand the easiest way to realize it is through initiating concur-rent Web-service requests Parallelizing individual cutoutswould benefit the performance of individual request butnot overall system throughput because multiple requests al-ready consume the memory resources of all processors Wenote that the Numpy Python library we use for array ma-nipulation implements parallel operators using BLAS butdoes not parallelize array slicing

Figure 12 shows the write throughput as a function ofthe volume size when uploading annotated volumes An-notations differ from cutouts in that the data are 32-bitsper voxel and are highly compressible For this experi-ment we uploaded a region dense manual annotations ofthe kasthuri11 data set in which more than 90 of voxelsare labeled Data compress to 6 the original size How-ever performance scales with the uncompressed size of theregion because that dictates how many cuboids must bemanipulated in memory This experiment uses 16 parallelrequests Write performance scales well up to 2MB cutoutsand is faster than read at the same cutout size because thedata compress much better However beyond 2MB trans-fers write throughput collapses The best performance overall configurations of 19 MBs does not compare well withthe 121 MBs when reading image cutouts Updating a vol-

Figure 13 Performance comparison of Databasenodes and SSD nodes when writing synapses (smallrandom writes)

ume of annotations is much more complex than a cutoutIt (1) reads the previous annotations (2) applies the newannotations to the volume database resolving conflicts ona per voxel basis (3) writes back the volume database (4)reads index entries for all new annotations (5) updates eachlist by unioning new and old cuboid locations and (5) writesback the index IO is doubled and index maintenance costsare added on top The critical performance issues is updat-ing the spatial index Parallel writes to the spatial indexresult in transaction retries and timeouts in MySQL dueto contention Often a single annotation volume will re-sult in the update of hundreds of index entries one for eachunique annotation identifier in the volume We can scale tolarger writes at the expense of using fewer parallel writersImproving annotation throughput is a high priority for theOCP Data Cluster

We also include an comparison of small write performancebetween an SSD node and a Database node The SSD nodeis a Dell R310 with an Intel Xeon 3430 with 4 24GHz cores16 GB of memory and two OCZ Vertex 4 drives config-ured in a RAID 0 array The experiment uploads all of thesynapse annotations in the kasthuri11 data in random or-der committing after each write Figure 13 shows that theSSD node achieves more than 150 the throughput of theDatabase array on small random writes The number ofsynapse writes per second is surprisingly low We write onlyabout 6 RAMON objects per second However a single Ra-mon object write generates updates to three different meta-data tables the index and the volume database The take-away from this experiment is that an inexpensive SSD node(lt$3000) offloads the write workload of an entire Databasenode (gt$18000) All IO in this experiment is randomIn practice we achieve much higher write throughput be-cause of locality and batching of requests Our synapsefinder workload uploaded more than 73 synapses a secondper node but that number reflects caching and effects whenupdating many synapses in a small region

6 RELATED WORKThe performance issues faced by the spatial databases in theOCP Data Cluster parallel those of SciDB [40 43] includingarray slicing selection queries joins and clusteringscale-out SciDB might benefit OCP by making data distributiontransparent and by offloading application functions imple-mented in Python to SciDB

Many of the physical design principles of OCP follow the Ar-rayStore of Soroush [39] They give a model for multi-levelpartitioning (chunking and tiling) that is more general thanOCP cuboids They can handle both regular and irregular

data ArrayStore extends decades of work on regular [6 24]and irregular [5] tiling The RasDaMan multidimensionalarray database [2] also builds spatial data services for im-age processing on top of a relational database They toohave explored the issues of tiling data organization anddata distribution [9] In contrast to general-purpose arraydatabases OCP has a focus on the specific application ofparallel computer vision for neuroscience and as such tunesits design to application concepts and properties Examplesinclude using the partitioning of space-filling curves in thedata distribution function and indexing of neural objects

OCPrsquos design incorporates many techniques developed inthe spatial data management community Samet [35] wrotethe authoritative text on the subject which we have used asa reference for region quad-trees space-filling curves tessel-lations and much more

OCP represents the a current state of evolution of the scale-out database architectures developed by the Institute forData-Intensive Engineering and Science at Johns HopkinsGray and Szalay developed the progenitor system in theSloan Digital Sky Survey [36] This has lead to more than 30different data products over the past decade including theJHU Turbulence Database Cluster [20] and the Life UnderYour Feet soil ecology portal [25]

7 FINAL THOUGHTSIn less than two years The Open Connectome Projecthas evolved from a problem statement to a data manage-ment and analysis platform for a community of neurosci-entists with the singular goal of mapping the brain High-throughput imaging instruments have driven data manage-ment off the workstation and out of the lab into data-intensive clusters and Web-services Also the scope of theproblem has forced experimental biologists to engage statis-ticians systems engineers machine learners and big-datascientists The Open Connectome Project aims to be theforum in which they all meet

The OCP Data Cluster has built a rich set of features tomeet the analysis needs of its users Application-specificcapabilities have been the development focus to date Theplatform manages data sets from many imaging modalities(Five to fifteen depending on how one counts them)

Presently our focus must change to throughput and scala-bility The near future holds two 100 teravoxel data setsmdasheach larger than the sum of all our data We will move to anew home in the Data-Scope cluster which will increase ouroutward-facing Internet bandwidth by a factor of forty Wealso expect the Connectomics community to expand rapidlyThe US Presidentrsquos Office recently announced a decade-longinitiative to build a comprehensive map of the human brain[21] This study reveals many limitations of the OCP datacluster and provides guidance to focus our efforts specif-ically toward servicing many small writes to record brainstructures and efficient and parallel manipulation of data toalleviate memory bottlenecks

AcknowledgmentsThe authors would like to thank additional members of theOpen Connectome Project that contributed to this workincluding Disa Mhembere and Ayushi Sinha We wouldalso like to thank our collaborators Forrest Collman AlbertoCardona Cameron Craddock Michael Milham Partha Mi-tra and Sebastian Seung

8 ADDITIONAL AUTHORSlowast Department of Computer Science and the Institute forData Intensive Engineering and Science Johns Hopkins Uni-versity

dagger Johns Hopkins University Applied Physics LaboratoryDagger Department of Statistical Science and Mathematics andthe Institute for Brain Science Duke Universitysect Janelia Farm Research Campus Howard Hughes MedicalInstitutepara Department of Molecular and Cellular Biology HarvardUniversity Department of Computational Neuroscience MassachusettsInstitute of Technologylowastlowast Allen Institute for Brain Sciencedaggerdagger Department of Physics and Astronomy and the Institutefor Data Intensive Engineering and Science Johns HopkinsUniversityDaggerDagger Department of Bioengineering Stanford Universitysectsect Department of Molecular and Cellular Physiology Stan-ford University

9 REFERENCES[1] D J Abadi S R Madden and M Ferreira

Integrating compression and execution incolumn-oriented database systems In SIGMOD 2006

[2] P Baumann A Dehmel P Furtado R Ritsch andB Widmann The multidimensional database systemRasDaMan In SIGMOD 1998

[3] D D Bock W-C A Lee A M Kerlin M LAndermann G Hood A W Wetzel S YurgensonE R Soucy H S Kim and R C Reid Networkanatomy and in vivo physiology of visual corticalneurons Nature 471(7337) 2011

[4] A Cardona S Saalfeld J SchindelinI Arganda-Carreras S Preibisch M LongairP Tomancak V Hartenstein and R J DouglasTrakEM2 software for neural circuit reconstructionPLoS ONE 7(6) 2012

[5] C Chang A Acharya A Sussman and J Saltz T2a customizable parallel database for multi-dimensionaldata SIGMOD Record 27(1) 1998

[6] C Chang B Moon A Acharya C ShockA Sussman and J Saltz Titan A high-performanceremote sensing database In ICDE 1997

[7] F Crow Summed-area tables for texture mapping InSIGGRAPH 1984

[8] R T Fielding R Taylor and N Richard Principleddesign of the modern Web architecture ACMTransactions on Internet Technology 2(2) 2002

[9] P Furtado and P Baumann Storage ofmultidimensional arrays based on arbitrary tiling InICDE 1999

[10] J Gray and A Szalay Science in an exponentialworld Nature 440(23) 23 March 2006

[11] A Guttman R-trees a dynamic index structure forspatial searching In ACM SIGMOD Conference 1984

[12] H S Seung et al Eyewire Available at eyewireorg2012

[13] V Jain H S Seung and S C Turaga Machines thatlearn to segment images a crucial technology forconnectomics Current opinion in neurobiology 20(5)2010

[14] V Jain S Turaga K Briggman W Denk andS Seung Learning to agglomerate superpixelhierarchies In Neural Information Processing Systems2011

[15] K Kanov R Burns G Eyink C Meneveau andA Szalay Data-intensive spatial filtering in largenumerical simulation datasets In Supercomputing2012

[16] N Kasthuri and J Lichtman Untitled Inpreparation 2013

[17] V Kaynig T Fuchs and J M BuhmannGeometrical consistent 3d tracing of neuronalprocesses in ssTEM data In MICCAI 2010

[18] M Kazhdan and H Hoppe Streaming multigrid forgradient-domain operations on large images ACMTransactions on Graphics 27 2008

[19] D M Kleissas W R Gray J M Burck J TVogelstein E Perlman P M Burlina R Burns andR J Vogelstein CAJAL3D toward a fully automaticpipeline for connectome estimation fromhigh-resolution em data In Neuroinformatics 2012

[20] Y Li E Perlman M Wang Y Yang C MeneveauR Burns S Chen A Szalay and G Eyink A publicturbulence database cluster and applications to studyLagrangian evolution of velocity increments inturbulence Journal of Turbulence 9(31)1ndash29 2008

[21] J Markoff Obama seeking to boost study of humanbrain New York Times February 17 2013

[22] K D Micheva B Busse N C Weiler N OrsquoRourkeand S J Smith Single-synapse analysis of a diversesynapse population Proteomic imaging methods andmarkers Neuron 68(1) 2010

[23] B Moon H V Jagadish C Faloutsos and J HSaltz Analysis of the clustering properties of theHilbert space-filling curve IEEE Transactions onKnowledge and Data Engineering 13(1) 2001

[24] B Moon and J Saltz Scalability analysis ofdeclustering methods for multidimensional rangequeries Trans of Knowledge and Data Engineering2(10) 1998

[25] R A Musaloiu-E A Terzis K Szlavecz A SzalayJ Cogan and J Gray Life under your feet Wirelesssensors in soil ecology In Embedded NetworkedSensors 2006

[26] M Nielsen Reinventing Discovery PrincetonUniversity Press 2011

[27] A Norton and J Clyne The VAPOR visualizationapplication In High Performance Visualization 2012

[28] N OrsquoRourke N C Weiler K D Micheva and S JSmith Deep molecular diversity of mammaliansynapses why it matters and how to measure itNature Reviews Neuroscience 13(1) 2012

[29] S Papadomanolakis A Ailamaki J C Lopez T TuD R OrsquoHallaron and G Heber Efficient queryprocessing on unstructured tetrahedral meshes InACM SIGMOD 2006

[30] E Perlman R Burns Y Li and C Meneveau Dataexploration of turbulence simulations using a database

cluster In Supercomputing 2007[31] A V Reina M Gelbart D Huang J Lichtman

E L Miller and H Pfister Segmentation fusion forconnectomics In International Conference onComputer Vision 2011

[32] A V Reina W-K Jeong J Lichtman andH Pfister The connectome project discovering thewiring of the brain ACM Crossroads 18(1)8ndash132011

[33] D E Rex J Q Ma and A W Toga The LONIpipeline processing environment Neuroimage 19(3)2003

[34] S Saalfeld A Cardona V Hartenstein andP Tomancak CATMAID collaborative annotationtoolkit for massive amounts of image dataBioinformatics 25(15) 2009

[35] H Samet Foundations of Multidimensional andMetric Data Structures Morgan Kaufmann PublishersInc San Francisco 2006

[36] The Sloan Digital Sky Survey 2013 Available athttpwwwsdssorg

[37] T Sellis N Roussopoulos and C Faloutsos TheR+-tree A dynamic index for multi-dimensionalobjects In VLDB 1987

[38] C Sommer C Straehle U Koethe and F AHamprecht rdquoilastik Interactive learning andsegmentation toolkitrdquo In Biomedical Imaging 2011

[39] E Soroush M Balazinska and D Wang ArraystoreA storage manager for complex parallel arrayprocessing In SIGMOD 2011

[40] M Stonebraker J Becla D DeWitt K-T LimD Maier O Ratzesberger and S ZdonikRequirements for science data bases and SciDB InConference on Innovative Data Systems Research2009

[41] A S Szalay K Church C Meneveau A Terzis andS Zeger MRI The Development of Data-Scopemdashamulti-petabyte generic data analysis environment forscience Available at httpswikiphajhueduescience_wikiimages77fDataScopepdf 2012

[42] F Tauheed L Biveinis T Heinis F SchurmannH Markram and A Ailamaki Accelerating rangequeries for brain simulations In ICDE 2012

[43] The SciDB Development Team Overview of SciDBlarge scale array storage processing and analysis InSIGMOD 2010

[44] K Wu FastBit Interactively searching massive dataJournal of Physics Conference Series 180(1) 2009

  • 1 Introduction
  • 2 Data and Applications
  • 3 Data Model
    • 31 Physical Design
    • 32 Annotations
    • 33 Tiles
    • 34 Data Cleaning
      • 4 System Design
        • 41 Architecture
        • 42 Web Services
          • 5 Preliminary Performance
          • 6 Related Work
          • 7 Final Thoughts
          • 8 Additional Authors
          • 9 References
Page 3: The Open Connectome Project Data Cluster: Scalable ... · The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience Randal Burns William

Figure 3 Visualization of six channels array tomog-raphy data courtesy of Nick Weiler and StephenSmith [28 22] Data were drawn from a 17-channeldatabase and rendered by the OCP cutout service

database filter out all other annotations We also use thedensely annotated regions as a ldquoground truthrdquo for evaluat-ing machine vision reconstruction algorithms

3 DATA MODELThe basic storage structure in OCP is a dense multi-dimensional spatial array partitioned into cuboids (rectan-gular subregions) in all dimensions Cuboids in OCP aresimilar in design and goal to chunks in ArrayStore [39] Eachcuboid gets assigned an index using a Morton-order space-filling curve (Figure 4) Space-filling curves organize datarecursively so that any power-of-two aligned subregion iswholly contiguous in the index [30] Space-filling curves min-imize the number of discontiguous regions needed to retrievea convex shape in a spatial database [23] While the Hilbertcurve has the best properties in this regard we choose theMorton-order curve for two reasons It is simple to evaluateusing bit interleaving of offsets in each dimension unlikeother curves that are defined recursively Also cube ad-dresses are strictly non-decreasing in each dimension so thatthe index works on subspaces Non-decreasing offsets alsoaid in interpolation filtering and other image processingoperations [15 7]

Image data contain up to 5 dimensions and often exhibitanisotropy For example serial section electron microscopydata come from imaging sections created by slicing or ablat-ing a sample The resolution of image plane (XY) is deter-mined by the instrument and the sections (Z) by the section-ing technique The difference in resolution is often a factorof 10 The fourth and fifth dimension arise in other imagingmodalities Some techniques produce time-series such asfunctional magnetic-resonance and light-microscopy of cal-cium channels Others image multiple channels that corre-spond to different proteins or receptors (Figure 3) Thesedata are studied by correlating multiple channels eg spa-tial co-occurence or exclusion to reveal biology

31 Physical DesignWe store a multi-resolution hierarchy for each image dataset so that analyses and visualization can choose the ap-propriate scale For EM data each lower resolution reducesthe data size by a factor of four halving the scale in X andY Typically we do not scale Z because it is poorly resolvedWe also do not scale time or channels The bock11 data hasnine levels and kasthuri11 six As an example of choosingscale our bock11 synapse detector runs on high resolution

Figure 4 Partitioning the Morton (z-order) space-filling curve For clarity the figure shows 16 cuboidsin 2-dimensions mapping to four nodes The z-ordercurve is recursively defined and scales in dimensionsand data size

Figure 5 The resolution hierarchy scales the XYdimensions of cuboids but not Z So that cuboidscontain roughly equal lengths in all dimensions

data because synapses have limited spatial extent (tens ofvoxels in any dimension) We detect synapses at resolutionone four times smaller and four times faster than the rawimage data We found that the algorithms was no less accu-rate at this scale We analyze large structures that cannotcontain synapses such as blood vessels and cell bodies tomask out false positives We developed this software for thisanalysis but many of the techniques follow those describein ilastik [38] We conduct the analysis at resolution 5 inwhich each voxel represents a (32321) voxel region in theraw data The structures are large and detectable at lowresolution and the computation requires all data to be inmemory to run efficiently

Resolution scaling and workload dictate the the dimensionand shape of cuboids For EM data computer vision algo-rithms operate on data regions that are roughly cubic in theoriginal sample so that they can detect brain anatomy in allorientations Because the X and Y dimensions are scaledbut not Z this results in different sized voxel requests atdifferent resolutions For this reason use different shapedcuboids at different levels in the hierarchy (Figure 5) Forexample at the highest three resolutions in bock11 cuboidsare flat (128x128x16) because each voxel represents 10 timesas much length in Z as X and Y Beyond level 4 we shift

to a cube of (64x64x64) We include time series in the spa-tial index using a 4-d space filling curve Time is oftenas large as other dimensions 1000s of time points in MRdata This supports queries that analyze the time historyof a smaller region We do not include channel data in theindexmdashthere are different cuboids for different channelsmdashbecause the number of channels tend to be few (up to 17)and most analyses look at combinations of a few channels

Cuboids contain only 218 = 256K of data which is a com-promise among the different uses of the data This sizemay be too small to optimize IO throughput Space fillingcurves mitigate this by ensuring that larger aligned regionsare stored sequentially and can be read in a single stream-ing IO An important use of our service extracts lower-dimensional projections for visualization of EM data andsub-space clustering of 4-d and 5-d data Keeping the cuboidsize small reduces the total amount of data to be read (anddiscarded) for these queries

32 AnnotationsAn annotation project contains a description of the spa-tial extent and metadata for objects detected in an imagedatabase Each project has a spatial database registered toan image data set Projects may be shared among teams ofhuman annotators that want a coherent view of what struc-tures have been labeled Projects are also plentiful eachparameterization of a computer vision algorithm may havea project so that outputs can be compared Annotations alsohave a resolution hierarchy so that large structures such ascell bodies may be found and marked at low resolution anddetail synapses and dendritic spines at high resolution

Anldquoannotationrdquoconsists of an object identifier that is linkedto object metadata in the RAMON (Reusable AnnotationMarkup for Open coNectomes) neuroscience ontology [19]and a set of voxels labeled with that identifier in the spa-tial database RAMON is a hierarchical systems engineeringframework for computer-vision and brain anatomy that in-clude concepts such as synapses seeds neurons organellesetc We developed RAMON for use within the Open Con-nectome Project community of collaborators it is not a stan-dard Although annotations are often sparse we store themin dense cuboids We found that this design is flexible andefficient When annotations are dense such as the outputof the automatic image segmentation [31] storing them incuboids outperforms sparse lists by orders of magnitude Toimprove performance when annotations are sparse we allo-cate cuboids lazily regions with no annotations use no stor-age and can be ignored during reads We also gzip compresscube data Cube labels compress well because they havelow entropy with both many zero values and long repeatedruns of non-zero values (labeled regions) Recent implemen-tations of run length encoding [1 44] may be preferable butwe have not evaluated them

We support multiple annotations per voxel through excep-tions An exceptions list per cuboid tracks multiply-labeledvoxels Exceptions are activated on a per project basiswhich incurs a minor runtime cost to check for exceptionson every read even if no exceptions are defined Each anno-tation write specifies how to deal with conflicting labels byoverwriting or preserving the previous label or by creatingan exception

For performance reasons we initially store annotations atsingle level in the resolution hierarchy and propagate themto all levels as a background batch IO job The conse-quence is that annotations are only immediately visible atthe resolution at which they are written The alternativewould be to update all levels of the hierarchy for each writeWhile there are data structures and image formats for this

Figure 6 Original (left) and color corrected (right)images across multiple serial sections [16]

task such as wavelets [27] the incremental maintenance ofa resolution hierarchy always makes updates more complexand expensive The decision to not make annotations con-sistent instantaneously reflects how we perform vision andanalysis Multiple algorithms are run to detect differentstructures at different levels and fuse them together Thenwe build the hierarchy of annotations prior to performingspatial analysis Because write IO performance limits sys-tem throughput we sacrifice data consistency to optimizethis workflow We may need to revisit this design decisionfor other applications

33 TilesAt present we store a redundant version of the image datain a 2-d tile stack as storage optimization for visualizationin the Web Viewer CATMAID [34] Tiles are 256x256 to1024x1024 image sections used in a pan and zoom interactiveviewer CATMAID dynamically loads and prefetches tiles sothat the user can experience continuous visual flow throughthe image region

The cutout service provides the capability to extract imageplanes as an alternative to storing tiles To do so it reads3-d cubes extracts the requested plane and discards thevast majority of the data To dynamically build tiles forCATMAID we use an http rewrite rule to convert a requestfor a file url into an invocation of the cutout Web service

Our current practice stores tiles for the image planendashthedimension of highest isotropic resolutionndashand dynamicallybuilds tiles from the cutout service for the orthogonal di-mensions including time Most visualization is done in theimage plane because anisotropy exposure differences andimperfect registration between slices makes images more dif-ficult to interpret (Figure 6) We modify CATMAIDrsquos stor-age organization to create locality within each file systemdirectory By default CATMAID keeps a directory for eachslice so that a tile at resolution r at coordinates (x y z) isnamed zy_x_rpng This places files from multiple reso-lutions in the same directory which decreases lookup speedand sequential IO Again using http rewrite rules we re-structure the hierarchy into rzy_xpng This halves thenumber of files per directory and more importantly eachdirectory corresponds to a single CATMAID viewing plane

Figure 7 OCP Data Cluster and clients as configured to run and visualize the parallel computer visionworkflow for synapse detection (Section 2)

We believe that we can provide performance similar to tilesin the cutout service through a combination of prefetchingand caching Doing so would eliminate the redundant stor-age of tiles Instead of doing a planar cutout for each tilewe could round the request up to the next cuboid and ma-terialize and cache all the nearby tiles either on the serveror in a distributed memory cache This is future work

34 Data CleaningWhile not a database technology we mention that we colorcorrect image data to eliminate exposure differences betweenimage planes The process solves a global Poisson equa-tion to minimize steep gradients in the low frequencies out-putting smoothed data High-frequencies from the originalimage are added back in to preserve the edges and struc-tures used in computer vision The process was derived fromwork by Kazhdan [18] The resulting color corrected dataare easier to interpret in orthogonal views (Figure 6) Wealso believe that they will improve the accuracy of computervision a hypothesis that we are exploring

4 SYSTEM DESIGNWe describe the OCP Data Cluster based on the configu-ration we used for running the synapse detector (Figure 7)because this demonstrates a parallel computer vision appli-cation The figure includes parallel vision pipelines usingWeb services and a visualization function run from a Webbrowser both connecting over the public Internet

41 ArchitectureThe OCP Data Cluster consists of heterogeneous nodes eachdesigned for a different function or workload These includeapplication servers database nodes file system nodes andSSD IO nodes The database nodes store all of the imageand annotation data for cutout (except for high-IO anno-tation projects on SSD IO nodes) We currently have twoDell R710s with dual Intel Xeon E5630 quad-core proces-sors 64GB of memory and Dell H700 RAID controllersThe twelve hot-plug drive bays hold a RAID-6 array of 112TB (resp 3TB) SATA drives and a hot-spare for an 18TB(resp 27TB) file system Two drives have failed and rebuiltfrom the hot spare automatically in the last 18 months Weuse MySQL for all database storage

Application servers perform all network and data manipula-tion except for database query processing This includespartitioning spatial data requests into multiple databasequeries and assembling rewriting and filtering the re-sults We currently deploy two Web-servers in a load-balancing proxy They run on the same physical hardware as

our database nodes because processor capabilities of thesenodes exceed the compute demand of the databases Thesefunctions are entirely separable

File server nodes store CATMAID image tiles for visualiza-tion image data streamed from the instruments over theInternet and other ldquoprojectrdquo data from our partners thatneeds to be ingested into OCP formats The nodes are de-signed for capacity and sequential read IO We have a DellR710 with less memory (16GB) and compute (one E5620processor) than the database nodes The machine has thesame IO subsystem with a 27TB file system We use twoData-Scope nodes (described below) as file servers that eachhave a 12TB software RAID 10 array built on 24 1TB SATAdisks

SSD IO nodes are designed for the random write workloadsgenerated by parallel computer visions algorithms The twonodes are Dell R310s with an Intel Xeon 3430 processorwith 4 24GHz cores Each machine has a striped (RAID0) volume on two OCZ Vertex4 solid states drives Thesystem realizes 20K IOPS of the theoretical hardware limitof 120K an improved IO controller is needed We deployedthe SSD IO nodes in response to our experience writing19M synapses in 3 days when running the parallel synapsefinding algorithm For synapse finding we had to throttlethe write rate to 50 concurrent outstanding requests to avoidoverloading the database nodes

Our growth plans involve moving more services into theData-Scope [41] a 90 node data-intensive cluster with a10PB ZFS file system We are currently using two machinesas file system nodes and have an unused allocation of 250TB on the shared file system Data-Scope nodes have 2 IntelXeon 5690 hex core processors 64 GB of RAM 3 Vertex3SSDs and 24 local SATA disks and thus serve all of ourcluster functions well

Data Distribution We place concurrent workloads on dis-tinct nodes in order to avoid IO interference This arisesin two applications Computer vision algorithms eg oursynapse detector reads large regions from the cutout ser-vice and performs many small writes We map cutouts toa database node and small writes to an SSD node Vi-sualization reads image data from either the tile stack orcutout database and overlays annotations Again anno-tation databases are placed on different nodes than imagedatabases Notably the cutouts and tile stack are not usedtogether and could be placed on the same node BecauseSSD storage is a limited resource OCP migrates databasesfrom SSD nodes to database nodes when they are no longer

actively being written This is an administrative action im-plemented with MySQLrsquos dump and restore utilities andmost often performed when we build the annotation reso-lution hierarchy (Section 32)

We shard large image data across multiple database nodesby partitioning the Morton-order space filling curve (Fig-ure 4) Currently we do this only for our largest dataset (bock11) for capacity reasons We have not yet founda performance benefit from sharding data For shardeddatabases the vast majority of cutout requests go to a sin-gle node We do note that multiple concurrent users of thesame data set would benefit from parallel access to the nodesof a sharded database Our sharding occurs at the applica-tion level The application is aware of the data distributionand redirects requests to the node that stores the data Weare in the process of evaluating multiple NoSQL systems inwhich sharding and scaleout are implemented in the storagesystem rather than the application We are also evaluatingSciDB [40 43] for use as an array store for spatial data

42 Web ServicesWeb-services provide rich object representations that bothcapture the output of computer vision algorithms and sup-port the spatial analysis of brain anatomy We have usedthe service to perform analysis tasks such as spatial densityestimation clustering and building distance distributionsAnalysis queries typically involve selecting a sets of objectsbased on metadata properties examining the spatial extentof these objects and computing statistics on spatial proper-ties such as distances or volumes For example one analysislooked at the distribution of the lengths of dendritic spinesskinny connections from dendrites to synapses in order tounderstand the region of influence of neural wiring TheOCP queries used metadata to find all synapses of a certaintype that connect to the selected dendrite and extractedvoxels for synapses and the object

All programming interfaces to the OCP Data Cluster useRESTful (REpresentational State Transfer) [8] interfacesthat are stateless uniform and cacheable Web-service in-vocations perform HTTP GETPUTDELETE requests tohuman readable URLs RESTrsquos simplicity makes it easy tointegrate services into many environments we have built ap-plications in Java CC++ Python Perl php and Matlab

We have selected HDF5 as our data interchange formatHDF5 is a scientific data model and format We prefer itto more common Web data formats such as JSon or XMLbecause of its support for multidimensional arrays and largedata sets

Cutout Providing efficient access to arbitrary sub-volumesof image data guides the design to the OCP Data SystemThe query which we call a cutout specifies a set of dimen-sions and ranges within each dimension in a URL GETsthe URL from the Web service which returns an HDF5file that contains a multidimensional array (Table 1) Everydatabase in OCP supports cutouts EM image databases re-turn arrays of 8-bit grayscale values annotation databasesreturn 32-bit annotation identifiers and we also support 16-bit (TIFF) and 32-bit (RBGA) image formats

Projects and Datasets A dataset configuration describesthe dimensions of spatial databases This includes the num-ber of dimensions (time series and channels) the size of eachdimensions and the number of resolutions in the hierarchyA project defines a specific database for a dataset includingthe project type (annotations or images) the storage config-uration (nodes and sharding) and properties such as doesthe database support exceptions and is the database read-only We often have tens of projects for a single dataset in-

Figure 8 A cutout of an annotation database (left)and the dense read of a single annotation (right)

cluding original data cleaned data and multiple annotationdatabases that describe different structures or use differentannotations algorithms

Object Representations Annotation databases combineall annotations of the same value (identifier) into an ob-ject associated with metadata and support spatial queriesagainst individual objects The service provides several dataoptions that allow the query to specify whether to retrievedata and in what format (Table 1) The default retrievesmetadata only from the databases tables that implementthe RAMON ontology A query may request a bounding-box around the annotation which queries a spatial indexbut does not access voxel data All the data may be re-trieved as a list of voxels or as a dense array by specify-ing cutout The dense data set has the dimensions of thebounding box Dense data may also be restricted to a spe-cific region by specifying ranges Data queries retrieve voxeldata from cuboids and then filter out all voxel labels thatdo not match the requested annotation

We provide both sparse (voxels lists) and dense (cutout)data interfaces to provide different performance optionswhen retrieving spatial data for an object The best choicedepends upon both the data and the network environmentAt the server it is always faster to compute the dense cutoutThe server reads cuboids from disk and filter the data inplace in the read buffer To compute voxel lists the match-ing voxels locations are written to another array Howevermany neural structures have large spatial extent and areextremely sparse For these the voxel representations aremuch smaller On WAN and Internet connections the re-duced network transfer time dominates additional computetime For example the dendrite 13 in kasthuri11 set com-prises 8 million voxels in a bounding box of more than 19trillion voxels ie less that 04 of voxels are in the annota-tion Other neural structures such as synapses are compactand dense interfaces always perform better

To write an annotation clients make an HTTP PUT requestto a project that includes an HDF5 file All writes use thesame base URL because the HDF5 file specifies the anno-tation identifier or gives no identifier causing the server tochoose a unique identifier for a new object The data optionsspecify the write discipline for voxels in the current anno-tation that are already labeled in the database overwritereplaces prior labels preserve keeps prior labels and ex-ception keeps the prior label and marks the new label as anexception Other data options include update when mod-ifying an existing annotation and dataonly to write voxellabels without changing metadata

Batch Interfaces OCP provides interfaces to read or writemultiple annotations at once in order to amortize the fixedcosts of making a Web service request over multiple writesand reads Batching requests is particularly important whencreating neural structures with little or no voxel data inwhich the Web service invocation dominates IO costs tothe database This was the cases with our synapse finder(Section 2) in which we doubled throughput by batching 40writes at a time HDF5 supports a directory structure that

3-d image cutout httpopenconnectometokenhdf5resolutionx-rangey-rangez-range 5123 at offset (1024 1024 1024) resolution 4 httpopenconnectomebock11hdf54512102451210245121024

Read an annotation httpopenconnectometokenidentifierdata options and voxel list for identifier 75 httpopenconnectomeannoproj75voxels and bounding box httpopenconnectomeannoproj75boundingbox and cutout restricted to a region httpopenconnectomeannoproj75cutout210002000100020001020

Write an annotation httpopenconnectometokendata optionsBatch read httpopenconnectomeannproj100010011002

Predicate query (find all synapses) httpopenconnectomeannoprojobjectstypesynapse

Table 1 RESTful interfaces to the OCP Cutout and Annotation Web services Italics indicate argumentsand parameters annoproj is an example annotation project

we use to encode multiple RAMON objects placing eachobject in its own directory by annotation identifier

Querying Metadata OCP provides a keyvalue queryinterface to object metadata The queries extract a listof objects that match a predicate against metadata Wecurrently allow equality queries against integers enumera-tions strings and user-defined keyvalue pairs and rangequeries against floating points values Although a limitedinterface these queries are simple and expressive for cer-tain tasks The objects Web service (Table 1) requests oneor more metadata fields and values (or metadata field in-equality operator and value for floating point fields) andthe service returns a list of matching annotation identi-fiers As an example we use the url openconnectomeobjectstypesynapseconfidencegeq099 to visualizethe highest confidence objects found by the synapse detec-tion pipeline

Although we do not currently provide full relational accessto databases we will provide SQL access for more complexqueries and move power users to SQL To date Web serviceshave been sufficient for computer vision pipelines But wesee no reason to re-implement an SQL-equivalent parser andlanguage through Web Services

Spatial Queries and Indexing Objects Critical to spa-tial analysis of annotation databases are two queries (1)what objects are in a region and (2) what voxels comprisean object On these primitives OCP applications buildmore complex queries such as nearest neighbors cluster-ing and spatial distributions OCPrsquos database design is wellsuited to identifying the annotations in a region The serviceperforms a cutout of the region and extracts all unique ele-ments in that region In fact the Numpy library in Pythonprovides a built in function to identify unique elements thatis implemented in C Locating the spatial extent of an objectrequires an additional index

OCP uses a simple sparse indexing technique that supportsbatch IO to locate the spatial extent of an object its voxelsand bounding box The index comprises a database tablethat enumerates the list of cuboids that contain voxels foreach annotation identifier (Figure 9) The list itself is aBLOB that contains a Python array Although this index isnot particularly compact it has several desirable properties

Adding to the index is efficient It is a batch operationand uses append only IO When writing an annotation ina cuboid we determine that the annotation is new to thecuboid based on the current contents and add the cuboidrsquosMorton-order location to a temporary list After all cuboidshave been updated a single write transaction appends allnew cuboids to the list as a batch The annotation pro-cess be it manual of machine vision tends to create newobjects and only rarely deletes or prunes existing objectsThe workload suits an append-mostly physical design

Figure 9 The sparse index for an object (green) isa list of the Morton-order location of the cuboidsthat contain voxels for that annotation The indexdescribes the disk blocks that contain voxels for thatobject which can be read in a single pass

Retrieving an object allows for batch IO and retrieves alldata in a single sequential pass The query retrieves the listof cuboids and sorts then by Morton-order locations Thenall cuboids are requested as a single query Cuboids arelaid out in increasing Morton order on disk and thus areretrieved a single sequential pass over the data

We chose this design over existing spatial indexing tech-niques because it is particularly well suited to neuroscienceobjects which are sparse and have large spatial extent ForOCP data indexes that rely on bounding boxes either growvery large (R-Trees [11]) or have inefficient search (R+-Trees[37]) Informally neural objects are long and skinny andthere are many in each region so that bounding boxes inter-sect and overlap pathologically An alternative is to use di-rected local search techniques that identify objects by a cen-troid and locate the object by searching nearby data [29 42]For OCP data these techniques produce much more com-pact indexes that are faster to maintain However query-ing the index requires many IOs to conduct an interactivesearch and we prefer lookup performance to maintenancecosts We plan to quantify and evaluate these informal com-parisons in the future

Software Application servers run a DjangoPython stackintegrated with Apache2 using WSGI that processes Webservice requests This includes performing database queriesassembling and filtering spatial data and formatting dataand metadata into and out of HDF5 files We uses paral-lel Cython to accelerate the most compute intensive tasks

(a) Maximum (b) By cutout size (c) By cutout size (loglog scale)

Figure 10 The performance of the cutout Web-service that extracts three-dimensional subvolumes from thekasthuri11 image database

Cython compiles Python code into the C language so thatit executes without the overhead of interpretation Paral-lel Cython activates OpenMP multicore parallelism withinCython routines The compute intensive routines that weaccelerate with Cython operate against every voxel in acutout Examples include (1) false coloring annotations forimage overlays in which every 32-bit annotation is mappedto an RGBA color in an output image buffer and (2) filter-ing out annotations that do not match a specified criteriaeg to find all synapses in a region one queries the metadatato get a list of synapse ids and then filters all non-matchingidentifiers out of the cutout

5 PRELIMINARY PERFORMANCEWe conducted experiments against the live Web-services toprovide an initial view of performance The major con-clusion is that memory and IO performance both limitthroughput The process of array slicing and assembly forcutout requests keeps all processors fully utilized reorganiz-ing data in memory We had not witnessed the memorybottleneck prior to this study because OCP serves data overa 1 Gbps Ethernet switch and a 1 Gbps Internet uplinkwhich we saturate long before other cluster resources Aswe migrate to the Data-Scope cluster (Section 41) we willupgrade to a 40 Gbps Internet2 uplink and more computecapable nodes which will make memory an even more crit-ical resource

Experiments were conducted on an OCP Database node aDell R710 with two Intel Xeon E5630 quad-core processors64GB of memory and a Dell H700 RAID controller manag-ing a RAID-6 array of 11 3TB SATA drives for a 27TB filesystem All Web service requests are initiated on the samenode and the use the localhost interface allowing us to testthe server beyond the 1 Gbps network limit Experimentsshow the cutout size not the size of the data transferred orread from disk Each cuboid is compressed on disk read anduncompressed The data are then packaged into a cutoutwhich is compressed prior to data transfer The cutout sizeis the amount of data that the server must handle in mem-ory The EM mouse brain data in question has high entropyand tends to compress by less than 10 The annotationdata are highly compressible

Cutout throughput is the principal performance measurefor our system dictating how much data the service pro-vides to parallel computer vision workflows Figure 10(a)

Figure 11 Throughput of 256MB cutout requests tokasthuri11 as a function of the number of concurrentrequests

shows the maximum throughput achieved over all configu-rations when issuing 16 parallel cutout requests When dataare in memory and cutout requests are aligned to cuboidboundaries (aligned memory) the system performs no IOand minimally rearranges data Processing in the appli-cation stack bounds performance In this case a singlenode realizes a peak throughput of more than 173 MBsfor the largest transfers Cutouts to random offsets alignedto cuboid boundaries add IO costs to each request andbring performance down to a peak of 121 MBs Unalignedcutouts require data to be reorganized in memory movingbyte ranges that are unaligned with the cache hierarchyThis incurs further performance penalties and peak through-put drops to only 61 MBs The IO cost of these requestsis only marginally more than aligned cutouts rounding eachdimension up to the next cuboid Unaligned cutouts revealsthe dominance of memory performance

Scalability results reveal fixed costs in both Web-service in-vocation and IO that become amortized for larger cutoutsFigures 10(b) and 10(c) show the throughput as a functionof cutout size in normal and loglog scale The experimentsuses 16 parallel requests each Performance scales nearly lin-early until 256K for reads (IO cost) and almost 1 MB for in-

Figure 12 Throughput of writing annotations as afunction of the size of the annotated region

cache (Web-service costs) Beyond this point performancecontinues to increase albeit more slowly For aligned andunaligned reads we attribute the continued increase to theMorton-order space-filling curve Larger cutouts intersectlarger aligned regions of the Morton-order curve producinglarger contiguous IOs [23] We do not report performanceabove a 256M cutout size Beyond this point the buffersneeded by the application and Web server exceed memorycapacity

To realize peak throughput we had to initiate multiple re-quests in parallel The application stack runs each Web-service request on a single process thread and memory lim-its the per-thread performance Figure 11 shows throughputas a function of the number of parallel requests Through-put scales with the number of parallel requests beyond theeight physical cores of the machine to 16 when reading datafrom disk and to 32 when reading from memory Perfor-mance scales beyond the 8 cores owing to some combina-tion of overlapping IO with computation and hyperthread-ing Too much parallelism eventually leads to a reduction inthroughput because all hardware resources are fully utilizedand more parallelism introduces more resource contention

Parallelism is key to performance in the OCP Data Clusterand the easiest way to realize it is through initiating concur-rent Web-service requests Parallelizing individual cutoutswould benefit the performance of individual request butnot overall system throughput because multiple requests al-ready consume the memory resources of all processors Wenote that the Numpy Python library we use for array ma-nipulation implements parallel operators using BLAS butdoes not parallelize array slicing

Figure 12 shows the write throughput as a function ofthe volume size when uploading annotated volumes An-notations differ from cutouts in that the data are 32-bitsper voxel and are highly compressible For this experi-ment we uploaded a region dense manual annotations ofthe kasthuri11 data set in which more than 90 of voxelsare labeled Data compress to 6 the original size How-ever performance scales with the uncompressed size of theregion because that dictates how many cuboids must bemanipulated in memory This experiment uses 16 parallelrequests Write performance scales well up to 2MB cutoutsand is faster than read at the same cutout size because thedata compress much better However beyond 2MB trans-fers write throughput collapses The best performance overall configurations of 19 MBs does not compare well withthe 121 MBs when reading image cutouts Updating a vol-

Figure 13 Performance comparison of Databasenodes and SSD nodes when writing synapses (smallrandom writes)

ume of annotations is much more complex than a cutoutIt (1) reads the previous annotations (2) applies the newannotations to the volume database resolving conflicts ona per voxel basis (3) writes back the volume database (4)reads index entries for all new annotations (5) updates eachlist by unioning new and old cuboid locations and (5) writesback the index IO is doubled and index maintenance costsare added on top The critical performance issues is updat-ing the spatial index Parallel writes to the spatial indexresult in transaction retries and timeouts in MySQL dueto contention Often a single annotation volume will re-sult in the update of hundreds of index entries one for eachunique annotation identifier in the volume We can scale tolarger writes at the expense of using fewer parallel writersImproving annotation throughput is a high priority for theOCP Data Cluster

We also include an comparison of small write performancebetween an SSD node and a Database node The SSD nodeis a Dell R310 with an Intel Xeon 3430 with 4 24GHz cores16 GB of memory and two OCZ Vertex 4 drives config-ured in a RAID 0 array The experiment uploads all of thesynapse annotations in the kasthuri11 data in random or-der committing after each write Figure 13 shows that theSSD node achieves more than 150 the throughput of theDatabase array on small random writes The number ofsynapse writes per second is surprisingly low We write onlyabout 6 RAMON objects per second However a single Ra-mon object write generates updates to three different meta-data tables the index and the volume database The take-away from this experiment is that an inexpensive SSD node(lt$3000) offloads the write workload of an entire Databasenode (gt$18000) All IO in this experiment is randomIn practice we achieve much higher write throughput be-cause of locality and batching of requests Our synapsefinder workload uploaded more than 73 synapses a secondper node but that number reflects caching and effects whenupdating many synapses in a small region

6 RELATED WORKThe performance issues faced by the spatial databases in theOCP Data Cluster parallel those of SciDB [40 43] includingarray slicing selection queries joins and clusteringscale-out SciDB might benefit OCP by making data distributiontransparent and by offloading application functions imple-mented in Python to SciDB

Many of the physical design principles of OCP follow the Ar-rayStore of Soroush [39] They give a model for multi-levelpartitioning (chunking and tiling) that is more general thanOCP cuboids They can handle both regular and irregular

data ArrayStore extends decades of work on regular [6 24]and irregular [5] tiling The RasDaMan multidimensionalarray database [2] also builds spatial data services for im-age processing on top of a relational database They toohave explored the issues of tiling data organization anddata distribution [9] In contrast to general-purpose arraydatabases OCP has a focus on the specific application ofparallel computer vision for neuroscience and as such tunesits design to application concepts and properties Examplesinclude using the partitioning of space-filling curves in thedata distribution function and indexing of neural objects

OCPrsquos design incorporates many techniques developed inthe spatial data management community Samet [35] wrotethe authoritative text on the subject which we have used asa reference for region quad-trees space-filling curves tessel-lations and much more

OCP represents the a current state of evolution of the scale-out database architectures developed by the Institute forData-Intensive Engineering and Science at Johns HopkinsGray and Szalay developed the progenitor system in theSloan Digital Sky Survey [36] This has lead to more than 30different data products over the past decade including theJHU Turbulence Database Cluster [20] and the Life UnderYour Feet soil ecology portal [25]

7 FINAL THOUGHTSIn less than two years The Open Connectome Projecthas evolved from a problem statement to a data manage-ment and analysis platform for a community of neurosci-entists with the singular goal of mapping the brain High-throughput imaging instruments have driven data manage-ment off the workstation and out of the lab into data-intensive clusters and Web-services Also the scope of theproblem has forced experimental biologists to engage statis-ticians systems engineers machine learners and big-datascientists The Open Connectome Project aims to be theforum in which they all meet

The OCP Data Cluster has built a rich set of features tomeet the analysis needs of its users Application-specificcapabilities have been the development focus to date Theplatform manages data sets from many imaging modalities(Five to fifteen depending on how one counts them)

Presently our focus must change to throughput and scala-bility The near future holds two 100 teravoxel data setsmdasheach larger than the sum of all our data We will move to anew home in the Data-Scope cluster which will increase ouroutward-facing Internet bandwidth by a factor of forty Wealso expect the Connectomics community to expand rapidlyThe US Presidentrsquos Office recently announced a decade-longinitiative to build a comprehensive map of the human brain[21] This study reveals many limitations of the OCP datacluster and provides guidance to focus our efforts specif-ically toward servicing many small writes to record brainstructures and efficient and parallel manipulation of data toalleviate memory bottlenecks

AcknowledgmentsThe authors would like to thank additional members of theOpen Connectome Project that contributed to this workincluding Disa Mhembere and Ayushi Sinha We wouldalso like to thank our collaborators Forrest Collman AlbertoCardona Cameron Craddock Michael Milham Partha Mi-tra and Sebastian Seung

8 ADDITIONAL AUTHORSlowast Department of Computer Science and the Institute forData Intensive Engineering and Science Johns Hopkins Uni-versity

dagger Johns Hopkins University Applied Physics LaboratoryDagger Department of Statistical Science and Mathematics andthe Institute for Brain Science Duke Universitysect Janelia Farm Research Campus Howard Hughes MedicalInstitutepara Department of Molecular and Cellular Biology HarvardUniversity Department of Computational Neuroscience MassachusettsInstitute of Technologylowastlowast Allen Institute for Brain Sciencedaggerdagger Department of Physics and Astronomy and the Institutefor Data Intensive Engineering and Science Johns HopkinsUniversityDaggerDagger Department of Bioengineering Stanford Universitysectsect Department of Molecular and Cellular Physiology Stan-ford University

9 REFERENCES[1] D J Abadi S R Madden and M Ferreira

Integrating compression and execution incolumn-oriented database systems In SIGMOD 2006

[2] P Baumann A Dehmel P Furtado R Ritsch andB Widmann The multidimensional database systemRasDaMan In SIGMOD 1998

[3] D D Bock W-C A Lee A M Kerlin M LAndermann G Hood A W Wetzel S YurgensonE R Soucy H S Kim and R C Reid Networkanatomy and in vivo physiology of visual corticalneurons Nature 471(7337) 2011

[4] A Cardona S Saalfeld J SchindelinI Arganda-Carreras S Preibisch M LongairP Tomancak V Hartenstein and R J DouglasTrakEM2 software for neural circuit reconstructionPLoS ONE 7(6) 2012

[5] C Chang A Acharya A Sussman and J Saltz T2a customizable parallel database for multi-dimensionaldata SIGMOD Record 27(1) 1998

[6] C Chang B Moon A Acharya C ShockA Sussman and J Saltz Titan A high-performanceremote sensing database In ICDE 1997

[7] F Crow Summed-area tables for texture mapping InSIGGRAPH 1984

[8] R T Fielding R Taylor and N Richard Principleddesign of the modern Web architecture ACMTransactions on Internet Technology 2(2) 2002

[9] P Furtado and P Baumann Storage ofmultidimensional arrays based on arbitrary tiling InICDE 1999

[10] J Gray and A Szalay Science in an exponentialworld Nature 440(23) 23 March 2006

[11] A Guttman R-trees a dynamic index structure forspatial searching In ACM SIGMOD Conference 1984

[12] H S Seung et al Eyewire Available at eyewireorg2012

[13] V Jain H S Seung and S C Turaga Machines thatlearn to segment images a crucial technology forconnectomics Current opinion in neurobiology 20(5)2010

[14] V Jain S Turaga K Briggman W Denk andS Seung Learning to agglomerate superpixelhierarchies In Neural Information Processing Systems2011

[15] K Kanov R Burns G Eyink C Meneveau andA Szalay Data-intensive spatial filtering in largenumerical simulation datasets In Supercomputing2012

[16] N Kasthuri and J Lichtman Untitled Inpreparation 2013

[17] V Kaynig T Fuchs and J M BuhmannGeometrical consistent 3d tracing of neuronalprocesses in ssTEM data In MICCAI 2010

[18] M Kazhdan and H Hoppe Streaming multigrid forgradient-domain operations on large images ACMTransactions on Graphics 27 2008

[19] D M Kleissas W R Gray J M Burck J TVogelstein E Perlman P M Burlina R Burns andR J Vogelstein CAJAL3D toward a fully automaticpipeline for connectome estimation fromhigh-resolution em data In Neuroinformatics 2012

[20] Y Li E Perlman M Wang Y Yang C MeneveauR Burns S Chen A Szalay and G Eyink A publicturbulence database cluster and applications to studyLagrangian evolution of velocity increments inturbulence Journal of Turbulence 9(31)1ndash29 2008

[21] J Markoff Obama seeking to boost study of humanbrain New York Times February 17 2013

[22] K D Micheva B Busse N C Weiler N OrsquoRourkeand S J Smith Single-synapse analysis of a diversesynapse population Proteomic imaging methods andmarkers Neuron 68(1) 2010

[23] B Moon H V Jagadish C Faloutsos and J HSaltz Analysis of the clustering properties of theHilbert space-filling curve IEEE Transactions onKnowledge and Data Engineering 13(1) 2001

[24] B Moon and J Saltz Scalability analysis ofdeclustering methods for multidimensional rangequeries Trans of Knowledge and Data Engineering2(10) 1998

[25] R A Musaloiu-E A Terzis K Szlavecz A SzalayJ Cogan and J Gray Life under your feet Wirelesssensors in soil ecology In Embedded NetworkedSensors 2006

[26] M Nielsen Reinventing Discovery PrincetonUniversity Press 2011

[27] A Norton and J Clyne The VAPOR visualizationapplication In High Performance Visualization 2012

[28] N OrsquoRourke N C Weiler K D Micheva and S JSmith Deep molecular diversity of mammaliansynapses why it matters and how to measure itNature Reviews Neuroscience 13(1) 2012

[29] S Papadomanolakis A Ailamaki J C Lopez T TuD R OrsquoHallaron and G Heber Efficient queryprocessing on unstructured tetrahedral meshes InACM SIGMOD 2006

[30] E Perlman R Burns Y Li and C Meneveau Dataexploration of turbulence simulations using a database

cluster In Supercomputing 2007[31] A V Reina M Gelbart D Huang J Lichtman

E L Miller and H Pfister Segmentation fusion forconnectomics In International Conference onComputer Vision 2011

[32] A V Reina W-K Jeong J Lichtman andH Pfister The connectome project discovering thewiring of the brain ACM Crossroads 18(1)8ndash132011

[33] D E Rex J Q Ma and A W Toga The LONIpipeline processing environment Neuroimage 19(3)2003

[34] S Saalfeld A Cardona V Hartenstein andP Tomancak CATMAID collaborative annotationtoolkit for massive amounts of image dataBioinformatics 25(15) 2009

[35] H Samet Foundations of Multidimensional andMetric Data Structures Morgan Kaufmann PublishersInc San Francisco 2006

[36] The Sloan Digital Sky Survey 2013 Available athttpwwwsdssorg

[37] T Sellis N Roussopoulos and C Faloutsos TheR+-tree A dynamic index for multi-dimensionalobjects In VLDB 1987

[38] C Sommer C Straehle U Koethe and F AHamprecht rdquoilastik Interactive learning andsegmentation toolkitrdquo In Biomedical Imaging 2011

[39] E Soroush M Balazinska and D Wang ArraystoreA storage manager for complex parallel arrayprocessing In SIGMOD 2011

[40] M Stonebraker J Becla D DeWitt K-T LimD Maier O Ratzesberger and S ZdonikRequirements for science data bases and SciDB InConference on Innovative Data Systems Research2009

[41] A S Szalay K Church C Meneveau A Terzis andS Zeger MRI The Development of Data-Scopemdashamulti-petabyte generic data analysis environment forscience Available at httpswikiphajhueduescience_wikiimages77fDataScopepdf 2012

[42] F Tauheed L Biveinis T Heinis F SchurmannH Markram and A Ailamaki Accelerating rangequeries for brain simulations In ICDE 2012

[43] The SciDB Development Team Overview of SciDBlarge scale array storage processing and analysis InSIGMOD 2010

[44] K Wu FastBit Interactively searching massive dataJournal of Physics Conference Series 180(1) 2009

  • 1 Introduction
  • 2 Data and Applications
  • 3 Data Model
    • 31 Physical Design
    • 32 Annotations
    • 33 Tiles
    • 34 Data Cleaning
      • 4 System Design
        • 41 Architecture
        • 42 Web Services
          • 5 Preliminary Performance
          • 6 Related Work
          • 7 Final Thoughts
          • 8 Additional Authors
          • 9 References
Page 4: The Open Connectome Project Data Cluster: Scalable ... · The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience Randal Burns William

to a cube of (64x64x64) We include time series in the spa-tial index using a 4-d space filling curve Time is oftenas large as other dimensions 1000s of time points in MRdata This supports queries that analyze the time historyof a smaller region We do not include channel data in theindexmdashthere are different cuboids for different channelsmdashbecause the number of channels tend to be few (up to 17)and most analyses look at combinations of a few channels

Cuboids contain only 218 = 256K of data which is a com-promise among the different uses of the data This sizemay be too small to optimize IO throughput Space fillingcurves mitigate this by ensuring that larger aligned regionsare stored sequentially and can be read in a single stream-ing IO An important use of our service extracts lower-dimensional projections for visualization of EM data andsub-space clustering of 4-d and 5-d data Keeping the cuboidsize small reduces the total amount of data to be read (anddiscarded) for these queries

32 AnnotationsAn annotation project contains a description of the spa-tial extent and metadata for objects detected in an imagedatabase Each project has a spatial database registered toan image data set Projects may be shared among teams ofhuman annotators that want a coherent view of what struc-tures have been labeled Projects are also plentiful eachparameterization of a computer vision algorithm may havea project so that outputs can be compared Annotations alsohave a resolution hierarchy so that large structures such ascell bodies may be found and marked at low resolution anddetail synapses and dendritic spines at high resolution

Anldquoannotationrdquoconsists of an object identifier that is linkedto object metadata in the RAMON (Reusable AnnotationMarkup for Open coNectomes) neuroscience ontology [19]and a set of voxels labeled with that identifier in the spa-tial database RAMON is a hierarchical systems engineeringframework for computer-vision and brain anatomy that in-clude concepts such as synapses seeds neurons organellesetc We developed RAMON for use within the Open Con-nectome Project community of collaborators it is not a stan-dard Although annotations are often sparse we store themin dense cuboids We found that this design is flexible andefficient When annotations are dense such as the outputof the automatic image segmentation [31] storing them incuboids outperforms sparse lists by orders of magnitude Toimprove performance when annotations are sparse we allo-cate cuboids lazily regions with no annotations use no stor-age and can be ignored during reads We also gzip compresscube data Cube labels compress well because they havelow entropy with both many zero values and long repeatedruns of non-zero values (labeled regions) Recent implemen-tations of run length encoding [1 44] may be preferable butwe have not evaluated them

We support multiple annotations per voxel through excep-tions An exceptions list per cuboid tracks multiply-labeledvoxels Exceptions are activated on a per project basiswhich incurs a minor runtime cost to check for exceptionson every read even if no exceptions are defined Each anno-tation write specifies how to deal with conflicting labels byoverwriting or preserving the previous label or by creatingan exception

For performance reasons we initially store annotations atsingle level in the resolution hierarchy and propagate themto all levels as a background batch IO job The conse-quence is that annotations are only immediately visible atthe resolution at which they are written The alternativewould be to update all levels of the hierarchy for each writeWhile there are data structures and image formats for this

Figure 6 Original (left) and color corrected (right)images across multiple serial sections [16]

task such as wavelets [27] the incremental maintenance ofa resolution hierarchy always makes updates more complexand expensive The decision to not make annotations con-sistent instantaneously reflects how we perform vision andanalysis Multiple algorithms are run to detect differentstructures at different levels and fuse them together Thenwe build the hierarchy of annotations prior to performingspatial analysis Because write IO performance limits sys-tem throughput we sacrifice data consistency to optimizethis workflow We may need to revisit this design decisionfor other applications

33 TilesAt present we store a redundant version of the image datain a 2-d tile stack as storage optimization for visualizationin the Web Viewer CATMAID [34] Tiles are 256x256 to1024x1024 image sections used in a pan and zoom interactiveviewer CATMAID dynamically loads and prefetches tiles sothat the user can experience continuous visual flow throughthe image region

The cutout service provides the capability to extract imageplanes as an alternative to storing tiles To do so it reads3-d cubes extracts the requested plane and discards thevast majority of the data To dynamically build tiles forCATMAID we use an http rewrite rule to convert a requestfor a file url into an invocation of the cutout Web service

Our current practice stores tiles for the image planendashthedimension of highest isotropic resolutionndashand dynamicallybuilds tiles from the cutout service for the orthogonal di-mensions including time Most visualization is done in theimage plane because anisotropy exposure differences andimperfect registration between slices makes images more dif-ficult to interpret (Figure 6) We modify CATMAIDrsquos stor-age organization to create locality within each file systemdirectory By default CATMAID keeps a directory for eachslice so that a tile at resolution r at coordinates (x y z) isnamed zy_x_rpng This places files from multiple reso-lutions in the same directory which decreases lookup speedand sequential IO Again using http rewrite rules we re-structure the hierarchy into rzy_xpng This halves thenumber of files per directory and more importantly eachdirectory corresponds to a single CATMAID viewing plane

Figure 7 OCP Data Cluster and clients as configured to run and visualize the parallel computer visionworkflow for synapse detection (Section 2)

We believe that we can provide performance similar to tilesin the cutout service through a combination of prefetchingand caching Doing so would eliminate the redundant stor-age of tiles Instead of doing a planar cutout for each tilewe could round the request up to the next cuboid and ma-terialize and cache all the nearby tiles either on the serveror in a distributed memory cache This is future work

34 Data CleaningWhile not a database technology we mention that we colorcorrect image data to eliminate exposure differences betweenimage planes The process solves a global Poisson equa-tion to minimize steep gradients in the low frequencies out-putting smoothed data High-frequencies from the originalimage are added back in to preserve the edges and struc-tures used in computer vision The process was derived fromwork by Kazhdan [18] The resulting color corrected dataare easier to interpret in orthogonal views (Figure 6) Wealso believe that they will improve the accuracy of computervision a hypothesis that we are exploring

4 SYSTEM DESIGNWe describe the OCP Data Cluster based on the configu-ration we used for running the synapse detector (Figure 7)because this demonstrates a parallel computer vision appli-cation The figure includes parallel vision pipelines usingWeb services and a visualization function run from a Webbrowser both connecting over the public Internet

41 ArchitectureThe OCP Data Cluster consists of heterogeneous nodes eachdesigned for a different function or workload These includeapplication servers database nodes file system nodes andSSD IO nodes The database nodes store all of the imageand annotation data for cutout (except for high-IO anno-tation projects on SSD IO nodes) We currently have twoDell R710s with dual Intel Xeon E5630 quad-core proces-sors 64GB of memory and Dell H700 RAID controllersThe twelve hot-plug drive bays hold a RAID-6 array of 112TB (resp 3TB) SATA drives and a hot-spare for an 18TB(resp 27TB) file system Two drives have failed and rebuiltfrom the hot spare automatically in the last 18 months Weuse MySQL for all database storage

Application servers perform all network and data manipula-tion except for database query processing This includespartitioning spatial data requests into multiple databasequeries and assembling rewriting and filtering the re-sults We currently deploy two Web-servers in a load-balancing proxy They run on the same physical hardware as

our database nodes because processor capabilities of thesenodes exceed the compute demand of the databases Thesefunctions are entirely separable

File server nodes store CATMAID image tiles for visualiza-tion image data streamed from the instruments over theInternet and other ldquoprojectrdquo data from our partners thatneeds to be ingested into OCP formats The nodes are de-signed for capacity and sequential read IO We have a DellR710 with less memory (16GB) and compute (one E5620processor) than the database nodes The machine has thesame IO subsystem with a 27TB file system We use twoData-Scope nodes (described below) as file servers that eachhave a 12TB software RAID 10 array built on 24 1TB SATAdisks

SSD IO nodes are designed for the random write workloadsgenerated by parallel computer visions algorithms The twonodes are Dell R310s with an Intel Xeon 3430 processorwith 4 24GHz cores Each machine has a striped (RAID0) volume on two OCZ Vertex4 solid states drives Thesystem realizes 20K IOPS of the theoretical hardware limitof 120K an improved IO controller is needed We deployedthe SSD IO nodes in response to our experience writing19M synapses in 3 days when running the parallel synapsefinding algorithm For synapse finding we had to throttlethe write rate to 50 concurrent outstanding requests to avoidoverloading the database nodes

Our growth plans involve moving more services into theData-Scope [41] a 90 node data-intensive cluster with a10PB ZFS file system We are currently using two machinesas file system nodes and have an unused allocation of 250TB on the shared file system Data-Scope nodes have 2 IntelXeon 5690 hex core processors 64 GB of RAM 3 Vertex3SSDs and 24 local SATA disks and thus serve all of ourcluster functions well

Data Distribution We place concurrent workloads on dis-tinct nodes in order to avoid IO interference This arisesin two applications Computer vision algorithms eg oursynapse detector reads large regions from the cutout ser-vice and performs many small writes We map cutouts toa database node and small writes to an SSD node Vi-sualization reads image data from either the tile stack orcutout database and overlays annotations Again anno-tation databases are placed on different nodes than imagedatabases Notably the cutouts and tile stack are not usedtogether and could be placed on the same node BecauseSSD storage is a limited resource OCP migrates databasesfrom SSD nodes to database nodes when they are no longer

actively being written This is an administrative action im-plemented with MySQLrsquos dump and restore utilities andmost often performed when we build the annotation reso-lution hierarchy (Section 32)

We shard large image data across multiple database nodesby partitioning the Morton-order space filling curve (Fig-ure 4) Currently we do this only for our largest dataset (bock11) for capacity reasons We have not yet founda performance benefit from sharding data For shardeddatabases the vast majority of cutout requests go to a sin-gle node We do note that multiple concurrent users of thesame data set would benefit from parallel access to the nodesof a sharded database Our sharding occurs at the applica-tion level The application is aware of the data distributionand redirects requests to the node that stores the data Weare in the process of evaluating multiple NoSQL systems inwhich sharding and scaleout are implemented in the storagesystem rather than the application We are also evaluatingSciDB [40 43] for use as an array store for spatial data

42 Web ServicesWeb-services provide rich object representations that bothcapture the output of computer vision algorithms and sup-port the spatial analysis of brain anatomy We have usedthe service to perform analysis tasks such as spatial densityestimation clustering and building distance distributionsAnalysis queries typically involve selecting a sets of objectsbased on metadata properties examining the spatial extentof these objects and computing statistics on spatial proper-ties such as distances or volumes For example one analysislooked at the distribution of the lengths of dendritic spinesskinny connections from dendrites to synapses in order tounderstand the region of influence of neural wiring TheOCP queries used metadata to find all synapses of a certaintype that connect to the selected dendrite and extractedvoxels for synapses and the object

All programming interfaces to the OCP Data Cluster useRESTful (REpresentational State Transfer) [8] interfacesthat are stateless uniform and cacheable Web-service in-vocations perform HTTP GETPUTDELETE requests tohuman readable URLs RESTrsquos simplicity makes it easy tointegrate services into many environments we have built ap-plications in Java CC++ Python Perl php and Matlab

We have selected HDF5 as our data interchange formatHDF5 is a scientific data model and format We prefer itto more common Web data formats such as JSon or XMLbecause of its support for multidimensional arrays and largedata sets

Cutout Providing efficient access to arbitrary sub-volumesof image data guides the design to the OCP Data SystemThe query which we call a cutout specifies a set of dimen-sions and ranges within each dimension in a URL GETsthe URL from the Web service which returns an HDF5file that contains a multidimensional array (Table 1) Everydatabase in OCP supports cutouts EM image databases re-turn arrays of 8-bit grayscale values annotation databasesreturn 32-bit annotation identifiers and we also support 16-bit (TIFF) and 32-bit (RBGA) image formats

Projects and Datasets A dataset configuration describesthe dimensions of spatial databases This includes the num-ber of dimensions (time series and channels) the size of eachdimensions and the number of resolutions in the hierarchyA project defines a specific database for a dataset includingthe project type (annotations or images) the storage config-uration (nodes and sharding) and properties such as doesthe database support exceptions and is the database read-only We often have tens of projects for a single dataset in-

Figure 8 A cutout of an annotation database (left)and the dense read of a single annotation (right)

cluding original data cleaned data and multiple annotationdatabases that describe different structures or use differentannotations algorithms

Object Representations Annotation databases combineall annotations of the same value (identifier) into an ob-ject associated with metadata and support spatial queriesagainst individual objects The service provides several dataoptions that allow the query to specify whether to retrievedata and in what format (Table 1) The default retrievesmetadata only from the databases tables that implementthe RAMON ontology A query may request a bounding-box around the annotation which queries a spatial indexbut does not access voxel data All the data may be re-trieved as a list of voxels or as a dense array by specify-ing cutout The dense data set has the dimensions of thebounding box Dense data may also be restricted to a spe-cific region by specifying ranges Data queries retrieve voxeldata from cuboids and then filter out all voxel labels thatdo not match the requested annotation

We provide both sparse (voxels lists) and dense (cutout)data interfaces to provide different performance optionswhen retrieving spatial data for an object The best choicedepends upon both the data and the network environmentAt the server it is always faster to compute the dense cutoutThe server reads cuboids from disk and filter the data inplace in the read buffer To compute voxel lists the match-ing voxels locations are written to another array Howevermany neural structures have large spatial extent and areextremely sparse For these the voxel representations aremuch smaller On WAN and Internet connections the re-duced network transfer time dominates additional computetime For example the dendrite 13 in kasthuri11 set com-prises 8 million voxels in a bounding box of more than 19trillion voxels ie less that 04 of voxels are in the annota-tion Other neural structures such as synapses are compactand dense interfaces always perform better

To write an annotation clients make an HTTP PUT requestto a project that includes an HDF5 file All writes use thesame base URL because the HDF5 file specifies the anno-tation identifier or gives no identifier causing the server tochoose a unique identifier for a new object The data optionsspecify the write discipline for voxels in the current anno-tation that are already labeled in the database overwritereplaces prior labels preserve keeps prior labels and ex-ception keeps the prior label and marks the new label as anexception Other data options include update when mod-ifying an existing annotation and dataonly to write voxellabels without changing metadata

Batch Interfaces OCP provides interfaces to read or writemultiple annotations at once in order to amortize the fixedcosts of making a Web service request over multiple writesand reads Batching requests is particularly important whencreating neural structures with little or no voxel data inwhich the Web service invocation dominates IO costs tothe database This was the cases with our synapse finder(Section 2) in which we doubled throughput by batching 40writes at a time HDF5 supports a directory structure that

3-d image cutout httpopenconnectometokenhdf5resolutionx-rangey-rangez-range 5123 at offset (1024 1024 1024) resolution 4 httpopenconnectomebock11hdf54512102451210245121024

Read an annotation httpopenconnectometokenidentifierdata options and voxel list for identifier 75 httpopenconnectomeannoproj75voxels and bounding box httpopenconnectomeannoproj75boundingbox and cutout restricted to a region httpopenconnectomeannoproj75cutout210002000100020001020

Write an annotation httpopenconnectometokendata optionsBatch read httpopenconnectomeannproj100010011002

Predicate query (find all synapses) httpopenconnectomeannoprojobjectstypesynapse

Table 1 RESTful interfaces to the OCP Cutout and Annotation Web services Italics indicate argumentsand parameters annoproj is an example annotation project

we use to encode multiple RAMON objects placing eachobject in its own directory by annotation identifier

Querying Metadata OCP provides a keyvalue queryinterface to object metadata The queries extract a listof objects that match a predicate against metadata Wecurrently allow equality queries against integers enumera-tions strings and user-defined keyvalue pairs and rangequeries against floating points values Although a limitedinterface these queries are simple and expressive for cer-tain tasks The objects Web service (Table 1) requests oneor more metadata fields and values (or metadata field in-equality operator and value for floating point fields) andthe service returns a list of matching annotation identi-fiers As an example we use the url openconnectomeobjectstypesynapseconfidencegeq099 to visualizethe highest confidence objects found by the synapse detec-tion pipeline

Although we do not currently provide full relational accessto databases we will provide SQL access for more complexqueries and move power users to SQL To date Web serviceshave been sufficient for computer vision pipelines But wesee no reason to re-implement an SQL-equivalent parser andlanguage through Web Services

Spatial Queries and Indexing Objects Critical to spa-tial analysis of annotation databases are two queries (1)what objects are in a region and (2) what voxels comprisean object On these primitives OCP applications buildmore complex queries such as nearest neighbors cluster-ing and spatial distributions OCPrsquos database design is wellsuited to identifying the annotations in a region The serviceperforms a cutout of the region and extracts all unique ele-ments in that region In fact the Numpy library in Pythonprovides a built in function to identify unique elements thatis implemented in C Locating the spatial extent of an objectrequires an additional index

OCP uses a simple sparse indexing technique that supportsbatch IO to locate the spatial extent of an object its voxelsand bounding box The index comprises a database tablethat enumerates the list of cuboids that contain voxels foreach annotation identifier (Figure 9) The list itself is aBLOB that contains a Python array Although this index isnot particularly compact it has several desirable properties

Adding to the index is efficient It is a batch operationand uses append only IO When writing an annotation ina cuboid we determine that the annotation is new to thecuboid based on the current contents and add the cuboidrsquosMorton-order location to a temporary list After all cuboidshave been updated a single write transaction appends allnew cuboids to the list as a batch The annotation pro-cess be it manual of machine vision tends to create newobjects and only rarely deletes or prunes existing objectsThe workload suits an append-mostly physical design

Figure 9 The sparse index for an object (green) isa list of the Morton-order location of the cuboidsthat contain voxels for that annotation The indexdescribes the disk blocks that contain voxels for thatobject which can be read in a single pass

Retrieving an object allows for batch IO and retrieves alldata in a single sequential pass The query retrieves the listof cuboids and sorts then by Morton-order locations Thenall cuboids are requested as a single query Cuboids arelaid out in increasing Morton order on disk and thus areretrieved a single sequential pass over the data

We chose this design over existing spatial indexing tech-niques because it is particularly well suited to neuroscienceobjects which are sparse and have large spatial extent ForOCP data indexes that rely on bounding boxes either growvery large (R-Trees [11]) or have inefficient search (R+-Trees[37]) Informally neural objects are long and skinny andthere are many in each region so that bounding boxes inter-sect and overlap pathologically An alternative is to use di-rected local search techniques that identify objects by a cen-troid and locate the object by searching nearby data [29 42]For OCP data these techniques produce much more com-pact indexes that are faster to maintain However query-ing the index requires many IOs to conduct an interactivesearch and we prefer lookup performance to maintenancecosts We plan to quantify and evaluate these informal com-parisons in the future

Software Application servers run a DjangoPython stackintegrated with Apache2 using WSGI that processes Webservice requests This includes performing database queriesassembling and filtering spatial data and formatting dataand metadata into and out of HDF5 files We uses paral-lel Cython to accelerate the most compute intensive tasks

(a) Maximum (b) By cutout size (c) By cutout size (loglog scale)

Figure 10 The performance of the cutout Web-service that extracts three-dimensional subvolumes from thekasthuri11 image database

Cython compiles Python code into the C language so thatit executes without the overhead of interpretation Paral-lel Cython activates OpenMP multicore parallelism withinCython routines The compute intensive routines that weaccelerate with Cython operate against every voxel in acutout Examples include (1) false coloring annotations forimage overlays in which every 32-bit annotation is mappedto an RGBA color in an output image buffer and (2) filter-ing out annotations that do not match a specified criteriaeg to find all synapses in a region one queries the metadatato get a list of synapse ids and then filters all non-matchingidentifiers out of the cutout

5 PRELIMINARY PERFORMANCEWe conducted experiments against the live Web-services toprovide an initial view of performance The major con-clusion is that memory and IO performance both limitthroughput The process of array slicing and assembly forcutout requests keeps all processors fully utilized reorganiz-ing data in memory We had not witnessed the memorybottleneck prior to this study because OCP serves data overa 1 Gbps Ethernet switch and a 1 Gbps Internet uplinkwhich we saturate long before other cluster resources Aswe migrate to the Data-Scope cluster (Section 41) we willupgrade to a 40 Gbps Internet2 uplink and more computecapable nodes which will make memory an even more crit-ical resource

Experiments were conducted on an OCP Database node aDell R710 with two Intel Xeon E5630 quad-core processors64GB of memory and a Dell H700 RAID controller manag-ing a RAID-6 array of 11 3TB SATA drives for a 27TB filesystem All Web service requests are initiated on the samenode and the use the localhost interface allowing us to testthe server beyond the 1 Gbps network limit Experimentsshow the cutout size not the size of the data transferred orread from disk Each cuboid is compressed on disk read anduncompressed The data are then packaged into a cutoutwhich is compressed prior to data transfer The cutout sizeis the amount of data that the server must handle in mem-ory The EM mouse brain data in question has high entropyand tends to compress by less than 10 The annotationdata are highly compressible

Cutout throughput is the principal performance measurefor our system dictating how much data the service pro-vides to parallel computer vision workflows Figure 10(a)

Figure 11 Throughput of 256MB cutout requests tokasthuri11 as a function of the number of concurrentrequests

shows the maximum throughput achieved over all configu-rations when issuing 16 parallel cutout requests When dataare in memory and cutout requests are aligned to cuboidboundaries (aligned memory) the system performs no IOand minimally rearranges data Processing in the appli-cation stack bounds performance In this case a singlenode realizes a peak throughput of more than 173 MBsfor the largest transfers Cutouts to random offsets alignedto cuboid boundaries add IO costs to each request andbring performance down to a peak of 121 MBs Unalignedcutouts require data to be reorganized in memory movingbyte ranges that are unaligned with the cache hierarchyThis incurs further performance penalties and peak through-put drops to only 61 MBs The IO cost of these requestsis only marginally more than aligned cutouts rounding eachdimension up to the next cuboid Unaligned cutouts revealsthe dominance of memory performance

Scalability results reveal fixed costs in both Web-service in-vocation and IO that become amortized for larger cutoutsFigures 10(b) and 10(c) show the throughput as a functionof cutout size in normal and loglog scale The experimentsuses 16 parallel requests each Performance scales nearly lin-early until 256K for reads (IO cost) and almost 1 MB for in-

Figure 12 Throughput of writing annotations as afunction of the size of the annotated region

cache (Web-service costs) Beyond this point performancecontinues to increase albeit more slowly For aligned andunaligned reads we attribute the continued increase to theMorton-order space-filling curve Larger cutouts intersectlarger aligned regions of the Morton-order curve producinglarger contiguous IOs [23] We do not report performanceabove a 256M cutout size Beyond this point the buffersneeded by the application and Web server exceed memorycapacity

To realize peak throughput we had to initiate multiple re-quests in parallel The application stack runs each Web-service request on a single process thread and memory lim-its the per-thread performance Figure 11 shows throughputas a function of the number of parallel requests Through-put scales with the number of parallel requests beyond theeight physical cores of the machine to 16 when reading datafrom disk and to 32 when reading from memory Perfor-mance scales beyond the 8 cores owing to some combina-tion of overlapping IO with computation and hyperthread-ing Too much parallelism eventually leads to a reduction inthroughput because all hardware resources are fully utilizedand more parallelism introduces more resource contention

Parallelism is key to performance in the OCP Data Clusterand the easiest way to realize it is through initiating concur-rent Web-service requests Parallelizing individual cutoutswould benefit the performance of individual request butnot overall system throughput because multiple requests al-ready consume the memory resources of all processors Wenote that the Numpy Python library we use for array ma-nipulation implements parallel operators using BLAS butdoes not parallelize array slicing

Figure 12 shows the write throughput as a function ofthe volume size when uploading annotated volumes An-notations differ from cutouts in that the data are 32-bitsper voxel and are highly compressible For this experi-ment we uploaded a region dense manual annotations ofthe kasthuri11 data set in which more than 90 of voxelsare labeled Data compress to 6 the original size How-ever performance scales with the uncompressed size of theregion because that dictates how many cuboids must bemanipulated in memory This experiment uses 16 parallelrequests Write performance scales well up to 2MB cutoutsand is faster than read at the same cutout size because thedata compress much better However beyond 2MB trans-fers write throughput collapses The best performance overall configurations of 19 MBs does not compare well withthe 121 MBs when reading image cutouts Updating a vol-

Figure 13 Performance comparison of Databasenodes and SSD nodes when writing synapses (smallrandom writes)

ume of annotations is much more complex than a cutoutIt (1) reads the previous annotations (2) applies the newannotations to the volume database resolving conflicts ona per voxel basis (3) writes back the volume database (4)reads index entries for all new annotations (5) updates eachlist by unioning new and old cuboid locations and (5) writesback the index IO is doubled and index maintenance costsare added on top The critical performance issues is updat-ing the spatial index Parallel writes to the spatial indexresult in transaction retries and timeouts in MySQL dueto contention Often a single annotation volume will re-sult in the update of hundreds of index entries one for eachunique annotation identifier in the volume We can scale tolarger writes at the expense of using fewer parallel writersImproving annotation throughput is a high priority for theOCP Data Cluster

We also include an comparison of small write performancebetween an SSD node and a Database node The SSD nodeis a Dell R310 with an Intel Xeon 3430 with 4 24GHz cores16 GB of memory and two OCZ Vertex 4 drives config-ured in a RAID 0 array The experiment uploads all of thesynapse annotations in the kasthuri11 data in random or-der committing after each write Figure 13 shows that theSSD node achieves more than 150 the throughput of theDatabase array on small random writes The number ofsynapse writes per second is surprisingly low We write onlyabout 6 RAMON objects per second However a single Ra-mon object write generates updates to three different meta-data tables the index and the volume database The take-away from this experiment is that an inexpensive SSD node(lt$3000) offloads the write workload of an entire Databasenode (gt$18000) All IO in this experiment is randomIn practice we achieve much higher write throughput be-cause of locality and batching of requests Our synapsefinder workload uploaded more than 73 synapses a secondper node but that number reflects caching and effects whenupdating many synapses in a small region

6 RELATED WORKThe performance issues faced by the spatial databases in theOCP Data Cluster parallel those of SciDB [40 43] includingarray slicing selection queries joins and clusteringscale-out SciDB might benefit OCP by making data distributiontransparent and by offloading application functions imple-mented in Python to SciDB

Many of the physical design principles of OCP follow the Ar-rayStore of Soroush [39] They give a model for multi-levelpartitioning (chunking and tiling) that is more general thanOCP cuboids They can handle both regular and irregular

data ArrayStore extends decades of work on regular [6 24]and irregular [5] tiling The RasDaMan multidimensionalarray database [2] also builds spatial data services for im-age processing on top of a relational database They toohave explored the issues of tiling data organization anddata distribution [9] In contrast to general-purpose arraydatabases OCP has a focus on the specific application ofparallel computer vision for neuroscience and as such tunesits design to application concepts and properties Examplesinclude using the partitioning of space-filling curves in thedata distribution function and indexing of neural objects

OCPrsquos design incorporates many techniques developed inthe spatial data management community Samet [35] wrotethe authoritative text on the subject which we have used asa reference for region quad-trees space-filling curves tessel-lations and much more

OCP represents the a current state of evolution of the scale-out database architectures developed by the Institute forData-Intensive Engineering and Science at Johns HopkinsGray and Szalay developed the progenitor system in theSloan Digital Sky Survey [36] This has lead to more than 30different data products over the past decade including theJHU Turbulence Database Cluster [20] and the Life UnderYour Feet soil ecology portal [25]

7 FINAL THOUGHTSIn less than two years The Open Connectome Projecthas evolved from a problem statement to a data manage-ment and analysis platform for a community of neurosci-entists with the singular goal of mapping the brain High-throughput imaging instruments have driven data manage-ment off the workstation and out of the lab into data-intensive clusters and Web-services Also the scope of theproblem has forced experimental biologists to engage statis-ticians systems engineers machine learners and big-datascientists The Open Connectome Project aims to be theforum in which they all meet

The OCP Data Cluster has built a rich set of features tomeet the analysis needs of its users Application-specificcapabilities have been the development focus to date Theplatform manages data sets from many imaging modalities(Five to fifteen depending on how one counts them)

Presently our focus must change to throughput and scala-bility The near future holds two 100 teravoxel data setsmdasheach larger than the sum of all our data We will move to anew home in the Data-Scope cluster which will increase ouroutward-facing Internet bandwidth by a factor of forty Wealso expect the Connectomics community to expand rapidlyThe US Presidentrsquos Office recently announced a decade-longinitiative to build a comprehensive map of the human brain[21] This study reveals many limitations of the OCP datacluster and provides guidance to focus our efforts specif-ically toward servicing many small writes to record brainstructures and efficient and parallel manipulation of data toalleviate memory bottlenecks

AcknowledgmentsThe authors would like to thank additional members of theOpen Connectome Project that contributed to this workincluding Disa Mhembere and Ayushi Sinha We wouldalso like to thank our collaborators Forrest Collman AlbertoCardona Cameron Craddock Michael Milham Partha Mi-tra and Sebastian Seung

8 ADDITIONAL AUTHORSlowast Department of Computer Science and the Institute forData Intensive Engineering and Science Johns Hopkins Uni-versity

dagger Johns Hopkins University Applied Physics LaboratoryDagger Department of Statistical Science and Mathematics andthe Institute for Brain Science Duke Universitysect Janelia Farm Research Campus Howard Hughes MedicalInstitutepara Department of Molecular and Cellular Biology HarvardUniversity Department of Computational Neuroscience MassachusettsInstitute of Technologylowastlowast Allen Institute for Brain Sciencedaggerdagger Department of Physics and Astronomy and the Institutefor Data Intensive Engineering and Science Johns HopkinsUniversityDaggerDagger Department of Bioengineering Stanford Universitysectsect Department of Molecular and Cellular Physiology Stan-ford University

9 REFERENCES[1] D J Abadi S R Madden and M Ferreira

Integrating compression and execution incolumn-oriented database systems In SIGMOD 2006

[2] P Baumann A Dehmel P Furtado R Ritsch andB Widmann The multidimensional database systemRasDaMan In SIGMOD 1998

[3] D D Bock W-C A Lee A M Kerlin M LAndermann G Hood A W Wetzel S YurgensonE R Soucy H S Kim and R C Reid Networkanatomy and in vivo physiology of visual corticalneurons Nature 471(7337) 2011

[4] A Cardona S Saalfeld J SchindelinI Arganda-Carreras S Preibisch M LongairP Tomancak V Hartenstein and R J DouglasTrakEM2 software for neural circuit reconstructionPLoS ONE 7(6) 2012

[5] C Chang A Acharya A Sussman and J Saltz T2a customizable parallel database for multi-dimensionaldata SIGMOD Record 27(1) 1998

[6] C Chang B Moon A Acharya C ShockA Sussman and J Saltz Titan A high-performanceremote sensing database In ICDE 1997

[7] F Crow Summed-area tables for texture mapping InSIGGRAPH 1984

[8] R T Fielding R Taylor and N Richard Principleddesign of the modern Web architecture ACMTransactions on Internet Technology 2(2) 2002

[9] P Furtado and P Baumann Storage ofmultidimensional arrays based on arbitrary tiling InICDE 1999

[10] J Gray and A Szalay Science in an exponentialworld Nature 440(23) 23 March 2006

[11] A Guttman R-trees a dynamic index structure forspatial searching In ACM SIGMOD Conference 1984

[12] H S Seung et al Eyewire Available at eyewireorg2012

[13] V Jain H S Seung and S C Turaga Machines thatlearn to segment images a crucial technology forconnectomics Current opinion in neurobiology 20(5)2010

[14] V Jain S Turaga K Briggman W Denk andS Seung Learning to agglomerate superpixelhierarchies In Neural Information Processing Systems2011

[15] K Kanov R Burns G Eyink C Meneveau andA Szalay Data-intensive spatial filtering in largenumerical simulation datasets In Supercomputing2012

[16] N Kasthuri and J Lichtman Untitled Inpreparation 2013

[17] V Kaynig T Fuchs and J M BuhmannGeometrical consistent 3d tracing of neuronalprocesses in ssTEM data In MICCAI 2010

[18] M Kazhdan and H Hoppe Streaming multigrid forgradient-domain operations on large images ACMTransactions on Graphics 27 2008

[19] D M Kleissas W R Gray J M Burck J TVogelstein E Perlman P M Burlina R Burns andR J Vogelstein CAJAL3D toward a fully automaticpipeline for connectome estimation fromhigh-resolution em data In Neuroinformatics 2012

[20] Y Li E Perlman M Wang Y Yang C MeneveauR Burns S Chen A Szalay and G Eyink A publicturbulence database cluster and applications to studyLagrangian evolution of velocity increments inturbulence Journal of Turbulence 9(31)1ndash29 2008

[21] J Markoff Obama seeking to boost study of humanbrain New York Times February 17 2013

[22] K D Micheva B Busse N C Weiler N OrsquoRourkeand S J Smith Single-synapse analysis of a diversesynapse population Proteomic imaging methods andmarkers Neuron 68(1) 2010

[23] B Moon H V Jagadish C Faloutsos and J HSaltz Analysis of the clustering properties of theHilbert space-filling curve IEEE Transactions onKnowledge and Data Engineering 13(1) 2001

[24] B Moon and J Saltz Scalability analysis ofdeclustering methods for multidimensional rangequeries Trans of Knowledge and Data Engineering2(10) 1998

[25] R A Musaloiu-E A Terzis K Szlavecz A SzalayJ Cogan and J Gray Life under your feet Wirelesssensors in soil ecology In Embedded NetworkedSensors 2006

[26] M Nielsen Reinventing Discovery PrincetonUniversity Press 2011

[27] A Norton and J Clyne The VAPOR visualizationapplication In High Performance Visualization 2012

[28] N OrsquoRourke N C Weiler K D Micheva and S JSmith Deep molecular diversity of mammaliansynapses why it matters and how to measure itNature Reviews Neuroscience 13(1) 2012

[29] S Papadomanolakis A Ailamaki J C Lopez T TuD R OrsquoHallaron and G Heber Efficient queryprocessing on unstructured tetrahedral meshes InACM SIGMOD 2006

[30] E Perlman R Burns Y Li and C Meneveau Dataexploration of turbulence simulations using a database

cluster In Supercomputing 2007[31] A V Reina M Gelbart D Huang J Lichtman

E L Miller and H Pfister Segmentation fusion forconnectomics In International Conference onComputer Vision 2011

[32] A V Reina W-K Jeong J Lichtman andH Pfister The connectome project discovering thewiring of the brain ACM Crossroads 18(1)8ndash132011

[33] D E Rex J Q Ma and A W Toga The LONIpipeline processing environment Neuroimage 19(3)2003

[34] S Saalfeld A Cardona V Hartenstein andP Tomancak CATMAID collaborative annotationtoolkit for massive amounts of image dataBioinformatics 25(15) 2009

[35] H Samet Foundations of Multidimensional andMetric Data Structures Morgan Kaufmann PublishersInc San Francisco 2006

[36] The Sloan Digital Sky Survey 2013 Available athttpwwwsdssorg

[37] T Sellis N Roussopoulos and C Faloutsos TheR+-tree A dynamic index for multi-dimensionalobjects In VLDB 1987

[38] C Sommer C Straehle U Koethe and F AHamprecht rdquoilastik Interactive learning andsegmentation toolkitrdquo In Biomedical Imaging 2011

[39] E Soroush M Balazinska and D Wang ArraystoreA storage manager for complex parallel arrayprocessing In SIGMOD 2011

[40] M Stonebraker J Becla D DeWitt K-T LimD Maier O Ratzesberger and S ZdonikRequirements for science data bases and SciDB InConference on Innovative Data Systems Research2009

[41] A S Szalay K Church C Meneveau A Terzis andS Zeger MRI The Development of Data-Scopemdashamulti-petabyte generic data analysis environment forscience Available at httpswikiphajhueduescience_wikiimages77fDataScopepdf 2012

[42] F Tauheed L Biveinis T Heinis F SchurmannH Markram and A Ailamaki Accelerating rangequeries for brain simulations In ICDE 2012

[43] The SciDB Development Team Overview of SciDBlarge scale array storage processing and analysis InSIGMOD 2010

[44] K Wu FastBit Interactively searching massive dataJournal of Physics Conference Series 180(1) 2009

  • 1 Introduction
  • 2 Data and Applications
  • 3 Data Model
    • 31 Physical Design
    • 32 Annotations
    • 33 Tiles
    • 34 Data Cleaning
      • 4 System Design
        • 41 Architecture
        • 42 Web Services
          • 5 Preliminary Performance
          • 6 Related Work
          • 7 Final Thoughts
          • 8 Additional Authors
          • 9 References
Page 5: The Open Connectome Project Data Cluster: Scalable ... · The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience Randal Burns William

Figure 7 OCP Data Cluster and clients as configured to run and visualize the parallel computer visionworkflow for synapse detection (Section 2)

We believe that we can provide performance similar to tilesin the cutout service through a combination of prefetchingand caching Doing so would eliminate the redundant stor-age of tiles Instead of doing a planar cutout for each tilewe could round the request up to the next cuboid and ma-terialize and cache all the nearby tiles either on the serveror in a distributed memory cache This is future work

34 Data CleaningWhile not a database technology we mention that we colorcorrect image data to eliminate exposure differences betweenimage planes The process solves a global Poisson equa-tion to minimize steep gradients in the low frequencies out-putting smoothed data High-frequencies from the originalimage are added back in to preserve the edges and struc-tures used in computer vision The process was derived fromwork by Kazhdan [18] The resulting color corrected dataare easier to interpret in orthogonal views (Figure 6) Wealso believe that they will improve the accuracy of computervision a hypothesis that we are exploring

4 SYSTEM DESIGNWe describe the OCP Data Cluster based on the configu-ration we used for running the synapse detector (Figure 7)because this demonstrates a parallel computer vision appli-cation The figure includes parallel vision pipelines usingWeb services and a visualization function run from a Webbrowser both connecting over the public Internet

41 ArchitectureThe OCP Data Cluster consists of heterogeneous nodes eachdesigned for a different function or workload These includeapplication servers database nodes file system nodes andSSD IO nodes The database nodes store all of the imageand annotation data for cutout (except for high-IO anno-tation projects on SSD IO nodes) We currently have twoDell R710s with dual Intel Xeon E5630 quad-core proces-sors 64GB of memory and Dell H700 RAID controllersThe twelve hot-plug drive bays hold a RAID-6 array of 112TB (resp 3TB) SATA drives and a hot-spare for an 18TB(resp 27TB) file system Two drives have failed and rebuiltfrom the hot spare automatically in the last 18 months Weuse MySQL for all database storage

Application servers perform all network and data manipula-tion except for database query processing This includespartitioning spatial data requests into multiple databasequeries and assembling rewriting and filtering the re-sults We currently deploy two Web-servers in a load-balancing proxy They run on the same physical hardware as

our database nodes because processor capabilities of thesenodes exceed the compute demand of the databases Thesefunctions are entirely separable

File server nodes store CATMAID image tiles for visualiza-tion image data streamed from the instruments over theInternet and other ldquoprojectrdquo data from our partners thatneeds to be ingested into OCP formats The nodes are de-signed for capacity and sequential read IO We have a DellR710 with less memory (16GB) and compute (one E5620processor) than the database nodes The machine has thesame IO subsystem with a 27TB file system We use twoData-Scope nodes (described below) as file servers that eachhave a 12TB software RAID 10 array built on 24 1TB SATAdisks

SSD IO nodes are designed for the random write workloadsgenerated by parallel computer visions algorithms The twonodes are Dell R310s with an Intel Xeon 3430 processorwith 4 24GHz cores Each machine has a striped (RAID0) volume on two OCZ Vertex4 solid states drives Thesystem realizes 20K IOPS of the theoretical hardware limitof 120K an improved IO controller is needed We deployedthe SSD IO nodes in response to our experience writing19M synapses in 3 days when running the parallel synapsefinding algorithm For synapse finding we had to throttlethe write rate to 50 concurrent outstanding requests to avoidoverloading the database nodes

Our growth plans involve moving more services into theData-Scope [41] a 90 node data-intensive cluster with a10PB ZFS file system We are currently using two machinesas file system nodes and have an unused allocation of 250TB on the shared file system Data-Scope nodes have 2 IntelXeon 5690 hex core processors 64 GB of RAM 3 Vertex3SSDs and 24 local SATA disks and thus serve all of ourcluster functions well

Data Distribution We place concurrent workloads on dis-tinct nodes in order to avoid IO interference This arisesin two applications Computer vision algorithms eg oursynapse detector reads large regions from the cutout ser-vice and performs many small writes We map cutouts toa database node and small writes to an SSD node Vi-sualization reads image data from either the tile stack orcutout database and overlays annotations Again anno-tation databases are placed on different nodes than imagedatabases Notably the cutouts and tile stack are not usedtogether and could be placed on the same node BecauseSSD storage is a limited resource OCP migrates databasesfrom SSD nodes to database nodes when they are no longer

actively being written This is an administrative action im-plemented with MySQLrsquos dump and restore utilities andmost often performed when we build the annotation reso-lution hierarchy (Section 32)

We shard large image data across multiple database nodesby partitioning the Morton-order space filling curve (Fig-ure 4) Currently we do this only for our largest dataset (bock11) for capacity reasons We have not yet founda performance benefit from sharding data For shardeddatabases the vast majority of cutout requests go to a sin-gle node We do note that multiple concurrent users of thesame data set would benefit from parallel access to the nodesof a sharded database Our sharding occurs at the applica-tion level The application is aware of the data distributionand redirects requests to the node that stores the data Weare in the process of evaluating multiple NoSQL systems inwhich sharding and scaleout are implemented in the storagesystem rather than the application We are also evaluatingSciDB [40 43] for use as an array store for spatial data

42 Web ServicesWeb-services provide rich object representations that bothcapture the output of computer vision algorithms and sup-port the spatial analysis of brain anatomy We have usedthe service to perform analysis tasks such as spatial densityestimation clustering and building distance distributionsAnalysis queries typically involve selecting a sets of objectsbased on metadata properties examining the spatial extentof these objects and computing statistics on spatial proper-ties such as distances or volumes For example one analysislooked at the distribution of the lengths of dendritic spinesskinny connections from dendrites to synapses in order tounderstand the region of influence of neural wiring TheOCP queries used metadata to find all synapses of a certaintype that connect to the selected dendrite and extractedvoxels for synapses and the object

All programming interfaces to the OCP Data Cluster useRESTful (REpresentational State Transfer) [8] interfacesthat are stateless uniform and cacheable Web-service in-vocations perform HTTP GETPUTDELETE requests tohuman readable URLs RESTrsquos simplicity makes it easy tointegrate services into many environments we have built ap-plications in Java CC++ Python Perl php and Matlab

We have selected HDF5 as our data interchange formatHDF5 is a scientific data model and format We prefer itto more common Web data formats such as JSon or XMLbecause of its support for multidimensional arrays and largedata sets

Cutout Providing efficient access to arbitrary sub-volumesof image data guides the design to the OCP Data SystemThe query which we call a cutout specifies a set of dimen-sions and ranges within each dimension in a URL GETsthe URL from the Web service which returns an HDF5file that contains a multidimensional array (Table 1) Everydatabase in OCP supports cutouts EM image databases re-turn arrays of 8-bit grayscale values annotation databasesreturn 32-bit annotation identifiers and we also support 16-bit (TIFF) and 32-bit (RBGA) image formats

Projects and Datasets A dataset configuration describesthe dimensions of spatial databases This includes the num-ber of dimensions (time series and channels) the size of eachdimensions and the number of resolutions in the hierarchyA project defines a specific database for a dataset includingthe project type (annotations or images) the storage config-uration (nodes and sharding) and properties such as doesthe database support exceptions and is the database read-only We often have tens of projects for a single dataset in-

Figure 8 A cutout of an annotation database (left)and the dense read of a single annotation (right)

cluding original data cleaned data and multiple annotationdatabases that describe different structures or use differentannotations algorithms

Object Representations Annotation databases combineall annotations of the same value (identifier) into an ob-ject associated with metadata and support spatial queriesagainst individual objects The service provides several dataoptions that allow the query to specify whether to retrievedata and in what format (Table 1) The default retrievesmetadata only from the databases tables that implementthe RAMON ontology A query may request a bounding-box around the annotation which queries a spatial indexbut does not access voxel data All the data may be re-trieved as a list of voxels or as a dense array by specify-ing cutout The dense data set has the dimensions of thebounding box Dense data may also be restricted to a spe-cific region by specifying ranges Data queries retrieve voxeldata from cuboids and then filter out all voxel labels thatdo not match the requested annotation

We provide both sparse (voxels lists) and dense (cutout)data interfaces to provide different performance optionswhen retrieving spatial data for an object The best choicedepends upon both the data and the network environmentAt the server it is always faster to compute the dense cutoutThe server reads cuboids from disk and filter the data inplace in the read buffer To compute voxel lists the match-ing voxels locations are written to another array Howevermany neural structures have large spatial extent and areextremely sparse For these the voxel representations aremuch smaller On WAN and Internet connections the re-duced network transfer time dominates additional computetime For example the dendrite 13 in kasthuri11 set com-prises 8 million voxels in a bounding box of more than 19trillion voxels ie less that 04 of voxels are in the annota-tion Other neural structures such as synapses are compactand dense interfaces always perform better

To write an annotation clients make an HTTP PUT requestto a project that includes an HDF5 file All writes use thesame base URL because the HDF5 file specifies the anno-tation identifier or gives no identifier causing the server tochoose a unique identifier for a new object The data optionsspecify the write discipline for voxels in the current anno-tation that are already labeled in the database overwritereplaces prior labels preserve keeps prior labels and ex-ception keeps the prior label and marks the new label as anexception Other data options include update when mod-ifying an existing annotation and dataonly to write voxellabels without changing metadata

Batch Interfaces OCP provides interfaces to read or writemultiple annotations at once in order to amortize the fixedcosts of making a Web service request over multiple writesand reads Batching requests is particularly important whencreating neural structures with little or no voxel data inwhich the Web service invocation dominates IO costs tothe database This was the cases with our synapse finder(Section 2) in which we doubled throughput by batching 40writes at a time HDF5 supports a directory structure that

3-d image cutout httpopenconnectometokenhdf5resolutionx-rangey-rangez-range 5123 at offset (1024 1024 1024) resolution 4 httpopenconnectomebock11hdf54512102451210245121024

Read an annotation httpopenconnectometokenidentifierdata options and voxel list for identifier 75 httpopenconnectomeannoproj75voxels and bounding box httpopenconnectomeannoproj75boundingbox and cutout restricted to a region httpopenconnectomeannoproj75cutout210002000100020001020

Write an annotation httpopenconnectometokendata optionsBatch read httpopenconnectomeannproj100010011002

Predicate query (find all synapses) httpopenconnectomeannoprojobjectstypesynapse

Table 1 RESTful interfaces to the OCP Cutout and Annotation Web services Italics indicate argumentsand parameters annoproj is an example annotation project

we use to encode multiple RAMON objects placing eachobject in its own directory by annotation identifier

Querying Metadata OCP provides a keyvalue queryinterface to object metadata The queries extract a listof objects that match a predicate against metadata Wecurrently allow equality queries against integers enumera-tions strings and user-defined keyvalue pairs and rangequeries against floating points values Although a limitedinterface these queries are simple and expressive for cer-tain tasks The objects Web service (Table 1) requests oneor more metadata fields and values (or metadata field in-equality operator and value for floating point fields) andthe service returns a list of matching annotation identi-fiers As an example we use the url openconnectomeobjectstypesynapseconfidencegeq099 to visualizethe highest confidence objects found by the synapse detec-tion pipeline

Although we do not currently provide full relational accessto databases we will provide SQL access for more complexqueries and move power users to SQL To date Web serviceshave been sufficient for computer vision pipelines But wesee no reason to re-implement an SQL-equivalent parser andlanguage through Web Services

Spatial Queries and Indexing Objects Critical to spa-tial analysis of annotation databases are two queries (1)what objects are in a region and (2) what voxels comprisean object On these primitives OCP applications buildmore complex queries such as nearest neighbors cluster-ing and spatial distributions OCPrsquos database design is wellsuited to identifying the annotations in a region The serviceperforms a cutout of the region and extracts all unique ele-ments in that region In fact the Numpy library in Pythonprovides a built in function to identify unique elements thatis implemented in C Locating the spatial extent of an objectrequires an additional index

OCP uses a simple sparse indexing technique that supportsbatch IO to locate the spatial extent of an object its voxelsand bounding box The index comprises a database tablethat enumerates the list of cuboids that contain voxels foreach annotation identifier (Figure 9) The list itself is aBLOB that contains a Python array Although this index isnot particularly compact it has several desirable properties

Adding to the index is efficient It is a batch operationand uses append only IO When writing an annotation ina cuboid we determine that the annotation is new to thecuboid based on the current contents and add the cuboidrsquosMorton-order location to a temporary list After all cuboidshave been updated a single write transaction appends allnew cuboids to the list as a batch The annotation pro-cess be it manual of machine vision tends to create newobjects and only rarely deletes or prunes existing objectsThe workload suits an append-mostly physical design

Figure 9 The sparse index for an object (green) isa list of the Morton-order location of the cuboidsthat contain voxels for that annotation The indexdescribes the disk blocks that contain voxels for thatobject which can be read in a single pass

Retrieving an object allows for batch IO and retrieves alldata in a single sequential pass The query retrieves the listof cuboids and sorts then by Morton-order locations Thenall cuboids are requested as a single query Cuboids arelaid out in increasing Morton order on disk and thus areretrieved a single sequential pass over the data

We chose this design over existing spatial indexing tech-niques because it is particularly well suited to neuroscienceobjects which are sparse and have large spatial extent ForOCP data indexes that rely on bounding boxes either growvery large (R-Trees [11]) or have inefficient search (R+-Trees[37]) Informally neural objects are long and skinny andthere are many in each region so that bounding boxes inter-sect and overlap pathologically An alternative is to use di-rected local search techniques that identify objects by a cen-troid and locate the object by searching nearby data [29 42]For OCP data these techniques produce much more com-pact indexes that are faster to maintain However query-ing the index requires many IOs to conduct an interactivesearch and we prefer lookup performance to maintenancecosts We plan to quantify and evaluate these informal com-parisons in the future

Software Application servers run a DjangoPython stackintegrated with Apache2 using WSGI that processes Webservice requests This includes performing database queriesassembling and filtering spatial data and formatting dataand metadata into and out of HDF5 files We uses paral-lel Cython to accelerate the most compute intensive tasks

(a) Maximum (b) By cutout size (c) By cutout size (loglog scale)

Figure 10 The performance of the cutout Web-service that extracts three-dimensional subvolumes from thekasthuri11 image database

Cython compiles Python code into the C language so thatit executes without the overhead of interpretation Paral-lel Cython activates OpenMP multicore parallelism withinCython routines The compute intensive routines that weaccelerate with Cython operate against every voxel in acutout Examples include (1) false coloring annotations forimage overlays in which every 32-bit annotation is mappedto an RGBA color in an output image buffer and (2) filter-ing out annotations that do not match a specified criteriaeg to find all synapses in a region one queries the metadatato get a list of synapse ids and then filters all non-matchingidentifiers out of the cutout

5 PRELIMINARY PERFORMANCEWe conducted experiments against the live Web-services toprovide an initial view of performance The major con-clusion is that memory and IO performance both limitthroughput The process of array slicing and assembly forcutout requests keeps all processors fully utilized reorganiz-ing data in memory We had not witnessed the memorybottleneck prior to this study because OCP serves data overa 1 Gbps Ethernet switch and a 1 Gbps Internet uplinkwhich we saturate long before other cluster resources Aswe migrate to the Data-Scope cluster (Section 41) we willupgrade to a 40 Gbps Internet2 uplink and more computecapable nodes which will make memory an even more crit-ical resource

Experiments were conducted on an OCP Database node aDell R710 with two Intel Xeon E5630 quad-core processors64GB of memory and a Dell H700 RAID controller manag-ing a RAID-6 array of 11 3TB SATA drives for a 27TB filesystem All Web service requests are initiated on the samenode and the use the localhost interface allowing us to testthe server beyond the 1 Gbps network limit Experimentsshow the cutout size not the size of the data transferred orread from disk Each cuboid is compressed on disk read anduncompressed The data are then packaged into a cutoutwhich is compressed prior to data transfer The cutout sizeis the amount of data that the server must handle in mem-ory The EM mouse brain data in question has high entropyand tends to compress by less than 10 The annotationdata are highly compressible

Cutout throughput is the principal performance measurefor our system dictating how much data the service pro-vides to parallel computer vision workflows Figure 10(a)

Figure 11 Throughput of 256MB cutout requests tokasthuri11 as a function of the number of concurrentrequests

shows the maximum throughput achieved over all configu-rations when issuing 16 parallel cutout requests When dataare in memory and cutout requests are aligned to cuboidboundaries (aligned memory) the system performs no IOand minimally rearranges data Processing in the appli-cation stack bounds performance In this case a singlenode realizes a peak throughput of more than 173 MBsfor the largest transfers Cutouts to random offsets alignedto cuboid boundaries add IO costs to each request andbring performance down to a peak of 121 MBs Unalignedcutouts require data to be reorganized in memory movingbyte ranges that are unaligned with the cache hierarchyThis incurs further performance penalties and peak through-put drops to only 61 MBs The IO cost of these requestsis only marginally more than aligned cutouts rounding eachdimension up to the next cuboid Unaligned cutouts revealsthe dominance of memory performance

Scalability results reveal fixed costs in both Web-service in-vocation and IO that become amortized for larger cutoutsFigures 10(b) and 10(c) show the throughput as a functionof cutout size in normal and loglog scale The experimentsuses 16 parallel requests each Performance scales nearly lin-early until 256K for reads (IO cost) and almost 1 MB for in-

Figure 12 Throughput of writing annotations as afunction of the size of the annotated region

cache (Web-service costs) Beyond this point performancecontinues to increase albeit more slowly For aligned andunaligned reads we attribute the continued increase to theMorton-order space-filling curve Larger cutouts intersectlarger aligned regions of the Morton-order curve producinglarger contiguous IOs [23] We do not report performanceabove a 256M cutout size Beyond this point the buffersneeded by the application and Web server exceed memorycapacity

To realize peak throughput we had to initiate multiple re-quests in parallel The application stack runs each Web-service request on a single process thread and memory lim-its the per-thread performance Figure 11 shows throughputas a function of the number of parallel requests Through-put scales with the number of parallel requests beyond theeight physical cores of the machine to 16 when reading datafrom disk and to 32 when reading from memory Perfor-mance scales beyond the 8 cores owing to some combina-tion of overlapping IO with computation and hyperthread-ing Too much parallelism eventually leads to a reduction inthroughput because all hardware resources are fully utilizedand more parallelism introduces more resource contention

Parallelism is key to performance in the OCP Data Clusterand the easiest way to realize it is through initiating concur-rent Web-service requests Parallelizing individual cutoutswould benefit the performance of individual request butnot overall system throughput because multiple requests al-ready consume the memory resources of all processors Wenote that the Numpy Python library we use for array ma-nipulation implements parallel operators using BLAS butdoes not parallelize array slicing

Figure 12 shows the write throughput as a function ofthe volume size when uploading annotated volumes An-notations differ from cutouts in that the data are 32-bitsper voxel and are highly compressible For this experi-ment we uploaded a region dense manual annotations ofthe kasthuri11 data set in which more than 90 of voxelsare labeled Data compress to 6 the original size How-ever performance scales with the uncompressed size of theregion because that dictates how many cuboids must bemanipulated in memory This experiment uses 16 parallelrequests Write performance scales well up to 2MB cutoutsand is faster than read at the same cutout size because thedata compress much better However beyond 2MB trans-fers write throughput collapses The best performance overall configurations of 19 MBs does not compare well withthe 121 MBs when reading image cutouts Updating a vol-

Figure 13 Performance comparison of Databasenodes and SSD nodes when writing synapses (smallrandom writes)

ume of annotations is much more complex than a cutoutIt (1) reads the previous annotations (2) applies the newannotations to the volume database resolving conflicts ona per voxel basis (3) writes back the volume database (4)reads index entries for all new annotations (5) updates eachlist by unioning new and old cuboid locations and (5) writesback the index IO is doubled and index maintenance costsare added on top The critical performance issues is updat-ing the spatial index Parallel writes to the spatial indexresult in transaction retries and timeouts in MySQL dueto contention Often a single annotation volume will re-sult in the update of hundreds of index entries one for eachunique annotation identifier in the volume We can scale tolarger writes at the expense of using fewer parallel writersImproving annotation throughput is a high priority for theOCP Data Cluster

We also include an comparison of small write performancebetween an SSD node and a Database node The SSD nodeis a Dell R310 with an Intel Xeon 3430 with 4 24GHz cores16 GB of memory and two OCZ Vertex 4 drives config-ured in a RAID 0 array The experiment uploads all of thesynapse annotations in the kasthuri11 data in random or-der committing after each write Figure 13 shows that theSSD node achieves more than 150 the throughput of theDatabase array on small random writes The number ofsynapse writes per second is surprisingly low We write onlyabout 6 RAMON objects per second However a single Ra-mon object write generates updates to three different meta-data tables the index and the volume database The take-away from this experiment is that an inexpensive SSD node(lt$3000) offloads the write workload of an entire Databasenode (gt$18000) All IO in this experiment is randomIn practice we achieve much higher write throughput be-cause of locality and batching of requests Our synapsefinder workload uploaded more than 73 synapses a secondper node but that number reflects caching and effects whenupdating many synapses in a small region

6 RELATED WORKThe performance issues faced by the spatial databases in theOCP Data Cluster parallel those of SciDB [40 43] includingarray slicing selection queries joins and clusteringscale-out SciDB might benefit OCP by making data distributiontransparent and by offloading application functions imple-mented in Python to SciDB

Many of the physical design principles of OCP follow the Ar-rayStore of Soroush [39] They give a model for multi-levelpartitioning (chunking and tiling) that is more general thanOCP cuboids They can handle both regular and irregular

data ArrayStore extends decades of work on regular [6 24]and irregular [5] tiling The RasDaMan multidimensionalarray database [2] also builds spatial data services for im-age processing on top of a relational database They toohave explored the issues of tiling data organization anddata distribution [9] In contrast to general-purpose arraydatabases OCP has a focus on the specific application ofparallel computer vision for neuroscience and as such tunesits design to application concepts and properties Examplesinclude using the partitioning of space-filling curves in thedata distribution function and indexing of neural objects

OCPrsquos design incorporates many techniques developed inthe spatial data management community Samet [35] wrotethe authoritative text on the subject which we have used asa reference for region quad-trees space-filling curves tessel-lations and much more

OCP represents the a current state of evolution of the scale-out database architectures developed by the Institute forData-Intensive Engineering and Science at Johns HopkinsGray and Szalay developed the progenitor system in theSloan Digital Sky Survey [36] This has lead to more than 30different data products over the past decade including theJHU Turbulence Database Cluster [20] and the Life UnderYour Feet soil ecology portal [25]

7 FINAL THOUGHTSIn less than two years The Open Connectome Projecthas evolved from a problem statement to a data manage-ment and analysis platform for a community of neurosci-entists with the singular goal of mapping the brain High-throughput imaging instruments have driven data manage-ment off the workstation and out of the lab into data-intensive clusters and Web-services Also the scope of theproblem has forced experimental biologists to engage statis-ticians systems engineers machine learners and big-datascientists The Open Connectome Project aims to be theforum in which they all meet

The OCP Data Cluster has built a rich set of features tomeet the analysis needs of its users Application-specificcapabilities have been the development focus to date Theplatform manages data sets from many imaging modalities(Five to fifteen depending on how one counts them)

Presently our focus must change to throughput and scala-bility The near future holds two 100 teravoxel data setsmdasheach larger than the sum of all our data We will move to anew home in the Data-Scope cluster which will increase ouroutward-facing Internet bandwidth by a factor of forty Wealso expect the Connectomics community to expand rapidlyThe US Presidentrsquos Office recently announced a decade-longinitiative to build a comprehensive map of the human brain[21] This study reveals many limitations of the OCP datacluster and provides guidance to focus our efforts specif-ically toward servicing many small writes to record brainstructures and efficient and parallel manipulation of data toalleviate memory bottlenecks

AcknowledgmentsThe authors would like to thank additional members of theOpen Connectome Project that contributed to this workincluding Disa Mhembere and Ayushi Sinha We wouldalso like to thank our collaborators Forrest Collman AlbertoCardona Cameron Craddock Michael Milham Partha Mi-tra and Sebastian Seung

8 ADDITIONAL AUTHORSlowast Department of Computer Science and the Institute forData Intensive Engineering and Science Johns Hopkins Uni-versity

dagger Johns Hopkins University Applied Physics LaboratoryDagger Department of Statistical Science and Mathematics andthe Institute for Brain Science Duke Universitysect Janelia Farm Research Campus Howard Hughes MedicalInstitutepara Department of Molecular and Cellular Biology HarvardUniversity Department of Computational Neuroscience MassachusettsInstitute of Technologylowastlowast Allen Institute for Brain Sciencedaggerdagger Department of Physics and Astronomy and the Institutefor Data Intensive Engineering and Science Johns HopkinsUniversityDaggerDagger Department of Bioengineering Stanford Universitysectsect Department of Molecular and Cellular Physiology Stan-ford University

9 REFERENCES[1] D J Abadi S R Madden and M Ferreira

Integrating compression and execution incolumn-oriented database systems In SIGMOD 2006

[2] P Baumann A Dehmel P Furtado R Ritsch andB Widmann The multidimensional database systemRasDaMan In SIGMOD 1998

[3] D D Bock W-C A Lee A M Kerlin M LAndermann G Hood A W Wetzel S YurgensonE R Soucy H S Kim and R C Reid Networkanatomy and in vivo physiology of visual corticalneurons Nature 471(7337) 2011

[4] A Cardona S Saalfeld J SchindelinI Arganda-Carreras S Preibisch M LongairP Tomancak V Hartenstein and R J DouglasTrakEM2 software for neural circuit reconstructionPLoS ONE 7(6) 2012

[5] C Chang A Acharya A Sussman and J Saltz T2a customizable parallel database for multi-dimensionaldata SIGMOD Record 27(1) 1998

[6] C Chang B Moon A Acharya C ShockA Sussman and J Saltz Titan A high-performanceremote sensing database In ICDE 1997

[7] F Crow Summed-area tables for texture mapping InSIGGRAPH 1984

[8] R T Fielding R Taylor and N Richard Principleddesign of the modern Web architecture ACMTransactions on Internet Technology 2(2) 2002

[9] P Furtado and P Baumann Storage ofmultidimensional arrays based on arbitrary tiling InICDE 1999

[10] J Gray and A Szalay Science in an exponentialworld Nature 440(23) 23 March 2006

[11] A Guttman R-trees a dynamic index structure forspatial searching In ACM SIGMOD Conference 1984

[12] H S Seung et al Eyewire Available at eyewireorg2012

[13] V Jain H S Seung and S C Turaga Machines thatlearn to segment images a crucial technology forconnectomics Current opinion in neurobiology 20(5)2010

[14] V Jain S Turaga K Briggman W Denk andS Seung Learning to agglomerate superpixelhierarchies In Neural Information Processing Systems2011

[15] K Kanov R Burns G Eyink C Meneveau andA Szalay Data-intensive spatial filtering in largenumerical simulation datasets In Supercomputing2012

[16] N Kasthuri and J Lichtman Untitled Inpreparation 2013

[17] V Kaynig T Fuchs and J M BuhmannGeometrical consistent 3d tracing of neuronalprocesses in ssTEM data In MICCAI 2010

[18] M Kazhdan and H Hoppe Streaming multigrid forgradient-domain operations on large images ACMTransactions on Graphics 27 2008

[19] D M Kleissas W R Gray J M Burck J TVogelstein E Perlman P M Burlina R Burns andR J Vogelstein CAJAL3D toward a fully automaticpipeline for connectome estimation fromhigh-resolution em data In Neuroinformatics 2012

[20] Y Li E Perlman M Wang Y Yang C MeneveauR Burns S Chen A Szalay and G Eyink A publicturbulence database cluster and applications to studyLagrangian evolution of velocity increments inturbulence Journal of Turbulence 9(31)1ndash29 2008

[21] J Markoff Obama seeking to boost study of humanbrain New York Times February 17 2013

[22] K D Micheva B Busse N C Weiler N OrsquoRourkeand S J Smith Single-synapse analysis of a diversesynapse population Proteomic imaging methods andmarkers Neuron 68(1) 2010

[23] B Moon H V Jagadish C Faloutsos and J HSaltz Analysis of the clustering properties of theHilbert space-filling curve IEEE Transactions onKnowledge and Data Engineering 13(1) 2001

[24] B Moon and J Saltz Scalability analysis ofdeclustering methods for multidimensional rangequeries Trans of Knowledge and Data Engineering2(10) 1998

[25] R A Musaloiu-E A Terzis K Szlavecz A SzalayJ Cogan and J Gray Life under your feet Wirelesssensors in soil ecology In Embedded NetworkedSensors 2006

[26] M Nielsen Reinventing Discovery PrincetonUniversity Press 2011

[27] A Norton and J Clyne The VAPOR visualizationapplication In High Performance Visualization 2012

[28] N OrsquoRourke N C Weiler K D Micheva and S JSmith Deep molecular diversity of mammaliansynapses why it matters and how to measure itNature Reviews Neuroscience 13(1) 2012

[29] S Papadomanolakis A Ailamaki J C Lopez T TuD R OrsquoHallaron and G Heber Efficient queryprocessing on unstructured tetrahedral meshes InACM SIGMOD 2006

[30] E Perlman R Burns Y Li and C Meneveau Dataexploration of turbulence simulations using a database

cluster In Supercomputing 2007[31] A V Reina M Gelbart D Huang J Lichtman

E L Miller and H Pfister Segmentation fusion forconnectomics In International Conference onComputer Vision 2011

[32] A V Reina W-K Jeong J Lichtman andH Pfister The connectome project discovering thewiring of the brain ACM Crossroads 18(1)8ndash132011

[33] D E Rex J Q Ma and A W Toga The LONIpipeline processing environment Neuroimage 19(3)2003

[34] S Saalfeld A Cardona V Hartenstein andP Tomancak CATMAID collaborative annotationtoolkit for massive amounts of image dataBioinformatics 25(15) 2009

[35] H Samet Foundations of Multidimensional andMetric Data Structures Morgan Kaufmann PublishersInc San Francisco 2006

[36] The Sloan Digital Sky Survey 2013 Available athttpwwwsdssorg

[37] T Sellis N Roussopoulos and C Faloutsos TheR+-tree A dynamic index for multi-dimensionalobjects In VLDB 1987

[38] C Sommer C Straehle U Koethe and F AHamprecht rdquoilastik Interactive learning andsegmentation toolkitrdquo In Biomedical Imaging 2011

[39] E Soroush M Balazinska and D Wang ArraystoreA storage manager for complex parallel arrayprocessing In SIGMOD 2011

[40] M Stonebraker J Becla D DeWitt K-T LimD Maier O Ratzesberger and S ZdonikRequirements for science data bases and SciDB InConference on Innovative Data Systems Research2009

[41] A S Szalay K Church C Meneveau A Terzis andS Zeger MRI The Development of Data-Scopemdashamulti-petabyte generic data analysis environment forscience Available at httpswikiphajhueduescience_wikiimages77fDataScopepdf 2012

[42] F Tauheed L Biveinis T Heinis F SchurmannH Markram and A Ailamaki Accelerating rangequeries for brain simulations In ICDE 2012

[43] The SciDB Development Team Overview of SciDBlarge scale array storage processing and analysis InSIGMOD 2010

[44] K Wu FastBit Interactively searching massive dataJournal of Physics Conference Series 180(1) 2009

  • 1 Introduction
  • 2 Data and Applications
  • 3 Data Model
    • 31 Physical Design
    • 32 Annotations
    • 33 Tiles
    • 34 Data Cleaning
      • 4 System Design
        • 41 Architecture
        • 42 Web Services
          • 5 Preliminary Performance
          • 6 Related Work
          • 7 Final Thoughts
          • 8 Additional Authors
          • 9 References
Page 6: The Open Connectome Project Data Cluster: Scalable ... · The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience Randal Burns William

actively being written This is an administrative action im-plemented with MySQLrsquos dump and restore utilities andmost often performed when we build the annotation reso-lution hierarchy (Section 32)

We shard large image data across multiple database nodesby partitioning the Morton-order space filling curve (Fig-ure 4) Currently we do this only for our largest dataset (bock11) for capacity reasons We have not yet founda performance benefit from sharding data For shardeddatabases the vast majority of cutout requests go to a sin-gle node We do note that multiple concurrent users of thesame data set would benefit from parallel access to the nodesof a sharded database Our sharding occurs at the applica-tion level The application is aware of the data distributionand redirects requests to the node that stores the data Weare in the process of evaluating multiple NoSQL systems inwhich sharding and scaleout are implemented in the storagesystem rather than the application We are also evaluatingSciDB [40 43] for use as an array store for spatial data

42 Web ServicesWeb-services provide rich object representations that bothcapture the output of computer vision algorithms and sup-port the spatial analysis of brain anatomy We have usedthe service to perform analysis tasks such as spatial densityestimation clustering and building distance distributionsAnalysis queries typically involve selecting a sets of objectsbased on metadata properties examining the spatial extentof these objects and computing statistics on spatial proper-ties such as distances or volumes For example one analysislooked at the distribution of the lengths of dendritic spinesskinny connections from dendrites to synapses in order tounderstand the region of influence of neural wiring TheOCP queries used metadata to find all synapses of a certaintype that connect to the selected dendrite and extractedvoxels for synapses and the object

All programming interfaces to the OCP Data Cluster useRESTful (REpresentational State Transfer) [8] interfacesthat are stateless uniform and cacheable Web-service in-vocations perform HTTP GETPUTDELETE requests tohuman readable URLs RESTrsquos simplicity makes it easy tointegrate services into many environments we have built ap-plications in Java CC++ Python Perl php and Matlab

We have selected HDF5 as our data interchange formatHDF5 is a scientific data model and format We prefer itto more common Web data formats such as JSon or XMLbecause of its support for multidimensional arrays and largedata sets

Cutout Providing efficient access to arbitrary sub-volumesof image data guides the design to the OCP Data SystemThe query which we call a cutout specifies a set of dimen-sions and ranges within each dimension in a URL GETsthe URL from the Web service which returns an HDF5file that contains a multidimensional array (Table 1) Everydatabase in OCP supports cutouts EM image databases re-turn arrays of 8-bit grayscale values annotation databasesreturn 32-bit annotation identifiers and we also support 16-bit (TIFF) and 32-bit (RBGA) image formats

Projects and Datasets A dataset configuration describesthe dimensions of spatial databases This includes the num-ber of dimensions (time series and channels) the size of eachdimensions and the number of resolutions in the hierarchyA project defines a specific database for a dataset includingthe project type (annotations or images) the storage config-uration (nodes and sharding) and properties such as doesthe database support exceptions and is the database read-only We often have tens of projects for a single dataset in-

Figure 8 A cutout of an annotation database (left)and the dense read of a single annotation (right)

cluding original data cleaned data and multiple annotationdatabases that describe different structures or use differentannotations algorithms

Object Representations Annotation databases combineall annotations of the same value (identifier) into an ob-ject associated with metadata and support spatial queriesagainst individual objects The service provides several dataoptions that allow the query to specify whether to retrievedata and in what format (Table 1) The default retrievesmetadata only from the databases tables that implementthe RAMON ontology A query may request a bounding-box around the annotation which queries a spatial indexbut does not access voxel data All the data may be re-trieved as a list of voxels or as a dense array by specify-ing cutout The dense data set has the dimensions of thebounding box Dense data may also be restricted to a spe-cific region by specifying ranges Data queries retrieve voxeldata from cuboids and then filter out all voxel labels thatdo not match the requested annotation

We provide both sparse (voxels lists) and dense (cutout)data interfaces to provide different performance optionswhen retrieving spatial data for an object The best choicedepends upon both the data and the network environmentAt the server it is always faster to compute the dense cutoutThe server reads cuboids from disk and filter the data inplace in the read buffer To compute voxel lists the match-ing voxels locations are written to another array Howevermany neural structures have large spatial extent and areextremely sparse For these the voxel representations aremuch smaller On WAN and Internet connections the re-duced network transfer time dominates additional computetime For example the dendrite 13 in kasthuri11 set com-prises 8 million voxels in a bounding box of more than 19trillion voxels ie less that 04 of voxels are in the annota-tion Other neural structures such as synapses are compactand dense interfaces always perform better

To write an annotation clients make an HTTP PUT requestto a project that includes an HDF5 file All writes use thesame base URL because the HDF5 file specifies the anno-tation identifier or gives no identifier causing the server tochoose a unique identifier for a new object The data optionsspecify the write discipline for voxels in the current anno-tation that are already labeled in the database overwritereplaces prior labels preserve keeps prior labels and ex-ception keeps the prior label and marks the new label as anexception Other data options include update when mod-ifying an existing annotation and dataonly to write voxellabels without changing metadata

Batch Interfaces OCP provides interfaces to read or writemultiple annotations at once in order to amortize the fixedcosts of making a Web service request over multiple writesand reads Batching requests is particularly important whencreating neural structures with little or no voxel data inwhich the Web service invocation dominates IO costs tothe database This was the cases with our synapse finder(Section 2) in which we doubled throughput by batching 40writes at a time HDF5 supports a directory structure that

3-d image cutout httpopenconnectometokenhdf5resolutionx-rangey-rangez-range 5123 at offset (1024 1024 1024) resolution 4 httpopenconnectomebock11hdf54512102451210245121024

Read an annotation httpopenconnectometokenidentifierdata options and voxel list for identifier 75 httpopenconnectomeannoproj75voxels and bounding box httpopenconnectomeannoproj75boundingbox and cutout restricted to a region httpopenconnectomeannoproj75cutout210002000100020001020

Write an annotation httpopenconnectometokendata optionsBatch read httpopenconnectomeannproj100010011002

Predicate query (find all synapses) httpopenconnectomeannoprojobjectstypesynapse

Table 1 RESTful interfaces to the OCP Cutout and Annotation Web services Italics indicate argumentsand parameters annoproj is an example annotation project

we use to encode multiple RAMON objects placing eachobject in its own directory by annotation identifier

Querying Metadata OCP provides a keyvalue queryinterface to object metadata The queries extract a listof objects that match a predicate against metadata Wecurrently allow equality queries against integers enumera-tions strings and user-defined keyvalue pairs and rangequeries against floating points values Although a limitedinterface these queries are simple and expressive for cer-tain tasks The objects Web service (Table 1) requests oneor more metadata fields and values (or metadata field in-equality operator and value for floating point fields) andthe service returns a list of matching annotation identi-fiers As an example we use the url openconnectomeobjectstypesynapseconfidencegeq099 to visualizethe highest confidence objects found by the synapse detec-tion pipeline

Although we do not currently provide full relational accessto databases we will provide SQL access for more complexqueries and move power users to SQL To date Web serviceshave been sufficient for computer vision pipelines But wesee no reason to re-implement an SQL-equivalent parser andlanguage through Web Services

Spatial Queries and Indexing Objects Critical to spa-tial analysis of annotation databases are two queries (1)what objects are in a region and (2) what voxels comprisean object On these primitives OCP applications buildmore complex queries such as nearest neighbors cluster-ing and spatial distributions OCPrsquos database design is wellsuited to identifying the annotations in a region The serviceperforms a cutout of the region and extracts all unique ele-ments in that region In fact the Numpy library in Pythonprovides a built in function to identify unique elements thatis implemented in C Locating the spatial extent of an objectrequires an additional index

OCP uses a simple sparse indexing technique that supportsbatch IO to locate the spatial extent of an object its voxelsand bounding box The index comprises a database tablethat enumerates the list of cuboids that contain voxels foreach annotation identifier (Figure 9) The list itself is aBLOB that contains a Python array Although this index isnot particularly compact it has several desirable properties

Adding to the index is efficient It is a batch operationand uses append only IO When writing an annotation ina cuboid we determine that the annotation is new to thecuboid based on the current contents and add the cuboidrsquosMorton-order location to a temporary list After all cuboidshave been updated a single write transaction appends allnew cuboids to the list as a batch The annotation pro-cess be it manual of machine vision tends to create newobjects and only rarely deletes or prunes existing objectsThe workload suits an append-mostly physical design

Figure 9 The sparse index for an object (green) isa list of the Morton-order location of the cuboidsthat contain voxels for that annotation The indexdescribes the disk blocks that contain voxels for thatobject which can be read in a single pass

Retrieving an object allows for batch IO and retrieves alldata in a single sequential pass The query retrieves the listof cuboids and sorts then by Morton-order locations Thenall cuboids are requested as a single query Cuboids arelaid out in increasing Morton order on disk and thus areretrieved a single sequential pass over the data

We chose this design over existing spatial indexing tech-niques because it is particularly well suited to neuroscienceobjects which are sparse and have large spatial extent ForOCP data indexes that rely on bounding boxes either growvery large (R-Trees [11]) or have inefficient search (R+-Trees[37]) Informally neural objects are long and skinny andthere are many in each region so that bounding boxes inter-sect and overlap pathologically An alternative is to use di-rected local search techniques that identify objects by a cen-troid and locate the object by searching nearby data [29 42]For OCP data these techniques produce much more com-pact indexes that are faster to maintain However query-ing the index requires many IOs to conduct an interactivesearch and we prefer lookup performance to maintenancecosts We plan to quantify and evaluate these informal com-parisons in the future

Software Application servers run a DjangoPython stackintegrated with Apache2 using WSGI that processes Webservice requests This includes performing database queriesassembling and filtering spatial data and formatting dataand metadata into and out of HDF5 files We uses paral-lel Cython to accelerate the most compute intensive tasks

(a) Maximum (b) By cutout size (c) By cutout size (loglog scale)

Figure 10 The performance of the cutout Web-service that extracts three-dimensional subvolumes from thekasthuri11 image database

Cython compiles Python code into the C language so thatit executes without the overhead of interpretation Paral-lel Cython activates OpenMP multicore parallelism withinCython routines The compute intensive routines that weaccelerate with Cython operate against every voxel in acutout Examples include (1) false coloring annotations forimage overlays in which every 32-bit annotation is mappedto an RGBA color in an output image buffer and (2) filter-ing out annotations that do not match a specified criteriaeg to find all synapses in a region one queries the metadatato get a list of synapse ids and then filters all non-matchingidentifiers out of the cutout

5 PRELIMINARY PERFORMANCEWe conducted experiments against the live Web-services toprovide an initial view of performance The major con-clusion is that memory and IO performance both limitthroughput The process of array slicing and assembly forcutout requests keeps all processors fully utilized reorganiz-ing data in memory We had not witnessed the memorybottleneck prior to this study because OCP serves data overa 1 Gbps Ethernet switch and a 1 Gbps Internet uplinkwhich we saturate long before other cluster resources Aswe migrate to the Data-Scope cluster (Section 41) we willupgrade to a 40 Gbps Internet2 uplink and more computecapable nodes which will make memory an even more crit-ical resource

Experiments were conducted on an OCP Database node aDell R710 with two Intel Xeon E5630 quad-core processors64GB of memory and a Dell H700 RAID controller manag-ing a RAID-6 array of 11 3TB SATA drives for a 27TB filesystem All Web service requests are initiated on the samenode and the use the localhost interface allowing us to testthe server beyond the 1 Gbps network limit Experimentsshow the cutout size not the size of the data transferred orread from disk Each cuboid is compressed on disk read anduncompressed The data are then packaged into a cutoutwhich is compressed prior to data transfer The cutout sizeis the amount of data that the server must handle in mem-ory The EM mouse brain data in question has high entropyand tends to compress by less than 10 The annotationdata are highly compressible

Cutout throughput is the principal performance measurefor our system dictating how much data the service pro-vides to parallel computer vision workflows Figure 10(a)

Figure 11 Throughput of 256MB cutout requests tokasthuri11 as a function of the number of concurrentrequests

shows the maximum throughput achieved over all configu-rations when issuing 16 parallel cutout requests When dataare in memory and cutout requests are aligned to cuboidboundaries (aligned memory) the system performs no IOand minimally rearranges data Processing in the appli-cation stack bounds performance In this case a singlenode realizes a peak throughput of more than 173 MBsfor the largest transfers Cutouts to random offsets alignedto cuboid boundaries add IO costs to each request andbring performance down to a peak of 121 MBs Unalignedcutouts require data to be reorganized in memory movingbyte ranges that are unaligned with the cache hierarchyThis incurs further performance penalties and peak through-put drops to only 61 MBs The IO cost of these requestsis only marginally more than aligned cutouts rounding eachdimension up to the next cuboid Unaligned cutouts revealsthe dominance of memory performance

Scalability results reveal fixed costs in both Web-service in-vocation and IO that become amortized for larger cutoutsFigures 10(b) and 10(c) show the throughput as a functionof cutout size in normal and loglog scale The experimentsuses 16 parallel requests each Performance scales nearly lin-early until 256K for reads (IO cost) and almost 1 MB for in-

Figure 12 Throughput of writing annotations as afunction of the size of the annotated region

cache (Web-service costs) Beyond this point performancecontinues to increase albeit more slowly For aligned andunaligned reads we attribute the continued increase to theMorton-order space-filling curve Larger cutouts intersectlarger aligned regions of the Morton-order curve producinglarger contiguous IOs [23] We do not report performanceabove a 256M cutout size Beyond this point the buffersneeded by the application and Web server exceed memorycapacity

To realize peak throughput we had to initiate multiple re-quests in parallel The application stack runs each Web-service request on a single process thread and memory lim-its the per-thread performance Figure 11 shows throughputas a function of the number of parallel requests Through-put scales with the number of parallel requests beyond theeight physical cores of the machine to 16 when reading datafrom disk and to 32 when reading from memory Perfor-mance scales beyond the 8 cores owing to some combina-tion of overlapping IO with computation and hyperthread-ing Too much parallelism eventually leads to a reduction inthroughput because all hardware resources are fully utilizedand more parallelism introduces more resource contention

Parallelism is key to performance in the OCP Data Clusterand the easiest way to realize it is through initiating concur-rent Web-service requests Parallelizing individual cutoutswould benefit the performance of individual request butnot overall system throughput because multiple requests al-ready consume the memory resources of all processors Wenote that the Numpy Python library we use for array ma-nipulation implements parallel operators using BLAS butdoes not parallelize array slicing

Figure 12 shows the write throughput as a function ofthe volume size when uploading annotated volumes An-notations differ from cutouts in that the data are 32-bitsper voxel and are highly compressible For this experi-ment we uploaded a region dense manual annotations ofthe kasthuri11 data set in which more than 90 of voxelsare labeled Data compress to 6 the original size How-ever performance scales with the uncompressed size of theregion because that dictates how many cuboids must bemanipulated in memory This experiment uses 16 parallelrequests Write performance scales well up to 2MB cutoutsand is faster than read at the same cutout size because thedata compress much better However beyond 2MB trans-fers write throughput collapses The best performance overall configurations of 19 MBs does not compare well withthe 121 MBs when reading image cutouts Updating a vol-

Figure 13 Performance comparison of Databasenodes and SSD nodes when writing synapses (smallrandom writes)

ume of annotations is much more complex than a cutoutIt (1) reads the previous annotations (2) applies the newannotations to the volume database resolving conflicts ona per voxel basis (3) writes back the volume database (4)reads index entries for all new annotations (5) updates eachlist by unioning new and old cuboid locations and (5) writesback the index IO is doubled and index maintenance costsare added on top The critical performance issues is updat-ing the spatial index Parallel writes to the spatial indexresult in transaction retries and timeouts in MySQL dueto contention Often a single annotation volume will re-sult in the update of hundreds of index entries one for eachunique annotation identifier in the volume We can scale tolarger writes at the expense of using fewer parallel writersImproving annotation throughput is a high priority for theOCP Data Cluster

We also include an comparison of small write performancebetween an SSD node and a Database node The SSD nodeis a Dell R310 with an Intel Xeon 3430 with 4 24GHz cores16 GB of memory and two OCZ Vertex 4 drives config-ured in a RAID 0 array The experiment uploads all of thesynapse annotations in the kasthuri11 data in random or-der committing after each write Figure 13 shows that theSSD node achieves more than 150 the throughput of theDatabase array on small random writes The number ofsynapse writes per second is surprisingly low We write onlyabout 6 RAMON objects per second However a single Ra-mon object write generates updates to three different meta-data tables the index and the volume database The take-away from this experiment is that an inexpensive SSD node(lt$3000) offloads the write workload of an entire Databasenode (gt$18000) All IO in this experiment is randomIn practice we achieve much higher write throughput be-cause of locality and batching of requests Our synapsefinder workload uploaded more than 73 synapses a secondper node but that number reflects caching and effects whenupdating many synapses in a small region

6 RELATED WORKThe performance issues faced by the spatial databases in theOCP Data Cluster parallel those of SciDB [40 43] includingarray slicing selection queries joins and clusteringscale-out SciDB might benefit OCP by making data distributiontransparent and by offloading application functions imple-mented in Python to SciDB

Many of the physical design principles of OCP follow the Ar-rayStore of Soroush [39] They give a model for multi-levelpartitioning (chunking and tiling) that is more general thanOCP cuboids They can handle both regular and irregular

data ArrayStore extends decades of work on regular [6 24]and irregular [5] tiling The RasDaMan multidimensionalarray database [2] also builds spatial data services for im-age processing on top of a relational database They toohave explored the issues of tiling data organization anddata distribution [9] In contrast to general-purpose arraydatabases OCP has a focus on the specific application ofparallel computer vision for neuroscience and as such tunesits design to application concepts and properties Examplesinclude using the partitioning of space-filling curves in thedata distribution function and indexing of neural objects

OCPrsquos design incorporates many techniques developed inthe spatial data management community Samet [35] wrotethe authoritative text on the subject which we have used asa reference for region quad-trees space-filling curves tessel-lations and much more

OCP represents the a current state of evolution of the scale-out database architectures developed by the Institute forData-Intensive Engineering and Science at Johns HopkinsGray and Szalay developed the progenitor system in theSloan Digital Sky Survey [36] This has lead to more than 30different data products over the past decade including theJHU Turbulence Database Cluster [20] and the Life UnderYour Feet soil ecology portal [25]

7 FINAL THOUGHTSIn less than two years The Open Connectome Projecthas evolved from a problem statement to a data manage-ment and analysis platform for a community of neurosci-entists with the singular goal of mapping the brain High-throughput imaging instruments have driven data manage-ment off the workstation and out of the lab into data-intensive clusters and Web-services Also the scope of theproblem has forced experimental biologists to engage statis-ticians systems engineers machine learners and big-datascientists The Open Connectome Project aims to be theforum in which they all meet

The OCP Data Cluster has built a rich set of features tomeet the analysis needs of its users Application-specificcapabilities have been the development focus to date Theplatform manages data sets from many imaging modalities(Five to fifteen depending on how one counts them)

Presently our focus must change to throughput and scala-bility The near future holds two 100 teravoxel data setsmdasheach larger than the sum of all our data We will move to anew home in the Data-Scope cluster which will increase ouroutward-facing Internet bandwidth by a factor of forty Wealso expect the Connectomics community to expand rapidlyThe US Presidentrsquos Office recently announced a decade-longinitiative to build a comprehensive map of the human brain[21] This study reveals many limitations of the OCP datacluster and provides guidance to focus our efforts specif-ically toward servicing many small writes to record brainstructures and efficient and parallel manipulation of data toalleviate memory bottlenecks

AcknowledgmentsThe authors would like to thank additional members of theOpen Connectome Project that contributed to this workincluding Disa Mhembere and Ayushi Sinha We wouldalso like to thank our collaborators Forrest Collman AlbertoCardona Cameron Craddock Michael Milham Partha Mi-tra and Sebastian Seung

8 ADDITIONAL AUTHORSlowast Department of Computer Science and the Institute forData Intensive Engineering and Science Johns Hopkins Uni-versity

dagger Johns Hopkins University Applied Physics LaboratoryDagger Department of Statistical Science and Mathematics andthe Institute for Brain Science Duke Universitysect Janelia Farm Research Campus Howard Hughes MedicalInstitutepara Department of Molecular and Cellular Biology HarvardUniversity Department of Computational Neuroscience MassachusettsInstitute of Technologylowastlowast Allen Institute for Brain Sciencedaggerdagger Department of Physics and Astronomy and the Institutefor Data Intensive Engineering and Science Johns HopkinsUniversityDaggerDagger Department of Bioengineering Stanford Universitysectsect Department of Molecular and Cellular Physiology Stan-ford University

9 REFERENCES[1] D J Abadi S R Madden and M Ferreira

Integrating compression and execution incolumn-oriented database systems In SIGMOD 2006

[2] P Baumann A Dehmel P Furtado R Ritsch andB Widmann The multidimensional database systemRasDaMan In SIGMOD 1998

[3] D D Bock W-C A Lee A M Kerlin M LAndermann G Hood A W Wetzel S YurgensonE R Soucy H S Kim and R C Reid Networkanatomy and in vivo physiology of visual corticalneurons Nature 471(7337) 2011

[4] A Cardona S Saalfeld J SchindelinI Arganda-Carreras S Preibisch M LongairP Tomancak V Hartenstein and R J DouglasTrakEM2 software for neural circuit reconstructionPLoS ONE 7(6) 2012

[5] C Chang A Acharya A Sussman and J Saltz T2a customizable parallel database for multi-dimensionaldata SIGMOD Record 27(1) 1998

[6] C Chang B Moon A Acharya C ShockA Sussman and J Saltz Titan A high-performanceremote sensing database In ICDE 1997

[7] F Crow Summed-area tables for texture mapping InSIGGRAPH 1984

[8] R T Fielding R Taylor and N Richard Principleddesign of the modern Web architecture ACMTransactions on Internet Technology 2(2) 2002

[9] P Furtado and P Baumann Storage ofmultidimensional arrays based on arbitrary tiling InICDE 1999

[10] J Gray and A Szalay Science in an exponentialworld Nature 440(23) 23 March 2006

[11] A Guttman R-trees a dynamic index structure forspatial searching In ACM SIGMOD Conference 1984

[12] H S Seung et al Eyewire Available at eyewireorg2012

[13] V Jain H S Seung and S C Turaga Machines thatlearn to segment images a crucial technology forconnectomics Current opinion in neurobiology 20(5)2010

[14] V Jain S Turaga K Briggman W Denk andS Seung Learning to agglomerate superpixelhierarchies In Neural Information Processing Systems2011

[15] K Kanov R Burns G Eyink C Meneveau andA Szalay Data-intensive spatial filtering in largenumerical simulation datasets In Supercomputing2012

[16] N Kasthuri and J Lichtman Untitled Inpreparation 2013

[17] V Kaynig T Fuchs and J M BuhmannGeometrical consistent 3d tracing of neuronalprocesses in ssTEM data In MICCAI 2010

[18] M Kazhdan and H Hoppe Streaming multigrid forgradient-domain operations on large images ACMTransactions on Graphics 27 2008

[19] D M Kleissas W R Gray J M Burck J TVogelstein E Perlman P M Burlina R Burns andR J Vogelstein CAJAL3D toward a fully automaticpipeline for connectome estimation fromhigh-resolution em data In Neuroinformatics 2012

[20] Y Li E Perlman M Wang Y Yang C MeneveauR Burns S Chen A Szalay and G Eyink A publicturbulence database cluster and applications to studyLagrangian evolution of velocity increments inturbulence Journal of Turbulence 9(31)1ndash29 2008

[21] J Markoff Obama seeking to boost study of humanbrain New York Times February 17 2013

[22] K D Micheva B Busse N C Weiler N OrsquoRourkeand S J Smith Single-synapse analysis of a diversesynapse population Proteomic imaging methods andmarkers Neuron 68(1) 2010

[23] B Moon H V Jagadish C Faloutsos and J HSaltz Analysis of the clustering properties of theHilbert space-filling curve IEEE Transactions onKnowledge and Data Engineering 13(1) 2001

[24] B Moon and J Saltz Scalability analysis ofdeclustering methods for multidimensional rangequeries Trans of Knowledge and Data Engineering2(10) 1998

[25] R A Musaloiu-E A Terzis K Szlavecz A SzalayJ Cogan and J Gray Life under your feet Wirelesssensors in soil ecology In Embedded NetworkedSensors 2006

[26] M Nielsen Reinventing Discovery PrincetonUniversity Press 2011

[27] A Norton and J Clyne The VAPOR visualizationapplication In High Performance Visualization 2012

[28] N OrsquoRourke N C Weiler K D Micheva and S JSmith Deep molecular diversity of mammaliansynapses why it matters and how to measure itNature Reviews Neuroscience 13(1) 2012

[29] S Papadomanolakis A Ailamaki J C Lopez T TuD R OrsquoHallaron and G Heber Efficient queryprocessing on unstructured tetrahedral meshes InACM SIGMOD 2006

[30] E Perlman R Burns Y Li and C Meneveau Dataexploration of turbulence simulations using a database

cluster In Supercomputing 2007[31] A V Reina M Gelbart D Huang J Lichtman

E L Miller and H Pfister Segmentation fusion forconnectomics In International Conference onComputer Vision 2011

[32] A V Reina W-K Jeong J Lichtman andH Pfister The connectome project discovering thewiring of the brain ACM Crossroads 18(1)8ndash132011

[33] D E Rex J Q Ma and A W Toga The LONIpipeline processing environment Neuroimage 19(3)2003

[34] S Saalfeld A Cardona V Hartenstein andP Tomancak CATMAID collaborative annotationtoolkit for massive amounts of image dataBioinformatics 25(15) 2009

[35] H Samet Foundations of Multidimensional andMetric Data Structures Morgan Kaufmann PublishersInc San Francisco 2006

[36] The Sloan Digital Sky Survey 2013 Available athttpwwwsdssorg

[37] T Sellis N Roussopoulos and C Faloutsos TheR+-tree A dynamic index for multi-dimensionalobjects In VLDB 1987

[38] C Sommer C Straehle U Koethe and F AHamprecht rdquoilastik Interactive learning andsegmentation toolkitrdquo In Biomedical Imaging 2011

[39] E Soroush M Balazinska and D Wang ArraystoreA storage manager for complex parallel arrayprocessing In SIGMOD 2011

[40] M Stonebraker J Becla D DeWitt K-T LimD Maier O Ratzesberger and S ZdonikRequirements for science data bases and SciDB InConference on Innovative Data Systems Research2009

[41] A S Szalay K Church C Meneveau A Terzis andS Zeger MRI The Development of Data-Scopemdashamulti-petabyte generic data analysis environment forscience Available at httpswikiphajhueduescience_wikiimages77fDataScopepdf 2012

[42] F Tauheed L Biveinis T Heinis F SchurmannH Markram and A Ailamaki Accelerating rangequeries for brain simulations In ICDE 2012

[43] The SciDB Development Team Overview of SciDBlarge scale array storage processing and analysis InSIGMOD 2010

[44] K Wu FastBit Interactively searching massive dataJournal of Physics Conference Series 180(1) 2009

  • 1 Introduction
  • 2 Data and Applications
  • 3 Data Model
    • 31 Physical Design
    • 32 Annotations
    • 33 Tiles
    • 34 Data Cleaning
      • 4 System Design
        • 41 Architecture
        • 42 Web Services
          • 5 Preliminary Performance
          • 6 Related Work
          • 7 Final Thoughts
          • 8 Additional Authors
          • 9 References
Page 7: The Open Connectome Project Data Cluster: Scalable ... · The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience Randal Burns William

3-d image cutout httpopenconnectometokenhdf5resolutionx-rangey-rangez-range 5123 at offset (1024 1024 1024) resolution 4 httpopenconnectomebock11hdf54512102451210245121024

Read an annotation httpopenconnectometokenidentifierdata options and voxel list for identifier 75 httpopenconnectomeannoproj75voxels and bounding box httpopenconnectomeannoproj75boundingbox and cutout restricted to a region httpopenconnectomeannoproj75cutout210002000100020001020

Write an annotation httpopenconnectometokendata optionsBatch read httpopenconnectomeannproj100010011002

Predicate query (find all synapses) httpopenconnectomeannoprojobjectstypesynapse

Table 1 RESTful interfaces to the OCP Cutout and Annotation Web services Italics indicate argumentsand parameters annoproj is an example annotation project

we use to encode multiple RAMON objects placing eachobject in its own directory by annotation identifier

Querying Metadata OCP provides a keyvalue queryinterface to object metadata The queries extract a listof objects that match a predicate against metadata Wecurrently allow equality queries against integers enumera-tions strings and user-defined keyvalue pairs and rangequeries against floating points values Although a limitedinterface these queries are simple and expressive for cer-tain tasks The objects Web service (Table 1) requests oneor more metadata fields and values (or metadata field in-equality operator and value for floating point fields) andthe service returns a list of matching annotation identi-fiers As an example we use the url openconnectomeobjectstypesynapseconfidencegeq099 to visualizethe highest confidence objects found by the synapse detec-tion pipeline

Although we do not currently provide full relational accessto databases we will provide SQL access for more complexqueries and move power users to SQL To date Web serviceshave been sufficient for computer vision pipelines But wesee no reason to re-implement an SQL-equivalent parser andlanguage through Web Services

Spatial Queries and Indexing Objects Critical to spa-tial analysis of annotation databases are two queries (1)what objects are in a region and (2) what voxels comprisean object On these primitives OCP applications buildmore complex queries such as nearest neighbors cluster-ing and spatial distributions OCPrsquos database design is wellsuited to identifying the annotations in a region The serviceperforms a cutout of the region and extracts all unique ele-ments in that region In fact the Numpy library in Pythonprovides a built in function to identify unique elements thatis implemented in C Locating the spatial extent of an objectrequires an additional index

OCP uses a simple sparse indexing technique that supportsbatch IO to locate the spatial extent of an object its voxelsand bounding box The index comprises a database tablethat enumerates the list of cuboids that contain voxels foreach annotation identifier (Figure 9) The list itself is aBLOB that contains a Python array Although this index isnot particularly compact it has several desirable properties

Adding to the index is efficient It is a batch operationand uses append only IO When writing an annotation ina cuboid we determine that the annotation is new to thecuboid based on the current contents and add the cuboidrsquosMorton-order location to a temporary list After all cuboidshave been updated a single write transaction appends allnew cuboids to the list as a batch The annotation pro-cess be it manual of machine vision tends to create newobjects and only rarely deletes or prunes existing objectsThe workload suits an append-mostly physical design

Figure 9 The sparse index for an object (green) isa list of the Morton-order location of the cuboidsthat contain voxels for that annotation The indexdescribes the disk blocks that contain voxels for thatobject which can be read in a single pass

Retrieving an object allows for batch IO and retrieves alldata in a single sequential pass The query retrieves the listof cuboids and sorts then by Morton-order locations Thenall cuboids are requested as a single query Cuboids arelaid out in increasing Morton order on disk and thus areretrieved a single sequential pass over the data

We chose this design over existing spatial indexing tech-niques because it is particularly well suited to neuroscienceobjects which are sparse and have large spatial extent ForOCP data indexes that rely on bounding boxes either growvery large (R-Trees [11]) or have inefficient search (R+-Trees[37]) Informally neural objects are long and skinny andthere are many in each region so that bounding boxes inter-sect and overlap pathologically An alternative is to use di-rected local search techniques that identify objects by a cen-troid and locate the object by searching nearby data [29 42]For OCP data these techniques produce much more com-pact indexes that are faster to maintain However query-ing the index requires many IOs to conduct an interactivesearch and we prefer lookup performance to maintenancecosts We plan to quantify and evaluate these informal com-parisons in the future

Software Application servers run a DjangoPython stackintegrated with Apache2 using WSGI that processes Webservice requests This includes performing database queriesassembling and filtering spatial data and formatting dataand metadata into and out of HDF5 files We uses paral-lel Cython to accelerate the most compute intensive tasks

(a) Maximum (b) By cutout size (c) By cutout size (loglog scale)

Figure 10 The performance of the cutout Web-service that extracts three-dimensional subvolumes from thekasthuri11 image database

Cython compiles Python code into the C language so thatit executes without the overhead of interpretation Paral-lel Cython activates OpenMP multicore parallelism withinCython routines The compute intensive routines that weaccelerate with Cython operate against every voxel in acutout Examples include (1) false coloring annotations forimage overlays in which every 32-bit annotation is mappedto an RGBA color in an output image buffer and (2) filter-ing out annotations that do not match a specified criteriaeg to find all synapses in a region one queries the metadatato get a list of synapse ids and then filters all non-matchingidentifiers out of the cutout

5 PRELIMINARY PERFORMANCEWe conducted experiments against the live Web-services toprovide an initial view of performance The major con-clusion is that memory and IO performance both limitthroughput The process of array slicing and assembly forcutout requests keeps all processors fully utilized reorganiz-ing data in memory We had not witnessed the memorybottleneck prior to this study because OCP serves data overa 1 Gbps Ethernet switch and a 1 Gbps Internet uplinkwhich we saturate long before other cluster resources Aswe migrate to the Data-Scope cluster (Section 41) we willupgrade to a 40 Gbps Internet2 uplink and more computecapable nodes which will make memory an even more crit-ical resource

Experiments were conducted on an OCP Database node aDell R710 with two Intel Xeon E5630 quad-core processors64GB of memory and a Dell H700 RAID controller manag-ing a RAID-6 array of 11 3TB SATA drives for a 27TB filesystem All Web service requests are initiated on the samenode and the use the localhost interface allowing us to testthe server beyond the 1 Gbps network limit Experimentsshow the cutout size not the size of the data transferred orread from disk Each cuboid is compressed on disk read anduncompressed The data are then packaged into a cutoutwhich is compressed prior to data transfer The cutout sizeis the amount of data that the server must handle in mem-ory The EM mouse brain data in question has high entropyand tends to compress by less than 10 The annotationdata are highly compressible

Cutout throughput is the principal performance measurefor our system dictating how much data the service pro-vides to parallel computer vision workflows Figure 10(a)

Figure 11 Throughput of 256MB cutout requests tokasthuri11 as a function of the number of concurrentrequests

shows the maximum throughput achieved over all configu-rations when issuing 16 parallel cutout requests When dataare in memory and cutout requests are aligned to cuboidboundaries (aligned memory) the system performs no IOand minimally rearranges data Processing in the appli-cation stack bounds performance In this case a singlenode realizes a peak throughput of more than 173 MBsfor the largest transfers Cutouts to random offsets alignedto cuboid boundaries add IO costs to each request andbring performance down to a peak of 121 MBs Unalignedcutouts require data to be reorganized in memory movingbyte ranges that are unaligned with the cache hierarchyThis incurs further performance penalties and peak through-put drops to only 61 MBs The IO cost of these requestsis only marginally more than aligned cutouts rounding eachdimension up to the next cuboid Unaligned cutouts revealsthe dominance of memory performance

Scalability results reveal fixed costs in both Web-service in-vocation and IO that become amortized for larger cutoutsFigures 10(b) and 10(c) show the throughput as a functionof cutout size in normal and loglog scale The experimentsuses 16 parallel requests each Performance scales nearly lin-early until 256K for reads (IO cost) and almost 1 MB for in-

Figure 12 Throughput of writing annotations as afunction of the size of the annotated region

cache (Web-service costs) Beyond this point performancecontinues to increase albeit more slowly For aligned andunaligned reads we attribute the continued increase to theMorton-order space-filling curve Larger cutouts intersectlarger aligned regions of the Morton-order curve producinglarger contiguous IOs [23] We do not report performanceabove a 256M cutout size Beyond this point the buffersneeded by the application and Web server exceed memorycapacity

To realize peak throughput we had to initiate multiple re-quests in parallel The application stack runs each Web-service request on a single process thread and memory lim-its the per-thread performance Figure 11 shows throughputas a function of the number of parallel requests Through-put scales with the number of parallel requests beyond theeight physical cores of the machine to 16 when reading datafrom disk and to 32 when reading from memory Perfor-mance scales beyond the 8 cores owing to some combina-tion of overlapping IO with computation and hyperthread-ing Too much parallelism eventually leads to a reduction inthroughput because all hardware resources are fully utilizedand more parallelism introduces more resource contention

Parallelism is key to performance in the OCP Data Clusterand the easiest way to realize it is through initiating concur-rent Web-service requests Parallelizing individual cutoutswould benefit the performance of individual request butnot overall system throughput because multiple requests al-ready consume the memory resources of all processors Wenote that the Numpy Python library we use for array ma-nipulation implements parallel operators using BLAS butdoes not parallelize array slicing

Figure 12 shows the write throughput as a function ofthe volume size when uploading annotated volumes An-notations differ from cutouts in that the data are 32-bitsper voxel and are highly compressible For this experi-ment we uploaded a region dense manual annotations ofthe kasthuri11 data set in which more than 90 of voxelsare labeled Data compress to 6 the original size How-ever performance scales with the uncompressed size of theregion because that dictates how many cuboids must bemanipulated in memory This experiment uses 16 parallelrequests Write performance scales well up to 2MB cutoutsand is faster than read at the same cutout size because thedata compress much better However beyond 2MB trans-fers write throughput collapses The best performance overall configurations of 19 MBs does not compare well withthe 121 MBs when reading image cutouts Updating a vol-

Figure 13 Performance comparison of Databasenodes and SSD nodes when writing synapses (smallrandom writes)

ume of annotations is much more complex than a cutoutIt (1) reads the previous annotations (2) applies the newannotations to the volume database resolving conflicts ona per voxel basis (3) writes back the volume database (4)reads index entries for all new annotations (5) updates eachlist by unioning new and old cuboid locations and (5) writesback the index IO is doubled and index maintenance costsare added on top The critical performance issues is updat-ing the spatial index Parallel writes to the spatial indexresult in transaction retries and timeouts in MySQL dueto contention Often a single annotation volume will re-sult in the update of hundreds of index entries one for eachunique annotation identifier in the volume We can scale tolarger writes at the expense of using fewer parallel writersImproving annotation throughput is a high priority for theOCP Data Cluster

We also include an comparison of small write performancebetween an SSD node and a Database node The SSD nodeis a Dell R310 with an Intel Xeon 3430 with 4 24GHz cores16 GB of memory and two OCZ Vertex 4 drives config-ured in a RAID 0 array The experiment uploads all of thesynapse annotations in the kasthuri11 data in random or-der committing after each write Figure 13 shows that theSSD node achieves more than 150 the throughput of theDatabase array on small random writes The number ofsynapse writes per second is surprisingly low We write onlyabout 6 RAMON objects per second However a single Ra-mon object write generates updates to three different meta-data tables the index and the volume database The take-away from this experiment is that an inexpensive SSD node(lt$3000) offloads the write workload of an entire Databasenode (gt$18000) All IO in this experiment is randomIn practice we achieve much higher write throughput be-cause of locality and batching of requests Our synapsefinder workload uploaded more than 73 synapses a secondper node but that number reflects caching and effects whenupdating many synapses in a small region

6 RELATED WORKThe performance issues faced by the spatial databases in theOCP Data Cluster parallel those of SciDB [40 43] includingarray slicing selection queries joins and clusteringscale-out SciDB might benefit OCP by making data distributiontransparent and by offloading application functions imple-mented in Python to SciDB

Many of the physical design principles of OCP follow the Ar-rayStore of Soroush [39] They give a model for multi-levelpartitioning (chunking and tiling) that is more general thanOCP cuboids They can handle both regular and irregular

data ArrayStore extends decades of work on regular [6 24]and irregular [5] tiling The RasDaMan multidimensionalarray database [2] also builds spatial data services for im-age processing on top of a relational database They toohave explored the issues of tiling data organization anddata distribution [9] In contrast to general-purpose arraydatabases OCP has a focus on the specific application ofparallel computer vision for neuroscience and as such tunesits design to application concepts and properties Examplesinclude using the partitioning of space-filling curves in thedata distribution function and indexing of neural objects

OCPrsquos design incorporates many techniques developed inthe spatial data management community Samet [35] wrotethe authoritative text on the subject which we have used asa reference for region quad-trees space-filling curves tessel-lations and much more

OCP represents the a current state of evolution of the scale-out database architectures developed by the Institute forData-Intensive Engineering and Science at Johns HopkinsGray and Szalay developed the progenitor system in theSloan Digital Sky Survey [36] This has lead to more than 30different data products over the past decade including theJHU Turbulence Database Cluster [20] and the Life UnderYour Feet soil ecology portal [25]

7 FINAL THOUGHTSIn less than two years The Open Connectome Projecthas evolved from a problem statement to a data manage-ment and analysis platform for a community of neurosci-entists with the singular goal of mapping the brain High-throughput imaging instruments have driven data manage-ment off the workstation and out of the lab into data-intensive clusters and Web-services Also the scope of theproblem has forced experimental biologists to engage statis-ticians systems engineers machine learners and big-datascientists The Open Connectome Project aims to be theforum in which they all meet

The OCP Data Cluster has built a rich set of features tomeet the analysis needs of its users Application-specificcapabilities have been the development focus to date Theplatform manages data sets from many imaging modalities(Five to fifteen depending on how one counts them)

Presently our focus must change to throughput and scala-bility The near future holds two 100 teravoxel data setsmdasheach larger than the sum of all our data We will move to anew home in the Data-Scope cluster which will increase ouroutward-facing Internet bandwidth by a factor of forty Wealso expect the Connectomics community to expand rapidlyThe US Presidentrsquos Office recently announced a decade-longinitiative to build a comprehensive map of the human brain[21] This study reveals many limitations of the OCP datacluster and provides guidance to focus our efforts specif-ically toward servicing many small writes to record brainstructures and efficient and parallel manipulation of data toalleviate memory bottlenecks

AcknowledgmentsThe authors would like to thank additional members of theOpen Connectome Project that contributed to this workincluding Disa Mhembere and Ayushi Sinha We wouldalso like to thank our collaborators Forrest Collman AlbertoCardona Cameron Craddock Michael Milham Partha Mi-tra and Sebastian Seung

8 ADDITIONAL AUTHORSlowast Department of Computer Science and the Institute forData Intensive Engineering and Science Johns Hopkins Uni-versity

dagger Johns Hopkins University Applied Physics LaboratoryDagger Department of Statistical Science and Mathematics andthe Institute for Brain Science Duke Universitysect Janelia Farm Research Campus Howard Hughes MedicalInstitutepara Department of Molecular and Cellular Biology HarvardUniversity Department of Computational Neuroscience MassachusettsInstitute of Technologylowastlowast Allen Institute for Brain Sciencedaggerdagger Department of Physics and Astronomy and the Institutefor Data Intensive Engineering and Science Johns HopkinsUniversityDaggerDagger Department of Bioengineering Stanford Universitysectsect Department of Molecular and Cellular Physiology Stan-ford University

9 REFERENCES[1] D J Abadi S R Madden and M Ferreira

Integrating compression and execution incolumn-oriented database systems In SIGMOD 2006

[2] P Baumann A Dehmel P Furtado R Ritsch andB Widmann The multidimensional database systemRasDaMan In SIGMOD 1998

[3] D D Bock W-C A Lee A M Kerlin M LAndermann G Hood A W Wetzel S YurgensonE R Soucy H S Kim and R C Reid Networkanatomy and in vivo physiology of visual corticalneurons Nature 471(7337) 2011

[4] A Cardona S Saalfeld J SchindelinI Arganda-Carreras S Preibisch M LongairP Tomancak V Hartenstein and R J DouglasTrakEM2 software for neural circuit reconstructionPLoS ONE 7(6) 2012

[5] C Chang A Acharya A Sussman and J Saltz T2a customizable parallel database for multi-dimensionaldata SIGMOD Record 27(1) 1998

[6] C Chang B Moon A Acharya C ShockA Sussman and J Saltz Titan A high-performanceremote sensing database In ICDE 1997

[7] F Crow Summed-area tables for texture mapping InSIGGRAPH 1984

[8] R T Fielding R Taylor and N Richard Principleddesign of the modern Web architecture ACMTransactions on Internet Technology 2(2) 2002

[9] P Furtado and P Baumann Storage ofmultidimensional arrays based on arbitrary tiling InICDE 1999

[10] J Gray and A Szalay Science in an exponentialworld Nature 440(23) 23 March 2006

[11] A Guttman R-trees a dynamic index structure forspatial searching In ACM SIGMOD Conference 1984

[12] H S Seung et al Eyewire Available at eyewireorg2012

[13] V Jain H S Seung and S C Turaga Machines thatlearn to segment images a crucial technology forconnectomics Current opinion in neurobiology 20(5)2010

[14] V Jain S Turaga K Briggman W Denk andS Seung Learning to agglomerate superpixelhierarchies In Neural Information Processing Systems2011

[15] K Kanov R Burns G Eyink C Meneveau andA Szalay Data-intensive spatial filtering in largenumerical simulation datasets In Supercomputing2012

[16] N Kasthuri and J Lichtman Untitled Inpreparation 2013

[17] V Kaynig T Fuchs and J M BuhmannGeometrical consistent 3d tracing of neuronalprocesses in ssTEM data In MICCAI 2010

[18] M Kazhdan and H Hoppe Streaming multigrid forgradient-domain operations on large images ACMTransactions on Graphics 27 2008

[19] D M Kleissas W R Gray J M Burck J TVogelstein E Perlman P M Burlina R Burns andR J Vogelstein CAJAL3D toward a fully automaticpipeline for connectome estimation fromhigh-resolution em data In Neuroinformatics 2012

[20] Y Li E Perlman M Wang Y Yang C MeneveauR Burns S Chen A Szalay and G Eyink A publicturbulence database cluster and applications to studyLagrangian evolution of velocity increments inturbulence Journal of Turbulence 9(31)1ndash29 2008

[21] J Markoff Obama seeking to boost study of humanbrain New York Times February 17 2013

[22] K D Micheva B Busse N C Weiler N OrsquoRourkeand S J Smith Single-synapse analysis of a diversesynapse population Proteomic imaging methods andmarkers Neuron 68(1) 2010

[23] B Moon H V Jagadish C Faloutsos and J HSaltz Analysis of the clustering properties of theHilbert space-filling curve IEEE Transactions onKnowledge and Data Engineering 13(1) 2001

[24] B Moon and J Saltz Scalability analysis ofdeclustering methods for multidimensional rangequeries Trans of Knowledge and Data Engineering2(10) 1998

[25] R A Musaloiu-E A Terzis K Szlavecz A SzalayJ Cogan and J Gray Life under your feet Wirelesssensors in soil ecology In Embedded NetworkedSensors 2006

[26] M Nielsen Reinventing Discovery PrincetonUniversity Press 2011

[27] A Norton and J Clyne The VAPOR visualizationapplication In High Performance Visualization 2012

[28] N OrsquoRourke N C Weiler K D Micheva and S JSmith Deep molecular diversity of mammaliansynapses why it matters and how to measure itNature Reviews Neuroscience 13(1) 2012

[29] S Papadomanolakis A Ailamaki J C Lopez T TuD R OrsquoHallaron and G Heber Efficient queryprocessing on unstructured tetrahedral meshes InACM SIGMOD 2006

[30] E Perlman R Burns Y Li and C Meneveau Dataexploration of turbulence simulations using a database

cluster In Supercomputing 2007[31] A V Reina M Gelbart D Huang J Lichtman

E L Miller and H Pfister Segmentation fusion forconnectomics In International Conference onComputer Vision 2011

[32] A V Reina W-K Jeong J Lichtman andH Pfister The connectome project discovering thewiring of the brain ACM Crossroads 18(1)8ndash132011

[33] D E Rex J Q Ma and A W Toga The LONIpipeline processing environment Neuroimage 19(3)2003

[34] S Saalfeld A Cardona V Hartenstein andP Tomancak CATMAID collaborative annotationtoolkit for massive amounts of image dataBioinformatics 25(15) 2009

[35] H Samet Foundations of Multidimensional andMetric Data Structures Morgan Kaufmann PublishersInc San Francisco 2006

[36] The Sloan Digital Sky Survey 2013 Available athttpwwwsdssorg

[37] T Sellis N Roussopoulos and C Faloutsos TheR+-tree A dynamic index for multi-dimensionalobjects In VLDB 1987

[38] C Sommer C Straehle U Koethe and F AHamprecht rdquoilastik Interactive learning andsegmentation toolkitrdquo In Biomedical Imaging 2011

[39] E Soroush M Balazinska and D Wang ArraystoreA storage manager for complex parallel arrayprocessing In SIGMOD 2011

[40] M Stonebraker J Becla D DeWitt K-T LimD Maier O Ratzesberger and S ZdonikRequirements for science data bases and SciDB InConference on Innovative Data Systems Research2009

[41] A S Szalay K Church C Meneveau A Terzis andS Zeger MRI The Development of Data-Scopemdashamulti-petabyte generic data analysis environment forscience Available at httpswikiphajhueduescience_wikiimages77fDataScopepdf 2012

[42] F Tauheed L Biveinis T Heinis F SchurmannH Markram and A Ailamaki Accelerating rangequeries for brain simulations In ICDE 2012

[43] The SciDB Development Team Overview of SciDBlarge scale array storage processing and analysis InSIGMOD 2010

[44] K Wu FastBit Interactively searching massive dataJournal of Physics Conference Series 180(1) 2009

  • 1 Introduction
  • 2 Data and Applications
  • 3 Data Model
    • 31 Physical Design
    • 32 Annotations
    • 33 Tiles
    • 34 Data Cleaning
      • 4 System Design
        • 41 Architecture
        • 42 Web Services
          • 5 Preliminary Performance
          • 6 Related Work
          • 7 Final Thoughts
          • 8 Additional Authors
          • 9 References
Page 8: The Open Connectome Project Data Cluster: Scalable ... · The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience Randal Burns William

(a) Maximum (b) By cutout size (c) By cutout size (loglog scale)

Figure 10 The performance of the cutout Web-service that extracts three-dimensional subvolumes from thekasthuri11 image database

Cython compiles Python code into the C language so thatit executes without the overhead of interpretation Paral-lel Cython activates OpenMP multicore parallelism withinCython routines The compute intensive routines that weaccelerate with Cython operate against every voxel in acutout Examples include (1) false coloring annotations forimage overlays in which every 32-bit annotation is mappedto an RGBA color in an output image buffer and (2) filter-ing out annotations that do not match a specified criteriaeg to find all synapses in a region one queries the metadatato get a list of synapse ids and then filters all non-matchingidentifiers out of the cutout

5 PRELIMINARY PERFORMANCEWe conducted experiments against the live Web-services toprovide an initial view of performance The major con-clusion is that memory and IO performance both limitthroughput The process of array slicing and assembly forcutout requests keeps all processors fully utilized reorganiz-ing data in memory We had not witnessed the memorybottleneck prior to this study because OCP serves data overa 1 Gbps Ethernet switch and a 1 Gbps Internet uplinkwhich we saturate long before other cluster resources Aswe migrate to the Data-Scope cluster (Section 41) we willupgrade to a 40 Gbps Internet2 uplink and more computecapable nodes which will make memory an even more crit-ical resource

Experiments were conducted on an OCP Database node aDell R710 with two Intel Xeon E5630 quad-core processors64GB of memory and a Dell H700 RAID controller manag-ing a RAID-6 array of 11 3TB SATA drives for a 27TB filesystem All Web service requests are initiated on the samenode and the use the localhost interface allowing us to testthe server beyond the 1 Gbps network limit Experimentsshow the cutout size not the size of the data transferred orread from disk Each cuboid is compressed on disk read anduncompressed The data are then packaged into a cutoutwhich is compressed prior to data transfer The cutout sizeis the amount of data that the server must handle in mem-ory The EM mouse brain data in question has high entropyand tends to compress by less than 10 The annotationdata are highly compressible

Cutout throughput is the principal performance measurefor our system dictating how much data the service pro-vides to parallel computer vision workflows Figure 10(a)

Figure 11 Throughput of 256MB cutout requests tokasthuri11 as a function of the number of concurrentrequests

shows the maximum throughput achieved over all configu-rations when issuing 16 parallel cutout requests When dataare in memory and cutout requests are aligned to cuboidboundaries (aligned memory) the system performs no IOand minimally rearranges data Processing in the appli-cation stack bounds performance In this case a singlenode realizes a peak throughput of more than 173 MBsfor the largest transfers Cutouts to random offsets alignedto cuboid boundaries add IO costs to each request andbring performance down to a peak of 121 MBs Unalignedcutouts require data to be reorganized in memory movingbyte ranges that are unaligned with the cache hierarchyThis incurs further performance penalties and peak through-put drops to only 61 MBs The IO cost of these requestsis only marginally more than aligned cutouts rounding eachdimension up to the next cuboid Unaligned cutouts revealsthe dominance of memory performance

Scalability results reveal fixed costs in both Web-service in-vocation and IO that become amortized for larger cutoutsFigures 10(b) and 10(c) show the throughput as a functionof cutout size in normal and loglog scale The experimentsuses 16 parallel requests each Performance scales nearly lin-early until 256K for reads (IO cost) and almost 1 MB for in-

Figure 12 Throughput of writing annotations as afunction of the size of the annotated region

cache (Web-service costs) Beyond this point performancecontinues to increase albeit more slowly For aligned andunaligned reads we attribute the continued increase to theMorton-order space-filling curve Larger cutouts intersectlarger aligned regions of the Morton-order curve producinglarger contiguous IOs [23] We do not report performanceabove a 256M cutout size Beyond this point the buffersneeded by the application and Web server exceed memorycapacity

To realize peak throughput we had to initiate multiple re-quests in parallel The application stack runs each Web-service request on a single process thread and memory lim-its the per-thread performance Figure 11 shows throughputas a function of the number of parallel requests Through-put scales with the number of parallel requests beyond theeight physical cores of the machine to 16 when reading datafrom disk and to 32 when reading from memory Perfor-mance scales beyond the 8 cores owing to some combina-tion of overlapping IO with computation and hyperthread-ing Too much parallelism eventually leads to a reduction inthroughput because all hardware resources are fully utilizedand more parallelism introduces more resource contention

Parallelism is key to performance in the OCP Data Clusterand the easiest way to realize it is through initiating concur-rent Web-service requests Parallelizing individual cutoutswould benefit the performance of individual request butnot overall system throughput because multiple requests al-ready consume the memory resources of all processors Wenote that the Numpy Python library we use for array ma-nipulation implements parallel operators using BLAS butdoes not parallelize array slicing

Figure 12 shows the write throughput as a function ofthe volume size when uploading annotated volumes An-notations differ from cutouts in that the data are 32-bitsper voxel and are highly compressible For this experi-ment we uploaded a region dense manual annotations ofthe kasthuri11 data set in which more than 90 of voxelsare labeled Data compress to 6 the original size How-ever performance scales with the uncompressed size of theregion because that dictates how many cuboids must bemanipulated in memory This experiment uses 16 parallelrequests Write performance scales well up to 2MB cutoutsand is faster than read at the same cutout size because thedata compress much better However beyond 2MB trans-fers write throughput collapses The best performance overall configurations of 19 MBs does not compare well withthe 121 MBs when reading image cutouts Updating a vol-

Figure 13 Performance comparison of Databasenodes and SSD nodes when writing synapses (smallrandom writes)

ume of annotations is much more complex than a cutoutIt (1) reads the previous annotations (2) applies the newannotations to the volume database resolving conflicts ona per voxel basis (3) writes back the volume database (4)reads index entries for all new annotations (5) updates eachlist by unioning new and old cuboid locations and (5) writesback the index IO is doubled and index maintenance costsare added on top The critical performance issues is updat-ing the spatial index Parallel writes to the spatial indexresult in transaction retries and timeouts in MySQL dueto contention Often a single annotation volume will re-sult in the update of hundreds of index entries one for eachunique annotation identifier in the volume We can scale tolarger writes at the expense of using fewer parallel writersImproving annotation throughput is a high priority for theOCP Data Cluster

We also include an comparison of small write performancebetween an SSD node and a Database node The SSD nodeis a Dell R310 with an Intel Xeon 3430 with 4 24GHz cores16 GB of memory and two OCZ Vertex 4 drives config-ured in a RAID 0 array The experiment uploads all of thesynapse annotations in the kasthuri11 data in random or-der committing after each write Figure 13 shows that theSSD node achieves more than 150 the throughput of theDatabase array on small random writes The number ofsynapse writes per second is surprisingly low We write onlyabout 6 RAMON objects per second However a single Ra-mon object write generates updates to three different meta-data tables the index and the volume database The take-away from this experiment is that an inexpensive SSD node(lt$3000) offloads the write workload of an entire Databasenode (gt$18000) All IO in this experiment is randomIn practice we achieve much higher write throughput be-cause of locality and batching of requests Our synapsefinder workload uploaded more than 73 synapses a secondper node but that number reflects caching and effects whenupdating many synapses in a small region

6 RELATED WORKThe performance issues faced by the spatial databases in theOCP Data Cluster parallel those of SciDB [40 43] includingarray slicing selection queries joins and clusteringscale-out SciDB might benefit OCP by making data distributiontransparent and by offloading application functions imple-mented in Python to SciDB

Many of the physical design principles of OCP follow the Ar-rayStore of Soroush [39] They give a model for multi-levelpartitioning (chunking and tiling) that is more general thanOCP cuboids They can handle both regular and irregular

data ArrayStore extends decades of work on regular [6 24]and irregular [5] tiling The RasDaMan multidimensionalarray database [2] also builds spatial data services for im-age processing on top of a relational database They toohave explored the issues of tiling data organization anddata distribution [9] In contrast to general-purpose arraydatabases OCP has a focus on the specific application ofparallel computer vision for neuroscience and as such tunesits design to application concepts and properties Examplesinclude using the partitioning of space-filling curves in thedata distribution function and indexing of neural objects

OCPrsquos design incorporates many techniques developed inthe spatial data management community Samet [35] wrotethe authoritative text on the subject which we have used asa reference for region quad-trees space-filling curves tessel-lations and much more

OCP represents the a current state of evolution of the scale-out database architectures developed by the Institute forData-Intensive Engineering and Science at Johns HopkinsGray and Szalay developed the progenitor system in theSloan Digital Sky Survey [36] This has lead to more than 30different data products over the past decade including theJHU Turbulence Database Cluster [20] and the Life UnderYour Feet soil ecology portal [25]

7 FINAL THOUGHTSIn less than two years The Open Connectome Projecthas evolved from a problem statement to a data manage-ment and analysis platform for a community of neurosci-entists with the singular goal of mapping the brain High-throughput imaging instruments have driven data manage-ment off the workstation and out of the lab into data-intensive clusters and Web-services Also the scope of theproblem has forced experimental biologists to engage statis-ticians systems engineers machine learners and big-datascientists The Open Connectome Project aims to be theforum in which they all meet

The OCP Data Cluster has built a rich set of features tomeet the analysis needs of its users Application-specificcapabilities have been the development focus to date Theplatform manages data sets from many imaging modalities(Five to fifteen depending on how one counts them)

Presently our focus must change to throughput and scala-bility The near future holds two 100 teravoxel data setsmdasheach larger than the sum of all our data We will move to anew home in the Data-Scope cluster which will increase ouroutward-facing Internet bandwidth by a factor of forty Wealso expect the Connectomics community to expand rapidlyThe US Presidentrsquos Office recently announced a decade-longinitiative to build a comprehensive map of the human brain[21] This study reveals many limitations of the OCP datacluster and provides guidance to focus our efforts specif-ically toward servicing many small writes to record brainstructures and efficient and parallel manipulation of data toalleviate memory bottlenecks

AcknowledgmentsThe authors would like to thank additional members of theOpen Connectome Project that contributed to this workincluding Disa Mhembere and Ayushi Sinha We wouldalso like to thank our collaborators Forrest Collman AlbertoCardona Cameron Craddock Michael Milham Partha Mi-tra and Sebastian Seung

8 ADDITIONAL AUTHORSlowast Department of Computer Science and the Institute forData Intensive Engineering and Science Johns Hopkins Uni-versity

dagger Johns Hopkins University Applied Physics LaboratoryDagger Department of Statistical Science and Mathematics andthe Institute for Brain Science Duke Universitysect Janelia Farm Research Campus Howard Hughes MedicalInstitutepara Department of Molecular and Cellular Biology HarvardUniversity Department of Computational Neuroscience MassachusettsInstitute of Technologylowastlowast Allen Institute for Brain Sciencedaggerdagger Department of Physics and Astronomy and the Institutefor Data Intensive Engineering and Science Johns HopkinsUniversityDaggerDagger Department of Bioengineering Stanford Universitysectsect Department of Molecular and Cellular Physiology Stan-ford University

9 REFERENCES[1] D J Abadi S R Madden and M Ferreira

Integrating compression and execution incolumn-oriented database systems In SIGMOD 2006

[2] P Baumann A Dehmel P Furtado R Ritsch andB Widmann The multidimensional database systemRasDaMan In SIGMOD 1998

[3] D D Bock W-C A Lee A M Kerlin M LAndermann G Hood A W Wetzel S YurgensonE R Soucy H S Kim and R C Reid Networkanatomy and in vivo physiology of visual corticalneurons Nature 471(7337) 2011

[4] A Cardona S Saalfeld J SchindelinI Arganda-Carreras S Preibisch M LongairP Tomancak V Hartenstein and R J DouglasTrakEM2 software for neural circuit reconstructionPLoS ONE 7(6) 2012

[5] C Chang A Acharya A Sussman and J Saltz T2a customizable parallel database for multi-dimensionaldata SIGMOD Record 27(1) 1998

[6] C Chang B Moon A Acharya C ShockA Sussman and J Saltz Titan A high-performanceremote sensing database In ICDE 1997

[7] F Crow Summed-area tables for texture mapping InSIGGRAPH 1984

[8] R T Fielding R Taylor and N Richard Principleddesign of the modern Web architecture ACMTransactions on Internet Technology 2(2) 2002

[9] P Furtado and P Baumann Storage ofmultidimensional arrays based on arbitrary tiling InICDE 1999

[10] J Gray and A Szalay Science in an exponentialworld Nature 440(23) 23 March 2006

[11] A Guttman R-trees a dynamic index structure forspatial searching In ACM SIGMOD Conference 1984

[12] H S Seung et al Eyewire Available at eyewireorg2012

[13] V Jain H S Seung and S C Turaga Machines thatlearn to segment images a crucial technology forconnectomics Current opinion in neurobiology 20(5)2010

[14] V Jain S Turaga K Briggman W Denk andS Seung Learning to agglomerate superpixelhierarchies In Neural Information Processing Systems2011

[15] K Kanov R Burns G Eyink C Meneveau andA Szalay Data-intensive spatial filtering in largenumerical simulation datasets In Supercomputing2012

[16] N Kasthuri and J Lichtman Untitled Inpreparation 2013

[17] V Kaynig T Fuchs and J M BuhmannGeometrical consistent 3d tracing of neuronalprocesses in ssTEM data In MICCAI 2010

[18] M Kazhdan and H Hoppe Streaming multigrid forgradient-domain operations on large images ACMTransactions on Graphics 27 2008

[19] D M Kleissas W R Gray J M Burck J TVogelstein E Perlman P M Burlina R Burns andR J Vogelstein CAJAL3D toward a fully automaticpipeline for connectome estimation fromhigh-resolution em data In Neuroinformatics 2012

[20] Y Li E Perlman M Wang Y Yang C MeneveauR Burns S Chen A Szalay and G Eyink A publicturbulence database cluster and applications to studyLagrangian evolution of velocity increments inturbulence Journal of Turbulence 9(31)1ndash29 2008

[21] J Markoff Obama seeking to boost study of humanbrain New York Times February 17 2013

[22] K D Micheva B Busse N C Weiler N OrsquoRourkeand S J Smith Single-synapse analysis of a diversesynapse population Proteomic imaging methods andmarkers Neuron 68(1) 2010

[23] B Moon H V Jagadish C Faloutsos and J HSaltz Analysis of the clustering properties of theHilbert space-filling curve IEEE Transactions onKnowledge and Data Engineering 13(1) 2001

[24] B Moon and J Saltz Scalability analysis ofdeclustering methods for multidimensional rangequeries Trans of Knowledge and Data Engineering2(10) 1998

[25] R A Musaloiu-E A Terzis K Szlavecz A SzalayJ Cogan and J Gray Life under your feet Wirelesssensors in soil ecology In Embedded NetworkedSensors 2006

[26] M Nielsen Reinventing Discovery PrincetonUniversity Press 2011

[27] A Norton and J Clyne The VAPOR visualizationapplication In High Performance Visualization 2012

[28] N OrsquoRourke N C Weiler K D Micheva and S JSmith Deep molecular diversity of mammaliansynapses why it matters and how to measure itNature Reviews Neuroscience 13(1) 2012

[29] S Papadomanolakis A Ailamaki J C Lopez T TuD R OrsquoHallaron and G Heber Efficient queryprocessing on unstructured tetrahedral meshes InACM SIGMOD 2006

[30] E Perlman R Burns Y Li and C Meneveau Dataexploration of turbulence simulations using a database

cluster In Supercomputing 2007[31] A V Reina M Gelbart D Huang J Lichtman

E L Miller and H Pfister Segmentation fusion forconnectomics In International Conference onComputer Vision 2011

[32] A V Reina W-K Jeong J Lichtman andH Pfister The connectome project discovering thewiring of the brain ACM Crossroads 18(1)8ndash132011

[33] D E Rex J Q Ma and A W Toga The LONIpipeline processing environment Neuroimage 19(3)2003

[34] S Saalfeld A Cardona V Hartenstein andP Tomancak CATMAID collaborative annotationtoolkit for massive amounts of image dataBioinformatics 25(15) 2009

[35] H Samet Foundations of Multidimensional andMetric Data Structures Morgan Kaufmann PublishersInc San Francisco 2006

[36] The Sloan Digital Sky Survey 2013 Available athttpwwwsdssorg

[37] T Sellis N Roussopoulos and C Faloutsos TheR+-tree A dynamic index for multi-dimensionalobjects In VLDB 1987

[38] C Sommer C Straehle U Koethe and F AHamprecht rdquoilastik Interactive learning andsegmentation toolkitrdquo In Biomedical Imaging 2011

[39] E Soroush M Balazinska and D Wang ArraystoreA storage manager for complex parallel arrayprocessing In SIGMOD 2011

[40] M Stonebraker J Becla D DeWitt K-T LimD Maier O Ratzesberger and S ZdonikRequirements for science data bases and SciDB InConference on Innovative Data Systems Research2009

[41] A S Szalay K Church C Meneveau A Terzis andS Zeger MRI The Development of Data-Scopemdashamulti-petabyte generic data analysis environment forscience Available at httpswikiphajhueduescience_wikiimages77fDataScopepdf 2012

[42] F Tauheed L Biveinis T Heinis F SchurmannH Markram and A Ailamaki Accelerating rangequeries for brain simulations In ICDE 2012

[43] The SciDB Development Team Overview of SciDBlarge scale array storage processing and analysis InSIGMOD 2010

[44] K Wu FastBit Interactively searching massive dataJournal of Physics Conference Series 180(1) 2009

  • 1 Introduction
  • 2 Data and Applications
  • 3 Data Model
    • 31 Physical Design
    • 32 Annotations
    • 33 Tiles
    • 34 Data Cleaning
      • 4 System Design
        • 41 Architecture
        • 42 Web Services
          • 5 Preliminary Performance
          • 6 Related Work
          • 7 Final Thoughts
          • 8 Additional Authors
          • 9 References
Page 9: The Open Connectome Project Data Cluster: Scalable ... · The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience Randal Burns William

Figure 12 Throughput of writing annotations as afunction of the size of the annotated region

cache (Web-service costs) Beyond this point performancecontinues to increase albeit more slowly For aligned andunaligned reads we attribute the continued increase to theMorton-order space-filling curve Larger cutouts intersectlarger aligned regions of the Morton-order curve producinglarger contiguous IOs [23] We do not report performanceabove a 256M cutout size Beyond this point the buffersneeded by the application and Web server exceed memorycapacity

To realize peak throughput we had to initiate multiple re-quests in parallel The application stack runs each Web-service request on a single process thread and memory lim-its the per-thread performance Figure 11 shows throughputas a function of the number of parallel requests Through-put scales with the number of parallel requests beyond theeight physical cores of the machine to 16 when reading datafrom disk and to 32 when reading from memory Perfor-mance scales beyond the 8 cores owing to some combina-tion of overlapping IO with computation and hyperthread-ing Too much parallelism eventually leads to a reduction inthroughput because all hardware resources are fully utilizedand more parallelism introduces more resource contention

Parallelism is key to performance in the OCP Data Clusterand the easiest way to realize it is through initiating concur-rent Web-service requests Parallelizing individual cutoutswould benefit the performance of individual request butnot overall system throughput because multiple requests al-ready consume the memory resources of all processors Wenote that the Numpy Python library we use for array ma-nipulation implements parallel operators using BLAS butdoes not parallelize array slicing

Figure 12 shows the write throughput as a function ofthe volume size when uploading annotated volumes An-notations differ from cutouts in that the data are 32-bitsper voxel and are highly compressible For this experi-ment we uploaded a region dense manual annotations ofthe kasthuri11 data set in which more than 90 of voxelsare labeled Data compress to 6 the original size How-ever performance scales with the uncompressed size of theregion because that dictates how many cuboids must bemanipulated in memory This experiment uses 16 parallelrequests Write performance scales well up to 2MB cutoutsand is faster than read at the same cutout size because thedata compress much better However beyond 2MB trans-fers write throughput collapses The best performance overall configurations of 19 MBs does not compare well withthe 121 MBs when reading image cutouts Updating a vol-

Figure 13 Performance comparison of Databasenodes and SSD nodes when writing synapses (smallrandom writes)

ume of annotations is much more complex than a cutoutIt (1) reads the previous annotations (2) applies the newannotations to the volume database resolving conflicts ona per voxel basis (3) writes back the volume database (4)reads index entries for all new annotations (5) updates eachlist by unioning new and old cuboid locations and (5) writesback the index IO is doubled and index maintenance costsare added on top The critical performance issues is updat-ing the spatial index Parallel writes to the spatial indexresult in transaction retries and timeouts in MySQL dueto contention Often a single annotation volume will re-sult in the update of hundreds of index entries one for eachunique annotation identifier in the volume We can scale tolarger writes at the expense of using fewer parallel writersImproving annotation throughput is a high priority for theOCP Data Cluster

We also include an comparison of small write performancebetween an SSD node and a Database node The SSD nodeis a Dell R310 with an Intel Xeon 3430 with 4 24GHz cores16 GB of memory and two OCZ Vertex 4 drives config-ured in a RAID 0 array The experiment uploads all of thesynapse annotations in the kasthuri11 data in random or-der committing after each write Figure 13 shows that theSSD node achieves more than 150 the throughput of theDatabase array on small random writes The number ofsynapse writes per second is surprisingly low We write onlyabout 6 RAMON objects per second However a single Ra-mon object write generates updates to three different meta-data tables the index and the volume database The take-away from this experiment is that an inexpensive SSD node(lt$3000) offloads the write workload of an entire Databasenode (gt$18000) All IO in this experiment is randomIn practice we achieve much higher write throughput be-cause of locality and batching of requests Our synapsefinder workload uploaded more than 73 synapses a secondper node but that number reflects caching and effects whenupdating many synapses in a small region

6 RELATED WORKThe performance issues faced by the spatial databases in theOCP Data Cluster parallel those of SciDB [40 43] includingarray slicing selection queries joins and clusteringscale-out SciDB might benefit OCP by making data distributiontransparent and by offloading application functions imple-mented in Python to SciDB

Many of the physical design principles of OCP follow the Ar-rayStore of Soroush [39] They give a model for multi-levelpartitioning (chunking and tiling) that is more general thanOCP cuboids They can handle both regular and irregular

data ArrayStore extends decades of work on regular [6 24]and irregular [5] tiling The RasDaMan multidimensionalarray database [2] also builds spatial data services for im-age processing on top of a relational database They toohave explored the issues of tiling data organization anddata distribution [9] In contrast to general-purpose arraydatabases OCP has a focus on the specific application ofparallel computer vision for neuroscience and as such tunesits design to application concepts and properties Examplesinclude using the partitioning of space-filling curves in thedata distribution function and indexing of neural objects

OCPrsquos design incorporates many techniques developed inthe spatial data management community Samet [35] wrotethe authoritative text on the subject which we have used asa reference for region quad-trees space-filling curves tessel-lations and much more

OCP represents the a current state of evolution of the scale-out database architectures developed by the Institute forData-Intensive Engineering and Science at Johns HopkinsGray and Szalay developed the progenitor system in theSloan Digital Sky Survey [36] This has lead to more than 30different data products over the past decade including theJHU Turbulence Database Cluster [20] and the Life UnderYour Feet soil ecology portal [25]

7 FINAL THOUGHTSIn less than two years The Open Connectome Projecthas evolved from a problem statement to a data manage-ment and analysis platform for a community of neurosci-entists with the singular goal of mapping the brain High-throughput imaging instruments have driven data manage-ment off the workstation and out of the lab into data-intensive clusters and Web-services Also the scope of theproblem has forced experimental biologists to engage statis-ticians systems engineers machine learners and big-datascientists The Open Connectome Project aims to be theforum in which they all meet

The OCP Data Cluster has built a rich set of features tomeet the analysis needs of its users Application-specificcapabilities have been the development focus to date Theplatform manages data sets from many imaging modalities(Five to fifteen depending on how one counts them)

Presently our focus must change to throughput and scala-bility The near future holds two 100 teravoxel data setsmdasheach larger than the sum of all our data We will move to anew home in the Data-Scope cluster which will increase ouroutward-facing Internet bandwidth by a factor of forty Wealso expect the Connectomics community to expand rapidlyThe US Presidentrsquos Office recently announced a decade-longinitiative to build a comprehensive map of the human brain[21] This study reveals many limitations of the OCP datacluster and provides guidance to focus our efforts specif-ically toward servicing many small writes to record brainstructures and efficient and parallel manipulation of data toalleviate memory bottlenecks

AcknowledgmentsThe authors would like to thank additional members of theOpen Connectome Project that contributed to this workincluding Disa Mhembere and Ayushi Sinha We wouldalso like to thank our collaborators Forrest Collman AlbertoCardona Cameron Craddock Michael Milham Partha Mi-tra and Sebastian Seung

8 ADDITIONAL AUTHORSlowast Department of Computer Science and the Institute forData Intensive Engineering and Science Johns Hopkins Uni-versity

dagger Johns Hopkins University Applied Physics LaboratoryDagger Department of Statistical Science and Mathematics andthe Institute for Brain Science Duke Universitysect Janelia Farm Research Campus Howard Hughes MedicalInstitutepara Department of Molecular and Cellular Biology HarvardUniversity Department of Computational Neuroscience MassachusettsInstitute of Technologylowastlowast Allen Institute for Brain Sciencedaggerdagger Department of Physics and Astronomy and the Institutefor Data Intensive Engineering and Science Johns HopkinsUniversityDaggerDagger Department of Bioengineering Stanford Universitysectsect Department of Molecular and Cellular Physiology Stan-ford University

9 REFERENCES[1] D J Abadi S R Madden and M Ferreira

Integrating compression and execution incolumn-oriented database systems In SIGMOD 2006

[2] P Baumann A Dehmel P Furtado R Ritsch andB Widmann The multidimensional database systemRasDaMan In SIGMOD 1998

[3] D D Bock W-C A Lee A M Kerlin M LAndermann G Hood A W Wetzel S YurgensonE R Soucy H S Kim and R C Reid Networkanatomy and in vivo physiology of visual corticalneurons Nature 471(7337) 2011

[4] A Cardona S Saalfeld J SchindelinI Arganda-Carreras S Preibisch M LongairP Tomancak V Hartenstein and R J DouglasTrakEM2 software for neural circuit reconstructionPLoS ONE 7(6) 2012

[5] C Chang A Acharya A Sussman and J Saltz T2a customizable parallel database for multi-dimensionaldata SIGMOD Record 27(1) 1998

[6] C Chang B Moon A Acharya C ShockA Sussman and J Saltz Titan A high-performanceremote sensing database In ICDE 1997

[7] F Crow Summed-area tables for texture mapping InSIGGRAPH 1984

[8] R T Fielding R Taylor and N Richard Principleddesign of the modern Web architecture ACMTransactions on Internet Technology 2(2) 2002

[9] P Furtado and P Baumann Storage ofmultidimensional arrays based on arbitrary tiling InICDE 1999

[10] J Gray and A Szalay Science in an exponentialworld Nature 440(23) 23 March 2006

[11] A Guttman R-trees a dynamic index structure forspatial searching In ACM SIGMOD Conference 1984

[12] H S Seung et al Eyewire Available at eyewireorg2012

[13] V Jain H S Seung and S C Turaga Machines thatlearn to segment images a crucial technology forconnectomics Current opinion in neurobiology 20(5)2010

[14] V Jain S Turaga K Briggman W Denk andS Seung Learning to agglomerate superpixelhierarchies In Neural Information Processing Systems2011

[15] K Kanov R Burns G Eyink C Meneveau andA Szalay Data-intensive spatial filtering in largenumerical simulation datasets In Supercomputing2012

[16] N Kasthuri and J Lichtman Untitled Inpreparation 2013

[17] V Kaynig T Fuchs and J M BuhmannGeometrical consistent 3d tracing of neuronalprocesses in ssTEM data In MICCAI 2010

[18] M Kazhdan and H Hoppe Streaming multigrid forgradient-domain operations on large images ACMTransactions on Graphics 27 2008

[19] D M Kleissas W R Gray J M Burck J TVogelstein E Perlman P M Burlina R Burns andR J Vogelstein CAJAL3D toward a fully automaticpipeline for connectome estimation fromhigh-resolution em data In Neuroinformatics 2012

[20] Y Li E Perlman M Wang Y Yang C MeneveauR Burns S Chen A Szalay and G Eyink A publicturbulence database cluster and applications to studyLagrangian evolution of velocity increments inturbulence Journal of Turbulence 9(31)1ndash29 2008

[21] J Markoff Obama seeking to boost study of humanbrain New York Times February 17 2013

[22] K D Micheva B Busse N C Weiler N OrsquoRourkeand S J Smith Single-synapse analysis of a diversesynapse population Proteomic imaging methods andmarkers Neuron 68(1) 2010

[23] B Moon H V Jagadish C Faloutsos and J HSaltz Analysis of the clustering properties of theHilbert space-filling curve IEEE Transactions onKnowledge and Data Engineering 13(1) 2001

[24] B Moon and J Saltz Scalability analysis ofdeclustering methods for multidimensional rangequeries Trans of Knowledge and Data Engineering2(10) 1998

[25] R A Musaloiu-E A Terzis K Szlavecz A SzalayJ Cogan and J Gray Life under your feet Wirelesssensors in soil ecology In Embedded NetworkedSensors 2006

[26] M Nielsen Reinventing Discovery PrincetonUniversity Press 2011

[27] A Norton and J Clyne The VAPOR visualizationapplication In High Performance Visualization 2012

[28] N OrsquoRourke N C Weiler K D Micheva and S JSmith Deep molecular diversity of mammaliansynapses why it matters and how to measure itNature Reviews Neuroscience 13(1) 2012

[29] S Papadomanolakis A Ailamaki J C Lopez T TuD R OrsquoHallaron and G Heber Efficient queryprocessing on unstructured tetrahedral meshes InACM SIGMOD 2006

[30] E Perlman R Burns Y Li and C Meneveau Dataexploration of turbulence simulations using a database

cluster In Supercomputing 2007[31] A V Reina M Gelbart D Huang J Lichtman

E L Miller and H Pfister Segmentation fusion forconnectomics In International Conference onComputer Vision 2011

[32] A V Reina W-K Jeong J Lichtman andH Pfister The connectome project discovering thewiring of the brain ACM Crossroads 18(1)8ndash132011

[33] D E Rex J Q Ma and A W Toga The LONIpipeline processing environment Neuroimage 19(3)2003

[34] S Saalfeld A Cardona V Hartenstein andP Tomancak CATMAID collaborative annotationtoolkit for massive amounts of image dataBioinformatics 25(15) 2009

[35] H Samet Foundations of Multidimensional andMetric Data Structures Morgan Kaufmann PublishersInc San Francisco 2006

[36] The Sloan Digital Sky Survey 2013 Available athttpwwwsdssorg

[37] T Sellis N Roussopoulos and C Faloutsos TheR+-tree A dynamic index for multi-dimensionalobjects In VLDB 1987

[38] C Sommer C Straehle U Koethe and F AHamprecht rdquoilastik Interactive learning andsegmentation toolkitrdquo In Biomedical Imaging 2011

[39] E Soroush M Balazinska and D Wang ArraystoreA storage manager for complex parallel arrayprocessing In SIGMOD 2011

[40] M Stonebraker J Becla D DeWitt K-T LimD Maier O Ratzesberger and S ZdonikRequirements for science data bases and SciDB InConference on Innovative Data Systems Research2009

[41] A S Szalay K Church C Meneveau A Terzis andS Zeger MRI The Development of Data-Scopemdashamulti-petabyte generic data analysis environment forscience Available at httpswikiphajhueduescience_wikiimages77fDataScopepdf 2012

[42] F Tauheed L Biveinis T Heinis F SchurmannH Markram and A Ailamaki Accelerating rangequeries for brain simulations In ICDE 2012

[43] The SciDB Development Team Overview of SciDBlarge scale array storage processing and analysis InSIGMOD 2010

[44] K Wu FastBit Interactively searching massive dataJournal of Physics Conference Series 180(1) 2009

  • 1 Introduction
  • 2 Data and Applications
  • 3 Data Model
    • 31 Physical Design
    • 32 Annotations
    • 33 Tiles
    • 34 Data Cleaning
      • 4 System Design
        • 41 Architecture
        • 42 Web Services
          • 5 Preliminary Performance
          • 6 Related Work
          • 7 Final Thoughts
          • 8 Additional Authors
          • 9 References
Page 10: The Open Connectome Project Data Cluster: Scalable ... · The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience Randal Burns William

data ArrayStore extends decades of work on regular [6 24]and irregular [5] tiling The RasDaMan multidimensionalarray database [2] also builds spatial data services for im-age processing on top of a relational database They toohave explored the issues of tiling data organization anddata distribution [9] In contrast to general-purpose arraydatabases OCP has a focus on the specific application ofparallel computer vision for neuroscience and as such tunesits design to application concepts and properties Examplesinclude using the partitioning of space-filling curves in thedata distribution function and indexing of neural objects

OCPrsquos design incorporates many techniques developed inthe spatial data management community Samet [35] wrotethe authoritative text on the subject which we have used asa reference for region quad-trees space-filling curves tessel-lations and much more

OCP represents the a current state of evolution of the scale-out database architectures developed by the Institute forData-Intensive Engineering and Science at Johns HopkinsGray and Szalay developed the progenitor system in theSloan Digital Sky Survey [36] This has lead to more than 30different data products over the past decade including theJHU Turbulence Database Cluster [20] and the Life UnderYour Feet soil ecology portal [25]

7 FINAL THOUGHTSIn less than two years The Open Connectome Projecthas evolved from a problem statement to a data manage-ment and analysis platform for a community of neurosci-entists with the singular goal of mapping the brain High-throughput imaging instruments have driven data manage-ment off the workstation and out of the lab into data-intensive clusters and Web-services Also the scope of theproblem has forced experimental biologists to engage statis-ticians systems engineers machine learners and big-datascientists The Open Connectome Project aims to be theforum in which they all meet

The OCP Data Cluster has built a rich set of features tomeet the analysis needs of its users Application-specificcapabilities have been the development focus to date Theplatform manages data sets from many imaging modalities(Five to fifteen depending on how one counts them)

Presently our focus must change to throughput and scala-bility The near future holds two 100 teravoxel data setsmdasheach larger than the sum of all our data We will move to anew home in the Data-Scope cluster which will increase ouroutward-facing Internet bandwidth by a factor of forty Wealso expect the Connectomics community to expand rapidlyThe US Presidentrsquos Office recently announced a decade-longinitiative to build a comprehensive map of the human brain[21] This study reveals many limitations of the OCP datacluster and provides guidance to focus our efforts specif-ically toward servicing many small writes to record brainstructures and efficient and parallel manipulation of data toalleviate memory bottlenecks

AcknowledgmentsThe authors would like to thank additional members of theOpen Connectome Project that contributed to this workincluding Disa Mhembere and Ayushi Sinha We wouldalso like to thank our collaborators Forrest Collman AlbertoCardona Cameron Craddock Michael Milham Partha Mi-tra and Sebastian Seung

8 ADDITIONAL AUTHORSlowast Department of Computer Science and the Institute forData Intensive Engineering and Science Johns Hopkins Uni-versity

dagger Johns Hopkins University Applied Physics LaboratoryDagger Department of Statistical Science and Mathematics andthe Institute for Brain Science Duke Universitysect Janelia Farm Research Campus Howard Hughes MedicalInstitutepara Department of Molecular and Cellular Biology HarvardUniversity Department of Computational Neuroscience MassachusettsInstitute of Technologylowastlowast Allen Institute for Brain Sciencedaggerdagger Department of Physics and Astronomy and the Institutefor Data Intensive Engineering and Science Johns HopkinsUniversityDaggerDagger Department of Bioengineering Stanford Universitysectsect Department of Molecular and Cellular Physiology Stan-ford University

9 REFERENCES[1] D J Abadi S R Madden and M Ferreira

Integrating compression and execution incolumn-oriented database systems In SIGMOD 2006

[2] P Baumann A Dehmel P Furtado R Ritsch andB Widmann The multidimensional database systemRasDaMan In SIGMOD 1998

[3] D D Bock W-C A Lee A M Kerlin M LAndermann G Hood A W Wetzel S YurgensonE R Soucy H S Kim and R C Reid Networkanatomy and in vivo physiology of visual corticalneurons Nature 471(7337) 2011

[4] A Cardona S Saalfeld J SchindelinI Arganda-Carreras S Preibisch M LongairP Tomancak V Hartenstein and R J DouglasTrakEM2 software for neural circuit reconstructionPLoS ONE 7(6) 2012

[5] C Chang A Acharya A Sussman and J Saltz T2a customizable parallel database for multi-dimensionaldata SIGMOD Record 27(1) 1998

[6] C Chang B Moon A Acharya C ShockA Sussman and J Saltz Titan A high-performanceremote sensing database In ICDE 1997

[7] F Crow Summed-area tables for texture mapping InSIGGRAPH 1984

[8] R T Fielding R Taylor and N Richard Principleddesign of the modern Web architecture ACMTransactions on Internet Technology 2(2) 2002

[9] P Furtado and P Baumann Storage ofmultidimensional arrays based on arbitrary tiling InICDE 1999

[10] J Gray and A Szalay Science in an exponentialworld Nature 440(23) 23 March 2006

[11] A Guttman R-trees a dynamic index structure forspatial searching In ACM SIGMOD Conference 1984

[12] H S Seung et al Eyewire Available at eyewireorg2012

[13] V Jain H S Seung and S C Turaga Machines thatlearn to segment images a crucial technology forconnectomics Current opinion in neurobiology 20(5)2010

[14] V Jain S Turaga K Briggman W Denk andS Seung Learning to agglomerate superpixelhierarchies In Neural Information Processing Systems2011

[15] K Kanov R Burns G Eyink C Meneveau andA Szalay Data-intensive spatial filtering in largenumerical simulation datasets In Supercomputing2012

[16] N Kasthuri and J Lichtman Untitled Inpreparation 2013

[17] V Kaynig T Fuchs and J M BuhmannGeometrical consistent 3d tracing of neuronalprocesses in ssTEM data In MICCAI 2010

[18] M Kazhdan and H Hoppe Streaming multigrid forgradient-domain operations on large images ACMTransactions on Graphics 27 2008

[19] D M Kleissas W R Gray J M Burck J TVogelstein E Perlman P M Burlina R Burns andR J Vogelstein CAJAL3D toward a fully automaticpipeline for connectome estimation fromhigh-resolution em data In Neuroinformatics 2012

[20] Y Li E Perlman M Wang Y Yang C MeneveauR Burns S Chen A Szalay and G Eyink A publicturbulence database cluster and applications to studyLagrangian evolution of velocity increments inturbulence Journal of Turbulence 9(31)1ndash29 2008

[21] J Markoff Obama seeking to boost study of humanbrain New York Times February 17 2013

[22] K D Micheva B Busse N C Weiler N OrsquoRourkeand S J Smith Single-synapse analysis of a diversesynapse population Proteomic imaging methods andmarkers Neuron 68(1) 2010

[23] B Moon H V Jagadish C Faloutsos and J HSaltz Analysis of the clustering properties of theHilbert space-filling curve IEEE Transactions onKnowledge and Data Engineering 13(1) 2001

[24] B Moon and J Saltz Scalability analysis ofdeclustering methods for multidimensional rangequeries Trans of Knowledge and Data Engineering2(10) 1998

[25] R A Musaloiu-E A Terzis K Szlavecz A SzalayJ Cogan and J Gray Life under your feet Wirelesssensors in soil ecology In Embedded NetworkedSensors 2006

[26] M Nielsen Reinventing Discovery PrincetonUniversity Press 2011

[27] A Norton and J Clyne The VAPOR visualizationapplication In High Performance Visualization 2012

[28] N OrsquoRourke N C Weiler K D Micheva and S JSmith Deep molecular diversity of mammaliansynapses why it matters and how to measure itNature Reviews Neuroscience 13(1) 2012

[29] S Papadomanolakis A Ailamaki J C Lopez T TuD R OrsquoHallaron and G Heber Efficient queryprocessing on unstructured tetrahedral meshes InACM SIGMOD 2006

[30] E Perlman R Burns Y Li and C Meneveau Dataexploration of turbulence simulations using a database

cluster In Supercomputing 2007[31] A V Reina M Gelbart D Huang J Lichtman

E L Miller and H Pfister Segmentation fusion forconnectomics In International Conference onComputer Vision 2011

[32] A V Reina W-K Jeong J Lichtman andH Pfister The connectome project discovering thewiring of the brain ACM Crossroads 18(1)8ndash132011

[33] D E Rex J Q Ma and A W Toga The LONIpipeline processing environment Neuroimage 19(3)2003

[34] S Saalfeld A Cardona V Hartenstein andP Tomancak CATMAID collaborative annotationtoolkit for massive amounts of image dataBioinformatics 25(15) 2009

[35] H Samet Foundations of Multidimensional andMetric Data Structures Morgan Kaufmann PublishersInc San Francisco 2006

[36] The Sloan Digital Sky Survey 2013 Available athttpwwwsdssorg

[37] T Sellis N Roussopoulos and C Faloutsos TheR+-tree A dynamic index for multi-dimensionalobjects In VLDB 1987

[38] C Sommer C Straehle U Koethe and F AHamprecht rdquoilastik Interactive learning andsegmentation toolkitrdquo In Biomedical Imaging 2011

[39] E Soroush M Balazinska and D Wang ArraystoreA storage manager for complex parallel arrayprocessing In SIGMOD 2011

[40] M Stonebraker J Becla D DeWitt K-T LimD Maier O Ratzesberger and S ZdonikRequirements for science data bases and SciDB InConference on Innovative Data Systems Research2009

[41] A S Szalay K Church C Meneveau A Terzis andS Zeger MRI The Development of Data-Scopemdashamulti-petabyte generic data analysis environment forscience Available at httpswikiphajhueduescience_wikiimages77fDataScopepdf 2012

[42] F Tauheed L Biveinis T Heinis F SchurmannH Markram and A Ailamaki Accelerating rangequeries for brain simulations In ICDE 2012

[43] The SciDB Development Team Overview of SciDBlarge scale array storage processing and analysis InSIGMOD 2010

[44] K Wu FastBit Interactively searching massive dataJournal of Physics Conference Series 180(1) 2009

  • 1 Introduction
  • 2 Data and Applications
  • 3 Data Model
    • 31 Physical Design
    • 32 Annotations
    • 33 Tiles
    • 34 Data Cleaning
      • 4 System Design
        • 41 Architecture
        • 42 Web Services
          • 5 Preliminary Performance
          • 6 Related Work
          • 7 Final Thoughts
          • 8 Additional Authors
          • 9 References
Page 11: The Open Connectome Project Data Cluster: Scalable ... · The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience Randal Burns William

[17] V Kaynig T Fuchs and J M BuhmannGeometrical consistent 3d tracing of neuronalprocesses in ssTEM data In MICCAI 2010

[18] M Kazhdan and H Hoppe Streaming multigrid forgradient-domain operations on large images ACMTransactions on Graphics 27 2008

[19] D M Kleissas W R Gray J M Burck J TVogelstein E Perlman P M Burlina R Burns andR J Vogelstein CAJAL3D toward a fully automaticpipeline for connectome estimation fromhigh-resolution em data In Neuroinformatics 2012

[20] Y Li E Perlman M Wang Y Yang C MeneveauR Burns S Chen A Szalay and G Eyink A publicturbulence database cluster and applications to studyLagrangian evolution of velocity increments inturbulence Journal of Turbulence 9(31)1ndash29 2008

[21] J Markoff Obama seeking to boost study of humanbrain New York Times February 17 2013

[22] K D Micheva B Busse N C Weiler N OrsquoRourkeand S J Smith Single-synapse analysis of a diversesynapse population Proteomic imaging methods andmarkers Neuron 68(1) 2010

[23] B Moon H V Jagadish C Faloutsos and J HSaltz Analysis of the clustering properties of theHilbert space-filling curve IEEE Transactions onKnowledge and Data Engineering 13(1) 2001

[24] B Moon and J Saltz Scalability analysis ofdeclustering methods for multidimensional rangequeries Trans of Knowledge and Data Engineering2(10) 1998

[25] R A Musaloiu-E A Terzis K Szlavecz A SzalayJ Cogan and J Gray Life under your feet Wirelesssensors in soil ecology In Embedded NetworkedSensors 2006

[26] M Nielsen Reinventing Discovery PrincetonUniversity Press 2011

[27] A Norton and J Clyne The VAPOR visualizationapplication In High Performance Visualization 2012

[28] N OrsquoRourke N C Weiler K D Micheva and S JSmith Deep molecular diversity of mammaliansynapses why it matters and how to measure itNature Reviews Neuroscience 13(1) 2012

[29] S Papadomanolakis A Ailamaki J C Lopez T TuD R OrsquoHallaron and G Heber Efficient queryprocessing on unstructured tetrahedral meshes InACM SIGMOD 2006

[30] E Perlman R Burns Y Li and C Meneveau Dataexploration of turbulence simulations using a database

cluster In Supercomputing 2007[31] A V Reina M Gelbart D Huang J Lichtman

E L Miller and H Pfister Segmentation fusion forconnectomics In International Conference onComputer Vision 2011

[32] A V Reina W-K Jeong J Lichtman andH Pfister The connectome project discovering thewiring of the brain ACM Crossroads 18(1)8ndash132011

[33] D E Rex J Q Ma and A W Toga The LONIpipeline processing environment Neuroimage 19(3)2003

[34] S Saalfeld A Cardona V Hartenstein andP Tomancak CATMAID collaborative annotationtoolkit for massive amounts of image dataBioinformatics 25(15) 2009

[35] H Samet Foundations of Multidimensional andMetric Data Structures Morgan Kaufmann PublishersInc San Francisco 2006

[36] The Sloan Digital Sky Survey 2013 Available athttpwwwsdssorg

[37] T Sellis N Roussopoulos and C Faloutsos TheR+-tree A dynamic index for multi-dimensionalobjects In VLDB 1987

[38] C Sommer C Straehle U Koethe and F AHamprecht rdquoilastik Interactive learning andsegmentation toolkitrdquo In Biomedical Imaging 2011

[39] E Soroush M Balazinska and D Wang ArraystoreA storage manager for complex parallel arrayprocessing In SIGMOD 2011

[40] M Stonebraker J Becla D DeWitt K-T LimD Maier O Ratzesberger and S ZdonikRequirements for science data bases and SciDB InConference on Innovative Data Systems Research2009

[41] A S Szalay K Church C Meneveau A Terzis andS Zeger MRI The Development of Data-Scopemdashamulti-petabyte generic data analysis environment forscience Available at httpswikiphajhueduescience_wikiimages77fDataScopepdf 2012

[42] F Tauheed L Biveinis T Heinis F SchurmannH Markram and A Ailamaki Accelerating rangequeries for brain simulations In ICDE 2012

[43] The SciDB Development Team Overview of SciDBlarge scale array storage processing and analysis InSIGMOD 2010

[44] K Wu FastBit Interactively searching massive dataJournal of Physics Conference Series 180(1) 2009

  • 1 Introduction
  • 2 Data and Applications
  • 3 Data Model
    • 31 Physical Design
    • 32 Annotations
    • 33 Tiles
    • 34 Data Cleaning
      • 4 System Design
        • 41 Architecture
        • 42 Web Services
          • 5 Preliminary Performance
          • 6 Related Work
          • 7 Final Thoughts
          • 8 Additional Authors
          • 9 References