extreme scale analytics on spatio -temporal datasets

10
Extreme Scale Analytics on Spatio-Temporal Datasets Joel Saltz Center for Comprehensive Informatics & Biomedical Informatics Department Emory University

Upload: rendor

Post on 23-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Extreme Scale Analytics on Spatio -Temporal Datasets. Joel Saltz Center for Comprehensive Informatics & Biomedical Informatics Department Emory University. Morphometric Image Analysis Pipeline. Preprocessing: normalization, tiling, etc. Segmentation: identify nuclei as objects - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Extreme Scale  Analytics  on  Spatio -Temporal Datasets

Extreme Scale Analytics on Spatio-Temporal Datasets

Joel SaltzCenter for Comprehensive Informatics &

Biomedical Informatics DepartmentEmory University

Page 2: Extreme Scale  Analytics  on  Spatio -Temporal Datasets

Morphometric Image Analysis Pipeline

• Preprocessing: normalization, tiling, etc.• Segmentation: identify nuclei as objects• Feature Extraction: compute morphometric features• Classification: unsupervised learning (k-means) after

patient-level aggregation and analysis

Page 3: Extreme Scale  Analytics  on  Spatio -Temporal Datasets

Satellite Data Analysis for Monitoring and Change Analysis

Page 4: Extreme Scale  Analytics  on  Spatio -Temporal Datasets

Subsurface Reservoir Management

• Numerical models of porous media – Fluids flow from one region of reservoir to another region – Rock and sediment properties change over time

• Simulate multiple realizations of multiple models and management strategies

• Evaluate geologic uncertainty and management strategies simultaneously

• Enable on-demand exploration and comparison of multiple scenarios

Page 5: Extreme Scale  Analytics  on  Spatio -Temporal Datasets

Core Operation Categories and PatternsCore Operation Category Operations Data Access Patterns and

Computational ComplexityData Cleaning and Low Level Transformations

Transformations to reduce effects of sensor/measurement artifacts. Transform sensor acquired measurements to domain specific variables.

Mainly local and regular data access patterns. Moderate computational complexity.

Data Subsetting, Filtering, Subsampling

Select portions of a dataset corresponding to regions in atlas and/or time intervals. Select portions of a dataset based on value ranges (e.g., regions with temperature larger than X degrees). Subsample data to reduce resolution and data size.

Local data access patterns as well as indexed access. Low to moderate, mainly data intensive computations.

Spatio-temporal Mapping and Registration

Map datasets to an atlas. Resolve data redundancy at tile boundaries to form mosaics. Create composite dataset from multiple spatially co-incident datasets. Create derived dataset from spatially co-incident datasets obtained at different times.

Irregular local and global data access patterns. Moderate to high computational complexity.

Object Segmentation Segment “base level” objects such as nuclei, buildings, lakes. Extract features from “base level” objects.

Irregular, but primarily local, data access patterns. High computational complexity.

Object Classification Classify “base level” objects through possibly iterative combination of clustering, machine learning and human input (active learning).

Irregular and global data access patterns. High computational complexity.

Spatio-temporal Aggregation

Construct “high level” objects composed of classified “base level” object aggregates, e.g., residential areas vs industrial complexes. Compute time-series aggregates over a given imaged area.

Primarily local with a crucial global component for aggregation. Moderate/high computation complexity.

Change Detection, Comparison, and Quantification

Quantify changes over time in domain specific low level variables, base level objects and high level objects. Construct “change objects” to describe changes in low level domain specific variables, base level and high level objects. Spatial queries for selecting and comparing segmented regions and objects.

Compute and data-intensive computations. Mixture of local and global data access patterns as well as indexed access.

Page 6: Extreme Scale  Analytics  on  Spatio -Temporal Datasets

Challenges• Spatial-temporal disk-resident, on-the-fly, dynamically

updated datasets• Access and manipulate multiple datasets generated and

stored on multiple, distributed systems• Analysis of raw data can generate millions to trillions of

features (e.g., millions of cells and nuclei in high resolution tissue images) to be mined and compared

• Take advantage of hardware platforms for analysis – Clusters containing hybrid CPU-GPU nodes – Extreme scale machines consisting of hundreds of thousands of CPU

cores– Systems with deep memory and storage hierarchies – Cloud computing platforms

Page 7: Extreme Scale  Analytics  on  Spatio -Temporal Datasets

Using Hybrid CPU-GPU Systems

Page 8: Extreme Scale  Analytics  on  Spatio -Temporal Datasets

Data Structures: Region Templates

• Describe 2D/3D static and temporal regions. • Provides a container for points, arrays, regions, and object sets

within a spatial and temporal bounding box. • A region template can represent collections of spatial areas and

objects where these entities vary from one another in size and shape; e.g. regions generated by segmenting cells in microscopy images, man-made structures or hurricanes in satellite imagery.

• Primary datasets are defined as point data elements and arrays, and derived datasets as sets of regions and objects.

• Region templates may be related to one another in a defined manner.

Page 9: Extreme Scale  Analytics  on  Spatio -Temporal Datasets

Programming Abstractions and Runtime Middleware Services

• Programming abstractions– Multi-level dataflow pipelines– MapReduce style programs– Spatial query capabilities

• I/O and Storage Services– Indexing and metadata management for ensembles of datasets– I/O support for retrieving data from multiple storage systems and for streaming data– Query capabilities

• Memory Management– Careful management and staging of large data structures across memory hierarchies. Masking data

movement costs with computation.

• Execution Services– Distributing and rearranging computations and data to minimize data movement – Coordinated scheduling and mapping of analysis operations to heterogeneous and hybrid (CPU cores

and GPUs) systems to increase overall application throughput– Quality of service/data requirements– Function variants

• Provenance Tracking, Fault-detection and tolerance

Page 10: Extreme Scale  Analytics  on  Spatio -Temporal Datasets

End