1 this work partially funded by nsf grants iis-9732897, iris-9729878 and iis-0119276 matthew o....
Post on 19-Dec-2015
214 views
TRANSCRIPT
1
This work partially funded by NSF Grants IIS-9732897, IRIS-9729878 and IIS-0119276
Matthew O. Ward, Elke A. Rundensteiner,Jing Yang, Punit Doshi, Geraldine Rosario,
Allen R. Martin, Ying-Huey Fua, Daniel Stroe
http://davis.wpi.edu/~xmdv
XmdvToolInteractive Visual Data Exploration System
for High-dimensional Data Sets
Worcester Polytechnic Institute
2
XmdvTool Features
• Hierarchical visualization and interaction tools for exploring very large high-dimensional data sets to discover patterns, trends and outliers
• Applications: Bioterrorism Detection Bioinformatics and Drug Discovery Space Science Geology and Geochemistry Systems Monitoring and Performance Evaluation Economics and Business Simulation Design and Analysis
• Multi-platform support (Unix, Linux, Windows)• Public domain software: http://davis.wpi.edu/~xmdv
3
• Scale-up to High Dimensions: Visual Hierarchical Dimension Reduction
• Scale-up to Large Data Sets: Interactive Hierarchical Displays, Database Backend with Minmax Encoding, Semantic Caching and Adaptive Prefetching
• Interlinked Multi-Displays: Parallel Coordinates, Glyphs, Scatterplot Matrices, Dimensional Stacking
• Visual Interaction Tools: N-Dimensional Brushes, Structure-Based Brushing, InterRing
Xmdv: Main Features
4
Scale-Up for Large Number of Dimensions
Solution to High Dimensional Datasets:• Group Similar Dimensions into
Dimension Hierarchy• Navigate Dimension Hierarchy by
InterRing• Form Lower Dimensional Spaces by
Dimension Clusters• Convey Dimension Cluster
Information by Dissimilarity Display
5
Visual Hierarchical Dimension Reduction Process
6
A 42-dimensional Data Set
Dimension Hierarchy Interaction Tool:
InterRing
A 4-Dimensional Subspace
Visual Hierarchical Dimension Reduction Process
7
InterRing - Dimension Hierarchy Navigation and Manipulation
Roll-up/Drill-down Rotate Zoom in/out
Distort Modify
8
Dissimilarity Display
Three Axes Method
Mean-Band Method
Diagonal Plot Method
Axis Width Method
9
Scale-up for Large Number of Records
Solution to Large Scale Datasets:• Group Similar Records into
Data Hierarchy • Navigate Data Hierarchy by
Structure-Based Brushing• Represent Data Clusters by
Mean-Band Method • Provide Database Backend Support
using MinMax Tree, Caching, Prefetching
10
2D example
Interactive Hierarchical Display
Hierarchical Clustering Structure-Based Brushing
11
Flat Display Hierarchical Display
Interactive Hierarchical Display
Mean-Band Method in Parallel Coordinates
12
Flat Display Hierarchical Display
Mean-Band Method in Parallel Coordinates
Interactive Hierarchical Display
13
Scalability of Data Access
• Approach• Attach database system to visualization front-end
• MinMax hierarchy encoding• Key idea: avoid recursive processing
• Pre-computed
• Caching• Key idea: reduce response time and network traffic
• Prefetching• Key idea: use application hints and predict user patternsapplication hints and predict user patterns
• Performed during idle timePerformed during idle time
14
• Pre-compute object positions
– level-of-detail (L)
– extent values (x,y)
– preserve tree structure
• New query semantics
– objects are now rectangles
– select objects that touch L
– select objects that touch (x, y)
– structure-based brush = intersection of two selections
Scalability of Data Access:MinMax Hierarchy Encoding
level of detail
extent values
L
x y
query = (x, y, L) x y
L
15
• Purpose• reduce response time and network traffic
• Issues• visual query cannot directly translate into object IDs high-level cache specification to avoid complete scans
• Semantic caching• queries are cached rather than objects• minimize cost of cache lookup• dynamically adapt cached queries to patterns of queries
Scalability of Data Access: Caching
16
• Strategy– Speculative (no specific hints)
– navigation remains locallocal – both user user and data setdata set influence exploration
– Adaptive (strategy changes over time)– Evolves as more knowledge becomes available
– Non-pure (interruptible prefetching)– leave buffer in consistent consistent state
• Requirements– non-pure prefetching + large transactions & small object
size + semantic caching small granularity (object level)– speculative, non-pure prefetcher cache replacement
policy + guessing method
Scalability of Data Access: Prefetching
17
Conclusions: Caching reduces response time by 80% Prefetching further reduces response time by 30% Designing better prefetching strategies might help
further reduce response time
Effectiveness of Prefetcher
0
5
1015
20
25
30
0 2 4 6 8Delay between User Operations (seconds)
% Im
prov
emen
t in
Resp
onse
Tim
e
Effectiveness of Caching
0
40
80
120
160
200
Client OFFServer OFF
Client OFFServer ON
Client ON ServerOFF
Client ON ServerON
Caching
Res
pon
se T
ime
(sec
ond
s)
Scalability of Data Access: Experimental Evaluation
18
Random Random Strategy
(m-1) m (m+1)
Direction Direction Strategy
Hot Regions
Current Navigation
Window
Focus Focus Strategy
m(n-2)
m(n-1)m(n)
m(n+1)
Mean Mean Strategy
m(n-2)
m(n-1)m(n)
m(n+1)
Exponential Weight Exponential Weight Average Average Strategy
Vector Vector Strategies
41p
41p
41p
41p
Data Set Driven Data Set Driven Strategy
Localized Speculative Localized Speculative Strategies
Scalability of Data Access: Prefetching
19
Xmdv System Implementation
• Tools– C/C++
– TCL/TK
– OpenGL
– Oracle 8i
– Pro*C
User
MinMaxLabeling
SchemaInfo
Hierarchical Data
RewriterTranslator
Loader
BufferQueries
GUI
OFF-LINE PROCESS
Estimator
ExplorationVariables
DB
ON-LINE PROCESS
MEMORY
Flat Data
PrefetcherLibrary:RandomDirection
Focus
EWAMean
DB DB
Buffer
20
Publications (available at http://davis.wpi.edu/~xmdv)
• Jing Yang, Matthew O. Ward and Elke A. Rundensteiner, "InterRing: An Interactive Tool for Visually Navigating and Manipulating Hierarchical Structures", InfoVis 2002, to appear
• Punit R. Doshi, Elke A. Rundensteiner, Matthew O. Ward and Daniel Stroe, “Prefetching For Visual Data Exploration.”
Technical Report #: WPI-CS-TR-02-07, 2002• Jing Yang, Matthew O. Ward and Elke A. Rundensteiner, “Interactive
Hierarchical Displays: A General Framework for Visualization and Exploration of Large Multivariate Data Sets”, Computers and Graphics Journal, 2002, to appear
• Daniel Stroe, Elke A. Rundensteiner and Matthew O. Ward, “Scalable Visual Hierarchy Exploration”, Database and Expert Systems Applications, pages 784-793, Sept. 2000
• Ying-Huey Fua, Matthew O. Ward and Elke A. Rundensteiner, “Hierarchical Parallel Coordinates for Exploration of LargeDatasets”, IEEE Proc. of Visualization, pages 43-50, Oct. 1999
• Ying-Huey Fua, Matthew O. Ward and Elke A. Rundensteiner, “Navigating Hierarchies with Structure-Based Brushes”, IEEE Proceedings of Visualization, pages 43-50, Oct. 1999