iiit hyderabad interactive visualization and tuning of multi-dimensional clusters for indexing...

42
IIIT Hyderabad Interactive Visualization and Tuning of Multi- Dimensional Clusters for Indexing Dasari Pavan Kumar (MS by Research Thesis) Centre for Visual Information Technology

Upload: belinda-bailey

Post on 28-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Large scale visualizationsDasari Pavan Kumar
IIIT Hyderabad
Overview
Provide a framework to generate better clusters for high dimensional data points
Provide a fast cluster analysis/generation tool
IIIT Hyderabad
A difficult task however!
Data generation in previous decade consisted mostly of textual information
Inverted Index, suffix trees, N-grams, etc
IIIT Hyderabad
More data !
Non-textual information (images)
Underlying processes remain similar
IIIT Hyderabad
can’t be fully automated.
IIIT Hyderabad
Data Visualization
Cluster analysis – descriptive modeling
Identify important features/patterns
XMDV tool (M. Ward)
Past Attempts!
IIIT Hyderabad
Indexing images/videos
Apply clustering to compute bag of words
Generate feature histogram and perform some ML methods
IIIT Hyderabad
Indexing images/videos
Apply clustering to compute bag of words
Generate feature histogram and perform some ML methods
IIIT Hyderabad
Other low-level image features exist
GLOH, steerable filter, spin images
IIIT Hyderabad
Clusters + visualization
The problem
IIIT Hyderabad
Cluster analysis
Identify better subspaces
Efficiently/quickly compute clusters
Compare clustering schemas
Hence PCA cant be trivially applied
Clusters could be lost in cloud of dimensions (curse of dimensionality)
Difficult to interpret the combination
IIIT Hyderabad
Feature selection
Wrapper model
Filter model
Difficult to compare since its highly dependent on density parameter
Rank dimensions
Uniformity (Entropy)
IIIT Hyderabad
Ranked dimensions
1D Histogram of distribution
Manual
Cluster such data on a commodity pc
Almost impossible
IIIT Hyderabad
Data clustering
Currently using k-means (GPU)
IIIT Hyderabad
Extracted low-level
image descriptors
Manageable size
(high dimensional)
Statistical sampling
Not feasible
Plug-in any graph drawing
Current – 2D force based
Similar nodes must be close
Can be estimated using MST
Generate minimum spanning tree (MST) of cluster centers
Single linkage dendogram
Prim’s method
Takes 0.2 sec for 1000 nodes
Drill-down “visual word” to actually see the “sift” interest points to understand the similarity
MST without layout
MST with layout
IIIT Hyderabad
Cluster validation
Three basic strategies
External – build an independent partition according to our intuition
Comparison with schema C or proximity matrix.
Relative – choose the one that best fits !!
Computationally not feasible
GPU implementation
Obtain min/max of the graph – optimal clusters Nc
Iteration
Index
IIIT Hyderabad
15 categories
IIIT Hyderabad
Interesting observation
Same with corner cells
78, 79, 83, 84, 110, 116}
1D histograms corresponding to dimensions (a)84, (b) 110, (c) 124
IIIT Hyderabad
More clusters does not mean better classification
Fei-Fei et al. report a mean accuracy of 52.5%
VW = Number of visual words, EW = K-means using uniform weights, IW = K-means with weights adjusted interactively, IW-Ds = K-means with Ds dimensions given a weight zero and weights of other dimensions adjusted interactively.
IIIT Hyderabad
More clusters does not necessarily mean better classification
Fei-Fei et al. report a mean accuracy of 52.5%
IIIT Hyderabad
Provide a framework for better cluster generation
Provide fast cluster analysis/generation tool for a commodity pc enabled with GPU
Able to analyze distributions across dimensions
Identified redundant dimensions
IIIT Hyderabad
Publications
Interactive Visualization and Tuning of SIFT Indexing, Dasari Pavan Kumar and P.J.Narayanan, Vision, Modelling and Visualization, 2010, Siegen, Germany
IIIT Hyderabad
User needs to get familiarized with the tool
Visual decoding of data is sometimes difficult
Cluster generation still depends on parameters like K (no. of clusters).
IIIT Hyderabad
Future Work
Incorporate support for subspace clustering
Conduct experiments based on wrapper clustering methods
IIIT Hyderabad
Thank you