on compression of machine-derived context sets β¦Β Β· the relationship of subset size (design...
TRANSCRIPT
On Compression of Machine-derived Context Sets for
Fusion of Multi-modal Sensor Data
Nurali ViraniDepartment of Mechanical Engineering
The Pennsylvania State University
Co-authors: Shashi Phoha and Asok RayThe Pennsylvania State University
1st International Conference on InfoSymbiotics / DDDASSession - 10: Image and Video Computing Methods
This work has been supported by U.S. Air Force Office of Scientific Research (AFOSR) under Grant No. FA9550-12-1-0270 (Dynamic Data-driven Application Systems)
What is context and where does it enter the DDDAS framework?
2
Contextaffecting
data
Context affectingdecision,
but not data
Context affectingdecision,
but not data
Schematic of a DDDAS framework for situation awareness
Context is the set of all factors which affect the data
Before Rain
After Rain
* J. McKenna and M. McKenna, βEffects of local meteorological variability on surface and subsurface seismic-acoustic signals,β in
25th Army Science Conference, 2006.
Seismic signal amplitude for a hammer blow*
GeophoneOr Seismic
Calm
Windy
Acoustic Sunny
Foggy
Camera
Drysoil
Moistsoil
3
Definition: A finite non-empty set β(π) is called context set, if for all π β β(π), the
following holds:
Context can be learned from heterogeneous sensor data using density
estimation
πΏ π
π1 π2 ππβ¦With nonparametric density estimation (Virani et al, 2015)*, conditional independence is guaranteed
for the modality-independent contexts
*N. Virani, J.-W. Lee, S. Phoha, and A. Ray. "Learning context-aware measurement models." In 2015 American Control Conference (ACC), pp. 4491-4496. IEEE, 2015. 4
In the presence of contextual effects assuming conditional independence given
state might be incorrect
π
π1 π2 ππβ¦
Machine-derived context is an output of the mixture-modeling technique
5
Joint conditional density as a mixture model :
where context set
The outputs of context learning includes:
β’ Machine-derived context set
β’ Contextual observation density
β’ Context priors
N. Virani, S. Sarkar, J.-W. Lee, S. Phoha, and A. Ray, βAlgorithms for Context Learning and Information Representation for Multi-Sensor Teams,β in Context-Enhanced Information Fusion, edited by L. Snidaro et al., Springer, 2016
Cardinality of context set governs complexity of decision-making process
β’ Forward loop of the DDDAS architecture
β’ Complexity in test phase:
β’ Time complexity: π |β|
β’ Space complexity: π |β|
Context-aware pattern classification^:
β’ Inverse (feedback) loop of the DDDAS architecture
β’ Complexity in test phase:
β’ Time complexity: π 1
β’ Space complexity: π β πΎβ1
(where πΎ is typically 20-30)
Context-aware sensor selection*:
^S. Phoha, N. Virani, P. Chattopadhyay, S. Sarkar, B. Smith, and A. Ray, βContext-aware dynamic data-driven pattern classification,β Procedia Computer Science, vol. 29, pp. 1324-1333, Elsevier, 2014.
*N. Virani, J.-W. Lee, S. Phoha, and A. Ray. "Dynamic context-aware sensor selection for sequential hypothesis testing." In 53rd IEEE Conference on Decision and Control, pp. 6889-6894. IEEE, 2014. 6
Compression of context set: Graph-theoretic Clustering
β(π₯)
π2 π1
π3
π4
π5π7
π6ππ
ππ
Partition the machine-derived context set by choosing a desired level of acceptable error (π)
β π₯ = π1, π2, π3, π4, π5, π6, π7ππ π₯ = π1, π2π1 = π1, π2, π3, π4π2 = π5, π6, π7
8
β π₯ = 6 π π₯ = 2
Algorithm for cardinality reduction by clustering
β’ Construct a complete graph with V = β π₯
β’ Assign weights to each edge: π€ππ = π€πππ€ππ = w ππ , ππ = π(π(π|π₯, ππ), π(π |π₯, ππ))
Graph Construction
β’ Remove edges with π€ππ β₯ Ξ΅Threshold
β’ Find all Maximal Cliques using underlying graphMaximal Clique
Enumeration
β’ Compute all set-difference and intersectionsMinterms
π1
π2 π3
π4
π5
β π₯ = π1, π2, π3, π4, π5
w12
w13
w14
w15
w23
w24
w25
w34
w35
w45
β³ π₯ = π1, π2, π5 , π2, π4 , π3
π π₯ = π1, π5 , π2 , π4 , π3
9
Given β π₯ and π π π₯, π , β π β β π₯ , β π₯ β π³ :
Approximate context-aware measurement models
10
Cluster membership:
Context priors:
Context-aware measurement models:
Approximation: Replace the mixture model with a single component
Theorem 1: (Bound of error in density estimation and choosing πβ)
Can we find an alternative approach which does not have similar limitations?
Graph-theoretic clustering-based compression:
Pros: Approximation error is directly the design parameter Based on well-established concepts from graph theory Ensures representation from all regions of information space
Cons: The relationship of error threshold π (design parameter) and cardinality
of context set π(π) is not known a priori Computationally expensive if β π is large (exponential in worst case)
11
Compression of a finite set: Subset selection
Select only a k-subset of the machine-derived context set.
π2
π3 π5
π1 π₯ = π5π2 π₯ = π2, π5
π3 π₯ = π2, π3, π5β¦
π7 π₯ = β π₯
π3 π₯
β(π₯)
π2 π1
π3
π4
π5π7
π6
Machine-derived context set k-subset context set
12
β π₯ = 6 π π₯ = 2
Error in density estimation due to subset selection
Original Compressed
Supremum Norm:
Theorem 2: (Bound of error in density estimation)
13
Optimal π-subset and Optimal π
14
Choose first π machine-defined contexts as optimal π-subset.
To choose optimal π, we can perform multi-objective optimization to minimize error as well as model complexity, where model complexity can
be in terms of time or space complexity in test phase.
Graph-theoretic clustering-based compression:
Pros: Approximation error is directly the design parameter Based on well-established concepts from graph theory Ensures representation from all regions of information space
Cons: The relationship of error threshold π (design parameter) and cardinality of context set
π(π) is not known a priori Computationally expensive if β π is large (exponential in worst case)
Subset selection-based compression:
Pros: The relationship of subset size π (design parameter) and cardinality of context set
π(π) is known a priori Computationally inexpensive (same complexity as the sorting algorithm)
Cons: Does not ensure representation from all regions of information space
15
Test procedure
β’ Cross-validation: 60% training 40% testing, repeat 10 times.
β’ Symbolic time-series analysis for feature extraction:
β Discrete Markov model Ξ£ = 6, π· = 2, π = 7
β Stationary distribution of Markov chain is used as a feature (β2000 β β7)
β PCA is used to reduce β6 β β2
β’ Learning context set using 2 seismic sensors:
β Gaussian kernel with parameter 0.01
β Cardinality of context set:
β’ Maximum likelihood classifier is used:
β’ Base case result: 99.78%
17
Results on cardinality reduction by clustering
18
Cardinality Accuracy
Accuracy same as base case, but,
Results on cardinality reduction by subset selection
19
Accuracy same as base case, but,
Cardinality Accuracy
Conclusion
Two techniques developed for cardinality reduction of machine-derived context sets
β’ Maximal clique enumeration
β’ Subset selection
Upper bounds of approximation error due to compression
were derived
Demonstrated cardinality reduction of context sets on seismic sensor data
β’ Accuracy does not reduce significantly with cardinality reduction
20Thank you