eiman elnahrawy wsna’03 cleaning and querying noisy sensors eiman elnahrawy and badri nath rutgers...

26
Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported in part by NSF grant ANI-0240383 and DARPA under contract number N-666001-00-1-8953

Upload: kolton-hinchcliffe

Post on 02-Apr-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Cleaning and Querying Noisy Sensors

Eiman Elnahrawy and Badri NathRutgers University

WSNA September 2003

This work was supported in part by NSF grant ANI-0240383and DARPA under contract number N-666001-00-1-8953

Page 2: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

I can’t rely on this sensor data anymore. It has too many problems!!?-Noise-Bias-Missing information-Hmm, is this a malicious sensor-Something strange or sensor gone bad

Page 3: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Outline

• Motivation• General Framework• Cleaning Noise• Querying Noisy Sensors Statistically• Preliminary Evaluations• Challenges and Future Work• Conclusion

Page 4: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Motivation

• “Measurements” subject to many sources of error

• Systematic errors->Bias (Calibration) [Bychkovskiy03]

• Random errors (Noise) : external, uncontrollable environmental, HW, inaccuracies/imprecision

• Current technology: cheap noisy sensors, vary in tolerance, precision/accuracy

• Focus of industry is even cheaper sensors -> noisier, noise varies with the cost of the sensor

Page 5: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

So What?

• Uncertainty• Interest is generally queries

over a set of noisy sensors– Predicate/ range queries– Aggregates SUM, MIN– Other

• Accumulation: seriously affects decision-making/triggers

• False +ve/-ve• Misleading answers• May cost you money

h

t

Page 6: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Problem Definition

• Research focused on homogeneous sensors, in-network aggregation, query languages, optimization

• The primitives are now working fairly fine, why don’t we move on to more complex data quality problems

• If the collected data/query result is erroneous/misleading, why would we need such nets?

• Given any query and some user-defined confidence metrics, how do we answer this query “efficiently” given noisy sensors?

• What is the effect of noise on queries?

Page 7: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Is this a new problem?

• Traditional databases– Data entry, transactional activity– Clean data: no noise– Supervised off-line cleaning

• Sensors– Stream– Decision-making in real time– Online cleaning and query processing– Many resource constraints

Page 8: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

General Framework

• Two Steps• Online cleaning

– Inputs: noisy data + error models + prior knowledge– Output: uncertainty models (clean data)

• Queries evaluated on clean data (uncertainty models)

Cleaning ModuleQuery Processing

Module

Uncertainty Models (Posteriors) Query Answer

Noisy Observations from Sensors

Error Models Prior Knowledge

User Query

Page 9: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

• Observation: noisy reading from the sensor• Prior Knowledge: r.v., distribution of the true

reading– Facts, learning, using less noisy as priors for

noisier, experts, dynamic (parametric model)• Error Model: r.v., noise characteristic

– Any appropriate distribution, e.g., Gaussian– Heterogeneity -> model for each type or

even each individual sensor• Uncertainty Model (true unknown): r.v., with a

distribution, we would like to estimate

Cleaning Module

Noisy Observations from Sensors

Error Models Prior Knowledge

Uncertainty Models (Posteriors)

Page 10: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Cleaning

• Single Sensor Fusion using Bayes’ rule Posterior = (likelihood x prior) / (evidence)• Single attribute sensors

• Example: Gaussian prior (μs,σ2s), Gaussian error (0,δ2)

yield Gaussian posterior (uncertainty model)

Page 11: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Cleaning

• Multi-attributes sensors

• Example: Gaussian prior (μs,Σs), Gaussian error (0, Σ2) yield Gaussian posterior (uncertainty model)

• The terms Σs [Σs + Σ]-1, ΣT will be computed off-line

Page 12: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

• Classification of Queries– What is the reading(s) of sensor x? Single Source

Queries (SSQ)

– Which sensors have at least c% chance of satisfying a given predicate? Set Non-Aggregate Queries (SNAQ)

– On those sensors which have at least c% chance of satisfying a given predicate, what is the value of a given aggregate?

• Summary Aggregate Queries (SUM, AVG, COUNT) SAQ • Exemplary Aggregate Queries (MIN, MAX, etc.) EAQ

Query Processing Module

Uncertainty Models (Posteriors) Query Answer

User Query

Page 13: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Single Source Queries

• Approach 1: output expected value of the probability distribution

• Approach 2: output p% confidence interval using Chebychev’s inequality [μs - ε, μs + ε]

– “p” is user-defined with a default value, e.g., 95%

• Multi-attribute: first compute the marginal pdf of each attribute then proceed as above

Page 14: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Set Non-Aggregate Queries

• Output sensor id, confidence (pi)

• Confidence = probability of satisfying the given predicate (range R) >= user defined confidence

pi = ∫R psi(t) dt

• {si} = SR , eligible set

• If the readings are required compute it using the SSQ’s algorithms

• Multi-attribute: compute SR over a region rather than a single interval

Page 15: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Summary Aggregate Queries

• SUM: compute sum of independent continuous r.vs.

• Z = sum(s1, s2,…, sm) • Perform convolution on two sensors and then add

one sensor repeatedly from the eligible set (SR)

• Output expected value or p% confidence interval of overall sum

Page 16: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Summary Aggregate Queries

• COUNT: output |SR| over the given predicate

• AVG: output SUM/COUNT

• Multi-attribute: compute SR , marginalize over the aggregated attribute, then proceed as above

Page 17: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Exemplary Aggregate Queries

• Min: compute min of independent continuous r.vs.

• Z = min(s1, s2,…, sm)

• Output expected value or p% confidence interval• Other order statistics Max, Top-K, Min-K, and

median in a similar manner • Multi-attribute: analogous

Page 18: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Tradeoffs “Sensors” Vs. “Database”

• Sensor Level– Storage cost – Communication cost “sending priors”– Processing cost “compute posteriors” – Adv: point estimate, in-network aggregation

with error bounds • Database Level

– 0 cost assuming free processing, storage– Communication cost saved – Exact query answer– Disadv: no distributed query processing

Page 19: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Evaluations

• Synthetic data• “Unknown” true readings

– 1000 sensors, random from 5 clusters – Gaussian, μ = 1000, 2000, 3000, 4000, 5000, δ2 = 100

• Noisy data (Raw data) – Added random noise, Gaussian, μ = 0, different noise

levels• Posteriors (Bayesian data)

– Prior: distribution of the cluster generated the reading • Predicates: 500 random range queries at each noise level,

averaged the error

Page 20: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

• Single source queries– Metric is MSE – Reduces uncertainty, yields far less errors

– Error scaled down by a factor of δp2 /(δp

2 + δn2)

Page 21: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

• Set non-aggregate queries: prior δ = 10

– Metrics are Precision and Recall– Recall: fraction of relevant objects that are retrieved– Precision: fraction of retrieved objects that are relevant – High Recall, Precision (low false –ve, +ve, res.) better – Maintained high Recall, Precision at different confidence

levels – 95 % versus 70 % for noisy readings

Page 22: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

• Summary aggregate queries: prior δ = 10– Metric is Absolute error– More accurate priors yield smaller error– SUM: noisy readings caused four times the error– COUNT: 2 versus 14 for noisy data

Page 23: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Challenges and Future Work

• Prototype and more evaluations on real data• Just scratched the surface!

– Other estimation techniques– Other uncertainty problems: outliers,

missing data, etc. – Other queries– Effect of noise on queries

• “Efficient” distributed query processing

Page 24: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Challenges and Future Work

• Given a query and specific quality requirements (confidence, number of false +/-) what to do if can’t satisfy confidence? – Sensors are not homogeneous– Change sampling method at running time– Turn on “specific” sensors at running time– Routing– Up-to-date metadata about sensors’

resources/characteristics– Cost and query optimization

Page 25: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Conclusion

• Taking noise into consideration is important• Single sensor fusion• Statistical queries• Works well• Many open problems and future work

directions

Page 26: Eiman Elnahrawy WSNA’03 Cleaning and Querying Noisy Sensors Eiman Elnahrawy and Badri Nath Rutgers University WSNA September 2003 This work was supported

Eiman Elnahrawy WSNA’03

Thank You