Download - Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete
![Page 1: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/1.jpg)
Query Processing for the Semantic Sensor Web
Antonios DeligiannakisTechnical University of Crete
![Page 2: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/2.jpg)
Vision for Semantic Sensor Networks
Universal, web-based access to sensor data– Simpler to consider collections of sensor networks
• Each network with some kind of authority and administration
2
![Page 3: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/3.jpg)
Of Course, Nothing Comes Easy…
Requires additional info for collected data– Location/orientation of sensor, time, authority,
measured quantities, units, errors etc• Some of them are static, some may change (time, location…)
– Additional info may significantly impact volume of transmitted data
Requires proper languages for querying data– Query execution still needs to be optimized within
each network
3
![Page 4: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/4.jpg)
Large Scale Querying
“Semantic Reality - Connecting the Real and the Virtual World”– Manfred Hauswirth, Stefan Decker
“Query processing, reasoning, and planning based on real world sensor information bases will be core functionalities to exploit the full potential of Semantic Reality. However, the size and the physical distribution of data will require new approaches which will have to trade logical correctness with statistical”
More complex, large-scale processing:– Issued queries are transformed– Relevant networks are identified– Queries issued over data of individual networks
• Query may involve log of historical data extracted from each network; or• Online query: Its execution still needs to be optimized within each network
– Results annotated, combined
4
Topic of our talk
![Page 5: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/5.jpg)
Data Collection – Potential Queries
Collecting all data (SELECT * queries)– Entire network or a subregion, periodically or frequently
Collect aggregates of data– Report AVG, SUM, MAX quantities in an area/network
Data Reduction based on user-specified data quality– In all types of queries: (historical, aggregate…)– Minimize bandwidth based on quality (or the dual problem)
Detecting Outliers– “Strange” readings: Interesting phenomenon or malfunction?
Joins– Report information based on combined readings of sensors (i.e.,
report when a lion is close to a deer)– Harder to optimize. Naïve solution of sending potential joining
tuples (or projected attributes of them) to base station is often not far from best case
5
![Page 6: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/6.jpg)
One-shot vs Continuous Queries
One-shot Queries– Ask a query, get results, DONE
Continuous Queries– Perform a query
• Specify how OFTEN it should be executed (query epoch)• Specify until WHEN it should run (optional)
– “Report avg temp per room every 30 sec, for the next hour”– “Report all measurements of sensor nodes in Room X every 1 min”
– More typical for monitoring applications– More data, more chances of doing something clever…
6
![Page 7: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/7.jpg)
Data Collection – Types of Queries
Pull vs Push Based Queries– Pull-Based Queries: sensors transmit data only in
response to queries that request them– Push-based Queries: transmit data to cache node
proactively• If I know that someone will likely request it soon, transmit to
avoid query propagation, organization etc…– Tradeoff based on how often data is requested by
different queries• Hybrid Push-Pull Query Processing for Sensor Networks
– Niki Trigoni, Yong Yao, Alan Demers, Johannes Gehrke, Rajmohan Rajaraman
7
![Page 8: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/8.jpg)
Data Collection – Types of Queries
Pull vs Push Based Queries– Data are usually pulled based on user queries– Exceptions for queries to historical data
• Node may need to push its bunch of latest measurements if memory becomes full
– Metadata changes likely need to be pushed to some external directory• Allows queries to be performed based on “correct”
knowledge• Is this an overkill for metadata that change constantly?
– I.e., time of acquired measurements is common – Careful organization helps. I.e., time can sometimes be inferred
– Possible to combine both approaches
8
![Page 9: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/9.jpg)
Outline
Sensor Nodes– Brief Intro: Parts of sensors, capabilities, constraints
Techniques for Query Processing– Topologies for Data Collection– Data Reduction based on user-specified data quality– Detecting Outliers
Conclusions
9
![Page 10: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/10.jpg)
Parts of a Sensor Sensing equipment (sensor and data acquisition
boards)– Internal (“built-in”) vs external sensing capabilities
CPU Memory Battery
– Some sensors may collect gather energy from the sun, vibrations etc
Radio to transmit/receive data from other sensors
10
![Page 11: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/11.jpg)
Sensor Parts - Example
Berkeley Mica2Stargate
(Intel PXA255 cpu)Constraint
Battery 2 ΑΑ Li-IonConserve to increase
network lifetime
CPU 7.38 MHz 400 MHzComputationally cheap
algorithms
Memory4KB SRAM,
512 KB EEPROMup to 256 MB
FLASHAlgorithms with low
memory requirements
Radio 300 μέτραDepends on radio
modelTransmission range, bandwidth (bits/sec)
Main Constraint
![Page 12: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/12.jpg)
Energy Constraints
3-5% battery yearly increase– CPU speed increases much faster
• However, energy per cpu instruction decreases
Some applications: unattended deployment– Eg: Disaster scenarios, military environments…– Often hard or impossible to replace batteries
Maximizing network lifetime is the main target– Cost-effective only if sensor networks last long– Applications with sensors without power constraints
are much easier to handle
12
![Page 13: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/13.jpg)
Sources of Energy Drain Cpu computations Measurements from sensing equipments (cost depends
on what you sense) Very small energy consumption in sleep mode Radio is main source
– cost(Transmitting) > cost(Receiving) ≥ cost(idle listening) – Popular goal: reduce #transmitted bits– Synchronization + communication protocols equally important
• I.e., cost of transmitting K bits depends on duty cycle (percentage of time sensor is awake to listen for data)
• Idle listening for too long is extremely costly
13
![Page 14: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/14.jpg)
Assumptions and Goals in Subsequent Algorithms
Research Emphasis on more constrained environments– Wireless communication, short transmission ranges
• One or more base stations with increased capabilities may exist• Candidates for gateways to the semantic sensor web
– Energy limitations (battery powered sensors) Goal of algorithms in all applications that follow:
– Preserve Energy– Organize sensors and their schedules
• Good schedules allow sensors to power down their radios/cpus and go into a sleep mode
– Reduce size of transmitted data Processing (esp, aggregation) focuses on numeric measurements Implication of having a strict schedule on when to collect data: base
stations knows when quantities are collected– Such metadata may not even need to be transmitted
14
![Page 15: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/15.jpg)
Outline
Sensor Nodes– Brief Intro: Parts of sensors, capabilities, constraints
Techniques for Query Processing– Topologies for Data Collection– Data Reduction (SELECT * and aggregate queries)– Detecting Outliers
Conclusions
June 1, 2009 Antonios Deligiannakis 15
![Page 16: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/16.jpg)
1a. Data Collection: TAG
TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks– Samuel Madden, Michael Franklin, Joseph Hellerstein
Goals– Specify SQL type query– Organize sensors and schedules them to reduce energy
consumption– Emphasizes/uses IN-NETWORK query processing– Targets aggregate queries, but similar functionality can be used
for SELECT * queries Results of paper incorporated into TinyDB
– Data processing system built on top of TinyOS
16
![Page 17: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/17.jpg)
TAG Operation
Users pose queries at a base station Messages flooded towards the sensors
– Reverse aggregation tree is formed– Each sensor belongs to a level based,
on hops to root Each epoch is (equally) partitioned
amongst the levels– Nodes listen ONLY when children nodes
send data– When nodes transmit, parent node has
radio open– Synchronization allows each node to
transmit ONCE per epoch• Transmission includes aggregate for
subtree
Base Station
Area A
The picture is from the paper
![Page 18: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/18.jpg)
TAG Query Language
The picture is from the paper
18
![Page 19: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/19.jpg)
TAG Contributions
Support of multiple aggregate functions AND group-by queries– Classification and behavior based on type of function
In network processing dramatically reduces transmitted data
Synchronization allows sensors to sleep most of the time
Further optimizations for monotonic aggregates– I.e., MAX aggregate: Don’t transmit aggregate if you
overhear a sibling that reports a larger aggregateConsiderations for message loss etc…
19
![Page 20: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/20.jpg)
1b. Data Collection: WaveScheduling
WaveScheduling: Energy-Efficient Data Dissemination for Sensor Networks– Niki Trigoni, Yong Yao, Alan J. Demers, Johannes Gehrke,
Rajmohan Rajaraman– Proposed in the Cougar system
Observation: In TAG, nodes at the same level transmit at the same time– Many collisions, message losses– Many retransmissions, energy drain
Goal: Organize nodes in order to minimize message collisions
20
![Page 21: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/21.jpg)
Main Idea: WaveScheduling
Organize nodes in a grid– Each grid area has a leader– Nodes within each area transmit data to their leader– Leaders communicate at specific intervals and directions
East Wave example
21
![Page 22: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/22.jpg)
WaveScheduling Paths
The large right-turn path has a lower
latency for N,E,S,WSource
Destination
They follow a North, East, South, West direction
22
![Page 23: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/23.jpg)
WaveScheduling Contributions
Transmissions without collisions– Much lower energy drain– But also, significantly larger delays
• Few nodes transmit at each time, long paths to follow
23
![Page 24: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/24.jpg)
Outline
Sensor Nodes– Brief Intro: Parts of sensors, capabilities, constraints
Techniques for Query Processing– Topologies for Data Collection– Data Reduction (SELECT * and aggregate queries)– Detecting Outliers
Conclusions
June 1, 2009 Antonios Deligiannakis 24
![Page 25: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/25.jpg)
2a. Compressing Historical Measurements
Compressing Historical Information in Sensor Networks– A. Deligiannakis, Y. Kotidis, N. Roussopoulos
Application: If past measurements are important, they should be periodically transmitted to base station, before the memory is exhausted
Transmitting all measurements is costly
25
![Page 26: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/26.jpg)
2a. Compressing Historical Measurements
Observation: Sensors may measure several quantities Potential correlations
– At the same quantity• Periodicity, similar trends
– Between different quantities• Temperature and Voltage [Deshpande04], pressure and humidity
– Between different sensors in an area• I.e., Similar temperature and noise levels
Can we take advantage of such correlations to compress the data?
26
![Page 27: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/27.jpg)
Regression at XY level = scaling and shifting
Examples of Correlated Signals
XY
XY Graph
27
![Page 28: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/28.jpg)
Main IdeaCreate small dictionary of trends that appear
frequentlyPartition data into intervalsEncode each interval through some part of the
dictionary– Use linear regression for encoding: bXaY
W W W
Dictionary
Part of Data
Regression Parameters (a,b)
28
![Page 29: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/29.jpg)
What is Transmitted
Size = B
Sensor Base Station
Dictionary
Measurements
Dictionary Updates Dictionary Log
Log with receivedcompressed data
Compressed Data
ΜBase
M
1
2
N
For each data interval transmit 4 values: 1) Start position in data array, 2) Location of best approx. in dictionary; 3-4) Regression Parameters
29
![Page 30: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/30.jpg)
Cooperative Compression Exploring spatial correlations…
Group leader partitions part of bandwidth
to each sensor– To save energy, group leader may transmit its
dictionary updates to the group
Sensors compress their data, report
resulting error (NOT data)
More space is assigned to nodes with
larger errors
Compressed data transmitted to group
leader
Combination with its own data and
transmission
1
32
S-1
Group Leader
Base Station
B1
B2 B3
BS-1
E1
E2 E3
ES-1
Query, Bandwidth
Β1’
Β2’ Β3’
ΒS-1’
30
![Page 31: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/31.jpg)
Algorithm Benefits
To minimize different error metrics, simply change regression algorithm– I.e., SSE, SSRE and Max errors are simple to handle
More space to difficult signals/sensors Group organization saves team the need to compute
dictionary (expensive part)– Need to rotate group leader selection, to avoid draining
energy– Can apply HEED protocol for group leader selection
• Prob of becoming group leader is analogous to Ecurr / Einit
31
![Page 32: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/32.jpg)
2b. Model-Driven Data Acquisition
Model-driven Data Acquisition in Sensor Networks– Amol Deshpande, Carlos Guestrin, Samuel R. Madden, Joseph
M. Hellerstein, Wei Hong Idea: Learn a probabilistic model of past
measurements/patterns– Multi-gaussian pdfs– Also learn transitional probabilities P(Xt+1 | Xt)
At each epoch, base station decides for which sensors (and which quantities) it is confident for their current values– This confidence decays over time if no samples are taken
Generate query plan to retrieve only the remaining quantities
32
![Page 33: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/33.jpg)
10 20 300
0.1
0.2
0.3
0.4
Dt
SQL query, with desired confidence
bounds
Method Sketch (slide from the authors’ presentation)
10 20 300
0.1
0.2
0.3
0.4
Query
Data Collection
Plan
Feed Model
10 20 300
0.1
0.2
0.3
0.4
Model Estimate
ProbabilisticModelNew Query
![Page 34: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/34.jpg)
Algorithm Characteristics
Takes advantage of correlations– May decide to sample voltage instead of temperature, to
decrease energy
Model can be used for missing data (inaccessible sensors)
Can handle point and range queries However, hard to handle previously unseen patterns
– “Thus, for models to perform accurate, predictions they must be trained in the kind of environment where they will be used”
Centralized model, difficult to scale– Subsequent work by same authors proposed a more
distributed system (KEN)34
![Page 35: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/35.jpg)
2c. Snapshot Queries
Snapshot Queries: Towards Data-Centric Sensor Networks– Yannis Kotidis
Idea: Nodes in proximity likely observe similar things Find if you are needed to answer queries
– Does there exist a representative that can approximate your data accurately?
Nodes that are needed (representatives) constitute network snapshot– Can estimate and answer queries for remaining nodes as well
Completely decentralized approach– Representatives tested, change with few messages
35
![Page 36: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/36.jpg)
Snapshot Queries
Q’: select loc,temperature from Sensors where loc in SOUTH_EAST_ QUADRANT
use snapshotQ’
A
D
C
B
36
![Page 37: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/37.jpg)
Example of Network Snapshot
Dark nodes show representatives
Network inside the network
Immediate visualization of common data patterns
A
B
C
Q
37
![Page 38: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/38.jpg)
Testing for Representatives Ni maintains model for each neighbor Nj
– Constructed using cache of measurements from Nj
– Updated at random epochs by measurements transmitted by Nj
Can use model to predict measurement of Nj if:
error(x’j(t)-xj(t)) T
Supports multiple error functions, user provided threshold T Snapshot adapts over time to evolving data characteristics No training is necessary
Time
Tem
pera
ture
xi(t)
xj(t)
xi(t)
xj(t)
38
![Page 39: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/39.jpg)
In TAG, each node transmits aggregates at EACH epoch– Can we do better than that?– Aggregates (i.e., avg temperature) may change slowly
Processing approximate aggregate queries in wireless sensor networks– A. Deligiannakis, Y. Kotidis, N. Roussopoulos
Application: If application tolerates E_Global = |V-V|, organize aggregation to minimize #messages
– V, V : Real/estimated aggregate result
Idea:– Don’t transmit small changes in aggregates of subtrees
Dual problem a little harder– Bandwidth Constrained Queries in Sensor Networks (same authors)
Approximate Aggregate Queries
39
![Page 40: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/40.jpg)
Error Filters
Algorithm apply error filters to sensor nodes– Interval [L..H]
Each node computes aggregate for subtree– Transmit only if new aggregate is outside the filter– At each transmission, re-center error filter
Of course, the trick is how to decide how large each error filter should be– Always respect accuracy constraints E_Global
40
![Page 41: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/41.jpg)
Example of Error Filters
54 6 8
1
7
3
Base Station
2
E=1 E=4
E=3
62 68
67
17
16 18 46 54
5017
19 21 41 49
50
17 50
62 68
65
19 21
20
41 49
45
62 68
65
19.5
19 21 41 49
47
62 68
65
19.5
19 21 41 49
47
62 68
65
41 49
45
19 21
20
62 68
65
62 68
65
17
16 18 46 54
50
i
iEE_Global =
41
![Page 42: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/42.jpg)
Why not Uniform Allocation of Error?
Reasons not to select uniform filters– Different range of aggregates in nodes/subtrees– Changes in subtrees may cancel-out
• Eg, Vehicle movement from observation area of node 4 to the one of node 5
Goal: Use larger filters where you expect larger decrease in transmitted messages
Filters periodically adjusted– Adapt to changes in data characteristics
54
2
42
![Page 43: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/43.jpg)
Sketch Of Technique
shrink
expand
F x W
(default)
W W + dW
(expanded)
Filter Width
#messages
Be
xp
an
d
Bs
hr
in
k
UpdG
ain
Filters periodically shrink to W x F – Creates error budget (1-F) x E_Global to redistribute to
nodes Algorithm computes simple statistics to estimate gain
of increasing filter– Statistics computed per node
C
C-W C+W
C
C-W*F C+W*F
![Page 44: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/44.jpg)
Statistics aggregated bottom-up before reorganization– Important note: Compare this cost (1 aggregate/sensor) with
cost of transmitting individual statistics to base station (SELECT *)
Error budget redistributed top-down– Partition error budget to your subtrees (and yourself),
proportionally to the gain of each subtree
Often 5-time reduction in transmitted data compared to uniform allocation, order of magnitude when compared
Sketch of Technique (2)
44
![Page 45: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/45.jpg)
Outline
Sensor Nodes– Brief Intro: Parts of sensors, capabilities, constraints
Techniques for Query Processing– Topologies for Data Collection– Data Reduction (SELECT * and aggregate queries)– Detecting Outliers
Conclusions
June 1, 2009 Antonios Deligiannakis 45
![Page 46: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/46.jpg)
The Need for Outlier Detection
Outliers Detection is Useful– Outliers may denote malfunctioning sensors
• Sensor measurements are often unreliable• Sensors may fail-dirty
– Outliers may also represent interesting events detected by few sensors
• Fire detected by a sensor
Results to Aggregate queries are often Meaningless– Consider a MAX/MIN calculation
in the presence of outlier measurements
– Other aggregates are also influenced
46
![Page 47: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/47.jpg)
What do we Need?
Goals for aggregate queries:– A “clean” aggregate– Reporting of outlier values– Both in a SINGLE, in-network framework
47
![Page 48: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/48.jpg)
What to Consider as an Outlier?
Need to support several similarity metrics
Also consider characteristics of monitored quantity– Measurements may depend on distance from source (e.g.,
noise, heat)– Simply relying on values for testing similarity between sensors is
not enough – comparing recent trends may be more appropriate
Provide provision for user-specific “minimum support”– How many other sensors need to be similar to you, so that you
are not considered as an outlier?
48
![Page 49: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/49.jpg)
Motivational Example
• S6, S7 and S8 observe a fire
• Their measurements fluctuate more
A voting process at S3 will reject the reading of S6
Smoothing at S3 also obscures the reading
S10 and S9 fail-dirty need to be excluded
Example is partitioned in 2 areas: Our framework
supports group-by queries
49
![Page 50: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/50.jpg)
Sketch of Framework
1. Assume a required minimum support of 2
2. S6, S7 and S8 can only be tested for similarity at their closest common ancestor (S2)
3. At node S2, their values can be merged into aggregate
4. S12, S4 and S5 are in the same group. In S3 their values can be tested for similarity
5. Similarly, test S3, S2 and S11 for similarity in S2 etc
6. Nodes S10 and S9 have failed-dirty. Readings without the minimum support are not included in the aggregateEven if the readings of S6, S7 and S8 are incorporated
into the aggregate, one of these readings (revealing the fire) will be received by the root node 50
![Page 51: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/51.jpg)
Framework Features
Performs similarity tests over the latest K readings of sensors– Can plug similarity functions with minimal cost
Allows for minimum support GROUP-BY support
– Grouping based on latest measurement, OR static predicates (area, id etc)
Can limit tests within each group using a CONSTRAIN TEST clause– Semantic information (i.e., location) could be useful here– I.e., only perform tests between sensors in the same floor
Collection Tree periodically reorganized– Move towards places you will find witnesses, outliers
51
![Page 52: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/52.jpg)
Just to See what Happens…
52
![Page 53: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/53.jpg)
Some More Notable Approaches
Cluster-Based Communication– LEACH, Pegasis, HEED– Goal: Organize sensors into clusters– Intra-cluster and inter-cluster communication– Overhead for clusterheads, so probabilistic election and rotation
Optimizing Scheduling– Nodes have different loads to transmit– Can determine minimal times to transmit/listen based on worst
case estimates of time to transmit/link– See: “Workload-aware Optimization of Query Routing Trees in
Wireless Sensor Networks (MicroPulse) by P. Andreou, D. Zeinalipour-Yazti, P. Chrysanthis, G. Samaras
53
![Page 54: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/54.jpg)
Some More Notable Approaches (2)
Sensor Localization– Few sensor nodes are actually GPS-enabled
• Catalog for Crossbow gives only 1 such data acquisition board (MTS420)
– Several algorithms for sensor localization• Option 1: Sensors are not localized, but there exist landmarks with
GPS knowledge of themselves• Option 2: Mobile sensors can localize even without any
infrastructure, if (1) the sensors are free to move, (b) they have a common reference point (i.e., direction of North)
– GPSFree Node Localization in Mobile Wireless Sensor Networks, by Huseyin Akcan, Vassil Kriakov, Herve Bronnimann, Alex Delis
Duplicate-Insensitive sketches for aggregate queries– Minimizes impact of data loss
54
![Page 55: Query Processing for the Semantic Sensor Web Antonios Deligiannakis Technical University of Crete](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649d2e5503460f94a04c35/html5/thumbnails/55.jpg)
Conclusions
Presented several query processing techniques– All aim to process data in-network
• Crucial for network lifetime– Techniques for extraction of historical measurements, or online
querying (point, range, aggregate, group-by)– All data reduction techniques presented are easily tunable
• Given desired error/accuracy, minimize bandwidth consumption; or• Given bandwidth consumption, minimize error of produced results• Both useful to satisfy different user requirements
– Important to properly schedule nodes• Increases available time to sleep, decreases the energy drain• If possible, avoid creating different schedule per each query
– Too many conflicts, sensor continuously working…
55