Download - Data Storage in Sensor Networks
![Page 1: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/1.jpg)
Data Storage in Sensor Data Storage in Sensor NetworksNetworks
Leonidas GuibasStanford University
Sensing Networking
Computation
CS428CS428
![Page 2: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/2.jpg)
Sensor Systems as DBs
Sensor networks:collect measurements from the physical worldorganize and store these measurements over timeserve continuous or single shot queries about current or past events
So sensor networks can be though of as distributed databases over these physical measurements
![Page 3: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/3.jpg)
Logical vs. Physical Data Access
A sensor net DB organization allows queries to be expressed at a level close to the application semantics – just like in a traditional DBThis allows the system to hide physical layer details, like where the data is stored, replication for robustness, and so on ...Of course, this increased convenience comes at a loss of efficiency
![Page 4: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/4.jpg)
Traditional SN Programs
Procedural addressingof individual sensor nodes; user specifies how task is executed; data may be processed centrally.
DB Approach
Declarative querying; user isolated from “how the network works”; in-network distributed processing.
The DB View of Sensor Networks
TemperatureTime Value2:00 154:00 12Temperature
Time Value2:00 104:00 13
HumidityTime Value2:30 703:30 75
TemperatureTime Value2:00 203:00 18
PressureTime Value1:00 304:00 35
Queries
![Page 5: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/5.jpg)
Database Approaches for Accessing Sensor Networks
Warehousing approachDevice data is extracted in a predefined wayDevice data is stored in a centralized DB serverQueries are evaluated on the centralized DB server
Distributed approachQueries are evaluated by contacting devicesPortions of queries are executed on the devices
EventData
EventData
![Page 6: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/6.jpg)
Data-Centric Storage (DCS)
Data-Centric: data is named by attributesEvent data is stored, by name, at home nodes; home nodes are selected by the named attributesQueries also go to the home nodes to retrieve the data (instead of to the nodes that detected the events)Home nodes are determined by a hash function + GPSR
![Page 7: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/7.jpg)
Database OrganizationOverview
![Page 8: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/8.jpg)
What is a (Traditional) Database?
Very large, integrated collection of data[Usually] Models real-world enterprises
Entities (e.g., students, courses)Relationships (e.g., John is taking CS428)
A DataBase Management System (DBMS) is a software package designed to store and manage databases
Many common examples, such as SQL, Oracle, etc.
![Page 9: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/9.jpg)
Why Use a DBMS (instead of just Files)?
Data independence and efficient accessReduced application development timeData integrity and securityUniform data administration[Consistent] Concurrent access, recovery from crashes
![Page 10: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/10.jpg)
Data ModelsA data model is a collection of concepts for describing dataA schema is a description of a particular collection of data, using a given data modelThe relational data model is the most widely used model today
Main concept: relation, basically a table with rows and columnsEvery relation has a schema, which describes its columns (fields)
![Page 11: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/11.jpg)
Levels of AbstractionMany views, single conceptual (logical) schema and physical schema
Views describe how users see the data Conceptual schema defines logical structurePhysical schema describes files and indexes used
Physical Schema
Conceptual Schema
View 1 View 2 View 3
![Page 12: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/12.jpg)
Example: a University Database
Conceptual schema: Students(sid:string, name:string,
login:string, age:integer, gpa:real)Courses(cid:string, cname:string,
credits:integer) Enrolled(sid:string, cid:string, grade:string)
Physical schema:Relations stored as unordered files Index on first column of Students
![Page 13: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/13.jpg)
Example: University Database
External Schema (View): Course_info(cid:string,enrollment:integer)
![Page 14: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/14.jpg)
Data Independence
Applications insulated from how data is structured and storedLogical data independence: Protection from changes in logical structure of dataPhysical data independence: Protection from changes in physical organization and format of data
![Page 15: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/15.jpg)
Concurrency ControlConcurrent execution of user programs is essential for good DBMS performance
Because disk accesses are frequent, and relatively slow, it is important to keep the CPU working on several user programs concurrently
Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computedDBMS ensures such problems don’t arise: users can pretend they are using a single-user system
In sensor networks the network plays the role of the disks ...
![Page 16: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/16.jpg)
Execution of a DBMS Program
Key concept is transaction, which is an atomic sequence of database actions (reads/writes)Each transaction, executed completely, must leave the DB in a consistent state(assuming DB is consistent when the transaction begins)
![Page 17: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/17.jpg)
Scheduling Concurrent Transactions
DBMS ensures that execution of {T1, ... , Tn} is equivalent to some serial execution T1’ ... Tn’ (in some order, not necessarily the order in which initiated)Two-phase locking: Before reading/writing an object, a transaction requests a lock on the object, and waits till the DBMS gives it the lock
![Page 18: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/18.jpg)
Deadlock
Say an action of Ti (say, writing X) affects Tj(which perhaps reads X). One of them, say Ti, will obtain the lock on X first, so Tj is forced to wait until Ti completes (this effectively orders the transactions)But what if Tj already has a lock on Y and Tilater requests a lock on Y? Ti or Tj must be aborted and restarted!
![Page 19: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/19.jpg)
Ensuring AtomicityDBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a transactionKeeps a log (history) of all actions carried out by transactions while executing:
Before a change is made to the database, the corresponding log entry is forced to a safe location (Write-Ahead Log protocol - OS support for this is often inadequate)After a crash, the effects of partially executed transactions are undone (recovery) using the log
![Page 20: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/20.jpg)
Structure of a DBMSA typical DBMS has a layered architectureDiagram shows one of several possible architectures; each system has its own variations
Query Optimizationand Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
These layersmust considerconcurrencycontrol andrecovery
![Page 21: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/21.jpg)
SummaryDBMS used to maintain, query large datasetsBenefits include recovery from system crashes, concurrent access, quick application development, data integrity and securityLevels of abstraction give data independenceA DBMS typically has a layered architecture
But all these operations assume fast processing and inexpensive storage
![Page 22: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/22.jpg)
Some Recent Trends
Distributed Databases: information may be stored on remote disks, accessed via a networkP2P systems: A network of nodes that come and go, sharing files (Napster, Gnutella, Kazaa)Data streams: Large data streams that cannot be stored; data summaries must be maintained to serve queries
![Page 23: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/23.jpg)
Sensor NetworkDataBases
![Page 24: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/24.jpg)
Sensor Network DB Challenges
These days disks used in DB systems are essentially free; sensor nodes instead have to deal with pitifully small memories –so data summarization, aggregation, and aging is essentialIn a sensor network links (and nodes) come and go – the stored information and access to it must be protected from this physical volatility
![Page 25: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/25.jpg)
Challenges, Cont’d
Continuous, rather than single shot queries, will be the norm – thus query optimization is important for saving energyLatencies in access to data can be highly variable; thus query execution plans must continuously adapt to the network stateQuery executions can interact and cause conflicts and resource contention in sensor tasking
![Page 26: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/26.jpg)
An Example
Many A detections,little correlation with B
Roughly the same A and Bdetections, highly correlated
![Page 27: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/27.jpg)
The Data is Different, Too
Sensor net data inherently contains errors –exact data comparisons are of little valueA distinction has to be made between data that could potentially be acquired, and data that actually has been acquired – resource contention and other issues could prevent the capture of potential dataThe relational view of large tables whose entries can be modified is not realistic; may need to work with append-only relations
![Page 28: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/28.jpg)
What ShouldQueries be Like?
![Page 29: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/29.jpg)
SQL-Like Query ExamplesSnapshot (single shot) queries:� How many empty bird nests are in the northeastern
quadrant of the forest?SELECT SUM(s)FROM SensorData sWHERE s.nest = empty and s.loc in (50,50,100,100)
Long-running (continuous) queries:� Notify me over the next hour whenever the number of
empty nests in an area exceeds a threshold.SELECT s.area, SUM(s)FROM SensorData sWHERE s.nest = emptyGROUP BY s.areaHAVING SUM(s) > TDURATION (now, now+60)EVERY 5
![Page 30: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/30.jpg)
What is New?
Duration for continuous queriesSampling ratesNew data types, to account for data uncertainty
rangesparametric distributions (e.g., Gaussians)
operations for computing probabilities, equality likelihood, ...
![Page 31: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/31.jpg)
Using Distributions Instead of Values
Properly reflects the uncertainty in all sensor measurementsAnswers computed by the sensor net can be given a confidenceBut even simple arithmetic operations can become very costly
![Page 32: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/32.jpg)
A Sensor DatabaseExample:
TinyDB from UC Berkeley[Madden, Franklin, Hellerstein, Hong, ’03]
![Page 33: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/33.jpg)
Using Declarative Queries
Users specify the data they wantSimple, intuitive, SQL-like queriesUsing user predicates, not specific node addresses
Challenge is to provide:Expressive and easy-to-use DB interfaceHigh-level operators
With well-defined interactionsWith transparent optimizations that many programmers would miss
Sensor-net specific techniques
Power efficient execution framework
![Page 34: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/34.jpg)
TinyDB
Programming sensor nets is hardDeclarative queries are easy
TinyDB: In-network processing via declarative queries
Example: Vehicle tracking application
Custom code1-2 weeks to developHundreds of lines of C
TinyDB query (on right): 2 minutes to developComparable functionality
����������� � � ��������� � � ��� ������������� �� �� � � ��� � � !� �
[Madden et. al., ’03]
![Page 35: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/35.jpg)
TinyDB Interface
![Page 36: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/36.jpg)
Overview
TinyDB: Queries for Sensor NetsProcessing Aggregate Queries (TAG)Taxonomy and ExperimentsAcquisitional Query Processing
![Page 37: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/37.jpg)
���"� �
�#�� �
$%�"����#����
� %&����'� �( ��)
���"� * �� �#���#�%�
�#�� �+,-����&��. �/�#�� � ����0 ������1%��
����������� ���
���23�� '45
��������� ��
� ������� ����� � � ���� ��� � � ���������
�������!"# � � !$$%�!$# � � !$%�
��1&� ��� '&� ���23�� '45
� �� +��� '��� ������� '&+�67�%������������ '&+�87�%9��&�1���������1&+�&� ����+�� �:�������+�± 6�� ���; ��/ +�getTempFunc()<
���� '�%�#���� '�%�#22<<55
���"� *���"� *
=>7?777�������� 1������
=6?777������2��@��5�9 �A�
=BC77�*"��� � � �2( D�E F�1"����'5
=6F�)* �#�� '�&�#�
2BG�&����������C� &���������"� ��������� 5
![Page 38: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/38.jpg)
Declarative Queries for Sensor Networks
Examples:SELECT nodeid, nestNo, lightFROM sensorsWHERE light > 400EPOCH DURATION 1s
1
2
1
2
1
NodeidNodeid
405251
422171
389250
455170
LightLightnestNonestNoEpochEpochSensors
-�����������������1����������:.
![Page 39: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/39.jpg)
Aggregation Queries
3
3
3
3
CNT(…)
520
370
520
360
AVG(…)
South0
North1
South1
North
region
0
Epoch
-��%�������%� 1���##%'�����������#��&�%��������/������&��:.
������ �����?���� �2�##%'�5�� H; 2��%�5
� � � ������
; � � ��*I �����
� � H�� ; � H; 2��%�5���C77
��� �� �� � � ��� � >7�
3
������( D�� H; 2��%�5���C77
������ � H; 2��%�5
� � � ������
��� �� �� � � ��� � >7�
2
![Page 40: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/40.jpg)
Tiny Aggregation (TAG)
In-network processing of aggregatesCommon data analysis operation
Aka gather operation or reduction in || programming
Communication reductionOperator dependent benefit
Across nodes during same epoch
Exploit query semantics to improve efficiency!
![Page 41: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/41.jpg)
Query Propagation Via Tree-Based Routing
Tree-based routingUsed in:
Query delivery Data collection
Topology selection is important; Continuous process
Mitigates failures
A
B C
D
FE
$+�������<
$ $
$
$$
$
$
$
$
$ $$
+J<K
+J<K
+J<K
+J<K +J<K
![Page 42: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/42.jpg)
Basic Aggregation
In each epoch (= system sampling period):Each node samples local sensors onceGenerates partial state record (PSR)
own local readings readings from children
Outputs PSR during assigned communication interval
At end of epoch, PSR for whole network output at rootNew result at each successive epoch
Extras:Predicate-based partitioning via GROUP BY
1
2 3
4
5
![Page 43: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/43.jpg)
Illustration: Aggregation
1
2
3
14
4
54321
1
2 3
4
5
>
������L
����A�&�L
����A�&�!� ������' ( ) ��*�� ' + ��,�-��
�'�#�
![Page 44: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/44.jpg)
Illustration: Aggregation
1
2
23
14
4
54321
1
2 3
4
5
C
������L
����A�&�B� ������' ( ) ��*�� ' + ��,�-��
����A�&�L
![Page 45: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/45.jpg)
Illustration: Aggregation
1
312
23
14
4
54321
1
2 3
4
5
B>
������L
����A�&�C� ������' ( ) ��*�� ' + ��,�-��
����A�&�L
![Page 46: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/46.jpg)
Illustration: Aggregation
51
312
23
14
4
54321
1
2 3
4
5
6
������L
� ������' ( ) ��*�� ' + ��,�-�� ����A�&�>
����A�&�L
![Page 47: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/47.jpg)
Illustration: Aggregation
51
312
23
14
14
54321
1
2 3
4
5
>
������L
� ������' ( ) ��*�� ' + ��,�-�� ����A�&�!
����A�&�L
![Page 48: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/48.jpg)
Interval Assignment: An Approach
1
2 3
4
5
� ������ ������' ( ) ��*��' ( ) ��*�..!�����A�&��D�'�#�
����A�&�L �M��A&
4
3
Level = 1
2
Epoch
��� � ����A�&
4 3 2 1 555
�
��
�
���
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
�
��
L T
L T
L T
T
L T
L L/�����,�,�+���#��������%��'%��1"�&�"������%&������A�&�%���&���&����'�#�
, �� + 0-�1-�����-,��-�2�,1�
,��� ��,�������0-��-3 ��1-,�������-,
, + �,4�������-,�
,��� �� 4,1
![Page 49: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/49.jpg)
Aggregation Framework
, � �����G����1&����1���?����"� * �%''�������"������������/%�#�����#��/��� ������+������������� �� �� ����� �
��������� → ����
�� �� �����������→ �����
� ����� ������� → ���� �� �����
�G�� '&+�� A���
������ ���� → ����
���� �� ����������������� → ���� ���� ��� �����
��� ����� �������� → ���
������&������ #���2�� 5
����#����+�� �������#����A?�#�� � %����A
![Page 50: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/50.jpg)
Types of Aggregates
SQL supports MIN, MAX, SUM, COUNT, AVERAGE
Any function over a set can be computed via TAG
In network benefit for many operationsE.g. Standard deviation, top/bottom n, spatial union/intersection, histograms, etc. Compactness of PSR
![Page 51: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/51.jpg)
Partial State
Growth of PSR vs. number of aggregated values (n) Algebraic: |PSR| = 1 (e.g. MIN)Distributive: |PSR| = c (e.g. AVG)Holistic: |PSR| = n (e.g. MEDIAN)Unique: |PSR| = d (e.g. COUNT DISTINCT)
d = # of distinct valuesContent Sensitive: |PSR| < n (e.g. HISTOGRAM)
Effectiveness of TAGMEDIAN : unbounded,MAX : 1 record
Partial StateAffectsExamplesProperty
-� �����%1.?�; ��"��:��&
![Page 52: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/52.jpg)
Simulation Environment
Evaluated TAG via simulation
Coarse grained event based simulatorSensors arranged on a gridTwo communication models
Lossless: All neighbors hear all messagesLossy: Messages lost with probability that increases with distance
Communication (message counts) as performance metric
![Page 53: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/53.jpg)
Benefit of In-Network Processing
Simulation Results
2500 Nodes
50x50 Grid
Depth = ~10
Neighbors = ~20
Uniform Dist.
����&�*"���N � ����A�:�� �����������%�#����
7
>7777
C7777
B7777
!7777
67777
7777
E7777
F7777
87777
>77777
�N �� � � � � � N � H� � ; � � ����� �� � �� �� �
��������-,��,1��-,
�-���54��
�6�����
2
, ��������7 2����2���,2�,�8�,�0��9
� �&����#� �&����#� ��O%� ��O%
� �����1%��A� �����1%��A� &�1���#� &�1���#
![Page 54: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/54.jpg)
Optimization: Channel Sharing (“Snooping”)
Insight: Shared channel can reduce communication
Suppress messages that won’t affect aggregateE.g., MAXApplies to all exemplary, monotonic aggregates
Only snoop in listen/transmit slotsFuture work: explore snooping/listening tradeoffs
![Page 55: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/55.jpg)
Optimization: Hypothesis Testing
Insight: Guess from root can be used for suppression
E.g. ‘MIN < 50’Works for monotonic & exemplary aggregates
Also summary, if imprecision allowed
How is hypothesis computed?Blind or statistically informed guessObservation over network subset
![Page 56: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/56.jpg)
Experiment: Snooping vs. Hypothesis Testing
Uniform Value Distribution
Dense Packing
Ideal Communication
�������������� ������������������
������������������� �������!�"#�$##%�
�
���
����
����
����
����
����
�� �� �� �� ��
���������������
��������������
����
�� ���
�� ���
��������
��%��������� �( ��)
��%����������A�
![Page 57: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/57.jpg)
Duplicate Sensitivity
Hypothesis Testing, SnoopingCOUNT : monotonicAVG : non-monotonic
Monotonicity
Routing RedundancyMIN : dup. insensitive,AVG : dup. sensitive
Duplicate Sensitivity
Applicability of Sampling, Effect of Loss
MAX : exemplaryCOUNT: summary
Exemplary vs. Summary
Effectiveness of TAGMEDIAN : unbounded, MAX : 1 record
Partial StateAffectsExamplesProperty
![Page 58: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/58.jpg)
Use Multiple Parents
Use graph structure Increase delivery probability with no communication overhead
For duplicate insensitive aggregates, orAggs expressible as sum of parts
Send (part of) aggregate to all parentsIn just one message, via multicast
Assuming independence, decreases variance
��������� � � �2P5
A
B C
R
A
B C
c
R
�2&��)�G� �� �%##��/%&5�M�'
�2�%##���/��� �� @� 5�M�'C
�2#��5�M�#�P�'C
H��2#��5�M�#C P�'C P�2>�Q 'C5�≡ H
L ��/�'������M��
�2#��5�M���P�2#D��P�'C5
H��2#��5�M���P�2#D�5C P�'C P�2>�Q 'C5��M�HD� A
B C
c/n c/n
R
n = 2
![Page 59: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/59.jpg)
TAG Contributions
Simple but powerful data collection languageVehicle tracking:
SELECT ONEMAX(mag,nodeid)EPOCH DURATION 50ms
Distributed algorithm for in-network aggregationCommunication ReducingPower Aware
Integration of sleeping, computationPredicate-based grouping
Taxonomy driven API Enables transparent application of techniques to
Improve quality (parent splitting)Reduce communication (snooping, hypo. testing)
![Page 60: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/60.jpg)
Acquisitional Query Processing (ACQP)
Closed world assumption does not holdCould generate an infinite number of samples
An acqusitional query processor controls when,
where,
and with what frequency data is collected!
Versus traditional systems where data is provided a priori
![Page 61: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/61.jpg)
ACQP: What is Different?
How should the query be processed?Sampling as a first class operationEvent – join duality
How does the user control acquisition?Rates or lifetimesEvent-based triggers
Which nodes have relevant data?Index-like data structures
Which samples should be transmitted?Prioritization, summary, and rate control
![Page 62: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/62.jpg)
Event-Based Processing
ACQP – want to initiate queries in response to events
� � ��H�� � 1��@���2<5
������ 1:#��R>
� � � 1����� � 1
� � ��� ���� �� 1
� � ��
:,;,��3 -�<��-����
� �8=�1��--���� �>���-,
� �� ���*� ��� 1���2%���> �#��5
��S � >�
![Page 63: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/63.jpg)
More EventsON EVENT bird_detect(loc) AS bd
SELECT AVG(s.light), AVG(s.temp)
FROM sensors AS s
WHERE dist(bd.loc,s.loc) < 10m
SAMPLE PERIOD 1s for 10
![Page 64: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/64.jpg)
Event Based Processing
![Page 65: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/65.jpg)
, �2��� '&����� ��5�����2��� '&����&����5>677�%9 �A�:�87�%9
Operator Ordering: Interleave Sampling + Selection
SELECT light, magFROM sensorsWHERE pred1(mag)AND pred2(light)EPOCH DURATION 1s
σσσσ2'�>5
σσσσ2'�C5
� ��
&����
σσσσ2'�>5
σσσσ2'�C5
� ��
&����
σσσσ2'�>5
σσσσ2'�C5
� �� &����
���2���-,��? 5+ �
� �$�
At 1 sample / sec, total power savings could be as much as 3.5mW ����Comparable to processor!
Correct orderingCorrect ordering(unless pred1 is (unless pred1 is very very selective selective
and pred2 is not):and pred2 is not):
���'
����&"
![Page 66: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/66.jpg)
Exemplary Aggregate Pushdown
SELECT WINMAX(light,8s,8s)FROM sensorsWHERE mag > xEPOCH DURATION 1s
, � �A&?�����&�'%���( ���#���O%
, � �� ��� '&����������� ����G'���A��'������T
γγγγ� �� � � N
σσσσ2� ���G5
� �� &����
���2���-,��? 5+ �
&����
� ��
σσσσ2� ���G5
γγγγ� �� � � N
σσσσ2&�������� � N 5
� �$�
![Page 67: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/67.jpg)
Sensor Network Challenge DB Problems
Temporal aggregates
Sophisticated, sensor network specific aggregates
Isobar FindingVehicle TrackingLossy compression
Wavelets-���1���������.
![Page 68: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/68.jpg)
TinyDB Deployments
Initial efforts:Network monitoringVehicle tracking
Ongoing deployments:Environmental monitoring Generic Sensor KitBuilding MonitoringGolden Gate Bridge
![Page 69: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/69.jpg)
Summary
Declarative queries are the right interface for data collection in sensor nets!
Easier, faster, & more robust
Acquisitional Query Processing Framework for addresses many new issues that arise in sensor networks, e.g.
Order of sampling and selectionLanguages, indices, approximations that give user control over which data enters the system
![Page 70: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/70.jpg)
Multi-DimensionalRange Searching
![Page 71: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/71.jpg)
Range Queries
Range queries ask for attribute readings with data values in certain ranges, e.g., temperature T � [-15 C, +15 C]They are well-suited to data with uncertainty, such as sensor readingsUsually multiple attributes are involvedTypically, the number of records satisfying the query is small compared to the total number of records
![Page 72: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/72.jpg)
Data-Base Indices
When repeated queries are made on the same data, it makes sense to preprocess the database so as to make the query processing fasterThe auxiliary structures we build to facilitate this processing are called indicesA large body of literature exists on building indices for one-dimensional attributes
![Page 73: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/73.jpg)
Metrics for Evaluating Indices
For a data base of n records, the relevant metrics are
the index size, S(n)the preprocessing time required to build the index, P(n)the query cost the index enables, Q(n)the update cost to allow for record insertions and deletions to the database, U(n)
![Page 74: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/74.jpg)
Distributed Range Searching
All structures we saw so far are hierarchical – in a distributed setting nodes that hold data close to the root are likely to be overloadedWe discuss one sensor network range searching approaches
The DIMENSIONS system from UCLA
![Page 75: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/75.jpg)
Some Issues to Consider
How is information aggregated spatially and temporally?How does the system decide where to store information?How are queries routed to the correct nodes?What steps does the system take to reduce energy use?
![Page 76: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/76.jpg)
/ '� ���:���@
��
DIMENSIONS System: Key Ideas
Construct distributed load-balanced quad-tree hierarchy of lossy wavelet-compressedsummaries corresponding to different resolutions and spatio-temporal scales.
Queries drill-down from root of hierarchy to focus search on small portions of the network.
Progressively age summaries for long-term storage and graceful degradation of query quality over time.
�������
�������
�������
/ '� ���:���@�'��@
[From Ganesan, et., al., 2003]
![Page 77: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/77.jpg)
Constructing the Hierarchy
Initially, nodes fill up their own storage with raw sampled data.
![Page 78: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/78.jpg)
Constructing the Hierarchy
Tesselate the network space into a grid; use hashing in each cell to determine location of clusterhead (ref: DCS).Send wavelet-compressed local time-series to clusterhead.
![Page 79: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/79.jpg)
Processing at Each Level
x
time
y
���� ��������������������� �����������
��� �������� ������� ������� ���� ������� ������� ��������
�� ������ ���������������� ���� �� ����������������
Wavelet encoder/decoder
![Page 80: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/80.jpg)
Constructing the Hierarchy
Recursively send data to higher levels of the hierarchy.
![Page 81: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/81.jpg)
Distributing the Storage Load
Hash to different locations over time to distribute load among nodes in the network.
![Page 82: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/82.jpg)
Eventually, all available storage gets filled, and we have to decide when and how to drop summaries.
Allocate storage to each resolution and use each allocated storage block as a circular buffer.
������������������� ��
Res 1Res 2Res 3Res 4
Local storage capacity
What Happens when Storage Fills Up?
![Page 83: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/83.jpg)
Graceful Query Degradation: Provide more accurate responses to queries on recent data and less accurate responses to queries on older data.
Tradeoff Between Age and Storage Requirements for Summary
�������
�������
�������
!��� �����������
������������
� ��������������
���������������
����������������
� �������������� �
����
" ��� ������� ������� �������������� ���� ��������������������������� ���� ���� ��� �������������� ������������� ��������������������#���� $
![Page 84: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/84.jpg)
Match system performance to user requirements
Objective: Minimize worst case difference between user-desired query quality (green curve) and query quality that the system can provide (red step function).
���� ���� �������
� ��
������������
�����������
%������ �������������� �&�'����������������������������(��� �(����� ���������� �� ���������
� ������� ������������������� �&�QQsystemsystem, , with steps at times when with steps at times when summaries are aged.summaries are aged.
iAge
95%
50%
![Page 85: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/85.jpg)
What Do We Know?
GivenN sensor nodes.Each node has storage capacity, S.Data is generated at resolution i at rate Ri.Quser - User-desired quality degradation.
We might be provideda set of typical queries, T, that the user provides.D(q,k) – Query Error when drilldown for query qterminates at level k.
![Page 86: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/86.jpg)
Determining Query Quality from Multiple Queries
Error
50%
5%
Edge Query: Find nodes along a boundary between high and low precipitation areas.
Max Query: Find the node which has the maximum precipitation in January.
We need to translate the performance of different drill-down queries to a single “query quality” metric.
Only coarsest summaryis queried.
All resolutions (from coarsest to finest) are queried
![Page 87: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/87.jpg)
Definition: Query Quality
Given:T = set of typical queries.D(q,k) = Query error when drill-down for query q in set T terminates at resolution k.
The query quality for queries that refer to data at time t in the past, Qsystem(t), if k is the finest available resolution is:
∑∈
=Tq
),(|T|
1)( kqDt
systemQ
![Page 88: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/88.jpg)
How Many Levels of Resolution kAre Available at Time t ?
Given:Ri = Total transmitted data rate from level iclusterheads to level i+1 clusterheads.
Define si = storage allocated at each node for summaries of resolution i.
Level i
i
ii R
NsAge =Level i+1
![Page 89: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/89.jpg)
Storage Allocation: Constraint-Optimization problem
Objective: Find {si}, i=1..log4Nthat:
Given constraints:Storage constraint: Each node cannot store any greater than its storage limit.Drill-down constraint: It is not useful to store finer resolution data if coarser resolutions of the same data is not present.
)()( ..0- t
max min tQtQ systemuser −∞=
SsN
ii ≤∑
=
4log
1
ii AgeAge ≥+1
![Page 90: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/90.jpg)
Determining Rates and Drilldown Query Errors
iRHow do we determine How do we determine communication rates to bound communication rates to bound query error?query error?
How do we determine the drillHow do we determine the drill--down query error when prior down query error when prior information about deployment information about deployment and data is limited?and data is limited?
),( kqD
Assume: Rates are fixed aAssume: Rates are fixed a--priori by priori by communication constraints.communication constraints.
![Page 91: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/91.jpg)
Solve ConstraintOptimization
Prior information about sampled data
Omniscient StrategyBaseline. Use all data to decide optimal allocation.
Training Strategy(can be used when small training dataset from initial deployment).
Greedy Strategy(when no data is available, use a simple weighted allocation to summaries).
Coarse Finer Finest
1 : 2 : 4
No a priori information
full a priori information
![Page 92: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/92.jpg)
Distributed Trace-Driven Implementation
Linux implementation for ipaq-class nodes uses Emstar (J. Elson et al), a Linux-based emulator/simulator for sensor networks.3D Wavelet codec based on freeware by Geoff Davis available at: http://www.geoffdavis.net.Query processing in Matlab.
Geo-spatial precipitation dataset15x12 grid (50km edge) of precipitation data from 1949-1994, from Pacific Northwest†. (Caveat: Not real sensor data).
System parameterscompression ratio: 6:12:24:48.Training set: 6% of total dataset.
)*��+����������,�-������� ���.��/����� ���� ������ ������������ ��� ������0�������1 �������&��232�23�
![Page 93: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/93.jpg)
How Efficient is The Search?
Search is very efficient (<5% of network queried) and accurate for different queries studied.
![Page 94: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/94.jpg)
Comparing Aging Strategies
Training performs within 1% to optimal . Careful selection of parameters for the greedy algorithm can provide surprisingly good results (within 2-5% of optimal).
![Page 95: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/95.jpg)
Conclusion
Range searching in an important capability for sensor networksTo allow efficient query processing, data aggregation over space and time is requiredMany methods employ hierarchical structuresNew communication problems arise in how to avoid overloading nodes high in the hierarchyLimited node memory implies that data ageing issues have to be addressed
![Page 96: Data Storage in Sensor Networks](https://reader030.vdocuments.site/reader030/viewer/2022040911/624e4bafb7c7fd43bf7e1089/html5/thumbnails/96.jpg)
The End