Download - A Database Perspective on Sensor Networks
2
Outline
• Introduction– Applications– Sensor Networks & Database Technology
• Part I: Sensor Networks– What are the capabilities of sensor nodes and of sensor
networks? What is the nature of sensor data? • Part II: Database Technology
– What are the relevant aspects of DB technology? Can they be applied in the context of sensor networks? What are the new problems?
3
Sensor-based Application #1
http://www.spyplanes.com/http://www.millennium.berkeley.edu/tinyos/uav.html
4
Sensor-based Application #2
http://www.media.mit.edu/resenv/ (Ara Knaian’s thesis)
Internet
http://www.media.mit.edu/resenv/vehicles.html
5
Sensor-based Application #3
Long-range Radio
http://birds.cornell.edu/
6
• Energy Efficient• Scalable• Accurate• Reliable• Low LatencySignal Processing
(Sensor Tasking)
Declarative Access
Area Monitoring Applications
7
Area Monitoring Applications
On-demandaccess to sensor data
Predefinedaccess to sensor data
On-demandSensor Tasking
One-TimeSensor Tasking
Application #3
Application #1
Application #2
(fixed point for data collection)
(mobile point for data collection)
8
Declarative Access to Sensor Data
• SQL Queries over a Sensor Network [T00][BS00]– Access to large collection of sensors– Associative access independent of the physical organization
of the sensor network
Example #1: Every minute return the measurement obtained fromRegion X.Example #2: Whenever two sensors within 5 yards of eachother detect a bird then return their location. Example #3: Every five minutes return the number of birds detected in Region X.
9
Database Analogy
Data Extraction
SensorNetwork
Sensors
Declarative SQL Query
SQL Engine
StorageManager
Data on Disk
10
Sensor Database SystemDeclarative SQL Query
SQL EngineSensor Network
Sensors
StorageManager
Data on Disk
Adapting database technology to support declarative access to sensor data in the context of area monitoring applications
11
Other Sensor-based Applications
• Condition-based maintenance– Product Quality Monitoring
• Device management– Smart office spaces– Home automation– Networked cars
• … The opportunities for database technology might exist but are less obvious
12
Part I: Sensor Networks
13
Issues from a Database Perspective
• What is sensor data?• How is sensor data accessed?• What about data storage and processing
capabilities on sensor nodes?• What is the cost of accessing sensor data? • What kind of abstraction to use in order to
represent a sensor network?• Ideas to reuse?
14
WINS NG Sensor Nodes
http://www.sensoria.com/
AnalogI/O
DigitalI/O
DSP
ControlProc.
GPS Ethernet
Real-TimeInterfaceProcessor
PowerPC32 bits
Processor
RF Modem
Powersupply
15
Smart Dust Motes
http://robotics.eecs.berkeley.edu/~pister/SmartDust/
Laser diodeIII -V process
Passive CCR comm.MEMS/ polysilicon
Active beam steering laser comm.MEMS/optical quality polysilicon
SensorMEMS/bulk, surface, ...
Analog I/O, DSP, ControlCOTS CMOS
Solar cellCMOS or III -V
Thick film batterySol/gel V 2O5
Power capacitorMulti -layer ceramic
1-2 mm
16
COTS Macro Dust Motes
http://www-bsac.eecs.berkeley.edu/~shollar/macro_motes/macromotes.html
17
Processing Capabilities
• WINS NG :– General Purpose Processor - PowerPC
• 66 MHz– 87 MIPS – 16 MB RAM
– DSP – TI5402• 100 MHz, 25 ksps input, 5ksps output to processor
• Macro Motes: – Micro-controller - AMTEL MCU
• 4 MHz, 8 kb of program memory, 512 of data memory.• Idle, power down, power save modes.
18
Communication Capabilities
• Radio Frequency– WINS NG
• WINS2.0 modem – 2.4 GHz - Frequency Hopping - 56 kbps – 30 m range
– Macro Motes• RFM T1000 – 900 MHz - On/Off Key Encoding – 10 kbps –
20 m range
• Optical Communication– Smart Dust
• Passive Corner Cube Reflector – On/Off Key Encoding (downlink) - 1kbps link over 500 m range
19
Optical NetworkingTop View of the Interrogator
CCD Camera Lens
Frequency-Doubled Beam45o mirror
Polarizing Beamsplitter
Quarter-wavePlateFilter
0.25% reflectance on each surface
YAG Green Laser Expander
J. M. Kahn, R. H. Katz and K. S. J. Pister, "Mobile Networking for Smart Dust", ACM/IEEE Intl. Conf. on Mobile Computing and Networking (MobiCom 99).
20
Piconet
• Cluster: 1 Master / N Slaves• Master synchronizes
communications in a cluster (TDMA)
• Dual radio used in WINS NG to allow for multi-hop communication across clusters
M
M
S
S
MS
S
S
S
S
ftp://ftp.uk.research.att.com/pub/docs/att/tr.97.9.pdfThe Bluetooth Radio System: Jaap C. Haartsen. IEEE PC Feb 2000
22
Batteries
• Energy densities (Wh/L)– Li-ion: 500 (~1.8J/mm3)– Li/So2: 176– Alkaline: 80– Nickel Cadmium: 40
• Moore’s law does not apply to batteries
Joe Paradiso’s survey of “renewable energy sources for the futureof mobile and embedded computing”
http://www.media.mit.edu/resenv/
18650 Li-ion Cell Energy Density
0
100
200
300
400
500
600
1995 1997 1999 2001 2003 2005 2007 2009
Year
Ener
gy D
ensi
ty (W
h/L)
Energy
Courtesy ofMarc Doyle,
DuPont
23
Energy Consumption
• Smart Dust– Objective: each mote
should consume less than 1 J / day (amount of energy produced by solar cells)
– Towards 10 pJ/ instruction for dedicated microcontrollers
– 1nJ to transmit a bit with CCR passive transmitter
• Macro Motes– 1 J to transmit a bit; 0.5 J
to receive a bit (10kpbs & 10mW)
– 10 nJ / instructions• WINS
– 10 J to transmit a bit (i.e., 100 mW transmit power and 100 ms to send a 32 bytes packet – very conservative estimate)
– 1 nJ/ instructions
Executing an instruction costs orders of magniture less than sending a bit of data
24
Signal Processing: Basics
• Measurement• Detection• Classification• Localization• Tracking
FFTTime Series
AdaptiveNormalizer
EnergyDetect Decision
Timer
Threshold
EventNo Event
Fundamentals of Statistical Signal Processing, Vol I&II by Steven McKay
A time stamp is associated toeach signal processing output
25
Signal Processing: Data Fusion
• Data Fusion– In: Observations from
different sensors– Out: Weight associated
to hypothesis• Approach
– Inferences (Bayesian, genetic algorithm, …)
– Peer-taskingR.Brooks and S.Iyengar. Multi-sensor Fusion: Fundamentals and Applications with Software. Prentice Hall.
26
RF Networking: Directed Diffusion
• Publish-Subscribe interface• Gradient based routing
– Data is sent on multiple routes
• Reinforcement learning– Chooses good route– Adapts to node failures
• In-network aggregation
SCADDS Project - http://www.isi.edu/scaddsDataSpaces - http://www.cs.rutgers.edu/datamanDSN Project - http://www.east.isi.edu/projects/DSN/
27
Operating System: Requirements• Compact scale
– Small footprint, efficient use of instruction set• Efficient Multithreading
– Concurrency-intensive operations• Sensor data + network data (+ GPS data)
• Efficient drivers– Limited levels of abstractions– Migration across hardware/software boundaries
• Modularity– Composition of modules for each type of sensor node– Support for mobile code
• Robust operations– Memory management
28
Operating System:tinyOS
RFM
Radio byte
Radio Packet
UART
Serial Packet
i2c
Temp
photo
Active Messages
clocksbit
byte
packet
Route map router sensor applnapplication
HW
SW
J.Hill, R.Szewczyk, A.Woo, S.Hollar, D.Culler, K.PisterSystem Architecture Directions for Networked Sensors. ASPLOS 2000.http://www.cs.berkeley.edu/~jhill/tos/
29
Design Space
Multi-hoptopology
Startopology
Sensor Pack
“Systemon a chip”
WINS NG
Macro Motes
Smart Dust Front-end
Front-end
30
What is Sensor Data?• Sensor data is generated by signal processing functions
– Measurements– Detections– Classification
• Time stamp associated to each sensor data item• Sensor data produced by individual sensors or groups
of sensors– If no “peer tasking” is used then the group of sensors that
produce data is the group of sensors on which the signal processing functions are invoked.
31
How is Sensor Data Accessed?
• Multi-hop RF network– Front-end connected to gateways nodes– Sensor nodes that produce data are sources, gateway
nodes are sinks.– Processing can be pushed in multi-hop network in order
to trade increased local processing for reduced traffic.• Optical network
– Front-end obtains data from all the nodes in its line of sight.
– Star Topology.
32
• Sensor pack– Large processing capabilities and buffer space
• System on a chip– Restricted processing capabilities and buffer space
• Data items should be processed as they are generated• No elaborate processing on the sensor nodes• No historical data is maintained
• Possible hierarchy of sensor nodes– A few sensor packs arranged in a multi-hop network– To each sensor pack is attached lots of miniature sensors
(system on a chip).
What About Data Storage and Processing Capabilities on the Nodes?
33
What is the Cost of Accessing Sensor Data?
• Energy is the scarce resource– Processing – Storage– Transmission
• Local processing is orders of magnitude cheaper than transmission– Propagation with nodes on the ground
accentuates this characteristic
34
What kind of abstraction to represent a sensor network?
• G = (V,E)– Vertices represent sensor nodes– Edges represent connected sensor nodes
• Model#1: The graph of connected nodes is fully connected. Each edge is annotated with the cost of the transmission between any two nodes.
– Relies on routing layer– How to estimate cost of transmission?
• Model#2: The graph of connected nodes is not fully connected. An edge represents a single hop
– Relies on physical layer– Stable for limited periods of time
35
Ideas to Reuse?
• Energy efficient, small footprint solutions• Easy to reconfigure, “0 administration” systems• Reinforcement learning
– Finding an optimal solution in a dynamic environment• Event-based processing
– Streams of sensor data items need be processed as they are produced
36
Break
37
Part II: Sensor Networks & Databases
38
Declarative Access to Sensor Data
• Sensors are data sources• Queries to access sensor data regardless of
physical organization
Example #1: Every minute return the measurement obtained fromRegion X.Example #2: Whenever two sensors within 5 yards of eachother detect a bird then return their location. Example #3: Every five minutes return the number of birds detected in Region X.
39
Queries over a Sensor Network
• Do data fusion, directed diffusion, and query processing share the same notion of query?– Yes
• Collect, filter, correlate, aggregate sensor data– … and No
• Data Fusion: hypothesis testing in a neighborhood• Directed Diffusion: efficient, scalable cross-layer routing• Query Processing: SQL queries over sensor data
• From a query processing viewpoint– Support for data fusion?– Integration with network routing?
40
Warehousing Approach
• Data is extracted from sensors and stored on a front-end server
• Query processing takes place on the front-end.
Warehouse
Front-end
Sensor Nodes
41
Sensor Database System
• Sensor Database System supports distributed query processing over a sensor network
SensorDB
SensorDB
SensorDB
SensorDB Sensor
DB
SensorDB
SensorDB
SensorDB
Front-end
Sensor Nodes
42
Sensor Database System
• Characteristics of a Sensor Network: Streams of data, uncertain data, large number of nodes, multi-hop network, no global knowledge about the network, failure is the rule, energy is the scarce resource, limited memory, no administration, …
1. Can existing database techniques be reused in this new context? What are their limitations?
2. What are the new problems? What are the new solutions?
43
Issues
• Representing sensor data• Representing sensor queries• Processing query fragments on sensor nodes• Distributing query fragments• Adapting to changing network conditions• Dealing with site and communication failures• Deploying and Managing a sensor database system
44
Performance Metrics
• High accuracy– Distance between ideal answer and actual answer?– Ratio of sensors participating in answer?
• Low latency– Time between data is generated on sensors and answer is
returned• Limited resource usage
– Energy consumption:E (J) = Wcpu (J/inst) * CPU (inst) + Wram (J/b) * RAM (b) +
Wmsg (J/msg sent) * nb msg sent + Wbdw (J/b) * bytes sent (b)
45
Representing Sensor Data and Sensor Queries
• Sensor Data:– Output of signal processing functions
• Time Stamped values produced over a given duration
– Inherently distributed• Sensor Queries
– Conditions on time and space• Location dependent queries• Constraints on time stamps or aggregates over time windows
– Event notification
46
The COUGAR Model
• Schema-Level – Each type of sensor is
represented as an ADT– To each signal-processing
function is associated an ADT function that returns a sequence
– A sequence associates sets of records with positions (elements in an ordered domain).
detect
SensorIdTimeStamp Out
T1 #1 True
T2 #1 True#2 True
T4 #3 True
In
9090
9090
47
The COUGAR Model
• Long-running SQL queries– Sequence functions over
sensor ADT functions (returning sequences)
– New sensor data items appended to sequence as they are produced
– Materialized view updated as sensor data items are appended
Select R.s.detect(90).project(s1.sensorId)From RWhere $every(60);
detect
SensorIdTimeStamp Out
T1 #1 True
T2 #1 True#2 True
T4 #3 True
In
9090
9090
P.Bonnet, J.Gehrke, P.Seshadri. Towards Sensor Database Systems. MDM’01http://www.cs.cornell.edu/database/cougar
48
A Measure Theoretic Probabilistic Data Model
• Outputs of a signal processing function might be continuous probability distributions
• Extension of data model for discrete probability distributions using measure theory
• Specific model for multidimensional parametric distributions (e.g., Gaussians)– Event probabilities– Comparisons
SensorIdTimeStamp Out
T1 #1
T2 #1#2
T4 #3
In
9090
9090
T1
T.Faradjian, J.Gehrke, P.Bonnet. A Model Theoretic Probabilistic Data Model.Cornell Technical Report . December 2000.
Detection
49
WebDust• Data Model
– DataSpaces: spatial decomposition of physical space
– Each sensor is an abstract data type
• InfoDispensers– Data aggregation devices
• Spatial Web– For organizing and
representing information aggregated by InfoDispenders
http://www.cs.rutgers.edu/dataman/webdust
T.Imielinski, S.Goel. DataSpace – Queryingand Monitoring Deeply Networked Collectionsin Physical Space. MobiDE 1999.
50
Control Language in Sagres• Data model
– Ontology that contains class information
– World State that contains device data
– XML encoding• DevL language
– Rules are defined for each device
– ECA model for querying and updating the World State
http://data.cs.washington.edu/ubiquitous/sagres/
51
Subscription Language in LeSubscribe
• Event Model– Similar to LDAP data model– An event type is associated to
a set of attributes– An event instance includes a
set of values
• Subscription Language– A subscription is a
conjunction of conditions on attributes
• An event instance e matches a subscription s if e provides a binding for every attribute occurring in s and all predicates in s are true with respect to this binding
J.Pereira et al. Publish/Subscribe on the Web at Extreme Speed. VLDB 2000.
52
Discussion• Data Model
– Representing sensors and signal-processing functions
• Abstract Data Types vs. attribute-value pairs
– Capturing the temporal aspect of sensor data
• Sequences vs. event model• New operators on data
streams– Representing uncertain data
• Probabilistic Data Model– Data Format
• XML vs. byte array
• Query Language– Manipulating sensor
data • Long-running SQL
queries vs. active rules
– Need for a propagation mechanism for sensor data (as events)
53
Processing query fragments on sensor nodes
• Processing query fragments on sensor nodes allows trading increased processing on sensor nodes for reduced network traffic– Valid trade-off in multi-hop
networks • Need for a light-weight
query engine on sensor nodes
• Limited Resources: – How to scale down the footprint
of the query engine?– How to manage the resource
consumption of the query engine (including CPU, RAM and energy)
• Event-based processing– Query processing takes place as
data items are produced by signal processing functions (or obtained from other sensor nodes). How does this impact the architecture of the query engine?
54
Light-weight query engines
• Commercial DBMS for palm-sized PCs including query processing and replication capabilities– Footprint limited to several hundred kbytes.
• PicoDBMS for the SmartCard – Focus on query processing without RAM.
• RISC-style Database System
C.Bobineau, L.Bouganim, P.Pucheral, P.Valduriez. PicoDBMS:Scaling down Database Techniques for the Smartcard. VLDB 2000.
S.Chaudhuri, G.Weikum. Rethinking Database System Architecture: Towards a Self-Tuning RISC-style Database System
55
Discussion• Need for scaled down database systems
– PicoDBMS focuses on RAM– Need for energy-aware query processing: managing CPU mode
to reduce energy usage
• Need for composition of database components– Building systems adapted to sensor capabilities (RAM, CPU,
energy) – tinyOS argument - similar to wrapper generators objective.
– Predictable performances for capacity planning and admission control
M.Weiser et al. Scheduling for reduced CPU usage. OSDI 1994.
56
Distributing query fragments
• Because producing and transmitting data is energy expensive, only the sensors involved in a query should be tasked to produce and transmit data.
• When placing query fragments, the system should consider the performance trade-off between increased processing on the nodes and reduced network traffic– Accuracy– Response Time– Resource Usage Cost model or Admission Control?
57
Distributing query fragments
• Distributed Database Systems assume– A centralized optimizer has global knowledge about all
the nodes– Meta-data is static
• This assumptions is challenged in the context of large-scale multi-hop sensor networks:– No global knowledge– Mobile sensors– Meta-data is dynamic
Decentralized Meta-data Management
58
Decentralized Meta-data Management
• No global knowledge– Resource Discovery on the Internet
• Index structure imposed on the network
• Dynamic Meta-data– Indexing Moving Objects
– Decisions taken at one point in time might be challenged later on!
Astrolabe - http://www.cs.cornell.edu/Info/People/rvr/astrolabe/Tapestry (OceanStore) - http://oceanstore.cs.berkeley.edu/
S.Salteis et al. Indexing the routes of Continuously Moving Objects. SIGMOD 2000O.Wolfson et al. Location Prediction and Queries for Tracking Moving Objects.ICDE 2000.
59
• Mariposa– Each autonomous site bids for queries in order to increase
the value of a reward function
• Quality of Service and Query Processing– Budget associated to each query
• Accuracy, Latency, Resource Usage– The system guarantees that each query is evaluated within
the given budget• Admission Control• Monitoring and Adaptation
Cost Model or Admission Control?
http://www.db.fmi.uni-passau.fr:8000/projects/OG
http://s2k-ftp.cs.berkeley.edu:8000/mariposa
60
Discussion
• Decentralized Meta-data management– Adapting data structures defined for resource discovery on
the Internet seems promising– Dealing with continuously changing meta-data– Similar problem for large-scale mediator systems
• Decentralized Query Planning– Query Decomposition
• Bottom-up? Top Down?– Negotiation between sites to reach agreement on which site
processes which query fragments• Need for adaptation and renegotiations when meta-data change
61
Adapting to changing network conditions
• During query executions streams of data flow from a large number of sensors to front-ends or between sensors– Dataflow engine
• Because of the nature of sensor data and because of congestion or failures it is impossible to predict how data will be obtained at a query processing site.– Adaptive query processing at each site
62
Dataflow Engines
• Same set of operations (query fragment) performed in parallel on multiple sites
• Mechanisms for load balancing – River: over a cluster– Mayr et al.: over heterogeneous
resources
Telegraph: http://telegraph.cs.berkeley.edu/River: http://now.cs.berkeley.edu/River/
http://www.research.microsoft.com/~gray/riverHeterogeneous Resources: http://www.cs.cornell.edu/mayr
Op SplitMerge
Op SplitMerge
Op SplitMerge
Op SplitMerge
Op SplitMerge
Op SplitMerge
Op SplitMerge
63
Adaptive Query Processing• Given a query
fragment: for each record, which operator should be executed next?
• Decision based on “back pressure” at the queue associated to each operator– Reinforcement learning
Ron Avnur and Joseph M. Hellerstein . Eddies: Continuously Adaptive Query Processing. SIGMOD 2000
Eddy
64
Discussion
• Integration of adaptive query processing with dataflow engines over a sensor network– How to take site or communication failure into
account?• Using reinforcement learning to take decisions over multiple
dataflows?– How to establish dataflow?
• No centralized site that establishes a dataflow. Need to take mobile sites into account.
• Need for distributed scheduling. Data driven control might not be sufficient. Using admission control to establish dataflow schedules?
65
Dealing with Site or Communication Failures
• Because sensors run out of energy, site and communication failures are the rule and not the exception in a sensor network
• Taking site or communication failure into account in dataflow processing:– Sensor data is uncertain in the
first place. Combining uncertainty and unavailability?
– Fault-tolerance mechanisms for intermediate query processing sites
– Trading resource usage and delay for increased accuracy in case of communication failure
• Assessing the quality of each answer– Approximate Query
Processing– Quality of Service
• Accuracy requirement• The system guarantees that
requirements are met
66
Deploying and Managing a Sensor Database System
• Sensor networks should be deployed and left unattended.
• It should be easy to add or remove sensor nodes.
• A sensor database system should – Take advantage of all
sensors in the system – Be as easy to deploy and
manage as all other components
• Need for mechanisms to acquire and distribute meta-data
• Need for mechanisms to adjust dataflow depending on the status of the sensor network
• It should be easy to configure, install and reboot sensor database components– Risc-style architecture?
67
Summary
• What database techniques can be reused?– Data model and query
languages• Sequences• Subscription languages
– Adaptive query processing– Small footprint and
modular architecture for query engine
• What is new?– Uncertain data and
unavailable data– Decentralized meta-data
management and query planning
– Combining dataflow engine and adaptive query processing
– Failure handling in dataflow engines
– Quality of service and query processing
68
Other Issues
• Historical analysis over data cached in the sensor network
• Asynchronous query processing– User submits a query at a given location and
obtains the answer later on at a different location
Example: What was the average temperature in Region X between 10 am and 1 pm yesterday.
69
Queries over a Sensor Network
• Support for data fusion– Peer-tasking:
extending dataflow dynamically
– Fully decentralized system: each sensor node can submit a query
• Integration with network routing– Sharing meta-data– Dataflow engine as
application in a cross layer routing mechanism
– Quality of service or cost information provided by routing layer
70
Acknowledgements
DARPA Sensit Programhttp://www.darpa.mil/ito/research/sensit/
Many thanks to Steve Beck, Richard Brooks, Jason Hill, Bill Kaiser, Donald Kossman, Sri Kumar, Tobias Mayr, Kris Pister, Joe Paradiso