shivkumar kalyanaraman rensselaer polytechnic institute 1 peer-to-peer (p2p) and sensor networks...
DESCRIPTION
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 3 P2P: Key Idea q Share the content, storage and bandwidth of individual (home) users InternetTRANSCRIPT
Shivkumar KalyanaramanRensselaer Polytechnic Institute
1
Peer-to-Peer (P2P) and Sensor Networks
Shivkumar KalyanaramanRensselaer Polytechnic Institute
[email protected] http://www.ecse.rpi.edu/Homepages/shivkuma
Based in part upon slides of Don Towsley, Ion Stoica, Scott Shenker, Joe Hellerstein, Jim Kurose, Hung-Chang Hsiao, Chung-Ta King
Shivkumar KalyanaramanRensselaer Polytechnic Institute
2
P2P networks: Napster, Gnutella, Kazaa Distributed Hash Tables (DHTs) Database perspectives: data-centricity, data-
independence Sensor networks and its connection to P2P
Overview
Shivkumar KalyanaramanRensselaer Polytechnic Institute
3
P2P: Key Idea Share the content, storage and bandwidth of
individual (home) users
Internet
Shivkumar KalyanaramanRensselaer Polytechnic Institute
4
Shivkumar KalyanaramanRensselaer Polytechnic Institute
5
Shivkumar KalyanaramanRensselaer Polytechnic Institute
6
What is P2P (Peer-to-Peer)? P2P as a mindset
Slashdot P2P as a model
Gnutella P2P as an implementation choice
Application-layer multicast P2P as an inherent property
Ad-hoc networks
Shivkumar KalyanaramanRensselaer Polytechnic Institute
7
P2P Application Taxonomy
P2P Systems
Distributed ComputingSETI@home
File SharingGnutella
CollaborationJabber
PlatformsJXTA
Shivkumar KalyanaramanRensselaer Polytechnic Institute
8
How to Find an Object in a Network?
Network
Shivkumar KalyanaramanRensselaer Polytechnic Institute
9
A Straightforward Idea
Use a BIG server
Store the object
Provide a directoryNetworkHow to do it in
a distributed way?
Shivkumar KalyanaramanRensselaer Polytechnic Institute
10
Why Distributed? Client-server model:
Client is dumb Server does most things (compute, store, control) Centralization makes things simple, but introduces
Single point of failure, performance bottleneck, tighter control, access fee and manage cost, …
ad hoc participation? Estimate of net PCs
10 billions of Mhz CPUs 10000 terabytes of storage
Clients are not that dumb after all Use the resources in the clients (at net edges)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
11
Shivkumar KalyanaramanRensselaer Polytechnic Institute
12
Shivkumar KalyanaramanRensselaer Polytechnic Institute
13
First Idea: Napster Distributing objects, centralizing directory:
Network
Shivkumar KalyanaramanRensselaer Polytechnic Institute
14
Shivkumar KalyanaramanRensselaer Polytechnic Institute
15
Shivkumar KalyanaramanRensselaer Polytechnic Institute
16
Shivkumar KalyanaramanRensselaer Polytechnic Institute
17
Today: P2P Video traffic is dominant
Source: cachelogic; Video, bittorrent, edonkey !
Shivkumar KalyanaramanRensselaer Polytechnic Institute
18
40-60%+ P2P traffic
Shivkumar KalyanaramanRensselaer Polytechnic Institute
19
2006 p2p Data Between 50 and 65 percent of all download traffic is P2P related.
Between 75 and 90 percent of all upload traffic is P2P related. And it seems that more people are using p2p today
In 2004 1 CacheLogic-server registered 3 million IP-addresses in 30 daysIn 2006 1 CacheLogic-server registered 3 million IP-addresses in 8 days
So what do people download? 61,4 percent video
11,3 percent audio27,2 percent is games/software/etc.
The average filesize of shared files is 1 gigabyte! Source: http://torrentfreak.com/peer-to-peer-traffic-statistics/
Shivkumar KalyanaramanRensselaer Polytechnic Institute
20
Shivkumar KalyanaramanRensselaer Polytechnic Institute
21
A More Aggressive Idea Distributing objects and directory:
Network
How to findobjects w/odirectory?Blind
flooding!
Shivkumar KalyanaramanRensselaer Polytechnic Institute
22
Shivkumar KalyanaramanRensselaer Polytechnic Institute
23
Gnutella Distribute file location Idea: flood the request Hot to find a file:
Send request to all neighbors Neighbors recursively multicast the request Eventually a machine that has the file receives the request,
and it sends back the answer Advantages:
Totally decentralized, highly robust Disadvantages:
Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
24
Ad-hoc topology Queries are flooded for bounded number of hops No guarantees on recall
Gnutella: Unstructured P2P
Query: “xyz”
xyz
xyz
Shivkumar KalyanaramanRensselaer Polytechnic Institute
25
Now Bittorrent & Edonkey2000! (2006)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
26
Lessons and Limitations Client-Server performs well
But not always feasible Ideal performance is often not the key issue!
Things that flood-based systems do well Organic scaling Decentralization of visibility and liability Finding popular stuff Fancy local queries
Things that flood-based systems do poorly Finding unpopular stuff [Loo, et al VLDB 04] Fancy distributed queries Vulnerabilities: data poisoning, tracking, etc. Guarantees about anything (answer quality, privacy, etc.)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
27
Detour …. Bittorrent
Shivkumar KalyanaramanRensselaer Polytechnic Institute
28
Shivkumar KalyanaramanRensselaer Polytechnic Institute
29
Shivkumar KalyanaramanRensselaer Polytechnic Institute
30
new leecher
BitTorrent – joining a torrent
Peers divided into: seeds: have the entire file leechers: still downloading
datarequest
peer list
metadata file
join
1
2 3
4seed/leecher
website
tracker
1. obtain the metadata file2. contact the tracker 3. obtain a peer list (contains seeds & leechers)4. contact peers from that list for data
Shivkumar KalyanaramanRensselaer Polytechnic Institute
31
!
BitTorrent – exchanging data
I have leecher A
● Verify pieces using hashes● Download sub-pieces in parallel● Advertise received pieces to the entire peer list● Look for the rarest pieces
seed
leecher B
leecher C
Shivkumar KalyanaramanRensselaer Polytechnic Institute
32
BitTorrent - unchoking
leecher A
seed
leecher B
leecher Cleecher D
● Periodically calculate data-receiving rates● Upload to (unchoke) the fastest downloaders● Optimistic unchoking ▪ periodically select a peer at random and upload to it ▪ continuously look for the fastest partners
Shivkumar KalyanaramanRensselaer Polytechnic Institute
33
End of Detour ….
Shivkumar KalyanaramanRensselaer Polytechnic Institute
34
Back to … P2P Structures Unstructured P2P architecture
Napster, Gnutella, Freenet No “logically” deterministic structures to organize the
participating peers No guarantee objects be found
How to find objects within some no. of hops? Extend hashing
Structured P2P architecture CAN, Chord, Pastry, Tapestry, Tornado, … Viewed as a distributed hash table for directory
Shivkumar KalyanaramanRensselaer Polytechnic Institute
35
How to Bound Search Quality? Many ideas …, again
Network
Work onplacement!
Shivkumar KalyanaramanRensselaer Polytechnic Institute
36
High-Level Idea: Indirection Indirection in space
Logical (content-based) IDs, routing to those IDs “Content-addressable” network
Tolerant of churn nodes joining and leaving the network
Indirection in time Want some scheme to temporally decouple send and receive Persistence required. Typical Internet solution: soft state
Combo of persistence via storage and via retry “Publisher” requests TTL on storage Republishes as needed
Metaphor: Distributed Hash Table
to hz
h=z
Shivkumar KalyanaramanRensselaer Polytechnic Institute
37
Basic Idea
Hash key
Object “y”
Objects have hash keys
Peer “x”Peer nodes also have hash keys in the same hash space
P2P Network
y xH(y) H(x)
Join (H(x))Publish (H(y))
Place object to the peer with closest hash keys
Shivkumar KalyanaramanRensselaer Polytechnic Institute
38
Distributed Hash Tables (DHTs) Abstraction: a distributed hash-table data structure
insert(id, item); item = query(id); (or lookup(id);) Note: item can be anything: a data object, document, file, pointer to
a file… Proposals
CAN, Chord, Kademlia, Pastry, Tapestry, etc Goals:
Make sure that an item (file) identified is always found Scales to hundreds of thousands of nodes Handles rapid arrival and failure of nodes
Shivkumar KalyanaramanRensselaer Polytechnic Institute
39
Viewed as a Distributed Hash Table
Hashtable
0 2128-1
Peernode
Each is responsible for a range of the hash table,according to the peer hash key
Objects are placed in the peer with the closest keyNote thatpeers areInternetedges
Internet
Shivkumar KalyanaramanRensselaer Polytechnic Institute
40
How to Find an Object?
Hashtable
0 2128-1
Peernode
Simplest idea:Everyone knows everyone else!
one hop tofind the objectWant to keep only
a few entries!
Shivkumar KalyanaramanRensselaer Polytechnic Institute
41
Distributed Hash Tables (DHTs) Hash table interface: put(key,item), get(key) O(log n) hops Guarantees on recall
Structured Networks
K I
K I
K I
K I
K I
K I
K I
K I
K I
put(K1,I1)
(K1,I1)
get (K1)
I1
Shivkumar KalyanaramanRensselaer Polytechnic Institute
42
Content Addressable Network, CAN Distributed hash table Hash table as in a Cartesian coordinate space A peer only needs to know its logical neighbors Dimensional-ordered multihop routing
Hashtable
0 2128-1
Peernode
Shivkumar KalyanaramanRensselaer Polytechnic Institute
43
Content Addressable Network (CAN)
Associate to each node and item a unique id in an d-dimensional Cartesian space on a d-torus
Properties Routing table size O(d) Guarantees that a file is
found in at most d*n1/d steps, where n is the total number of nodes
Shivkumar KalyanaramanRensselaer Polytechnic Institute
44
CAN Example: Two Dimensional Space
Space divided between nodes All nodes cover the entire space Each node covers either a
square or a rectangular area of ratios 1:2 or 2:1
Example: Node n1:(1, 2) first node that
joins cover the entire space
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1
Shivkumar KalyanaramanRensselaer Polytechnic Institute
45
CAN Example: Two Dimensional Space
Node n2:(4, 2) joins space is divided between n1 and n2
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
Shivkumar KalyanaramanRensselaer Polytechnic Institute
46
CAN Example: Two Dimensional Space
Node n2:(4, 2) joins space is divided between n1 and n2
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3
Shivkumar KalyanaramanRensselaer Polytechnic Institute
47
CAN Example: Two Dimensional Space
Nodes n4:(5, 5) and n5:(6,6) join
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
Shivkumar KalyanaramanRensselaer Polytechnic Institute
48
CAN Example: Two Dimensional Space
Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6)
Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5);
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
Shivkumar KalyanaramanRensselaer Polytechnic Institute
49
CAN Example: Two Dimensional Space
Each item is stored by the node who owns its mapping in the space
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
Shivkumar KalyanaramanRensselaer Polytechnic Institute
50
CAN: Query Example
Each node knows its neighbors in the d-space
Forward query to the neighbor that is closest to the query id
Example: assume n1 queries f4 Can route around some failures
1 2 3 4 5 6 70
1
2
3
4
5
6
7
0
n1 n2
n3 n4n5
f1
f2
f3
f4
Shivkumar KalyanaramanRensselaer Polytechnic Institute
51
Another Design: Chord Node and object keys:
random location around a circle
Neighbors: nodes 2-i around the circle found by routing to desired key
Routing: greedy pick nbr closest to destination
Storage: “own” interval node owns key range between
her key and previous node’s key
Ownershiprange
Shivkumar KalyanaramanRensselaer Polytechnic Institute
52
OpenDHT A shared DHT service
The Bamboo DHT Hosted on PlanetLab Simple RPC API You don’t need to deploy
or host to play with a real DHT!
Shivkumar KalyanaramanRensselaer Polytechnic Institute
53
Review: DHTs vs Unstructured P2P DHTs good at:
exact match for “rare” items DHTs bad at:
keyword search, etc. [can’t construct DHT-based Google] tolerating extreme churn
Gnutella etc. (unstructured P2P) good at: general search finding common objects very dynamic environments
Gnutella etc. bad at: finding “rare” items
Shivkumar KalyanaramanRensselaer Polytechnic Institute
54
Distributed Systems Pre-Internet Connected by LANs (low loss and delay)
Small scale (10s, maybe 100s per server)
PODC literature focused on algorithms to achieve strict semantics in the face of failures Two-phase commits Synchronization Byzantine agreement Etc.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
55
Distributed Systems Post-Internet Very different context:
Huge scales (thousands if not millions) Highly variable connectivity Failures common Organic growth
Abandoned distributed strict semantics Adaptive apps rather than “guaranteed” infrastructure
Adopted pairwise client-server approach Server is centralized (even if server farm) Relatively primitive approach (no sophisticated dist.
algms.) Little support from infrastructure or middleware
Shivkumar KalyanaramanRensselaer Polytechnic Institute
56
A Database viewpoint on DHTs…: Towards Data-centricity, Data
Independence
Shivkumar KalyanaramanRensselaer Polytechnic Institute
57
Host-centric Protocols Protocols defined in terms of IP addresses:
Unicast: IP address = hostMulticast: IP address = set of hosts
Destination address is given to protocol
Protocol delivers data from one host to anotherunicast: conceptually trivialmulticast: address is logical, not physical
Shivkumar KalyanaramanRensselaer Polytechnic Institute
58
Host-centric Applications Classic applications: destination is “intrinsic”
telnet: target machine FTP: location of files electronic mail: email address turns into mail server multimedia conferencing: machines of participants
Destination is specified by user (not network) Usually specified by hostname not address
DNS translates names into addresses
Shivkumar KalyanaramanRensselaer Polytechnic Institute
59
Domain Name System (DNS)
DNS is built around recursive delegation Top level domains (TLDs): .com, .net, .edu, etc. TLDs delegate authority to subdomains
berkeley.edu Subdomains can further delegate
cs.berkeley.edu
Hierarchy fits host administrative structure Local decentralized control Crucial to efficient hostname resolution
Shivkumar KalyanaramanRensselaer Polytechnic Institute
60
Modern Web Data-Centricity URLs often function as names of data
users think of www.cnn.com as data, not a host Fact that www.cnn.com is a hostname is irrelevant
Users want data, not access to particular host
The web is now data-centric
Shivkumar KalyanaramanRensselaer Polytechnic Institute
61
Data-centric App in Host-centric World Data still associated with host names (URLs)
administrative structure of data same as hosts weak point in current web
Key enabler: search engines Searchable databases map keywords to URLs Allowed users to find desired data
Networkers focused on technical problems: HTTP, persistence (URNs), replication (CDNs), ...
Shivkumar KalyanaramanRensselaer Polytechnic Institute
62
A DNS for Data? DHTs… Can we map data names into addresses?
a data-centric DNS, distributed and scalable doesn’t alter net protocols, but aids data location not just about stolen music, but a general facility
A formidable challenge: Data does not have a clear administrative hierarchy Likely need to support a flat namespace Can one do this scalably?
Data-centrism requires scalable flat lookups => DHTs
Shivkumar KalyanaramanRensselaer Polytechnic Institute
63
Data Independence In DB Design Decouple app-level API from data organization
Can make changes to data layout without modifying applications
Simple version: location-independent names
Fancier: declarative queries
“As clear a paradigm shift as we can hope to find in computer science”- C. Papadimitriou
Shivkumar KalyanaramanRensselaer Polytechnic Institute
64
The Pillars of Data Independence Indexes
Value-based lookups have to compete with direct access
Must adapt to shifting data distributions
Must guarantee performance
Query Optimization Support declarative queries
beyond lookup/search Must adapt to shifting data
distributions Must adapt to changes in
environment
DBMSB-Tree
Join Ordering, AM Selection, etc.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
65
Generalizing Data Independence A classic “level of indirection” scheme
Indexes are exactly that Complex queries are a richer indirection
The key for data independence: It’s all about rates of change
Hellerstein’s Data Independence Inequality: Data independence matters when
d(environment)/dt >> d(app)/dt
Shivkumar KalyanaramanRensselaer Polytechnic Institute
66
Data Independence in Networks
d(environment)/dt >> d(app)/dt
In databases, the RHS is unusually small This drove the relational database revolution
In extreme networked systems, LHS is unusually high And the applications increasingly complex and data-driven Simple indirections (e.g. local lookaside tables) insufficient
Shivkumar KalyanaramanRensselaer Polytechnic Institute
67
Hierarchical Networks (& Queries) IP
Hierarchical name space (www.vldb.org, 141.12.12.51) Hierarchical routing
Autonomous Systems correlate with name space (though not perfectly)
DNS Hierarchical name space (“clients” + hierarchy of servers) Hierarchical routing w/aggressive caching
13 managed “root servers”
Traditional pros/cons of Hierarchical data mgmt Works well for things aligned with the hierarchy
Esp. physical locality a la Astrolabe Inflexible
No data independence!
Shivkumar KalyanaramanRensselaer Polytechnic Institute
68
The Pillars of Data Independence Indexes
Value-based lookups have to compete with direct access
Must adapt to shifting data distributions
Must guarantee performance
Query Optimization Support declarative queries
beyond lookup/search Must adapt to shifting data
distributions Must adapt to changes in
environment
DBMS P2PB-Tree Content-
Addressable Overlay Networks (DHTs)
Join Ordering, AM Selection, etc.
Multiquery dataflow sharing?
Shivkumar KalyanaramanRensselaer Polytechnic Institute
69
Sensor Networks: The Internet Meets the Environment
Shivkumar KalyanaramanRensselaer Polytechnic Institute
70
Today: Internet meets Mobile Wireless Computing
Computing: smaller, faster Disks: larger size, small
form Communications: wireless
voice, data Multimedia integration:
voice, data, video, games
Samsung Cameraphone w/ camcorder
iPoD: impact of disk size/cost
Blackberry: phone + PDA
SONY PSP: mobile gaming
Shivkumar KalyanaramanRensselaer Polytechnic Institute
71
Tomorrow: Embedded Networked Sensing Apps Micro-sensors, on-board
processing, wireless interfaces feasible at very small scale--can monitor phenomena “up close”
Enables spatially and temporally dense environmental monitoring
Embedded Networked Sensing will reveal previously unobservable phenomena
Seismic Structure response
Contaminant Transport
Marine Microorganisms
Ecosystems, Biocomplexity
Shivkumar KalyanaramanRensselaer Polytechnic Institute
72
Embedded Networked Sensing: Motivation
Imagine: high-rise buildings self-detect structural faults (e.g., weld cracks) schools detect airborn toxins at low concentrations, trace
contaminant transport to source buoys alert swimmers to dangerous bacterial levels earthquake-rubbled building infiltrated with robots and sensors:
locate survivors, evaluate structural damage ecosystems infused with chemical, physical, acoustic, image
sensors to track global change parameters battlefield sprinkled with sensors that identify track friendly/foe
air, ground vehicles, personnel
Shivkumar KalyanaramanRensselaer Polytechnic Institute
73
Embedded Sensor Nets: Enabling Technologies
Embedded Networked
Sensing
Control system w/Small form factorUntethered nodes
ExploitcollaborativeSensing, action
Tightly coupled to physical world
Embed numerous distributed devices to monitor and interact with physical world
Network devices to coordinate and perform higher-level tasks
Exploit spatially/temporally dense, in situ/remote, sensing/actuation
Shivkumar KalyanaramanRensselaer Polytechnic Institute
74
Sensornets Vision:
Many sensing devices with radio and processor Enable fine-grained measurements over large areas Huge potential impact on science, and society
Technical challenges: untethered: power consumption must be limited unattended: robust and self-configuring wireless: ad hoc networking
Shivkumar KalyanaramanRensselaer Polytechnic Institute
75
Similarity w/ P2P Networks Sensornets are inherently data-centric
Users know what data they want, not where it is Estrin, Govindan, Heidemann (2000, etc.)
Centralized database infeasible vast amount of data, constantly being updated small fraction of data will ever be queried sending to single site expends too much energy
Shivkumar KalyanaramanRensselaer Polytechnic Institute
76
Sensor Nets: New Design Themes Self configuring systems that adapt to unpredictable
environment dynamic, messy (hard to model), environments preclude pre-
configured behavior
Leverage data processing inside the network exploit computation near data to reduce communication collaborative signal processing achieve desired global behavior with localized algorithms
(distributed control)
Long-lived, unattended, untethered, low duty cycle systems energy a central concern communication primary consumer of scarce energy resource
Shivkumar KalyanaramanRensselaer Polytechnic Institute
77
From Embedded Sensing to Embedded Control
embedded in unattended “control systems” control network, and act in environment
critical app’s extend beyond sensing to control and actuation transportation, precision agriculture, medical monitoring and
drug delivery, battlefield app’s concerns extend beyond traditional networked systems and
app’s: usability, reliability, safety
need systems architecture to manage interactions current system development: one-off, incrementally tuned,
stove-piped repercussions for piecemeal uncoordinated design: insufficient
longevity, interoperability, safety, robustness, scaling
Shivkumar KalyanaramanRensselaer Polytechnic Institute
78
Why cant we simply adapt Internet protocols, “end to end” architecture?
Internet routes data using IP Addresses in Packets and Lookup tables in routers humans get data by “naming data” to a search
engine many levels of indirection between name and IP
address embedded, energy-constrained (un-tethered,
small-form-factor), unattended systems cant tolerate communication overhead of indirection
special purpose system function(s): don’t need want Internet general purpose functionality designed for elastic applications.
Shivkumar KalyanaramanRensselaer Polytechnic Institute
79
Sample Layered Architecture
Resource constraints call for more tightly integrated layers
Open Question:
What are definingArchitecturalPrinciples?
In-network: Application processing, Data aggregation, Query processing
Adaptive topology, Geo-Routing
MAC, Time, Location
Phy: comm, sensing, actuation, SP
User Queries, External Database
Data dissemination, storage, caching
Shivkumar KalyanaramanRensselaer Polytechnic Institute
80
Coverage measures area coverage: fraction of
area covered by sensors detectability: probability
sensors detect moving objects
node coverage: fraction of sensors covered by other sensors
control: where to add new nodes
for max coverage how to move existing
nodes for max coverage
S
D
x
Given: sensor field (either known sensor locations, or spatial density)
Shivkumar KalyanaramanRensselaer Polytechnic Institute
81
In Network Processing communication expensive
when limited power bandwidth
perform (data) processing in network close to (at) data forward
fused/synthesized results e.g., find max. of data
distributed data, distributed computation
Shivkumar KalyanaramanRensselaer Polytechnic Institute
82
Distributed Representation and Storage
Data Centric Protocols, In-network Processing goal: Interpretation of spatially distributed data (Per-
node processing alone is not enough) network does in-network processing based on
distribution of data Queries automatically directed towards nodes
that maintain relevant/matching data
pattern-triggered data collection Multi-resolution data storage and retrieval Distributed edge/feature detection Index data for easy temporal and spatial
searching Finding global statistics (e.g., distribution)
K V
K VK V
K VK V
K V
K V K VK V
K VK V
Tim
e
Shivkumar KalyanaramanRensselaer Polytechnic Institute
83
Directed Diffusion: Data Centric Routing
Basic idea name data (not nodes) with externally relevant
attributes: data type, time, location of node, SNR, diffuse requests and responses across network using
application driven routing (e.g., geo sensitive or not) support in-network aggregation and processing
data sources publish data, data clients subscribe to data however, all nodes may play both roles
node that aggregates/combines/processes incoming sensor node data becomes a source of new data
node that only publishes when combination of conditions arise, is client for triggering event data
true peer to peer system?
Shivkumar KalyanaramanRensselaer Polytechnic Institute
84
Traditional Approach: Warehousing data extracted from sensors, stored on server Query processing takes place on server
Warehouse
Front-end
Sensor Nodes
Shivkumar KalyanaramanRensselaer Polytechnic Institute
85
Sensor Database System Sensor Database System supports distributed query
processing over sensor network
SensorDB
SensorDB
SensorDB
SensorDB Sensor
DB
SensorDB
SensorDB
SensorDB
Front-end
Sensor Nodes
Shivkumar KalyanaramanRensselaer Polytechnic Institute
86
Sensor Database System Characteristics of a Sensor
Network: Streams of data Uncertain data Large number of nodes Multi-hop network No global knowledge
about the network Node failure and
interference is common Energy is the scarce
resource Limited memory No administration, …
• Can existing database techniques be reused? What are the new problems and solutions? Representing sensor data Representing sensor
queries Processing query
fragments on sensor nodes
Distributing query fragments
Adapting to changing network conditions
Dealing with site and communication failures
Deploying and Managing a sensor database system
Shivkumar KalyanaramanRensselaer Polytechnic Institute
87
Summary
P2P networks: Napster, Gnutella, Kazaa Distributed Hash Tables (DHTs) Database perspectives: data-centricity, data-
independence Sensor networks and its connection to P2P