A Resilient Transport System for Wireless Sensor
Networks
Chieh-Yih Wan
Submitted in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
in the Graduate School of Arts and Sciences
Columbia University
2005
c© 2005
Chieh-Yih Wan
All Rights Reserved
ABSTRACT
A Resilient Transport System for Wireless Sensor Networks
Chieh-Yih Wan
This thesis contributes toward the design of a new resilient transport system for
wireless sensor networks. Sensor networks have recently emerged as a vital new area
in networking research, one that tightly marries sensing, computing, and wireless
communications for the first time. Wireless sensors are embedded in the real world
and interact closely with the physical environment in which they reside. These net-
works must be designed to effectively deal with the network’s dynamically changing
resources, including available energy, bandwidth, processing power, node density,
and connectivity. This dissertation focuses on making the sensor network trans-
port system resilient to such changes - in many cases abrupt changes. We define
transport resilience as the ability of the network to deliver a sufficient amount of
sensing events to meet the applications’ fidelity requirement for a set of different
traffic classes while reducing the energy consumption of the network. More specifi-
cally, we investigate, study, and analyze two classes of transport resilience: (1) the
need to reliably deliver data under various error conditions; and (2) the need to
maintain the application’s fidelity under congested network conditions. We take an
experimental systems research approach to the problem of supporting resilience in
sensor networks by building an experimental sensor network testbed and evaluating
a set of new resilient transport algorithms under various workloads and changing
network conditions. We study the behavior of these algorithms under testbed condi-
tions, and apply what is learned toward the construction of larger and more scalable
resilient networks.
This thesis makes a number of contributions. First, we propose a new reliable
delivery transport paradigm for sensor networks called Pump Slowly Fetch Quickly
(PSFQ). PSFQ represents a lightweight, scalable and robust transport protocol that
is customizable to meet a wide variety of applications needs (e.g., re-programming,
actuation, reliable event delivery). We present the design and implementation of
PSFQ, and evaluate the protocol using the ns-2 simulator and an experimental wire-
less sensor testbed based on Berkeley motes and the TinyOS operating system. The
PSFQ protocol represents the first reliable transport proposed for wireless sensors
networks.
Next, we present the design of an energy-efficient congestion control scheme for
sensor networks called CODA (COngestion Detection and Avoidance). We define
a new objective function for traffic control in sensor networks, which maximizes
the operational lifetime of the network while delivering acceptable data fidelity to
sensor network applications. CODA is founded on three important distributed con-
trol mechanisms: (1) an accurate and energy-efficient congestion detection scheme;
(2) a hop-by-hop backpressure algorithm; and (3) a sink to multi-source regulation
scheme. We evaluate a number of congestion scenarios and define new performance
metrics that capture the impact of CODA on the sensing application performance.
We analyze the performance benefits and practical engineering challenges of imple-
menting CODA in an experimental sensor network motes testbed. CODA represents
the first comprehensive solution to the congestion problem in sensor networks.
The final contribution of this dissertation explores a complementary solution to
CODA called dual radio virtual sinks that boosts the performance of sensor networks
even under persistent overload conditions. We propose to randomly distribute a
small number of all-wireless dual radio virtual sinks throughout the sensor field. In
essence, these virtual sinks operate as safety valves in the sensor field by selectively
siphoning off overload traffic in order to maintain the fidelity of the application
signal delivered to the network’s physical sink. A key feature of virtual sinks is
that they are equipped with a secondary higher bandwidth, long-range radio (e.g.,
the IEEE 802.11), in addition to their primary low bandwidth, low power mote
radio. Virtual sinks are capable of dynamically forming a secondary ad hoc radio
network that can be used on-demand by the mote radio network. Rather than rate-
controlling packets during periods of congestion (as is the case with CODA), virtual
sinks take the congested traffic off the low-powered sensor network and move it on
to the secondary radio network, transiting it to the network’s physical sink. We
study, propose, and evaluate a set of algorithms for virtual sink discovery, selection,
traffic transiting and load balancing. We leverage the use of the Stargate platform
to support an all-wireless virtual sink approach in our sensor network testbed.
We believe that sensor networks must be built to be robust to various software
and hardware failures, and be resilient to dynamic resource changes such as node
failures, increased packet error rates, and traffic surges. Collectively, PSFQ, CODA,
and virtual sinks provide a set of energy-efficient, robust transport mechanisms that
serve as a foundation for making sensor networks more resilient.
Contents
1 Introduction 1
1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 A Resilient Transport System for Sensor Networks . . . . . . . 5
1.1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 10
1.1.3 Technical Barriers . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2. Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.1 Pump Slowly Fetch Quickly (PSFQ) . . . . . . . . . . . . . . 16
1.2.2 CODA (COngestion Detection and Avoidance) . . . . . . . . . 17
1.2.3 Dual Radio Virtual Sinks . . . . . . . . . . . . . . . . . . . . . 18
1.3. Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Pump-Slowly, Fetch-Quickly (PSFQ): A Reliable Transport Proto-
col for Sensor Networks 22
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2. Protocol Design Space . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.1 Hop-by-Hop Error Recovery . . . . . . . . . . . . . . . . . . . 25
2.2.2 Fetch/Pump Relationship . . . . . . . . . . . . . . . . . . . . 27
2.2.3 Multi-modal Operations . . . . . . . . . . . . . . . . . . . . . 29
2.3. Protocol Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.1 Pump Operation . . . . . . . . . . . . . . . . . . . . . . . . . 31
i
2.3.2 Fetch Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.3 Report Operation . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3.4 Single-Packet Message Delivery . . . . . . . . . . . . . . . . . 40
2.4. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.1 Simulation Approach . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5. Experimental Testbed Results . . . . . . . . . . . . . . . . . . . . . . 48
2.5.1 PSFQ Parameter Space and Timer Bounds . . . . . . . . . . . 48
2.5.2 Messaging Overhead . . . . . . . . . . . . . . . . . . . . . . . 49
2.5.3 Network Size versus Network Density . . . . . . . . . . . . . . 51
2.6. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3 Energy-Efficient Congestion Detection and Avoidance in Sensor
Networks 56
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2. Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.2.1 CSMA Considerations . . . . . . . . . . . . . . . . . . . . . . 63
3.2.2 Congestion Detection . . . . . . . . . . . . . . . . . . . . . . . 66
3.3. CODA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3.1 Open-Loop Hop-by-Hop Backpressure . . . . . . . . . . . . . . 72
3.3.2 Closed-Loop Multi-Source Regulation . . . . . . . . . . . . . . 77
3.4. Experimental Sensor Network Testbed . . . . . . . . . . . . . . . . . 82
3.4.1 Measuring the β Value . . . . . . . . . . . . . . . . . . . . . . 83
3.4.2 Channel Loading Measurement and Utilization . . . . . . . . . 84
3.4.3 Energy Tax, Fidelity Penalty, and Power . . . . . . . . . . . . 88
3.4.4 Open-loop Control . . . . . . . . . . . . . . . . . . . . . . . . 90
ii
3.4.5 Combining Open-loop and Closed-loop Control . . . . . . . . 92
3.5. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.5.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . 95
3.5.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . 98
3.6. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4 Dual Radio Virtual Sinks 110
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.2. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.3. Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.3.1 Funneling Effect . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.3.2 Small World Observations and Shortcuts . . . . . . . . . . . . 117
4.3.3 Traffic Redirection and Prioritization Issues . . . . . . . . . . 119
4.3.4 Transparency and Compatibility Issues . . . . . . . . . . . . . 120
4.4. Siphon Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.4.1 Virtual Sink Discovery and Visibility Scope Control . . . . . . 123
4.4.2 Congestion Detection . . . . . . . . . . . . . . . . . . . . . . . 126
4.4.3 Traffic Redirection . . . . . . . . . . . . . . . . . . . . . . . . 129
4.4.4 Congestion in the Secondary Network . . . . . . . . . . . . . . 131
4.5. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.5.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . 132
4.5.2 Delay Device and Directed Diffusion . . . . . . . . . . . . . . 134
4.5.3 Energy Tax, Fidelity Ratio and Residual Energy . . . . . . . . 136
4.5.4 Early Congestion Detection . . . . . . . . . . . . . . . . . . . 137
4.5.5 Virtual Sink’s Visibility Scope Impact . . . . . . . . . . . . . . 139
4.5.6 Always-on versus On-demand Virtual Sinks . . . . . . . . . . 140
iii
4.5.7 Partitioned Secondary Network . . . . . . . . . . . . . . . . . 142
4.5.8 VS Density Impact . . . . . . . . . . . . . . . . . . . . . . . . 143
4.5.9 Load Balancing Feature . . . . . . . . . . . . . . . . . . . . . 146
4.6. Sensor Network Testbed Implementation . . . . . . . . . . . . . . . . 148
4.6.1 Stargate and Mica Mote Testbed . . . . . . . . . . . . . . . . 148
4.6.2 Congestion Detection for Traffic Redirection Decision . . . . . 149
4.6.3 A Generic Data Dissemination Application . . . . . . . . . . . 153
4.6.4 Post-Facto Traffic Siphoning . . . . . . . . . . . . . . . . . . . 155
4.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5 Conclusion 158
5.1. The Critical Issue of Transport Resilience . . . . . . . . . . . . . . . . 158
5.2. Reliable Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.3. Congestion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.4. Endnote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6 My Publications as a PhD Candidate 166
6.1. Journal Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.2. Journal Papers under Submission . . . . . . . . . . . . . . . . . . . . 167
6.3. Magazine Papers, Review Articles and Book Chapters . . . . . . . . . 167
6.4. Conference Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.5. IETF Internet Drafts . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
References 170
iv
List of Figures
1-1 The funneling effect. Sensors within the range of an event region/epicenter
(enclosed by the dotted ellipse line) generate data that travels along
a propagation funnel (enclosed by dotted line) toward the sink when
an event occurs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2-1 Probability of successful delivery of a message using an end-to-end
model across a multi-hop network. . . . . . . . . . . . . . . . . . . . . 26
2-2 Probability of successful delivery of a message over one hop when the
mechanism allows multiple retransmissions before the next packet
arrival. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2-3 Sensor network in a building. A user node at location 0 injects 50
packets into the network within 0.5 seconds. . . . . . . . . . . . . . . 43
2-4 Error tolerance comparison - average delivery ratio as a function
of the number of hops under various channel condition for different
packet error rates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2-5 Comparison of average latency as a function of channel error rate. . . 46
2-6 Average delivery overhead as a function of channel error rate. . . . . 47
2-7 A 4-hop network physically arranged in a string/chain topology. . . . 49
2-8 Breakdown of PSFQ messages. Average delivery overhead is 1.2±0.13. 50
2-9 Average delivery overhead as a function of network size and density. . 51
2-10 Average delivery latency as a function of network size and density. . . 52
v
3-1 Total number of packets dropped by the sensor network per data
event packet delivered at the sink (Drop Rate) as a function of the
source rate. The x axis is plotted in log scale to highlight data points
with low reporting rates. All packets that are dropped during the
50 second simulation session are counted as part of the drop rate
including the MAC signaling (e.g., RTS/CTS/ACK and ARP), data
event, and diffusion messaging packets. . . . . . . . . . . . . . . . . . 58
3-2 A simple IEEE 802.11 wireless network of 5 nodes illustrates receiver-
based congestion detection. . . . . . . . . . . . . . . . . . . . . . . . . 67
3-3 Channel load and buffer occupancy time series traces with and with-
out virtual carrier sense (VC)+link-layer ACK, and packet delivery
trace with VC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3-4 Queueing performance of a real sensor network of Mica motes. . . . . 69
3-5 Closed-loop control model. The impact of Wsink and the multiplica-
tive decrease factor d. . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3-6 MAC layer stopwatch placement for β measurement. Diagram of
receive and transmit state flows in the TinyOS MAC component code.
Placement of the stopwatch start/stop trigger points are marked with
an X. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3-7 A limit on measured channel load is imposed by β. Nominal load
curve increases with constant slope as the source packet rate increases,
while the measured load saturates at a value below 70%. . . . . . . . 87
3-8 Experimental sensor network testbed topology. Nodes are well con-
nected. Packets are unicast. . . . . . . . . . . . . . . . . . . . . . . . 89
3-9 Improvement in energy tax with small fidelity penalty using CODA.
Priority of Src-2 evident from the fidelity penalty results. . . . . . . . 91
vi
3-10 Experimental sensor network testbed topology to capture the funnel-
ing effect in a larger network with sparsely located sources. . . . . . . 92
3-11 Time series traces that present the rate control dynamics and the
event fidelity/delivery performance of CODA. CODA’s rate control
scheme does not increase the degree of variability to the event delivery
performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3-12 Tradeoff between fidelity and energy tax that obtain the most benefit,
i.e. maximum “power”, for the network. . . . . . . . . . . . . . . . . 94
3-13 Power of CODA versus non-CODA in an experimental Mica motes
testbed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3-14 Network of 30 nodes. Sensors within the range of the event epicentre,
which is enclosed by the dotted ellipse, generate impulse data when
an event occurs. The circle represents the radio range (40m) of the
sensor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3-15 Time series traces for densely deployed sources that generate high
rate data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3-16 (a) Packet delivery and (b) Packet drop time series traces for a 15-
node network with low rate traffic. The plots show the traces for
three cases: when only open-loop control (OCC) is used, both open-
loop and closed-loop control (CCC) are enabled and when congestion
control is disabled (noCC). . . . . . . . . . . . . . . . . . . . . . . . . 100
3-17 Average energy tax and fidelity penalty as a function of the network
size when only CODA’s open loop control is used. . . . . . . . . . . . 101
3-18 Energy tax as a function of network size for high and low rate data
traffic. The difference between the data points with and without
CODA indicates the energy saving achieved by CODA. . . . . . . . . 102
vii
3-19 Fidelity penalty as a function of the network size for high and low
rate data traffic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4-1 The funneling effect. Sensors within the range of an event region/epicenter
(enclosed by the dotted ellipse) generate impulse data that travel
along a propagation funnel (enclosed by dotted line) toward the sink
when an event occurs. . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4-2 Reduction of average distance in a network with increasing percentage
of dual-radio nodes that provide the shortcuts between nodes. . . . . 118
4-3 Early Congestion Detection. Different congestion level thresholds
that can avoid congestion down the funnel. . . . . . . . . . . . . . . . 138
4-4 The impact of the visibility scope of a VS for a network of 30 nodes. . 139
4-5 Fidelity and Energy Tax performance in a network where there are
always-on virtual sinks. . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4-6 Fidelity and Energy Tax performance in a network where there are
virtual sinks that are put into service only when congestion is detected.141
4-7 Number of sensor nodes required to ensure connectivity in the cor-
responding areas of network coverage as well as the number of VSs
(right vertical axis) required to ensure performance improvement. . . 144
4-8 Fraction of Virtual Sinks needed to assure improved network perfor-
mance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4-9 Requirement for a connected secondary network. The transmission
range of the long-range radio is expressed as the multiples of transmis-
sion radius of the low-power radio. The visibility scope requirement
that assures both energy tax and fidelity improvement is plotted as
filled square in the figure. . . . . . . . . . . . . . . . . . . . . . . . . 146
viii
4-10 Energy Distribution (Complementary CDF) of a 70-node network
with 3 virtual sinks scattered randomly across the network. With
Siphon’s load balancing feature, more nodes share the energy load.
Therefore, fewer nodes have residual energy larger than 85%, but
more nodes have larger residual energy (e.g., the percentage of nodes
having residual energy larger than 75% increase from 60% to 85%),
effectively increasing the operation lifetime of the network. . . . . . . 147
4-11 A sensor network testbed of 30 nodes . . . . . . . . . . . . . . . . . . 149
4-12 Early congestion detection threshold. An appropriate choice for the
early congestion detection threshold must be based on application
loss tolerance parameters. . . . . . . . . . . . . . . . . . . . . . . . . 150
4-13 Queueing performance and buffer occupancy threshold for congestion
avoidance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4-14 Siphon performance in a real sensor network of 30 nodes. Notice the
priority favor of CODA toward src-3. . . . . . . . . . . . . . . . . . . 154
4-15 Post-facto traffic redirection versus early-detection approach. . . . . . 155
ix
Acknowledgements
First of all, I would like to thank my adviser Professor Andrew T. Campbell for his
support and constant prodding in the past few years. Professor Campbell’s high
expectation towards research has brought my thesis to the next level of practicabil-
ity and academic contribution. Furthermore, Professor Campbell’s persistence and
optimism have helped me overcome many obstacles and accomplish the impossible.
I want to thank Dr. Lakshman Krishnamurthy for giving me the opportunity to
work with him in 2001 for my summer internship with Intel. Through Lakshman, I
have learned to be practical in exploring the unknowns and to exercise practicality
in seeking solutions. Because of him, there came the creation of PSFQ, the first
chapter of my thesis.
I am deeply grateful to Dr. Wai Chen. I was inspired by his teaching at Columbia
University and I had also given the opportunity to work with him at Telcordia
Technologies as my first summer internship in the United States. Dr. Chen has
become more than just a professor or a superior at work to me, he is a true mentor.
He has given me consolations and encouragements as I faltered along during this
time of my life.
Many thanks to Professor Mischa Schwartz for his conscientiousness in editing
my thesis and great admiration for his wisdom.
Special thanks goes to Shane B. Eisenman for being such a great help to my
research and most of all, such a wonderful friend. His friendship has lit up the
otherwise boring and seemingly never-ending Ph.D. life. I will always treasure our
“mini ball game” at the nights that we spent together at the COMET lab burning
the midnight oil for our research papers.
Deep appreciation and thanks to my dearest wife for her endurance to put up
with my hectic life style for being a Ph.D. student. At last, I want to dedicate my
x
thesis to my father in Malaysia and also my beloved mother who passed away few
years ago.
xi
1
Chapter 1
Introduction
1.1. Overview
Over the last several years sensor networks have emerged as a vital new area in
networking research, one that tightly marries sensing, computing, and wireless com-
munications for the first time. This new frontier in wireless sensing presents many
new technical challenges for the research community as well as many untapped op-
portunities for a diverse set of industries and early adopters, from environmental
sensing to ubiquitous computing. This new era is also likely to have a significant
impact on how we as individuals interact with the physical world around us. In this
dissertation, we address some important networking problems associated with this
nascent field that could limit this broad vision, if not solved.
The notion of sensor networks has been around for over two decades now [1], but
recently the coming together of sensing and wireless communications has revolution-
ized the field and enabled significant advances. Early sensor networks emerged in
the 1980s and included radar networks used in air traffic control systems, and the
national power grid. However, these networks were wired networks, and while some
researchers working on these early systems envisioned large numbers of small sen-
sors, the technology to do this has only recently emerged. Today’s sensor networks
2
comprise multiple distributed sensors that can collect information associated with
events of interest wirelessly. Each sensor node has an embedded processing capa-
bility, onboard storage and communication capability, and potentially has multiple
onboard sensors interfacing with, or monitoring, the physical environment in which
the sensor resides.
There have been a number of important enablers for advances in networked sens-
ing. Today, silicon technology continues to push into two complementary directions,
i.e., increased processing speed measured in GHz and the decreased circuit feature
size measured in µm and nm. Together with important advances in miniaturization
and low power wireless technologies, the emergence of low cost, small scale devices
with sensing, communication and computing have begun to enable the much vaunted
era of ubiquitous computing [2] proposed by Mark Weiser [3] in the early 90s. The
necessary enabling technologies to revolutionize the world of sensing have finally ar-
rived. The current state of the art in microelectronics and microelectromechanical
system (MEMS) [4] provide a foundation for autonomous microsystems that com-
pose wireless sensor networks. These networks offer low cost, distributed monitoring
solutions for a wide variety of applications and systems [5] [6] [7]. Anything from
monitoring weather patterns to tracking human movement can be done with these
large-scale, highly distributed systems of small, untethered, low-power, unattended
sensors and actuators.
Wireless sensor networks represent a new class of distributed systems that op-
erate under a new set of constraints. Many tasks are greatly complicated in sensor
networks due to their unique requirements and constraints - energy efficiency, re-
source limitations in computing and communicating, scalability, automatic adapta-
tion to environmental dynamics, etc. In many ways, sensor networks are breathing
new life into a number of old problems. For example, because of the unpredictable
3
and possibly harsh environments where the sensors might be deployed, assuring re-
liable communications between sensor nodes for the purposes of control and data
collection is a significant challenge. This is in contrast with traditional networks
such as the Internet, in which reliable networking has been well studied and applied
with great success in support of everyday applications; the same comment applies
to cellular networks and wireless LAN-based networks that we use everyday.
There has been a growing amount of networking research conducted on wireless
sensor networks focusing on several important problems; these include but are not
limited to:
• Data dissemination mechanisms [8] [9] [10] in which researchers study the data
routing problems in sensor networks. These studies have also identified the
unique data-centric nature [11] of sensor networks.
• Power conservation/coordination schemes [12] [13] that allow nodes to schedule
appropriate sleep cycles among themselves to save power, while maintaining
network connectivity.
• Energy-efficient medium access schemes [14] [15] [16] for low-power radio com-
munications.
• Coverage problems [17] [18] in which researchers study the sensing coverage
and the minimal/maximal exposure paths in breach-detection applications.
• Efficient and modular operating system support [19] [20] [21] [22] for low-end,
low-power sensor platforms such as the popular Berkeley motes series [23] [24]
[6] [25].
• Distributed time synchronization [26] [27] and lightweight geographic localiza-
tion [28] [29] mechanisms without GPS support.
4
• Data fusion and signal processing techniques [30] [31] for reliable event detec-
tion/sensing, etc.
Sensor networks are embedded in the real world and interact closely with the
changing physical environment in which they reside. To the best of our knowledge
there has been little or no work on understanding to what degree do sensor networks
and their associated data collection and network management algorithms need to
be resilient to dynamic changes in network conditions and environment in which
sensors are embedded. This is the broad question that this thesis addresses. We
argue that without making the network algorithms resilient to potentially harsh and
fast changing operating conditions the rollout of sensor network technology may be
jeopardized.
Furthermore, depending on the nature of the applications, sensor networks are
expected to operate under a wide variety of environmental conditions, and under
some scenarios, during moments of natural disasters. As a result, sensor networks
must be fault-tolerant and capable of adapting to the environment. Early sensor
network research suggests that in dense sensor networks, node or communication
failures in the network could be absorbed by the natural redundancy of large num-
bers of nodes or the large correlation of the signal in time and space [32]. We
conjecture in this dissertation that the benefits of such redundancy in the network
(while helpful) is limited and not sufficient to support resilient sensor network op-
erations and applications. We believe that there is a pressing need for reliable
communication protocols and congestion control in sensor networks. This disser-
tation studies resilience in sensor networks, specifically at the data transport level
where the problems of packet losses due to communication errors and congestion
can make these networks unstable, energy wasting, and potentially nonoperational.
Our conjecture is that existing sensor network design at the data transport level is
5
not robust enough to operate even under moderate conditions, and certainly inca-
pable of delivering high fidelity data to applications under higher traffic loads, or
in environments where serious error conditions are experienced. This thesis investi-
gates aspects of this problem and explores a spectrum of general design principles
for resilient control and transport mechanisms that enable the robust operation of
sensor networks under a wide variety of conditions.
1.1.1 A Resilient Transport System for Sensor Networks
Sensor networks must deal with resources that are dynamically changing, including
energy, bandwidth, processing power, node density and connectivity. Furthermore,
they must also deal with the adverse effects from uncertain and dynamic physical
environments. Therefore, a resilient sensor network must operate autonomously,
changing its configuration as required and running algorithms that are optimized
for node survivability and energy usage.
We observe that there are three classes of data traffic patterns in sensor networks
associated with three different classes of applications. The first class represents peri-
odic traffic, which is associated with the applications that generate periodic reports
associated with environmental conditions. Such applications include habitat mon-
itoring [33] and the structural/environmental monitoring of buildings or machines
[34], e.g., the periodic reporting of the temperature readings of a room or the acous-
tic signature of a machine. The next class represents discrete event traffic, which
is associated with event sensing applications that generate discrete event reports
triggered by an event of interest, e.g., enemy and target movements on the battle-
field. Finally, the third class represents impulse wave traffic, which is associated
with data impulse applications that typically monitor large-scale disastrous events
such as earthquakes, biochemical attacks, forest fires, etc. These applications can
6
generate large impulse waves of correlated data across a large area that can easily
overwhelm the sensor network if it is not designed to be robust to these conditions.
It is likely that future sensor networks will need to be designed to carry multiple
traffic classes simultaneously. This is very challenging for the existing state of the
art.
We define transport resilience in sensor networks as the ability to deliver a suf-
ficient amount of event data to meet an application’s fidelity (e.g., rate of events)
[35] needs (for the different traffic patterns discussed above) while minimizing the
energy consumption. We investigate two classes of transport resilience in this thesis:
1. The need to reliably deliver data with minimal energy expenditure under var-
ious error conditions, potentially, significantly harsh.
2. The need to maintain the fidelity of the signal to the applications [35] under
congested network conditions, which we show in this dissertation is a signifi-
cant problem in the emerging sensor networks.
In typical sensor network applications, data from the active sensors are delivered
through the network to a relatively small number of sink points (see Figure 1-1) that
are attached to the regular communication infrastructure (e.g., Internet). On the
other hand, users control/manage the network or provide actuation instructions [36]
to the network through control signals sent from the sinks. Hence, in a general sense,
there are two information flows in opposite directions in the network, as seen from
the sink’s point of view. In this dissertation, we consider the problems associated
with supporting a resilient transport in sensor networks from the same perspective,
i.e., sinks to sources and sources to sinks.
First, we consider the data that flows from sinks to sources for the purpose of
controlling (e.g., actuation) or managing (e.g., reprogramming) the network. For
7
example, new applications/software should be ready for rapid deployment in an ad
hoc fashion, supporting the re-tasking or reprogramming of sensors, or allowing the
network to adaptively reconfigure and repair itself in response to unpredictable run-
time dynamics. For example, a user can install a new data encoding scheme on each
sensor that supports a lower transmission rate but is less sensitive to the channel
quality after the network has been deployed in a high interference environment1. As
a general comment, the information flow from sink to sources is much more sensitive
to message loss in comparison to communications between the sources and sinks. In
this dissertation, we consider the issue of loss when the application is sensitive to
such loss as the reliable transport problem.
Second, we consider data that flows from sources to sinks for the purpose of ex-
tracting useful, reliable, and timely information from the deployed sensor network.
The information flow in this respect is generally more tolerant to packet loss be-
cause of the natural redundancy inherent in disseminated sensor data [32] [8] but
is prone to suffer significant fidelity degradation due to network congestion. In this
dissertation, we consider this as the congestion control problem.
Note that in general, both the reliable transport and congestion problems are
not limited to the specific downlink or uplink directions of data flows; although,
we observe that existing sensor network applications often exhibit this directional
property of the information flows in the network. In what follows, we discuss the
challenges in supporting reliable transport and congestion control in sensor networks.
1Note that the RFM [37] radio used by the Rene2 Berkeley mote [23] sensor platform supportsthis kind of operation.
8
1.1.1.1 Reliable Transport Challenges
The ability to provide reliable data delivery is the first step toward the creation
of resilient sensor networks. Such a capability benefits a spectrum of new sensor
network services. For example, the assured delivery of important target information,
or the ability to modify the software algorithms running in sensors, (i.e., over-the-
air dynamic reconfiguration or re-tasking of sensor networks). There are several
sensor platforms that have the capability to support the re-tasking of individual
sensors but not a network of sensors. For example, the Berkeley motes [19] [23]
are capable of receiving code segments from the network and assembling them into
a completely new execution image in EEPROM secondary store before re-tasking
a sensor. Currently, however, there is no transport protocol capable of reliably
delivering code segments to groups of sensors. A number of groups have recently
started projects to build reliability into sensor networks and develop on-demand
over-the-air reprogramming infrastructure for re-tasking. For example, the SOS [22]
[38] project is developing an operating system that supports the dynamic loading of
software modules to create a system supporting dynamic addition, modification, and
removal of network services on a sensor node. Many other applications of a reliable
transports are emerging, e.g., reliable actuation, reliable signaling, and database
duplication [39] stored in remote sensors.
There are a number of challenges associated with the development of a reliable
transport protocol for sensor networks. Early experimental testbed results [40] [41]
reveal that the link quality of the wireless channel in sensor networks is highly vari-
able and unpredictable. For example, the packet reception rate can range from
90% to 50%. In the case of a re-tasking application, how can a transport support
such an application when possibly hundreds or thousands of nodes need to be re-
programmed in a controlled, reliable, robust, timely and scalable manner? Such a
9
reliable transport protocol must be lightweight and energy-efficient to be realized
on low-end sensor nodes, and capable of isolating applications from the unreliable
nature of wireless sensor networks. We address this problem in this dissertation
through the development of a reliable transport protocol and use a reprogramming
application capable of re-tasking sensor networks as an application driver.
1.1.1.2 Congestion Control Challenges
While developing the reliable transport protocol discussed in this dissertation, we
analyzed the loss patterns from a sensor network testbed and observed that signifi-
cant loss is also due to congestion for a wide range of workloads, including light and
moderate traffic. This observation leads us to the study of the congestion problem
in sensor networks.
Sensor networks are likely to suffer from various degrees of congestion depending
on the different classes of applications they run, which in turn generate different
traffic patterns (as discussed in Section 1.1.1). Even with a simple application
that generates periodic workloads, congestion can occur in wireless sensor networks
because of the poor and time-varying channel quality [41]. In this case, the time-
varying channel suffers from occasional deep fades for an extended period of time.
During these periods, the queue of the sending node grows quickly (when link-layer
ARQ is in use). This results in the eventual overflow of the sensor’s buffer and
potentially significant packet drops. In sensor networks that support applications
that generate discrete events and impulse waves, the congestion problem is much
more severe. Event-driven sensor networks typically operate under light load, but
can suddenly become active in response to a detected event. The transport of
event impulses is likely to trigger varying degrees of congestion, potentially leading
to the congestion collapse of a sensor network. Although a sensor network may
10
spend only a small fraction of its time dealing with data impulses, it is during these
impulse periods that the information it transits is of greatest importance, and it
is at this exact time that the information in transit is more likely to be lost; this
makes the congestion problem in sensor networks such a severe one. While some
researchers have broadly discussed congestion issues in sensor networks [35] there is
no comprehensive approach to solving this important problem.
1.1.2 Problem Statement
In traditional networks (e.g., IP networks), congestion control and transport relia-
bility are often coupled into a single protocol solution (e.g., TCP). This approach,
however, is not necessarily correct in the context of sensor networks. In sensor net-
works, the energy expenditure is more important than occasional data loss because
of the natural redundancy inherent in disseminated sensor data. Depending on the
application and the direction of the information flow as discussed in Section 1.1.1,
not all data packets require strict reliability. For example, applications that monitor
the temperature of a certain geographic region can tolerate occasional packet loss.
Therefore, the complex protocol machinery that would ensure the reliable delivery
of data is not always needed. Due to this application-specific nature of sensor net-
works, we argue that there is a need for the separation of transport reliability and
congestion control in sensor networks.
This dissertation investigates the tradeoffs and performance limits of applying
existing techniques to solving the reliable transport and congestion problems in
sensor networks. Based on this analysis we propose new approaches that are specif-
ically designed to best fit the unique constraints of sensor networks and emerging
application needs. The resulting transport and control algorithms proposed in this
dissertation provide a general set of mechanisms that can be plugged into applica-
11
tions or the appropriate layers of the protocol stack in support of energy efficient
reliable transport and congestion control.
1.1.3 Technical Barriers
Sensor networks have unique system characteristics and constraints [5] [6] [7] that are
significantly different from traditional networks. For example, the communication
channel condition is highly variable and unpredictable due to (i) a low-power radio
that is susceptible to channel fading; and (ii) the highly dynamic and harsh physical
environment. It may be reasonable to expect that low-power radio performance
can be improved significantly in the foreseeable future, but it is unlikely that the
improvement in the radio design itself can counter all existing adverse effects from
the environment, such as during moments of disasters such as flooding, fires, etc. In
what follows, we discuss the important technical barriers to realizing energy-efficient
reliable delivery and congestion avoidance in sensor networks.
1.1.3.1 Reliable Transport
Reliable data delivery in IP networks relies on end-to-end error recovery mechanisms
(e.g., TCP) in which only the final destination node is responsible for detecting loss
and requesting retransmissions. The end-to-end paradigm [42] has had a large im-
pact enabling the Internet and its success. However, we argue that the end-to-end
paradigm is not appropriate for the design of networking protocols in sensor net-
works, which are chiefly considered to be data-centric [11] [32] instead of user or
host-centric. In the case of reliable data delivery, the biggest problem with end-to-
end recovery has to do with the physical characteristics of the transport medium.
Sensor networks usually operate in harsh radio environments, and rely on multihop
forwarding techniques to exchange messages. Error accumulates exponentially over
12
multi-hops, therefore packet loss and reordering is more likely. Recent studies [40]
[43] [41] show that in sensor networks that use low-power radios without frequency
diversity, packet delivery performance is both highly variable and exhibits spatial
and temporal dependency. For example, the packet reception rate can range from
90% to 50%. Under such error-prone channel conditions, it is almost impossible to
deliver a single event using an end-to-end approach for larger networks. This obser-
vation suggests that end-to-end error recovery is not a good candidate for reliable
delivery in sensor networks. In Chapter 2, we propose an alternative approach that
is characterized by hop-by-hop error recovery and rate control.
TCP uses a positive acknowledgment (ACK) approach for loss detection. It is
well-known that a positive ACK approach performs better in high error rate envi-
ronments than a negative acknowledgment (NAK) approach. However, the control
overhead of the positive ACK approach is very high even under error-free conditions
because of the requirement to acknowledge each individual packet. Each of these
ACKs consumes energy and is therefore costly. Energy conservation is a particularly
critical issue in sensor networks. In a NAK based system, a node pays a price in
terms of control overhead only when the channel condition is poor. In addition, an
ACK approach can not support multicast or broadcast operations because of the
ACK implosion problem [44]. Many communication scenarios in sensor networks
are often concerned with group-based communications, i.e., one-to-many commu-
nications. This is especially true for data that flows from sinks to sources for the
purpose of control or management of the network. As a result, a NAK approach
makes more sense to provide reliable delivery in sensor networks since packet deliv-
ery is free of overhead in error-free environments and group-based communications
can be supported.
Another approach would be to layer a NAK transport upon a positive ACK link
13
Physical sink Virtual sink sensor Active sensor
Figure 1-1: The funneling effect. Sensors within the range of an event re-gion/epicenter (enclosed by the dotted ellipse line) generate data that travels along apropagation funnel (enclosed by dotted line) toward the sink when an event occurs.
layer in a unicast scenario. The feasibility of using such an approach relies on the
sensor’s radio design. Many tradeoffs exist: for example, some low-power radios
support low-overhead link-layer synchronous ACK [45] (e.g., the RFM radio used
in Mica [24]), others support built-in link-layer ACK for higher data rates up to
250 Kbps (e.g., the IEEE 802.15.4 radio used in Telos [25]), while with other radios
the support of ACKs is costly (e.g., the Chipcon radio [46] used in Mica2 [45]) in
terms of the energy and bandwidth consumption. Further complicating the issue,
these link-layer ACK approaches assume a symmetrical link environment, which is
not always present in today’s sensor networks [41]. As a result, there is a need to
study new transport approaches capable of supporting a wide range of channel error
conditions (including non-symmetrical link environments) that take energy savings,
scalability, and lightweight operations into account as key design goals.
14
1.1.3.2 Congestion Avoidance and Control
As shown in Figure 1-1, we observe that sensor networks exhibit a unique funneling
effect that significantly complicates the design of these networks. The funneling
effect results when events or periodic reports are generated and then move through
the sensor network on a hop-by-hop basis toward a relatively small number of phys-
ical sink points that are attached to the regular communication infrastructure (e.g.,
the Internet). The flow of events out of the network has similarities to the flow of
people from a large arena after sporting events complete. This leads to a number
of significant challenges including increased transit traffic intensity, congestion, and
packet loss at nodes closer to the sink.
In wired networks, congestion is signified by buffer drops and increased delays.
Therefore, monitoring the buffer size and transmission delays provides an accurate
measure for congestion detection. In sensor networks, since the transmission medium
is shared, traffic between other nodes in the neighborhood may cause interference.
As a result, radio channel quality can be seriously degraded in times of congestion,
resulting in an increase in packet error rates. Today, variants of CSMA MAC are
widely used in many sensor platforms [23]. In a CSMA-based network, because of
the contention on the medium, congestion causes the increase of packet collisions
in addition to an increase in packet error rates. These account for the majority of
the packet drops in times of congestion, as opposed to packet drops due to buffer
overflows in wired network. As a result, the buffer occupancy and transmission
delays alone can no longer provide an accurate and timely indication of congestion
in sensor networks. There is a need for new congestion detection techniques that
are efficient, accurate, and incur low cost in terms of energy, computation and
complexity.
TCP offers end-to-end flow control through its window-based control mechanism
15
and avoids congestion via the Additive Increase and Multiplicative Decrease (AIMD)
of the window size at the sending host. This mitigates congestion by aggressively
metering traffic being admitted into the network when congestion is detected. In
sensor networks, however, throttling transmission rates at the sources alone does
not resolve congestion nor its negative impact on the network. This is because
the major concerns during congestion are the degraded application data fidelity
measured at the sink, and the energy wasted due to the packet loss/drops in the
network. These occur as a result of packet collisions and degraded channel quality
due to interference. The main goal of congestion control in sensor networks is thus
to maintain data fidelity and reduce packet drops due to collision, in addition to
regulating the admitted traffic into the network at the sources.
A number of distinct congestion scenarios are likely to arise in sensor networks.
First, densely deployed sensors generating impulse data events create persistent
hotspots (i.e., congested nodes or areas) proportional to the impulse rate at loca-
tions close to the sources (e.g., within one or two hops). In this scenario, localized,
fast time scale mechanisms capable of providing backpressure from the points of
congestion back to the sources may be potentially effective. Second, sparsely de-
ployed sensors generating low data rate events create transient hotspots potentially
anywhere in the sensor field but likely farther from the sources, toward the sink.
In this case, fast time scale resolution of localized hotspots using a combination
of localized backpressure (between nodes identified in a hotspot region) and rate
limiting techniques may be potentially more effective. Because of the transient na-
ture of congestion, source nodes may not be involved in the backpressure. Third,
sparsely deployed sensors generating high data-rate events create both transient and
persistent hotspots distributed throughout the sensor field. In this final scenario, a
combination of fast time scale actions to resolve localized transient hotspots, and
16
closed loop rate regulation of all sources that contribute toward creating persistent
congestion may be potentially effective.
1.2. Thesis Outline
In order to overcome the technical barriers to supporting resilient transport system
in sensor networks discussed above, we propose to use a combination of simula-
tions, experimentations and analytical modeling to best understand the problem
and solution space. We emphasize a methodology founded on an experimental sys-
tems research approach that builds a small experimental sensor network testbed in
the laboratory to best understand the problems discussed and our proposed algo-
rithms. We study resilience issues in this small testbed, and apply what is learned
toward the construction of larger and more scalable distributed systems. We adopt
a general design principle when studying reliable transport and congestion control
that makes minimum assumptions about the network and applications. Such an
approach is important because of the application-specific nature of sensor networks.
The outline of our study is as follows.
1.2.1 Pump Slowly Fetch Quickly (PSFQ)
Chapter 2 describes the development of Pump Slowly Fetch Quickly (PSFQ), a
lightweight, scalable and robust transport protocol that is customizable to meet
the needs of supporting reliable control and management of sensor networks, and
remotely programming/re-tasking sensor nodes over-the-air. PSFQ represents a sim-
ple approach with reduced requirements on the routing infrastructure (as opposed
to IP multicast routing requirements), and reduced signaling, thereby reducing the
communication cost for data reliability. Further, PSFQ is responsive to high error
rates, allowing successful operation even under highly error-prone conditions.
17
Several important contributions come out of this work. First, we propose and
justify hop-by-hop error recovery in which intermediate nodes also take responsi-
bility for loss detection and recovery, so that reliable data exchange is done on a
hop-by-hop basis rather than end-to-end. Second, we analyze a simplified model of
our NAK-based algorithm and determine a near-optimal ratio between the timers as-
sociated with the forwarding (pump) and retransmission (fetch) operations. Third,
PSFQ exhibits a unique multi-modal communication property that provides a grace-
ful tradeoff between the packet switching and store-and-forward paradigms, depend-
ing on the channel conditions encountered. This multi-modal transport behavior is
crucial to the performance of the reliable delivery service in sensor networks and is
responsive across a wide range of bit error rates, (i.e., low, moderate, and high bit
error rates).
We present the design and implementation of PSFQ, and evaluate the proto-
col using the ns-2 simulator and an experimental wireless sensor testbed based on
Berkeley motes and the TinyOS operating system. We show that PSFQ can out-
perform existing related techniques and is highly responsive to the various error
conditions experienced in sensor networks.
1.2.2 CODA (COngestion Detection and Avoidance)
A resilient sensor network must be capable of balancing the offered load, while at-
tempting to maintain acceptable fidelity (e.g., rate of events) of the delivered signal
at the sink during periods of transient and more persistent congestion. In Chap-
ter 3, we present the design of an energy-efficient congestion control scheme for
sensor networks called CODA (COngestion Detection and Avoidance). We explore
and identify a new objective function for traffic control mechanisms in wireless sen-
sor network, which attempts to maximize the operational lifetime of the network
18
while delivering acceptable data fidelity to the applications. CODA implements
three components to realize such an objective function: (1) a timely, accurate and
energy-efficient congestion detection scheme; (2) a hop-by-hop backpressure algo-
rithm; and (3) a sink to multi-source regulation scheme. To evaluate CODA in a
realistic environment, we study and analyze a number of congestion scenarios that
we believe will be prevalent in sensor networks. We define new performance metrics
suitable for sensor network transport that capture the impact of CODA on a sensing
application’s performance. Furthermore, we discuss the performance benefits and
practical engineering challenges of implementing CODA in an experimental sensor
network testbed based on Berkeley motes using CSMA. Both testbed and simu-
lation results indicate that CODA significantly improves the performance of data
dissemination applications such as Directed Diffusion [11] by mitigating hotspots,
and reducing the energy consumption and fidelity penalty of sensing applications.
1.2.3 Dual Radio Virtual Sinks
The combination of the funneling effect and low-power radio channel can signifi-
cantly limit the network’s ability to deliver high fidelity data from sources to sinks
(i.e., to applications). To overcome this capacity limitation, new technologies must
be studied and developed. We observe that CODA provides a conservative solution
to mitigating congestion in sensor networks and assumes that all nodes are equal
(with the exception of the sink) in trying to counter and react to the onset of conges-
tion. When congestion occurs and the channel becomes saturated, the application
fidelity, which can be viewed as the application’s quality of service measured at
the sink, can be significantly degraded. This is because CODA’s congestion control
policy at sources and forwarding nodes is to rate control the traffic during peri-
ods of persistent congestion. While CODA and other traditional congestion control
19
schemes are capable of avoiding congestion and costly packet loss and therefore en-
ergy waste, it is to the detriment of the maximum number of events that can be
delivered to the sink.
In Chapter 4, we study an alternative but complementary solution to CODA
that maintains the application’s fidelity during persistent overload conditions. A
number of observations inform our study and design; primarily, “small worlds” [47]
research has shown that a small fraction of shortcut nodes randomly distributed
in a network is enough to effectively reduce the network diameter resulting in a
fast distribution network. Inspired by this result, we propose a new approach to
mitigating congestion in sensor networks based on the concept of dual radio virtual
sinks. Our proposal is as follows. After randomly distributing a small number of
all-wireless dual radio virtual sinks throughout the sensor field, we propose to enable
these virtual sinks to operate as safety valves in the sensor field. Specifically, virtual
sinks selectively siphon off high load traffic in order to maintain the fidelity of the
application signal at the physical sink. Virtual sinks are equipped with a secondary
long-range radio interface, such as the IEEE 802.11, in addition to their primary low
power mote radio. Virtual sinks are capable of dynamically forming a secondary ad
hoc radio network. Rather than rate controlling packets as is the case with CODA,
virtual sinks take the congested traffic off the low-powered low-bandwidth primary
sensor network and move it on to the higher-bandwidth secondary radio network,
transiting it to the final physical data sink.
Chapter 4 explores algorithms for virtual sink discovery, selection, traffic tran-
siting, and load balancing. We leverage the use of heterogeneous sensors and study
the use of Stargate [48] nodes to support an all-wireless virtual sink approach in our
sensor network testbed. We show that a small number of virtual sinks are sufficient
to significantly improve the data fidelity of the sensor networks while operating in
20
overload conditions.
1.3. Thesis Contributions
This dissertation provides several broad contributions summarized as follows:
1. There is a growing need to support reliable data communications in sensor
networks that are capable of supporting new applications, such as the assured
delivery of high priority events to sinks, the duplication of a database [39]
stored in a remote sensor on other sensors, the reliable control and manage-
ment of sensor networks, and remotely programming/re-tasking sensor nodes
over-the-air [19] [22]. This represents a significant challenge because there
is no prior work on reliable transports for sensor networks, and most of the
existing approaches in wired and mobile ad hoc (MANET) networks can not
guide an efficient solution to this problem. Our work in PSFQ as described in
Chapter 2 has been widely recognized as the first contribution to the problem
of reliable delivery in sensor networks. Following the publication of PSFQ
[49], a number of follow-up studies have been conducted that continue to ad-
vance the development of a de facto standard for reliable transport in sensor
networks.
2. While some researchers have discussed congestion issues [35] there has been
no comprehensive approach to the problem. CODA is the first such general
algorithmic approach that includes a low-cost sampling scheme for congestion
detection, a backpressure algorithm, and sink to multi-source regulation, as
presented in Chapter 3. Hop-by-hop error recovery and flow control have been
proposed before in wired networks [50] [51] [52] but not for sensor networks.
The major contributions of CODA include a low cost congestion detection
21
technique, and a new objective function that governs the design of two energy-
efficient control mechanisms, that are responsive to a wide set of congestion
scenarios and congestion time scales.
3. Dual radio wireless systems have been proposed in the literature [53] for cel-
lular and WLAN-based networks. However, there is limited understanding
of the use of dual radio nodes in sensor networks. The contribution of vir-
tual sinks and its support Siphon protocol discussed in Chapter 4 allows the
network to maintain application fidelity even under overload conditions. We
believe it is essential that networks comprising a large number of sensors in-
corporate heterogeneity by means of special nodes (e.g., Stargate nodes) that
offer enhanced services to applications. These special nodes will likely offer
other services than the ones we study, such as additional storage and computa-
tional capability. As such, the Siphon protocol discussed in Chapter 4 is more
broadly applicable to new special node services. Therefore, this contribution
explores general design principles that exploit dual radio nodes in sensor net-
works. However, the specific contribution of virtual sinks is that they add a
new level of resilience to sensor networks.
Collectively, PSFQ, CODA, and virtual sinks considerably enhance the resilience
and performance of the transport system for wireless sensor networks.
22
Chapter 2
Pump-Slowly, Fetch-Quickly (PSFQ): A Reliable
Transport Protocol for Sensor Networks
2.1. Introduction
There is a considerable amount of research in the area of wireless sensor networks
ranging from real-time tracking to ubiquitous computing where users interact with
potentially large numbers of embedded devices. This chapter addresses the design of
system support for a new class of applications emerging in wireless sensor networks
that require reliable data delivery. One such application that is driving our research
is the reprogramming or “re-tasking” of groups of sensors over-the-air. This is one
new application in sensor networks that requires the underlying transport protocol
to support reliable data delivery. Today, sensor networks tend to be application
specific and are typically hard-wired to perform a specific task efficiently at low cost.
We believe that as the number of sensor network applications grows, there will be a
need to build more powerful general-purpose hardware and software environments
capable of reprogramming or re-tasking sensors to do a variety of tasks. These
general-purpose sensors would be capable of servicing new and evolving classes of
applications. Such systems are beginning to emerge. For example, the Berkeley
23
motes [23] [54] [24] are capable of receiving code segments from the network and
assembling them into a completely new execution image in EEPROM secondary
store before re-tasking a sensor.
Unlike traditional networks (e.g., IP networks), reliable data delivery is still an
open research question in the context of wireless sensor networks. To our knowledge
there has been little work on the design of reliable transport protocols for sensor
networks. This is expected because the vast majority of sensor network applica-
tions do not require reliable data delivery. For example, in applications such as
temperature monitoring or animal location tracking, the occasional loss of sensor
readings is tolerable, and therefore, the complex protocol machinery that would
ensure the reliable delivery of data is not needed. Directed Diffusion [8] is one of
a representative class of data dissemination mechanisms, specifically designed for a
general class of applications in sensor networks. Directed Diffusion provides robust
dissemination through the use of multi-path data forwarding, but the correct recep-
tion of all data messages is not assured. We observed that in the context of sensor
networks, data that flows from sources to sinks is generally tolerable of loss. On
the other hand, data that flows from sinks to sources for the purpose of control or
management (e.g., re-tasking sensors, actuation) is sensitive to message loss. For
example, disseminating a program image to sensor nodes is problematic. Loss of
a single message associated with code segment or script would render the image
useless and the re-tasking operation a failure.
There are a number of challenges associated with the development of a reliable
transport protocol for sensor networks. For example, in the case of a re-tasking
application there may be a need to reprogram certain groups of sensors (e.g., within
a disaster recovery area). This would require addressing groups of sensors, loading
new binaries into them, and then, switching over to the new re-tasked application in
24
a controlled manner. Another example of new reliable data requirements relates to
simply injecting scripts into sensors to customize them rather than sending complete,
and potentially bandwidth demanding, code segments. Such re-tasking becomes in-
creasingly challenging as the number of sensor nodes in the network grows. How can
a transport protocol offer suitable support for such a re-tasking application where
possibly hundreds or thousands of nodes need to be reprogrammed in a controlled,
reliable, robust and scalable manner? Such a reliable transport protocol must be
lightweight and energy-efficient to be realized on low-end sensor nodes, such as, the
Berkeley mote series of sensors, and capable of isolating applications from the unreli-
able nature of wireless sensor networks in an efficient and robust manner. The error
rates experienced by these wireless networks can vary widely, and therefore, any
reliable transport protocol must be capable of delivering reliable data to potentially
large groups of sensors under such conditions.
In this chapter, we propose PSFQ (Pump Slowly, Fetch Quickly), a new reli-
able transport protocol for wireless sensor networks. Due to the application-specific
nature of sensor networks, it is hard to generalize a specific scheme that can be
optimized for every application. Rather, the focus of this chapter is the design
and evaluation of a new transport system that is simple, robust, scalable, and cus-
tomizable to different applications’ needs. PSFQ represents a simple approach with
reduced requirements on the routing infrastructure (as opposed to IP multicast
routing requirements), reduced signaling, thereby reducing the communication cost
for data reliability, and finally, responsive to high error rates allowing successful
operation even under highly error-prone conditions.
This chapter represents an extended version of the work that first appeared in
[49] and is organized as follows. Section 2.2. presents the PSFQ model and discusses
important design choices. Section 2.3. details the design of the PSFQ pump, fetch
25
and report mechanisms. Section 2.4. presents an evaluation of the protocol and
comparison to existing related techniques such as Scalable Reliable Multicast (SRM)
[55] using the ns-2 simulator. We show that PSFQ can outperform an idealized SRM
scheme and is highly responsive to the various error conditions experienced in sensor
networks. Section 2.5. discusses experimental results from the implementation of
PSFQ in a wireless sensor testbed based on Berkeley motes. Section 2.6. discusses
related work, and finally, we present some concluding remarks in Section 2.7..
2.2. Protocol Design Space
The key idea that underpins the design of PSFQ is to distribute data from a source
node by pacing data at a relatively slow speed (“pump slowly”), but allowing nodes
that experience data loss to fetch (i.e., recover) any missing segments from imme-
diate neighbors very aggressively (local recovery, “fetch quickly”). Messages that
are lost are detected when a higher sequence number than expected is received at
a node triggering the fetch operation, (i.e., an energy-efficient negative acknowledg-
ment system that PSFQ is based on). The motivation behind our simple model is to
achieve loose delay bounds while reducing the lost recovery cost by using localized
recovery of data among immediate neighbors.
2.2.1 Hop-by-Hop Error Recovery
To achieve these goals we have taken a different approach in comparison to tradi-
tional end-to-end error recovery mechanisms in which only the final destination node
is responsible for detecting loss and requesting retransmission. The biggest problem
with end-to-end recovery has to do with the physical characteristic of the trans-
port medium. Sensor networks usually operate in harsh radio environments, and
rely on multihop forwarding techniques to exchange messages. Error accumulates
26
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
2 4 6 8 10 12 14
Suc
cess
Rat
e
Network Size (number of hops)
1% error5% error
10% error20% error30% error40% error50% error
Figure 2-1: Probability of successful delivery of a message using an end-to-end modelacross a multi-hop network.
exponentially over multi-hops, therefore, packet loss and reordering is more likely.
To simply illustrate this, assume that the packet error rate of a wireless channel
is p then the chances of exchanging a message successfully across n hops decreases
quickly to (1−p)n. Figure 2-1 illustrates this problem numerically. Figure 2-1 plots
the success rate as a function of the network size in number of hops, and shows
that for larger networks it is almost impossible to deliver a single message using
an end-to-end approach in a lossy link environment when the error rate is larger
than 10%. In [56] [41] the authors show that it is not unusual to experience error
rates of 10% or above in dense wireless sensor networks. We believe that the error
rate could be even higher in many cases, such as, military applications, industrial
process monitoring, and disaster recovery activities. This observation suggests that
end-to-end error recovery is not a good candidate for reliable transport in wireless
sensor networks.
27
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.01 0.1 0.3 0.6 0.9
Suc
cess
Rat
e
Packet Loss Rate (Log scale)
Allow 7 retransmissionsAllow 5 retransmissionsAllow 3 retransmissionsAllow 1 retransmission
no retransmission
Figure 2-2: Probability of successful delivery of a message over one hop when themechanism allows multiple retransmissions before the next packet arrival.
We propose hop-by-hop error recovery in which intermediate nodes also take
responsibility for loss detection and recovery, so reliable data exchange is done on
a hop-by-hop basis rather than end-to-end. This approach essentially segments
multihop forwarding operations into a series of single hop transmission processes
that eliminate error accumulation. The hop-by-hop approach thus scales better
and is more tolerable to errors while reducing the likelihood of packet reordering in
comparison to end-to-end approaches.
2.2.2 Fetch/Pump Relationship
For a negative acknowledgment system, the data delivery latency would be depen-
dent on the expected number of retransmissions for successful delivery. To reduce
the latency, it is essential to maximize the probability of successful delivery of a
packet within a “controllable time frame”. An intuitive approach to doing this
28
would be to enable the possible multiple retransmissions of packet i (therefore in-
creasing the chances of successful delivery) before the next packet i + 1 arrives; in
other words, clear the queue at a receiver (e.g., an intermediate sensor) before new
packets arrive in order to keep the queue length small and hence reduce the delay.
However, it is non-trivial to determine the optimal number of retransmissions that
tradeoff the success rate, (i.e., probability of successful delivery of a single message
within a time frame) against wasting too much energy on retransmissions. In order
to investigate and justify this design decision, we analyze a simple model, which
approximates this mechanism. Assuming that the packet loss rate p stays constant
during the controllable time frame, it can be shown that in a negative acknowledg-
ment system, the probability of a successful delivery of a packet between two nodes
that allows k retransmissions can be expressed recursively as:
(1 − p) + p × Ω(k) (k ≥ 1) (2.1)
where,
Ω(k) = Φ(1) + Φ(2) + . . . + Φ(k)
Φ(k) = (1 − p)2 × [1 − p − Φ(1) − . . . − Φ(k − 1)] (Φ(0) = 0) (2.2)
Ω(k) is the probability of a successful recovery of a missing segment within k re-
transmission, Φ(k) is the probability of a successful recovery of the missing segment
at kth retransmission. The above expressions are evaluated numerically against the
packet loss rate p, as shown in Figure 2-2, demonstrating the impact of increasing
the number of retransmissions up to k equal to 7. We can see that substantial
improvements in the success rate can be gained in the region where the channel
error rate is between 0 and 60%. However, the additional benefit of allowing more
29
retransmissions diminishes quickly and becomes negligible when k is larger than 5.
This simple analysis implies that the tolerable ratio between the timers associated
with the pump and fetch operations is within the range between 3 and 5.
2.2.3 Multi-modal Operations
In a negative acknowledgment system, a local loss event could propagate to down-
stream nodes if higher sequence number packets are continuously forwarded. The
propagation of a loss event could cause a serious waste of energy because a loss
event will trigger error recovery operations that attempt to fetch the missing packet
quickly from immediate neighbors, whereas none of their (downstream nodes) neigh-
bors would have the missing packet. Therefore, the loss cannot be recovered and the
control messages associated with the fetch operation are wasted. As a result, it is
necessary to make sure that intermediate nodes only relay messages with continuous
sequence numbers.
The use of a data cache is required to buffer messages to ensure in-sequence
data forwarding and the complete recovery for any fetch operations from down-
stream nodes. Note that the cache size effect is not investigated here but for our
reference application (i.e., re-tasking) the cache keeps all code segments. This pump
mechanism not only prevents propagation of loss events and the triggering of un-
necessary fetch operations from downstream nodes, but it also greatly contributes
toward the error tolerance of the protocol against channel conditions. By localizing
loss events and not relaying any higher sequence number messages until recovery
has taken place, this mechanism operates in a similar fashion to a store-and-forward
approach where an intermediate node relays a file only after the node has received
the complete file. The store-and-forward approach is effective in highly error-prone
environments because it essentially segments the multi-hop forwarding operations
30
into a series of single hop transmission processes.
PSFQ benefits from the following tradeoff between store-and-forward and packet
switching. The pump operation operates in a multihop packet-switching mode dur-
ing periods of low errors when lost packets can be recovered quickly, and behaves
more like store-and-forwarding communications when the channel is highly error-
prone. Therefore, PSFQ exhibits a novel multi-modal communications property
that provides a graceful tradeoff between the packet switching and store-and-forward
paradigms, depending on the channel conditions encountered.
2.3. Protocol Description
PSFQ comprises three protocol functions: message relaying (pump operation), relay-
initiated error recovery (fetch operation), and selective status reporting (report op-
eration). A user injects messages into the network and intermediate nodes buffer
and relay messages with the proper schedule to achieve loose delay bounds. A relay
node maintains a data cache and uses cached information to detect data loss, initi-
ating error recovery operations if necessary. It is important for the user to obtain
statistics about the dissemination status in the network as a basis for subsequent
decision-making, such as the correct time to switch over to the new task in the
case of re-tasking/ programming sensors over-the-air. Therefore, it is necessary to
incorporate a feedback and reporting mechanism into PSFQ that is flexible (i.e.,
adaptive to the environment) and scalable (i.e., minimizes the overhead).
In what follows, we describe the main PSFQ operations (viz. pump, fetch and
report) with specific reference to a re-tasking application – one in which a user
needs to re-task a set of sensor nodes by distributing control scripts or binary code
segments into the targeted sensor nodes.
31
2.3.1 Pump Operation
Recall that PSFQ is not a routing solution but a transport scheme. In a case where
a specific node needs to be addressed directly, instead of a whole group of sensors,
which is the norm, then PSFQ can operate on top of existing routing schemes (e.g.,
DSDV [57]) to support reliable data transport1. A user node can use TTL-based
as well as group address filtering [19] methods to control the scope of its re-tasking
operation. Note, however, that this method does not provide accurate scope control
because in many cases the intended receivers cannot be neatly defined by a limit of
TTL. To enable local loss recovery and in-sequence data delivery, a data cache is
created and maintained at intermediate nodes. The pump operation is important
in controlling the timely dissemination of code segments to all target nodes, and
providing basic flow control so that the re-tasking operation does not overwhelm
the regular operations of the sensor network. This requires proper scheduling for
data forwarding. We adopt a simple scheduling scheme, which use two timers Tmin
and Tmax for scheduling purposes.
2.3.1.1 Pump Timers
A user node broadcasts a packet to its neighbors every Tmin until all the data
fragments have been sent out. Neighbors that receive this packet will check against
their local data cache discarding any duplicates. If this is a new message PSFQ
will buffer the packet and decrease the TTL by 1. If the TTL value is not zero
and there is no gap in the sequence number, then PSFQ sets a schedule to forward
the message. The packet will be delayed for a random period between Tmin and
1To support reliable transport for any-to-any communication scenarios, PSFQ is layered upon arouting scheme and uses the unicast address of the destination node instead of a broadcast addressin data packets. In order to support hop-by-hop error recovery, a “snoop” component is needed tocopy packets from the routing agent to the PSFQ agent. In this case, only nodes en-route to thedestination node, as determined by the routing algorithm, participate in the PSFQ operations.
32
Tmax and then relayed to its neighbors that are one or more hops away from the
source. In this specific reference case PSFQ simply rebroadcasts the packet. A
packet propagates outward from the source node up to TTL hops away in this
mode. The random delay before forwarding a message is necessary to avoid collisions
because RTS/CTS dialogues are inappropriate in broadcasting operations when the
timing of rebroadcasts among interfering nodes can be highly correlated.
Tmin has several considerations. First, there is a need to provide a time-buffer
for local packet recovery. One of the main motivations behind the PSFQ paradigm
is to recover lost packets quickly among immediate neighboring nodes within a
controllable time frame. Tmin serves such a purpose in the sense that a node has an
opportunity to recover any missing segment before the next segment comes from its
upstream neighbors, since a node must wait at least Tmin before forwarding a packet
as part of the pump operation. Next, there is a need to reduce redundant broadcasts.
In a densely deployed network it is not unusual to have multiple immediate neighbors
within radio transmission range. In [58], the authors show that a rebroadcast system
can provide only 0-61% additional coverage over what was already covered by the
previous transmissions. Furthermore, it is shown that if a message has been heard
more than 4 times the additional coverage is below 0.05%. Tmin associated with
the pump operation provides an opportunity for a node to hear the same message
from other rebroadcasting nodes before it would actually have started to transmit
the message. A counter is used to keep track of the number of times the same
broadcast message is heard. If the counter reaches 4 before the scheduled rebroadcast
of a message then the transmission is cancelled and the node will not relay the
specific message because the expected benefit (additional coverage) is very limited
in comparison to the cost of transmission. Tmax can be used to provide a loose
statistical delay bound for the last hop to successfully receive the last segment of
33
a complete file (e.g., a program image or script). Assuming that any missing data
is recovered within one Tmax interval using the aggressive fetch operation described
in next section, then the relationship between delay bound D(n) and Tmax is as
follows: D(n) = Tmax × n× (Number of hops), where n is the number of fragments
of a file.
2.3.2 Fetch Operation
A node goes into the PSFQ fetch mode once a sequence number gap in a file’s
fragments is detected. A fetch operation is the proactive act of requesting a retrans-
mission from neighboring nodes once loss is detected at a receiving node. PSFQ
uses the concept of loss aggregation whenever loss is detected; that is, it attempts
to batch up all message losses in a single fetch operation whenever possible.
2.3.2.1 Loss Aggregation
Data loss is often correlated in time because of fading conditions and other chan-
nel impairments. As a result loss usually occurs in batches (bursty loss). PSFQ
aggregates loss such that the fetch operation deals with a “window” of lost packets
instead of a single packet loss. In a dense network where a node usually has more
than one neighbor it is possible that each of its neighbors only obtains or retains
part of the missing segments in the loss window. PSFQ allows different segments
of the loss window to be recovered from different neighbors. In order to reduce
redundant retransmissions of the same segment each neighbor waits for a random
time before transmitting segments. Other nodes that have the data and scheduled
retransmissions will cancel their timers if they hear the same repair from a neigh-
boring node. In poor radio environments successive loss could occur including loss
of retransmissions and fetch control messages. Therefore, it is not unusual to have
34
multiple gaps in the sequence number of messages received by a node after several
such failures. Aggregating multiple loss windows in the fetch operation increases
the likelihood of successful recovery in the sense that as long as one fetch control
message is heard by one neighbor then all the missing segments could be resent by
this neighbor.
2.3.2.2 Fetch Timer
In fetch mode, a node aggressively sends out NAK messages to its immediate neigh-
bors to request missing segments. If no reply is heard or only a partial set of missing
segments are recovered within a period Tr (Tr < Tmax, this timer defines the ratio
between pump and fetch, as discussed earlier) then the node will resend the NAK
every Tr interval (with slight randomization to avoid synchronization between neigh-
bors) until all the missing segments are recovered or the number of retries exceed
a preset threshold thereby ending the fetch operation. The first NAK is scheduled
to be sent out within a short delay that is randomly computed between 0 and ∆
( Tr). The first NAK is cancelled (to keep the number of duplicates low) in the
case where a NAK for the same missing segments is overheard from another node
before the NAK is sent. Since ∆ is small the chance of this happening is relatively
small. In general, retransmissions in response to a NAK coming from other nodes
are not guaranteed to be overheard by the node that cancelled its first NAK. In
[58] the authors show that at most there is a 40% chance that the canceling node
receives the retransmitted data under such conditions. Note, however, that a node
that cancels its NAK will eventually resend a NAK within Tr if the missing seg-
ments are not recovered, therefore, such an approach is safe and beneficial given the
tradeoffs.
To avoid the message implosion problem NAK messages never propagate; that
35
is, neighbors do not relay NAK messages unless the number of times the same NAK
is heard exceeds a predefined threshold while the missing segments requested by the
NAK message are no longer retained in a node’s data cache. In this case, the NAK
is relayed once, which in effect broadens the NAK scope to one more hop to increase
the chances of recovery.
Each neighbor that receives a NAK message checks the loss window field. If
the missing segment is found in its data cache the neighboring node schedules a
reply event (sending the missing segment) at a random time between ( 1
4Tr,
1
2Tr).
Neighbors will cancel this event whenever a reply to the same NAK for the same
segment is overheard. In the case where the loss window in a NAK message contains
more than one segment to be resent, or more than one loss window exists in the NAK
message, then neighboring nodes that are capable of recovering missing segments
will schedule their reply events such that packets are sent in-sequence at a speed
that is not faster than once every 1
4Tr.
2.3.2.3 Proactive Fetch
As in many negative acknowledgment systems the fetch operation described above
is a reactive loss recovery scheme in the sense that a loss is detected only when a
packet with a higher sequence number is received. This could cause problems on
rare occasions; for example, if the last segment of a file is lost there is no way for
the receiving node to detect this loss since no packet with a higher sequence number
will be sent. In addition, if the file to be injected into the network is small (e.g., a
script instead of binary code), it is not unusual to lose all subsequent segments up
to the last segment following a bursty loss. In this case, the loss is also undetectable
and thus non-recoverable with such a reactive loss detection scheme. In order to
cope with these problems PSFQ supports a timer-based proactive fetch operation
36
such that a node can also enter the fetch mode proactively and send a NAK message
for the next segment or the remaining segments if the last segment has not been
received and no new packet is delivered after a period of time Tpro.
The proactive fetch mechanism is designed to automatically trigger the fetch
mode at the proper time. If the fetch mode is triggered too early, then the extra
control messaging might be wasted since upstream nodes may still be relaying mes-
sages or they may not have received the necessary segments. In contrast, if the fetch
mode is triggered too late, then the target node might waste too much time waiting
for the last segment of a file, significantly increasing the overall delivery latency of
a file transfer. The correct choice of Tpro must consider these two cases. In our
reference application, where each segment of a file needs to be kept in a data cache
or external storage for the re-tasking operation, the proactive fetch mechanism will
“Nak” for all the remaining segments up to the last segment if the last segment
has not been received and no new packet arrives after a period of time Tpro. Tpro
should be proportional to the difference between the last highest sequence number
(Slast) packet received and the largest sequence number (Smax) of the file (the dif-
ference is equal to the number of remaining segments associated with the file), i.e.,
Tpro = α × (Smax − Slast) × Tmax (α ≥ 1). α is a scaling factor to adjust the delay
in triggering the proactive fetch and should be set to 1 for most operational cases.
This definition of Tpro guarantees that a node will wait long enough until all
upstream nodes have received all segments before a node moves into the proactive
fetch mode. This enables a node to start the proactive fetch earlier when it is closer
to the end of a file, and wait longer when it is further from completion. Such an
approach adapts nicely to the quality of the radio environment. If the channel is in a
good condition, then it is unlikely to experience successive packet loss; therefore, the
reason for the reception of no new messages prior to the anticipated last segment is
37
most likely due to the loss of the last segment, hence, it is wise to start the proactive
fetch promptly. In contrast, a node is likely to suffer from successive packet loss
when the channel is error-prone; therefore, it makes sense to wait longer before
pumping more control messages into the channel. If the sensor network is known
to be deployed in a harsh radio environment then α should be set larger than 1 so
that a node waits longer before starting the proactive fetch operation.
In other applications where the data cache size is small and nodes only can keep
a portion of the segments that have been received, the proactive fetch mechanism
will “Nak” for the same amount of segments (or less) that the data cache can
maintain. In this case, Tpro should be proportional to the size of the data cache. If
the data cache keeps n segments, then Tpro = α × n × Tmax(α ≥ 1). As discussed
previously, α should be set to 1 in low error environments and to a larger value in
harsher radio environments. This approach keeps the sequence number gap at any
node smaller than n, i.e., it makes sure that a node will fetch proactively after n
successive missing segments. Recall that a node waits at most Tmax before relaying
a message in the pump operation so that the probability of finding missing segments
in the data cache of upstream nodes is maximized.
The proactive fetch operation would ensure all intended receivers eventually
receive all of the data. But like any protocols that try to a maximum number before
giving up, PSFQ proactive fetch could stop after reaching a threshold, which is an
application-specific choice.
2.3.2.4 Signal Strength based Fetch
Recent studies [56] [41] [43] show that in sensor networks that use low-power radios
without frequency diversity, there exists very high variability in the packet delivery
performance that is both spatial and temporal dependent. Because of intermittent
38
packet reception from nodes that are more than a single hop away (however weak the
signal is) can cause nodes to send unnecessary NAK messages and retransmissions,
PSFQ also takes into consideration the received signal strength of a packet during
the fetch and repair operations. A node maintains a table of parent nodes (i.e.,
nodes from which it receives messages) with their associated average signal quality
measurements. When a node detects a gap in the sequence number upon receiving
a packet it only respond and send out a NAK if this packet comes from a parent
with the strongest average signal quality measurement. This effectively suppresses
unnecessary NAK messages triggered by the reception of packets that come from
nodes that are multiple hops away.
Similarly, when a node transmits a NAK message it includes the preferred parent
with the strongest average signal in the message. Nodes that receive this NAK will
determine if they are the preferred parent/neighbor. All non-preferred neighbors
double their response time delay in sending repair packets so that they have a greater
chance of hearing the repair packet from a better candidate node (i.e., preferred
parent/neighbor), allowing the node to cancel a repair whenever a response is heard
before sending. This approach prevents nodes sending redundant retransmissions
when they do not have a good chance of delivering these messages to a fetching
node.
2.3.3 Report Operation
PSFQ supports an optional report operation designed specifically to feedback data
delivery status to users in a simple and scalable manner. In wireless communication
it is well known that the communication cost of sending a long message is less
than sending the same amount of data using many shorter messages [59]. Given
the potentially large number of target nodes in a sensor network in addition to
39
potentially long paths (i.e., longer paths through multi-hops greatly increases the
delivery cost of data), the network would become overwhelmed if each node sent
feedback in the form of report messages. Therefore, there is a need to minimize the
number of messages used for feedback purposes.
A node enters the report mode when it receives a data message with the report bit
set in the message header. The user node sets the report bit in its injected message
whenever it needs to know the latest status of the surrounding nodes. To reduce the
number of report messages and to avoid report implosion only the last hop nodes,
(i.e., TTL = 1) will respond immediately by initiating a report message by sending it
to its parent node, where the previous segment came from, at a random time between
(0, ∆). Each node along the path toward the source node will piggyback its report
message by adding its own status information into the report, and then propagate
the aggregated report toward the user node. Each node will ignore the report if it
found its own ID in the report to avoid looping. Nodes that are not last hop nodes
but are in report mode will wait for a period of time (Treport = Tmax × TTL + ∆)
sufficient to receive a report message from a last hop node, enabling it to piggyback
its state information. A node that has not received a report message after Treport in
the report mode will initiate its own report message and send it to its parent node.
If the network is very large then it is possible for a node to receive a report message
that has no space to append more state information. In this case, a node will create
a new report message and send it prior to relaying the previously received report
that had no space remaining to piggyback its state information. This ensures that
other nodes en-route toward the user node will use the newer report message rather
than creating new reports because they themselves received the original report with
no space for piggybacking additional status.
40
2.3.4 Single-Packet Message Delivery
There is a need to support the reliable delivery of single-shot atomic messages in
sensor networks, for example, in support of reliable control and management of
sensors. For messages that fit into a single packet (e.g., smaller than the network
MTU) delivery failure is undetectable using PSFQ’s NAK-based protocol without
the addition of explicit signaling. This is because PSFQ detects loss by observing
sequence number gaps or timeouts. To address this service need PSFQ makes use
of its reporting primitive to acquire application-specific feedback at the sink. PSFQ
sets the report bit at the sink in every single-packet message that requires reliable
delivery. Based on the feedback status the sink resends the packet until all receivers
confirm reception. This essentially turns PSFQ into a positive aggregated-ACK
protocol used in an on-demand manner by the sink for these special case messages.
The use of the report mechanism to support reliable data delivery of single-shot
atomic messages highlights the flexible use of PSFQ mechanisms to meet application
specific needs.
2.4. Performance Evaluation
We use packet-level simulation to study the performance of PSFQ in relation to
several evaluation metrics and discuss the benefits of some of our design choices.
Simulation results indicate that PSFQ is capable of delivering reliable data in wire-
less sensor networks even under highly error prone conditions.
2.4.1 Simulation Approach
We implement PSFQ as part of our reference re-tasking application using the ns-
2 network simulator [60]. In order to highlight the different design choices made
we compare the performance of PSFQ to an idealized version of Scalable Reliable
41
Multicast (SRM) [55], which has some similar properties to PSFQ, but is designed
to support reliable multicast services in IP networks. While there is a growing
body of work in multicast [61] [62] in mobile ad hoc networks and some initial work
on reliable multicast support [63] [64], we have chosen SRM as the best possible
candidate that is well understood in the literature. SRM supports reliable multicast
on top of IP and uses three control messages for reliable delivery, including session,
request and repair messaging. Because SRM is designed to operate on top of an IP
multicasting substrate, it assumes an environment where there is a single path from
a source to an individual receiver, and each node receives each multicast packet at
most once. SRM is also intended for a topology where routers are not active members
of the group and do not maintain state, except for that needed for multicast routing.
SRM represents a scheme that uses explicit signaling for reliable data delivery while
PSFQ is a more minimalist transport that can be unicast (on top of routing) or
broadcast, and does not require periodic signaling.
We compare PSFQ with the loss detection/recovery approach of SRM but extract
out the IP multicast substrate and replace it with an idealized omniscient multicast
routing scheme. We therefore only compare the reliable delivery portions of the
SRM and PSFQ protocols. Since PSFQ uses a simple broadcast mechanism as a
means for routing in our reference application, it makes sense to layer SRM over an
ideal omniscient multicast routing layer for simulation purposes. Using omniscient
multicast, the source transmits its data along the shortest-path multicast tree to all
intended receivers in which the shortest path computation and the tree construction
to every destination is free in term of communication cost.
The major purpose of our comparison is to highlight the impact of different
design choices made. SRM represents a traditional receiver-based reliable transport
solution and is designed to be highly scalable for Internet applications. The SRM
42
service model has the closest resemblance to our reference application in sensor
networks. However, SRM is designed to operate in the wired Internet in which the
transport medium is highly reliable and does not suffer from the unique problems
found in wireless sensor networks, such as, hidden terminal and interference. To
make a fair comparison, we try to idealize the lower layer to minimize the differences
of the transport medium (which SRM is designed for) for simulation purposes, and,
solely focus on the reliable data delivery mechanism - we term this idealized SRM
scheme as SRM-I.
The goal of our evaluation is also to justify the design choices of PSFQ. We choose
three metrics that underpin the major motivation behind the design of PSFQ:
• Average Delivery Ratio, which measures the ratio of the number of messages
a target node received to the number of messages a user node injects into the
network. This metric indicates the error tolerance of a scheme at the point
where a scheme fails to deliver 100% of the messages injected by a user node
within certain time limits.
• Average Latency, which measures the average time elapsed between the trans-
mission of the first data packet from the user node until the reception of the
last packet by the last target node in the sensor network. This metric examines
the delay bound performance of a scheme.
• Average Delivery Overhead, which measures the total number of messages
sent per data message received by a target node. This metric examines the
communication cost to achieve reliable delivery over the network.
We study these metrics as a function of the channel error rate as well as the
network size.
43
20 m
0
25 m
Figure 2-3: Sensor network in a building. A user node at location 0 injects 50packets into the network within 0.5 seconds.
To evaluate PSFQ in a realistic scenario, we simulate the re-tasking of a simple
sensor network in a disaster recovery scenario where the sensor nodes are deployed
along the hallway on each floor of a building. Figure 2-3 shows such a simple sensor
network in a space of dimensions 100 × 100 m2. Each sensor node is located 20
m from each other. Nodes use radios with 2 Mbps bandwidth with nominal radio
range of 25 m. The channel access is the simple CSMA/CA and we use a uniformly
distributed channel error model. A user node at location 0 attempts to inject a
program image file of size equal to 2.5 KB into every node on the floor for the
purposes of re-tasking. The packet size is 50 bytes. Packets are generated from
the user node and transmitted at a rate of one packet every 10 ms. For PSFQ, the
timer parameters are set conservatively to follow the PSFQ paradigm: Tmax is 100
ms, Tmin is 50 ms, and Tr is 20 ms. Therefore, the fetch operation is 5 times faster
than pump operation. Each experiment is run 10 times and the results shown are
an average of these runs.
2.4.2 Simulation Results
One of the major goals of PSFQ is to be able to work correctly under a wide variety
of wireless channel conditions. The first experiment examines the error tolerance of
PSFQ and SRM-I, and compares their results.
44
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10 12 14
Ave
rage
Del
iver
y R
atio
Network Size (number of hops)
SRM-I, 30% error rateSRM-I, 50% error rateSRM-I, 70% error ratePSFQ , 30% error ratePSFQ , 50% error ratePSFQ , 70% error rate
Figure 2-4: Error tolerance comparison - average delivery ratio as a function of thenumber of hops under various channel condition for different packet error rates.
In Figure 2-4, we present the results for PSFQ and SRM-I under various channel
error conditions as we increase the network size in terms of the number of hops. As
one might expect, the average delivery ratio of both schemes decreases as the channel
error rate increases. For larger error rates, the delivery ratio decreases rapidly when
the network size increases. Notice that the user node starts sending data packets
into the network at a constant rate of one packet every 10 ms at 2 seconds into
the simulation trace and finishes sending all 50 packets within 0.5 seconds. The
simulation ran for 100 seconds after the user node stopped sending data packets.
Observe from Figure 2-4, SRM-I can achieve 100% delivery up to 13 hops away from
the source node only when the channel error rate is smaller than 30%. For 50% error
rate, the 100% delivery point decreases to within 5 hops; and for larger error rates,
SRM-I is only able to deliver a portion of the file up to two hops away from the user
node. In contrast, PSFQ achieves a much higher delivery ratio for all cases under
45
consideration for a wide range of channel error conditions. PSFQ achieves 100%
delivery up to 10 hops away from the user node even at 50% error rate and delivers
more than 90% of the packet up to 13 hops away. Even under extremely error-prone
conditions of 70% channel error rate, PSFQ is still able to deliver 100% data up to
4 hops away and 70% of the packets up to 13 hops, while SRM-I can only deliver
less than 30% of data up to 2 hops.
The better error tolerance exhibited by PSFQ in comparison to SRM-I justifies
the design paradigm of pump slowly and fetch quickly for wireless sensor networks.
The in-sequence data pump operation prevents the propagation of loss events, as
discussed in Section 2.2.3. SRM-I does not attempt to provide ordered delivery of
data and loss events are propagated along the multicast tree. In contrast, PSFQ’s
aggressive fetch operation and loss aggregation techniques support multiple loss
windows in a single control message. One high-level design lesson here is the in-
effectiveness of control messages under high-loss rate scenarios. SRM relies on the
underlying MAC layer to reliably deliver explicit and periodic control messages be-
tween members of a multicast group. The failure of the virtual carrier sense in
IEEE 802.11 MAC under high-loss rate environments cause SRM-I to fail, whereas
PSFQ’s minimalist approach enables it to do efficient control broadcasting, even
under high-loss conditions.
Our second experiment examines the data delivery latency of both schemes under
various channel conditions. The results are shown in Figure 2-5. The delivery
latency is determined only when all the intended target nodes have received all of
the data packets before the simulation terminates. For SRM-I, we know that 100%
delivery can be achieved only within a limited number of hops when the error rate
is high. In this experiment, we compare the two schemes using a 3-hop network and
investigate PSFQ’s performance with a larger number of hops since PSFQ has better
46
0
5
10
15
20
25
30
35
40
45
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Ave
rage
Del
iver
y La
tenc
y (S
econ
ds)
Channel Error Rate
SRM-I (3-Hop Network)PSFQ (3-Hop Network)PSFQ (4-Hop Network)PSFQ (5-Hop Network)
Figure 2-5: Comparison of average latency as a function of channel error rate.
error tolerance properties. Figure 2-5 shows that SRM-I has a smaller delay than
PSFQ when the error rate is smaller than 40%, but its delay grows exponentially
as the error rate increases, while PSFQ grows more slowly until it hits its error
tolerance barrier at 70% error rate. The reason that SRM-I performs better than
PSFQ in terms of delay in the lower error rate region is due to the “pump slowly”
mechanism, where each node delays a random period of time between Tmin and Tmax
before forwarding packets. Despite this small penalty in the lower error rate region
the coupling of this mechanism with the “fetch quickly” operation proves to be very
effective. As shown in Figure 2-5, PSFQ can provide delay assurances even at very
high error rates.
In the next experiment, we study the communication cost for reliability in both
schemes under various channel conditions using a 3-hop network, including a 16-
node (4× 4) 3-hop grid network to explore PSFQ’s performance in a dense network
47
0
5
10
15
20
25
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Ave
rage
Del
iver
y O
verh
ead
(tra
nsm
issi
ons/
pkt)
Channel Error Rate
100% delivery(PSFQ)
100% delivery(SRM-I)
SRM-I (MAC signaling not included)SRM-I (Total)PSFQ (Total)
PSFQ (Grid topology)
Figure 2-6: Average delivery overhead as a function of channel error rate.
where nodes can have up to four neighbors. Communication cost is measured as the
average number of transmissions per data packet (i.e., average delivery overhead).
For SRM-I, we separate the communication cost of the SRM-specific loss recovery
mechanisms from the total communication cost, which includes the cost associated
with the link-layer loss recovery mechanisms (RTS/CTS/ACK). Figure 2-6 shows
that the cost for PSFQ is consistently smaller than SRM-I by several times even af-
ter excluding the link-layer cost of SRM-I. We can observe from Figure 2-6 that the
communication cost in a denser grid network closely matches but is lower than its
chain-network counterpart, indicating that PSFQ can exploit neighbor redundancy
while suppressing unnecessary redundant transmissions. Figure 2-6 also illustrates
the 100% delivery barrier of both schemes (the two vertical lines). The 52% error
rate mark shows the limit for SRM-I while the 70% error rate mark shows the oper-
ational boundary for PSFQ. The different performance observed under simulation
48
is rooted in the distinct design choices made for each protocol. PSFQ utilize a pas-
sive, on-demand loss recovery mechanism, whereas SRM employ periodic exchange
of session messages for loss detection/recovery.
2.5. Experimental Testbed Results
In what follows, we discuss experiences implementing PSFQ in an experimental
wireless sensor testbed using the TinyOS platform [23] [19] and Rene2 motes [23].
The Rene2 sensor device has an ATMEL 4 MHz, low power, 8-bit micro-controller
with 16K bytes of program memory and 1K bytes of data memory, 32 KB EEPROM
serves as secondary storage. The radio is a single channel RF transceiver operating
at 916 MHz and capable of transmitting at 10 kbps using on-off-keying encoding.
TinyOS [19] is an event-based operating system employing a CSMA MAC. The
packet size is 36 bytes. With a link speed of 10 kbps the channel can deliver at
most 16 packets per second. Tuning the transmission power can change the radio
transmission range of the motes.
We implement the PSFQ pump, fetch, and report operations as a component
of TinyOS that interfaces with the lower layer radio components. In the imple-
mentation, every data fragment that is received correctly is stored in the external
EEPROM at a predefined location based on its sequence number. The sequence
number is used as an index to locate and retrieve data segments when a node re-
ceives a NAK from its neighbors.
2.5.1 PSFQ Parameter Space and Timer Bounds
Among the various PSFQ operations, the most aggressive timer is the fetch timer,
as defined in Section 2.3.2.2. A successful recovery after a sequence number gap has
been detected relies on two successful packet receptions being accomplished within
49
Base Station
1 13 20 21
Figure 2-7: A 4-hop network physically arranged in a string/chain topology.
Tr, (i.e., one for receiving the NAK at the neighbors and another for receiving
the repair packet at the fetching node). Since the transmission time of a single
packet is non-negligible in low bandwidth environments (i.e., approximately 67 ms
for Rene2 mote), Tr should be long enough to accommodate at least two packet
transmissions. There exists a lower bound for Tr that is defined at the granularity
of the transmission time of a single packet; assume this is Tpkt. Recall that upon
receiving a NAK, a node schedules a repair to be sent at a random time ∈ [ 1
4Tr,
1
2Tr].
Therefore, Tr must be long enough to wait for the largest delayed repair from the
neighborhood to avoid unnecessary retransmission of NAK messages, (i.e., Tr ≥1
2Tr + 2Tpkt). Therefore, Tr ≥ 4Tpkt. In reality, to avoid using up all the available
channel bandwidth during fetch operations, we increase the lower bound by one
or two Tpkt times to allow at least one packet transmission for other operations or
applications, and to accommodate other possible processing delays. For example, a
reasonable bound is: Tr ≥ 6Tpkt and Tmax ≥ 5Tr ≥ 30Tpkt. These values are used in
all of our testbed experiments discussed in the remainder of this section.
2.5.2 Messaging Overhead
In our experiments, we manipulate the radio transmission power of the motes to cre-
ate multi-hop networks such that motes that are separated by 5 inches can maintain
90% ∼ 100% reception rate, while motes that are separated by 10 inches can hardly
hear each other, (i.e., the reception rate is between 0% ∼ 15%). Figure 2-7 shows
a 4-hop network in a string/chain topology in which each node is separated by 5
inches. Here we refer to a hop distance as the distance between nodes that can main-
50
Figure 2-8: Breakdown of PSFQ messages. Average delivery overhead is 1.2± 0.13.
tain excellent communication, (i.e., more than a 90% packet reception rate). Our
test scenario sends a new execution image (i.e., image file of the TinyOS BLINK [19]
application segmented into 70 over-the-air packets) from the base station (BS) con-
nected to a PC to all the sensor nodes using PSFQ. When the base station confirms
the 100% reception of the image by all sensors (using the PSFQ report operation)
then it sends a single control message that propagates to all the sensor nodes to
initiate the process of transferring the new image from external EEPROM to the
internal Flash to complete the reprogramming of the application. Note that we use
PSFQ’s single packet reliable service to do this controlled application switchover at
sensors, as discussed in Section 2.3.4.
Figure 2-8 shows the result of our experiments in terms of communication over-
head with the breakdown of the PSFQ messages. Each data point in the figure
is an average of 10 independent experiments and the 95% confidence intervals are
all within 10% of the average value. The overall average delivery overhead is 1.2
transmissions per received packet.
51
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 1.5 2 2.5 3 3.5 4 4.5 5
Ave
rage
Del
iver
y O
verh
ead
(Xm
issi
ons/
pkt)
Network Size (# of hops)
1x Density2x Density3x Density
Figure 2-9: Average delivery overhead as a function of network size and density.
2.5.3 Network Size versus Network Density
In what follows, we examine the impact of the network density and the network
size on the performance of PSFQ in terms of delivery latency and average delivery
overhead.
Using the same test scenario described in Section 2.5.2, we measure both the
communication cost and delivery latency of PSFQ with various network sizes as
well as various node densities in our Rene2 testbed, in which motes are arranged in
a string/chain topology. Figure 2-9 and 2-10 present the results of these experiments.
Each data point is an average of 10 independent experiments and the corresponding
95% confidence intervals are plotted as y-axis error bars in the figures, respectively.
Figure 2-9 shows that the communication cost for reliable delivery increases
rapidly when the network size increases from 1 hop to multiple hops, but it also
levels off and stabilizes quickly for a network size of 4 to 5 hops. The reason for
52
100
150
200
250
300
350
1 1.5 2 2.5 3 3.5 4 4.5 5
Ave
rage
Del
iver
y La
tenc
y (S
econ
ds)
Network Size (# of hops)
1x Density2x Density3x Density
Figure 2-10: Average delivery latency as a function of network size and density.
the rapid rise of the communication cost is due to the well-known hidden terminals
problem in CSMA networks, which becomes evident only in multi-hop environments
and creates collisions that force packet drops. Nevertheless, PSFQ’s pump/fetch op-
erations can effectively prevent loss propagation along the distribution chain, and
therefore is able to maintain relatively low overhead (∼ 1.3 transmissions/packet)
as the network size increases. Interestingly, we can observe from Figure 2-9 that as
we increase the network density the communication cost actually decreases. This
indicates that PSFQ effectively suppresses redundant transmissions and takes ad-
vantage of overhearing transmissions from a dense neighborhood to reduce a node’s
transmissions, and hence, reduces the overall delivery overhead. Figure 2-10 shows
that the delivery latencies of PSFQ increase almost linearly with the network size
but they are rather independent of the network density, which indicates that PSFQ
can adapt well in a high-density environment.
53
2.6. Related Work
To our best knowledge PSFQ represents the first reliable transport for sensor net-
works. In what follows, we contrast more recent contributions [65] [66] [67] for
reliable data delivery in sensor networks that followed the initial publication of
PSFQ [49].
RMST (Reliable Multi-Segment Transport) [65] is a transport layer paradigm
for sensor networks that is closest to our work. RMST is designed to complement
Directed Diffusion [8] by adding a reliable data transport service on top of it. RMST
is a NAK-based protocol like PSFQ, which has primarily timer driven loss detection
and repair mechanisms. The authors analyze the tradeoff between hop-by-hop vs.
end-to-end repair and conclude the importance of hop-by-hop recovery, which is
consistent with our analysis and simulation results. In contrast to PSFQ, which
provides reliability purely at the transport layer, RMST involves both the transport
and MAC layers (e.g., SMAC [14]) to provide reliable delivery.
In ESRT [66] the authors propose using an event-to-sink reliability model in
providing reliable event detection that embeds a congestion control component. In
contrast to PSFQ, ESRT does not deal with data flows that require strict delivery
guarantees; rather, the authors define the “desired event reliability” as the num-
ber of data packets required for reliable event detection that is determined by the
application.
A sink-to-sensor reliability solution is presented in [67] that focuses on communi-
cation reliability from the sink to the sensors in a static network. The authors pro-
pose using a two-radio approach where each node is equipped with a low frequency
“busy-tone” radio in addition to the default radio that is used for data transmis-
sion and reception. The busy-tone radio is used to ensure delivery of single-packet
messages or the first packet of a longer message. A NAK-based recovery core is
54
constructed from the minimum dominating set of the underlying graph.
2.7. Conclusion
We have presented PSFQ, a reliable transport protocol for wireless sensor networks.
PSFQ is a lightweight, simple mechanism that is scalable and robust making mini-
mum assumptions about the underlying transport infrastructure. Based on our ref-
erence application for remotely programming sensors over-the-air, we have discussed
a number of important design goals that underpin the protocol’s development, in-
cluding, the correct and efficient operation under high packet error rate conditions
and support for loose delay bounds for reliable data delivery. We evaluated PSFQ
using simulation and through implementation in an experimental motes testbed. We
found that PSFQ outperforms SRM-I in terms of error tolerance, communication
overhead, and delivery latency.
Our work in PSFQ has been widely recognized as the first contribution to the
problem of reliable delivery in sensor networks. Several important contributions
came out of this work [65] [67] [68]. First, we proposed hop-by-hop error recovery
in which intermediate nodes also take responsibility for loss detection and recovery,
so that reliable data exchange is done on a hop-by-hop basis rather than end-to-
end. Second, we analyzed a simplified model of our NAK-based algorithm and
determined a near-optimal ratio between the timers associated with the forwarding
(pump) and retransmission (fetch) operations. Third, PSFQ exhibits a novel multi-
modal communication property that provides a graceful tradeoff between packet
switching and store-and-forward paradigms, depending on the channel conditions
encountered.
While investigating the reliable transport issues, we analyzed the loss patterns
in our sensor network testbed and observed that significant loss is also due to con-
55
gestion, over a wide range of workloads, including light and moderate traffic. This
observation leads us to study congestion problems in sensor networks. While some
researchers have discussed congestion issues [35] in sensor networks there has been
no comprehensive approach to the problem proposed in the literature. In the next
chapter, we address this challenge and propose the first such general algorithmic
approach called CODA (COngestion Detection and Avoidance) that includes a low-
cost sampling scheme for congestion detection, a backpressure algorithm, and sink
to source regulation.
56
Chapter 3
Energy-Efficient Congestion Detection and
Avoidance in Sensor Networks
3.1. Introduction
Sensor networks come in a wide variety of forms, covering different geographical
areas, being sparsely or densely deployed, using devices with a variety of energy
constraints, and implementing an assortment of sensing applications. One applica-
tion driving the development of sensor networks is the reporting of conditions within
a region where the environment abruptly changes due to observed events, such as
target detection, earthquakes, floods, or fires, and in habitat monitoring. Sensor
networks may typically operate under light load, but can suddenly become active in
response to a detected event. These scenarios require data to be delivered through
the sensor network quickly to a relatively small number of physical sink points that
are attached to the regular communication infrastructure. Sensor networks exhibit
a unique funneling effect where events are generated en masse and then must be
quickly moved toward a sink point. The flow of events out of the network has
similarities to the flow of people from a large arena after sporting events complete.
This leads to a number of significant challenges including increased transit traffic
57
intensity, congestion, and packet loss (and therefore energy and bandwidth waste)
at nodes closer to the sink.
Depending on the application this can result in the generation of large, sudden,
and correlated impulses of data that must be delivered to a small number of sinks
without significantly disrupting the performance (i.e., fidelity) of the sensing appli-
cation. Although a sensor network may spend only a small fraction of time dealing
with impulses, it is during this time that the information it delivers is of greatest
importance.
The transport of event impulses is likely to lead to varying degrees of conges-
tion in sensor networks. In order to illustrate the congestion problem consider the
following simple but realistic simulation scenario. Figure 3-1 shows the impact of
congestion on data dissemination in a sensor network for a moderate number of
active sources with varying reporting rates. The ns-2 simulation results are for the
well-known Directed Diffusion scheme [11] operating in a moderately sized 30-node
sensor network using a 2 Mbps IEEE 802.11 MAC with 6 active sources and 3 sinks.
The 6 sources are randomly selected among the 30 nodes in the network and the
3 sinks are uniformly scattered across the sensor field. Each source generates data
event packets at a common fixed rate while the sinks subscribe (i.e., broadcast cor-
responding interest packets) to different sources at random times within the first 20
seconds of the 50-second simulation scenario. Event and interest packets are 64 and
36 bytes in size, respectively. The plot illustrates that as the source rate increases
beyond a certain network capacity threshold (10 events/s in this network), conges-
tion occurs more frequently and the total number of packets dropped per received
data packet at the sink increases rapidly. The plot shows that even with low to
moderate source event rates there is a large drop rate observed across the sensor
network. For example, with a source event rate of 20 events/s in the network one
58
0
1
2
3
4
5
6
0.5 1 2 4 10 20 50 100
Dro
p R
ate
Source Rate (Events/s)
Drop.Rate
Figure 3-1: Total number of packets dropped by the sensor network per data eventpacket delivered at the sink (Drop Rate) as a function of the source rate. The x axisis plotted in log scale to highlight data points with low reporting rates. All packetsthat are dropped during the 50 second simulation session are counted as part ofthe drop rate including the MAC signaling (e.g., RTS/CTS/ACK and ARP), dataevent, and diffusion messaging packets.
packet is dropped across the sensor field for every data event packet received at the
sink. Dropped packets can include MAC signaling, data event packets themselves,
and the diffusion messaging packets. The drop rates shown in Figure 3-1 repre-
sent not only significant packet losses in the sensor network, but more importantly,
energy wasted by the sensing application.
Depending on the type of sensing application the rate of event impulses may vary
in frequency. Some applications may only generate light traffic from small regions
of the sensor network (e.g., target detection) while others (e.g., fires, earthquakes
detection) may generate large waves of impulses, potentially across the whole sens-
ing area (which causes high loss, as shown in Figure 3-1). In traditional computer
networks, throughput and delay are two important performance metrics that im-
pact the users’ experience. Therefore, the objective function for control mechanisms
adopted to control the traffic is often defined as maximizing the ratio of throughput
59
to delay [69], i.e., the power. However, in the context of sensor networks, because
of their limited resources and application specific nature, we observe that maximiz-
ing this ratio does not necessarily result in the optimal performance. Rather, the
objective of sensor networks is to maximize the operational lifetime while delivering
acceptable data fidelity to the applications.
In response to this, future congestion control mechanisms for sensor networks
must be capable of balancing the offered load, while attempting to maintain accept-
able fidelity (e.g., rate of events) of the delivered signal at the sink during periods
of transient and more persistent congestion. A number of distinct congestion sce-
narios are likely to arise. First, densely deployed sensors generating impulse data
events will create persistent hotspots proportional to the impulse rate beginning
at a location very close to the sources (e.g., within one or two hops). In this sce-
nario, localized, fast time scale mechanisms capable of providing backpressure from
the points of congestion back to the sources could be effective. Second, sparsely
deployed sensors generating low data rate events will create transient hotspots po-
tentially anywhere in the sensor field but likely farther from the sources, toward the
sink. In this case, fast time scale resolution of localized hotspots using a combi-
nation of localized backpressure (between nodes identified in a hotspot region) and
rate limiting techniques could be more effective. Because of the transient nature of
congestion, source nodes may not be involved in the backpressure. Third, sparsely
deployed sensors generating high data-rate events will create both transient and
persistent hotspots distributed throughout the sensor field. In this final scenario, a
combination of fast time scale actions to resolve localized transient hotspots, and
closed loop rate regulation of all sources that contribute toward creating persistent
congestion could be effective.
In this chapter, we propose an energy efficient congestion control scheme for
60
sensor networks called CODA (COngestion Detection and Avoidance) that comprises
three mechanisms:
• Congestion detection. Accurate and efficient congestion detection plays an
important role in the congestion control of wireless networks. CODA uses
a combination of the present and past channel loading conditions, and the
current buffer occupancy, to infer accurate detection of congestion at each
receiver with low cost. Sensor networks must know the state of the channel
since the transmission medium is shared and may be congested with traffic
between other nodes in the neighborhood. Listening to the channel to measure
local loading incurs high energy costs, if performed all the time. Therefore,
CODA uses a sampling scheme that activates local channel monitoring at the
appropriate time to minimize cost while forming an accurate estimate. Once
congestion is detected, nodes signal their upstream neighbors via a backpres-
sure mechanism that is discussed next.
• Open-loop, hop-by-hop backpressure. In CODA a node broadcasts backpressure
messages as long as it detects congestion. Backpressure signals are propagated
upstream toward the source. In the case of impulse data events in dense net-
works it is very likely that backpressure will propagate directly to the sources.
Nodes that receive backpressure signals can throttle their sending rates based
on the local congestion policy (e.g., silence for a random time or AIMD, etc.).
When an upstream node (toward the source) receives a backpressure message
it decides whether or not to further propagate the backpressure upstream,
based on its own local measured network conditions.
• Closed-loop, multi-source regulation. In CODA, closed-loop regulation oper-
ates over a slower time scale and is capable of asserting congestion control
61
over multiple sources from a single sink in the event of persistent congestion.
When a source event rate is less than some fraction of the maximum theoret-
ical throughput of the channel, the source regulates itself. When this value
is exceeded, however, a source is more likely to contribute to congestion, and
therefore, closed-loop congestion control is triggered. The source only enters
sink regulation if this threshold is exceeded. At this point a source requires
constant, slow time-scale feedback (e.g., ACK) from the sink to maintain its
rate. The reception of ACKs at sources serves as a self-clocking mechanism
allowing sources to maintain their current event rates. In contrast, failure to
receive ACKs forces a source to reduce its own rate. Once a source has de-
termined congestion has passed it takes itself out of sink regulation under its
own direction.
The chapter is organized as follows. Section 3.2. discusses a number of important
design considerations for mitigating congestion in sensor networks including MAC
and congestion detection issues. Section 3.3. details CODA’s backpressure and rate
regulation mechanisms. Following this, an implementation of CODA is evaluated in
an experimental sensor testbed in Section 3.4.. We define three important perfor-
mance metrics (i.e., energy tax, fidelity penalty, and power) to evaluate the impact
of CODA on the performance of sensing applications. Because CODA is designed
to interwork with existing data dissemination schemes, we also evaluate it using
one well-known dissemination mechanism. Section 3.5. presents our performance
evaluation of CODA working with Directed Diffusion [11] using the ns-2 simula-
tor. Section 3.6. presents the related work. Finally, some concluding remarks are
discussed in Section 3.7..
62
3.2. Design Considerations
In what follows, we discuss the technical considerations that underpin the design of
CODA while the detailed design is presented in Section 3.3..
The medium access control plays a significant role in the performance of man-
aging impulses of data in a wireless shared medium, including the detection of
congestion. A growing number of sensor networks use CSMA or variants for the
medium access control. For example, the widely used Berkeley motes [23] use a sim-
ple CSMA MAC as part of the TinyOS [19] platform. In [14] the authors proposed
a modified version of CSMA called S-MAC, which combines TDMA scheduling with
CSMA’s contention-based medium access, without a strict requirement for time syn-
chronization. S-MAC uses virtual carrier sense to avoid hidden-terminal problems,
allowing nodes other than the sender and receiver to enter sleep mode (during the
NAV after the RTS/CTS exchange), thus saving energy. A collision-minimizing
CSMA MAC is proposed in [70] that is optimized for event-driven sensor networks.
The authors propose to utilize a non-uniform probability distribution for nodes to
randomly select contention slots such that collisions between contending stations
are minimized.
There is a growing effort to design suitable TDMA schemes for sensor networks
where energy can be conserved by turning off nodes periodically. Congestion can
still occur when using TDMA or other schedule-based schemes when the incoming
traffic exceeds the node capacity and the queue overflows. Because TDMA and
other schedule-based access schemes (e.g., [71] [72]) can control and schedule traffic
flows in the network to provide collision-free communication, the impact of con-
gestion is less severe and the congestion control mechanism is simpler compared
to a CSMA MAC. For example, congestion can be reliably detected by monitoring
the queue size at each node (i.e., when buffer overflows), eliminating the need for
63
the new congestion detection schemes proposed in the rest of this section. Nev-
ertheless, the new objective function for congestion control in sensor networks (as
discussed in previous section) demands new feedback control mechanisms even for
TDMA/schedule-based networks. These new control mechanisms are discussed in
detail in section 3.3.1 and 3.3.2 and can be used seamlessly on both contention-based
and schedule-based networks.
A number of considerations shape the design of CODA. In what follows, we
discuss the MAC and congestion detection considerations with a focus toward CSMA
or contention-based schemes.
3.2.1 CSMA Considerations
3.2.1.1 Throughput Issues
The theoretical maximum throughput (channel utilization) for the CSMA scheme
is approximately [73]:
Smax ≈ 1
(1 + 2√
β)(for β 1), (3.1)
where,
β =τC
L. (3.2)
The performance of CSMA is highly dependent on the value of β, which is a measure
of radio propagation delay and channel idle detection delay. τ is the sum of both
radio propagation delay and channel idle detection delay in seconds, C is the raw
channel bit rate and L is the expected number of bits in a data packet. If nodes can
detect idle periods quickly, in other words have a very small β value, then CSMA
can offer very good channel utilization regardless of the offered load.
Equation (3.1) gives the channel capacity of CSMA within one hop. In [74] the
64
authors show that an ideal ad hoc multihop forwarding chain should be able to
achieve 25% of the throughput that a single-hop transmission can achieve. This
observation has important implications in the design of our congestion detection
and closed-loop regulation mechanisms, as discussed in Section 3.2.2 and Section
3.3.2, respectively.
3.2.1.2 Hidden Terminals
CSMA suffers from the well-known hidden terminal problem in multihop environ-
ments. IEEE 802.11 utilizes virtual carrier sense (VC), namely an RTS/CTS ex-
change, to eliminate hidden terminals. In order to reduce the signaling overhead
incurred by adding VC, IEEE 802.11 does not exchange RTS/CTS for small pack-
ets. In sensor networks, packets are usually small in nature (i.e., on the order of few
tens of bytes) because of the low duty cycle requirement and traffic characteristics
[5]. Therefore, the signaling cost is high if the RTS/CTS exchange is used for ev-
ery message. Furthermore, sensor nodes have a limited energy budget making the
energy cost of doing this prohibitively high.
Usually, nodes other than event source nodes and the forwarding nodes will
be silent most of the time. Therefore, loss due to hidden terminals rarely occurs
when the workload of the network is low. In [75], the authors show that in general,
when nodes are nicely randomized and coupled with appropriate delay in send-
ing/forwarding packets, the probability of having hidden terminals is low even in
dense networks. In S-MAC [14], an RTS/CTS exchange is used in an aggregated
manner (i.e., not for every single packet) to reduce the energy cost.
In the context of sensor networks, the VC scheme is costly and mostly unneces-
sary during normal operation. There is a need, however, to devise a scheme that can
work satisfactorily with or without the VC for collision avoidance, that incurs low
65
cost or no cost during normal operations, and yet is responsive enough to quickly
resolve congestion1. In Section 3.2.2, we discuss such a scheme.
3.2.1.3 Link-layer ARQ
In the IEEE 802.11 MAC, a packet will be kept in the sending buffer until an
ACK is received or the number of retransmissions exceeds a certain threshold. This
mechanism increases the link reliability at the expense of energy and buffer space.
However, both of these resources are scarce in sensor nodes where support for re-
liability may not always be necessary under normal operations (i.e., due to the
application-specific nature of sensor networks not all data packets require strict
reliability2). Today different sensor platforms utilize different radio technologies;
some radios support low-overhead synchronous ACK [45] (e.g., the RFM radio used
in Mica [24]) and some radios include built-in link-layer ACK supporting higher
data rates up to 250 Kbps (e.g., the IEEE 802.15.4 radio used in Telos [19]), while
in others supporting ACK could be costly (e.g., the Chipcon radio used in Mica2
[45]) in terms of energy and bandwidth consumption.
We believe there is a need for a separation between reliability and congestion
control in the design of sensor networks protocols. The use of VC and link-layer ARQ
as a reliable means of communication is essential for critical information exchange
(e.g., routing signaling), but they are not necessarily relevant during congestion. In
sensor networks, energy expenditure is more important than occasional data loss
because of the natural redundancy inherent in disseminated sensor data. The main
objective function is therefore to minimize energy expenditure. This is in contrast
1Depending on the sensing applications and the radio technologies, a user might choose toomit the VC for data packets but retain it for critical signaling messages (e.g., control packets forrouting protocol) in order to reduce overhead.
2For example, applications that generate periodic workload can often reasonably assume thatsubsequent reports will supersede any lost data.
66
to TCP where the lost data is always recovered. In our design, congestion control
elements do not explicitly look at loss (unlike TCP), allowing CODA to decouple
reliability from other control mechanisms. CODA is therefore capable of working
with or without reliability elements, such as link-layer ARQ, depending on the
application’s needs and the radio technology used in the sensor platform.
3.2.2 Congestion Detection
Accurate and efficient congestion detection plays an important role in congestion
control of sensor networks. There is a need for new congestion detection techniques
that incur low cost in terms of energy and computation complexity. Several tech-
niques are possible.
3.2.2.1 Buffer Queue Length
Queue management is often used in traditional data networks for congestion detec-
tion. However, without link-layer ACK (some applications might not require this
and hence would omit it to save the overhead, as discussed above), buffer occu-
pancy or queue length cannot be used as a reliable indication of congestion. To
illustrate this, we perform an ns-2 simulation of the simple IEEE 802.11 wireless
5-node network shown in Figure 3-2. In the simulation, nodes 1 and 4 each start
sending (1 second apart in simulation time) CBR traffic that consumes 50% of the
channel capacity through node 2 to node 3 and 5, respectively. One of the sources
stops sending data after 10 seconds. We ran two simulation trials, one with the VC
enabled (including link ARQ), the other with it disabled and no link ARQ.
Figure 3-3 shows the time series traces for both channel loading and buffer
occupancy as well as the packet delivery ratio measured at the intermediate node 2.
It is clear from the plot that the channel loading almost immediately rises to 90%
67
24
1
3
5
Queue LengthChannel load
Figure 3-2: A simple IEEE 802.11 wireless network of 5 nodes illustrates receiver-based congestion detection.
during the time both sources are on. Congestion occurs and the packet delivery
ratio drops from 100% to around 20% during this period. Note that the buffer
occupancy grows at a slower rate during this congestion period, particularly in
the trace corresponding to the simulation where the VC is disabled. The buffer
occupancy (without link ACK) even drops at around 5 seconds into the simulation,
which provides false information about the congestion state. This is because without
the link-layer ACK, the clearing of the queue at the transmitter does not mean that
congestion is alleviated since packets that leave the queue might fail to reach the
next hop as a result of collisions. Note that CSMA does not guarantee collision-free
transmissions among neighboring nodes because of the detection delay [73].
This simple simulation shows that the buffer occupancy alone does not provide
an accurate and timely indication of congestion even when the link ARQ is enabled,
except in the extreme case when the queue is empty or about to overflow. The first
case indicates good traffic conditions and the latter one signals serious congestion.
As shown in the figure, the queue takes a much longer time to grow beyond a
68
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
Nor
mal
ized
Rat
io
Time (Seconds)
Channel Load/UtilizationBuffer Occupancy, VC
Buffer Occupancy, no VCPacket Delivery Ratio
Figure 3-3: Channel load and buffer occupancy time series traces with and withoutvirtual carrier sense (VC)+link-layer ACK, and packet delivery trace with VC.
high watermark level (e.g., 0.8) that signifies congestion compared to the channel
load. We argue that this bimodal effect and detection latency is not responsive
enough and too coarse to provide accurate, timely and efficient congestion control,
especially in the case of event-driven sensor networks where short-lived hotspots
are likely to occur across different time-scales. Therefore, we propose augmenting
buffer monitoring with channel load measurement for fast and reliable congestion
detection in sensor networks.
3.2.2.2 Channel Loading
In CSMA networks, it is straightforward for sensors to listen to the channel, trace the
channel busy time, and calculate the local channel loading conditions. Since Smax in
Equation (3.1) gives the optimal utilization of the channel when β is determined, if
one senses that the channel loading reaches a certain fraction of the channel capacity,
this would indicate a high probability of collision [74].
Listening to the channel consumes a significant portion of energy [13] in a node.
69
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70
Ave
rage
Que
ue S
ize
(pkt
s)
Channel Utilization (%)
Average Queue Size
Figure 3-4: Queueing performance of a real sensor network of Mica motes.
Therefore, performing this operation all of the time is not practical in sensor net-
works. In Section 3.3.1, we propose a sampling scheme that activates local channel
monitoring only at the appropriate time to minimize the energy cost while forming
an accurate estimate of conditions.
Channel loading and buffer occupancy give accurate information about how busy
the surrounding network is. These provide a good congestion detection measure for
hop-by-hop flow control, but the scope of this control is inherently local. Hop-by-
hop flow control has limited effect, for example, in mitigating large-scale congestion
caused by data impulses from sparsely located sources that generate high-rate traffic.
To understand this limitation in a practical sensor network, we study the channel
load and queue performance using our Mica mote [23] testbed. We generate data
packets at different rates that drive the network to different levels of congestion and
measure the average queue size of the nodes in a small neighborhood that share the
wireless medium. The experimental results shown in Figure 3-4 plot the measured
average queue size against the channel load (utilization). The figure shows that the
queue size is very small ( 1) for all channel loads before the channel saturates
70
at a utilization of approximately 70%. Note that the curve resembles a typical
M/M/1 queue, except that it saturates at a utilization far lower than one, which
is a limitation imposed by the channel idle detection delay (this result is further
confirmed when we measure the β value in Section 3.4.1).
In CODA’s hop-by-hop flow control, a congested node, determined by the mea-
sured channel load and queue size, provides backpressure to its upstream neighbors.
Because of the funneling effect in sensor networks, particularly for sparsely located
sources, congestion is most likely to occur at downstream sensors closer to the sink.
Therefore, upstream sensors located closer to the sources within the propagation
funnel (i.e., data flowing from multiple sources toward a sink) are likely to experi-
ence lower channel load, and hence a low queue occupancy according to Figure 3-4.
As a result, the backpressure signal would most likely stop propagating before the
feedback signal reaches the sources. Therefore, the hop-by-hop backpressure mecha-
nism alone is not enough to mitigate large-scale congestion. A new mechanism that
resembles end-to-end closed-loop control, that allows a user to control the desired
reporting rate of an application is needed and explored in the next section.
3.2.2.3 Reporting Rate/Fidelity Measurement
For typical applications in sensor networks [11], the sinks expect a certain sampling
rate or reporting rate coming from the sources. This rate is application-specific, and
can be seen as an indication of event fidelity [35]; that is, the reporting rate from
the sources with respect to certain phenomena should be high enough to satisfy
the applications’ desired accuracy. When a sink consistently receives a less than
desired reporting rate, it can be inferred that packets are being dropped along the
path, most probably due to congestion. On the other hand, most applications can
tolerate a certain degree of fidelity loss, allowing the user to tradeoff fidelity to
71
avoid congestion in the network, if needed. Therefore, the fidelity measurements
such as the number of event packets received within a time period can be used as a
congestion signal and a metric for end-to-end control.
Such fidelity measurement schemes need to operate on a much longer time-scale
compared to the packet transmission time-scale, and consider:
• End-to-end delay between sources and sink nodes since only the sink can
recognize its own requirements on the sampling rate.
• Stability - to avoid unnecessary reaction to transient phenomena that could
cause oscillations, a sink should not respond too quickly to events, and there-
fore, should define an appropriate “observation period” (i.e., window) over a
longer time-scale for measurement.
We conjecture that pure window-based end-to-end control schemes like TCP
are not well suited to sensor networks. In addition to the excessive end-to-end
acknowledgment overhead, there exists a mismatch of the traffic model with the
applications (i.e., the data traffic in most sensor network applications is CBR in
nature and might experience a sudden increase in the data rate when an interesting
event occurs). In TCP, since every incoming ACK increases the transmission window
size, low-rate CBR could falsely inflate the window to a very large size that could
easily overwhelm the network with event-based applications. To avoid this well-
known large-window problem in TCP and the excessive ACK overhead, in section
3.3.2 we propose a novel low cost end-to-end closed-loop control mechanism that
combines window-based and rate-based control.
72
3.3. CODA Design
Hotspots (i.e., congestion) can occur in different regions of a sensor field due to
different congestion scenarios that arise. This motivates the need for CODA’s open-
loop hop-by-hop backpressure and closed-loop multi-source regulation mechanisms.
These two control mechanisms, while insufficient in isolation, complement each other
nicely. Different rate control functions are required at different nodes in the sen-
sor network depending on whether they are sources, sinks, or intermediate nodes.
Sources know the properties of the traffic they inject while intermediate nodes do
not. Sinks are best placed to understand the fidelity rate of the received signal, and
in some applications, sinks are powerful nodes that are capable of performing sophis-
ticated heuristics. The goal of CODA is to maintain low or no cost operations during
normal conditions, but be responsive enough to quickly mitigate congestion around
hotspots once it is detected. In what follows, we discuss CODA’s backpressure and
multi-source regulation mechanisms.
3.3.1 Open-Loop Hop-by-Hop Backpressure
Backpressure is the primary fast time scale control mechanism when congestion
occurs. The main idea is to use the components mentioned in Section 3.2.2 to do
local congestion detection at each node with low cost. Once congestion is detected,
the receiver will broadcast a backpressure message to its neighbors and at the same
time make local adjustments to prevent propagating the congestion downstream.
A node broadcasts backpressure messages as long as it detects congestion. Back-
pressure signals are propagated upstream toward the source. In the case of impulse
data events in dense networks it is very likely that the backpressure may propagate
directly to the sources. Nodes that receive backpressure signals could throttle their
sending rates (e.g., be silence for a random period of time) or regulate data rates
73
based on some local congestion policy (e.g., AIMD).
When an upstream node (toward the source) receives a backpressure message,
based on its own local network conditions it determines whether or not to further
propagate the backpressure signal upstream. For example, nodes do not propagate
the backpressure message if they are not congested.
We use the term depth of congestion to indicate the number of hops that the
backpressure message has traversed before a non-congested node is encountered.
The depth of congestion can be used by the routing protocol and local packet drop
policies to help balance the energy consumed during congestion across different
paths. Two simple schemes can be used:
• Consider the instantaneous depth of congestion as an indicator to the routing
protocol to select better paths, thereby reducing traffic over the paths suffering
deep congestion.
• Alternatively, rather than coupling congestion control and routing, the nodes
can silently suppress or drop important signaling messages associated with
routing or data dissemination protocols (e.g., interests [11], data advertise-
ments [9], etc.). Such actions would help to push event flows out of congested
regions and away from hotspots in a more transparent way.
Further investigation of using depth of congestion to assist routing is out of the
scope of this chapter. The rest of this section will describe the main elements and
detailed operations of CODA’s open-loop control.
3.3.1.1 Receiver-based Detection
As mentioned in Section 3.2.2, there are multiple good indications of congestion:
• a nearly overflowing queue.
74
• a measured channel load higher than a fraction of the optimum utilization.
This provides a probabilistic indication of congestion by observing how closely
the channel load approaches the upper bound.
Monitoring the queue size comes almost for free except for a little processing
overhead, but it provides only a bimodal indication with non-negligible latency.
Listening to the channel either to measure the channel loading or to acquire signaling
information for collision detection provides a fast and good indication but incurs high
energy cost if performed all the time. Therefore, it is crucial to activate the latter
component only at the appropriate time in order to minimize cost.
Consider the typical packet forwarding behavior of a sensor network node and
its normal radio operational modes. The radio stays in the listening mode except
when it is turned off or transmitting. When a carrier is detected on the channel,
the radio switches into the receiving mode to look for a transmission preamble and
continues to receive the packet bit stream. Before forwarding this packet to the
next hop, CSMA requires the radio to detect an idle channel which implies listening
for a certain amount of time. If the channel is clear during this period, then the
radio switches into the transmission mode and sends out a packet. There is no extra
cost to listen and measure channel loading when a node wants to transmit a packet
since carrier sense is required anyway before a packet transmission. Based on this
observation, we conclude that the proper time to activate the detection mechanism
is when a node’s send buffer is not empty. In other words, a node’s radio might
be turned off most of the time according to some node coordination schemes (e.g.,
GAF [13], SPAN [12], S-MAC [14], etc.), but, whenever receiving or transmitting a
packet, the radio must reside in the listening mode for a time.
Figure 3-2 illustrates a typical scenario in sensor networks in which hotspots or
congestion areas could be created. In this example, nodes 1 and 4 each send CBR
75
traffic that consumes 50% of the channel capacity through node 2 to node 3 and 5,
respectively. Packets that are received by node 2 stay in its queue because of the
very busy channel and are eventually dropped. This simple example shows that in
a congested neighborhood, a receiver’s (e.g., node 2, the forwarding node) buffer
occupancy is high or at least non-empty. A node that activates the channel loading
measurement during the moment when its buffer is not empty is highly responsive
with almost no cost. The channel loading measurement will stop naturally when
the buffer is cleared, which indicates with high probability that any congestion is
mitigated and data flows smoothly around the neighborhood. Based on this obser-
vation, there is little extra cost to measure the channel loading if a node activates
channel monitoring only when it is “receiving” a packet and needs to forward it
later on. The only time CODA needs to do this is when a node has something to
send, and it has to do carrier sense anyway for those situations.
3.3.1.2 Minimum Cost Sampling
A sensing epoch is defined as a multiple of the packet transmission time. When a
node starts sensing the channel (i.e., when it has something to send in its buffer),
we probe the MAC for at least 1 epoch time to measure the channel load. During an
epoch period, instead of forcing the MAC to continuously listen during the backoff
time, a node performs periodic sampling of the radio states (non-invasive probing,
i.e., we do not modify the radio state machine) so that the radio can be turned off
during the interval. This non-invasive sampling scheme provides an elegant way to
measure the channel load without adding any energy cost to the radio other than
the cost required by the original CSMA state machine. We use a simple sampling
scheme where the channel load is measured for N consecutive sensing epochs of
length E, with a predefined sampling rate to obtain channel state information; that
76
is, the number of times that the channel state is busy or idle within a single sensing
epoch. We then calculate the sensed channel load Φ as the exponential average of
Φn (the measured channel load during epoch n) with parameter α over the previous
N consecutive sensing epochs, as shown in Equation (3.3).
Φn+1 = αΦn + (1 − α)Φn, (n ∈ 1, 2, ..., N, Φ1 = Φ1). (3.3)
If the send buffer is cleared before n counts to N, then the average value is ignored
and n is reset to 1. The tuple (N,E, α) offers a way to tune the sampling scheme
to accurately measure the channel load for specific radio and system architectures.
In Section 3.4.2, we describe and demonstrate the tuning of these three parameters
in an experimental sensor network testbed comprised of Berkeley Mica motes.
3.3.1.3 Backpressure Message
When the sensed channel load exceeds a threshold (this can simply be Smax, as
shown in later evaluation sections) or the buffer occupancy reaches a certain high
watermark level, congestion is implicit. A node broadcasts a message as a back-
pressure signal and at the same time exercises the local congestion policy. Although
there is no guarantee that all neighboring nodes will get this message, at least some
nodes will get it probabilistically. A node broadcasts a backpressure message when
it detects congestion based on channel loading and buffer occupancy. A node will
continue broadcasting this message up to a certain maximum number of times with
minimum separation as long as congestion persists. Alternatively, a node can set a
congestion bit in the header of every outgoing packet [76] instead of sending explicit
backpressure signals. However, this scheme requires all nodes to overhear traffic
from the neighborhood, which is difficult to realize in non-CSMA based MAC such
as TDMA.
77
The backpressure message provides the basis for the open loop backpressure
mechanism and can also serve as an on-demand “Clear To Send” (CTS) signal, so
that all other neighbors except a single sender (which could be picked randomly,
or a node can assign more chances to more desirable senders) can be silenced at
least for a single packet transmission time. This deals with hidden terminals and
supports an implicit priority scheme in CODA. The “chosen node” embedded in
the backpressure message can be selected based on data type or other metrics that
essentially assign the chosen sender a higher priority to use the bandwidth. All
nodes can share a priority list of data types, with a certain data type having higher
priority than others.
3.3.2 Closed-Loop Multi-Source Regulation
In sensor networks there is a need to assert congestion control over multiple sources
from a single sink in the event of persistent congestion, where the sink plays an
important role as a 1-to-N controller over multiple sources. Note that backpres-
sure alone cannot resolve congestion under all scenarios because our design does
not propagate the congestion signal in cases where nodes do not locally experi-
ence congestion - to do so would be very costly in terms of power and bandwidth
consumption.
The cost of closed-loop flow control is typically high in comparison to simple
open-loop control because of the required feedback signaling. We propose an ap-
proach that would dynamically regulate all sources associated with a particular
data event. Under normal operation sources would regulate themselves at prede-
fined rates (e.g., based on the data dissemination protocol [11] [9]) without the
intervention of closed loop sink regulation.
When the source event rate (r) is less than some fraction η of the maximum
78
theoretical throughput (Smax) of the channel the source regulates itself. When this
value is exceeded (r ≥ ηSmax), a source is more likely to contribute to congestion and
therefore closed-loop control is triggered. The threshold η here is not the same as the
threshold that used in local congestion detection, in fact η should be much smaller
because of the result suggested in [74]. The source only enters sink regulation if this
threshold is exceeded. At this point a source requires steady periodic feedback (e.g.,
ACKs) from the sink to maintain its rate (r). A source triggers sink regulation
when it detects (r ≥ ηSmax) by setting the regulate bit in the event packets it
forwards toward the sink. Reception of packets with the regulate bit set forces the
sink to send “aggregated ACKs” (e.g., 1 ACK per 100 events received at the sink)
to regulate all sources associated with a particular data event. ACKs could be sent
in an application specific manner. For example, the sink could send the ACK only
along paths it wants to reinforce in the case of a Directed Diffusion [11] application.
The reception of ACKs at sources serves as a self-clocking mechanism allowing the
sources to maintain the current event rate (r).
When a source sets its regulate bit it expects to receive an ACK from the sink at
some predefined rate, or better, a certain number of ACKs over a predefined period
allowing for the occasional loss of ACKs due to transient congestion. If a source
receives a prescribed number of ACKs during this interval it maintains its rate (r).
When congestion builds up ACKs can be lost, forcing sources to drop their event
rate (r) according to some rate decrease function (e.g., multiplicative decrease, etc.).
The sink can stop sending ACKs based on its view of network conditions. The sink
is capable of measuring its own local channel loading (ρ) conditions and if this is
excessive (ρ ≥ γSmax) it can stop sending ACKs to sources.
Because the sink expects a certain reporting rate it can also take application-
specific actions when this rate is consistently less than the desired reporting rate
79
(i.e., the fidelity of the signal [35]). In this case the sink infers that packets are
being dropped along the path due to persistent congestion and stops sending ACKs
to sources. When congestion clears the sink can start to transmit ACKs again, and
as a result, the event rate of the source nodes will increase according to some rate
increase function (e.g., additive increase).
Because in some applications the sink is powerful in comparison to sensors and
a point of data collection, it can maintain state information associated with specific
data types. By observing packet streams from sources at the sink, if congestion is
inferred the sink can send explicit control signals to those sources to lower their
threshold value η to force them to trigger sink regulation even at a lower rate than
others, (i.e., other more important observers). This provides an implicit priority
mechanism as part of the closed-loop congestion control mechanism.
When the event rate at the sources is reset (e.g., via reinforcement [11]) to a value
(r) that is less than some factor η of the maximum theoretical throughput (Smax)
of the channel then the sources begin again to regulate themselves without the need
of ACKs from the sink. Such a multimodal congestion control scheme provides
the foundation for designing efficient and low cost control that can be practically
implemented in sensor networks based on the Berkeley motes series [23][24], as
discussed in Section 3.4.. Overall, closed loop multi-source regulation works closer
to the application layer and operates on a much larger (order of magnitude) time-
scale than its open-loop counterpart.
3.3.2.1 A Hybrid Window-based and Rate-based Algorithm
In essence, CODA’s closed-loop control can be realized as a combination of window-
based and rate-based schemes. We define the drop rate (i.e., number of packets
dropped in the network per received packet at the sink) as an energy metric called
80
1
2
3
4
5
6
7
8
9
10
0 50 100 150 200 250 300 350 400
Rat
e (p
kt/s
)
Time (seconds)
d=0.5,Wsink=50d=0.8,Wsink=50d=0.8,Wsink=25
Figure 3-5: Closed-loop control model. The impact of Wsink and the multiplicativedecrease factor d.
the energy tax or ETax. The packet loss rate p is thus ETax
1+ETax
. With a source event
rate of r, the expected number of event packets received at the sink, which is a
measure of application fidelity, is r(1 − p) or r1+ETax
. The application fidelity is
approximately inversely proportional to ETax.
Recall a key objective of sensor networks is to maximize the operational lifetime
while delivering acceptable data fidelity to the applications. This demands a mech-
anism to control the network so that the energy tax does not exceed an acceptable
value, which is in essence an application-specific choice. This is the objective func-
tion for CODA’s closed-loop control. Under overload conditions, assume that the
network does not drop ACKs from the sinks, (i.e. ACKs are delivered through high
priority queues), and the majority of packet loss in the network is due to congestion.
We can then realize this objective through a hybrid rate-based and window-based
algorithm. This algorithm governs the window sizes at both source and sink with
81
the ETax in the following equation:
Wsrc = r(τf + τb) + Wsink(1 + ETax) (3.4)
Wsrc is the window size or the number of event packets a source is allowed to send
at the current rate r without receiving an ACK from the sink. Wsink is the window
size or the number of accumulated event packets a sink receives before it sends
an aggregated ACK. r is the source rate during the current observation cycle and
(τf + τb) is the sum of the forward and backward one-way delays between a source
and the sink. The algorithm is such that, if a source does not receive an ACK after
it has sent out Wsrc event packets at rate r, it should decrease its rate from r to d · r
(d < 1 multiplicative decrease). If later an ACK is received at the source within
the next observation cycle Wsrc, then the source increases its rate from r to r + b
(additive increase). In other words, this control scheme ensures that a source would
cut its rate whenever the perceived energy tax rises beyond an acceptable value
ETax. Wsink determines the control overhead and the length of the decision period
that controls the convergence time of the rate control algorithm. To understand
the tradeoff between the control overhead and the convergence time, we numerically
evaluate Equation (3.4), simulating a network that experiences congestion when the
source rate exceeds 3 pkt/s but no congestion when the source rate is below 1.5
pkt/s.
In Figure 3-5, we evaluate the impact of two values of multiplicative decrease
factor d and two values of Wsink. For a fixed Wsink (e.g., equal to 50 or 2% control
overhead for sending ACKs), we observe that the source rate with a smaller d (i.e.,
0.5) drops more quickly than a source with a larger d value (i.e., 0.8). However, the
rate with a smaller d oscillates and thus takes a longer time to restore and converge
to an acceptable rate that avoids congestion. Therefore, a smaller d can reduce the
82
energy tax but most likely will hurt the fidelity because of the longer convergence
time. On the other hand, a larger d would have a larger energy tax because of the
slower rate reduction, even though it could achieve higher data fidelity because of
the finer levels of granularity of rate reduction and thus can converge faster to an
acceptable rate. Note that Wsink controls the length of the “observation cycle” and
thus a smaller Wsink can accelerate the rate reduction process. In Figure 3-5, we can
see that a smaller Wsink (i.e., 25) causes the rate of a source with d = 0.8 to decrease
as fast as d = 0.5. This allows the algorithm to achieve the same reduction in energy
tax while maintaining high fidelity, at the expense of higher control overhead (i.e.,
an increase from 2% to 4%) because of the smaller value of Wsink. We study these
parameter tradeoffs in our mote testbed and discuss the result in Section 3.4.5 under
real-world experimental conditions.
3.4. Experimental Sensor Network Testbed
In this section, we discuss experiences implementing CODA on a real sensor system
using the TinyOS platform [19] on Mica motes [23]. We report evaluation results,
including measuring the β value, tuning the parameters for accurate channel load
measurement, and finally, evaluating CODA with a generic data dissemination ap-
plication.
The sensor device has an ATMEL 4 MHz, low power, 8-bit microcontroller with
128K bytes of program memory, 4K bytes of data memory, and a 512 KB external
flash serves as secondary storage. The radio is a single channel RF transceiver
operating at 916 MHz and is capable of transmitting at 10 Kbps using on-off-keying
encoding. For all our experiments we use a Non-Persistent CSMA MAC on top of
the Mica motes.
83
3.4.1 Measuring the β Value
An important decision that must be made when using CODA’s open-loop control
mechanism described in Section 3.3.1 is the congestion threshold at which we should
start applying backpressure. A first step in making this decision is to determine the
maximum channel utilization achievable with the radio and MAC protocol being
used.
As noted in Equation 3.1 in Section 3.2.1, for the CSMA MAC protocol, the
channel utilization in a wireless network depends on the propagation delay between
the nodes with the maximum physical separation that can still interfere with each
other’s communications, and the channel idle detection delay. In sensor networks,
the maximum physical separation is typically tens of meters or less and as such the
propagation delay is negligible for most purposes. Thus, if the channel idle detection
delay is also negligible, CSMA should provide almost 100% utilization of the offered
load of the channel. However, in practice, the utilization is much less due to the
latency in the idle channel detection at the MAC layer. We can use the parameter β
as defined in Equation 3.2 to predict how much this latency degrades the maximum
channel utilization.
We measure the β value for the Mica mote using a simple experimental setup
involving two motes both running TinyOS [19]. Stopwatches inserted in the MAC
provide the basis for the measurement of β. Figure 3-6 shows the placement of the
stopwatches within the receive and transmit flows of the MAC layer. Mote A starts
its watch when the MAC receives a packet to be sent from the upper layers of the
network stack and stops its watch when it detects the start-symbol of an incoming
packet from mote B. The locations of the stopwatch trigger points in the mote B
MAC are the same as in mote A, but the operations are reversed. It starts the
watch when it receives a packet and stops it when it starts to transmit.
84
A single iteration of the measurement consists of mote A sending a packet to
mote B and mote B immediately reflecting the packet back to mote A. Due to the
symmetry inherent in the placement of the stopwatch trigger points, β is propor-
tional to half the difference between Stopwatch A and Stopwatch B:
β =(StopwatchA − StopwatchB)
(2 ∗ (Packet transmission time)). (3.5)
Over 50 iterations, we measure an average β of 0.030 ± 0.003 (with confidence
level of 95%) for the Mica motes. Substituting β into Equation 3.1, the standard
expression for CSMA throughput (Smax), we predict a maximum channel utilization
of approximately 73%. The same measurement procedure executed on the Mica2
mote predicts a maximum throughput of approximately 36% with the default MAC
in TinyOS-1.1.0. Note that the measurement of β is simply a way to provide theo-
retical rationale to determine a reasonable threshold. Alternatively, one can always
determine a suitable threshold experimentally.
3.4.2 Channel Loading Measurement and Utilization
Setting the channel loading threshold that will trigger the backpressure mechanism
requires consideration of the tradeoff between energy savings and fidelity. Conserv-
ing energy implies a strategy that senses the channel load sparsely in time (fewer
timer interrupts and processing). However, the channel load measurement is most
accurate when sensing densely in time. As a compromise between dense and sparse
sampling, we use the scheme discussed in Section 3.3.1.2 where the channel load
is measured for N consecutive epochs of length E (with some fixed channel state
sampling rate within this epoch), and an exponential average, with parameter α,
is calculated to represent the sensed channel load. The problem then becomes to
manipulate these three parameters (N,E, α) so that the node’s sensed channel load
85
Tx ByteT_carrier sense
T_encode
T_preamble search
T_start symbol search
T_read bits
T_decodeRx Byte
BA
Start Timer
Start Timer
Rx Byte
T_start symbol search
T_read bits
T_decode
T_preamble search
T_carrier sense
T_encode
Tx ByteStop Timer
Stop Timer
Upper Layers(Assuming T_prop ~ 0)
Upper Layers
Upper Layers
Figure 3-6: MAC layer stopwatch placement for β measurement. Diagram of receiveand transmit state flows in the TinyOS MAC component code. Placement of thestopwatch start/stop trigger points are marked with an X.
86
is as close as possible to the actual channel load.
To do this optimization experimentally, we use two motes running TinyOS with
a CSMA MAC. Mote S is a randomized CBR source that sends at 4 packets per
second. Mote R is the receiver that senses the channel load using the scheme men-
tioned in the previous paragraph. The channel is sampled once per millisecond for
each epoch E for a total of N epochs. Using this setup we tested all combinations
of N ∈ 2, 3, 4, 5; E ∈ 100ms, 200ms, 300ms and α ∈ 0.75, 0.80, 0.85, 0.90. A
time series average, of the exponential averages, is taken over 256 seconds for each
combination (1024 packets are sent). Using this method we found that the combi-
nation (4, 100ms, 0.85) yielded the average sensed channel load at mote R closest to
the actual average channel load (in % terms) calculated by mote S with an accuracy
of 0.16±0.07. In general, we observe that the detection accuracy is not very sensitive
(the difference is within 5%) to these three parameters. Therefore, manual calibra-
tion for each new CSMA-based radio might not be necessary. Our experiences with
the new generation of Mica2 mote, which uses a different radio/MAC than Mica, is
consistent with this conjecture.
In order to address the more realistic case of a node that both listens to and
forwards packets, a third mote F is added to the previous experimental setup with
all motes well within the transmission range of each other. Mote F forwards packets
sent from mote S in a random interval between 30 and 130 milliseconds after it
receives them, and also senses the channel load using the same scheme with the same
(N,E, α) parameters that mote R uses. There is now contention for the channel
since there are two packet sources (motes S and F). To minimize the probability
of dropping packets from the application layer because of buffer limitations, we use
a buffer size of 3 packets at the MAC layer. This decision is based on the queue
performance result shown in Figure 3-4, where we observe that the average queue
87
50
55
60
65
70
75
80
85
90
95
100
2 4 6 8 10 12 14 16 18
Cha
nnel
Loa
d/D
eliv
ery
Rat
io (
%)
Source Rate (packets/sec)
Nominal LoadMeasured Load
Application Delivery RatioMAC Delivery Ratio
Figure 3-7: A limit on measured channel load is imposed by β. Nominal load curveincreases with constant slope as the source packet rate increases, while the measuredload saturates at a value below 70%.
size is 3 before the channel saturates. Mote R remains as a reference node to
check the channel load sensed by mote F and also to keep track of the number of
packets sent by motes S and F to calculate the delivery ratio.
With mote S sending 1024 packets, we measure the packet delivery ratio and
channel load sensing accuracy using different source packet rates (viz. 4, 5, 6.25,
7.69, 9.09, 10, 16.67). The average sensed channel load at R and F, along with the
nominal channel load (calculated based strictly on offered load), are plotted against
the source packet rate in Figure 3-7.
Figure 3-7 shows the β-dependency of the CSMA MAC on the Mica mote. We
can see from the plot of the nominal channel load that the offered load is more
than enough to saturate the channel at points above 7.69 packets per second (source
packet rate). However, we can also observe that regardless of the source packet rate,
the measured channel load/utilization saturates below 70%. This is in agreement
with the limitation predicted by β (as shown in Section 3.4.1), if we can assume that
packet collision and buffer limitation do not contribute significantly to the observed
88
reduced channel load. To verify this assumption, we analyze the packet delivery
ratio at both the MAC and application layer in Figure 3-7.
We define the MAC packet delivery ratio as the percentage of packets sent by the
MAC layer at motes S and F that are actually received by mote R. The application
delivery ratio is the percentage of packets sent by the application layer (i.e., passed
down to the MAC queue) at motes S and F that are actually received by mote R.
Figure 3-7 shows that both application and MAC delivery ratios match each other
closely, indicating that nearly every packet that gets into the MAC queue is sent and
received successfully, eliminating the effect of packet collision and buffer overflowing
in the reduced channel load.
3.4.3 Energy Tax, Fidelity Penalty, and Power
We define three metrics to analyze the performance of CODA on sensing applica-
tions:
• Average Energy Tax - this metric calculates the ratio between the total number
of packets dropped in the sensor network and the total number of packets
received at the sinks over an experiment, as introduced in Section 3.3.2.1.
Since packet transmission/reception consumes the main portion of the energy
of a node, the number of wasted packets per received packet directly indicates
the energy saving aspect of CODA when compared to the case of systems
without CODA.
• Average Fidelity Penalty - we define the data fidelity as the delivery to the
sink of the required number of data event packets within a certain time limit
(i.e., event delivery rate). This metric measures the difference between the
average number of data packets received at a sink when using CODA and when
using the ideal scheme discussed in the Appendix A. Since CODA’s control
89
Src−1 Src−3Src−2
Sink
Figure 3-8: Experimental sensor network testbed topology. Nodes are well con-nected. Packets are unicast.
policy is to rate control the sources during periods of congestion, fidelity is
necessarily degraded on the average. This fidelity difference, when normalized
to the ideal fidelity obtained at the sink, indicates the fidelity penalty for using
CODA. A lower fidelity penalty is desired by CODA to efficiently alleviate
congestion while attempting not to impact the system performance seen by
sensing applications.
• Power - this metric calculates the ratio of data fidelity to energy tax. Tradi-
tional end-to-end congestion control schemes often define power as the ratio
of throughput to delay where the objective function is to maximize the power.
We borrow the same idea but maximize the power by operating the network
to minimize energy tax (thereby maximizing the operational lifetime of the
network) while delivering acceptable data fidelity to the applications. This is
the objective of our closed-loop control.
90
3.4.4 Open-loop Control
We create a simple generic data dissemination application to evaluate our congestion
control scheme in a wireless sensor network. The simple application implements the
open-loop fast time scale component of our scheme using TinyOS and runs on our
Mica mote testbed. When an intermediate (non source/sink) node receives a packet
to forward, it enables channel load sensing. It disables sensing when its packet
queue is emptied. If the channel load exceeds a given threshold value (e.g., 73%
as discussed in Section 3.4.1) during the sensing period or its buffer overflows, it
transmits a backpressure packet. The sources use a multiplicative rate reduction
policy. When a source receives a backpressure message, it reduces its rate by half.
A minimum rate of 2 packets per second is imposed such that a source sending
at this rate will ignore subsequent backpressure messages. An intermediate node
stops transmitting for a random amount of time (up to 400 ms) when it receives
a backpressure message except if it is the “chosen node”, as discussed in Section
3.3.1.3. No link-layer ACKs are used in any testbed experiments.
The experimental sensor network testbed topology is shown in Figure 3-8. Pack-
ets are unicast, with the arrows in Figure 3-8 indicating the unicast paths. The
topology represents a dense deployment of motes so that the radio range of many
of the motes in the graph overlap. The local congestion policy of the intermediate
nodes can include the designation of a “chosen parent” (i.e., the chosen node, as
discussed in Section 3.3.1.3) or set of parents, such that a backpressure message
sent by this node will invoke the suppression method at its neighbors except for
the chosen parent(s). This supports traffic prioritization. In Figure 3-8, the thick
arrows show the “chosen paths”. Paths funnel events toward the sink node. The
three source nodes provide a high traffic load to the network, representing a data
impulse. The source rates are: Src-1: 8pps (packets per second), Src-2: 4pps, Src-3:
91
0
0.2
0.4
0.6
0.8
1
0
0.5
1
1.5
2
2.5
3
Src−1 Src−2 Src−3
Src−1 Src−2 Src−3
without CODAwith CODA
without CODAwith CODA
Fide
lity
Pena
ltyE
nerg
y T
ax
Figure 3-9: Improvement in energy tax with small fidelity penalty using CODA.Priority of Src-2 evident from the fidelity penalty results.
7pps, respectively.
The sink node counts the number of packets it receives from each respective
source. Each source node counts the number of packets it actually sends over the
air and the number of packets the application tries to send. The difference between
these last two counters measures the number of packets a source’s MAC layer drops.
Using ten 120-second trials, we obtain average values for the packets received,
sent, and attempted to be sent but failed (e.g., because of a busy channel, buffer
overflow, etc.) corresponding to each of the three sources. From this measured
data, we calculate the energy tax and fidelity penalty for each of the three sources.
Figure 3-9 shows the result of experiments with and without CODA enabled. We
can see from the figure that with a small fidelity penalty compared with non-CODA
system we can achieve a 3x reduction in energy tax on average. We observe that
without CODA the fidelity penalty is the same for all three sources. With CODA
92
Src1 Src2 Src3
Sink
Figure 3-10: Experimental sensor network testbed topology to capture the funnelingeffect in a larger network with sparsely located sources.
the penalty for Src-2 is less than the other two sources. In contrast with the other
sources, the fidelity penalty for Src-2 is less with CODA than without CODA. The
reason is because the data type of Src-2 has the highest priority. When CODA is
used in the presence of congestion, the suppression mechanism favors Src-2 packets
over the others.
3.4.5 Combining Open-loop and Closed-loop Control
We reuse the application described above but increase the network size by adding
more motes into the testbed to capture the funneling effect in a larger network as
shown in Figure 3-10. We implement CODA’s closed-loop control component, as
discussed in Section 3.3.2, into the application running in parallel with the open-
loop component. The first experiment examines the rate control dynamics of CODA.
Figure 3-11 presents time series traces taken at one of the sources in the topology,
i.e., Src1 in Figure 3-10. Wsink is set to 25 (representing 4% of control overhead)
and we examine our closed-loop model using two values for multiplicative factor d,
of 0.5 and 0.8, respectively. The two time series traces (source rate) closely resemble
the numerical example traces shown in Figure 3-5. We observe that the source rate
93
0
0.5
1
1.5
2
2.5
3
0 200 400 600 800 1000 1200
0
0.5
1
1.5
Sou
rce
Rat
e (p
kts/
sec)
Fid
elity
or
Agg
rega
te D
eliv
ery
Rat
e (p
kts/
sec)
Time (secs)
Src-1 Rate, d=0.5 Src-1 Rate, d=0.8
Event Delivery, CODAEvent Delivery, no CODA
Figure 3-11: Time series traces that present the rate control dynamics and theevent fidelity/delivery performance of CODA. CODA’s rate control scheme doesnot increase the degree of variability to the event delivery performance.
oscillates when using a smaller d = 0.5 and converges more slowly when compared to
its counterpart when d = 0.8. In the experiment, the open-loop control component
is running in parallel and backpressure signals are originated from the mote closest
to the sink (i.e. at the funnel neck). However, we observe that none of the signals
propagate back to any of the sources, confirming our postulation regarding the
limitations of open-loop control discussed in Section 3.2.2.2.
To understand the impact of our rate control algorithm on the stability of event
delivery/fidelity at the sink, we plot the event delivery rate measured at the sink as
time series traces in Figure 3-11. While the traces exhibit a high degree of variability
even without any rate control (trace with no CODA), we observe that CODA rate
control does not increase the degree of variation. Rather, the trace with CODA is
more stable (has less variation) after the rate converges to a value that is determined
by the ETax threshold in the closed-loop model.
The next experiment examines our closed-loop control model that controls the
tradeoffs between the perceived energy tax of the network and the perceived appli-
94
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
0.4 0.6 0.8 1 1.2 1.4 1.6
Pow
er (
pkts
/sec
)
Preset Etax Threshold
d=0.5,Wsink=12(8% ovrhd)d=0.8,Wsink=12(8% ovrhd)d=0.5,Wsink=25(4% ovrhd)d=0.8,Wsink=25(4% ovrhd)
Figure 3-12: Tradeoff between fidelity and energy tax that obtain the most benefit,i.e. maximum “power”, for the network.
cation fidelity. As discussed in Section 3.3.2.1, a smaller value of d yields a larger
saving of energy tax but negatively impacts the data fidelity. Similarly, allowing a
smaller value of ETax threshold in the network (Equation 3.4) would reduce Wsrc,
hence a smaller observation cycle. This makes the control algorithm more sensitive
to packet loss, thus reducing rate and energy tax more aggresively but would ad-
versely affect the data fidelity. To achieve a balance and obtain the best benefit out
of the closed-loop control, we calculate the power metric, defined in Section 3.4.3 as
the ratio of data fidelity to energy tax, and present the results with different control
parameters in Figure 3-12. The result clearly indicates that a smaller value of ETax
almost always guarantees a higher power. Therefore, a smaller observation cycle
can gain more in energy tax than it harms the fidelity. On the other hand, although
a smaller value of d gives a higher average power, the gain is less stable as observed
in the high degree of variability (indicated by the error bars, which represent the
corresponding 95% confidence intervals).
Finally, Figure 3-13 presents the performance gain of CODA compared to the
95
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
1 2 3 4 5 6 7 8 9
Pow
er (
pack
ets/
sec)
Aggregate Source Rate (packets/sec)
CODAno CODA
Figure 3-13: Power of CODA versus non-CODA in an experimental Mica motestestbed.
cases without CODA under different network workload. We observe that CODA is
able to prevent the network power from degrading exponentially when the workload
increases.
3.5. Simulation Results
We use packet-level simulation to study the scalability performance and the network
dynamics of CODA in large networks.
3.5.1 Simulation Environment
We implemented both open-loop backpressure and closed-loop regulation in the ns-
2 [60] simulator in their simplest instantiation; that is, a simple AIMD function
is implemented at each sensor source by an application agent. The reception of
backpressure messages at the source, or, in the case of closed-loop control, not
receiving a sufficient number of ACKs from the sink over a predefined period of time,
will cause a source to cut its rate by half (i.e., d = 0.5). For intermediate nodes
96
(non source/sink), local congestion policy is such that backpressure messages will
halt a node’s transmission for a small random number of packet times (i.e., packet
transmission times) unless a node is the chosen node specified in the backpressure
message, as discussed in Section 3.3.1.3.
In all our experiments, we use random topologies with different network sizes.
We generate sensor networks of different sizes by placing nodes randomly in a square
area. Different sizes are generated by scaling the square size and keeping the nominal
radio range (40 meters in our simulations) constant in order to approximately keep
the average density of sensor nodes constant. In most of our simulations, we study
five different sensor fields with size ranging from 30 to 120 nodes in increments of 20
nodes. For each network size, our results are averaged over five different generated
topologies and each value is reported with its corresponding 95% confidence interval.
Our simulations use a 2 Mbps IEEE 802.11 MAC provided in ns-2 simulator,
with some modifications. First, we disable the use of RTS/CTS exchanges and link-
layer ARQ for data packets. We do this for the reasons discussed in Section 3.2.1
because we want to capture the realistic cases where reliable delivery of data is not
needed and the fidelity can be compromised to save energy. Although we use IEEE
802.11 in the simulation, most sensor platforms use simpler link technologies where
the ARQ is not enabled by default, (e.g., Berkeley motes). Next, we added code to
the MAC to measure the channel loading using the epoch parameters (N = 3, E =
200ms, α = 0.5), as defined in Section 3.3.1.2. The choice of the parameters is not
crucial because the ns-2 simulator does not model the details of the IEEE 802.11
physical layer. The MAC broadcasts backpressure messages when the measured
channel load exceeds a threshold of 80%. We have added code to model the channel
idle detection delay with a β of 0.01, which yields a Smax of 80%. Closed-loop
multi-source regulation is implemented as an application agent attached to source-
97
Event Epicenter
Figure 3-14: Network of 30 nodes. Sensors within the range of the event epicentre,which is enclosed by the dotted ellipse, generate impulse data when an event occurs.The circle represents the radio range (40m) of the sensor.
sink pairs. Wsink is set to 100 and the ETax threshold to 2 for the closed-loop control
parameters.
Finally, we use Directed Diffusion [11] as the routing core in the simulations
since our congestion control fits nicely into the diffusion paradigm, and since doing
so allows insight into CODA’s interaction with a realistic data routing model where
congestion can occur.
In most of our simulations, we use a fixed workload that consists of 6 sources
and 3 sinks. All sources are randomly selected from nodes in the network. Sinks
are uniformly scattered across the sensor field. A sink subscribes to 2 data types
corresponding to two different sources. This models the typical case in which there
are fewer sinks than sources in a sensor field. Each source generates packets at a
different rate. An event packet is 64 bytes and an interest packet is 36 bytes in size
[11], respectively.
98
3.5.2 Results and Discussion
We evaluate CODA under the three distinct congestion scenarios discussed in the
Introduction section to best understand its behavior and dynamics in responding
to the different types of congestion found in sensor networks. First we look at a
densely deployed sensor field that generates impulse data events. Next, we examine
the behavior of our scheme when dealing with transient hotspots in sparsely deployed
sensor networks of different sizes. Last, we examine the case where both transient
and persistent hotspots occur in a sparsely deployed sensor field generating data at
a high rate.
3.5.2.1 Congestion Scenario - Dense Sources, High Rate
We simulate a network with 30 nodes, as shown in Figure 3-14, emulating a disaster-
related event (e.g., fire, earthquake) that occurs 10 seconds into the simulation. Each
node within the epicenter region, which is enclosed by the dotted ellipse, generates
at least 100 packets per second sent toward the sinks, shown as filled black dots in
the figure.
Figure 3-15 shows both the number of packets delivered and the packets dropped
as time series traces. For the packet delivery trace, we count the number of data
packets a sink receives every fixed interval of 500ms, which indicates the fidelity
of the data samples. For the packet dropped trace, we count the number of data
packets dropped within the whole network every 500ms.
From the traces, it is clear that the difference in data delivery (fidelity) with and
without CODA is small, while the number of packets dropped is an order of magni-
tude smaller (hence the energy savings) when congestion control is applied. We can
also observe from the plot that the congestion is effectively relieved within 2 to 3
seconds. This shows the adaptive property of CODA. The delivery plot reflects the
99
0
100
200
300
400
500
600
700
800
900
1000
0 5 10 15 20 25 30
Num
ber
of P
acke
ts
Time (s)
packet-delivery-trace-CCpacket-delivery-trace-noCC
packet-drop-trace-CCpacket-drop-trace-noCC
Figure 3-15: Time series traces for densely deployed sources that generate high ratedata.
real system goodput, which is highly dependent on the system capacity, indicating
the maximum channel utilization. When impulses happen, the channel is saturated
so it can deliver only a fraction of the event’s data. CODA’s open-loop backpressure
(even with a very simple policy) adapts well to operate close enough to the channel
saturation, as shown in Figure 3-15, while efficiently alleviating congestion. This
greatly reduces the number of packets dropped thereby saving energy, which is the
key objective function for CODA. The same simulation scenario is repeated 5 times
using different topologies of the same size. Overall, using CODA obtains packet
(energy) saving up to 88 ± 2% while the fidelity penalty paid is only 3 ± 11%.
3.5.2.2 Congestion Scenario - Sparse Sources, Low Rate
To examine the ability to deal with transient hotspots, in these simulations all six
sources send at low data rates, at most 20 packets per second. Four of the sources
are randomly selected so that they are turned on and off at a random time between
10 and 20 seconds into the simulation.
100
(b.)
(a.)
0
50
100
150
200
250
300
0 5 10 15 20 25 30Time (s)
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30Time (s)
pkt−delivery−OCCpkt−delivery−CCC
pkt−delivery−noCC
pkt−drop−CCCpkt−drop−noCC
pkt−drop−OCC
Num
ber
of P
acke
tsN
umbe
r of
Pac
kets
Figure 3-16: (a) Packet delivery and (b) Packet drop time series traces for a 15-nodenetwork with low rate traffic. The plots show the traces for three cases: when onlyopen-loop control (OCC) is used, both open-loop and closed-loop control (CCC) areenabled and when congestion control is disabled (noCC).
101
0
1
2
3
4
5
6
7
8
20 40 60 80 100 120
Rat
io
Network Size (# of nodes)
E.Tax.noCODAE.Tax.CODA
Fidelity.Penalty.CODA
Figure 3-17: Average energy tax and fidelity penalty as a function of the networksize when only CODA’s open loop control is used.
Figure 3-16 shows the packet delivery and packet drop traces for one of the simu-
lation sessions in a network of 15 nodes. Observe in Figure 3-16(a), the difference in
fidelity between the three cases is small, except for around 20 seconds into the trace,
where only open-loop control is used. Figure 3-16(b) shows a large improvement in
energy savings (i.e., packet drop reduction) especially when closed-loop control is
also enabled together with open-loop control. Again, the figure shows that at around
20 seconds into the trace, open-loop control cannot resolve congestion as there is no
reduction in the number of dropped packets and there is low delivery during this pe-
riod. This is because transient hotspots turn into persistent congestion at around 18
seconds into the trace until four of the sources turn off after 20 seconds. Open-loop
control cannot deal with persistent congestion unless the hotspots are close to the
sources, as discussed in Section 3.2.2.2. On the other hand, the trace corresponding
to closed-loop regulation also shows that the fidelity is maintained while effectively
alleviating congestion with only a small amount of additional signaling overhead.
Importantly, the signaling cost of CODA is less than 1% with respect to the number
of data packets delivered to the sink.
102
0
5
10
15
20
20 40 60 80 100 120
Rat
io
Network Size (# of nodes)
High.noCODA.E.TaxLow.noCODA.E.Tax
High.CODA.E.TaxLow.CODA.E.Tax
Figure 3-18: Energy tax as a function of network size for high and low rate datatraffic. The difference between the data points with and without CODA indicatesthe energy saving achieved by CODA.
The same behavior can be observed in Figure 3-17, where the two metrics (i.e.,
energy tax and fidelity penalty) are plotted as a function of the network size. Note
that when using only open-loop control, the energy savings has a large variation,
indicated by the error bars that represent 95% confidence intervals. This indicates
that congestion is not always resolved, especially for larger-sized networks. This is
because in larger networks, persistent hotspots, which localized open-loop control is
unable to resolve, are more likely to occur given the long routes between source-sink
pairs. When closed-loop control is also enabled, the energy savings is large, up to
500% with a small variation, and increases with the growing network size, as shown
in Figure 3-18.
Overall, the gain from using open-loop control in larger networks is limited.
Hotspots are likely to persist when the sources are generating data at a low rate
because of possible long routes and the funneling effect. Enabling closed-loop control
even at low source rates can improve the performance significantly, with the addition
of a small overhead for the control packets from sinks. Note that the amount of
103
overhead is only a small fraction (i.e., 1%, of the number of data packets that the
sink receives). This result suggests that except for small networks, always enabling
closed-loop control is beneficial, regardless of the source rate. This is an important
observation that guides the use of CODA’s mechanisms in sensor networks.
3.5.2.3 Congestion Scenario - Sparse Sources, High Rate
We examine the performance of our scheme in resolving both transient and per-
sistent hotspots where sparsely located sources generate high data traffic. In the
simulations, all sources generate 50 packets per second data traffic over the 30 sec-
ond simulation time. Both open-loop and closed-loop control are used throughout
the simulations. Figure 3-18 shows that CODA can obtain up to 15 times or 1500%
energy savings. Figure 3-19 shows that CODA can maintain a relatively low fidelity
penalty of less than 40% as compared to the ideal scheme. Observe that energy tax
increases as the network grows in general. However, in Figure 3-18 we can see that
the energy tax actually decreases when the network grows beyond the size of 100
nodes (the same behavior can be observed in Figure 3-17). This is because under
a fixed workload, which is the case in our simulations, a network’s capacity could
increase when the network grows beyond certain sizes. This is because the data
dissemination paths from the sources to the sinks spread across a broader network
and the funneling effect is lessened.
3.6. Related Work
There is a growing interest in the problem of congestion in sensor networks. The
need for congestion avoidance techniques is identified in [35] while discussing the
infrastructure tradeoffs for sensor networks. Tilak, Abu-Ghazaleh, and Heinzelman
[35] show the impact of increasing the density and reporting rate on the performance
104
0
0.2
0.4
0.6
0.8
1
20 40 60 80 100 120
Rat
io
Network Size (# of nodes)
LowRate.Fidelity.PenaltyHighRate.Fidelity.Penalty
Figure 3-19: Fidelity penalty as a function of the network size for high and low ratedata traffic.
of the network. While the authors do not propose any congestion avoidance mech-
anisms, they do note that any such mechanism must converge on a reporting rate
that is just sufficient to meet the performance or fidelity of the sensing application.
This is an important observation in the context of sensor networks.
Some existing data dissemination schemes [11] [49] can be configured or mod-
ified to be responsive to congestion. For example, Directed Diffusion [11] can use
in-network data reduction techniques such as aggressive aggregation when conges-
tion is detected. Other protocols, such as PSFQ (Pump Slowly Fetch Quickly [49], a
reliable transport protocol for sensor networks discussed in Chapter 2) can adapt the
protocol (i.e., modulate its pump/fetch ratio) to avoid congestion. However, such
approaches involve highly specialized parameter tuning, accurate timing configura-
tion, and an in-depth understanding of the protocol’s internal operations. There is a
need for a comprehensive set of congestion control mechanisms specifically designed
to best fit the unique constraints and requirements of sensor networks and their
emerging applications. These mechanisms should provide a general set of compo-
105
nents that can be plugged into applications or the MAC in support of energy efficient
congestion control.
In [75] a comprehensive study of carrier sensing mechanisms for sensor networks
is reported. The authors propose an adaptive rate control mechanism that supports
fair bandwidth allocation for all nodes in the network. Implicit loss (i.e., failed
attempts to inject a packet into the network) is used as a collision signal to ad-
just the transmission rate of nodes. The paper focuses on fairness issues in access
control but not congestion control. In [77] the authors assume homogeneous appli-
cations in an indoor environment where sinks are sensor access points (SAPs) that
work collaboratively to collect data from a sensor field. The authors propose using
a combination of a hop-by-hop flow control scheme and a SAP selection routing
metric that considers packet loss probabilities, path load, and path length to select
congestion-free paths to SAPs, improving the capacity of the network.
In [66] an event-to-sink reliable transport protocol (ESRT) provides support for
congestion control. ESRT regulates the reporting rate of sensors in response to
congestion detected in the network. This paper is inspired, as our work is, by the
observations of Tilak, Abu-Ghazaleh, and Heinzelman [35] discussed above. ESRT
monitors the local buffer level of sensor nodes and sets a congestion notification
bit in the packets it forwards to sinks if the buffer overflows. If a sink receives a
packet with the congestion notification bit set it infers congestion and broadcasts
a control signal informing all source nodes to reduce their common reporting fre-
quency according to some function. As discussed in [66] the sink must broadcast
this control signal at high energy so that all sources in the sensor field can hear it.
Such a signal has a number of potential drawbacks, however, particularly in large
sensor networks. Any on-going event transmission would be disrupted by such a
high powered congestion signal to sources. In addition, rate regulating all sources
106
in the manner proposed in [66] is fine for homogeneous applications where all sen-
sors in the network have the same reporting rate but not for heterogeneous sources.
Even with homogeneous sources, ESRT always regulates all sources regardless of
where the hotspot occurs in the sensor field or whether the observed hotspot im-
pacts a path between a source and sink. We believe there is a need to support
heterogeneous sources and only regulate those sources that are responsible for, or
impacted by, transient or persistent congestion conditions. Furthermore, we be-
lieve that closed-loop regulation of sources should not use high energy but instead
hop-by-hop signaling that does not interfere with on-going data dissemination.
More recently, Ee and Bajcsy study the fairness issues of congestion control in
sensor networks [78]. They propose a distributed congestion control algorithm in
the transport layer of the traditional network stack model, to ensures the fair de-
livery of packets to a central node. In [76], Hull et al. experimentally investigate
the end-to-end performance of various congestion avoidance techniques in a 55-node
sensor networks. They propose a strategy called Fusion that combines three con-
gestion control techniques that operate at different layers of the traditional protocol
stack. These techniques include a version of hop-by-hop flow control (similar to
CODA’s open-loop control), a source rate limiting scheme (similar to the adaptive
rate control mechanism proposed in [75]) that meters traffic being admitted into
the network, and a prioritized MAC layer that gives a backlogged node priority
over non-backlogged nodes for access to the shared medium. Based on an extensive
amount of experimental data from the sensor network, the paper shows the ad-
verse effects of network congestion and demonstrates that Fusion, the combination
of these three techniques, can greatly improve the network efficiency (up to 300%)
under realistic workloads.
A number of other groups have looked at the issue of congestion control in
107
wireless networks other than sensor networks. For example, WTCP [79] monitors
the ratio of inter-packet separation for senders and receivers to detect and react to
congestion in wireless LANs. SWAN [80] forces sources to re-negotiate end-to-end
flows if congestion is detected in wireless ad hoc networks. RALM [81] employs
TCP-like congestion and error control mechanisms for multicast support in wireless
ad hoc networks. While multicast congestion control and congestion control in
wireless networks are of interest they do not address the same problem space as
energy efficient congestion detection and avoidance for sensor networks.
3.7. Conclusion
In this chapter, we have presented an energy efficient congestion control scheme for
sensor networks called CODA. The framework is targeted at CSMA-based sensors3,
and comprises three key mechanisms: (i) receiver-based congestion detection, (ii)
open-loop hop-by-hop backpressure, and (iii) closed-loop multi-source regulation.
We have presented experimental results from a small sensor network testbed based
on TinyOS running on Berkeley Mica motes. We defined three performance metrics,
average energy tax, average fidelity penalty and power, which capture the impact
of CODA on sensing applications’ performance. A number of important results
came out of our study and implementation. It was straightforward to measure
β, channel loading at the receiver, and to evaluate CODA with a generic data
dissemination scheme. We have also demonstrated through simulation that CODA
can be integrated to support data dissemination schemes and be responsive to a
number of different congestion control scenarios that we believe will be prevalent in
future sensor network deployments. Simulation results indicated that CODA can
3Other than the congestion detection component, the two control components are independentof the MAC used and can work with other scheduled-based MAC such as TDMA.
108
improve the performance of Directed Diffusion by significantly reducing the average
energy tax with minimal fidelity penalty to sensing applications. These results are
very promising and provide a basis for further larger scale experimentation.
Our study of congestion problems in this chapter also reveals that the unique
funneling effect in sensor networks as well as the low-power radio communication
channel can significantly limit the networks’ ability to deliver high fidelity data
from sources to sinks. For example, in Section 3.4.4 our experiment with CODA
in a real testbed setting showed that the application fidelity penalty (i.e., the mea-
sured degradation in application quality as measured at the sink) during periods of
congestion can be up to 80% penalty (see Figure 3-9). To overcome this capacity
limitation, new technologies must be introduced. In the next chapter, we address
this challenge and explore alternative or complementary solutions that can maintain
the application fidelity during persistent overload conditions based on the concept
of dual radio virtual sinks.
109
A. Experimentally determining the ideal fidelity of a network
Assume that there exists an ideal congestion control scheme that is capable of
rate-controlling each source to share the network capacity equally without dropping
each other’s packets. The problem then becomes finding out the network capac-
ity or at least the upper bound of the network capacity. The actual capacity of
the network is application-specific depending on several factors including the radio
bandwidth, the MAC operations, the routing/data dissemination schemes, and the
traffic pattern. Assume that the network is homogeneous in the sense that all wire-
less links are symmetrical and equal. We can determine the upper bound of the
network capacity in a simple and practical manner through experimentation. The
idea is as follows:
Def: Cmax,i = Maximum data delivery rate of a path i associated with source i,
in which the packet drop rate is minimum.
Consider that multiple distinct sources send data toward a common sink trav-
elling along different paths. Assume these dissemination paths from the sources to
the sink coincide with each other and share at least one common link. This is a
reasonable assumption considering the funneling effect toward the sink that these
transmissions have to share at least the air around the sink. Therefore, the data
dissemination capacity for a sink is limited by MaxCmax,i. Thus we can find the
upper bound and calculate the ideal fidelity by measuring MaxCmax,i experimen-
tally.
110
Chapter 4
Dual Radio Virtual Sinks
4.1. Introduction
Wireless sensor networks [5] [11] comprise emerging technologies that offer a low
cost, distributed monitoring solution for a wide variety of applications and systems.
One application driving the development of sensor networks is the reporting of
conditions within a region of interest where the environment can abruptly change
due to a sudden event, such as target movements on the battlefield, biochemical
attack or fire. This chapter focuses specifically on sensor systems that are to be
designed to efficiently deliver information during and immediately following an event
that triggers an abrupt change.
This chapter describes a strategy of handling sudden impulses of data, which
will otherwise move the sensor network almost instantaneously from light load to
overload, while maintaining application fidelity. Data must be delivered through
the sensor network quickly to a relatively small number of physical sink points that
are attached to the regular communication infrastructure. Sensor networks exhibit
a unique funneling effect where events are generated en masse and then have to
quickly move toward a sink point. The flow of events has similarities to the flow of
people from a large arena after a sporting event completes. This leads to a number
111
of significant challenges that include increased transit traffic intensity, congestion
and packet loss (and therefore energy and bandwidth waste) at nodes closer to the
sink. These have a detrimental effect on the operational lifetime of sensor networks.
The major limitation in the design of existing sensor networks is that they are
ill-equipped to deal with data impulses. In [82] [41], the authors show that existing
sensor network protocols and technologies experience and allow large packet loss
even under light to moderate load. What is needed is a well-planned, coordinated
(in a distributed and scalable manner) data exit strategy through a likely small
number of physical sinks.
We propose the novel idea of randomly distributing throughout the sensor field
a small number of all-wireless dual-radio virtual sinks (VSs) that are capable of
offering enhanced congestion avoidance services to the existing low-power sensor
network. While such special nodes can be exploited to support a variety of ap-
plication specific (e.g. aggregation, coding) and common network functions (e.g.,
storage, localized activation), we focus here on the ability to selectively siphon off
data events from regions of the sensor field under critical overload. In essence vir-
tual sinks operate as safety valves in the sensor field to divert selected packets from
congested areas in order to maintain the fidelity of the application signal (e.g., as
simple as events/sec, or more complex) at the physical sink, and to alleviate the
funneling effect.
In this chapter we call these specialized nodes virtual sinks (VSs) to distinguish
them from physical sinks that typically provide a gateway to the Internet via a
wireline interface. Virtual sinks are equipped with a secondary long-range radio
interface, such as IEEE 802.11, in addition to their primary low power mote radio.
Virtual sinks are capable of dynamically forming a secondary ad hoc radio network
that is rooted at a physical sink. Rather than rate controlling as in the case of
112
the congestion avoidance techniques such those as used by CODA [82] described in
previous chapter, virtual sinks take some traffic off the low-powered sensor network
(i.e., off the primary radio network) when persistent congestion is detected, and
move it to the physical sink using the secondary radio network.
The chapter is organized as follows. Section 4.2. presents the related work. Sec-
tion 4.3. discusses a number of important design considerations such as the funneling
effect, small world observations that can be exploited, and traffic redirection strate-
gies. Section 4.4. provides the detailed design of the Siphon algorithms. While
our design addresses overload traffic management in sensor networks we believe the
Siphon algorithms are more generally applicable to a broader class of new applica-
tions that exploit special nodes with additional capability (e.g., dual radio, more
computational capability or more storage). Section 4.5. studies the performance
properties of Siphon using the ns-2 simulator, which is enhanced to support dual
radio virtual sink nodes. Experimental results from a Stargate implementation of
virtual sinks in a Mica sensor network testbed are reported in Section 4.6.. Section
4.7. concludes.
4.2. Related Work
Event-based sensor networks generate impulse data traffic triggered by events of
interest. Large scale events (e.g. forest fires, earthquakes) can generate large im-
pulse waves of correlated data across a large area, creating a bottleneck down the
propagation funnel toward a sink even when the report rate is low.
Existing congestion control [82] [66] [76] schemes do not adequately address the
funneling effect and the type of congestion scenarios exhibited because of this effect.
Our prior work on CODA discussed in previous chapter is representative of the first
generation congestion control schemes. CODA provides a conservative solution to
113
mitigating congestion in sensor networks and assumes that all nodes are equal (with
the exception of the sink) in trying to counter and react to the onset of congestion.
CODA’s congestion control policy at source and forwarding nodes is to rate control
the traffic through a hop-by-hop backpressure mechanism as well as a closed-loop
multi-source regulation scheme during periods of persistent congestion. Thus, when
congestion occurs and the channel becomes saturated, the application fidelity [35],
which can be viewed as the application’s quality of service measured at the sink,
can be significantly degraded.
ESRT [66] regulates the reporting rate of sensors in response to congestion de-
tected in the network by monitoring the local buffer level of sensor nodes. In [76],
Hull et al. experimentally investigate the end-to-end performance of various con-
gestion avoidance techniques in a 55-node sensor network. They propose a strategy
called Fusion that combines three congestion control techniques that operate at dif-
ferent layers of the traditional protocol stack. These techniques include a version
of hop-by-hop flow control similar to CODA’s open-loop control, a source rate lim-
iting scheme similar to the adaptive rate control mechanism proposed in [75] that
meters traffic being admitted into the network, and a prioritized MAC layer that
gives a backlogged node priority over non-backlogged nodes for access to the shared
medium.
A collision-minimizing CSMA MAC is proposed in [70] that is optimized for
event-driven sensor networks. The authors propose to utilize a non-uniform prob-
ability distribution for nodes to randomly select contention slots such that colli-
sions between contending stations are minimized. This MAC can reduce packet loss
around hotspots but can not completely resolve congestion due to the funneling
effect when the incoming traffic exceeds the node capacity and the queue overflows.
While CODA and other schemes are capable of avoiding congestion/collision and
114
costly packet loss and therefore energy waste, it is to the detriment of the maximum
number of events that can be funnelled to the sink. The fundamental question that
this chapter addresses is whether alternative or complementary solutions exist that
maintain the application fidelity during persistent overload conditions.
Recently, the idea of utilizing multiple coordinated radios operating over multiple
channels to improve and optimize wireless network capacity has been proposed. In
[83] the authors exploit the possibility of adding a second low-power radio of lower
complexity and capability into a node in a wireless LAN network to increase the
battery lifetime of the node. The main idea is to use the secondary lower-power
radio to wake up a node, allowing the node to shutdown the primary radio during
idle periods. In [53], the authors explore the implications of using multiple radios
on each node that work in an integrated manner to solve a number of existing
problems in wireless networking with a focus toward the energy management and
capacity enhancement issues in wireless LAN environments.
In [84], the authors propose leveraging processing and energy heterogeneity in
sensor network to improve the network performance, including a topology control
protocol which systematically shifts the network’s routing burden to energy-rich
nodes. More recently, in [85] the authors propose to extend the idea of exploiting
heterogeneity in sensor networks to use a modest number of line-powered back-
hauled nodes that connect to the wired backbone network. They prove analytically
that this approach increases network reliability and lifetime in a grid network. They
do not, however, consider congestion or the funneling effect in the network.
Finally, [86] [87] investigate the optimal radio transmission range that balances
the wireless capacity and network connectivity from a theoretical viewpoint. Results
suggest that the smallest transmission range that is just enough to assure network
connectivity is optimal. However, the papers do not consider the dual-radio config-
115
uration, where a separate secondary channel can increase the network capacity.
4.3. Design Considerations
A number of questions arise when studying the deployment of virtual sinks. What
is the optimal number and distribution of virtual sinks to minimize congestion and
energy consumption? Utilizing a longer-range radio is usually demanding in terms of
energy consumption. Therefore, one should only activate the secondary long-range
radio when it is needed. When does a virtual sink offer such hotspot services to local
sensors? How do sensors discover local virtual sinks? When congestion or overload
conditions occur which packets should be redirected onto the secondary long-range
network? How can sensor networks automatically benefit from the existence of
virtual sinks in their neighborhoods, but maintain uninterrupted services in their
absence using the existing congestion avoidance mechanisms, such as those discussed
earlier [82] [66] [76]? What if the virtual sinks cannot form a connected network
with the physical sink on their own? In what follows, we explore these questions
and discuss the technical considerations that underpin the design of Siphon. The
detailed design is presented in Section 4.4..
4.3.1 Funneling Effect
Conventional networks assume traffic flows in all directions. However, sensor net-
works exhibit a unique funneling effect where events are generated en masse and
then have to quickly move toward a relatively small number of physical sink points.
Figure 4-1 illustrates the funneling effect. Sensors within the range of an event
epicenter generate impulse data that travels along a propagation funnel toward a
sink when an event occurs. One or more physical sinks can exist at any location in
the sensor field to collect the event data from the active sensors. Sensors located
116
Physical sink Virtual sink sensor Active sensor
Figure 4-1: The funneling effect. Sensors within the range of an event re-gion/epicenter (enclosed by the dotted ellipse) generate impulse data that travelalong a propagation funnel (enclosed by dotted line) toward the sink when an eventoccurs.
within the propagation funnel between the event epicenter and the physical sink
will typically consume more energy. This leads to a number of significant challenges
that virtual sinks can help address.
First, the funneling effect places heavier load on sensors that are closer to a
sink point. As a result the sensors nearest the sink will use energy at the fastest
rate, significantly impacting the operational lifetime of the network. Second, traffic
intensifies at the neck of the funnel causing congestion, packet loss, and therefore
wasted energy and bandwidth. The aggregation of data events can help offset con-
gestion and the disproportionate amount of energy consumed by forwarding nodes
located nearer the sink by trading off computation and communications resources.
However, it is unlikely that aggregation techniques alone can completely resolve the
congestion problem and funneling effect. Because of the build up of traffic close to
the sink, loss of aggregated data packets is more likely. This can severely impact the
117
reporting capability (i.e., the fidelity) of the network to meet the application’s needs.
Aggregate packets not only represent the reporting of accumulated events from the
network and are considered more ”valuable” in comparison to non-aggregated pack-
ets, but also they consume more energy on average than non-aggregated packets
by the time they eventually reach the sink. This argues for priority treatment of
aggregated packets in the case of congestion.
4.3.2 Small World Observations and Shortcuts
By using specialized dual-radio nodes as virtual sinks, the second long-range radio
can serve the purpose of creating “shortcuts” in the sensor network among other
virtual sinks and one or more physical sinks. Our goal is to design control mecha-
nisms such that the network can automatically benefit from the existence of virtual
sinks in the neighborhood, in a way that the gain increases when the number of
these nodes increases, but to maintain uninterrupted baseline services even without
any of these nodes.
While a dual-radio sensor platform (e.g. Stargate [48]) is feasible, the cost is still
much higher than a single-radio platform (e.g., Berkeley motes series [23]). There-
fore, the cost of deploying a large number of dual-radio sensor platforms in sensor
networks is prohibitive. Recent “small world” studies conducted by Watts and Stro-
gatz [47] has shown that a small fraction of shortcut nodes randomly distributed
in a network is enough to effectively reduce the network diameter, resulting in a
fast distribution network. From this, we conjecture that only a small fraction of
shortcut nodes (i.e., virtual sinks) would be needed to create a fast secondary radio
distribution network for overload traffic.
To examine this conjecture, we simulate a sensor network of 100 nodes using the
ns-2 simulator, with nodes randomly distributed across a 350m x 350m square. The
118
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30
Nor
mal
ized
Sho
rtes
t dis
tanc
e (R
atio
)
Fraction of Dual-radio nodes (%)
Maximum Shortest DistanceAverage Shortest Distance
Figure 4-2: Reduction of average distance in a network with increasing percentageof dual-radio nodes that provide the shortcuts between nodes.
transmission radii of the sensors are 30m for the primary low power, short-range
radio and 150m for the secondary long-range radio. We vary the number of dual-
radio virtual sink nodes from 1% to 30% of the total number of nodes in the sensor
field. A node is randomly selected as a virtual sink from the set of all nodes until we
have the designated fraction of virtual sinks in the network. The shortest distances
for the network are computed using the Floyd-Warshall algorithm [88] which has an
O(N 3) complexity. We independently generate ten networks. Figure 4-2 shows the
average shortest distance and the maximum shortest distance in the network with
a varying fraction of virtual sinks. Both shortcut metrics are normalized against
results when there are no virtual sinks present. Each data point is an average
over the ten network configurations. Assuming physical sinks can be located at
any location in the network, the figure clearly shows that when only 5% of virtual
sinks exist in the network, the average distance is halved. This is a very promising
result because it indicates that a small number of nodes can be used to form a
fast, low-diameter secondary distribution network that can serve to re-route traffic
to a physical sink. This result underpins the viability of the virtual sink concept,
119
showing enhanced congestion control and load balancing in a sensor network can be
implemented in a cost-effective manner.
To ensure a reasonable probability of finding a local virtual sink in the neigh-
borhood of a propagation funnel, there is a lower bound on the number of virtual
sinks needed in the network. In Section 4.5.8 we present an analytical model to offer
insights into finding this lower bound.
4.3.3 Traffic Redirection and Prioritization Issues
The goal of virtual sinks is to steer overload traffic, i.e., redirect event packets
away from congested regions toward a physical sink. Figure 4-1 illustrates this
idea; virtual sinks behave as local sinks that “attract” part of the traffic from a
congested neighborhood, and send it over a long-range radio link that provides a
shortcut distribution path toward the physical sink. Since the secondary, long-range
distribution path usually involves shorter hop distances, delivery latency is likely
to be reduced especially when the secondary radio has higher bandwidth, as in the
case of the Stargate [48] platform.
Utilizing long-range radio is usually expensive in terms of energy consumption;
therefore a virtual sink should only activate its secondary radio when it is necessary
and beneficial. Assuming nodes can measure and quantify their local congestion
levels, the point at which sensors should start (or stop) using available local virtual
sink services must be determined. Further, there is a need to develop redirect
algorithms capable of redirecting designated traffic (e.g., data impulses) as early in
the propagation funnel as possible. That is, data impulse traffic will be redirected
to a virtual sink at the farthest upstream locations within the propagation funnel,
possibly even under lower congestion levels. This would effectively divert part of the
traffic to nearby virtual sinks (if they exist) to counter the funneling effect. Section
120
4.4.2 presents an efficient scheme to address this issue.
Another important issue is deciding what portion of the local forwarding traffic
virtual sinks should redirect. As a general approach, nodes will maintain a dynamic
list of event packet types, e.g., data types (possibly prioritized), aggregates, and
control messages. When congestion exceeds a certain threshold, a node that has
a virtual sink within l hops distance determines whether to redirect certain data
types to its local sink according to local policy. One example is only redirecting
data types associated with impulse data, (e.g., seismic data) as opposed to periodic
events (e.g., temperature). Another policy would be to redirect prioritized or im-
portant target event data directly to the nearest virtual sink because its expedited
service can increase the timeliness and reliability of packet delivery. The traffic redi-
rection through virtual sinks may unavoidably introduce out-of-order data delivery
into the network, at least for a short period during which packets that are previ-
ously delivered through the primary network (before the redirection occurs) to the
physical sink lag behind those later packets that are delivered through virtual sinks.
However, because sensor data are typically time-stamped [26] for the purposes of
local collaborations or aggregations, the impact of a short period of out-of-order
data delivery is minimal to the applications.
4.3.4 Transparency and Compatibility Issues
Our goal is to design a general solution that is “application-agnostic”, i.e. to provide
a common set of fundamental building blocks that can be readily combined and in-
corporated into emerging and off-the-shelf sensor networking technologies, including
MAC [14] [16] [72], transport and routing [11] [89] protocols. Hence, it is important
to design a scheme that maintains maximum transparency and compatibility with a
wide variety of sensor technologies, as mentioned above. Furthermore, virtual sinks
121
can be incrementally deployed in sensor networks in response to different workloads
(i.e., traffic patterns, such as continuous event, and impulse-based offered load) and
fidelity needs.
One way to maintain maximum transparency is to use the same routing logic
on both primary and secondary radio networks, i.e. the networking layer views the
network as a homogeneous system, despite the existence of special dual-radio nodes.
A virtual sink simply advertises the lower routing cost calculated among its two radio
interfaces to its neighbors. In this approach, a virtual sink is likely to be always
utilized by its neighbors (even if there is no congestion) because of the shortcut
routes (hence lower routing cost) through the secondary radio network. Routing
protocols in sensor networks [11] [89] often take into account the link qualities of
the neighbors in calculating routing cost. Therefore, congestion will eventually be
reflected on the routing cost as it degrades the link qualities of neighboring nodes,
forcing data packets to be routed through other non-congested nodes, possibly a
virtual sink. While this is a simple approach to exploit virtual sinks, it lacks the
flexibility to fulfill our goals, specifically for the following reasons:
1. A virtual sink is not necessarily energy-rich, it might not be feasible to con-
tinuously utilize the secondary longer range radio for an extended amount of
time.
2. Because of the stability requirement of a routing protocol as well as its sig-
naling overhead considerations, the routing cost updates occur at a relatively
longer time scale. Relying on routing cost updates to route around a hotspot
could be too slow to deal with impulse data traffic since the congestion could
be short-lived.
3. Shortest paths or minimum cost paths are not always the best choice during
122
congestion, considering the funneling effect as discussed in Section 4.3.1. Our
goal is to divert data traffic in a timely manner to avoid congestion while
maximizing the application fidelity, even if it requires involving sub-optimal
paths through a virtual sink.
In this chapter, we investigate a different approach. We explicitly expose the
existence of special nodes, the virtual sinks, to the network to fully exploit the het-
erogeneity in the network. Different radio technologies are designed and optimized
for different usages. For example, the RFM [37] or Chipcon [46] radios used on
the Berkeley Mote platforms [23] are energy efficient, but support low data rates
in short range, while the IEEE 802.11 radio supports high data rate, long range
operations but consumes a large amount of energy. Because of the different radio
characteristics, the best routing strategy for different radio networks is likely to be
different. For example, the RFM radio’s lack of frequency diversity makes it vulner-
able to multipath fading and it suffers from time-varying link quality, while IEEE
802.11 radio offers a much better resistance to multipath channel fading because of
its spread spectrum design. Therefore, different routing metrics are likely to be used
in different radio networks to take advantage or to complement the characteristics
of different radio technologies. Section 4.4. presents an algorithmic approach that
supports the above features.
4.4. Siphon Design
The key aim driving the design of Siphon is to exploit the existence of virtual sinks
in the network to siphon overload traffic from the network, thereby mitigating the
funneling effect. The realization of this idea requires a protocol with the follow-
ing characteristics: (i) an energy-efficient mechanism to discover the existence of
virtual sinks in the neighborhood, (ii) an accurate congestion detection technique
123
to determine the correct time to utilize the discovered virtual sinks, and finally
(iii) a transparent mechanism to influence the dissemination path of event data. In
what follows, we discuss the detailed design of these components. While our design
addresses overload traffic management in sensor networks we believe the Siphon al-
gorithms are more generally applicable to a broader class of new applications that
exploit special nodes with additional capability (e.g., dual radio, more computa-
tional capability or more storage).
4.4.1 Virtual Sink Discovery and Visibility Scope Control
Like any new service, we envision Siphon may be deployed in an incremental fash-
ion, either for logistical reasons or in response to anticipated traffic characteristics.
Specifically, the physical sink might not be equipped with a secondary radio1. As a
result, there is no guarantee that the virtual sinks (VSs) can form a connected sec-
ondary network rooted at a physical sink through their long-range radio. Further,
due to the relatively sparse required concentration of VSs, as discussed in Sections
4.3.2 and 4.5.8, there is no assurance that a VS is adjacent to a congested region.
Consequently, a congested node requires a method to discover, in an energy-efficient
manner, the existence of a local VS that could be multiple hops away.
We propose an in-band signaling approach that embeds a signature byte into any
periodic control packets originated by a physical sink. In typical sensor network ap-
plications, a physical sink is required to send periodic signaling into the network for
management purposes. For example, Directed Diffusion requires periodic interest
refreshes [11], and in MultiHopRouter [89], a routing protocol included in TinyOS
1Siphon’s algorithms do not assume that the physical sinks have secondary radios and canoperate even under these conditions. However, in the case of our application we generally assumethat physical sinks are equipped with secondary radios. We do not assume, however, that aconnected VS overlay exist including the physical sinks, even though this is likely under theapplications we consider.
124
[19] for mote-based sensor networks, route control messages are periodically broad-
cast from each node in the network to estimate routing cost and monitor link quality.
In these cases the Siphon signature byte can ride for free, allowing for nearly zero-
overhead VS discovery. For applications that do not require periodic sink control
messages, an independent signature byte application is invoked that broadcasts low
rate (once per few minutes) VS signaling messages from a physical sink, resulting in
a small overhead. This overhead can be minimized through smart management at
the sink, as discussed in Section 4.4.2.2. As shown in the following discussion, the
embedded signature byte approach to VS discovery is also used for controlling the
visibility of the VS to its neighbors.
This signature byte contains a VS-TTL (virtual sink TTL) field that specifies
the scope (hop count) over which a VS is advertised. A VS-TTL of l allows nodes
up to l hops from a VS to utilize Siphon’s congestion mitigation services. Clearly,
a larger value of l allows more nodes to utilize a local VS, but increasing l does not
necessarily lead to better network performance. First, packets from nodes reached
only by a large l have longer paths to the VS and may not benefit from its use.
Also, a broad VS scope advertisement increases the chance of localized congestion
around a VS (where each VS potentially creates a funneling effect similar to the
original problem). On the other hand, a smaller value of l implies shorter redirect
paths, improving delivery latency and energy consumption, but confines the benefit
to fewer nodes. Section 4.5.5 investigates the tradeoffs involved in determining the
optimal value of l under different conditions.
The handling of signature byte messages is different for VS and non-VS nodes
in the network; the process flow for each case is outlined below. Note that physical
sinks that do not have a secondary radio broadcast Siphon control packets (i.e.,
any non-data packet that include a Siphon signature byte) with the VS-TTL set to
125
NULL; otherwise, physical sinks set VS-TTL to l.
For VS nodes:
For any incoming packets,
IF (non-data packet)
IF (signature byte exists)
IF (packet arrives via secondary radio)
set VS-TTL to l;
ELSE
leave VS-TTL as NULL;
ENDIF
identify the forwarder of this packet,
set it as the next Siphon hop;
ENDIF
forward the packet through both radios;
ENDIF
Notice that VSs receiving control packets containing the Siphon signature byte
via their low power radios leave the VS-TTL as NULL and thus do not advertise
their presence to the neighborhood. Such a VS has no path to a physical sink via
its secondary network and, thus, other nodes derive no extra benefit by forwarding
packets through this node. However, the Siphon protocol definition allows for a
graph of VSs not connected to any dual-radio physical sink to carry traffic on its
secondary network. We evaluate this scenario in Section 4.5.7 and discuss whether
this yields any performance benefits.
For non-VS nodes:
For any incoming packets,
126
IF (non-data packet)
IF (signature byte exists)
IF (VS-TTL > 0)
identify the forwarder of this packet,
set it as a VS neighbor;
VS-TTL −−;
ENDIF
ENDIF
forward the packet;
ENDIF
Note that the existence of a VS neighbor indicates a VS is located in the neigh-
borhood and can be reached through this specific neighbor. Through this procedure
a sensor maintains a list of neighbors through which neighborhood VSs are accessi-
ble. Because of the small fraction of VSs in the network (as governed by our small
world insights from Section 4.3.2), there is usually only one neighbor in the list.
Therefore, the memory overhead for maintaining a VS list is negligible. In fact, in
many cases the overhead could be reduced to a single bit in each neighbor entry of
the routing table.
4.4.2 Congestion Detection
Accurate and efficient congestion detection plays an important role in the Siphon
framework inasmuch as it indicates the proper time a sensor should attempt to
utilize any VSs it has discovered. We describe two techniques for congestion detec-
tion control: (i) node-initiated congestion detection; and (ii) physical sink initiated
“post-facto” actuation of the VS infrastructure. In what follows we discuss the two
127
techniques and their application in Siphon.
4.4.2.1 Node-initiated Congestion Detection
In our previous CODA work discussed in Chapter 3 we describe a CSMA-based,
energy-efficient congestion detection technique where wireless receivers use a com-
bination of the present and past channel loading conditions, obtained through a
low-cost sampling technique, and the current buffer occupancy to infer congestion.
In Siphon, we adopt these mechanisms proposed in Chapter 3 to determine the local
congestion levels that a node is experiencing.
While the above congestion detection scheme is CSMA-based or contention-
based, the idea can be generalized to other MACs that are often used in sensor net-
works, including schedule-based [71] [72] and hybrid-based MACs (e.g., S-MAC [14],
T-MAC [16]). For pure schedule-based MACs that attempt to guarantee collision-
free communication, queue occupancy provides a good measure of the congestion
level. For hybrid-based MACs such as T-MAC [16], a good measure is a combina-
tion of the queue occupancy and the duty cycle length of the scheduled activity of
a node.
However congestion level is measured, when the local channel load approaches
or exceeds the theoretical upper bound of the channel throughput [82], or when the
buffer occupancy grows beyond a high water mark, a sensor node located within
the visibility scope of a VS will activate its redirect algorithm (see Section 4.4.3 for
details) to divert designated traffic (e.g., data impulses, prioritized traffic, etc.) out
of the neighborhood, utilizing the VSs. To best counter the funneling effect, it is
essential to redirect data impulses as early as is possible in the propagation funnel.
However, in order not to diminish any possible aggregation effort of correlated data
in the network (aggregation is most effective deep in the funnel), it is beneficial to
128
redirect traffic later in the funnel. To achieve a balance, it is best to redirect data
at a location just before congestion is most likely to occur in the funnel. In Section
4.5.4 we verify this conjecture.
4.4.2.2 Physical Sink initiated “Post-Facto” Congestion Detection
As an alternative approach to the node initiated congestion detection discussed in
the previous section, we consider the “post-facto” activation of the VS infrastruc-
ture via congestion inference at a physical sink. The physical sink, as a point of
data collection in the funnel, can do smart monitoring of the event data quality and
the measured application fidelity [35], and initiate VS signaling only when the mea-
sured application fidelity is degraded below a certain threshold. In this approach,
the siphoning service is enabled only after congestion or fidelity degradation is mea-
sured in the primary low-power radio network. As such, the approach has limited
capabilities dealing with transient congestion deep in the network, but may be ad-
equate when congestion occurs closer to the physical sink. This technique has the
advantage of not requiring underlying congestion detection support at each node.
To propagate the signal in a timely manner from the physical sink, a control mes-
sage is broadcast through its non-congested secondary radio network (a connected
secondary network is required). Because the traffic siphoning in the “post-facto”
approach is based on the perceived performance measured at the physical sink but
not the congestion levels experienced in the network, we conjecture that it also has
the advantage of avoiding premature traffic siphoning especially when network-wide
aggregation [90] is used. In Section 4.6., we examine the effectiveness of employing
the “post-facto” congestion approach in a sensor network testbed.
129
4.4.3 Traffic Redirection
Traffic redirection in Siphon is enabled by the use of one redirection bit in the net-
work layer header. We consider two approaches in setting the redirection bit: (i)
on-demand redirection, in which the redirection bit is set only when congestion is
detected; and (ii) always-on redirection, in which the redirection bit is always set.
We discuss the tradeoffs of these two approaches in Section 4.5.6. The basic redirec-
tion mechanism is as follows. A sensor that receives a packet with the redirection
bit set forwards the packet to its VS neighbor, a process through which the redi-
rected packet would eventually reach a VS. If the redirection bit is not set then
routing follows the paths determined by the underlying data dissemination/routing
protocol.
When a VS receives a redirected packet, it forwards it to the neighbor through
which it most recently received a control message embedding the signature byte.
As discussed in Section 4.4.1, such control packets can arrive either through a VS’s
primary or secondary radio interface. In the best case, all VSs are connected to a
physical sink via the secondary network overlay and all physical sink-bound packets
are routed through the VS are forwarded on a fast track all the way to the physical
sink. When the secondary network is partitioned, the last VS (closest to the sink)
in the secondary network fragment must direct all sink-bound packets back on the
primary network, specifically to the sensor it has identified in the discovery phase.
From here, packets are again routed to the physical sink according to the default
routing paths.
Recent experimental studies [41] [89] show that sensor networks using low-power
radios often suffer from highly variant wireless link quality that is both time and
location dependent. To ensure that traffic siphoning through the VS infrastructure
does not degrade the network’s primary packet forwarding service only neighbors
130
with good link quality are utilized to redirect packets to a VS. Many routing proto-
cols (e.g., MultiHopRouter [89]) maintain a neighbor table that includes a continu-
ously updated link quality estimation for a selected set of neighbors. When a sensor
located within the visibility scope of a VS detects congestion while forwarding event
packets, it makes a decision to redirect a specific type of data packet based on local
policy, as discussed in Section 4.3.3.
As a general policy rule for traffic redirection in Siphon, an alternate (redirect)
next hop neighbor must have a link estimation that is within 15% (lowerbound) of
the link estimation of the current chosen next hop. Otherwise the VS should not be
utilized. If the redirect policy parameters are met, the congested sensor marks the
redirection bit in the routing header of the data packet being forwarded and redirects
it to a VS neighbor selected from its local list. Conformance with an appropriate
policy allows use of the VS infrastructure to improve application data fidelity by
bypassing funnel congestion, without incurring potentially an unacceptable level of
packet loss through use of low quality links to the local VS.
VSs offer shortcuts and possibly higher bandwidth pipes for data delivery in
sensor networks. Traffic siphoning through VSs may subtly impact the routing pro-
tocols operating in the primary and secondary network only if the routing metrics
used are sensitive to enhanced service characteristic, such as the delay or loss as-
sociated with the data delivery paths in the network. For example, data-centric
dissemination protocols such as Directed Diffusion [11] and variants of DSDV-based
routing protocol [84], are capable of choosing empirically good paths that dynam-
ically adapt to changing network conditions. These protocols are therefore delay-
sensitive and their routing decisions could be impacted by traffic siphoning in a
subtle way2. In Section 4.5.2 we discuss these interactions and propose a simple
2As a general comment only protocols that base their routing decisions on the actual data
131
and elegant method to deal seamlessly with such behavior. As an example, we show
how Siphon interwork seamlessly with Directed Diffusion.
Using the same routing logic (e.g., Directed Diffusion) on both radio networks
has the benefit of having simpler and consistent control/management at the network
level. However, as discussed in Section 4.3.4, this approach limits the flexibility for
optimization. In general, different routing protocols use different routing metrics.
As discussed in Section 4.3.4, the primary network and the secondary network can
run different routing protocols that are completely independent and optimized for
each radio network. We conjecture this is the favorable approach when the two
radio networks have significantly different communication properties. In that case,
there is minimal interaction between the two networks once packets are redirected
to a nearby VS. For example, Stargate supports running AODV [93] on its IEEE
802.11 network while interfacing with a mote-based sensor network running the
single-destination MultiHopRouter protocol [89]. The routing decisions on both
networks are completely independent from each other and hence traffic siphoning
requires minimal or no interaction with the routing protocols. As a result, Siphon
works naturally with this approach.
4.4.4 Congestion in the Secondary Network
The traffic siphoning service is complementary to the first generation congestion
control schemes such as CODA and Fusion [76], and as such can be run in par-
allel with these techniques on the primary and secondary networks. When the
secondary network is also overloaded, traffic redirection through VSs offers little
benefit. Therefore, a VS always monitors its own congestion levels on both primary
delivery service perceived at the receiver are potentially impacted. Other protocols, that base theirrouting decisions on fixed routing metrics, such as shortest path routing, geographical routing [91]or routing on a curve [92] approach are not affected.
132
and secondary radio channels and does not advertise its existence when either one
of its radio networks is overloaded. For the IEEE 802.11 radio (which we use in our
experimentation), Murty et al. [94] propose an algorithm to calculate the normal-
ized collision-induced bit error rate as part of their scheme to predict congestion
and dynamically adjust the MAC parameters for throughput optimization. We use
this technique as a reliable scheme to detect congestion on Siphon’s secondary IEEE
802.11 network. This forces a VS to refrain from offering service or to reduce its
scope of service according to the level of detected congestion.
When both primary and secondary networks are overloaded, the congestion lev-
els on both networks will eventually rise beyond certain thresholds. In that case,
CODA’s backpressure mechanism (or the similar mechanism in Fusion) will be trig-
gered, (i.e., the system falls back to the traditional schemes that rate-control the
source and forwarding nodes to alleviate congestion [82]). In general, VSs are less
likely to be congested since they can send and receive packets at the same time
through the two different radios, in channels with different characteristics (fading,
throughput, delay, etc.).
4.5. Performance Evaluation
We use packet-level simulation to obtain preliminary performance evaluation results
for Siphon. We also discuss the implications of our results on the design choices that
shape Siphon.
4.5.1 Simulation Environment
We implement Siphon as an extension to the ns-2 [60] simulator in its simplest in-
stantiation. First, to model a VS node we add support for a second long-range radio
interface that has a transmission range of 250m. The primary low-power radio used
133
in our simulations is configured to have a 40m transmission range to model a typical
sensor node. We use Directed Diffusion [11] as the routing core in the simulations,
which allows the simulations to shed light on Siphon’s interaction with a realistic
data routing model where congestion can occur. For data-centric dissemination pro-
tocols that utilize empirically good paths, such as Directed Diffusion [11], there is
a simple and elegant scheme for transparent traffic redirection through the use of
a delay device. We describe the specifics of this scheme, including the appropriate
local redirection rules, in Section 4.5.2.
Our simulations use the 2 Mbps IEEE 802.11 MAC provided in ns-2 with some
modifications. We add code to the MAC to measure channel loading using the epoch
parameters (N = 3, E = 200ms, α = 0.5), as defined in Chapter 3. The MAC of
a node sets a congestion flag at the routing agent when the measured channel load
exceeds a threshold of 70%. To perform early congestion detection, as discussed
in Section 4.4.2, it might be beneficial to trigger the congestion flag at a lower
threshold. We verify this in Section 4.5.4.
In all our experiments, we use random topologies with different network sizes.
For each network size, our results are averaged over five different generated topolo-
gies and each value is reported with its corresponding 95% confidence interval. In
most of our simulations, we use a fixed workload that consists of six sources and
two physical sinks. A sink subscribes to three data types corresponding to three
different sources [11]. Note, however, that the network dynamics in the simulations
are non-deterministic because each sink subscribes at a random time to a set of
sources that is randomly chosen over different simulation sessions. Therefore, the
congestion periods and locations are also non-deterministic because of Directed Dif-
fusion’s ability to choose empirically good paths that are dynamically adapted to
the network conditions.
134
To model the large-scale, impulse type data traffic that is generated from an event
epicenter, all sources are located in a neighborhood of a node randomly selected from
nodes in the network. Sinks are uniformly scattered across the sensor field.
4.5.2 Delay Device and Directed Diffusion
Here we describe a scheme to seamlessly integrate Siphon redirection with data-
centric dissemination protocols using Directed Diffusion as an example. In Directed
Diffusion, the sources initially generate low rate data packets that are marked ex-
ploratory and disseminated through multiple paths toward the physical sink. Based
on the measured delivery performance, the sink later reinforces one or more empir-
ically good paths capable of delivering high quality data traffic, (i.e., with lowest
latency and highest fidelity delivery). Subsequently, the sources generate higher
rate data packets, no longer marked exploratory, which are transported along the
reinforced paths.
As mentioned in Section 4.4.3, when routing protocols are delay-sensitive, the
enhanced service offered by Siphon can affect routing decisions. This is certainly the
case for Directed Diffusion, which is used as the routing protocol for both primary
and secondary networks in our simulations. Specifically, exploratory data packets
traversing the low-delay paths provided by the VS secondary network will almost
certainly reach the physical sink before packets passing through the primary net-
work. As a result, paths using the VS secondary network will always be reinforced,
regardless of the congestion state of the network when using delay sensitive pro-
tocols such as Directed Diffusion. In Section 4.5.6, we discuss the merits of such
always-on operation of the secondary network. In general, however, a mechanism
that allows for the conditional (i.e., “on-demand”) usage of the low-delay VS paths
is required.
135
To this end, we implement a delay device on each VS that operates on the
secondary radio interface and is activated whenever a VS forwards an exploratory
data packet through the long-range radio. The device delays the forwarding of
exploratory data packets via the secondary radio by D seconds. D should be large
enough such that these exploratory data packets will not be the first to be delivered
to the physical sink, instead allowing packets to reach the physical sink via the
primary radio network first. For example, in our simulations we use the maximum
round-trip delay between two nodes that are furthest apart from each other in the
network as the value of D for the delay device. In this manner, paths on the primary
network (instead of the secondary network via VSs) will be reinforced by Directed
Diffusion, and this situation will persist while the network is in a non-congested
state.
When a node within the visibility scope of a VS detects congestion while for-
warding data packets, it takes action such that the VS secondary network becomes
more attractive to Directed Diffusion, allowing traffic to be siphoned from the con-
gested region through the VSs. Specifically, such a node selectively duplicates a
data packet (e.g., one in every fifty data packets), marks it exploratory (using the
Directed Diffusion exploratory bit) and sets the redirection bit (the same bit as
described in Section 4.4.3), and forwards to its VS neighbors. Note that the orig-
inal packet is still forwarded along the existing routing paths during this period.
A VS receiving an exploratory data message with the redirection bit set will dis-
able the delay device and forward the message immediately through both interfaces
(assuming they have matching gradient entries [11]). Without the delay added by
the delay device, a dissemination path over the VS secondary network is likely to
be reinforced by the sink. Subsequently, high rate data will be redirected over the
secondary network, until the congestion ends. At that point, the node that had
136
originally signaled the congestion stops setting the redirection bit in data packets
it forwards through the VS. The VS will reinstitute the delay device, and Directed
Diffusion will ultimately again reinforce the best path(s) on the primary network.
Note that the low rate exploratory data message serves the purpose of probing for
new routes that are better (e.g., have lower latencies). If the VS’s long-range radio
link is overloaded, then delivery performance will suffer and it will not necessarily
be used. This self-balancing feature helps to prevent an overloaded VS from being
utilized.
The use of delay devices on the VSs provides an elegant way for Siphon to seam-
lessly interact with Directed Diffusion. This delay device resolves any undesirable
behavior that is brought about through the interaction of Siphon when using delay-
sensitive routing scheme such as Directed Diffusion. It can also be used by the VS
to assert active control. For example, a VS may selectively enable/disable the delay
device based on its own view of the environment or local conditions, such as the
amount of energy left. A VS that is not energy-constrained might disable the delay
device, thus enabling the siphoning function even when there is no congestion in the
neighborhood.
4.5.3 Energy Tax, Fidelity Ratio and Residual Energy
We define three metrics to analyze the performance of Siphon on sensing applica-
tions:
• Average Energy Tax - this metric is the ratio between the total number of pack-
ets dropped3 in the sensor network and total number of packets received at the
physical sinks over a simulation session. Since packet transmission/reception
3Dropped packets include the MAC signaling (e.g., RTS/CTS/ACK and ARP), event data, andDiffusion messaging packets.
137
consumes the main portion of the energy of a node, the number of wasted pack-
ets per received packet directly indicates the energy saving aspect of Siphon
when compared to the case without it.
• Average Fidelity Ratio - we define the data fidelity as the delivery of the
required number of data packets within a certain time limit. This metric is
the ratio between the average number of data packets received at a physical
sink when using Siphon and when using vanilla Directed Diffusion. The ratio
indicates the fidelity improvement/degradation by using Siphon.
• Residual Energy - we use the ns-2 energy model for IEEE 802.11 network
to measure the remaining energy of each node at the end of a simulation.
This metric is calculated by normalizing the remaining energy to each node’s
initial energy. The residual energy distribution allows us to examine the load
balancing feature of Siphon and to estimate effective network lifetime.
We use these three metrics to evaluate and quantify the benefits of using Siphon
under different scenarios and configurations in the following sections.
4.5.4 Early Congestion Detection
Using fidelity and energy tax performance as a guide, we first search for a congestion
(channel load) threshold that will trigger traffic siphoning to best avoid congestion
in the funnel. We simulate a network of 30 nodes, where 2 nodes are randomly
selected as VSs, one of which is also selected as the physical sink. There is only one
VS within the network that can be utilized to redirect data traffic. Six nodes are
selected as sources according to the process described previously in Section 4.5.1.
Each source generates 15 packets per second sent toward the physical sink starting
at a random time distributed uniformly from 10 to 15 seconds into the simulation,
138
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 20 30 40 50 60 70 80 90 100
Nor
mal
ized
Rat
io
Congestion level threshold (Channel load %)
Fidelity RatioEnergy Tax
Figure 4-3: Early Congestion Detection. Different congestion level thresholds thatcan avoid congestion down the funnel.
and run for 100 more seconds. In the simulations, we vary the congestion threshold
at which we should start redirecting data traffic to a nearby VS. Figure 4-3 plots
both metrics (fidelity and energy tax) against different congestion levels.
In the simulation, we strategically place the VS at a location within a few hops
of the propagation funnel toward the physical sink. Figure 4-3 shows that as long
as the VS is utilized for traffic siphoning, the data fidelity is improved regardless of
the congestion level threshold. However, the energy tax of the network rises quickly
when the threshold is set higher than 80%. In our simulations, we observe that a
channel utilization of 80% is where the channel saturates and suffers from frequent
collisions between neighboring nodes. Note that this is also the threshold chosen to
trigger CODA’s open-loop backpressure scheme in Chapter 3. This indicates that
when the threshold is set too high it is too late to divert traffic at a location that
is deep in the funnel. Considering Siphon is a complementary scheme (to CODA)
that prevents congestion by diverting traffic earlier in the funnel, Figure 4-3 indicates
that a threshold that is slightly lower than the channel saturation level would be
appropriate. For example, 70% is appropriate since the energy tax is only slightly
139
-1.5
-1
-0.5
0
0.5
1
1.5
2
1 1.5 2 2.5 3 3.5 4 4.5 5
Fid
elity
Rat
io/E
nerg
y T
ax S
avin
g
Visibility Scope of Virtual Sink (# of hops)
FidelityEnergy Tax Saving
Figure 4-4: The impact of the visibility scope of a VS for a network of 30 nodes.
higher than that incurred by the lower thresholds. Notice that utilizing a VS at a
lower threshold means its energy is more quickly drained.
While a high buffer occupancy can also serve as a good indicator for congestion,
we observe that, in our simulator, it grows at a much slower rate than the channel
load. In Section 4.6.2 we investigate an appropriate buffer occupancy level threshold
that best predicts congestion in our sensor testbed.
4.5.5 Virtual Sink’s Visibility Scope Impact
In what follows, we investigate the visibility scope of a virtual sink. We vary the
scope l from 1 to 5 and measure the fidelity ratio as well as the average energy tax.
In Figure 4-4, the energy tax is normalized such that it represents the energy tax
savings when using Siphon.
Figure 4-4 shows that for all values of l, the average fidelity ratio is larger than
1 (despite the high variability when l is larger than 2), indicating that fidelity can
be improved whenever a VS is utilized. However, the energy tax savings decreases
when l is larger than 2, and drops rapidly below zero, indicating that the nodes
actually consume more energy when they utilize and redirect data traffic to a VS
140
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
30 40 50 60 70 80 90 100 110 120
Nor
mal
ized
Rat
io
Network Size (# of nodes)
Fidelity RatioEnergy Tax Saving
Figure 4-5: Fidelity and Energy Tax performance in a network where there arealways-on virtual sinks.
that is more than two hops away. Through careful examination of the details of our
simulation, we observe that when l is larger than 2, it often creates local congestion
around the VS as more nodes within the funnel are trying to redirect data through
the same VS. This causes frequent collisions and therefore more packet drops and
more retransmissions.
Figure 4-4 shows that when l is 2, both the fidelity gain (20%) and energy tax
saving (60%) are the highest, and have smaller confidence intervals, indicating that
the optimal scope is l equal to 2.
4.5.6 Always-on versus On-demand Virtual Sinks
An always-on VS continuously powers up the secondary radio to help forward nearby
data traffic, regardless of the congestion conditions in the neighborhood. This can
help to enhance data delivery service in the field at the expense of consuming more
of its own energy. On the other hand, an on-demand VS will not power up its
long-range radio unless its visibility scope overlaps a congestion region. In what
follows, we specifically investigate the tradeoff of these two approaches. To model
141
0
0.5
1
1.5
2
2.5
3
30 40 50 60 70 80 90 100 110 120
Nor
mal
ized
Rat
io
Network Size (# of nodes)
Fidelity RatioE-Tax Saving
Fidelity Ratio, PartitionedE-Tax Saving, Partitioned
Figure 4-6: Fidelity and Energy Tax performance in a network where there arevirtual sinks that are put into service only when congestion is detected.
an always-on VS, we simply disable the delay device on the node, as discussed in
Section 4.5.2.
Figure 4-5 and 4-6 present the fidelity ratio and energy tax saving performance
of these two approaches in a set of networks of different sizes. Six sources and two
physical sinks form two propagation funnels in the network. In each simulation,
5% of the nodes are randomly selected to be the VSs, the process is similar to that
discussed in section 4.3.2. In this scenario, the 5% VSs are uniformly distributed
across the field and form a connected secondary network over long range radios.
Figure 4-6 also includes another set of plots that present the scenario when the VSs
can not form a connected secondary network, as discussed in Section 4.5.7.
Always-on VSs are utilized whenever they are able to deliver data event to a
physical sink with lower delay and higher fidelity. Figure 4-5 shows that Siphon is
able to obtain greater fidelity gain in a larger network, although the energy gain does
not follow the same trend. The fidelity gain increases almost linearly with increasing
network sizes, while the gain in energy tax levels off after a network size of 50 nodes.
This indicates that when the number of nodes in the network increases, the number
142
of dropped packets increases almost linearly because of longer propagation path
and more intense funneling effect. But with Siphon, the VSs are able to siphon off
events to maintain the fidelity level regardless of the linearly increasing number of
packet drops. Without Siphon, the packet delivery service degrades linearly while
the number of packets dropped (wasted) increases rapidly. This indicates that the
energy tax of Siphon degrades much slower than the vanilla Directed Diffusion.
When the network size increases, Siphon can continue to obtain larger fidelity gains,
although the energy benefit obtained does not keep pace.
Figure 4-6 closely agrees with Figure 4-5, except with a much higher degree of
variability (indicated by the error bars that represent the 95% confidence intervals).
This indicates that instead of utilizing VSs in an always-on fashion, possibly ex-
hausting the energy of the VS, on-demand VSs that power up the secondary long
range radio only in times of congestion and can achieve almost as good energy sav-
ings and fidelity improvement may be preferred. Figure 4-6 also shows the efficacy
of our congestion detection scheme since it enables the nodes within the visibility
scope of a VS to correctly detect congestion and utilize the nearby VS. However,
the on-demand nature of this approach increases the dynamics and introduces more
disturbance into the network, hence the high degree of variability in the plot. This
result clearly illustrates the tradeoff between data delivery, service stability, and
energy consumption of the VSs.
4.5.7 Partitioned Secondary Network
If only a small number of VSs are deployed in the network or if the physical sink
does not support a secondary radio, then the VSs may not form a connected network
and the primary short-range channel must be used to deliver packets between VSs.
To model this, we move VS functionality from one of the physical sinks to another
143
node. This action conserves the number of VSs, while partitioning the secondary
network. Each of the simulations described in the previous section is repeated. The
result is presented in Figure 4-6, which shows that both the fidelity and energy tax
gains are much smaller (and have higher variability) than their connected network
counterparts, especially for energy tax performance in smaller networks. For exam-
ple, in a 30-node network, the energy tax sometimes is even higher than without
siphoning, indicated by the error bars of energy tax saving that include negative val-
ues in Figure 4-6. We observe that in smaller networks, the paths that connect VSs
through the primary channel often coincide with the original propagation funnels
toward the physical sinks. Although this could improve load balancing by diverting
traffic through VSs and their surrounding neighbors, it does not eliminate network
bottlenecks caused by funneling effect. This result suggests that a connected sec-
ondary network is required to reap a consistent benefit from traffic siphoning for
the purpose of congestion avoidance.
4.5.8 VS Density Impact
In section 4.5.5 we demonstrate that a visibility scope of 2 hops is most appropriate.
Section 4.5.7 describes the two disadvantages of having partitioned secondary net-
work. These results influence the distribution of VSs in a network. Kumar and Xue
[86] proved an asymptotic result for full connectivity within a randomly distributed
wireless network, i.e. in a wireless network consisting of n nodes, the network is
asymptotically connected if each node connects to greater than 5.1774 log n nearest
neighbors.
This result provides an analytical foundation and an elegant strategy for con-
nected ad hoc network deployment. Consider a radio communication range of r,
using the uniform iid node placement assumption in [86], we can derive that a ran-
144
100^2
200^2
300^2
400^2
500^2
600^2
100 200 300 400 500 600 700 800 900 1000
2
4
6
8
10
12
14
16
Are
a of
net
wor
k co
vera
ge (
m^2
)
# of
V-s
inks
Number of sensor nodes
Connectivity-proof requirement of network coverage
Ordinary SensorVirtual Sink (right y-axis)
Figure 4-7: Number of sensor nodes required to ensure connectivity in the corre-sponding areas of network coverage as well as the number of VSs (right vertical axis)required to ensure performance improvement.
domly distributed network of N nodes can fully cover an area and guarantee full
connectivity therein if it fulfills the constraint:
Area =N ∗ π ∗ r2
5.1774 log N, (4.1)
Meanwhile consider a visibility scope of l hops, we can calculate the number of VSs
required in this network deployment as:
Nv−sink =Area
π(lr)2=
N
5.1774 log N ∗ l. (4.2)
Using equation 4.1 and 4.2 one can determine the number of sensors and VSs
required to populate a designated area of interest. Figure 4-7 plots the above ex-
pressions numerically against number of sensors N . The radio communication range
of a sensor is r = 40m, while the long-range radio communication range of a VS
is 250m and the visibility scope l is 2 hops. With this specific setup, according
145
1.5
2
2.5
3
3.5
4
0 100 200 300 400 500 600 700 800 900 10001.5
2
2.5
3
3.5
4
Frac
tion
of V
-sin
k (%
)
Number of sensor nodes
Fraction of Virtual sinks needed to assure improved network performance
Percentage of V-sinks
Figure 4-8: Fraction of Virtual Sinks needed to assure improved network perfor-mance.
to Figure 4-7, an area of 6002m2 would require 1000 randomly distributed sensors
to ensure network connectivity, while 16 VSs are enough to guarantee performance
improvement from siphoning. To further understand the cost of siphoning, Figure
4-8 presents the fraction of VSs needed in this scenario. We observe that as the
network size increases, the cost actually decreases, e.g. only 1.6% of nodes needed
to be VSs for a 1000-node network.
On the other hand, the connected secondary network requirement imposes a
lower-bound on the ratio between the transmission range of the long-range radio
and the low-power radio. In Figure 4-9, we consider the network area covered
by the secondary long-range radio of VSs with different transmission range ratios,
and plot them against the number of VSs in the network. We observe that for a
visibility scope of l = 2, when the transmission ratio > 5, the network coverage
of the required number of VSs is always smaller than the network coverage of the
same number of VSs needed to ensure connectivity in secondary network. In other
words, for a specific area, the number of VSs required for a visibility scope of 2 is
146
200^2
300^2
400^2
500^2
600^2
700^2
800^2
2 4 6 8 10 12 14 16
Are
a of
net
wor
k co
vera
ge (
m^2
)
Number of V-sinks
Fully connected secondary network requirement
6.25x Xmit Radius5x Xmit Radius4x Xmit Radius3x Xmit Radius2x Xmit Radius
Visibility scope of 2
Figure 4-9: Requirement for a connected secondary network. The transmissionrange of the long-range radio is expressed as the multiples of transmission radius ofthe low-power radio. The visibility scope requirement that assures both energy taxand fidelity improvement is plotted as filled square in the figure.
always larger than that required to ensure full connectivity in the same area. This
indicates that if the transmission range of a VS’s long-range radio is at least 5 times
that of its low-power radio, then the number of VSs required for visibility scope of
2 also guarantees a connected secondary network. Therefore, Figures 4-7 and 4-8
provide an important and accurate roadmap for network deployment.
4.5.9 Load Balancing Feature
In what follows, we study the load balancing feature of Siphon in terms of its energy
impact on the network. We simulate a moderate-size network of 70 nodes. 3 VSs
are scattered at random locations in the network. One of these VSs is selected
as the physical sink, subscribing to six randomly designated sources that generate
data at 10 packet per second. We measure the residual energy of each node at the
end of each simulation and plot the average complementary cumulative distribution
frequency (CDF) of the residual energy distribution of the network in Figure 4-10.
147
0
0.2
0.4
0.6
0.8
1
60 65 70 75 80 85 90 95 100
Com
plem
enta
ry C
DF
(P
rob
X>
b)
Residual Energy (%)
Directed DiffusionSiphon+Directed Diffusion
Figure 4-10: Energy Distribution (Complementary CDF) of a 70-node network with3 virtual sinks scattered randomly across the network. With Siphon’s load balanc-ing feature, more nodes share the energy load. Therefore, fewer nodes have residualenergy larger than 85%, but more nodes have larger residual energy (e.g., the per-centage of nodes having residual energy larger than 75% increase from 60% to 85%),effectively increasing the operation lifetime of the network.
Figure 4-10 shows that the minimum residual energy of the network increases
from 67% to 72%. In other words, all of the nodes have residual energy larger than
72% of its initial energy capacity. However, the plot also shows that the probability
of any nodes having residual energy larger than 86% is 0%, while without Siphon,
the probability is higher. This indicates that the maximum residual energy of the
nodes decreases because more nodes are involved in forwarding packets in Siphon,
thus, more nodes share the energy consumption. In other words, there are fewer rich
nodes (in terms of energy), but overall there are more richer nodes in the network
as a result of the load balancing feature of Siphon. Note that there is no node
possessing residual energy more than 88%; all nodes at least spend some energy.
This is because of the periodic interest flooding requirement of Directed Diffusion.
In summary, Siphon can balance the load in the network so that more nodes have
higher residual energy as more nodes share the energy load, effectively increasing
148
the operational lifetime of the network.
4.6. Sensor Network Testbed Implementation
In this section, we discuss the implementation of Siphon on a real sensor network
using the TinyOS platform [19] on Mica motes [23] and the Stargate platform [48].
We report evaluation results, including appropriate threshold values for congestion
levels that should trigger the traffic redirection, and an evaluation of Siphon in a
generic data dissemination application as compared to CODA [82]. We also evaluate
a “post-facto” approach that activates the VSs only after congestion has occurred
and impacted the application’s fidelity, as discussed in Section 4.4.2.2.
4.6.1 Stargate and Mica Mote Testbed
The sensor device (Mica) has an ATMEL 4 MHz, low power, 8-bit microcontroller
with 128K bytes of program memory, 4K bytes of data memory. The radio is a single
channel RFM radio transceiver operating at 916 MHz and capable of transmitting
at 10 kbps using on-off-keying encoding. The Stargate is a powerful single board
computer with enhanced communications and sensor signal processing capabilities
that runs a version of embedded Linux. The Stargate uses Intel’s 400 MHz X-Scale
processor and supports serial communication with Mica through a 51-pin connector.
Also, the Stargate supports a PCMCIA slot where we can install a IEEE 802.11b
network card, enabling the Stargate to become a VS that talks to both the long-
range IEEE 802.11 network as well as the short-range RFM radio network formed
by the Mica motes.
We implement CODA’s open-loop control function that supports priority for-
warding of packets from a list of pre-defined data types. This includes a channel
load measurement MAC module as described in Chapter 3 on Mica. To support
149
Virtual Sink
Source
Source
Source
Sink
Figure 4-11: A sensor network testbed of 30 nodes
Siphon, we use Stargates as VSs and implement the traffic redirecting function as
well as the VS visibility scope control function on both Mica and Stargate platforms.
4.6.2 Congestion Detection for Traffic Redirection Decision
An important decision that must be made when using Siphon is the congestion
threshold at which it is appropriate to start redirecting data traffic to a nearby VS.
To determine this threshold experimentally, we deployed a 30-node sensor network
using the Mica motes, as shown in Figure 4-11. The topology is arranged to capture
a funneling effect described in Section 4.3.1.
4.6.2.1 Channel Load Threshold
Due to the application-specific nature of sensor networks, an appropriate choice for
the early congestion detection threshold must be based on application loss tolerance
parameters. The energy tax that we define in Section 4.5.3 is an appropriate metric
150
0
5
10
15
20
25
30
35
40
45
50
10 20 30 40 50 60 70 80 90
Ene
rgy
Tax
Channel Load (%)
Single StreamThree Streams
Figure 4-12: Early congestion detection threshold. An appropriate choice for theearly congestion detection threshold must be based on application loss toleranceparameters.
to measure loss tolerances of an application because it represents the number of drops
in the network per single event delivered at the physical sink. In the experiment,
we setup a node at one end to be a source and a sink at the other end (Figure
4-11). The source generates data packets at different rates that drive the network
to different levels of congestion. Each scenario is repeated five times. We calculate
the average energy tax of the network and plot it against the average channel load
measurement of a node that is located at the middle position in the topology. Figure
4-12 shows that for a single stream of data traffic, the energy tax increases from 0
to 3 when the channel load increases from 10% to 30%. It increases exponentially
when the channel load reaches 70%, which indicates that the channel is saturated4.
To capture the more realistic traffic pattern and congestion scenario, in Figure
4-12 we also plot the curve for three forwarding streams involving three different
4Note that this result closely matches the result reported in Chapter 3 in which we measuredthe β value that limits the channel throughput upper-bound of a Mica mote radio to be around70%.
151
data sources disseminating data toward a sink (Figure 4-11). We observe that the
two curves closely match each other between the channel load range of 30% and
70%. This indicates that our channel load measurement is accurate in reflecting
the energy tax irrespective of the number of traffic streams in the network. This
is an important feature since a node is not generally aware of the number of traffic
streams in the network and should thus not be required to make routing decision
based on it (e.g., when to redirect the traffic to a nearby VS).
Following the rationale suggested in Section 4.5.4 from the simulation results, a
threshold that is slightly lower than the channel saturation level would be appro-
priate. According to Figure 4-12, a threshold that is between 60-70% should be
appropriate to trigger traffic siphoning to avoid congestion. However, considering
the much higher energy tax (around 10 for a channel load of 60%) of the RFM radio
network compared to the IEEE 802.11 network, we choose a smaller threshold, i.e.,
30%, for early congestion detection.
4.6.2.2 Buffer Occupancy Threshold
Queue management is often used in traditional data networks for congestion de-
tection, i.e., congestion is signified when a node’s buffer occupancy grows beyond a
high water mark level. However, as discussed previously in Section 4.5.4, we observe
through simulations that the channel load provides a much faster and reliable in-
dication of network congestion than buffer occupancy. While the same observation
holds true for our mote-based sensor testbed, we observe one exception, in which
case the time-varying channel suffers from occasional deep fades for an extended
time period. During this period, while the measured channel load is low, few pack-
ets can be delivered between forwarding nodes. Therefore, the queue of the sending
node grows quickly (when link-ARQ is used) and eventually overflows and starts
152
0
0.2
0.4
0.6
0.8
1
1.2
0.2 0.4 0.6 0.8 1
Del
iver
y R
atio
/Buf
fer
Occ
upan
cy
Channel Load/Utilization
Packet Delivery RatioBuffer Occupancy
Figure 4-13: Queueing performance and buffer occupancy threshold for congestionavoidance.
dropping packets. Based on this observation, it is beneficial to determine an ap-
propriate buffer occupancy level that can reliably indicate congestion in addition to
channel load indication.
In our testbed, we generate data packets at different rates and measure the aver-
age queue size of the nodes in a small neighborhood that share the wireless medium.
Figure 4-13 plots the measured normalized average buffer occupancy against the
channel load (utilization). We also plot the packet delivery ratio between neigh-
boring nodes in the same figure. We observe that the buffer occupancy is small
(≤ 15%) when the channel quality is excellent and the packet delivery ratio is high.
On the other hand, when the buffer occupancy ≥ 20%, the packet delivery ratio
drops quickly signifying the onset of congestion. Based on this result, we set the
buffer occupancy threshold to 20% in our testbed for all experiments discussed in
next section.
153
4.6.3 A Generic Data Dissemination Application
In what follows, we evaluate Siphon using a realistic data dissemination application
and compare the result with CODA’s open loop control. We reuse the sensor network
topology in Figure 4-11 to carry out the experiments, only this time we add two
Stargates into the sensor network testbed, one of which is also a physical sink.
The other is a VS that is placed at random locations in the testbed for different
experiments.
For every scenario, we collect data for three different cases:
• The vanilla application without any congestion control/avoidance mechanism
• CODA open loop control with priority support is used. One of the sources
(Src-3) generates data packets with higher priority than the other two
• Siphon with one VS that is placed at random locations for different experi-
ments.
Five independent experiments are conducted for each case and we calculate the
average energy tax savings and fidelity ratio. Both metrics are normalized to the
result obtained for the case without any congestion control/avoidance mechanism.
Figure 4-14 presents the results in bar charts. From the figure, we observe that
in terms of energy tax, CODA’s open-loop hop-by-hop backpressure scheme has
limited benefits in this scenario since the hotspot is far away from the sources and
the congestion is persistent. However, CODA’s priority support for src-3 improves
both its energy tax (up to 55%) and fidelity (up to 200%). On the other hand,
Siphon can improve both its energy tax (12% to 68%) and fidelity (10% - 110%) for
all sources.
154
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Ene
rgy
Tax
Sav
ing
Src−1 Src−2 Src−3
SIPHON
CODA
0
0.5
1
1.5
2
Fide
lity
Rat
io
Src−1 Src−2 Src−3
CODA
SIPHON
Figure 4-14: Siphon performance in a real sensor network of 30 nodes. Notice thepriority favor of CODA toward src-3.
155
0
0.5
1
1.5
2
2.5
3
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Nor
mal
ized
Rat
io
Average Traffic/Channel Load
Fidelity Ratio, Post-factoE-Tax Saving, Post-facto
Fidelity Ratio, Early DetectionE-Tax Saving, Early Detection
Figure 4-15: Post-facto traffic redirection versus early-detection approach.
4.6.4 Post-Facto Traffic Siphoning
As discussed in Section 4.4.2.2, a physical sink can infer congestion by monitoring
the event data quality, and enable “post-facto” traffic siphoning through secondary
network only when the measured application fidelity is degraded below a certain
threshold. We implement an application agent that analyzes in real-time the event
data delivery ratio of each source on the physical sink. The agent calculates the
moving average of the data delivery ratio using a window of five seconds, and will
initiate VS signaling when the measured delivery ratio is lower than 60% for at least
10 seconds. Figure 4-15 presents the results as compared to its early-detection based
counterpart under different traffic loads.
Figure 4-15 shows that while the post-facto approach does not perform as well as
the early-detection approach under high traffic load scenarios (≥ 50% channel load),
it performs as well in the lower traffic load region. In fact, the post-facto approach
performs better than the early-detection approach at traffic loads lower than 30%.
We observe that under low traffic load, the network sometimes suffers from poor
connectivity or frequent collisions due to hidden terminals during the periods in
156
which both the measured channel load and buffer occupancy are low. As a result,
the measured data delivery ratio degrades and triggers post-facto traffic siphoning
that improves subsequent data delivery, while in early-detection approach the VS is
not utilized because of the perceived low channel load and buffer occupancy.
4.7. Conclusion
There is a growing need for improved congestion control and load balancing support
in emerging sensor networks. The first generation of congestion avoidance mecha-
nisms for sensors are effective at limiting packet loss due to congestion and allowing
the network to find a stable operating point under increasing load. However, these
mechanisms are not sufficient to deal with the new types of congestion that are
an artifact of the funneling effect and a product of data impulse applications. In
this chapter, we have taken a new approach and proposed dual-radio virtual sinks.
We have discussed Siphon and its algorithms for deploying virtual sinks in sensor
networks. Siphon is evaluated using extensive simulations to gain insights into its
performance and properties under a variety of different conditions. Preliminary
results from an implementation of Siphon in an experimental testbed using Mica
motes and Stargate virtual sinks show that our approach provides substantial im-
provements over the first generation congestion control approaches.
Recent studies [53] [95] [83] show that smart utilization of multiple radios can
either increase the network capacity, improve channel performance or save energy.
In this chapter, we utilize dual-radio virtual sinks to counter traffic funnels caused
by impulse type applications, avoiding congestion and balancing load. As a broader
comment, our contribution is the exploration of general design principles that en-
able exploitation of special nodes, such as dual-radio virtual sinks, to increase the
resilience of sensor networks with affordable cost. The idea of using special nodes can
157
be pushed into a higher level of abstraction. For example, in this chapter we exploit
a virtual sink’s characteristic of longer transmission range. The same concept can
be extended to special nodes with higher transit bandwidth or larger storage space.
The same signaling mechanisms in traffic siphoning can be used for the different
kinds of exploitation mentioned above. Therefore, we believe Siphon’s algorithms
are more broadly applicable for a class of new application that exploit special nodes
with additional capability (e.g., dual radio, more computation capability, more stor-
age).
158
Chapter 5
Conclusion
5.1. The Critical Issue of Transport Resilience
This dissertation has focused on new transport and control paradigms for a class
of new wireless networks based on sensing and actuation. Sensor networks are
embedded in the real world and interact closely with the physical environment in
which they reside. These networks must be designed to effectively deal with the
network’s dynamically changing resources, including energy, bandwidth, processing
power, node density, and connectivity. Importantly, these sensor networks must be
designed to be responsive to such changing conditions while supporting a wide range
of traffic demands from sensors. Traffic demands in sensors networks are different
from other traditional networks that have been studied because the injected traffic
is strongly influenced by, and coupled to, changes in the physical environment that
has been instrumented. Furthermore, sensor networks have to deal with the adverse
effects from uncertain and dynamic physical environments.
Because of these challenges we believe that sensor networks must be designed
and built with resilience as a primary design goal, and not, as in many cases in the
current state of the art, as an secondary add-on. As a general architectural comment,
a resilient sensor network must operate autonomously, changing its configuration as
159
required and running algorithms that are optimized for node survivability and energy
usage. This thesis does not address the resilience issues across the broad architecture
(from radio, MAC, routing, aggregation, transport to application). Rather, we focus
on one aspect of this broader architectural problem: making the sensor network
transport systems much more resilient to changes (in many cases abrupt) in the
network resources and environment, and application traffic demands.
In this dissertation, we have argued that a resilient transport system is a funda-
mental building block of future sensor networks; it is fundamental to the operations
of the network, fundamental to the stability of the network, and finally, fundamen-
tal to the energy-conserving performance goals of the network. In this dissertation,
we define transport resilience as the ability to deliver a sufficient number of events
to meet the applications’ fidelity requirements for a set of different traffic patterns
(i.e., periodic, discrete, impulse) while minimizing the energy consumption of the
network. When our study began there was little in the literature related to transport
resilience. Our investigation identified two classes of transport resilience:
1. The need to reliably deliver data with minimal energy expenditure under var-
ious error conditions.
2. The need to maintain the fidelity of the signal delivered to the applications
under congested network conditions.
We addressed the first challenge of reliable data delivery in Chapter 2 and the
congestion problem in Chapter 3 and Chapter 4.
5.2. Reliable Delivery
The first contribution of this dissertation focused on proposing a new reliable de-
livery transport paradigm for sensor networks. The Pump Slowly Fetch Quickly
160
(PSFQ) protocol represented the first reliable transport proposed for wireless sen-
sors networks. PSFQ represents a lightweight, scalable and robust transport pro-
tocol that is customizable to meet a wide variety of application’s needs (e.g., re-
programming, actuation, reliable event delivery). We specifically focused on one
novel reliable data delivery application associated with remotely programming/re-
tasking sensor nodes over-the-air. This was the first realization of such remote
over-the-air programming capability for sensor network, to the best of our knowl-
edge. PSFQ represented an enabling technology for such advanced applications that
had not been feasible prior to the development of PSFQ. In Chapter 2, we presented
the design and implementation of PSFQ, and evaluated the protocol using the ns-
2 simulator and an experimental wireless sensor testbed based on Berkeley motes
and the TinyOS operating system. We showed that PSFQ can outperform existing
related techniques (e.g., an idealized SRM) and is highly responsive to the various
error conditions experienced in sensor networks.
Chapter 2 provided several important contributions to the problem of reliable
transport in sensor networks. First, we proposed and justified hop-by-hop error
recovery in which intermediate nodes also take responsibility for loss detection and
recovery, so that reliable data exchange is done on a hop-by-hop basis rather than
end-to-end. Second, we analyzed a simplified model of our NAK-based algorithm
and showed the optimal ratio between the timers associated with the forwarding
(pump) and retransmission (fetch) operations. Third, PSFQ exhibits a novel multi-
modal communication property that provides a graceful tradeoff between the packet
switching and store-and-forward paradigms, depending on the channel conditions
encountered.
The results presented in Chapter 2 assume the data cache of each node keeps all
fragments of a file. They also assume a fixed sensor network where there is no node
161
mobility in the network. Future work in this area includes the investigation of cache
size limits and the impact of node mobility on the basic reliable transport protocol
design and operation. We also plan to explore other important transport issues
in sensor networks utilizing the PSFQ paradigm; we plan to investigate different
variants of PSFQ that can be optimized by different metrics such as “delay sensi-
tive” reliable delivery, (e.g., by increasing the degree of pipelining in pumping data
packets thereby speeding up data delivery), or adaptive “network density-aware”
transports, (e.g., by modulating the pump/fetch ratio to take advantages of the
node redundancy in a high density environment).
5.3. Congestion Control
Chapter 3 presented the design of an energy-efficient congestion control scheme for
sensor networks called CODA (COngestion Detection and Avoidance). We explored
a new objective function for traffic control in sensor network that maximizes the
operational lifetime of the network while delivering acceptable data fidelity to sensor
network applications.
To enable the dynamic adaptation of sensor applications to the network condi-
tions while meeting the proposed objective function, CODA was founded on three
important distributed control mechanisms: (1) an accurate and energy-efficient con-
gestion detection scheme; (2) a hop-by-hop backpressure algorithm; and (3) a sink to
multi-source regulation scheme. In Chapter 3, we explored a number of congestion
scenarios and defined new performance metrics that captured the impact of CODA
on sensing applications’ performance. We discussed the performance benefits and
practical engineering challenges of implementing CODA in an experimental sensor
network testbed based on Berkeley motes. Both testbed and simulation results in-
dicated that CODA significantly improved the performance of data dissemination
162
applications such as Directed Diffusion by mitigating hotspots, and reducing the
energy consumption and fidelity penalty on sensing applications. These results are
very promising and provide a basis for further larger-scale experimentation.
CODA represents the first comprehensive solution to the congestion problem in
sensor networks. That is not to say that CODA is the final answer to this prob-
lem. Many other open issues exist. As part of future work we are studying the
performance benefits of using CODA with reliable transport mechanisms such as
PSFQ. The results from Chapter 3 also highlighted significant problems associated
with the stable operation of sensor networks that complicate the design of more
effective congestion control mechanisms; the funnelling effect dominates the trans-
port control design in sensor networks. As shown in Chapter 3, the funnelling
effect significantly limits the network capacity in delivering high fidelity data to
the applications. While first generation congestion control schemes such as CODA
are capable of avoiding congestion and costly packet loss (energy waste) in sensor
networks, they do so by using rate control techniques at sources and intermediate
sensors that limit the maximum number of events that can be transported to the
sink. This raised a significant problem that we addressed in Chapter 4 where we
explored a complementary solution to CODA that was capable of maintaining the
application fidelity during persistent overload conditions - something that CODA
was not capable of doing.
Chapter 4 introduced the concept of dual-radio virtual sinks. We proposed to
randomly distribute a small number of all-wireless dual radio virtual sinks through-
out the sensor field. In essence these virtual sinks operated as safety valves in the
sensor field by selectively siphoning off overload traffic in order to maintain the fi-
delity of the application signal at the physical sink. A key attribute of virtual sinks
is that they are equipped with a secondary long-range radio interface, such as the
163
IEEE 802.11, in addition to their primary low power mote radio. Virtual sinks are
capable of dynamically forming a secondary ad hoc radio network. Rather than
rate-controlling packets during congestion, as is the case with CODA, virtual sinks
take the congested traffic off the low-powered sensor network and move it on to the
secondary radio network, transiting it to the final physical data sink.
Chapter 4 explored algorithms for virtual sink discovery, selection, traffic tran-
siting and load balancing. We described the use of the Stargate [48] platform to
support an all-wireless virtual sink approach in our sensor network testbed. We
showed that a small number of virtual sinks are enough to significantly improve
the data fidelity of the sensor networks. Virtual sinks and the Siphon protocols
discussed in Chapter 4 represent a new direction in traffic over-load management
in sensor networks. In fact, one can design radically different protocol architectures
using dual-radio systems, as indicated with the use of multi-radio in wireless mesh
networks. We believe networks incorporating multi-radio platforms constitute a
promising direction for sensor networks. Future work will extend our model to more
than two separate radio networks and examine more affordable secondary radio sys-
tems. The interaction between the two radio systems in support of sensor traffic
needs further study. For example, we presented some initial ideas associated with
congestion push between one radio network and the other. Such behavior could lead
to instabilities if the dual-radio network is not designed with these types of subtle
interactions in mind. We plan to explore these interactions and to build more robust
multi-radio network systems.
5.4. Endnote
There have been a number of significant advances in the area of protocol design
for sensor networks over the last few years. However, we believe that the issue of
164
transport resilience has been neglected in the design of these protocols. As such,
we argue, the usefulness of existing sensor networks and their ability to operate
in a stable and energy efficient manner when resources, the environment, or traffic
demands suddenly shift is limited. Without building resilience into the transport
we believe the network will at best significantly under perform and at worst be
unusable, for example, because of congestion collapse.
In this dissertation we have shown the types of problems that can emerge if
resilience is not built into the transport as a first class citizen. We discussed a
number of considerations that must be included when designing resilient transport
systems for sensor networks. Our contributions are three-fold. At the transport we
designed, implemented and experimentally evaluated the PSFQ protocol for reli-
able data delivery in sensor networks. PSFQ is capable of successful operation even
under conditions that experience significant packet error losses. The last two contri-
butions are linked by the goal of solving the congestion problem which significantly
limits the operation and roll out of sensor networks because of the unique funnelling
effect that these networks exhibit. We first presented the design, implementation,
and experimental evaluation of CODA to mitigate congestion in sensor networks.
While CODA’s simple control mechanisms are effective at suppressing congestion
and allowing the network to operate at a stable point under varying traffic loads it
does so at the cost of limiting the application’s fidelity. The final contribution of
the dissertation addresses this issue. Virtual sinks represent an enabling technology
that allows sensor networks to quickly control congestion using techniques such as
CODA, but to do so while maintaining the application’s fidelity needs. We consider
this interplay between controlled traffic transit under potentially high impulse loads
while maximizing the fidelity of the network, a compelling problem that virtual
sinks begin to address.
165
The work presented in this dissertation takes an experimental systems research
approach through building small scale testbeds and studying how PSFQ, CODA and
Virtual Sinks can scale to even larger distributed networked systems. The source
code for PSFQ, CODA and Virtual Sinks is freely available from the web [96] for
further experimentation.
Collectively, we hope that PSFQ, CODA, and Virtual Sinks provide a set of suit-
able energy-efficient, robust transport building blocks that can serve as a foundation
for building more resilient sensor network.
166
Chapter 6
My Publications as a PhD Candidate
My publications as a Ph.D. candidate (2000-2005) are listed below. This list also
includes research papers that are indirectly related to the work presented in this
thesis, including, the design and implementation of a seamless mobility solution
(Cellular IP) for mobile networks, and the investigation and comparison of various
micromobility protocols based on ns-2 simulations.
6.1. Journal Papers
• Chieh-Yih Wan, Andrew T. Campbell, and Lakshman Krishnamurthy. Pump-
Slowly, Fetch-Quickly (PSFQ): A Reliable Transport Protocol for Sensor Net-
works. IEEE Journal on Selected Areas in Communications, Vol. 23, No. 4,
pp. 862-872, April 2005.
• A. T. Campbell, J. Gomez, C-Y. Wan, S. Kim, Z. Turanyi,and A. Valko.
Internet Micromobility. Journal of High Speed Networks, Special Issue on
Multimedia in Wired and Wireless Environment, 11(3-4):177-198, September
2002.
167
6.2. Journal Papers under Submission
• Chieh-Yih Wan, Shane B. Eisenman, and Andrew T. Campbell. Congestion
Detection and Avoidance in Sensor Networks. IEEE/ACM Transactions on
Networking (in submission).
6.3. Magazine Papers, Review Articles and Book Chapters
• Chieh-Yih Wan, Andrew T. Campbell, and Lakshman Krishnamurthy. Reli-
able Transport for Sensor Networks. (Eds.) Taieb Znati, Krishna M. Sivalingam,
and Cauligi Raghavendra. Wireless Sensor Networks, Kluwer Academic/Springer
Verlag Publishers, Chapter 8. pp. 153-182, ISBN:1-4020-7883-8, May 2004.
• A. T. Campbell, J. Gomez, C-Y. Wan, S. Kim, Z. Turanyi, and A. Valko.
Comparison of IP Micro-Mobility Protocols. IEEE Wireless Communications
Magazine, Vol. 9, No. 1, pp 72-78, February 2002.
• J. Gomez, S. Kim, C-Y. Wan, Z. Turanyi, A. Valko, and A. T. Campbell.
Design, Implementation and Evaluation of Cellular IP. IEEE Personal Com-
munications, Special Issue on IP-based Mobile Telecommunications Networks,
Vol. 7, No. 4, pp. 42-49, August 2000.
6.4. Conference Papers
• Chieh-Yih Wan, Shane B. Eisenman, Andrew T. Campbell and Jon Crowcroft.
Siphon: Overload Traffic Management using Multi-Radio Virtual Sinks in
Sensor Networks. Accepted and to be published in SenSys 2005.
• Chieh-Yih Wan, Andrew T. Campbell, and Jon Crowcroft. A Case for All-
Wireless, Dual Radio Virtual Sinks (poster abstract). In Proc. of Second
168
ACM Conference on Embedded Networked Sensor Systems (SenSys 2004), pp.
267-268, Baltimore, Nov 3-5, 2004.
• Chieh-Yih Wan, Shane B. Eisenman, and Andrew T. Campbell. CODA +
PSFQ + Virtual Sinks = Enabling Technologies for Resilient Sensor Net-
working (demo abstract). In Proc. of Second ACM Conference on Embedded
Networked Sensor Systems (SenSys 2004), pp. 308, Baltimore, Nov 3-5, 2004.
• Chieh-Yih Wan, Shane B. Eisenman, and Andrew T. Campbell. CODA: COn-
gestion Detection and Avoidance in Sensor Networks. In Proc. of First ACM
Conference on Embedded Networked Sensor Systems (SenSys 2003), pp. 266-
279, Los Angeles, November 5-7, 2003.
• Chieh-Yih Wan, Andrew T. Campbell, and Lakshman Krishnamurthy. PSFQ:
A Reliable Transport Protocol For Wireless Sensor Networks. In Proc. of First
ACM International Workshop on Wireless Sensor Networks and Applications
(WSNA 2002), pp. 1-11, Atlanta, September 28, 2002.
• S. Kim, C-Y. Wan, A. T. Campbell, J. Gomez and A. G. Valko. A Cellular
IP Testbed Demostrator. In Proc. Sixth IEEE International Workshop on
Mobile Multimedia Communications (MOMUC’99), San Diego, California, 15-
17 November 1999.
6.5. IETF Internet Drafts
• Z. D. Shelby, D. Gatzounas, C-Y. Wan, and A. T. Campbell. “Cellular IPv6”
, Internet Draft, draft-ietf-seamoby-cellularipv6-00.txt, IETF Mobile IP Work-
ing Group Document, November 2000.
169
• A. T. Cambell, S. Kim, J. Gomez, C-Y. Wan, Z. Turanyi and A. Valko. “Cel-
lular IP”, Internet Draft, draft-ietf-mobileip-cellularip-00.txt, IETF Mobile IP
Working Group Document, December 1999.
• A. T. Cambell, S. Kim, J. Gomez, C-Y. Wan, Z. Turanyi, and A. Valko.
“Cellular IP Performance”, Internet Draft, draft-gomez-cellularip-perf-00.txt,
October 1999.
• A. T. Campbell, J. Gomez, C-Y. Wan, Z. Turanyi, and A. Valko. “Cellular
IP”, Internet Draft, draft-valko-cellularip-01.txt, October 1999.
170
References
[1] Chee-Yee Chong and Srikanta P. Kumar. Sensor Networks: Evolution, Oppor-tunities, and Challenges. Proceedings of the IEEE, 91(8):1247–1256, August2003.
[2] Ubiquitous Computing - The Third Wave in Computing.http://www.ubiq.com/hypertext/weiser/UbiHome.html.
[3] Mark weiser. http://www.ubiq.com/weiser/.
[4] J. W. Gardner, V. K. Varadan, and O. O. Awadelkarim. Microsensors, MEMSand Smart Devices. Wiley, New York, 2001.
[5] G. Pottie and W.J. Kaiser. Wireless integrated network sensors. Communica-tions of the ACM, 43(5):51–58, May 2000.
[6] Brett Warneke, Matt Last, Brian Liebowitz, and Kristofer S. J. Pister. Smartdust: Communicating with a cubic-millimeter computer. Computer, 34(1):44–51, 2001.
[7] David Culler, Deborah Estrin, and Mani Srivastava. Overview of sensor net-works. IEEE Computer, Special Issue in Sensor Networks, August 2004.
[8] Chalermek Intanagonwiwat, Ramesh Govindan, Deborah Estrin, John Hei-demann, and Fabio Silva. Directed diffusion for wireless sensor networking.ACM/IEEE Transactions on Networking, 11(1):2–16, February 2002.
[9] W.R. Heinzelman, J. Kulik, and H. Balakrishnan. Adaptive protocols for in-formation dissemination in wireless sensor networks. In Proc. of the 5th An-nual International Conference on Mobile Computing and Networking (Mobicom1999), pages 174–185, 1999.
[10] John Heidemann, Fabio Silva, and Deborah Estrin. Matching data dissem-ination algorithms to application requirements. In Proceedings of the ACMSenSys Conference, pages 218–229, Los Angeles, California, USA, November2003. ACM.
171
[11] C. Intanagonwiwat, R. Govindan, and D. Estrin. Directed diffusion: A scalableand robust communication paradigm for sensor networks. In Proc. of the 6thAnnual International Conference on Mobile Computing and Networking (Mo-bicom 2000), pages 56–67, August 2000.
[12] Benjie Chen, Kyle Jamieson, and Hari Balakrishnan. An energy efficient coor-dination algorithm for topology maintenance in ad hoc wireless networks. InProc. of the 7th Annual International Conference on Mobile Computing andNetworking (Mobicom 2001), pages 85–96, July 2001.
[13] Y. Xu, J. Heideman, and D. Estrin. Geography-informed energy conservationfor ad hoc routing. In Proc. of the 7th Annual International Conference onMobile Computing and Networking (Mobicom 2001), pages 70–84, July 2001.
[14] W. Ye, J. Heidemann, and D. Estrin. An energy efficient mac protocol for wire-less sensor networks. In Proc. of the 21st International Annual Joint Confer-ence of the IEEE Computer and Communications Societies (INFOCOM 2002),pages 1567–1576. New York, June 2002.
[15] Joe Polastre, Jason Hill, and David Culler. Versatile low power media accessfor wireless sensor networks. In Proc. of Second ACM Conference on EmbeddedNetworked Sensor Systems (SenSys 2004), pages 95–107. Baltimore, November3-5 2004.
[16] T. V. Dam and K. Langendoen. An adaptive energy-efficient mac protocolfor wireless sensor networks. In Proc. of First ACM Conference on EmbeddedNetworked Sensor Systems (SenSys’03), pages 171–180. Los Angeles, November5-7 2003.
[17] X. Wang, G. Xing, Y. Zhang, C. Lu, R. Pless, and C. Gill. Integrated cov-erage and connectivity configuration in wireless sensor networks. In Proc. ofFirst ACM Conference on Embedded Networked Sensor Systems (SenSys 2003),pages 28–39. Los Angeles, November 5-7 2003.
[18] G. Veltri, Q. Huang, G. Qu, and M. Potkonjak. Minimal and maximal exposurepath algorithms for wireless embedded sensor networks. In Proc. of First ACMConference on Embedded Networked Sensor Systems (SenSys 2003), pages 40–50. Los Angeles, November 5-7 2003.
[19] Tinyos homepage. http://webs.cs.berkeley.edu/tos/.
[20] Philip Levis and David Culler. Mate: A tiny virtual machine for sensor net-works. In Proceedings of the 10th International Conference on Architectural
172
Support for Programming Languages and Operating Systems (ASPLOS X),2002.
[21] Philip Levis, Sam Madden, David Gay, Joe Polastre, Robert Szewczyk, AlecWoo, Eric Brewer, and David Culler. The emergence of networking abstractionsand techniques in tinyos. In Proc. of the First USENIX/ACM Symposium onNetworked Systems Design and Implementation (NSDI 2004), 2004.
[22] Roy Shea, Chih-Chieh Han, and Ram Rengaswamy. Motivations Behind SOS.Technical Report SOS2000-1, University of California Los Angeles, NetworkedEmbedded Systems Lab, Los Angeles, CA, February 2004.
[23] Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, andKristofer Pister. System arthitecture directions for network sensors. In Proc.of the 9th International Conf. on Arch. Support for Programming Languagesand Operating Systems, pages 93–104, November 2000.
[24] Jason Hill and David Culler. Mica: A wireless platform for deeply embeddednetworks. IEEE Micro., 22(6):12–24, November/December 2002.
[25] moteiv: Wireless sensor networks. http://www.moteiv.com/.
[26] Jeremy Elson. Time synchronization in wireless sensor networks. Ph.D. disser-tation, May 2003.
[27] S. Ganeriwal, R. Kumar, and M. B. Srivastava. Time-sync protocol for sensornetworks. In Proc. of First ACM Conference on Embedded Networked SensorSystems (SenSys 2003), pages 138–149. Los Angeles, November 5-7 2003.
[28] Lewis Girod and Deborah Estrin. Robust range estimation using acoustic andmultimodal sensing. In Proc. of the IEEE/RSJ International Conference onIntelligent Robots and Systems (IROS 2001). Maui, Hawaii, October 2001.
[29] Nirupama Bulusu, John Heidemann, and Deborah Estrin. Gps-less low costoutdoor localization for very small devices. IEEE Personal CommunicationsMagazine, 7(5):28–34, October 2000.
[30] Grime S and Durrant-Whyte H F. Data fusion in decentralized sensor networks.Control Engineering Practice, 2(5):849–863, 1994.
[31] D. Guo and X. Wang. Dynamic sensor collaboration via sequential monte carlo.IEEE Journal on Selected Areas in Communications, 22(6):1037–1047, August2004.
173
[32] D. Estrin, R. Govindan, J. Heidemann, and S. Kumar. Next century chal-lenges: Scalable coordination in sensor networks. In Proc. of the 5th AnnualInternational Conference on Mobile Computing and Networking (ACM Mobi-com 1999), August 1999.
[33] Robert Szewczyk, Joe Polastre, Alan Mainwaring, John Anderson, and DavidCuller. An analysis of a large scale habitat monitoring application. In Proc.of Second ACM Conference on Embedded Networked Sensor Systems (SenSys2004), pages 214–226. Baltimore, November 3-5 2004.
[34] Mark D. Yarvis, W. Steven Conner, Lakshman Krishnamurthy, JasmeetChhabra, Brent Elliott, and Alan Mainwaring. Real-world experiences with aninteractive ad hoc sensor network. In Proceedings of the International Work-shop on Ad Hoc Networking (IWAHN 2002), pages 143–151. Vancouver, BritishColumbia, Canada, August 2002.
[35] Sameer Tilak, Nael B. Abu-Ghazaleh, and Wendi Heinzelman. Infrastructuretradeoffs for sensor networks. In Proc. of First ACM International Workshopon Wireless Sensor Networks and Applications (WSNA 2002), pages 49–58.Atlanta, September 2002.
[36] I. F. Akyildiz and I. H. Kasimoglu. Wireless sensor and actor networks: Re-search challenges. Ad Hoc Networks Journal (Elsevier), 2:351–367, October2004.
[37] RF Monolithics. Tr1000 916.50mhz hybrid transceivers. http://www.rfm.com.
[38] Sos operating system. http://nesl.ee.ucla.edu/projects/sos.
[39] S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin, R. Govindan, and S. Shenker.GHT: A Geographic Hash Table for Data-Centric Storage. In Proc. of FirstACM International Workshop on Wireless Sensor Networks and Applications(WSNA 2002), pages 78–87. Atlanta, September 2002.
[40] Chalermek Intanagonwiwat, Deborah Estrin, Ramesh Govindan, and John Hei-demann. Impact of network density on data aggregation in wireless sensor net-works. In Proc. of International Conference on Distributed Computing Systems(ICDCS), July 2002.
[41] Jerry Zhao and Ramesh Govindan. Understanding packet delivery performancein dense wireless sensor network. In Proc. of First ACM Conference on Em-bedded Networked Sensor Systems (SenSys 2003), pages 1–13. Los Angeles,November 5-7 2003.
174
[42] J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-end arguments in systemdesign. ACM Transactions on Computer Systems, 2(4), November 1984.
[43] D. Ganesan, B. Krishnamachari, A. Woo, D. Culler, D. Estrin, and S. Wicker.Complex behavior at scale: An experimental study of low-power wireless sen-sor networks. In Technical Report UCLA/CSD-TR02-0013. Computer ScienceDepartment, UCLA, July 2002.
[44] Sridhar Pingali, Don Towsley, and James F. Kurose. A comparison of sender-initiated and receiver-initiated reliable multicast protocols. In Proceedings ofthe Sigmetrics Conference on Measurement and Modeling of Computer Systems,pages 221–230, New York, NY, USA, 1994. ACM Press.
[45] Philip Levis, Sam Madden, David Gay, Joe Polastre, Robert Szewczyk, AlecWoo, Eric Brewer, and David Culler. The emergence of networking abstractionsand techniques in tinyos. In Proc. of the First USENIX/ACM Symposium onNetworked Systems Design and Implementation (NSDI 2004), 2004.
[46] Chipcon Corporation. CC1000 low power FSK transceiver, April 2002.http://www.chipcon.com/files/CC1000 Data Sheet 2 1.pdf.
[47] D. J. Watts and S. H. Strogatz. Collective dynamics of small-world networks.Nature, 393:440–442, 1998.
[48] Stargate datasheet. http://www.xbow.com/Products/Product-pdf-files/.
[49] Chieh-Yih Wan, Andrew T. Campbell, and Lakshman Krishnamurthy. PSFQ:A Reliable Transport Protocol for Wireless Sensor Networks. In Proc. of FirstACM International Workshop on Wireless Sensor Networks and Applications(WSNA 2002), pages 1–11. Atlanta, September 2002.
[50] P. Mishra and H. Kanakia. A hop by hop rate-based congestion control scheme.In Proc. of the ACM SIGCOMM Conf., pages 112–123. Baltimore, MD, August1992.
[51] W. Noureddine and F. Tobagi. Selective Backpressure in Switched EthernetLANs. In Proc. of the IEEE GLOBECOM Conf., pages 1256–1263. Rio DeJaneiro, Brazil, December 1999.
[52] C. Ozveren, R. Simcoe, and G. Varghese. Reliable and efficient hop-by-hop flowcontrol. In Proc. of the ACM SIGCOMM Conf. London, UK, August 1994.
[53] P. Bahl, A. Adya, J. Padhye, and A. Wolman. Reconsidering wireless systemswith multiple radios. ACM SIGCOMM Computer Communications Review(CCR), July 2004.
175
[54] Cots dust - large scale models for smart dust.http://www-bsac.eecs.berkeley.edu/shollar/macro motes/macromotes.html.
[55] S. Floyd, V. Jacobson, C. Liu, S. Macanne, and L. Zhang. A reliable multicastframework for lightweight session and application layer framing. IEEE/ACMTransactions on Networking, 5(6):784–803, December 1997.
[56] J. Zhao, R. Govindan, and D. Estrin. Computing aggregates for monitoringwireless sensor networks. In Proc. Of the IEEE ICC Workshop on SensorNetwork Protocols and Applications. Anchorage, AK, May 2003.
[57] C.E. Perkins and P. Bhagwat. Highly dynamic destination-sequenced distance-vector routing (dsdv) for mobile computers. In SIGCOMM Symposium onCommunications Architectures and Protocols, pages 212–225, September 1994.
[58] S.-Y. Ni, Y.-C. Tseng, Y.-S. Chen, and J.-P. Sheu. The broadcast storm prob-lem in a mobile adhoc network. In Proc. of the 5th Annual ACM/IEEE Inter-national Conference on Mobile Computing and Networking (Mobicom 1999),pages 151–162, August 1999.
[59] D. A. Maltz. On-demand routing in multi-hop wireless mobile ad hoc networks.Ph.D. dissertation, 2001.
[60] The network simulator - ns2. http://www.isi.edu/nsnam/ns/.
[61] J.J. Garcia-Luna-Aceves and E. L. Madruga. The core assisted mesh protocol.IEEE Journal on Selected Areas in Communications, 17(8):1380–1394, August1999.
[62] S.-J. Lee, M. Gerla, and C.-C. Chiang. On-demand multicast routing protocol.In Proc. IEEE Wireless Communications and Networking Conf., pages 1298–1304, Sep. 21-25 1999.
[63] C. Ho, K. Obraczka, G. Tsudik, and K. Viswanath. Flooding for ReliableMulticast in Multi-Hop Ad Hoc Networks. In Mobicom Workshop on DiscreteAlgorithms and Methods for Mobility (DialM.99), August 1999.
[64] E. Pagani and G. Rossi. Reliable Broadcast in Mobile Multihop Packet Net-works. In Proc. of the third annual ACM/IEEE international conference onMobile computing and networking, pages 34–42, September 1997.
[65] R. Stann and J. Heidemann. Rmst: Reliable data transport in sensor networks.In 1st IEEE International Workshop on Sensor Net Protocols and Applications(SNPA). Anchorage, Alaska, USA, May 2003.
176
[66] Yogesh Sankarasubramaniam, Ozgur Akan, and Ian Akyildiz. Event-to-sinkreliable transport in wireless sensor networks. In Proc. of the 4th ACM Sym-posium on Mobile Ad Hoc Networking & Computing (MobiHoc 2003), pages177–188. Annapolis, Maryland, June 2003.
[67] S-J. Park and R. Sivakumar. Sink-to-Sensors Reliability in Sensor Networks. InProc. of the 4th ACM Symposium on Mobile Ad Hoc Networking & Computing(MobiHoc 2003). Annapolis, Maryland, June 2003.
[68] Jonathon hui and David Culler. The dynamic behavior of a data disseminationprotocol for network programming at scale. In Proc. of Second ACM Con-ference on Embedded Networked Sensor Systems (SenSys 2004), pages 81–94.Baltimore, November 3-5 2004.
[69] K. K. Ramakrishnan and R. Jain. A binary feedback scheme for congestionavoidance in computer networks. ACM Transactions on Computer Systems,8(2):158–181, May 1990.
[70] Y.C. Tay, Kyle Jamieson, and Hari Balakrishnan. Collision-minimizing csmaand its applications to wireless sensor networks. IEEE Journal on SelectedAreas in Communications, 22(6):1048–1057, August 2004.
[71] L. P. Clare, G. Pottie, and J. R. Agre. Self-organizing distributed microsensornetworks. In Proc. of SPIE 13th Annual Internl. Symp. on Aerospace/DefenseSensing, Simulation, and Controls (AeroSense). Orlando, FL, April 1999.
[72] V. Rajendran, K. Obraczka, and J.J. Garcia. Energy-efficient, collision-freemedium access control for wireless sensor networks. In Proc. of First ACMConference on Embedded Networked Sensor Systems (SenSys’03), pages 181–192. Los Angeles, November 5-7 2003.
[73] D. Bertsekas and R. Gallager. DATA NETWORKS, second edition. PrenticeHall, Upper Saddle River, New Jersey, 1992.
[74] J. Li, C. Blake, D. De Couto, H. Lee, and R. Morris. Capacity of ad hoc wirelessnetworks. In Proc. of the Seventh Annual International Conference on MobileComputing and Networking, pages 61–69, July 2001.
[75] Alec Woo and David Culler. A transmission control scheme for media accessin sensor networks. In Proc. of the 7th Annual International Conference onMobile Computing and Networking (Mobicom 2001), pages 221–235, July 2001.
[76] Bret Hull, Kyle Jamieson, and Hari Balakrishnan. Mitigating congestion inwireless sensor networks. In Proc. of Second ACM Conference on Embedded
177
Networked Sensor Systems (SenSys 2004), pages 134–147. Baltimore, November3-5 2004.
[77] Bret Hull, Kyle Jamieson, and Hari Balakrishnan. Bandwidth management inwireless sensor networks (poster abstract). In Proc. of First ACM Conferenceon Embedded Networked Sensor Systems (SenSys 2003), pages 306–307. LosAngeles, November 5-7 2003.
[78] Cheng Tien Ee and Ruzena Bajcsy. Congestion control and fairness for many-to-one routing in sensor networks. In Proc. of Second ACM Conference onEmbedded Networked Sensor Systems (SenSys 2004), pages 148–161. Baltimore,November 3-5 2004.
[79] P. Sinha, N. Venkitaraman, R. Sivakumar, and V. Bharghavan. Wtcp: Areliable transport protocol for wireless wide-area networks. In Proc. of the5th Annual International Conference on Mobile Computing and Networking(Mobicom 1999). Seattle, August 1999.
[80] G.-S. Ahn, A. T. Campbell, Andras Veres, and Li-Hsiang Sun. Supportingservice differentiation for real-time and best effort traffic in stateless wirelessad hoc networks (swan). IEEE Transactions on Mobile Computing, 1(3):192–207, July-September 2002.
[81] K. Tang and M. Gerla. Reliable on-demand multicast routing with congestioncontrol in wireless ad hoc networks. In Proceedings of SPIE 2001. Denver,August 2001.
[82] Chieh-Yih Wan, Shane B. Eisenman, and Andrew T. Campbell. CODA: COn-gestion Detection and Avoidance in Sensor Networks. In Proc. of First ACMConference on Embedded Networked Sensor Systems (SenSys 2003), pages 266–279. Los Angeles, November 5-7 2003.
[83] Eugene Shih, Paramvir Bahl, and Michael J. Sinclair. Wake on wireless: Anevent driven energy saving strategy for battery operated devices. In Proc. ofthe 8th Annual International Conference on Mobile Computing and Networking.Atlanta, GA, September 2002.
[84] W. S. Conner, J. Chhabra, M. Yarvis, and L. Krishnamurthy. Experimentalevaluation of synchronization and topology control for in-building sensor net-work applications. In Proc. of 2nd ACM International Workshop on WirelessSensor Networks and Applications (WSNA 2003), pages 38–49. San Diego, CA,September 2003.
178
[85] M. Yarvis, N. Kushalnagar, H. Singh, A. Rangarajan, Y. Liu, and S. Singh.Exploiting heterogeneity in sensor networks. In Proceedings of IEEE INFOCOM2005 (to appear). Miami, FL, March 2005.
[86] Feng Xue and P. R. Kumar. The number of neighbors needed for connectivityof wireless networks. Wireless Networks, 10(2):169–181, March 2004.
[87] P. Gupta and P. R. Kumar. The capacity of wireless networks. IEEE Trans-actions on Information Theory, IT-46(2):388–404, March 2000.
[88] T.H.Corman, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. MITPress, McGraw Hill, 1990.
[89] Alec Woo and David Culler. Taming the underlying challenges of reliable multi-hop routing in sensor networks. In Proc. of First ACM Conference on EmbeddedNetworked Sensor Systems (SenSys 2003), pages 14–27. Los Angeles, November5-7 2003.
[90] T. He, B. M. Blum, J. A. Stankovic, and T. Abdelzaher. Aida: Adaptiveapplication-independent data aggregation in wireless sensor networks. ACMTransactions on Embedded Computing Systems, 3(2):426–457, May 2004.
[91] J. C. Navas and Tomasz Imielinski. Geographic addressing and routing. InProc. of the 3rd Annual International Conference on Mobile Computing andNetworking (Mobicom 1997). Budapest, Hungary, September 1997.
[92] Badri Nath and Dragos Niculescu. Routing on a curve. In Proc. of the FirstWorkshop on Hot Topics in Networks (HotNets-I). Princeton, NJ, October2002.
[93] C. E. Perkins, E. M. Belding-Royer, and S. Das. Ad hoc on demand distancevector (aodv) routing. IETF RFC 3561.
[94] R. Murty, E. H. Qi, and M. Hazra. An adaptive approach to wireless net-work performance optimization. In Wireless World Research Forum (WWRF11Meeting). Oslo, Norway, June 10-11 2004.
[95] A. Adya, P. Bahl, J. Padhye, A. Wolman, and L. Zhou. A multi-radio unifica-tion protocol for ieee 802.11 wireless networks. In Technical Report (MSR-TR-2003-44). Microsoft Research, July 2003.
[96] The armstrong project. http://www.comet.columbia.edu/armstrong.