PROGRAMMING OUTDOOR DISTRIBUTED EMBEDDEDSYSTEMS
BY CRISTIAN M. BORCEA
A dissertation submitted to the
Graduate School—New Brunswick
Rutgers, The State University of New Jersey
in partial fulfillment of the requirements
for the degree of
Doctor of Philosophy
Graduate Program in Computer Science
Written under the direction of
Professor Liviu Iftode
and approved by
New Brunswick, New Jersey
October, 2004
c© 2004
CRISTIAN M. BORCEA
ALL RIGHTS RESERVED
ABSTRACT OF THE DISSERTATION
Programming Outdoor Distributed Embedded Systems
by CRISTIAN M. BORCEA
Dissertation Director: Professor Liviu Iftode
The next generation of computing systems will be embedded everywhere in the physical world.
These ubiquitous systems, deployed in a virtually unbounded number and dynamically con-
nected, will create outdoor computing environments. The thesis of my dissertation is that these
environments can be programmed to execute distributed applications using programming mod-
els and system architectures specifically designed to address their volatility, heterogeneity, and
scale.
My dissertation proposes Spatial Programming, a location-aware programming model for
outdoor distributed computing, and Smart Messages, a system architecture based on execution
migration that supports distributed computing over ad hoc networks of embedded systems.
Spatial Programming is a location-aware programming model that enables programmers to
easily develop distributed applications over dynamic networks of potentially mobile embedded
systems. Central to Spatial Programming is the concept of spatial reference, which defines a
virtual name space over networks of embedded systems using the expected locations and prop-
erties of these systems. Programmers use spatial references to access the content or services
provided by nodes in the network in the same way they use variables in a conventional program.
Similar to the mappings from virtual to physical memory in a conventional computer system, a
runtime system maintains mappings between spatial references and nodes in the physical space.
ii
A Spatial Programming runtime is implemented on top of the Smart Messages system ar-
chitecture, which provides a cooperative execution environment in networks of embedded sys-
tems. A Smart Message is a user-defined distributed program that executes on nodes of interest
named by their properties and reached using explicit execution migration. Smart Messages rep-
resent an attractive alternative to traditional distributed computing based on message passing in
mobile ad hoc networks because they adapt quickly to highly dynamic networks and provide
support for deploying new applications in existing networks.
To demonstrate the feasibility of the proposed solutions, we have designed and implemented
a prototype system, and we have performed simulations for larger scale networks. The experi-
mental results for several applications executed over wireless networks of pocket PCs indicate
that Spatial Programming and Smart Messages are viable solutions for outdoor distributed com-
puting.
iii
Acknowledgements
I would like to begin by expressing my gratitude to Liviu Iftode, my thesis advisor and mentor
for the past four years. His passion to conduct research that matters along with his enthusiasm
for exploring new ideas have been a constant source of inspiration for me. His guidance during
my time at Rutgers has been invaluable.
With Uli Kremer, I had countless discussions about the design of both Smart Messages and
Spatial Programming. I thank Uli for helping me focus on the high-level picture of my research.
His advice and feedback have greatly enhanced and strengthened the work. I also appreciate
very much the other two members of my thesis committee, Badri Nath and Yuanyuan Zhou, for
their support.
During my years at Rutgers, I certainly benefited from the interaction with many professors.
I especially enjoyed the system classes or seminars taught by Thu Nguyen, Rich Martin, and
Ricardo Bianchini. I also appreciated Ricardo’s skills on the soccer field, even though he was
usually my opponent.
The Smart Messages project involved the participation of a larger group of people. Special
thanks to Porlin Kang for his work in the implementation of Smart Messages and for being
always around to help. I also enjoyed a short, but very productive collaboration with Chalermek
Intanagonwiwat during his PostDoc at Rutgers.
I especially acknowledge all my colleagues from DiscoLab, and I would like to thank
Aniruddha Bohra, Murali Rangarajan, and Florin Sultan for providing me with feedback for
many practice talks.
To my friend Anda Iamnitchi, I owe the interest of pursuing a Ph.D. in the United States. I
also thank her for being always on-line to answer many questions that I had.
I would like to thank all my romanian friends at Rutgers who made my life in grad school
much more colorful than I initially hoped. Special thanks to Andreea and Cristi Francu who
iv
helped me pass over the difficult period of adapting to a new country. I would like to thank
Cristi Popescu for the useful advices that he constantly offered. Costel Serban has been a great
friend for such a long time, starting with our undergrad years in Romania and continuing all
this period at Rutgers.
I am deeply grateful to my brother for helping me with everything I asked him during the
last six years. Knowing that I can rely on him has provided me with peace of mind in many
situations. Last, but certainly not least, I am indebted to my parents for everything I am.
This work was supported in part by the NSF grant ANI-0121416.
v
Dedication
To My Family
vi
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1. Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. Outdoor Computing Environments . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3. The Programmability Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4. Dissertation Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6. Contributors to Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7. Dissertation Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2. Spatial Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3. Space Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4. Spatial References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5. Reference Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6. Space Casting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7. Spatial Reference Access Timeout . . . . . . . . . . . . . . . . . . . . . . . . 20
vii
2.8. Defining New Space Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.9. Creating/Removing Network Resources . . . . . . . . . . . . . . . . . . . . . 23
2.10. Putting It All Together: Program Example . . . . . . . . . . . . . . . . . . . . 23
2.11. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3. Smart Messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1. Smart Messages Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2. Cooperative Node Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1. Virtual Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.2. Local Injector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.3. Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.4. Admission Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.5. Tag Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.6. Synchronization Mechanism . . . . . . . . . . . . . . . . . . . . . . . 34
3.3. Smart Messages API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.1. Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.2. Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.3. Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.4. Setting Resource Requirements . . . . . . . . . . . . . . . . . . . . . 36
3.3.5. Tag Space Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4. Security Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.1. Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.2. Protection Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5. Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5.1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5.2. SPIN using Smart Messages . . . . . . . . . . . . . . . . . . . . . . . 40
3.5.3. Directed Diffusion using Smart Messages . . . . . . . . . . . . . . . . 42
3.6. Smart Messages Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.7. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
viii
3.8. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4. Smart Messages Self-Routing Mechanism. . . . . . . . . . . . . . . . . . . . . 46
4.1. Content-Based Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2. Application Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.1. Selecting the Routing Algorithm . . . . . . . . . . . . . . . . . . . . . 49
4.2.2. Dynamically Changing the Routing Algorithm . . . . . . . . . . . . . 50
4.3. Implementing Routing Algorithms with Smart Messages . . . . . . . . . . . . 52
4.3.1. On-Demand Content-Based Routing . . . . . . . . . . . . . . . . . . . 52
4.3.2. Geographical Routing . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3.3. Proactive Routing using Bloom Filters . . . . . . . . . . . . . . . . . . 54
4.3.4. Rendez-Vous Routing . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.4. Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5. Prototype Implementation and Evaluation . . . . . . . . . . . . . . . . . . . . 62
5.1. Smart Messages Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.1. Creating New Smart Messages . . . . . . . . . . . . . . . . . . . . . . 63
5.1.2. Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.1.3. Lightweight Migration . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.1.4. Code Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.5. I/O Tags for Interaction with the OS and I/O System . . . . . . . . . . 67
5.2. Smart Messages Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.1. Cost of SM Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2.2. Cost of SM Migration . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.3. Tag Space Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2.4. Routing Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2.5. Application Case Study: EZCab . . . . . . . . . . . . . . . . . . . . . 74
5.3. Spatial Programming using Smart Messages . . . . . . . . . . . . . . . . . . . 77
5.4. Spatial Programming Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 81
ix
5.5. Experiences and Lessons Learned from Building our Prototypes . . . . . . . . 84
5.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
x
List of Tables
1.1. Traditional Distributed Computing Environments vs. Outdoor Distributed Com-
puting Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.1. Smart Messages API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1. Effect of Code Brick Size oncreateSMFromFiles . . . . . . . . . . . . . . . . 69
5.2. Effect of Data Brick Size onspawnSMandcreateSM . . . . . . . . . . . . . . 69
5.3. Cost of Tag Space Primitives for Application Tags . . . . . . . . . . . . . . . 72
5.4. Cost of Reading I/O Tags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.5. Completion Time for Routing Algorithms . . . . . . . . . . . . . . . . . . . . 73
xi
List of Figures
2.1. How to Program Motion Sensors and Intelligent Cameras Deployed over Two
Hills to Perform Distributed Object Tracking? . . . . . . . . . . . . . . . . . 13
2.2. Analogy Between Spatial Programming and Two Traditional Programming Mod-
els . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3. Example of Spatial References for Object Tracking in a Network Consisting of
Motion Sensors and Intelligent Cameras Deployed over Two Hills . . . . . . . 17
2.4. Example of Program using Spatial References . . . . . . . . . . . . . . . . . 18
2.5. Reference Consistency Example: A Spatial Reference is Mapped to the Same
System as long as this System Remains in the The Same Space Region . . . . 19
2.6. Space Casting: The Same System is Referenced in Different Space Region . . 20
2.7. Code Example for Spatial Reference Access Timeout . . . . . . . . . . . . . . 21
2.8. Dynamic Definition of a Relative Space Region . . . . . . . . . . . . . . . . . 22
2.9. Spatial Programming Application for Object Tracking . . . . . . . . . . . . . . 24
3.1. Traditional Distributed Applications vs. Smart Messages Applications . . . . . 26
3.2. Distributed Computing Using Smart Messages . . . . . . . . . . . . . . . . . 27
3.3. Smart Message Code Example . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4. Execution Path for the Above Smart Message . . . . . . . . . . . . . . . . . . 28
3.5. Cooperative Node Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6. Application and I/O Tag Structures . . . . . . . . . . . . . . . . . . . . . . . 33
3.7. SM Protection Domains for Tag Space Access . . . . . . . . . . . . . . . . . 38
3.8. Access Control Example For Smart Message Family Cooperation (Ni are Nodes,
SMi are Smart Messages, andT is a Tag) . . . . . . . . . . . . . . . . . . . . . 38
3.9. Access Control Example For Single Originator Cooperation (Ni are Nodes,SMi
are Smart Messages, andT is a Tag) . . . . . . . . . . . . . . . . . . . . . . . 39
xii
3.10. Access Control Example for Code-based Cooperation (Ni are Nodes,SMi are
Smart Messages, andT is a Tag) . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.11. Implementation of SPIN with Smart Messages . . . . . . . . . . . . . . . . . 41
3.12. Directed Diffusion using Smart Messages . . . . . . . . . . . . . . . . . . . . 44
3.13. SPIN using Smart Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.14. Directed Diffusion - Multiple Smart Messages . . . . . . . . . . . . . . . . . 44
3.15. SPIN - Multiple Smart Messages . . . . . . . . . . . . . . . . . . . . . . . . 44
4.1. Example of Smart Message Using Content-based Migration . . . . . . . . . . 47
4.2. Example ofmigrateImplementation . . . . . . . . . . . . . . . . . . . . . . . 48
4.3. Dynamic Change of Routing Due to Application’s Requirements . . . . . . . . 50
4.4. Dynamic Change of Routing Due to Network’s Conditions . . . . . . . . . . . 51
4.5. Example of On-demand Routing Implementation with Smart Messages . . . . 53
4.6. Lookup in Proactive Routing: An SM arrives at node A, looking for a “fire”
tag. Applying the hash functions on “fire”, it concludes that the neighbors of C
might know better about “fire”, and migrates to C. A lookup on node C leads
to the conclusion that the “fire” tag exists on node F. . . . . . . . . . . . . . . 55
4.7. Rendez-Vous Routing with Smart Messages . . . . . . . . . . . . . . . . . . . 56
4.8. Completion Time for Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . 58
4.9. Bytes Sent in the Network for Experiment 1 . . . . . . . . . . . . . . . . . . . 58
4.10. Completion Time for Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . 59
4.11. Bytes Sent in the Network for Experiment 2 . . . . . . . . . . . . . . . . . . . 59
4.12. Completion Time for Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . 59
4.13. Bytes Sent in the Network for Experiment 3 . . . . . . . . . . . . . . . . . . . 59
5.1. Smart Message Transfer (Main Operations) . . . . . . . . . . . . . . . . . . . 66
5.2. I/O Tag Example (Using GPS to Get the Current Location) . . . . . . . . . . . 68
5.3. Cost of Data Brick Serialization . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4. Cost of Data Brick De-Serialization . . . . . . . . . . . . . . . . . . . . . . . 70
5.5. Effect of Code Brick Size on Single Hop Migration . . . . . . . . . . . . . . . 71
5.6. Effect of Data Brick Size on Single Hop Migration . . . . . . . . . . . . . . . 71
xiii
5.7. Network Topology for Routing Experiments . . . . . . . . . . . . . . . . . . . 72
5.8. Route Discovery in EZCab . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.9. Cab Booking following a Route Discovery in EZCab . . . . . . . . . . . . . . 74
5.10. EZCab Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.11. Estimated Completion Time for EZCab . . . . . . . . . . . . . . . . . . . . . 77
5.12. Implementation of Spatial References with Smart Messages . . . . . . . . . . 78
5.13. Example of Spatial Reference Access . . . . . . . . . . . . . . . . . . . . . . 80
5.14. Java Code for Intrusion Detection Application . . . . . . . . . . . . . . . . . 82
5.15. The Network Topology for Intrusion Detection Application . . . . . . . . . . 83
5.16. Typical Camera Node with GPS Receiver Attached . . . . . . . . . . . . . . . 83
5.17. Smart Message Code Breakdown for Intrusion Detection Application . . . . . 83
5.18. Spatial Programming Runtime Library Code Breakdown . . . . . . . . . . . . 83
5.19. Execution Time for Intrusion Detection Application . . . . . . . . . . . . . . 84
xiv
List of Abbreviations
NES Networks of Embedded Systems
SP Spatial Programming
SM Smart Message
GPS Global Positioning System
VM Virtual Machine
AN Active Networks
xv
1
Chapter 1
Introduction
1.1 Thesis
The thesis of my dissertation is that outdoor computing environments can be programmed to
execute distributed applications using programming models and system architectures specifi-
cally designed to address their volatility, heterogeneity, and scale.
1.2 Outdoor Computing Environments
Recent advances in technology will likely realize the vision of ubiquitous computing [85, 53],
where the physical world is populated with a sheer number of heterogeneous embedded systems
that can sense, monitor, or control our surrounding environment. Unlike traditional embedded
systems which have very limited resources and are used to execute simple, dedicated functions,
these emergent systems are much more powerful and can be used to program a variety of tasks.
For instance, we are witnessing computers embedded in cars, video cameras, cell phones, or
even watches [6] that are powerful enough to run applications on top of reduced versions of
traditional operating systems.
So far, these systems have been mostly used for local computations. Most of them are,
however, equipped with short-range wireless network interfaces (e.g., IEEE 802.11, Bluetooth).
Hence, they can create mobile ad hoc networks of embedded systems (NES) which can be pro-
grammed to execute distributed applications. NES offer the opportunity to program a large
spectrum of distributed applications, ranging from simple data collection and data dissemina-
tion [42, 39, 34] to remote object tracking using robots equipped with video cameras [43] or
inter-car collaboration to improve the safety and fluidity of the traffic [4]. This type of dis-
tributed applications will soon span non-traditional computing domains, such as health care,
2
Term of Comparison Traditional Networks Networks of Embedded SystemsLocation Indoor OutdoorNodes Functionally Homogeneous Functionally HeterogeneousOperation Under User’s Control UnattendedScale Relatively Small LargeTopology Stable Ad Hoc and VolatileResources Known A Priori/Infrequent Changes Limited A Priori Knowledge/Highly Dynamic
Table 1.1: Traditional Distributed Computing Environments vs. Outdoor Distributed Comput-ing Environments
transportation, or homeland security. This huge potential will not be achieved, however, with-
out proper support for programming outdoor distributed applications.
1.3 The Programmability Problem
This dissertation tries to answer the question:how to program outdoor distributed applica-
tions? Most of the recent research in NES area has focused focused on hardware, operating
systems, or network protocols for sensor networks [42, 71, 35, 39]. We believe that a crucial
challenge which has been only marginally tackled is how to program distributed applications in
outdoor computing environments. Developing outdoor distributed applications requires us to
understand the unique challenges possessed by NES. Table 1.1 presents a comparison between
the networks used in traditional distributed computing and NES. Unlike traditional distributed
computing which takes place “indoors” over relatively small scale networks with stable con-
figurations, distributed computing over NES takes place “outdoors” over large scale networks
with highly dynamic configurations. Since NES are composed of a massive number of hetero-
geneous systems, which may be mobile and volatile, it is impossible to know the exact number
or location of various network resources over time.
To leverage the raw computing power provided by NES into distributed applications, we
need programming models and system architectures that are able to overcome the NES volatil-
ity, heterogeneity, and scale. Traditional distributed computing models (e.g., message passing,
shared memory) cannot satisfy this requirement because they have not been designed for out-
door computing environments. Writing distributed programs under these models is relatively
easy when the underlying networks are composed of functionally homogeneous nodes and
have stable configurations with acceptable delays. On the other hand, when the networks are
3
composed of functionally heterogeneous nodes and have volatile configurations with unknown
delays, such as NES, developing distributed applications becomes much more difficult.
Several basic assumptions in traditional distributed computing models render them unus-
able in NES. Their main assumption is the end-to-end data transfers between applications resid-
ing on different nodes. One problem with end-to-end data transfers is that they may complete
very slowly, or may not complete at all in volatile networks [84]. Even if the network topol-
ogy is stable, the wireless nature of communication (e.g., in sensor networks, the experienced
packet loss between two neighbor nodes is as high as 40-50%) will lead to the same problems.
Since applications have no control over the network, they are forced to wait indefinitely (or
until the connection times out) each time something goes wrong in the network. To be able
to adapt quickly to network volatility, applications would like to regain the control as soon as
possible.
Another problem with traditional end-to-end data transfers is that they do not allow in-
network processing in order to reduce the size of data transferred by applications [33]. Reduc-
ing the amount of traffic in the network is important in mobile ad hoc networks, such as NES,
since it leads to reduced bandwidth and energy consumption. Therefore, outdoor applications
would also benefit from the ability to perform in-network processing.
Traditional distributed computing assumes fixed bindings between names and node ad-
dresses. This naming is too rigid for NES. After a fixed binding has been established dur-
ing the name resolution phase, an application is forced to contact the same node each time
it needs to access a resource of the same type. Commonly, name resolvers react slowly to
network changes, and applications would try to contact a node long time after this node has
become unreachable, even though nodes with similar resources exist in the network. To pre-
vent such a situation, more flexible naming is needed in NES. We believe that content-based
naming [39, 13] can provide a solution because it allows applications to contact any node that
has a certain resource.
Since content-based naming makes fixed addresses (e.g., IP) irrelevant, the routing and
name resolution should be integrated in NES. Additionally, given the diversity of applications,
no single routing will provide good performance for all applications. Therefore, similar to
active networks [25, 59, 77], it would be desirable to let applications use the best-suited routing
4
for their needs. For instance, an application may use geographical routing to reach a node with
known location, while another application may use content-based routing to reach a node with
a certain property.
In traditional distributed computing models, the programmers install the applications man-
ually on all the nodes involved in computation. Under this assumption, it is practically impos-
sible to have a new application in NES after the network deployment phase. Thus, we would
like to have an automatic method for deploying new applications in existing networks.
The conclusion of the above arguments is that traditional distributed computing models
cannot work in NES due to the unique characteristics exhibited by these networks. Currently,
the only alternative approaches are ad hoc and provide limited flexibility; they are designed for
specific classes of applications (e.g., querying the network for certain data) and can hardly ac-
commodate new applications or services after the network has been deployed. As the domain of
possible applications diversifies, there will be an increasing demand for a common distributed
computing platform to support arbitrary applications over NES. Such a platform has to support
simple development and rapid prototyping of new distributed applications. It also has to allow
applications to cope with the uncertainty encountered in NES (i.e., the network topology as
well as the resources at nodes are unknown a priori and can vary greatly over time).
1.4 Dissertation Contributions
Spatial Programming is a location-aware programming model that enables programmers to
easily develop distributed applications over dynamic networks of potentially mobile embedded
systems. Similar to the view of the network as a database (used in sensor networks), Spa-
tial Programming (SP) views the network as a single virtual name space. Central to SP is the
concept of spatial reference which defines a virtual name space over networks of embedded
systems using the expected locations and properties of these systems. In SP, network resources
(content or services provided by nodes) are accessed using spatial references in the same way
memory is accessed using variables in conventional programming. Similar to the mappings
from virtual to physical memory in a conventional computer system, a runtime system main-
tains mappings between spatial references and nodes in the physical space. For every access
5
to a spatial reference, the runtime system takes care of name resolution and binding, commu-
nication, and routing. We implemented a runtime system for SP on top of Smart Messages.
SP is presented in two papers appeared in the9th International Workshop on Future Trends of
Distributed Computing Systems (FTCDS 2003)[38] and the24th International Conference on
Distributed Computing Systems (ICDCS 2004)[19], respectively.
The Smart Messagessystem architecture, based on execution migration, content-based
naming, and self-routing, has been designed specifically to support the development of arbi-
trary distributed applications over NES. The SM computing platform assumes a decentralized
architecture, where nodes in the network act as peers. A Smart Message (SM) is a user-defined
distributed program which executes on nodes of interest named by their properties and reached
using explicit migrations. An SM carries its execution state (and possibly its code) during mi-
grations and self-routes at each intermediate node between two nodes of interest. SMs do not
make any assumptions about the underlying network configuration, except for a minimal sys-
tem support provided by nodes, which include a virtual machine and a name-based memory,
called tag space. The virtual machine offers a hardware abstraction layer for SM execution
which shields SMs from heterogeneous node configurations. The tag space offers a name-
based memory, persistent across SM executions. It consists of named data pairs, called tags,
which are used for data exchange among SMs (application tags) or for accessing the local host
properties (I/O tags).
In networks of embedded systems, SMs represent an attractive alternative to traditional
distributed computing based on end-to-end message passing for several reasons. First, SMs
allow applications to adapt to highly dynamic network configurations. The routing code can
be instructed to return the control to application as soon as a route cannot be found or after
an application-set timeout. Since its execution state is already at the same node, the SM can
quickly adapt to changes in the network. Second, the content-based routing provides the flexi-
bility to reach a node that offers a certain property in an application-controlled manner. Third,
SMs simplify the deployment of new applications in the network after the network deployment
phase has ended. A user can inject SMs at any node in the network, and consequently the
SMs migrate their code each time the code is not cached at the node they are executing on.
And fourth, SMs can significantly reduce the amount of traffic generated by certain classes of
6
applications (e.g., processing big amounts of data at its source).
To demonstrate the feasibility of SMs, we have designed and implemented the SM proto-
type in Java over Linux. The SM system support is implemented within Sun Microsystem’s
K Virtual Machine which was designed specifically for resource constrained devices (its size
is 160KB). The testbed used for the prototype’s evaluation consisted of an ad hoc network of
PDAs (HP iPAQs equipped with IEEE 802.11 wireless cards). The details of the SM prototype
are presented in aComputer Journalpaper [44].
For larger scale evaluation, we have simulated SM-based implementations of existing ap-
plications for sensor networks. The results are presented in a paper appeared in the22nd In-
ternational Conference on Distributed Computing Systems (ICDCS 2002)[21] and a chapter
in theHandbook of Sensor Networks[37]. We have also demonstrated the benefits of the SM
self-routing mechanism using different routings (e.g., content-based on-demand routing, geo-
graphical routing, proactive routing with Bloom filters). This work resulted in a paper appeared
in the1st IEEE Conference on Pervasive Computing and Communication (PerCom 2003)[20].
For this research, we developed an event-driven simulator extended with support for SM exe-
cution.
The security issues and solutions for the SM architecture have been presented in a pa-
per appeared in the1st Workshop on Mobile Distributed Computing (MDC 2003)[86]. To
demonstrate the feasibility of the SM computing platform for real-world applications, we have
developed EZCab, an application for locating and booking free cabs in densely crowded traffic
environments, which resulted in a technical report [67].
1.5 Related Work
Recent projects [31, 73, 12, 15, 70] have presented programming models for ubiquitous/pervasive
computing. SP shares some of their goals, but its main design goal is to provide simple abstrac-
tions to program distributed applications for systems embedded in the physical space. These
abstractions decouple the access to network resources from the networking details.
Although geographical routing [48, 49] and content-based naming and routing [13, 39]
have been extensively studied, a simple and intuitive programming model that allows the user
7
to express the computation in terms of physical location and content (or services) provided by
nodes is still missing. SP offers such a model, and its runtime system takes advantage of these
routing algorithms (especially of those developed for ad hoc networks).
The “database” model [18, 55] for programming sensor networks is a research complemen-
tary to SP. For instance, TAG [55] defines an SQL-like language for sensor networks. Both SP
and TAG provide simple programming constructs that shield the programmer from the underly-
ing network. There are two main differences between SP and this work. First, the programmer
has fine-grained control over execution in SP, while TAG depends entirely on the compiler (i.e.,
essentially SP offers an imperative language, while TAG offers a declarative language). Sec-
ond, SP focuses on flexible abstractions that support programming for uncertainty in highly
dynamic networks, while TAG focuses on a set of queries executed efficiently in the network.
Abstract regions [54] and EnviroTrack [82] have a similar goal to SP. While SP provides
a high-level programming model for outdoor distributed computing, they offer high-level pro-
gramming abstractions for sensor networks. Abstract regions select a subset of nodes of interest
based on certain properties and allow programmers to apply various operations (e.g., maximum,
reduction) on these subsets. Unlike abstract regions, SP is able to access individual nodes in
the network in a consistent fashion. Additionally, its implementation based on SMs allows
simple deployment of new applications in existing networks and quick adaptability to network
conditions. EnviroTrack defines a distributed middleware that provides a convenient API for
programmers writing applications that track objects in the physical world. While EnviroTrack
is focused on a specific category of applications, SP provides a general programming model
that can be used for any type of distributed application over networks of embedded systems.
SensorWare [22] can be an alternative solution for the SP runtime system, especially in
networks composed of devices with extremely limited resources (e.g., sensor networks). Sen-
sorWare is similar to SMs in the sense that both SensorWare and SMs are systems based on
code migration. Therefore, both are suitable for re-programming the network. SMs, however,
offer the advantage of programming in a well known language (Java) which is supported on
many embedded systems today [1]. Also, the tag space abstraction provided by the SM ar-
chitecture and the SM self-routing mechanism simplify the implementation of the SP runtime
system.
8
The SM platform shares the idea of execution migration with process migration [58, 64, 11],
mobile agents [46, 29], and active networks [25, 59, 77].
Unlike process migration which has been used to increase performance or availability in
stable networks, the main goal of the SM platform is to provide flexible support for program-
ming distributed applications over highly dynamic NES. Additionally, process migration and
SM migration differ in two aspects. First, the SM migration is explicit (i.e., the programmer
decides when and where to migrate), while process migration is implicit (i.e., the system de-
cides when and where to migrate a process). Second, the SM architecture avoids one of the
most difficult problems in process migration: transferring the kernel state (e.g., sockets, file de-
scriptors). The SM platform does not transfer any kernel state because SMs interact with local
hosts through atomic operations performed on the tag space, and they do not open explicitly
communication channels.
SMs are influenced by the design of mobile agents. Similar to a mobile agent, an SM may
be viewed as an application that explicitly migrates between nodes of interest. Mobile agents,
however, name nodes by fixed addresses and commonly know the network configuration a pri-
ori, while SMs name nodes by content and discover the network configuration dynamically.
In contrast to mobile agents, SMs are responsible for their own routing at each node in the
path between two nodes of interest. This feature allows SMs to adapt quickly to changes that
may occur both in the network topology and the availability of resources at nodes. Further-
more, the SM system architecture is suitable for resource constrained devices since it defines a
lightweight system support at nodes, with most of the “intelligence” incorporated into SMs.
Although the SM computing platform (especially the self-routing mechanism), shares some
of the design goals and leverages work done in active networks (AN), it differs from AN in
several key features. A first difference comes from the problems they try to solve: AN target
improved performance for end-to-end data transfer in relatively stable networks, while the SM
platform helps the development of distributed applications on top of a new computing infras-
tructure which is significantly under-used due to the lack of programmability support. Unlike
AN, we define a computing model whereby several SMs can cooperate, exchange data, and
synchronize with each other through the tag space. In terms of migration, AN do not transfer
9
the execution state from node to node whereas the SM model does. The migration of the execu-
tion state for SMs trades off overhead for flexibility to react “on-the-spot” to adverse network
conditions.
Sensor networks represent the first attempt toward deploying large scale NES. Most of
the research in this area has focused on hardware [42, 71], operating systems [35], or net-
work protocols [39, 34, 17]. Even though sensor networks act primarily as huge distributed
databases [18, 56], more sophisticated applications might be needed in the future. Toward this
end, SensorWare [22] and Mate [51] have proposed solutions for network re-programmability.
The SM architecture takes one step further and proposes a distributed computing model that is
flexible enough to be implemented for nodes with very limited resources such as those encoun-
tered in sensor networks.
Among many projects that target the programmability of ubiquitous computing environ-
ments,one.world[31] is similar to SMs in the sense that both consider migration an essential
mechanism to adapt to highly dynamic computing environments. Each application inone.world
has at least one environment that contains tuples (similar to tags in the SM architecture), ap-
plication’s components, and other nested environments. When needed, a migration moves a
checkpointed copy of an environment to another node. A significant difference between SMs
andone.worldis that our work proposes a computing model based on execution migration,
while one.worlduses migration just as a mechanism to adapt to changes (i.e., in their program-
ming model, the applications reside on nodes and communicate through remote event passing).
Another difference is that the SM architecture is more suitable for resource constrained devices
whereasone.worldis designed for more powerful nodes.
SMs represent the underlying platform for Spatial Views, a high-level programming model
for networks of embedded systems, targeting their dynamic, space-sensitive and resource-
restrained characteristics. The core of the model is iterative programming over a dynamic
collection of nodes identified by the physical spaces they are in and the services they provide.
Hidden in the iteration is execution migration, as the main collaboration paradigm, constrained
by user specified limits on resource usage such as response time and energy consumption. A
Spatial Views prototype has been implemented and first results are reported in [62]. A Spatial
Views compiler with SMs as its target is currently being implemented.
10
The tag space bears some similarity with tuple spaces [24, 50]. While both offer persistent
shared memory for applications, the essential difference is that the tag space is local to each
node. Also, unlike tuple spaces, the tag space provides SMs with I/O tags for interaction with
the local OS and I/O subsystem. The concept of I/O tags share the same goal withLinux
Procfs[7] which allows user-level programs to access certain kernel information.
Content-based naming has been recently presented for both the Internet [13, 83, 32] and
sensor networks [33]. SMs use content-based migration to reach the nodes of interest. This
high-level migration function implements routing algorithms which leverage work done for
mobile ad hoc networks [41, 68, 48].
Although the security for both mobile agents [30, 47] and ad hoc networks [69, 36] have
been extensively studied, we have faced a new and more difficult problem: how to define a
security architecture for a system based on execution migration over mobile ad hoc networks?
Given the complexity of this problem, our current architecture provides solutions for protecting
the hosts against SMs and SMs against each other. It is much harder, however, to prevent
an SM from being tampered by a malicious host. Since SMs have to execute at any host,
end-to-end authentication based on digital signatures or encrypting the entire message are not
possible. Hardware solutions [9, 66] represent an option, but they involve extra-costs. Complete
software solutions are not known yet, but code confusion and encryption techniques have been
investigated [27, 76] in the context of mobile agents.
Coupled with security comes the issue of admission control at nodes. A significant amount
of research has been done to solve this problem for real time systems [78, 74] and active net-
works [28, 59]. Given that we did not want to limit the expressibility of the programming
language (e.g., SNAP [59]), our solution is based on user-provided lower bounds for resources
and non-preemptive execution. Each node has the flexibility to implement its own schedul-
ing and resource allocation policies which are typically integrated. These policies guarantee
enough resources to satisfy the lower bounds and let the SM migrate in case no more resources
are allocated. A problem that remains to be solved is how to protect the network, as a whole,
against malicious SMs that waste network resources, but respect the admission contract at each
node. TTL-based [25] or market-based [30] schemes offer possible solutions.
11
1.6 Contributors to Dissertation
Porlin Kang has contributed significantly to the current implementation of the Smart Messages
prototype [44]. Deepa Iyer has implemented a preliminary Smart Message prototype [21].
Phillip Stanley-Marbell has participated to the initial design of Smart Messages [79]. Akhilesh
Saxena has implemented the routing algorithm based on Bloom filters, which have been used
together with other routing algorithms to demonstrate the Smart Messages self-routing mech-
anism [20]. Gang Xu has implemented the protection domains and the corresponding API
for Smart Messages security architecture [86]. Peng Zhou has implemented a flooding-based
routing algorithm and the GUI for the EZCab application [67]. The following is a list of all
my colleagues that co-authored papers from which I used material in this dissertation: Porlin
Kang, Chalermek Intanagonwiwat, Deepa Iyer, Akhilesh Saxena, Gang Xu, Peng Zhou, Phillip
Stanley-Marbell, Kiran Nagaraja, Andrzej Kochut, and Tamer Nadeem.
1.7 Dissertation Roadmap
This dissertation is organized as follows. Chapter 2 describes Spatial Programming, a location-
aware programming model for outdoor distributed computing. In Chapter 3, we present the
Smart Messages, a system architecture based on execution migration which provides system
support for Spatial Programming. The Smart Messages self-routing mechanism, which al-
lows applications to dynamically change the routing algorithm, is discussed in Chapter 4. The
prototype implementation and evaluation for Smart Messages and Spatial Programming are
presented in Chapter 5. The dissertation concludes with Chapter 6.
12
Chapter 2
Spatial Programming
This chapter presents the design of Spatial Programming (SP), a location-aware programming
model for outdoor distributed computing. We start with the motivation for a high-level pro-
gramming model that can shield the programmers from the complex networking aspects en-
countered in NES, thus, allowing them to focus on the algorithmic details of the applications.
SP is introduced through an analogy with conventional programming and distributed program-
ming using shared virtual memory. After presenting a short SP overview, we describe the
main SP concepts, including spatial references, space regions, reference consistency, and ac-
cess timeout. The chapter concludes with an SP application for distributed object tracking that
illustrates all these concepts.
2.1 Motivation
Massive networks of embedded systems (NES) will become common in the near future as the
trend of embedding “intelligence” everywhere in the physical world increases. These networks
can be programmed to execute a large variety of distributed applications. Traditionally, the
main focus of distributed computing has been on performance or availability. Instead, the focus
of distributed computing over NES will be on enabling the systems embedded in the physi-
cal world to perform collaborative tasks. This type of distributed computing is more difficult
than traditional distributed computing because the state of the network evolves continuously
over time. Therefore, it is practically impossible to know the network topology or the node
properties at any moment in time.
To motivate the need for a novel programming model for outdoor distributed computing, let
us consider a collaborative object tracking application as illustrated in Figure 2.1. For this ap-
plication, two types of nodes are assumed available across a given geographical region: motion
13
Hill1 Hill2
motion
motion
motion
motion
motionmotion motion
motion
camera
camera
camera cameracamera
Figure 2.1: How to Program Motion Sensors and Intelligent Cameras Deployed over Two Hillsto Perform Distributed Object Tracking?
sensors and intelligent cameras. Each node is capable of determining its location (i.e., using
GPS [45] or other localization methods [71, 63]). The motion sensors remain static after de-
ployment, but the cameras can be mobile (e.g., carried by mobile robots [43]). The nodes may
fail or be deployed far from each other preventing them from participating in the computation.
Since motion sensors are less expensive, their number is significantly greater than the number
of cameras.
A potentially mobile user can start an application (e.g., from a wireless-enabled PDA) that
performs object tracking across a given geographical region. This application checks the status
of motion sensors in the desired region. Each time motion is detected, the application turns on a
certain number of cameras located in the proximity of that sensor and instructs them to perform
collaborative object tracking in order to identify the object that triggered the motion sensor.
During this process, the application accesses repeatedly the selected cameras and uses the par-
tial results computed at each node to dynamically determine the next action. Once the object
tracking completes, the active cameras are turned off. This application emphasizes the main
question that any programming model for outdoor distributed computing has to answer: how
to program anunknown number of volatile embedded systems(i.e., mobile or even disposable)
to execute a user-defined application in a certain geographical area?
The task stated above is difficult and tedious to program using the traditional message pass-
ing programming model. Characteristics of message passing systems (e.g., Message Passing
Interface (MPI) standard [8]) include explicit management of communication with possible
deadlocks due to mismatched communication pairs and “all or nothing” semantics. The pro-
grammers would also have to take care of all the details involved in reaching the area of interest
14
and contacting the target nodes located there. This is not a trivial task in a volatile network with
unknown configurations. In our example, the programmer does not know how many camera
nodes are there, or where exactly they are located. Additionally, the network dynamics (caused
by failures, mobility, or deployment of new nodes) may cause the application to fail since fixed
addressing schemes treat exceptions as failures.
To simplify the development of distributed applications in NES as well as to allow for rapid
prototyping, we need a programming model that shields the programmers from most of the
networking aspects. A simple way to present the programmers with high level abstractions
for writing distributed applications is to use a declarative programming style. Declarative pro-
gramming, used mostly for querying databases, is goal-oriented in the sense that programmers
simply specify what they want instead of how to algorithmically obtain the results. Multiple so-
lutions for programming sensor networks illustrate this programming style in NES [18, 55, 56].
The “database” model of programmability suits sensor networks well because these networks
act primarily as large “distributed databases” for the environments where they are deployed.
Despite its simplicity, declarative programming is not a panacea for every type of task
or NES. Imperative programming is more appropriate for complex tasks that go beyond data
collection, especially tasks whereby algorithmic details matter. Also, networks composed of
more powerful nodes (e.g., systems in cars, cell phones, intelligent cameras, mobile robots [6,
43]) cannot be programmed in a simple and effective way without having fine-grained control
over individual network resources. To summarize, a programming model for NES needs to
answer the following questions:
• How to write simple and intuitive programs for NES?
• How to refer to nodes in a network-transparent way?
• How to use the location of the nodes in computation?
• How to discover and access repeatedly resources at nodes?
• How to cope with network dynamics?
15
Page Table
Physical Memory
Application
Address SpaceVirtual
Conventional Computer System
Message Passing
Physical Memories
Page Table &
Variable Access
Shared Virtual Memory
Space Region
Systems Embedded
Spatial Programming
RuntimeSpatial Programming
Outdoor DistributedApplication
Address SpaceShared Virtual
ApplicationDistributed Spatial ReferenceVariable Access
in Physical Space
Figure 2.2: Analogy Between Spatial Programming and Two Traditional Programming Models
2.2 Overview
Spatial Programming (SP) is a location-aware programming model designed to satisfy these
requirements. The main idea of SP is to offer network-transparent, fine-grained access to data
and services distributed on systems embedded in the physical space. In SP, a network of physi-
cally distributed systems is viewed as a single virtual address space, and its individual resources
at nodes can be accessed by applications like normal variables. SP hides the distributed nature
of the underlying infrastructure. An application written under the SP model is a sequential
program that can transparently read and write network resources as they are local variables
declared in this program. Similar to the mappings from virtual to physical memory in a con-
ventional computer system, a runtime system maintains mappings between spatial references
and nodes in the physical space. SP applications can cooperate indirectly through shared net-
work resources. The SP model allows for a large spectrum of outdoor distributed applications,
ranging from computing the average/maximum temperature over a given geographical region
to collaborative applications such as distributed object tracking or coordinating military forces
on the battlefield. Typical applications for SP are those which execute a distributed algorithm
over a set of nodes selected based on their location and properties.
Given the scale of NES and the fact that most of the nodes work unattended, it is practi-
cally impossible to re-program each node individually for every new application. Therefore,
the SP implementation (described in Chapter 5) moves the application, sequentially, at each
node whose properties or content have to be accessed by the application. Thus, although trans-
parent to the application programmer, the actual execution takes place on the nodes hosting the
16
resources being accessed.
The high level view of the network as a single virtual address space is similar to the one
presented by shared virtual memory systems [52] (i.e., shared virtual memory shields the pro-
grammers from message passing communication, while offering a shared virtual address space
for distributed applications). A major difference, however, is that shared virtual memory is per-
formed over a stable and robust network, with an acceptable upper bound for memory access
time, while SP must tolerate dynamic network configurations, with unknown time bounds for
accessing systems embedded in the physical space. Figure 2.2 illustrates this analogy and the
simple abstractions defined by SP to support outdoor distributed programming:space regions
andspatial references.
2.3 Space Regions
Unlike traditional distributed systems where the physical location of the nodes does not mat-
ter, the spatial distribution of nodes across physical space is a key feature of massive NES.
These networks will span buildings, large facilities such as campuses or airports, or even roads
and forests. Most envisioned distributed applications for NES will exhibit a location-aware
behavior. In order to achieve their prescribed objectives, they will need to run within certain
geographical regions. For instance, the motivating application described at the beginning of
the chapter may want to activate intelligent cameras within a physical range of the trigger node
(the sensor that detected motion) since otherwise no causal relation can be established.
SP considers location a first order programming concept and exposes it to applications
through space regions. A space region is a virtual representation of a given physical space.
SP applications may use statically defined spaces or create dynamically new spaces. Static
definitions are used to describe physical spaces that do not change over time and are commonly
provided in the form of names associated with geographical regions (e.g., using topological
maps). In Figure 2.3,Hill1 andHill2 are defined as two circular regions in a two-dimensional
space. For the clarity of exposition, we will describe the creation of dynamic space regions in
a subsequent section after the introduction of spatial references.
17
Hill2
{Hill1:motion[0]} {Hill2:camera[0]}{Hill1:camera[1]}
{Hill1:camera[2]}{Hill1:camera[0]}
motion motion
Hill1
{Hill2:motion[0]}
cameracamera
camera
camera
Figure 2.3: Example of Spatial References for Object Tracking in a Network Consisting ofMotion Sensors and Intelligent Cameras Deployed over Two Hills
2.4 Spatial References
A spatial reference is defined as a{space:tag} pair which is mapped to a system embedded in
the physical space. Thespaceis a space region that represents the geographical scope of this
system. Thetag is the name of a property or service provided by the same system. Tags are
not globally unique because they name properties or services that can be provided by multiple
systems. Spatial references, like variables, are defined within applications; hence, a spatial
reference has meaning only within the application that defined it.
Spatial references provide applications with a virtual resource naming in the network. Ap-
plications access network resources using spatial references in the same way they access phys-
ical memory through variables in conventional systems (or in shared virtual memory systems).
Given that programmers have only limited knowledge about such dynamic networks (i.e., a
programmer does not know how many resources are in a given space, what types they are,
or even if they exist at all), spatial references offer a convenient method to refer to network
resources using theirexpectedlocations and properties.
Figure 2.3 presents examples of spatial references. To differentiate among systems with
the same space-tag pair referenced in the same application, programmers can use indexes to
refer to distinct systems. Thus, a spatial reference becomes a triplet{space:tag[index]}. SP
guarantees that spatial references with distinct indexes (but the same space-tag pair) map to
different systems. The figure shows how a programmer can use three distinct indexes to refer
to distinct cameras onHill1 .
18
1 Image[] getImages(Location location, int n){2 Image []image = new Image[n];3 for(int i=0; i<n; i++){4 {Hill1:camera[i]}.active = ON;5 {Hill1:camera[i]}.focus = location;6 image[i] = {Hill1:camera[i]}.image;7 }8 return image;9 }
Figure 2.4: Example of Program using Spatial References
An SP application can name and access multiple network resources provided by a node
using just one spatial reference. The construct{space:tag[index]}.resourcerefers to a certain
resourcelocated on the system referenced by{space:tag[index]}. In Figure 2.3,
{Hill1:camera[0]}.activemay denote the status of the camera, while{Hill1:camera[0]}.location
may represent the location of this system in space. To illustrate better the use of these concepts
in applications, Figure 2.4 shows a code fragment, where a program activates three cameras on
Hill1 , focuses them toward a certain location, and collects the images taken by these cameras.
This example demonstrate the SP simplicity, where applications can write (lines 4-5) or read
(line 6) resources at nodes using spatial references in a similar fashion to the way they use
variables. Spatial references relieve programmers from the burden of having to cope with all
the networking details of reaching the nodes of interest and accessing data or services on those
nodes. This is possible since applications are build on top of an SP runtime system which takes
care of name resolution, communication, and access to resources. The SP runtime system also
guarantees that each index maps to a different system in the same space region and ensures
reference consistency.
2.5 Reference Consistency
Conventional computer systems maintain reference consistency for variables. The operating
system uses per-application page tables to guarantee that each time an allocated variable is
used, it accesses the same physical memory location. Similarly, SP guarantees that each time an
application uses a certain spatial reference, it accesses the same system as long as this system
19
Hill1 Hill2(before motion)
{Hill1:camera[0]}.active = OFF;{Hill1:camera[0]}.active = ON;(after motion)
Motion Path
Figure 2.5: Reference Consistency Example: A Spatial Reference is Mapped to the SameSystem as long as this System Remains in the The Same Space Region
remains in its original space region. This property provides the ability to perform arbitrary
distributed computations over a subset of nodes selected based on their location and properties.
The SP runtime system maintains mappings between spatial references and the nodes they
refer to. These mappings are maintained in aper-application mapping tableand are persistent
during the SP program execution. At the time of the first access, a spatial reference is mapped to
a node located in the desired space region which provides the required property. Each mapping
table entry contains the location of the referenced node and a unique per-application network
address for this node. The location is used for faster subsequent accesses to this node. The
network address is assigned by the application (i.e., it has no global meaning) and is used to
confirm the identity of the node for subsequent accesses (a referenced node may move from
its recorded location, and another node may take its place). This address can also be used to
locate, in the same space, referenced nodes that moved from their recorded locations. Figure 2.5
shows how reference consistency works for SP applications. Once a spatial reference has been
mapped to a camera node, it can be used repeatedly by its application to access the same camera
even when this camera moves. To be semantically acceptable, the node has to remain in the
space region it was at the time of the first access (i.e., when the mapping was created).
In some situations, reference consistency is not necessary. For instance, an application that
needs to contact periodically a number of temperature sensors located in a certain region and
compute the average temperature may accept any sensor that provides the desired space-tag
pair. In such a case, if a referenced node cannot be found in its space region, the runtime
system should transparently remap the spatial reference to a similar node rather than returning
20
Hill1 Hill2
motion path
{Hill1:camera[0]}(before motion)
{Hill2:(Hill1:camera[0])}(after motion)
camera camera
Figure 2.6: Space Casting: The Same System is Referenced in Different Space Region
an exception for a failed access. To implement this feature, SP allows an application to specify
a remapflag for spatial references.
2.6 Space Casting
The SP runtime system locates the same node each time an application uses the same spatial
reference, provided that the node is still in its space region. If a node moves out of its space re-
gion, it becomes semantically unacceptable. Thus, the application receives a timeout exception
(the system could not find the node during the timeout interval). However, if the programmer
still wants to access this node and has knowledge about the node’s mobility patterns, the space
region for the spatial reference mapped to this node can be modified usingspace casting. The
construct{space2:(space1:tag[index])} changes the geographical scope of the spatial refer-
ence fromspace1to space2. Figure 2.6 shows how space casting is used to reach a camera
carried by a mobile robot which has moved fromHill1 to Hill2 . If the new space for a node
is unknown, a programmer can use theAnywherespace constant to cast a spatial reference to
any space. Note that in such a case thetimeoutensures that the attempted access will not take
forever.
2.7 Spatial Reference Access Timeout
Unlike traditional computer systems where the access time to memory is finite and an upper
bound for this time can be computed (i.e., by adding the miss penalties in the memory hier-
archy), in a volatile and dynamic NES, it is difficult to estimate how long it takes to access a
21
1 Image[] getImages(Location location, int n, int timeout){2 Image []image = new Image[n];3 try{4 for(int i=0; i<n; i++){5 {Hill1:camera[i], timeout}.active = ON;6 {Hill1:camera[i], timeout}.focus = location;7 image[i] = {Hill1:camera[i], timeout}.image;8 }9 }catch(TimeoutException e){
10 if (i < n/2)11 return null; // abort if less than half cameras12 // otherwise continue with a lower quality result13 }14 return image;15 }
Figure 2.7: Code Example for Spatial Reference Access Timeout
network resource. This problem happens both for new references (no more available systems
with the required space-tag pair) and for mapped references (they may become invalid because
the referenced node can move from its space or simply cease to exist).
SP requires application programmers to reason about the possibility of not reaching a node
by imposing atimeouton each spatial reference (i.e., the format of a spatial reference becomes
{space:tag[index], timeout}). This timeout allows a programmer to limit the access time to
a network resource which, given the volatility of the network, may take forever. Essentially,
SP defines a “best effort” semantics that allows an application to make progress and get a
semantically acceptable result even in adverse network conditions. If a node cannot be reached
in the specified time interval, the SP runtime throws a timeout exception; once the application
catches this exception, it can decide about further actions.
Figure 2.7 shows the same code presented before in Figure 2.4, except for the added timeout
at each spatial reference. If the access to one of the spatial references times out, the application
catches aTimeoutexception (line 9). In this example, the application goes ahead with a partial
result if images from at least half of the required number of cameras have been acquired already
(lines 10-12). Otherwise, it aborts the computation and returns null.
Commonly, the programmer sets each timeout based on a constraint imposed by the user on
the total execution time (e.g., the total time is divided equally among all accesses, or each new
22
Hill2Hill1
Rangemotion
{Hill1:motion[0]}
{rangeOf({Hill1:motion[0]}, Range):camera[0]}
camera
Figure 2.8: Dynamic Definition of a Relative Space Region
access can have the entire remaining time). If no such constraint is imposed, the SP runtime
system considers a “default” value for the timeout.
2.8 Defining New Space Regions
Besides statically defined space regions, SP also supports dynamically defined space regions.
Composedspace regions can be defined using the union or intersection operators (i.e., these
space regions are also defined as circles that circumscribe the actual physical space). If we
consider the hills from our examples throughout this section, a spatial reference{(Hill1 +
Hill2):camera[0]} returns a camera node located on eitherHill1 or Hill2 .
Defining relativespace regions based on the position of a referenced node offers two ben-
efits for applications: access to systems located in dynamically defined space regions, and
possibility to “remember” a space region where a certain event took place, even after the node
that produced (or detected) this event is no longer there.
The rangeOfoperator defines a space region in the proximity of a node referenced by a
spatial reference. Figure 2.8 shows how such a relative space is dynamically defined and used
to refer to a camera node located in the proximity of a motion sensor. Similar torangeOf, SP
defines thenorthOf, southOf, eastOf,andwestOfoperators. They create space regions relative
to the position of a referenced node and the respective cardinal direction (the center of the
circular region is located toward that cardinal direction at a given distance from the position of
the referenced node).
23
2.9 Creating/Removing Network Resources
In addition to accessing resources that already exist at nodes, SP programs can also dynami-
cally create/remove their own resources. For instance, an application may need to create new
resources in order to store data in the network (i.e., similar to creating files in a file system).
The primitives that offer this functionality are:
create({space:tag[index], timeout}.resource)
remove({space:tag[index], timeout}.resource)
Currently, SP provides just a limited resource sharing policy: the resources provided by
nodes are shared, and the resources created by applications are private.
2.10 Putting It All Together: Program Example
We conclude this chapter by presenting the code (Figure 2.9) for the object tracking application
used throughout the chapter. This application emphasizes the novel concepts introduced by
SP, as well as the simplicity of programming under this model. Additionally, it represents the
class of applications that can benefit mostly from the SP model: applications that execute a
distributed algorithm over a set of nodes selected based on their content and spatial properties,
and during computation, they access repeatedly the nodes from this set.
The application checks the status ofNsmotion sensors onHill1 (lines 1-3). Once the mo-
tion is detected at one of the monitored sensors, a relative space,motionSpace, is created around
that sensor in order to perform object tracking within its proximity (line 4). Any node located
in motionSpacethat is not active (i.e., not working for other applications) is turned on, focused
to the location of motion, and added to the set of active cameras until the desired number of
Nc active cameras has been reached (lines 5-11). If a timeout exception is raised during this
computation, the application has to decide what to do next. In our example, the application
accepts a possibly lower quality of result and goes ahead if at least half of the desired number
of cameras is found. Otherwise, it restarts monitoring the motion sensors (lines 13-17). During
the object tracking (line 18), the cameras may be accessed multiple times due to the reference
consistency feature of SP. The actions taken at a camera node depend on the partial results
computed at previously visited nodes. If a camera moves out ofmotionSpaceduring the object
24
1 for(i=0; i<Ns; i++){ // loop over Ns motion sensors2 try{3 if ({Hill1:motion[i], timeout}.detect == true){4 motionSpace = rangeOf({Hill1:motion[i], timeout}, Range);5 location = {Hill1:motion[i], timeout}.location;6 for(j=0, k=0; j<Nc; k++) // build the set of Nc cameras7 if ({motionSpace:camera[k], timeout}.active == OFF){8 {motionSpace:camera[k], timeout}.active = ON;9 {motionSpace:camera[k], timeout}.focus = location;
10 activeCameras[j++] = {motionSpace:camera[k], timeout};11 }12 }13 }catch(TimeoutException e){14 if (j < Nc/2)15 continue; // continue monitoring the motion sensors16 // otherwise, do object tracking with lower quality of result17 }18 result=objectTracking(activeCameras);19 for(k=0; k<j; k++)20 activeCameras[k].active = OFF;21 return result;22 }
Figure 2.9: Spatial Programming Application for Object Tracking
tracking, the application may just ignore it or use space casting to re-discover it (considering
the execution time and typical motion speeds, the camera should be in the proximity ofmo-
tionSpace). The application ends by turning off the set of active cameras (lines 19-20). This
operation is also enabled by the reference consistency property of SP.
2.11 Summary
In this chapter, we have presented the design of Spatial Programming (SP), a location-aware
programming model for outdoor distributed computing. SP offers fine-grained, network-transparent
access to systems embedded in the physical space. Central to SP is the concept of spatial refer-
ence, which defines a virtual name space over NES using the expected locations and properties
of these systems. Programmers use spatial references to access the content or services pro-
vided by nodes in the network in the same way they use variables in a conventional program.
The main benefits of SP are the flexibility and simplicity to program user-defined distributed
applications in highly volatile outdoor computing environments.
25
Chapter 3
Smart Messages
This chapter describes the Smart Messages (SMs) distributed computing platform for networks
of embedded systems, which can be used to program any-user defined distributed application.
To simplify the application development, we have used the SM platform to implement Spatial
Programming. In this chapter, we describe the SM system architecture, based on execution
migration, content-based naming, and self-routing. Additionally, we present the node archi-
tecture (i.e., nodes in the network cooperate by providing a common system support) and the
security architecture for SMs. After describing the SM API, we demonstrate the features of the
SM platform by implementing and evaluating two previously proposed applications for sen-
sor networks (SPIN [34] and Directed Diffusion [39]). For evaluation, we have developed an
event-driven simulator, extended with support for SM execution. The chapter concludes with
simulation results for these two applications.
3.1 Smart Messages Architecture
Smart Messages (SM) define a distributed computing platform for NES based on execution
migration. Instead of transferring data (i.e., data migration) among nodes involved in the com-
putation, applications developed over the SM platform transfer the execution to each of these
nodes. Figure 3.1 illustrates the difference between the traditional distributed computing using
data migration and distributed computing with Smart Messages.
Let us assume that a user needs to contact three nodes that provide certain services. In the
data migration approach, the user application gets the addresses of the nodes, and then it sends
requests and waits for answers. This approach works well in relatively stable networks such as
the Internet. On the other hand, in more volatile networks such as NES, the user application may
take an indefinite amount of time to complete with this approach. For instance, let us consider
26
������������
������������
Node3
(1)
(3)
(5)
(2) (4)
(6)
for(i=0; i<3; i++){
// do computation}
send(request, address[i]); receive(data, address[i]);
DataReceive
RequestData
RequestData
DataReceive
DataReceive
RequestData
Node2
Node1
User Node
Network
������������
������������
Node3
(1)
(2)
(4)
(3)
read(data); // do computation}
migrate(property[i]);for(i=0; i<3; i++){
ExecutionMigration
ExecutionMigration
Execution
ExecutionMigration
MigrationNode1
Node2
User Node
Network
Figure 3.1: Traditional Distributed Applications vs. Smart Messages Applications
that these responses are time-sensitive. If the third one does not arrive (e.g., congestion, broken
routes, failed service node), the application can only wait (for an indefinite amount of time) or
re-issue all the requests.
Using execution migration, however, the application can adapt dynamically to changing
network conditions. In the SM architecture, the application discovers the nodes of interest
sequentially and executes on each of them. Thus, the application can make incremental progress
and eventually complete even in highly volatile networks. Additionally, since the nodes are
named by properties, the application can discover similar nodes even if its initial targets become
unavailable.
Distributed applications built on top of the SM architecture are collections of SMs. An
SM is a user-defined application whose execution is moved sequentially over a series of nodes
using execution migration. The nodes on which SM applications execute, called “nodes of in-
terest”, are named by properties, discovered using application-controlled routing, and switched
when the SM application calls for execution migration. The payload of an SM consists of data
“bricks”, explicitly identified in the application, and execution control state. Code “bricks”
may also be transferred if the code is not cached at destination. An SM can carry multiple data
and code bricks, and it can use them to create new SMs during its execution. In this way, an
application can eventually generate multiple SMs although it has started as a single SM.
27
Node 1(Node of interest)
Node 2(Intermediate Node)
Node 3(Node of interest)
Migration
Migration
Application
Routing
Application
CodeCode CodeCacheCache Cache
TagSpaceSpace
Machine MachineMachineVirtual Virtual Virtual
Tag TagSpace
Figure 3.2: Distributed Computing Using Smart Messages
The SM computing platform assumes a decentralized architecture, where nodes in the net-
work act as peers. SMs do not make any assumptions about the underlying network configura-
tion, except for a minimal system support provided by nodes: avirtual machine, a name-based
memory, calledtag space, and acode cache. The virtual machine offers a hardware abstrac-
tion layer for SM execution, which shields SMs from heterogeneous node configurations. The
tag space offers a name-based memory, persistent across SM executions. It consists of(name,
data) pairs, called tags, which are used for data exchange among SMs. Special I/O tags are
used as interface to the host OS and I/O system. Tags serve also to name the destination of
SM migrations and store routing information (routing tags). The code cache stores frequently
accessed code bricks in order to amortize the cost of transferring code over time. Figure 3.2
depicts the execution of an SM over three nodes. The SM application code starts onNode1and
finishes onNode3. The SM reachesNode3by explicitly migrating from node to node.Node2
is used as an intermediate hop, where only the SM routing code executes. Note that an SM
executes and potentially carries both application and routing code.
To illustrate how NES are programmed using SMs, we present a very simple example con-
sisting of an SM that books cabs in a densely populated city. Let us consider a group of people
attending a conference, who wants to return to the conference venue after an “off-site” lunch.
Instead of calling a cab company or waiting on the street for a free cab, one of them uses her
handheld device to inject an SM in the network to book a certain number of free cabs. Each cab
provides support for SM execution and is identified by aFreeCabtag name. The code for this
28
int numCabs, i; //stored in data brickLocation loc; //stored in data brickfor(i=0; i<numCabs; i++){
migrate("FreeCab");deleteTag("FreeCab");writeTag("Location", loc);
}
Figure 3.3: Smart Message Code Example
Data BrickApplication CodeRouting Code
migrate("FreeCab") migrate("FreeCab")
i=1 i=2i=0i=0 i=1Mes
sage
Smar
t
sys_migratesys_migratesys_migrate sys_migrate
... ... .........
CabClient Occupied FreeCab
OccupiedCab
FreeCab
Figure 3.4: Execution Path for the Above Smart Message
application is shown in Figure 3.3, and the SM execution path through the network is depicted
in Figure 3.4. The SM migrates to free cabs, changes their status from free to occupied (by
removing theFreeCabtag), and instructs them to come to the client’s location (by writing to
Locationtag). The SM completes after booking the desired number cabs.
The key operation in the SM programming model is multi-hop, content-based migration,
which implements routing using tags. An SM names the nodes of interest by tag names (which
represent properties or content of that node), and then calls a high-levelmigrate function to
route itself to a node that has the desired tags. In our example,migrate(“FreeCab”) routes
the SM to free cabs using the occupied cabs as intermediate nodes. This high-level function
uses the low-levelsysmigrate primitive, provided by the SM system software, for one-hop
migration. After a migration, the SM resumes from the next instruction following the migrate
call. It is important to notice that migration is explicit (i.e., the programmer callsmigratewhen
needed).
29
Figure 3.4 emphasizes two major characteristics of the SM architecture. First, the high-
level, content-based migration shields the application programmer from the routing details.
Although the routing code is executed at each node as the SM migrates hop-by-hop through the
network,migratereturns the control to application only on nodes of interest (i.e., free cabs).
Second, the data transferred during a migration is specified by the programmer as data bricks;
the variablesnumCabs, i, andloc are stored in a data brick and carried from node to node during
migrations (the figure shows howi is modified during execution).
From a user’s perspective, this model offers resilience to dynamic network configurations
and simple deployment of new distributed applications in the network. An application pro-
grammer can write simple sequential programs that migrate to nodes named by content and
execute there, while ignoring the routing which is embedded inmigratefunctions. These are
user-level functions, typically developed by system programmers. Applications can choose be-
tween multiplemigrate functions and adapt to dynamic network configurations by switching
these functions during execution.
To achieve good performance in networks composed of resource constrained nodes, we
have decided against involving the VM in determining which data is needed across migrations.
In our architecture, the VM captures the minimal execution control state required for SMs to
resume at the instruction following a migration. Although this decision puts clearly a burden
on programmers, it avoids the overhead of having the VM collect the “live data” of SMs; many
times this operation is not only time consuming, but also collects more data than necessary (i.e.,
conservative approach), thus increasing the amount of traffic in the network.
3.2 Cooperative Node Architecture
In order to execute SM-based applications, the nodes must cooperate to support SM execution
and routing. The entire SM model is built under the assumption that the node architecture
must be kept as simple and flexible as possible. Figure 3.5 shows the system components of a
cooperative node.
30
Tag Space
Incoming SM Migrating SMNetwork Network
SM Ready Queue
Injector
Cache
Manager Scheduler Virtual
Authorization
Application I/O
Code
Admission
Tags Tags
Machine
OS & I/O
Local
SM Platform
Figure 3.5: Cooperative Node Architecture
3.2.1 Virtual Machine
The virtual machine (VM) executes VM-level threads generated by incoming SMs. To migrate
an SM, the VM captures the execution state and sends it along with the code and data bricks to
the next hop. The VM at destination resumes the SM at the instruction following themigrate
call.
3.2.2 Local Injector
The local injector allows the users to start new SMs at the local node. A VM-level thread is
generated for each new SM. This thread is stored in theSM ready queueand dispatched for
execution according to the scheduling policies.
3.2.3 Scheduler
The SM execution is non-preemptive; other SMs can be accepted, but they are not dispatched
for execution before the current SM completes. The non-preemptive scheduling simplifies the
implementation of inter-SM synchronization and sharing. Additionally, we envision that the
overhead introduced by more complex scheduling will not be justified for NES applications,
which typically have short execution time.
31
3.2.4 Admission Manager
To prevent excessive use of resources (e.g., processor cycles, tag space memory, runtime mem-
ory, bandwidth), the nodes have to perform admission control on incoming SMs. The admission
control at nodes ensures the progress of all SMs running in the network. It also prevents SMs
from migrating to nodes where they cannot achieve anything due to resource constraints. SMs
present their resource requirements in a resource table. The admission manager receives the re-
source table, decides whether to accept the SMs or not, and enqueues the accepted SMs into the
SM ready queue. It also instructs an accepted SM to transfer only the missing code bricks (i.e.,
the code bricks that are not stored locally) and stores them in the code cache upon reception.
The admission manager makes the admission decision based on the current state of the node
and the SM’s resource requirements. This decision is based on the admission policy in effect
at that node. An accepted SM is guaranteed non-preemptive execution as long as its resource
usage does not exceed certain limits defined by the admission policy. For instance, a node may
run out of battery and decide to accept only SMs for which it is a node of interest, but reject all
SMs that need to route through it. If an SM is rejected, the migration call fails at the source,
and the SM regains the control.
Precise resource usage for SMs cannot be predicted in advance because their computations
depend not only on user-provided input data, but also on data gathered from the network during
execution. To be able to perform admission, the admission manager needs, however, at least
approximate information about SMs’ resource requirements. One solution would be to specify
upper bounds for the resource requirements. We have dismissed this idea for two reasons: com-
puting relatively precise upper bounds is as hard as predicting the actual resource usage (i.e.,
we do not have knowledge about data acquired at runtime), and large upper bounds may lead
to frequent rejections at nodes even though the SM may consume significantly less resources
during its execution.
Our solution requires each SM to specify its lower bounds for resource requirements. The
programmers can set them before any one-hop migration, and they define the minimum amount
of resources that may lead to SM completion or migration. The programmers may use com-
piler support to derive lower bounds for resource requirements. The declaration of these lower
32
bounds serves two purposes: protect SMs from migrating to a node that cannot offer enough
resources for any semantically acceptable result, and protect the resources at the node from be-
ing wasted on such SMs. Based on the admission policy, the system may grant more resources
to SMs that have exceeded their lower bounds during execution. If no more resources could
be granted, the system raises an exception which, by default, terminates the SM. The SM is al-
lowed, however, to catch this exception, to save data of interest in data bricks, and to migrate. A
limited amount of resources is reserved during admission for the exception handler. To ensure
a successful migration for this case, the SM has to declare, during admission, the maximum
amount of data it plans to carry to the next hop.
3.2.5 Tag Space
The tag space provides a name-based memory and a unique interface to the local OS and I/O
system. It consists of a collection of tags that can be divided into two categories: (1)application
tags which are created by SMs and used for inter-SM communication and synchronization, and
(2) I/O tags which belong to nodes and allow SMs to access system resources. The structures
of these tags are shown in Figure 3.6. Each tag has a name (unique at a node, but not globally
unique) which is similar to a file name in a file system. SMs use this name for content-based
naming.
Application tags are commonly used for data exchange among SMs because their data
portion can store application-specific data. For instance, an SM can build a routing table in a
tag, and other SMs can subsequently read the routes from this tag. Each application tag has a
lifetime that specifies the duration after which the tag expires and its memory is reclaimed by
the node.
I/O tags act as a gateway between SMs and the underlying OS and I/O system. Usually,
each I/O tag is associated with an external process, which communicates with the VM through
a standard interface. Each time an I/O tag is accessed by an SM, its associated external process
interacts with the local resources and returns a response to the SM.
The access to tag space is protected using an access control list (ACL). The application tags
have also ownership information (i.e., OwnerID and FamilyID). We defer the description of the
protection mechanism to Section 3.4.
33
Name Data Lifetime OwnerID Name ACL I/O Handler
I/O TagApplication Tag
SM Blocked QueueFamilyID ACL
Figure 3.6: Application and I/O Tag Structures
Similar to existent solutions [10, 5], we use namespaces to avoid naming conflicts; a tag
name is preceded by a namespace (i.e.,namespace:tagname). The I/O tags have a pre-defined
namespace,ions, which is known by any SM. The namespaces for application tags, on the other
hand, are defined by the SMs that create them. Each SM has a unique default namespace which
is used when a reference to a tag name is not preceded by a namespace. The system where the
SM is injected generates this unique namespace, and every SM created dynamically inherits it
from its parent SM.
An SM may use other namespaces to cooperate with SMs that do not belong to its family.
Accessing tags in other namespaces does not create problems because the access is subject
to access control. Creating new tags, however, may lead to naming conflicts. For instance,
two different SMs may create two tags with the same name, but with different semantics. A
solution to this issue is to ensure that conflicting namespaces are extremely rare in practice
(e.g., a namespace is a long random string of bits). The developers that need to cooperate can
exchange these namespaces off-line.
Although simple, this solution is not bullet-proof. If an SM needs to ensure that conflicts are
avoided, it has to usesecurenamespaces (i.e., by definition, a secure namespace is preceded by
the keywordsecure). At the compilation time, the compiler builds the list of secure namespaces
used in tag creation invocations throughout each code brick. The compiler has to be able
to generate the list of namespaces (i.e., the namespaces are either directly specified, or the
compiler is able to determine them using static analysis); if the compiler is not able to find at
least one possible namespace for a tag, the compilation fails.
At injection time, the SM must present a capability for each namespace in the compiler-
generated list. Therefore, the developer of a code brick (or the developer of an SM) has to
acquire these capabilities such that each code brick of an SM has an associated list of capabili-
ties. During SM injection, the system verifies the capabilities and creates a list of namespaces
34
for each code brick. This list together with the default namespace is maintained in the SM
structure and cannot be modified over time. A child SM inherits the list of namespaces for the
code bricks that compose it. If an SM does not present a capability for every namespace in the
list generated by the compiler, it will be rejected during the injection phase.
A central authority (CA) keeps track of all secure namespaces and their owners. Each time
a namespace owner decides to allow a code brick to create tags within that namespace, she
associates a capability, digitally signed by the CA, with this code brick; the capability contains
the hash value of the code brick. Similar to ANTS [25], this value is obtained by applying a
hash function on the code itself. Each node has the public key of the CA and the common hash
function. During SM injection, the VM uses the CA’s public key and the capability to verify
that the code bricks are authorized to use the secure namespaces.
3.2.6 Synchronization Mechanism
Given the non-preemptive SM execution, we have devised a simple update-based synchroniza-
tion mechanism for inter-SM communication. An SM can block on an application tag until
another SM performs a write on that tag. A blocked SM is appended to theSM blocked queue
and yields the processor (this is the only exception to our run-to-completion model of execu-
tion). After an SM blocks, the scheduler may dispatch other SMs for execution. When an SM
writes to an application tag with a non-empty SM blocked queue, all SMs in the queue are
woken up and made ready for scheduling. To prevent infinite blocking, if no write operation
takes place within a given timeout, SMs are unblocked and made ready for scheduling.
3.3 Smart Messages API
The SM API is presented in Table 3.1. SMs are allowed to create new SMs dynamically, migrate
one-hop to neighbor nodes, access the tag space, set lower bounds for resource requirements,
and synchronize on tags. Also, the SMs can use the uniform interface provided by the tag space
to execute system calls on the local host (i.e. through I/O tags). The use of the SM primitives
is extensively illustrated in Section 3.5 and throughout the next chapter.
35
Category Primitives
createSMFromFiles(codefiles, databricks);createSM(codebricks, databricks);spawnSM();
Smart Messages sysmigrate();blockSM(tagname, timeout);setResources(resources);createTag(tagname, lifetime, data);deleteTag(tagname);
Tag Space readTag(tagname);writeTag(tagname, data);
Table 3.1: Smart Messages API
3.3.1 Creation
Initially, an SM is injected at a node as a program file, and it callscreateSMFromFileswith a list
of program file names and data bricks to create a new SM structure. An SM may usecreateSM
to assemble a new, possibly smaller SM using some of its code and data bricks. AcreateSMcall
is commonly used to build an SM that cooperates with the current one (e.g., a route discovery
SM). An application that needs to clone itself callsspawnSM(similar to thefork system call in
Unix). Typically, spawnSMis invoked when the current SM needs to migrate a copy of itself
to nodes of interest while continuing the execution at the local node. A new SM generated by
createSMor spawnSMis scheduled for execution at the local node.
3.3.2 Migration
Thesysmigrateprimitive implements one-hop migration. It captures the execution state, sends
the resource table for admission, transfers the accepted SMs, and resumes these SMs at destina-
tion. Thesysmigrateis used by high-levelmigratefunctions to route SMs to nodes of interest.
More details about the SM self-routing mechanism are presented in Chapter 4.
3.3.3 Synchronization
TheblockSMprimitive allows SMs to block on a tag pending a write by another SM. Typically,
an SM uses this primitive to wait for a route. For instance, an SM can create a route discovery
SM and block on a routing tag until the route discovery SM returns (i.e., the route discovery
36
SM writes to the routing tag, and thus wakes up the blocked SM).
3.3.4 Setting Resource Requirements
Programmers invokesetResourceseach time they need to set new lower bounds for resource
requirements. Typically, this primitive is called once per high-level migration invocation and
specifies two categories of lower bounds: resources needed for routing, and resources needed
for computation at the node of interest (i.e., the target of migration). The resources are imple-
mentation specific, but they include at least: number of VM cycles, amount of runtime memory,
amount of tag space memory and the duration for which this memory is needed, I/O tags to be
accessed, and maximum number of bytes that would be generated when migrating this SM to
another node. An SM is not required, however, to set the resource requirements. In such a case,
the admission is based only on the size of the SM, but the node does not provide any type of
guarantees (i.e., the SM can be terminated or asked to migrate at any moment). Our current
prototype, described in Section 5.1, uses this very simple solution.
3.3.5 Tag Space Access
An SM can create, delete, or access application tags. As mentioned in Section 3.2, the tags are
accessed subject to authorization. The same interface is used to access the I/O tags: SMs can
issue commands to I/O devices by writing into I/O tags, or can get I/O data by reading from
I/O tags (an SM cannot create or delete I/O tags).
3.4 Security Architecture
One of the traditional pitfalls of existing systems based on mobile code is security. Similar
to mobile agents, there are three main issues that have to be solved: (1) protecting recipient
hosts from SMs, (2) protecting SMs from each other, and (3) protecting SMs from malicious
hosts. These problems become more severe for SMs due to the volatile nature of NES. Unlike
traditional mobile agents for relatively stable IP-based networks, the SMs have to overcome
the lack of an infrastructure or a central authority, specific to mobile ad hoc networks, which
increases significantly the difficulty of key authentication and group management.
37
In this section, we present a basic security architecture for SMs, which focuses on providing
protected access to the tag space. This security architecture offers protection against malicious
SMs under the assumption that the SM system software at nodes is trusted (i.e., we do not
protect SMs against compromised hosts). To protect against compromised systems, we plan
to develop a distributed trust mechanism [23], which helps a node assign trust values to its
one-hop neighbors; a node deemed untrusted is simply removed from the list of neighbors.
Optionally, an SM may ask to be migrated in an encrypted form between neighbor nodes. To
support this, each node carries a pair of public/private keys.
3.4.1 Access Control
A unique characteristic of SMs is that no direct access is allowed to system resources (i.e.,
the SMs access both their data and system resources through the tag space). The advantage of
this design is that the tag space is a single point of access control, which can be implemented
and enforced uniformly. Compared to mobile agent systems [30], the tag space simplifies
greatly the control mechanisms. The SM creating a tag, called tag’s owner, determines the
access control policy and delegates the host to enforce this policy on its behalf. Protecting the
application tags ensures that SM executions do not interfere with each other, and therefore,
provides a secure channel for SM cooperation.
A tag incorporates the ID of its owner, the ID of its owner’s family, the address of the
node where its owner’s family originated, and its ACL (access control list). SMs are uniquely
identified by the node address where they originated and the time of their creation. We define a
family of SMs as all SMs originated from an SM injected in the network by a user. The family
ID is the ID of the original SM. Since an SM can migrate or spawn new SMs at intermediate
nodes, its family information can be used to enforce access control for an entire family of SMs.
The ACL is a matrix of subjects and their access permissions to tags, read(r) or write(w). The
ACL contains five protection domains:Owner, Family, Origin, Code, andOthers.
Each time an SM tries to execute an operation on a tag, the VM performs the authorization
process. Based on the credentials presented during admission and the currently executing code
brick, the SM is associated with at least one protection domain. A user or the SM itself cannot
forge an SM’s identification information because this information is set automatically by the
38
Others
Owner
Origin
Code
Family
Figure 3.7: SM Protection Domains for Tag Space Access
SM1
N1
N4
N3
N2 SM2
SM1 SM1
SM2N5
{Family, rw}T
Figure 3.8: Access Control Example For Smart Message Family Cooperation (Ni are Nodes,SMi are Smart Messages, andT is a Tag)
system. The request is granted if the SM has the necessary permissions to access the tag in any
of the protection domains it has been associated with.
3.4.2 Protection Domains
TheOwnerandOthersprotection domains define the access permissions for the SM that owns
the tag and for any SM, respectively. The group concept, defined as an arbitrary relation over
SMs, supports more flexible cooperation, but also requires high overhead of managing the
group membership on-the-fly. Currently, our architecture does not support dynamic coopera-
tion among totally independent SMs. Instead, we define three protections domains that allow
cooperation among well-defined groups of SMs (i.e.,Family, Origin, Code). Figure 3.7 shows
that an SM can be associated with multiple protection domains for a tag. In the following, we
present three scenarios that illustrate the protection domains for group cooperation.
Family cooperation. In Figure 3.8, all cooperative SMs originate from a common SM
ancestor. For instance,SM1 is created onN1 and migrates toN2. At this node it creates a child,
SM2, which migrates and creates a tagT on nodeN5. To allowSM1 to access this tag,SM2 sets
39
N2
N1 SM2
SM1
N3 N4
SM2N5
SM1
SM2
{Origin, rw}T
Figure 3.9: Access Control Example For Single Originator Cooperation (Ni are Nodes,SMi areSmart Messages, andT is a Tag)
r=
r=
r
N5SM2
SM1
N1
N2(C ,C )2SM2
SM1 (C ,C )1
N3 {Code=(C ), rw}
N4
T
Figure 3.10: Access Control Example for Code-based Cooperation (Ni are Nodes,SMi areSmart Messages, andT is a Tag)
the ACL to{Family, rw} (i.e., the familyID ofT is the same as the family ID ofSM1).
Single originator cooperation.Figure 3.9 shows the scenario when the group of coopera-
tive SMs originate from a common node.SM1 andSM2 are created on nodeN1 and migrate to
a target nodeN5 via different paths.SM1 arrives atN5 beforeSM2 and creates a tagT. It also
sets the ACL as{Origin, rw} such thatSM2 will be able to accessT (i.e., the unique IDs of
SM1 andSM2 contain the same origin ID). This scenario is very likely to be encountered since
many nodes are small devices, such as PDAs or cell phones, owned by a single user.
Code-based cooperation.In addition to the simple groups described before, the SM group
cooperation can be coordinated more flexibly based on code bricks. To ensure cooperation
among SMs that are aware of the code used for data sharing or data exchange, each tag has a
list of associated hash values for certain code bricks. These hash values define the members
of the Codegroup (they may or may not belong to the owner of the tag). By definition, an
SM is a member of theCodegroup if the hash value of its currently executing code brick
belongs to this list. For instance, SMs using the same routing brick can add the hash value
corresponding to this brick to the tag’s list of hash values in order to facilitate route sharing
among them. Figure 3.10 presents such an example.SM1 creates a tagT and sets the ACL to
40
{Code=(Cr), rw} to grant access to all the other SMs using theCr routing brick. Hence,SM2
has the permissions to useT.
3.5 Application Examples
To prove that virtually any protocol or application can be written using SMs, we have imple-
mented two previously proposed applications: SPIN [34] and Directed Diffusion [39]. They
present different paradigms for content-based communication and computation in sensor net-
works: SPIN is a protocol for data dissemination, and Directed Diffusion implements data
collection.
3.5.1 Background
SPIN [34] is a family of adaptive protocols that disseminates information among nodes in a
sensor network. We present an implementation of SPIN-1 which is a three-stage handshake
protocol for data dissemination. Each time a node obtains new data, it disseminates this data in
the network by sending an advertisement to its neighbors. The node receiving the advertisement
checks if it has already received or requested that data. If not, it sends a request message back
to the sender asking for the advertised data. The initiator sends the requested data, and then,
the process is executed recursively for the entire network.
In Directed Diffusion [39], a sink node requests data by sending “interests” for named data.
Data matching an interest is then “drawn” from source nodes toward the sink node. Interme-
diate nodes can cache and aggregate data; they may also direct interests based on previously
cached data. At the beginning, the sink may receive data from multiple paths, but after a while it
will reinforce the path providing the best data rate. All future data will arrive on the reinforced
path only.
3.5.2 SPIN using Smart Messages
To illustrate a distributed application written using SMs, Figure 3.11 presents the code for our
implementation of SPIN. The tag space at each node hosts two tags: the value of the most
recent data received (tagData), and the timestamp associated with this data (tagTimestamp).
41
1 DisseminateSM(String tag, int timeout){2 // Data Brick3 int timestamp;4 Data data;5 String tagData=tag+"data";6 String tagTimestamp=tag+"timestamp";7 Address src, dest;8 // Code Brick9 while(true){ // SM at source
10 blockSM(tagData, timeout);11 timestamp = readTag(tagTimestamp);12 if (spawnSM() == 0){ // child SM13 while(true){ // SM at every node14 src = getLocalAddress();15 sys_migrate(all); // migrate to all neighbors16 int localTimestamp = readTag(tagTimestamp);17 if (timestamp <= localTimestamp){18 // the same or more recent data exists at this node19 System.exit(0);20 }21 writeTag(tagTimestamp, timestamp);22 dest = getLocalAddress();23 sys_migrate(src); // migrate back to source24 data = readTag(tagData);25 sys_migrate(dest); // bring data to destination26 writeTag(tagData, data);27 }28 }29 }30 }
Figure 3.11: Implementation of SPIN with Smart Messages
The protocol is initiated by injecting aDisseminate SMinto a node that produces data. This
SM blocks ontagData (line 10) waiting for new data. Each time new data is produced, the
SM reads thetagTimestampand spawns itself (lines 11-12). The “child” SM migrates to all
one-hop neighbors to advertise the new data (line 15). If a destination node does not have this
data or more recent data, the “child” SM updates thetagTimestampand migrates back to the
source to bring the data (lines 16-23). Upon data arrival (lines 24-26), the “child” SM executes
recursively the same algorithm until the data is disseminated in the entire network.
42
3.5.3 Directed Diffusion using Smart Messages
For the implementation of Directed Diffusion using SMs, the tag space at each node hosts
three tags: the most recent data value (tagData), the best data rate available at that node (tag-
DataRate), and the best next hop toward the source (tagBestRoute). Directed Diffusion is
initiated by injecting an SM at the sink. The execution of this SM has two main phases: (1)ex-
plorationstarts at the sink and floods the network to find data of interest, and (2)reinforcement
chooses the best path and brings data from source to sink.
If the information of interest is not locally available (notagDataRatevalue), theexplore
SMspawns itself; the “child” SM migrates to all neighbors, while the “parent” SM blocks on
tagDataRate. This operation is performed recursively at every node until an SM reaches a node
containing thetagDataRate. At this point, the “child” SM migrates back to its parent carrying
the discovered data rate. If the new data rate is better than the value stored intagDataRate, the
SM updatestagDataRatewith the new value andtagBestRoutewith its source as the best node
in the path toward the source of data. This update unblocks the “parent” SM which will carry
the data rate one hop back. Eventually, the sink node is reached and the reinforcement phase
begins.
During the reinforcement phase, acollect SMmigrates to the best next hop starting from the
sink. At each intermediate node, this SM spawns; the “child” SM migrates to the best next hop,
while the “parent” SM blocks waiting for data. When the SM reaches the source, it spawns
new SMs to carry the data one hop back at the promised data rate. Recursively, a blocked SM
is awaken by the data arrival, and it will carry the data back until it reaches the sink.
3.6 Smart Messages Simulator
For large scale evaluation, we have developed an event-driven simulator, similar to ns-2 [57],
extended with support for SM execution. The simulator is written in Java to allow rapid pro-
totyping of applications. To get accurate results, both the communication and the execution
time have to be accounted for. The simulator provides accurate measurements of the execution
time by counting, at the VM level, the number of cycles per VM instruction. To account for
the execution time, we have simulated each node with a Java thread, and we have implemented
43
a new mechanism for scheduling these threads inside JVM. The communication model used in
our simulator is “generic wireless”, with contention solved at the message level. Before any
transmission, a node “senses” the medium and backs-off in case of contention.
3.7 Simulation Results
The main goal in conducting the simulation experiments was to quantify the data convergence
time for our implementations of SPIN and Directed Diffusion using SMs and to compare these
results with the results for traditional message passing implementations. We define the data
convergence time as the time when a certain percentage of the total number of nodes have
received the data (SPIN), or the data rate (Directed Diffusion). In both cases, due to flooding,
all nodes end up receiving the data and the data rate. SPIN completes after all nodes have
received the data, while Directed Diffusion will start the reinforcement phase after all nodes
have received the data rate. We use the same network configuration for all experiments. The
network has 256 nodes distributed uniformly over a square area, and each node has the same
transmission range. The average number of neighbors per node is 4.
The first set of experiments evaluate the data convergence time when only one SM is in-
jected in the network. Figure 3.12 presents the data convergence time for a single Directed
Diffusion SM, with the sink and source located at the diagonal corners of the square region.
We plot the data convergence time for three different cases of the same SM and a base case
for the same application using passive communication (no SM). The top curve shows the time
when code caching is not used. In the second curve, we can see an improvement of more than
4 times in performance when code caching is activated during the first execution of the SM in
the network. The code is cached when an SM visits a node for the first time and will be used by
subsequent SMs during the same execution. The effects of caching are very important in this
case because the SMs visit a node multiple times in Directed Diffusion: they travel the network
both forward (looking for the source) and backward (diffusion of data rate). In the third curve
we can observe a 30% decrease in the completion time when the code is already cached at all
nodes. The fourth curve shows the data convergence time for a traditional implementation: the
protocol is implemented at each node, only data is transferred through the network, and the
44
Figure 3.12: Directed Diffusion using SmartMessages
Figure 3.13: SPIN using Smart Messages
Figure 3.14: Directed Diffusion - MultipleSmart Messages
Figure 3.15: SPIN - Multiple Smart Messages
execution time is not accounted for. We observe that the degradation in performance for our
implementation, when the code is cached at all nodes, compared to the traditional implemen-
tation is only 5%. We believe that this is a reasonable price for the flexibility to program any
user-defined distributed application in NES.
Figure 3.13 plots the same curves for a single SPIN SM launched in the network at a node
located in a corner of the square area. During the first execution, code caching leads to a 3 times
improvement in performance (i.e., reducing the size of SMs is essential for a protocol based
on flooding and three-stage communication). The third curve shows a 30% decrease in the
completion time (similar to Directed Diffusion) when the code is already cached at all nodes.
The completion time increases from 10% to 15% compared to the traditional implementation.
The second set of experiments quantify the performance of our applications when multiple
SMs run simultaneously in the network. Figures 3.14 and 3.15 show the data convergence time
45
for both Directed Diffusion and SPIN with the code already cached at nodes. For these experi-
ments, data convergence time is the time when a certain percentage of nodes have received the
data (or data rate) for all the SMs running in parallel. The nodes at which the SMs start are dis-
tributed uniformly in the network. The results show that data convergence time increases with
the number of SMs, but only during the initial flooding phase because of increased contention
in the network. After that, the shapes of the curves are the same, independent of the number of
SMs. The results also indicate that SPIN completes faster than Directed Diffusion in all cases
(i.e., 2.3 s compared to 3.4 s for the top curves in the figures). The cause is that SPIN floods
only the neighbors and then brings the data to them, while Directed Diffusion needs to flood
the entire network until it finds the source and then brings the data rate back to all nodes. In
the initial phase Directed Diffusion generates more messages in the network leading to higher
contention, but its performance will increase as soon as the reinforcement phase begins.
3.8 Summary
In this chapter, we have presented the Smart Messages (SM) distributed computing platform
which provides a common execution environment for distributed applications developed on top
of highly dynamic networks of embedded systems. The SM platform overcomes the volatility,
heterogeneity, and scale encountered in these networks by using execution migration, content-
based naming, and self-routing. Furthermore, the SM system architecture is suitable for re-
source constrained devices since it defines a lightweight system support at nodes, with most of
the “intelligence” incorporated into SMs.
To prove that virtually any user-defined distributed application can be implemented using
SMs, we have implemented and evaluated through simulations two previously proposed appli-
cations for sensor networks, SPIN and Directed Diffusion. The simulation results show that the
SM platform is able to provide high flexibility for user-defined distributed applications while
limiting the increase in the response time to at most 15% over the traditional non-active com-
munication implementations.
46
Chapter 4
Smart Messages Self-Routing Mechanism
This chapter presents the Smart Messages (SMs) self-routing mechanism. Similar to most
mobile ad hoc networks, the separation between hosts and routers disappears in NES. In our
approach, there is no support for routing at nodes. SMs are responsible for their own routing
in the network (i.e., self-routing), and they can control routing in two ways: select their routing
algorithms, or change the routing algorithm during execution. To show how routing algorithms
can be implemented with SMs, we describe four such implementations corresponding to dif-
ferent types of routing. The chapter concludes with simulation results that demonstrate the
benefits of the self-routing mechanism.
4.1 Content-Based Migration
The key SM operation is content-based migration, which implements routing. Each SM has to
include at least onerouting brickamong its code bricks. A routing brick defines a high level
migratefunction. SMs name the nodes of interest by tag names, which denote content or prop-
erties, and then callmigrate to route them to a node that has the desired tags. Additionally,
migratecan be instructed to check if the nodes with these tags meet certain conditions (i.e.,mi-
grateimplements a conditional content-based migration). This function is a user-level function,
which can be provided as a library routing brick (e.g., implemented by system programmers)
or implemented directly by application programmers. For instance, a simple implementation
of migratetakes a list of tag names as parameter and migrates the SM to a node that contains
all those tags. Nothing precludes, however, a programmer to express more complex conditions
within this function.
Commonly,migratetakes atimeoutas an additional parameter in order to deal with network
volatility. If a timeout occurs (i.e., the routing algorithm has not been able to find a node of
47
1 int n = 0, sum = 0, lifetime = 1000; //stored in data brick2 String tempTag = "Temp", avgTag = "AvgTemp"; //stored in data brick3 createTag(avgTag, lifetime, null);4 while(n < 10){5 if (migrate(tempTag, timeout)){ // true on a node of interest6 sum += readTag(tempTag);7 n++;8 } else{ // migrate returns false in case of timeout9 if (n >= 5)
10 break; // go ahead if average over at least 5 nodes11 return; // otherwise, abort the execution12 }13 }14 if (migrate(AvgTag, timeout))15 writeTag(AvgTag, sum/n);
Figure 4.1: Example of Smart Message Using Content-based Migration
interest during the given period), the SM regains the control at an arbitrary node. In this way,
the SM is able to quickly adapt to changing network conditions. For instance, it may decide to
change the routing, change the nodes of interest, or abandon the migration.
Figure 4.1 illustrates the use ofmigratein an SM. To compute the average temperature over
a certain geographic region, the SM needs to run on ten nodes providing temperature sensors.
To simplify the example, we use a single tag name (“Temp”) as parameter ofmigrate. The SM
starts by creating a tag for average temperature at the source node (line 3). Then, it callsmigrate
(line 5) until ten nodes are visited and the sum of temperatures is computed. If ten nodes have
been found, the SM callsmigrateagain to return to the source and writes the average value in
the AvgTag (lines 14-15). The migration to the source node may use a different routing brick
than the first one, and implicitly, another implementation ofmigrate.
If a route to a node of interest is not found, the SM will not stay in the network forever (i.e.,
an SM can use limited resources and if it stays for too long in the network, it will eventually
be dropped by a node). This is ensured by thetimeoutparameter ofmigrate. If the timeout
expires before finding one of the ten nodes,migratetimes out on an arbitrary node and returns
the control to the SM. In this example, if a timeout happens, the SM accepts a partial results if
at least half of the nodes have been visited (lines 8-10). This is a simple example of application-
defined quality of result, which shows the ability of SMs to adapt to adverse network conditions.
48
1 String tagID, routeTagID; // stored in data brick2 int migrateTimeout; // stored in data brick3 boolean migrate(tag, timeout){4 tagID = tag;5 routeTagID = "route" + tag;6 migrateTimeout = timeout + getLocalTime();7 while(readTag(tagID) == null){8 Address nextHop = readTag(routeTagID);9 if (nextHop != null){
10 sys_migrate(nextHop);11 if (migrateTimeout <= getLocalTime())12 return false; // migrate timed out13 }else{14 RouteDiscovery rd = getDataBrick("RouteDiscovery");15 rd.setTag(tagID);16 createSM("RouteDiscovery", rd);17 int blockTimeout = migrateTimeout - getLocalTime();18 if (blockSM(routeTagID, blockTimeout) == TIMEOUT)19 return false; // migrate timed out20 }21 }22 return true; // migrated to a node of interest23 }
Figure 4.2: Example ofmigrateImplementation
For instance, the SM might never complete if ten nodes providing temperature readings do not
exist in that region.
Figure 4.2 shows an example of amigrate implementation usingsysmigratefor one-hop
migration and routing tags. To be capable of routing, SMs need to maintain routing information
within the tag space. They create tags at visited nodes, caching discovered routing information
in the data portion of these tags. Since tags are persistent across SM executions (as long as
their lifetimes have not expired), the routing information can be used by subsequent SMs with
similar interests, thus amortizing the route discovery effort over time.
In our example, we present a simple on-demand routing based on flooding the network
when looking for a tag name. As long as a next hop toward a node of interest is available, the
entire SM eagerly migrates there (lines 8-10). If the migration timeout has expired, the routing
code returns the control to the application code of the SM (lines 11-12). The assumption in
this example is that all the nodes have the time synchronized (i.e., using either GPS receivers
or more accurate time synchronization algorithms [40, 75]).
49
If a route toward a node of interest is not available, a route discovery SM is created and
its data brick is initialized with the tag name that defines the nodes of interest (lines 13-16).
The goal of the newly created SM is to migrate through the network, find nodes of interest,
set routes to these nodes, and report back the newly learned routes. During this process, the
current SM is blocked waiting for routing information (line 18). The blocked SM is woken up
when the discovery SM returns with a route and writes the routing tag. The implementation of
the route discovery SM is presented in Section 4.3, which describes various classes of routing
algorithms implemented with SMs. A problem generated by content-based routing (not shown
in this example) is how to ensure that the SM does not end up on a node of interest already
visited. In our programs, we have used two solutions. One is to to let the SM record the nodes
of interest visited and pass this list as a parameter tomigrate. The other one is to “mark” the
visited nodes with temporary tags.
4.2 Application Examples
To illustrate the flexibility provided by self-routing, we present several scenarios for applica-
tions that benefit from this mechanism. These scenarios correspond to the two possible ways
for an SM to control the routing: (1) choosing its routing algorithms, and (2) dynamically
changing its current routing algorithm. Section 4.3 describes the SM implementations of the
routing algorithms supporting these applications.
4.2.1 Selecting the Routing Algorithm
A first scenario involves an application that needs to perform image recognition on a number
of camera nodes that have acquired an image with a certain resolution within a given time
interval. In the absence of routing information, a naive solution would be to use an on-demand
content-based routing algorithm to discover camera nodes. Once migrated to a camera node,
the SM has to check if the resolution of the image and its acquiring time satisfy the application’s
requirements, and then proceed with the computation. The disadvantage of such a method is
that the SM has to pay the cost of migrating to nodes that do not satisfy the requirements of
the application (e.g., they have low resolutions or old images). The self-routing mechanism
50
SM injected
On-Demand Routing Space reached
SM done
Geographical Routing
(a)
Figure 4.3: Dynamic Change of Routing Due to Application’s Requirements
allows the application to define its own routing that discovers only the nodes having the desired
combination of tag names and values (i.e., they satisfy the required content-based condition).
Thus, the network bandwidth, the energy consumed, and the response time are all reduced for
this application. It is important to mention that self-routing offers the power to use any arbitrary
condition expressed by a program to select the nodes of interest.
A second example presents an SM routing algorithm that builds an ad hoc content-based
topology over a network of hand-held devices belonging to the attendees at a conference. For
instance, CEOs attending a conference may decide to have an important discussion and, for se-
curity reasons, they would like to have their messages sent directly to destinations or forwarded
toward destinations only by other CEO devices. Under the assumption that it is possible to ob-
tain a connected graph using only CEO nodes, a simple SM routing algorithm can be developed
such that the routing entries stored in the tag space have thenext hopvalue set always to a CEO
node.
4.2.2 Dynamically Changing the Routing Algorithm
Using multiple routing bricks during the lifetime of an SM may improve the completion time
or even help the application complete in the presence of adverse network conditions.
Figure 4.3 presents an SM that incorporates two routing bricks comprising of a geographical
routing and an on-demand content-based routing. The nodes containing the tag of interest are
51
SM injectedRouting Alg R1
SM timeoutsRouting Alg R2 SM done
Dense networkLow mobility
Sparse networkHigh mobility
(b)
Figure 4.4: Dynamic Change of Routing Due to Network’s Conditions
colored grey, but the application is interested only in the grey nodes located in the circular
region. Therefore, a simple on-demand content-based routing would perform poorly since it
would have to flood the entire network to discover the nodes of interest. The performance can
be radically improved if the application has knowledge about the geographical region where
the nodes of interest should reside. In such a case, a geographical routing is used to reach the
desired region. Once there (the black node in the figure), the SM changes its routing to the
on-demand content-based algorithm which will flood only a limited area.
Figure 4.4 shows another example of an SM that changes its routing dynamically. The
grey nodes are nodes of interest for the application. In the dense and relatively stable part of
the network, the SM may use routes established by a proactive routing algorithm. Once the
SM enters the unstable part of the network, the adverse conditions (low density of nodes, high
mobility) lead to a timeout in themigrate call. Let us assume that the SM is executing on
the black node when the timeout expires. At this time, the SM decides to change its routing.
It does so by calling amigrate function which corresponds to an on-demand content-based
routing. Using the new routing, the SM is able to visit all nodes of interest and complete its
execution.
52
4.3 Implementing Routing Algorithms with Smart Messages
In the following, we describe briefly the proof-of-concept implementations for several routing
algorithms using SMs. It is not our intention to show finely tuned routing implementations.
Our goal is to show the potential of the SM self-routing mechanism in implementing flexible
content-based routing in NES. With this mechanism, virtually any routing algorithm for ad hoc
networks can be used to implement amigratefunction.
4.3.1 On-Demand Content-Based Routing
Previous research, such as DSR [41] and AODV [68], has shown that on-demand routing is
suitable for highly mobile environments. We extend this work to implement an on-demand
content-based routing algorithm using SMs. Figure 4.5 presents a simplified implementation
of an on-demand routing (similar to AODV). Essentially, AODV builds routes using a route
request/route reply query cycle. When a source node needs a route to a destination for which it
does not already have a route, it broadcasts a route request packet across the network. Nodes
receiving this packet update their information for the source node and set up backward pointers
to the source node in the routing tables.
Each time routing information is not available at the current node, a route discovery SMs
flood the network looking for either a node of interest (defined by a certain tag name) or for
a node containing routing information about a node of interest. An SM that arrives at a node
already visited stops its execution (lines 5-6). A new node is marked, and if it does not have
the required data, the SM migrates to all one-hop neighbors and sets backpointers to the source
of one-hop migration (lines 7-10). After finding a node of interest or a route to a node of
interest, a route discovery SM returns to its source and sets up the routing tags at each node in
the path (lines 13-19). The first SM updating the routing tag at the source unblocks the initial
SM, which subsequently migrates to the next hop, as shown in Figure 4.2. Each time the next
hop becomes unavailable, the route discovery process is restarted. Thus, routing around broken
paths is possible. Such situations emphasize one of the advantages of using self-routing SMs
over the traditional request/reply paradigm: an application is able to make progress even in
poor network conditions, moving toward nodes of interest and eventually arriving there. In the
53
1 String tagID, markTag, routeTagID, prevTagID; // stored in data brick2 Address prevHop, nextHop; // stored in data brick3 int lifetime; // stored in data brick4 while((readTag(tagID) == null) && (readTag(routeTagID) == null)){5 if (readTag(markTag) != null)6 return;7 createTag(markTag, lifetime, "visited");8 prevAddr = getLocalAddress();9 sys_migrate(all); // migrate to all neighbors
10 createTag(prevTagID, lifetime, prevAddr);11 }12 // found tagID or a tag with a route to tagID13 while(prevHop != null){14 nextHop = getLocalAddress();15 sys_migrate(prevHop);16 createTag(routeTagID, lifetime, nextHop);17 prevHop = readTag(prevTagID);18 writeTag(RouteTagID, previous());19 }
Figure 4.5: Example of On-demand Routing Implementation with Smart Messages
request/reply paradigm, the round-trip communication may never complete and the application
may fail to achieve any result.
4.3.2 Geographical Routing
Unlike traditional distributed systems where the physical location of nodes does not matter,
the spatial distribution of nodes across the physical space is a key feature of massive NES.
Many times, the applications running in NES will prefer to express their interest for content
located within well-defined geographical regions. Therefore, a geographical routing algorithm
becomes a necessity. GPSR [48] is a well known geographical routing that makes greedy for-
warding decisions using only information about a node’s immediate neighbors. When a packet
reaches a region where greedy forwarding is impossible, the algorithm recovers by routing
around the perimeter of that region. We have implemented a simple geographical routing, sim-
ilar to the greedy forwarding used by GPSR, that takes a circular region as parameter and keeps
migrating the SM to the neighbor node closest to the center of the region until it reaches a
node located within that area. The SM system software at nodes provides the list of one-hop
neighbors together with their locations.
54
4.3.3 Proactive Routing using Bloom Filters
Exchanging routing information among all nodes in NES is practically impossible, but a lim-
ited exchange of information among neighbors can be useful even in the absence of global con-
vergence. We have implemented an algorithm that maintains approximate information (sum-
maries) about content location in the network as Bloom filters [16]. This algorithm is similar to
probabilistic routing [72]. A Bloom filter is a bit vector of lengthn that uses several indepen-
dent hash functions to map the elements of a set to integers in a[0,n) interval. To form a Bloom
filter summary, each element in the set is hashed and the bits in the bit vector associated with
the hash functions are set. For an element lookup, the element is hashed and the corresponding
bits are checked. If all the bits are set, there is a certain probability that the element is contained
in the set. Thus, false positives can occur. Whereas, if any one of the bits is not set, we can
guarantee that the element is not in the set.
Our algorithm builds summaries for the content (i.e., tag names) present at each node. These
summaries are disseminated among neighbors, and they are diluted as they move away from the
source. Nodes closer to certain content have more accurate knowledge about its existence than
nodes farther away from it. This information continues to degrade as we move farther from the
content. However, it is still possible for an SM to discover a route to a content located far away
from its current node using the approximate information maintained locally. This knowledge
may not be accurate, but it is expected that the next hop will be able to provide more precise
information. Thus, choosing nodes which have an a priori better knowledge about the location
of the content as intermediate hops may finally lead to the desired destination.
Initializing the network for the proactive algorithm can be done on demand by injecting
a RoutingSM that will replicate itself at the participating nodes. TheRoutingSMs maintain
summaries about the information learned so far and store them in the tag space. They maintains
exact summaries for the local node and its one-hop neighbors, but only approximate informa-
tion about their larger neighborhood. The approximate information for a node locatedN hops
away from a content is a logicalOR of the summaries for the nodes located up toN-1 hops
away (N is an implementation parameter).
Routing SMs block on a tag, and they wake up in two situations. First, they are woken
55
1 1 0 1 0 1 0 0
1 1 1 0 0 1 0 1
1 0 1 1 0 0 0 0
0 0 1 1 1 0 0 1
1 1 0 1 0 0 1 1
1 1 0 1 0 1 0 0
0 0 1 1 1 0 0 1
1 0 0 1 0 0 1 1
1 1 0 0 0 0 1 1
1 1 1 0 0 0 0 1
1 1 1 0 0 1 0 0
1 0 0 1 0 0 1 1
1 1 0 0 0 0 1 1
A
B
C
D
E
F
G
S(B)
S(C)
S(A)
S(F OR G)
S(D OR E)
S(C)
S(G)
S(F)
S(B)
S(G)
S(F)
S(E)
S(D)
hash(rain)={7,6,4,2}
hash(fire )={7,4,1,0}
rain ?
fire ?
FIRE
RAIN
Figure 4.6: Lookup in Proactive Routing: An SM arrives at node A, looking for a “fire” tag.Applying the hash functions on “fire”, it concludes that the neighbors of C might know betterabout “fire”, and migrates to C. A lookup on node C leads to the conclusion that the “fire” tagexists on node F.
up by SMs bringing new summaries. And Second, they wake up periodically to disseminate
information in the network. In the initialization phase, the local summaries are disseminated
to each of the neighbors one-hop away. After the exact summaries of the neighbors have been
received, the local summaries are updated, and new SMs propagate them. Periodically, each
RoutingSM creates aheartbeatSM that migrates to each of its immediate neighbors, informing
them that the local node is still alive. If noheartbeatis received from a node within a timeout
period, the node is assumed to be dead and its summary is discarded. When new summaries are
modified, theRoutingSM incorporates the differences (if any) in theheartbeat, and informs all
its immediate neighbors about the change. This change is recursively propagated byheartbeats
created byRoutingSMs residing on the neighbor nodes.
A migrating SM checks the summaries at each node to find routes to the desired content.
An SM arriving at a previously visited node, looks up the summary and chooses a different
neighbor that may have the desired content. If no such neighbor exists, the SM randomly
chooses a neighboring node for migration. It stops its execution if it arrives again at the same
node, in order to avoid a loop.
Figure 4.6 shows an example of a lookup operation performed in a network with intelligent
cameras. The cameras are programmed such that a tag is set when fire or rain is detected. When
56
DisseminateWest East
South
North
waiting for routesSM blocked
Figure 4.7: Rendez-Vous Routing with Smart Messages
the application arrives at node A, it looks up the summaries to find the next hop which has the
“fire” tag, or more precisely information about the location of this tag of interest. The routing
algorithm applies the hash functions on “fire” and checks if the hash value can be matched
against the local summaries. It concludes that the neighbors of C might know better about
“fire” and migrates to C. The same algorithm is applied at C, and the SM discovers the “fire”
on node F.
4.3.4 Rendez-Vous Routing
We introduce the term “rendez-vous” routing to define a category of routing algorithms that
use a combination of on-demand and proactive routing. Such an approach can be beneficial
for certain applications both in terms of scalability and adaptability to highly dynamic network
configurations. For example, with such an approach, we can disseminate routes for important
content in certain locations across the network, and then the SMs can migrate to these locations
to find the necessary routing information. Conceptually, the idea of rendez-vous routing is
similar to the one presented in the Internet Indirection Infrastructure [81].
Figure 4.7 illustrates a simple rendez-vous routing that combines geographic dissemination
57
with limited flooding. An SM, running at the grey circle in the figure, needs routing information
for a certain tag name. The algorithm starts by broadcastingExploreSMs to one-hop away
neighbors, and then blocks waiting for routing tag updates. TheExplore SMs check if the
neighbor nodes have the given routing tag. If the tag is found, anExploreSM returns at source
and updates the routing tag; this operation unblocks the initial SM. If routing information is not
available at the neighbors,ExploreSMs create a tag for the desired routing data, and block on
it. If no result is received for a certain amount of time (passed as timeout parameter to theblock
call), eachExplorebroadcasts itself one more hop and doubles the timeout. The algorithm
works recursively until it reaches the established limit of the number of hops to be visited. The
exploring process is stopped by the initial SM after it receives the required route. This SM
floodsCancelSMs, which will let theExploreSMs know that they have to finish. ACancel
SM stops at the first node that does not contain the desired routing tag (i.e., noExplorehas
passed through that node).
SMs that create important tags generate a new SM to disseminate routing information.
According to our algorithm, an SM running at the black node in the figure creates 4 SMs
that will travel in 4 directions, based on geographic coordinates: East, West, North, South.
These SMs are inherently loop free. We assume that each node knows its own location and
its neighbors locations. The intuitive idea behind our approach is that the rendez-vous can
happen in two situations: (1) one of the dissemination SMs intersects the flooded area, or (2)
anExploreSM reaches a node storing the disseminated information. There are two advantages
to this rendez-vous algorithm. First, we avoid a global dissemination, which would be too
expensive in terms of network resources, but at the same time we propagate routing information
eagerly. And second, we limit the flooding process that takes place in on-demand algorithms.
Consequently, routes to important information are discovered faster, and the response time for
applications decreases. In the example presented in the figure, the SM moving North updates
the routing tag at the grey square node, and theExploreSM blocked there brings the routing
information to the source (grey circle).
58
0
0.5
1
1.5
2
2.5
0 1 2 3 4 5
Com
plet
ion
Tim
e (s
ec)
Number of Nodes of Interest
On-Demand Routing Conditional On-Demand Routing
Figure 4.8: Completion Time for Experiment1
0
1000
2000
3000
4000
5000
6000
0 1 2 3 4 5
Byt
es S
ent i
n th
e N
etw
ork
(KB
ytes
)
Number of Nodes of Interest
On-Demand Routing Conditional On-Demand Routing
Figure 4.9: Bytes Sent in the Network forExperiment 1
4.4 Simulation Results
Our main goal in conducting the simulation experiments was to quantify the effects of the
self-routing mechanism for applications running in large scale NES. We choose two metrics
to analyze the performance of our solution: (1)the completion timewhich measures the user-
observed response time for an application, and (2)the total number of bytes sentwhich mea-
sures the total amount of traffic (generated by an application) throughout the network. This
metric implies the energy and bandwidth consumed by an application and consequently, it also
indicates the overall lifetime of the network. For all the simulations, we uniformly distribute
256 nodes in an 1000m by 1000m square. The transmission range for each node is 100m. A
node can communicate with an average of 6 neighbors (ranging from 2 to 11 neighbors) at the
network bandwidth of 2Mb/s.
Our first set of simulation experiments studies the SM feature that allows programmers
to select the most appropriate routing for their applications or even to implement their own
routing. The SM starts on a node located in the bottom-left corner of the square region that
contains the network. The goal of this application is to visit a number of nodes of interest
(defined by a given tag name) which satisfy a certain condition. Without loss of generality,
we simply check if the value associated with the given tag is over a certain threshold. We use
two on-demand routing algorithms for this experiment (similar to those described in 4.2.1): a
simple on-demand content-based routing, and a conditional on-demand content-based routing
(which enhances the simple on-demand algorithm with a few lines of code that checks the
desired condition).
59
0
0.5
1
1.5
2
2.5
0 200 400 600 800 1000 1200 1400 1600
Com
plet
ion
Tim
e (s
ec)
Region Radius (meters)
On-Demand Routing Geographic + On-Demand Routing
Figure 4.10: Completion Time for Experi-ment 2
0
1000
2000
3000
4000
5000
6000
0 200 400 600 800 1000 1200 1400 1600
Byt
es S
ent i
n th
e N
etw
ork
(KB
ytes
)
Region Radius (meters)
On-Demand Routing Geographic + On-Demand Routing
Figure 4.11: Bytes Sent in the Network forExperiment 2
0
0.5
1
1.5
2
2.5
0 100 200 300 400 500 600 700 800
Com
plet
ion
Tim
e (s
ec)
Region Radius (meters)
On-Demand Routing Geographic + On-Demand Routing
Figure 4.12: Completion Time for Experi-ment 3
0
1000
2000
3000
4000
5000
0 100 200 300 400 500 600 700 800 900
Byt
es S
ent i
n th
e N
etw
ork
(KB
ytes
)
Region Radius (meters)
On-Demand Routing Geographic + On-Demand Routing
Figure 4.13: Bytes Sent in the Network forExperiment 3
We distribute uniformly over the network area a total of five nodes containing the tag of
interest and vary the number of nodes of interest (in this experiment, nodes whose tag values
satisfy the desired condition) from one to four (by setting the values of the tags of interest).
Since the results of using the simple on-demand content-based routing depend on the order
of visiting the nodes, we take all the possible combinations and compute the average for both
routing algorithms. Our results indicate that the conditional routing improves the response
time with as much as 40% (see Figure 4.8) because it does not visit any unnecessary nodes
(i.e., nodes that have the desired tag, but the tag value does not meet the condition) whereas the
simple routing does.
Additionally, our bytes-sent results (see Figure 4.9) indicate that the conditional routing
consumes significantly less energy and bandwidth (40% fewer bytes sent for one node of inter-
est) than the simple routing. As expected, when the number of nodes of interest increases, the
savings of our conditional routing are less evident because the simple on-demand routing visits
60
fewer unnecessary nodes. When the number of nodes of interest is close to the number of nodes
hosting the tag of interest, the simple routing even performs slightly better than the conditional
routing. The primary reason is that the code size of the conditional routing is approximately
150 bytes larger than that of the simple routing. Even though this additional size is small, the
impact of the additional overhead for programming the network becomes noticeable, given that
the network size is sufficiently large.
In the second set of experiments, we study the SM ability to change its routing during
execution. Specifically, we compare SMs using only on-demand routing with SMs using a
combination of geographical and on-demand routing (as described in 4.2.2). The SM starts
on a node located at the bottom-left corner of the region. The goal of this SM is to visit five
nodes of interest identified by a given tag name. The network contains exactly five nodes of
interest uniformly distributed over a region delimited by a circular area with the center at the
opposite corner and a 500m radius. If the SM has approximate information of the geographical
region containing these nodes, it can migrate to this area using geographical routing. Upon
reaching the specified area, the SM changes dynamically its routing to geographically-bound
on-demand routing (i.e., on-demand routing that floods a limited region) in order to discover
the target nodes. In our simulations, we vary the approximate geographical information (of the
target nodes) by changing the radius of the circular region defined above (the nodes of interest
remain the same).
The performance of the on-demand routing remains constant (regardless of the radius) be-
cause this simple on-demand routing always floods the entire network (see Figure 4.10). Con-
versely, the more accurate the target area is, the faster the combination scheme completes (as
much as 38% reduction in completion time). For the 1500m radius, the combination scheme
performs roughly the same as the on-demand algorithm because the target region already covers
the entire network.
It is well documented that the use of flooding in large scale networks adversely impacts the
system scalability [61]. Figure 4.11 shows that the combination approach can significantly im-
prove the scalability by reducing the total number of bytes sent in the network (consuming less
energy and bandwidth). The combination scheme can achieve up to 80% energy and bandwidth
savings. Surprisingly, for a larger radius (≥ 1100m), on-demand routing sends fewer bytes than
61
the combination scheme. There are two reasons for this result. First, given such a large target
region, the combination scheme unavoidably floods almost the entire network. Second, the
code size of geographically bound on-demand routing is 400 bytes larger than that of simple
on-demand routing. This additional code size can significantly decrease the performance, given
the sufficiently large network size and the flooding nature of our on-demand routing.
Nevertheless, for some SMs, the combination scheme can achieve much better perfor-
mance. Similar to the previous experiment, we consider an SM that starts at the same node
at the bottom-left corner. However, unlike the previous experiment, the goal of this SM is to
visit three nodes, each of which residing in one of the other corners. Additionally, the SM
has to visit these three nodes (identified by different tag names) in clockwise order. Under our
investigated scenarios, the combination scheme (with limited flooding) expectedly completes
faster (between 25% and 40%) than the on-demand algorithm which floods the entire network
(Figure 4.12). The difference between full flooding and limited flooding is more evident be-
cause the on-demand routing floods the entire network three times. Such faster completion
time conforms with the fewer bytes-sent result in Figure 4.13 (between 62% and 92% bytes
savings).
4.5 Summary
In this chapter, we have presented the Smart Messages (SM) self-routing mechanism.The main
feature of SM self-routing is its flexibility in the presence of highly dynamic network config-
urations. Content-based migration is the high level primitive used by applications to name the
nodes of interest by content and to migrate the execution there. Using this primitive, SM appli-
cations can choose the most suitable routing for their needs, implement their own routing, or
change the routing dynamically. Our simulation results indicate that the above flexibility can
improve the responsiveness of SM applications and provide significant energy and bandwidth
savings.
62
Chapter 5
Prototype Implementation and Evaluation
This chapter presents the design and implementation of the Smart Messages (SMs) prototype,
as well as the implementation of Spatial Programming (SP) over an SM runtime system. The
SM prototype is implemented in Java over Linux. The SM system support is implemented
within Sun Microsystem’s K Virtual Machine which has a memory footprint suitable for re-
source constrained devices.
We also describe EZCab, an SM-based application for locating and booking free cabs
in densely crowded traffic environments using only short-range wireless communication. To
demonstrate the SP simplicity, we have implemented a simple intrusion detection application.
Throughout this chapter, we present experimental results for the basic SM operations, SM
routing algorithms, and the two applications mentioned above. The testbed used for the evalu-
ation consisted of ad hoc networks of PDAs (HP iPAQs) equipped with IEEE 802.11 wireless
cards. The experimental results demonstrate the feasibility of our approach in programming
distributed applications for outdoor computing environments.
5.1 Smart Messages Implementation
To leverage on the existing user base, we have implemented the SM prototype in the Java pro-
gramming environment over Linux. Specifically, we have modified Sun Microsystem’s KVM
(Kilobyte Virtual Machine) [2] because its source code is available and has a small memory
footprint (i.e., it is suitable for resource constrained devices such as those encountered in NES).
The SM API is encapsulated in two Java classes:SmartMessageandTagSpace. For efficiency,
we have implemented the API as Java native methods. Besides the KVM interpreter thread,
we have introduced two additional threads for admission control and local code injection. The
design of the SM computing platform is not specific to any hardware or software environment.
63
It can be implemented on any virtual machine (e.g., Mate [51], Scylla [80]), programming
language, or underlying operating system.
In the rest of this section, we describe the most important components of our prototype
implementation: the primitives for SM creation, the memory management mechanism which
ensures thread-safety in KVM, the lightweight migration mechanism, the code caching, and
the I/O tags. Currently, the admission manager is very simple; it accepts any SM as long as the
destination node has enough memory to accommodate this SM.
5.1.1 Creating New Smart Messages
New SMs can be created at a node by the local injector or the VM interpreter. Each SM in the
system is associated with a VM-level thread. The admission manager can also create VM-level
threads for SMs arriving from the network.
A user can inject a new SM by passing a Java class name and a list of arguments to the local
injector. The injector attempts to load, link, verify, and initialize the class file. Upon successful
initialization, the injector creates a new VM-level thread with an initial stack frame for themain
method of the class and inserts the thread into the ready queue. The arguments passed by the
user are pushed onto the stack as arguments of themainmethod. At this point, the VM-level
thread has no associated SM structure. When the VM-level thread starts its execution, it has to
call createSMFromFilesto associate itself with a new SM structure.
The interpreter thread also creates new VM-level threads in response tocreateSMand
spawnSMinvocations. When an SM callscreateSM, the data bricks of the new SM are cloned
from the current SM, and the code bricks of the new SM refer to the verified code bricks in the
code cache. ThespawnSMcall is similar tocreateSM, except that the new SM starts its execu-
tion from the next bytecode afterspawnSM. To implement this primitive, the execution stack
frame associated with the VM-level thread of the original SM is duplicated onto the VM-level
thread of the new SM.
64
5.1.2 Memory Management
The garbage collector in KVM is designed for a single-threaded environment. Since any of
the three threads in SM prototype (i.e., interpreter, local injector, admission manager) could
allocate memory from the dynamic heap, we protect the garbage collector data structures us-
ing a heap lock and restrict the garbage collection to a limited number of locations (i.e.,GC
Points[14]). We have modified the mark-sweep garbage collector in KVM such that garbage
collection is performed by the interpreter only during context switches (i.e., the interpreter
has a singleGC Point). The interpreter triggers a garbage collection during a context switch
if the available memory falls below a threshold. Before performing garbage collection, the
VM ensures that the admission manager and the local injector threads have reached theirGC
Points(defined as the regions where all valid memory references are reachable from the garbage
collector’s root set). TheGC Pointsof the three VM threads are demarcated using a single
read-write lock. During garbage collection, the interpreter thread holds the write lock. The ad-
mission manager and the injector hold the read lock to protect the critical regions from garbage
collection.
5.1.3 Lightweight Migration
One of the main obstacles in implementing an efficient execution migration arises from the
strong coupling between the execution entity and the host. For example, traditional process
migration needs to deal with sockets and file descriptors during migration. Two key features in
the design of our system helped us circumvent the problem of strong coupling.
First, the tag space shields the SMs from direct coupling with the underlying OS. The read
and write operations on tags are complete and atomic transactions; no state of the underlying
OS resources is kept in the SM structure. Hence, an SM can be completely extracted from its
execution environment, migrated, and resumed at destination.
Second, an SM program never creates a communication endpoint directly since it is based
on execution migration, not message passing. Communication channels are managed implicit
by the underlying system. In contrast, traditional message passing programs create communi-
cation channels explicitly to transfer data. Hence, SM programs do not have any reference to
65
OS network descriptors.
Our migration islightweight in the sense that we do not migrate the complete memory
referred to by SMs. Instead, we migrate data bricks which are explicitly identified in the SM.
To simplify the task of programmers, we migrate, however, thethisself-reference for non-static
methods. Therefore, these methods can use object member variables safely after migration.
For clarity of exposition, we will describe the SM migration mechanism as three logical
phases:SM capture, SM transfer, andSM resumption.
SM Capture Phase. An SM enters into this phase when it invokessysmigratedirectly
or as part of a routing library. In this phase, we convert the SM into a machine-independent
representation. The code bricks are already in the machine-independent Java class format, and
therefore, only the data bricks and execution stack frames need to be converted.
To implement this conversion, we have developed a simple object serialization mechanism
(i.e., KVM does not provide one). Each data brick is serialized into values and types repre-
senting its internal structure recursively. During serialization, we also generate a temporary
structure which provides a unique identifier for each data brick reference. The unique identi-
fiers of a data brick object and its sub-objects are determined solely by the structure of the data
bricks.
The execution control state of an SM is represented by the execution stack frames of its
associated VM-level thread. Each stack frame is serialized into a tuple of six values: current
offset of instructionandoperand stackpointers, method name, signature name, class name,
and a flag indicating whether the method is non-static. For non-static methods, we also encode
the machine-independent identifier for thethisself-reference.
SM Transfer Phase. Using the data brick and stack information sizes obtained during
the capture phase, the interpreter initiates a three-way handshake protocol with the destination
node. The operation of this protocol is shown in Figure 5.1. If the SM is accepted, the admission
manager sends back a list of missing code bricks as part of theacknowledgment. Otherwise,
the admission just drops the request. Upon the receipt of the acknowledgment, the source
node sends the complete SM, which consists of missing code bricks, serialized data bricks, and
execution control state. To simplify the implementation, we have used TCP for reliable single-
hop communication between neighbors. For better performance, single-hop communication
66
AdmissionManager
StackControlCB2 DB2DB1
CB1CB2...
...
Code Cache
Tag
Spa
ce
CB1 CB2
Running SM
DB1 DB2QueueSM Ready
AdmissionManager
CB1
...
...
Code Cache
Tag
Spa
ce
CB3
QueueSM Ready
Running SM
DB3 DB4CB1 CB3
Node1
Stack
VM
Send SM (4)
Send ResourceTable (1)
Missing = CB2 (3)Ack
VM
Stack
CheckCache (2)
Add CB2 (5)
Enqueue SM (6)
Node2
InterpreterInterpretersys_migrate
Figure 5.1: Smart Message Transfer (Main Operations)
can be implemented on top of a reliable single-hop protocol over 802.11.
SM Resumption Phase. After the admission manager successfully received the code
bricks, data bricks, and execution control information from a source node, a new VM-level
thread and its associated SM structure are constructed. The missing code bricks sent from the
source node are verified by the KVM verifier and stored in the code cache by the admission
manager. We have modified the existing KVM class loader to search the code cache each time
the VM needs a class. During data brick de-serialization, the admission manager constructs
a temporary structure (similar to the structure constructed during the data brick capture at the
source node) which maps a unique identifier to each data brick reference. The execution stack
frames are reconstructed using the tuples sent from the source. Finally, the interpreter thread is
notified if it is currently idle.
5.1.4 Code Caching
Each code cache entry consists of the Java class file of a code brick, a reference count, and a
reference to the internal VM class representation. The original class format is stored for future
migrations to nodes that do not have it cached. The reference count keeps track of the number
of SMs currently referring to this code brick. Each time an SM referring to this code brick
migrates or terminates, the reference count is decremented. When the reference count becomes
67
zero, the code cache entry is moved to a free list. Should the same code brick be referenced by
a new SM, the cache entry is resurrected from the free list. The memory associated with free
list entries is reclaimed according to an LRU policy. When a cache entry is evicted, the code
brick memory is freed, and the corresponding internal VM class representation is unloaded
(since KVM does not have a class unloading capability, we have implemented our own class
unloading mechanism).
5.1.5 I/O Tags for Interaction with the OS and I/O System
An application uses thereadTagandwriteTagprimitives to access an I/O tag. It is up to the
system to define the source of the data, butreadTagtypically translates to an OS call. A
writeTagtranslates to an OS call which sets certain parameters for an I/O device. Example of
I/O tags currently available in our prototype can be found in Table 5.4.
Since each I/O tag requires specific native code, adding new I/O tags involves adding new
native code to the node. We have identified three possible solutions for this issue. The first
option is to statically link the native code into the VM. This is not viable because adding
new I/O tags would involve shutting down the VM. The second option is to implement new
I/O tags as dynamic shared libraries. This is not viable because we cannot assume that every
node supports dynamic linking. The third option is to implement new I/O tags as external
processes which communicate with the VM using a standard interface. We have chosen the
third alternative since it enables users to dynamically extend the I/O tags without requiring the
VM to be shut down or the host to support dynamic shared libraries. For efficiency, a few basic
I/O tags (e.g.,free memoryandsystemtime) are implemented and linked permanently into the
VM executable.
Commonly, an I/O tag is associated with an external program, termedhandler, which in-
corporates the code for reading and writing this I/O tag. When the VM receives a request to add
a new I/O tag, it creates a new Unix process for this handler. We use Unixpipesfor communi-
cation between VM and the handler process. Figure 5.2 shows the interaction between an SM
and a handler process for a GPS device. When the SM issues a read request for theLocation
tag, the interpreter sends aread command to the handler and blocks waiting for an answer.
Once the handler has obtained the data from the GPS device (connected on the serial port in
68
Unix PipesGPS GPS
Device
Read Command
Handler
Interpreter
Tag Space
Serial (/dev/ttyS)
I/O Handler
Location ACL
Location l = readTag("Location");
Location Object
Figure 5.2: I/O Tag Example (Using GPS to Get the Current Location)
our example), the handler encodes the data and sends it back to the VM. The VM de-serializes
the results into a Java Object and returns it to the SM. A write operation is performed similarly.
Certain SMs may have a user interface (in the form of an external process) which allow
users to interact with SMs via special I/O tags, termed UI tags. Unlike regular I/O tags, a UI
tag behaves similar to a producer-consumer circular buffer. Each UI process can communicate
with multiple SMs. This communication is done through a pair of UI tags: awrite tag for
passing data to SMs, and aread tag for receiving data from SMs. These tags persist for the
entire duration of the UI process.
5.2 Smart Messages Evaluation
To evaluate the performance of our prototype, we have measured the cost of the SM primitives
and the completion time of two routing algorithms (on-demand and geographical routing).
Additionally, we have implemented and analyzed EZCab, a real-life application for booking
cabs in densely populated cities using only short-range wireless communication. The testbed
consists of HP iPAQ 3870 running Linux 2.4.18. Each iPAQ contains an Intel StrongARM
206Mhz processor, 32MB flash memory, and 64MB RAM memory. For communication, we
use Orinoco 802.11b Silver PC Cards.
5.2.1 Cost of SM Creation
createSMFromFiles. This primitive allows a user to inject a new SM at a node. After an
invocation, the VM loads the class files from the local file system, unless the classes are already
69
Time(ms)Size(KB)
Uncached Cached
1 2.622 0.0322 5.112 0.0344 9.953 0.0428 20.151 0.063
Table 5.1: Effect of Code Brick Size oncre-ateSMFromFiles
Time(ms)Size(KB)
spawnSM createSM
2 0.270 0.2434 0.367 0.3268 0.508 0.46916 0.913 0.822
Table 5.2: Effect of Data Brick Size onspawnSMandcreateSM
in the VM code cache, and creates a new SM structure. To evaluate its cost, we have performed
two series of experiments. In the first, we invokecreateSMFromFilesfor an un-cached class
of different sizes while keeping the data brick size constant (53 bytes). Then, we repeat the
same experiment with the class cached. In both experiments, we have used 1KB class files
and we varied the number of class files used to create an SM. Table 5.1 shows that the cost of
createSMFromFilesalmost doubles (when the code is not cached) as we double the size of the
code brick. These results show that the cost of class loading dominates the cost of creating a
new SM structure. The cost of creating a new SM structure is essentially the cost measured
when the code is cached.
createSM and spawnSM.Table 5.2 shows the costs ofspawnSMandcreateSMfor differ-
ent data brick sizes. The code brick and stack size are fixed at 1527 and 131 bytes, respectively.
Typically, an SM has a mixture of static and non-static call frames. Therefore, we consider a
stack consisting of two stack frames, one for a static method and the other for a virtual method
call. Although these two primitives are similar, the results show that the cost ofspawnSM
is slightly higher than the cost ofcreateSM. The difference is the time spent to duplicate the
execution stack frames forspawnSM.
5.2.2 Cost of SM Migration
The most significant factors that determine the cost of our migration are the data brick serial-
ization, the SM transfer, and data brick de-serialization.
Data Brick Serialization and De-Serialization. Since the code bricks need not be serial-
ized, we perform this operation only on data bricks and execution stack frames. Our measure-
ments indicate that the serialization cost for the execution stack frames is small compared to
70
Figure 5.3: Cost of Data Brick Serialization Figure 5.4: Cost of Data Brick De-Serialization
the cost of data brick serialization; it varies from 0.204ms to 0.567ms as we vary the execution
stack from 2 to 15 frames. To study the effect of data brick serialization, we vary the data brick
size from 2KB to 16KB, while using a fixed size code brick (1197 bytes) and two fixed size
stack frames (131 bytes).
Commonly, the data bricks in an SM consist of a mixture of objects and primitive types.
We use two types of data bricks in this evaluation: an array of integers, and an array of objects.
The serialization costs for these two data bricks provide practical lower and upper bounds for
the cost of data brick serialization. The object array represents an upper bound since each of its
elements causes a call to the top level VM serialization method. The integer array represents a
lower bound since it involves only one call to the top level VM serialization method.
Figure 5.3 shows that the serialization cost is below 6ms for data bricks as large as 16KB.
Commonly, the SMs process data at its source, and therefore, they carry small size data. The
applications that we have developed carry less than 2KB, which costs less than 1ms to seri-
alize. Figure 5.4 presents the de-serialization cost for the same data bricks. We observe that
de-serialization cost is as much as 30% higher than the cost of serialization due to memory
allocation during object de-serialization.
SM Transfer. The variation of execution control state size is small compared to that of
code bricks and data bricks. Thus, we only consider the effect of code bricks and data bricks in
the subsequent experiments. We have performed two sets of experiments to evaluate the cost of
migration (serialization, transfer, de-serialization) for different code brick and data brick sizes.
In the first set, we vary the code brick size while keeping the data brick size and stack frame
71
Figure 5.5: Effect of Code Brick Size on Sin-gle Hop Migration
Figure 5.6: Effect of Data Brick Size on Sin-gle Hop Migration
size fixed at 53 bytes and 131 bytes, respectively. In the second experiment, we vary the data
brick size while keeping the code brick size and stack frame size fixed at 1197 bytes and 131
bytes. Figures 5.5 and 5.6 show the results of these two experiments.
The values in Figure 5.5 represent the total time for single hop migration in two situations:
the code is not cached, and the code cached. The time to transfer the SM when the code is
cached is constant and represents the overhead of the three-way handshake protocol. Figure 5.6
shows that the data brick size contributes significantly to the total cost of migration. Thus, it is
important to have a serialization scheme with minimal space overhead.
5.2.3 Tag Space Operations
Table 5.3 shows the cost of the tag space operations for application tags. ThereadTagprimitive
has the lowest cost since it performs the least number of operations; when an SM reads a tag,
the interpreter acquires a lock, performs a lookup in the tag space, verifies the access rights,
and returns the data to the SM. ThewriteTagoperation costs slightly more since the interpreter
has to check for and unblock any SMs blocked on the tag. TheblockSMoperation costs more
than bothreadTagandwriteTagsince it also needs to append the SM to the SM blocked queue
and suspend the VM-level thread. ThedeleteTagprimitive has the second highest cost since the
interpreter needs to wake up all SMs blocked on the tag, remove the timer for the tag lifetime,
and remove the tag structure from the tag space, while thecreateTagprimitive has the highest
cost since it involves additional steps to register a timer for the tag lifetime and create access
72
Operation Time(µs)
createTag 101.781deleteTag 75.071readTag 34.548writeTag 50.289blockSM 59.844
Table 5.3: Cost of Tag Space Primitives forApplication Tags
Tag Name Time(ms)
gps location 0.20neighborlist 0.34imagecapture (32 Kb) 341.23light sensor 0.11batterylifetime 25.63systemtime 0.09free memory 0.12
Table 5.4: Cost of Reading I/O Tags
User Node Intermediate Node of Interest
Figure 5.7: Network Topology for Routing Experiments
control data structures.
Table 5.4 presents the access time to several I/O tags that are currently implemented in
our prototype: GPS location query, neighbor discovery, camera image capture, light sensor,
and system status inquiry (battery lifetime, system time, and amount of free memory). The
gps location is updated by a user-level process which reads from the GPS serial interface. The
location of the neighbors along with their identifiers are returned by reading theneighborlist
tag. This tag is typically used by geographical routing algorithms carried and executed by
SMs. To get the information about neighbor nodes, we have implemented a neighbor discovery
protocol which maintains a cache of known neighbors. For theimagecapture tag, the I/O
handler converts the image received from camera in YUYV format to RGB format before
returning it to the SM. All the other tag values are obtained directly from Linux using system
calls.
5.2.4 Routing Algorithms
We present the evaluation of two simple SM routing algorithms (geographical and on-demand
content-based) executed over our SM prototype. Since one of these routing algorithms might
73
Routing Algorithm Code not cached (ms) Code cached (ms)Geographical 415.6 126.6On-demand 506.6 314.7
Table 5.5: Completion Time for Routing Algorithms
be more suitable than the other for some applications, we do not intend to compare them. In
fact, a judicious use of both algorithms might yield significantly better results than each of them
separately.
Our goals in conducting this experimental evaluation study were three-fold: (1) to demon-
strate the flexibility of the SM architecture for application-level self-routing, (2) to understand
the re-programmability issues in NES, and (3) to explore the influence of code caching on our
unattended re-programmable system. Our testbed consists of eight HP’s iPAQs running Linux
and using Orinoco’s 802.11b PC cards for wireless communication. The network topology is
typically four hops across (see Figure 5.7). The SM starts at the grey node and discovers the tag
of interest at the black node using geographical routing or on-demand content-based routing.
In the first experiment, we measure the completion time of an SM using geographical rout-
ing. The SM routes itself from the grey node to the black node and returns on a different path.
The round-trip time for this task is 415.6 ms (Table 5.5). At the beginning of our experiments,
there was no SM program (or routing) installed at any node. Therefore, the result also includes
the latency imposed by programming the network. The program size of our SM with geograph-
ical routing is approximately 4.4KB. To factor out the installation latency, we study the impact
of code caching on this experiment by re-running the same SM at the grey node. The second
execution of the same SM (the code is cached by all nodes) takes only 126.6 ms (or 3.2 times
faster).
We also conduct a similar experiment for an SM with on-demand content-based routing.
When the code is not cached, the route discovery time for this SM is 506.6 ms. This result is a
bit surprising, given that the program size of this route discovery SM is only 2.8KB. However,
the result is reasonable given the significant delay imposed by the wireless contention (due
to route discovery flooding). When the code is cached, the route discovery time for this SM
decreases to 314.7 ms (or only 1.6 times faster). Understandably, one might also expect a 3-
times speedup for this SM after code caching. However, the impact of code caching is less
74
Figure 5.8: Route Discovery in EZCabFigure 5.9: Cab Booking following a RouteDiscovery in EZCab
evident when the program size is smaller, given an unavoidable overhead coupled with such
wireless contention.
5.2.5 Application Case Study: EZCab
To demonstrate the feasibility of the SM computing platform for real-world applications, we
have developed EZCab, an application for locating and booking free cabs in densely crowded
traffic environments (like Manhattan, where looking for a free cab can be an annoying experi-
ence). We envision that the use of embedded devices in cars will soon become a reality [60, 65].
Instead of calling a cab company or merely “gesturing” to negotiate a cab for her destination,
a client can simply inject an SM through her handheld device to perform seamlessly the same
action. Unlike the existing solutions for inter-car communication that are based on certain in-
frastructures (which are expensive, cannot be deployed on every road, and provide only limited
information), EZCab uses a peer-to-peer approach whose key benefits are scalability and prac-
ticality. The minimal infrastructure needed by EZCab is the availability of the SM support in
the cabs, a location service (e.g., GPS), and wireless connectivity.
The main component of EZCab is an SM that migrates to a cab identified by aFreeCabtag,
negotiates the price according to a client-established limit, let the cab know the identity of the
client, and instructs the cab to go to the client’s location. The booking is complete after the cab
sends a message with its identification to the client and the client acknowledges this message.
When the cab arrives at the client’s location, a validation process takes place to ensure that the
client gets her booked cab (and the cab takes the client that booked it). In the following, we
present a brief description of the basic operations in EZCab: (1) discovering the routes to free
75
cabs, (2) booking a free cab, and (3) performing the validation between the cab and the client.
We conclude the section with an analysis of our application based on experimental results.
Route Discovery.The EZCab application starts at the client node and takes as parameter
the radius of the circular geographical region to be covered (the maximum number of hops,
maxHops, for which any EZCab SM is allowed to migrate is computed based on this radius and
the transmission range of the nodes). To reach a free cab, the SM uses routing tables that specify
the next hop as the probability to reach a free cab from the current node (similar to probabilistic
routing [72]). If the probability to find a free cab using the existing routes is too low, or there are
no routes at all, the SM creates aroute discovery SMand blocks waiting for routes (Figure 5.8
illustrates this process). Each route discovery SM migrates through the network until it arrives
at a node already visited by another discovery SM (i.e., it ends its execution) or reaches the
maximum number of hops that it is allowed to migrate. Once this threshold is reached, the
SM migrates back one hop and reports its current information. This is a recursive process that
builds the routing tables at nodes. We have chosen to wait for replies for a given period of
time because it is difficult to wait for a fixed number of replies in a volatile network (i.e., those
replies may never arrive).
Cab Booking. Booking a cab is a three-way handshake protocol. If a node has routes to
free cabs, the application creates abooking SMto find a free cab and blocks for a certain amount
of time. If the cab is not free, the booking SM chooses the next neighbor greedily (i.e., using
the greatest probability in the routing table), as shown in Figure 5.9. Once a free cab is found,
the SM removes theFreeCabtag, writes the client’s location in theLocationtag, and creates a
reporting SMto confirm the booking with the client. Then, it blocks at the cab waiting for an
acknowledgment from the client.
The reporting SM migrates to the client’s location using geographical routing to improve
the efficiency. Once it has informed the client that a cab is on its way, it returns to the cab
with an acknowledgment to let the cab know that the handshake has succeeded. If no reply is
received from a cab after a timeout, EZCab will re-start with a new best route. Consequently,
the booking SM waiting at the cab times out and re-creates aFreeCabtag to reflect the change
in the cab’s status.
Validation. Upon reaching the client’s location, the validation mechanism is initiated. To
76
make the validation possible, the booking SM carries the public key of the client to the cab,
and the reporting SM carries the public key of the cab to the client. To validate the client, the
cab broadcasts a challenge in the zone by encrypting a text using the client’s public key. The
client, upon receiving the encrypted text, decrypts it using its private key. In turn, it uses the
cab’s public key to encrypt the text again and send it to the cab. If the reply text is identical, the
client is validated.
Analysis. For EZCab, it is of particular importance to evaluate its completion time given re-
alistic configurations. In the following, we present an analysis which demonstrates that EZCab
can cover a circular area up to 1km radius around the client’s location, and the user-perceived
response time is less than 2 seconds. Figure 5.10 shows our EZCab prototype.
The first part of our evaluation tries to determine the maximum distance at which two mov-
ing cars can communicate and the time for which the topology is relatively stable. Using two
HP iPAQs with 802.11 cards for communication, and various mobility scenarios (as much as
170km/h relative speed between two cars moving in opposite directions), we have experienced
a substantial increase in the packet loss rate for distances bigger than 60m. We consider this
distance feasible for our target networks (we have also experimented with external antennas
and amplifiers to increase to range as much as 400m). Given this distance, two cars are in the
communication range of each other for approximately 2 seconds at a relative speed of 120km/h
(i.e., typical speed for two cars moving in opposite directions in a crowded city). Therefore,
our application should complete faster than that in order to reduce the effects of mobility on the
established routes.
The second part of our evaluation tries to see if EZCab can finish using this time bound, and
how big the geographical region covered by our application is (i.e., a bigger region increases the
probability to locate a free cab). The response time for EZCab is defined as the time spent until
the client receives a confirmation from the cab. The design of EZCab makes it easy to bound
this response time. All the main operations (route discovery, booking a cab, and reporting a
booked cab) are bounded by a timeout. Therefore, the maximum response time for a successful
booking is the sum of the timeouts for route discovery and booking a cab.
We compute the timeouts for each SM generated by EZCab as the products of the round
trip time of each SM between two nodes (RTT) and the maximum number of hops traveled
77
Figure 5.10: EZCab Prototype Figure 5.11: Estimated Completion Time forEZCab
by an SM (maxHops). We further assume that all cabs have the code cached. SMs transfer
only small size data bricks and execution control state when the code is cached. Hence, the
measured values of the RTTs for the three SMs are almost identical (24.3ms, 25.4ms, 25.1ms).
To include the costs of SM execution and wireless contention, we consider, conservatively, a
value three times greater. Since booking a cab and reporting back to the client do not involve
any broadcast, we just double the minimum timeout value for booking a cab and reporting back
to the client.
Figure 5.11 shows the response time as a function of the size of the covered region. The re-
sults indicate that EZCab can finish in less than 2 seconds for a region radius of approximately
1km even if it needs to perform a route discovery. As determined by the first part of our evalua-
tion, we expect the network topology to be relatively stable during this time period. Therefore,
we conclude that SMs offer the flexibility to program the EZCab application without any in-
frastructure, and the analysis of the actual SM implementation demonstrates the feasibility of
EZCab for densely populated cities.
5.3 Spatial Programming using Smart Messages
SP requires a set of programming constructs that have to be exposed to programmers and a
runtime system to support the model. The constructs can be added as extensions to any pro-
gramming language or implemented as library calls. In this section, we describe the SP imple-
mentation using SMs. Under this implementation, SP applications are Java programs. The SP
78
read/write {space:tag[index], timeout}.resource
migrate(space, timeout)
lookup {space:tag[index]}
timeout throw exceptionspace unreachable/
success
read location and node’s unique tagId
migrate(location, timeout)
create unique tagId at nodeverify node’s unique tagId
success success
throw exception
timeout
read/write resource
fail
existsreference does not existreference
loca
tion
unre
acha
ble timeout timeout
throw exception
success
success
create list of ineligiblenodes [mapped_nodes]
migrate(tag, space, mapped_nodes, timeout)
create a new entry in mapping table
in mapping table
migrate(tagId, space, timeout)
Figure 5.12: Implementation of Spatial References with Smart Messages
programming constructs can be invoked as Java methods, which are supported by our SM-based
runtime system.
An SM-based runtime system is suitable for SP not only because SMs provide the abil-
ity to re-program the network on-the-fly, but also because the tag space offers a simple and
uniform interface for accessing data or services at nodes. Additionally, SP benefits from the
SM self-routing mechanism; in reaching a node, the runtime system may use different routing
algorithms and change the routing dynamically.
The main idea in our implementation is to translate high level SP programs into SMs.
However, SP programs (written in Java) are not aware of the underlying SMs. To use the
SM-based runtime system, they have to follow three simple rules: (1) extend anSMWrapper
class which provides methods for the SP programming constructs, (2) initialize the SMWrapper
by passing the class names for all classes that do not belong to our SM distribution (i.e., in order
to be incorporated in the SM as code bricks), (3) use only class member variables (in this way,
the SMWrapper knows what data needs to be transferred as data bricks). Under these rules, SP
79
applications are just normal Java programs that access transparently network resources using
spatial references.
At initialization, the SMWrapper creates themapping tablewhich maintains the mappings
between spatial references and nodes. Also, the SMWrapper includes the code and data bricks
for two routing algorithms: geographical routing (to reach the space of interest), and space
bound content-based routing (to reach a node of interest within a given space). After the initial-
ization is done, the SMWrapper creates and injects a new SM in the network. This SM includes
the code and data for the SP program.
Essentially, the SMWrapper performs the SP-to-SM translation by transforming each ac-
cess to a network resource (read/write) into an SM migration. Figure 5.12 illustrates the main
steps necessary to read/write a resource located on a referenced node. For both mapped and
unmapped spatial references, the SM migrates to the desired space using geographical routing.
We have implemented a greedy geographical routing similar to GPSR [48].
When the space is reached, the SM checks if the spatial reference exists by performing
a lookup in the mapping table. In the left part of the figure, we show how to reach a node
referenced by an existent spatial reference (i.e., reference consistency). The right part shows
how a new node of interest is found and mapped to a spatial reference.
If the reference does not exist, the SM has to discover a node of interest in the given space.
Therefore, it changes dynamically its routing to a content-based on-demand routing (similar to
AODV [68]) which is used to discover a node of interest. Due to its limited geographical scope,
flooding does not represent a major problem for scalability. Once a matching node is found,
the SM assigns a unique network address to this node by creating a uniquetagID in this node’s
tag space. Subsequently, thetagID and the location of the node are stored in the associated
mapping table entry. In the process of mapping a new spatial reference, the mapped nodes
having the same space-tag pair must be avoided (i.e., the application asked for a new node). To
solve this problem, we retrieve the list of uniquetagIDs corresponding to the mapped nodes
and pass it to the routing algorithm. It is the responsibility of the routing to find an unmapped
node.
To ensure reference consistency, subsequent accesses to an existent spatial reference must
reach the same node. Therefore, the SM retrieves the location of the mapped node from the
80
// Instruction in an SP Applicationimage = {Hill1:Camera[1], timeout}.image;
// Content of Mapping Table for the Above Spatial Reference// (space, tag, index) = (unique_tag, location){Hill1, Camera, 1} = {yU78GH5, location}
// SMWrapper Code to Access the Above Spatial Referencetry{
location = MappingTable.getLocation(Hill1, Camera, 1);GeographicalRouting.migrate(location, timeout);
}catch(Exception e){ // LocationUnreachable or Timeouttry{
unique_tag = MappingTable.getUniqueTag(Hill1, Camera, 1);ContentBasedRouting.migrate(unique_tag, Hill1, timeout);
}catch(TimeoutException e){throw TimeoutException;
}}return TagSpace.readTag(image);
Figure 5.13: Example of Spatial Reference Access
mapping table and migrates directly to this location. According to spatial references’ semantics,
if the node is not present at that location anymore, the SM will try to reach it in the same space
region using its uniquetagID.
When the node of interest is reached, the SP program resumes its execution (i.e., it starts
with the read/write operation which triggered the entire migration process). The tag space
primitives are used to give the application access to local resources. If a node of interest is
not found during the time interval specified by the application, or the space is unreachable, an
exception is thrown to let the application decide further actions.
Figure 5.13 illustrates the main operations performed for a spatial reference access. The
SP application uses an already mapped spatial reference to read animageat aCameranode on
Hill1 . The mapping table (carried in the SM data brick) contains the unique tag created on this
node at the time of the first access, as well as the location of this node. The SM tries first to use
geographical routing to migrate to the referenced node. If this operation fails, either because
of a timeout or because the location is unreachable (e.g., the node might have moved from its
previously recored location), the SM tries to use the space-bound content-based routing to find
81
the node. If the node is still in the same space region (Hill1 ), the SM migrates on it, reads the
imagetag and return its value. Otherwise, the SP application receives a timeout exception.
5.4 Spatial Programming Evaluation
This section presents the implementation and evaluation of an SP application executed over our
SM-based runtime system. We have evaluated this application on a testbed consisting of ten HP
iPAQs. For wireless communication, we use Orinoco 802.11b PC cards in ad hoc mode. Each
node supports the Smart Messages (SM) architecture. Our goal in conducting this evaluation
study was twofold: (1) to verify the viability of the SP model in terms of ease of programming,
and (2) to analyze the performance of our SM-based runtime system.
The application is similar to the object tracking application described in Chapter 2. Essen-
tially, the application (injected by a user from a handheld device) performs intrusion detection
over a monitored space region. It verifies the status of the motion sensors, and if one of them
have detected motion, the application turns on a certain number of cameras to perform face
recognition. After all these cameras have been turned on, the application returns to each of
them to verify the result of the face recognition program. If at least half of the cameras have
recognized a face, the application informs the user that an intruder has been detected.
For this application, some of our nodes are identified by aCameratag (i.e., they have
an attached video camera), while others are identified by aLight tag (i.e., instead of motion
sensors, we use light sensors incorporated in iPAQs; we consider that motion was detected
when the light intensity is above a certain threshold). The camera nodes provide also tags to
activate the camera and get the result of the face recognition program. Figure 5.16 shows a
typical camera node used in our experiments.
The Java code for this application, presented in Figure 5.14, demonstrates the main benefit
of SP: flexibility to program complex distributed applications in outdoor computing environ-
ments in a simple, network-transparent fashion. Therun method shows how spatial references
shield the programmers from the networking details. It also demonstrates reference consis-
tency; the runtime system guarantees that the same cameras which have been activated to per-
form face recognition are turned off after the operation completes. Note that the SM-based
82
public class IntruderDetection extends SMWrapper{
public Space userSpace, monitoredSpace;public int i, j, count, numSensors, numCameras, timeout, threshold;public SpatialReference srLight, srUser, []srCamera;
public static void main(String []args){IntruderDetection intruderDetection = new IntruderDetection();// read and store application’s parametersString []userClasses = {"IntruderDetection"} ;intruderDetection.initSMWrapper(userClasses, intruderDetection);intruderDetection.run();
}
public void run(){try{
for (i=0; i<numSensors; i++){srLight = getSpatialReference(monitoredSpace, "Light", i, timeout);if (((Integer)srLight.read("Intensity")).intValue() > threshold){
srCamera = new SpatialReference[numCameras];for (j=0; j<numCameras; j++){
srCamera[j] = getSpatialReference(monitoredSpace, "Camera", j, timeout);srCamera[j].write("Active", "ON");
}for(j=0,count=0; j<numCameras; j++){
if (((Boolean)srCamera[j].read("FaceRecognition")).booleanValue())count++;
srCamera[j].write("Active", "OFF");}if (count > numCameras/2){
srUser = getSpatialReference(userSpace, "User", 0, timeout);srUser.write("Message", "intruder detected!!");return;
}}
}}catch(TimeoutException e){}
}}
Figure 5.14: Java Code for Intrusion Detection Application
runtime system is transparent to the programmer, except in themainmethod which performs
the initialization (i.e., the SMWrapper is initialized in order to allow it to create the SM that
will carry the SP application through the network).
For experiments, we have considered the simple network topology presented in Figure 5.15.
The response time is heavily influenced by the size of the payload carried by the SM “incar-
nation” of our SP application. Figure 5.17 presents the breakdown of the SM payload (code,
data, and execution state). The code consists of SP application and SM-based runtime library
code (i.e., the SM needs to carry the runtime library code to those nodes where this code is not
cached). We can see that the execution state is small (under 3% of the total size). The biggest
83
light sensoruser node camera
monitored space
regular node
Figure 5.15: The Network Topology for In-trusion Detection Application Figure 5.16: Typical Camera Node with GPS
Receiver Attached
Figure 5.17: Smart Message Code Break-down for Intrusion Detection Application
Figure 5.18: Spatial Programming RuntimeLibrary Code Breakdown
contribution comes from the library code (the size of its components are shown in Figure 5.18).
This code, however, is cached at nodes in the common case. In Figure 5.19, we present the
total execution time for the application in two cases: (1) the code is not cached at any node
when the application starts (but the caching is activated in the network), and (2) the SM-based
runtime library code is cached at every node (i.e., only the application code is migrated through
the network). In this experiment, we do not perform the face recognition because our goal is to
evaluate the performance of the SP runtime system (i.e., the execution time for the face recog-
nition is an order of magnitude greater than the rest of the application). The results indicate
that our SP runtime implementation based on SMs can achieve good performance, especially
when the runtime library is cached at nodes. We observe that caching leads to a 57% decrease
in the overall response time. The time breakdown shows how each basic operation is affected
by code caching. The time to reach the space of interest and the time to migrate to target
nodes are significantly reduced (as much as 70%). The route discovery time experiences a less
significant decrease due to the unavoidable contention encountered in wireless networks for
84
Figure 5.19: Execution Time for Intrusion Detection Application
flooding-based algorithms.
5.5 Experiences and Lessons Learned from Building our Prototypes
In this section, we present the experiences and lessons learned from building prototypes for
Smart Messages (SM) and Spatial Programming (SP), as well as testing them on top of ad
hoc networks of PDAs. In many aspects, this was pioneering work because the technology
that allows the deployment of wireless embedded systems in the physical world has started to
become more mature (and implicitly commercially available) only in the last two-three years.
The first lesson learned during our initial evaluation of possible hardware and software op-
tions for the prototype is that a good balance needs to exist between the amount of resources
available at nodes and the ease of programming. The hardware limitations, specific to em-
bedded systems, can lead to a very low level programming interface, and consequently, make
programming extremely tedious. From a programmer perspective, working with resource con-
strained systems is similar to “going several decades back in time” (i.e., these systems are
similar to the computers used twenty or thirty years ago). The use of extremely limited sys-
tems should be avoided, if possible, for the sake of faster implementation of real prototypes.
The danger is that a certain idea becomes irrelevant if too much time is spent on non-essential
implementation issues.
Since we needed an open platform, we had discarded from the beginning any systems
85
based on Microsoft Windows CE and PalmOS. Thus, the only valid options remained either
TinyOS-based sensors or Linux-based embedded systems. Programming sensor networks us-
ing TinyOS [35] has been demonstrated to be very difficult and time consuming due to the low
level programming interface. Additionally, our target systems were significantly more power-
ful than sensors. Therefore, we have focused on systems capable of running Linux. One of our
original ideas was to build a prototype using Axis micro-controllers [3] and Bluetooth for com-
munication. Although these micro-controllers ran a reduced version of Linux (microLinux),
writing distributed applications on top of them has been difficult for two main reasons. The
first was the lack of a memory management unit on these micro-controllers. Hence, it was
very easy to write buggy programs that crashed the entire system. A different problem came
from the immaturity of Bluetooth technology at that time. The Bluetooth chips had bugs that
ultimately rendered them unusable.
Finally, we ended up with HP iPAQ PDAs running Linux and communicating through
802.11 PC cards. The iPAQs provided a good balance between the amount of resources (much
less than traditional PCs, but enough to develop our prototype) and ease of programming. From
a programmer point of view, their best feature was the ability to run an unmodified Linux ker-
nel. Thus, we have been able to develop our prototype on PCs using traditional programming
tools for Linux, and then, cross-compile it for ARM-based processors (iPAQs have a 206MHz
Strong ARM processor). From a hardware point of view, iPAQs come equipped with two PC
card slots and a serial interface. For instance, this configuration allowed us to use one PC card
slot for wireless communication, the other for a video camera, and the serial interface for a GPS
receiver. Such a node is relatively powerful and can be used in non-trivial outdoor distributed
applications.
A problem that we faced during our experiments was the accuracy of raw GPS data, which
sometimes varies significantly. However, this problem can be solved using various “smoothing”
techniques. In a different work [26], we have shown how such techniques can improve the
localization of cars on the roads despite the relative inaccuracy of raw GPS data.
The biggest problem that we encountered was testing and running experiments in real-life
conditions. The lesson learned is that a good emulator is needed to ensure that the software
86
function properly indoor, and only after that it makes sense to do outdoor experiments. Oth-
erwise, the entire process is a huge waste of time for many people (i.e., outdoor experiments
require nodes distributed across large areas; thus, many people are needed).
Testing ad hoc networks with multi-hop communication has presented big logistics chal-
lenges because it is hard to build topologies for certain experiments when the connectivity
varies a lot between nodes. This variation occurs mostly because of obstacles located between
the nodes. When the nodes are mobile, maintaining a certain degree of connectivity while
having multi-hop communication is also very difficult. This is true especially for small size
networks when a node can easily move out of range. In a different testbed [26], we have used
omni-directional antennas to increase the communication range. Another solution we came up
with recently is to emulate mobility indoors. In this way, the implement-test-debug cycle can
be shortened significantly.
Many times we have experienced faults due to lost connectivity or lost GPS coverage. De-
signing robust applications in outdoor computing environments is a problem that needs further
study. Since it is impossible to provide real-time guarantees, our mechanisms are based on
soft deadlines. Currently, it is the programmer’s responsibility to decide what to do when the
application receives a timeout exception generated by a non-satisfied soft deadline. A sys-
tematic study of the problems encountered in outdoor computing environments may help us
improve the system support for fault tolerance, and consequently, relieve the programmer from
the burden of taking care of all the exceptions.
One problem we have observed is that programmers have their mind set on the traditional
message passing model (send/receive between two fixed end-points), and it takes a certain
amount of time until they learn how to program with migration when writing SM-based dis-
tributed applications. Furthermore, to make execution migration efficient, we require the pro-
grammers to explicitly specify the data that needs to be accessed across migrations. The most
common bug encountered in SMs was forgetting to include a certain variable in a data brick
(i.e., forgetting to make it a global object variable in that data brick). This situation occurs in
more complex programs such as those using recursion.
Although both SP and SM can be used to write any type of distributed application, we have
been used them successfully for relatively simple, sequential programs. Commonly, we had
87
one or few applications running concurrently in the network. Additionally, these applications
were mostly cooperative applications. Injecting multiple competing applications in the network
should provide better insights in the design and implementation of our systems.
5.6 Summary
In this chapter, we have described the prototype implementation of Smart Messages (SM) and
Spatial Programming (SP) using SMs. We have demonstrated the feasibility of these implemen-
tations through applications executed over ad hoc networks of PDAs (HP iPAQs equipped with
IEEE 802.11 wireless cards). Although difficult to build and test, our prototype has enabled
rapid development of outdoor distributed applications. The experimental results have indicated
that the SP model and the SM system architecture are viable solutions for outdoor distributed
computing.
88
Chapter 6
Conclusions
With the emergence of outdoor computing environments consisting of massive numbers of net-
worked embedded systems deployed everywhere in the physical world, computing is becoming
pervasive for the first time in history. However, this huge computing infrastructure lacks proper
support for programmability. The volatility, heterogeneity, and scale that characterize the net-
works of embedded systems (NES) make programming these networks a very challenging task.
The main question that this dissertation has tried to answer is:
• Can we take advantage of the ubiquity of NES and program distributed applications on
top of them?
The traditional distributed computing models and system architectures have not been de-
signed for networks such as NES, where the systems are extremely heterogeneous and the
network configuration evolves continuously over time. Therefore, we have raised the following
questions:
• Can we provide a simple and intuitive programming model that allows programmers to
reason about the algorithmic details of the applications rather than spend time coping
with the highly volatile nature of NES?
• Can we provide a common distributed computing platform that supports a cooperative
execution environment across NES for virtually any user-defined application?
This dissertation have presented the design and implementation of Spatial Programming
(SP) and Smart Messages (SM), which provide a programming model and a distributed com-
puting platform for programming distributed applications on top of NES. To the best of our
knowledge, SP is the first attempt to design and implement a location-aware programming
89
model for outdoor distributed computing. SP offers fine-grained, network-transparent access
to systems embedded in the physical space. Central to SP is the concept of spatial reference,
which defines a virtual name space over NES using the expected locations and properties of
these systems. Programmers use spatial references to access the content or services provided
by nodes in the network in the same way they use variables in a conventional program. The
main benefits of SP are the flexibility and simplicity to program user-defined distributed appli-
cations in highly volatile outdoor computing environments.
The SM architecture provides a common distributed computing platform across NES. SMs
overcome the volatility, heterogeneity, and scale encountered in NES by migrating the execu-
tion to nodes of interest and self-routing between these nodes. The SM system architecture
is suitable for resource constrained systems because it defines a lightweight system support at
nodes, with most of the “intelligence” incorporated into SMs. SMs represent an attractive alter-
native to traditional distributed computing based on end-to-end message passing because they
adapt quickly to highly dynamic networks and provide support for deploying new applications
in existing networks.
To demonstrate the feasibility of the proposed solutions, we have designed and implemented
a prototype system. The experimental results for several applications executed over ad hoc
wireless networks of PDAs have indicated that the SP model and the SM architecture are viable
solutions for outdoor distributed computing. Additionally, simulation results for larger scale
networks have shown the performance benefits of the SM self-routing mechanism.
The conclusion of this dissertation is that, although difficult, programming outdoor dis-
tributed applications is possible when the programming models and system architectures are
specifically designed to address the volatility, heterogeneity, and scale exhibited by networks
of embedded systems.
90
References
[1] Java 2 Platform, Micro Edition (J2ME). http://java.sun.com/j2me/.
[2] K Virtual Machine. http://java.sun.com/products/cldc/.
[3] “Axis Communications.” http://www.axis.com.
[4] “Intelligent Transportation Systems, U.S. Department of Transporation.”http://www.its.dot.gov.
[5] “JavaSpaces.” http://wwws.sun.com/software/jini/specs/jini1.1html/js-title.html.
[6] “Linux Devices.” http://www.linuxdevices.com.
[7] “Linux Kernel Procfs.” http://www.kernelnewbies.org/documents/kdoc/procfs-guide/intro.html.
[8] “The Message Passing Interface (MPI) Standard.” http://www-unix.mcs.anl.gov/mpi/.
[9] “Trusted Computing.” http://www.cl.cam.ac.uk/ rja14/tcpa-faq.html.
[10] “XML.” http://www.w3.org/XML/.
[11] M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young,“Mach: A new kernel foundation for unix development,” inProceedings of the USENIX1986 Summer Conference, (Atlanta, GA), July 1986, pp. 93–113.
[12] S. Adhikari, A. Paul, and U. Ramachandran, “D-Stampede: Distributed ProgrammingSystem for Ubiquitous Computing,” inProceedings of the 22nd International Conferenceon Distributed Computing Systems (ICDCS 2002), (Vienna, Austria), July 2002, pp. 209–216.
[13] W. Adjie-Winoto, E. Schwartz, H. Balakrishnan, and J. Lilley, “The Design and Imple-mentation of an Intentional Naming System,” inProceedings of the 17th ACM Symposiumon Operating Systems Principles (SOSP 1999), (Charleston, SC), ACM Press, New York,NY, 1999, pp. 186–201.
[14] O. Agesen, “GC Points in a Threaded Environment,” Technical Report SMLI TR-98-70,Sun Microsystems Laboratories, Palo Alto, CA, December 1998.
[15] G. Banavar, J. Beck, E. Gluzberg, J. Munson, J. Sussman, and D. Zukowski, “Challenges:An Application Model for Pervasive Computing,” inProceedings of the Sixth annualACM/IEEE International Conference on Mobile Computing and Networking (MobiCom2000), (Boston, MA), August 2000, pp. 266–274.
[16] B. Bloom, “Space/time trade-offs in hash coding with allowable errors,”Communicationof the ACM, vol. 13, no. 7, pp. 422–426, July 1970.
[17] B. Blum, P. Nagaraddi, A. Wood, T. Abdelzaher, S. Son, and J. Stankovic, “An EntityMaintenance and Connection Service for Sensor Networks,” inProceedings of the FirstInternational Conference on Mobile Systems, Applications, and Services (MobiSys 2003),(San Francisco, CA), May 2003, pp. 201–214.
91
[18] P. Bonnet, J. E. Gehrke, and P. Seshadri, “Querying the Physical World,”IEEE PersonalCommunications, vol. 7, no. 5, pp. 10–15, October 2000.
[19] C. Borcea, C. Intanagonwiwat, P. Kang, U. Kremer, and L. Iftode, “Spatial Programmingusing Smart Messages: Design and Implementation,” inProceedings of the 24th Inter-national Conference on Distributed Computing Systems (ICDCS 2004), (Tokyo, Japan),March 2004, pp. 690–699.
[20] C. Borcea, C. Intanagonwiwat, A. Saxena, and L. Iftode, “Self-Routing in PervasiveComputing Environments using Smart Messages,” inProceedings of the 1st IEEE In-ternational Conference on Pervasive Computing and Communications (PerCom 2003),(Dallas-Fort Worth, TX), March 2003, pp. 87–96.
[21] C. Borcea, D. Iyer, P. Kang, A. Saxena, and L. Iftode, “Cooperative Computing for Dis-tributed Embedded Systems,” inProceedings of the 22nd International Conference onDistributed Computing Systems (ICDCS 2002), (Vienna, Austria), July 2002, pp. 227–236.
[22] A. Boulis, C. Han, and M. Srivastava, “Design and Implementation of a Framework forEfficient and Programmable Sensor Networks,” inProceedings of the First InternationalConference on Mobile Systems, Applications, and Services (MobiSys 2003), (San Fran-cisco, CA), May 2003, pp. 187–200.
[23] V. Cahill and et al, “Using trust for secure collaboration in uncertain environments,” inPervasive Computing, IEEE, volume 2(3), 2003, pp. 52–61.
[24] N. Carriero and D. Gelernter, “Linda in context,”Communications of the ACM, vol. 32,no. 4, pp. 444–458, April 1989.
[25] D. Wetherall, “Active Network Vision Reality: Lessons from a Capsule-based System,” inProceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP 1999),(Charleston, SC), ACM Press, New York, NY, December 1999, pp. 64–79.
[26] S. Dashtinezhad, T. Nadeem, B. Dorohonceanu, C. Borcea, P. Kang, and L. Iftode, “Traf-ficView: A Driver Assistant Device for Traffic Monitoring based on Car-to-Car Commu-nication,” inProceedings of the 59th IEEE Semiannual Vehicular Technology Conference,May 2004.
[27] F.Hohl, “Time Limited Blackbox Security: Protecting Mobile Agents from MaliciousHosts,” in G. Vigna, editor,Mobile Agents and Security, volume 1419 ofLecture Notes inComputer Science, pp. 92–113, Springer-Verlag, London, UK, 1998.
[28] V. Galtier, K. Mills, Y. Carlinet, S. Bush, and A. Kulkarni, “Predicting resource demandin heterogeneous active networks,” inMilitary Communications Conference, 2001 (MIL-COM 2001). Communications for Network-Centric Operations: Creating the InformationForce, (Washington, D.C.), October 2001, pp. 905–909.
[29] R. Gray, G. Cybenko, D. Kotz, and D. Rus, “Mobile agents: Motivations and state of theart,” in J. Bradshaw, editor,Handbook of Agent Technology, AAAI/MIT Press, 2002.
[30] R. Gray, D. Kotz, G. Cybenko, and D. Rus, “D’Agents: Security in a multiple-language,mobile-agent system,” in G. Vigna, editor,Mobile Agents and Security, volume 1419 ofLecture Notes in Computer Science, pp. 154–187, Springer-Verlag, London, UK, 1998.
[31] R. Grimm and et al, “Systems Directions for Pervasive Computing,” inProceedings ofthe 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII), (Elmau/Oberbayern,
92
Germany), IEEE Computer Society, Washington, DC, May 2001, pp. 147–151.
[32] M. Gritter and D. Cheriton, “An Architecture for Content Routing Support in the Inter-net,” inProceedings of the 3rd USENIX Symposium on Internet Technologies and Systems(USITS 2001), (San Francisco, CA), March 2001, pp. 37–48.
[33] J. Heideman, F. Silva, C. Intanagonwiwat, R. Govindan, D. Estrin, and D. Ganesan,“Building Efficient Wireless Sensor Networks with Low-Level Naming,” inProceed-ings of the 18th ACM Symposium on Operating Systems Principles (SOSP 2001), (Banff,Canada), ACM Press, New York, NY, October 2001, pp. 146–159.
[34] W. R. Heinzelman, J. Kulik, and H. Balakrishnan, “Adaptive Protocols for Informa-tion Dissemination in Wireless Sensor Networks,” inProceedings of the Fifth annualACM/IEEE International Conference on Mobile Computing and Networking (MobiCom1999), (Seattle, WA), ACM Press, New York, NY, August 1999, pp. 174–185.
[35] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K. Pister, “System ArchitectureDirections for Networked Sensors,” inProceedings of the Ninth International Conferenceon Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), (Cambridge, MA), ACM Press, New York, NY, November 2000, pp. 93–104.
[36] Y. Hu, A. Perrig, and D. Johnson, “Ariadne: a secure on-demand routing protocol for adhoc networks,” inProceedings of the 8th annual ACM/IEEE International Conference onMobile Computing and Networking (MobiCom 2002), (Atlanta, GA), ACM Press, NewYork, NY, September 2002, pp. 12–23.
[37] L. Iftode, C. Borcea, and P. Kang, “Cooperative Computing in Sensor Networks,” inM. Ilyas, editor,Handbook of Sensor Networks: Compact Wireless and Wired SensingSystems, CRC Press, July 2004.
[38] L. Iftode, C. Borcea, A. Kochut, C. Intanagonwiwat, and U. Kremer, “Programming Com-puters Embedded in the Physical World,” inProceedings of the 9th Workshop on FutureTrends of Distributed Computing Systems (FTDCS 2003), May 2003, pp. 78–85.
[39] C. Intanagonwiwat, R. Govindan, and D. Estrin, “Directed Diffusion: A Scalable and Ro-bust Communication Paradigm for Sensor Networks,” inProceedings of the Sixth annualACM/IEEE International Conference on Mobile Computing and Networking (MobiCom2000), (Boston, MA), ACM Press, New York, NY, August 2000, pp. 56–67.
[40] J. Elson and L. Girod and D. Estrin, “Fine-Grained Network Time Synchronization us-ing Reference Broadcasts,” inProceedings of the 5th Symposium on Operating SystemsDesign and Implementation (OSDI 2002), December 2002, pp. 64–79.
[41] D. Johnson and D. Maltz,Dynamic Source Routing in Ad Hoc Wireless Networks. T.Imielinski and H. Korth, (Eds.). Kluwer Academic Publishers, 1996.
[42] P. Juang, H. Oki, Y. Wang, M. Martonosi, L. Peh, and D. Rubenstein, “Energy-EfficientComputing for Wildlife Tracking: Design Tradeoffs and Early Experiences with Ze-braNet,” inProceedings of the Tenth International Conference on Architectural Supportfor Programming Languages and Operating Systems (ASPLOS-X), (San Jose, CA), ACMPress, New York, NY, October 2002, pp. 96–107.
[43] B. Jung and G. S. Sukhatme, “Cooperative Tracking using Mobile Robots andEnvironment-Embedded, Networked Sensors,” inthe 2001 IEEE International Sympo-sium on Computational Intelligence in Robotics and Automation.
93
[44] P. Kang, C. Borcea, G. Xu, A. Saxena, U. Kremer, and L. Iftode, “Smart Messages: ADistributed Computing Platform for Networks of Embedded Systems,”The ComputerJournal, Special Focus-Mobile and Pervasive Computing, vol. 47, no. 4, pp. 475–494,July 2004. The British Computer Society. Oxford University Press.
[45] E. Kaplan, editor,Understanding GPS: Principles and Applications. Artech House, 1996.
[46] N. Karnik and A. Tripathi, “Agent Server Architecture for the Ajanta Mobile-Agent Sys-tem,” in Proceedings of the 1998 International Conference on Parallel and DistributedProcessing Techniques and Applications (PDPTA’98), (Las Vegas, NV), July 1998, pp.66–73.
[47] N. Karnik and A. Tripathi, “Security in the Ajanta Mobile Agent System,”Software Prac-tice and Experience, vol. 31, no. 4, pp. 301–329, January 2001.
[48] B. Karp and H. Kung, “Greedy Perimeter Stateless Routing for Wireless Networks,” inProceedings of the Sixth annual ACM/IEEE International Conference on Mobile Comput-ing and Networking (MobiCom 2000), (Boston, MA), ACM Press, New York, NY, August2000, pp. 243–254.
[49] Y.-B. Ko and N. H. Vaidya, “Location-Aided Routing(LAR) in Mobile Ad Hoc Net-works,” in Proceedings of the Fourth annual ACM/IEEE International Conference onMobile Computing and Networking (MobiCom), October 1998, pp. 66–75.
[50] T. Lehman, A. Cozzi, Y. Xiong, J. Gottschalk, V. Vasudevan, S. Landis, P. Davis,B. Khavar, and P. Bowman, “Hitting the distributed computing sweet spot with tspaces,”Computer Networks: The International Journal of Computer and TelecommunicationsNetworking, vol. 35, no. 4, pp. 457–472, March 2001.
[51] P. Levis and D. Culler, “Mate: A Virtual Machine for Tiny Networked Sensors,” inPro-ceedings of the Tenth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS-X), (San Jose, CA), ACM Press, New York,NY, October 2002, pp. 85–95.
[52] K. Li, “Shared virtual memory on loosely-coupled multiprocessors.” Ph.D. Thesis, YaleUniversity, October 1986. Tech Report YALEU-RR-492.
[53] M. Satyanarayanan, “Pervasive Computing: Vision and Challenges,”IEEE PersonalCommunications, August 2001.
[54] M. Welsh and G. Mainland, “Programming Sensor Networks Using Abstract Regions,”in Proceedings of the First USENIX/ACM Symposium on Networked Systems Design andImplementation (NSDI 2004), March 2004.
[55] S. Madden, M. Franklin, J. Hellerstein, and W. Hong, “TAG: a Tiny AGgregation Ser-vice for Ad-Hoc Sensor Networks,” inProceedings of the 5th Symposium on OperatingSystems Design and Implementation (OSDI)., December 2002.
[56] S. Madden, M. Franklin, J. Hellerstein, and W. Hong, “The Design of an AcquisitionalQuery Processor for Sensor Networks,” inProceedings of the 2003 ACM SIGMOD inter-national conference on Management of data, (San Diego, CA), ACM Press, New York,NY, June 2003, pp. 491–502.
[57] S. McCanne and S. Floyd. ns Network Simulator. http://www.isi.edu/nsnam/ns/.
[58] D. Milojicic, F. Douglis, Y. Paindaveine, R. Wheeler, and S. Zhou, “Process migration,”ACM Computing Surveys, vol. 32, no. 3, pp. 241–299, September 2000.
94
[59] J. Moore, M. Hicks, and S. Nettles, “Practical Programmable Packets,” inProceedings ofthe 20th Annual Joint Conference of the IEEE Computer and Communications Societies(INFOCOM 2001), (Anchorage, AK), April 2001, pp. 41–50.
[60] R. Morris, J. Jannotti, F. Kaashoek, J. Li, and D. Decouto, “CarNet: A Scalable Ad HocWireless Network System,” inProceedings of the 9th ACM SIGOPS European Workshop,(Kolding, Denmark), ACM Press, New York, NY, September 2000, pp. 61–65.
[61] S.-Y. Ni, Y.-C. Tseng, Y.-S. Chen, and J.-P. Sheu, “The Broadcast Storm Problem in aMobile Ad Hoc Network,” inProceedings of the Fifth Annual ACM/IEEE InternationalConference on Mobile Computing and Networking (MobiCom 1999), (Seattle, WA), 1999,pp. 151–162.
[62] Y. Ni, U. Kremer, and L. Iftode, “Spatial Views:space-aware programming for networksof embedded systems,” inProceedings of the 16th International Workshop on Languagesand Compilers for Parallel Computing (LCPC 2003), (College Station, TX), October2003.
[63] D. Niculescu and B. Badrinath, “Ad hoc positioning system(aps),” inProceedings of theGLOBECOM 2001 Conference, 2001.
[64] J. Ousterhout, A. Cherenson, F. Douglis, M. Nelson, and B. Welch, “The sprite networkoperating system,”IEEE Computer, vol. 21, no. 2, pp. 23–36, February 1988.
[65] P. Koopman, “Critical Embedded Automotive Networks,”IEEE Micro, vol. 22, no. 4, pp.14–18, July-August 2002.
[66] E. Palmer, “An Introduction to Citadel - A Secure Cypto Coprocessor for Workstations,”in Proceedings of IFIP SEC’94 Conference, (Curacao, Dutch Antilles), May 1994.
[67] Peng Zhou and Tamer Nadeem and Porlin Kang and Cristian Borcea and Liviu Iftode,“EZCab: A Cab Booking Application Using Short-Range Wireless Communication,”Technical Report DCS-TR-550, Rutgers University, March 2004.
[68] C. Perkins and E. Royer, “Ad Hoc On Demand Distance Vector Routing,” inProceedingsof the 2nd IEEE Workshop on Mobile Computing Systems and Applications (WMCSA1999), (New Orleans, LA), February 1999, pp. 90–100.
[69] A. Perrig, R. Szewczyk, V. Wen, D. Culler, and J. Tygar, “SPINS: Security Protocols forSensor Netowrks,” inProceedings of the 7th annual ACM/IEEE International Conferenceon Mobile Computing and Networking (MobiCom 2001), (Rome, Italy), ACM Press, NewYork, NY, July 2001, pp. 189–199.
[70] S. Ponnekanti, B. Lee, A. Fox, P. Hanrahan, and T. Winograd, “ICrafter: A Service Frame-work for Ubiquitous Computing Environments,” inProceedings of the Third InternationalConference on Ubiquitous Computing (Ubicomp), (Atlanta, GA), Springer-Verlag, Lon-don, UK, September 2001, pp. 56–75.
[71] N. Priyantha, A. Miu, H. Balakrishnan, and S. Teller, “The Cricket Compass for Context-Aware Mobile Applications,” inProceedings of the 7th annual ACM/IEEE InternationalConference on Mobile Computing and Networking (MobiCom 2001), ACM Press, NewYork, NY, July 2001, pp. 1–14.
[72] S. Rhea and J. Kubiatowicz, “Probabilistic Location and Routing,” inProceedings of the21th Annual Joint Conference of the IEEE Computer and Communications Societies (IN-FOCOM’02), (New York, NY), June 2002, pp. 1248–1257.
95
[73] M. Roman and R. Campbell, “GAIA: Enabling Active Spaces,” inProceedings of the 9thACM SIGOPS European Workshop, (Kolding, Denmark), ACM Press, New York, NY,September 2000, pp. 229–234.
[74] D. Rosu, K. Schwan, and S. Yalamanchili, “Fara - a framework for adaptive resourceallocation in complex real-time systems,” inProceedings of the Fourth IEEE Real-TimeTechnology and Applications Symposium, (Denver, CO), May 1998, pp. 79–84.
[75] S. Ganeriwal and R. Kumar and M. Srivastava, “Timing-sync Protocol for Sensor Net-works,” in Proceedings of the 1st International Conference on Embedded Networked Sen-sor Systems (Sensys 2003), November 2003, pp. 138–149.
[76] T. Sander and C. Tschudin, “Protecting Mobile Agents against Malicious Hosts,” in G. Vi-gna, editor,Mobile Agents and Security, volume 1419 ofLecture Notes in Computer Sci-ence, pp. 44–60, Springer-Verlag, 1998.
[77] B. Schwartz, A. Jackson, W. Strayer, W. Zhou, R. Rockwell, and C. Partridge, “Smartpackets: Applying active networks to network management,”ACM Transactions on Com-puter Systems, vol. 18, no. 1, pp. 67–88, 2000.
[78] J. Stankovic and K. Ramamritham, “The spring kernel: A new paradigm for real-timesystems,”IEEE Software, vol. 8, pp. 62–72, May 1991.
[79] P. Stanley-Marbell, C. Borcea, K. Nagaraja, and L. Iftode, “Smart messages: A systemarchitecture for large networkws of embedded systems,” inProceedings of HotOS-VIII,May 2001. Position Paper, 2001. Longer version: Rutgers University Technical ReportDCS-TR-430.
[80] P. Stanley-Marbell and L. Iftode, “Scylla: A smart virtual machine for mobile embed-ded systems,” in3rd IEEE Workshop on Mobile Computing Systems and Applications,WMCSA2000, (Monterey, CA), December 2000, pp. 41–50.
[81] I. Stoica, D. Adkins, S. Zhaung, S. Shenker, and S. Surana, “Internet Indirection Infras-tructure,” inProceedings of ACM SIGCOMM ’02, August 2002, pp. 73–86.
[82] T. Abdelzaher and B. Blum and Q. Cao and D. Evans and J. George and S. Georgeand T. He and L. Luo and S. Son and R. Stoleru and J. Stankovic and A. Wood, “En-viroTrack: Towards an Environmental Computing Paradigm for Distributed Sensor Net-works,” in Proceedings of the 24th International Conference on Distributed ComputingSystems (ICDCS 2004), March 2004, pp. 582–589.
[83] A. Vahdat, M. Dahlin, T. Anderson, and A. Aggarwal, “Active Names: Flexible Locationand Transport of Wide-Area Resources,” inProceedings of the Second USENIX Sympo-sium on Internet Technologies and Systems (USITS 1999), (Boulder, CO), October 1999,pp. 151–164.
[84] C. Wan, A. Campbell, and L. Krishnamurthy, “PSFQ: A Reliable Transport Protocol ForWireless Sensor Networks,” inProceedings of the 1st ACM international workshop onWireless sensor networks and applications (WSNA 2002), (Atlanta, GA), ACM Press,New York, NY, September 2002, pp. 1–11.
[85] M. Weiser, “The computer for the twenty-first century,”Scientific American, September1991.
[86] G. Xu, C. Borcea, and L. Iftode, “Toward a Security Architecture for Smart Messages:
96
Challenges, Solutions, and Open Issues,” inProceedings of the 1st International Work-shop on Mobile Distributed Computing (MDC’03), May 2003.
97
Vita
Cristian Borcea
Education
Ph.D.Computer Science, Rutgers University, New Jersey (2004)
M.S. Computer Science, Rutgers University, New Jersey (2002)
M.S. Computer Science, Polytechnic University of Bucharest, Romania (1997)
B.S. Computer Science, Polytechnic University of Bucharest, Romania (1996)
Publications
• Nishkam Ravi, Cristian Borcea, Porlin Kang, and Liviu Iftode. “Portable Smart Mes-sages for Ubiquitous Java-Enabled Devices”. Proceedings of The First Annual Interna-tional Conference on Mobile and Ubiquitous Systems: Networking and Services (Mo-biQuitous 2004), August 2004.
• Liviu Iftode, Cristian Borcea, and Porlin Kang, “Cooperative Computing in Sensor Net-works”. Handbook of Sensor Networks: Compact Wireless and Wired Sensing Systems,Mohammad Ilyas (ed.), CRC Press, July 2004.
• Porlin Kang, Cristian Borcea, Gang Xu, Akhilesh Saxena, Ulrich Kremer, and LiviuIftode, “Smart Messages: A Distributed Computing Platform for Networks of EmbeddedSystem”. The Computer Journal, Special Focus on Mobile and Pervasive Computing,Volume 47, British Computer Society, Oxford University Press, July 2004.
• Sasan Dashtinezhad, Tamer Nadeem, Bogdan Dorohonceanu, Cristian Borcea, PorlinKang, Liviu Iftode. “TrafficView: A Driver Assistant Device for Traffic Monitoringbased on Car-to-Car Communication”. Proceedings of the 59th IEEE Semiannual Vehic-ular Technology Conference (VTC 2004 Spring), May 2004.
• Liviu Iftode, Cristian Borcea, Nishkam Ravi, Porlin Kang, and Peng Zhou, “SmartPhone: An Embedded System for Universal Interactions”. Proceedings of the 10th IEEEInternational Workshop on Future Trends of Distributed Computing Systems (FTDCS2004), May 2004.
• Cristian Borcea, Chalermek Intanagonwiwat, Porlin Kang, Ulrich Kremer, and LiviuIftode, “Spatial Programming using Smart Messages: Design and Implementation”. Pro-ceedings of the 24th International Conference on Distributed Computing Systems (ICDCS2004), March 2004.
98
• Peng Zhou, Tamer Nadeem, Porlin Kang, Cristian Borcea, and Liviu Iftode. “EZCab: ACab Booking Application Using Short-Range Wireless Communication”, Rutgers Uni-versity Technical Report DCS-TR-550, March 2004.
• Liviu Iftode, Cristian Borcea, Andrzej Kochut, Chalermek Intanagonwiwat, and UlrichKremer, “Programming Computers Embedded in the Physical World”. Proceedings ofthe 9th IEEE International Workshop on Future Trends of Distributed Computing Sys-tems (FTDCS 2003), May 2003.
• Gang Xu, Cristian Borcea, and Liviu Iftode, “Toward a Security Architecture for SmartMessages: Challenges, Solutions, and Open Issues”. Proceedings of the 1st InternationalWorkshop on Mobile Distributed Computing (MDC 2003), May 2003.
• Cristian Borcea, Chalermek Intanagonwiwat, Akhilesh Saxena, and Liviu Iftode, “Self-Routing in Pervasive Computing Environments using Smart Messages”. Proceedings ofthe 1st IEEE Annual Conference on Pervasive Computing and Communications (PerCom2003), March 2003.
• Cristian Borcea, Deepa Iyer, Porlin Kang, Akhilesh Saxena, and Liviu Iftode. “Cooper-ative Computing for Distributed Embedded Systems”. Proceedings of the 22nd Interna-tional Conference on Distributed Computing Systems (ICDCS 2002), July 2002.
• Phillip Stanley-Marbell, Cristian Borcea, Kiran Nagaraja, and Liviu Iftode. “Smart Mes-sages: A system Architecture for Large Networks of Embedded Systems”, Proceedingsof the 8th Workshop on the 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII), Position Summary, May 2001.