information extraction from sensor networks using the watershed transform algorithm

11
Information extraction from sensor networks using the Watershed transform algorithm Mohammad Hammoudeh a,, Robert Newman b a Manchester Metropolitan University, Chester Street, Manchester M1 5GD, UK b University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1LY, UK article info Article history: Available online xxxx Keywords: Information extraction Watershed segmentation Query scoping Macroprogramming Wireless sensor networks abstract Wireless sensor networks are an effective tool to provide fine resolution monitoring of the physical envi- ronment. Sensors generate continuous streams of data, which leads to several computational challenges. As sensor nodes become increasingly active devices, with more processing and communication resources, various methods of distributed data processing and sharing become feasible. The challenge is to extract information from the gathered sensory data with a specified level of accuracy in a timely and power-effi- cient approach. This paper presents a new solution to distributed information extraction that makes use of the morphological Watershed algorithm. The Watershed algorithm dynamically groups sensor nodes into homogeneous network segments with respect to their topological relationships and their sensing- states. This setting allows network programmers to manipulate groups of spatially distributed data streams instead of individual nodes. This is achieved by using network segments as programming abstractions on which various query processes can be executed. Aiming at this purpose, we present a reformulation of the global Watershed algorithm. The modified Watershed algorithm is fully asynchro- nous, where sensor nodes can autonomously process their local data in parallel and in collaboration with neighbouring nodes. Experimental evaluation shows that the presented solution is able to considerably reduce query resolution cost without scarifying the quality of the returned results. When compared to similar purpose schemes, such as ‘‘Logical Neighborhood’’, the proposed approach reduces the total query resolution overhead by up to 57.5%, reduces the number of nodes involved in query resolution by up to 59%, and reduces the setup convergence time by up to 65.1%. Ó 2013 Elsevier B.V. All rights reserved. 1. Introduction Wireless Sensor Networks (WSNs) are enabling the production of applications that previously were not practical. They are cur- rently being applied to a variety of use domains ranging from hab- itat monitoring to space exploration, and from scientific to military. Such applications share several aspects: (1) the demand for information; (2) the response to this demand generally exists in multiple unstructured, potentially unbounded sequences of data points; (3) the generation of great volume of data that is ‘imperfect’ in nature and is characterised by significant redundancy. The char- acteristics of the sense data coupled with the resource constraints on sensor nodes necessitate the development of resource-efficient WSN applications. In-network information extraction is one meth- od to minimise resource utilisation while achieving applications objectives. This is due to the fact that local processing of sensed data is more energy and bandwidth efficient than the transfer of raw data to a central location for processing. Information extraction is the sub-discipline of artificial intelli- gence [1] that selectively structures, identifies, filters, classifies, and merges multi-modal data produced by multiple sensor nodes to discover recurring patterns that form a coherent and meaningful information. Initially, information extraction converts, possibly un- bounded, sequences of raw data into a more uniform and conve- nient structural format preparing it for further processing and analysis. It exploits domain-specific knowledge and data structural properties to harvest information by finding and combining rele- vant data while excluding irrelevant and erroneous ones. This def- inition will be used throughout this work to describe the term information extraction. This work addresses distributed query-based information extraction systems that use in-network computation to provide cost effective query responses. Nevertheless, the proposed solution is general enough to be applicable to other information extraction models, e.g., periodic or threshold-based. The query-based model of information extraction is widely used in data intensive applica- tions, where data is stored at multiple locations. Such systems are user controlled where users specify and inject a request for the information they require through simple queries; then the 1566-2535/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.inffus.2013.07.001 Corresponding author. Tel.: +44 0 161 247 2845. E-mail addresses: [email protected] (M. Hammoudeh), [email protected] (R. Newman). Information Fusion xxx (2013) xxx–xxx Contents lists available at ScienceDirect Information Fusion journal homepage: www.elsevier.com/locate/inffus Please cite this article in press as: M. Hammoudeh, R. Newman, Information extraction from sensor networks using the Watershed transform algorithm, Informat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001

Upload: m-h

Post on 12-May-2015

264 views

Category:

Technology


1 download

DESCRIPTION

Wireless sensor networks are an effective tool to provide fine resolution monitoring of the physical environment. Sensors generate continuous streams of data, which leads to several computational challenges. As sensor nodes become increasingly active devices, with more processing and communication resources, various methods of distributed data processing and sharing become feasible. The challenge is to extract information from the gathered sensory data with a specified level of accuracy in a timely and power-efficient approach. This paper presents a new solution to distributed information extraction that makes use of the morphological Watershed algorithm. The Watershed algorithm dynamically groups sensor nodes into homogeneous network segments with respect to their topological relationships and their sensing-states. This setting allows network programmers to manipulate groups of spatially distributed data streams instead of individual nodes. This is achieved by using network segments as programming abstractions on which various query processes can be executed. Aiming at this purpose, we present a reformulation of the global Watershed algorithm. The modified Watershed algorithm is fully asynchronous, where sensor nodes can autonomously process their local data in parallel and in collaboration with neighbouring nodes. Experimental evaluation shows that the presented solution is able to considerably reduce query resolution cost without scarifying the quality of the returned results. When compared to similar purpose schemes, such as “Logical Neighborhood”, the proposed approach reduces the total query resolution overhead by up to 57.5%, reduces the number of nodes involved in query resolution by up to 59%, and reduces the setup convergence time by up to 65.1%.

TRANSCRIPT

Page 1: Information extraction from sensor networks using the Watershed transform algorithm

Information Fusion xxx (2013) xxx–xxx

Contents lists available at ScienceDirect

Information Fusion

journal homepage: www.elsevier .com/locate / inf fus

Information extraction from sensor networks using the Watershedtransform algorithm

1566-2535/$ - see front matter � 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.inffus.2013.07.001

⇑ Corresponding author. Tel.: +44 0 161 247 2845.E-mail addresses: [email protected] (M. Hammoudeh),

[email protected] (R. Newman).

Please cite this article in press as: M. Hammoudeh, R. Newman, Information extraction from sensor networks using the Watershed transform algInformat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001

Mohammad Hammoudeh a,⇑, Robert Newman b

a Manchester Metropolitan University, Chester Street, Manchester M1 5GD, UKb University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1LY, UK

a r t i c l e i n f o

Article history:Available online xxxx

Keywords:Information extractionWatershed segmentationQuery scopingMacroprogrammingWireless sensor networks

a b s t r a c t

Wireless sensor networks are an effective tool to provide fine resolution monitoring of the physical envi-ronment. Sensors generate continuous streams of data, which leads to several computational challenges.As sensor nodes become increasingly active devices, with more processing and communication resources,various methods of distributed data processing and sharing become feasible. The challenge is to extractinformation from the gathered sensory data with a specified level of accuracy in a timely and power-effi-cient approach. This paper presents a new solution to distributed information extraction that makes useof the morphological Watershed algorithm. The Watershed algorithm dynamically groups sensor nodesinto homogeneous network segments with respect to their topological relationships and their sensing-states. This setting allows network programmers to manipulate groups of spatially distributed datastreams instead of individual nodes. This is achieved by using network segments as programmingabstractions on which various query processes can be executed. Aiming at this purpose, we present areformulation of the global Watershed algorithm. The modified Watershed algorithm is fully asynchro-nous, where sensor nodes can autonomously process their local data in parallel and in collaboration withneighbouring nodes. Experimental evaluation shows that the presented solution is able to considerablyreduce query resolution cost without scarifying the quality of the returned results. When compared tosimilar purpose schemes, such as ‘‘Logical Neighborhood’’, the proposed approach reduces the total queryresolution overhead by up to 57.5%, reduces the number of nodes involved in query resolution by up to59%, and reduces the setup convergence time by up to 65.1%.

� 2013 Elsevier B.V. All rights reserved.

1. Introduction Information extraction is the sub-discipline of artificial intelli-

Wireless Sensor Networks (WSNs) are enabling the productionof applications that previously were not practical. They are cur-rently being applied to a variety of use domains ranging from hab-itat monitoring to space exploration, and from scientific tomilitary. Such applications share several aspects: (1) the demandfor information; (2) the response to this demand generally existsin multiple unstructured, potentially unbounded sequences of datapoints; (3) the generation of great volume of data that is ‘imperfect’in nature and is characterised by significant redundancy. The char-acteristics of the sense data coupled with the resource constraintson sensor nodes necessitate the development of resource-efficientWSN applications. In-network information extraction is one meth-od to minimise resource utilisation while achieving applicationsobjectives. This is due to the fact that local processing of senseddata is more energy and bandwidth efficient than the transfer ofraw data to a central location for processing.

gence [1] that selectively structures, identifies, filters, classifies,and merges multi-modal data produced by multiple sensor nodesto discover recurring patterns that form a coherent and meaningfulinformation. Initially, information extraction converts, possibly un-bounded, sequences of raw data into a more uniform and conve-nient structural format preparing it for further processing andanalysis. It exploits domain-specific knowledge and data structuralproperties to harvest information by finding and combining rele-vant data while excluding irrelevant and erroneous ones. This def-inition will be used throughout this work to describe the terminformation extraction.

This work addresses distributed query-based informationextraction systems that use in-network computation to providecost effective query responses. Nevertheless, the proposed solutionis general enough to be applicable to other information extractionmodels, e.g., periodic or threshold-based. The query-based modelof information extraction is widely used in data intensive applica-tions, where data is stored at multiple locations. Such systems areuser controlled where users specify and inject a request for theinformation they require through simple queries; then the

orithm,

Page 2: Information extraction from sensor networks using the Watershed transform algorithm

2 M. Hammoudeh, R. Newman / Information Fusion xxx (2013) xxx–xxx

information extraction infrastructure efficiently retrieves and pro-cesses the data within the network. Generally, query-based sys-tems support high-level declarative languages for incorporatingdata retrieval and analysis. In this setting, most query languagesprovide an interface to the WSN user whilst concealing low leveldetails such as the network topology or communication paths. Of-ten, the query issuer is unaware of how the query is disseminatedin the network and how the data is retrieved or analysed.

A fundamental requirement to effectively use the query-basedinformation extraction model is the user’ s prior knowledge ofthe semantic data model and its contextual meaning. This allowsnetwork users to better understand, query, and control the ex-tracted information. For instance, it could be essential to recognisewhat kind of events are occurring in a specific segment of the mon-itored area and when these events occurred. Depending on thetype of the application, sensor data can be labelled with temporal,spatial, or other thematic semantic metadata. Accordingly, basedon the user information requirements, queries can be built to dis-tinguish a variety of events at various levels of semantic detail. Intime critical applications, for example, it may be sufficient to statethat a portion of a query is a temporal operator; whereas in otherapplications, it may be essential to distinguish among diverse tem-poral operators, e.g., operators specifying past, present and future.In event-driven applications, the semantic nature, scope, and unitof the requested information could be predefined. The unit ofextraction defines the granularity of individual information chunksthat are extracted from the sensor nodes. The scope of extraction isthe process of identifying a subset of data or nodes holding datathat is relevant to the requested information. To choose the chunksof information that should be considered in preparing the query re-sponse, an information extraction system uses a group of condi-tions. These conditions specify the formal attributes a specificchunk of information must have to belong to a certain semanticgroup.

As an example of this kind of requirement, from a real-worldand practical application, in the recently completed RFID fromFarm to Fork project [2], undertaken by one of the present authors,a system was built to trace provenance and condition of food prod-ucts. The general strategy was that RFID systems were used tolocalise food products, while WSNs were used to monitor the envi-ronment through which they passed and condition information astemperature and humidity. It is required that the condition infor-mation be associated with an individual food item, so the resolu-tion of a query requires determination of the location of an itemat a particular time and estimation of the environmental parame-ters at that location and time. Sensor information not associatedwith any food product (the majority of it) is surplus to require-ments. In-network information extraction allows only the requiredinformation to be collected, avoiding the swamping of the systemswith unnecessary data.

The focus in this work is on large-scale sensor networks wheremultiple sensor types are fitted to each node. In such networks,many low-observability events require the retrieval of multi-mod-al data. To cope with the high-volume sensor data, we explore net-work segmentation techniques to provide effective mechanism formulti-modal query dissemination and local collaboration amongnodes. Network segmentation provides high level abstractions forlocal and spatial data processing. The aggregation of sensor dataand informed selective collaboration of sensor nodes can improveextracted information accuracy, reduce latency, and minimisebandwidth consumption. However, every node in the network per-forms sensing operations on every sensor without in-node pre-processing.

This paper proposes a distributed information extraction ap-proach that retrieves data and accurately process it to respond toqueries that users pose. This approach focuses on reducing the

Please cite this article in press as: M. Hammoudeh, R. Newman, Information exInformat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001

amount of data that has to be collected and analysed to obtainthe desired information. It only query nodes that are closely linkedto the user requested information. The ability to limit a queryscope to arbitrary subsets of nodes results in considerable energysavings, lower bandwidth utilisation, and better information qual-ity. The absolute result of the information extraction system de-pends on the nature of the query; however, it usually entailsidentifying entities sharing a common sensing state and the rela-tions between those entities to provide accurate information ac-cess. Our proposed approach makes use of the Watershedtransform algorithm [3] to assign homogenous nodes’ to welldelimited segments using nodes physical location and theirsensors measurements. The purpose of the network segmentationprocess is to provide high-level primitives that abstracts away low-level system details. Unlike traditional information extraction ap-proaches where each node is treated as an isolated computationalunit, network segmentation allows efficient processing of high-le-vel queries of logical interest over the entire network. Networksegmentation is especially useful to dynamically build informationrules and frames to quickly provide accurate answers to informa-tion queries at a low cost.

The Watershed algorithm is able to cope with the irregular anddata dependent information requests by using the most suitableoption at every processing step instead of using predefined thresh-old values. To comply with application constraints, typically therequirement for real-time processing and limited energy consump-tion, we propose a new asynchronous and parallel reformulation ofthe Watershed transform algorithm. The presented reformulationis able to calculate the Watershed transform based on the node’slocal data. It achieves similar outcome with just single point of syn-chronisation, hence reducing the running time without scarifyingthe segmentation effectiveness. These two claims are discussedin this paper and verified in experimental evaluations.

The remainder of this paper is organised as follows. The nextsection discusses the differences of our work from other closely re-lated work. Section 3 introduces the Watershed algorithm. Section 4explains the suitability of this algorithm for WSNs. Section 5 de-scribes our modifications to the Watershed algorithm. Section 6 ex-plain the use of segments as a query scoping mechanism. Section 7presents the parallel asynchronous implementation of the Wa-tershed algorithm. Section 8 explains the mechanism for resolvingqueries. The effectiveness of the presented approach is studied bysimulation in Section 9. Section 10 concludes the work. Section 11gives a number of interesting directions for future work.

2. Related work

The information extraction process begins by specifying theinformation needed by a user/application. Often, this takes theform of a query or event trap. Then, an information extraction ap-proach selects a relevant subset of data or nodes carrying data thatis relevant to the needed information from a larger set. To improvethe accuracy of the extracted information and to minimise the costof collecting raw data, high-level programming abstractions can beused [4–9]. Programming abstractions allow the user to specifyhow and where a query is disseminated. A large and growing bodyof literature has investigated micro- and macro-programmingabstractions in the context of WSNs. Many methods and formal-isms for abstraction have been proposed, but their design has oftenbeen focussed on synthetic problems, rather than requirements ofworking systems, such as the example given above. In this sectiondifferent mechanisms to achieve abstraction are presented. We re-fer the interested reader to the recent survey [10] and the refer-ences therein for a comprehensive review of WSNs programmingapproaches.

traction from sensor networks using the Watershed transform algorithm,

Page 3: Information extraction from sensor networks using the Watershed transform algorithm

M. Hammoudeh, R. Newman / Information Fusion xxx (2013) xxx–xxx 3

SPINE [11] and MAPS [12] are two effective paradigms for theprogramming of WSN applications. The former is a function-ori-ented and domain-specific framework that supports distributedsignal processing and data classification functions over sensornodes. The latter adopts an agent-oriented view in the design ofa WSN programming framework for the Oracle Sun SPOT [13] de-vices. In MAPS, mobile agents support the programming of WSNsat various levels. The inherent properties of mobile agents, e.g.,autonomy and adaptability to distributed and open environments,means that MAPS can significantly improve the design and thedevelopment of WSN applications. However, the SPINE and MAPSframeworks remains platform-specific, which limits there applica-bility. To deal with this issue, SPINE2 [14] has been proposed.

SPINE2 is designed for the development of signal processingapplications on WSNs through a task-oriented programmingabstraction. In SPINE2, applications are considered as a set of inter-connected tasks. A task encapsulates a well defined activity, e.g.,feature extraction. SPINE2 is based on the concept of Virtual Sen-sors to provide platform-independent programming environmentand to improve architectural modularity and design reusability.Also, SPINE2 uses the C-language to achieve platform indepen-dency for C-like programmable sensor platforms. The very highportability of the framework is, moreover, due to the strong soft-ware decoupling between the platform independent runtime exe-cution logic and the platform-dependent components needed foraccessing the services and resources offered by the platform.However, the platform-dependent components have to be custom-ised for different platforms, which limits reusability and seamlesssoftware composition. Titan [15] is another task-oriented butservice-centric framework that uses interconnected services as aprogramming abstraction. It defines a programming model thatlinks nodes at run-time based on their opportunistically discoveredresources to compose activity or context aware applications. Titanis similar to our proposed approach in respect to its ability todynamically cope with changing node conditions. However, Titancannot benefit from opportunistic sensing in the absence of com-prehensive expert-knowledge and activity recognition servicesfor each possible available sensor.

A significant category of programming languages and systems isthe neighbourhood-based and the high-level representation of thenetwork. The Hood [16] and Regiment [17] are two examples ofsystems that are based on local behaviour to manipulate sensornodes collectively. They provide a group of operations to allow net-work programmers use network regions and data streams as thebasic programming abstraction. Kairos [18] provides programmerswith three levels of abstraction: node abstraction, list of directneighbours, and remote data access. Compared to Regiment, Kairosprovides a small number of constructs for its three levels ofabstraction, whereas, Regiments offers a bigger set of operationsand data types. EnviroTrack [19] is a programming abstraction par-ticularly designed for target-tracking applications. It has a complexgroup management protocol that assigns nodes that sensed thesame event to the one logical group. These systems, such as Regi-ment and Kairos, are tailored for rather narrow range of targetapplications. In SPIDEY [9], a node is defined using a logical nodetemplate that has several exported attributes (dynamic and static);the abstraction definition is composed of a node template and aninstantiation. Pathak et al. [4] presented a data-driven macropro-gramming framework to solve the problem of initial placementof high-level task graph representation onto the nodes of the targetnetwork. Srijan [6] is another graphical toolkit designed to aidapplication developers in the compilation of the macroprograminto individual customised runtimes for each constituent node ofthe target network. Pathak and Srijan are centralised in naturebecause the sink knows the location and initial energy levels ofall sensor nodes. Hnat et al. [5] implemented a macroprogramming

Please cite this article in press as: M. Hammoudeh, R. Newman, Information exInformat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001

compiler for MacroLab [7]. The system-wide or global computationscope provided by MacroLab limits the efficiency of the newcompiler.

In the approaches described above, the utilisation of the physi-cal network topology as a programming abstraction introducessome rigidity to the programming approach. Handling frequenttopology changes can be expensive, potentially involving globalabstraction changes. Besides, node-level abstraction still requiresthe programmer to be responsible for ensuring that the distributedapplication achieves its intended functions efficiently. The infor-mation extraction approach proposed in this paper is differentfrom these systems in one vital characteristic; it provide loose timesensor node state synchronisation among all nodes in the network,which makes it more suited for real-time applications character-ised by frequent topological and state changes. Our proposed ap-proach uses asynchronous operations to dynamically maintainsegments memberships in a distributed mode to guarantee thatthe system remains in the correct state for the most part of its life-cycle with minimum interruptions.

One particular approach that is similar to the one proposed inthis paper is the Logical Neighborhood. Both approaches utilisecommunication cost and similar patterns of sensed data to assignnodes into logical collections. The generated abstraction allowsthe user to extract information at a pre-defined cost and at higheraccuracy by limiting a query scope to arbitrary subsets of nodeswith desired characteristics. We compare the performance of theproposed approach against that of the Logical Neighborhood. Webelieve that the comparison is fair as the two approaches are sim-ilar in spirit but different in approach. Below we provide a briefdescription of the Logical Neighborhood approach.

Logical Neighborhood [9] is a virtual node programmingabstraction. A node is characterised by several exportable attri-butes that can be static or dynamic. These attributes are used to re-place the notion of geographic neighbourhood (defined by radiorange) with the notion of logical neighborhood (defined by applica-tion conditions). Logical neighborhood regions are defined declar-atively using the state of the node and on the communicationcost. These logical neighborhoods are implemented using SPIDEYprogramming language [20] and are supported by a special routingstrategy enabled by SPIDEY programming constructs. The authorsof the Logical Neighborhood assure that their approach can be ontop of any routing protocol. However, they confirm that the utilisa-tion of existing routing mechanisms to support their abstractionswill result with various performance drawbacks. Another limita-tion of this approach is that the user is expected to manuallydeclaratively specify using the SPIDEY language which nodes toconsider as neighbours. This is not always easy to achieve; the con-figuration of various parameters requires the developers to havedeep understanding of the nature of data being collected and goodknowledge of the network topology.

3. Description of the Watershed algorithm

Watershed transformation has proven to be one of the mostpopular segmentation techniques used in grey-scale mathematicalmorphology [3,21–29]. Its name originates from its similarity withhydrology. This algorithm is usually used for segmenting an imageinto a set of non-overlapping regions. It uses an intensity-basedtopographical representation, where light pixels represent higheraltitudes and dark pixels represent valleys of an image landscape.The basic idea can be easily explained using the analogy of rainfalling on a landscape. All the rain falling flow with gravity fromhigh altitude areas along lines of steepest descent until it istrapped in a catchment basin. Each catchment basin of the imagehas a corresponding regional minima. The catchment basin growsin size as the amount of falling rain increases. When two basins

traction from sensor networks using the Watershed transform algorithm,

Page 4: Information extraction from sensor networks using the Watershed transform algorithm

Fig. 1. An example of a regional minimum.

Fig. 2. An example of a steepest descending path.

4 M. Hammoudeh, R. Newman / Information Fusion xxx (2013) xxx–xxx

start spilling into one another, a dam is built to stop them frommerging into a larger basin. These dams, called the Watershedlines, split these catchment basins and correspond to the desiredsegmentation.

Because of its importance, specific variants and implementa-tions of the original Watershed transform algorithm has been pro-posed to suit a wide range of applications. These algorithms maybe categorised into two theoretically separate approaches: rainingand immersion.

1. Raining: Simulates pouring down water from top (as raining) ontopographic surface. It is regarded as a distributed approachsince every droplet follows, independently from neighbouringdroplets, the steepest descent until reaching a minimum. Thisprocess continues for every droplet that falls on the surfaceand stops when each point is allocated to a minimum. All drop-lets that follow to the same local minimum form a catchmentbasin (otherwise known as segment). No point belongs to aWatershed line; instead each point is considered as belongingto a particular catchment basin. This view was adopted in thework of [23–26]. There are a number of raining implementa-tions targeting parallel processing [28,29] and hardware adap-tation [27].

2. Immersion: The immersion approach can be conceptuallyviewed as an approach that starts from low altitude to high alti-tude. It simulates gradually immersing the entire topographicsurface in a water tank. The immersion process proceeds stepby step, globally increasing the altitude. The sources for theflooding are the local surface minima. Watershed lines are con-structed whenever the water from different sources meet. Thecatchment basins, catchment areas or Watersheds are the sur-faces delimited by these lines. There exists many modificationsaround this technique [22].

On flat zones or plateaus, the water droplet follows the path to-wards the closest edge of a descending slope and it stops once itarrives at a local minimum. An important factor of developing aparallel asynchronous algorithm is the utilisation of the data local-ity for minimising the computation and communication costs. Toachieve this aim, we present a reformulation of the raining Wa-tershed algorithm because of its suitability for parallel processingsystems.

To begin with, we present Meyer’s formalism [30] thatdirectly segments the input image using local conditions. TheWatershed developed by Meyer is the ground of our parallelasynchronous reformulation. The authors in [31] reviewed someof the significant parallel and distributed variations of theWatershed algorithm on computationally rich devices. Priorgiving the details of our proposed synchronous and parallelreformulation, we introduce some basic definitions as given byMeyer [30].

Suppose that f(p) is a function of grey-values, representing acontinuous digital image with the domain X 2 Z2. All pixelsp 2X have a grey-value f(p) and a set of neighbouring pixelsp0 2 N(p) with a distance function d(p,p0) to every neighbour;the majority of algorithms in the literature use a 4- to 8-neighbours.

Definition 1 (Regional minimum). Is a connected components ofpixels with a constant grey-value, where external boundary pixelsall have a higher value [23]. In keeping with the rain analogy, aregional minimum is the zone where the water would settle downwithout overflowing to lower levels. Fig. 1 gives an illustration ofthe regional minima using an example digital image. The double-line square encloses pixels that are members of the regionalminima indicated in the figure.

Please cite this article in press as: M. Hammoudeh, R. Newman, Information exInformat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001

Definition 2. The lower slope (LS) of p is defined as the maximalslope connecting p to any of its neighbours of lower grey-level thanitself.

ðLSÞ ¼ max8p02NðpÞf ðpÞ � f ðp0Þdistðp;p0Þ jf ðp

0Þ 6 f ðpÞ� �

Definition 3 (Steepest Descending Path). "p 2X, the SteepestDescending Path (SDP) is the set of points p0 2 N(p) defined asfollows:

p0 2 NðpÞ f ðpÞ � f ðp0Þdistðp;p0Þ ¼ LSðpÞ; f ðp0Þ < f ðpÞ����

� �

i.e. the SDP is a chain of linked points where every point has a grey-value precisely less than its predecessor. Each point may have morethan one neighbour with a grey-value less than its own value;therefore, there may be many descending paths from a given point.The SDP is the path where every point in that path is linked to theneighbour with the lowest grey-value. In keeping with the rainanalogy, a droplet that falls over a point on the surface flows tothe regional minimum along the path of steepest descent. Fig. 2shows the SDP from the point (0,0) to the regional minimum situ-ated in (4,4).

Definition 4 (Cost function based on lower slope). The slope of thesurface has strong influence on path selection; flatter surfaceallows for faster, easier, and more direct walking. The slope-basedcost function, cost(pi�1,pi), for travelling from point pi�1 topi 2 N(pi�1) is:

LSðpi�1Þ � distðpi�1; piÞ f ðpi�1Þ > f ðpiÞLSðpiÞ � distðpp�1;piÞ f ðpi�1Þ < f ðpiÞ12 ðLSðpi�1Þ þ LSðpiÞÞ � distðpi�1;piÞ f ðpi�1Þ ¼ f ðpiÞ

8><>:

Definition 5 (Topographic distance). Distance is the primary costof walking on a surface. The topographic distance between twopoints p and q is the minimal p distance among all paths p�between p and q:

traction from sensor networks using the Watershed transform algorithm,

Page 5: Information extraction from sensor networks using the Watershed transform algorithm

M. Hammoudeh, R. Newman / Information Fusion xxx (2013) xxx–xxx 5

TDf ðp; qÞ ¼ infTDpf ðp; qÞ

where TDf ðp; qÞ ¼Pn

i¼2 costðpi�1; piÞ is the topographical distance ofa path p = (p = p1, p2, . . . , pn = q), such that "i, pi 2 N(pi�1) and pi 2X

Definition 6 (Catchment basin based on topographic dis-tance). CBTD(mi) of a local minimum mi is defined as the set ofpoints p 2X that are topographically closer to mi than to any otherregional minimum mj,:

CBTDðmiÞ ¼ p f ðmiÞ þ TDf ðp;miÞ þ TDf ðp;mjÞ8j–i��� �

Specifically, a CB is composed of a uniquely labelled regional mini-mum and all the points whose SDP lead to it [24]. In keeping withthe rain simulation analogy, a CB is a zone where all rain fall isconveyed to. Fig. 3 shows a digital image segmented into two CB(highlighted with different grey levels). The squares indicate theregional local minima of each CB and the arrows represent thesteepest descending paths indicating the direction of descent.

4. The Watershed algorithm suitability for WSNs

The Watershed algorithm is intuitively understandable and of-fers a wide range of prospective modifications to fit specific goals.It can be easily customised to include diverse external factors thatmay have influence on the segmentation outcome, e.g., physicalobstacles between nodes. Watershed algorithm can easily beadapted to work with multimodal WSNs. Each sense modalitycan be viewed as a separate digital image to be segmented. A nodemay have many SDPs leading to different minima and it may joinmultiple catchment basins each correspond to one sense modality.The segmentation steps across the different sense modalities canbe combined in one step to reduce resource utilisation.

The concept of Watersheds can be implemented in a distributedasynchronous mode with fast computation time. The localisationof the process of building catchment basins is a beneficial factorfor immense and frequently varying data sets, posing it appropri-ate for WSNs applications. A parallel and asynchronous implemen-tation is expected to reduce data communication across thenetwork and evade the need for global re-computation/synchroni-sation of the entire network segments once one or more observa-tions are changed. Watershed algorithm can also be modified tobase the segmentation process on local conditions, which helpsin parallelising this process (see Section 5).

In the Watershed algorithm, the topographical distance be-tween a point and its regional minimum equals to f(p)–f(mi). Thismeans that a SDP guarantees a minimal communication cost. Theminimisation of the bridging distance between a point and itsregional minimum helps the WSN to save energy. The effective en-ergy gain is the minimum squared distance between two points.Finally, in Watersheds there are few parameter decisions and itdoes not make any assumptions that limits its applicability.

Fig. 3. Image segmented into two catchment basins.

Please cite this article in press as: M. Hammoudeh, R. Newman, Information exInformat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001

5. The modified Watershed algorithm

5.1. Neighbours definition using Shepard’s method

In the wide body of literature, most Watershed transform algo-rithm variations use a 4- to 8-pixel neighbours to define theboundary of a point. However, the topographical-distance-basedtransform may result in non-minima plateaus with nonempty inte-rior in situations where non-minima points do not have a neigh-bour of lower value. Moreover, nodes may be located at differenttopographic distances from the node of interest. Hence, an addi-tional ordering relation between such points is required. To ad-dress these issues, we propose using Shepard’s method [32] toconstruct the set of neighbours of a point P. Shepard defined twometrics:

(a) Arbitrary distance metric:This metric is to compute the geodesic distance to the lowerboundary of the plateau. All points located within radius r from pare considered in determining the SDP. The calculation of this met-ric is computationally simple, but there is a possibility that thereare zero or excessive number of neighbours within radius r.

(b) Arbitrary number metric:Just the nearest n neighbours are included in determining the SDP.This metric disregards the spacing and relative location of thepoints and necessitates complicated ranking process and expensivesearching for points. Moreover, it accepts a fixed number, n, ofneighbouring nodes as optimal.

A combination of the two metrics captures their advantages. Aninitial search radius r is established depending on the total numberof points. On average, a maximum of seven points was defined tolimit the complexity and the amount of computation required. ris defined as:

pr2 ¼ 7AN

were A is the area of the minimum bounding box enclosing allpoints. The authors of [33] present a detailed study about the suit-ability of this inclusion function for WSNs.

5.2. Incorporate the hop count in determining the SDP

A path from location x to location y on a terrain is a descendingpath if the value of a point p never increases as we move p alongthe path from x to y. If p always moves through the neighbour withthe smallest value, then the path from x to y is the SDP. It is possi-ble to have several descending paths from p to its local minimum,the selection among them is mostly nondeterministic or imple-mentation dependent. However, when the Watershed algorithmis applied in WSNs applications, a computationally efficientdescending path is preferred. The authors of [34] identified com-munication as the most energy expensive operation on a sensornode; they calculated the energy consumption and found that ageneral-purpose processor could execute 3 million instructionsfor the same amount of energy used to transmit 1 Kbit of data byradio over 100 m. The communication cost is dependent on thetransmission distance, number of intermediate communicationhops, link quality, and other elements. Hop count is a metric thatis commonly utilised by routing protocols to determine the dis-tance between two hosts or for organising nodes in resource-effi-cient clusters (e.g., [35]). The energy spent in communication isrelative to the square of the communication distance betweenthe sending and receiving hosts. The distance between two nodes

traction from sensor networks using the Watershed transform algorithm,

Page 6: Information extraction from sensor networks using the Watershed transform algorithm

6 M. Hammoudeh, R. Newman / Information Fusion xxx (2013) xxx–xxx

can be estimated on the basis of incoming signal strengths or di-rectly using the global positioning system. The energy expenditureis also proportional to the scale of the network [36]. Consequently,the lower communication distances should be considered as afunction of the inter-sensor Euclidean distance and the numberof intermediate communication hops. The resulting distance func-tion can be defined as the summation of the single inter-nodeEuclidean distances divided by the hop count.

Distðp;p0Þ ¼P

distðpi;pjÞhc

where i, j < hc � 1 and hc is the hop count.

6. Segments as a scoping mechanism

In the wider body of literature [4–9], most existing program-ming abstractions are a part of the query language used, thus lim-iting its applicability. In fact, system developers are concernedwith the application logic as well as specifying the nodes to partic-ipate in a specific task and how to communicate with them. Thebenefits of restricting the number of nodes participating in a taskare to reduce energy and bandwidth consumption along withimproving information accuracy by removing task-irrelevantnodes from the task computation. For instance, querying tempera-ture data over the whole network might end up getting the amountof data that is linked with the density of sensors but not necessarilypertinent to the monitored phenomena.

A key contribution of the work presented here is the creation oflogical groups of nodes within the network, called segments. Thesesegments are used to constrain dissemination of queries to a sub-set of the most-relevant nodes. Thus, each query is sent to thesmallest possible number of nodes, and per-node query dissemina-tion overhead does not grow with network size. Segments can alsobe useful to many services other than information extraction, e.g.,the steepest path can be used by the routing service to determineenergy efficient routes.

The user can exploit segments as a component of a query, orsegments can be used by the system to autonomously make deci-sions on where a query should be disseminated. The formerrequires the extension of existing query languages to incorporatethe segment construct and the development of an associatedparser for the language. The building of query execution abstrac-tion as part of the used query language is beyond the scope of thispaper. In the latter, segments form a vital factor in making a deci-sion on where a query should be disseminated, how the query willbe processed within and across segments, and where the query an-swer will be finally resolved.

In the absence of dedicated query constructs, the concept ofscoping in the context of WSNs macroprogramming presents apractical solution. Our Watershed-based approach allows thedevelopers to reason about the network as a whole, instead ofviewing the network as a set of isolated nodes. Segments allowsdynamic complex interactions between system portions, which re-sults in less programming effort, reduced complexity, and morereliable code. In a segmented network, query dissemination andthe generation of query running plans are based on Watershed-de-fined (distance and sensor value) neighbourhood instead of theunreliable wireless communication range. Watershed segmentsare dynamically updated at low cost. This is due to the localisedindependent processing of segment membership. When the cur-rent state of the node or the state of a node in its steepest pathchanges significantly, the node initialises a local update processindependently from other nodes. Local computation results in re-duced calculation times and costs.

Please cite this article in press as: M. Hammoudeh, R. Newman, Information exInformat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001

The Watershed algorithm groups sensor node segments, suchthat each segment is homogeneous with respect to some property.For instance, a segment may be a collection of nodes that are geo-graphically adjacent and they share a common temperature read-ing. When performing segmentation, the Watershed usesinformation about node location, its sensory values, and informa-tion about the boundaries of segmented nodes. Hence, the Wa-tershed algorithm integrates the topological (e.g., [16,17]) anddata-centric (e.g., [37,38]) group abstraction methods to eliminatetheir drawbacks. This property allows for obtaining accurateresults and presenting several local behaviours effectively, makingit possible to develop programs that articulate higher-level behav-iour further than that of the standard query-based approaches. Toaddress this emerging requirement additional programmingabstractions to distribute the computational load while maintain-ing results effectiveness are essential. The logical segmentsproduced by the Watershed algorithm substitute the rigid geo-graphical neighbourhood offered by the radio range with applica-tion-based, higher level abstractions. Such segments aregenerated in a way that the logical notion of proximity is defineddeclaratively and dynamically according to nodes properties, be-side restrictions on communication costs (particularly the widthof the segment). A network segment created using the concept oflogical neighborhood specified by applicative conditions is, thus,able to extract requested information with high fidelity.

All nodes within the same segment are automatically labelledwith an integer identifier called marker. The Watershed algorithmis supplied with membership templates. These templates encapsu-late a set of logical constraints that a node has to satisfy to belongto one or more segments. Templates are also used to dynamicallyupdate and maintain segment membership. Such segments givesystem programmers the ability to manipulate logical segmentsrather than isolated nodes or groups of adjacent nodes. Yet, pro-grammers are still capable of manipulating individual nodes orbroadcast regions, but they may indicate declaratively the part ofthe network to involve and therefore manage the scope of a queryto reduce computation cost.

7. Parallel asynchronous watershed implementation

This section, presents our proposed reformulation of the Wa-tershed algorithm that complies with the sensor nodes constraints,i.e., limited memory, energy, processing speed, and bandwidth. Pre-viously published Watershed algorithms required at least three glo-bal synchronisation points: minima detection; labelling; andflooding. This work describes a new fully Parallel and AsynchronousWatershed algorithm (for short called PA Watershed), in which sen-sor nodes perform local computations independently and in paral-lel without a global synchronisation point. This is of particularimportance when applied to in-network processing in WSNs, wherethe problem of synchronisation is exacerbated by slow and unreli-able data communication; and mitigation using sophisticated syn-chronisation protocols is difficult due to the constraints overcommunication bandwidth given by power consumption.

To apply the PA Watershed to WSNs segmentation, for a three-dimensional image, each pixel is considered as a sensor node. Inthis representation, the numerical value of each pixel determinesthe corresponding sense modality of a node located at position(x,y). The image resolution corresponds to the network density,i.e., the number of nodes per area unit. To simplify the implemen-tation of the PA Watershed algorithm, we assume a virtual grid isformed throughout the deployed network. Each node is assigned toa virtual grid cell and each cell can contain only one node.

A crucial aspect of the design of the PA Watershed algorithm isto exploit the data locality to reduce the algorithm’s processing andcommunication cost. The PA Watershed algorithm is based on the

traction from sensor networks using the Watershed transform algorithm,

Page 7: Information extraction from sensor networks using the Watershed transform algorithm

M. Hammoudeh, R. Newman / Information Fusion xxx (2013) xxx–xxx 7

rain falling simulation model because it involves less communica-tion overhead, compared to immersion-based approaches whichare global in nature. The PA Watershed algorithm may be regardedas an asynchronous relaxation of the ‘Hill Climbing’ algorithm [30].This algorithm is similar to the rainfall simulation except that it be-gins from the minima and climbs by the steepest ascending slope.In this algorithm, most computations can be viewed as s collectionof independent tasks that can be run in parallel on different sensornodes. Each node runs a simple finite state machine linked to everyindividual sensed modality. Moreover, nodes use non-blockingcommunication to improve performance by overlapping computa-tion and communication. This approach avoids the need of any glo-bal synchronisation among sensor nodes and results in a very highcomputational efficiency and running speed of the algorithm.

The PA Watershed algorithm labels each node with the identi-fier of its catchment basin by ‘walking’ along the SDP towards toa regional minimum. Primarily, all nodes are marked by initialtemporary labels and assumed to be non-minima nodes. Each nodestarts searching for the steepest descent among its neighbours. If asteepest descent is detected, then the node will be relabelled tomatch the label of its corresponding steepest neighbour. When aregional minimum is finally reached, its label is propagated up toall nodes along the path. The sequence of nodes along the path willbe relabelled to mark the association to that regional minimum,hence concluding this search process. Because the search processis executed and terminated autonomously by individual nodes,the total idle time and communication overhead is reduced signif-icantly. In situations where a regional minima cannot be reached,e.g., flat zone, the steepest descent must be calculated from thecomplete set of lower borders.

Algorithm 1. The proposed PA Watershed algorithm.

Input state S(u) of node u; current sensor reading d(u); d(v)sensor readings of neighbours

Output segment membership after processing new datacase (S(u) = ‘‘initial’’)

u broadcasts d(u) and label l(u) to its neighbours N(v)u listens for data from every neighbour vi 2 N(v)u computes the following:

N(v)= all neighbours with sensor value equal to d(u)N(v)P all neighbours with sensor reading greater than

d(u)N(v)Ln = vi a unit set where slope(u,vi) = LS(u) and vi 2 N(v)if N(v)Ln = ø

lðvÞ minv2NðvÞ¼ ðlðvÞÞS(u) ‘‘MP’’

elseS(u) ‘‘NP’’

case (S(u) = ‘‘MP’’)u listens for new data, h0(v), from any neighbourif h0(v) < h(u)

N(v)Ln v;S(u) ‘‘NP’’else

l(u) min(l(u), l(v)); S(u) ‘‘MP’’case (S(u) = ‘‘NP’’)

u waits for data from N(v)Ln

l(u) min(l(u), l(v))S(u) ‘‘NP’’

In all cases, u sends any new reading >T to all nodes in N(v)P

In multi-modal sensor networks, a number of paths from a sin-gle source to multiple destinations exists. Each path represent adifferent sense modality. Sensor nodes are assigned a separate la-bel for each sense modality. The search process for the regional

Please cite this article in press as: M. Hammoudeh, R. Newman, Information exInformat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001

minima associated with each sense modality can be combined inone step. This does not add extra cost on the nodes, or on the net-work in general, due to the broadcast nature of wireless communi-cation. In practice, a node sensor reading can change frequentlynecessitating relabeling all nodes above it in the SDP with thenew minimum label. A threshold can be defined to avoid unneces-sary updates resulting from insignificant changes in sensor read-ing. The segment membership update process can be mergedwith other communication tasks, e.g., data transmission, to reducemembership management overhead. Depending on the applicationrequirements, the membership update for various sense modalitiescan be aggregated in one step. To keep track of segment member-ship update along the SDP, a flag (called reset) is defined at eachnode to monitor whether a change took place since the last datacommunication.

The PA Watershed does not require frequent synchronisationamong nodes. It takes advantage of the asynchronous computa-tions at the programming level and for its implementation. Theminima detection, labelling, and climbing of the SDPs run on allnodes asynchronously and in parallel. An important shortcomingof the PA Watershed is that inner nodes of a plateau cannot deter-mine independently if they belong to non-minimum plateau (NP)or minimum plateau (MP). In its current form, the PA Watershedrequires global synchronisation to recognise and label MPs. Toovercome this problem, all inner nodes of a plateau are classifiedinto two groups: plateau and minimum. The plateau is the groupof nodes whose neighbours have a sensor value of equal or greatervalue than its sensor value. The minimum contains every node ofthe plateau that has a neighbour with a sensor value less than itsown sensor value. Nodes in the minimum serve as seeds for prop-agating their labels to inner nodes on the plateau. The labels prop-agate to progressively fill the entire plateau.

Algorithm 1 shows the pseudocode of the PA Watershed algo-rithm, which runs on every node. Regardless of the network size,each node only communicates with a limited number of neigh-bours located within its radio range. Every node will learn onlythe next hop, not the complete hop-by-hop route, to the minima.The algorithm starts by each node broadcasting its ID, its currentstate, as well as its sensors readings and corresponding labels(di(u), li(u)) to its neighbours. Then, nodes use the gathered infor-mation to group their neighbours in three sets: N(u)= contains allneighbours with sensor values equal to d(u); N(u)P is the set ofall neighbouring nodes with sensor readings greater than d(u);N(u)Ln = vi is a unit subset of !, where !(u) = {vi 2 N(u)—slope(u,vi) = LS(u). If !(u) has two or more nodes, then the nearestnode to (u) is chosen from !(u), if the set is empty, then the node ison a MP. The node will keep monitoring changes in neighboursdata. When the node current state is minima or plateau and re-ceives new data such that h0(v) < h(u), then it changes its state toNP; otherwise, a representative label is calculated for that plateauor minima. The representative label is that of the node with thesmallest reading in the plateau. This means that all neighbours ofa minimum node with equal reading to that node, relate to oneconnected component. A connected component of an undirectedgraph is a subgraph such that any pair of vertices are connectedby paths and are connected to no additional vertices [39]. Whena non-minimum plateau node receives a new data message, it up-dates its label to the minimal label of a minimum plateau. There-fore, all changes are dealt with locally and independently fromother nodes, which keeps the cost of dynamic segmentation verylow. Nodes exploit the broadcast nature of wireless communica-tion in updating their state or label. When a node forwards the dataof its predecessors or when it hears neighbours broadcast mes-sages, it can use these messages to update its label. To increasethe algorithm efficiency, a threshold T is defined to indicate

traction from sensor networks using the Watershed transform algorithm,

Page 8: Information extraction from sensor networks using the Watershed transform algorithm

8 M. Hammoudeh, R. Newman / Information Fusion xxx (2013) xxx–xxx

significant changes that need to be reported to neighbours. Sincedata messages can be captured by nodes due to the broadcast nat-ure of wireless communications, nodes will also use the thresholdto decide locally whether they should start an update process ornot.

8. Query resolution mechanism in PA Watershed

The PA Watershed divides the feature space into distinct seg-ments returning a hierarchical description of the data in the formof a tree. Moreover, it provides each sensor with a path, i.e., theSDP, over which quires and their answers can be transmitted tothe querying node. SDPs can be exploited by queries to exploresensors data starting at the root and going only as deep as is nec-essary to provide answers with the desired quality. This greedy as-cent algorithm is reasonably straightforward; the query is initiatedat the root of the tree, i.e., at the local minimum, and every node inthe segment makes decisions independently based on the satisfi-ability of the query conditions by its local data. If an ascent is nec-essary, the query is forwarded to the next node on the SDP.

A traversal of the SDP using our greedy algorithm will stop atnodes at various levels that all satisfy the query. Let the section Sof the SDP be the set of nodes at which the query stops. Then,the structure of the tree above S, i.e., all nodes that have a prede-cessor in S, is irrelevant to the cost of query, as the query traversalwill never ascend that far. In this simple version of the query res-olution method, if a decision is made to forward, then only the nextchild will receive the query. More elaborate schemes can be de-signed to assign different priorities to children and perform morecomplex selective forwarding.

Consider the following example to explain the mechanism forresolving a specific multi-modal query: ‘‘Return the temperatureat each sensor node, where the air pressure is less than 950mbar’’.This query is expressed using simple inclusion predicate, which is acondition of the form ‘‘current reading > threshold’’. We call suchpredicates, whose evaluation depends also on the readings takenby the nodes, dynamic predicates as they specify which nodesshould include their response in the query, i.e., nodes whose valuesexceed a given threshold.

First, a query is propagated from the sink toward relevant localminima by end-to-end routing. A minimum being defined by somecriterion, for instance that the pressure 6950. Once the set of seg-ments to sample from has been selected, data must be queriedfrom the subtree rooted at local minimum. The choice of the datapoint can be made by problem of ascending up the subtree untila node that is qualified to reply the query is found. After the querytraverses a node with pressure reading P950 or when the queryreaches the leaf node, sensor nodes with pressure readings fallinginto the query window recursively report their temperature read-ings to the nodes from which they received the query. Partial re-sults on every SDP are returned level by level up the tree untilreaching the root node. Whenever possible, data is aggregated asquery responses move down the tree and sent in the same datapacket. After aggregating the data from all SDPs, the aggregated re-sult is returned back to the sink node, again by an end-to-end rout-ing protocol.

9. Evaluation

In this section, the efficiency of the PA Watershed producedprogramming abstraction is evaluated in comparison with moretraditional query resolution methods, in-network and centralisedprocessing, as a baseline. Moreover, in the absence of recent di-rectly applicable approaches to compare against, we choose theLogical Neighborhoods [9] approach. To this end, we implemented

Please cite this article in press as: M. Hammoudeh, R. Newman, Information exInformat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001

PA Watershed and Logical Neighborhood abstractions on top ofMuMHR [35] routing protocol and evaluated them using the Dingo[40] simulator. In our experiments, programming abstractions areused as a query scoping mechanism to constrain dissemination ofqueries so that the entire network is not flooded.

9.1. Dingo WSNs simulator

Dingo provides tools for the simulation and deployment ofhigh-level, Python code on real WSNs. Dingo users write high levelsimulations, which can then be refined to various hardware plat-forms (e.g., Gumstix [41]). In addition, Dingo allows mixed modesimulation using a combination of real and simulated nodes. Theauthors found Dingo not only easy to use, but also powerful en-ough to model and simulate the behaviour of systems at variousdesign stages. In Dingo, nodes have the ability to obtain theirsensed data from a database or graphical objects like maps. Theuse of real-world data improves the fidelity of simulations as itmakes it possible to check the simulation results against the realdata. Furthermore, Dingo has several features in the form ofplugins. These can be activated/deactivated on the plugin menu.Network topologies can be loaded and saved. The ‘‘Topology’’ menucan be used to change the network topology of a simulation from arandom topology to/from a grid.

9.2. MuMHR routing protocol

In our experiments we make use of MuMHR (Multi-hop Multi-path Hierarchal Routing) because it has proven to be an energy effi-cient and robust routing protocol. MuMHR is already implementedin Dingo, which allowed the authors to direct the implementationefforts on the programming abstractions.

9.3. Simulation setup and scenario

In all simulations, we configured a number of parameters basedon the Crossbow [42] wireless sensor hardware platform. Concur-rently, we tried to conform to values set in the synthetic scenarioin [9] for better comparison. Nodes were dispersed over themonitored region such that no two nodes share the same locationand the transmission range of each node is bound to 75 m. Thebandwidth of the channel was set to 1 Mbps, each data messagewas 500 bytes long, and the packet header for various messagetypes was fixed to 30 bytes. A simple model for radio hardware en-ergy dissipation is also assumed. The transmitter dissipates energyto run the radio electronics and power amplifier; and the receiverdissipates energy to run the radio electronics. All the nodes weregiven an equal initial supply of energy. The processing delay fortransmitting a message is randomly chosen between 0 and 5 ms,simulating real-world characteristics of low-power radiotransmission.

The simulated scenario is the deployment of a WSN to monitorthe temperature in an environmentally sensitive area. Each node isinitially configured with a single attribute, i.e., temperature. Thenode’s temperature reading is perceived as the colour intensitycorresponding to its position on a thermal map. All simulationsemploy a thermal map, Fig. 4(a), taken from [43]. This thermalmap is fed in the simulator as an image at the system startup. Tosimulate dynamic conditions, a list of events in terms of significantincrease of temperature readings on some nodes was configured.The set of nodes at which these events are to occur was defined,initially nondeterministically, before the start of each simulationrun. In each run, between 5 and 10 events occur at predefinedintervals. An event occurrence is followed by a query to locatethe lowest and highest temperature readings in the entire network.Each simulation run lasted 1000 s.

traction from sensor networks using the Watershed transform algorithm,

Page 9: Information extraction from sensor networks using the Watershed transform algorithm

Fig. 5. Comparison of the messaging overhead involved in resolving the query atdifferent network densities.

M. Hammoudeh, R. Newman / Information Fusion xxx (2013) xxx–xxx 9

The PA Watershed algorithm has been implemented startingfrom the Python code of SciPy Cookbook available at [44]. Eachnode runs a copy of Algorithm 1 until stabilisation. Then, a request(query) is made to the sink node using a simple graphical userinterface. Dingo provides a chart pane where individual node dataor network data can be plotted during the simulation.

Due to the effect that network density and nodes distributionhave on the performance analysis results, multiple simulation runsare combined to estimate uncertainties in the simulations. In otherwords, to demonstrate that the results are not biased to specificnetwork setup, we ran the same experiments for 5 different distri-butions. This makes our simulation a Monte Carlo simulation, asrepeated sampling from a distribution is performed. Therefore, ineach simulation run, network nodes were randomly positioned inthe simulation window. At each point, the measured metric of fiveruns was averaged.

9.4. Performance metrics, results and analysis

Query resolution is implemented in four algorithms: PA Wa-tershed, segments are used as high-level programming model tosupport query resolution; Logical Neighborhood, the network ispartitioned into logical groups as described in [9]; In-network pro-cessing, by data aggregation up the network spanning tree; andcentralised, by collecting and analysing all data at the sink.

Fig. 4(b) shows the network segmentation results using the PAWatershed algorithm. It is straightforward to extend the segmen-tation results of the thermal map using the generalised Voronoi,i.e., every location where there is no sensor node is allocated toits closest segment, Fig. 4(c). Since the most power consumingoperation is wireless communication, a query resolution cost ismeasured as the traffic overhead generated to answer the query.Fig. 5 shows the communication overhead generated by each ofthe four studied query resolution methods. The results indicatethat on average, segment-based query processing resulted in an al-most 79.5% decrease in the total number of transmissions over thecentralised approach, 66% decrease on average over the in-networkprocessing approach, and 40% decrease on average over the LogicalNeighborhood approach.

Centralised query collection incurs the highest messaging over-head amongst the four tested approaches. This is because the queryis received by every node in the network. If a node is carrying datarelevant to the posed query, then it sends it to the base stationthrough multi-hop communication.

In-network data aggregation attempts to limit the number ofexchanged messages while providing accurate answers to posedqueries. Sensor nodes form an aggregation tree, where parentnodes aggregate the data received from their children and forwardthe result to their own parents. However, similar to centralisedquery resolution, in-network data aggregation query resolutiondoes not provide a mechanism for isolating nodes carrying datairrelevant to a query. Some queries include the characteristics ofnodes to participate in the response; however, the query still hasto be received by all nodes in the network. Moreover, when the

Fig. 4. (a) Test thermal map; (b) PA Watershed segmentation results; and (c)Extend segmentation results.

Please cite this article in press as: M. Hammoudeh, R. Newman, Information exInformat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001

query is dealing with multiple elements targets, the size of col-lected data can potentially be excessive. This may require the frag-mentation of data to be transmitted through multiple messages,which results in dramatic increase in the number of transmissionsas the data moves towards the base station.

It is clear from Fig. 5 that Watershed-based query resolutionoutperforms Logical Neighborhood in terms of messagingoverhead. The separation of the Logical Neighborhood from itsexpressly devised routing protocol resulted in an increased com-munication overhead. This is because of incorporating the proto-col-dependent communication cost in the group membershipfunction. As a result, the span of communication also changes inan undesirable manner. Watershed segment abstraction leveragedlocalised interactions without pre-installing neighborhoodtemplates.

The number of nodes that participated in answering a specificquery has impact on the total response time, energy consumption,and accuracy of the response to that query. Eliminating nodes car-rying information unrelated to a specific query reduces energyconsumption and data analysis time as well as improves the re-sponse accuracy. Fig. 6 presents the number of nodes participatedin resolving a query at various network densities using the PA Wa-tershed and the Logical Neighborhoods algorithms. A node iscounted as involved in generating a query if it sends or forwardsthe query message or a response message associated to this query.The results show that on average, segment-based query processingresulted in an almost 24.5% decrease in the total number of nodesthat participated in answering the query over the Logical Neigh-borhoods approach at an identical accuracy level. The exclusionof non-relevant nodes to the query initial search allows drillingdown into the exact response; this new capability is a positive indi-cator of the scalability of the approach. Watershed leverages theknowledge of the information distribution, which makes it morefeasible than the Logical Neighborhoods and other traditional ap-proaches. In the Watershed-based approach, the neighbours setis built using the steepest paths metric, which provides an efficientway to sort nodes according to the data they carry. Accordingly,many query types such as maximum, minimum, or average are re-solved involving small portions of the network. Moreover, theShepard distance metric adopted in the Watershed algorithm helpsto enhance localised interactions by limiting the span of the neigh-bourhood. The combined Shepard neighbourhood metric resultedin better results than the cost construct defined in the LogicalNeighborhood. In the latter, cost is measured in credits on a per-node basis without considering the local and global node density.Thus, our approach performed better in determining the networksections to participate in resolving the posed query and how toreach them.

traction from sensor networks using the Watershed transform algorithm,

Page 10: Information extraction from sensor networks using the Watershed transform algorithm

Fig. 6. The number of nodes responded by sending their data to resolve the query.

10 M. Hammoudeh, R. Newman / Information Fusion xxx (2013) xxx–xxx

Fig. 7 shows the convergence time for the PA Watershed and theLogical Neighborhoods averaged across multiple runs at differentnetwork densities. It can be seen that Watershed-based segmenta-tion always generates better quality logical groups without incur-ring an unacceptable increase in convergence time (�30%)compared to the Logical Neighborhoods. This low convergencetime is due to its asynchronous operation, which makes groupsegmentation a fast process. Also, the time required to update seg-ments is small, because each node is only dependent on the set ofnodes on its path to the minima.

10. Conclusion

This paper advocates PA Watershed segments as a high-levelprogramming abstractions for query-based information extractionsystems. It presents network segmentation as a possible effectivesolution to the problem of the limitations of pure query-based sys-tems. The logical constructs provided by Watershed offer an effec-tive method for processing user queries. Segments provide a usefulabstraction for the low level programming of nodes (the establish-ment of network topology and of radio communications) while stillproviding users with the ability to extract high-level informationfrom the network. The advantages of Watershed segments include:dynamic construction and updating of segments; eliminating theneed for compilers or new programming constructs; catering fornode level communication; localised computation; segment setupconsiders the physical topology provided by wireless broadcast;and the span of the segment improves response accuracy whilereducing the processing cost.

Experimental results presented in Section 9 show that in-net-work, segment-based query resolution results in substantialenergy savings compared to the baseline and Logical Neighborhoodschemes. The PA Watershed groups nodes into logical energy effi-

Fig. 7. Watershed and Logical Neighborhoods convergence time with logarithmicregression.

Please cite this article in press as: M. Hammoudeh, R. Newman, Information exInformat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001

cient segments that minimise unnecessary communication and en-hance queries responses accuracy. Compared to the originalWatershed algorithm, the key enhancement in the PA Watershedis that travelling along the steepest paths and labelling are concur-rently and locally ran according to nodes states, throughout thecomplete segmentation process. A number of shortcomings so farin the work can be dealt with in the future, for instance the de-tailed study of segmentation setup and maintenance cost.

11. Future work

Leading directly from the work in this paper, there are a numberof avenues that need to be followed. Most importantly, investigatethe usefulness of the Watershed gradients to various network andapplication functions. For instance, in agent-oriented WSN systemsuch as [12], the gradient can be used for itinerary planning of themobile agents. The order in which sensor nodes are visited by themobile agent and the number of nodes it migrates to can have asignificant impact on energy consumption. Similarly, the gradientcan be used in generating more efficient query execution plans,i.e., the number of nodes to process a query and the order in whicha query reaches these nodes. For example, if the temperature atlocation (x,y) is 50 �C, then nearby locations temperature shouldbe correlated with that based on distance. The presented work usesthis natural gradient as a key characteristic to forward the querytowards the heat source. To select an optimal subset of sensornodes and to decide on an optimal order of how to incorporatethese measurements can be equally well supported by other meth-ods such as exploiting the relationships among multi-modal sensedata. As information about a particular event of interest is usuallycaptured in multiple sensed modalities, then, it is feasible to ex-ploit multi-modal relationships to merge multi-modal query dis-semination and data collection for different modalities together.Mathematically, this can be viewed as calculating a gradient thatis the derivative of a multi-variable function. However, now thatthere are multiple directions to consider; the direction of gradientis no longer simply ‘‘up’’ or ‘‘down’’ along the x-axis, like it is withfunctions of a single modality. The Watershed gradients can also beuseful to WSNs applications such as flood management. The gradi-ent can be utilised to create high-risk flood maps to predict, report,and control floods. These maps contain answers to queries relevantto the such tasks; for example, when rain falls on a catchment,what is the amount of rainwater that reached the waterways?

We also intend to investigate possible ways to exploit the gra-dient to give meaning to the collected data. For instance, in forestfire monitoring applications, the gradient at each point shows thedirection the temperature rises most quickly. The magnitude of thegradient determines how fast the temperature rises in that direc-tion. This information can be useful for firefighting operations. Bytaking a dot product, it is possible to measure how temperaturechanges in directions other than the direction of greatest change.Moreover, the gradient can be used as a visualisation tool similarto a vector map. It can be used to visually display the location ofsenor nodes as well as the direction and magnitude of data. Thisdata representation can be overlaid over geographic maps to allowusers from different communities to derive conclusions based onvisual information. This visualisation and analysis tool is visuallycommunicative, it provides information on spatial patterns, it im-plies the distributions and states, and it implies the relations ofvarious phenomena.

References

[1] B. Liu, H. Ju, Y. Yao, Object recognition and centroid detection based onmachine vision, in: Second International Conference on Mechanic Automationand Control Engineering, 2011, pp. 5945–5947.

traction from sensor networks using the Watershed transform algorithm,

Page 11: Information extraction from sensor networks using the Watershed transform algorithm

M. Hammoudeh, R. Newman / Information Fusion xxx (2013) xxx–xxx 11

[2] F2F, RFID from Farm to Fork, 2013. <http://www.rfid-f2f.eu/> (accessed15.03.13).

[3] L. Vincent, P. Soille, Watersheds in digital spaces: an efficient algorithm basedon immersion simulations, IEEE Trans. Pattern Anal. Mach. Intell. 13 (6) (1991)583–598.

[4] A. Pathak, V.K. Prasanna, Energy-efficient task mapping for data-driven sensornetwork macroprogramming, IEEE Trans. Comput. 59 (7) (2010) 955–968.

[5] T.W. Hnat, T.I. Sookoor, P. Hooimeijer, W. Weimer, K. Whitehouse, A modularand extensible macroprogramming compiler, in: Proceedings of the 2010 ICSEWorkshop on Software Engineering for Sensor Network Applications, SESENA’10, 2010, pp. 49–54.

[6] A. Pathak, M.K. Gowda, Srijan: a graphical toolkit for sensor networkmacroprogramming, in: Proceedings of the 7th Joint Meeting of theEuropean Software Engineering Conference and the ACM SIGSOFTSymposium on The Foundations of Software Engineering, ESEC/FSE ’09, 2009,pp. 301–302.

[7] T.I. Sookoor, T.W. Hnat, K. Whitehouse, Programming cyber-physical systemswith macrolab, in: Proceedings of the 6th ACM Conference on EmbeddedNetwork Sensor Systems, SenSys ’08, 2008, pp. 363–364.

[8] R. Newton, G. Morrisett, M. Welsh, The regiment macroprogramming system,in: Proceedings of the 6th International Conference on Information Processingin Sensor Networks, IPSN ’07, 2007, pp. 489–498.

[9] L. Mottola, G.P. Picco, Using logical neighborhoods to enable scoping inwireless sensor networks, in: Proceedings of the 3rd International MiddlewareDoctoral Symposium, 2006, pp. 6–12.

[10] L. Mottola, G.P. Picco, Programming wireless sensor networks: fundamentalconcepts and state of the art, ACM Comput. Surv. 43 (3) (2011). 19:1–19:51.

[11] F. Bellifemine, G. Fortino, R. Giannantonio, R. Gravina, A. Guerrieri, M. Sgroi,Spine: a domain-specific framework for rapid prototyping of WBSNapplications, Softw. Pract. Exp. 41 (3) (2011) 237–265. http://dx.doi.org/10.1002/spe.998.

[12] F. Aiello, G. Fortino, R. Gravina, A. Guerrieri, A java-based agent platform forprogramming wireless sensor networks, Comput. J. 54 (3) (2011) 439–454.http://dx.doi.org/10.1093/comjnl/bxq019.

[13] Oracle Labs, Small Programmable Object Technology (sun spot), 2012. <http://www.sunspotworld.com/> (accessed 26.09.12).

[14] N. Raveendranathan, S. Galzarano, V. Loseu, R. Gravina, R. Giannantonio, M.Sgroi, R. Jafari, G. Fortino, From modeling to implementation of virtual sensorsin body sensor networks, IEEE Sens. J. 12 (3) (2012) 583–593. http://dx.doi.org/10.1109/JSEN.2011.2121059.

[15] D. Roggen, C. Lombriser, M. Rossi, G. Tröster, Titan: an enabling framework foractivity-aware ‘pervasive apps’ in opportunistic personal area networks,EURASIP J. Wirel. Commun. Netw. (2011) 1:1–1:22. http://dx.doi.org/10.1155/2011/172831.

[16] K. Whitehouse, C. Sharp, E. Brewer, D. Culler, Hood: a neighborhoodabstraction for sensor networks, in: Proceedings of the 2nd InternationalConference on Mobile Systems, Applications, and Services, 2004, pp. 99–110.

[17] M. Welsh, G. Mainland, Programming sensor networks using abstract regions,in: Proceedings of the 1st Conference on Symposium on Networked SystemsDesign and Implementation, vol. 1, 2004, pp. 3–3.

[18] R. Gummadi, N. Kothari, R. Govindan, T. Millstein, Kairos: a macro-programming system for wireless sensor networks, in: Proceedings of theTwentieth ACM Symposium on Operating Systems Principles, SOSP ’05, 2005,pp. 1–2.

[19] T. Abdelzaher, B. Blum, Q. Cao, Y. Chen, D. Evans, J. George, S. George, L. Gu, T.He, S. Krishnamurthy, L. Luo, S. Son, J. Stankovic, R. Stoleru, A. Wood,Envirotrack: Towards an environmental computing paradigm for distributedsensor networks, in: Proceedings of the 24th International Conference onDistributed Computing Systems (ICDCS’04), 2004, pp. 582–589.

[20] L. Mottola, G.P. Picco, Logical neighborhoods: a programming abstraction forwireless sensor networks, in: Proceedings of the Second IEEE InternationalConference on Distributed Computing in Sensor Systems, Springer-Verlag,Berlin, Heidelberg, 2006, pp. 150–168.

Please cite this article in press as: M. Hammoudeh, R. Newman, Information exInformat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001

[21] V. Osma-Ruiz, J.I. Godino-Llorente, N. Sáenz-Lechón, P. Gómez-Vilda, Animproved watershed algorithm based on efficient computation of shortestpaths, Pattern Recognit. Lett. 40 (2007) 1078–1090.

[22] C. Kuo, S. Odeh, M. Huang, Image segmentation with improved watershedalgorithm and its FPGA implementation, in: The 2001 IEEE InternationalSymposium on Circuits and Systems, vol. 2, 2001, pp. 753–756.

[23] A. Bleau, L.J. Leon, Watershed-based segmentation and region merging,Comput. Vis. Image Underst. 77 (2000) 317–370.

[24] A. Bieniek, A. Moga, An efficient watershed algorithm based on connectedcomponents, Pattern Recognit. Lett. 33 (6) (2000) 907–916.

[25] V. Grau, A.U.J. Mewes, M. Alcaniz, R. Kikinis, S.K. Warfield, Improvedwatershed transform for medical image segmentation using priorinformation, IEEE Trans. Med. Imaging 23 (4) (2004) 447–458.

[26] H. Sun, J. Yang, M. Ren, A fast watershed algorithm based on chain code and itsapplication in image segmentation, Pattern Recogn. Lett. 26 (2005) 1266–1274.

[27] C. Rambabu, T. Rathore, I. Chakrabarti, A new watershed algorithm based onhillclimbing technique for image segmentation, in: Conference on ConvergentTechnologies for the Asia-Pacific Region, vol. 4, 2003, pp. 1404–1408.

[28] M. Swiercz, M. Iwanowski, Fast, parallel watershed algorithm based on pathtracing, in: Proceedings of the 2010 International Conference on ComputerVision and Graphics: Part II, ICCVG’10, 2010, pp. 317–324.

[29] B. Wagner, A. Dinges, P. Müller, G. Haase, Parallel volume image segmentationwith watershed transformation, in: Proceedings of the 16th ScandinavianConference on Image Analysis, SCIA ’09, Springer-Verlag, 2009, pp. 420–429.

[30] F. Meyer, Topographic distance and watershed lines, Signal Process. 38 (1994)113–125.

[31] J.B.T.M. Roerdink, A. Meijster, The watershed transform: definitions,algorithms and parallelization strategies, Fundam. Inform. 41 (1–2) (2000)187–228.

[32] D. Shepard, A two-dimensional interpolation function for irregularly-spaceddata, in: Proceedings of the 1968 23rd ACM National Conference, ACM, 1968,pp. 517–524.

[33] M. Hammoudeh, R. Newman, C. Dennett, S. Mount, Interpolation techniquesfor building a continuous map from discrete wireless sensor network data,Wirel. Commun. Mobile Comput. 2 (2011) 41–60. http://dx.doi.org/10.1002/wcm.1139.

[34] G.J. Pottie, W.J. Kaiser, Wireless integrated network sensors, Commun. ACM 43(5) (2000) 51–58.

[35] M. Hammoudeh, A. Kurz, E. Gaura, Mumhr: Multi-path, multi-hop hierarchicalrouting, in: Proceedings of the 2007 International Conference on SensorTechnologies and Applications, 2007, pp. 140–145.

[36] P. Sun, W.K. Seah, P.W. Lee, Efficient data delivery with packet cloning forunderwater sensor networks, in: Symposium on Underwater Technology andWorkshop on Scientific Use of Submarine Cables and Related Technologies,2007, pp. 34–41.

[37] M. Chu, J.J. Liu, F. Zhao, State-centric programming for sensor-actuatornetwork systems, IEEE Pervasive Comput. 2 (4) (2003) 50–62.

[38] E. Cheong, J. Liebman, J. Liu, F. Zhao, TinyGALS: a programming model forevent-driven embedded systems, in: Proceedings of the 2003 ACM symposiumon Applied computing, SAC ’03, 2003, pp. 698–704.

[39] L. He, Y. Chao, K. Suzuki, K. Wu, Fast connected-component labeling, PatternRecognit. Lett. 42 (2009) 1977–1987.

[40] S. Mount, M. Hammoudeh, A Development Tool for Wireless Sensor Networks,2011. <http://www.code.google.com/p/dingo-wsn/> (accessed 21.03.11).

[41] Gumstix.com, Gumstix Way Small Computing, 2011. <http://www.gumstix.com> (accessed 26.03.111).

[42] Crossbow Technology Inc, The Smart Sensors Company, 2011. <http://www.xbow.com> (accessed 26.03.1111).

[43] FLIR Systems, Flir Thermography, 2011. <http://www.goinfrared.com>(accessed 26.04.11).

[44] SciPy Cookbook, Cookbook Watershed, 2012. <http://www.scipy.org/Cookbook/Watershed> (accessed 15.10.12).

traction from sensor networks using the Watershed transform algorithm,