secure multidimensional queries in tiered sensor networks · secure multidimensional queries in...

25
arXiv:0911.4238v2 [cs.NI] 16 Dec 2009 Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu †§ , Chun-Shien Lu , and Sy-Yen Kuo § Institute of Information Science, Academia Sinica, Taipei, Taiwan § Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan Abstract—In this paper, aiming at securing range query, top-k query, and skyline query in tiered sensor networks, we propose the Secure Range Query (SRQ), Secure Top-k Query (STQ), and Secure Skyline Query (SSQ) schemes, respectively. In particular, SRQ, by using our proposed prime aggregation technique, has the lowest communication overhead among prior works, while STQ and SSQ, to our knowledge, are the first proposals in tiered sensor networks for securing top-k and skyline queries, respectively. Moreover, the relatively unexplored issue of the security impact of sensor node compromises on multidimensional queries is studied; two attacks incurred from the sensor node compromises, collusion attack and false-incrimination attack, are investigated in this paper. After developing a novel technique called subtree sampling, we also explore methods of efficiently mitigating the threat of sensor node compromises. Performance analyses regarding the probability for detecting incomplete query-results and communication cost of the proposed schemes are also studied. I. I NTRODUCTION Tiered Sensor Networks. Sensor networks are expected to be deployed on some harsh or hostile regions for data collec- tion or environment monitoring. Since there is the possibility of no stable connection between the authority and the network, in-network storage is necessary for caching or storing the data sensed by sensor nodes. A straightforward method is to attach external storage to each node, but this is economically infeasible. Therefore, various data storage models for sensor networks have been studied in the literature. In [6], [22], a notion of tiered sensor networks was discussed by introducing an intermediate tier between the authority and the sensor nodes. The purpose of this tier is to cache the sensed data so that the authority can efficiently retrieve the cache data, avoiding unnecessary communication with sensor nodes. The network model considered in this paper is the same as the ones in [6], [22]. More specifically, some storage- abundant nodes, called storage nodes, which are equipped with several gigabytes of NAND flash storage [24], are deployed as the intermediate tier for data archival and query response. In practice, some currently available sensor nodes such as RISE [21] and StarGate [29] can work as the storage nodes. The performance of sensor networks wherein external flash memory is attached to the sensor nodes was also studied in [15]. In addition, some theoretical issues concerning the tiered sensor networks, such as the optimal storage node placement, were also studied in [24], [30]. In fact, such a two-tiered network architecture has been demonstrated to be useful in increasing network capacity and scalability, reducing network management complexity, and prolonging network lifetime. Multidimensional Queries. Although a large amount of sensed data can be stored in storage nodes, the authority might be interested in only some portions of them. To this end, the authority issues proper queries to retrieve the desired portion of sensed data. Note that, when the sensed data have multiple attributes, the query could be multidimensional. We have observed that range query, top-k query, and skyline query are the most commonly used queries. Range query [12], [17], which could be useful for correlating events occurring within the network, is used to retrieve sensed data whose attributes are individually within a specified range. After mapping the sensed data to a ranking value, top-k query [33], which can be used to extract or observe the extreme phenomenon, is used to retrieve the sensed data whose ranking values are among the first k priority. Skyline query [5], [11], due to its promising application in multi-criteria decision making, is also useful and important in environment monitoring, industry control, etc. Nonetheless, in the tiered network model, the storage nodes become the targets that are easily compromised because of their significant roles in responding to queries. For example, the adversary can eavesdrop on the communications among nodes or compromise the storage nodes to obtain the sensed data, resulting in the breach of data confidentiality. After the compromise of storage nodes, the adversary can also return falsified query-results to the authority, leading to the breach of query-result authenticity. Even more, the compromised storage nodes can cause query-result incompleteness, creating an incomplete query-result for the authority by dropping some portions of the query-result. Related Work. Secure range queries in tiered sensor networks have been studied only in [23], [31], [40]. Data confidentiality and query-result authenticity can be preserved very well in [23], [31], [40] owing to the use of the bucket scheme [8], [9]. Unfortunately, encoding approach [23] is only suitable for the one-dimensional query scenario in the sensor networks for environment monitoring purposes. On the other hand, crosscheck approaches [31], [40] can be applied on sensor networks for event-driven purposes at the expense of the reduced probability for detecting query-result incompleteness. The security issues incurred from the compromise of storage nodes have been addressed in [23], [31], [40]. The impact of collusion attacks defined as the collusion among compromised sensor nodes and compromised storage nodes, however, was

Upload: others

Post on 24-Jun-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

arX

iv:0

911.

4238

v2 [

cs.N

I] 1

6 D

ec 2

009

Secure Multidimensional Queries in Tiered SensorNetworks

Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§†Institute of Information Science, Academia Sinica, Taipei, Taiwan

§Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan

Abstract—In this paper, aiming at securing range query, top-kquery, and skyline query in tiered sensor networks, we proposethe Secure Range Query (SRQ), Secure Top-k Query (STQ), andSecure Skyline Query (SSQ) schemes, respectively. In particular,SRQ, by using our proposedprime aggregation technique, hasthe lowest communication overhead among prior works, whileSTQ and SSQ, to our knowledge, are the first proposals intiered sensor networks for securing top-k and skyline queries,respectively. Moreover, the relatively unexplored issue of thesecurity impact of sensor node compromises on multidimensionalqueries is studied; two attacks incurred from the sensor nodecompromises,collusion attack and false-incrimination attack, areinvestigated in this paper. After developing a novel techniquecalled subtree sampling, we also explore methods of efficientlymitigating the threat of sensor node compromises. Performanceanalyses regarding the probability for detecting incompletequery-results and communication cost of the proposed schemesare also studied.

I. I NTRODUCTION

Tiered Sensor Networks.Sensor networks are expected tobe deployed on some harsh or hostile regions for data collec-tion or environment monitoring. Since there is the possibilityof no stable connection between the authority and the network,in-network storage is necessary for caching or storing thedata sensed by sensor nodes. A straightforward method is toattach external storage to each node, but this is economicallyinfeasible. Therefore, various data storage models for sensornetworks have been studied in the literature. In [6], [22], anotion of tiered sensor networks was discussed by introducingan intermediate tier between the authority and the sensornodes. The purpose of this tier is to cache the sensed dataso that the authority can efficiently retrieve the cache data,avoiding unnecessary communication with sensor nodes.

The network model considered in this paper is the sameas the ones in [6], [22]. More specifically, some storage-abundant nodes, calledstorage nodes, which are equipped withseveral gigabytes of NAND flash storage [24], are deployedas the intermediate tier for data archival and query response.In practice, some currently available sensor nodes such asRISE [21] and StarGate [29] can work as the storage nodes.The performance of sensor networks wherein external flashmemory is attached to the sensor nodes was also studied in[15]. In addition, some theoretical issues concerning the tieredsensor networks, such as the optimal storage node placement,were also studied in [24], [30]. In fact, such a two-tierednetwork architecture has been demonstrated to be useful in

increasing network capacity and scalability, reducing networkmanagement complexity, and prolonging network lifetime.

Multidimensional Queries. Although a large amount ofsensed data can be stored in storage nodes, the authoritymight be interested in only some portions of them. To thisend, the authority issues proper queries to retrieve the desiredportion of sensed data. Note that, when the sensed data havemultiple attributes, the query could be multidimensional.Wehave observed that range query, top-k query, and skyline queryare the most commonly used queries. Range query [12], [17],which could be useful for correlating events occurring withinthe network, is used to retrieve sensed data whose attributesare individually within a specified range. After mapping thesensed data to a ranking value, top-k query [33], which can beused to extract or observe the extreme phenomenon, is used toretrieve the sensed data whose ranking values are among thefirst k priority. Skyline query [5], [11], due to its promisingapplication in multi-criteria decision making, is also useful andimportant in environment monitoring, industry control,etc.

Nonetheless, in the tiered network model, the storage nodesbecome the targets that are easily compromised because oftheir significant roles in responding to queries. For example,the adversary can eavesdrop on the communications amongnodes or compromise the storage nodes to obtain the senseddata, resulting in the breach ofdata confidentiality. After thecompromise of storage nodes, the adversary can also returnfalsified query-results to the authority, leading to the breachof query-result authenticity. Even more, the compromisedstorage nodes can causequery-result incompleteness, creatingan incomplete query-result for the authority by dropping someportions of the query-result.

Related Work. Secure range queries in tiered sensornetworks have been studied only in [23], [31], [40]. Dataconfidentiality and query-result authenticity can be preservedvery well in [23], [31], [40] owing to the use of the bucketscheme [8], [9]. Unfortunately, encoding approach [23] is onlysuitable for the one-dimensional query scenario in the sensornetworks for environment monitoring purposes. On the otherhand, crosscheck approaches [31], [40] can be applied onsensor networks for event-driven purposes at the expense ofthereduced probability for detecting query-result incompleteness.The security issues incurred from the compromise of storagenodes have been addressed in [23], [31], [40]. The impact ofcollusion attacksdefined as the collusion among compromisedsensor nodes and compromised storage nodes, however, was

Page 2: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

2

only discussed in [40], wherein only a naive method wasproposed as a countermeasure. When the compromised sensornodes are taken into account, a Denial-of-Service attack, calledfalse-incrimination attack, not addressed in the literature, canbe extremely harmful. In such an attack, the compromisedsensor nodes subvert the functionality of the secure queryschemes by simply claiming that their sensed data have beendropped by the storage nodes. After that, the innocent storagenodes will be considered compromised and will be revoked bythe authority. It should be noted that all the previous solutionssuffer from false-incrimination attacks.

Contribution. Our major contributions are:

• The Secure Range Query (SRQ) scheme is proposed tosecure the range query in tiered networks (Sec. III-A).By taking advantage of our proposedprime aggregationtechnique for securely transmitting the amount of datain specified buckets, SRQ has the lowest communicationcost among prior works in all scenarios (environmentmonitoring and event detection purposes), while preserv-ing the probability for detecting incomplete query-resultsclose to1. It should be noted that although incorporatingbucket scheme [8], [9] (described in Sec. III-A1) in theprotocol design [23], [31], [40] is not new, the noveltyof our method lies on the use ofprime aggregationin reducing the overhead and guaranteeing query-resultcompleteness.

• For the first time in the literature, the issues of securingtop-k and skyline queries in tiered networks are studied(Secs. III-B and III-C). Our solutions to these two issuesare Secure Top-k Query (STQ) and Secure Skyline Query(SSQ), respectively. The former is built upon the pro-posed SRQ scheme to detect query-result completeness,while the efficiency of SSQ is based on our proposedgrouping technique.

• The security impact of sensor compromises is studied(Sec. V); collusion attackis formally addressed, and anew Denial-of-Service attack,false-incrimination attack,which can thwart the security purpose in prior works,is first identified in our paper. The resiliency of SRQ,STQ, and SSQ against these two attacks is investigated.With a novel technique calledsubtree sampling, someminor modifications are introduced for SRQ and STQas countermeasures to these two attacks. Moreover, thecompromised nodes can even be efficiently identified andbe further attested [25], [26], [28].

II. SYSTEM MODEL

In general, the models used in this paper are very similarto those in [6], [22], [23], [31], [40].

Network Model. As shown in Fig. 1, the sensor networkconsidered in this paper is composed of a large number ofresource-constrained sensor nodes and a few so-calledstoragenodes. Storage nodes are assumed to be storage-abundant andmay be compromised. In addition, in certain cases, storagenodes could also have abundant resources in energy, computa-tion, and communication. The storage nodes can communicate

with the authority via direct or multi-hop communications.Thenetwork is connected such that, for two arbitrary nodes, at leastone path connecting them can be found.

A cell is composed of a storage node and a number ofsensor nodes. In a cell, sensor nodes could be far awayfrom the associated storage node so that they can com-municate with each other only through multi-hop commu-nication. For example, in Fig. 1, without the relay of thegray node, the black node cannot reach the storage node.

Query

Query

Query

Reply

Reply

Reply The Authority

Storage Node

SensorNode

Cell

Cell-region

Fig. 1: A tiered sensor network.

The nodes in the networkhave synchronized clocks[27] and the time is dividedinto epochs. As in [23],[24], [31], [40], each nodeis assumed to be aware ofthe geographic position itlocates [14], [38] so thatthe association between thesensor node and storagenode can be established. Asa matter of fact, informa-tion about the time and ge-ographic position is indis-pensable for most sensornetwork applications.

For each cell, aggregation is assumed to be performedover an aggregation tree rooted at the storage node. Sincethe optimization of the aggregation tree structure is out ofthe scope of this paper, we adopt the method described inTAG [16] to construct an aggregation tree. We follow theconventional assumption that the topology of the aggregationtree is known by the authority [2], [4]. Similar to sensor nodes,storage nodes also perform the sensing task. Each sensor nodesenses the data and temporarily stores the sensed data in itslocal memory within an epoch. At the end of each epoch,the sensor nodes in a cell report the sensed data stored inlocal memory to the associated storage node. Throughout thispaper, we focus on a cellC, composed ofN−1 sensor nodes,{si}N−1

i=1 , and a storage nodeM.Security Model. We consider the adversary who can com-

promise an arbitrary number of storage nodes. After nodecompromises, all the information stored in the compromisedstorage nodes will be exposed to the adversary. The goalof the adversary is to breach at least one of the following:data confidentiality, query-result authenticity, and query-resultcompleteness. We temporarily do not consider the compromiseof sensor nodes in describing SRQ, STQ, and SSQ in Sec. III.The impact of sensor node compromise on the security breach,however, will be explored in Sec. V. Many security issuesin sensor networks, such as key management [3], [7], [36],broadcast authentication [13], [19], and secure localization[14], [38], have been studied in the literature. This paper fo-cuses on securing multidimensional queries that are relativelyunexplored in the literature, while the protocol design of theaforementioned issues are beyond the scope of this paper.

Query Model. The sensed data can be represented as ad-

Page 3: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

3

dimensional tuple,(A1, A2, . . . , Ad), whereAg, ∀g ∈ [1, d],denotes theg-th attribute. The authority may issue a properd-dimensional query to retrieve the desired portion of data storedin storage nodes. Three types of queries, including range query,top-k query, and skyline query, are considered in this paper.For range query, its form, issued by the authority, is expressedas〈C, t, l1, h1, . . . , ld, hd〉, which means that the sensed data tobe reported to the authority should be generated by the nodesin cell C at epocht, and theirg-th attributes,Ag ’s, should bewithin the range of[lg, hg], g ∈ [1, d]. Top-k query is usuallyassociated with a scalar (linear)ranking function. With rankingfunction,R, the sensed data, even if it is multidimensional, canbe individually mapped to a one-dimensionalranking value.The top-k query issued by the authority is in the form of〈C, t, R, k〉. As the first attempt to achieve secure top-k query,the goal of top-k query in this paper is simply assumed toobtain the sensed data generated by the nodes in cellC at epocht with the first k smallest ranking values. For skyline query,the desiredskyline dataare defined as those notdominatedbyany other data. Assuming that smaller values are preferabletolarge ones for all attributes, for a set ofd-dimensional data, adatumci dominatesanother datumcj if both the conditions,Ag(ci) ≤ Ag(cj), ∀g ∈ [1, d], andAg(ci) < Ag(cj), ∃g ∈[1, d], whereAg(ci) denotes theg-th attribute value of thedatumci, hold. Hence, the form of the skyline query issuedby the authority is given as〈C, t〉, which is used to retrievethe skyline data generated in cellC at epocht.

III. SECURING MULTIDIMENSIONAL QUERIES

In this section, aiming at securing range query, top-k query,and skyline query, we propose the SRQ (Sec. III-A), STQ(Sec. III-B), and SSQ (Sec. III-C) schemes, respectively. Notethat though SRQ, STQ, and SSQ use the bucket scheme [8],[9], the novelty of them is due to their design in efficientlydetecting the incomplete query-result (described later).

A. Securing Range Queries (SRQ)

Our proposed SRQ scheme consists of a confidentiality-preserving reporting phase (Sec. III-A1) that can simultane-ously prevent the adversary from accessing data stored inthe storage nodes, authenticate the query results, and ensureefficient multidimensional query processing, and a query-resultcompleteness verification phase (Sec. III-A2) for guaranteeingthe completeness of query-results.

1) Confidentiality-preserving reporting:Data encryption isa straightforward and common method of ensuring data confi-dentiality against a compromised storage node. Moreover, wehope that even when the adversary compromises the storagenode, the previously stored information should not be exposedto the adversary. To this end, the keys used in encryptionshould be selected from a one-way hash chain. In particular,assume that a keyKi,0 is initially stored in sensor nodesi.At the beginning of epocht, the keyKi,t, which is used onlywithin epocht, is calculated ashash(Ki,t−1), wherehash(·)is a hash function, andKi,t−1 is dropped. Suppose that sensornodesi has sensed dataD at epocht. One method for storing

D in the storage nodeM while preserving the privacy is tosend{D}Ki,t

, which denotes the encryption ofD with thekeyKi,t. With this method, when an OCB-like authenticatedencryption primitive [20] is exploited, the authenticity of Dcan be guaranteed. At the same time,D will not be knownby the adversary during message forwarding and even afterthe compromise of the storage node at epocht because theadversary cannot recover the keys used in the time beforeepocht. Nevertheless, no query can be answered byM ifonly encrypted data is stored inM. Hence, thebucket schemeproposed in [8], [9], which uses the encryption keys generatedvia a one-way hash chain, is used in the SRQ scheme.

In the bucket scheme, the domain of each attributeAg,∀g ∈ [1, d], is assumed to be known in advance, and isdivided into wg ≥ 1 consecutive non-overlapping intervalssequentially indexed from1 to wg, under a publicly knownpartitioning rule. For ease of representation, in the following,we assume thatwg = w, ∀g ∈ [1, d]. A d-dimensional bucketis defined as a tuple,(v1, v2, . . . , vd) (hereafter calledbucketID), wherevg ∈ [1, w], g ∈ [1, d]. The sensor nodesi, whenit has sensed data at epocht, sends toM the correspondingbucket IDs, which are constructed by mapping each attributeof the sensed data to the proper interval index, and the senseddata encrypted by the keyKi,t. For example, whensi hassensed data(1, 3), (2, 4), and(2, 11) at epocht, the messagetransmitted to the storage node at the end of epocht is〈i, t, (1, 1), {(1, 3), (2, 4)}Ki,t

, (1, 2), {(2, 11)}Ki,t〉, assuming

thatA1, A2 ∈ [1, 20], w = 2, and each interval length, set at10, is the same.

Let V be the set of all possible bucket IDs. Assume thatthere are on averageY andY/N data generated in a cell andin a node, respectively, at epocht. Assume that,Di,t,V is aset containing all the data within the bucketV ∈ V sensedby si at epocht. The messages sent fromsi to M at theend of epocht can be abstracted as〈i, t, Jσ, {Di,t,Jσ

}Ki,t〉,

whereJσ ∈ V , Jσ 6= Jσ′ , 1 ≤ σ, σ′ ≤ Y/N if there are somedata sensed bysi within epocht. Note thatsi sends nothingto M if Di,t,Jσ

’s, ∀Jσ ∈ V , are empty. After that,M cananswer the range query according to the information revealedby the bucket IDs. Assume thatlg andhg are located withintheαg-th andβg-th intervals, respectively, whereαg ≤ βg, αg,βg ∈ [1, w], andg ∈ [1, d]. The encrypted data falling into thebuckets in the setA = {(ρ1, . . . , ρd)|αg ≤ ρg ≤ βg, g ∈[1, d]} are reported to the authority. In other words, oncereceiving the range query,M first translates the informationl1, h1, . . . , ld, hd into the proper bucket IDs and then repliesall the encrypted data falling into the buckets1 in A.

Nevertheless, in tiered sensor networks, even when theoriginal bucket scheme is used,M could still maliciouslydrop some encrypted data and only report part of the resultsto the authority, resulting in an incomplete query-result.In

1There is a tradeoff between the communication cost and confidentialityin terms of bucket sizes because larger bucket size implies higher dataconfidentiality and higher communication cost due to more superfluous databeing returned to the authority. The design of optimal bucketing strategies isbeyond the scope of this paper, and we refer to [8], [9] for more details.

Page 4: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

4

the following, we will describe an extended bucket scheme,which incorporates the prime aggregation strategy into theoriginal bucket scheme, to detect the incomplete reply in acommunication-efficient manner.

2) Query-Result Completeness Verification:With primeaggregationtechnique, SRQ detects an incomplete reply bytaking advantage of aggregation for counting the amount ofsensed data falling into specified buckets. Together with ahash for verification purpose, the count forms a so-calledproof in detecting an incomplete reply. The storage nodeMis required to provide the proof to the authority at the epochspecified in the query so that the authority can use the proofto verify the completeness of received query-results. Since inour design all the sub-proofs generated by the nodes can beaggregated to yield the final proof, the communication cost canbe significantly reduced. The details are described as follows.

Assume that an aggregation tree [16] has been constructedafter sensor deployment. Recall that the domain of attributeAgis divided intow intervals. Before the sensor deployment, aset{pVi , p∅i |∀i ∈ [1, N−1], V ∈ V} of (wd+1)(N−1) primenumbers is selected by the authority such thatpVi 6= pV

i′ andp∅i 6= p∅i′ if i 6= i′ or V 6= V ′. Then, the set{pVi , p∅i |V ∈ V} ofwd+1 prime numbers, called the set ofbucket primesof si, isstored in each sensor nodesi. In addition, a set{kVi,0, k∅i,0|V ∈V} of wd + 1 keys is selected by the authority and is storedin each sensor nodesi initially. For fixed i and t, the set of{kVi,t, k∅i,0|V ∈ V} is called the set ofbucket keysof si at epocht. Bucket primes could be publicly-known, while bucket keysshould be kept secret. Each sensor nodesi, at the beginning ofepocht, calculateskVi,t = hash(kVi,t−1) and then dropskVi,t−1,∀V ∈ V . In addition,si also calculatesk∅i,t = hash(k∅i,t−1)

and then dropsk∅i,t−1.Recall that each nodesi on average hasY/N sensed data

at epoch t, and assume that the set ofY/N bucket IDsassociated with theseY/N sensed data isBi,t = {vi,t,σ|σ =1, . . . , Y/N}, which could be a multiset. Then, according toits sensed data,si calculatesHi,t = hashKi,t

(∑Y/N

σ=1 kvi,t,σ

i,t ),where hashK(·) denotes the keyed hash function with keyK, if it has sensed data, andHi,t = hashKi,t

(k∅i,t) oth-

erwise. Moreover,si computesPi,t =∏Y/Nσ=1 p

vi,t,σ

i if ithas sensed data, andPi,t = p∅i otherwise. Moreover, oncereceiving〈jρ, t, Ejρ,t,Bjρ,t,Hjρ,t,Pjρ,t〉, jρ ∈ [1, N−1], ∀ρ ∈[1, χ] from its χ children, sj1 , . . . , sjχ , si calculatesEi,t =(⋃χρ=1 Ejρ,t)

⋃Ei,t, whereEi,t denotes the set of encrypted

data sensed bysi at epocht, andBi,t = (⋃χρ=1 Bjρ,t)

⋃Bi,t,

whereBi,t denotes the set of bucket IDs ofEi,t. In addition,si also calculatesHi,t = hashKi,t

(∑χ

ρ=1 Hjρ,t + Hi,t) andPi,t =

∏χρ=1 Pjρ,t · Pi,t, ρ ∈ [1, χ]. Finally, si reports

〈i, t, Ei,t,Bi,t,Hi,t,Pi,t〉 to its parent node on the aggregationtree. Note that, ifsi is a leaf node on the aggregation tree,then we assume that it receives〈∅, ∅, ∅, ∅, 0, 1〉.

Assume that the set{pVM, p∅M|V ∈ V} of wd + 1 primenumbers stored inM are all different from those stored insensor nodes, and the set{kVM,0, k

∅M,0|V ∈ V} of wd + 1

bucket keys are selected by the authority and stored inM. M

computeskVM,t = hash(kVM,t−1) and dropskVM,t−1 at epocht. In addition,M also computesk∅M,t = hash(k∅M,t−1) anddropsk∅M,t−1 at epocht. For the storage nodeM, it can alsocalculateEM,t, BM,t, HM,t, andPM,t according to the itsown sensed data at epocht. In fact, the proceduresM needs toperform after messages are received from the child nodes arethe same as the ones performed by the sensor nodes. Actingas the root of the aggregation tree, however,M keeps theaggregated results, which are denoted asEM,t,BM,t,HM,t

andPM,t, respectively, in its local storage and waits for thequery issued by the authority. Note thatPM,t can be thoughtof as a compact summary of the sensed data of the wholenetwork and can be very useful for the authority in checkingthe completeness of the query-result, whileHM,t can be usedby the authority to verify the authenticity ofPM,t.

Assume that a range query〈C, t, l1, h1, l2, h2, . . . , ld, hd〉is issued by the authority. The encrypted data falling intothe buckets in the setA, along with the proof composedof HM,t and PM,t, are sent to the authority. OncePM,t

is received, the authority immediately performs the primefactor decomposition ofPM,t. Due to the construction of{pVi , pVM, p∅i , p

∅M|i ∈ [1, N − 1], V ∈ V}, which guarantees

that the bucket primes are all distinct, after the prime factordecomposition ofPM,t, the authority can be aware of whichnode contributes which data within specified buckets. As aresult, the authority can know which keys should be used toverify the authenticity and integrity ofHM,t. More specifi-cally, assume thatPM,t = (p1)

a1 · · · (pγ)aγ , a1, . . . , aγ > 0,γ ≥ 0, and thatp1, . . . , pγ are distinct prime numbers. Fromthe construction ofPM,t, we know that(pk)

ak , for k ∈ [1, γ],

is equal to(pk′′

k′ )ak , for k′ ∈ [1, N − 1] and k′′ ∈ V . From

the procedure performed by each node, it can also be knownthat the appearance of(pk)

ak = (pk

′′

k′ )ak in PM,t means that

at epocht the sensor nodesk′ producesak data falling intobucketk′′, contributing the bucket keykk

′′

k′,t in totalak times inHM,t. Here, the sensor nodesk′ producing the data falling intothe bucket∅ means thatsk′ senses nothing. Thus, we can inferthe total amount of data falling into specified buckets at epocht. Recall that the authority is aware of the topology of theaggregation tree. Thus, after the prime factor decompositionof PM,t, the authority can reconstructHM,t according to thederivedak’s andpk’s by its own effort, because it knowsKi,t

andkVi,t, ∀i ∈ {1, . . . , N−1,M}, ∀t ≥ 0, ∀V ∈ V . Therefore,we know that theHM,t reconstructed by the authority is equalto the receivedHM,t if and only if the receivedPM,t areconsidered authentic. When the verification ofPM,t fails, Mis considered compromised. When the verification ofPM,t issuccessful, the authority decrypts all the received encryptions,and checks whether the number of query-results falling intothe buckets inA matches those indicated byPM,t. If andonly if there are matches in all the buckets inA, the receivedquery-results are considered complete.

B. Securing Top-k Queries (STQ)

Basically, the proposed STQ scheme for securing top-kquery is built upon SRQ in that both confidentiality-preserving

Page 5: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

5

reporting and query-result completeness verification phases inSRQ are exploited. In particular, based on the proof generatedin SRQ, since it can know which buckets contain data, theauthority can also utilize such information to examine thecompleteness of query-results of top-k query. In other words,top-k query can be secured by the use of the SRQ scheme.Because of the similarity between the SRQ and STQ schemes,some details of the STQ scheme will be omitted in thefollowing description.

Here, a bucket data setis defined to be composed ofbucket IDs. We use ad-dimensional tuple,(v1, . . . , vd), wherevg ∈ [1, w], to represent the bucket IDs in a bucket dataset. With this representation, we can use the ranking func-tion, R, to calculate the ranking value of each bucket ID.Assume that theν-th interval in theg-th attribute containsthe values in[uℓg,ν , u

hg,ν]. The ranking value of the bucket ID,

(v1, . . . , vd), is evaluated asR(uℓ1,v1

+uh1,v1

2 , . . . ,uℓd,vd

+uhd,vd

2 ),

where thed-dimensional tuple,(uℓ1,v1

+uh1,v1

2 , . . . ,uℓd,vd

+uhd,vd

2 ),whose individual entry is simply averaged over the minimumand maximum values in each interval, acts as the representativeof the bucket(v1, . . . , vd) for simplicity.

Recall that we simply assume that the data with the firstk smallest ranking values are desired. The general form ofthe message sent fromsi to its parent node at the end ofepoch t is 〈i, t, Ei,t,Bi,t,Hi,t,Pi,t〉, where Ei,t, Bi,t, Hi,t,andPi,t are the same as those defined in SRQ. Assume thatζ1, . . . , ζk ∈ V arek bucket IDs in the bucket data set whoseranking values are among the firstk smallest ones. Accordingto BM,t, M can calculate the ranking values of bucket IDsin BM,t and, therefore, knowsζ1, . . . , ζk. To answer a top-kquery, 〈C, t, k〉, the storage nodeM reports the bucket IDs,ζ1, . . . , ζk, and their corresponding encrypted data, along withHM,t andPM,t, to the authority because it can be known thatthe data with the firstk smallest ranking values must be withinζ1, . . . , ζk. After receiving the query-result, the authority canfirst verify the authenticity ofPi,t by usingHi,t, and verify thequery-result completeness by usingPi,t. Note that both of theabove verifications can be performed in a way similar to theone described in Sec. III-A. Actually, after receivingPi,t, theauthority knows which buckets contain data and the amount ofdata. Hence, knowinguℓg,ν anduhg,ν , ∀g ∈ [1, d], ∀ν ∈ [1, w],the authority can also obtainζ1, . . . , ζk. Afterwards, what theauthority should do is to check if it receives the bucket IDs,ζ1, . . . , ζk, and if the number of data in bucketζg′ , g′ ∈ [1, k],is consistent with the number indicated byPi,t. If and onlyif these two verifications pass, the authority considers thereceived query-result to be complete and extracts the top-kresult from the encrypted data sent fromM.

C. Securing Skyline Queries (SSQ)

To support secure skyline query in sensor networks, in thefollowing we first present a naive approach as baseline, andthen propose an advanced approach that employs a groupingtechnique for simultaneously reducing the computation andcommunication cost.

1) Baseline scheme:To ensure the data confidentiality andauthenticity, as in the SRQ and STQ schemes, the sensed dataare also encrypted by using the bucket scheme mentioned inSec. III-A1. At the end of epocht, eachsi broadcasts itssensor ID, all the sensed data encrypted by keyKi,t, andthe proper bucket IDs to all the nodes within the same cell.Then, according to the broadcast messages, each sensor nodesi at epocht has a bucket data set composed of the bucketIDs extracted from broadcast messages and the bucket IDscorresponding to its own sensed data. In fact, the bucket datasets constructed by different nodes in the same cell at epocht will be the same. Treating these bucket IDs as data points,si can find the setΦt of skyline bucketsthat are defined asthe bucket IDs not dominated by the other bucket IDs. Here,since the bucket IDs are represented also byd-dimensionaltuples, the notion of domination is the same as the one definedin the query model of Sec. II. Definequasi-skyline dataasthe set of data falling into the skyline buckets. It can beobserved that the set of skyline data must be a subset of quasi-skyline data. After doing so, each node can locally find2 thequasi-skyline data, although there could be the cases wheresuperfluous data are also included. At the end of epocht, ifKi,t is smaller than a pre-determined threshold, thensi sendsits sensor ID andhashKi,t

(Φt) to M. Here,hashKi,t(Φt)

works as a kind of proof so that it can be used for checkingthe query-result completeness. Note that onlyhashKi,t

(Φt)needs to be transmitted toM becauseM also receives all theencrypted data and bucket IDs, and can calculate the quasi-skyline data by itself after message broadcasting.

To answer the skyline query〈C, t〉 the storage node reportsthe quasi-skyline data calculated at epocht and the hash valuesreceived at epocht to the the authority. Since the authorityknows the threshold andKi,0 ∀i ∈ [1, N ], it will expect toreceive hash values from a set of sensor nodes whose keys aresmaller than the threshold. Unfortunately, due to the network-wide broadcast, this baseline scheme works but is inefficientin terms of communication overhead. Hence, an efficient SSQscheme exploiting a grouping strategy is proposed as follows.

2) Grouping technique:Like the baseline scheme, thebucket scheme mentioned in Sec. III-A1 is also used. Givena data setO and a collection{O1, · · · ,Oτ} of subsets ofOsatisfying

⋃τj=1 Oj = O and Oj ∩ Oj′ = ∅, ∀j 6= j′ for

τ ≥ 1. An observation is that the skyline data ofO must bea subset of the union of the skyline data ofOj , ∀j ∈ [1, τ ].Thus, the key idea of our proposed SSQ scheme is to partitionthe sensor nodes in a cell into groups so that broadcasting canbe limited within a group, resulting in reduced computationand communication costs. In what follows, the SSQ schemewill be described in more detail.

The sensor nodes in a cell are divided intoµ disjointgroups,Gη, ∀η ∈ [1, µ], each of which is composed of|Gη|

2The design of an algorithm for efficiently finding the skylinedata given adata set is a research topic, but is beyond the scope of this paper. We considerthe naive data-wise comparison-based algorithm with running timeO(n2) ifthe size of the data set isn, but each node, in fact, can implement an arbitraryalgorithm for finding skyline data in our setting.

Page 6: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

6

sensor nodes. The grouping needs to be performed only onceright after the sensor deployment. Note that each group isformed by nearby sensor nodes and the grouping procedureis independent of the structure of the aggregation tree withoutaffecting SRQ and STQ. Letcell-regionbe part of the sensingregion monitored by one specified cell. For example, with theassumption that the shape of each cell-region is approximatelya square, as shown in Fig. 1, and sensor nodes with uniformdeployment are considered, the grouping can be achieved bysimply dividing a cell-region intoµ (=

√N ) sub-cell-regions.

The sensor nodes in the same sub-cell-region form a group.Note that the square cell-region is assumed here for ease ofexplanation, but is not necessary for the grouping procedures3.

At the end of epocht, each sensor nodesi broadcasts itssensor ID, itsorder seedθi,t = hashKi,t

(i), all the sensed dataencrypted with the keyKi,t, and the proper bucket IDs to allthe nodes within the same group. After doing so, as in baselinescheme, the sensor nodes in the same group can locally findthe quasi-skyline data from the bucket data set whose entriesare generated by the sensor nodes in the same group. LetΦη,tbe the set of skyline bucket IDs andφη,t their correspondingquasi-skyline data encrypted by the sensor nodes in groupηusing proper keys at epocht. At the end of epocht, if θi,tis among the firstξ1 smallest ones in the set of order seedsin groupη at epocht, whereξ1 is a pre-determined thresholdknown by each node, thensi reportsΦη,t, φη,t, its verificationseedhashKi,t

(Φη,t), and the IDs of sensor nodes generatingφη,t to M. In fact, ξ1 = 1 is sufficient for the verificationpurpose.ξ1, however, is also related to the resiliency againstsensor node compromises. Thus, we still keepξ1 as a variableand defer the explanation of the purpose ofξ1 to Sec. V.Here, the purpose of verification seeds is that the completenessof quasi-skyline data can be guaranteed by exactlyξ1 sensornodes for each group, while the purpose of order seed is toguarantee that at each epoch exactlyξ1 sensor nodes will sendthe verification seeds as the proofs.

To answer a skyline query,〈C, t〉, the storage nodeMreports a hash of all the received verification seeds,ht =hash(||i∈Γt

hashKi,t(Φi,t)), where || denotes the bit-string

concatenation andΓt is the set of sensor nodes responsiblefor sending a hash value toM in each group at epocht, theset of skyline bucket IDs, and their corresponding encrypteddata received at epocht to the authority. Since it knows thethresholdξ1 andKi,t, ∀i ∈ {1, . . . , N − 1,M}, the authoritywill expect to receive a particular hash value fromM. Ifand only if the hash sent from theM matches the hash ofthe verification seeds calculated according to the knowledgeof Γt andKi,t by the authority itself, the received data areconsidered complete, and contain the skyline data.

3For example, the authority knowing the position of each nodeor the use ofclustering algorithms can also divide nodes into groups. Ingeneral, after thelocalization, each sensor node can join the proper group possibly accordingto its geographic position when the grouping information such as the sizes ofsensing region and cell-region are preloaded in sensor nodes.

IV. PERFORMANCEEVALUATION

We will focus on analyzing the critical issue of detectingan incomplete query-result in tiered networks. In this section,the detection probability and communication cost of query-result completeness verification in the proposed schemes willbe analyzed. It is assumed that the number of hops betweenM and each sensor node is

√n for a collection ofn uniformly

deployed nodes [1]. In this section, both detection probabilityand communication cost are discussed at a fixed epocht.

As the communication cost of encoding approach [23]grows exponentially with the number of attributes, and somecrosscheck approaches [31], [40] have relatively low detectionprobability, in the following, we compare SRQ with onlyhybrid crosscheck[40], which achieves the best balance be-tween the detection probability and communication cost in theliterature. Note that, the parameter setting required in hybridcrosscheck is the same as that listed in [40].

A. Detection Probability

The detection probability is defined as the probability thatthe compromised storage nodeM is detected if it returnsan incomplete query-result. With the fact that the larger theportion of query-resultM drops, higher the probability thatthe authority detects it, we consider the worst case that onlyone bucket and its corresponding data in the query-result aredropped byM and the number of data sensed by a node iseither0 or 1 as the lower bound of detection probability.

1) Detection probability for SRQ and STQ:To return anincomplete query-result without being detected by the author-ity, M should create a proof,〈HM,t, PM,t〉, correspondingto the incomplete query-result. Since bucket primes can beknown by the adversary,PM,t can be easily constructed.Nevertheless,HM,t cannot be constructed, since the bucketkeys of sensor nodes generating the bucket dropped byM arenot known by the adversary. Therefore, only two options canbe chosen by the adversary. First, the adversary can directlyguess to obtainHM,t, with probability being2−ℓh , whereℓhis the number of bits output by a keyed hash function. Thisimplies that the detection probabilityPSRQdet,1 is 1 − 2−ℓh forthe first case. Second, knowing the aggregation tree topology,the adversary can also follow the rule of SRQ to construct theHM,t without considering the bucket key of dropped bucket.Assume that the probability that a sensor node has senseddata isδ. The size of the bucket key pool can be derived as(N−2)(2δ+1−δ)+2 = Nδ+N−2δ. Thus, the probabilityfor the adversary to guess successfully is2−ℓk(Nδ+N−2δ),where ℓk is the number of bits of a key, leading to thedetection probabilityPSRQdet,2 is 1 − 2−ℓk(Nδ+N−2δ) for the

second case. Overall, the final detection probability,PSRQdet ,

is min{PSRQdet,1,PSRQdet,2}. On the other hand, as stated in Sec.

III-B, the STQ scheme is built upon the SRQ scheme. Thus,the detection probability,PSTQdet , will be the same asPSRQdet .As Fig. 2 depicts, the detection probability of SRQ is closeto 1 in any case. However, hybrid crosscheck is effective onlywhen a few sensed data are generated in the network. Such

Page 7: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

7

a performance difference can be attributed to the fact that thesensed data in the network are securely and deterministicallysummarized in the proof of SRQ but they are probabilisticallysummarized in hybrid crosscheck.

0500

10001500

2000

0

5

10

15

200.9992

0.9994

0.9996

0.9998

1

Hybrid Crosscheck

SRQ

Det

ectio

n pr

obab

ility

Nd

(a)

0500

10001500

2000

0

5

10

15

200.4

0.5

0.6

0.7

0.8

0.9

1

Hybrid Crosscheck

SRQ

Det

ectio

n pr

obab

ility

Nd

(b)

Fig. 2: The detection probability of SRQ and hybrid crosscheck in the cases that (a)Y = 100 and (b)Y = 5000.

2) Detection probability for SSQ:To return an incom-plete query-result without being detected by the authority,M should forge a proof,i.e., a hashht of all the receivedverification seeds, corresponding to the incomplete query-result. Therefore, only two options can be chosen by theadversary. First, the adversary can directly guess a hash valueht. The probability for the adversary to guess successfully is2−ℓh , implying the detection probabilityPSSQdet,1 is 1 − 2−ℓh

in this case. Second, the adversary can also follow the ruleof SRQ to constructht for incomplete data. In this case, theadversary is forced to guessξ1 keys for each one ofµ groups,leading to the probability of success guess being2−ℓkµξ1 andthe detection probability beingPSSQdet,2 = 1 − 2−ℓkµξ1 . The

final detection probability,PSSQdet , is thus,min{PSSQdet,1,PSSQdet,2}.

Obviously,PSSQdet will also be close to1 when appropriate keylength or hash function is selected.

B. Communication Cost

The communication cost,T, is defined as the number of bitsin the communications required for the proposed schemes. Weare mainly interested in the asymptotic result in terms ofdandN because they reflect the scalability of the number ofattributes and the network size, respectively. We do not countthe number of bits in representing data,Ei,t, since the sendingof Ei,t is necessary in any data collection scheme. We furtherassume that there are on averagepdtobY data buckets, where0 ≤ pdtob ≤ 1, generated in cellC.

1) Communication Cost of SRQ and STQ:Each sensornode si in SRQ is required to sendBi,t,Hi,t, and Pi,t toits parent node. Nevertheless,Bi,t,Hi,t, and Pi,t can beaggregated along the path in the aggregation tree. As a con-sequence,si actually has only one-hop broadcast containingBi,t,Hi,t, andPi,t once at each epoch. In addition, to answera range query,M is responsible for sendingHM,t, PM,t,and the bucket IDs inA. In summary, the communicationcost, TSRQ, can be calculated as(N − 1)(ℓh + ℓP ) +pdtobY ⌈logw⌉d logN + ℓh + ℓP + |A|⌈logw⌉d = O(N +d logN), whereℓP is the number of bits used to represent thebucket primePM,t. Due to the similarity between SRQ andSTQ, the communication cost,TSTQ, can also be calculatedasO(N + d logN) in a way similar to the one for obtainingTSRQ.

0500

10001500

2000

0

5

10

15

200

1

2

3

4

5

x 107

SRQ

Hybrid Crosscheck

Total number of attributesNumber of nodes in a cell

Com

mun

icat

ion

cost

(a)

0500

10001500

2000

0

5

10

15

200

2

4

6

8

x 107

SRQ

Total number of attributes

Hybrid Crosscheck

Number of nodes in a cell

Com

mun

icat

ion

cost

(b)

Fig. 3: The communication cost of SRQ and hybrid crosscheck in the cases that (a)Y = 100 and (b)Y = 5000.

As shown in Fig. 3 where the parametersℓh = 80 andℓP = 1000 are used, the communication cost of SRQis significantly lower than that of hybrid crosscheck. Morespecifically, as the communication cost of hybrid crosscheckcan be asymptotically represented asO(N2+Nd) and will bedrastically increased withN andd, the proposed SRQ scheme,however, exhibits low communication cost regardless of theamount of sensed data in the network due to the fact that thesize of the proof used in SRQ is always a constant. Hence,the communication cost of SRQ will be dominated by theaggregation procedure, the average hop distance betweenMand each node, and the transmission of bucket IDs, resultingO(N + d logN) communication cost.

2) Communication Cost of SSQ:Since grouping is onlyperformed once after sensor deployment, we ignore its com-munication cost. Note that, in the following, the communica-tion cost of the data should be counted because it is involvedin the design of SSQ. After the grouping, eachsi broadcasts

Page 8: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

8

the data bucket IDs, sensor ID, and the order seed to all thenodes within the same group. Assuming that the nodes employa duplicate suppression algorithm, by which each node onlybroadcasts a given message once, one node should broadcasta message withYN ℓd +

pdtobYN ⌈logw⌉d+ ℓid + ℓh bits, where

ℓd is the average size of a datum andℓid is the number of bitsrequired to represent sensor IDs, resulting in communicationcost ofC1 = |Gη|2( YN ℓd +

pdtobYN ⌈logw⌉d + ℓid + ℓh) bits

in each group. Let0 ≤ pq ≤ 1 be the average ratio ofthe quasi-skyline data to all the sensed data.|Φη,t| is equalto pdtobpqY on average. Then, once the the order seed isamong the firstξ1 smallest ones of the order seeds in a group,si ∈ Gη is required for sendingΦη,t, φη,t, hashKi,t

(Φη,t),its sensor ID, and the IDs of sensor nodes generating bucketsin Φη,t to M, implying the communication cost ofC2 =pdtobpqY ⌈logw⌉d+pqY ℓd+ℓh+ℓid+|Gη|ℓid bits in the worstcase. Note that the verification seeds cannot be aggregated,although the verification seed can also be delivered along thepath to M on the aggregation tree.M needs to send theskyline bucket IDs, its sensor ID, and a hashht as the proof tothe authority for answering the query. The communication costof M is C3 = ℓid+ ℓh+µ

∑µη=1 pdtobpqY ⌈logw⌉d+pqY ℓd.

Consequently, the upper bound of communication cost,TSSQ,

can be obtained asµC1 +µξ1C2 logN +C3 = O(N3

2 +Nd)when µ =

√N and |Gη| =

√N , ∀η ∈ [1, µ]. By similar

derivation, the communication cost of the baseline scheme isO(N2d). Thus, exploiting the proposed grouping techniquedoes reduce the required communication cost. The trends ofcommunication cost in SSQ are shown in Fig. 4.

0500

10001500

2000

0

5

10

15

200

2

4

6

8

10

x 106

Nd

Com

mun

icat

ion

cost

(a)

0500

10001500

2000

0

5

10

15

200

2

4

6

8

10

x 107

Nd

Com

mun

icat

ion

cost

(b)

Fig. 4: The communication cost of SSQ for (a)Y = 100 and (b)Y = 5000.

V. I MPACT OF SENSORNODE COMPROMISE

In the previous discussions, we have ignored the impact ofthe compromise of sensor nodes. In practice, the adversarycould take the control of sensor nodes to enhance the ability

of performing malicious operations. The notations is usedto denote a set of random sensor nodes compromised bythe adversary. Incollusion attackconsidered here,M col-ludes withs in the hope that more portions of query-resultsgenerated by innocent sensor nodes can be dropped. Sincecrosscheck approaches [40] suffer from collusion attacks,theimpact of collusion attack on secure range query for tieredsensor networks was addressed in [40]. Their proposed methodis random probing, by which the authority occasionally checksif there is no data sensed by some randomly selected sensornodes by directly communicating with them. Random probing,however, can only discover the incomplete query-results withan inefficient but possibly lucky way and cannot identifys.

On the other hand, in this paper, we identify a new Denial-of-Service attack never addressed in the literature, called false-incrimination attack, by which s provides false sub-proofs tothe innocent storage node so that the innocent storage nodewill be regarded as the compromised one and be revoked.Unfortunately, all the prior works [23], [31], [40] suffer fromthis attack. In summary, with minor modifications involved,our proposed SRQ, STQ, and SSQ schemes are resilientagainst both the collusion attack and false-incriminationattack.

It should be especially noted that, in the following discus-sion of SRQ, we temporarily make two unrealistic assumptionsthat the compromised nodes can only disobey the proceduresof the proposed schemes4 and the (subtree) proofs (definedlater) will not be manipulated by the compromised storageand sensor nodes, in order to emphasize on the effectivenessof our proposed technique in identifying compromised nodes.Nevertheless, these two assumptions will be relaxed later.

Impact of Sensor Node Compromise on SRQ and STQ.Under the above two assumptions, the SRQ scheme is in-herently resilient against collusion attack, because, regardlessof the existence and position ofs, the proper bucket keyswill be embedded into the proofs and cannot be removedby s. Nevertheless, SRQ could be vulnerable to the false-incrimination attack since false sub-proofs injected bys willbe integrated with the other correct sub-proofs to constructa false proof, leading to the revocation of innocentM. Here,we present a novel technique calledsubtree samplingenablingSRQ, with a slight modification, to efficiently mitigate thethreat of false-incrimination attacks. The idea ofsubtreesampling is to check if the proof constructed by the nodesin a random subtree with fixed depth is authentic so asto perform the attestation only on the remaining suspiciousnodes. Letm be a user-selected constant indicating thesubtree depth. In the modified SRQ scheme, once receiving〈jρ, t, Ejρ,t,Bjρ,t,Hjρ,t,Pjρ,t, ϑ0jρ,t, . . . , ϑ

m−1jρ,t

〉, jρ ∈ [1, N −1], ∀ρ ∈ [1, χ], from its χ children, sj1 , . . . , sjχ , each sicalculatesEjρ,t, Bjρ,t, Hjρ,t, andPjρ,t as in the original SRQscheme. Note that, ifsi is a leaf node on the aggregationtree, it is assumed thatsi receives〈∅, ∅, ∅, ∅, 0, 1, ∅, . . . , ∅〉. Inthe modified SRQ scheme, however,si additionally performs

4In other words, compromised nodes are assumed tonot inject bogus sensorreadings. They can only manipulate its own subproof and the proofs sent fromits descendant sensor nodes.

Page 9: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

9

the following operations. Assume thatHυjρ,t, P

υjρ,t ∈ ϑυjρ,t,

∀υ ∈ [0,m − 1], ∀jρ ∈ [1, N − 1], and ∀ρ ∈ [1, χ]. sicalculatesHυ

i,t = hashKi,t(∑χ

ρ=1 Hυ−1jρ,t

+ Hi,t) and P υi,t =∏χρ=1 P

υ−1jρ,t

· Pi,t, ∀υ ∈ [1,m]. si also calculatesH0i,t = Hi,t

and P 0i,t = Pi,t, whereHi,t andPi,t are computed in a way

stated in Sec. III-A. Then,Hυi,t and P υi,t are assigned to set

ϑυi,t, ∀υ ∈ [0,m − 1]. If hashKi,t(i) ≤ ξ2, where ξ2 is a

pre-determined threshold known by each node and will beanalyzed later, thensi sendsϑmi,t to (possibly compromised)M. Let Tmi be a subtree of the underlying aggregation tree,rooted atsi with depthm. ϑmi,t generated bysi can be thoughtof as thesubtree proofof the data sensed by the nodes inTmi .Finally, si sends〈i, t, Ei,t,Bi,t,Hi,t,Pi,t, ϑ0i,t, . . . , ϑm−1

i,t 〉 toits parent node. LetWt be thewitness setof sensor nodessatisfyinghashKi,t

(i) ≤ ξ2 at epocht. The nodes inWt arecalledwitness nodesat epocht.

(a)

(b)

Fig. 5: The conceptual diagrams of identifying the compromised nodes. (a) Only thenodes in red area need to be attested. (b) Only the nodes in gray area need to be attested.

We first consider the simplest case, where no compromisednodes act as the witness nodes. We further assume that onlyone compromised node (i.e., |s| = 1) injects false sub-prooffor simplicity and our method can be adapted to the case ofmultiple compromised nodes injecting false sub-proofs. Wehave the following observation regarding each witness nodesiand its generatedϑmi,t. Assume that the query-result is found tobe incomplete at epocht. The authority requestsϑmi,t for eachwitness nodesi stored inM. To identify the compromisednodes, all the nodes at first are considered neutral. The nodesin Tmi become innocent if the verification ofϑmi,t passes, andthe nodes inTmi become suspicious otherwise5. The aboveverification is performed as follows. According to thePmi,t inthe subtree proofϑmi,t, the authority knows which nodes in thesubtreeTmi contribute data and their amount. The authority

5Note that there would be the cases that a node is deemed to be bothinnocent and suspicious when different subtree proofs are considered. Whensuch a case happens, that node obviously should be innocent.

can, based on this information, calculateHmi,t by itself. Define

ETmi,t as the set of data sensed by the nodes inTmi , andBTm

i,t

as the corresponding bucket IDs ofETmi ,t. As a consequence,

the verification passes if and only if theHi,t calculated bythe authority itself is equal to theHi,t extracted from thereceived receivedϑmi,t, and the amount of the data inETm

i ,t

falling into the specified buckets inBTmi,t matches the one

indicated inPi,t. As a whole, each time the above checkingprocedures are performed according toϑmi,t at epocht, nodeswill be partitioned into three sets, innocent setIit, neutralset, and suspicious setSi

t containing innocent nodes, neutralnodes, and suspicious nodes, respectively. We can concludethat (

⋂i∈Wt,Si

t 6=∅ Sit)⋃{M} contains at least one compro-

mised node injecting the false subproof if at least oneSit is

nonempty and({s1, . . . , sN−1}\(⋃i∈Wt

Iit))⋃{M} contains

at least one compromised node injecting the false subproofotherwise. In more details, the authority at first only performsthe attestation [25], [26], [28] on the nodes in

⋂i∈Wt,Si

t 6=∅ Sit

or in {s1, . . . , sN−1} \ (⋃i∈Wt

Iit). Nonetheless, after theattestation, if the nodes being attested are ensured to be notcompromised, thenM should be the compromised node. Itcan be observed that the size of the set of the nodes theauthority needs to perform the attestation has possibilityofbeing drastically shrunk so that the computation and commu-nication cost required in the attestation will be substantiallyreduced as well. It can also be observed that each time theattestation is performed, at least one compromised node canbe recovered. The intuition behind the checking procedure canbe illustrated in Fig. 5. In addition, Fig. 6 depicts the numberof nodes to be attested after an incomplete reply is foundin different settings. Since the number of witness nodes isapproximatelyξ2(N − 1)/2ℓh, it can be observed from Fig. 6that the larger theξ2 andm, the lower the number of nodesto be attested. Nevertheless, whenξ2 andm become larger,the communication cost of the modified SRQ scheme, whichwill be presented later, is increased as well.

There, however, would still be the cases where the compro-mised nodes luckily act as witness nodes so that they can beconsidered innocent by sending genuine subtree proof toM.In our consideration, this case does happen, but our techniquealso successfully mitigates the threat of false-incriminationattacks because the effectiveness of false-incriminationattacksis now limited within the case where some compromised nodeswork as witness nodes.

Now, we have to remove the two unrealistic assumptions wemade before, allowing that the bogus data can be injected andthe subtree proof sent from the witness node has possibilityto be maliciously altered on its way to the authority. Afterthe removal of these two unrealistic assumptions, SRQ isstill resilient against collusion attacks because the use ofthe proofs guarantees that the misbehavior of deleting thesensed data will be detected. Unfortunately, on the one hand,the compromised nodes injecting the bogus data can deceivethe authority into accepting the falsified sensor reading. Onthe other hand, the compromised nodes lying on the path

Page 10: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

10

0

50

100

150

200 0

2

4

6

80

100

200

300

400

500

Subtree depth (m)Number of witness nodes

Num

ber

of nodes to b

e a

tteste

d

(a)

0

50

100

150

200 0

2

4

6

8

0

200

400

600

800

1000

Subtree depth (m)Number of witness nodes

Num

ber

of nodes to b

e a

tteste

d

(b)

Fig. 6: The number of nodes to be attested for (a)N = 500 and (b)N = 1000.

betweenM and witness nodes can manipulate the subtreeproofs so that the compromised nodes can avoid the detectionand the innocentM can still be falsely incriminated. In ourstrategy, we use theredundancy property, which have beenwidely used in the design of the other security protocols inWSNs such as en-route filtering [34], [35], [37], [39], tomitigate the former threat, while we, motivated by the IPtraceback technique in internet security literature, develop anew traceback technique suitable for WSNs to resist againstthe latter attack. Here, the redundancy property means thatusually WSNs are densely deployed so that an event in thesensing region can be simultaneously detected by multiplenodes. In general, the redundancy property can be obtainedby achieving the so-calledt-coverage [10], [18], [32] andtherefore, an event can be simultaneously observed by at leastt nodes. In the following description of our remedy to theseproblems, due to its similarity to our modified SRQ schemepreviously mentioned, we will omit some notational details,stressing on the procedures itself.

When the redundancy property is used, in essence, ourSRQ does not need to be changed. Assume for now that, theauthority issues a range query and receives the query-resultfrom M. In addition, we also assume that all the checkingprocedures stated in (modified) SRQ are passed. The authoritynow wants to know whether the received data are falsified byand sent from the compromised nodes. Recall that each nodecan be aware of its geographic position. After knowing whichnodes contribute the sensed data to its issued query from thequery-result, the authority further acquires the encrypted dataof the neighbors of those nodes fromM. Note that this canbe achieved because when geographic positions of all nodes

are known by the authority6, inferring the one-hop neighborsof a specific node can be easily achieved. Finally, for eachnode in the query-result, the authority checks the consistencyof its sensor reading and the sensor readings of its one-hopneighbors. The sensed data will be rejected as long as it isinconsistent with the sensor readings of its neighbors. Theconsistency checking procedure here may allow for certainmeasurement errors or environmental factors, which shouldbe domain-specific and user-defined.

Now, we turn to address another problem that the subtreeproofs transmitted from the witness node to the authoritycould be maliciously altered by the compromised (storageor sensor) nodes. To deal with this kind of threat, we needa mechanism, by which the receiver not only can knowwhether the received message is modified by the intermediatenodes, but also can point out the node modifying the messageif it does exist. Motivated by the IP traceback techniques,we develop arecursive tracebackmechanism. To be morespecifically, each nodesi on the path connecting theM andthe witness node, after receivingD, whereD denotes thesubtree proof, from its descendant node, attacheshashKi,t

(D)toD and then forwardsD||hashKi,t

(D) to its ascendant nodeon the underlying aggregation tree. Note that it is assumed thatthe witness node receivesD||∅. Thus, with this modification, asubtree proof received by the authority should be accompaniedwith ψ hashes, whereψ is the number of nodes (includingstorage nodes and the witness node itself) between a specificwitness node and the authority. Here, we should note thatbecause the topology of the aggregation tree is known bythe authority, when the subtree proof is sent, the IDs ofintermediate nodes except for the ID of the witness nodeitself do not need to be attached7. Hence, when the query-result is deemed incomplete, before conducting the proceduresdefined in the subtree sampling technique to attest nodes,the authority checks whether the received subtree proof ismaliciously altered by the intermediate node. In particular,assume that the subtree proof and itsψ associated hashes,D||hD1 || · · · ||hDψ , wherehDi , 1 ≤ i ≤ ψ, is the hash calculatedby the node that is(i − 1)-hop away from the witness nodeandhD1 is computed by the witness node itself according toits asserted subtree proof, are received by the authority. Afterthe reception ofD||hD1 || · · · ||hDψ , the authority checks theconsistency of the hash backward;i.e., it first checkshDψ , andthenhDψ−1, and so on. If such a verification can be successfullyproceeded allψ hashes, then the subtree proofD is consideredto be intact. Otherwise, once the verification fails in, say,hDj ,we can conclude that the subtree proof was altered by thecorresponding node since the innocent node is not assumed tobehave in such a way.

6As long as each node knows its position, it sends in a multihopmannerits position to the authority with the MAC constructed by thekey uniquelyshared with the authority. This kind of operations are only performed onceafter the sensor deployment.

7The path from any sensor node in the aggregation tree toM is unique.The authority can infer the nodes the subtree proof traverses once it is awareof the ID of the witness node.

Page 11: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

11

Recall that the communication cost of original SRQ isO(N + d logN). In the modified SRQwithout those twounrealistic assumptions, each nodesi at epocht is requiredto additionally sendϑ0i,t, . . . , ϑ

m−1i,t , leading toO(N) com-

munication cost. At epocht, approximatelyξ2(N − 1)/2ℓh

witness nodes will send their subtree proofs toM. Never-theless, when the subtree proofs are sent toM, since theadditional hashes will be added, its communication cost willbe O(N · (

√N + (

√N − 1) + · · · + 1)). As a result, the

communication cost becomesO(N2 + d logN).Basically, STQ can be regarded as a special use of SRQ.

Thus, its resiliency against false-incrimination attack is thesame as that of SRQ. Nevertheless, due to the nature of top-k query, the injection of the falsified sensor reading fromthe compromised nodes will imply a significant query-resultdeviation. Nevertheless, due to the use of the redundancyproperty in resisting against false data injection, STQ isresilient against the collusion attacks and false-incriminationattacks even when the compromised nodes can inject bogusdata and can manipulate the proofs. Finally, because of itssimilarity to the SRQ scheme, STQ has the communicationoverhead the same as SRQ’s.

Impact of Sensor Node Compromise on SSQ.Recallthat two aforementioned unrealistic assumptions are made.Wefirst consider the resiliency against false-incriminationattacks.The simplest method is to introduce a parameterξ3 ≤ ξ1so thatM reports to the authority all the verification seeds,instead of the hash of them in original SSQ. After that, foreach group, if at leastξ3 out of ξ1 verification seeds canbe successfully verified, then the quasi-skyline data in thatgroup are considered complete. Hence, the threat of false-incrimination attacks will be mitigated because the adversaryis forced to send at leastξ1 − ξ3 + 1 false proofs, instead ofsingle one false proof. With even one verification seed froma specific group failed to be verified, we can conclude that atleast one compromised sensor node exists in that group.

Now, we consider collusion attacks. In SSQ, bothMand sensor nodes only know the quasi-skyline data. Recallthat quasi-skyline data are not necessarily the skyline data.Thus, even when there is more than one compromised sensornode in a group, the probability of successfully dropping theskyline data is actually small. More specifically, to drop theskyline data at a specified epoch,s should contain at leastξ3sensor nodes responsible for sending hash values in order tosuccessfully forge a proof of incomplete quasi-skyline data,and at the same time should be fortunate enough to selectgroups whose quasi-skyline data contain skyline data8.

Now, we consider both the collusion and false-incriminationattacks. To drop the skyline data, the only thingM can dois to drop the data of certain groups. Here, for simplicity, weconsider the case whereM drops the data of a fixed groupGη,η ∈ [1, µ], in which a sets of x sensor nodes is compromised.To prevent the detection of incomplete query-result,ξ3 out ofx

8Definitely, M can simply drop all the sensed data. Nevertheless, underthis option, it is forced to forge at leastµ(ξ1 − ξ3 + 1) proofs (µ =

√N in

our analysis), leading to high probability of being detected.

compromised sensor nodes should be the sensor nodes respon-sible for sending the proofs. The probability that at leastξ3 outof ξ1 nodes responsible for sending the proofs are containedin s is p1 =

∑ξ1j=ξ3

(ξ1j

)(|Gη|−ξ1x−j

)/(|Gη|x

). In other words,

this is equal to the probability that the compromised sensornodes can drop the quasi-skyline data without being detected.Quasi-skyline data, however, are not necessarily equivalent tothe skyline data. The probability that the quasi-skyline datadropped by compromised nodes indeed contain skyline datacan be represented asp2 =

(Y−pcY

|Gη|Y/N−pcY

)/(

Y|Gη|Y/N

), where

pc is the average ratio of the skyline data to all the sensed data.In short, this is equal to the probability that the operations per-formed by compromised nodes cause the loss of skyline data.As a whole, even if the adversary hasx compromised nodes inGη, the probability of successfully making skyline query-resultincomplete is merelyp1 · p2. The trends of such a probabilityare depicted in Fig. 7 under different parameter settings. Asshown in Fig. 7a,p1 · p2 is decreased with an increase ofN andY . This is because whenN andY become larger, itis more unlikely that the quasi-skyline data dropped by theadversary contains the skyline data. Nevertheless, as shown inFig. 7b,p1 ·p2 is increased with an increase of(ξ1− ξ3). Thisis because when(ξ1 − ξ3) becomes larger, it is more likelythat s can forge a proof of an incomplete quasi-skyline data.As a whole, from the false-incrimination attack point of view,the larger the(ξ1 − ξ3), the lower thep1 · p2, but from thecollusion attack point of view, the smaller the(ξ1 − ξ3), thelower thep1 · p2. This would be an optimization problem thatdeserves further studying. In addition, compared with originalSSQ, the additional communication cost incurred from themodified SSQ comes from the transmission of verificationseeds fromM to the authority. Thus, the communication costof the modified SSQ scheme remainsO(N

3

2 +Nd).

020

4060

80100

200

400

600

800

10000

0.2

0.4

0.6

0.8

1

YN

p 1⋅ p2

(a)

1 2 3 4 5 6 7

7

7.5

8

8.5

90

0.5

1

1.5

2

2.5

x 10−9

ξ3

ξ1

p 1⋅ p2

(b)

Fig. 7: The probability of successfully dropping skyline data in the cases that (a)ξ1 = 8 andξ3 = 4, and (b)N = 500 andY = 100.

Page 12: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

12

Now, the two unrealistic assumptions will be relaxed so thatthe compromised nodes can provide falsified sensor readingsand contaminate the proofs. The same as the top-k query,skyline query is vulnerable to the falsified data. That is,a compromised node injecting an falsified extreme sensorreading can gain the effect of deleting all the data sensed bythe other sensor nodes. Thus, the redundancy property is sillrequired to be applied on our SSQ scheme so as to detectthe false data injection. Because the use of the redundancyproperty in SSQ is also similar to its use in SRQ, we omitthe detailed description as well. In addition, the adversarymay compromise the sensor nodes near the innocentM sothat it can falsify all the verification seeds. Therefore, ourrecursive traceback mechanism is also required to be appliedon SSQ to secure the verification seeds. Because of these twoadditional changes in SSQ, the communication cost becomesO(N2 +Nd).

VI. CONCLUSION

We propose schemes for securing range query, top-k query,and skyline query, respectively. Two critical performancemetrics, detection probability and communication cost, areanalyzed. In particular, the performance of SRQ is superiorto all the prior works, while STQ and SSQ act as thefirst proposals for securing top-k query and skyline query,respectively, in tiered sensor networks. We also investigate thesecurity impact of collusion attacks and newly identified false-information attacks, and explore the resiliency of the proposedschemes against these two attacks.

REFERENCES

[1] H. Chan and A. Perrig. PIKE: Peer Intermediaries for Key Establishmentin Sensor Networks. inProceeding of IEEE Conference on ComputerCommunications (INFOCOM), 2005.

[2] H. Chan and A. Perrig. Efficient Security Primitives froma SecureAggregation Algorithm. inProceeding of ACM Conference on Computerand Communications Security (CCS), 2008.

[3] H. Chan, A. Perrig, and D. Song. Random Key Predistribution Schemesfor sensor networks. inProceeding of IEEE Symposium on Security andPrivacy (S&P), 2003.

[4] H. Chan, A. Perrig, and D. Song. Secure Hierarchical In-NetworkAggregation in Sensor Networks. inProceeding of ACM Conference onComputer and Communications Security (CCS), 2006.

[5] H. Chen, S. Zhou, and J. Guan. Towards Energy-Efficient SkylineMonitoring in Wireless Sensor Networks. inProceeding of EuropeanConference on Wireless Sensor Networks (EWSN), 2007.

[6] P. Desnoyers, D. Ganesan, H. Li, M. Li, and P. Shenoy. PRESTO: APredictive Storage Architecture for Sensor Networks. inProceeding ofWorkshop on Hot Topics in Operating Systems (HotOS), 05.

[7] L. Eschenauer and V. Gligor. A Key-management Scheme forDistributedsensor networks. inProceeding of ACM Conference on Computer andCommunications Security (CCS), 2002.

[8] H. Hacigumus, B. R. Iyer, C. Li, and S. Mehrotra, Executing SQL overencrypted data in the database service provider model. inProceedingof ACM International Conference on Management of Data (SIGMOD),2002.

[9] B. Hore, S. Mehrotra, and G. Tsudik, A privacy-preserving index forrange queries. inProceeding of International Conference on Very LargeData Bases (VLDB), 2004.

[10] C. -F. Huang and Y. -C. Tseng, The Coverage Problem in a WirelessSensor Network.ACM Mobile Networks and Applications (MONET), vol.10, no. 4, pp. 519-528, 2005.

[11] W. Liang, B. Chen, J. Yu. Energy-efficient skyline queryprocessing andmaintenance in sensor networks. inProceeding of ACM Conference onInformation and Knowledge Management (CIKM), 2008.

[12] X. Li, Y. Kim, R. Govindan, and W. Hong. Multi-dimensional rangequeries in sensor networks. in Proc.Proceeding of ACM Conference onEmbedded Networked Sensor Systems (SenSys), 2003.

[13] D. Liu, P. Ning. Multi-level µTESLA: broadcast authentication fordistributed sensor networks.ACM Transactions in Embedded ComputingSystems (TECS), vol. 3, no. 4, pp. 800-836, 2004.

[14] D. Liu, P. Ning, and W. Du. Attack-resistant location estimation insensor networks. inProceeding of ACM/IEEE International Conferenceon Information Processing in Sensor Networks (IPSN), 2005.

[15] G. Mathur, P. Desnoyers, D. Ganesan, and P. Shenoy. Ultra-low powerdata storage for sensor networks. inProceeding of ACM/IEEE Interna-tional Conference on Information Processing in Sensor Networks (IPSN),2006.

[16] S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. TAG: aTiny AGgregation Service for Ad-Hoc Sensor Networks. inProceeding ofUSENIX Symposium on Operating Systems Design and Implementation(OSDI), 2002.

[17] S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. TinyDB:An Acquisitional Query Processing System for Sensor Networks. ACMTransactions on Database Systems (TODS), vol. 30, no. 1, pp. 122-173,Mar. 2005.

[18] S. Meguerdichian, F. Koushanfar, M. Potkonjak, and M. B. Srivastava.Coverage Problems in Wireless Ad-hoc Sensor Networks. inProceedingof IEEE Conference on Computer Communications (INFOCOM), 2001.

[19] A. Perrig, R. Szewczyk, V. Wen, D. Culler, and J. D. Tygar. SPINS:Security Protocols for Sensor Networks. inProceeding of ACM Inter-national Conference on Mobile Computing and Networking (Mobicom),2001.

[20] P. Rogaway, M. Bellare, and J. Black. OCB: A Block-Cipher Mode ofOperation for Efficient Authenticated Encryption.ACM Transactions onInformation and System Security (TISSEC), vol. 6, no. 3, pp. 365-403,2003.

[21] RISE project. Available:http://www.cs.ucr.edu/˜rise/[22] S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Yin, and

F. Yu. Data-centric storage in sensornets with GHT, a geographic hashtable.Mobile Networks and Applications, vol. 8, no. 4, pp. 427-442, 2003.

[23] B. Sheng and Q. Li. Verifiable privacy-preserving rangequery in two-tiered sensor networks. inProceeding of IEEE Conference on ComputerCommunications (INFOCOM), 2008.

[24] B. Sheng, Q. Li and W. Mao. Data storage placement in sensor networks.in Proceeding of ACM International Symposium on Mobile Ad HocNetworking and Computing (MobiHoc), 2006.

[25] A. Seshadri, M. Luk, A. Perrig, L. V. Doorn, and P. Khosla. SCUBA:Secure Code Update By Attestation in Sensor Networks. inProceedingof ACM Workshop on Wireless Security (WiSe), 2006.

[26] M. Shaneck, K. Mahadevan, V. Kher, and Y. Kim. Remote Software-based Attestation for Wireless Sensors. inProceeding of European Work-shop on Security and Privacy in Ad-hoc and Sensor Networks (ESAS),2005

[27] K. Sun, P. Ning, C. Wang, A. Liu, and Y. Zhou. TinySeRSync:Secure and Resilient Time Synchronization in Wireless Sensor Networks.in Proceeding of ACM Conference on Computer and CommunicationsSecurity (CCS), 2006.

[28] A. Seshadri, A. Perrig, L. V. Doorn, and P. Khosla. SWATT: SoftWare-based ATTestation for Embedded Devices. inProceeding of IEEE Sym-posium on Security and Privacy (S&P), 2004.

[29] Stargate gateway (SPB400). Available:http://www.xbow.com[30] B. Sheng, C. C. Tan, Q. Li, and W. Mao. An Approximation Algorithm

for Data Storage Placement in Sensor Networks. inProceeding of WASA,2007.

[31] J. Shi, R. Zhang and Y. Zhang. Secure range queries in tiered sensor net-works. inProceeding of IEEE Conference on Computer Communications(INFOCOM), 2009.

[32] X. Wang, G. Xing, Y. Zhang, C. Lu, R. Pless, and C. Gill. IntegratedCoverage and Connectivity Configuration in Wireless SensorNetworks.in Proc.Proceeding of ACM Conference on Embedded Networked SensorSystems (SenSys), 2003.

[33] M. Wu, J. Xu, X. Tang, W.-C. Lee. Top-k Monitoring in Wireless SensorNetworks. IEEE Transactions on Knowledge and Data Engineering(TKDE), vol. 19, no. 6, pp. 962-976, 2007.

[34] Z. Yu and Y. Guan. A Dynamic En-route Scheme for Filtering False DataInjection in Wireless Sensor Networks. inProceeding of IEEE Conferenceon Computer Communications (INFOCOM), 2006.

Page 13: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

13

[35] L. Yu and J. Li. Grouping-based Resilient Statistical En-route Filteringfor Sensor Networks. inProceeding of IEEE Conference on ComputerCommunications (INFOCOM), 2009.

[36] C.-M. Yu, C.-S. Lu, and S.-Y. Kuo. A simple non-interactive pairwisekey establishment scheme in sensor networks. inProceeding of IEEEConference on Sensor, Mesh and Ad Hoc Communications and Networks(SECON), 2009.

[37] F. Ye, H. Luo, S. Lu, and L. Zhang. Statistical En-route Filteringof Injected False Data in Sensor Networks. inProceeding of IEEEConference on Computer Communications (INFOCOM), 2004.

[38] Y. Zhang, W. Liu, Y. Fang, and D. Wu. Secure localizationandauthentication in ultra-wideband sensor networks. inIEEE Journal onSelected Areas in Communications, Special Issue on UWB WirelessComms. - Theory and Applications, vol. 24, no. 4, pp. 829-835, 2006.

[39] S. Zhu, S. Setia, S. Jajodia, and P. Ning. An InterleavedHop-by-HopAuthentication Scheme for Filtering False Data in Sensor Networks. inProceeding of IEEE Symposium on Security and Privacy (S&P), 2004.

[40] R. Zhang, J. Shi, and Y. Zhang. Secure multidimensionalrange queriesin sensor networks. inProceeding of ACM International Symposium onMobile Ad Hoc Networking and Computing (MobiHoc), 2009.

Page 14: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

arX

iv:0

911.

4238

v2 [

cs.N

I] 1

6 D

ec 2

009

Secure Multidimensional Queries in Tiered SensorNetworks

Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§†Institute of Information Science, Academia Sinica, Taipei, Taiwan

§Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan

Abstract—In this paper, aiming at securing range query, top-kquery, and skyline query in tiered sensor networks, we proposethe Secure Range Query (SRQ), Secure Top-k Query (STQ), andSecure Skyline Query (SSQ) schemes, respectively. In particular,SRQ, by using our proposedprime aggregation technique, hasthe lowest communication overhead among prior works, whileSTQ and SSQ, to our knowledge, are the first proposals intiered sensor networks for securing top-k and skyline queries,respectively. Moreover, the relatively unexplored issue of thesecurity impact of sensor node compromises on multidimensionalqueries is studied; two attacks incurred from the sensor nodecompromises,collusion attack and false-incrimination attack, areinvestigated in this paper. After developing a novel techniquecalled subtree sampling, we also explore methods of efficientlymitigating the threat of sensor node compromises. Performanceanalyses regarding the probability for detecting incompletequery-results and communication cost of the proposed schemesare also studied.

I. I NTRODUCTION

Tiered Sensor Networks.Sensor networks are expected tobe deployed on some harsh or hostile regions for data collec-tion or environment monitoring. Since there is the possibilityof no stable connection between the authority and the network,in-network storage is necessary for caching or storing thedata sensed by sensor nodes. A straightforward method is toattach external storage to each node, but this is economicallyinfeasible. Therefore, various data storage models for sensornetworks have been studied in the literature. In [6], [20], anotion of tiered sensor networks was discussed by introducingan intermediate tier between the authority and the sensornodes. The purpose of this tier is to cache the sensed dataso that the authority can efficiently retrieve the cache data,avoiding unnecessary communication with sensor nodes.

The network model considered in this paper is the sameas the ones in [6], [20]. More specifically, some storage-abundant nodes, calledstorage nodes, which are equipped withseveral gigabytes of NAND flash storage [22], are deployedas the intermediate tier for data archival and query response.In practice, some currently available sensor nodes such asRISE [19] and StarGate [27] can work as the storage nodes.The performance of sensor networks wherein external flashmemory is attached to the sensor nodes was also studied in[14]. In addition, some theoretical issues concerning the tieredsensor networks, such as the optimal storage node placement,were also studied in [22], [28]. In fact, such a two-tierednetwork architecture has been demonstrated to be useful in

increasing network capacity and scalability, reducing networkmanagement complexity, and prolonging network lifetime.

Multidimensional Queries. Although a large amount ofsensed data can be stored in storage nodes, the authoritymight be interested in only some portions of them. To thisend, the authority issues proper queries to retrieve the desiredportion of sensed data. Note that, when the sensed data havemultiple attributes, the query could be multidimensional.Wehave observed that range query, top-k query, and skyline queryare the most commonly used queries. Range query [11], [16],which could be useful for correlating events occurring withinthe network, is used to retrieve sensed data whose attributesare individually within a specified range. After mapping thesensed data to a ranking value, top-k query [30], which can beused to extract or observe the extreme phenomenon, is used toretrieve the sensed data whose ranking values are among thefirst k priority. Skyline query [5], [10], due to its promisingapplication in multi-criteria decision making, is also useful andimportant in environment monitoring, industry control,etc.

Nonetheless, in the tiered network model, the storage nodesbecome the targets that are easily compromised because oftheir significant roles in responding to queries. For example,the adversary can eavesdrop on the communications amongnodes or compromise the storage nodes to obtain the senseddata, resulting in the breach ofdata confidentiality. After thecompromise of storage nodes, the adversary can also returnfalsified query-results to the authority, leading to the breachof query-result authenticity. Even more, the compromisedstorage nodes can causequery-result incompleteness, creatingan incomplete query-result for the authority by dropping someportions of the query-result.

Related Work. Secure range queries in tiered sensornetworks have been studied only in [21], [29], [34]. Dataconfidentiality and query-result authenticity can be preservedvery well in [21], [29], [34] owing to the use of the bucketscheme [8], [9]. Unfortunately, encoding approach [21] is onlysuitable for the one-dimensional query scenario in the sensornetworks for environment monitoring purposes. On the otherhand, crosscheck approaches [29], [34] can be applied onsensor networks for event-driven purposes at the expense ofthereduced probability for detecting query-result incompleteness.The security issues incurred from the compromise of storagenodes have been addressed in [21], [29], [34]. The impact ofcollusion attacksdefined as the collusion among compromisedsensor nodes and compromised storage nodes, however, was

Page 15: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

2

only discussed in [34], wherein only a naive method wasproposed as a countermeasure. When the compromised sensornodes are taken into account, a Denial-of-Service attack, calledfalse-incrimination attack, not addressed in the literature, canbe extremely harmful. In such an attack, the compromisedsensor nodes subvert the functionality of the secure queryschemes by simply claiming that their sensed data have beendropped by the storage nodes. After that, the innocent storagenodes will be considered compromised and will be revoked bythe authority. It should be noted that all the previous solutionssuffer from false-incrimination attacks.

Contribution. Our major contributions are:

• The Secure Range Query (SRQ) scheme is proposed tosecure the range query in tiered networks (Sec. III-A).By taking advantage of our proposedprime aggregationtechnique for securely transmitting the amount of datain specified buckets, SRQ has the lowest communicationcost among prior works in all scenarios (environmentmonitoring and event detection purposes), while preserv-ing the probability for detecting incomplete query-resultsclose to1. It should be noted that although incorporatingbucket scheme [8], [9] (described in Sec. III-A1) in theprotocol design [21], [29], [34] is not new, the noveltyof our method lies on the use ofprime aggregationin reducing the overhead and guaranteeing query-resultcompleteness.

• For the first time in the literature, the issues of securingtop-k and skyline queries in tiered networks are studied(Secs. III-B and III-C). Our solutions to these two issuesare Secure Top-k Query (STQ) and Secure Skyline Query(SSQ), respectively. The former is built upon the pro-posed SRQ scheme to detect query-result completeness,while the efficiency of SSQ is based on our proposedgrouping technique.

• The security impact of sensor compromises is studied(Sec. V); collusion attackis formally addressed, and anew Denial-of-Service attack,false-incrimination attack,which can thwart the security purpose in prior works,is first identified in our paper. The resiliency of SRQ,STQ, and SSQ against these two attacks is investigated.With a novel technique calledsubtree sampling, someminor modifications are introduced for SRQ and STQas countermeasures to these two attacks. Moreover, thecompromised nodes can even be efficiently identified andbe further attested [23], [24], [26].

II. SYSTEM MODEL

In general, the models used in this paper are very similarto those in [6], [20], [21], [29], [34].

Network Model. As shown in Fig. 1, the sensor networkconsidered in this paper is composed of a large number ofresource-constrained sensor nodes and a few so-calledstoragenodes. Storage nodes are assumed to be storage-abundant andmay be compromised. In addition, in certain cases, storagenodes could also have abundant resources in energy, computa-tion, and communication. The storage nodes can communicate

with the authority via direct or multi-hop communications.Thenetwork is connected such that, for two arbitrary nodes, at leastone path connecting them can be found.

A cell is composed of a storage node and a number ofsensor nodes. In a cell, sensor nodes could be far awayfrom the associated storage node so that they can com-municate with each other only through multi-hop commu-nication. For example, in Fig. 1, without the relay of thegray node, the black node cannot reach the storage node.

Query

Query

Query

Reply

Reply

Reply The Authority

Storage Node

SensorNode

Cell

Cell-region

Fig. 1: A tiered sensor network.

The nodes in the networkhave synchronized clocks[25] and the time is dividedinto epochs. As in [21],[22], [29], [34], each nodeis assumed to be aware ofthe geographic position itlocates [13], [33] so thatthe association between thesensor node and storagenode can be established. Asa matter of fact, informa-tion about the time and ge-ographic position is indis-pensable for most sensornetwork applications.

For each cell, aggregation is assumed to be performedover an aggregation tree rooted at the storage node. Sincethe optimization of the aggregation tree structure is out ofthe scope of this paper, we adopt the method described inTAG [15] to construct an aggregation tree. We follow theconventional assumption that the topology of the aggregationtree is known by the authority [2], [4]. Similar to sensor nodes,storage nodes also perform the sensing task. Each sensor nodesenses the data and temporarily stores the sensed data in itslocal memory within an epoch. At the end of each epoch,the sensor nodes in a cell report the sensed data stored inlocal memory to the associated storage node. Throughout thispaper, we focus on a cellC, composed ofN−1 sensor nodes,{si}N−1

i=1 , and a storage nodeM.Security Model. We consider the adversary who can com-

promise an arbitrary number of storage nodes. After nodecompromises, all the information stored in the compromisedstorage nodes will be exposed to the adversary. The goalof the adversary is to breach at least one of the following:data confidentiality, query-result authenticity, and query-resultcompleteness. We temporarily do not consider the compromiseof sensor nodes in describing SRQ, STQ, and SSQ in Sec. III.The impact of sensor node compromise on the security breach,however, will be explored in Sec. V. Many security issuesin sensor networks, such as key management [3], [7], [31],broadcast authentication [12], [17], and secure localization[13], [33], have been studied in the literature. This paper fo-cuses on securing multidimensional queries that are relativelyunexplored in the literature, while the protocol design of theaforementioned issues are beyond the scope of this paper.

Query Model. The sensed data can be represented as ad-

Page 16: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

3

dimensional tuple,(A1, A2, . . . , Ad), whereAg, ∀g ∈ [1, d],denotes theg-th attribute. The authority may issue a properd-dimensional query to retrieve the desired portion of data storedin storage nodes. Three types of queries, including range query,top-k query, and skyline query, are considered in this paper.For range query, its form, issued by the authority, is expressedas〈C, t, l1, h1, . . . , ld, hd〉, which means that the sensed data tobe reported to the authority should be generated by the nodesin cell C at epocht, and theirg-th attributes,Ag ’s, should bewithin the range of[lg, hg], g ∈ [1, d]. Top-k query is usuallyassociated with a scalar (linear)ranking function. With rankingfunction,R, the sensed data, even if it is multidimensional, canbe individually mapped to a one-dimensionalranking value.The top-k query issued by the authority is in the form of〈C, t, R, k〉. As the first attempt to achieve secure top-k query,the goal of top-k query in this paper is simply assumed toobtain the sensed data generated by the nodes in cellC at epocht with the first k smallest ranking values. For skyline query,the desiredskyline dataare defined as those notdominatedbyany other data. Assuming that smaller values are preferabletolarge ones for all attributes, for a set ofd-dimensional data, adatumci dominatesanother datumcj if both the conditions,Ag(ci) ≤ Ag(cj), ∀g ∈ [1, d], andAg(ci) < Ag(cj), ∃g ∈[1, d], whereAg(ci) denotes theg-th attribute value of thedatumci, hold. Hence, the form of the skyline query issuedby the authority is given as〈C, t〉, which is used to retrievethe skyline data generated in cellC at epocht.

III. SECURING MULTIDIMENSIONAL QUERIES

In this section, aiming at securing range query, top-k query,and skyline query, we propose the SRQ (Sec. III-A), STQ(Sec. III-B), and SSQ (Sec. III-C) schemes, respectively. Notethat though SRQ, STQ, and SSQ use the bucket scheme [8],[9], the novelty of them is due to their design in efficientlydetecting the incomplete query-result (described later).

A. Securing Range Queries (SRQ)

Our proposed SRQ scheme consists of a confidentiality-preserving reporting phase (Sec. III-A1) that can simultane-ously prevent the adversary from accessing data stored inthe storage nodes, authenticate the query results, and ensureefficient multidimensional query processing, and a query-resultcompleteness verification phase (Sec. III-A2) for guaranteeingthe completeness of query-results.

1) Confidentiality-preserving reporting:Data encryption isa straightforward and common method of ensuring data confi-dentiality against a compromised storage node. Moreover, wehope that even when the adversary compromises the storagenode, the previously stored information should not be exposedto the adversary. To this end, the keys used in encryptionshould be selected from a one-way hash chain. In particular,assume that a keyKi,0 is initially stored in sensor nodesi.At the beginning of epocht, the keyKi,t, which is used onlywithin epocht, is calculated ashash(Ki,t−1), wherehash(·)is a hash function, andKi,t−1 is dropped. Suppose that sensornodesi has sensed dataD at epocht. One method for storing

D in the storage nodeM while preserving the privacy is tosend{D}Ki,t

, which denotes the encryption ofD with thekeyKi,t. With this method, when an OCB-like authenticatedencryption primitive [18] is exploited, the authenticity of Dcan be guaranteed. At the same time,D will not be knownby the adversary during message forwarding and even afterthe compromise of the storage node at epocht because theadversary cannot recover the keys used in the time beforeepocht. Nevertheless, no query can be answered byM ifonly encrypted data is stored inM. Hence, thebucket schemeproposed in [8], [9], which uses the encryption keys generatedvia a one-way hash chain, is used in the SRQ scheme.

In the bucket scheme, the domain of each attributeAg,∀g ∈ [1, d], is assumed to be known in advance, and isdivided into wg ≥ 1 consecutive non-overlapping intervalssequentially indexed from1 to wg, under a publicly knownpartitioning rule. For ease of representation, in the following,we assume thatwg = w, ∀g ∈ [1, d]. A d-dimensional bucketis defined as a tuple,(v1, v2, . . . , vd) (hereafter calledbucketID), wherevg ∈ [1, w], g ∈ [1, d]. The sensor nodesi, whenit has sensed data at epocht, sends toM the correspondingbucket IDs, which are constructed by mapping each attributeof the sensed data to the proper interval index, and the senseddata encrypted by the keyKi,t. For example, whensi hassensed data(1, 3), (2, 4), and(2, 11) at epocht, the messagetransmitted to the storage node at the end of epocht is〈i, t, (1, 1), {(1, 3), (2, 4)}Ki,t

, (1, 2), {(2, 11)}Ki,t〉, assuming

thatA1, A2 ∈ [1, 20], w = 2, and each interval length, set at10, is the same.

Let V be the set of all possible bucket IDs. Assume thatthere are on averageY andY/N data generated in a cell andin a node, respectively, at epocht. Assume that,Di,t,V is aset containing all the data within the bucketV ∈ V sensedby si at epocht. The messages sent fromsi to M at theend of epocht can be abstracted as〈i, t, Jσ, {Di,t,Jσ

}Ki,t〉,

whereJσ ∈ V , Jσ 6= Jσ′ , 1 ≤ σ, σ′ ≤ Y/N if there are somedata sensed bysi within epocht. Note thatsi sends nothingto M if Di,t,Jσ

’s, ∀Jσ ∈ V , are empty. After that,M cananswer the range query according to the information revealedby the bucket IDs. Assume thatlg andhg are located withintheαg-th andβg-th intervals, respectively, whereαg ≤ βg, αg,βg ∈ [1, w], andg ∈ [1, d]. The encrypted data falling into thebuckets in the setA = {(ρ1, . . . , ρd)|αg ≤ ρg ≤ βg, g ∈[1, d]} are reported to the authority. In other words, oncereceiving the range query,M first translates the informationl1, h1, . . . , ld, hd into the proper bucket IDs and then repliesall the encrypted data falling into the buckets1 in A.

Nevertheless, in tiered sensor networks, even when theoriginal bucket scheme is used,M could still maliciouslydrop some encrypted data and only report part of the resultsto the authority, resulting in an incomplete query-result.In

1There is a tradeoff between the communication cost and confidentialityin terms of bucket sizes because larger bucket size implies higher dataconfidentiality and higher communication cost due to more superfluous databeing returned to the authority. The design of optimal bucketing strategies isbeyond the scope of this paper, and we refer to [8], [9] for more details.

Page 17: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

4

the following, we will describe an extended bucket scheme,which incorporates the prime aggregation strategy into theoriginal bucket scheme, to detect the incomplete reply in acommunication-efficient manner.

2) Query-Result Completeness Verification:With primeaggregationtechnique, SRQ detects an incomplete reply bytaking advantage of aggregation for counting the amount ofsensed data falling into specified buckets. Together with ahash for verification purpose, the count forms a so-calledproof in detecting an incomplete reply. The storage nodeMis required to provide the proof to the authority at the epochspecified in the query so that the authority can use the proofto verify the completeness of received query-results. Since inour design all the sub-proofs generated by the nodes can beaggregated to yield the final proof, the communication cost canbe significantly reduced. The details are described as follows.

Assume that an aggregation tree [15] has been constructedafter sensor deployment. Recall that the domain of attributeAgis divided intow intervals. Before the sensor deployment, aset{pVi , p∅i |∀i ∈ [1, N−1], V ∈ V} of (wd+1)(N−1) primenumbers is selected by the authority such thatpVi 6= pV

i′ andp∅i 6= p∅i′ if i 6= i′ or V 6= V ′. Then, the set{pVi , p∅i |V ∈ V} ofwd+1 prime numbers, called the set ofbucket primesof si, isstored in each sensor nodesi. In addition, a set{kVi,0, k∅i,0|V ∈V} of wd + 1 keys is selected by the authority and is storedin each sensor nodesi initially. For fixed i and t, the set of{kVi,t, k∅i,0|V ∈ V} is called the set ofbucket keysof si at epocht. Bucket primes could be publicly-known, while bucket keysshould be kept secret. Each sensor nodesi, at the beginning ofepocht, calculateskVi,t = hash(kVi,t−1) and then dropskVi,t−1,∀V ∈ V . In addition,si also calculatesk∅i,t = hash(k∅i,t−1)

and then dropsk∅i,t−1.Recall that each nodesi on average hasY/N sensed data

at epoch t, and assume that the set ofY/N bucket IDsassociated with theseY/N sensed data isBi,t = {vi,t,σ|σ =1, . . . , Y/N}, which could be a multiset. Then, according toits sensed data,si calculatesHi,t = hashKi,t

(∑Y/N

σ=1 kvi,t,σ

i,t ),where hashK(·) denotes the keyed hash function with keyK, if it has sensed data, andHi,t = hashKi,t

(k∅i,t) oth-

erwise. Moreover,si computesPi,t =∏Y/Nσ=1 p

vi,t,σ

i if ithas sensed data, andPi,t = p∅i otherwise. Moreover, oncereceiving〈jρ, t, Ejρ,t,Bjρ,t,Hjρ,t,Pjρ,t〉, jρ ∈ [1, N−1], ∀ρ ∈[1, χ] from its χ children, sj1 , . . . , sjχ , si calculatesEi,t =(⋃χρ=1 Ejρ,t)

⋃Ei,t, whereEi,t denotes the set of encrypted

data sensed bysi at epocht, andBi,t = (⋃χρ=1 Bjρ,t)

⋃Bi,t,

whereBi,t denotes the set of bucket IDs ofEi,t. In addition,si also calculatesHi,t = hashKi,t

(∑χ

ρ=1 Hjρ,t + Hi,t) andPi,t =

∏χρ=1 Pjρ,t · Pi,t, ρ ∈ [1, χ]. Finally, si reports

〈i, t, Ei,t,Bi,t,Hi,t,Pi,t〉 to its parent node on the aggregationtree. Note that, ifsi is a leaf node on the aggregation tree,then we assume that it receives〈∅, ∅, ∅, ∅, 0, 1〉.

Assume that the set{pVM, p∅M|V ∈ V} of wd + 1 primenumbers stored inM are all different from those stored insensor nodes, and the set{kVM,0, k

∅M,0|V ∈ V} of wd + 1

bucket keys are selected by the authority and stored inM. M

computeskVM,t = hash(kVM,t−1) and dropskVM,t−1 at epocht. In addition,M also computesk∅M,t = hash(k∅M,t−1) anddropsk∅M,t−1 at epocht. For the storage nodeM, it can alsocalculateEM,t, BM,t, HM,t, andPM,t according to the itsown sensed data at epocht. In fact, the proceduresM needs toperform after messages are received from the child nodes arethe same as the ones performed by the sensor nodes. Actingas the root of the aggregation tree, however,M keeps theaggregated results, which are denoted asEM,t,BM,t,HM,t

andPM,t, respectively, in its local storage and waits for thequery issued by the authority. Note thatPM,t can be thoughtof as a compact summary of the sensed data of the wholenetwork and can be very useful for the authority in checkingthe completeness of the query-result, whileHM,t can be usedby the authority to verify the authenticity ofPM,t.

Assume that a range query〈C, t, l1, h1, l2, h2, . . . , ld, hd〉is issued by the authority. The encrypted data falling intothe buckets in the setA, along with the proof composedof HM,t and PM,t, are sent to the authority. OncePM,t

is received, the authority immediately performs the primefactor decomposition ofPM,t. Due to the construction of{pVi , pVM, p∅i , p

∅M|i ∈ [1, N − 1], V ∈ V}, which guarantees

that the bucket primes are all distinct, after the prime factordecomposition ofPM,t, the authority can be aware of whichnode contributes which data within specified buckets. As aresult, the authority can know which keys should be used toverify the authenticity and integrity ofHM,t. More specifi-cally, assume thatPM,t = (p1)

a1 · · · (pγ)aγ , a1, . . . , aγ > 0,γ ≥ 0, and thatp1, . . . , pγ are distinct prime numbers. Fromthe construction ofPM,t, we know that(pk)

ak , for k ∈ [1, γ],

is equal to(pk′′

k′ )ak , for k′ ∈ [1, N − 1] and k′′ ∈ V . From

the procedure performed by each node, it can also be knownthat the appearance of(pk)

ak = (pk

′′

k′ )ak in PM,t means that

at epocht the sensor nodesk′ producesak data falling intobucketk′′, contributing the bucket keykk

′′

k′,t in totalak times inHM,t. Here, the sensor nodesk′ producing the data falling intothe bucket∅ means thatsk′ senses nothing. Thus, we can inferthe total amount of data falling into specified buckets at epocht. Recall that the authority is aware of the topology of theaggregation tree. Thus, after the prime factor decompositionof PM,t, the authority can reconstructHM,t according to thederivedak’s andpk’s by its own effort, because it knowsKi,t

andkVi,t, ∀i ∈ {1, . . . , N−1,M}, ∀t ≥ 0, ∀V ∈ V . Therefore,we know that theHM,t reconstructed by the authority is equalto the receivedHM,t if and only if the receivedPM,t areconsidered authentic. When the verification ofPM,t fails, Mis considered compromised. When the verification ofPM,t issuccessful, the authority decrypts all the received encryptions,and checks whether the number of query-results falling intothe buckets inA matches those indicated byPM,t. If andonly if there are matches in all the buckets inA, the receivedquery-results are considered complete.

B. Securing Top-k Queries (STQ)

Basically, the proposed STQ scheme for securing top-kquery is built upon SRQ in that both confidentiality-preserving

Page 18: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

5

reporting and query-result completeness verification phases inSRQ are exploited. In particular, based on the proof generatedin SRQ, since it can know which buckets contain data, theauthority can also utilize such information to examine thecompleteness of query-results of top-k query. In other words,top-k query can be secured by the use of the SRQ scheme.Because of the similarity between the SRQ and STQ schemes,some details of the STQ scheme will be omitted in thefollowing description.

Here, a bucket data setis defined to be composed ofbucket IDs. We use ad-dimensional tuple,(v1, . . . , vd), wherevg ∈ [1, w], to represent the bucket IDs in a bucket dataset. With this representation, we can use the ranking func-tion, R, to calculate the ranking value of each bucket ID.Assume that theν-th interval in theg-th attribute containsthe values in[uℓg,ν , u

hg,ν]. The ranking value of the bucket ID,

(v1, . . . , vd), is evaluated asR(uℓ1,v1

+uh1,v1

2 , . . . ,uℓd,vd

+uhd,vd

2 ),

where thed-dimensional tuple,(uℓ1,v1

+uh1,v1

2 , . . . ,uℓd,vd

+uhd,vd

2 ),whose individual entry is simply averaged over the minimumand maximum values in each interval, acts as the representativeof the bucket(v1, . . . , vd) for simplicity.

Recall that we simply assume that the data with the firstk smallest ranking values are desired. The general form ofthe message sent fromsi to its parent node at the end ofepoch t is 〈i, t, Ei,t,Bi,t,Hi,t,Pi,t〉, where Ei,t, Bi,t, Hi,t,andPi,t are the same as those defined in SRQ. Assume thatζ1, . . . , ζk ∈ V arek bucket IDs in the bucket data set whoseranking values are among the firstk smallest ones. Accordingto BM,t, M can calculate the ranking values of bucket IDsin BM,t and, therefore, knowsζ1, . . . , ζk. To answer a top-kquery, 〈C, t, k〉, the storage nodeM reports the bucket IDs,ζ1, . . . , ζk, and their corresponding encrypted data, along withHM,t andPM,t, to the authority because it can be known thatthe data with the firstk smallest ranking values must be withinζ1, . . . , ζk. After receiving the query-result, the authority canfirst verify the authenticity ofPi,t by usingHi,t, and verify thequery-result completeness by usingPi,t. Note that both of theabove verifications can be performed in a way similar to theone described in Sec. III-A. Actually, after receivingPi,t, theauthority knows which buckets contain data and the amount ofdata. Hence, knowinguℓg,ν anduhg,ν , ∀g ∈ [1, d], ∀ν ∈ [1, w],the authority can also obtainζ1, . . . , ζk. Afterwards, what theauthority should do is to check if it receives the bucket IDs,ζ1, . . . , ζk, and if the number of data in bucketζg′ , g′ ∈ [1, k],is consistent with the number indicated byPi,t. If and onlyif these two verifications pass, the authority considers thereceived query-result to be complete and extracts the top-kresult from the encrypted data sent fromM.

C. Securing Skyline Queries (SSQ)

To support secure skyline query in sensor networks, in thefollowing we first present a naive approach as baseline, andthen propose an advanced approach that employs a groupingtechnique for simultaneously reducing the computation andcommunication cost.

1) Baseline scheme:To ensure the data confidentiality andauthenticity, as in the SRQ and STQ schemes, the sensed dataare also encrypted by using the bucket scheme mentioned inSec. III-A1. At the end of epocht, eachsi broadcasts itssensor ID, all the sensed data encrypted by keyKi,t, andthe proper bucket IDs to all the nodes within the same cell.Then, according to the broadcast messages, each sensor nodesi at epocht has a bucket data set composed of the bucketIDs extracted from broadcast messages and the bucket IDscorresponding to its own sensed data. In fact, the bucket datasets constructed by different nodes in the same cell at epocht will be the same. Treating these bucket IDs as data points,si can find the setΦt of skyline bucketsthat are defined asthe bucket IDs not dominated by the other bucket IDs. Here,since the bucket IDs are represented also byd-dimensionaltuples, the notion of domination is the same as the one definedin the query model of Sec. II. Definequasi-skyline dataasthe set of data falling into the skyline buckets. It can beobserved that the set of skyline data must be a subset of quasi-skyline data. After doing so, each node can locally find2 thequasi-skyline data, although there could be the cases wheresuperfluous data are also included. At the end of epocht, ifKi,t is smaller than a pre-determined threshold, thensi sendsits sensor ID andhashKi,t

(Φt) to M. Here,hashKi,t(Φt)

works as a kind of proof so that it can be used for checkingthe query-result completeness. Note that onlyhashKi,t

(Φt)needs to be transmitted toM becauseM also receives all theencrypted data and bucket IDs, and can calculate the quasi-skyline data by itself after message broadcasting.

To answer the skyline query〈C, t〉 the storage node reportsthe quasi-skyline data calculated at epocht and the hash valuesreceived at epocht to the the authority. Since the authorityknows the threshold andKi,0 ∀i ∈ [1, N ], it will expect toreceive hash values from a set of sensor nodes whose keys aresmaller than the threshold. Unfortunately, due to the network-wide broadcast, this baseline scheme works but is inefficientin terms of communication overhead. Hence, an efficient SSQscheme exploiting a grouping strategy is proposed as follows.

2) Grouping technique:Like the baseline scheme, thebucket scheme mentioned in Sec. III-A1 is also used. Givena data setO and a collection{O1, · · · ,Oτ} of subsets ofOsatisfying

⋃τj=1 Oj = O and Oj ∩ Oj′ = ∅, ∀j 6= j′ for

τ ≥ 1. An observation is that the skyline data ofO must bea subset of the union of the skyline data ofOj , ∀j ∈ [1, τ ].Thus, the key idea of our proposed SSQ scheme is to partitionthe sensor nodes in a cell into groups so that broadcasting canbe limited within a group, resulting in reduced computationand communication costs. In what follows, the SSQ schemewill be described in more detail.

The sensor nodes in a cell are divided intoµ disjointgroups,Gη, ∀η ∈ [1, µ], each of which is composed of|Gη|

2The design of an algorithm for efficiently finding the skylinedata given adata set is a research topic, but is beyond the scope of this paper. We considerthe naive data-wise comparison-based algorithm with running timeO(n2) ifthe size of the data set isn, but each node, in fact, can implement an arbitraryalgorithm for finding skyline data in our setting.

Page 19: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

6

sensor nodes. The grouping needs to be performed only onceright after the sensor deployment. Note that each group isformed by nearby sensor nodes and the grouping procedureis independent of the structure of the aggregation tree withoutaffecting SRQ and STQ. Letcell-regionbe part of the sensingregion monitored by one specified cell. For example, with theassumption that the shape of each cell-region is approximatelya square, as shown in Fig. 1, and sensor nodes with uniformdeployment are considered, the grouping can be achieved bysimply dividing a cell-region intoµ (=

√N ) sub-cell-regions.

The sensor nodes in the same sub-cell-region form a group.Note that the square cell-region is assumed here for ease ofexplanation, but is not necessary for the grouping procedures3.

At the end of epocht, each sensor nodesi broadcasts itssensor ID, itsorder seedθi,t = hashKi,t

(i), all the sensed dataencrypted with the keyKi,t, and the proper bucket IDs to allthe nodes within the same group. After doing so, as in baselinescheme, the sensor nodes in the same group can locally findthe quasi-skyline data from the bucket data set whose entriesare generated by the sensor nodes in the same group. LetΦη,tbe the set of skyline bucket IDs andφη,t their correspondingquasi-skyline data encrypted by the sensor nodes in groupηusing proper keys at epocht. At the end of epocht, if θi,tis among the firstξ1 smallest ones in the set of order seedsin groupη at epocht, whereξ1 is a pre-determined thresholdknown by each node, thensi reportsΦη,t, φη,t, its verificationseedhashKi,t

(Φη,t), and the IDs of sensor nodes generatingφη,t to M. In fact, ξ1 = 1 is sufficient for the verificationpurpose.ξ1, however, is also related to the resiliency againstsensor node compromises. Thus, we still keepξ1 as a variableand defer the explanation of the purpose ofξ1 to Sec. V.Here, the purpose of verification seeds is that the completenessof quasi-skyline data can be guaranteed by exactlyξ1 sensornodes for each group, while the purpose of order seed is toguarantee that at each epoch exactlyξ1 sensor nodes will sendthe verification seeds as the proofs.

To answer a skyline query,〈C, t〉, the storage nodeMreports a hash of all the received verification seeds,ht =hash(||i∈Γt

hashKi,t(Φi,t)), where || denotes the bit-string

concatenation andΓt is the set of sensor nodes responsiblefor sending a hash value toM in each group at epocht, theset of skyline bucket IDs, and their corresponding encrypteddata received at epocht to the authority. Since it knows thethresholdξ1 andKi,t, ∀i ∈ {1, . . . , N − 1,M}, the authoritywill expect to receive a particular hash value fromM. Ifand only if the hash sent from theM matches the hash ofthe verification seeds calculated according to the knowledgeof Γt andKi,t by the authority itself, the received data areconsidered complete, and contain the skyline data.

3For example, the authority knowing the position of each nodeor the use ofclustering algorithms can also divide nodes into groups. Ingeneral, after thelocalization, each sensor node can join the proper group possibly accordingto its geographic position when the grouping information such as the sizes ofsensing region and cell-region are preloaded in sensor nodes.

IV. PERFORMANCEEVALUATION

We will focus on analyzing the critical issue of detectingan incomplete query-result in tiered networks. In this section,the detection probability and communication cost of query-result completeness verification in the proposed schemes willbe analyzed. It is assumed that the number of hops betweenM and each sensor node is

√n for a collection ofn uniformly

deployed nodes [1]. In this section, both detection probabilityand communication cost are discussed at a fixed epocht.

As the communication cost of encoding approach [21]grows exponentially with the number of attributes, and somecrosscheck approaches [29], [34] have relatively low detectionprobability, in the following, we compare SRQ with onlyhybrid crosscheck[34], which achieves the best balance be-tween the detection probability and communication cost in theliterature. Note that, the parameter setting required in hybridcrosscheck is the same as that listed in [34].

A. Detection Probability

The detection probability is defined as the probability thatthe compromised storage nodeM is detected if it returnsan incomplete query-result. With the fact that the larger theportion of query-resultM drops, higher the probability thatthe authority detects it, we consider the worst case that onlyone bucket and its corresponding data in the query-result aredropped byM and the number of data sensed by a node iseither0 or 1 as the lower bound of detection probability.

1) Detection probability for SRQ and STQ:To return anincomplete query-result without being detected by the author-ity, M should create a proof,〈HM,t, PM,t〉, correspondingto the incomplete query-result. Since bucket primes can beknown by the adversary,PM,t can be easily constructed.Nevertheless,HM,t cannot be constructed, since the bucketkeys of sensor nodes generating the bucket dropped byM arenot known by the adversary. Therefore, only two options canbe chosen by the adversary. First, the adversary can directlyguess to obtainHM,t, with probability being2−ℓh , whereℓhis the number of bits output by a keyed hash function. Thisimplies that the detection probabilityPSRQdet,1 is 1 − 2−ℓh forthe first case. Second, knowing the aggregation tree topology,the adversary can also follow the rule of SRQ to construct theHM,t without considering the bucket key of dropped bucket.Assume that the probability that a sensor node has senseddata isδ. The size of the bucket key pool can be derived as(N−2)(2δ+1−δ)+2 = Nδ+N−2δ. Thus, the probabilityfor the adversary to guess successfully is2−ℓk(Nδ+N−2δ),where ℓk is the number of bits of a key, leading to thedetection probabilityPSRQdet,2 is 1 − 2−ℓk(Nδ+N−2δ) for the

second case. Overall, the final detection probability,PSRQdet ,

is min{PSRQdet,1,PSRQdet,2}. On the other hand, as stated in Sec.

III-B, the STQ scheme is built upon the SRQ scheme. Thus,the detection probability,PSTQdet , will be the same asPSRQdet .As Fig. 2 depicts, the detection probability of SRQ is closeto 1 in any case. However, hybrid crosscheck is effective onlywhen a few sensed data are generated in the network. Such

Page 20: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

7

a performance difference can be attributed to the fact that thesensed data in the network are securely and deterministicallysummarized in the proof of SRQ but they are probabilisticallysummarized in hybrid crosscheck.

0500

10001500

2000

0

5

10

15

200.9992

0.9994

0.9996

0.9998

1

Hybrid Crosscheck

SRQ

Det

ectio

n pr

obab

ility

Nd

(a)

0500

10001500

2000

0

5

10

15

200.4

0.5

0.6

0.7

0.8

0.9

1

Hybrid Crosscheck

SRQ

Det

ectio

n pr

obab

ility

Nd

(b)

Fig. 2: The detection probability of SRQ and hybrid crosscheck in the cases that (a)Y = 100 and (b)Y = 5000.

2) Detection probability for SSQ:To return an incom-plete query-result without being detected by the authority,M should forge a proof,i.e., a hashht of all the receivedverification seeds, corresponding to the incomplete query-result. Therefore, only two options can be chosen by theadversary. First, the adversary can directly guess a hash valueht. The probability for the adversary to guess successfully is2−ℓh , implying the detection probabilityPSSQdet,1 is 1 − 2−ℓh

in this case. Second, the adversary can also follow the ruleof SRQ to constructht for incomplete data. In this case, theadversary is forced to guessξ1 keys for each one ofµ groups,leading to the probability of success guess being2−ℓkµξ1 andthe detection probability beingPSSQdet,2 = 1 − 2−ℓkµξ1 . The

final detection probability,PSSQdet , is thus,min{PSSQdet,1,PSSQdet,2}.

Obviously,PSSQdet will also be close to1 when appropriate keylength or hash function is selected.

B. Communication Cost

The communication cost,T, is defined as the number of bitsin the communications required for the proposed schemes. Weare mainly interested in the asymptotic result in terms ofdandN because they reflect the scalability of the number ofattributes and the network size, respectively. We do not countthe number of bits in representing data,Ei,t, since the sendingof Ei,t is necessary in any data collection scheme. We furtherassume that there are on averagepdtobY data buckets, where0 ≤ pdtob ≤ 1, generated in cellC.

1) Communication Cost of SRQ and STQ:Each sensornode si in SRQ is required to sendBi,t,Hi,t, and Pi,t toits parent node. Nevertheless,Bi,t,Hi,t, and Pi,t can beaggregated along the path in the aggregation tree. As a con-sequence,si actually has only one-hop broadcast containingBi,t,Hi,t, andPi,t once at each epoch. In addition, to answera range query,M is responsible for sendingHM,t, PM,t,and the bucket IDs inA. In summary, the communicationcost, TSRQ, can be calculated as(N − 1)(ℓh + ℓP ) +pdtobY ⌈logw⌉d logN + ℓh + ℓP + |A|⌈logw⌉d = O(N +d logN), whereℓP is the number of bits used to represent thebucket primePM,t. Due to the similarity between SRQ andSTQ, the communication cost,TSTQ, can also be calculatedasO(N + d logN) in a way similar to the one for obtainingTSRQ.

0500

10001500

2000

0

5

10

15

200

1

2

3

4

5

x 107

SRQ

Hybrid Crosscheck

Total number of attributesNumber of nodes in a cell

Com

mun

icat

ion

cost

(a)

0500

10001500

2000

0

5

10

15

200

2

4

6

8

x 107

SRQ

Total number of attributes

Hybrid Crosscheck

Number of nodes in a cell

Com

mun

icat

ion

cost

(b)

Fig. 3: The communication cost of SRQ and hybrid crosscheck in the cases that (a)Y = 100 and (b)Y = 5000.

As shown in Fig. 3 where the parametersℓh = 80 andℓP = 1000 are used, the communication cost of SRQis significantly lower than that of hybrid crosscheck. Morespecifically, as the communication cost of hybrid crosscheckcan be asymptotically represented asO(N2+Nd) and will bedrastically increased withN andd, the proposed SRQ scheme,however, exhibits low communication cost regardless of theamount of sensed data in the network due to the fact that thesize of the proof used in SRQ is always a constant. Hence,the communication cost of SRQ will be dominated by theaggregation procedure, the average hop distance betweenMand each node, and the transmission of bucket IDs, resultingO(N + d logN) communication cost.

2) Communication Cost of SSQ:Since grouping is onlyperformed once after sensor deployment, we ignore its com-munication cost. Note that, in the following, the communica-tion cost of the data should be counted because it is involvedin the design of SSQ. After the grouping, eachsi broadcasts

Page 21: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

8

the data bucket IDs, sensor ID, and the order seed to all thenodes within the same group. Assuming that the nodes employa duplicate suppression algorithm, by which each node onlybroadcasts a given message once, one node should broadcasta message withYN ℓd +

pdtobYN ⌈logw⌉d+ ℓid + ℓh bits, where

ℓd is the average size of a datum andℓid is the number of bitsrequired to represent sensor IDs, resulting in communicationcost ofC1 = |Gη|2( YN ℓd +

pdtobYN ⌈logw⌉d + ℓid + ℓh) bits

in each group. Let0 ≤ pq ≤ 1 be the average ratio ofthe quasi-skyline data to all the sensed data.|Φη,t| is equalto pdtobpqY on average. Then, once the the order seed isamong the firstξ1 smallest ones of the order seeds in a group,si ∈ Gη is required for sendingΦη,t, φη,t, hashKi,t

(Φη,t),its sensor ID, and the IDs of sensor nodes generating bucketsin Φη,t to M, implying the communication cost ofC2 =pdtobpqY ⌈logw⌉d+pqY ℓd+ℓh+ℓid+|Gη|ℓid bits in the worstcase. Note that the verification seeds cannot be aggregated,although the verification seed can also be delivered along thepath to M on the aggregation tree.M needs to send theskyline bucket IDs, its sensor ID, and a hashht as the proof tothe authority for answering the query. The communication costof M is C3 = ℓid+ ℓh+µ

∑µη=1 pdtobpqY ⌈logw⌉d+pqY ℓd.

Consequently, the upper bound of communication cost,TSSQ,

can be obtained asµC1 +µξ1C2 logN +C3 = O(N3

2 +Nd)when µ =

√N and |Gη| =

√N , ∀η ∈ [1, µ]. By similar

derivation, the communication cost of the baseline scheme isO(N2d). Thus, exploiting the proposed grouping techniquedoes reduce the required communication cost. The trends ofcommunication cost in SSQ are shown in Fig. 4.

0500

10001500

2000

0

5

10

15

200

2

4

6

8

10

x 106

Nd

Com

mun

icat

ion

cost

(a)

0500

10001500

2000

0

5

10

15

200

2

4

6

8

10

x 107

Nd

Com

mun

icat

ion

cost

(b)

Fig. 4: The communication cost of SSQ for (a)Y = 100 and (b)Y = 5000.

V. I MPACT OF SENSORNODE COMPROMISE

In the previous discussions, we have ignored the impact ofthe compromise of sensor nodes. In practice, the adversarycould take the control of sensor nodes to enhance the ability

of performing malicious operations. The notations is usedto denote a set of random sensor nodes compromised bythe adversary. Incollusion attackconsidered here,M col-ludes withs in the hope that more portions of query-resultsgenerated by innocent sensor nodes can be dropped. Sincecrosscheck approaches [34] suffer from collusion attacks,theimpact of collusion attack on secure range query for tieredsensor networks was addressed in [34]. Their proposed methodis random probing, by which the authority occasionally checksif there is no data sensed by some randomly selected sensornodes by directly communicating with them. Random probing,however, can only discover the incomplete query-results withan inefficient but possibly lucky way and cannot identifys.

On the other hand, in this paper, we identify a new Denial-of-Service attack never addressed in the literature, called false-incrimination attack, by which s provides false sub-proofs tothe innocent storage node so that the innocent storage nodewill be regarded as the compromised one and be revoked.Unfortunately, all the prior works [21], [29], [34] suffer fromthis attack. In summary, with minor modifications involved,our proposed SRQ, STQ, and SSQ schemes are resilientagainst both the collusion attack and false-incriminationattack.

It should be especially noted that, in the following discus-sion of SRQ, we temporarily make two unrealistic assumptionsthat the compromised nodes can only disobey the proceduresof the proposed schemes4 and the (subtree) proofs (definedlater) will not be manipulated by the compromised storageand sensor nodes, in order to emphasize on the effectivenessof our proposed technique in identifying compromised nodes.Nevertheless, these two assumptions will be relaxed later.

Impact of Sensor Node Compromise on SRQ and STQ.Under the above two assumptions, the SRQ scheme is in-herently resilient against collusion attack, because, regardlessof the existence and position ofs, the proper bucket keyswill be embedded into the proofs and cannot be removedby s. Nevertheless, SRQ could be vulnerable to the false-incrimination attack since false sub-proofs injected bys willbe integrated with the other correct sub-proofs to constructa false proof, leading to the revocation of innocentM. Here,we present a novel technique calledsubtree samplingenablingSRQ, with a slight modification, to efficiently mitigate thethreat of false-incrimination attacks. The idea ofsubtreesampling is to check if the proof constructed by the nodesin a random subtree with fixed depth is authentic so asto perform the attestation only on the remaining suspiciousnodes. Letm be a user-selected constant indicating thesubtree depth. In the modified SRQ scheme, once receiving〈jρ, t, Ejρ,t,Bjρ,t,Hjρ,t,Pjρ,t, ϑ0jρ,t, . . . , ϑ

m−1jρ,t

〉, jρ ∈ [1, N −1], ∀ρ ∈ [1, χ], from its χ children, sj1 , . . . , sjχ , each sicalculatesEjρ,t, Bjρ,t, Hjρ,t, andPjρ,t as in the original SRQscheme. Note that, ifsi is a leaf node on the aggregationtree, it is assumed thatsi receives〈∅, ∅, ∅, ∅, 0, 1, ∅, . . . , ∅〉. Inthe modified SRQ scheme, however,si additionally performs

4In other words, compromised nodes are assumed tonot inject bogus sensorreadings. They can only manipulate its own subproof and the proofs sent fromits descendant sensor nodes.

Page 22: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

9

the following operations. Assume thatHυjρ,t, P

υjρ,t ∈ ϑυjρ,t,

∀υ ∈ [0,m − 1], ∀jρ ∈ [1, N − 1], and ∀ρ ∈ [1, χ]. sicalculatesHυ

i,t = hashKi,t(∑χ

ρ=1 Hυ−1jρ,t

+ Hi,t) and P υi,t =∏χρ=1 P

υ−1jρ,t

· Pi,t, ∀υ ∈ [1,m]. si also calculatesH0i,t = Hi,t

and P 0i,t = Pi,t, whereHi,t andPi,t are computed in a way

stated in Sec. III-A. Then,Hυi,t and P υi,t are assigned to set

ϑυi,t, ∀υ ∈ [0,m − 1]. If hashKi,t(i) ≤ ξ2, where ξ2 is a

pre-determined threshold known by each node and will beanalyzed later, thensi sendsϑmi,t to (possibly compromised)M. Let Tmi be a subtree of the underlying aggregation tree,rooted atsi with depthm. ϑmi,t generated bysi can be thoughtof as thesubtree proofof the data sensed by the nodes inTmi .Finally, si sends〈i, t, Ei,t,Bi,t,Hi,t,Pi,t, ϑ0i,t, . . . , ϑm−1

i,t 〉 toits parent node. LetWt be thewitness setof sensor nodessatisfyinghashKi,t

(i) ≤ ξ2 at epocht. The nodes inWt arecalledwitness nodesat epocht.

(a) (b)

Fig. 5: The conceptual diagrams of identifying the compromised nodes. (a) Only thenodes in red area need to be attested. (b) Only the nodes in gray area need to be attested.

We first consider the simplest case, where no compromisednodes act as the witness nodes. We further assume that onlyone compromised node (i.e., |s| = 1) injects false sub-prooffor simplicity and our method can be adapted to the case ofmultiple compromised nodes injecting false sub-proofs. Wehave the following observation regarding each witness nodesiand its generatedϑmi,t. Assume that the query-result is found tobe incomplete at epocht. The authority requestsϑmi,t for eachwitness nodesi stored inM. To identify the compromisednodes, all the nodes at first are considered neutral. The nodesin Tmi become innocent if the verification ofϑmi,t passes, andthe nodes inTmi become suspicious otherwise5. The aboveverification is performed as follows. According to thePmi,t inthe subtree proofϑmi,t, the authority knows which nodes in thesubtreeTmi contribute data and their amount. The authoritycan, based on this information, calculateHm

i,t by itself. DefineETm

i,t as the set of data sensed by the nodes inTmi , andBTm

i,t

as the corresponding bucket IDs ofETmi ,t. As a consequence,

the verification passes if and only if theHi,t calculated bythe authority itself is equal to theHi,t extracted from thereceived receivedϑmi,t, and the amount of the data inETm

i ,t

falling into the specified buckets inBTmi,t matches the one

indicated inPi,t. As a whole, each time the above checkingprocedures are performed according toϑmi,t at epocht, nodeswill be partitioned into three sets, innocent setIit, neutralset, and suspicious setSi

t containing innocent nodes, neutral

5Note that there would be the cases that a node is deemed to be bothinnocent and suspicious when different subtree proofs are considered. Whensuch a case happens, that node obviously should be innocent.

nodes, and suspicious nodes, respectively. We can concludethat (

⋂i∈Wt,Si

t 6=∅ Sit)⋃{M} contains at least one compro-

mised node injecting the false subproof if at least oneSit is

nonempty and({s1, . . . , sN−1}\(⋃i∈Wt

Iit))⋃{M} contains

at least one compromised node injecting the false subproofotherwise. In more details, the authority at first only performsthe attestation [23], [24], [26] on the nodes in

⋂i∈Wt,Si

t 6=∅ Sit

or in {s1, . . . , sN−1} \ (⋃i∈Wt

Iit). Nonetheless, after theattestation, if the nodes being attested are ensured to be notcompromised, thenM should be the compromised node. Itcan be observed that the size of the set of the nodes theauthority needs to perform the attestation has possibilityofbeing drastically shrunk so that the computation and commu-nication cost required in the attestation will be substantiallyreduced as well. It can also be observed that each time theattestation is performed, at least one compromised node canbe recovered. The intuition behind the checking procedure canbe illustrated in Fig. 5. In addition, Fig. 6 depicts the numberof nodes to be attested after an incomplete reply is foundin different settings. Since the number of witness nodes isapproximatelyξ2(N − 1)/2ℓh, it can be observed from Fig. 6that the larger theξ2 andm, the lower the number of nodesto be attested. Nevertheless, whenξ2 andm become larger,the communication cost of the modified SRQ scheme, whichwill be presented later, is increased as well.

There, however, would still be the cases where the compro-mised nodes luckily act as witness nodes so that they can beconsidered innocent by sending genuine subtree proof toM.In our consideration, this case does happen, but our techniquealso successfully mitigates the threat of false-incriminationattacks because the effectiveness of false-incriminationattacksis now limited within the case where some compromised nodeswork as witness nodes.

0

50

100

150

200 0

2

4

6

80

100

200

300

400

500

Subtree depth (m)Number of witness nodes

Num

ber

of nodes to b

e a

tteste

d

(a)

0

50

100

150

200 0

2

4

6

8

0

200

400

600

800

1000

Subtree depth (m)Number of witness nodes

Num

ber

of nodes to b

e a

tteste

d

(b)

Fig. 6: The number of nodes to be attested for (a)N = 500 and (b)N = 1000.

Page 23: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

10

Now, we have to remove the two unrealistic assumptions wemade before, allowing that the bogus data can be injected andthe subtree proof sent from the witness node has possibilityto be maliciously altered on its way to the authority. Afterthe removal of these two unrealistic assumptions, SRQ isstill resilient against collusion attacks because the use of theproofs guarantees that the misbehavior of deleting the senseddata will be detected. Unfortunately, on the one hand, thecompromised nodes injecting the bogus data can deceive theauthority into accepting the falsified sensor reading. On theother hand, the compromised nodes lying on the path betweenM and witness nodes can manipulate the subtree proofs sothat the compromised nodes can avoid the detection and theinnocentM can still be falsely incriminated. In our strategy,we use theredundancy property, which have been widely usedin the design of the other security protocols in WSNs suchas en-route filtering, to mitigate the former threat, while we,motivated by the IP traceback technique in internet security lit-erature, develop a new traceback technique suitable for WSNsto resist against the latter attack. Here, the redundancy propertymeans that usually WSNs are densely deployed so that anevent in the sensing region can be simultaneously detected bymultiple nodes. In general, the redundancy property can beobtained by achieving the so-calledt-coverage and therefore,an event can be simultaneously observed by at leastt nodes.In the following description of our remedy to these problems,due to its similarity to our modified SRQ scheme previouslymentioned, we will omit some notational details, stressingonthe procedures itself.

When the redundancy property is used, in essence, ourSRQ does not need to be changed. Assume for now that, theauthority issues a range query and receives the query-resultfrom M. In addition, we also assume that all the checkingprocedures stated in (modified) SRQ are passed. The authoritynow wants to know whether the received data are falsified byand sent from the compromised nodes. Recall that each nodecan be aware of its geographic position. After knowing whichnodes contribute the sensed data to its issued query from thequery-result, the authority further acquires the encrypted dataof the neighbors of those nodes fromM. Note that this canbe achieved because when geographic positions of all nodesare known by the authority6, inferring the one-hop neighborsof a specific node can be easily achieved. Finally, for eachnode in the query-result, the authority checks the consistencyof its sensor reading and the sensor readings of its one-hopneighbors. The sensed data will be rejected as long as it isinconsistent with the sensor readings of its neighbors. Theconsistency checking procedure here may allow for certainmeasurement errors or environmental factors, which shouldbe domain-specific and user-defined.

Now, we turn to address another problem that the subtreeproofs transmitted from the witness node to the authority

6As long as each node knows its position, it sends in a multihopmannerits position to the authority with the MAC constructed by thekey uniquelyshared with the authority. This kind of operations are only performed onceafter the sensor deployment.

could be maliciously altered by the compromised (storageor sensor) nodes. To deal with this kind of threat, we needa mechanism, by which the receiver not only can knowwhether the received message is modified by the intermediatenodes, but also can point out the node modifying the messageif it does exist. Motivated by the IP traceback techniques,we develop arecursive tracebackmechanism. To be morespecifically, each nodesi on the path connecting theM andthe witness node, after receivingD, whereD denotes thesubtree proof, from its descendant node, attacheshashKi,t

(D)toD and then forwardsD||hashKi,t

(D) to its ascendant nodeon the underlying aggregation tree. Note that it is assumed thatthe witness node receivesD||∅. Thus, with this modification, asubtree proof received by the authority should be accompaniedwith ψ hashes, whereψ is the number of nodes (includingstorage nodes and the witness node itself) between a specificwitness node and the authority. Here, we should note thatbecause the topology of the aggregation tree is known bythe authority, when the subtree proof is sent, the IDs ofintermediate nodes except for the ID of the witness nodeitself do not need to be attached7. Hence, when the query-result is deemed incomplete, before conducting the proceduresdefined in the subtree sampling technique to attest nodes,the authority checks whether the received subtree proof ismaliciously altered by the intermediate node. In particular,assume that the subtree proof and itsψ associated hashes,D||hD1 || · · · ||hDψ , wherehDi , 1 ≤ i ≤ ψ, is the hash calculatedby the node that is(i − 1)-hop away from the witness nodeandhD1 is computed by the witness node itself according toits asserted subtree proof, are received by the authority. Afterthe reception ofD||hD1 || · · · ||hDψ , the authority checks theconsistency of the hash backward;i.e., it first checkshDψ , andthenhDψ−1, and so on. If such a verification can be successfullyproceeded allψ hashes, then the subtree proofD is consideredto be intact. Otherwise, once the verification fails in, say,hDj ,we can conclude that the subtree proof was altered by thecorresponding node since the innocent node is not assumed tobehave in such a way.

Recall that the communication cost of original SRQ isO(N + d logN). In the modified SRQwithout those twounrealistic assumptions, each nodesi at epocht is requiredto additionally sendϑ0i,t, . . . , ϑ

m−1i,t , leading toO(N) com-

munication cost. At epocht, approximatelyξ2(N − 1)/2ℓh

witness nodes will send their subtree proofs toM. Never-theless, when the subtree proofs are sent toM, since theadditional hashes will be added, its communication cost willbe O(N · (

√N + (

√N − 1) + · · · + 1)). As a result, the

communication cost becomesO(N2 + d logN).Basically, STQ can be regarded as a special use of SRQ.

Thus, its resiliency against false-incrimination attack is thesame as that of SRQ. Nevertheless, due to the nature of top-k query, the injection of the falsified sensor reading from

7The path from any sensor node in the aggregation tree toM is unique.The authority can infer the nodes the subtree proof traverses once it is awareof the ID of the witness node.

Page 24: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

11

the compromised nodes will imply a significant query-resultdeviation. As a consequence, the resilience against collusionattack of STQ should be reconsidered and it is discussed inthe extended version [32] of this paper.

Impact of Sensor Node Compromise on SSQ.Recallthat two aforementioned unrealistic assumptions are made.Wefirst consider the resiliency against false-incriminationattacks.The simplest method is to introduce a parameterξ3 ≤ ξ1so thatM reports to the authority all the verification seeds,instead of the hash of them in original SSQ. After that, foreach group, if at leastξ3 out of ξ1 verification seeds canbe successfully verified, then the quasi-skyline data in thatgroup are considered complete. Hence, the threat of false-incrimination attacks will be mitigated because the adversaryis forced to send at leastξ1 − ξ3 + 1 false proofs, instead ofsingle one false proof. With even one verification seed froma specific group failed to be verified, we can conclude that atleast one compromised sensor node exists in that group.

Now, we consider collusion attacks. In SSQ, bothMand sensor nodes only know the quasi-skyline data. Recallthat quasi-skyline data are not necessarily the skyline data.Thus, even when there is more than one compromised sensornode in a group, the probability of successfully dropping theskyline data is actually small. More specifically, to drop theskyline data at a specified epoch,s should contain at leastξ3sensor nodes responsible for sending hash values in order tosuccessfully forge a proof of incomplete quasi-skyline data,and at the same time should be fortunate enough to selectgroups whose quasi-skyline data contain skyline data8.

Now, we consider both the collusion and false-incriminationattacks. To drop the skyline data, the only thingM can dois to drop the data of certain groups. Here, for simplicity, weconsider the case whereM drops the data of a fixed groupGη,η ∈ [1, µ], in which a sets of x sensor nodes is compromised.To prevent the detection of incomplete query-result,ξ3 out ofxcompromised sensor nodes should be the sensor nodes respon-sible for sending the proofs. The probability that at leastξ3 outof ξ1 nodes responsible for sending the proofs are containedin s is p1 =

∑ξ1j=ξ3

(ξ1j

)(|Gη|−ξ1x−j

)/(|Gη|x

). In other words,

this is equal to the probability that the compromised sensornodes can drop the quasi-skyline data without being detected.Quasi-skyline data, however, are not necessarily equivalent tothe skyline data. The probability that the quasi-skyline datadropped by compromised nodes indeed contain skyline datacan be represented asp2 =

(Y−pcY

|Gη|Y/N−pcY

)/(

Y|Gη|Y/N

), where

pc is the average ratio of the skyline data to all the sensed data.In short, this is equal to the probability that the operations per-formed by compromised nodes cause the loss of skyline data.As a whole, even if the adversary hasx compromised nodes inGη, the probability of successfully making skyline query-resultincomplete is merelyp1 · p2. The trends of such a probabilityare depicted in Fig. 7 under different parameter settings. Asshown in Fig. 7a,p1 · p2 is decreased with an increase of

8Definitely, M can simply drop all the sensed data. Nevertheless, underthis option, it is forced to forge at leastµ(ξ1 − ξ3 + 1) proofs (µ =

√N in

our analysis), leading to high probability of being detected.

N andY . This is because whenN andY become larger, itis more unlikely that the quasi-skyline data dropped by theadversary contains the skyline data. Nevertheless, as shown inFig. 7b,p1 ·p2 is increased with an increase of(ξ1− ξ3). Thisis because when(ξ1 − ξ3) becomes larger, it is more likelythat s can forge a proof of an incomplete quasi-skyline data.As a whole, from the false-incrimination attack point of view,the larger the(ξ1 − ξ3), the lower thep1 · p2, but from thecollusion attack point of view, the smaller the(ξ1 − ξ3), thelower thep1 · p2. This would be an optimization problem thatdeserves further studying. In addition, compared with originalSSQ, the additional communication cost incurred from themodified SSQ comes from the transmission of verificationseeds fromM to the authority. Thus, the communication costof the modified SSQ scheme remainsO(N

3

2 +Nd).

020

4060

80100

200

400

600

800

10000

0.2

0.4

0.6

0.8

1

YN

p 1⋅ p2

(a)

1 2 3 4 5 6 7

7

7.5

8

8.5

90

0.5

1

1.5

2

2.5

x 10−9

ξ3

ξ1

p 1⋅ p2

(b)

Fig. 7: The probability of successfully dropping skyline data in the cases that (a)ξ1 = 8 andξ3 = 4, and (b)N = 500 andY = 100.

VI. CONCLUSION

We propose schemes for securing range query, top-k query,and skyline query, respectively. Two critical performancemetrics, detection probability and communication cost, areanalyzed. In particular, the performance of SRQ is superiorto all the prior works, while STQ and SSQ act as thefirst proposals for securing top-k query and skyline query,respectively, in tiered sensor networks. We also investigate thesecurity impact of collusion attacks and newly identified false-information attacks, and explore the resiliency of the proposedschemes against these two attacks.

REFERENCES

[1] H. Chan and A. Perrig. PIKE: Peer Intermediaries for Key Establishmentin Sensor Networks. inProceeding of IEEE Conference on ComputerCommunications (INFOCOM), 2005.

[2] H. Chan and A. Perrig. Efficient Security Primitives froma SecureAggregation Algorithm. inProceeding of ACM Conference on Computerand Communications Security (CCS), 2008.

Page 25: Secure Multidimensional Queries in Tiered Sensor Networks · Secure Multidimensional Queries in Tiered Sensor Networks Chia-Mu Yu†§, Chun-Shien Lu†, and Sy-Yen Kuo§ †Institute

12

[3] H. Chan, A. Perrig, and D. Song. Random Key Predistribution Schemesfor sensor networks. inProceeding of IEEE Symposium on Security andPrivacy (S&P), 2003.

[4] H. Chan, A. Perrig, and D. Song. Secure Hierarchical In-NetworkAggregation in Sensor Networks. inProceeding of ACM Conference onComputer and Communications Security (CCS), 2006.

[5] H. Chen, S. Zhou, and J. Guan. Towards Energy-Efficient SkylineMonitoring in Wireless Sensor Networks. inProceeding of EuropeanConference on Wireless Sensor Networks (EWSN), 2007.

[6] P. Desnoyers, D. Ganesan, H. Li, M. Li, and P. Shenoy. PRESTO: APredictive Storage Architecture for Sensor Networks. inProceeding ofWorkshop on Hot Topics in Operating Systems (HotOS), 05.

[7] L. Eschenauer and V. Gligor. A Key-management Scheme forDistributedsensor networks. inProceeding of ACM Conference on Computer andCommunications Security (CCS), 2002.

[8] H. Hacigumus, B. R. Iyer, C. Li, and S. Mehrotra, Executing SQL overencrypted data in the database service provider model. inProceedingof ACM International Conference on Management of Data (SIGMOD),2002.

[9] B. Hore, S. Mehrotra, and G. Tsudik, A privacy-preserving index forrange queries. inProceeding of International Conference on Very LargeData Bases (VLDB), 2004.

[10] W. Liang, B. Chen, J. Yu. Energy-efficient skyline queryprocessing andmaintenance in sensor networks. inProceeding of ACM Conference onInformation and Knowledge Management (CIKM), 2008.

[11] X. Li, Y. Kim, R. Govindan, and W. Hong. Multi-dimensional rangequeries in sensor networks. in Proc.Proceeding of ACM Conference onEmbedded Networked Sensor Systems (SenSys), 2003.

[12] D. Liu, P. Ning. Multi-level µTESLA: broadcast authentication fordistributed sensor networks.ACM Transactions in Embedded ComputingSystems (TECS), vol. 3, no. 4, pp. 800-836, 2004.

[13] D. Liu, P. Ning, and W. Du. Attack-resistant location estimation insensor networks. inProceeding of ACM/IEEE International Conferenceon Information Processing in Sensor Networks (IPSN), 2005.

[14] G. Mathur, P. Desnoyers, D. Ganesan, and P. Shenoy. Ultra-low powerdata storage for sensor networks. inProceeding of ACM/IEEE Interna-tional Conference on Information Processing in Sensor Networks (IPSN),2006.

[15] S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. TAG: aTiny AGgregation Service for Ad-Hoc Sensor Networks. inProceeding ofUSENIX Symposium on Operating Systems Design and Implementation(OSDI), 2002.

[16] S. R. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong. TinyDB:An Acquisitional Query Processing System for Sensor Networks. ACMTransactions on Database Systems (TODS), vol. 30, no. 1, pp. 122-173,Mar. 2005.

[17] A. Perrig, R. Szewczyk, V. Wen, D. Culler, and J. D. Tygar. SPINS:Security Protocols for Sensor Networks. inProceeding of ACM Inter-national Conference on Mobile Computing and Networking (Mobicom),2001.

[18] P. Rogaway, M. Bellare, and J. Black. OCB: A Block-Cipher Mode ofOperation for Efficient Authenticated Encryption.ACM Transactions onInformation and System Security (TISSEC), vol. 6, no. 3, pp. 365-403,2003.

[19] RISE project. Available:http://www.cs.ucr.edu/˜rise/[20] S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Yin, and

F. Yu. Data-centric storage in sensornets with GHT, a geographic hashtable.Mobile Networks and Applications, vol. 8, no. 4, pp. 427-442, 2003.

[21] B. Sheng and Q. Li. Verifiable privacy-preserving rangequery in two-tiered sensor networks. inProceeding of IEEE Conference on ComputerCommunications (INFOCOM), 2008.

[22] B. Sheng, Q. Li and W. Mao. Data storage placement in sensor networks.in Proceeding of ACM International Symposium on Mobile Ad HocNetworking and Computing (MobiHoc), 2006.

[23] A. Seshadri, M. Luk, A. Perrig, L. V. Doorn, and P. Khosla. SCUBA:Secure Code Update By Attestation in Sensor Networks. inProceedingof ACM Workshop on Wireless Security (WiSe), 2006.

[24] M. Shaneck, K. Mahadevan, V. Kher, and Y. Kim. Remote Software-based Attestation for Wireless Sensors. inProceeding of European Work-shop on Security and Privacy in Ad-hoc and Sensor Networks (ESAS),2005

[25] K. Sun, P. Ning, C. Wang, A. Liu, and Y. Zhou. TinySeRSync:Secure and Resilient Time Synchronization in Wireless Sensor Networks.

in Proceeding of ACM Conference on Computer and CommunicationsSecurity (CCS), 2006.

[26] A. Seshadri, A. Perrig, L. V. Doorn, and P. Khosla. SWATT: SoftWare-based ATTestation for Embedded Devices. inProceeding of IEEE Sym-posium on Security and Privacy (S&P), 2004.

[27] Stargate gateway (SPB400). Available:http://www.xbow.com[28] B. Sheng, C. C. Tan, Q. Li, and W. Mao. An Approximation Algorithm

for Data Storage Placement in Sensor Networks. inProceeding of WASA,2007.

[29] J. Shi, R. Zhang and Y. Zhang. Secure range queries in tiered sensor net-works. inProceeding of IEEE Conference on Computer Communications(INFOCOM), 2009.

[30] M. Wu, J. Xu, X. Tang, W.-C. Lee. Top-k Monitoring in Wireless SensorNetworks. IEEE Transactions on Knowledge and Data Engineering(TKDE), vol. 19, no. 6, pp. 962-976, 2007.

[31] C.-M. Yu, C.-S. Lu, and S.-Y. Kuo. A simple non-interactive pairwisekey establishment scheme in sensor networks. inProceeding of IEEEConference on Sensor, Mesh and Ad Hoc Communications and Networks(SECON), 2009.

[32] Extension of this paper. Availiable:http://homepage.ntu.edu.tw/˜d95921015/index.htm

[33] Y. Zhang, W. Liu, Y. Fang, and D. Wu. Secure localizationandauthentication in ultra-wideband sensor networks. inIEEE Journal onSelected Areas in Communications, Special Issue on UWB WirelessComms. - Theory and Applications, vol. 24, no. 4, pp. 829-835, 2006.

[34] R. Zhang, J. Shi, and Y. Zhang. Secure multidimensionalrange queriesin sensor networks. inProceeding of ACM International Symposium onMobile Ad Hoc Networking and Computing (MobiHoc), 2009.