1 improved spectrum access control of cognitive radios ...liu/paper/fabioiet.pdf · an information...

22
1 Improved Spectrum Access Control of Cognitive Radios based on Primary ARQ Signals Fabio E. Lapiccirella, Zhi Ding and Xin Liu Electrical and Computer Engineering University of California, Davis, California 95616 Email: [email protected] [email protected] [email protected] Abstract Cognitive radio systems capable of opportunistic spectrum access represent a new paradigm for improving the efficiency of current spectrum utilization. In this work, we present a novel cognitive channel access method based on learning from both primary channel transmissions and the receiver ARQ feedback signals. This new sensing-plus-confirmation scheme constitutes a non-trivial gen- eralization of the more traditional “Listen-Before-Talk” (LBT) strategy that merely listens to and yields to primary transmissions regardless of primary receiver responses. Our new method exploits the bi-directional and interactive nature of most wireless communication links to facilitate better opportunistic secondary access while achieving primary receiver protection. By allowing the sec- ondary users to learn from both primary transmissions and the corresponding receiver confirmations, our approach allows secondary cognitive users to exploit critical information that primary receivers regularly send to their transmitters. We show that, by monitoring both primary transmissions and receiver feedback signals, secondary radio access can improve throughput over the traditional LBT while limiting the probability of collision with primary user signals. 1. I NTRODUCTION The topic of opportunistic spectrum access for cognitive radios has recently attracted substantial research interest in wireless research community as a new paradigm for overcoming the dilemma of spectrum overcrowding and underutilization due to static spectrum allocation. As pointed by the FCC (of US) in [1], much of the spectrum suitable for wireless communications has been licensed to primary users. In many applications, sporadic channel access by licensed users leads to underutilization

Upload: others

Post on 18-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

1

Improved Spectrum Access Control of

Cognitive Radios based on Primary ARQ

Signals

Fabio E. Lapiccirella, Zhi Ding and Xin Liu

Electrical and Computer Engineering

University of California, Davis, California 95616

Email: [email protected] [email protected] [email protected]

Abstract

Cognitive radio systems capable of opportunistic spectrumaccess represent a new paradigm for

improving the efficiency of current spectrum utilization. In this work, we present a novel cognitive

channel access method based on learning from both primary channel transmissions and the receiver

ARQ feedback signals. This new sensing-plus-confirmation scheme constitutes a non-trivial gen-

eralization of the more traditional “Listen-Before-Talk”(LBT) strategy that merely listens to and

yields to primary transmissions regardless of primary receiver responses. Our new method exploits

the bi-directional and interactive nature of most wirelesscommunication links to facilitate better

opportunistic secondary access while achieving primary receiver protection. By allowing the sec-

ondary users to learn from both primary transmissions and the corresponding receiver confirmations,

our approach allows secondary cognitive users to exploit critical information that primary receivers

regularly send to their transmitters. We show that, by monitoring both primary transmissions and

receiver feedback signals, secondary radio access can improve throughput over the traditional LBT

while limiting the probability of collision with primary user signals.

1. INTRODUCTION

The topic of opportunistic spectrum access for cognitive radios has recently attracted substantial

research interest in wireless research community as a new paradigm for overcoming the dilemma

of spectrum overcrowding and underutilization due to static spectrum allocation. As pointed by the

FCC (of US) in [1], much of the spectrum suitable for wirelesscommunications has been licensed to

primary users. In many applications, sporadic channel access by licensed users leads to underutilization

Page 2: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

2

of their allocated spectra. By allowing secondary users to opportunistically access the bandwidth

allocated to but underutilized by primary users, cognitiveradio networks have been championed as

an enabling new technology to overcome the current spectrumscarcity caused from the traditional

static spectrum allocation.

A main challenge of opportunistic spectrum access (OSA) is the need to ensure that cognitive users

avoid detrimental disruption to licensed user communications when they are active. To achieve such

primary user protection, “Listen-Before-Talk” (LBT) opportunistic spectrum access scheme has been

studied extensively with significant progress both in theory and in practice. For example, LBT strategy

is shown to protect incumbent TV users during the second FCC experimentation phase [2] over TV

bands. Although conceptually simple, LBT exhibits severaldrawbacks. First, it only senses primary

transmission signals. Thus, to protect potential primary receivers, LBT-based cognitive radios must

be designed very conservatively and typically needs a listening sensitivity well below−100dBm.

Such extreme sensitivity may leave very little room for cognitive access [3]. Second, many robust

primary user (PU) systems can withstand some concurrent interference. This robustness to interference

provides room for concurrent SU access without disrupting PU communications. LBT does not allow

SU access to fully exploit link robustness offered by error protection in such PU systems. Third, LBT

does not provide any information regarding the aggregated effect of multiple cognitive transmitters

on the primary receivers; hence making distributed cognitive access protocol difficult.

To overcome the aforementioned weaknesses of LBT, we advocate a different strategy that in-

corporates the intrinsic feedback information in typical two-way PU communication links. More

specifically, our principle is to exploit data-link-control (DLC) feedback information in PU systems

for cognitive access. Because DLC messages carry importantinformation about the quality of the

link between the primary sender and its receiver, they reflect the sum interference effect of secondary

user transmissions on the primary receiver. By observing such feedback information, secondary users

(SUs) can devise more intelligent decisions with respect tospectrum access. As a generalization

of “Listen-Before-Talk”, we develop a new “sensing-and-confirmation” OSA control algorithm by

allowing secondary users to exploit information sensed from both primary transmissions and receiver

responses. In particular, beyond the “traditional” sensing of primary transmitter, the SU also listens to

the primary receiver ARQ signals; i.e., ACK/NAK. The successful decoding of an ACK/NAK packet

from PU receiver indicates (with high accuracy) the presence of PU transmission, and thus enhances

(or confirms) the detection of primary idle/busy state. It isimportant to note that opportunistic access

is not necessary unlicensed. In fact, secondary users may belong to the same legacy as the primary

Page 3: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

3

ones. For example, in this work the SU transmitter could be a femto base-station in a4G network,

opportunistically serving its user provided that the macrocell users are protected. We, therefore,

advocate that secondary users should be allowed to decode encrypted packets such as DLC feedback

packets to increase the average system performance.

Opportunities to exploit DLC information exist widely in bidirectional digital communication

systems. Examples include the1-bit closed-loop-power-control (CLPC) information in IS-95, the

ACK/NAK feedback-packets in WiFi networks, or the CQI feedback in LTE and HSDPA. Also,

most standards include feedback control channel to communicate ACK/NAK feedback packets. In

the present work, we focus on the ACK/NAK information that primary receivers (PRxs) send to

primary transmitters (PTxs) to indicate whether or not the latest data packet was correctly received.

The main scope of this work is to show that exploiting DLC information, specifically, ACK/NAK

packets from the PRx, can be beneficial to secondary users to achieve improved channel access and,

consequently secondary throughput-performance, while preserving primary protection. The work is

not limited to a particular wireless network standard or protocol, but applies to systems that fit the

descriptions in Section 3.

Our scheme can be viewed as a special case of cooperative sensing in which multiple signal

observations are used to better detect the presence of spectrum opportunities. In this special case, we

are not usingmultiple cooperative secondary users. Instead, we exploitmultiple sources of information

received by the same secondary transmitter. Moreover, integrating the detection information from

sensing-and-confirmation (SAC) with past observations as well as prior knowledge of primary user

characteristics in terms of idle and busy distribution, we present an effective channel access control

algorithm that can optimize the utility of spectrum idle periods of the primary users.

Our work will be presented as follows. Section 2 details someof the related works in cognitive

networks. In Section 3, we describe in detail the system model that underlies the cognitive sensing

and channel access of primary user channels. In Section 4, wedefine our problem formulation and in

Section 5 we develop an optimized spectrum access policy based on dynamic programming with

information collection through sensing-and-confirmation(SAC). We present simulation results in

section 6 to illustrate the performance of the optimized spectrum access policy. Section 7 summarizes

the current work and discusses future research directions.

2. RELATED WORKS

Existing works on cognitive radio access are mostly based onLBT and, particularly, on spectrum

sensing of primary transmissions. Spectrum sensing may involve energy detection and cyclostationary

Page 4: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

4

feature detection. These sensing schemes are presented, among others, in [4] and [5] respectively. A

survey on spectrum sensing for cognitive radios is given by [6]. Distributed spectrum sensing schemes

have been studied in [7] and [8].

A fundamental result of spectrum sensing was given in [9], where the authors described the

concept of “SNR-walls”. To overcome such impairment, cooperative spectrum sensing among multiple

nodes has been proposed. Among others, [11] demonstrated that the sensitivity-requirement changes

under fading channels for spectrum opportunity detection.Also, [12] presented a multi-band joint

detection algorithm for cooperative secondary users to mitigate the effect of possibly degraded channel

conditions between the primary transmitter and some of the cognitive radios.

The authors of [13], among others, tackled the design of optimum sensing time to maximize

secondary users throughput under a primary user protectionconstraint. Different sensing techniques

to help secondary users designers to implement the reliablesensors can be found in [14]. It is

important to note that both [11] and [14] stress the need for cooperation among SUs to improve

detection reliability. For this reason, we consider a special type of cooperation where each secondary

transmitter (STx), instead of sharing its observation withthe other SUs, collects observations from

both the primary transmitter (PTx) and the primary receiver(PRx). This approach enables improved

detection of channel vacancies and also makes it possible tomore accurately assess STx’s interference

effect at the PRx, thereby improving the average secondary throughput performance. Since we assume

that PRx feedback transmission time is very short compared to PTx data packet duration, channel

usage during PRx feedback exchange does not constitute a significant opportunity for the STx.

An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

to maximize primary channel access time have been studied in[18] and [19]. A generalization of

those works for optimal single-SU channel access with primary exponential or general idle time

distributions is presented in [23]. The authors of [24] useda partially observable Markov decision

process (POMDP) framework to devise a distributed MAC protocol for cognitive radios.

Closely related to our SAC proposal are the works in [25] and [26]. In [25], the authors used a1-bit

CINR (carrier-to-interference plus noise ratio) feedbackto optimize an SU power control strategy,

whereas [26] proposed to maximize cognitive user data rate by exploiting the primary closed-loop-

power-control. Finally, in [27] the authors proposed another POMDP framework for dual sensing to

utilize the ACK/NAK feedback that SUs can collect from the primary receiver in order to achieve

distributed-collective primary receiver protection.

There are a number of works assuming prior knowledge of primary busy-idle time duration at the

Page 5: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

5

STx e.g, in [20], [21] and [22]. Such information can be obtained at STx by observing the primary

channel long-enough and by using historic information.

The authors of [28] apply a Markov-decision-process framework for devising an optimal oppor-

tunistic channel access policy, subject to limits on primary’s performance loss. In [29], the authors

allow primary H-ARQ feedback exploitation by secondary users to realize overlay opportunistic

communications.

3. SYSTEM MODEL

3.1. Cognitive Overlay atop Random Access Networks

PU−Rx

SU−Rx

SU−TX

SU−Rx

SU−TX

DATA−PACKETS

FEEDBACK

PU−Tx

Fig. 1. System model

Figure 1 illustrates a typical scenario in our study, where acognitive radio network is overlaid on

top of a primary user network that hosts random transmissions of primary subscribers when needed.

As shown, a primary transmitter (PTx) and a primary receiver(PRx) communicate by transmitting

packets of variable lengths by randomly access their assigned channel. The PRx responds to the packet

transmission by feeding back ACK for success and NAK for failure. Our objective is to allow the

overlay of secondary users that include secondary transmitters (STx) and secondary receivers (SRx)

such that, when the primary pairs are silent, the STx can activate and transmit data packets to their

receivers (SRx).

More specifically, we consider the case when the primary system consists of a transmitter (PTx)

and a receiver (PRx). They communicate through two logically distinct channels (i.e., through TDD or

FDD). Figure 3 illustrates a TDD PU channel access in which the data frame from the PTx is followed

by an acknowledgment from the PRx via the reverse link. As shown in Figure 2, in the forward channel

(FCH), data packets are sent from the primary transmitter tothe primary receiver, whereas the reverse

channel (RCH) is a feedback channel where DLC packets are sent from PRx to PTx. The primary

transmitter gains channel-access at will (whenever a packet is ready for transmission), following an

Page 6: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

6

ON/OFF traffic pattern with known Idle/Busy-period distributions. We assume the access time to be

slotted and that the secondary time slots have the same duration and timing of the primary ones. We

associate the slott to the time interval:[t · ts, (t + 1) · ts) and we denotets as the slot duration.

The objective of the secondary transmitter is to maximize its channel access while maintaining the

desired PU collision rate.

Fig. 2. Distance model for secondary transmitter and primary pair.

In this work, we make the following assumptions:

• A wireless scenario that includes one pair of primary does and one secondary pair.

• STx has prior knowledge on the primary busy/idle distribution.

• Primary traffic statistics does not change in the consideredtime-scale.

• The PTx is oblivious to secondary activity such that its channel access policy does not consider

secondary activity.

• The STx always has packets to send.

• STx is allowed to decrypt the feedback-packets from the PRx.

• The transmission of the ACK/NAK packets from the PRx are short enough not to constitute a

significant opportunity for increasing secondary throughput.

• Primary and secondary time slot synchronization.

• The STx cannot sense while transmitting.

We assume that the SU only attempts to access PU forward channel and always listens to the PU

reverse channel. When the SU correctly decoded a feedback packet, it can be used to enhance its

detection of the primary transmitter’s ON/OFF activity.

For simplicity, we apply the following notations

• P, P0, P1, P2: PTx transmission power, PTx signal power at the STx, PRx transmission power,

Page 7: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

7

and STx transmission power, respectively;

• d0, g0: distance and channel (fading) gain between the PTx and the STx, respectively;

• d1, g1: distance and channel (fading) gain between the PRx and the PTx, respectively;

• d2, g2: distance and channel (fading) gain between the PRx and the STx, respectively;

• δ: path-loss factor, typically between 2 and 5;

• N0: background noise power at the PRx;

• Ns: the number of samples used for the radiometer detector;

• γ: the likelihood-ratio test threshold;

The secondary transmitter applies a sensing-based detection (e.g. energy detection) characterized by:

PF = Pr[false alarm] ,

PM = Pr[missed detection] .

Note that, for a target probability of missed detectionPM , we can calculate the corresponding

false-alarm probabilityPF , through a Neyman-Pearson test. Specifically for Gaussian noise and

approximately Gaussian signals, the probability of false alarm and missed-detectionPF and PM

can be expressed as follows, [10]:

PF = Q

γ −N0√

2

Ns

N0

(1)

PM = 1−Q

γ − (P0 +N0)√

2

Ns

(P0 +N0)

. (2)

whereQ(·) is the standard Gaussian integral function whileP0 =P

d−δ0

. In our algorithm, given a target

value ofPM , we can derive the corresponding likelihood ratio testγ and the false alarm probability

PF .

In our work, we assume imperfect STx reception of primary receiver’s feedback characterized by:

η = Pr[correctly decoding PRx feedback] .

Page 8: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

8

Note that a value ofη = 0 means that the STx has no feedback information from the PRx and

that its access strategy degenerates to the LBT strategy.

3.2. Primary Transmission Outage

Because the essential objective of secondary user access isto minimize its negative impact on

primary data transmission, the primary transmission outage probability must be kept low. To quantify

the impact of SU access on PU channel quality, we define probabilities

ξ0 = Pr[PRx outage| STx not transmitting]

ξ1 = Pr[PRx outage| STx transmitting] .

Whenever an outage event occurs at the primary receiver, thePRx sends a NAK packet to the

primary transmitter. Note that the error (outage) probability at the PRx is based on the detection SNR

thresholdγp. Following the exponential path model and the assumption ofRayleigh fading channel

with unit variance, the error probabilityξ0 without STx transmission is:

ξ0 = P

Pd−δ1 |g1|

2

N0≤ γp

,

= 1− exp

− γpN0

Pd−δ1

;

(3)

Similarly, the outage probability when STx is active is:

ξ1 = P

Pd−δ1 |g1|2

P2d−δ2 |g2|2 +N0

≤ γp

,

= 1−1

1 + γpP−1dδ1P2d−δ2

· exp

−γpN0

Pd−δ1

;

(4)

Naturally, ξ0 ≤ ξ1.

We note that bothξ0 andξ1 can be estimated by the STx. In particular, we require the secondary

transmitter to be able to decode primary receiver’s ARQ packets and estimate the values ofξ0 and

ξ1 over time.

3.3. Cognitive User Utilization of PU Signals

In what follows, we detail the stochastic dynamic programming formulation used to devise the

optimal channel access policy applied in this paper. Stochastic dynamic programming deals with

Page 9: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

9

problems where decisions are made in stages. Dynamic programming captures the trade-offs between

the instantaneous outcome and future ones. In this work, theobjective of the decision-maker (the STx)

is to find an optimal decision-selection policy that maximizes the expected reward (13). Note that we

make use of the terms “action” and “decision” interchangeably. To devise an optimal action-policy

at each stage, we consider a discrete-time dynamic-system for which the result of every decision is

not fully predictable, but can be estimated through sensingand through collecting DLC feedback

information from the PRx. Since we value more the reward fromearlier stages than rewards that

can be accrued in the future, we define a “discount factor”0 < α < 1 that exponentially discounts

the rewards collected in the future (see [31] and [32]). Given a policy, that is a set of actions to

be performed at every stage of the algorithm, we define the average future-discounted total reward

that the STx accrues as the value function associated to the policy in Eq. (15). In this section, we

quantify:

• The per-stage reward function in Eq. (13).

• The per-stage action set in Eq. (5).

• The per-stage observation set in Eqs. (7) and (8).

• The STx value-function in Eq. (15).

• The optimal policy of Eq. (16) that maximizes Eq. (15).

At time slot t, we let the secondary transmitter take the following three potential actions

at =

0, stay idle, with unit costc0

S, sense, with unit costcS

T, transmit, with unit costcT

(5)

The implicit assumption is that the STx cannot transmit and sense at the same time. Thus, we define

the STx action space as

A = 0,S,T.

At the start of each time slott, the secondary transmitter can decide whether or not to sense primary

transmitter’s activity. Following the forward channel sensing of primary transmission, regardless of

the outcome, our secondary transmitter always listens to the reverse channel to over-hear possible

primary receiver’s DLC signals addressed to the primary transmitter. Our STx can utilize these signals

Page 10: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

10

Fig. 3. Sequences of primary states, secondary actions and relative observation sets.

for sensing confirmation. The probabilityη of correctly decode PU-PRx’s feedbacks is:

η = P

P1d−δ2 |g2|

2

N0> γs

= exp

γsN0

P1d−δ2

; (6)

whereγs is the SNR threshold for packet reception at the STx.

We note that DLC packets are typically easier to decode than data packets. We let there be an

adequate CRC to protect the feedback packet. Hence, if the SUcorrectly decodes the feedback

packet, it is accurate and it confirms with certainty that PU is active in the preceding time slot,

thereby allowing the secondary user to adapt its policy. Let∅ denote the event that STx did not

decode any feedback packet while∅ denotes the complement event. Failure to decode may indicate

an idle PTx or a severely degraded channel between the PRx to the STx.

After performing actionat ∈ A, the STx will collect an observationO(at). Clearly, the observation

space is action-dependent. Fora = S, the observation space is idle (I) or busy (B) followed by the

ACK/NAK decoding result:

Ω(S) = I;B × ∅; ∅. (7)

On the other hand, the observation space fora = 0 or a = T is limited to the ACK/NAK decoding

result:

Ω(T) = Ω(0) = ∅; ∅. (8)

Figure 3 shows a possible realization of primary channel states and the relative secondary transmitter

actions and corresponding observation spaces. Because we have imperfect sensing, the PTx may be

sensed as idle but a feedback ACK/NAK may instead be correctly decoded at the end of the time

slot, informing the STx that there was a sensing error on the PTx activity. This allows sensing-and-

Page 11: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

11

confirmation (SAC) to improve the STx’s knowledge on the PTx’s activity.

3.4. Partially Observable Markov Model

In order to determine an optimized action to take at the STx, we apply a partially observable

Markov decision framework to devise an optimal opportunistic spectrum access strategy that achieves

primary receiver protection in terms of collision probability while maximizing secondary transmitter’s

channel access opportunities. In particular, define the true state of the forward channel att as

st =

0 : PTx = B (busy),

1 : PTx = I (idle).

(9)

DefineS = 0; 1. This binary statest ∈ S is partially observable at the cognitive user through PTx

sensing and through PRx feedback decoding.

Clearly, if the STx is fully aware ofst, then its access strategy can be optimized accordingly to

maximize a user defined utility function. However, both PTx’s activity sensing and PRx’s feedback

decoding are imperfect and prone to errors. For this reason,we define

σt = P[st = 1|STx observations] (10)

as the STx’s information state on the PTx statest based on currentO(at) and past observations.

Observations are collected by the STx to estimate the primary channel state [31]. In order to update

σt, the STx collects the observations up to timet− 1 in a vector

O<t = [O(a0), O(a1), . . . , O(at−1)].

Our access algorithm depends on the state information estimateσt. Hence, we must determine how

the estimateσt can be updated, given each new observation before a new action at+1. We outline

the development of this optimized access control based on the partially observable Markov process

in the next section.

4. ACCESSCONTROL OPTIMIZATION

Let us defineDI , DB as two discrete random variables representing the number ofslots the FCH

has been in the “idle” or the “busy” state, respectively. In this work, we assume that the secondary

Page 12: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

12

transmitter knows the probability distribution of the idleand busy durations:

PI(k) = P [DI = k] , (11)

PB(k) = P [DB = k] . (12)

Assuming a stationary PU activity distribution, this knowledge can be acquired either directly from the

PU hosts or by monitoring PU activities prior to channel access. Note that we are implicitly considering

a time-scale during which the primary Busy/Idle distributions do not change. This assumption is

reasonable because traffic patterns do not dramatically change, e.g., in cellular systems traffic patterns

are normally assumed to stay constant for periods as long as1 hour.

For estimating the duration of a primary idle or busy period upon deployment, the STx senses the

forward channel to detect the first idle to busy transition inprimary traffic. This kind of transition is

easier to detect than a busy-to-idle transition because, inpractice, the start of each packet typically

contains a preamble and/or a pilot sequence for receiver synchronization. Such preamble has unique

characteristics and can be more easily detected by the cognitive radios to detect the beginning of the

first busy period upon their deployment.

To optimize STx control, we define a reward function for each stage (slot). Recall the STx

information stateσt ∈ [0, 1] and its actionat ∈ A. We specify a reward function as

R(σt, at) =

−c0, at = 0,

−cS, at = S,

rT(1− PNAK )− cT · PNAK , at = T.

(13)

where the probability ofPNAK = (1 − σ) · ξ1 is an outage, or collision, probability estimate at the

PRx. Note thatc0 is a constant cost associated to the feedback packet overhearing in case of idle

action;cS is a constant sensing cost;rT is a constant reward that the STx gets after every successful

transmission whereascT is the penalty of collision with a primary transmission.

We define an access control policy as a sequence of functionsπ = µ0, . . . , µt, . . ., in which

µt(σt) is the action taken according to the result inσt

µt : [0, 1] → A. (14)

Let us defineτ0(t0) as the first forward channel state transition from busy to idle from timet0. Given

Page 13: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

13

a policyπ, we define the corresponding value function as:

Vπ(σ, t0, τ0(t0)) = Eπ

[

+∞∑

t=t0

αtr(σt, at)|σt0 = σ

]

. (15)

The optimized access control policyπ∗ is an admissible policy that maximizes the value function

(15) via

π∗ = argmaxπ

Vπ(σ, ts, τ0(ts)). (16)

5. DYNAMIC PROGRAMMING FOR OPTIMIZING SPECTRUM ACCESS

We now present the details of our dynamic programming formulation for optimizing spectrum

access.

5.1. PTx Transition Estimation

In order to estimateσt, the secondary transmitter needs to know the number of time slots in the

same “busy” or “idle” state. For this reason, if, at timet, the STx believes that the PU channel is

in stateq ∈ 0; 1, it uses the past and current observations to estimate the next PU traffic instant

τ1−q(t) that marks the transition from stateq to state1−q. In order to do so, at every time slot, upon

performing an action and collecting corresponding observation, our STx uses a maximum a posterior

(MAP) estimator to estimateτ1−q(t).

At time t, after collecting the observationO(at) corresponding to the actionat ∈ A, the STx

estimates the transition epoch from stateq ∈ 0, 1 to state1− q via

τ1−q(t) = argmaxx>τq(t−1)

ln(P [O(at),O<t|τ1−q(t) = x]) + ln(P [τ1−q(t) = x) . (17)

Note that there are two possible outcomes of the MAP estimateτ1−q(t).

I) τq(t − 1) < τ1−q(t) < t: this MAP outcome suggests that the transition from stateq to 1 − q

happened beforet. In this case, if the posterior probability associated withthe MAP estimation

is above a given threshold, the traffic distribution used to determine when the information state

σt changes fromPq(k) = P [Dq = k] to P1−q(k) = P [D1−q = k]. Moreover, the STx will

erase the vectorO<t and start a new one associated toτ1−q(t).

II) τ1−q(t) ≥ t: this MAP estimate suggests that the PU channel state will stay in stateq at time

t; Hence, STx discards the estimateτ1−q(t).

When τ1−q(t) < t, it is still risky to conclude that a PU state transition tookplace in order to

update the distribution associated to the PU channel state.It is in fact important to take into account

the reliability of this MAP estimate by evaluating its corresponding posterior probabilityσt.

Page 14: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

14

5.2. Information State Update

Consider that, at timet, the STx performed an actionat and collectedO(at) andO<t. The STx

then will use the current and past observations to determinethe next PU traffic transition via the MAP

estimator of Eq. (17). Since the MAP estimator is not perfect, the STx would exploit the information

stateσt to decide the next action to take.

Let us assume that, at the end of time slott, τ1−q(t) ≥ t. In this case (known as case II), the STx

assumes the primary forward channel state did not change. Note that the primary traffic transition

estimateτ1−q(t) is discarded and that the STx will conclude that the FCH is in state q, storing the

value of τq(t− 1) in τq(t).

In the dynamic programming implementation, we use the following information state approxima-

tion:

σt ≃ ρt

= E [st = 1|O(at−1), τq(t)] , t > 0. (18)

Note that, in (18), the information of the vectorO<t, contained inσt (10), is carried over by the

transition epoch estimateτq(t).

Upon performing actionat ∈ A and collecting observationO(at), the STx calculatesρt+1 using

two steps:

1) Determining the posterior probability:

βt = P [st = 1|O(at), τq(t)]. (19)

In order to findβt, the STx determines first the probabilities

v1 = P [O(at)|st = 1], v0 = P [O(at)|st = 0]. (20)

The posterior probability (19) is defined as

βt =v1 · ρt

v1 · ρt + v0 · (1− ρt). (21)

2) Calculating the information vectorρt+1. Let Kq be the estimate of the number of time slots

the PU channel has been in stateq ∈ 0; 1.

Let Dq be a random variable denoting the number of time slots in state q. For computational

tractability, we assume that sinceτq(t), at most one transition may take place. Hence, the

Page 15: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

15

probability thatst remains in stateq at t+ 1 is

P [Dq ≥ Kq + 1|Dq ≥ Kq] =P [Dq ≥ Kq + 1]

P [Dq ≥ Kq]. (22)

To update the probability estimate thatst+1 = 1, we need to define this probability:

=

Kq∑

k=1

p (Dq = k) ·

[

1−P [D1−q ≥ Kq − k + 1]

P [D1−q ≥ Kq − k]

]

. (23)

Hence, the updated information vector is

ρt+1 =P [Dq≥Kq+1]P [Dq≥Kq]

· βt + · (1− βt). (24)

For the sake of simplicity, we will simply denote this updateprocedure by

ρt+1 = UPDATE(O(at), ρt, t, τq(t)). (25)

5.3. Value Iteration and Policy Decision

Based on dynamic programming, we defineV (ρ, t, τq(t)) as the maximum mean discounted value

function that the STx can get at time slott with ρt = ρ. Without loss of generality, let the latest PU

state transition be to stateq. From [30], [31] and [32], we have:

V (ρ, t, τq(t)) = maxa∈0, T, S

Va(ρ, t, τq(t)), (26)

where V0(ρ, t, τq(t)), VS(ρ, t, τq(t)), VT(ρ, t, τq(t)) are the value function associated with actions

at = 0, at = S, andat = T, respectively. These value functions are defined by

V0(ρ, t, τq(t)) = R(ρ, at = 0) + α∑

ω∈Ω(0)

P [O(at) = ω]V (ρt+1(ω), t+ 1, τ1−q(t+ 1));

VS(ρ, t, τq(t)) = R(ρ, at = S) + α∑

ω∈Ω(S)

P [O(at) = ω]V (ρt+1(ω), t+ 1, τ1−q(t+ 1));

VT (ρ, t, τq(t)) = R(ρ, at = T) + α∑

ω∈Ω(T)

P [O(at) = ω]V (ρt+1(ω), t+ 1, τ1−q(t+ 1));

where

ρt+1(ω) = UPDATE(O(at) = ω, ρt = ρ, t, τq(t)))

Note thatR(ρt, a) is the per-stage reward function associated with the actiona; τq(t+1) is the MAP

estimate of the PU traffic transition-epoch at timet+ 1 to state1− q,

Page 16: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

16

6. SIMULATION RESULTS

We now present test results of our optimized access policy based on SAC. Our test scenario involves

a single pair primary nodes and a single pair secondary nodes. Throughout our tests, we set the path

lossδ = 4, to account for surface reflections. Additionally, we setPB(k), PI(k) as uniform in[1, 10]

and [1, 20], respectively. Performance evaluation is in terms of secondary average channel utilization

rate (28), and of collision probability at the primary receiver (27).

Since our scheme applies sensing-and-confirmation (SAC), we wish to analyze the effect of each

detection outcome on secondary performance. For this reason, we took into consideration a scenario

such as the one depicted in Fig. 2, where primary transmitter’s activity detection performance depends

on the distance between the primary and secondary transmitters d0; whereas the probabilitiesη of

correctly decoding a feedback packet from the primary receiver andξ1 of outage in case of concurrent

primary-secondary transmissions depend on the distanced2 between the STx and the PRx. Asd0

increases, the PTx activity detection performance by the STx becomes worse either in terms of false

alarm or missed detection; asd2 increases, the feedback decoding probability decreases together with

the collision probabilityξ1 at the primary receiver.

The collision probability at the primary receiver,Pcoll, and the secondary average channel utilization

rate, (SACUR), are defined as:

Pcoll =

∑Nt=0 I(at = T, ft = NAK)

N + 1, (27)

SACUR=

∑Nt=0 I (at = T, ft 6= NAK)

N + 1, (28)

whereft ∈ ACK,NAK , ∅ is the primary receiver’s feedback at timet; Ns is the simulation run time

andI(·) is the standard indicator function. Note that the above metric counts the effective percentage of

time the STx causes a NAK at the PRx. The parametersη, PM andPF directly affect the belief-vector

dynamics through the information-state update procedure summarized in Eq. (25). The performance

metrics SACUR andPcoll are determined by an optimal action-selection policy derived through Eqs.

(16) and (26). In particular, a collision is expected by the action of STxat = T followed by an NAK

reception. In our experiments, the performance estimates are averaged over N= 100 time slots. This

value is large enough since, as will be shown in Fig. 4, averaging over largerN does not noticeably

change our results.

Page 17: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

17

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

η

N=500N=100

SACUR

Pcoll

Fig. 4. SACUR andPcoll as functions ofη.

6.1. Impact of η.

Before checking the effect ofd0 and d2 on the secondary pair, we show the impact of different

values ofη for ξ1 = 0.99, PF = 0.2 andPM = 0.1.

Our goal is to show that the probability of correctly decoding PRx feedbacks plays a fundamental

role in SUs performance. We expect that, as the value ofη grows to one, the SUs take more advantage

of the information on FCH activities, thereby improving their throughput while keeping the collision

probability low. We also expect that, for high value ofη, the SUs performance will be significantly

higher than the case withη = 0. We kept the value ofξ1 as a constant equal to0.99 to favor

a conservative behavior at the STx. The results are shown in Figure 4. Note that we adjusted the

collision costcT, to keep the collision probability at the primary receiver at a nearly-constant level.

For constantPcoll, the SACUR is a non-decreasing function ofη, as expected. Note that our policy

for η = 0 is equivalent to a LBT strategy.

From the simulation results, we can see that, as long as the probability of correctly decoding a

feedback from the primary receiver is reasonably high, sensing-and-confirmation (SAC) significantly

outperforms the basic LBT access not taking into account thereceiver feedback. These results validate

our motivation that PRx feedbacks give confirmation information about the primary channel state and

therefore helps improve opportunistic spectrum access.

Page 18: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

18

The results are shown for bothN = 100 andN = 500. From the simulation comparison,N = 100

is a sufficiently large.

6.2. Impact of d0

500 1000 1500 2000 2500 3000 3500 40000

0.1

0.2

0.3

0.4

0.5

0.6

d0 [m]

ξ

1=0.6

ξ1=0.7

ξ1=0.8

ξ1=0.99

SACURPcoll

Fig. 5. SACUR andPcoll as functions of d0, whereη = 0.2.

To inspect the effect of primary transmitter’s activity detection on the SACUR, we fixed the

parameterη to 0.2 and compiled results in Figure 2 for different values ofd0.

In this set of simulations,PM is kept at0.1 while the false alarm ratePF varies as determined

by a Neyman-Pearson energy based test as shown in Eqs. (1), (2). This scenario happens when

the PU systems impose a constraint on the missed detection probability for protecting primary

communications as the secondary energy detector experiences differentPF due to different SNR

conditions. Figure 5 shows that SU performance varies in terms of SACUR for differentPF values.

It can be seen that, for high values ofξ1, the SACUR dramatically decreases asd0 increases. This

effect is due to the fact that, when the estimation of the collision probability is high while the channel

conditions between the STx and the PTx do not guarantee a goodPF, the STx policy tends to be

more conservative.

Page 19: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

19

500 600 700 800 900 1000 1100 1200 1300 14000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

d2 [m]

Pcoll

SACUR

Fig. 6. SACUR andPcoll as functions ofd2, whered0 = 875[m].

6.3. Impact of d2

To inspect the effect of the distanced2 between the PRx and the STx, we allowed the feedback

decoding probabilityη, and the outage probabilityξ1 to be determined by (6), and (4) respectively. We

let the detection performance be determined by the distancebetween the PTx and STx:d0. We expect

that, asd2 increases, the STx experience more spectrum opportunitiesincreasing its throughput.

Figure 6 shows the SACUR and collision rate as functions of the distanced2. This result confirms

that the performance of both PRx and STx depend on their mutual interferences. Smallerd2 allows STx

to more accurately estimate the FCH state but also leads to a stronger interference and consequently

higher collision probability. Asd2 increases, the SU throughput increases because its effect on the

PRx is less detrimental. This result confirms our hypothesisthat secondary users opportunistic access

is largely improved by our SAC strategy taking into account the primary receiver reception conditions.

6.4. Algorithm comparison

Since our scheme collects past observations to employ a MAP estimator for primary transmitter’s

traffic transitions forecasting, we compared our channel access algorithm to a “pure dual-sensing”

channel access strategy that can be viewed as a special “Cooperative-Listen-Before-Talk”.

We provide dynamic programming results achieved under two different scenarios: the first assuming

Page 20: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

20

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

η

MAPGENIECLBT

Pcoll

SACUR

Fig. 7. SACUR andPcoll comparison: PF = 0.4, PM = 0.4.

perfectly known primary channel state transitions; the second when the dynamic programming is

carried over through the help of the MAP estimator of Eq. (17). The purpose of this test is owing to

the fact that, when the primary state transition epochs are estimated through the MAP estimator of

Eq. (17), the value function resulting from the dynamic programming is not optimal. For the same

value of the collision probability, we compared the STx throughput of the three schemes. We expect

the MAP-based opportunistic channel access not to be worse than the pure dual-sensing algorithm,

where the FCH is accessed only based on the current observations from both the primary receiver and

transmitter and not on the observation history. This expectation is confirmed by the results of Figure

7, where “CLBT” stands for “Cooperative-Listen-Before-Talk”, “Genie” is the dynamic programming

formulation with perfectly known channel transitions, and“MAP” is the dynamic programming carried

over the help of the MAP estimator. It can be seen that the MAP scheme is always better than pure

CLBT. For largerη, the MAP channel access outperforms CLBT by up to50%, since the channel

state transition estimations becomes more reliable as the decoding probability increases.

7. CONCLUSIONS

In this paper, we introduced a MAP-based opportunistic spectrum access scheme that intelligently

exploits the bi-directional nature of most wireless communication systems. We assumed that the

primary system follows an ON/OFF traffic pattern and that theSU is able to sense the PU forward

Page 21: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

21

channel and to detect PU receiver feedbacks. We demonstrated the advantage of listening to the

primary receiver feedback to improve SUs performance in terms of opportunistic channel access rate.

Moreover, we established a link between our algorithm and cooperative sensing, where we collect

at the same node observations relative to the primary transmitter’s activity and primary receiver’s

reception quality. We showed that, by exploiting historic observations, even sub-optimally through

the use of a MAP estimator, our new strategy can outperform the traditional cooperative sensing

schemes.

8. ACKNOWLEDGMENTS

This material is based upon work supported by the National Science Foundation under Grant

CNS-0917251.

REFERENCES

[1] ’Federal communications commission: spectrum policy task force report’, Federal Communications Commission ET

Docket 02-135, Nov. 2002.

[2] ’Federal communications commission: office of engineering and technology releases TV white space phase II test

report’, http://www.fcc.gov/oet/projects/tvbanddevice/Welcome.html, Nov. 2008.

[3] Woyach K., Pyapali P., and Sahai A.: ’Can we incentivize sensing in a light-handed way?’, IEEE Symp. on New Fron.

in Dyn. Spect. Access Networks, Singapore, Apr. 2010 pp. 1-12

[4] Digham F.F., Alouini M.S., and Simon M.K., ’On the energydetection of unknown signals over fading channels’,

IEEE ICC, May 2003 pp. 3575-3579

[5] Lunden J., Koivunen V., Huttunen A., and Poor H.V.: ’Spectrum sensing in cognitive radios based on multiple cyclic

frequencies’, IEEE CrownCom, USA, Orlando FL, Aug. 2007 pp.37-43

[6] Yucek T., and Arslan H.: ’A survey of spectrum sensing algorithms for cognitive radios’, IEEE Comm. surv. and tut.,

11, (1), 2009 pp. 116-130

[7] Tsitsiklis J.N.: ’Decentralized detection’, Adv.in Statistical Signal Processing, 2, 1993.

[8] da Silva C.R.C.M., Choi B., and K. Kim: ’Distributed spectrum sensing for cognitive radio systems’, ITA Workshop,

USA, San Diego CA, Feb. 2007 pp. 120-123

[9] Tandra R., and Sahai A.: ’Fundamental limits on detection in low SNR under noise uncertainty’, IEEE WirelessCom

Symp. on Emerging Networks, Technologies and Standards, USA, Hawaii, June 2005, pp. 464-469

[10] Tandra R., Sahai A., ’SNR Walls for Signal Detection’, IEEE Journal of Selected Topics in Signal Processing, 2, (1),

2008, pp. 4-17

[11] Mishra S.M., Sahai A., and Brodersen R.W., ’Cooperative sensing among cognitive radios’, IEEE ICC, Turkey, Istanbul,

June 2006 pp. 1658-1663

[12] Quan Z., Cui S., Sayed A.H., Poor H.V., ’Optimal multiband joint detection for spectrum sensing in cognitive radios’,

IEEE Transactions on Signal Processing, 57, (3), 2009 pp. 1128-1140

[13] Liang Y.C., Zeng Y., Peh E.C.Y., and Hoang A.T., ’Sensing-Throughput Tradeoff for Cognitive Radio Networks’,

IEEE Transactions on Wireless Communications, 7, (4), Apr.2008

Page 22: 1 Improved Spectrum Access Control of Cognitive Radios ...liu/paper/FabioIET.pdf · An information theoretic viewpoint for OSA appeared in [15], [16] and [17]. Secondary user policies

22

[14] Cabric D., Mishra S.M., and Brodersen R.W.: ’Implementation Issues in Spectrum Sensing for Cognitive Radios’,

Asylomar Conf. on Signals, Systems and Computers, USA, Monterey CA, Nov. 2004, pp. 772-776

[15] Devroye N., Mitran P., and Tarokh V.: ’Achievable ratesin cognitive radio channels’, IEEE Trans. on Information

Theory, 52, (5), May 2006, pp. 1813-1827

[16] Simeone O., Bar-Ness Y., and Spagnolini U.: ’Stable throughput of cognitive radios with and without relaying

capability’, IEEE trans. on Communications, 55, (12), Dec.2007, pp. 2351-2360

[17] Devroye N., Mitran P., and Tarokh V.: ’Cognitive multiple access networks’, International Symposium on Information

Theory, Australia, Adelaide, SA, Sep. 2005 pp. 57-61

[18] Huang S., Liu X., and Ding Z.: ’Opportunistic spectrum access in cognitive radio networks’, IEEE INFOCOM, USA,

Phoenix AZ, April 2008 pp. 1427-1435

[19] Huang S., Liu X., and Ding Z.: ’Optimization of transmission strategies for opportunistic access in cognitive radio

networks’, UC Davis report, April 2008

[20] Zhao Q., Geirhofer S., Tong L., Sadler B.M.: ’Optimal Dynamic Spectrum Access via Periodic Channel Sensing’,

IEEE Conf. on Wir. Comm. and Networking, China, Kowloon, Mar. 2007 pp. 33-37

[21] Zhao Q., Tong L., and Swami A.: ’Decentralized Cognitive MAC for Dynamic Spectrum Access’, IEEE Symp. on

New Fron. in Dyn. Spect. Access Networks, USA, Baltimore MD,Nov. 2005 pp. 224-232

[22] Geirhofer S., Tong L., and Sadler B.M.: ’A measurement-based model for dynamic spectrum access’, IEEE MILCOM,

USA, Washington DC, Oct. 2006, pp. 1-7

[23] Jung E., and Liu X.: ’Opportunistic spectrum access in heterogeneous user environments’, IEEE Symp. on New Fron.

in Dyn. Spect. Access Networks, USA, Chicago IL, Oct. 2008, pp. 1-11

[24] Zhao Q., Tong L., Swami A., and Chen Y.: ’Decentralized cognitive MAC for opportunistic spectrum access in ad

hoc networks: A POMDP framework’, IEEE Journal Selected Areas in Communications, 2007, 25, (3), pp. 589-600

[25] Huang S., Liu X., and Ding Z.: ’Optimal Sensing-Transmission Structure for Dynamic Spectrum Access’, IEEE

INFOCOM, Brasil, Rio de Janeiro, Apr., 2009 pp. 2295-2303

[26] Zhang R., and Liang Y.C.: ’Exploiting hidden power feedbacks in cognitive radio networks’, IEEE Symp. on New

Fron. in Dyn. Spect. Access Networks, USA, Chicago IL, Oct. 2008, pp. 1-5

[27] Lapiccirella F.E., Huang S., Liu X., and Ding Z.: ’Feedback-based access and power control for distributed multiuser

cognitive networks’, Info. Theo. and Appl. Work., USA, San Diego CA, Feb. 2009, pp. 85-89

[28] Levorato M., Mitra U., and Zorzi M.: ’Cognitive Interference Management in Retransmission-Based Wireless

Networks’, Allerton conf. on Comm. Cont. and Comp., USA, Monticello IL, Sep. 2009, pp. 94-101

[29] Levorato M., Mitra U., and Zorzi M.: ’Cooperation and Coordination in Cognitive Networks with Packet Retransmis-

sion’, Info. Theo. Work., Italy, Taormina, Oct. 2009, pp. 495-499

[30] Bellman R.: ’Dynamic programming’, (Princeton University Press, 1957)

[31] Bertsekas D.P.: ’Dynamic programming and optimal control’, (Athena scientific, 1995)

[32] Ross S.: ’Introduction to Stochastic Dynamic Programming’, (Academic press, 1983)