opportunistic access to spectrum holes between packet

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011 2497

Opportunistic Access to Spectrum Holes BetweenPacket Bursts: A Learning-Based Approach

Kae Won Choi, Member, IEEE, and Ekram Hossain, Senior Member, IEEE

Abstract—We present a cognitive radio (CR) mechanism foropportunistic access to the frequency bands licensed to a data-centric primary user (PU) network. Secondary users (SUs) aim toexploit the short-lived spectrum holes (or opportunities) createdbetween packet bursts in the PU network. The PU traffic patternchanges over both time and frequency according to upper layerevents in the PU network, and fast variation in PU activitymay cause high sensing error probability and low spectrumutilization in dynamic spectrum access. The proposed mechanismlearns a PU traffic pattern in real-time and uses the acquiredinformation to access the frequency channel in an efficient waywhile limiting the probability of collision with the PUs below atarget limit. To design the channel learning algorithm, we modelthe CR system as a hidden Markov model (HMM) and presenta gradient method to find the underlying PU traffic pattern.We also analyze the identifiability of the proposed HMM toprovide a condition for the convergence of the proposed learningalgorithm. Simulation results show that the proposed algorithmgreatly outperforms the traditional listen-before-talk algorithmwhich does not possess any learning functionality.

Index Terms—Cognitive radio, opportunistic spectrum access,energy detection, hidden Markov model (HMM), partially ob-servable Markov decision process (POMDP).

I. INTRODUCTION

THE concept of opportunistic spectrum access (OSA) ismotivated by low spectrum utilization of traditional fixed

spectrum allocation strategies. In order to make efficient useof precious spectrum resources, OSA allows a secondaryuser (SU) to exploit the spectrum bands that a primary user(PU) has priority to access, under the condition that theSU does not cause harmful interference to the PU. With-out explicit negotiation with the PU, the SU autonomouslysenses spectrum bands, finds spectrum holes (i.e., spectrumtemporarily unused by the PUs), and accesses them by tuningits operating parameters. This process requires an intelligentcognition cycle, and therefore, an SU network is consideredas a cognitive radio (CR) network.

In this paper, we propose a CR mechanism for an SUnetwork which shares spectrum bands with a data-centric PUnetwork. In particular, we are interested in exploiting short-lived spectrum opportunities created between packet bursts

Manuscript received February 2, 2010; revised October 30, 2010 andFebruary 7, 2011; accepted May 21, 2011. The associate editor coordinatingthe review of this paper and approving it for publication was Q. Zhang.

This work was supported by Natural Sciences and Engineering ResearchCouncil (NSERC), Canada.

K. W. Choi is with the Department of Computer Science and Engineering,Seoul National University of Science and Technology, Gongneung 2-dong,Nowon-gu, Seoul, Korea.

E. Hossain is with the Dept. of Electrical and Computer Engineering,University of Manitoba, Canada (e-mail: [email protected]).

Digital Object Identifier 10.1109/TWC.2011.060711.100154

of a PU network. Experimental researches on potential PUnetworks (e.g., GSM networks) [1]–[6] have shown that thereexist abundant spectrum opportunities between packet bursts.In [1], [2], it was revealed that there are plenty of gaps betweenconsecutive packets in an 802.11b-based WLAN, even when aWLAN continuously uses a channel for packet transmissions.However, exploiting these spectrum opportunities poses sig-nificant challenges due to the following two characteristics ofa data-centric PU network.

First, the channel usage pattern of PUs changes over timeand frequencies according to upper layer events and trafficloads. Therefore, it is very difficult for an SU to have a properknowledge of the channel usage pattern. Accessing a spectrumwithout knowing the channel usage pattern potentially leads toharmful interference to PUs and also performance degradationof the SU. In the literature (e.g., in [7]–[13]), the channel usagepattern of PUs was modeled either as a two-state Markov ora semi-Markov chain, and the distributions of the lengths of aspectrum opportunity and a packet burst were assumed to bestationary and known to the SU. However, in a data-centricPU network, an SU may not know the channel usage pattern inadvance. Therefore, an SU should estimate the channel usagepattern by using an online learning algorithm.

The second characteristic of a data-centric PU network isthat the lengths of spectrum opportunities and packet burstsare very short (e.g., of the order of milliseconds to seconds).This means that an SU has to perform channel sensing veryfrequently to catch up with the fast variations of PU activity.Since an SU (with a single radio) has to stop data transmissionduring channel sensing, frequent channel sensing leads tolow spectrum utilization [14]. Moreover, the channel sensingtime should be much shorter than the average length of aspectrum opportunity. Due to short channel sensing time, thesensing error probability (i.e., false alarm and misdetectionprobabilities) tends to be high. Most of the related work inthe literature (e.g., in [7]–[13], [15], [16]) assumed perfectsensing (i.e., sensing error probability is zero) and that thechannel sensing time is short enough to be neglected. In apractical CR network, an SU requires to be resilient to highsensing error probability while reducing the channel sensingtime in an intelligent way.

The above-mentioned problems related to spectrum sharingwith a data-centric PU network have not been addressedwell in the previous studies in the literature. This motivatesus to design a channel sensing and channel access schemeconsidering the characteristics of a data-centric PU network.The proposed scheme operates on a learning and access cyclewhere it learns the channel usage pattern and then accesses the

1536-1276/11$25.00 c⃝ 2011 IEEE

Authorized licensed use limited to: Sungkyunkwan University. Downloaded on August 25,2020 at 08:00:01 UTC from IEEE Xplore. Restrictions apply.

2498 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 10, NO. 8, AUGUST 2011

channel based on the learned channel usage pattern. These twofunctionalities are carried out by a channel learning algorithmand a channel access algorithm, respectively. Note that thefunctionality of channel selection in a multi-channel scenario(i.e., determining the order in which the channels need to besensed and/or accessed) is out of the scope of the proposedscheme. The optimal frequency channel selection problem wasaddressed in [7], [8], [16]–[18].

Taking the sensing results obtained by a channel sensingmethod as inputs, the channel learning algorithm estimatesthe channel usage pattern in the PU network. To deal witherroneous sensing results, we design this algorithm by usinga hidden Markov model (HMM) [19]. Based on a sequenceof sensing results, which act as observations in the HMM,the channel usage pattern is calculated iteratively by usingthe gradient method [20]. This algorithm estimates not onlythe traffic pattern of PUs but also the signal-to-noise ratio(SNR) corresponding to a PU signal. To show under whatcondition the channel usage pattern can be estimated, weprovide an analysis of the equivalence and the identifiabilityof the proposed HMM. The channel usage pattern is used bythe channel access algorithm for efficient data transmission inthe SU network. Although in the literature there have been fewalgorithms for estimating the PU traffic pattern (e.g., in [15],[21]), they are neither robust to high sensing error probabilitiesnor able to estimate the SNR of a PU signal. There have beenfew works (e.g., [22] and [23]) which modeled a CR system asan HMM. However, these works did not address the problemof parameter estimation from the erroneous sensing results.

Using the channel access algorithm, which is developedbased on a partially observable Markov decision process(POMDP) framework [24], an SU transmits data packetswhile avoiding interference to the PU network. The algorithmadaptively decides whether to perform channel sensing ortransmit user data in each time slot to prevent unnecessarysensing.

The main contributions of the paper can be summarized asfollows:

∙ We present an optimized OSA scheme for cognitive ra-dios coexisting with a data-centric PU network. With thisscheme, an SU can effectively use spectrum opportunitiesbetween packet bursts, maximize spectrum utilization,and maintain its data connection even when a spectrumis densely occupied by PUs. The proposed scheme notonly detects instantaneous PU activity but also learnsthe channel usage pattern in the PU network. Based onthe estimated channel usage information, the proposedscheme adjusts the parameters for accessing a frequencychannel. This learning and access cycle makes it possiblefor an SU to adapt itself to a time-varying channel usagepattern in the PU network. Also, the proposed scheme isfavorable to practical implementation, since it needs verylittle prior knowledge about the PU network.

∙ The channel learning algorithm is developed by solvingthe parameter estimation problem in the HMM. Thisalgorithm is resilient to sensing errors and can estimatethe SNR of a PU signal, which the existing parameterestimation algorithms for the CR systems are not capableof. We also analyze the identifiability of the proposed

TABLE ITABLE OF SYMBOLS

Symbol Definition𝑀 Number of frequency channels𝑊 Bandwidth of a frequency channel𝜆 Transition rate from state 0 to state 1 in PU traffic model𝜇 Transition rate from state 1 to state 0 in PU traffic model𝛾 SNR of a PU signalu Channel usage pattern, i.e., u := (𝜆, 𝜇, 𝛾)𝒰 Set of possible channel usage patterns𝛿 Threshold for energy detection

𝐷(𝜌) Probability that an SU detects PU to be active duringa slot when the average SNR of PU signal is 𝜌

𝑁𝐿 Number of slots in a channel learning subframe𝑁𝐴 Number of slots in a channel access subframe𝑇 Length of a slotu𝑘 Channel usage pattern in frame 𝑘u𝑘 Estimate of the channel usage pattern in frame 𝑘

𝜁𝐿𝑘,𝑛 Sensing result generated in slot 𝑛 in thechannel learning subframe of frame 𝑘

𝜁𝐴𝑘,𝑛 Sensing result generated in slot 𝑛 in thechannel access subframe of frame 𝑘

𝛼𝑛 PU activity at time 𝑡 = (𝑛 − 1)𝑇 ,when 𝑡 = 0 at the start of a subframe

s𝑛 State of slot 𝑛, i.e., s𝑛 := (𝛼𝑛, 𝛼𝑛+1)𝒮 State space, i.e., 𝒮 := {(0, 0), (0, 1), (1, 0), (1, 1)}𝑜𝑛 Observation in slot 𝑛𝒪 Observation space

𝑝𝑖,𝑗𝑙,𝑚 State transition probability from s𝑛 = (𝑙, 𝑚) to s𝑛+1 = (𝑖, 𝑗)

𝑟𝑖,𝑗 State transition probability from 𝛼𝑛 = 𝑖 to 𝛼𝑛+1 = 𝑗𝑞𝑚𝑖,𝑗 Observation probability that the observation 𝑜𝑛 is

𝑚 given that the state s𝑛 is (𝑖, 𝑗)𝑎𝑛 Action in slot 𝑛𝒜 Action space, i.e., 𝒜 := {0, 1}𝐶 Collision probability

𝐶lim Collision probability limit𝑅(s, 𝑎) Reward for given state s and action 𝑎𝝅𝑛 Belief vector for slot 𝑛, i.e., 𝝅𝑛 := (𝜋𝑛

0,0, 𝜋𝑛0,1, 𝜋𝑛

1,0, 𝜋𝑛1,1)

Π Domain of a belief vector𝑉 ∗𝑛 Optimal value function for slot 𝑛

𝜷∗ Optimal policy, i.e., 𝜷∗ := (𝛽∗1 , . . . , 𝛽∗

𝑁𝐴)

𝜷sub Suboptimal policy, i.e., 𝜷sub := (𝛽sub1 , . . . , 𝛽sub

𝑁𝐴)

HMM and show that the proposed channel learningalgorithm can estimate the channel usage pattern undersome mild conditions. To our knowledge, the problem ofthe identifiability of an HMM was not addressed in theexisting works on CR systems.

The rest of the paper is organized as follows. Section IIdescribes the system model and assumptions and proposesthe OSA scheme for exploiting short spectrum opportunitiesbetween packet bursts. The channel learning algorithm is de-scribed in Section III. In Section IV, we introduce the channelaccess algorithm. In Section V, we present representativenumerical results. Section VI concludes the paper. A list ofthe key mathematical symbols used in this paper is shown inTable I.

II. SYSTEM MODEL AND PROPOSED SPECTRUM ACCESS

PROTOCOL

A. Network Model

The PU network has a license to use𝑀 frequency channelseach of which has a bandwidth of𝑊 . In Section II-B, we willdescribe the channel usage model of the PU network. The SUnetwork could be either an ad hoc or an infrastructure-based


CHOI and HOSSAIN: OPPORTUNISTIC ACCESS TO SPECTRUM HOLES BETWEEN PACKET BURSTS: A LEARNING-BASED APPROACH 2499

(b) Time-domain example ofchannel usage pattern

Time

(a) Two-state Markov chain

0µ

1

: PU is active

SNR

Fig. 1. Two-state Markov model and an example of channel usage pattern.

network. We focus on the operation of a single SU in theSU network. The SU can communicate with other SUs (orthe secondary network controller) via one radio transceiverthat can be tuned to one of the 𝑀 frequency channels at atime. The SU can access a frequency channel only when thereis no PU activity in that channel. We assume that the SUperforms spectrum sensing by means of energy detection. Thespectrum sensing model will be described in Section II-C.We will explain the details of the OSA scheme for an SU inSection II-D.

B. Primary User Channel Usage Model

We adopt a two-state continuous-time Markov chain(CTMC) to model PU traffic in a channel [7]–[11], [13],[25].1 Fig. 1 shows the two-state CTMC model in which thestates represent PU activity in a channel. The PU activity ona frequency channel alternates between state 1 (i.e., active)and state 0 (i.e., inactive). The lengths of an active period andan inactive period in a channel are exponentially distributedwith the average length of 1/𝜇 and 1/𝜆, respectively, where𝜆 and 𝜇 denote PU state transition rates. We also incorporatethe SNR of a PU signal, 𝛾, into the PU channel usage model,since it significantly affects the channel sensing performance.Now, the PU channel usage is completely determined bythree parameters 𝜆, 𝜇, and 𝛾. We define the “channel usagepattern”, denoted by u, as the vector of these parameters, i.e.,u := (𝜆, 𝜇, 𝛾).

Many experimental studies on potential PU networks haveshown that traffic characteristics vary over time [1], [2], [4],[6] and frequencies [3], [5]. There can be several reasons forthis PU behavior. First, the channel usage pattern can varyaccording to the configurations of the upper layer protocols.For example, the channel usage pattern is affected by the typeof PU application (e.g., voice call, video streaming, file trans-fer, and web browsing, etc.) and its parameter settings (e.g.,source rate of video streaming)2. PU applications determinethe traffic properties such as the packet length and the packetarrival rate, which, in turn, affect the channel usage pattern.

1In some works (e.g., [2], [12], [15], [21]), PU traffic was modeled bya two-state semi-Markov process, which is a generalization of the two-stateMarkov process. In the semi-Markov process, the sojourn time on each statefollows an arbitrary distribution (e.g., hyper-Erlang distribution [2]). Althoughthe semi-Markov process provides a more accurate fit for empirical data, theMarkov process is a good approximation with mathematical tractability [11].

2For example, in [2], the authors presented the distribution of idle periodsexperimentally estimated from an IEEE 802.11b-based WLAN with the userdatagram protocol (UDP) traffic. It was shown that the distribution of idleperiods differs for two different packet arrival rates of 25 packets/s and 100packets/s.

Second, the channel usage pattern depends on the traffic loadin the PU network, which may vary over time. In [4], [6], itwas shown that traffic load in voice-centric cellular networksvaries according to the time of the day.

An SU should track the variation of the channel usagepattern in order to access the channel in an optimal way. Weassume that the channel usage pattern is restricted to a certainregion 𝒰 , i.e., u ∈ 𝒰 . Also, it is assumed that the channelusage pattern varies slowly so that an SU can estimate thechannel usage pattern by gathering statistical information froma number of packet bursts and spectrum opportunities.

C. Secondary User Energy Detection Model

An SU performs energy detection on a frequency chan-nel for a time duration of 𝑇 . Recall that 𝑊 denotes thebandwidth of a frequency channel. The energy detector takes𝑊𝑇 baseband complex signal samples during an energydetection period. Let 𝑦𝑖 denote the 𝑖th signal sample. Then,we have 𝑦𝑖 = 𝑥𝑖 + 𝑛𝑖, where 𝑥𝑖 is a PU signal and 𝑛𝑖 isthe thermal noise with the noise spectral density of 𝑁𝑜. Togenerate a test statistic, denoted by 𝜉, the energy detectorestimates the normalized energy in the signal samples as𝜉 = 1

𝑊𝑇𝑁𝑜

∑𝑊𝑇𝑖=1 ∣𝑦𝑖∣2. Let 𝜁 denote the sensing result. To

conclude whether the channel is in use or not, the energydetector compares 𝜉 with a given threshold 𝛿. If 𝜉 > 𝛿, thedetector concludes that the channel is in use (i.e., 𝜁 = 1).Otherwise, 𝜁 = 0.

We require to find the distribution of the test statistic andcalculate the detection probability. Let 𝜌 denote the averageSNR of a PU signal during an energy detection period,i.e., 𝜌 := 1

𝑊𝑇𝑁𝑜

∑𝑊𝑇𝑖=1 E[∣𝑥𝑖∣2]. If the number of signal

samples (i.e., 𝑊𝑇 ) is sufficiently large, the test statistic 𝜉follows a normal distribution with mean (1 + 𝜌) and variance(1+2𝜌)/(𝑊𝑇 ) [26]. From the distribution of the test statistic,we can calculate the probability that an SU senses the channelto be active (i.e., 𝜁 = 1) as a function of the average SNR, 𝜌.From [26], we have

𝐷(𝜌) := Pr[𝜉 ≥ 𝛿] = 𝑄(

𝛿 − (1 + 𝜌)√(1 + 2𝜌)/(𝑊𝑇 )

)(1)

where 𝑄 denotes the Q-function defined as 𝑄(𝑥) :=1√2𝜋

∫∞𝑥 exp(−𝑢2

2 )𝑑𝑢.

D. Channel Sensing and Access to Exploit Short-Lived Spec-trum Opportunities

For the proposed scheme, time is divided into frames(Fig. 2) which are indexed by 𝑘. It is assumed that framesynchronization is maintained in the SU network. The lengthof a frame is short enough so that the channel usage patternremains unchanged during a frame. A frame is further di-vided into a channel learning subframe and a channel accesssubframe3. An SU estimates the channel usage pattern on thecurrent channel during a channel learning subframe, and basedon the estimated channel usage pattern, it exchanges user datawith other SUs during a channel access subframe. A channellearning subframe and a channel access subframe consist of

3We will explain the rationale behind this frame structure in Section III-E.



Data packet

Time

Frame k+1 Frame k+2

Channel m

Channel (m+1)

Channel learningsubframe

Channel access subframe

: Sensing: Data transmission

Frame k-2 Frame k-1

: PU is active

Frame k

Data transmission

Energy detection

Sensing

T

NA slots

T

NL slots

Fig. 2. Frame structure of the proposed scheme.

Fig. 3. Overall operation of the proposed scheme.

𝑁𝐿 slots and 𝑁𝐴 slots, respectively. The length of a slot is𝑇 . We have to set the length of a slot short enough to preventPU activity from changing multiple times during a slot. AnSU senses the channel and produces a sensing result in eachslot during the channel learning subframe. On the other hand,an SU either performs sensing or transmits user data duringthe channel access subframe.

The overall operation of the proposed scheme for an SUis summarized in Fig. 3. From the sensing results obtainedduring a channel learning subframe of frame 𝑘, the SUcalculates the estimate of the channel usage pattern in frame 𝑘,denoted by u𝑘 = (��𝑘, ��𝑘, 𝛾𝑘). Then, based on the estimatedchannel usage pattern, u𝑘, it decides whether to change thechannel or not. If the SU judges that there are sufficientspectrum opportunities to support its quality-of-service (QoS)requirements,4 it stays on the current channel and exchangesdata packets during the following channel access subframe.Otherwise, it switches to another frequency channel in thenext frame. The SU can simply switch to the next availablefrequency channel, or it can use more sophisticated algorithms

4For example, the SU can decide that the QoS is supported if the dutycycle, ��𝑘/(��𝑘 + ��𝑘), and the SNR, 𝛾𝑘 , exceed their respective thresholds.

proposed for the frequency channel selection problem in theliterature (e.g., in [7], [8], [16]–[18]).

During the channel learning subframe in frame 𝑘, theSU estimates the current channel usage pattern, denoted byu𝑘 = (𝜆𝑘, 𝜇𝑘, 𝛾𝑘). Each of the 𝑁𝐿 slots in the channellearning subframe is indexed by 𝑛 = 1, . . . , 𝑁𝐿. In eachslot, the SU performs energy detection and generates a binarysensing result. Let 𝜁𝐿𝑘,𝑛 denote the sensing result generated inslot 𝑛 in the channel learning subframe of frame 𝑘. From thesequence of the sensing results, 𝜻𝐿

𝑘 := {𝜁𝐿𝑘,1, . . . , 𝜁𝐿𝑘,𝑁𝐿}, the

“channel learning algorithm” in the SU calculates the estimateof the channel usage pattern, u𝑘. In Section III, we will explainthe channel learning algorithm in detail.

Let us explain the operation of an SU when it decides toaccess the current channel during a channel access subframe.Each of the 𝑁𝐴 slots in the channel learning subframe isindexed by 𝑛 = 1, . . . , 𝑁𝐴. During a slot of the channel accesssubframe, the SU can either perform sensing or transmit userdata. If it chooses to perform sensing in slot 𝑛, it obtains 𝜁𝐴𝑘,𝑛,which denotes the sensing result generated in slot 𝑛 in thechannel access subframe of frame 𝑘. Otherwise, it transmitsdata packet(s) in slot 𝑛. For each slot 𝑛 in the channel accesssubframe, the “channel access algorithm” residing in the SUdecides whether to perform sensing or data transmission,based on the sensing results from slot 1 to slot (𝑛 − 1).The channel access algorithm also utilizes the channel usagepattern estimated in the preceding channel learning subframe.From this information, the channel access algorithm adjustsits parameters so that it can maximize the channel utilizationwhile limiting the interference caused to the PU network to thetolerable level. We will explain the channel access algorithmin Section IV.

III. LEARNING CHANNEL USAGE PATTERN DURING

CHANNEL LEARNING SUBFRAME

A. Hidden Markov Model for Channel Learning Subframe

We model a channel learning subframe as an HMM [19].An HMM is described by state space, state transition probabil-ity, observation space, and observation probability. Considerregularly spaced discrete time instants (e.g., beginning of timeslots). At any time instant, the system is in one of the statesin the countable state space. The evolution of states over timefollows a Markov process in accordance with the state transi-tion probability. The state is hidden to the agent and can onlybe inferred from noisy observations. At each time, the agentreceives an observation from the observation space accordingto the observation probability. For an HMM, the standardgradient method can be used to find the model parameters,which are most likely, given the received observation sequence[27]. We will use this technique to estimate the channel usagepattern. For more information on HMM, please refer to [19]and [27].

In our system model, the SU (i.e., the agent) obtains noisysensing results about underlying PU activities. Therefore, PUactivities in a channel can be modeled as hidden states,while sensing results are modeled as observations. Then, thestate transition probabilities depend on the state transitionrates in PU activity (i.e., 𝜆 and 𝜇), and the observation



probabilities are related to the detection probabilities, whichin turn are determined mainly by the SNR of a PU signal(i.e., 𝛾). This means that the state transition and observationprobabilities are functions of the channel usage pattern. Fromthe HMM, we can calculate the log-likelihood of the receivedsensing results, 𝜻𝐿

𝑘 , given the channel usage pattern, u, thatis, ln(Pr[𝜻𝐿

𝑘 ∣u]). To find the most likely channel usage patternfor the received sensing results, the SU updates the estimateof the channel usage pattern toward the gradient direction sothat ln(Pr[𝜻𝐿

𝑘 ∣u]) increases in each iteration. We will explainthe details of the algorithm later in this section.

To set up an HMM, we first define states and observations.As seen in Fig. 4, a state is defined for each slot to reflect thePU activities at the start and the end of the slot. Let 𝑡 = 0 atthe start of the channel learning subframe. Then, 𝛼𝑛 denotesthe PU activity at time 𝑡 = (𝑛 − 1)𝑇 (i.e., at the start ofslot 𝑛 or at the end of slot (𝑛 − 1)). We have 𝛼𝑛 = 1, ifthe PU is active at 𝑡 = (𝑛 − 1)𝑇 ; and 𝛼𝑛 = 0 otherwise.The state of slot 𝑛, which is denoted by 𝑠𝑛, is defined asthe vector of the PU activities at the start and the end of slot𝑛, i.e., 𝑠𝑛 := (𝛼𝑛, 𝛼𝑛+1). Then, 𝑠𝑛 is one of four possiblestates in the state space 𝒮 := {(0, 0), (0, 1), (1, 0), (1, 1)}. Ifwe consider an HMM of length 𝑁 , a sequence of the statesis given by s := {𝑠1, . . . , 𝑠𝑁}. We assume that a slot is shortenough so that the PU activity does not change more than oncewithin a slot. Then, if the state is (0, 0) or (1, 1), the PU staysinactive or active all along a slot. On the other hand, if thestate is (0, 1) or (1, 0), the PU activity changes once during aslot. The observation in slot 𝑛, which is from the observationspace 𝒪 := {0, 1}, is denoted by 𝑜𝑛. The observation 𝑜𝑛 isequal to the sensing result from slot 𝑛. That is, if the currentframe is 𝑘, we have 𝑜𝑛 = 𝜁𝑘,𝑛. Let o := {𝑜1, . . . , 𝑜𝑁} be asequence of the observations.

Now, we define the state transition and observation probabil-ities. Let 𝑝𝑖,𝑗𝑙,𝑚 denote the state transition probability from state(𝑙,𝑚) to state (𝑖, 𝑗). That is, 𝑝𝑖,𝑗𝑙,𝑚 := Pr[𝑠𝑛+1 = (𝑖, 𝑗)∣𝑠𝑛 =(𝑙,𝑚)]. Since the PU activity at the end of a slot is the same asthat at the start of the next slot, we have 𝑝𝑖,𝑗𝑙,𝑚 = 0 for 𝑚 ∕= 𝑖.If 𝑚 = 𝑖, then 𝑝𝑖,𝑗𝑙,𝑚 is equal to the probability that 𝛼𝑛+1 = 𝑗given 𝛼𝑛 = 𝑖, i.e., Pr[𝛼𝑛+1 = 𝑗∣𝛼𝑛 = 𝑖]. Let 𝑟𝑖,𝑗 denotePr[𝛼𝑛+1 = 𝑗∣𝛼𝑛 = 𝑖]. If u = (𝜆, 𝜇, 𝛾) is the channel usagepattern in the frame of interest, we can calculate 𝑟0,0 = 𝑒−𝜆𝑇 ,𝑟0,1 = 1−𝑒−𝜆𝑇 , 𝑟1,0 = 1−𝑒−𝜇𝑇 , and 𝑟1,1 = 𝑒−𝜇𝑇 . Therefore,we can calculate the state transition probability matrix as

p :=

⎡⎢⎢⎢⎣𝑝0,00,0 𝑝0,00,1 𝑝0,01,0 𝑝0,01,1

𝑝0,10,0 𝑝0,10,1 𝑝0,11,0 𝑝0,11,1

𝑝1,00,0 𝑝1,00,1 𝑝1,01,0 𝑝1,01,1

𝑝1,10,0 𝑝1,10,1 𝑝1,11,0 𝑝1,11,1

⎤⎥⎥⎥⎦

=

⎡⎢⎢⎣𝑒−𝜆𝑇 0 𝑒−𝜆𝑇 0

1− 𝑒−𝜆𝑇 0 1− 𝑒−𝜆𝑇 00 1− 𝑒−𝜇𝑇 0 1− 𝑒−𝜇𝑇

0 𝑒−𝜇𝑇 0 𝑒−𝜇𝑇

⎤⎥⎥⎦ . (2)

The initial state distribution is denoted by 𝝅 :=(𝜋0,0, 𝜋0,1, 𝜋1,0, 𝜋1,1)

𝑇 , where 𝜋𝑖,𝑗 := Pr[s1 = (𝑖, 𝑗)]. Itis assumed that the initial state distribution is equal tothe stationary state distribution. Therefore, we have 𝝅 =

Fig. 4. State transition in a subframe.

(𝑟0,0𝑟1,0/(𝑟0,1 + 𝑟1,0), 𝑟0,1𝑟1,0/(𝑟0,1 + 𝑟1,0), 𝑟1,0𝑟0,1/(𝑟0,1 +𝑟1,0), 𝑟1,1𝑟0,1/(𝑟0,1 + 𝑟1,0))

𝑇 .We define 𝑞𝑚𝑖,𝑗 as the probability that the observation 𝑜𝑛 is

𝑚 given that the state 𝑠𝑛 is (𝑖, 𝑗). That is, 𝑞𝑚𝑖,𝑗 := Pr[𝑜𝑛 =𝑚∣𝑠𝑛 = (𝑖, 𝑗)]. Recall that 𝐷(𝜌) is the probability of detectingPU activity during a slot when the average SNR correspondingto a PU signal is 𝜌. If the state is (0, 0), the average SNR ofa PU signal during the slot is 0, and therefore 𝑞10,0 = 𝐷(0).In the case that the state is (1, 1), the average SNR during theslot is 𝛾, since the SU receives a PU signal all along the slot.Thus, we have 𝑞11,1 = 𝐷(𝛾). On the other hand, when the stateis (1, 0), the PU activity changes from active to inactive at atime point during the slot. If the channel becomes inactive aftertime 𝑡 from the start of the slot, the average SNR during theslot is 𝛾𝑡/𝑇 . Also, the probability density function (pdf) of theelapsed time until the PU activity changes is given as 𝜇𝑒−𝜇𝑡

1−𝑒−𝜇𝑇 .

Therefore, we have 𝑞11,0 =∫ 𝑇

0𝜇𝑒−𝜇𝑡

1−𝑒−𝜇𝑇 𝐷(𝛾𝑡/𝑇 )𝑑𝑡. We can

also calculate 𝑞10,1 =∫ 𝑇

0𝜆𝑒−𝜆𝑡

1−𝑒−𝜆𝑇 𝐷(𝛾 − 𝛾𝑡/𝑇 )𝑑𝑡 in a similarway. To simplify the HMM model, we introduce Υ(𝛾) :=1𝑇

∫ 𝑇

0𝐷(𝛾𝑡/𝑇 )𝑑𝑡. Then, 𝑞11,0 =

∫ 𝑇

0𝜆𝑒−𝜆𝑡

1−𝑒−𝜆𝑇 𝐷(𝛾 − 𝛾𝑡/𝑇 )𝑑𝑡and 𝑞10,1 =

∫ 𝑇

0𝜆𝑒−𝜇𝑡

1−𝑒−𝜇𝑇 𝐷(𝛾𝑡/𝑇 )𝑑𝑡 can well be approximatedby Υ(𝛾), when 𝜆 and 𝜇 are sufficiently small. From thisapproximation, the observation probability matrix is given as

q :=

[𝑞00,0 𝑞00,1 𝑞01,0 𝑞01,1𝑞10,0 𝑞10,1 𝑞11,0 𝑞11,1

]

=

[1−𝐷(0) 1−Υ(𝛾) 1−Υ(𝛾) 1−𝐷(𝛾)𝐷(0) Υ(𝛾) Υ(𝛾) 𝐷(𝛾)

]. (3)

Given the HMM defined by the state transition and ob-servation probabilities, the problem at hand is the parameterestimation problem in which the true channel usage patternis estimated from the received sensing results (i.e., the obser-vation, o = {𝑜1, . . . , 𝑜𝑁}). The true channel usage pattern isdenoted by u∗ = (𝜆∗, 𝜇∗, 𝛾∗).

B. Equivalence, Identifiability, and Consistency of ProposedHidden Markov Model

The problem of parameter estimation in the proposed HMMis not a trivial problem since the SU can only see theobservations, not the underlying states. For example, whenthe observation changes, the SU does not know whether it iscaused by a PU state transition or a channel sensing error.Thus, one can suspect that the high sensing error rate canbe misinterpreted as the high PU transition rate, leading toincorrect estimation of the true channel usage pattern. Fortu-nately, the PU state transition and the channel sensing error



induce different statistical characteristics of the observationsequence, and the true channel usage pattern is identifiablefrom the standpoint of the SU only by imposing some mildconditions.

Let us explain the equivalence and the identifiability ofHMMs. Two HMMs with different parameters, u and u, aresaid to be equivalent if and only if they generate the samestochastic observation sequence as

Pr[o = x∣u] = Pr[o = x∣u], ∀𝑁 = 1, 2, . . . ,

and ∀𝑥𝑛 ∈ {0, 1} for 𝑛 = 1, . . . , 𝑁 (4)

where x := {𝑥1, . . . , 𝑥𝑁}. With a slight abuse of notation,let 𝜋𝑖,𝑗(u) := Pr[s1 = (𝑖, 𝑗)∣u], 𝑟𝑖,𝑗(u) := Pr[𝛼𝑛+1 =𝑗∣𝛼𝑛 = 𝑖,u], and 𝑞𝑚𝑖,𝑗(u) := Pr[𝑜𝑛 = 𝑚∣𝑠𝑛 = (𝑖, 𝑗),u] denotethe initial, transition, and observation probabilities given thechannel usage pattern u. We can calculate

Pr[o = x∣u] =∑𝑦1,...,𝑦𝑁+1

𝜋𝑦1,𝑦2(u)

𝑁∏𝑛=2

𝑟𝑦𝑛,𝑦𝑛+1(u)

𝑁∏𝑛=1

𝑞𝑥𝑛𝑦𝑛,𝑦𝑛+1

(u) (5)

where 𝑦𝑛 ∈ {0, 1} for all 𝑛. If two HMMs are equivalent,it is impossible to distinguish these HMMs based on theobservations.

To test the equivalence of two HMMs, we can apply thealgorithm proposed in [28] for the aggregated Markov process(AMP). The AMP is a class of the HMM where an observationis a deterministic function of a state. Our HMM can beconverted to an AMP. Different from the state of an HMM,the state of the corresponding AMP is a vector composed ofa sensing result and a PU state, that is, s𝑛 = (𝑜𝑛, 𝛼𝑛+1). Thetransition probability matrix of an AMP is a 4-by-4 matrixsuch that

h :=

[h0 h0

h1 h1

],where h𝑚 :=

[𝑟0,0𝑞

𝑚0,0 𝑟1,0𝑞

𝑚1,0

𝑟0,1𝑞𝑚0,1 𝑟1,1𝑞

𝑚1,1

],

for 𝑚 = 0, 1. (6)

The initial state distribution is equal to the stationary state dis-tribution. Let 𝑓 denote the deterministic function mapping thestate to the observation. We have 𝑓((0, 0)) = 0, 𝑓((0, 1)) = 0,𝑓((1, 0)) = 1, and 𝑓((1, 1)) = 1. We can easily verify that thisAMP is exactly the same as the original HMM. The followingtheorem states the condition for two AMPs to be equivalent.

Theorem 1 (Equivalence of two AMPs). The AMP with thetransition probability matrix h is equivalent to the AMP withthe transition probability matrix h if and only if the followingconditions are met.

∙ If 1𝑇h0𝝉 = 0 and 1𝑇 h0𝝉 = 0, the following equalityholds: 1𝑇h0 = 1𝑇 h0.

∙ Otherwise, there exists a 2-by-2 matrix X such that1𝑇X = 1𝑇 , Xh0 = h0X, and Xh1 = h1X,

where 𝝉 = (1,−1)𝑇 and 1 is a column vector of all ones.

Proof: See Appendix A for the proof.An HMM with the true parameter u∗ ∈ 𝒰 is said to be

identifiable if and only if for all u ∈ 𝒰 such that u ∕= u∗,the HMM with the parameter u is not equivalent to the HMMwith the true parameter u∗. We can estimate the true parameter

of an HMM from the observations only if the HMM isidentifiable. In the following theorem, we provide a conditionfor the AMP corresponding to an HMM to be identifiable.

Theorem 2 (Identifiability of an AMP). The AMP with thetransition probability matrix h is identifiable if 1𝑇h0𝝉 ∕= 0and there does not exist any 2-by-2 matrix X ∕= I and 𝛾 ≥ 0that satisfies

1𝑇X = 1𝑇 and F(𝛾) ∘ (Xh0X−1) = G(𝛾) ∘ (Xh1X

−1)(7)

where I is the identity matrix, the notation ∘ is the entrywise(Hadamard) product, and F(𝛾) and G(𝛾) are 2-by-2 matricessuch that

F(𝛾) :=

[𝐷(0) Υ(𝛾)Υ(𝛾) 𝐷(𝛾)

]and G(𝛾) :=

[1−𝐷(0) 1−Υ(𝛾)1−Υ(𝛾) 1−𝐷(𝛾)

].

(8)

Proof: See Appendix B for the proof.Roughly speaking, X and 𝛾 satisfying the condition in

(7) do not exist in general, since the condition involves fivevariables (i.e., 𝛾 and four entries in X) while there are sixequations. Although it is hard to make more precise statement,we can say that the proposed HMM is identifiable in mostcases if 1𝑇h0𝝉 ∕= 0 is satisfied.

As long as an HMM is identifiable, the maximum likelihood(ML) estimation can find the true channel usage pattern. Letus define Ξ(o;u) := ln(Pr[o∣u]) as the log-likelihood of theobservation o given the channel usage pattern u. The MLestimator of the true channel usage pattern u∗ is obtainedfrom

u = argmaxu∈𝒰

Ξ(o;u). (9)

The ML estimator u of u∗ is said to be strongly consistentwhen u almost surely converges to u∗ as the length ofobservations,𝑁 , goes to infinity. In [29], it was proven that thestrong consistency holds if an HMM with the true parameteru∗ is identifiable. In our problem, the strong consistencymeans that the ML estimator in (9) can estimate the truechannel usage pattern u∗ in 𝒰 if the length of the channellearning subframe is long enough.

C. Gradient Method for Maximum Likelihood Estimation ofChannel Usage Pattern

For the given observation, the ML estimator in (9) canbe found by using either the expectation-maximization (EM)algorithm or the standard gradient method [19]. In this paper,we adopt the gradient method since the EM algorithm can onlybe used in case of the usual parametrization and the gradientmethod can easily be modified so that it recursively updatesthe parameter. Unfortunately, the gradient method as well asthe EM algorithm can only find a local optimal point sinceΞ(o;u) is not a convex function. Algorithms that globallymaximize the log-likelihood function of a general HMM arenot known yet [27].

In each iteration, the gradient method updates the estimateof the channel usage pattern toward the gradient directionof the log-likelihood function Ξ(o;u). Let u(𝑗) denote theestimate of the channel usage pattern at the 𝑗th iteration. The



initial estimate u(0) can be set to an arbitrary channel usagepattern in 𝒰 . At the 𝑗th iteration, the gradient method updatesthe estimate as follows:

u(𝑗) = Θ𝒰 [u(𝑗−1) + 𝜎(𝑗) ⋅ ∇Ξ(o; u(𝑗−1))] (10)

where 𝜎(𝑗) is a step size, Θ𝒰 [⋅] is the projection onto the set𝒰 , and ∇Ξ(o;u) is the gradient of Ξ(o;u) such that

∇Ξ(o;u) :=(∂Ξ

∂𝜆(o;u),

∂Ξ

∂𝜇(o;u),

∂Ξ

∂𝛾(o;u)

). (11)

The iteration stops when u(𝑗) sufficiently converges to acertain channel usage pattern.

The gradient in (11) can be derived by calculating thepartial derivatives of Ξ(o;u) with respect to 𝜆, 𝜇, and 𝛾.In Appendix C, we calculate the partial derivatives. We cancalculate 𝜙(o;u), 𝜔𝑖(o;u), 𝜒𝑖,𝑗(o;u), and 𝜓𝑚

𝑖,𝑗(o;u) by usingthe forward-backward method in [19].

D. Recursive Algorithm for Maximum Likelihood Estimation

The above-mentioned gradient method has to update thechannel usage pattern multiple times within a frame, whichcan be computationally complex. To reduce the complexity,we can alternatively adopt the recursive algorithm [20]. Therecursive algorithm updates the estimate of the channel usagepattern only once in each frame 𝑘 on the basis of its sens-ing result 𝜻𝐿

𝑘 . Over multiple frames, the estimate graduallyconverges to the true channel usage pattern. If u𝑘 denotes theestimate of the channel usage pattern in frame 𝑘, the recursivealgorithm updates the estimate as

u𝑘 = Θ𝒰 [u𝑘−1 + 𝜎𝑘 ⋅ ∇Ξ(𝜻𝐿𝑘 ; u𝑘−1)] (12)

where 𝜎𝑘 is the step size for frame 𝑘.The recursive algorithm minimizes the following Kullback-

Leibler divergence [20]:

𝐾(u) = Eu∗

[ln

Pr[o∣u∗]Pr[o∣u]

]. (13)

If the HMM with the true parameter u∗ is identifiable, theKullback-Leibler divergence has a unique minimizer at u∗. Inaddition, −∇Ξ(𝜻𝐿

𝑘 ; u𝑘−1) in (12) is the stochastic gradientof the Kullback-Leibler divergence. Therefore, the recursivealgorithm in (12) can estimate the true channel usage patternby minimizing the Kullback-Leibler divergence. Similar to thegradient method in (10) for the ML estimator, the recursivealgorithm can only find a local minimum since the Kullback-Leibler divergence is generally not a convex function. How-ever, if the initial estimate is close enough to u∗, we can saythat u𝑘 converges to u∗ with high probability.

E. Rationale Behind the Proposed Frame Structure

In the proposed frame structure, we have assigned thechannel learning subframe dedicated to the estimation ofthe channel usage pattern, instead of just embedding theestimation algorithm in the traditional listen-before-talk policyand making use of the sensing results generated for datatransmission. In this section, we will explain the advantagesof the proposed structure over the latter strategy.

We can easily adapt the proposed HMM (AMP) so that itcan also be applied to the listen-before-talk policy. The listen-before-talk policy senses the channel every 𝐽 slots and uses therest of slots for data transmission. Without loss of generality,sensing slot 𝑛 starts at time 𝑡 = (𝑛− 1)𝐽𝑇 and ends at time𝑡 = (𝑛− 1)𝐽𝑇 + 𝑇 . Let 𝛼𝑛+1 denote the PU activity at time𝑡 = (𝑛− 1)𝐽𝑇 + 𝑇 and let 𝑜𝑛 denote the sensing result fromsensing slot 𝑛. Then, we can define the transition probability𝑟𝑖,𝑗 and the observation probability 𝑞𝑚𝑖,𝑗 in the same way asthe original HMM.

We will show that the estimation of the channel usagepattern becomes more difficult as 𝐽 increases. As 𝐽 increases,the PU activity 𝛼𝑛+1 becomes less dependent upon the previ-ous PU activity 𝛼𝑛. Therefore, the transition probability 𝑟𝑖,𝑗converges to the stationary probability as 𝐽 goes to infinity.That is, 𝑟𝑖,0 → 𝜇/(𝜆+𝜇) and 𝑟𝑖,1 → 𝜆/(𝜆+𝜇) for 𝑖 = 0, 1 as𝐽 →∞. Similarly, the observation probability also convergesas 𝑜𝑚1,𝑗 − 𝑜𝑚0,𝑗 → 0 for 𝑗 = 0, 1 and 𝑚 = 0, 1 as 𝐽 → ∞.From (6), we can see that 1𝑇h0𝝉 → 0 as 𝐽 → ∞. Recallthat, according to Theorem 2, an AMP is unidentifiable if1𝑇h0𝝉 = 0. Therefore, we can say that an AMP becomes lessidentifiable as 𝐽 increases. Roughly speaking, this is because,when 𝐽 is large, the transition in PU activity looks similar tothe sensing error due to statistical independence between thePU activities at consecutive sensing slots.

From this observation, we can conclude that the proposedchannel learning subframe (i.e., 𝐽 = 1) performs better thanthe estimation algorithm used in the listen-before-talk policy(i.e., 𝐽 > 1) and is capable of estimating the channel usagepattern with high transition rates.

IV. DATA TRANSMISSION DURING CHANNEL ACCESS

SUBFRAME

A. Partially Observable Markov Decision Process Model forChannel Access Subframe

During a channel access subframe, the SU exploits spectrumopportunities to transmit its own data. The channel accessalgorithm is responsible for transmitting user data whilelimiting the probability of collision with a PU. This algorithmshould be able to cope with sensing errors. At the same time,it should reduce the time wasted on channel sensing as muchas possible to maximize channel utilization. The proposedalgorithm adopts a strategy different from the traditional listen-before-talk policy. First, the algorithm combines the mostrecent sensing result with previous sensing results to extractreliable information from erroneous sensing results. Second,the algorithm adaptively decides whether to perform sensingor transmit user data in each time slot to prevent unnecessarysensing [30]. We devise an algorithm that accomplishes thesetasks by using a POMDP framework [24]. In addition, thechannel access algorithm should have correct knowledge of thecurrent channel usage pattern of the PU so that it can properlyconfigure the parameters for channel access. Therefore, thealgorithm makes use of the channel usage pattern estimatedin the preceding channel learning subframe.

To design the channel access algorithm, we model thechannel access subframe as a POMDP [24], [31]. In aPOMDP model, similar to HMM, the agent only receives



probabilistic observations, while the states are hidden to theagent. However, unlike HMM, the agent does not only receiveobservations in a passive manner, but also takes actions toexert influence on the system. The action taken by the agentaffects state transition and observation probabilities. Moreover,the agent acquires a reward according to the action. At eachtime point, the agent takes into account the observationsreceived until then to choose a right action which is expectedto return a maximum reward. In our model, the agent (i.e., theSU) chooses an action between sensing and data transmission.A reward value depends on whether data transmission issuccessful or results in collision with PU traffic.

We need to define the states, the actions, and the observa-tions for our model. The definition of a state is the same as thatin the HMM. Thus, s𝑛 denotes the state of slot 𝑛 during thechannel access subframe, which represents the PU activity atthe start and the end of slot 𝑛. Let 𝑎𝑛 denote the action in slot𝑛. If the SU opts to transmit data in slot 𝑛, we have 𝑎𝑛 = 1; ifit chooses to sense during slot 𝑛, we have 𝑎𝑛 = 0. We define𝒜 := {0, 1} as the action space. The observation is also similarto that in the HMM, except for the case that the SU doesnot perform sensing for transmitting data. If the SU performssensing during slot 𝑛, i.e., if 𝑎𝑛 = 0, the observation (i.e., 𝑜𝑛)is equal to the sensing result, 𝜁𝐴𝑘,𝑛. For slot 𝑛 with 𝑎𝑛 = 1, theobservation 𝑜𝑛 is a null observation,∅. Hence, the observationspace for a channel access subframe is 𝒪 := {∅, 0, 1}.

The state transition and observation probabilities are calcu-lated from the channel usage pattern estimated in the channellearning subframe. In our model, an action does not affect thestate transition probabilities. The state transition probabilitiesin the POMDP are the same as those in the HMM. Thatis, we use 𝑝𝑖,𝑗𝑙,𝑚 to denote the state transition probabilityfrom state (𝑙,𝑚) to state (𝑖, 𝑗), and calculate it from thestate transition probability matrix (2) by substituting 𝜆 and𝜇 with ��𝑘 and ��𝑘, respectively. Different from the HMM,the observation probabilities in the POMDP depend on anaction, since the SU receives a null observation when itselects to transmit data. Let 𝑞𝑚𝑖,𝑗(𝑎) denote the observationprobability such that 𝑜𝑛 = 𝑚 given s𝑛 = (𝑖, 𝑗) and 𝑎𝑛 = 𝑎,i.e., 𝑞𝑚𝑖,𝑗(𝑎) := Pr[𝑜𝑛 = 𝑚∣s𝑛 = (𝑖, 𝑗), 𝑎𝑛 = 𝑎]. If theaction is sensing, i.e., if 𝑎 = 0, the observation probability𝑞𝑚𝑖,𝑗(𝑎) is equal to 𝑞𝑚𝑖,𝑗 of the HMM for (𝑖, 𝑗) ∈ 𝒮 and𝑚 = 0, 1. Therefore, these observation probabilities can bederived from the observation probability matrix (3) by usingthe estimate of the channel usage pattern, u𝑘. In addition, wehave 𝑞∅𝑖,𝑗(0) = 0, 𝑞∅𝑖,𝑗(1) = 1, 𝑞0𝑖,𝑗(1) = 0, and 𝑞1𝑖,𝑗(1) = 0.

Let us explain the reward model. First, we define two perfor-mance measures: channel utilization and collision probability.The channel utilization is defined as the probability of success-ful data transmission. Data transmission is successful in thecase that the SU transmits data (i.e., 𝑎𝑛 = 1) in a slot duringwhich there is no PU activity (i.e., s𝑛 = (0, 0)). Then, thechannel utilization is

∑𝑁𝐴

𝑛=1 Pr[s𝑛 = (0, 0), 𝑎𝑛 = 1]/𝑁𝐴. Wedefine the collision probability as the probability that the PU isactive (i.e., s𝑛 ∕= (0, 0)) when the SU attempts to transmit data(i.e., 𝑎𝑛 = 1). Formally, the collision probability is defined as𝐶 :=

(∑𝑁𝐴

𝑛=1 Pr[s𝑛 ∕= (0, 0), 𝑎𝑛 = 1])/(∑𝑁𝐴

𝑛=1 Pr[𝑎𝑛 = 1]).

We maximize the channel utilization while limiting the colli-

sion probability as follows:

max

∑𝑁𝐴

𝑛=1 Pr[s𝑛 = (0, 0), 𝑎𝑛 = 1]

𝑁𝐴

s. t. 𝐶 =

∑𝑁𝐴

𝑛=1 Pr[s𝑛 ∕= (0, 0), 𝑎𝑛 = 1]∑𝑁𝐴

𝑛=1 Pr[𝑎𝑛 = 1]≤ 𝐶lim (14)

where 𝐶lim denotes the collision probability limit. We releasethe constraint by applying the Lagrange multiplier 𝜈 to theconstraint. Then, the optimization problem reduces to

max

𝑁𝐴∑𝑛=1

E[𝑅(s𝑛, 𝑎𝑛)] (15)

where 𝑅(s, 𝑎) is the reward for given state s and action 𝑎,such that

𝑅(s, 𝑎) =

⎧⎨⎩𝜈 ⋅ 𝐶lim + 1/𝑁𝐴, if s = (0, 0) and 𝑎 = 1

𝜈 ⋅ 𝐶lim − 𝜈, if s ∕= (0, 0) and 𝑎 = 1

0, otherwise.(16)

B. Channel Access Algorithm

We now design the channel access algorithm that selects anaction in each slot in order to maximize the objective functionin (15). To decide an action for slot 𝑛, the algorithm considersthe observations obtained until slot 𝑛, i.e., 𝑜1, . . . , 𝑜𝑛−1.Instead of directly using the observations, the algorithm cal-culates the belief vector and uses it to decide an action. Itis known that the belief vector summarizes all the necessaryinformation required to make an optimal decision [31]. Let𝝅𝑛 := (𝜋𝑛0,0, 𝜋

𝑛0,1, 𝜋

𝑛1,0, 𝜋

𝑛1,1) denote the belief vector for slot

𝑛. In the belief vector, 𝜋𝑛𝑖,𝑗 represents the belief that the statein slot 𝑛 is (𝑖, 𝑗) given 𝑎1, . . . , 𝑎𝑛−1 and 𝑜1, . . . , 𝑜𝑛−1. Thatis, 𝜋𝑛𝑖,𝑗 := Pr[s𝑛 = (𝑖, 𝑗)∣𝝅1, 𝑎1, . . . , 𝑎𝑛−1, 𝑜1, . . . , 𝑜𝑛−1].Let Π denote the domain of a belief vector, i.e., Π :={(𝜋𝑖,𝑗)(𝑖,𝑗)∈𝒮 ∣

∑(𝑖,𝑗)∈𝒮 𝜋𝑖,𝑗 ≤ 1 and 𝜋𝑖,𝑗 ≥ 0 for (𝑖, 𝑗) ∈ 𝒮}.

The initial belief vector 𝝅1 is the stationary distribution of thehidden process. The belief vector in slot 𝑛 is updated from thebelief vector in slot (𝑛− 1) as follows:

𝜋𝑛𝑖,𝑗 = 𝜂𝑖,𝑗(𝝅𝑛−1; 𝑎𝑛−1, 𝑜𝑛−1), for (𝑖, 𝑗) ∈ 𝒮 (17)

where

𝜂𝑖,𝑗(𝝅; 𝑎, 𝑜) =

∑(𝑙,𝑚)∈𝒮 𝑝

𝑖,𝑗𝑙,𝑚 ⋅ 𝑞𝑜𝑙,𝑚(𝑎) ⋅ 𝜋𝑙,𝑚𝜃(𝝅; 𝑎, 𝑜)

(18)

and

𝜃(𝝅; 𝑎, 𝑜) =∑

(𝑖,𝑗)∈𝒮

∑(𝑙,𝑚)∈𝒮

𝑝𝑖,𝑗𝑙,𝑚 ⋅ 𝑞𝑜𝑙,𝑚(𝑎) ⋅ 𝜋𝑙,𝑚. (19)

Note that the update of the belief vector is slightly differentfrom the one in [31], since only the observations from untilthe previous slot are available.

The channel access algorithm selects an action accordingto a policy. Let 𝜷 := (𝛽1, . . . , 𝛽𝑁𝐴) denote a policy. A policyin slot 𝑛, i.e., 𝛽𝑛 : Π → 𝒜, is a mapping of a belief vector𝝅𝑛 to an action 𝑎𝑛. In slot 𝑛, the channel access algorithmchooses 𝛽𝑛(𝝅𝑛) as an action. Among the policies, we definethe optimal policy 𝜷∗ := (𝛽∗1 , . . . , 𝛽∗𝑁𝐴

) as the one that



maximizes the objective function in (15). To derive the optimalpolicy, we define the optimal value function 𝑉 ∗

𝑛 : Π → ℜ asthe maximum expected reward that will be earned from slot 𝑛for the current belief vector. The optimal value function canbe found by the following dynamic programming recursion[31]:

𝑉 ∗𝑁𝐴

(𝝅) = max𝑎∈𝒜

{ ∑(𝑖,𝑗)∈𝒮

𝜋𝑖,𝑗𝑅((𝑖, 𝑗), 𝑎)

}(20)

𝑉 ∗𝑛 (𝝅) = max

𝑎∈𝒜

{ ∑(𝑖,𝑗)∈𝒮

𝜋𝑖,𝑗𝑅((𝑖, 𝑗), 𝑎) +

∑𝑜∈𝒪

𝜃(𝝅; 𝑎, 𝑜) ⋅ 𝑉 ∗𝑛+1(𝜼(𝝅; 𝑎, 𝑜))

}(21)

where 𝜼(𝝅; 𝑎, 𝑜) := (𝜂𝑖,𝑗(𝝅; 𝑎, 𝑜))(𝑖,𝑗)∈𝒮 . The optimal policy𝜷∗ is a policy such that 𝛽∗𝑛 for each 𝑛 maps a belief vectorto a maximizing argument in (20) and (21).

Although we can calculate the optimal policy from (20)and (21), the complexity of the dynamic programming in anuncountable set can be prohibitive [31]. Moreover, we shouldalso find the Lagrange multiplier 𝜈 that makes the collisionprobability constraint in (14) satisfied, which requires a highcomplexity iterative algorithm such as the subgradient method.To overcome this difficulty, we suggest a simple stationarysuboptimal policy that exhibits a near-optimal performancein terms of channel utilization while restricting the collisionprobability within the collision probability limit 𝐶lim. Thesuboptimal policy is

𝛽sub𝑛 (𝝅) =

{1, 1− 𝜋0,0 ≤ 𝐶lim

0, otherwise∀𝑛 = 1, . . . , 𝑁𝐴. (22)

In Appendix D, we prove that this suboptimal policy satisfiesthe collision probability constraint. Also, in Section V, itis shown by using simulations that the suboptimal policyachieves a near-optimal performance. In Fig. 5, we summa-rize the operation of the channel access algorithm when thesuboptimal policy is applied.

V. NUMERICAL RESULTS

We first evaluate the performances of the channel learningand the channel access algorithms separately, and then studythe benefit of the combined use of both algorithms. The sim-ulation parameters are as follows: bandwidth of a frequencychannel (𝑊 ) is 10 MHz; length of a frame is 200 ms; lengthof a slot (𝑇 ) is 20 𝜇s. There are 1000 and 9000 slots in achannel learning subframe and in a channel access subframe,respectively. The threshold for energy detection (𝛿) is set to1.16. The set of possible channel usage patterns is given as𝒰 = {(𝜆, 𝜇, 𝛾)∣𝜆 ≤ 1 kHz, 𝜇 ≤ 1 kHz, 𝜌 ≥ −10 dB}. Weuse the recursive algorithm for estimating the channel usagepattern. We use a constant step size, 𝜎𝑘 = 10−5, for therecursive algorithm. We assume that the SU does not switchthe frequency channel during simulation time.

Fig. 6 demonstrates how well the channel learning algorithmestimates the time-varying channel usage pattern. The channelusage pattern changes in frames 1000, 2000, and 3000. Inthis figure, we can see that the estimate fluctuates around

1: Calculate the state transition and observationprobabilities from u𝑘

2: Calculate the initial belief vector, 𝝅1

3: for 𝑛 = 1 to 𝑁𝐴 do4: if 1− 𝜋𝑛0,0 ≤ 𝐶lim then5: SU exchanges user data in slot 𝑛6: 𝑎𝑛 ← 17: 𝑜𝑛 ← ∅

8: else9: SU performs energy detection in slot 𝑛 and

calculates the sensing result 𝜁𝐴𝑛10: 𝑎𝑛 ← 011: 𝑜𝑛 ← 𝜁𝐴𝑛12: end if13: 𝜋𝑛+1

𝑖,𝑗 ← 𝜂𝑖,𝑗(𝝅𝑛; 𝑎𝑛, 𝑜𝑛) for (𝑖, 𝑗) ∈ 𝒮14: end for

Fig. 5. The channel access algorithm in the channel access subframe offrame 𝑘.

0 500 1000 1500 2000 2500 3000 3500 40000.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8 k k k k ^^^

Stat

e tra

nsiti

on ra

tes

(kH

z)

Frame

-7

-6

-5

-4

-3

-2

-1 k k

SN

R (d

B)

Fig. 6. Estimates of the channel usage pattern over frames.

the real channel usage pattern due to the constant step size.Nonetheless, the channel learning algorithm well tracks thevariations of the channel usage pattern. Note that the speedand the accuracy of convergence can be controlled by adjustingthe step size 𝜎𝑘.

We evaluate the performance of the channel access algo-rithm in Figs. 7 and 8. For these figures, we assume thatthe channel usage pattern remains the same over time and isknown to the SU so that we can focus on the performance ofthe channel access algorithm. Fig. 7 shows the utilization andthe collision probability for the proposed channel access algo-rithm with the suboptimal policy as function of the collisionprobability limit. We can see in the figure that the utilizationconverges to the probability that a slot is not occupied by thePU as the collision probability increases. This figure showsthat the collision probability does not exceed the collisionprobability limit, regardless of the channel usage pattern. By



0.01 0.04 0.07 0.11E-3

0.01

0.1

1U

tiliz

atio

n an

d co

llisi

on p

roba

bilit

y

Collision probability limit

Collision probability limit Utilization, = = 0.2 kHz, SNR = -3 dB Collision Prob., = = 0.2 kHz, SNR = -3 dB Utilization, = 0.2 kHz, = 0.1 kHz, SNR = -5 dB Collision Prob., = 0.2 kHz, = 0.1 kHz, SNR = -5 dB

Fig. 7. Variations in utilization and collision probability with collisionprobability limit for the proposed channel access algorithm.

0.01 0.1 0.50.0

0.1

0.2

0.3

0.4

0.5

Util

izat

ion

Collision Probability

Proposed, suboptimal, = = 0.2 kHz Proposed, optimal, = = 0.2 kHz Heuristic, = = 0.2 kHz Proposed, suboptimal, = 0.2 kHz, = 0.1 kHz Proposed, optimal, = 0.2 kHz, = 0.1 kHz Heuristic, = 0.2 kHz, = 0.1 kHz

Fig. 8. Performance comparison of the proposed channel access algorithmswith suboptimal and optimal policies, and the heuristic channel accessalgorithm in terms of utilization and collision probability. The SNR of aPU signal is set to -4 dB.

lowering the collision probability limit, we can decrease thecollision probability at the cost of the utilization.

Fig. 8 compares the performances of the proposed channelaccess algorithm (with suboptimal and optimal policies) andthe performances of the heuristic channel access algorithm.We compare the proposed algorithm with a simple listen-before-talk heuristic algorithm. If the sensing result in slot(𝑛 − 1) indicates that the channel is inactive, the heuristicalgorithm transmits data for 𝜏 consecutive slots from slot 𝑛until it performs another energy detection. Thus, 𝜏 balancesthe tradeoff between the utilization and the collision proba-bility for the heuristic algorithm. The graphs are plotted byvarying 𝐶lim for the proposed algorithm with the suboptimalpolicy, 𝜈 and 𝐶lim for the proposed algorithm with the optimalpolicy, and 𝜏 for the heuristic algorithm. In this figure, we cansee that the proposed algorithm with the suboptimal policyexhibits performance very close to the optimal one. Therefore,we can say that the suboptimal policy is a very useful low-complexity alternative to the optimal policy, accomplishing

100 500 1000 1500 2000 2500 3000 3500 40001E-3

0.01

0.1

1

Util

izat

ion

and

colli

sion

pro

babi

lity

Frame

Proposed with learning, utilization Proposed with learning, collision prob. Proposed w/o learning, utilization Proposed w/o learning, collision prob.

Fig. 9. Time variation of utilization and collision probability for the proposedschemes with and without the channel learning algorithm. The utilization andcollision probability are time-averaged over every 100 frames.

0.003 0.01 0.1 10.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Utilizationcdf

Utilization and collision probability

Proposed with learning Proposed w/o learning Heuristic

Collision probability

Fig. 10. Cumulative density functions of utilization and collision probabilitywhen the proposed schemes with and without the channel learning algorithmand the heuristic channel access algorithm are used.

a near-optimal performance as well as effectively limitingthe collision probability. We also observe that the proposedalgorithm outperforms the heuristic algorithm. The proposedalgorithm can achieve very low collision probability owing toits resilience to sensing errors, whereas the heuristic algorithmcannot.

In Figs. 9-10, we consider the channel learning algorithmas well as the channel access algorithm to investigate theimpact of channel learning on the system performance. Fig. 9shows the time variation of the utilization and the collisionprobability of the proposed schemes with and without thechannel learning algorithm. Since the proposed scheme withlearning consumes additional 𝑁𝐿 slots for channel learning,for fairness in comparison, we multiply 𝑁𝐴/(𝑁𝐿 + 𝑁𝐴) tothe utilization of the proposed scheme with learning. Forboth the schemes, the collision probability limit, 𝐶lim, is setto 0.03. While the proposed scheme with learning utilizes



the channel usage pattern estimated by the channel learningalgorithm to adjust the parameters of the channel access algo-rithm, the proposed scheme without learning just assumes that𝜆 = 𝜇 = 0.3 kHz and 𝛾 = −3 dB. The channel usage patternu𝑘 varies over time as follows: (0.4 kHz, 0.4 kHz,−3 dB)for 𝑘 = 1, . . . , 1000, (0.6 kHz, 0.2 kHz,−5 dB) for𝑘 = 1001, . . . , 2000, (0.1 kHz, 0.6 kHz,−2 dB) for 𝑘 =2001, . . . , 3000, and (0.4 kHz, 0.2 kHz,−6 dB) for 𝑘 =3001, . . . , 4000. From Fig. 9, we observe that the proposedscheme without learning violates the collision probability limitand imposes excessive interference to PU traffic, when thechannel usage pattern is unfavorable. On the other hand, forthe proposed scheme with learning, the collision probabilityremains below the collision probability limit, irrespective ofhow the channel usage pattern varies. This is due to the factthat the scheme with learning is able to adapt its parametersto the varying channel usage pattern.

In Fig. 10, we compare the cumulative distribution functions(cdf’s) of the utilization and the collision probability when theproposed schemes with and without learning and the heuristicchannel access algorithm are used. We estimate the utilizationand the collision probability in each frame and calculate thecorresponding cumulative distribution functions. The channelusage pattern randomly changes over frames. The durationbetween consecutive changes in the channel usage patternfollows a geometric distribution with an average of 1000frames. The state transition rates 𝜆 and 𝜇 are selected froma uniform distribution over [0.1 kHz, 1 kHz], and the SNRof PU signals is uniformly distributed over [−6 dB,−3 dB].The collision probability limit is set to 0.03. The proposedscheme without learning assumes that 𝜆 = 𝜇 = 0.8 kHz and𝛾 = −6 dB. For the heuristic algorithm, we set 𝜏 = 1 toreduce the collision probability of the heuristic algorithm asmuch as possible. From Fig. 10, we observe that the colli-sion probability limit is frequently violated by the proposedscheme without learning and the heuristic algorithm, whilethe proposed scheme with learning well keeps the collisionprobability below the limit. The proportions of the frames inwhich the collision probability exceeds the limit are 0.07, 0.08,and 0.61 for the proposed schemes with and without learning,and the heuristic algorithm, respectively. From this figure,we can conclude that the proposed scheme with learning caneffectively maintain the collision probability under the targetlimit. While keeping the collision probability, the proposedscheme with learning also has the average utilization (i.e.,0.31) considerably higher than the proposed scheme withoutlearning (i.e., 0.18) and the heuristic algorithm (i.e., 0.24).

VI. CONCLUSION

We have proposed a channel sensing and channel accessscheme that opportunistically exploits frequency channelsoccupied by a data-centric primary user network. The pro-posed scheme repeats a learning and access cycle, driven bythe channel learning and the channel access algorithms. Tomake the scheme robust to high sensing error probability, wehave applied the hidden Markov model (HMM) and partiallyobservable Markov decision process (POMDP) frameworksto the channel learning and the channel access algorithms,

respectively. The simulation results have shown that, by adapt-ing to varying channel usage pattern, the proposed schemeprovides efficient access to spectrum opportunities while con-straining the interference to the primary users below the targetlimit. The proposed scheme outperforms a heuristic algorithmwithout any learning functionality. Extension of the schemeto a distributed multiuser scenario will be considered in ourfuture work.

APPENDIX

A. Proof of the Condition for Equivalence of Two AMPs

Let u and u be the channel usage patterns correspondingto the AMPs with h and h, respectively. The probability ofan observation sequence x = {𝑥1, . . . , 𝑥𝑁} given the channelusage pattern u can be rewritten as

Pr[o = x∣u] = 1𝑇 ⋅ I𝑥𝑁h ⋅ I𝑥𝑁−1h ⋅ ⋅ ⋅ I𝑥2h ⋅ I𝑥1𝝅

= 1𝑇h𝑥𝑁h𝑥𝑁−1 ⋅ ⋅ ⋅h𝑥2𝝅𝑥1

(23)

where 𝝅 := (𝝅0,𝝅1)𝑇 is a column vector of the initial state

distribution in which 𝝅0 and 𝝅1 are 2-by-1 column vectors,I0 := diag(1, 1, 0, 0), and I1 := diag(0, 0, 1, 1).

We first consider the case that 1𝑇h0𝝉 = 0 and 1𝑇 h0𝝉 = 0.In this case, we have 1𝑇h𝑥 = 1𝑇 𝑦𝑥 and 1𝑇 h𝑥 = 1𝑇 𝑦𝑥 for𝑥 = 0, 1 and some real values 𝑦0, 𝑦1, 𝑦0, and 𝑦1. Then, wehave Pr[o = x∣u] = 𝑦𝑥𝑁 𝑦𝑥𝑁−1 ⋅ ⋅ ⋅ 𝑦𝑥2𝑦𝑥1 and Pr[o = x∣u] =𝑦𝑥𝑁 𝑦𝑥𝑁−1 ⋅ ⋅ ⋅ 𝑦𝑥2𝑦𝑥1 . The AMPs with h and h are equivalentif and only if 𝑦𝑥𝑁 𝑦𝑥𝑁−1 ⋅ ⋅ ⋅ 𝑦𝑥2𝑦𝑥1 and 𝑦𝑥𝑁 𝑦𝑥𝑁−1 ⋅ ⋅ ⋅ 𝑦𝑥2𝑦𝑥1

are the same for all observation sequences x. This conditionis satisfied only when 𝑦𝑥 = 𝑦𝑥 for 𝑥 = 0, 1. Therefore, wecan conclude that 1𝑇h0 = 1𝑇 h0 should be satisfied for theequivalence of two AMPs.

We now consider the case that 1𝑇h0𝝉 ∕= 0 or 1𝑇 h0𝝉 ∕= 0.The proof for this case is based on the result in [28]. Let 𝒱denote the null space defined by

𝒱 := {𝝅∣1𝑇 ⋅ I𝑥𝑁h ⋅ I𝑥𝑁−1h ⋅ ⋅ ⋅ I𝑥2h ⋅ I𝑥1𝝅 = 0 ∀x}.(24)

The vector in the null space should satisfy 1𝑇𝝅0 = 0,1𝑇h0𝝅0 = 0, 1𝑇𝝅1 = 0, and 1𝑇h0𝝅1 = 0. If 1𝑇h0𝝉 ∕= 0,the only vector satisfying the condition is the zero vector. In[28], it is shown that the AMPs with h and h are equivalentif and only if h and h are similar via some block diagonalmatrix preserving the probability, on the quotient space wherethe null space is factored out. Since the null space has zerodimension in this case, the AMPs are equivalent if and onlyif there exists a 2-by-2 matrix X such that

1𝑇X = 1𝑇 , Xh0 = h0X, and Xh1 = h1X. (25)

B. Proof of the Condition for Identifiability of an AMP

If 1𝑇h0𝝉 = 0, there can be an infinite number of AMPswith the transition probability matrix h ∕= h that satisfies1𝑇 h0 = 1𝑇h0. Since these AMPs are equivalent to the AMPwith h from Theorem 1, it should be satisfied that 1𝑇h0𝝉 ∕= 0for the AMP to be identifiable.

Suppose that there exists an AMP with h that is equivalentto the AMP with h when 1𝑇h0𝝉 ∕= 0. Then, from Theorem 1,there exists X ∕= I such that 1𝑇X = 1𝑇 , Xh0 = h0X, and



Xh1 = h1X. We can calculate h0 = Xh0X−1 and h1 =

Xh1X−1. These matrices should satisfy, for some 𝑟𝑖,𝑗 and 𝛾,

h0 =

[𝑟0,0(1−𝐷(0)) 𝑟1,0(1−Υ(𝛾))𝑟0,1(1 −Υ(𝛾)) 𝑟1,1(1 −𝐷(𝛾))

](26)

and

h1 =

[𝑟0,0𝐷(0) 𝑟1,0Υ(𝛾)𝑟0,1Υ(𝛾) 𝑟1,1𝐷(𝛾)

]. (27)

Therefore, we have

1𝑇X = 1𝑇 and F(𝛾) ∘ (Xh0X−1) = G(𝛾) ∘ (Xh1X

−1).(28)

If there is no X ∕= I and 𝛾 ≥ 0 satisfying the above condition,we can say that there is no AMP equivalent to the AMP withh.

C. Calculation of the Gradient of Ξ(o;u)

We calculate the partial derivatives of Ξ(o;u) with respectto 𝜆, 𝜇, and 𝛾. To do this, we first define 𝜙(o;u) := Pr[o∣u].Recall that 𝛼𝑛 is the PU activity at time 𝑡 = (𝑛− 1)𝑇 when𝑡 = 0 at the start of the channel learning subframe. Let usdefine 𝜶 := (𝛼1, . . . , 𝛼𝑁𝐿+1). Then, 𝜙(o;u) can be rewrittenas the sum of the probabilities Pr[o,𝜶∣u]’s for all possible𝜶’s, that is, 𝜙(o;u) =

∑𝜶 𝜅(o,𝜶;u), where

𝜅(o,𝜶;u) = Pr[o,𝜶∣u] = 𝑏𝛼1

𝑁𝐿∏𝑛=1

𝑟𝛼𝑛,𝛼𝑛+1 ⋅ 𝑞𝑜𝑛𝛼𝑛,𝛼𝑛+1.

(29)

In the above equation, we define 𝑏𝑖 := Pr[𝛼1 = 𝑖] and 𝑟𝑖,𝑗 :=Pr[𝛼𝑛+1 = 𝑗∣𝛼𝑛 = 𝑖]. Then, we have 𝑏0 = 𝜇/(𝜆 + 𝜇), 𝑏1 =𝜆/(𝜆 + 𝜇), 𝑟0,0 = 𝑒−𝜆𝑇 , 𝑟0,1 = 1 − 𝑒−𝜆𝑇 , 𝑟1,0 = 1− 𝑒−𝜇𝑇 ,and 𝑟1,1 = 𝑒−𝜇𝑇 . In addition, using the definition of Υ(𝛾),we have 𝑞00,0 = 1 − 𝐷(0), 𝑞10,0 = 𝐷(0), 𝑞00,1 = 1 − Υ(𝛾),𝑞10,1 = Υ(𝛾), 𝑞01,0 = 1−Υ(𝛾), 𝑞11,0 = Υ(𝛾), 𝑞01,1 = 1−𝐷(𝛾),and 𝑞11,1 = 𝐷(𝛾).

First, we calculate the derivative of 𝜅(o,𝜶;u) with respectto an arbitrary variable 𝑥. That is,

∂𝜅

∂𝑥(o,𝜶;u) =

∑𝑖∈{0,1}

∂𝑏𝑖∂𝑥⋅ 1𝑏𝑖⋅ 1𝛼1=𝑖 Pr[o,𝜶∣u]

+∑

(𝑖,𝑗)∈𝒮

∂𝑟𝑖,𝑗∂𝑥⋅ 1

𝑟𝑖,𝑗⋅𝑁𝐿∑𝑛=1

1s𝑛=(𝑖,𝑗) Pr[o,𝜶∣u]

+∑

(𝑖,𝑗)∈𝒮

∑𝑚∈𝒪

∂𝑞𝑚𝑖,𝑗∂𝑥⋅ 1

𝑞𝑚𝑖,𝑗⋅𝑁𝐿∑𝑛=1

1s𝑛=(𝑖,𝑗),𝑜𝑛=𝑚 Pr[o,𝜶∣u]

(30)

where 𝒮 is the state space, 𝒪 is the observation space, and1𝑋 is a function that is 1 if 𝑋 is true; and 0 otherwise. Now,

we calculate ∂Ξ/∂𝑥 as

∂Ξ

∂𝑥(o;u) =

1

𝜙(o;u)⋅ ∂𝜙(o;u)

∂𝑥=

1

𝜙(o;u)⋅∑𝜶

∂𝜅(o,𝜶;u)

∂𝑥

=1

𝜙(o;u)⋅( ∑

𝑖∈{0,1}

∂𝑏𝑖∂𝑥⋅ 1𝑏𝑖⋅ 𝜔𝑖(o;u)

+∑

(𝑖,𝑗)∈𝒮

∂𝑟𝑖,𝑗∂𝑥⋅ 1

𝑟𝑖,𝑗⋅ 𝜒𝑖,𝑗(o;u)

+∑

(𝑖,𝑗)∈𝒮

∑𝑚∈𝒱

∂𝑞𝑚𝑖,𝑗∂𝑥⋅ 1

𝑞𝑚𝑖,𝑗⋅ 𝜓𝑚

𝑖,𝑗(o;u)

)(31)

where we define 𝜔𝑖(o;u) := Pr[𝛼1 = 𝑖,o∣u],𝜒𝑖,𝑗(o;u) :=

∑𝑁𝐿

𝑛=1 Pr[s𝑛 = (𝑖, 𝑗),o∣u], and 𝜓𝑚𝑖,𝑗(o;u) :=∑

𝑛∣𝑜𝑛=𝑚 Pr[s𝑛 = (𝑖, 𝑗),o∣u]. From this equation, we cancalculate ∂Ξ/∂𝜆, ∂Ξ/∂𝜇, and ∂Ξ/∂𝛾. For example, we canderive ∂Ξ/∂𝜆 as∂Ξ

∂𝜆(o;u) =

1

𝜙(o;u)⋅(∂𝑏0∂𝜆

⋅ 1

𝑏0⋅ 𝜔0(o;u) +

∂𝑏1∂𝜆

⋅ 1

𝑏1⋅ 𝜔1(o;u)

+∂𝑟0,0∂𝜆

⋅ 1

𝑟0,0⋅ 𝜒0,0(o;u)

+∂𝑟0,1∂𝜆

⋅ 1

𝑟0,1⋅ 𝜒0,1(o;u)

)

=1

𝜙(o;u)⋅(𝜇 ⋅ 𝜔1(o;u)

𝜆(𝜆+ 𝜇)− 𝜔0(o;u)

𝜆+ 𝜇

+𝑇 ⋅ 𝜒0,1(o;u)

𝑒𝜆𝑇 − 1− 𝑇 ⋅ 𝜒0,0(o;u)

). (32)

We can also calculate ∂Ξ/∂𝜇 and ∂Ξ/∂𝛾 in a similar way.

D. The Suboptimal Policy Satisfies the Collision ProbabilityConstraint

Proof: We prove that collision probability does not exceedthe collision probability limit, i.e., 𝐶 ≤ 𝐶lim, when thesuboptimal policy 𝜷sub = (𝛽sub

1 , . . . , 𝛽sub𝑁𝐴

) is applied. Providedthat 𝜷sub is used, we can rewrite the collision probability as

𝐶 =

∑𝑁𝐴

𝑛=1 Pr[s𝑛 ∕= (0, 0), 𝑎𝑛 = 1]∑𝑁𝐴

𝑛=1 Pr[𝑎𝑛 = 1]

=

∑𝑁𝐴

𝑛=1

∑Γ𝑛

Pr[s𝑛 ∕= (0, 0), 1− 𝜋0,0 ≤ 𝐶lim∣Γ𝑛] ⋅ Pr[Γ𝑛]∑𝑁𝐴

𝑛=1

∑Γ𝑛

Pr[1− 𝜋0,0 ≤ 𝐶lim∣Γ𝑛] ⋅ Pr[Γ𝑛]

(33)

where Γ𝑛 := {𝝅1, 𝑎1, . . . , 𝑎𝑛−1, 𝑜1, . . . , 𝑜𝑛−1}.Since 𝜋0,0 only depends on Γ𝑛, the value of Pr[1− 𝜋0,0 ≤

𝐶lim∣Γ𝑛] in the denominator in (33) is one if 1− 𝜋0,0 ≤ 𝐶lim;and zero, otherwise. Also, Pr[s𝑛 ∕= (0, 0), 1−𝜋0,0 ≤ 𝐶lim∣Γ𝑛]in the numerator in (33) is calculated as

Pr[s𝑛 ∕= (0, 0), 1− 𝜋0,0 ≤ 𝐶lim∣Γ𝑛] ={1− 𝜋0,0, if 1− 𝜋0,0 ≤ 𝐶lim

0, otherwise.(34)

Therefore, the inequality Pr[s𝑛 ∕= (0, 0), 1 − 𝜋0,0 ≤𝐶lim∣Γ𝑛] ≤ 𝐶lim ⋅Pr[1−𝜋0,0 ≤ 𝐶lim∣Γ𝑛] is satisfied. Applyingthis inequality to (33), we can conclude that

𝐶 ≤∑𝑁𝐴

𝑛=1

∑Γ𝑛

𝐶lim ⋅ Pr[1− 𝜋0,0 ≤ 𝐶lim∣Γ𝑛] ⋅ Pr[Γ𝑛]∑𝑁𝐴𝑛=1

∑Γ𝑛

Pr[1− 𝜋0,0 ≤ 𝐶lim∣Γ𝑛] ⋅ Pr[Γ𝑛]= 𝐶lim.



REFERENCES

[1] S. Geirhofer, L. Tong, and B. M. Sadler, “A measurement-based modelfor dynamic spectrum access in WLAN channels,” in Proc. IEEE MIL-COM Oct. 2006.

[2] S. Geirhofer, L. Tong, and B. M. Sadler, “Dynamic spectrum accessin the time domain: modeling and exploiting white space,” IEEECommun. Mag., vol. 45, no. 5, pp. 66–72, May 2007.

[3] S. D. Jones, E. Jung, X. Liu, N. Merheb, and I. J. Wang, “Char-acterization of spectrum activities in the U.S. public safety band foropportunistic spectrum access,” in Proc. IEEE DySPAN Apr. 2007.

[4] M. Wellens, J. Riihijarvi, and P. Mahonen, “Empirical time and fre-quency domain models of spectrum use,” Physical Commun. (Elsevier),vol. 2, no. 1–2, pp. 10–32, Mar. 2009.

[5] M. Wellens and P. Mahonen, “Lessons learned from an extensivespectrum occupancy measurement campaign and a stochastic duty cyclemodel,” in Proc. TridentCom Apr. 2009.

[6] D. Willkomm, S. Machiraju, J. Bolot, and A. Wolisz, “Primary userbehavior in cellular networks and implications for dynamic spectrumaccess,” IEEE Commun. Mag., vol. 47, no. 3, pp. 88–95, Mar. 2009.

[7] Q. Zhao, L. Tong, A. Swami, and Y. Chen, “Decentralized cognitiveMAC for opportunistic spectrum access in ad hoc networks: a POMDPframework,” IEEE J. Sel. Areas Commun., vol. 25, no. 3, pp. 589–600,Apr. 2007.

[8] Q. Zhao, B. Krishnamachari, and K. Liu, “On myopic sensing formulti-channel opportunistic access: structure, optimality, and perfor-mance,” IEEE Trans. Wireless Commun., vol. 7, no. 12, pp. 5431–5440,Dec. 2008.

[9] Q. Zhao, S. Geirhofer, L. Tong, and B. M. Sadler, “Opportunisticspectrum access via periodic channel sensing,” IEEE Trans. SignalProcess., vol. 56, no. 2, pp. 785–796, Feb. 2008.

[10] H. Su and X. Zhang, “Cross-layer based opportunistic MAC protocolsfor QoS provisionings over cognitive radio wireless networks,” IEEEJ. Sel. Areas Commun, vol. 26, no. 1, pp. 118–129, Jan. 2008.

[11] S. Geirhofer, L. Tong, and B. M. Sadler, “Cognitive medium access: con-straining interference based on experimental models,” IEEE J. Sel. AreasCommun, vol. 26, no. 1, pp. 95–105, Jan. 2008.

[12] S. Huang, X. Liu, and Z. Ding, “Opportunistic spectrum access incognitive radio networks,” in Proc. IEEE INFOCOM Apr. 2008.

[13] R. Urgaonkar and M. J. Neely, “Opportunistic scheduling with reliabilityguarantees in cognitive radio networks,” IEEE Trans. Mobile Comput.,vol. 8, no. 6, pp. 766–777, June 2009.

[14] Y.-C. Liang, Y. Zeng, E. C. Y. Peh, and A. T. Hoang, “Sensing-throughput tradeoff for cognitive radio networks,” IEEE Trans. WirelessCommun., vol. 7, no. 4, pp. 1326–1337, Apr. 2008.

[15] H. Kim and K. G. Shin, “Efficient discovery of spectrum opportunitieswith MAC-layer sensing in cognitive radio networks,” IEEE Trans. Mo-bile Comput., vol. 7, no. 5, pp. 533–545, May 2008.

[16] L. Lai, H. El Gamal, H. Jiang, and H. V. Poor, “Cogni-tive medium access: exploration, exploitation and competition,”IEEE/ACM Trans. Netw., submitted for publication. Available:http://www.ece.osu.edu/∼helgamal/

[17] H. Jiang, L. Lai, R. Fan, and H. V. Poor, “Optimal selection of channelsensing order in cognitive radio,” IEEE Trans. Wireless Commun., vol. 8,no. 1, pp. 297–307, Jan. 2009.

[18] R. Fan and H. Jiang, “Channel sensing-order setting in cognitive radionetworks: a two-user case,” IEEE Trans. Veh. Technol., vol. 58, no. 9,pp. 4997–5008, Nov. 2009.

[19] L. R. Rabiner, “A tutorial on hidden Markov models and selectedapplications in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989.

[20] T. Ryden, “On recursive estimation for hidden Markov models,”Stochastic Processes and their Applications, vol. 66, no. 1, pp. 79–96,Feb. 1997.

[21] S. Huang, X. Liu, and Z. Ding, “Optimal transmission strategies for dy-namic spectrum access in cognitive radio networks,” IEEE Trans. MobileComput., vol. 8, no. 12, pp. 1636–1648, Dec. 2009.

[22] T. Clancy and B. Walker, “Predictive dynamic spectrum access,” inProc. SDR Forum Technical Conference, Nov. 2006.

[23] I. A. Akbar and W. H. Tranter, “Dynamic spectrum allocation incognitive radio using hidden Markov models: Poisson distributed case,”in Proc. SoutheastCon Mar. 2007.

[24] G. E. Monahan, “A survey of partially observable Markov decision pro-cesses: theory, models, and algorithms,” Management Science, vol. 28,no. 1, pp. 1–16, Jan. 1982.

[25] J. Jia, Q. Zhang, and X. Shen, “HC-MAC: a hardware-constrainedcognitive MAC for efficient spectrum management,” IEEE J. Sel. AreasCommun, vol. 26, no. 1, pp. 106–117, Jan. 2008.

[26] H. Urkowitz, “Energy detection of unknown deterministic signals,”Proc. IEEE, vol. 55, no. 4, pp. 523–531, Apr. 1967.

[27] Y. Ephraim and N. Merhav, “Hidden Markov processes,” IEEETrans. Inf. Theory, vol. 48, no. 6, pp. 1518–1569, June 2002.

[28] H. Ito, S.-I. Amari, and K. Kobayashi, “Identifiability of hidden Markovinformation sources and their minimum degrees of freedom,” IEEETrans. Inf. Theory, vol. 38, no. 2, pp. 324–333, Mar. 1992.

[29] L. E. Baum and T. Petrie, “Statistical inference for probabilisticfunctions of finite state Markov chains,” The Annals of MathematicalStatistics, vol. 37, no. 6, pp. 1554–1563, Dec. 1966.

[30] K. W. Choi, “Adaptive sensing technique to maximize spectrum uti-lization in cognitive radio,” IEEE Trans. Veh. Technol., vol. 59, no. 2,pp. 992–998, Feb. 2010.

[31] W. S. Lovejoy, “A survey of algorithmic methods for partially observableMarkov decision processes,” Annals of Operations Research, vol. 28,no. 1, pp. 47–66, Dec. 1991.

Kae Won Choi received the B.S. degree in civil,urban, and geosystem engineering in 2001, and theM.S. and Ph.D. degrees in electrical engineering andcomputer science in 2003 and 2007, respectively,all from Seoul National University, Seoul, Korea.From 2008 to 2009, he was with TelecommunicationBusiness of Samsung Electronics Co., Ltd., Korea.From 2009 to 2010, he was a postdoctoral researcherin the Department of Electrical and Computer En-gineering, University of Manitoba, Winnipeg, MB,Canada. In 2010, he joined the faculty at Seoul

National University of Science and Technology, Korea, where he is currentlyan assistant professor in the Department of Computer Science. His researchinterests include cognitive radio, wireless network optimization, radio resourcemanagement, and mobile cloud computing.

Ekram Hossain (S’98-M’01-SM’06) is a full Pro-fessor in the Department of Electrical and ComputerEngineering at University of Manitoba, Winnipeg,Canada. He received his Ph.D. in Electrical En-gineering from University of Victoria, Canada, in2001. Dr. Hossain’s research interests include de-sign, analysis, and optimization of wireless/mobilecommunications networks and cognitive radiosystems (http://www.ee.umanitoba.ca/∼ekram). Heserves as the Area Editor for the IEEE TRANS-ACTIONS ON WIRELESS COMMUNICATIONS in the

area of “Resource Management and Multiple Access,” an Editor for the IEEETRANSACTIONS ON MOBILE COMPUTING, the IEEE COMMUNICATIONS

SURVEYS AND TUTORIALS, and IEEE Wireless Communications. Dr. Hossainhas several research awards to his credit which include the University ofManitoba Merit Award in 2010 (for Research and Scholarly Activities) andthe 2011 IEEE Communications Society Fred Ellersick Prize Paper Award.He is a registered Professional Engineer in the province of Manitoba, Canada.


opportunistic access to spectrum holes between packet

Documents