dynamic proﬁt maximization of cognitive mobile virtual...

1

Dynamic Profit Maximization of Cognitive MobileVirtual Network Operator

Shuqin Li, Member, IEEE, Jianwei Huang, Senior Member, IEEE, and Shuo-Yen Robert Li, Fellow, IEEE

Abstract—We study the profit maximization problem of a cognitive virtual network operator in a dynamic network environment. Weconsider a downlink OFDM communication system with various network dynamics, including dynamic user demands, uncertain sensingspectrum resources, dynamic spectrum prices, and time-varying channel conditions. In addition, heterogenous users and imperfectsensing technology are incorporated to make the network model more realistic. By exploring the special structural of the problem,we develop a low-complexity on-line control policies that determine pricing and resource scheduling without knowing the statistics ofdynamic network parameters. We show that the proposed algorithms can achieve arbitrarily close to the optimal profit with a propertrade-off with the queuing delay.

Index Terms—Cognitive Radio, Profit Maximization, Pricing, Virtual Network Operator.

✦

1 INTRODUCTIONThe limited wireless spectrum is becoming a bottleneck formeeting today’s fast growing demands for wireless data ser-vices. More specifically, there is very little spectrum left thatcan be licensed to new wireless services and applications.However, extensive field measurements [2] showed that muchof the licensed spectrum remains idle most of the time, evenin densely populated metropolitan areas such as New YorkCity and Chicago. A potential way to solve this dilemma is tomanage and utilize the licensed spectrum resource in a moreefficient way.

This is why the concept of Dynamic Spectrum Access(DSA) has received enthusiastic support from governmentsand industries worldwide [3]–[5]. We can roughly classifyvarious DSA approaches into two main categories: the spec-trum sensing based ones and the spectrum leasing (or market)based ones. The first category indicates a hierarchical accessmodel, where unlicensed secondary users opportunisticallyaccess the under-utilized part of the licensed spectrum, withcontrolled interference to the licensed primary users. Duringthis process, spectrum sensing helps the secondary users todetect the currently available spectrum resource. In contrast,the second category relates to a dynamic exclusive use model,which allows licensees to trade spectrum usage right to thesecondary users. In both categories, it is possible to have asecondary operator coordinating the transmissions of multiplesecondary users.

There are pros and cons for both DSA categories. Spectrum

• Shuqin Li is with Research and Innovation, Alcatel-Lucent Shanghai BellCo., Ltd., D400, Bldg. 3, 388 Ningqiao Rd, Shanghai, 201206, China. E-mail: [email protected] This work was done when Shuqin Liwas in The Chinese University of Hong Kong.

• Jianwei Huang and Shuo-Yen Robert Li are with Department of In-formation Engineering, The Chinese University of Hong Kong. E-mail:{jwhuang, bobli}@ie.cuhk.edu.hk. Jianwei Huang is the correspondingauthor.

Part of the results have appeared in IEEE ICC 2012 [1]. This work issupported by the General Research Funds (Project Number CUHK 412710and CUHK 412511) established under the University Grant Committee of theHong Kong Special Administrative Region, China.

sensing detects and identifies the available unused licensedspectrum through technologies such as beacons, geolocationsystem, and cognitive radio. Form the secondary operator’sperspective, the spectrum acquired by sensing is an unreliableresource, since it cannot determine how much resource isavailable before sensing. Furthermore, imperfect sensing maylead to collisions with primary users, and thus reduce theincentives for the licensee to share the spectrum. Thereforethe secondary operator needs to carefully design sensing andaccess algorithm to control the collision probability under anacceptable level. In dynamic spectrum leasing, a secondaryoperator acquires the exclusive right to use spectrum withina limited time period by paying the corresponding leasingprice. Thus the spectrum acquired by spectrum leasing is areliable resource. However, the cost can be high comparedto the spectrum sensing cost, and is dynamically changingaccording to the demand and supply relationship in the market.

In this paper, we will consider a hybrid model, where asecondary operator obtains resources from the primary li-censees through both spectrum sensing and dynamic spectrumleasing, and provides services to the secondary unlicensedusers. Our study is motivated by [6], [7], in which theauthors introduced the new concept of Cognitive MobileVirtual Network Operator (C-MVNO). The C-MVNO is ageneralization of the existing business model of MVNO [8],which refers to the network operator who does not own alicensed frequency spectrum or even wireless infrastructure,but resells wireless services under its own brand name. TheMVNO business model has been very successful after morethan 10 years’ development, and there are more than 600MVNOs today [9], [10]. The C-MVNO model generalizes theMVNO model with DSA technologies, which allow the virtualoperator to obtain spectrum resources through both spectrumsensing and leasing. The C-MVNO model can be applied toa wild range of wireless scenarios. One example is the IEEE802.22 standard [11], which suggests that the cognitive radionetwork using white space in TV spectrum will operate ona point to multipoint basis (i.e., a base station to customer-premises equipments). Such a secondary base station can be

2

operated by a C-MVNO.The key difference between our work and the ones in

[6], [7] is that we study a much more realistic dynamicnetwork in this paper. In [6], [7], the authors formulated theproblem based on a static network scenario, and providedinteresting equilibrium results through a one-shot Stackelberggame. However, the real network is highly dynamic. Forexample, users arrive and leave the systems randomly, thestatistics of spectrum availability changes over time, and thespectrum-sensing results are imperfect. Also the leasing priceis often unpredictable and changing from time to time. Thesedynamics and realistic concerns make the network model andthe corresponding analysis rather challenging.

In this paper, we focus on the profit maximization problemfor C-MVNO in a dynamic network scenario. Our key resultsand contributions are summarized as follows.

• A dynamic network decision model: Our model incor-porates various key dynamic aspects of a cognitive radionetwork and the dynamic decision process of a C-MVNO.We model sensing channel availability, leasing marketprice, and channel conditions as exogenous stochasticprocesses.

• Dynamic user demands: We allow users to dynamicallyjoin the network with random demands (file sizes). Thedemand is affected by both the transmission prices (deci-sion variables) and market states (exogenous stochastics).

• Realistic cognitive radio model: We incorporate variouspractical issues such as imperfect spectrum sensing, pri-mary users’ collision tolerance, and sensing technologyselection. The operator needs to choose a sensing tech-nology to trade-off between cost and performance.

• A low-complexity on-line control policy: By exploitingthe special structure of the problem, we design a low-complexity on-line pricing and resource allocation policy,which can achieve arbitrarily close to the operator’soptimal profit. The policy does not require precise in-formation of the dynamic network parameters, has a lowsystem overhead, and is easy to implement.

The remainder of the paper is organized as follows. InSection II, we introduce the related work. In Section 3, weintroduce the system model. Section 4 describes the problemformulation. In Section 5, we propose the profit maximizationcontrol (PMC) policy for homogeneous users and analyze itsperformance. We further extend profit maximization controlpolicy (M-PMC policy) to heterogeneous users in Section 6.Section 7 provides simulation results for both PMC and M-PMC polices. Finally, we conclude the paper in Section 8.

2 RELATED WORKAmong the vast literature on cognitive radio, we will focuson the results on operator-oriented cognitive radio networks,where secondary operators play key roles in terms of coordi-nating the transmissions of the secondary users. These studiesonly started to emerge recently, e.g., [6], [7], [12]–[21]. Wecan further classify these studies into two clusters: monopolymodels with one operator, and oligopoly models with multipleoperators.

References [6], [7], [12], [13] studied monopoly modelsusing the Stackelberg game formulation. Daoud et al. in [12]proposed a profit-maximizing pricing strategy for uplink powercontrol problem in wide-band cognitive radio networks. Yu etal. in [13] proposed a pricing scheme that can guarantee afair and efficient power allocation among the secondary users.

References [14]–[21] looked at the oligopoly issues, eitherbetween two operators [14], [15] or among many operators[16]–[21]. For the case of two operators, Jia and Zhang in[14] proposed a non-cooperative two-stage game model tostudy the duopoly competition. Duan et al. in [15] formulatedthe economic interaction among the spectrum owner, twosecondary operators and the users as a three-stage game. Forthe case of many operators, Ileri et al. in [16] developed anon-cooperative game to model competition of operators in amixed commons/property-rights regime under the regulation ofa spectrum policy server. Elias and Martignon in [17] showedthat polynomial pricing functions lead to unique and efficientNash equilibrium for the two-stage Stackelberg game betweennetwork operators and secondary users. Niyato et al. in [18]formulated an evolutionary game for modeling the dynamics ofa multiple-seller, multiple-buyer spectrum trading market. Inaddition, several auction mechanisms were proposed to studythe investment problems of cognitive network operators (e.g.,[19]–[21]).

All results mentioned above considered a rather staticnetwork model. In contrast, our work adopts a dynamicnetwork model to characterize the stochastic nature of wirelessnetworks. We will focus on a monopoly model in this paper.

In this paper, we use Lyapunov stochastic optimizationto show the optimality and stability of the proposed profitmaximizing control algorithms. Several closely related pre-vious results applying Lyaunov stochastic optimization towireless networks include [22]–[24]. Huang and Neely in [22]considered revenue maximization problem for a conventionalwireless access point without considering the cognitive radiotechnologies. Urgaonkar and Neely in [23] and Lotfinezhadet al. in [24] studied cognitive radio networks based ona user-oriented approach, by designing joint scheduling andresource allocation algorithms to maximize the utility of agroup of secondary users. Our paper focused on an operator-oriented approach to address profit maximization problem. Inparticular, we need to deal with the combinatorial problem ofchannel selection and channel assignment that usually leads toa high computational complexity. By discovering and utilizingthe special problem structure, we design a low-complexityalgorithm that is suitable for online implementation.

3 SYSTEM MODELConsider a C-MVNO that provides wireless communicationsservices to its own secondary users by acquiring spectrumresource from some spectrum owner. For example, Googlemay acquire spectrum from AT&T to provide its own wirelessservices through the C-MVNO model. The spectrum owner’sspectrum can be divided into two types: the sensing band andthe leasing band. In the sensing band, AT&T serves its ownprimary users, but allows Google to identify available spec-trum in this band through spectrum sensing without explicit

3

communications with AT&T. In the leasing band, AT&T willdoes not allow spectrum sensing, and will lease the band toGoogle for economic returns.

More specifically, we consider a time-slotted OFDM sys-tem, where the C-MVNO serves the downlink transmissionsfrom its base station to the secondary users. The system modelis illustrated in Fig. 1. Secondary users randomly arrive atthe secondary network and request files with random sizesto be downloaded from the base station. This requested filesare queued at the server in the base station until they aresuccessfully transmitted to the requesting users.

! " # $ % & ' ( ) !" !! !# !$ ! # $ % & ' ( )

!"#$%#&'()#*'+,%-.'/011%$%0#'203#*4 5")$%#&'()#*'+,%-.'*6#)7%/'

7)89"-':8%/"4

! " # % & ( ) !" !! !$ # $ ' )

*+,-.,/01+2.-.3, 4+5-.,/01+2.-.3,

;3"3"'0<'-8)#$7%$$%0#':)/9"-$=

>6#)7%/'3$"8$',%-.'<%1"$',%-.'8)#*07'1"#&-.$

!"8?"8='

,%8"1"$$'/.)##"1$

!"##$%$"&

*6+2789:0;34+ <,5=5.4>4+0>5,1

?.--0

@+7+27.3,

A54-+0B458:

CD5-7+1E

*+,-.,/0

-757+

F;5,,+40

-757+

( )l tB( )s tB

maxs

B maxl

B

Fig. 1. Business model of the operator (Cognitive VirtualNetwork Operator).

The rest of the section introduces each part of the systemmodel in more details. The C-MVNO (or “operator” forsimplicity) obtains wireless channels through spectrum sensing(Sections 3.1 and 3.2) and spectrum leasing (Section 3.3), andallocates power over the obtained channels (Section 3.4). Sec-ondary users dynamically arrive and request file downloadingservices (based on the demand model in Section 3.5), and wemodel the requests as a queue (Section 3.6).

3.1 Imperfect Spectrum SensingSensing band Bs

max!= {1, . . . , Bs

max} includes all channelsthat the spectrum owner allows sensing by the operator.1 Wedefine the state of a channel i ! Bs

max(t) in time slot t asSi(t), which equals 0 if channel i is busy (being used by aprimary user), and equals 1 if channel i is idle.

We assume that Si(t) is an i.i.d. Bernoulli random variable,with an idle probability p0 ! (0, 1) and a busy probability1" p0. This approximates the reality well if the time slots forsecondary transmissions are sufficiently long or the primarytransmissions are highly bursty [27]. (We will further studythe general Markovian model in Section 5.5.) We define thesensing state of a channel i ! Bs

max in time slot t as Wi(t),which equals to 0 if channel i is sensed busy, and 1 if sensedidle.

1. The operator will collect the sensing information from a sensor networkor geolocation database and provide it to its users, i.e., providing “sensingas service” [25], [26]. This means that the network can accommodatelegacy mobile devices without cognitive radio capabilities. For more detaileddiscussions, see [7].

Notice that Wi(t) may not equal to Si(t) due to imperfectsensing. The accuracy of spectrum sensing depends on thesensing technology [28]. If we denote Cs as the sensing cost(per channel)2, then we can write the false alarm probability asPfa(Cs)

!= Pr{Wi = 0|Si = 1} (same for all channel i) and

the missed detection probability as Pmd(Cs)!= Pr{Wi =

1|Si = 0} (same for all channel i). Both functions aredecreasing in Cs. Intuitively, a better technology will havea higher cost Cs, a lower false alarm probability Pfa(Cs),and a lower missed detection probability Pmd(Cs). We denoteall choices of cost Cs (and thus the corresponding sensingtechnologies) by a finite set Cs.

As different channels have different conditions (to be ex-plained in details in Section 3.4), the operator needs to decidewhich channels to sense at the beginning of each time slot.We use Bs(t) to denote the set of channels sensed by theoperator at time t, which satisfies

Bs(t) # Bsmax, $ t. (1)

3.2 Collision ConstraintMissed detections in spectrum sensing lead to transmissioncollisions with the primary users. We denote the collision inchannel i ! Bs

max at time t as a binary random variableXi(t) ! {0, 1}. We have Xi(t)

!= (1 " Si(t))Wi(t), i.e., the

collision happens if and only if the channel is busy but issensed idle.

To protect primary users’ transmissions, the operator needsto ensure that the average collision in each channel i does notexceed a tolerable level !i (measured in terms of the averagenumber of collisions per unit time) specified by the spectrumowner. The tolerable level !i can be channel specific, since theprimary users in different channels may have different QoSrequirements. We define the time-average number of collisionin channel i as Xi

!= limt!"

1t

!t#1!=0 E [Xi(")]. The collision

constraints areXi % !i, $i ! Bs

max(t). (2)

3.3 Spectrum Leasing with a Dynamic Market PriceA spectrum owner may have some channels that do not want tobe sensed, for either privacy reasons or the fear of collisionsdue to sensing errors. However, these channels may not bealways fully utilized. The spectrum owner can lease the unusedpart of these channels to the operator dynamically over timeto earn more revenue. Recall that we denote the set of thesechannels as the leasing band Bl

max!= {1, . . . , Bl

max}. (Ingeneral, we may represent it as Bl

max(t), since our modelallows leasing band to be time-varying. For the simplicity ofnotations, we denote it as Bl

max whenever it is clear.) We useBl

i(t) to denote the set of channels leased by the operatorat time t, which satisfies

Bl(t) # Blmax, $ t. (3)

These channels will be exclusively used by the operator in thecurrent time slot. We denote the leasing price per channel as

2. The cost corresponds to, for example, power or time used for sensing.

4

Cl(t), which stochastically changes according to the supplyand demand relationship in the spectrum market (which mightinvolve many spectrum owners and operators). It can bemodeled by an exogenous (not affected by this particularoperator’s decisions) random process with countable discretestates and stationary distribution (not necessarily known bythe operator).

3.4 Power AllocationIn wireless network, there are usually channel fading dueto multipath propagation or shadowing from obstacles. Tocombat channel fading, it is necessary for the operator to doproper power allocation in both sensing channels and leasingchannels to achieve satisfactory data rates. For each channeli ! Bmax

!= Bs

max & Blmax, hi(t) represents its channel gain

in time slot t and follows an i.i.d. distribution over time.Different channels have independent and possibly differentchannel gain distributions. We assume that secondary usersare homogeneous and experience the same channel conditionfor the same channel. But channel conditions can be differentin different channels.3 (The heterogenous user scenario willbe further discussed in Section 6.) The operator can measurehi(t) for each i at the beginning of each slot t, but may notknow the distributions. Let Pi(t) denote the power allocatedto channel i at time t. Since we consider a downlink casehere, the operator needs to satisfy the total power constraintPmax at its base station,

"

i$Bmax

Pi(t) % Pmax, $t. (4)

In addition, for a channel i ! Bsmax in the sensing band,

we use the binary variable Ii(t)!= Si(t)Wi(t) to denote the

transmission result of a secondary user, i.e., Ii(t) = 1 ifsuccessful (i.e., Si(t) = 1 and Wi(t) = 1) and Ii(t) = 0otherwise (either not sensed, or sensed busy, or sensed idlebut actually busy). Based on the discussion of the leasingagreement, we have Ii(t) ' 1, i ! Bl

max for all channel inleasing band. Then the rate in channel i at time slot t is (basedon the Shannon formula)

ri(t) = Ii(t) log2(1 + hi(t)Pi(t)), (5)and total transmission rate obtained by the operator is

r(t) ="

i$Bs(t)%Bl(t)

ri(t). (6)

Furthermore, we assume that the operator has a finitemaximum transmission rate, i.e., r(t) % rmax, $t, under anyfeasible power allocation.

3.5 Demand ModelWe will focus on elastic data traffic in this paper. Secondaryusers randomly arrive at the network to request files withrandom and finite file sizes (measured in the number ofpackets) from the operator. A user will leave the networkonce it has downloaded the complete requested file. The

3. This is the case where the users are located close by, and thus thedownlink channel condition from the base station to the users is userindependent.

operator can price the packet transmission dynamically overtime, which will affect the users’ arrival rate. For example, ahigher price at peak time can refrain users from downloadingfiles, as they can wait until a later time with a lower price.To model this, we use M(t) to denote the random marketstate, which can be measured precisely at the beginning ofeach time slot t and can help estimate the users demand4. Therandom variable is drawn from a finite set M over time in ani.i.d. fashion. The distribution of M(t) may not be known bythe operator.

At a time t, the operator will decide whether to accept newfile downloading requests from newly arrived secondary users.We define the binary demand control variable as O(t),where O(t) = 1 means that the operator accepts the incomingrequests in time t, and O(t) = 0 otherwise. When the operatordecides to accept new requests of packet transmissions, it willalso announce a price q(t) for transmitting one packet(to any user). This price will affect the users’ incentives ofdownloading requests, e.g., when price q(t) is high, some usersmay choose to postpone their requests.

More precisely, we denote the number of incoming users attime t as a discrete random variable N(t)

!= N (M(t), q(t)) !

{0, 1, 2 . . .}, the distribution of which is a function of thetransmission price q(t) and market state M(t). Further, auser n’s requested file size is denoted Ln(t), with n !{1, 2, . . . , N(q(t))}, which is assumed to be independent ofeach other and does not depend on q(t) or M(t). Moreover,we assume that users are using a set K = {1, 2, . . . , K} ofdifferent applications, and denote #k as the probability that anincoming user is using application k ! K with

!Kk=1 #k = 1.

The distributions of the file length for different applicationscan be different, and we denote lk as the expected file lengthof application k ! K.

To summarize, users’ instantaneous demand at time t is

A(t)!=

N(M(t),q(t))"

n=1

Ln(t), (7)

which is a random variable due to random file sizes andthe random number of incoming users (even given q(t) andM(t)). We define the users’ (expected) demand functionas D(t)

!= D (M(t), q(t))

!= E [A (M(t), q(t))], and its

value is completely determined by M(t) and q(t). We cancalculate that D(M(t), q(t)) = E [N(M(t), q(t))]

!k$K #klk.

Then it is reasonable to assume that the operator can ratheraccurately characterize the expected number of incoming usersE [N(M(t), q(t))] through long-term observations. Thus thedemand function D(M(t), q(t)) is known by the operator.We further assume that the instantaneous demand is upper-bounded as A(t) % Amax for all t, and that the demandfunction D(t) is non-negative and non-increasing function ofthe price q(t). When the price is higher than some upper-bound, i.e., q(t) ( qmax, the demand function D(t) will bezero. The optimization of O(t) and q(t) based on the demandfunction will be further discussed in Section 5.2.1.

4. For example, M(t) can be users’ willingness to pays, or whether thesystem is in peak time or off-peak time.

5

3.6 Queuing dynamicsSince we focus on the profit maximization problem in thispaper, we will take a simple view of the network and modelusers’ dynamic arrivals and departures as a single server queue.When a user accesses the network, the corresponding filewill be queued in a server at the base station, waiting tobe transmitted to the user according to the First Come FirstServe (FCFS) discipline. Shama and Lin in [29] showed thatthe single server queue model is a good approximation foran OFDM system, especially when the number of users andchannels are large.

We denote the queue length (i.e., the backlog, or thenumber of all packets from all queued files) at time t as Q(t).Thus the queuing dynamic can be written as

Q(t + 1) =#Q(t)" r(t)

$++ O(t)A(t), (8)

where (a)+!= max(a, 0), r(t) and A(t) are the transmission

rate and incoming rate at time t, and O(t) is the binary demandcontrol variable (i.e., O(t) = 1 means the operator admit theusers’ transmission requests at time t). Throughout the paper,we adopt the following notion of queue stability:

Q!= lim sup

t!"

1

t

t#1"

!=0

E [Q(")] <). (9)

4 PROBLEM FORMULATIONFor notation convenience, we introduce several condensednotations and use them together with the original notations.

We define $(t)!= (M(t), h(t), Cl(t)) as observable pa-

rameters, including the market state M(t), channel conditions(vector) h(t), and the leasing price Cl(t) in the spectrummarket. Based on previous assumptions, $(t)’s are i.i.d overtime and take values from a finite set !.

We define %(t)!= (O(t), q(t), Cs(t),Bs(t),Bl(t), P (t)) as

decision variables, including the demand control variableO(t), the transmission price for users q(t), the sensing cost(with the corresponding sensing technology) Cs(t), the set ofsensing channels Bs(t), the set of leasing channels Bl(t), andpower allocations (vector) P (t) of the operator. We assumethat %(t) takes values form a countable (finite or infinite) set""(t), which is a Cartesian product of the feasible regionsof all variables, i.e., non-negative values satisfying constraints(1), (3), and (4). With the condensed notations, functions inthis paper can be simply represented as functions of %(t) withparameter $(t).

We further define the instantaneous profit in time t

R(t)!= R(%(t); $(t))!= q(t)O(t)A(t) " Cs(t)|Bs(t)|" Cl(t)|Bl(t)|. (10)

The time average profit is denoted as

R!= lim sup

t!"

1

t

t#1"

!=0

E [R(t)].

All expectations in this paper are taken with respect to systemparameters $(t) unless stated otherwise.

We look at the profit maximization problem through pricingdetermination and resource allocations. At the beginning of

each time slot t, the operator observes the value of $(t) andmakes a decision %(t) to maximize the time average profit,subject to the system stability constraint (11) and the collisionupper-bound requirement (12). The Profit Maximization (PM)problem is formulated as

PM: Maximize R

Subject to Q <), (11)Xi % !i, i ! Bs

max, (12)Variables %(t) ! ""(t), $t,

Parameters $(t), $t.

We represent its optimal solution as %&(t) =%O&(t), q&(t), Cs&(t),Bs&(t),Bl&(t), P &(t)

&, and denote

R& as the maximum profit. The PM problem is an infinite

horizon stochastic optimization problem, which is in generalhard to solve directly, especially when the distribution ofdynamic parameter $(t) is unknown. For example, the futureleasing price is hard to predict due to the dynamic suppliesand demands in the market; and the primary users’ activitiescan not be estimated precisely before hand.

5 PROFIT MAXIMIZATION CONTROL POLICYNow, we adopt Lyapunov stochastic optimization technique tosolve the PM problem.

5.1 Lyapunov stochastic optimizationWe first introduce a virtual queue for constraint (12), and thenderive the optimal control policy to solve the PM problemthrough the technique of drift-plus-penalty function minimiza-tion [30].

We denote Zi(t) as the number of collisions happeningin sensing channel i ! Bs

max. The counter Zi(t) can beunderstood as a “virtual queue”, in which the incoming rate isXi(t), and the serving rate is !i (the collision tolerant level).The queue dynamic is

Zi(t + 1) =#Zi(t)" !i

$++ Xi(t), (13)

with Zi(0) = 0. By this notion, if the virtual queue is stable,then it implies that the average incoming rate is no larger thanthe average serving rate. This is just the same as the collisionupper-bound constraint (12).

We introduce the general queue length vector !(t)!=

{Q(t), Z(t)}. We then define the Lyapunov function

L(!(t))!=

1

2

'Q(t)2 +

"

i$Bsmax

Zi(t)2(,

and the Lyapunov drift

#(!(t))!= E

'L(!(t + 1))" L(!(t))|!(t)

(. (14)

According to the Lyapunov stochastic optimization tech-nique, we can obtain instantaneous control policy that cansolve the PM problem though minimizing some upper boundof the following drift-plus-penalty function in every slot t:

#(!(t))" V E [R(t)|!(t)]. (15)There are two terms in the above function. The first term isthe Lyapunov drift defined in (14). It is shown by Lyapunov

6

stochastic optimization [30] that we can achieve the systemstabilities (i.e., constraints (11) and (12) of the PM problem)by showing the existence of a constant upper bound for thedrift function. The second term in (15) is just the objective ofthe PM problem, i.e., to minimize the minus profit, whichis equivalent to maximize the profit. Here parameter V isintroduced to achieve the desired tradeoff between profit andqueuing delay in the control policy. We first find an upperbound for (15).

By the queue dynamic (8), we have

Q(t+1)2 % (Q(t)"r(t))2+A(t)2+2Q(t)O(t)A(t)

= Q(t)2+r(t)2 + A(t)2+2Q(t)(O(t)A(t)"r(t)) . (16)

Similarly, for virtual queue (13), we have

Zi(t+1)2 % Zi(t)2+!2

i +Xi(t)2+2Zi(t)(Xi(t)"!i). (17)

Substituting (16) and (17) into (15), we have

#(!(t))" V E [R(t)|!(t)] % D "D1(t)

" V E)q(t)O(t)A(t)"Cs(t)|Bs(t)|"Cl(t)|Bl(t)||!(t)

*

+ Q(t)E [O(t)A(t) " r(t)|!(t)] +"

i$Bsmax

Zi(t)E [Xi(t)|!(t)]

where D is a positive constant satisfying the following condi-tion for all t,

D(1

2

+

,E

'r(t)2+(O(t)A(t))2|!(t)

(+"

i$Bsmax

E

'Xi(t)

2+!2i |!(t)

(-

.,

and D1(t)!=!

i$Bsmax

Zi(t)!i is a known constant at time t,since the values of Zi(t)s are known at time t.

To further simplify the above expression, we introduce twonew notations: channel selection B(t)

!= Bs(t) & Bl(t), and

channel cost

Ci(t)!=

/0Cs(t), if i ! Bs(t),

Cl(t), if i ! Bl(t),(18)

where 0Cs(t) is the virtual sensing cost and is defined as0Cs(t)

!= Cs(t) + (1/V )Zi(t)E [Xi(t)|!(t)]. Note that this

virtual sensing cost depends not only on the sensing cost butalso on the collision history in this channel. More frequent pastcollisions in this channel will increase the virtual sensing cost,hence makes the operator more conservative about choosingthis sensing channel. It then follows:

#(!(t))" V E [R(t)|!(t)] % D "D1(t)

+ V E

12Q(t)

V" q(t)

3O(t)A(t)

4444!(t)

5

+ V E

6

7"

i$B(t)

Ci(t)"Q(t)ri(t)

V

444444!(t)

8

9 , (19)

where we use the fact that collisions between secondary andprimary users can only happen in channels that are chosenfor sensing, i.e., i ! Bs(t). Next we propose the ProfitMaximization Control (PMC) policy to minimize the righthand side of inequality (19) for each time t.

5.2 Profit Maximization Control (PMC) policyIt is clear that minimizing the right hand side of (19) isequivalent to minimizing the last two terms in (19). Note thatthe last two terms are decouple in decision variables, thus wehave the two parallel parts in the PMC policy as follows:

5.2.1 Revenue MaximizationHere we determine two variables: the transmission price q(t)and the market control decision O(t). The optimal transmis-sion price q(t) is obtained by solving the following revenuemaximization problem:

Maximize q(t)D (q(t), M(t))"Q(t)

VD (q(t), M(t)) (20)

Variables q(t) ( 0

To obtain the above problem formulation of revenuemaximization, we use the fact that the demand functionD(M(t), q(t))

!= E [A(t)], which is independent of the queu-

ing states of the system.Note that the first term in (20) is just the revenue that

the operator collects from its users. The second term can beviewed as a shift of the queuing effect, which is introducedby the Lyapunov drift for system stability.

If the maximum objective in (20) (under the optimal choiceof q(t)) is positive, the operator sets the demand controlvariable O(t) = 1 and accepts the present incoming requestsA(t) at the price q(t). Otherwise, the operator sets O(t) = 0and rejects any new requests.

5.2.2 Cost MinimizationWe determine channels selection B(t), sensing technology(or cost) Cs(t), and power allocation P (t), by solving thefollowing optimization problem to control the costs of theoperator to provide transmission services to its users.

Minimize"

i$B(t)

Ci(t)"Q(t)

VE [ri(t)|!] (21)

Subject to (1), (3), (4)

Variables Cs(t),Bs(t),Bl(t), Pi(t) ( 0

To obtain the above problem formulation of cost minimiza-tion, we use the fact that Xi(t) = (1 " Si(t))Wi(t), whichis independent of the queuing state. Thus the virtual sensingcost can be updated as 0Cs(t) = Cs(t) + (1/V )Zi(t)(1 "p0)Pmd(Cs(t)), which increases with the virtual queue andmissed detection probability.

Note that the first term in the summation in (21) is thecost of each channel. The second term in the summation is aqueuing-weighted expected transmission rate, again is a shiftintroduced by Lyapunov drift for system stability. This shiftcan be also viewed as the “gain” collected from the channelto help clear the queue.

5.2.3 Intuitions behind the PMC policyWe discuss some intuitions behind the PMC policy. Tomaximize the profit, the operator needs to perform revenuemaximization and cost minimization. To guarantee the queuingstability, some shifts (i.e., all queue-related terms) are intro-duced by the Lyapunov drift in these problems. In Appendix

7

Section C of our technical report [31], we show that thequeueing effect will increase the optimal price announced bythe operator (comparing with not considering queueing), as ahigher price will reduce the users’ demands and maintain thesystem stability.

The Lyapunov stochastic optimization approach providesa way to decompose a long-term average goal (e.g., thePM problem) into instantaneous optimization problems (e.g.,revenue maximization and cost minimization problems in thePMC policy). In the stochastic optimization problem, thecurrent decisions always have impacts on the future prob-lems. These impacts are characterized and incorporated bythe queueing shift terms in the instantaneous optimizationproblems. Therefore, we can achieve the long term goalthrough focusing on the instantaneous decisions in every timeslot. The flowchart for the PMC policy is illustrated in Fig. 2.

Revenge Maximization:

Determine price ! "and market control strategy # $"% by

solving (21)

Updating:

" ← " ' 1

Update queues ) " and* $"% by (8) and (13)

Initiation: ) " + 0 and * " + 0

Power allocation:Compute - " by

Algorithm 3

sensing results .$"%

Sensing cost selection & channel selection:Compute / " , 0 " , 0 " by Algorithm 1

! "#$%&

' "#$%&

Cost Minimization: Profit Maximization Control (PMC) Policy:

Fig. 2. Flowchart of the dynamic PMC policy

Although the revenue maximization problem is relativelyeasy to solve, the cost minimization problem is very com-plicated. It is actually a two-stage decision problem. In thefirst stage, the operator determines the sensing technology,and chooses which channels to sense and which channels tolease. Then spectrum sensing is performed to identify availablechannels. With this information, the operator further allocatesdownlink transmission power in the available channels (sensedidle ones and leasing ones). In Section 5.3, we focus ondesigning algorithms to solve the cost minimization problem.

5.3 Algorithms for Cost Minimization ProblemNow we use backward induction to solve the cost mimmationproblem.

5.3.1 The Second Stage ProblemWe first analyze the power allocation in the second stage,where the sensing results Wi(t), the channel selection Bs(t),and the sensing technology Cs(t) have been determined.Therefore, the power allocation problem of (21) is as follows:

Maximize"

i$B(t)

&i(t) log (1 + hi(t)Pi(t)) (22)

Subject to"

i$Bmax

Pi(t) % Pmax,

Variables Pi(t) ( 0

where

&i(t)!=

/&0

!= E [Si(t)|Wi(t) = 1], if i ! Bs(t),

1, if i ! Bl(t).(23)

and we can calculateE [Si(t)|Wi(t) = 1]

=p0(1" Pfa(Cs(t)))

p0(1" Pfa(Cs(t))) + (1" p0)Pmd(Cs(t)). (24)

By using the Lagrange duality theory, we can show that theproblem (22) has the following optimal solution

P &i (t) =

:;

<&i(t)

#1

#(t) "1

$i(t)hi(t)

$+i ! B(t),

0 i /! B(t),(25)

where '(t) is the Lagrange multiplier of the total powerconstraint (4). The optimal value of '(t) is the following waterfilling solution,

'(t) =

!i$Bp(t) &i(t)

Pmax +!

i$Bp(t)1

hi(t)

, (26)

where Bp(t)!= {i ! B(t) : Pi(t) > 0}. Note that (26) is a

fixed-point equation of '(t), and the precise value of '(t) isnot given here.

When the values of all parameters (i.e., hi(t), &i(t)) aregiven, we can use a simple water level searching Algorithm 1and similar as the searching algorithms in [32], [33]) todetermine the exact optimal value of '(t). In the followingpseudo code of Algorithm 1, we define a function $(m) asfollows:

$(m)!=

!mi=1 &i

Pmax +!m

i=11hi

.

Algorithm 1 Power Allocation1: Rearrange the channel indices i ! Bmax as a decreasing

order of &i(t)hi(t)2: m* |B(t)|, '+ $(m)3: while ' ( hm(t)&m(t) do4: m* m" 15: '+ $(m)6: end while

The main complexity in this algorithm is to sort the channelsaccording to the channel gains. We can adopt established sort-ing algorithms [34] to obtain the index rearrangement with acomplexity O(|Bmax| log(|Bmax|)). Thus the total complexityof Algorithm 1 is O(|Bmax|3 log(|Bmax|)).

5.3.2 The First Stage ProblemLet us consider the first stage problem to determine the sensingtechnology, the sensing set, and the leasing set. Note that sincethe sensing has not been performed at this stage yet, thus thesensing result Wi(t) is not known. We denote

(i(t) =

/(0

!= E [Si(t)Wi(t)] if i ! Bs(t)

1 if i ! Bl(t),(27)

and we can calculateE [Si(t)Wi(t)] = p0(1 " Pfa(Cs(t))) (28)

8

Substitute the optimal power allocation (25) into the prob-lem (21), we have

Minimize"

i$B(t)

Ci(t)"Q(t)

V(i(t)

2log

2hi(t)&i(t)

'(t)

33+

(29)

Subject to Bs(t) # Bsmax, Bl(t) # Bl

max, Cs(t) ! C

Variables Bs(t),Bl(t), Cs(t)

We first consider the above problem for a fixed sensing costCs(t). This problem is a combinatorial optimization problemof Bs(t) and Bs(t). The worst case of searching complexity(i.e., exhaustive searching) can be O(2|Bmax|), exponential inthe number of total channels.

However, we can reduce the complexity by exploring thespecial structure of this problem.

Proposition 1: (Threshold Property)• We rearrange the leasing channel indices i ! Bl

max in thedecreasing order of gi(t), which is defined as

gi(t)!= hi(t) exp

=

"Ci(t)Q(t)V

>

. (30)

There exists a threshold index ilth, such that a channel i ischosen for leasing (i.e., i ! Bl(t)) if and only if i % ilth.

• We rearrange the sensing channel indices j ! Bsmax in

the decreasing order of gj(t), which is defined as

gj(t)!= &0hj(t) exp

=

"Cj(t)Q(t)V

(0

>

. (31)

For all leasing channels j ! Bsmax, there exists a threshold

index jsth, such that a channel j is chosen for sensing (i.e.,

j ! Bs(t)) if and only if j % jsth.

Proof: For each channel in the optimal channel selectionset i ! B&(t), it satisfies the following condition

Ci(t)%Q(t)

V(i(t)

2log

2&i(t)hi(t)

'(t)

33+

. (32)

This result is easy to see from the objective function in (29):to optimize the profit, we should only pick the channel withits cost no larger than its gain. Thus by (30) and (31), theoptimization problem in (29) can be written in the followingequivalent form:

Maximize"

i$Bl(t)

2log

2gi(t)

'(t)

33+

+(0

"

j$Bs(t)

2log

2gj(t)

'(t)

33+

(33)

Subject to Bs(t) # Bsmax, Bl(t) # Bl

max

Variables Cs(t),Bs(t),Bl(t)

Thus by the log function in the objective of (33), the thresholdproperty immediately follows.

This proposition suggests that we should select the channelwith a large gi (for leasing channels) or gj (for sensingchannels). Note that as defined in (30) and (31), gi and gj

are equal to channel information (i.e., hi for leasing channels,&jhj for sensing channels) multiplying a decaying factorrelated to the channel cost. They can be understood as virtualchannel gains by taking channel costs into consideration. Alarge value of gi or gj means that the channel is cost-effective,i.e., the channel has a good channel gain as well as a low cost.

By Proposition 1, it is clear that we can obtain the optimalchannel selection by an exhaustive search of the optimalsensing and leasing thresholds. Algorithm 2 gives a pseudocode for the searching procedure.

Algorithm 2 Optimal Channel Selection (for a given Cs(t))1: procedure COMPUTING B(Cs(t))2: invoke procedure SearchingThreshold(Cs) to calculate

%l and %s

3: U(Cs(t))* 04: for i = 0, 1 . . . , %l do5: for j = 0, 1 . . . , %s do6: Calculate '(t) as (26) with Bp(t) = Bl

i & Bsj

7: if gi(t) > '(t) and gj(t) > '(t) then8: Calculate U(i, j)9: if U(i, j) < U(Cs(t)) then

10: U(Cs(t))* U(i, j)11: B(Cs(t))* Bl

i & Bsj

12: end if13: end if14: end for15: end for16: end procedure

In Algorithm 2, U(i, j) denotes the optimal value of (33)with the channel selection set B = Bl

i & Bsj . To decrease the

number of searching loops, we can first run Algorithm 5 (inAppendix C) to determine the maximum possible thresholds%l for leasing channels or %s for sensing channels. (If we donot run Algorithm 5 , we can just set %l = |Bl

max| and %s =|Bs

max|. Whether we run Algorithm 5 or not, the complexityof Algorithm 2 is no worse than O(|Bs

max| , |Blmax|).) Thus

the searching complexity is reduced to O(|Bmax|2), given thechannel indices are rearranged as in the Proposition 1. We canadopt established sorting algorithms [34] to obtain the indexrearrangement with a complexity O(|Bmax| log(|Bmax|)). Thusthe total complexity of finding the optimal channel selectionis O(|Bmax|3 log(|Bmax|)).

Note that in real systems, the channel conditions and theleasing cost may not change as frequently as every time slot.We usually can update these network parameters every timeframe (which is composed by several time slots instead ofone time slot). Accordingly, the above algorithm will also beoperated based on the time frames, which will greatly reducethe computation complexity in practice.

Furthermore, let us find the optimal sensing cost Cs(t) byenumerating all possible sensing costs Cs(t) ! Cs. For thesensing cost Cs(t), we denote the objective value in (33) asU(Cs(t)) and the optimal channel selection set as B(Cs(t)).The corresponding pseudo code is given in Algorithm 3, thecomplexity of which is O(|C|, |Bmax|3 log(|Bmax|)).

So far, we have completely solved the two-stage optimiza-tion problem in (21). For each time t, the operator first runs Al-gorithm 3 to choose the channel sets B&(t) = Bs&(t)&Bl&(t).Then it uses the sensing technology with a cost Cs&(t) tosense channels in Bs&(t). Based on the sensing results, itfurther runs Algorithm 1 to determine the power allocationP &

i (t), i ! B&(t).

9

Algorithm 3 Optimal Sensing Cost and Channel Selection1: U& * 02: for Cs(t) ! Cs do3: Determine the optimal channel selection B(Cs(t)) (see

Algorithm 2)4: Calculate U(Cs(t))5: if U& > U then6: U& * U , Cs&(t)* Cs(t), B&(t)* B(Cs(t))7: end if8: end for

5.3.3 Sensing vs. LeasingWe are also interested in how the PMC policy makes the besttradeoff between sensing and leasing based on the sensing costCs(t) and the leasing cost Cl(t). To make the comparison easyto understand, we will consider perfect sensing with no sensingerrors (i.e., &0 = 1 and (0 = p0). We will further assumethat a leasing channel i and a sensing channel j have thesame channel gain hi(t) = hj(t). Finally, we assume that twochannels have the same availability-price-ratio, i.e., the costssatisfy Cs(t) = (0Cl(t). We want to answer the followingquestion: is the PMC policy indifferent in choosing either ofthe two channel?

By (30) and (31), we have gi = gj for these two channels.By (33), we can calculate the net gains by channel i andj:#log#

gi(t)#(t)

$$+( (0

#log#

gj(t)#(t)

$$+. To maximize the

objective in (33), it is clear that PMC policy will prefer theleasing channel i over the sensing channel j, and this tendencyincreases as (0 decreases. If we view the channel unavailabil-ity (1"(0) as the risk of choosing the sensing channel, thenthe PMC policy is a risk averse one. This is mainly due tothe concavity of the rate function. This preference order willalso hold in the imperfect sensing case, in which case we willhave gi > gj and

#log#

gi(t)#(t)

$$+( (0

#log#

gj(t)#(t)

$$+.

5.4 Performance of the PMC PolicyWe can characterize the performance of the PMC Policy asfollows:

Theorem 1: For any positive value V , the PMC Policy hasthe following properties:(a) The queue stability (11) and collision constraints (12) are

satisfied. The queue length is upper bounded by

Q(t) % Qmax!= V qmax+Amax, $ t; (34)

and the virtual queue length is upper-bounded by

Zi(t) % Zmax!= )(V qmax + Amax) + 1, $ i, t. (35)

where)

!=

rmaxp0(1" Pfa(Cs0)))

(1" p0)Pmd(Cs0)

and Cs0 != max

Cs

p0(1#Pfa(Cs)))(1#p0)Pmd(Cs) .

(b) The average profit RPMC obtained by the PMC policysatisfies

inf RPMC ( R&"O(1/V ), (36)

where R& is the optimal value of the PM problem.

According to the Little’s law, the average queuing delayis proportional to the queue length. Thus users experiencebounded queuing delays under the PMC algorithm by (34).By (36), we find that the profit obtained by the PMC Policycan be made closer to the optimal profit by increasing V .However, as V increases, the queuing delay also increases asshown in (34). The best choice of V depends on the desiredtrade-off between queuing delay and profit optimality.

A detailed proof of Theorem 1 is provided in Appendices Aand B.

5.5 Extension: More General Model of Primary Ac-tivitiesIn the previous analysis, we have assumed that primaryusers’ activities in each sensing channel follow a simple i.i.d.Bernoulli random process. Next we will show that the PMCpolicy can be easily adapted to the more general Markovchain model of the primary users’ activities shown in Fig. 3.In this model, for any time t, Si(t) is unknown, but thehistory information Si(t"1) is known, and also the transitionprobabilities Pr(Si(t) = s'|Si(t " 1) = s)

!= pi

s!s! , s !{0, 1}, s' ! {0, 1}, i ! Bs

max are known from long-timestatistics.

! "

p1! 0

p0! 1

p0! 0p1! 1

Fig. 3. Markov chain model of the PUs’ activities

All previous analysis for PMC policy will still hold if weupdate two parameters &i(t) and (i(t) as follows:

&i(t)!=

/E [Si(t)|Wi(t) = 1, Si(t" 1)], if i ! Bs(t)

1, if i ! Bl(t)

whereE [Si(t)|Wi(t) = 1, Si(t" 1) = s]

=pi

s!1(1" Pfa(Cs(t)))

pis!1(1" Pfa(Cs(t))) + ps!0Pmd(Cs(t))

;

and

(i(t) =

/E [Si(t)Wi(t)|Si(t" 1)] if i ! Bs(t)

1 if i ! Bl(t)

whereE [Si(t)Wi(t)|Si(t" 1) = s] = pi

s!1(1" Pfa(Cs(t))).

There is no change in the revenue maximization part,and the cost minimization part still involves a combinatorialoptimization problem. But the complexity of solving costminimization problem becomes O(|C| , 2|B

smax

||Blmax + 1|),

since we lose the structure information in sensing channels,i.e., the threshold structure does not hold for sensing channels.In the worst case (Bmax = Bs

max, Blmax = - ), it comes back

to O(|C|,2|Bmax|), which is the complexity of the exhaustivesearch without considering the threshold structure.

10

A C-MVNO

Fig. 4. Heterogeneous user model: Users in a hexagonsare nearby homogeneous users, who have the samechannel experience. Users in deferent hexagons can havedifferent channel experience.

Let us further consider a special Markov chain model wherethe transition probability for each sensing channel is the same,i.e., Pr(Si(t) = s'|Si(t"1) = s)

!= ps!s! , $i ! Bs

max. In thismodel, all sensing channels can be categorized into two types,channels being busy in the last slot (i.e., Si(t " 1) = 0), orchannels being idle in the last slot (i.e., Si(t " 1) = 1). Wecan still show threshold structures for both types. Thus thecomplexity is reduced to O(|C|, |Bmax|4 log(|Bmax|)).

The above analysis shows that it is critical to exploit theproblem structure to reduce the algorithm complexity.

6 HETEROGENEOUS USERSIn Section 4, we adopt the single queue analysis for homoge-neous users who are assumed to be located nearby and havethe same channel condition on each channel. However, thesingle queue analysis no longer works for a more generalscenario of heterogeneous users, where users can be locatedat different places, and have different channel conditions. Inthis section, we introduce the multi-queue model to deal withthe heterogenous user scenario as shown in Fig. 4.

We divide the total coverage of the secondary base stationinto J

!= {1, 2, . . . , J} disjoint small areas (illustrated as

hexagons in Fig. 4) according to users’ different channelexperiences. Users in one of these small area are nearbyhomogeneous users. They share the same channel conditions,and form a queue based on the FCFS discipline. We use Qj(t)to denote the queue length in area j. Since the queue and thecorresponding area is one-to-one mapping, we also call theusers in area j as queue j users.

For each queue j ! J , hij(t) represents the users’ channelgain to channel i ! Bmax at time t, which follows an i.i.ddistribution over time. The indicator variable Tij(t) ! {0, 1}denotes the operator’s channel assignment at time t:Tij(t) = 1 if channel i is allocated to queue j, and Tij(t) = 0otherwise. Meanwhile, the assignment Tij must satisfy

"

j$J

Tij(t) % 1, $t. (37)

The power allocation for queue j on channel i is denotedby Pij(t). The total power allocation must satisfy

"

j$J

"

i$Bmax

Pij(t) % Pmax, $t, (38)

Thus the rate of queue j ! J can be calculated

rj(t) ="

i$B(t)

Ii(t)Tij(t) log

21 +

hij(t)Pij(t)

Tij(t)

3(39)

where Ii(t) is the transmission result in channel i, followingthe same definition in Section 3.4.

For each queue j, we follow the same demand model as inSection 3.5 for homogenous users. We assume the number ofincoming users, the market state, and the user’s instantaneousdemand are i.i.d among different queues, and denote themas Nj(t), Mj(t), and Aj(t) respectively. The market controlvariable and price for queue j are denoted as Oj(t) and qj(t).The queuing dynamic for queue j is as follows:

Qj(t + 1) = (Qj(t)" rj(t))+ + Oj(t)Aj(t). (40)

Thus the homogeneous user model in Section 4 can also beviewed as a special case of the heterogeneous user model,where J is a singleton.

6.1 Multi-queue Profit Maximization Control Policy6.1.1 Revenue MaximizationFor any queue j ! J , we compute the optimal transmissionprice q&j (t) by solving the following problem.

Maximize2

qj(t)"Qj(t)

V

3D (qj(t), Mj(t)) (41)

Variables qj(t) ( 0

If the maximum objective in (41) is positive, the operator setstransmission control variable O&

j (t) = 1 and accepts users’new file download requests at the price q&j (t). Otherwise, theoperator sets O&

j (t) = 0 and rejects any new requests.

6.1.2 Cost MinimizationWe solve the following optimization problem to determinesensing cost and resource allocation:

Minimize"

i$B(t)

Ci(t)""

j$J

Qj(t)

VE [rj(t)] (42)

Subject to (1), (3), (38), (37)

Variables Cs(t),Bs(t),Bl(t), Pij(t) ( 0, Tij(t) ! {0, 1}

where the cost Ci(t) follows the same definition of (18).Similar to the homogenous model in Section5.3, it is a two-stage decision problem. In the first stage, we determine thesensing technology, and choose the sensing channels and theleasing channels. Then in the second stage, we determine thechannel assignment and power allocation based on the sensingresults. We use backward induction to solve this problem (42).To simplify the notation, we will ignore the time index in thefollowing analysis.

We first analyze the channel assignment and power allo-cation in the second stage. In this stage, since the sensingresults Wi, the channels Bs, and the sensing technology Cs

are determined, the second stage problem of (21) is shown asfollows:

Maximize"

j$J

"

i$B

Qj&iTij log

21 +

hijPij

Tij

3(43)

Subject to (38), (37)

Variables Pij ( 0, Tij ! {0, 1}

11

where &i follows the same definition of (23).Compared to the power allocation problem in (22), the

binary channel assignment variables Tij’s make the secondstage problem in (43) much more complex.

We first solve problem (43) assuming fixed Tij’s, in whichcase the power allocation problem is a convex optimizationproblem. Following the same method of solving power allo-cation for homogeneous users as in (25), we have

Pij = TijQj&i

21

'"

1

Qj&ihij

3+

, i ! B. (44)

where ' is the Lagrange multiplier of the total power constraint(38), which satisfies

' =

!i$Bp

&iQjTij

Pmax +!

i$Bp

Tij

hi(t)

, (45)

where Bp!= {i ! B : Pij > 0}. When Tij is known, we can

design a simple search algorithm similar to Alg. 1 to determinethe optimal value of '.

We then substitute this result in (43), and further maximizethe objective over Tij’s. Then we have

Maximize"

i$B

&i

"

j$J

QjTij

2log

2&iQihi

'

33+

(46)

Subject to"

j$J

Tij % 1

Variables Tij ! {0, 1}

For each channel i ! B, let us denote the set

J &i

!=arg max

j$JQj

2log

2&iQjhij

'

33+

(47)

Here the solution J &i is a set of indices of the chosen queues.

If J &i is a singleton, then we denote its unique element as j&i .

If J & is not singleton, since all elements in J & lead to thesame value in objective, then we can randomly pick one ofthe its element and denote it as j&i .

Since the objective in problem (46) is linear in Tij , it iseasy to see the optimal solution is

Tij =

/1, if j = j&i , i ! B

0, otherwise.(48)

Now let us consider how to calculate the value of ' andj&i . By (45) and (47), we find that they are actually coupledtogether. To determine ' in (45), we need to know Tij (orequivalently J &

i , j&i ), i.e., which queue is chosen for whichchannel. But determining J &

i in (47) requires the value of'. One way to solve this problem is to enumerate everypossible channel assignment combinations to find the solutionssatisfying both (45) and (47). Since each channel i ! Bmax canbe assigned to one of J queues, there are a total of |Bmax|J

channel assignment combinations. When the J or |Bmax| islarge, the complexity can be very high. However, we canreduce the search complexity by exploring the special structureof the problem.

Property 1: For each channel allocated positive power i !Bp, we have

J &i = arg max {Qj |hij >

'

Qj&i, j ! J }. (49)

This property comes from (47). This means that for aparticular channel i, if the channel gain for the longest queueQj is good enough (i.e., hij > #

Qj$i), we should assign

channel i to queue with the longest queue length. If J &i is a

singleton, then we denote its unique element as j&i . Otherwise,we denote J &

i!= arg max {hij | j ! J &

i }. In this case, thereare multiple channels with the same channel gain and thesame queue length, and we can randomly pick one and denoteit as j&i . Thus we can search ' and J &

i (and also j&i ) by asimple greedy algorithm as follows. First, for all channels, weassume J &

i = arg maxj$J Qj , and calculate the value of 'by the waterfilling algorithm (as the procedure “Waterfilling”in Alg. 4). For each unchosen channel, i.e., the channel iwith &iQj"i

hij"i% ', we check whether &iQjhij > ' can be

satisfied when another queue is chosen instead of j&i . If thereis some set Ji of queues satisfying &iQjhij > ', we replaceJ &

i with the one with the longest queue length in this set,i.e., arg maxj$Ji

Qj . We repeat the process iteratively untilwe find the ' and j&i that satisfies both (45) and (47). Thepseudo code is given in Alg. 4. To simplify the expression ofAlg. 4, with a little abuse of notations, we denote hi = hij"i

,Qi = Qj"i

, and $(m)!=

Pmi=1

$iQi

Pmax+Pm

i=1

1

hi

.

Algorithm 4 Channel Assignment1: J &

i * argmaxj$J Qj

2: procedure WATERFILLING(hi(t), Qi(t))3: Rearrange the channel indices i ! Bmax in the de-

creasing order of &i(t)Qi(t)hi(t)4: m* |B(t)|, '+ $(m)5: while ' ( hm(t)&m(t) do6: m* m" 17: '+ $(m)8: end while9: end procedure

10: while &i(t)hij(t)Qj(t) > ', $j, $i > m, do11: J &

i * arg max {Qj|&i(t)hij(t)Qj(t) > '}12: invoke procedure Waterfilling( hi(t), Qi(t))13: end while

The complexity of Alg. 4 is O(|Bmax|3 log(|Bmax|), sincethe while loop runs no more than |Bmax| times in theworst case, and the complexity of waterfilling part isO(|Bmax|2 log(|Bmax|) (the same as the waterfilling power al-location algorithm in Section 5.3.1). Compared to the exhaus-tive search, the complexity of solving the channel assignmentis greatly reduced.

With the solution of channel assignment, we can update thepower allocation solution of (42) as

P &ij =

:;

<&iQj

#1#" 1

$iQjhi

$+, if j = j&, i ! B,

0 otherwise,

where the value of ', hi and Qi are calculated by Alg. 4.After solving the second stage problem, we move to the first

stage. Following the channel assignment in (48), we find that

12

the first stage problem for the heterogeneous users is the samewith the one of homogeneous users problem in (29). We cansimply run the same Alg. 3 to determine sensing technologyCs&(t), sensing channel set Bs&(t) and leasing channel setBl&(t).

6.2 The Performance of M-PMC PolicyNext we show the performance bounds of the M-PMC Policy.The proof method is similar to that of Theorem 1, and thedetails are omitted due to space limit.

Theorem 2: For any positive value V , the M-PMC Policyhas the following properties:(a) The queue stability (11) and collision constraints (12) are

satisfied. The queue length is upper bounded by

Qj(t) % Qmax != V qmax+Amax, $ t, (50)

and the virtual queues are bounded by

Zi(t) % Zmax != )(V qmax + Amax) + 1, $ i, t. (51)

where )!= rmaxp0

p0(1#Pfa(Cs(t)))+(1#p0)Pmd(Cs(t))(1#p0) ,

and Cs0 denotes the highest sensing cost.(b) The average profit RM#PMC obtained by the M-PMC

policy satisfiesinf RM#PMC ( R

&"O(1/V ), (52)

where R& is the optimal value of the multi-queue PM

problem.

7 SIMULATIONIn this section we provide simulation results for PMC andM-PMC policies.

We conduct simulations with the following parameters. Thenumber of incoming users in each slot satisfies a Poissondistribution with a rate D(q(t), M(t)) = 1

M(t) (q(t)"5)2. Themarket state M(t) satisfies Bernoulli distribution, M(t) = 1with probability 0.5, and M(t) = 2 with probability 0.5. Thefile length of each user satisfies the i.i.d. (discrete) uniformdistribution between 1 and 10. There are 32 channels in total.20 of them belong to the sensing band Bs

max, and the rest12 channels belong to the leasing band Bl

max. The primarycollision probability tolerant levels are set as !i = 0.001for sensing channel i = 1, 2, . . . , 10, and !i = 0.005 forsensing channel i = 11, 12, . . . , 20. The channel gain hi ofeach channel satisfies i.i.d. (continuous) Rayleigh distributionwith parameter * = 4.5 The total power constraint of thebase station is Pmax = 8. There are 3 different sensingtechnologies with costs Cs = {0 (not sensing at all), 0.1, 0.5}.The corresponding false alarm probabilities are Pfa ={0.5, 0.1, 0.008}, and the missed detection probabilities arePmd = {0.5, 0.08, 0.005}. We assume the idle time proba-bility of sensing band p0 is 0.6, and the control parameterV ! {5, 10, 50, 100, 200}.

Figure 5 shows a collision situation of all sensing channelswith the control parameter V = 100. We find that primaryusers’ collision tolerant bound (12) is satisfied as time in-creases. We also find that we obtain similar curves for thecollision probabilities with other values of control parameterV .

0 1000 2000 3000 4000 50000

0.002

0.004

0.006

0.008

0.01

0.012

Time t

Col

lisio

n Pr

obab

lity

X

Collision situation of all sensing channels (V=100)

PU’s collision tolerant boundfor Channel 1~10

PU’s collision tolerant boundfor Channel 11~20

Fig. 5. A collision situation of all sensing channels withV = 100

Figure. 6 (a) shows that the average queue length growslinearly in V , and is always less than the worst case boundV qmax + Amax. Figure. 6 (b) shows that the average profitachieved by PMC policy converges quickly as V grows, andis close to the maximum profit when V ( 100.

0 50 100 150 2000

500

1000

1500

V

Aver

age

queu

e le

ngth

(a) Average queue length vs. Parameter V

Average queue length of PMC algorithmAverage queue length upper bound

0 100 20050 1500

20

40

60

V

Aver

age

prof

it

(b) Average profit vs. Parameter V

Fig. 6. (a) Average queue length vs. Parameter V , (b)Average profit vs. Parameter V

We further vary the idle time probability of sensing band p0

from 0 to 1. In Fig. 7, we show the average profit with differentsensing available probabilities p0 ! [0, 1] and different fixedsensing technologies. The black curve is with zero sensingcost, where Pfa = Pmd = 0.5, which means the operatordoes not perform sensing and takes random guesses of primaryusers’ activities in sensing channels. The blue curve is withthe low sensing cost 0.1, where Pfa = 0.1 and Pmd = 0.08.The purple curve is with the high sensing cost 0.5, wherePfa = 0.008 and Pmd = 0.005. The red curve is thePMC policy, which adaptively choose sensing cost from theabove three (sensing cost Cs = 0, 0.1, 0.5). When the sensingavailable probability is small (e.g., p0 ! [0, 0.2]), all strategiestend to only choose the leasing channels, thus all curves obtainsimilar profits. When the sensing available probability p0

further increases, the advantage of exploring sensing channels

13

0 0.2 0.4 0.6 0.8 130

35

40

45

50

55

60

Sensing Available Probablity: p0

Aver

age

Prof

t

Average profits with different sensing techniques

zero sensing costlow sensing costhigh sensing costoptimal sensing cost

Fig. 7. Average Profit with different sensing technologies

becomes more significant. The performances for strategiesusing the zero and low sensing cost are not good. The reason isthat their detection accuracy is not good enough. To achievethe primary users’ collision bounds, these strategies choosesensing channels less often, and replace with more expensiveleasing channels. When the sensing available probability ishigh and close to 1, sensing seems unnecessary. Therefore,the performance increases as sensing cost decreases, where thezero cost is the best and the high cost is the worst. The PMCpolicy (the red curve) adaptively chooses the sensing cost, i.e.,when p0 is medium, it utilizes the high sensing cost strategy inmost of time slots; as p0 keeps increasing, it gradually changesto utilize the low cost and zero cost strategies more frequently;and when p0 goes to 1, it utilizes the zero sensing cost strategyin most of time slots. It has the best performance, since it cantake advantage of different sensing technologies for differentsensing available probabilities.

For the M-PMC policy, we conduct simulations for a simpletwo-queue system. For queue-1, the channel gain hi1 of eachchannel satisfies i.i.d. (continuous) Rayleigh distributions withparameter * = 4.5. For queue-2, the channel gain hi2 of eachchannel satisfies i.i.d. (continuous) Rayleigh distributions withparameter * = 5.5. This is because queue-2 users are closerto the operator’s base station than queue-1 users.

Figure 8 (a) shows that the average transmission rateobtained by queue-2 users is higher than that of the queue-1 users. This is because queue-2 users usually have betterchannel conditions, and M-PMC policy prefers to allocatemore powers to better channels to improve the transmissionrate. Figure 8 (b) shows that the revenues obtained by theoperator from users of two queues are almost the same whenall queues are stable. It is an interesting observation. We canunderstand it in this way: when two queues make differentrevenue, to maximize the profit (also the revenue), the operatorwill allocate more transmission rates to the queue with ahigher revenue. Thus the length of the queue with a highertransmission rate will be shortened, and the negative queuingeffect in revenue maximization problem will be soon diluted.It further leads to a decreasing price and a decreasing revenue.

In contrast, the length of the queue with a lower transmissionrate increases, which results in an increasing price and anincreasing revenue. Therefore, when all queues are stablefinally, the average revenue generated by each queue is thesame.

0 2000 4000 6000 8000 100000

5

10

15

Time t

Aver

age

Tran

smis

sion

Rat

e (a) Average rate

Average rate of queue 1Average rate of queue 2

0 2000 4000 6000 8000 100000

20

40

60

80

Time t

Rev

enue

(b) Average Revenue

Average revenue of queue 1Average revenue of queue 2

Fig. 8. (a) Average transmission rates of a two-queue M-PMC policy, (b) Average revenues of a two-queue M-PMCpolicy

8 CONCLUSIONIn this paper, we study the profit maximization problem ofa cognitive mobile virtual network operator in a dynamicnetwork environment. We propose low-complexity PMC andM-PMC policies which perform both revenue maximizationwith pricing and market control, and cost minimization withproper resource investment and allocation. We show that thesepolicies can achieve arbitrarily close to the optimal profit, andhave flexible trade-offs between profit optimality and queuingdelay.

We also find several interesting features in these close-to-optimal policies. In revenue maximization, the dynamicpricing strategy performs the functionality of congestion con-trol to users’ demands, i.e., the longer the queue length ofdemands, the higher price the operator should charge. In costminimization, the operator is risk averse towards spectruminvestment, and prefers stable leasing spectrum to unstablesensing spectrum with the same channel condition and thesame availability-price-ratio.

In this paper, we only looked at the issue of elastic traffic.It would be worthwhile to incorporate inelastic traffic, whichusually has strict constraints on transmission rates and delays.Typical examples include real-time multimedia applications,e.g., audio streaming, Video on Demand (VoD), and Voiceover IP (VoIP). In the most general case, we can considera hybrid system with both elastic and inelastic traffic, whichis more realistic and practical. In addition, as mentioned inSection 2, the literature about competition in cognitive radionetworks mainly focus on the static network scenario. It isalso interesting to extend our dynamic model to incorporatecompetition among several network operators.

14

REFERENCES[1] S. Li, J. Huang, and S. Li, “Profit maximization of cognitive virtual

network operator in a dynamic wireless network,” in Proc. of IEEEICC, 2012, pp. 1–6.

[2] M. McHenry, P. Tenhula, and D. McCloskey, “Chicago spectrum oc-cupancy measurements and analysis and a long-term studies proposal,”Shared Spectrum Company, Tech. Rep., 2005.

[3] “Fcc released the final rules for the cognitive use of tv white spacesin the us,” 2010, federal Communications Commission:Report FCC-10-174, online available at http://transition.fcc.gov/Daily Releases/DailyBusiness/2010/db0923/FCC-10-174A1.pdf.

[4] I. Akyildiz, W. Lee, M. Vuran, and S. Mohanty, “Next genera-tion/dynamic spectrum access/cognitive radio wireless networks: a sur-vey,” Computer Networks, vol. 50, no. 13, pp. 2127–2159, 2006.

[5] Q. Zhao and B. Sadler, “A survey of dynamic spectrum access,” IEEESignal Processing Magazine, vol. 24, no. 3, pp. 79–89, 2007.

[6] L. Duan, J. Huang, and B. Shou, “Cognitive mobile virtual networkoperator: Investment and pricing with supply uncertainty,” in Proc. ofIEEE INFOCOM, 2010.

[7] ——, “Investment and pricing with spectrum uncertainty: A cogni-tive operators perspective,” IEEE Transactions on Mobile Computing,vol. 10, no. 11, pp. 1590 – 1604, November 2011.

[8] “Introduction of mvno,” internet Open Resource: online available athttp://en.wikipedia.org/wiki/Mobile virtual network operator and http://www.mobilein.com/what is a mvno.htm.

[9] C. Camaran and D. D. Miguel, “Mobile virtual network operator(mvno) basics: What is behind this mobile business trend,” ValorisTelecommunication Practice, Tech. Rep., 2008.

[10] “List of mvno worldwide,” internet Open Resource: online available athttp://www.mobileisgood.com/mvno.php.

[11] C. Stevenson, G. Chouinard, Z. Lei, W. Hu, S. Shellhammer, andW. Caldwell, “Ieee 802.22: The first cognitive radio wireless regionalarea network standard,” IEEE Communications Magazine, vol. 47, no. 1,pp. 130–138, 2009.

[12] A. Al Daoud, T. Alpcan, S. Agarwal, and M. Alanyali, “A stackelberggame for pricing uplink power in wide-band cognitive radio networks,”in Proc. of IEEE CDC, 2008, pp. 1422–1427.

[13] H. Yu, L. Gao, Z. Li, X. Wang, and E. Hossain, “Pricing for uplink powercontrol in cognitive radio networks,” IEEE Transactions on VehicularTechnology, vol. 59, no. 4, pp. 1769–1778, 2010.

[14] J. Jia and Q. Zhang, “Competitions and dynamics of duopoly wirelessservice providers in dynamic spectrum market,” in Proc. of ACMMobihoc, 2008, pp. 313–322.

[15] L. Duan, J. Huang, and B. Shou, “Duopoly competition in dynamicspectrum leasing and pricing,” IEEE Transactions on Mobile Computing,vol. 11, no. 11, pp. 1706 – 1719, November 2012.

[16] O. Ileri, D. Samardzija, and N. Mandayam, “Demand responsive pricingand competitive spectrum allocation via a spectrum server,” in Proc. ofIEEE DySPAN, 2005, pp. 194–202.

[17] J. Elias and F. Martignon, “Joint spectrum access and pricing in cognitiveradio networks with elastic traffic,” in Proc. of IEEE ICC, 2010, pp. 1–5.

[18] D. Niyato, E. Hossain, and Z. Han, “Dynamics of multiple-seller andmultiple-buyer spectrum trading in cognitive radio networks: a game-theoretic modeling approach,” IEEE Transactions on Mobile Computing,pp. 1009–1022, 2008.

[19] S. Sengupta, M. Chatterjee, and S. Ganguly, “An economic frameworkfor spectrum allocation and service pricing with competitive wirelessservice providers,” in Proc. of IEEE DySPAN, 2007, pp. 89–98.

[20] L. Gao, Y. Xu, and X. Wang, “Map: Multi-auctioneer progressive auctionfor dynamic spectrum access,” IEEE Transactions on Mobile Computing,vol. 10, no. 8, pp. 1144 – 1161, August 2011.

[21] J. Jia, Q. Zhang, Q. Zhang, and M. Liu, “Revenue generation fortruthful spectrum auction in dynamic spectrum access,” in Proc. of ACMMobiHoc, 2009, pp. 3–12.

[22] L. Huang and M. Neely, “The optimality of two prices: Maximizingrevenue in a stochastic communication system,” IEEE/ACM Transactionson Networking, vol. 18, no. 2, pp. 406–419, 2010.

[23] R. Urgaonkar and M. Neely, “Opportunistic scheduling with reliabilityguarantees in cognitive radio networks,” IEEE Transactions on MobileComputing, vol. 8, no. 6, pp. 766–777, 2009.

[24] M. Lotfinezhad, B. Liang, and E. Sousa, “Optimal control of constrainedcognitive radio networks with dynamic population size,” in Proc. ofIEEE INFOCOM, 2010, pp. 1–9.

[25] M. Weiss, S. Delaere, and W. Lehr, “Sensing as a service: An explorationinto practical implementations of DSA,” Proc. of IEEE DySPAN, 2010.

[26] “Digital dividend: Geolocation for cognitive access,” Independent regu-lator and competition authority for the UK communications industries,Tech. Rep., 2010.

[27] A. Anandkumar, N. Michael, and A. Tang, “Opportunistic spectrumaccess with multiple users: learning under competition,” in Proc. of IEEEINFOCOM, 2010, pp. 1–9.

[28] K. Woyach, P. Pyapali, and A. Sahai, “Can we incentivize sensing in alight-handed way?” in Proc. of IEEE DySPAN, 2010, pp. 1–12.

[29] M. Sharma and X. Lin, “Ofdm downlink scheduling for delay-optimality: Many-channel many-source asymptotics with general arrivalprocesses,” in Proc. of IEEE ITA, 2011.

[30] M. Neely, “Stochastic Network Optimization with Application to Com-munication and Queueing Systems,” Synthesis Lectures on Communica-tion Networks, vol. 3, no. 1, pp. 1–211, 2010.

[31] S. Li, J. Huang, and S. Li, “Dynamic profit maximization ofcognitive mobile virtual network operator,” Tech. Rep., available athttp://jianwei.ie.cuhk.edu.hk/publication/DynamicCVNOTechReport.pdf.

[32] J. Huang, V. Subramanian, R. Agrawal, and R. Berry, “Downlinkscheduling and resource allocation for ofdm systems,” IEEE Transac-tions on Wireless Communications, vol. 8, no. 1, pp. 288–296, 2009.

[33] D. Palomar and J. Fonollosa, “Practical algorithms for a family ofwaterfilling solutions,” IEEE Transactions on Signal Processing, vol. 53,no. 2, pp. 686–695, 2005.

[34] D. Knuth et al., “Sorting and searching, the art of computer program-ming, vol. 3,” 1973.

Shuqin Li received Ph.D. in Information Engi-neering from The Chinese University of HongKong in 2012. She now works as a researchscientist in Research and Innovation, Alcatel-Lucent Shanghai Bell Co., Ltd. Her researchinterests include resource allocation, pricing andrevenue management in communication net-works, applied game theory, contract theory andincentive mechanism design in network eco-nomics, network coding and stochastic networkoptimization.

Jianwei Huang is an Assistant Professor in theDepartment of Information Engineering at theChinese University of Hong Kong. He receivedPh.D. in Electrical and Computer Engineeringfrom Northwestern University in 2005. Dr. Huangcurrently leads the Network Communicationsand Economics Lab (ncel.ie.cuhk.edu.hk). Heis the co-recipient of five best paper awards,including the 2011 IEEE Marconi Prize PaperAward in Wireless Communications .

Dr. Huang currently serves as Editor of IEEETransactions on Wireless Communications, Editor of IEEE Journal onSelected Areas in Communication - Cognitive Radio Series, and Chairof IEEE ComSoc Multimedia Communications Technical Committee.

Shuo-Yen Robert Li S.-Y. Robert Li (Bob Li)received the PhD degree from UC Berkeley in1974. He taught applied math at MIT in 1974-76and math/statistics/CS at U. Illinois, Chicago in1976-79. After working at Bell Labs/Bellcore fora decade, he joined CUHK as a chair professorin 1989. He now also serves as Co-director ofInstitute of Network Coding as well as honoraryprofessor in a few universities. Bob initiated alge-braic switching theory and is a cofounder of net-work coding theory. His ”martingale of patterns”

(1980) is widely applied to genetics.

dynamic proﬁt maximization of cognitive mobile virtual...

Documents