optimal probabilistic policy for dynamic resource activation using markov decision process in green...

13
1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing 1 Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks Peng-Yong Kong, Senior Member, IEEE Abstract—With increasing awareness toward protecting our environment, this paper intends to reduce the CO2 emission of a wireless cellular network by reducing the power consumption of its base station. We propose to reduce power consumption by dynamically activating and deactivating the modular resources at the base station depending on the instantaneous network traffic. In order to achieve the objective, we develop a discrete time Markov Decision Process (DTMDP) to capture the dynamics of the system. In the DTMDP, the action to be taken at each decision epoch is to activate a new resource module, to deactivate a currently active resource module, or to stay the same. We further develop a linear programming approach to model the DTMDP so that it can be solved for optimal probabilistic decision policy. Evaluation results show that the optimal probabilistic policy for resource activation can reduce power consumption for more 50% under various traffic load conditions, without compromising network service quality which is measured in terms of user blocking probability. Index Terms—Markov decision process, green wireless networks, green communications, energy efficiency. 1 I NTRODUCTION I N November 2012, the World Bank has published a report to remind us of the environmental and humani- tarian crisis brought by climate change and global warm- ing [1]. If nothing is done, the world will be warmed by 4 degrees Celsius by the end of this century. The warmer climate will present a significant threat to our economic development and human life. Global warming and climate change are caused by heat-trapping carbon dioxide (CO2) in the atmosphere. We can reduce CO2 emission by reducing our power consumption. In 2008, information and communication technology (ICT) industry as a whole has consumed 7% of the total electrical power generated worldwide [2]. This percent- age of power consumption is expected to grow to 14% in 2020, given the existing popularity and our increasing dependency on telecommunication services. With the in- troduction of iPhone and Andriod smart mobile devices, proliferation of ebook readers such as iPad and Kindle, and the success of social networking platforms such as Facebook and Twitter, we are demanding even more access to telecommunication services. Generally, ICT industries can be more environmental friendly by reducing its power consumption and im- proving its energy efficiency. In the literature, environ- mental friendly ICT is also called green ICT or green communications. Green ICT covers both the efforts of: (a) Reducing power consumption in other industrial sections by using ICT [3], and (b) Reducing power Peng-Yong Kong is with the Department of Electrical and Com- puter Engineering, Khalifa University of Science, Technology and Re- search (KUSTAR), Abu Dhabi, United Arab Emirates. E-mail: pengy- [email protected] consumption within ICT section. This paper focuses on the latter. More specifically, this paper aims at reducing power consumption in a wireless cellular network. We focus on wireless communication due to the fact that it is the only mean to enable anywhere anytime communica- tions and the prevalent adoption of mobile communica- tion devices. Rapidly growing wireless communication systems have accounted for a significant portion of the total power consumption in the ICT industry as a whole. As an effort in reducing power consumption of a computer network, [4] has introduced sleep mode in the network which allows for communication nodes to be switched off. In addition to sleep mode, [5] has introduced the idea of rate-adaptation where a slower data rate is provided to a user with a lower traffic rate to save even more energy. However, [4] and [5] are for the wired Internet and this paper focuses on energy efficient wireless cellular networks. In wireless cellular networks, there is a great interest in saving energy at mobile devices. Here, energy is saved by switching mobile devices to a low power mode when there is no network activity within the length of an inactivity timer [6]. The length of the inactivity timer is a subject of optimization because it controls the trade-off between energy conservation and performance degradation. The performance degradation is due to the delay suffered by the mobile devices for the need to switch back from the low power mode before restart- ing transmissions. The trade-off between energy saving and performance degradation given the user inactivity timer is analyzed in [7]. LTE and WiMAX have also introduced discontinue reception mode and discontinue transmission mode to save energy at the mobile de- vices by momentarily powering down the devices while

Upload: peng-yong

Post on 30-Jan-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing

1

Optimal Probabilistic Policy for DynamicResource Activation Using Markov Decision

Process in Green Wireless NetworksPeng-Yong Kong, Senior Member, IEEE

Abstract—With increasing awareness toward protecting our environment, this paper intends to reduce the CO2 emission of a wirelesscellular network by reducing the power consumption of its base station. We propose to reduce power consumption by dynamicallyactivating and deactivating the modular resources at the base station depending on the instantaneous network traffic. In order toachieve the objective, we develop a discrete time Markov Decision Process (DTMDP) to capture the dynamics of the system. Inthe DTMDP, the action to be taken at each decision epoch is to activate a new resource module, to deactivate a currently activeresource module, or to stay the same. We further develop a linear programming approach to model the DTMDP so that it can besolved for optimal probabilistic decision policy. Evaluation results show that the optimal probabilistic policy for resource activation canreduce power consumption for more 50% under various traffic load conditions, without compromising network service quality which ismeasured in terms of user blocking probability.

Index Terms—Markov decision process, green wireless networks, green communications, energy efficiency.

F

1 INTRODUCTION

IN November 2012, the World Bank has published areport to remind us of the environmental and humani-

tarian crisis brought by climate change and global warm-ing [1]. If nothing is done, the world will be warmedby 4 degrees Celsius by the end of this century. Thewarmer climate will present a significant threat to oureconomic development and human life. Global warmingand climate change are caused by heat-trapping carbondioxide (CO2) in the atmosphere. We can reduce CO2emission by reducing our power consumption.

In 2008, information and communication technology(ICT) industry as a whole has consumed 7% of the totalelectrical power generated worldwide [2]. This percent-age of power consumption is expected to grow to 14%in 2020, given the existing popularity and our increasingdependency on telecommunication services. With the in-troduction of iPhone and Andriod smart mobile devices,proliferation of ebook readers such as iPad and Kindle,and the success of social networking platforms such asFacebook and Twitter, we are demanding even moreaccess to telecommunication services.

Generally, ICT industries can be more environmentalfriendly by reducing its power consumption and im-proving its energy efficiency. In the literature, environ-mental friendly ICT is also called green ICT or greencommunications. Green ICT covers both the efforts of:(a) Reducing power consumption in other industrialsections by using ICT [3], and (b) Reducing power

• Peng-Yong Kong is with the Department of Electrical and Com-puter Engineering, Khalifa University of Science, Technology and Re-search (KUSTAR), Abu Dhabi, United Arab Emirates. E-mail: [email protected]

consumption within ICT section. This paper focuses onthe latter. More specifically, this paper aims at reducingpower consumption in a wireless cellular network. Wefocus on wireless communication due to the fact that it isthe only mean to enable anywhere anytime communica-tions and the prevalent adoption of mobile communica-tion devices. Rapidly growing wireless communicationsystems have accounted for a significant portion of thetotal power consumption in the ICT industry as a whole.

As an effort in reducing power consumption of acomputer network, [4] has introduced sleep mode inthe network which allows for communication nodesto be switched off. In addition to sleep mode, [5] hasintroduced the idea of rate-adaptation where a slowerdata rate is provided to a user with a lower traffic rate tosave even more energy. However, [4] and [5] are for thewired Internet and this paper focuses on energy efficientwireless cellular networks.

In wireless cellular networks, there is a great interestin saving energy at mobile devices. Here, energy issaved by switching mobile devices to a low power modewhen there is no network activity within the lengthof an inactivity timer [6]. The length of the inactivitytimer is a subject of optimization because it controls thetrade-off between energy conservation and performancedegradation. The performance degradation is due to thedelay suffered by the mobile devices for the need toswitch back from the low power mode before restart-ing transmissions. The trade-off between energy savingand performance degradation given the user inactivitytimer is analyzed in [7]. LTE and WiMAX have alsointroduced discontinue reception mode and discontinuetransmission mode to save energy at the mobile de-vices by momentarily powering down the devices while

Page 2: Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing

2

keeping them connected to the network with a reducedthroughput [8].

Instead of saving energy at the mobile devices, thispaper focuses on reducing power consumption andimproving energy efficiency at the base stations. Thisis because the base stations are estimated to accountfor about 80% of the total power consumption of atelecommunication network. In the literature [9], thereare various existing efforts on conserving energy at thebase stations. These efforts can be classified into twocategories: (a) Improve energy efficiency of the basestation, and (b) Reduce the required number of basestations for each telecommunication network. For thefirst category, it may involve controlling the transmissionpower more optimally through parameter optimizationafter taking into account coverage and capacity require-ments [10], [11], or re-designing the base station by usingequipment, components and architecture that are moreenergy efficient [12]. According to [13], power amplifieris the most critical components in a base station as itaccounts for almost 40% of total power consumption,and significant improvement in energy efficiency canbe achieved by using advanced power amplifier. Forthe second category, the number of macro base stationscan be minimized by introducing a higher density oflow energy micro and pico base stations [14], [15]. In[16], a queueing analysis is performed to relate powerconsumption to packet delay through femto base stationdensity. In [17], a femto cell deployment architectureis presented and analyzed to quantify the achievablepower saving in off-loading network traffic from macrocells to femto cells. In [18], a MDP-based vertical han-dover scheme is designed to conserve energy by opti-mally switching users between macro and femto basestations. All these works consider the telecommunicationnetwork to carry a specific fixed volume of traffic, whichis generally a representation of the busy hours or peakhours scenario. In reality, network traffic is dynamic as itchanges temporally from time to time and spatially fromlocation to location. Such variation in network traffic aredependant on the demographics, seasons and personaluser habits. For instance, an area that is predominatelyresidential can expect to have a different daily trafficprofile compared to that of an industrial area.

In [19], [20], [21], dynamic network traffic character-istics are exploited in reducing power consumption byshutting down some base stations during low traffichours, or by controlling the cell size depending on theinstantaneous traffic load. However, these existing worksassume each base station as a single entity which must becontrol as a whole unit. We argue that each base stationcan be managed as a collection of modular resource unitseach with its own power consumption. These resourcemodules can be radios, baseband processors, feeders,power amplifiers, air-conditioners, etc. This modularresource model is reasonable based on [13], and is similarto the system model adopted by [22]. With the modularbase station, this paper intends to exploit the dynamic

nature of network traffic in performing optimal resourcemodule activation and deactivation at the base stationsuch that power consumption can be reduced withoutcompromising service quality. Specifically, we proposea Markov Decision Process (MDP) [23], [24] model tocapture the dynamic nature of network traffic and to op-timally activate and deactivate the base station resourcemodules subject to a tolerable user blocking probability.Our problem and objective are same as in [25] but theyhave assumed a different system model. Specifically, [25]has assumed a heterogeneous network where energysaving can be achieved by off-loading network trafficfrom macro base station to its femto base stations. Similarto this paper, [25] has also considered dynamic networktraffic in addition to the location information of usersin identifying which femto cells to be awaken. Morerecently, [26] has dealt with a similar problem of thispaper but using a different approach which is based onreinforcement learning.

The rest of this paper is organized as follows. Wedescribe the system model in Section II. In Section III,we propose the MDP model and find its optimal prob-abilistic policy using linear programming. In Section IV,we present and discuss the evaluation results. This paperends with concluding remarks in Section V.

2 SYSTEM MODEL

We adopt a discrete time model as illustrated in Fig. 1,where the time domain is divided into repetitive timeslots of fixed duration T . In this paper, the term “timet” is used inter-changeably with the term “time slotn”, where time slot n is related to time t throughnT ≤ t < (n + 1)T and n = 0, 1, 2, · · · . For a variablethat changes from time to time, the changes occur onlyat the beginning of each time slot, where the changedvalue stays the same within a time slot before changingagain with a probability at the beginning of the next timeslot.

We consider a base station with a pool of resourcemodules where all modules are co-located and have asame radio coverage. Each resource module supportsonly 1 type of resource, such as relay stations for LTE,or sub-carriers for WiMAX. In a resource module, thereare a number of resource units. For example, each wire-less channel module provides for U wireless channels.In our model, the resource modules can be activatedand deactivated separately from time to time similarto that in [22] and [27]. When a resource module isactivated (deactivated), all the U resource units withinthe module become available (unavailable) immediately.An activated resource module contributes to the totalpower consumption of the base station. The base stationpower consumption Pbs[n] in time slot n is determinedas follows:

Pbs[n] = Pcnt +∑i∈I

mi[n]Pr,i, (1)

Page 3: Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing

3

������������

������������

����� �����������������µ ���

�����������������������

������

����������� �����

�������������������

��� ������������

����� ��������

����� �

� ����������������������

� �������������� ���������

�����������������

����� � ���

������

������� ����������

������������������������������������������������

�������� �������������������������

Fig. 1. System model for dynamic activation of basestation resource modules.

where Pcnt is the base station’s fundamental power con-sumption within a time slot, and it depends on the loadindependent factors of all equipment. Here, I is the setconsists of all types of available resource modules. Also,mi[n] is the number of activated resource module of typei in time slot n, and Pr,i is the power consumption ofeach modular resource type i. The term

∑i mi[n]Pr,i is

the load dependent power consumption in time slot n. Itis load dependant because mi[n] changes when the trafficload changes since each resource module is capable ofsupporting only a fixed traffic load. This power con-sumption model is consistent with [28]. For simplicity,we consider only one type of resource hereafter sincethe results can be easily extended to the case of multipleresource types. Hence, (1) becomes

Pbs[n] = Pcnt +m[n]Pr. (2)

Traffic load at a base station is measured in terms ofthe number of active users. A user becomes active whenit is making an outgoing call or receiving an incomingcall. When the call ends, the user becomes inactive.We use user arrival rate, λ[n] to indicate the averagenumber of users becoming active within a second intime slot n. Similarly, we use user departure rate, µ[n] toindicate the average number of users becoming inactivewithin a second in time slot n. The traffic load is time-varying when λ[n] or µ[n] changes from time to time. For

example, during peak hours, λ[n] is higher compared toits value at non-peak hours. While the user arrival rate istime-varying, it changes at a much slower rate comparedto the speed of changing a time slot. Specifically, the userarrival rate may reduce over several hours after the peakhours but the duration of a time slot T is in the scale ofseconds or minutes. In our system model, T is not anoptimization parameter but a system variable which isdetermined based on practical ground after consideringcomputational requirement. We will describe in detaillater in the next section that the base station needs tomake decision to select an action in each time slot. Thus,a smaller T implies a higher computational requirement.As such, we set T to 1 minute that forms part ofthe system setting for which we develop an optimaldecision making policy. Let λ̇ be the speed of changein user arrival rate, and λ̇ ≪ 1/T . Then, with the largedifference between 1/T and λ̇, as depicted in Fig. 1, λ[n]may remain unchanged for a number of time slots. Weassume the similar slow time-varying characteristics forµ[n] and avoid redundant explanation here.

Following the literature, user arrival and user depar-ture are two independent stochastic processes in oursystem model. Specifically, user arrival is a Poissonprocess with parameter λ[n]T in time slot n. As such,the probability of having x user arrivals within a timeslot n is given as follows:

P{x users arrive} =(λ[n]T )xe−λ[n]T

x!. (3)

Similar to user arrival, user departure is a Poisson pro-cess with parameter µ[n]T in time slot n. Accordingly,the probability of having y user departures within a timeslot n is given as follows:

P{y users depart} =(µ[n]T )ye−µ[n]T

y!. (4)

Recall that U is the number of resource units in aresource module and m[n] is the number of activatedresource modules in time slot n. There are m[n] × Uresource units available in time slot n. Out of theseavailable resource units, only m[n] × U − u[n] resourceunits are free, where u[n] is the number of occupiedresource units in time slot n. Let ri be the number ofresource units required by a user i. The user i is admittedinto the system if there are at least ri resource units freeat the base station. After being admitted, the number offree resource units is reduced by ri. A call (user) blockingoccurs when a newly arrive user i finds less than ri freeresource units. Then, the call blocking probability at timeslot n is given as follows:

Pblocking[n] =∑ri

(P{new user i requires ri resource

units} × P{m[n]× U − u[n] < ri}). (5)

We use this call blocking probability as the measurementfor service quality of the wireless cellular networks. It is

Page 4: Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing

4

desirable to have a very low call blocking probability,says 0.01 at all times for a good network service.

3 OPTIMAL POLICY FOR DYNAMIC MODULARRESOURCE ACTIVATION

With the system model described in the previous section,we now state our goal as to minimize the average powerconsumption of a base station over a time slot whilekeeping the call blocking probability below a certainlimit. To achieve the goal, we propose to model thesystem using a stochastic Markov chain, called MarkovDecision Process (MDP) and to solve for its optimal pol-icy in activating and deactivating the resource modulesat base station. We develop the MDP next.

Consider a discrete time Markov chain (DTMC){Xn, n = 0, 1, 2, · · · }, where Xn is the state of the systemin a time slot n. The system transits from one state to an-other state, from time to time. The transition probabilitymatrix Pn depends on the action {An, n = 0, 1, 2, · · · }taken in time slot n. When the system is in state Xn intime slot n, taking action An incurs a cost c(Xn, An) tothe system. The joint process {(Xn, An), n = 0, 1, 2, · · · }is the discrete time Markov Decision Process (DTMDP).Each DTMDP is completely described by its states, ac-tions, transition probabilities, and costs. We define thesefour components next. In defining the components, wefirst consider a system that supports only voice users.Then, we extend the voice-only DTMDP model to in-clude data (non-voice) users with different data rates.

3.1 Voice Users

Following the DTMDP described above, we let S be thestate space which includes all possible states, such thatXn ∈ S . We consider a system with a finite numberof states such that S = {1, 2, · · · , S}, where S = |S| isthe number of states. We define the state through twovariables such that Xn = (mn, un), where mn = m[n]and un = u[n] are respectively the number of activatedresource modules and the number of occupied resourceunits at time slot n. We define voice users such that eachuser i requires only ri = 1 resource unit. As such, thenumber of occupied resource units is indeed the numberof active users.

Since we intend to active and deactivate resourcemodules dynamically to reduce power consumption, mn

varies from time to time such that 0 ≤ mn ≤ M , whereM is the maximum number of resource modules. Whenmn = 0, no resource module is activated and the basestation is practically shutdown. Here, un varies fromtime to time due to active user arrival and departure asdescribed earlier, such that 0 ≤ un ≤ mn×U , where U isthe maximum number of active users supported by eachresource module. Therefore, we may list all the possiblestates as follows, S = {(0, 0), (1, 0), (1, 1), (1, 2), · · · ,(1, U), (2, 0), (2, 1), (2, 2), · · · , (2, 2U), · · · , (M, 0), (M, 1),(M, 2), · · · , (M,MU)}. Then, there exists a function h1(·)

that maps these states into S = {1, 2, · · · , S} sequentiallyby returning the index to the list such that h1((0, 1)) = 1,h1((1, 0)) = 2, h1((1, 1)) = 3, · · · , h1((M,MU)) = S.For simplicity, when referring to a state Xn, we will usethe two-variable representation (mn, un) and the indexh1((mn, un)) interchangeable hereafter. Following h1(·),we further define two functions, h2(·) and h3(·) thatrespectively returns the first variable and the secondvariable for a given two-variable state representation. Assuch, h2((mn, un)) = mn and h3(mn, un) = un.

In the DTMDP, the number of states S is given asfollows:

S =M∑

m=0

(m× U + 1)

= 1 +UM2 + (U + 2)M

2. (6)

From the equation above, we see that the number ofstates in our DTMDP does not converge, and it increasesas the square of the number of resource modules in-creases. This is the typical problem of state explosionin MDP. Fortunately, we envisage that the number ofresource modules in a base station to be around 10,and thus the number of states will be manageable.Specifically, our numerical results in Section IV showsthat a system with M = 6 resource modules is alreadyvery good.

After observing the state Xn = i ∈ {1, 2, · · · , S}, anaction An is taken from the set of all feasible actions A(i)at that state. We define three possible actions as −1, 0,+1,where “−1” means deactivate a currently active resourcemodule, “0” means do not activate nor deactivate anyadditional resource module, and “+1” mean activate acurrently deactivated resource module. Since we can notdeactivate a module when none is activated, action “−1”is not feasible for states Xn = (mn = 0, un). Similarly,we can not activate any additional module when all Mmodules have been activated. Thus, action “+1” is notfeasible for states Xn = (mn = M,un). Therefore, the setof state dependant feasible actions is given as follows:

A(i) =

{0, 1} if i = (0, un)

{−1, 0} if i = (M,un)

{−1, 0,+1} otherwise(7)

As a result of taking an action An ∈ A(i), the systemtransits from one state i to another state. Since we haveonly a very limited set of actions, not all states in thestate space S are reachable from all other states after tak-ing an action. For example, state Xn+1 = (mn+2, un+1) isnot reachable from state Xn = (mn, un) after taking anyone of the three actions. Thus, we further define S(i, a)as the reachable states if action a is taken in state i.

The transition probability from state i to reachablestate j given that action a has been taken in time slotn is given as follows:

pij,n(a) = P{Xn+1 = j|Xn = i, An = a}, (8)

Page 5: Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing

5

where i ∈ S , j ∈ S(i, a), and a ∈ A(i). According to oursystem model, user arrival and departure are indepen-dent Poisson processes, which are also not dependent onthe action taken. Thus, the transition probability dependsonly on the user arrival and departure processes, whilethe action affects only the set of reachable states. In orderto explain this further, for any given state i = (m,u)and j = (m∗, u∗), we define z = u∗ − u as the net userarrivals within time slot n. Note that z can be positive ornegative. It is positive when there are more users arrivethan depart within a time slot n. On the other hand,z is negative when there are fewer users arrive thandepart within a time slot n. Hence, we can determinethe transition probability pij,n(a) as follows:

pij,n(a) =

P{z net arrivals} if h2(j) = h2(i) + a and

h3(j) = h3(i) + z

0 otherwise(9)

From (9), the transition probability matrix can be de-termined as soon as we find the probability distributionof net user arrivals, z. Recall that z = u∗ − u, where u∗

and u are the number of active users in the new state andold state, respectively. Practically, z is a random variablethat depends on the difference between new active userarrivals and departures within a time slot n. We canwrite the probability density of z as follows:

P{z net arrivals}= P{x arrivals} − P{y departures}

=(λ[n]T )xe−λ[n]T

x!− (µ[n]T )ye−µ[n]T

y!. (10)

From (10), we notice that z is the difference betweentwo Poisson random variables. As such, probability dis-tribution of z is given by Skellam distribution and (10)can be rewritten as follows:

P{z net arrivals}

= e−(λ[n]+µ[n])T

(λ[n]

µ[n]

)z/2

I|z|

(2T

√λ[n]µ[n]

),

(11)

where Iz(·) is the modified Bessel function of the firstkind, and it is given as follows:

Iz(x) =∞∑

m=0

1

m!Γ(m+ z + 1)

(x2

)(2m+z)

, (12)

where Γ(·) is the Gamma function.Combining (9) and (11), we obtain the transition prob-

ability matrix as given in (13). There is a subscript nin the transition probability matrix indicating that thematrix changes from time to time. This is due to thefact that we have considered the network traffic, ascharacterized by λ[n] and µ[n], may change from timeto time. However, as described earlier in Section II anddepicted in Fig. 1, the user traffic changes at a muchslower rate compared to the speed of changing time

slots. As such, the transition probability matrix needs noupdate in every time slot. This is a good news becauserenewing the transition probability in every time slotmay not be practical. For practically, we will update thetransition probability matrix only once in every W timeslots as illustrated in Fig. 2.

Fig. 2. The process of updating λ[n] and µ[n], as well ascreating a new transition probability matrix at the intervalof W time slots.

In Fig. 2, within the time period of W time slots, λ[n]and µ[n] remains unchanged. A set of new user arrivalrate and departure rate are adopted at the beginning ofeach period of W time slots. The new rates are time-average values over the last W time slots. These newlyadopted rates may not be the same as the actual rates.Thus, we will determine W such that the differencebetween the actual rates and the adopted time-averagevalues are within a tolerable range. Let ∆λ be the errorin the arrival rate, such that the actual arrival rate isλ[n] +∆λ. We focus on explaining for arrival rate whilethe explanation will also apply to departure rate. Giventhe error in arrival rate, the transition probability matrixwill also have an error. Let p̂ij,n(a) be the correspondingtransition probability with λ[n] +∆λ. Consider only thenon-zero terms in (13), p̂ij,n(a) is given by (14), wherethe approximation is achieved through series expansion,and by assuming the error ∆λ ≪ 1. Now, the errorin transition probability is the difference between (13)and (14). Obviously, when ∆λ = 0, the error in tran-sition probability is also zero and this is achieved byupdating λ[n] in every time slot. However, this is notpractical and computational intensive. Practically, weare not interested in minimizing ∆λ but maximizingit for a given tolerable error in transition probability,which may in turn determined by the tolerable error inperformance metrics. Recall that λ̇ is the speed of change

Page 6: Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing

6

pij,n(a) =

e−(λ[n]+µ[n])T(

λ[n]µ[n]

)z/2

I|z|

(2T

√λ[n]µ[n]

)if h2(j) = h2(i) + aand h3(j) = h3(i) + z

0 otherwise(13)

p̂ij,n(a) = e−(λ[n]+∆λ+µ[n])T

(λ[n] + ∆λ

µ[n]

)z/2

I|z|(2T√

(λ[n] + ∆λ)µ[n])

≈ e−(λ[n]+µ[n])T e−∆λT

(λ[n]

µ[n]

)z/2 (1 +

z

2

∆λ

λ[n]

)I|z|

(2T

√λ[n]µ[n]

(1 +

∆λ

2λ[n]

))(14)

in user arrival rate. Given a ∆λ, we want to avoid aslope overload which occurs when the user arrival ratechanges too fast for the algorithm in Fig. 2 to catchup. This slope overload can be avoid when the followcondition is satisfied:

∆λ

WT≥ λ̇

W ≤ ∆λ

λ̇T. (15)

As described above, we want to maximize W to reducecomputational requirement. Hence, we set W = ∆λ/λ̇T .Reasonably, when the user arrival changes very rapidly,we need to have a small W .

When the system is in state Xn = i, taking an actiona ∈ A(i) will incur a cost c(i, a). In our model, thecost is power consumption as given by (2). From theequation, we notice that action a = −1 will reducepower consumption while a = +1 will increase powerconsumption as follows:

c(i, a) = Pcnt + (h2(i) + a)× Pr. (16)

3.2 Voice and Data UsersWe now extend the DTMDP model developed in theprevious subsection to include data users with differentdata rates. Compared to a voice user that requires only1 resource unit, different data users i may require dif-ferent numbers of resource units ri. This is reasonableassuming that each resource unit provides for a giventhroughput and different data users will have diversetraffic profile. For example, a web browsing user needsa much lower data rate compared to another user whois watching video on the smart phone. In this case, thevideo watching user will require more resource unitsthan that of the web browsing user. As such, the numberof occupied resource units is no longer the same as thenumber of active users. Since, in the voice-only case, wehave already defined the system state Xn based on thenumber of active resource modules m[n] and the numberof occupied resource units u[n], we continue to use herethe same state definition and state space S withoutchange. However, we highlight that while M remains asthe maximum number of resource modules supported inthe system, U is the maximum number of resource units

in a resource module which does not always equal thenumber of active users. Given a state, there is also nochange to the set of feasible actions as defined by (7) aswe will continue to look for opportunities in the sameway to turn off resource modules to conserve energy.Given the same state space and feasible action set, weadopt the same cost function as defined by (16) for eachaction taken in a given state.

While three out of the four components in the newDTMDP is the same as those in the voice-only case,the transition probability matrix is different. Consider agroup of heterogeneous voice and data users. We assumethe number of resource units ri required by a user i is arandom variable uniformly distributed within the range[1, R], where R is the maximum number of resource unitspossibly required by any user. With the same user arrivalprocess described earlier by (3), the probability of havinga total request for x new resource units in time slot n isgiven as follows:

P{x resource units required}

=

∞∑α=0

(P{( α∑

i=1

ri

)= x

}× P{α users arrive}

)=

∞∑α=0

(P{( α∑

i=1

ri

)= x

}× (λ[n]T )αe−λ[n]T

α!

).

(17)

Unfortunately, there is no closed form expression forthe probability function (17). Nevertheless, we may ef-ficiently compute the probability value after truncatingthe Poisson distribution for large α. This truncation isacceptable because for a small time slot duration, theprobability of having more than 50 user arrivals withina time slot is negligible. Fig. 3 shows the probabil-ity density for the total new resource units requiredby all newly arrived users in a time slot. The valuesare computed using MATLAB. In computing the termP{(

∑αi ri) = x}, we have exploited the fact that the

probability density function of a sum of multiple randomvariables is indeed the convolution of all the individualprobability density functions of these random variablesthat are being summed up. We notice that the probabilityof requiring more than 65 new resource units per timeslot is negligible. Also, occurrence of the peak probability

Page 7: Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing

7

value is reasonably shifted to a lower value of x with asmaller λ[n]T .

0 20 40 60 80 1000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Number of new resource units requested, x

Pro

babili

ty d

ensity

(a) λ[n]T = 4, R = 5

0 20 40 60 80 1000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Number of new resource units requested, x

Pro

ba

bili

ty d

en

sity

(b) λ[n]T = 10, R = 5

Fig. 3. The probability density function for total number ofresource units requested by all newly arrived users withina time slot.

Similar to the voice-only case, when a user departs,the respective occupied resource units will be released.However, the number of released resource units is notalways 1 per user. We assume in this user departureprocess, the same uniform distribution that has beenadopted above for the arrival process for the number ofresource units requested per user. In the voice-only case,the user departure rate is time-varying but independentof the number of active users. Here, in the presenceof data users, the user departure rate depends on thenumber of users. This is because data throughput fromeach resource unit decreases with an increasing numberof active users that contributes to a higher level of multi-user interference in the wireless system. As such, whilewe continue to use the algorithm in Fig. 2 to update λ[n]and µ[n] periodically, the effective departure rate maybe different within a time window W when the DTMDPmoves from one state to another state. Here, the effectivedeparture rate µ̄[n] is given as follows:

µ̄[n] =µ[n]

max{1, γ[n]}, (18)

where max(·) in the denominator is needed to upperbound the effective user departure rate because there is apractical limit in system throughput, and γ[n] determineshow sensitive the effective user departure rate is withrespect to the number of occupied resource units at timeslot n. Here, γ[n] is computed as follows:

γ[n] = ϵ− ϵ− 1

eκu[n]−κ, (19)

where u[n] has been defined earlier as the number ofoccupied resource units in time slot n. In the equation,1/ϵ is the maximum degradation factor suffered bythe user departure rate when the number of occupiedresource units increases. Here, κ determines the speedof the degradation such that a larger κ produces aslower degradation with respect to u[n]. Recall that userdepartures follow a Poisson process which implies anexponentially distributed service time. For a data userperforming file transfer, the file size is exponentiallydistributed. Therefore, the average file size affects thevalues of µ[n] and R, and consequently affects µ̄[n] in(18). Given that the system is in state Xn = (mn, un) intime slot n, the probability of having a total of y resourceunits released by all departing users is given as follows:

P{y resource units released |Xn}

=∞∑

β=0

(P{( β∑

i=1

ri

)= x

}× P{β users depart|Xn}

)

=∞∑

β=0

(P{( β∑

i=1

ri

)= x

}× P{β users depart|un}

)

=

∞∑β=0

(P{( β∑

i=1

ri

)= x

}× (µ̄[n]T )βe−µ̄[n]T

β!

).

(20)

Similar to (17), we compute the conditional probabilitydensity without a closed form expression. We do notshow the computed probability values here to avoidredundancy because they are similar to Fig. 3.

Let z = x− y be the net resource units required whenthe system transits from one state to another state. Noticethat z is the difference between new resource unitsrequested and released. This is similar to the variable zdefined earlier for voice-only case. Then, in the absenceof a closed form expression, we may compute the prob-ability density values for z given that we have alreadyderived (17) and (20). Fig. 4 shows the examples ofcomputed probability density values in different systemstates. We see that z can be positive or negative, where apositive number means an increase in the resource unitsoccupied and a negative number means a reduction inthe number of occupied resource units. The occurrenceof the peak probability value shifts to a higher positivenumber when the system is in a state with more oc-cupied resource units u[n]. This is reasonable because asdiscussed earlier, a higher u[n] which implies more activeusers, leads to a lower throughput and thus a smaller

Page 8: Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing

8

effective user departure rate. A lower departure rate inthe presence of a same arrival rate produces a higherchance for a larger z.

−100 −50 0 50 1000

0.005

0.01

0.015

0.02

0.025

Net resource units required, z

Pro

ba

bili

ty d

en

sity

(a) λ[n]T = 10, µ[n]T = 15, R = 5, Xn = (mn, 1).

−100 −50 0 50 1000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Net resource units required, z

Pro

ba

bili

ty d

en

sity

(b) λ[n]T = 10, µ[n]T = 15, R = 5, Xn = (mn, 5).

Fig. 4. The probability density function for the net numberof new resource units required within a time slot. Theprobability values are computed with γ = 1.

With the computed conditional probability density forz, we can now create the transition probability matrix forthe case that supports both voice and data users using(9). This transition probability matrix, together with statespace, action set and cost function described earlier inthis subsection have completed our DTMDP model formixed voice and data users.

3.3 Solving the DTMDPBoth the voice-only DTMDP and mixed users DTMDPcan be solved for their respective optimal policy usingthe method presented in this subsection. Recall thatwhen an action is taken in time slot n, a state is realizedin the next time slot n + 1. In general, the action takendepends on the history of the process up to time slot n.Given a process history at time slot n, a decision ruleis the function fn that assigns a probability distributionover the set A(Xn). A sequence of decision rules f =(f0, f1, f2, · · · ) is called a policy. A policy is stationarywhen it is completely described by a single decision

rule f = (f0, f0, f0, · · · ). We consider stationary policyhereafter assuming each DTMDP is stationary within thewindow of W time slots in Fig. 2. Our goal is to find anoptimal stationary policy that can minimize the averagecost per time slot (stage), where the average cost v̄f (i)for a give policy f is given as follows:

v̄f (i) = limN→∞

1

NEf

[N∑

n=0

c(Xn, An)|X0 = i

], (21)

where Ef represents expectation conditioned on policyf . Then, the optimal cost is given by

g = v̄∗(i) = inff∈F

v̄f (i), (22)

where F is the set of all possible policies.A stationary policy is deterministic if there is a func-

tion f(Xn) ∈ A(Xn) such that

P{An = a|Xn = i} =

{1 if a = f(i)

0 otherwise(23)

For ergodic and stationary DTMDP, the optimal deter-ministic policy can be determined easily using dynamicprogramming with policy iteration or value iterationthrough Bellman’s optimality equation. Unfortunately,our DTMDP model is not ergodic with deterministicpolicy because not all the states in S are communica-tive. Therefore, we can not use the simple Bellman’soptimality equation in finding the optimal policy. Evenif our DTMDP is ergodic, using Bellman’s optimalityequation is still not a good option because it does notdeal with optimization constraints. Recall that we wantto minimize power consumption subject to meeting theminimum network service quality in terms of blockingprobability, and this service quality requirement is ourconstraint.

Luckily, we can make the DTMDP ergodic by makingthe decision rule probabilistic such that there is a proba-bility distribution f(i, a) over the three possible actionsa at each state i as follows:

f(i, a) = P{An = a|Xn = i}. (24)

We call f(i, a) ∈ F the probabilistic policy for DTMDP.The commonly used policy iteration and value iterationcan not find the optimal probabilistic policy, and they arealso not capable of enforcing constraints. In view of theproblem, we model the DTMDP as a linear programmingproblem before solving it for the optimal probabilisticpolicy. Using linear programming to solve our DTMDPis a good approach because it can deal with optimizationconstraint.

Let πi be the stationary probability in state i. Then,we define a new variable xia = πif(i, a), which is thelong run fraction of time that the system is in state i andtaking action a. As such, we further define the linear

Page 9: Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing

9

programming problem as follows:

minimize∑i∈S

∑a∈A(i)

c(i, a)xia (25)

subject to ∑a∈A(i)

xja −∑

i∈S−(j)

∑a∈A(i)

pij(a)xia = 0, j ∈ S

(26)∑i∈S

∑a∈A(i)

xia = 1 (27)

xia ≥ 0, i ∈ S, a ∈ A(i) (28)xia = 0, i ∈ S, a /∈ A(i) (29)∑i∈S+

∑a∈A(i)

xia ≤ Q (30)

where S−(j) is the set of possible predecessors of statej, i.e., S−(j) = {i : j ∈ S(i, a), a ∈ A(i)}, and S+ is theset of states {(m,u) : u = m× U}.

In the linear programming problem, there are a to-tal of 5 constraints. The first constraint is the balanceequation for the Markov chain, which says the sta-tionary probability for a state equals to the sums ofall its incoming probabilities. The second constraint isthe normalization equation of the Markov chain, whichsays the sum of all the stationary probabilities for allstates must equal to unity. The third constraint indicatesthe positivity requirement on all probability values. Thefourth constraint is to account for infeasible actions froma given state. The fifth constraint enforces the networkservice quality requirement. Recall that network servicequality is measured in terms of user blocking probabilitywhich is accounted using (5), and the tolerable upperbound in the probability is given as Q in the constraint.We set Q to the value of blocking probability sufferedby the system when the proposed dynamic activationpolicy is not applied. As such, we can practically observethe reduction in power consumption given the samenetwork service quality.

4 PERFORMANCE EVALUATION

We have implemented the proposed DTMDP and solvedfor its optimal probabilistic policy using MATLAB. Dif-ferent system configurations will lead to different op-timal policies. In the performance evaluation, we haveassumed the following nominal values for various pa-rameters: T = 1 minute, ∆λ/λ̇ = 36000, M = 6, U= 25, R = 3, ϵ = 1.5, κ = 1000, Q = 0.01, Pr = 10and Pcnt = 5. These values will be considered unlessstated otherwise. Notice that the nominal value of ∆λ/λ̇implies the user arrival and departure rates in Fig. 2 isupdated once every hour. We did not assign a specificunit such as Watt to Pcnt and Pr because the actualpower can vary widely depending on the equipment.We use a normalized power unit such that each moduleconsumes 10 units of power, where in practise each unitof power can be 500 Watt.

For each identified optimal policy, we can determineits respective power consumption. For performance com-parison, we have created a baseline system that has Mresource modules which are activated all the times. Wenormalized the power consumption of the optimal policyover that of the baseline system to produce the normal-ized power consumption, which is our key performancemetric. A normalized value smaller than 1 indicatesthat the proposed dynamic resource activation schemeis more energy efficient than the baseline system. Anormalized power consumption larger than 1 is obtainedonly when the proposed scheme is less energy efficientthan the baseline scheme.

6 8 10 12 14 16 180.4

0.5

0.6

0.7

0.8

0.9

1

Number of supported voice−only users per resource module, U

Norm

aliz

ed p

ow

er

consum

ption

Number of departure per time slot, µ[n]T = 1

Number of departure per time slot, µ[n]T = 2

Fig. 5. Normalized power consumption with increasingnumber of voice-only users supported per resource mod-ule, at different average numbers of user departures pertime slot µ[n]T and number of resource modules M = 2.

For a start, we consider voice-only scenario withPcnt = 0 and λ[n]T = 1, for a small network withM = 2. There are only two resource modules. If oneof the two modules can be deactivated, we will achievea normalized power consumption of 0.5 which implies a50% power saving. Fig. 5 shows that the proposed policyis always more power efficient than the baseline systemas interpreted by the normalized power consumptionwhich never exceeds unity. The figure also indicates thatit is possible to achieve more than 50% reduction inpower consumption with certain settings.

Fig. 5 shows results for two very different trafficconditions. When µ[n]T = 1, the ratio λ[n]/µ[n] = 1 andthis is a very high traffic load because all communicationsystem must keep the ratio not larger than 1 in order tobe stable. On the other hand, when µ[n]T = 2, the ratioλ[n]/µ[n] = 0.5 and this is a very low traffic load becauseonly half of the system capability is utilized in average.Fig. 5 shows that there are two contradicting effects ofincreasing the number of supported users per resourcemodules U . When traffic load is very low, a larger Ucan lead to more power saving. When traffic load isvery high, a larger U can lead to less power saving.A larger U implies a bigger system capacity because

Page 10: Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing

10

the system can now support more users as given byU×M . At the very high traffic load (µ[n]T = 1), a biggersystem capacity means a lower user blocking probabilityQ. Recall that we have used Q as the constraint inour linear programming problem. A tighter constraintleads to a smaller feasibility region in finding an optimalpolicy. Therefore, a lower power saving. On the otherhand, at the very low traffic load (µ[n]T = 2), theblocking probability is already negligible even if oneof the two resource modules is deactivated and theQ constraint is hardly enforced. The dynamic resourceactivation policy can exploit the situation to turn off oneof the resource modules most of the times leading to asignificant improvement in power saving. The chance ofthis happening is higher with a larger U .

6 8 10 12 14 16 18

0

0.02

0.04

0.06

0.08

0.1

0.12

Number of supported voice−only users per resource module, U

Blo

ckin

g p

rob

ab

ility

, P

blo

ck

Number of departures per times lot, µ[n]T = 1

Number of departures per time slot, µ[n]T = 2

Fig. 6. Blocking probability with increasing number ofvoice-only users supported per resource module, at dif-ferent average number of user departures per time slotµ[n]T and number of resource modules M = 2.

Fig. 6 shows the corresponding user blocking prob-ability for each data point in Fig. 5. We confirm thatthe blocking probabilities have never exceeded the con-straint. In Fig. 6, the blocking probability is quite largewhen µ[n]T = 1. Specifically, the blocking probabilitycan be as high as 11.5% when there is only 6 voiceusers supported per resource module. This is because asexplained earlier, the ratio of λ[n]/µ[n] = 1 is the upperlimit to achieve stability in any communication network.At such a boundary condition, it is reasonable to see ahigher blocking probability. The actual value of blockingprobability is not important to us because we are onlyconcern with the relative power consumption with andwithout our scheme, given any blocking probability. Theimportant point is that our scheme can indeed reducepower consumption without compromising service qual-ity given in terms of blocking probability as seen by thebaseline scheme.

Fig. 7 shows the normalized power consumption atλ[n]/µ[n] = 1 for different combinations of M and U .Similar to U , adding more resource modules M can alsoincrease the system capacity. Adding resource modulehas the effect of rapidly reducing traffic load for a given

6 8 10 12 14 16 180.1

0.2

0.3

0.4

0.5

0.6

Number of supported voice−only users per resource module, U

Norm

aliz

ed p

ow

er

consum

ption

Number of resource modules, M = 2

Number of resource modules, M = 4

Number of resource modules, M = 6

Number of resource modules, M = 8

Fig. 7. Normalized power consumption with increasingnumber of voice-only users supported per resource mod-ule, at different numbers of resource modules M andλ[n]T = 1.

ratio of λ[n]/µ[n] because each resource module adds asmany as U resource units to the system. Therefore, wesee an opposite in trend normalized power consumptionof M = 2 compared to that with other values of M . Thisis a result of switching the system from very high load(M = 2) to low load. Similar to Fig. 5, at very high load,increasing U reduces the power saving. It is the oppositeto what we see for other values of M .

Fig. 7 shows that, for a given U , an increasing Mleads to a rapid decrease in normalized power consump-tion. For example, for U = 18, the dynamic activationscheme consumes about 50% of the power requiredby the baseline system with M = 2. This percentagedrops to about 15% when M is increased to 8. Thisimplies a system with more resource modules has ahigher potential in achieving a larger energy saving. Thisis reasonable because more resource modules offers agreater flexibility with each module supporting fewerresource units. As such, we call for a design with a largerM but smaller U for a desired M × U .

While Fig. 7 suggests that power saving can go upto as high as 85%, we want to highlight that the valuesare computed with Pcnt = 0. In practice, Pcnt is hardlyclose to zero because the base station needs to consumesome power even when it is not supporting any networktraffic. Fig. 8 shows how the normalized power con-sumption is affected by Pcnt. Reasonably, the normalizedpower consumption increases when Pcnt increases. Thisresult tells us that it is a very important research activeto reduce the fundamental power consumption of abase station, in addition to our effort in saving powerby dynamically turning off resource modules. Similarto the trend we see in Fig. 7, the normalized powerconsumption is lower with a higher number of resourcemodules. This is because, at a given U and µ[n]T , a largerM means a higher chance to find a resource module thatis not being used, and therefore can be turned off to save

Page 11: Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing

11

power.

2 3 4 5 6 7 80

0.2

0.4

0.6

0.8

1

Number of resource modules, M

Norm

aliz

ed p

ow

er

consum

ption

Fundamental power consumption, Pcnt = 10

Fundamental power consumption, Pcnt = 5

Fundamental power consumption, Pcnt = 0

Fig. 8. Normalized power consumption with increasingnumber of resource modules, with U =18 voice-only userssupported per module, and the average number of userdepartures per time slot, µ[n]T = 1.

Cross reference Fig. 5 and Fig. 6, we notice that it ispossible to achieve more than 50% power saving whilekeeping blocking probability at a negligible level. Specif-ically, for U = 18 and µ[n]T = 2, the normalized powerconsumption is about 0.5 when blocking probability isalmost zero. This is a clear evidence that we can conservepower without compromising network service quality.

5 4 3 2 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of Voice−Only User Departures per Time Slot, µ[n]T

No

rma

lize

d p

ow

er

co

nsu

mp

tio

n

Number of supported voice users per module = 8

Number of supported voice users per module = 9

Number of supported voice users per module = 10

Fig. 9. Normalized power consumption with increasingaverage number of user departures per time slot, µ[n]T .

Recall that the blocking probability which is imposedas a service quality requirement, has the effect of limitingthe power saving. Now, we further investigate the effectof not equating Q to the baseline blocking probability.Instead, we set Q to a target value, says 1%. In this case,we practically tolerate up to 0.01 blocking probabilityregardless of the baseline system. Fig. 9 shows that thedynamic policy can consistently achieve up to 50% ofpower saving. In Fig. 9, there is no result for U = 8 andµ[n]T = 1 because there is no feasible dynamic policythat can achieve 1% blocking under the condition.

15 20 25 30 35 40

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

Number of resource units per resource modules, U

Norm

aliz

ed p

ow

er

consum

ption

Number of resource modules, M = 6

Number of resource modules, M = 8

Number of resource modules, M = 10

Fig. 10. Normalized power consumption with increasingnumber of resource units per resource module, at dif-ferent numbers of resource modules In this figure, theblocking probability Q = 0.01, average number of userarrival per time slot λ[n]T = 25, and average number ofuser departure per time slot µ[n]T = 27.5.

Up to this point, we have focused on voice-only users.We will now study the performance given a mix ofvoice and data users. For these evaluations, we set Qto 0.01. We also set λ[n]T = 25 and µ[n]T = 27.5 sothat λ[n]/µ[n] ≈ 0.9 < 1. Each data user may requireup to R = 3 resource units. Fig. 10 indicates that weneed only M = 6 resource modules to achieve a powersaving of 50%. Since the system is not at the boundarycondition of load (λ[n]/µ[n] = 1), it is consistent withFig. 5 and Fig. 7 where a larger U leads to a lowernormalized power consumption. As such, we summarizethat the mix of data and voice users does not change theperformance trend but only the necessary system settingfor a desired performance. Specifically, a mix user systemneeds a larger M compared to a voice-only system witha same user arrival rate and departure rate.

0 5 10 15 20 250.39

0.4

0.41

0.42

0.43

0.44

0.45

0.46

0.47

0.48

Hour of the day

Norm

aliz

ed p

ow

er

consum

ption

Blocking probability, Q = 0.01

Blocking probability, Q = 0.005

Fig. 11. Normalized power consumption at different hoursof a day, where hour 01 is 01:00AM. In this figure, numberof resource modules M = 6 and number of resource unitsper module U = 25.

Page 12: Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing

12

Now, we want to present some evaluation results asa function of time so that we may observe the effectof time-varying traffic load. In this evaluation, the userarrival rate λ[u] is updated using Fig. 2 hourly at thebeginning of the hours, and the actual user arrival rateλ(t) at time t is given by:

λ(t) = 20(1 + 0.25× sin

(π(t/3600− 3)

12

)). (31)

The user arrival rate is generated in this way to highlightthe time-varying and periodic characteristics of networktraffic. In this traffic model, the user arrival rate is thehighest at 9AM and it is the lowest at 9PM. We letthe user departure rate be 1.1 times of the correspond-ing user arrival rate. We configure the system to userM = 6 and U = 25. Fig. 11 shows that the normalizedpower consumption increases and decreases trackingthe changes in network traffic. The power consumptionis the highest when the network traffic is the highest.This is a good indication that our system is efficientbecause it consumes more power only when there ismore traffic to support. More importantly, this dynamicactivation and deactivation of resource modules are donewithout compromising service quality. Specification, theuser blocking probability is always met for both Q = 0.01and Q = 0.005. The normalized power consumption forQ = 0.005 is higher than that of Q = 0.01 because asmaller Q is a tighter quality requirement resulting in alower chance to deactivate a resource module.

5 CONCLUSION

In this paper, we have studied the possibility of reducingpower consumption of a base station by deactivatingsome of its modular resources depending on the usertraffic arrival. Simply, the idea is to activate more re-source modules when there are more users, and de-activate some resource modules when there are fewerusers, given that network traffic is time-varying. We aimto reduce power consumption without compromisingnetwork service quality which is measured in terms ofuser blocking probability. We have developed a DTMDPwith liner programming and solved for its optimal prob-abilistic policy. Considering voice users and data users,the evaluation results show that it is possible to saveup to 50% of the power consumption while keepingthe user blocking probability to as low as 0.005. In ourfuture work, we will investigate the effects of radiopropagation impairments and user mobility in reducingpower consumption for a wireless base station withmodular resources.

REFERENCES[1] The World Bank, “Turn Down the Heat: Why a 4◦C Warmer

World Must Be Avoided,” November 2012.[2] Willem Vereecken, Ward Van Heddeghem, Didier Colle, Mario

Pickavet and Piet Demeester, “Overall ICT Footprint and GreenCommunication Technologies,” Proceedings of the InternationalSymposium on Communications, Control and Signal Processing, March2010.

[3] Molly Webb, “Smart 2020: Enabling The Low Carbon Economyin The Information Age,” The Climate Group, Tech. Rep., 2008.

[4] Maruti Gupta and Suresh Singh, “Greening of The Internet”,Proceedings of the ACM SIGCOMM conference on Applications, Tech-nologies, Architectures, and Protocols for Computer Communications,pp. 19-26, August 2003.

[5] Sergiu Nedevschi, Lucian Popa, Gianluca Iannaccone, SylviaRatnasamy, and David Wetherall, “Reducing Network EnergyConsumption via Sleeping and Rate-Adaptation,” Proceedings ofthe USENIX Symposium on Networked Systems Design and Imple-mentation, pp. 323-336, April 2008.

[6] M. Chuah, Wei Luo and X. Zhang, “Impacts of Inactivity TimerValues on UMTS System Capacity,” Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC), pp. 897-903,March 2002.

[7] Jui-Hung Yeh, Jyh-Cheng Chen and Chi-Chen Lee, “Compara-tive Analysis of Energy-Saving Techniques in 3GPP and 3GPP2Systems,” IEEE Transactions on Vehicular Technology, vol. 58, no. 1,pp. 432-447, January 2009.

[8] Michele Polignano, Dario Vinella, Daniela Laselva, Jeroen Wigardand Troels B. Sorensens, “Power Savings and QoS Impact for VoIPApplication with DRX/DTX Feature in LTE,” Proceedings of theIEEE Vehicular Technology Conference (VTC), May 2011.

[9] Ziaul Hasan, Hamidreza Boostanimehr and Vijay K. Bhargava,“Green Cellular Networks: A Survey, Some Research Issues andChallenges,” IEEE Communications Surveys & Tutorial, vol. 13,no. 4, pp. 524-540, Fourth Quarter 2011.

[10] Jacques Palicot, “Cognitive Radio: An Enabling Technology forThe Green Radio Communications Concept,” Proceedings of theInternational Conference on Wireless Communications and MobileComputing (IWCMC), pp. 489-494, June 2009.

[11] Holger Claussen, Lester T. W. Ho and Florian Pivit, “Effectsof Joint Macrocell and Residential Picocell Deployment on theNetwork Energy Efficiency,” Proceedings of the IEEE InternationalSymposium on Personal, Indoor and Mobile Communications (PIMRC),September, 2008.

[12] Vandana Bassoo, Kevin Tom, A. K. Mustafa Ellie Cijvat, HenrikSjoland and Mike Faulkner, “A Potential Transmitter Architecturefor Future Generation Green Wireless Base Station,” EURASIPJournal on Wireless Communications and Networking, November2009.

[13] Jyrki T. Louhi, “Energy Efficiency of Modern Cellular Base Sta-tions,” Proceedings of the International Conference on Telecommunica-tions Energy, pp. 475-476, October 2007.

[14] Fred Richter, Albrecht J. Fehske, and Gerhard P. Fettweis, “EnergyEfficiency Aspects of Base Station Deployment Strategies forCellular Networks,” Proceedings of the IEEE Vehicular TechnologyConference (VTC), Septermber 2009.

[15] B. Badic, T. O’Farrell, P. Loskot, J. He, “Energy Efficient RadioAccess Architectures for Green Radio: Large versus Small CellSize Deployment,” Proceedings of the IEEE Vehicular TechnologyConference (VTC), Septermber 2009.

[16] Peng-Yong Kong, “Power Consumption and Packet Delay Re-lationship for Heterogeneous Wireless Networks,” IEEE Com-munications Letters, vol. 17, no. 7, pp. 1376-1379, July 2013,doi:10.1109/LCOMM.2013.052013.130423.

[17] D. Calin, H. Claussen and H. Uzunalioglu, “On Femto De-ployment Architectures and Macrocell Offloading Benefits inJoint Macro-Femto Deployments,” IEEE Communications Magazine,vol. 48, no. 1, pp. 26-32, January 2010.

[18] Yujae Song, Peng-Yong Kong and Youngnam Han, “Power-optimized Vertical Handover Scheme for HeterogeneousWireless Networks,”, IEEE Communications Letters, 2014,doi:10.1109/LCOMM.2013.120713.132279 .

[19] Zhisheng Niu, Yiqun Wu, Jie Gong and Zexi Yang, “Cell Zoomingfor Cost-Efficient Green Cellular Networks,” IEEE CommunicationsMaganine, vol. 48, no. 11, pp. 74-79, November 2010.

[20] Marco Ajmone Marsan and Michela Meo, “Energy Efficient Wire-less Internet Access with Cooperative Cellular Networks,” ElsevierComputer Networks, vol. 55, no. 2, pp. 386-398, February 2011.

[21] Junhyuk Kim, Peng-Yong Kong, Nah-Oak Song, June-Koo KevinRhee and Saleh Al-Araji, “MDP Based Dynamic Base Station Man-agement for Power Conservation in Self-Organizing Networks,”Proceedings of the IEEE Wireless Communications and NetworkingConference (WCNC), April 2014.

[22] Salah-Eddine Elayoubi, Louai Saker and Tijani Chahed, “OptimalControl for Base Station Sleep Mode in Energy Efficient Radio

Page 13: Optimal Probabilistic Policy for Dynamic Resource Activation Using Markov Decision Process in Green Wireless Networks

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2307328, IEEE Transactions on Mobile Computing

13

Access Networks,” Proceedings of the IEEE INFOCOM, pp. 106-110, April 2011.

[23] Martin L. Puterman, “Markov Decision Processes: DiscreteStochastic Dynamic Programming,” John Wiley and Sons, 1994.

[24] Ghasem N. Shirazi, Peng-Yong Kong and Chen-Khong Tham,“Markov Decision Process Frameworks for Cooperative Retrans-mission in Wireless Networks,” Proceedings of the IEEE WirelessCommunications and Networking Conference (WCNC), April 2009,doi: 10.1109/WCNC.2009.4917801.

[25] L. Saker, S. E. Elayoubi, R. Combes and T Chahed, “Optimal Con-trol of Wake Up Mechanismes of Femto Cells in HeterogeneousNetworks,” IEEE Journal on Selected Areas in Communications,vol. 30, no. 3, pp. 664-672, April 2012.

[26] Peng-Yong Kong and Dorin Panaitopol, “Reinforcement Learn-ing Approach to Dynamic Activation of Base Station Resourcesin Wireless Networks,” Proceedings of the Personal, Indoor andMobile Radio Communications (PIMRC), September 2013, doi:10.1109/PIMRC.2013.6666710.

[27] Gilbert Micallef, Preben Mogensen and Hans-Otto Scheck, “CellSize Breathing and Possibilities to Introduce Cell Sleep Mode,”Proceedings of the European Wireless Conference, October 2010.

[28] Oliver Arnold, Fred Richter, Gerhard Fettweis and Oliver Blume,“Power Consumption Modeling of Different Base Station Typesin Heterogeneous Cellular Networks,” Proceedings of the FutureNetwork and Mobile Summit, June 2010.

Peng-Yong Kong (S’99-M’03-SM’12) is cur-rently an Assistant Professor at the Departmentof Electrical and Computer Engineering, KhalifaUniversity of Science, Technology and Research(KUSTAR), Abu Dhabi, United Arab Emirates.He was previously an adjunct Assistant Profes-sor at the Electrical & Computer EngineeringDepartment, National University of Singapore(NUS), concurrent to the appointment of Re-search Scientist at the Institute for InfocommResearch (I2R), Agency for Science, Technology

& Research (A*STAR), Singapore. Prior to the PhD study, he was anEngineer with Intel Malaysia. He received the BEng degree in electrical& electronic engineering (first-class honors) from the Universiti SainsMalaysia, and the PhD degree in electrical & computer engineeringfrom the National University of Singapore. His research interests arein wireless networking, medium access control, routing, and scheduling.