delay-optimal and energy-efficient communications with

33
1 Delay-Optimal and Energy-Efficient Communications with Markovian Arrivals Xiaoyu Zhao, Wei Chen, Senior Member, IEEE, Joohyun Lee, Member, IEEE, and Ness B. Shroff, Fellow, IEEE Abstract In this paper, delay-optimal and energy efficient communication is studied for a single link under Markov random arrivals. We present the optimal tradeoff between delay and power over Additive White Gaussian Noise (AWGN) channels and extend the optimal tradeoff for block fading channels. Under time-correlated traffic arrivals, we develop a cross-layer solution that jointly considers the arrival rate, the queue length, and the channel state in order to minimize the average delay subject to a power constraint. For this purpose, we formulate the average delay and power problem as a Constrained Markov Decision Process (CMDP). Based on steady-state analysis for the CMDP, a Linear Programming (LP) problem is formulated to obtain the optimal delay-power tradeoff. We further show the optimal transmission strategy using a Lagrangian relaxation technique. Specifically, the optimal adaptive transmission is shown to have a threshold type of structure, where the thresholds on the queue length are presented for different transmission rates under the given arrival rates and channel states. By exploiting the result, we develop a threshold-based algorithm to efficiently obtain the optimal delay-power tradeoff. We show how a trajectory-sampling version of the proposed algorithm can be developed without prior need of arrival statistics. Index Terms Cross-layer design, Markovian Arrivals, Queuing, Markov Decision Process, Energy efficiency, Average delay, Delay-power tradeoff, Linear programming. X. Zhao and W. Chen are with the Department of Electronic Engineering and Beijing National Research Center for Information Science and Technology, Tsinghua University. E-mail: [email protected], [email protected]. J. Lee is with the Division of Electrical Engineering, Hanyang University. E-mail: [email protected]. Ness B. Shroff holds a joint appointment in both the Department of ECE and the Department of CSE at The Ohio State University. E-mail: [email protected]. This research was supported in part by the National Natural Science Foundation of China under Grant No. 61671269, the Beijing Natural Science Foundation under Grant No. 4191001, and the National Program for Special Support for Eminent Professionals of China (10,000-Talent Program). September 2, 2019 DRAFT arXiv:1908.11797v1 [cs.IT] 30 Aug 2019

Upload: others

Post on 29-Dec-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Delay-Optimal and Energy-Efficient Communications with

1

Delay-Optimal and Energy-Efficient

Communications with Markovian ArrivalsXiaoyu Zhao, Wei Chen, Senior Member, IEEE, Joohyun Lee, Member, IEEE,

and Ness B. Shroff, Fellow, IEEE

Abstract

In this paper, delay-optimal and energy efficient communication is studied for a single link under

Markov random arrivals. We present the optimal tradeoff between delay and power over Additive White

Gaussian Noise (AWGN) channels and extend the optimal tradeoff for block fading channels. Under

time-correlated traffic arrivals, we develop a cross-layer solution that jointly considers the arrival rate, the

queue length, and the channel state in order to minimize the average delay subject to a power constraint.

For this purpose, we formulate the average delay and power problem as a Constrained Markov Decision

Process (CMDP). Based on steady-state analysis for the CMDP, a Linear Programming (LP) problem

is formulated to obtain the optimal delay-power tradeoff. We further show the optimal transmission

strategy using a Lagrangian relaxation technique. Specifically, the optimal adaptive transmission is

shown to have a threshold type of structure, where the thresholds on the queue length are presented for

different transmission rates under the given arrival rates and channel states. By exploiting the result,

we develop a threshold-based algorithm to efficiently obtain the optimal delay-power tradeoff. We show

how a trajectory-sampling version of the proposed algorithm can be developed without prior need of

arrival statistics.

Index Terms

Cross-layer design, Markovian Arrivals, Queuing, Markov Decision Process, Energy efficiency,

Average delay, Delay-power tradeoff, Linear programming.

X. Zhao and W. Chen are with the Department of Electronic Engineering and Beijing National Research Center forInformation Science and Technology, Tsinghua University. E-mail: [email protected], [email protected].

J. Lee is with the Division of Electrical Engineering, Hanyang University. E-mail: [email protected] B. Shroff holds a joint appointment in both the Department of ECE and the Department of CSE at The Ohio State

University. E-mail: [email protected] research was supported in part by the National Natural Science Foundation of China under Grant No. 61671269, the

Beijing Natural Science Foundation under Grant No. 4191001, and the National Program for Special Support for EminentProfessionals of China (10,000-Talent Program).

September 2, 2019 DRAFT

arX

iv:1

908.

1179

7v1

[cs

.IT

] 3

0 A

ug 2

019

Page 2: Delay-Optimal and Energy-Efficient Communications with

2

I. INTRODUCTION

There is increasing interest in developing strategies to achieve low-latency transmissions in

a wide variety of applications, e.g., in mission critical applications for the Internet of Things

(IoT), or Ultra Reliable and Low Latency Communications (URLLC) in Fifth-Generation (5G)

systems [1, 2]. At the same time, there is also a push towards developing strategies to make

devices and networks more energy efficient [3, 4]. Thus, in our work, we will aim to understand

the fundamental tradeoff between delay and energy. More specifically, we will develop a cross-

layer solution that minimizes the delay for a given power constraint.

Cross-layer design has been used as a potential enabler to satisfy the requirements of low

latency [5]. In [6], a tradeoff between delay and throughput was established based on a cross-

layer design that combines adaptive modulation and coding with a truncated Automatic Repeat

reQuest (ARQ). In [7], the authors proposed a cross-layer power and rate allocation control to

minimize power consumption with a delay constraint in Multiple-Input Multiple-Output (MIMO).

Moreover, energy-efficient cross-layer designs are also studied for packet transmission in wireless

networks. In [8], a cross-layer online algorithm was proposed to obtain a more energy efficient

transmission over wireless networks. For multi-hop wireless networks, a cross-layer framework

was also presented to jointly consider power control and scheduling in [9]. With the stringent

requirements in 5G, the cross-layer designs have been studied to achieve the low latency and

energy efficient transmissions in multiple scenarios, such as tactile Internet [10] and wireless

mesh network [11].

In this work, we take a cross-layer design approach to analytically establish the power-delay

tradeoff. To jointly optimize the delay and power, the design problem can be formulated using

a Markov Decision Process (MDP). In [12], Collins and Cruz considered cross-layer scheduling

of an adaptive transmitter over a two-state fading channel. In their work, the authors established

a tradeoff between the average delay and power consumption based on Dynamic Programming

(DP), where the only objective of the MDP is formulated as the weighted sum of average

power and delay. Follow-up papers [13–15] extended this study in various directions with the

DP formulation in [12] employed. In [13], Berry and Gallager formulated the optimal delay-

power tradeoff curve for a multi-state block fading channel, where the fixed-length coding

and variable-length coding are discussed. With the DP formulation, the authors have presented

all the Pareto optimal power-delay operating points and studied the optimal tradeoff in the

DRAFT September 2, 2019

Page 3: Delay-Optimal and Energy-Efficient Communications with

3

regime of asymptotically large delays. For the regime of asymptotically small delays, Berry has

further presented the behavior of the optimal delay-power tradeoff in [14]. Moreover, a single-

parameter scheduler, labeled log-linear scheduler, was proposed over a block fading channel in

[15] with near-optimal performance. In our previous work [16], the optimal delay-power tradeoff

was attained by formulating a Constrained MDP. With a probabilistic scheduling framework

employed, we converted the CDMP problem as an LP problem. By solving the derived LP

problem, we obtain an arbitrary power-delay operating point on the optimal tradeoff curve.

We further focus on the structural properties of the optimal transmission policies in the cross-

layer design. By exploiting structural properties of the optimal policy, a substantial reduction

in computational complexity can be obtained for finding the optimal delay-power tradeoff.

For example, in [17], the structure of the optimal policy were investigated for an adaptive

transmitter over the fading channel with interference. The authors of [18] further developed an

explicit formula for the optimal transmission rate, through which the optimal rate of the single

link over a static channel is expressed as a increasing function of queue length. In [19], the

optimal scheduling was presented in correlated fading channel with the ARQ protocol employed.

The monotonicity of the optimal scheduling was also shown by presenting the optimal rate as

an increasing function of the buffer occupancy. Moreover, by using the policy structures, the

complexity of point-to-point network transmission control in [20] was effectively reduced with

the tools from graph signal processing employed for large state space. In [21], based on the

structural properties, a novel accelerated reinforcement learning (RL) algorithm was formulated

for an energy-harvesting wireless sensor with latency-sensitive data. Based on the formulated

LP problem in our previous work [16], we also shown a threshold-based structure for the

optimal transmission policies. For the optimal threshold-based policy, we further give a detailed

description by showing that the transmission rates are selected deterministically for all the queue

lengths except a particular threshold. The work about the optimal threshold-based policies was

also extended to the communication systems with adaptive transmission [22], arbitrary burstiness

random arrival [23], and multi-state fading channels [24], respectively.

In this work, we generalize our previous work in [25] to show delay optimality with Markov

arrivals. Our generalization is motivated by the work of [26], where network arrivals are shown

to exhibit time-correlations. By modeling the user’s arrival as a Markov chain, we first present a

cross-layer design to determine the transmission rate. In particular, we determine the transmission

September 2, 2019 DRAFT

Page 4: Delay-Optimal and Energy-Efficient Communications with

4

rates by its probability distribution, which is obtained for the current queue length, arrival rate,

and the channel state. With the degenerated probability distribution employed, we can present

a deterministic rate selection as the special case for probabilistic transmission policies. Under

the probabilistic cross-layer design, we then formulate the adaptive transmission as a CMDP.

In this way, we next show delay optimality for AWGN channels, where the impacts of the

Markovian arrivals is presented for the optimal delay-power tradeoff. Furthermore, the optimal

tradeoff between the delay and power consumption is extended to block fading channels.

For AWGN channels, we first convert the formulated CMDP as an equivalent LP problem.

By this means, we construct the optimal delay-power tradeoff to minimize the average delay

under an average power constraint. We further show the optimal tradeoff by using a curve

that consists of all the optimal power-delay pairs for different power constraints. We refer the

curve as the optimal delay-power tradeoff curve, and show the typical geometric properties of

it under Markovian arrivals, i.e., the tradeoff curve is piecewise linear, decreasing, and convex.

By jointly exploiting the properties of both the optimal tradeoff curve and the corresponding

optimal policies, we then show that the optimal average delay is generated by a threshold-

based optimal adaptive transmission policy. Based on the threshold-based structure, we finally

develop an algorithm to efficiently determine the optimal transmission strategies, through which

the optimal delay-power tradeoff is presented. In practice, we show that an online version of the

threshold-based algorithm can be also exploited without any need for random arrival statistics.

Moreover, we extend the optimal delay-power tradeoff by considering block fading channels.

With a block fading channel employed, we can obtain the equivalent LP problem for the adaptive

transmitter that is derived based on the formulated CMDP. We then obtain a similar threshold-

based structure on the queue length for fading channels. As a result, with the current arrival

rate and channel state given, we can particularly attain the corresponding transmission rate by

comparing the current queue length with the thresholds for different transmission rates.

The rest of this paper is organized as follows. In Section II, the system model is presented

as a CMDP. By formulating the CMDP as an LP problem, Section III investigates the optimal

delay-power tradeoff over AWGN channels. Then, the corresponding optimal transmission policy

is presented in Section IV under the threshold-based structure. In Section V, we further extent

the optimal delay-power tradeoff over a block fading channel. Finally, numerical results and

conclusions are given in Sections VI and VII, respectively.

DRAFT September 2, 2019

Page 5: Delay-Optimal and Energy-Efficient Communications with

5

TxTemporal Dependent

Random Arrival

Cross-Layer

Scheduling

Channel

Encoder

Channel State Information

Tra

nsm

issi

on P

ow

er

Transmission rate10 42 3 65

Additive

Gaussian Noise

Adaptive Transmission

Fig. 1. System Model

II. SYSTEM MODEL

In this paper, we focus on a single link of an adaptive transmitter that serves traffic arriving

according to a general Markovian process. As shown in Fig. 1, the system is assumed to be

time-slotted. The data packets arrive at the beginning of each timeslot according to a stationary

and ergodic Markov chain that has finite states. The state of the Markov chain corresponds to the

number of packets that arrive in timeslot n, and is denoted by a[n], where the maximum value

of a[n] is defined as A, i.e., a[n] ∈ {0, 1, · · · , A}. Given that a[n] packets arrive in timeslot n,

a[n+ 1] is characterized by the transition probability γa,a′ that is defined as

γa,a′ = Pr{a[n+ 1] = a′ | a[n] = a}, (1)

where a and a′ belong to set {0, 1, · · · , A}. In other words, the probability that a[n+ 1] = a′ is

shown as γa,a′ given that a[n] = a. Note that γa,a′ ≥ 0 and∑A

a′=0 γa,a′ = 1. With the transition

probabilities γa,a′ , α, the expected number of arrivals in a timeslot, is given by

α =A∑a=0

aφa, (2)

where φa denotes the steady-state probability of a arrivals in a timeslot.

Arriving packets enter a buffer of size Q. At each time n, the queue length q[n] belongs to

set {0, 1, · · · , Q}, and evolves as

q[n+ 1] = min{max{q[n]− s[n], 0}+ a[n+ 1], Q}, (3)

where s[n] denotes the number of packets that are transmitted in timeslot n.

Due to the limited throughput at the transmitter, the number of packets that can be trans-

mitted in each timeslot is upper bounded by S. The transmission rate s[n] belongs to the set

{0, 1, · · · , S}. We then assume that the maximum transmission rate is greater than or equal to

the maximum data arrival rate, i.e., S ≥ A. As a result, we provide the stability of the queue

system under an arbitrary Markov arrival process, where the average arrival rates can range from

0 to A under different arrival processes. Further, to avoid underflow and overflow of the buffer,

s[n] needs to satisfy 0 ≤ q[n] − s[n] ≤ Q − A. In other words, for each given queue length q,

we have q − Q + A ≤ s ≤ q. Therefore, with a given queue length q, we define the feasible

September 2, 2019 DRAFT

Page 6: Delay-Optimal and Energy-Efficient Communications with

6

region S(q) of the transmission rate as {s|max{q −Q+ A, 0} ≤ s ≤ min{q, S}} 1.

To transmit s[n] packets in timeslot n, we determine the corresponding power consumption

for the adaptive transmitter with the available Channel State Information (CSI). In particular, we

present the channel state h[n] of timeslot n by using the current channel coefficient of the fading

channel. As a result, we have that h[n] belongs to the field of complex numbers C. With the

channel state h[n] given as h ∈ C, we express the power consumption by function Ph(s) for each

transmission rate s, where we define function Ph(s) = 0 for each h. For typical communications

scenarios, we provide a greater transmission rate by a greater power consumption, Meanwhile,

the power efficiency will degrade with the increasing transmission rate [8]. Therefore, we focus

on a function Ph(s) that is monotonically increasing and convex in s for each given h. With Ph(s)

given for channel state h[n], the power consumption in timeslot n is defined as ρ[n] = Ph[n](s[n]).

We further adopt an L-state block fading channel model, through which the channel coefficient

of the fading channel stays invariant during each timeslot and is quantized into L states, i.e.,

h1, h2, · · · , hL. In this way, we have that channel state h[n] belongs to set {h1, · · · , hL}, through

which we shall only consider the power functions Phι(s), ι = 1, · · · , L for the block fading

channel. More specifically, the channel states are satisfy 0 < |h1| < |h2| < · · · < |hL| < +∞.

In other words, we will obtain a better channel condition under a channel state hι with a greater

index l, 1 ≤ ι ≤ L. Moreover, we consider that the channel state h[n] in each timeslot n follows

an independent and identically distributed (i.i.d.) process. As a result, we defined the probability

of that channel state h[n] for each timeslot n is equal to hl as

Pr{h[n] = hι} = ηι, (4)where we have

∑Lι=1 ηι = 1.

Under the cross-layer adaptive transmission policy, the transmission rate s[n] is determined by

the current queue length q[n], the arrival rate a[n], as well as the channel state h[n]. With q[n],

s[n], and h[n] presented as q, a, hι, respectively, we define the probability f sq,a,ι that transmission

rate s[n] is equal to s as

f sq,a,ι = Pr{s[n] = s | q[n] = q, a[n] = a, h[n] = hι}, (5)

where we have∑S

s=0 fsq,a,ι = 1, and f sq,a,ι = 0 for each s /∈ S(q). Based on the probability

f sq,a, the cross-layer adaptive transmission policy F is expressed by {f sq,a,ι : 0 ≤ q ≤ Q, 0 ≤

a ≤ A, 0 ≤ ι ≤ L, 0 ≤ s ≤ S}. We first present the deterministic transmission policies

1To avoid underflow and overflow, we also need to satisfy S ≥ A, which is straightforwardly obtained by the existence ofthe feasible region S(q) with q setting as Q.

DRAFT September 2, 2019

Page 7: Delay-Optimal and Energy-Efficient Communications with

7

using a degenerate probability distribution on the transmission rate for each given queue length,

arrival rate and channel state. Then, a deterministic policy FD is equivalently expressed as

{sFD(q, a, ι) : 0 ≤ q ≤ Q, 0 ≤ a ≤ A, 0 ≤ ι ≤ L}, where we have sFD(q, a, ι) =∑

s∈S(q) sfsq,a,ι.

The set of deterministic policies are given by FD $ F , where F is the set of all policies. For

random arrivals that are temporally correlated, the same probabilistic strategy is also constructed

by determining the probabilities of the transmission rate given the current queue length and

channel state with the historical information of the arrival rates, as presented in Eq. (5).

By using the probabilistic transmission policies, we present a Markov Decision Process

(MDP), where we express the system state as the triple (q[n], a[n], h[n]). With system state

(q[n], a[n], h[n]) at timeslot n given as (q, a, hι), each adaptive transmission policy F ∈ F can

determine transmission rate s[n] based on the probability distribution {f sq,a,ι : 0 ≤ s ≤ S}. Under

the given transmission rate s[n], we next determine the system state (q[n+ 1], a[n+ 1], h[n+ 1])

in timeslot (n+ 1) following the processes of Markov arrival and channel fading. In particular,

the transition probability for the next timeslot is represented as

Pr{q[n+ 1] = q′,a[n+ 1] = a′, h[n+ 1] = hι′ | (6)

q[n] = q, a[n] = a, h[n] = hι, s[n] = s} = γa,a′ηι′1{s=q+a′−q′},

where we have q′ ∈ {0, 1, · · · , Q}, a′ ∈ {0, 1, · · · , A}, and ι′ ∈ {0, 1, · · · , L}. With the system

state employed, the MDP can continually evolve under the given initial queue length q0, arrival

rate a0, and channel state hι0 , where we define q0 = q[0], a0 = a[0], and hι0 = h[0].

With the formulated MDP, the long-term average power consumption and delay are also

formulated based on the power consumption ρ[n] = Ph[n](s[n]) and the queue length q[n] in

each timeslot, respectively. First, the average power consumption PF can be presented as

PF = limN→∞

1

NEFq0,a0,hι0

{N∑n=1

ρ[n]

}, (7)

where EFq0,a0,hι0{·} is the expectation with respect to policy F as well as the initial system state

(q0, a0, hι0). The average delay DF is given from Little’s Law as

DF = limN→∞

1

NEFq0,a0,hι0

{1

α

N∑n=1

q[n]

}, (8)

where recall that α is defined as the expected number of packets that arrive in each timeslot.

Based on the average power consumption and delay in Eqs. (7) and (8), we can formulate the

optimal delay-power tradeoff under Markov random arrivals. Intuitively, a higher transmission

rate can reduce the packets’ delay, but degrades the power efficiency because Ph(s) is convex

on s for each channel state h. For a lower transmission rate, the reverse holds true, i.e., we

September 2, 2019 DRAFT

Page 8: Delay-Optimal and Energy-Efficient Communications with

8

have a greater power efficiency but also a larger transmission delay. Therefore, a tradeoff exists

between the delay and power consumption. To obtain the optimal tradeoff, we formulate a

cross-layer optimization problem as a Constrained Markov Decision Process (CMDP) under the

probabilistic transmission strategy. In the CMDP, we aim at minimizing the average delay subject

to the constraint on the average power. In particular, the optimization problem is given as

minF∈F

DF (9a)

s.t. PF ≤ Pth. (9b)

By solving this CMDP under different power constraint Pth, we can show the optimal delay-

power tradeoff under Markov arrivals. As a result, we obtain the minimized average delay DF ∗

and optimal policy F ∗ for each given Pth.

To particularly show the impact of Markovian arrivals, we first focus on the optimal delay-

power tradeoff for an AWGN channel in Sections III and IV. Then, we extend the optimal

tradeoff by considering the fading channel in Section V. More specifically, we present the AWGN

channel by setting L = 1 and |h1| = 1. Under the only channel state, we further simplify the

presentations of the only power function and the adaptive transmission policy as P (s) and

F = {f sq,a : 0 ≤ q ≤ Q, 0 ≤ a ≤ A, 0 ≤ s ≤ S} in the following two sections, respectively. As

a result, a degenerated CMDP is formulated with the system state as (q[n], a[n]).

III. OPTIMAL DELAY-POWER TRADEOFF FOR AWGN CHANNELS

In this section, we focus on the optimal delay-power tradeoff for AWGN channels, which is

described by the cross-layer optimization problem (9). We first show that the optimal delay-power

tradeoff can be formulated by an equivalent LP problem based on the steady-state analysis for

a single user. With the LP problem being solved over the set of all the obtainable power-delay

pairs, we then generate an optimal delay-power tradeoff curve for AWGN channels, under which

minimized average delays are obtained for different power constraints. Further, we show some

interesting geometric properties of the optimal tradeoff curve. Based on these geometric prop-

erties, we finally demonstrate that the same optimal tradeoff is obtained by the optimal policies

with an arbitrary initial system state. In other words, the optimal policies over AWGN channels

have the same average delay and power consumptions regardless of the initial system states.

A. The equivalent LP problemFirst, we show the optimal delay-power tradeoff by expressing the cross-layer optimization

problem (9) as an LP problem. In particular, we formulate the LP problem based on a Markov

Reward Process (MRP) that is generated by the CMDP with the transmission policy given. For

DRAFT September 2, 2019

Page 9: Delay-Optimal and Energy-Efficient Communications with

9

a given policy F , we first describe the resulting MRP to analytically present the average delay

and power. In the MRP, λ(q,a),(q′,a′) denotes the transition probability from (q, a) to (q′, a′). Based

on the evolution of q[n] and a[n] in Eq. (3), transition probability λ(q,a),(q′,a′) is presented as

λ(q,a),(q′,a′) = γa,a′fq−q′+a′q,a 1{max{q−S,0}≤q′−a′≤min{q,Q−A}}. (10)

With probability λ(q,a),(q′,a′), we then show steady-state probabilities by formulating the balance

equations. Let πF (q, a) denote the steady-state probability. We present the balance equations asA∑a=0

min{q′−a′+S,Q}∑q=max{q′−a′,0}

πF (q, a)λ(q,a),(q′,a′) = πF (q′, a′), (11)

where we have∑Q

q=0

∑Aa=0 πF (q, a) = 1. More specifically, πF (q, a) indicates how often the

queue length is equal to q and the arrival rate is a on average in the long run. Considering the

evolution of q[n] in Eq. (3) with s[n] ∈ S(q[n]), we have q[n+1]−a[n+1] = q[n]−s[n] ≤ Q−A

for each timeslot. Therefore, it is straightforward that steady-state probability πF (q, a) is equal

to 0 if q−a > Q−A. By solving the balance equations for all queue lengths q and arrival rates

a, we can obtain the steady-state probability distribution πF that is defined as {πF (q, a) : ∀q, a}.

The balance equations given by Eq. (11) can be expressed as the following matrix formΛFπF = πF , (12)

where πF is formulated as vector with probabilities πF (q, a) as elements. In particular, we can

present πF (q, a) as the (a× (Q+ 1) + q + 1)th element in vector πF . Based on the permutation

of πF (q, a) in vector πF , the stochastic matrix ΛF is also defined with λ(q,a),(q′,a′) as the elements.

The location of λ(q,a),(q′,a′) in ΛF is determined by the permutation of πF (q, a) and πF (q′, a′)

in vector πF . In other words, when πF (q, a) and πF (q′, a′) are the ith and jth elements in πF ,

respectively, we have λ(q,a),(q′,a′) is located at the ith column and jth row in matrix ΛF .

By using the steady-state probability, we next present the average power consumption and

delay. Given the steady-state probability πF (q, a), we express the average power consumption as

PF =

Q∑q=0

A∑a=0

S∑s=0

P (s)πF (q, a)f sq,a. (13)

Similarly, the average delay is given as

DF =1

α

Q∑q=0

A∑a=0

qπF (q, a). (14)

Then, we demonstrate the optimal delay-power tradeoff under the cross-layer transmission

policies. As shown in Eqs. (13) and (14), the average power consumption and delay are presented

based on the steady-state probability πF (q, a) with policy F given. Considering the steady-state

probabilities that satisfy the balance equations in Eq. (12), we can reveal the optimal delay-power

September 2, 2019 DRAFT

Page 10: Delay-Optimal and Energy-Efficient Communications with

10

tradeoff given by problem (9) by the solution in the following problem for each value of Pth.

min{πF ,F }

1

α

Q∑q=0

A∑a=0

qπF (q, a) (15a)

s.t.Q∑q=0

A∑a=0

S∑s=0

P (s)πF (q, a)f sq,a ≤ Pth (15b)

ΛFπF = πF (15c)Q∑q=0

A∑a=0

πF (q, a) = 1 (15d)

S∑s=0

f sq,a = 1 ∀ q, a (15e)

πF (q, a) ≥ 0, f sq,a ≥ 0 ∀ q, a, s, (15f)

where the optimal delay for the problem is generated by the optimal transmission policy F ∗

with the corresponding steady-state probability π∗F ∗(q, a).

With the cross-layer optimization problem (15) given, we finally convert problem (15) to an

equivalent LP problem, through which the optimal average delay is obtained for each given

power constraint Pth. To formulate the LP problem, we use the product of πF (q, a) and f sq,a as

the optimization variables. Defining xsq,a as πF (q, a)f sq,a, we can present the optimal delay-power

tradeoff by using that equivalent LP problem that is shown in the following theorem.

Theorem 1. The problem (15) is equivalent to the following linear programming problem.

min{xsq,a}

1

α

Q∑q=0

A∑a=0

S∑s=0

qxsq,a (16a)

s.t.Q∑q=0

A∑a=0

S∑s=0

P (s)xsq,a ≤ Pth (16b)

min{q′−a′+S,Q}∑q=max{q′−a′,0}

A∑a=0

S∑s=0

γa,a′xsq,a1{s=q+a′−q′} =

S∑s=0

xsq′,a′

∀ 0 ≤ q′ ≤ Q, 0 ≤ a′ ≤ A (16c)Q∑q=0

A∑a=0

S∑s=0

xsq,a = 1 (16d)

xsq,a ≥ 0 ∀ 0 ≤ q ≤ Q, 0 ≤ a ≤ A, 0 ≤ s ≤ S. (16e)Proof: To show the equivalence of problems (15) and (16), we divide the proof into two

parts. We first show that problem (15) is converted into LP problem (16) by replacing πF (q, a)f sq,a

as xsq,a. For each feasible solution πF and F of problem (15), we can generate a feasible solution

for problem (16), i.e., {xsq,a = πF (q, a)f sq,a}. By using the corresponding {xsq,a} in problem (16),

we also obtain the same average power consumption and delay as πF and F in problem (15).

DRAFT September 2, 2019

Page 11: Delay-Optimal and Energy-Efficient Communications with

11

For each feasible solution {xsq,a} of problem (16), we then construct the corresponding policy

F by presenting probability f sq,a as

f sq,a =

xsq,a

πF (q,a), πF (q, a) > 0,

1{s=min{q,S}}, πF (q, a) = 0,(17)

where steady-state probability πF (q, a) under policy F is expressed as πF (q, a) =∑S

s=0 xsq,a. By

substituting the attained f sq,a and πF (q, a) into problem (15), we can check that the constructed

solution satisfies the balance equations in Eq. (15c) with the average delay and power consump-

tion remain unchanged, through which we complete the proof.

With the equivalent LP problem (16) formulated, we demonstrate the optimal delay-power

tradeoff for AWGN channels. By solving the derived LP problem, we can particularly obtain

the minimum average delay with the optimal policy F ∗ given by Eq. (17).

B. The Optimal Delay-Power Tradeoff CurveIn this subsection, we attain the optimal delay-power tradeoff channel by solving the LP

problem (16) that is formulated in Theorem 1 for AWGN channels. By solving the LP problem

over a power-delay plane that contains all the obtainable power and delay pairs under the policies,

we present the optimal delay-power tradeoff curve. In this way, the minimized average delay

can be obtained for the single link under a given average power constraint.

To obtain the optimal delay-power tradeoff curve, we first solve LP problem (16) by consider-

ing the set of all obtainable average power-delay pairs. In particular, a power-delay plane is first

formulated to contain all the average power-delay pairs (PF , DF ) that are generated by the cross-

layer transmission policies F ∈ F . However, for a given transmission policy F = {f sq,a}, we

can only present the corresponding average power-delay pair (PF , DF ) by f sq,a with the assistant

of πF (q, a) as Eqs. (13) and (14), respectively. Considering we determine πF (q, a) under policy

F based on the a series of balance equations in Eq. (11), we can hardly show the power-delay

pair (PF , DF ) as a analytical expression of f sq,a. In this way, we generate the power-delay plane

based on the optimization variables xsq,a in LP problem (16), which can be referred to as the

state-action frequency in MDP [27, Section 8.9]. With the obtainable state-action frequencies

{xsq,a} given, we can analytically present the average power-delay pair by the objective function

and power constraint in LP problem (16). The corresponding policy F is also obtained following

the bijective map presented in Theorem 1.

Thus, we first express the set that consists of all the obtainable state-action frequencies {xsq,a}

under the transmission policies as

G ={{xsq,a : ∀ q, a, s} | Eqs. (16c), (16d), and (16e)

}. (18)

September 2, 2019 DRAFT

Page 12: Delay-Optimal and Energy-Efficient Communications with

12

According to the linear functions in objective function (16a) and power constraint (16b), we

then present the average delay and power, respectively, for the feasible {xsq,a}. As a result, the

power-delay plane is generated to contain all the obtainable average power-delay pairs.

We then express the feasible state-action frequencies {xsq,a} as a ((Q + 1) × (A + 1) ×

(S + 1))-dimension vector. We can straightforwardly demonstrate set G as a polyhedron in a

high dimensional Euclidean space. The obtainable power-delay pairs are next presented as the

projection of the state-action frequencies on the power-delay plane. In other words, the set R

of all the obtainable average power-delay pairs is defined as

R =

{(P,D) | ∀{xsq,a} ∈ G, P =

Q∑q=0

A∑a=0

S∑s=0

P (s)xsq,a, D =1

α

Q∑q=0

A∑a=0

S∑s=0

qxsq,a

}, (19)

where set R is a polyhedron on the power-delay plane.

With definition of set R in Eq. (19), we rewrite the LP problem (16) over the power-delay

plane. In particular, we havemin

(P,D)∈RD (20a)

s.t. P ≤ Pth. (20b)

In this way, we demonstrate the optimal delay-power tradeoff described in cross-layer optimiza-

tion problem (9) over the power-delay plane. With the derived LP problem in Eq. (20), we

obtain the optimal power-delay pair (P ∗, D∗) by searching the power-delay pair that minimizes

the delay in set R∩ {(P,D) | P ≤ Pth}.

We finally formulate the optimal delay-power tradeoff curve for AWGN channels asL = {(P ∗, D∗) ∈ R | ∀(P , D) ∈ R, either P ∗ ≤ P or D∗ ≤ D}, (21)

which consists of all the optimal delay-power pairs under different power constraints. For each

optimal power-delay pair (P ∗, D∗) in problem (20), (P ∗, D∗) belongs to L because we have

that D∗ ≤ D if (P , D) ∈ R ∩ {P ≤ Pth}, and P ∗ ≤ Pth ≤ P if (P , D) ∈ R ∩ {P ≥ Pth}.

Meanwhile, each element (P ∗, D∗) in set L can minimize the average delay in problem (20)

with power constraint Pth as P ∗. Further, the geometric properties of the optimal delay-power

tradeoff curve are then presented in the following theorem.

Theorem 2. The optimal tradeoff curve L is piecewise linear, decreasing, and convex.

Proof: The proof of the geometric properties follows directly from [22, Corollary 3]. We

include the main idea of it for completeness. With the optimal tradeoff curve L expressed as

Eq. (21), we first show that L is convex and decreasing according to the definitions of convex

and decreasing function, respectively. By showing L as a part of bound of the polyhedron R,

we next present L as a piecewise linear curve.

DRAFT September 2, 2019

Page 13: Delay-Optimal and Energy-Efficient Communications with

13

In this way, we present the optimal delay-power tradeoff by solving the equivalent LP problem

on the power-delay plane. By employing the state-action frequencies, we analytically present the

optimal delay-power tradeoff curve for AWGN channels, under which the minimized average

delay is attained for the adaptive transmitter with a given power constraint.

C. The Optimal Delay-Power Tradeoff with an Arbitrary Initial State

In this subsection, we show that the same optimal delay-power tradeoff is obtained for AWGN

channels by the optimal adaptive transmission policies under an arbitrary initial state. With

different initial queue lengths and arrival rates, we may have different average delays and powers

under a given transmission policy because different steady-state distributions can be obtained

with multiple closed classes existing in the corresponding MRP [28, Section 4.3]. However, for

the optimal transmission policies of LP problem (16), we show that the same average delay and

power consumption is obtained for AWGN channels with an arbitrary initial state.

For each power-delay pair on curve L, the corresponding optimal adaptive transmission policy

is first formulated by solving LP problem (16). In particular, with the optimal solution {x∗sq,a}

of LP problem (16), we obtain the optimal policy by determining f sq,a according to Eq. (17).

Then, we demonstrate that the optimal adaptive transmission policy can obtain the same

optimal tradeoff under an arbitrary initial state. In other words, we show that the performance of

the optimal policy on the average delay and power consumption is independent with an initial

state. For this purpose, we only need to show that the Markov chain induced by the MRP has

only one closed communication class under an optimal policy. These Markov chains are referred

to as unichain. First, we present the structure of the Markov chains for the vertices of the optimal

delay-power tradeoff curve L in the following theorem.

Theorem 3. The optimal delay-power tradeoff curve L satisfies that

1) All vertices of L can be obtained by adaptive transmission policies with unichains;

2) All vertices of L can be obtained by deterministic transmission policies;

3) The policies corresponding to two adjacent vertices of L have different transmission rates

only on one state.

Proof: See Appendix A.

The vertices of the optimal tradeoff curve L can be generated by the optimal deterministic

transmission policies, under which the Markov chains have only one closed class. As a result,

for all the vertices of curve L, the same optimal delay-power tradeoff can be presented by the

corresponding optimal transmission policies for any arbitrary initial state.

September 2, 2019 DRAFT

Page 14: Delay-Optimal and Energy-Efficient Communications with

14

We next show that the same minimized average delay can be obtained under an arbitrary initial

state for the other power-delay pairs on L. Since L is piecewise linear, we first consider the

optimal power-delay points by dividing the curve into several segments with a pair of adjacent

vertices as endpoints. By using the two adaptive transmission policies for the pair of adjacent

vertices, we then construct the optimal policies with unichains for each segment of curve L. In

particular, the construction of the optimal policies relies on the following lemma.

Lemma 1. F ′ = {f ′sq,a} and F ′′ = {f ′′sq,a} are two transmission policies with unichains, and

have different distributions on the transmission rate only when q = q and a = a. We define

policy F = εF ′ + (1− ε)F ′′, where each f sq,a is equal to εf ′sq,a + (1− ε)f ′′sq,a, and 0 ≤ ε ≤ 1.

Then, we have1) The Markov chain under policy F =εF ′+(1− ε)F ′′ is a unichain for each 0 ≤ ε ≤ 1;

2) There exists a 0≤ε′≤1 so that PF =ε′PF ′+(1− ε′)PF ′′ and DF =ε′DF ′+(1− ε′)DF ′′;

3) Parameter ε′ increasingly moves from 0 to 1 with the increase of ε from interval [0, 1].

Proof: See Appendix B.For each pair of adjacent vertices (P , D) and (P , D) on L, we present the two optimal

deterministic policies F∗

and F∗

with unichains, according to Theorem 3. The pair of policies

has different transmission rates only for one particular queue length and arrival rate. According

to Lemma 1, we can present the optimal policy F ∗ as (1− ε)F ∗ + εF∗, by which the average

power-delay is presented as (ε′P+(1−ε′)P , ε′D+(1−ε′)D), and the Markov chain is a unichain.

As a result, we show the existence of the optimal policy with a unichain for each power-delay

pairs (P,D) on the optimal delay-power tradeoff curve L.

According to Theorem 3 and Lemma 1, we finally straightforwardly show that the optimal

delay-power tradeoff curve is obtained under an arbitrary initial state in the following theorem.

Theorem 4. All the average power-delay pairs of the optimal delay-power tradeoff curve L can

be obtained using the adaptive transmission policies with unichains.

Therefore, the optimal delay-power tradeoff for AWGN channels is obtained by the optimal

policy that is given by the LP problem (16). Meanwhile, the same optimal tradeoff is presented

for the single link with different initial queue lengths and arrival rates.

IV. THRESHOLD-BASED OPTIMAL TRANSMISSION POLICY OVER AWGN CHANNELS

In this section, we show the threshold-based structure for the optimal adaptive transmission

policies over AWGN channels. For each optimal average power-delay pair, we present the delay-

optimal transmission strategy by using a threshold-based structure on the queue length, in which

DRAFT September 2, 2019

Page 15: Delay-Optimal and Energy-Efficient Communications with

15

1

Average Power

Aver

ageD

elay

𝜇 𝜇𝜇

𝑃, 𝐷

𝑃, 𝐷

𝑃, 𝐷

Optimal Tradeoff CurveVertices on the CurveFeasible Region

𝜇

Fig. 2. The stretch of the optimal delay-power tradeoff: We present the vertex of the curve as (P , D), while two vertices thatare adjacent with vertex (P , D) are (P , D) and (P , D) with P < P < P . As for the two end points of the curve, we have no(P , D) for the vertex with lowest power; no (P , D) for the vertex with largest power.

the thresholds for different transmission rates are given for the arrival rates. To this end, we

first present the threshold-based optimal policies for the vertices of the optimal tradeoff curve

L based on the Lagrangian relaxation of the cross-layer optimization problem (9). Further, by

using the optimal policies on the vertices, we formulate the threshold-based transmission policy

for each average power-delay pairs on curve L. With the threshold-based structure, we finally

develop a threshold-based algorithm to efficiently obtain the optimal delay-power tradeoff.

A. Threshold-based Optimal Deterministic Policy for the Lagrangian Relaxation ProblemIn this subsection, the threshold-based optimal deterministic policies are shown for all the

vertices of the optimal delay-power tradeoff curve L that is formulated for AWGN channels. For

each vertex on L, we first obtain an optimal deterministic policy by exploiting the Lagrangian

relaxation problem for cross-layer optimization problem (9). Then, for the optimal deterministic

policies, we show that there exists a threshold-based structure on the queue lengths.

First, we formulate the Lagrangian relaxation problem for each vertex. As shown in Fig. 2,

for each µ > 0, we always find a vertex on tradeoff curve L to get the minimum value of

D + µP . For each vertex on L, we further show a set of µ as (µmin, µmax) 1, under which the

vertex obtains the minimized value of D + µP . In particular, we have that µmin = P−PD−D for all

the vertices except the one with the largest power, while we set µmin as 0 for this vertex based

on the observation of Fig. 2. Similarly, we have µmax = P−PD−D for the vertices with a less power,

and µmax = +∞ for the vertex with the lowest power.

1When µ is equal to µmin or µmax, two adjacent vertices can obtain the minimum value of D + µP .

September 2, 2019 DRAFT

Page 16: Delay-Optimal and Energy-Efficient Communications with

16

Since set R consists of all the power-delay pairs given by policies F ∈ F , we can show the

optimal policy for each vertex by the following Lagrangian relaxation problem

minF∈F

DF + µPF − µPth, (22)

where the multiplier µ belongs to the corresponding set for the given vertex (P , D) on L.

Therefore, we show the optimal policy for each vertex (P , D) by solving Lagrangian relax-

ation problem (22) with specific µ employed. In particular, we formulate problem (22) as an

unconstrained infinite-horizon MDP with the objective function as DF + µPF . According to

the result given by [27, Theorem 9.1.8], we have that the unconstrained MDP is minimized by

a deterministic policy, under which the corresponding Markov chain is a unichain. Further, we

show that the deterministic policy is presented by a threshold-based structure on the queue length.

Theorem 5. For each vertex (P , D) on curve L, the optimal deterministic policy F ∗ is presented

by the threshold-based structure on the queue length, in which thresholds qF ∗(s, a) exist for every

0 ≤ s ≤ S, 0 ≤ a ≤ A, and the probabilities f ∗sq,a satisfy that f ∗sq,a = 1 qF ∗(s− 1, a) < q ≤ qF ∗(s, a),

f ∗sq,a = 0 otherwise,(23)

where we have 0≤qF ∗(0, a)≤qF ∗(1, a)≤ · · ·≤qF ∗(S, a)≤Q and qF ∗(−1, a)=−1 for each a.

Proof: See Appendix C.With a threshold-based optimal deterministic policy F ∗ given, we show a series of thresholds

{qF ∗(s, a) : s = 0, 1, · · · , S} for each arrival rate a ∈ {0, 1, · · · , A}. By using the thresholds

on the queue length, we then can completely describe the corresponding optimal deterministic

policy for each vertex of the optimal tradeoff curve L. Moreover, we can determine the delay-

optimal transmission strategy by using the order relation of queue lengths with the thresholds

under different arrival rates.

B. Threshold-Based Optimal Adaptive Transmission Policy

We now present the threshold-based optimal policy for each power-delay pair on the optimal

delay-power tradeoff curve L. With a given power-delay pair on curve L, we construct the

threshold-based optimal policy as a convex combinations of the optimal deterministic policies

for the vertices on L which are presented in Theorem 5. In particular, we present the threshold-

based optimal policies for AWGN channels in the following theorem.

Theorem 6. The optimal policy F ∗ exists (A + 1) × (S + 1) thresholds qF ∗(s, a), where we

have 0 ≤ qF ∗(0, a) ≤ qF (1, a) ≤ · · · ≤ qF (S, a) ≤ Q for each arrival rate a = 0, 1, · · · , A. With

DRAFT September 2, 2019

Page 17: Delay-Optimal and Energy-Efficient Communications with

17

all the thresholds qF ∗(s, a) given, the optimal policy F ∗ satisfiesf ∗sq,a = 1 qF ∗(s− 1, a) < q ≤ qF ∗(s, a), a 6= a∗ or s 6= s∗

f ∗sq,a = 1 qF ∗(s− 1, a) < q < qF ∗(s, a), a = a∗ and s = s∗

f ∗sq,a + f ∗(s−1)q,a = 1 q = qF ∗(s, a), a = a∗ and s = s∗

f ∗sq,a = 0 otherwise.

(24)

where the specific transmission rate s∗ and arrival rate a∗ are given by optimal policy F ∗, and

we have qF ∗(−1, a) = −1 for each 0 ≤ a ≤ A.

Proof: Our proof starts with the observation that the optimal policies corresponding to the

vertices of the optimal delay-power tradeoff curve L satisfy Eq. (24). Then, we only need to

construct the optimal policies satisfying Eq. (24) for the other average power-delay pairs on

curve L. In particular, we show the construction by using the properties of L in Theorem 3 and

the threshold-based structure for the optimal policies on the vertices.

For each power-delay pair (P,D) on L, we can find a pair of adjacent vertices (P , D) and

(P , D), under which the power-delay pair is exactly on the line segment with the two vertices

as the endpoints. According to Theorem 5, we have that the pair of vertices on the curve L is

generated by two threshold-based deterministic policies F∗

and F∗, respectively. In other words,

both the policies satisfy Eq. (23) as well as Eq. (24). Meanwhile, the two policies F∗

and F∗

will employ different transmission rates only on a particular queue length and arrival rate. As a

result, according to Lemma 1, we can formulate the corresponding optimal policy for (P,D) as

the convex combination of the two threshold-based deterministic policies.

Considering the two deterministic policies F∗

and F∗

for the two adjacent vertices are

threshold-based, we have that there exist the specific transmission rate s∗ and arrival rate a∗,

under which the corresponding thresholds for the two policies are different. Further, we have

that the thresholds under the two policies are adjacent on the queue length, i.e., |qF ∗(a∗, s∗) −

qF ∗(a∗, s∗)| = 1. Therefore, we show that the threshold-based optimal policy satisfies Eq. (24)

for each (P,D) on curve L, and the proof is completed.

For a given threshold-based optimal policy F ∗, we obtain a series of thresholds {qF ∗(s, a) : 0 ≤

s ≤ S} under different arrival rates a. Based on the order relation of {qF ∗(s, a)} in Theorem 6,

we have that the transmission rate increases with the increase of the queue length. Moreover,

according to Theorem 6, the threshold-based optimal policy can be expressed as the convex

combination of two adjacent deterministic threshold-based policies shown in Theorem 5. As a

result, for each system state (q[n], a[n]) except (qF ∗(s∗, a∗), a∗), we determine the transmission

September 2, 2019 DRAFT

Page 18: Delay-Optimal and Energy-Efficient Communications with

18

1

Average Power

Aver

age

Del

ay

Optimal Tradeoff Curve

Feasible RegionAdjacent Policies for Θ

Θ

Θ

Θ

Θ

Θ

Fig. 3. Demonstration of the algorithm to obtain the optimal delay-power tradeoff curve.

rates for AWGN channels by the queue length and arrival rate with the probability as 1. While

the queue length is qF ∗(s∗, a∗) and arrival rate is a∗, the transmission rate is given as s∗ and

s∗ − 1 with probabilities f ∗s∗

qF∗ (s∗,a∗),a∗ and f ∗s∗−1qF∗ (s∗,a∗),a∗ , respectively.

C. Algorithm to Obtain the Optimal Tradeoff

We finally develop a threshold-based algorithm to efficiently obtain the optimal delay-power

tradeoff curve for AWGN channels. In this way, the minimized delay can be generated by the

optimal threshold-based policy for the given power constraint, which will be adjusted by practical

systems based on the time varying delay and power efficiency requirements. Considering the

piecewise linearity of the optimal tradeoff curve L, we first attain all the vertices of L and

the corresponding threshold-based optimal deterministic policies. As shown in Algorithm 1, we

search the vertices sequence {Θ0,Θ1, · · · ,ΘN} starting from Θ0 with an iteration procedure.

For the vertex Θ0 in Fig. 3, we obtain it by the policy that transmits the packets as soon as they

arrive at the buffer. In particularly, we denote this transmission policy by F 0.

We next present the iteration procedure in Algorithm 1 to find the current vertex Θn+1

based on the previous vertex Θn. With the optimal deterministic policy F ∗n for previous vertex

Θn = (Pn, Dn), we can detect the current vertex Θn+1 = (Pn+1, Dn+1) by focusing on all the

adjacent threshold-based deterministic policies of F ∗n. Overall the candidates of transmission

policies, we obtain the threshold-based optimal deterministic policy F ∗n+1 for vertex Θn+1

based on the decreasing and convexity of L. More specifically, the average power-delay pair

generated by F ∗n+1, i.e., Θn+1, has the slower increment of the average delay per decrement

of the average power consumption starting from vertex Θn than that generated by any other

candidate. Therefore, the current vertex Θn+1 and optimal policy F ∗n+1 can be determined by

DRAFT September 2, 2019

Page 19: Delay-Optimal and Energy-Efficient Communications with

19

Algorithm 1 Obtain the Optimal Delay-Power Tradeoff for AWGN channels1: F ← F 0, n← 02: DF ← average delay under policy F , PF ← average power under policy F3: Fc ← [F ], Dc ← DF , Pc ← DF4: while Fc 6= ∅ do5: Fp ← the set containing an arbitrary policy in Fc, Fc ← ∅, Fp ← ∅6: Dp ← Dc, Pp ← Dc, slope← +∞7: while Fp 6= ∅ do8: F = Fp . pop(0), Fp .append(F ), Fp ← ∅9:

F(F )← the set of all threshold-based deterministic policies satisfying Eq. (23)with the only one different threshold comparing with F

10: for all F ′ ∈ F(F ) do11: DF ′ ← average delay under F ′, PF ′ ← average power under F ′

12: if DF ′ = Dp, PF ′ = Pp, and F ′ /∈ Fp then13: Fp . append(F ′)14: else if DF ′ = Dc and PF ′ = Pc then15: Fc . append(F ′)16: else if DF ′−Dp

Pp−PF ′< slope or DF ′−Dp

Pp−PF ′= slope, PF ′ > Pc then

17: Fc ← [F ′], Dc ← DF ′ , Pc ← PF ′ , slope←DF ′−DpPp−PF ′

18: end if19: end for20: Fp ← Fp21: end while22: n← n+ 123: Θn = (Pc, Dc), F ∗n = Fc . pop(0)24: end while

enumerating all the deterministic policies that are adjacent with F ∗n. Further, we narrow down

the alternatives of policy F ∗n+1 by using the threshold-based structure presented in Theorem

6. In Algorithm 1, we denote by Fp the set of the threshold-based policies under which the

previous vertex Θn is generated as the average power-delay pair. By enumerating the adjacent

threshold-based deterministic policies for each policy in Fp, we can obtain the current vertex

Θn+1 and the corresponding threshold-based optimal policies. During the searching process, we

backlog the candidate of the optimal policy in set Fc, under which a less absolute slope and a

lower power decreasing can be obtained on the power-delay plane. As a result, when we traverse

all the optimal policies that generate the vertex Θn, the threshold-based optimal deterministic

policy F ∗n+1 is also attained for the current vertex Θn+1.

Considering all the vertices are detected for curve L, we finally show the optimal delay-

power tradeoff under an arbitrary power constraint. With the power constraint Pth given, we

construct the corresponding optimal policy as a convex combination of two threshold-based

policies. According to Theorem 6, the two threshold-based policies corresponds to two adjacent

vertices (Pn, Dn) and (Pn+1, Dn+1) that satisfy (Pn−Pth)(Pn+1−Pth) ≤ 0. In this way, we can

September 2, 2019 DRAFT

Page 20: Delay-Optimal and Energy-Efficient Communications with

20

find the two adjacent vertices on L by checking the sequence {Θ0, · · · ,ΘN}. Considering the

sequence is permuted with the power components increasing, we will end the research when

finding the first vertex whose power component is less than Pth. According to Lemma 1, we

obtain the multiplier of the convex combination by the binary search over interval [0, 1]. By

this means, the optimal delay-power tradeoff can be demonstrated under an arbitrary power

constraint. The threshold-based optimal transmission policy is also effectively formulated based

on the threshold-based deterministic policies for the vertices.

Furthermore, we present the complexity of the proposed algorithm. Considering an iteration

process is employed for Algorithm 1, we first show the maximum number of iterations that search

the adjacent policies for set Fp; then analyze the complexity in each iteration. As indicated in

Algorithm 1, we update set Fp in each iteration by changing one particular state’s transmission

rate. Meanwhile, under two arbitrary deterministic policies, the number of different transmission

rates is no more than the number of system states, i.e., QA. As a result, the number of iterations

is no more than QA. For each iteration, we further calculate the average delay and power for the

AS adjacent threshold-based policies of F , where the most time-consuming operation for each

candidate, that is the matrix inversion, costs O(Q3A3) in terms of time. In this way, the time

complexity of Algorithm 1 is O(Q4A5S). Moreover, considering the set Fp has the most space

consumption with the maximum number of policies as QA, we have that the space complexity

is O(Q2A2S), where each policy is contained in Fp with the QAS probabilities stored.

For practical systems, we can formulate a trajectory-sampling version of the algorithm. More

specifically, we generate the average delay and power as the mean value of 1αq[n] and ρ[n] based

on a long-term sampling of s[n], q[n], as well as a[n]. The optimal delay-power tradeoff is then

presented for the practical systems over AWGN channels without prior need of arrival statistics.

V. OPTIMAL DELAY-POWER TRADEOFF FOR BLOCK FADING CHANNELS

In this section, we extend the optimal delay-power tradeoff over block fading channels. Based

on the analyses of the optimal tradeoff for AWGN channels, we first show the optimal delay-

power tradeoff for block fading channels by converting the CMDP to an LP problem. By solving

the equivalent LP problem, we then formulate an optimal delay-power tradeoff curve, where we

show the properties for the curve that are same as those in Section III. We finally present the

optimal transmission policies over the fading channel with a threshold type of structure on the

queue length. For the optimal threshold-based policies, we further show an order relation of the

thresholds under different channel states, when the power functions follow a particular condition.

DRAFT September 2, 2019

Page 21: Delay-Optimal and Energy-Efficient Communications with

21

First, we present the optimal delay-power tradeoff over block fading channels. For the gener-

alized system over the fading channel, we employ the steady-state analysis for each transmission

policy F = {f sq,a,ι : ∀ q, a, ι, s}, as presented in Section III-A. For this purpose, we formulate

a Markov reward process for each given policy F , through which the average delay and power

consumption are presented by the steady-state probability. In this way, we further show the

optimal delay-power tradeoff by using an LP problem, where all the obtainable power-delay

pairs are presented for the transmission policies in terms of the state-action frequencies {xsq,a,ι}.

In particular, we present the LP problem as follows.

min{xsq,a,ι}

1

α

Q∑q=0

A∑a=0

L∑ι=1

S∑s=0

qxsq,a,ι (25a)

s.t.Q∑q=0

A∑a=0

L∑ι=1

S∑s=0

Phι(s)xsq,a,ι ≤ Pth (25b)

min{q′−a′+S,Q}∑q=max{q′−a′,0}

A∑a=0

L∑ι=1

S∑s=0

γa,a′ηι′

xsq,a,ι1{s=q+a′−q′} =S∑s=0

xsq′,a′,ι′ ∀ q′, a′, ι′ (25c)

Q∑q=0

A∑a=0

L∑ι=1

S∑s=0

xsq,a,ι = 1 (25d)

xsq,a,ι ≥ 0 ∀ q, a, ι, s. (25e)

By solving the LP problem under different power constraints Pth, we then formulate the optimal

delay-power tradeoff curve, which contains all the optimal average power-delay operating points

under different power constraints. With the same method in Sections III-B and III-C employed,

we straightforwardly obtain the same properties of the optimal tradeoff curve as follows.

Theorem 7. For block fading channels, the optimal delay-power tradeoff curve is piecewise

linear, decreasing, and convex. The vertices of the optimal tradeoff curve are obtained by a

series of deterministic transmission policies with unichains. For each two adjacent vertices, the

corresponding two policies have different transmission rates only on one state.

Proof: The proof of this theorem is directly taken from the method of Theorems 2 and 3.As a result, the optimal delay-power tradeoff over the fading channel is obtained by solving

the LP problem (25), where the optimal policies are generated by the optimal solutions based

on the extension of Eq. (17) for fading channels. By jointly exploiting the results in Theorem

7 and Lemma 1 over fading channels, we have that the optimal average delays are obtained by

the corresponding optimal policies regardless of the initial system state.

September 2, 2019 DRAFT

Page 22: Delay-Optimal and Energy-Efficient Communications with

22

Based on the analyses of the optimal tradeoff curve, we finally show that the optimal delay

can be obtained by the optimal threshold-based policies over the fading channel. With a similar

way indicated in Section IV, we show the threshold-based structure of the optimal policies in

the following theorem. In particular, we first present the optimal deterministic threshold-based

policies for the vertices of the tradeoff curve, where we employ the same method in Theorem 5

for the CMDP generated over the fading channel. Then, for other points on the optimal tradeoff

curve, we show the threshold-based structure of the optimal policies by presenting them as

the convex combination of two adjacent deterministic threshold-based policies, as indicated in

Theorem 6. We show the optimal threshold-based policies in the following theorem.Theorem 8. The optimal policy F ∗ exists (A+1)×(S+1)×L thresholds qF ∗(s, a, ι), where we

have 0≤qF ∗(0,a,ι)≤· · ·≤qF (S,a,ι)≤Q for each arrival rate a, 0 ≤ a ≤ A and index of channel

state ι, 1 ≤ ι ≤ L. With the thresholds qF ∗(s, a, ι) given, the optimal policy F ∗ satisfiesf ∗sq,a,ι = 1 qF ∗(s−1, a, ι) < q ≤ qF ∗(s, a, ι), a 6=a∗ or s 6=s∗ or ι 6= ι∗

f ∗sq,a,ι = 1 qF ∗(s−1, a, ι) < q < qF ∗(s, a, ι), a=a∗ and s=s∗ and ι= ι∗

f ∗sq,a,ι + f ∗(s−1)q,a,ι = 1 q = qF ∗(s, a, ι), a = a∗ and s = s∗ and ι = ι∗

f ∗sq,a,ι = 0 otherwise.

(26)

where the specific s∗, a∗, and ι∗ are given by F ∗, and qF ∗(−1, a, ι) = −1 for each a and ι.Proof: The proof of this theorem is directly taken from the method of Theorems 5 and 6.

With the threshold-based structure of the optimal policies, we can efficiently determine the

transmission rate for the adaptive transmitter over fading channels. With the current arrival rate

a[n] and channel state h[n] given as a and hι, respectively, we present the transmission rate by

comparing the current queue length with the series of thresholds {qF ∗(s, a, ι) : 0 ≤ s ≤ S}. As

a result, we can also obtain the optimal delay-power tradeoff for fading channels by developing

a similar algorithm as Algorithm 1. Moreover, we show an order relation of the thresholds under

different transmission rate in the following theorem.Theorem 9. The thresholds qF ∗(s, a, ι), 1 ≤ ι ≤ L of the optimal policy F ∗ satisfy

qF ∗(s, a, ι+) ≤ qF ∗(s, a, ι

−), ∀ 1 ≤ ι− < ι+ ≤ L, (27)for each transmission rate s and arrival rate a, when power functions Phι(s), 1≤ ι≤L satisfies

Phι+ (s+)− Phι+ (s−) ≤ Phι− (s+)− Phι− (s−), (28)where we have 0 ≤ s− < s+ ≤ S.

Proof: See Appendix D.According to the order relation of thresholds in Eq. (27), a greater rate will employed for

a better channel condition under the optimal threshold-based policies, if the condition in Eq.

(28). Actually, for a typical communication system, we have that the power consumption for a

DRAFT September 2, 2019

Page 23: Delay-Optimal and Energy-Efficient Communications with

23

Fig. 4. Optimal Delay-Power Tradeoff Curves

transmission rate is inversely proportional to the square of amplitude of channel coefficient, i.e.,Phι+

(s)

|hι− |2=

Phι−

(s)

|hι+ |2. As a result, we can straightforwardly check the condition in Eq. (28) in the

typical system, through which the order relation of thresholds of optimal policies is satisfied.VI. NUMERICAL RESULTS

In this section, we present the numerical results to validate the optimal delay-power tradeoff

for the adaptive transmitter with Markov random arrivals. In a practical scenario, we consider

that the maximum transmission rate S is equal to 3, under which we employ three optional

modulations BPSK, QPSK, or 8-PSK to transmit 1, 2, or 3 packets in a timeslot, respectively.

We assume that each packet contains 10,000 bits and time duration of timeslot is 10 ms. With

the bandwidth as 1 MHz and the one-sided noise power spectral density N0 as −150 dBm/Hz,

we calculate the transmission powers over AWGN channels as P (0) = 0 W, P (1) = 9.0×10−12

W, P (2) = 18.2× 10−12 W, and P (3) = 59.5× 10−12 W, by which the bit error rate as 10−5 is

provided. Moreover, we consider a specific class of the arrival processes. For each arrival process,

we determine the transition matrix Γ = [γa,a′ ] by a constant ψ and a vector ζ = [ζ0, ζ1, · · · , ζA]T .

In particular, we define matrix Γ by presenting each element γa,a′ as

γa,a′ =1− ζaA

+(A+ 1)ζa − 1

A1{a′=(a+ψ) mod (A+1)}. (29)

As a result, we construct an arrival process by using a tuple (ζ, ψ), and have that ζa ∈ [0, 1]

and ψ ∈ {−A,−A+ 1, · · · , A}.

First, Fig. 4 presents the optimal delay-power tradeoff curves for AWGN channels, where

we consider the impact of different average arrival rates. For the optimal tradeoff curves, we

validate the theoretical results by using the Monte-Carlo simulation. We assume the maximum

September 2, 2019 DRAFT

Page 24: Delay-Optimal and Energy-Efficient Communications with

24

(a) The average transmission rates for the optimal policy (b) The thresholds for the optimal policyFig. 5. Typical optimal threshold-based policy

arrival rate and transmission rate as 3, and the buffer size as 7. The optimal delay-power tradeoff

curves are next presented for the three different arrival processes, all of which are charactered

as (ζi, 0), i = 1, 2, 3. In particular, we have ζ1 = [0.7, 0.7, 0.5, 0.5], ζ2 = [0.5, 0.5, 0.5, 0.5], and

ζ3 = [0.3, 0.3, 0.5, 0.5]. The average rates for the three arrival processes are equal to 1.25, 1.50,

and 1.67, respectively. As presented in Fig. 4, the optimal delay-power tradeoff given by Algo-

rithm 1 and solving the LP problem can perfectly match the results that are given by the Monte-

Carlo simulation. In each optimal tradeoff curve, the optimal average delay is decreasing with

the increase of average power consumption. Further, a close observation shows that each curve is

piecewise linear and convex, by which we confirm Theorem 2. Then, we present different optimal

delay-power tradeoff curves under different average arrival rates. When the power constraint is

Pth = 18× 10−12 W, the average delay under (ζ2, 0) can reduce by 47% compared with that

under (ζ3, 0). To achieve the average delay D = 1.2×10ms, arrival processes (ζ2, 0) and (ζ3, 0)

require greater power consumptions, which are 126% and 143% of that for (ζ1, 0).

Then, we turn our attention to the threshold-based structure of the optimal cross-layer trans-

mission policy over AWGN channels. Fig. 5 presents the typical threshold-based optimal policy

F ∗ for the identified system configuration as Fig. 4 with the arrival process given as (ζ2, 0)

and Pth = 14.82× 10−12 W. In Fig. 5(a), we particularly show the average transmission rates

under different queue lengths and arrival rates. We also indicate the threshold-based structure in

Fig. 5(b), where we present the thresholds by red solid lines. According to the order relation of the

thresholds in Theorem 6, we show a greater transmission rate for a longer queue length under the

optimal policy F ∗. Following Theorem 6, we further present the typical policy as a convex com-

bination of two adjacent deterministic policies, both of which exist a threshold-based structure

in Theorem 5. As a result, the transmission rates under F ∗ are deterministic for the system states

DRAFT September 2, 2019

Page 25: Delay-Optimal and Energy-Efficient Communications with

25

Fig. 6. Optimal Delay-Power Tradeoff Curves under different arrival processes

except for a specific one with arrival rate and queue length as 3 and 6, respectively.

We next show the impact of different patterns of Markov arrivals to the optimal delay-power

tradeoff over AWGN channels even if we employ the same average rate and covariance in these

random arrivals. In particular, we focus on the three arrival patterns that are denoted by A1,

A2, and A3, the transition matrices of which are given by (κ11, 1), (κ21, 0), and (κ31,−1),

respectively. We have κi ∈ [0, 1] for i = 1, 2, 3, and all the elements of vector 1 are equal to

1. Then, the random arrivals under all the three patterns have the same steady-state probability

distribution, and the steady-state probabilities of all the arrival rates are the same, i.e., 1A+1

.

Therefore, the average arrival rate of each arrival process is equal to A2

. To obtain the same

covariances cov(a[n], a[n+ 1]) for the random arrivals under three different arrival patterns, we

set κ1 = κ3 = κ and κ2 = 3−2κ10

, under which we have cov(a[n], a[n+ 1]) = 1−4κ10

.

As shown in Fig. 6, we present the optimal delay-power tradeoff curves for the three arrival

patterns with parameter κ given as 0.1, 0.25, and 0.71. In particular, we have a lower average delay

for arrival processes A1 and A2 if we increase κ, i.e., decrease covariance. When Pth = 17×10−12

W, the average delay under A1 with κ = 0.7 can be reduced by 37% and 43% compared to that

with κ = 0.25 and κ = 0.1. As for arrival process A2, we have that the average delay is reduced

by 11% and 14%. However, for arrival process A3, the average delays under the three value of

κ have different order relations with the varying of the average power constraint.

1When κ is equal to 0.25, we have the same arrival processes under the three arrival patterns, through which thecorresponding curves are coincident

September 2, 2019 DRAFT

Page 26: Delay-Optimal and Energy-Efficient Communications with

26

Fig. 7. Optimal Delay-Power Tradeoff Curves

In Fig. 7, we present the procedure to obtain the optimal delay-power tradeoff for AWGN

channels, which is given by Algorithm 1. To simplify the figure, we assume Q = 4 and S =

A = 2, and consider the arrival process presented by ([0.6, 0.6]T , 1). We first show the power-

delay pairs obtained by the deterministic policies by using marker ’o’. Further, we connect

the two points generated by two adjacent policies by the black dash lines. With the vertex

Θ0 and corresponding policy F 0 given, we seek the vertices Θn among the threshold-based

deterministic policies that are adjacent with the previous optimal policies in set Fp. To present

those investigated policies in Algorithm 1, we particularly show the corresponding power-delay

pairs by marker ’×’ and connect them with adjacent vertices on the optimal tradeoff curve L

by the red dash lines. As shown in Fig. 7, the optimal delay-power tradeoff can be effectively

obtained by Algorithm 1, where a few adaptive transmission policies are investigated over all

the deterministic policies.

We finally show the optimal delay-power tradeoff for block fading channels. In Fig. 8, we

consider an L-state block fading channel with L given as 4. In particular, for the fading channel,

the amplitudes of the four channel states, i.e., |hι|, ι = 1, · · · , 4, are given as 0.314, 2.50, 3.54,

and 5.00, respectively. The corresponding probabilities ηι are presented as 0.394, 0.232, 0.239,

and 0.135. We obtain the power consumptions under different channel states by define the power

consumption function Phι(s) as P (s)|hι|2 for each s and ι. As a result, we present the optimal delay-

power tradeoff curves for different arrival processes in Fig. 8(a), where we employed the same

system configuration as that Fig. 4. Moreover, we also show the average transmission rates under

DRAFT September 2, 2019

Page 27: Delay-Optimal and Energy-Efficient Communications with

27

(a) The optimal delay-power tradeoff curve over the blockfading channel

(b) The average transmissin rate under the optimal policywith the arrival rate given as 2

Fig. 8. Optimal delay-power tradeoff over fading channel

an optimal threshold-based policy in Fig. 8(b) with the current arrival rate a[n] given as 2. As

indicated in Fig. 8(b), we present the threshold-based structure for the optimal policy, in which

we show the thresholds on the queue lengths by red solid lines. In Fig. 8(b), we further illustrate

the order relation of the thresholds under different channel states that is given by Theorem 9. By

this means, under the current queue length, a greater transmission rate is employed for a better

channel condition.

VII. CONCLUSION

In this paper, we have obtained the optimal delay-power tradeoff required for transmission

over a wireless link under Markov arrivals. The problem can be formulated as a CMDP, under

which we jointly consider the queue length, arrival rate, and channel state to minimize the

average delay under an average power constraint. To obtain the optimal delay-power tradeoff,

we have shown an equivalent LP problem based on the steady-state analysis of the Markov

reward process. Varying the power constraints in the derived LP problem, we show that the

optimal delay-power tradeoff curve is decreasing, convex and piecewise linear. Based on these

geometric properties, we have also presented the optimal adaptive transmission policies for the

optimal power-delay pairs on the tradeoff curve. Further, the threshold-based structure of the

optimal policies has been demonstrated in the queue length by using the Lagrangian relaxation.

With the threshold-based structure, we have developed a threshold-based algorithm to efficiently

obtain the optimal delay-power tradeoff for practical communications.

APPENDIX A

PROOF OF THEOREM 3The proof falls into three parts. We first show that the policies generating the vertices of

curve L with unichains. To obtain a contradiction, we suppose that there exists a policy F for

September 2, 2019 DRAFT

Page 28: Delay-Optimal and Energy-Efficient Communications with

28

a vertex of L, under which a multichain is generated with the number of closed classes as I .

Then, the set {xsq,a : πF (q, a)f sq,a} is varied with the initial state. Moreover, we can construct a

series of policies F i, i = 1, 2, · · · , I with unichains, among which the policy F i employs the

same transmission rates as policy F for each state in the ith recurrent closed class. The existence

of policy F i is provided by the communicating property of the CMDP in [27, Section 8.3.1].

As a result, the same steady-state distribution is obtained under the policies F i and F with

the system starting from the ith recurrent closed class. In this way, the state-action frequencies

{xsq,a} generated by F can be expressed as the convex combination of state-action frequencies of

policies F i, where the convex multipliers are determined based on the initial state. As a corollary,

{xsq,a} is not the vertex of G. Since set R is the projection of G and contains L, we have that

the vertices of L must be projected by the vertices of G, which induces to a contradiction.

Then, we show that all the vertices of curve L are obtained by the deterministic policies. For

this purpose, we apply the similar consideration as [29, Theorem 4.2]. This theorem shows that

the vertices of G are generated by the deterministic policies, if all the considered policies have

unichains. With the above analysis, the proof is straightforwardly checked based on the theorem.

We finally show the relationship of policies for two adjacent vertices on L. We start the

proof with the observation that the edge connecting the two adjacent vertices on curve L is the

projection of an edge on G, where the vertices are generated by the deterministic policies. For

the two adjacent vertices on G, the corresponding deterministic policies are different only on

one state. The conclusion also holds for the degenerated case that the edges connecting a series

of adjacent vertices of L are collinear. In this way, the proof of this theorem is completed.

APPENDIX B

PROOF OF LEMMA 1

We begin by recalling that the probabilities of the transmission rates in policy F are the same

as that in policies F ′ and F ′′ for all the states except state (q, a). When the system state is given

as (q, a), we next randomly determine the employed policy as F ′ or F ′′ with probabilities ε or

1−ε, respectively. Considering that F ′ and F ′′ have unichains, we can visit (q, a) within a finite

time duration starting from any other state under F ′ and F ′′. As a result, we can also obtain

the identified random process under policy F , i.e., the system visits (q, a) starting from a given

state. Therefore, there exists only one recurrent closed class in the Markov chain under policy

F , i.e., policy F has a unichain.

DRAFT September 2, 2019

Page 29: Delay-Optimal and Energy-Efficient Communications with

29

Then, we present that the average power and delay under F is formulated as the convex

combination of those under F ′ and F ′′ based on the relationship of the three policies’ state-

action frequencies. To this end, we first present xsq,a under F by the method in [28, Eq. 4.3.8] as

xsq,a = πF (q, a)f sq,a =EFq,a{N(q,a,s)}EFq,a{T}

, where we have T = inf{n ≥ 1 : q[n] = q, a[n] = a} and

N(q, a, s) =∑T−1

n=0 1{q[n]=q,a[n]=a,s[n]=s}.

Considering that the average power and delay in Eqs. (16a) and (16b) are linear functions

of xsq,a, we only need show the relationship of xsq,a, x′sq,a, and x′′sq,a, where we define x′sq,a =

πF ′(q, a)f ′sq,a and x′′sq,a = πF ′′(q, a)f ′′sq,a. Based on the above definition of xsq,a, we have

xsq,a =εEF ′q,a{N(q, a, s)}+ (1− ε)EF ′′q,a{N(q, a, s)}

εEF ′q,a{T}+ (1− ε)EF ′′q,a{T}

=εEF ′q,a{T}x′

sq,a + (1− ε)EF ′′q,a{T}x′′

sq,a

εEF ′q,a{T}+ (1− ε)EF ′′q,a{T}, (30)

where the first equality holds based on the above analysis of transmission process under F .

By defining ε′ =εEF ′q,a{T}

εEF ′q,a{T}+(1−ε)EF ′′

q,a {T}, we finally have xsq,a = ε′x′sq,a + (1− ε′)x′′sq,a. Therefore,

the average power and delay under policy F are given as PF = (1 − ε′)PF ′+ε′PF ′′ and DF =

(1 − ε′)DF ′+ε′DF ′′ , respectively. An easy computation shows that ε′ is monotone increasing

with ε under the given EF ′q,a{T} and EF ′′q,a{T}. Meanwhile, we have that policy F degenerates to

policy F ′ and F ′′ with ε as 1 or 0, respectively, where the corresponding ε′ is equal to 1 or 0.

APPENDIX C

PROOF OF THEOREM 5The main idea of the proof is to formulate the optimal policies for vertices of the optimal

tradeoff curve L based on value iteration algorithm. As presented in Fig. 2, we obtain vertex

(P , D) as the only optimal power-delay pair of Lagrangian relaxation problem (22) with the

specific µ. The same optimal power-delay pair is obtained by the prime problem (9) with Pth = P .

To obtain optimal policies for vertices, we first formulate the MDP to minimize DF + µPF .

According to [27, Theorem 9.1.8], we obtain the optimal deterministic policy F ∗ by using value

iteration, which is presented in Algorithm 2 with ω(m+1)(q, a, s) defined as

ω(m+1)(q, a, s) =1

αq + µP (s) +

∑A

a′=0γa,a′ν

(m)(q − s+ a′, a′). (31)

Further, we have a unichain under the optimal policy F ∗ that is generated by Algorithm 2.

Then, we show the threshold-based structure for policy F ∗. Since the optimal policy is

generated by an iteration process, we present the threshold-based structure by induction on

m. In particular, we first show the existence of thresholds for deterministic policy F (m+1) =

{s(m+1)(q, a) : ∀q, a} with the assumption that ν(m)(q, a) is convex in q, i.e.,

September 2, 2019 DRAFT

Page 30: Delay-Optimal and Energy-Efficient Communications with

30

Algorithm 2 Value Iteration Algorithm for Markov Decision Processes1: m← 02: for all q and a do3: ν(0)(q, a)← arbitrary value // Initialization4: end for5: repeat6: for all q and a do // Policy Improvement:7: s(m+1)(q, a)← arg mins∈S(q){ω(m+1)(q, a, s)}8: end for9: for all q and a do // Policy Evaluation:

10: ν(m+1)(q, a)← ω(m+1)(q, a, s(m+1)(q))11: end for12: m← m+ 113: until s(m)(q, a) = s(m−1)(q, a) holds for all q and a14: s∗(q, a)← s(m)(q, a) for all q and a

ν(m)(q − 1, a) + ν(m)(q + 1, a) ≥ 2ν(m)(q, a). (32)

To this end, we only need to show that transmission rate s(m+1)(q+ 1, a) is equal to s∗ or s∗+ 1

when s(m+1)(q, a) is equal to s∗. For a given arrival rate a, we then have that transmission rate

s(m+1)(q, a) under deterministic policy F (m+1) is monotone increasing on queue length q. As a

result, we have thresholds qF (m+1)(s, a) exist, and policy F (m+1) satisfies Eq. (23). With s∗ given

as arg mins∈S(q){ω(m+1)(q, a, s)}, we show the sufficient condition of thresholds’ existence as

ω(m+1)(q + 1, a, s∗) ≤ ω(m+1)(q + 1, a, s∗ − δ) (33)

ω(m+1)(q + 1, a, s∗ + 1) ≤ ω(m+1)(q + 1, a, s∗ + 1 + δ), (34)

where δ≥0. Since s∗ minimizes ω(m+1)(q, a, s) over s∈S(q), we rewrite Eqs. (33) and (34) as

ω(m+1)(q, a, s∗−δ)+ω(m+1)(q+1, a, s∗) ≤ ω(m+1)(q, a, s∗)+ω(m+1)(q+1, a, s∗−δ), (35)

ω(m+1)(q, a, s∗+δ)+ω(m+1)(q+1, a, s∗+1) ≤ ω(m+1)(q, a, s∗)+ω(m+1)(q+1, a, s∗+1+δ), (36)

respectively. According to Eq. (31), we can expand every components of the two inequalities. As a

result, we immediately show the two inequalities following the convexity of P (s) and ν(m)(q, a).

We next show the convexity of ν(m+1)(q, a) based on the threshold-based structure of F (m+1).

In particular, the convexity of ν(m+1)(q, a) is given from the definition of ν(m+1)(q, a) as

ω(m+1)(q + 1, a, s) + ω(m+1)(q − 1, a, s) ≥ 2ω(m+1)(q, a, s∗), (37)

where we have s = arg mins∈S(q+1)

{ω(m+1)(q+1, a, s)} and s = arg mins∈S(q−1)

{ω(m+1)(q−1, a, s)}.

Further, we have that s and s∗ are selected from sets {s∗, s∗ + 1} and {s, s + 1}, respectively.

We first present a sufficient condition for Eq. (37) as

ω(m+1)(q + 1, a, s) + ω(m+1)(q − 1, a, s) ≥ ω(m+1)(q, a, s∗) + ω(m+1)(q, a, s′), (38)

where s′ is an arbitrary transmission rate belonging to set S(q), and the sufficiency is guaranteed

DRAFT September 2, 2019

Page 31: Delay-Optimal and Energy-Efficient Communications with

31

by ω(m+1)(q, a, s∗) = mins∈S(q){ω(m+1)(q, a, s)}. Then, we show the sufficient condition by

considering two cases, where s∗ is given as s or s+ 1, respectively. When s∗ = s, we set s′ = s.

By expanding every components in Eq. (38), we verify the sufficient condition based on the

convexity of ν(m)(q, a). When s∗ = s+1, we set s′ = s−1, under which the sufficient condition

holds based on the convexity of P (s). Since the initial ν(0)(q, a) is convex in q, we have that

deterministic policy F (m) satisfies the threshold-based structure expressed in Eq. (23).

We finally supplement the proof for the degenerate case, in which one vertex may locate at

a line segment generated by two vertices that adjacent with this vertex. As a result, multiple

points on the segment can minimize the Lagrangian relaxation problem. In other words, we may

not obtain the optimal policy for this vertex by Algorithm 2. In this way, we present the optimal

policy based on the sensitivity analysis of the equivalent LP problem. With a slight drift in P (s),

we have that the degenerate case can be removed in the derived Lagrangian problem under the

new P (s) and the corresponding optimal policy will be unchanged. Therefore, we can show that

the optimal policy is threshold-based by using the same consideration as above.

APPENDIX D

PROOF OF THEOREM 9Our proof starts with the observation that the optimal threshold-based policy can be presented

as a convex combination of two adjacent deterministic threshold-based policies that correspond

to two adjacent vertices on the optimal tradeoff curve L. As a result, we shall only need to

show Eq. (27) in Theorem 9 for the vertices of L. For each vertex of L, we can also obtain the

optimal policy for the system over a fading channel by using a value iteration, as shown in the

proof of Theorem 5, through which we further show Eq. (27) under condition in Eq. (28). In

particular, a sufficient condition of Eq. (27) is given as

ω(m+1)(q, a, ι+, s∗)+ω(m+1)(q, a, ι−, s∗+ δ)≤ω(m+1)(q, a, ι+, s∗+ δ)+ω(m+1)(q, a, ι−, s∗), (39)

where we denote the value function of the generalized system by ω(m+1)(q, a, ι, s), and defines∗ = mins∈S(q){ω(m+1)(q, a, ι, s)}. Further, by expanding each component, we immediately show

the sufficient condition under the condition in Eq. (28). As a result, we have a greater transmission

rate under the channel state hι+ than hι− . With the threshold-based structure of the optimal policy,

we finally show the order relation in Eq. (27), which completes the proof.

REFERENCES

[1] A. Osseiran, F. Boccardi, V. Braun, K. Kusume, P. Marsch, M. Maternia, O. Queseth, M. Schellmann, H. Schotten, H. Taoka,H. Tullberg, M. A. Uusitalo, B. Timus, and M. Fallgren, “Scenarios for 5G mobile and wireless communications: Thevision of the METIS project,” IEEE Communications Magazine, vol. 52, no. 5, pp. 26–35, May 2014.

September 2, 2019 DRAFT

Page 32: Delay-Optimal and Energy-Efficient Communications with

32

[2] M. Simsek, A. Aijaz, M. Dohler, J. Sachs, and G. Fettweis, “5G-enabled tactile internet,” IEEE Journal on Selected Areasin Communications, vol. 34, no. 3, pp. 460–473, March 2016.

[3] S. Buzzi, C. I, T. E. Klein, H. V. Poor, C. Yang, and A. Zappone, “A survey of energy-efficient techniques for 5G networksand challenges ahead,” IEEE Journal on Selected Areas in Communications, vol. 34, no. 4, pp. 697–709, April 2016.

[4] R. Q. Hu and Y. Qian, “An energy efficient and spectrum efficient wireless heterogeneous network framework for 5Gsystems,” IEEE Communications Magazine, vol. 52, no. 5, pp. 94–101, May 2014.

[5] C. She, C. Yang, and T. Q. S. Quek, “Radio resource management for ultra-reliable and low-latency communications,”IEEE Communications Magazine, vol. 55, no. 6, pp. 72–78, June 2017.

[6] Q. Liu, S. Zhou, and G. B. Giannakis, “Cross-Layer combining of adaptive Modulation and coding with truncated ARQover wireless links,” IEEE Transactions on Wireless Communications, vol. 3, no. 5, pp. 1746–1755, Sep. 2004.

[7] D. V. Djonin and V. Krishnamurthy, “MIMO transmission control in fading channels-a constrained Markov decision processformulation with monotone randomized policies,” IEEE Transactions on Signal Processing, vol. 55, no. 10, pp. 5069–5083,2007.

[8] A. E. Gamal, C. Nair, B. Prabhakar, E. Uysal-Biyikoglu, and S. Zahedi, “Energy-efficient scheduling of packet transmissionsover wireless networks,” in Proc. IEEE International Conference on Computer Communications (INFOCOM), June 2002,pp. 1773–1782.

[9] U. C. Kozat, I. Koutsopoulos, and L. Tassiulas, “A framework for cross-layer design of energy-efficient communication withQoS provisioning in multi-hop wireless networks,” in Proc. IEEE International Conference on Computer Communications(INFOCOM), March 2004, pp. 1446–1456.

[10] Z. Hou, C. She, Y. Li, T. Q. S. Quek, and B. Vucetic, “Burstiness aware bandwidth reservation for ultra-reliable andlow-latency communications (URLLC) in tactile internet,” IEEE Journal on Selected Areas in Communications, vol. 36,no. 11, pp. 2401–2410, Nov. 2018.

[11] J. Hu, L. Yang, and L. Hanzo, “Energy-efficient cross-layer design of wireless mesh networks for content sharing in onlinesocial networks,” IEEE Transactions on Vehicular Technology, vol. 66, no. 9, pp. 8495–8509, Sep. 2017.

[12] B. Collins and R. L. Cruz, “Transmission policies for time varying channels with average delay constraints,” in Proc.Allerton Conference on Communication, Control, and Computing (Allerton), 1999, pp. 709–717.

[13] R. A. Berry and R. G. Gallager, “Communication over fading channels with delay constraints,” IEEE Transactions onInformation Theory, vol. 48, no. 5, pp. 1135–1149, 2002.

[14] R. Berry, “Optimal power-delay tradeoffs in fading channels–small-delay asymptotics,” IEEE Transactions on InformationTheory, vol. 59, no. 6, pp. 3939–3952, June 2013.

[15] D. Rajan, A. Sabharwal, and B. Aazhang, “Delay-bounded packet scheduling of bursty traffic over wireless channels,”IEEE Transactions on Information Theory, vol. 50, no. 1, pp. 125–144, 2004.

[16] W. Chen, Z. Cao, and K. B. Letaief, “Optimal delay-power tradeoff in wireless transmission with fixed modulation,” inProc. IEEE International Workshop on Cross Layer Design (IWCLD), 2007, pp. 60–64.

[17] M. Goyal, A. Kumar, and V. Sharma, “Power constrained and delay optimal policies for scheduling transmission over afading channel,” in Proc. IEEE International Conference on Computer Communications (INFOCOM), 2003, pp. 311–320.

[18] B. Ata, “Dynamic power control in a wireless static channel subject to a quality-of-service constraint,” Operations Research,vol. 53, no. 5, pp. 842–851, 2005.

[19] M. H. Ngo and V. Krishnamurthy, “Monotonicity of constrained optimal transmission policies in correlated fading channelswith ARQ,” IEEE Transactions on Signal Processing, vol. 58, no. 1, pp. 438–451, 2010.

[20] L. Liu, A. Chattopadhyay, and U. Mitra, “On solving MDPs with large state space: Exploitation of policy structures andspectral properties,” IEEE Transactions on Communications, Early Access, 2019.

[21] N. Sharma, N. Mastronarde, and J. Chakareski, “Accelerated structure-aware reinforcement learning for delay-sensitiveenergy harvesting wireless sensors,” CoRR, vol. abs/1807.08315, 2018. [Online]. Available: http://arxiv.org/abs/1807.08315

[22] X. Chen, W. Chen, J. Lee, and N. B. Shroff, “Delay-optimal buffer-aware scheduling with adaptive transmission,” IEEETransactions on Communications, vol. 65, no. 7, pp. 2917–2930, July 2017.

[23] M. Wang, J. Liu, W. Chen, and A. Ephremides, “Joint queue-aware and channel-aware delay optimal scheduling ofarbitrarily bursty traffic over multi-state time-varying channels,” IEEE Transactions on Communications, vol. 67, no. 1,pp. 503–517, Jan 2019.

[24] J. Liu, W. Chen, and K. B. Letaief, “Delay optimal scheduling for ARQ-aided power-constrained packet transmission overmulti-state fading channels,” IEEE Transactions on Wireless Communications, vol. 16, no. 11, pp. 7123–7137, Nov. 2017.

[25] X. Chen, W. Chen, J. Lee, and N. B. Shroff, “Delay-optimal probabilistic scheduling in green communications with arbitraryarrival and adaptive transmission,” in Proc. IEEE International Conference on Communications (ICC), May 2017, pp. 1–6.

[26] V. Paxson and S. Floyd, “Wide area traffic: The failure of poisson modeling,” IEEE/ACM Transactions on Networking,

DRAFT September 2, 2019

Page 33: Delay-Optimal and Energy-Efficient Communications with

33

vol. 3, no. 3, pp. 226–244, June 1995.[27] M. L. Puterman, Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, 2014.[28] E. P. Kao, An introduction to stochastic processes. Cengage Learning, 1997.[29] E. Altman, Constrained Markov decision processes. CRC Press, 1999.

September 2, 2019 DRAFT