delay-optimal and energy-efficient communications with
TRANSCRIPT
1
Delay-Optimal and Energy-Efficient
Communications with Markovian ArrivalsXiaoyu Zhao, Wei Chen, Senior Member, IEEE, Joohyun Lee, Member, IEEE,
and Ness B. Shroff, Fellow, IEEE
Abstract
In this paper, delay-optimal and energy efficient communication is studied for a single link under
Markov random arrivals. We present the optimal tradeoff between delay and power over Additive White
Gaussian Noise (AWGN) channels and extend the optimal tradeoff for block fading channels. Under
time-correlated traffic arrivals, we develop a cross-layer solution that jointly considers the arrival rate, the
queue length, and the channel state in order to minimize the average delay subject to a power constraint.
For this purpose, we formulate the average delay and power problem as a Constrained Markov Decision
Process (CMDP). Based on steady-state analysis for the CMDP, a Linear Programming (LP) problem
is formulated to obtain the optimal delay-power tradeoff. We further show the optimal transmission
strategy using a Lagrangian relaxation technique. Specifically, the optimal adaptive transmission is
shown to have a threshold type of structure, where the thresholds on the queue length are presented for
different transmission rates under the given arrival rates and channel states. By exploiting the result,
we develop a threshold-based algorithm to efficiently obtain the optimal delay-power tradeoff. We show
how a trajectory-sampling version of the proposed algorithm can be developed without prior need of
arrival statistics.
Index Terms
Cross-layer design, Markovian Arrivals, Queuing, Markov Decision Process, Energy efficiency,
Average delay, Delay-power tradeoff, Linear programming.
X. Zhao and W. Chen are with the Department of Electronic Engineering and Beijing National Research Center forInformation Science and Technology, Tsinghua University. E-mail: [email protected], [email protected].
J. Lee is with the Division of Electrical Engineering, Hanyang University. E-mail: [email protected] B. Shroff holds a joint appointment in both the Department of ECE and the Department of CSE at The Ohio State
University. E-mail: [email protected] research was supported in part by the National Natural Science Foundation of China under Grant No. 61671269, the
Beijing Natural Science Foundation under Grant No. 4191001, and the National Program for Special Support for EminentProfessionals of China (10,000-Talent Program).
September 2, 2019 DRAFT
arX
iv:1
908.
1179
7v1
[cs
.IT
] 3
0 A
ug 2
019
2
I. INTRODUCTION
There is increasing interest in developing strategies to achieve low-latency transmissions in
a wide variety of applications, e.g., in mission critical applications for the Internet of Things
(IoT), or Ultra Reliable and Low Latency Communications (URLLC) in Fifth-Generation (5G)
systems [1, 2]. At the same time, there is also a push towards developing strategies to make
devices and networks more energy efficient [3, 4]. Thus, in our work, we will aim to understand
the fundamental tradeoff between delay and energy. More specifically, we will develop a cross-
layer solution that minimizes the delay for a given power constraint.
Cross-layer design has been used as a potential enabler to satisfy the requirements of low
latency [5]. In [6], a tradeoff between delay and throughput was established based on a cross-
layer design that combines adaptive modulation and coding with a truncated Automatic Repeat
reQuest (ARQ). In [7], the authors proposed a cross-layer power and rate allocation control to
minimize power consumption with a delay constraint in Multiple-Input Multiple-Output (MIMO).
Moreover, energy-efficient cross-layer designs are also studied for packet transmission in wireless
networks. In [8], a cross-layer online algorithm was proposed to obtain a more energy efficient
transmission over wireless networks. For multi-hop wireless networks, a cross-layer framework
was also presented to jointly consider power control and scheduling in [9]. With the stringent
requirements in 5G, the cross-layer designs have been studied to achieve the low latency and
energy efficient transmissions in multiple scenarios, such as tactile Internet [10] and wireless
mesh network [11].
In this work, we take a cross-layer design approach to analytically establish the power-delay
tradeoff. To jointly optimize the delay and power, the design problem can be formulated using
a Markov Decision Process (MDP). In [12], Collins and Cruz considered cross-layer scheduling
of an adaptive transmitter over a two-state fading channel. In their work, the authors established
a tradeoff between the average delay and power consumption based on Dynamic Programming
(DP), where the only objective of the MDP is formulated as the weighted sum of average
power and delay. Follow-up papers [13–15] extended this study in various directions with the
DP formulation in [12] employed. In [13], Berry and Gallager formulated the optimal delay-
power tradeoff curve for a multi-state block fading channel, where the fixed-length coding
and variable-length coding are discussed. With the DP formulation, the authors have presented
all the Pareto optimal power-delay operating points and studied the optimal tradeoff in the
DRAFT September 2, 2019
3
regime of asymptotically large delays. For the regime of asymptotically small delays, Berry has
further presented the behavior of the optimal delay-power tradeoff in [14]. Moreover, a single-
parameter scheduler, labeled log-linear scheduler, was proposed over a block fading channel in
[15] with near-optimal performance. In our previous work [16], the optimal delay-power tradeoff
was attained by formulating a Constrained MDP. With a probabilistic scheduling framework
employed, we converted the CDMP problem as an LP problem. By solving the derived LP
problem, we obtain an arbitrary power-delay operating point on the optimal tradeoff curve.
We further focus on the structural properties of the optimal transmission policies in the cross-
layer design. By exploiting structural properties of the optimal policy, a substantial reduction
in computational complexity can be obtained for finding the optimal delay-power tradeoff.
For example, in [17], the structure of the optimal policy were investigated for an adaptive
transmitter over the fading channel with interference. The authors of [18] further developed an
explicit formula for the optimal transmission rate, through which the optimal rate of the single
link over a static channel is expressed as a increasing function of queue length. In [19], the
optimal scheduling was presented in correlated fading channel with the ARQ protocol employed.
The monotonicity of the optimal scheduling was also shown by presenting the optimal rate as
an increasing function of the buffer occupancy. Moreover, by using the policy structures, the
complexity of point-to-point network transmission control in [20] was effectively reduced with
the tools from graph signal processing employed for large state space. In [21], based on the
structural properties, a novel accelerated reinforcement learning (RL) algorithm was formulated
for an energy-harvesting wireless sensor with latency-sensitive data. Based on the formulated
LP problem in our previous work [16], we also shown a threshold-based structure for the
optimal transmission policies. For the optimal threshold-based policy, we further give a detailed
description by showing that the transmission rates are selected deterministically for all the queue
lengths except a particular threshold. The work about the optimal threshold-based policies was
also extended to the communication systems with adaptive transmission [22], arbitrary burstiness
random arrival [23], and multi-state fading channels [24], respectively.
In this work, we generalize our previous work in [25] to show delay optimality with Markov
arrivals. Our generalization is motivated by the work of [26], where network arrivals are shown
to exhibit time-correlations. By modeling the user’s arrival as a Markov chain, we first present a
cross-layer design to determine the transmission rate. In particular, we determine the transmission
September 2, 2019 DRAFT
4
rates by its probability distribution, which is obtained for the current queue length, arrival rate,
and the channel state. With the degenerated probability distribution employed, we can present
a deterministic rate selection as the special case for probabilistic transmission policies. Under
the probabilistic cross-layer design, we then formulate the adaptive transmission as a CMDP.
In this way, we next show delay optimality for AWGN channels, where the impacts of the
Markovian arrivals is presented for the optimal delay-power tradeoff. Furthermore, the optimal
tradeoff between the delay and power consumption is extended to block fading channels.
For AWGN channels, we first convert the formulated CMDP as an equivalent LP problem.
By this means, we construct the optimal delay-power tradeoff to minimize the average delay
under an average power constraint. We further show the optimal tradeoff by using a curve
that consists of all the optimal power-delay pairs for different power constraints. We refer the
curve as the optimal delay-power tradeoff curve, and show the typical geometric properties of
it under Markovian arrivals, i.e., the tradeoff curve is piecewise linear, decreasing, and convex.
By jointly exploiting the properties of both the optimal tradeoff curve and the corresponding
optimal policies, we then show that the optimal average delay is generated by a threshold-
based optimal adaptive transmission policy. Based on the threshold-based structure, we finally
develop an algorithm to efficiently determine the optimal transmission strategies, through which
the optimal delay-power tradeoff is presented. In practice, we show that an online version of the
threshold-based algorithm can be also exploited without any need for random arrival statistics.
Moreover, we extend the optimal delay-power tradeoff by considering block fading channels.
With a block fading channel employed, we can obtain the equivalent LP problem for the adaptive
transmitter that is derived based on the formulated CMDP. We then obtain a similar threshold-
based structure on the queue length for fading channels. As a result, with the current arrival
rate and channel state given, we can particularly attain the corresponding transmission rate by
comparing the current queue length with the thresholds for different transmission rates.
The rest of this paper is organized as follows. In Section II, the system model is presented
as a CMDP. By formulating the CMDP as an LP problem, Section III investigates the optimal
delay-power tradeoff over AWGN channels. Then, the corresponding optimal transmission policy
is presented in Section IV under the threshold-based structure. In Section V, we further extent
the optimal delay-power tradeoff over a block fading channel. Finally, numerical results and
conclusions are given in Sections VI and VII, respectively.
DRAFT September 2, 2019
5
TxTemporal Dependent
Random Arrival
Cross-Layer
Scheduling
Channel
Encoder
Channel State Information
Tra
nsm
issi
on P
ow
er
Transmission rate10 42 3 65
Additive
Gaussian Noise
Adaptive Transmission
Fig. 1. System Model
II. SYSTEM MODEL
In this paper, we focus on a single link of an adaptive transmitter that serves traffic arriving
according to a general Markovian process. As shown in Fig. 1, the system is assumed to be
time-slotted. The data packets arrive at the beginning of each timeslot according to a stationary
and ergodic Markov chain that has finite states. The state of the Markov chain corresponds to the
number of packets that arrive in timeslot n, and is denoted by a[n], where the maximum value
of a[n] is defined as A, i.e., a[n] ∈ {0, 1, · · · , A}. Given that a[n] packets arrive in timeslot n,
a[n+ 1] is characterized by the transition probability γa,a′ that is defined as
γa,a′ = Pr{a[n+ 1] = a′ | a[n] = a}, (1)
where a and a′ belong to set {0, 1, · · · , A}. In other words, the probability that a[n+ 1] = a′ is
shown as γa,a′ given that a[n] = a. Note that γa,a′ ≥ 0 and∑A
a′=0 γa,a′ = 1. With the transition
probabilities γa,a′ , α, the expected number of arrivals in a timeslot, is given by
α =A∑a=0
aφa, (2)
where φa denotes the steady-state probability of a arrivals in a timeslot.
Arriving packets enter a buffer of size Q. At each time n, the queue length q[n] belongs to
set {0, 1, · · · , Q}, and evolves as
q[n+ 1] = min{max{q[n]− s[n], 0}+ a[n+ 1], Q}, (3)
where s[n] denotes the number of packets that are transmitted in timeslot n.
Due to the limited throughput at the transmitter, the number of packets that can be trans-
mitted in each timeslot is upper bounded by S. The transmission rate s[n] belongs to the set
{0, 1, · · · , S}. We then assume that the maximum transmission rate is greater than or equal to
the maximum data arrival rate, i.e., S ≥ A. As a result, we provide the stability of the queue
system under an arbitrary Markov arrival process, where the average arrival rates can range from
0 to A under different arrival processes. Further, to avoid underflow and overflow of the buffer,
s[n] needs to satisfy 0 ≤ q[n] − s[n] ≤ Q − A. In other words, for each given queue length q,
we have q − Q + A ≤ s ≤ q. Therefore, with a given queue length q, we define the feasible
September 2, 2019 DRAFT
6
region S(q) of the transmission rate as {s|max{q −Q+ A, 0} ≤ s ≤ min{q, S}} 1.
To transmit s[n] packets in timeslot n, we determine the corresponding power consumption
for the adaptive transmitter with the available Channel State Information (CSI). In particular, we
present the channel state h[n] of timeslot n by using the current channel coefficient of the fading
channel. As a result, we have that h[n] belongs to the field of complex numbers C. With the
channel state h[n] given as h ∈ C, we express the power consumption by function Ph(s) for each
transmission rate s, where we define function Ph(s) = 0 for each h. For typical communications
scenarios, we provide a greater transmission rate by a greater power consumption, Meanwhile,
the power efficiency will degrade with the increasing transmission rate [8]. Therefore, we focus
on a function Ph(s) that is monotonically increasing and convex in s for each given h. With Ph(s)
given for channel state h[n], the power consumption in timeslot n is defined as ρ[n] = Ph[n](s[n]).
We further adopt an L-state block fading channel model, through which the channel coefficient
of the fading channel stays invariant during each timeslot and is quantized into L states, i.e.,
h1, h2, · · · , hL. In this way, we have that channel state h[n] belongs to set {h1, · · · , hL}, through
which we shall only consider the power functions Phι(s), ι = 1, · · · , L for the block fading
channel. More specifically, the channel states are satisfy 0 < |h1| < |h2| < · · · < |hL| < +∞.
In other words, we will obtain a better channel condition under a channel state hι with a greater
index l, 1 ≤ ι ≤ L. Moreover, we consider that the channel state h[n] in each timeslot n follows
an independent and identically distributed (i.i.d.) process. As a result, we defined the probability
of that channel state h[n] for each timeslot n is equal to hl as
Pr{h[n] = hι} = ηι, (4)where we have
∑Lι=1 ηι = 1.
Under the cross-layer adaptive transmission policy, the transmission rate s[n] is determined by
the current queue length q[n], the arrival rate a[n], as well as the channel state h[n]. With q[n],
s[n], and h[n] presented as q, a, hι, respectively, we define the probability f sq,a,ι that transmission
rate s[n] is equal to s as
f sq,a,ι = Pr{s[n] = s | q[n] = q, a[n] = a, h[n] = hι}, (5)
where we have∑S
s=0 fsq,a,ι = 1, and f sq,a,ι = 0 for each s /∈ S(q). Based on the probability
f sq,a, the cross-layer adaptive transmission policy F is expressed by {f sq,a,ι : 0 ≤ q ≤ Q, 0 ≤
a ≤ A, 0 ≤ ι ≤ L, 0 ≤ s ≤ S}. We first present the deterministic transmission policies
1To avoid underflow and overflow, we also need to satisfy S ≥ A, which is straightforwardly obtained by the existence ofthe feasible region S(q) with q setting as Q.
DRAFT September 2, 2019
7
using a degenerate probability distribution on the transmission rate for each given queue length,
arrival rate and channel state. Then, a deterministic policy FD is equivalently expressed as
{sFD(q, a, ι) : 0 ≤ q ≤ Q, 0 ≤ a ≤ A, 0 ≤ ι ≤ L}, where we have sFD(q, a, ι) =∑
s∈S(q) sfsq,a,ι.
The set of deterministic policies are given by FD $ F , where F is the set of all policies. For
random arrivals that are temporally correlated, the same probabilistic strategy is also constructed
by determining the probabilities of the transmission rate given the current queue length and
channel state with the historical information of the arrival rates, as presented in Eq. (5).
By using the probabilistic transmission policies, we present a Markov Decision Process
(MDP), where we express the system state as the triple (q[n], a[n], h[n]). With system state
(q[n], a[n], h[n]) at timeslot n given as (q, a, hι), each adaptive transmission policy F ∈ F can
determine transmission rate s[n] based on the probability distribution {f sq,a,ι : 0 ≤ s ≤ S}. Under
the given transmission rate s[n], we next determine the system state (q[n+ 1], a[n+ 1], h[n+ 1])
in timeslot (n+ 1) following the processes of Markov arrival and channel fading. In particular,
the transition probability for the next timeslot is represented as
Pr{q[n+ 1] = q′,a[n+ 1] = a′, h[n+ 1] = hι′ | (6)
q[n] = q, a[n] = a, h[n] = hι, s[n] = s} = γa,a′ηι′1{s=q+a′−q′},
where we have q′ ∈ {0, 1, · · · , Q}, a′ ∈ {0, 1, · · · , A}, and ι′ ∈ {0, 1, · · · , L}. With the system
state employed, the MDP can continually evolve under the given initial queue length q0, arrival
rate a0, and channel state hι0 , where we define q0 = q[0], a0 = a[0], and hι0 = h[0].
With the formulated MDP, the long-term average power consumption and delay are also
formulated based on the power consumption ρ[n] = Ph[n](s[n]) and the queue length q[n] in
each timeslot, respectively. First, the average power consumption PF can be presented as
PF = limN→∞
1
NEFq0,a0,hι0
{N∑n=1
ρ[n]
}, (7)
where EFq0,a0,hι0{·} is the expectation with respect to policy F as well as the initial system state
(q0, a0, hι0). The average delay DF is given from Little’s Law as
DF = limN→∞
1
NEFq0,a0,hι0
{1
α
N∑n=1
q[n]
}, (8)
where recall that α is defined as the expected number of packets that arrive in each timeslot.
Based on the average power consumption and delay in Eqs. (7) and (8), we can formulate the
optimal delay-power tradeoff under Markov random arrivals. Intuitively, a higher transmission
rate can reduce the packets’ delay, but degrades the power efficiency because Ph(s) is convex
on s for each channel state h. For a lower transmission rate, the reverse holds true, i.e., we
September 2, 2019 DRAFT
8
have a greater power efficiency but also a larger transmission delay. Therefore, a tradeoff exists
between the delay and power consumption. To obtain the optimal tradeoff, we formulate a
cross-layer optimization problem as a Constrained Markov Decision Process (CMDP) under the
probabilistic transmission strategy. In the CMDP, we aim at minimizing the average delay subject
to the constraint on the average power. In particular, the optimization problem is given as
minF∈F
DF (9a)
s.t. PF ≤ Pth. (9b)
By solving this CMDP under different power constraint Pth, we can show the optimal delay-
power tradeoff under Markov arrivals. As a result, we obtain the minimized average delay DF ∗
and optimal policy F ∗ for each given Pth.
To particularly show the impact of Markovian arrivals, we first focus on the optimal delay-
power tradeoff for an AWGN channel in Sections III and IV. Then, we extend the optimal
tradeoff by considering the fading channel in Section V. More specifically, we present the AWGN
channel by setting L = 1 and |h1| = 1. Under the only channel state, we further simplify the
presentations of the only power function and the adaptive transmission policy as P (s) and
F = {f sq,a : 0 ≤ q ≤ Q, 0 ≤ a ≤ A, 0 ≤ s ≤ S} in the following two sections, respectively. As
a result, a degenerated CMDP is formulated with the system state as (q[n], a[n]).
III. OPTIMAL DELAY-POWER TRADEOFF FOR AWGN CHANNELS
In this section, we focus on the optimal delay-power tradeoff for AWGN channels, which is
described by the cross-layer optimization problem (9). We first show that the optimal delay-power
tradeoff can be formulated by an equivalent LP problem based on the steady-state analysis for
a single user. With the LP problem being solved over the set of all the obtainable power-delay
pairs, we then generate an optimal delay-power tradeoff curve for AWGN channels, under which
minimized average delays are obtained for different power constraints. Further, we show some
interesting geometric properties of the optimal tradeoff curve. Based on these geometric prop-
erties, we finally demonstrate that the same optimal tradeoff is obtained by the optimal policies
with an arbitrary initial system state. In other words, the optimal policies over AWGN channels
have the same average delay and power consumptions regardless of the initial system states.
A. The equivalent LP problemFirst, we show the optimal delay-power tradeoff by expressing the cross-layer optimization
problem (9) as an LP problem. In particular, we formulate the LP problem based on a Markov
Reward Process (MRP) that is generated by the CMDP with the transmission policy given. For
DRAFT September 2, 2019
9
a given policy F , we first describe the resulting MRP to analytically present the average delay
and power. In the MRP, λ(q,a),(q′,a′) denotes the transition probability from (q, a) to (q′, a′). Based
on the evolution of q[n] and a[n] in Eq. (3), transition probability λ(q,a),(q′,a′) is presented as
λ(q,a),(q′,a′) = γa,a′fq−q′+a′q,a 1{max{q−S,0}≤q′−a′≤min{q,Q−A}}. (10)
With probability λ(q,a),(q′,a′), we then show steady-state probabilities by formulating the balance
equations. Let πF (q, a) denote the steady-state probability. We present the balance equations asA∑a=0
min{q′−a′+S,Q}∑q=max{q′−a′,0}
πF (q, a)λ(q,a),(q′,a′) = πF (q′, a′), (11)
where we have∑Q
q=0
∑Aa=0 πF (q, a) = 1. More specifically, πF (q, a) indicates how often the
queue length is equal to q and the arrival rate is a on average in the long run. Considering the
evolution of q[n] in Eq. (3) with s[n] ∈ S(q[n]), we have q[n+1]−a[n+1] = q[n]−s[n] ≤ Q−A
for each timeslot. Therefore, it is straightforward that steady-state probability πF (q, a) is equal
to 0 if q−a > Q−A. By solving the balance equations for all queue lengths q and arrival rates
a, we can obtain the steady-state probability distribution πF that is defined as {πF (q, a) : ∀q, a}.
The balance equations given by Eq. (11) can be expressed as the following matrix formΛFπF = πF , (12)
where πF is formulated as vector with probabilities πF (q, a) as elements. In particular, we can
present πF (q, a) as the (a× (Q+ 1) + q + 1)th element in vector πF . Based on the permutation
of πF (q, a) in vector πF , the stochastic matrix ΛF is also defined with λ(q,a),(q′,a′) as the elements.
The location of λ(q,a),(q′,a′) in ΛF is determined by the permutation of πF (q, a) and πF (q′, a′)
in vector πF . In other words, when πF (q, a) and πF (q′, a′) are the ith and jth elements in πF ,
respectively, we have λ(q,a),(q′,a′) is located at the ith column and jth row in matrix ΛF .
By using the steady-state probability, we next present the average power consumption and
delay. Given the steady-state probability πF (q, a), we express the average power consumption as
PF =
Q∑q=0
A∑a=0
S∑s=0
P (s)πF (q, a)f sq,a. (13)
Similarly, the average delay is given as
DF =1
α
Q∑q=0
A∑a=0
qπF (q, a). (14)
Then, we demonstrate the optimal delay-power tradeoff under the cross-layer transmission
policies. As shown in Eqs. (13) and (14), the average power consumption and delay are presented
based on the steady-state probability πF (q, a) with policy F given. Considering the steady-state
probabilities that satisfy the balance equations in Eq. (12), we can reveal the optimal delay-power
September 2, 2019 DRAFT
10
tradeoff given by problem (9) by the solution in the following problem for each value of Pth.
min{πF ,F }
1
α
Q∑q=0
A∑a=0
qπF (q, a) (15a)
s.t.Q∑q=0
A∑a=0
S∑s=0
P (s)πF (q, a)f sq,a ≤ Pth (15b)
ΛFπF = πF (15c)Q∑q=0
A∑a=0
πF (q, a) = 1 (15d)
S∑s=0
f sq,a = 1 ∀ q, a (15e)
πF (q, a) ≥ 0, f sq,a ≥ 0 ∀ q, a, s, (15f)
where the optimal delay for the problem is generated by the optimal transmission policy F ∗
with the corresponding steady-state probability π∗F ∗(q, a).
With the cross-layer optimization problem (15) given, we finally convert problem (15) to an
equivalent LP problem, through which the optimal average delay is obtained for each given
power constraint Pth. To formulate the LP problem, we use the product of πF (q, a) and f sq,a as
the optimization variables. Defining xsq,a as πF (q, a)f sq,a, we can present the optimal delay-power
tradeoff by using that equivalent LP problem that is shown in the following theorem.
Theorem 1. The problem (15) is equivalent to the following linear programming problem.
min{xsq,a}
1
α
Q∑q=0
A∑a=0
S∑s=0
qxsq,a (16a)
s.t.Q∑q=0
A∑a=0
S∑s=0
P (s)xsq,a ≤ Pth (16b)
min{q′−a′+S,Q}∑q=max{q′−a′,0}
A∑a=0
S∑s=0
γa,a′xsq,a1{s=q+a′−q′} =
S∑s=0
xsq′,a′
∀ 0 ≤ q′ ≤ Q, 0 ≤ a′ ≤ A (16c)Q∑q=0
A∑a=0
S∑s=0
xsq,a = 1 (16d)
xsq,a ≥ 0 ∀ 0 ≤ q ≤ Q, 0 ≤ a ≤ A, 0 ≤ s ≤ S. (16e)Proof: To show the equivalence of problems (15) and (16), we divide the proof into two
parts. We first show that problem (15) is converted into LP problem (16) by replacing πF (q, a)f sq,a
as xsq,a. For each feasible solution πF and F of problem (15), we can generate a feasible solution
for problem (16), i.e., {xsq,a = πF (q, a)f sq,a}. By using the corresponding {xsq,a} in problem (16),
we also obtain the same average power consumption and delay as πF and F in problem (15).
DRAFT September 2, 2019
11
For each feasible solution {xsq,a} of problem (16), we then construct the corresponding policy
F by presenting probability f sq,a as
f sq,a =
xsq,a
πF (q,a), πF (q, a) > 0,
1{s=min{q,S}}, πF (q, a) = 0,(17)
where steady-state probability πF (q, a) under policy F is expressed as πF (q, a) =∑S
s=0 xsq,a. By
substituting the attained f sq,a and πF (q, a) into problem (15), we can check that the constructed
solution satisfies the balance equations in Eq. (15c) with the average delay and power consump-
tion remain unchanged, through which we complete the proof.
With the equivalent LP problem (16) formulated, we demonstrate the optimal delay-power
tradeoff for AWGN channels. By solving the derived LP problem, we can particularly obtain
the minimum average delay with the optimal policy F ∗ given by Eq. (17).
B. The Optimal Delay-Power Tradeoff CurveIn this subsection, we attain the optimal delay-power tradeoff channel by solving the LP
problem (16) that is formulated in Theorem 1 for AWGN channels. By solving the LP problem
over a power-delay plane that contains all the obtainable power and delay pairs under the policies,
we present the optimal delay-power tradeoff curve. In this way, the minimized average delay
can be obtained for the single link under a given average power constraint.
To obtain the optimal delay-power tradeoff curve, we first solve LP problem (16) by consider-
ing the set of all obtainable average power-delay pairs. In particular, a power-delay plane is first
formulated to contain all the average power-delay pairs (PF , DF ) that are generated by the cross-
layer transmission policies F ∈ F . However, for a given transmission policy F = {f sq,a}, we
can only present the corresponding average power-delay pair (PF , DF ) by f sq,a with the assistant
of πF (q, a) as Eqs. (13) and (14), respectively. Considering we determine πF (q, a) under policy
F based on the a series of balance equations in Eq. (11), we can hardly show the power-delay
pair (PF , DF ) as a analytical expression of f sq,a. In this way, we generate the power-delay plane
based on the optimization variables xsq,a in LP problem (16), which can be referred to as the
state-action frequency in MDP [27, Section 8.9]. With the obtainable state-action frequencies
{xsq,a} given, we can analytically present the average power-delay pair by the objective function
and power constraint in LP problem (16). The corresponding policy F is also obtained following
the bijective map presented in Theorem 1.
Thus, we first express the set that consists of all the obtainable state-action frequencies {xsq,a}
under the transmission policies as
G ={{xsq,a : ∀ q, a, s} | Eqs. (16c), (16d), and (16e)
}. (18)
September 2, 2019 DRAFT
12
According to the linear functions in objective function (16a) and power constraint (16b), we
then present the average delay and power, respectively, for the feasible {xsq,a}. As a result, the
power-delay plane is generated to contain all the obtainable average power-delay pairs.
We then express the feasible state-action frequencies {xsq,a} as a ((Q + 1) × (A + 1) ×
(S + 1))-dimension vector. We can straightforwardly demonstrate set G as a polyhedron in a
high dimensional Euclidean space. The obtainable power-delay pairs are next presented as the
projection of the state-action frequencies on the power-delay plane. In other words, the set R
of all the obtainable average power-delay pairs is defined as
R =
{(P,D) | ∀{xsq,a} ∈ G, P =
Q∑q=0
A∑a=0
S∑s=0
P (s)xsq,a, D =1
α
Q∑q=0
A∑a=0
S∑s=0
qxsq,a
}, (19)
where set R is a polyhedron on the power-delay plane.
With definition of set R in Eq. (19), we rewrite the LP problem (16) over the power-delay
plane. In particular, we havemin
(P,D)∈RD (20a)
s.t. P ≤ Pth. (20b)
In this way, we demonstrate the optimal delay-power tradeoff described in cross-layer optimiza-
tion problem (9) over the power-delay plane. With the derived LP problem in Eq. (20), we
obtain the optimal power-delay pair (P ∗, D∗) by searching the power-delay pair that minimizes
the delay in set R∩ {(P,D) | P ≤ Pth}.
We finally formulate the optimal delay-power tradeoff curve for AWGN channels asL = {(P ∗, D∗) ∈ R | ∀(P , D) ∈ R, either P ∗ ≤ P or D∗ ≤ D}, (21)
which consists of all the optimal delay-power pairs under different power constraints. For each
optimal power-delay pair (P ∗, D∗) in problem (20), (P ∗, D∗) belongs to L because we have
that D∗ ≤ D if (P , D) ∈ R ∩ {P ≤ Pth}, and P ∗ ≤ Pth ≤ P if (P , D) ∈ R ∩ {P ≥ Pth}.
Meanwhile, each element (P ∗, D∗) in set L can minimize the average delay in problem (20)
with power constraint Pth as P ∗. Further, the geometric properties of the optimal delay-power
tradeoff curve are then presented in the following theorem.
Theorem 2. The optimal tradeoff curve L is piecewise linear, decreasing, and convex.
Proof: The proof of the geometric properties follows directly from [22, Corollary 3]. We
include the main idea of it for completeness. With the optimal tradeoff curve L expressed as
Eq. (21), we first show that L is convex and decreasing according to the definitions of convex
and decreasing function, respectively. By showing L as a part of bound of the polyhedron R,
we next present L as a piecewise linear curve.
DRAFT September 2, 2019
13
In this way, we present the optimal delay-power tradeoff by solving the equivalent LP problem
on the power-delay plane. By employing the state-action frequencies, we analytically present the
optimal delay-power tradeoff curve for AWGN channels, under which the minimized average
delay is attained for the adaptive transmitter with a given power constraint.
C. The Optimal Delay-Power Tradeoff with an Arbitrary Initial State
In this subsection, we show that the same optimal delay-power tradeoff is obtained for AWGN
channels by the optimal adaptive transmission policies under an arbitrary initial state. With
different initial queue lengths and arrival rates, we may have different average delays and powers
under a given transmission policy because different steady-state distributions can be obtained
with multiple closed classes existing in the corresponding MRP [28, Section 4.3]. However, for
the optimal transmission policies of LP problem (16), we show that the same average delay and
power consumption is obtained for AWGN channels with an arbitrary initial state.
For each power-delay pair on curve L, the corresponding optimal adaptive transmission policy
is first formulated by solving LP problem (16). In particular, with the optimal solution {x∗sq,a}
of LP problem (16), we obtain the optimal policy by determining f sq,a according to Eq. (17).
Then, we demonstrate that the optimal adaptive transmission policy can obtain the same
optimal tradeoff under an arbitrary initial state. In other words, we show that the performance of
the optimal policy on the average delay and power consumption is independent with an initial
state. For this purpose, we only need to show that the Markov chain induced by the MRP has
only one closed communication class under an optimal policy. These Markov chains are referred
to as unichain. First, we present the structure of the Markov chains for the vertices of the optimal
delay-power tradeoff curve L in the following theorem.
Theorem 3. The optimal delay-power tradeoff curve L satisfies that
1) All vertices of L can be obtained by adaptive transmission policies with unichains;
2) All vertices of L can be obtained by deterministic transmission policies;
3) The policies corresponding to two adjacent vertices of L have different transmission rates
only on one state.
Proof: See Appendix A.
The vertices of the optimal tradeoff curve L can be generated by the optimal deterministic
transmission policies, under which the Markov chains have only one closed class. As a result,
for all the vertices of curve L, the same optimal delay-power tradeoff can be presented by the
corresponding optimal transmission policies for any arbitrary initial state.
September 2, 2019 DRAFT
14
We next show that the same minimized average delay can be obtained under an arbitrary initial
state for the other power-delay pairs on L. Since L is piecewise linear, we first consider the
optimal power-delay points by dividing the curve into several segments with a pair of adjacent
vertices as endpoints. By using the two adaptive transmission policies for the pair of adjacent
vertices, we then construct the optimal policies with unichains for each segment of curve L. In
particular, the construction of the optimal policies relies on the following lemma.
Lemma 1. F ′ = {f ′sq,a} and F ′′ = {f ′′sq,a} are two transmission policies with unichains, and
have different distributions on the transmission rate only when q = q and a = a. We define
policy F = εF ′ + (1− ε)F ′′, where each f sq,a is equal to εf ′sq,a + (1− ε)f ′′sq,a, and 0 ≤ ε ≤ 1.
Then, we have1) The Markov chain under policy F =εF ′+(1− ε)F ′′ is a unichain for each 0 ≤ ε ≤ 1;
2) There exists a 0≤ε′≤1 so that PF =ε′PF ′+(1− ε′)PF ′′ and DF =ε′DF ′+(1− ε′)DF ′′;
3) Parameter ε′ increasingly moves from 0 to 1 with the increase of ε from interval [0, 1].
Proof: See Appendix B.For each pair of adjacent vertices (P , D) and (P , D) on L, we present the two optimal
deterministic policies F∗
and F∗
with unichains, according to Theorem 3. The pair of policies
has different transmission rates only for one particular queue length and arrival rate. According
to Lemma 1, we can present the optimal policy F ∗ as (1− ε)F ∗ + εF∗, by which the average
power-delay is presented as (ε′P+(1−ε′)P , ε′D+(1−ε′)D), and the Markov chain is a unichain.
As a result, we show the existence of the optimal policy with a unichain for each power-delay
pairs (P,D) on the optimal delay-power tradeoff curve L.
According to Theorem 3 and Lemma 1, we finally straightforwardly show that the optimal
delay-power tradeoff curve is obtained under an arbitrary initial state in the following theorem.
Theorem 4. All the average power-delay pairs of the optimal delay-power tradeoff curve L can
be obtained using the adaptive transmission policies with unichains.
Therefore, the optimal delay-power tradeoff for AWGN channels is obtained by the optimal
policy that is given by the LP problem (16). Meanwhile, the same optimal tradeoff is presented
for the single link with different initial queue lengths and arrival rates.
IV. THRESHOLD-BASED OPTIMAL TRANSMISSION POLICY OVER AWGN CHANNELS
In this section, we show the threshold-based structure for the optimal adaptive transmission
policies over AWGN channels. For each optimal average power-delay pair, we present the delay-
optimal transmission strategy by using a threshold-based structure on the queue length, in which
DRAFT September 2, 2019
15
1
Average Power
Aver
ageD
elay
𝜇 𝜇𝜇
𝑃, 𝐷
𝑃, 𝐷
𝑃, 𝐷
Optimal Tradeoff CurveVertices on the CurveFeasible Region
𝜇
Fig. 2. The stretch of the optimal delay-power tradeoff: We present the vertex of the curve as (P , D), while two vertices thatare adjacent with vertex (P , D) are (P , D) and (P , D) with P < P < P . As for the two end points of the curve, we have no(P , D) for the vertex with lowest power; no (P , D) for the vertex with largest power.
the thresholds for different transmission rates are given for the arrival rates. To this end, we
first present the threshold-based optimal policies for the vertices of the optimal tradeoff curve
L based on the Lagrangian relaxation of the cross-layer optimization problem (9). Further, by
using the optimal policies on the vertices, we formulate the threshold-based transmission policy
for each average power-delay pairs on curve L. With the threshold-based structure, we finally
develop a threshold-based algorithm to efficiently obtain the optimal delay-power tradeoff.
A. Threshold-based Optimal Deterministic Policy for the Lagrangian Relaxation ProblemIn this subsection, the threshold-based optimal deterministic policies are shown for all the
vertices of the optimal delay-power tradeoff curve L that is formulated for AWGN channels. For
each vertex on L, we first obtain an optimal deterministic policy by exploiting the Lagrangian
relaxation problem for cross-layer optimization problem (9). Then, for the optimal deterministic
policies, we show that there exists a threshold-based structure on the queue lengths.
First, we formulate the Lagrangian relaxation problem for each vertex. As shown in Fig. 2,
for each µ > 0, we always find a vertex on tradeoff curve L to get the minimum value of
D + µP . For each vertex on L, we further show a set of µ as (µmin, µmax) 1, under which the
vertex obtains the minimized value of D + µP . In particular, we have that µmin = P−PD−D for all
the vertices except the one with the largest power, while we set µmin as 0 for this vertex based
on the observation of Fig. 2. Similarly, we have µmax = P−PD−D for the vertices with a less power,
and µmax = +∞ for the vertex with the lowest power.
1When µ is equal to µmin or µmax, two adjacent vertices can obtain the minimum value of D + µP .
September 2, 2019 DRAFT
16
Since set R consists of all the power-delay pairs given by policies F ∈ F , we can show the
optimal policy for each vertex by the following Lagrangian relaxation problem
minF∈F
DF + µPF − µPth, (22)
where the multiplier µ belongs to the corresponding set for the given vertex (P , D) on L.
Therefore, we show the optimal policy for each vertex (P , D) by solving Lagrangian relax-
ation problem (22) with specific µ employed. In particular, we formulate problem (22) as an
unconstrained infinite-horizon MDP with the objective function as DF + µPF . According to
the result given by [27, Theorem 9.1.8], we have that the unconstrained MDP is minimized by
a deterministic policy, under which the corresponding Markov chain is a unichain. Further, we
show that the deterministic policy is presented by a threshold-based structure on the queue length.
Theorem 5. For each vertex (P , D) on curve L, the optimal deterministic policy F ∗ is presented
by the threshold-based structure on the queue length, in which thresholds qF ∗(s, a) exist for every
0 ≤ s ≤ S, 0 ≤ a ≤ A, and the probabilities f ∗sq,a satisfy that f ∗sq,a = 1 qF ∗(s− 1, a) < q ≤ qF ∗(s, a),
f ∗sq,a = 0 otherwise,(23)
where we have 0≤qF ∗(0, a)≤qF ∗(1, a)≤ · · ·≤qF ∗(S, a)≤Q and qF ∗(−1, a)=−1 for each a.
Proof: See Appendix C.With a threshold-based optimal deterministic policy F ∗ given, we show a series of thresholds
{qF ∗(s, a) : s = 0, 1, · · · , S} for each arrival rate a ∈ {0, 1, · · · , A}. By using the thresholds
on the queue length, we then can completely describe the corresponding optimal deterministic
policy for each vertex of the optimal tradeoff curve L. Moreover, we can determine the delay-
optimal transmission strategy by using the order relation of queue lengths with the thresholds
under different arrival rates.
B. Threshold-Based Optimal Adaptive Transmission Policy
We now present the threshold-based optimal policy for each power-delay pair on the optimal
delay-power tradeoff curve L. With a given power-delay pair on curve L, we construct the
threshold-based optimal policy as a convex combinations of the optimal deterministic policies
for the vertices on L which are presented in Theorem 5. In particular, we present the threshold-
based optimal policies for AWGN channels in the following theorem.
Theorem 6. The optimal policy F ∗ exists (A + 1) × (S + 1) thresholds qF ∗(s, a), where we
have 0 ≤ qF ∗(0, a) ≤ qF (1, a) ≤ · · · ≤ qF (S, a) ≤ Q for each arrival rate a = 0, 1, · · · , A. With
DRAFT September 2, 2019
17
all the thresholds qF ∗(s, a) given, the optimal policy F ∗ satisfiesf ∗sq,a = 1 qF ∗(s− 1, a) < q ≤ qF ∗(s, a), a 6= a∗ or s 6= s∗
f ∗sq,a = 1 qF ∗(s− 1, a) < q < qF ∗(s, a), a = a∗ and s = s∗
f ∗sq,a + f ∗(s−1)q,a = 1 q = qF ∗(s, a), a = a∗ and s = s∗
f ∗sq,a = 0 otherwise.
(24)
where the specific transmission rate s∗ and arrival rate a∗ are given by optimal policy F ∗, and
we have qF ∗(−1, a) = −1 for each 0 ≤ a ≤ A.
Proof: Our proof starts with the observation that the optimal policies corresponding to the
vertices of the optimal delay-power tradeoff curve L satisfy Eq. (24). Then, we only need to
construct the optimal policies satisfying Eq. (24) for the other average power-delay pairs on
curve L. In particular, we show the construction by using the properties of L in Theorem 3 and
the threshold-based structure for the optimal policies on the vertices.
For each power-delay pair (P,D) on L, we can find a pair of adjacent vertices (P , D) and
(P , D), under which the power-delay pair is exactly on the line segment with the two vertices
as the endpoints. According to Theorem 5, we have that the pair of vertices on the curve L is
generated by two threshold-based deterministic policies F∗
and F∗, respectively. In other words,
both the policies satisfy Eq. (23) as well as Eq. (24). Meanwhile, the two policies F∗
and F∗
will employ different transmission rates only on a particular queue length and arrival rate. As a
result, according to Lemma 1, we can formulate the corresponding optimal policy for (P,D) as
the convex combination of the two threshold-based deterministic policies.
Considering the two deterministic policies F∗
and F∗
for the two adjacent vertices are
threshold-based, we have that there exist the specific transmission rate s∗ and arrival rate a∗,
under which the corresponding thresholds for the two policies are different. Further, we have
that the thresholds under the two policies are adjacent on the queue length, i.e., |qF ∗(a∗, s∗) −
qF ∗(a∗, s∗)| = 1. Therefore, we show that the threshold-based optimal policy satisfies Eq. (24)
for each (P,D) on curve L, and the proof is completed.
For a given threshold-based optimal policy F ∗, we obtain a series of thresholds {qF ∗(s, a) : 0 ≤
s ≤ S} under different arrival rates a. Based on the order relation of {qF ∗(s, a)} in Theorem 6,
we have that the transmission rate increases with the increase of the queue length. Moreover,
according to Theorem 6, the threshold-based optimal policy can be expressed as the convex
combination of two adjacent deterministic threshold-based policies shown in Theorem 5. As a
result, for each system state (q[n], a[n]) except (qF ∗(s∗, a∗), a∗), we determine the transmission
September 2, 2019 DRAFT
18
1
Average Power
Aver
age
Del
ay
Optimal Tradeoff Curve
Feasible RegionAdjacent Policies for Θ
Θ
Θ
Θ
Θ
Θ
Fig. 3. Demonstration of the algorithm to obtain the optimal delay-power tradeoff curve.
rates for AWGN channels by the queue length and arrival rate with the probability as 1. While
the queue length is qF ∗(s∗, a∗) and arrival rate is a∗, the transmission rate is given as s∗ and
s∗ − 1 with probabilities f ∗s∗
qF∗ (s∗,a∗),a∗ and f ∗s∗−1qF∗ (s∗,a∗),a∗ , respectively.
C. Algorithm to Obtain the Optimal Tradeoff
We finally develop a threshold-based algorithm to efficiently obtain the optimal delay-power
tradeoff curve for AWGN channels. In this way, the minimized delay can be generated by the
optimal threshold-based policy for the given power constraint, which will be adjusted by practical
systems based on the time varying delay and power efficiency requirements. Considering the
piecewise linearity of the optimal tradeoff curve L, we first attain all the vertices of L and
the corresponding threshold-based optimal deterministic policies. As shown in Algorithm 1, we
search the vertices sequence {Θ0,Θ1, · · · ,ΘN} starting from Θ0 with an iteration procedure.
For the vertex Θ0 in Fig. 3, we obtain it by the policy that transmits the packets as soon as they
arrive at the buffer. In particularly, we denote this transmission policy by F 0.
We next present the iteration procedure in Algorithm 1 to find the current vertex Θn+1
based on the previous vertex Θn. With the optimal deterministic policy F ∗n for previous vertex
Θn = (Pn, Dn), we can detect the current vertex Θn+1 = (Pn+1, Dn+1) by focusing on all the
adjacent threshold-based deterministic policies of F ∗n. Overall the candidates of transmission
policies, we obtain the threshold-based optimal deterministic policy F ∗n+1 for vertex Θn+1
based on the decreasing and convexity of L. More specifically, the average power-delay pair
generated by F ∗n+1, i.e., Θn+1, has the slower increment of the average delay per decrement
of the average power consumption starting from vertex Θn than that generated by any other
candidate. Therefore, the current vertex Θn+1 and optimal policy F ∗n+1 can be determined by
DRAFT September 2, 2019
19
Algorithm 1 Obtain the Optimal Delay-Power Tradeoff for AWGN channels1: F ← F 0, n← 02: DF ← average delay under policy F , PF ← average power under policy F3: Fc ← [F ], Dc ← DF , Pc ← DF4: while Fc 6= ∅ do5: Fp ← the set containing an arbitrary policy in Fc, Fc ← ∅, Fp ← ∅6: Dp ← Dc, Pp ← Dc, slope← +∞7: while Fp 6= ∅ do8: F = Fp . pop(0), Fp .append(F ), Fp ← ∅9:
F(F )← the set of all threshold-based deterministic policies satisfying Eq. (23)with the only one different threshold comparing with F
10: for all F ′ ∈ F(F ) do11: DF ′ ← average delay under F ′, PF ′ ← average power under F ′
12: if DF ′ = Dp, PF ′ = Pp, and F ′ /∈ Fp then13: Fp . append(F ′)14: else if DF ′ = Dc and PF ′ = Pc then15: Fc . append(F ′)16: else if DF ′−Dp
Pp−PF ′< slope or DF ′−Dp
Pp−PF ′= slope, PF ′ > Pc then
17: Fc ← [F ′], Dc ← DF ′ , Pc ← PF ′ , slope←DF ′−DpPp−PF ′
18: end if19: end for20: Fp ← Fp21: end while22: n← n+ 123: Θn = (Pc, Dc), F ∗n = Fc . pop(0)24: end while
enumerating all the deterministic policies that are adjacent with F ∗n. Further, we narrow down
the alternatives of policy F ∗n+1 by using the threshold-based structure presented in Theorem
6. In Algorithm 1, we denote by Fp the set of the threshold-based policies under which the
previous vertex Θn is generated as the average power-delay pair. By enumerating the adjacent
threshold-based deterministic policies for each policy in Fp, we can obtain the current vertex
Θn+1 and the corresponding threshold-based optimal policies. During the searching process, we
backlog the candidate of the optimal policy in set Fc, under which a less absolute slope and a
lower power decreasing can be obtained on the power-delay plane. As a result, when we traverse
all the optimal policies that generate the vertex Θn, the threshold-based optimal deterministic
policy F ∗n+1 is also attained for the current vertex Θn+1.
Considering all the vertices are detected for curve L, we finally show the optimal delay-
power tradeoff under an arbitrary power constraint. With the power constraint Pth given, we
construct the corresponding optimal policy as a convex combination of two threshold-based
policies. According to Theorem 6, the two threshold-based policies corresponds to two adjacent
vertices (Pn, Dn) and (Pn+1, Dn+1) that satisfy (Pn−Pth)(Pn+1−Pth) ≤ 0. In this way, we can
September 2, 2019 DRAFT
20
find the two adjacent vertices on L by checking the sequence {Θ0, · · · ,ΘN}. Considering the
sequence is permuted with the power components increasing, we will end the research when
finding the first vertex whose power component is less than Pth. According to Lemma 1, we
obtain the multiplier of the convex combination by the binary search over interval [0, 1]. By
this means, the optimal delay-power tradeoff can be demonstrated under an arbitrary power
constraint. The threshold-based optimal transmission policy is also effectively formulated based
on the threshold-based deterministic policies for the vertices.
Furthermore, we present the complexity of the proposed algorithm. Considering an iteration
process is employed for Algorithm 1, we first show the maximum number of iterations that search
the adjacent policies for set Fp; then analyze the complexity in each iteration. As indicated in
Algorithm 1, we update set Fp in each iteration by changing one particular state’s transmission
rate. Meanwhile, under two arbitrary deterministic policies, the number of different transmission
rates is no more than the number of system states, i.e., QA. As a result, the number of iterations
is no more than QA. For each iteration, we further calculate the average delay and power for the
AS adjacent threshold-based policies of F , where the most time-consuming operation for each
candidate, that is the matrix inversion, costs O(Q3A3) in terms of time. In this way, the time
complexity of Algorithm 1 is O(Q4A5S). Moreover, considering the set Fp has the most space
consumption with the maximum number of policies as QA, we have that the space complexity
is O(Q2A2S), where each policy is contained in Fp with the QAS probabilities stored.
For practical systems, we can formulate a trajectory-sampling version of the algorithm. More
specifically, we generate the average delay and power as the mean value of 1αq[n] and ρ[n] based
on a long-term sampling of s[n], q[n], as well as a[n]. The optimal delay-power tradeoff is then
presented for the practical systems over AWGN channels without prior need of arrival statistics.
V. OPTIMAL DELAY-POWER TRADEOFF FOR BLOCK FADING CHANNELS
In this section, we extend the optimal delay-power tradeoff over block fading channels. Based
on the analyses of the optimal tradeoff for AWGN channels, we first show the optimal delay-
power tradeoff for block fading channels by converting the CMDP to an LP problem. By solving
the equivalent LP problem, we then formulate an optimal delay-power tradeoff curve, where we
show the properties for the curve that are same as those in Section III. We finally present the
optimal transmission policies over the fading channel with a threshold type of structure on the
queue length. For the optimal threshold-based policies, we further show an order relation of the
thresholds under different channel states, when the power functions follow a particular condition.
DRAFT September 2, 2019
21
First, we present the optimal delay-power tradeoff over block fading channels. For the gener-
alized system over the fading channel, we employ the steady-state analysis for each transmission
policy F = {f sq,a,ι : ∀ q, a, ι, s}, as presented in Section III-A. For this purpose, we formulate
a Markov reward process for each given policy F , through which the average delay and power
consumption are presented by the steady-state probability. In this way, we further show the
optimal delay-power tradeoff by using an LP problem, where all the obtainable power-delay
pairs are presented for the transmission policies in terms of the state-action frequencies {xsq,a,ι}.
In particular, we present the LP problem as follows.
min{xsq,a,ι}
1
α
Q∑q=0
A∑a=0
L∑ι=1
S∑s=0
qxsq,a,ι (25a)
s.t.Q∑q=0
A∑a=0
L∑ι=1
S∑s=0
Phι(s)xsq,a,ι ≤ Pth (25b)
min{q′−a′+S,Q}∑q=max{q′−a′,0}
A∑a=0
L∑ι=1
S∑s=0
γa,a′ηι′
xsq,a,ι1{s=q+a′−q′} =S∑s=0
xsq′,a′,ι′ ∀ q′, a′, ι′ (25c)
Q∑q=0
A∑a=0
L∑ι=1
S∑s=0
xsq,a,ι = 1 (25d)
xsq,a,ι ≥ 0 ∀ q, a, ι, s. (25e)
By solving the LP problem under different power constraints Pth, we then formulate the optimal
delay-power tradeoff curve, which contains all the optimal average power-delay operating points
under different power constraints. With the same method in Sections III-B and III-C employed,
we straightforwardly obtain the same properties of the optimal tradeoff curve as follows.
Theorem 7. For block fading channels, the optimal delay-power tradeoff curve is piecewise
linear, decreasing, and convex. The vertices of the optimal tradeoff curve are obtained by a
series of deterministic transmission policies with unichains. For each two adjacent vertices, the
corresponding two policies have different transmission rates only on one state.
Proof: The proof of this theorem is directly taken from the method of Theorems 2 and 3.As a result, the optimal delay-power tradeoff over the fading channel is obtained by solving
the LP problem (25), where the optimal policies are generated by the optimal solutions based
on the extension of Eq. (17) for fading channels. By jointly exploiting the results in Theorem
7 and Lemma 1 over fading channels, we have that the optimal average delays are obtained by
the corresponding optimal policies regardless of the initial system state.
September 2, 2019 DRAFT
22
Based on the analyses of the optimal tradeoff curve, we finally show that the optimal delay
can be obtained by the optimal threshold-based policies over the fading channel. With a similar
way indicated in Section IV, we show the threshold-based structure of the optimal policies in
the following theorem. In particular, we first present the optimal deterministic threshold-based
policies for the vertices of the tradeoff curve, where we employ the same method in Theorem 5
for the CMDP generated over the fading channel. Then, for other points on the optimal tradeoff
curve, we show the threshold-based structure of the optimal policies by presenting them as
the convex combination of two adjacent deterministic threshold-based policies, as indicated in
Theorem 6. We show the optimal threshold-based policies in the following theorem.Theorem 8. The optimal policy F ∗ exists (A+1)×(S+1)×L thresholds qF ∗(s, a, ι), where we
have 0≤qF ∗(0,a,ι)≤· · ·≤qF (S,a,ι)≤Q for each arrival rate a, 0 ≤ a ≤ A and index of channel
state ι, 1 ≤ ι ≤ L. With the thresholds qF ∗(s, a, ι) given, the optimal policy F ∗ satisfiesf ∗sq,a,ι = 1 qF ∗(s−1, a, ι) < q ≤ qF ∗(s, a, ι), a 6=a∗ or s 6=s∗ or ι 6= ι∗
f ∗sq,a,ι = 1 qF ∗(s−1, a, ι) < q < qF ∗(s, a, ι), a=a∗ and s=s∗ and ι= ι∗
f ∗sq,a,ι + f ∗(s−1)q,a,ι = 1 q = qF ∗(s, a, ι), a = a∗ and s = s∗ and ι = ι∗
f ∗sq,a,ι = 0 otherwise.
(26)
where the specific s∗, a∗, and ι∗ are given by F ∗, and qF ∗(−1, a, ι) = −1 for each a and ι.Proof: The proof of this theorem is directly taken from the method of Theorems 5 and 6.
With the threshold-based structure of the optimal policies, we can efficiently determine the
transmission rate for the adaptive transmitter over fading channels. With the current arrival rate
a[n] and channel state h[n] given as a and hι, respectively, we present the transmission rate by
comparing the current queue length with the series of thresholds {qF ∗(s, a, ι) : 0 ≤ s ≤ S}. As
a result, we can also obtain the optimal delay-power tradeoff for fading channels by developing
a similar algorithm as Algorithm 1. Moreover, we show an order relation of the thresholds under
different transmission rate in the following theorem.Theorem 9. The thresholds qF ∗(s, a, ι), 1 ≤ ι ≤ L of the optimal policy F ∗ satisfy
qF ∗(s, a, ι+) ≤ qF ∗(s, a, ι
−), ∀ 1 ≤ ι− < ι+ ≤ L, (27)for each transmission rate s and arrival rate a, when power functions Phι(s), 1≤ ι≤L satisfies
Phι+ (s+)− Phι+ (s−) ≤ Phι− (s+)− Phι− (s−), (28)where we have 0 ≤ s− < s+ ≤ S.
Proof: See Appendix D.According to the order relation of thresholds in Eq. (27), a greater rate will employed for
a better channel condition under the optimal threshold-based policies, if the condition in Eq.
(28). Actually, for a typical communication system, we have that the power consumption for a
DRAFT September 2, 2019
23
Fig. 4. Optimal Delay-Power Tradeoff Curves
transmission rate is inversely proportional to the square of amplitude of channel coefficient, i.e.,Phι+
(s)
|hι− |2=
Phι−
(s)
|hι+ |2. As a result, we can straightforwardly check the condition in Eq. (28) in the
typical system, through which the order relation of thresholds of optimal policies is satisfied.VI. NUMERICAL RESULTS
In this section, we present the numerical results to validate the optimal delay-power tradeoff
for the adaptive transmitter with Markov random arrivals. In a practical scenario, we consider
that the maximum transmission rate S is equal to 3, under which we employ three optional
modulations BPSK, QPSK, or 8-PSK to transmit 1, 2, or 3 packets in a timeslot, respectively.
We assume that each packet contains 10,000 bits and time duration of timeslot is 10 ms. With
the bandwidth as 1 MHz and the one-sided noise power spectral density N0 as −150 dBm/Hz,
we calculate the transmission powers over AWGN channels as P (0) = 0 W, P (1) = 9.0×10−12
W, P (2) = 18.2× 10−12 W, and P (3) = 59.5× 10−12 W, by which the bit error rate as 10−5 is
provided. Moreover, we consider a specific class of the arrival processes. For each arrival process,
we determine the transition matrix Γ = [γa,a′ ] by a constant ψ and a vector ζ = [ζ0, ζ1, · · · , ζA]T .
In particular, we define matrix Γ by presenting each element γa,a′ as
γa,a′ =1− ζaA
+(A+ 1)ζa − 1
A1{a′=(a+ψ) mod (A+1)}. (29)
As a result, we construct an arrival process by using a tuple (ζ, ψ), and have that ζa ∈ [0, 1]
and ψ ∈ {−A,−A+ 1, · · · , A}.
First, Fig. 4 presents the optimal delay-power tradeoff curves for AWGN channels, where
we consider the impact of different average arrival rates. For the optimal tradeoff curves, we
validate the theoretical results by using the Monte-Carlo simulation. We assume the maximum
September 2, 2019 DRAFT
24
(a) The average transmission rates for the optimal policy (b) The thresholds for the optimal policyFig. 5. Typical optimal threshold-based policy
arrival rate and transmission rate as 3, and the buffer size as 7. The optimal delay-power tradeoff
curves are next presented for the three different arrival processes, all of which are charactered
as (ζi, 0), i = 1, 2, 3. In particular, we have ζ1 = [0.7, 0.7, 0.5, 0.5], ζ2 = [0.5, 0.5, 0.5, 0.5], and
ζ3 = [0.3, 0.3, 0.5, 0.5]. The average rates for the three arrival processes are equal to 1.25, 1.50,
and 1.67, respectively. As presented in Fig. 4, the optimal delay-power tradeoff given by Algo-
rithm 1 and solving the LP problem can perfectly match the results that are given by the Monte-
Carlo simulation. In each optimal tradeoff curve, the optimal average delay is decreasing with
the increase of average power consumption. Further, a close observation shows that each curve is
piecewise linear and convex, by which we confirm Theorem 2. Then, we present different optimal
delay-power tradeoff curves under different average arrival rates. When the power constraint is
Pth = 18× 10−12 W, the average delay under (ζ2, 0) can reduce by 47% compared with that
under (ζ3, 0). To achieve the average delay D = 1.2×10ms, arrival processes (ζ2, 0) and (ζ3, 0)
require greater power consumptions, which are 126% and 143% of that for (ζ1, 0).
Then, we turn our attention to the threshold-based structure of the optimal cross-layer trans-
mission policy over AWGN channels. Fig. 5 presents the typical threshold-based optimal policy
F ∗ for the identified system configuration as Fig. 4 with the arrival process given as (ζ2, 0)
and Pth = 14.82× 10−12 W. In Fig. 5(a), we particularly show the average transmission rates
under different queue lengths and arrival rates. We also indicate the threshold-based structure in
Fig. 5(b), where we present the thresholds by red solid lines. According to the order relation of the
thresholds in Theorem 6, we show a greater transmission rate for a longer queue length under the
optimal policy F ∗. Following Theorem 6, we further present the typical policy as a convex com-
bination of two adjacent deterministic policies, both of which exist a threshold-based structure
in Theorem 5. As a result, the transmission rates under F ∗ are deterministic for the system states
DRAFT September 2, 2019
25
Fig. 6. Optimal Delay-Power Tradeoff Curves under different arrival processes
except for a specific one with arrival rate and queue length as 3 and 6, respectively.
We next show the impact of different patterns of Markov arrivals to the optimal delay-power
tradeoff over AWGN channels even if we employ the same average rate and covariance in these
random arrivals. In particular, we focus on the three arrival patterns that are denoted by A1,
A2, and A3, the transition matrices of which are given by (κ11, 1), (κ21, 0), and (κ31,−1),
respectively. We have κi ∈ [0, 1] for i = 1, 2, 3, and all the elements of vector 1 are equal to
1. Then, the random arrivals under all the three patterns have the same steady-state probability
distribution, and the steady-state probabilities of all the arrival rates are the same, i.e., 1A+1
.
Therefore, the average arrival rate of each arrival process is equal to A2
. To obtain the same
covariances cov(a[n], a[n+ 1]) for the random arrivals under three different arrival patterns, we
set κ1 = κ3 = κ and κ2 = 3−2κ10
, under which we have cov(a[n], a[n+ 1]) = 1−4κ10
.
As shown in Fig. 6, we present the optimal delay-power tradeoff curves for the three arrival
patterns with parameter κ given as 0.1, 0.25, and 0.71. In particular, we have a lower average delay
for arrival processes A1 and A2 if we increase κ, i.e., decrease covariance. When Pth = 17×10−12
W, the average delay under A1 with κ = 0.7 can be reduced by 37% and 43% compared to that
with κ = 0.25 and κ = 0.1. As for arrival process A2, we have that the average delay is reduced
by 11% and 14%. However, for arrival process A3, the average delays under the three value of
κ have different order relations with the varying of the average power constraint.
1When κ is equal to 0.25, we have the same arrival processes under the three arrival patterns, through which thecorresponding curves are coincident
September 2, 2019 DRAFT
26
Fig. 7. Optimal Delay-Power Tradeoff Curves
In Fig. 7, we present the procedure to obtain the optimal delay-power tradeoff for AWGN
channels, which is given by Algorithm 1. To simplify the figure, we assume Q = 4 and S =
A = 2, and consider the arrival process presented by ([0.6, 0.6]T , 1). We first show the power-
delay pairs obtained by the deterministic policies by using marker ’o’. Further, we connect
the two points generated by two adjacent policies by the black dash lines. With the vertex
Θ0 and corresponding policy F 0 given, we seek the vertices Θn among the threshold-based
deterministic policies that are adjacent with the previous optimal policies in set Fp. To present
those investigated policies in Algorithm 1, we particularly show the corresponding power-delay
pairs by marker ’×’ and connect them with adjacent vertices on the optimal tradeoff curve L
by the red dash lines. As shown in Fig. 7, the optimal delay-power tradeoff can be effectively
obtained by Algorithm 1, where a few adaptive transmission policies are investigated over all
the deterministic policies.
We finally show the optimal delay-power tradeoff for block fading channels. In Fig. 8, we
consider an L-state block fading channel with L given as 4. In particular, for the fading channel,
the amplitudes of the four channel states, i.e., |hι|, ι = 1, · · · , 4, are given as 0.314, 2.50, 3.54,
and 5.00, respectively. The corresponding probabilities ηι are presented as 0.394, 0.232, 0.239,
and 0.135. We obtain the power consumptions under different channel states by define the power
consumption function Phι(s) as P (s)|hι|2 for each s and ι. As a result, we present the optimal delay-
power tradeoff curves for different arrival processes in Fig. 8(a), where we employed the same
system configuration as that Fig. 4. Moreover, we also show the average transmission rates under
DRAFT September 2, 2019
27
(a) The optimal delay-power tradeoff curve over the blockfading channel
(b) The average transmissin rate under the optimal policywith the arrival rate given as 2
Fig. 8. Optimal delay-power tradeoff over fading channel
an optimal threshold-based policy in Fig. 8(b) with the current arrival rate a[n] given as 2. As
indicated in Fig. 8(b), we present the threshold-based structure for the optimal policy, in which
we show the thresholds on the queue lengths by red solid lines. In Fig. 8(b), we further illustrate
the order relation of the thresholds under different channel states that is given by Theorem 9. By
this means, under the current queue length, a greater transmission rate is employed for a better
channel condition.
VII. CONCLUSION
In this paper, we have obtained the optimal delay-power tradeoff required for transmission
over a wireless link under Markov arrivals. The problem can be formulated as a CMDP, under
which we jointly consider the queue length, arrival rate, and channel state to minimize the
average delay under an average power constraint. To obtain the optimal delay-power tradeoff,
we have shown an equivalent LP problem based on the steady-state analysis of the Markov
reward process. Varying the power constraints in the derived LP problem, we show that the
optimal delay-power tradeoff curve is decreasing, convex and piecewise linear. Based on these
geometric properties, we have also presented the optimal adaptive transmission policies for the
optimal power-delay pairs on the tradeoff curve. Further, the threshold-based structure of the
optimal policies has been demonstrated in the queue length by using the Lagrangian relaxation.
With the threshold-based structure, we have developed a threshold-based algorithm to efficiently
obtain the optimal delay-power tradeoff for practical communications.
APPENDIX A
PROOF OF THEOREM 3The proof falls into three parts. We first show that the policies generating the vertices of
curve L with unichains. To obtain a contradiction, we suppose that there exists a policy F for
September 2, 2019 DRAFT
28
a vertex of L, under which a multichain is generated with the number of closed classes as I .
Then, the set {xsq,a : πF (q, a)f sq,a} is varied with the initial state. Moreover, we can construct a
series of policies F i, i = 1, 2, · · · , I with unichains, among which the policy F i employs the
same transmission rates as policy F for each state in the ith recurrent closed class. The existence
of policy F i is provided by the communicating property of the CMDP in [27, Section 8.3.1].
As a result, the same steady-state distribution is obtained under the policies F i and F with
the system starting from the ith recurrent closed class. In this way, the state-action frequencies
{xsq,a} generated by F can be expressed as the convex combination of state-action frequencies of
policies F i, where the convex multipliers are determined based on the initial state. As a corollary,
{xsq,a} is not the vertex of G. Since set R is the projection of G and contains L, we have that
the vertices of L must be projected by the vertices of G, which induces to a contradiction.
Then, we show that all the vertices of curve L are obtained by the deterministic policies. For
this purpose, we apply the similar consideration as [29, Theorem 4.2]. This theorem shows that
the vertices of G are generated by the deterministic policies, if all the considered policies have
unichains. With the above analysis, the proof is straightforwardly checked based on the theorem.
We finally show the relationship of policies for two adjacent vertices on L. We start the
proof with the observation that the edge connecting the two adjacent vertices on curve L is the
projection of an edge on G, where the vertices are generated by the deterministic policies. For
the two adjacent vertices on G, the corresponding deterministic policies are different only on
one state. The conclusion also holds for the degenerated case that the edges connecting a series
of adjacent vertices of L are collinear. In this way, the proof of this theorem is completed.
APPENDIX B
PROOF OF LEMMA 1
We begin by recalling that the probabilities of the transmission rates in policy F are the same
as that in policies F ′ and F ′′ for all the states except state (q, a). When the system state is given
as (q, a), we next randomly determine the employed policy as F ′ or F ′′ with probabilities ε or
1−ε, respectively. Considering that F ′ and F ′′ have unichains, we can visit (q, a) within a finite
time duration starting from any other state under F ′ and F ′′. As a result, we can also obtain
the identified random process under policy F , i.e., the system visits (q, a) starting from a given
state. Therefore, there exists only one recurrent closed class in the Markov chain under policy
F , i.e., policy F has a unichain.
DRAFT September 2, 2019
29
Then, we present that the average power and delay under F is formulated as the convex
combination of those under F ′ and F ′′ based on the relationship of the three policies’ state-
action frequencies. To this end, we first present xsq,a under F by the method in [28, Eq. 4.3.8] as
xsq,a = πF (q, a)f sq,a =EFq,a{N(q,a,s)}EFq,a{T}
, where we have T = inf{n ≥ 1 : q[n] = q, a[n] = a} and
N(q, a, s) =∑T−1
n=0 1{q[n]=q,a[n]=a,s[n]=s}.
Considering that the average power and delay in Eqs. (16a) and (16b) are linear functions
of xsq,a, we only need show the relationship of xsq,a, x′sq,a, and x′′sq,a, where we define x′sq,a =
πF ′(q, a)f ′sq,a and x′′sq,a = πF ′′(q, a)f ′′sq,a. Based on the above definition of xsq,a, we have
xsq,a =εEF ′q,a{N(q, a, s)}+ (1− ε)EF ′′q,a{N(q, a, s)}
εEF ′q,a{T}+ (1− ε)EF ′′q,a{T}
=εEF ′q,a{T}x′
sq,a + (1− ε)EF ′′q,a{T}x′′
sq,a
εEF ′q,a{T}+ (1− ε)EF ′′q,a{T}, (30)
where the first equality holds based on the above analysis of transmission process under F .
By defining ε′ =εEF ′q,a{T}
εEF ′q,a{T}+(1−ε)EF ′′
q,a {T}, we finally have xsq,a = ε′x′sq,a + (1− ε′)x′′sq,a. Therefore,
the average power and delay under policy F are given as PF = (1 − ε′)PF ′+ε′PF ′′ and DF =
(1 − ε′)DF ′+ε′DF ′′ , respectively. An easy computation shows that ε′ is monotone increasing
with ε under the given EF ′q,a{T} and EF ′′q,a{T}. Meanwhile, we have that policy F degenerates to
policy F ′ and F ′′ with ε as 1 or 0, respectively, where the corresponding ε′ is equal to 1 or 0.
APPENDIX C
PROOF OF THEOREM 5The main idea of the proof is to formulate the optimal policies for vertices of the optimal
tradeoff curve L based on value iteration algorithm. As presented in Fig. 2, we obtain vertex
(P , D) as the only optimal power-delay pair of Lagrangian relaxation problem (22) with the
specific µ. The same optimal power-delay pair is obtained by the prime problem (9) with Pth = P .
To obtain optimal policies for vertices, we first formulate the MDP to minimize DF + µPF .
According to [27, Theorem 9.1.8], we obtain the optimal deterministic policy F ∗ by using value
iteration, which is presented in Algorithm 2 with ω(m+1)(q, a, s) defined as
ω(m+1)(q, a, s) =1
αq + µP (s) +
∑A
a′=0γa,a′ν
(m)(q − s+ a′, a′). (31)
Further, we have a unichain under the optimal policy F ∗ that is generated by Algorithm 2.
Then, we show the threshold-based structure for policy F ∗. Since the optimal policy is
generated by an iteration process, we present the threshold-based structure by induction on
m. In particular, we first show the existence of thresholds for deterministic policy F (m+1) =
{s(m+1)(q, a) : ∀q, a} with the assumption that ν(m)(q, a) is convex in q, i.e.,
September 2, 2019 DRAFT
30
Algorithm 2 Value Iteration Algorithm for Markov Decision Processes1: m← 02: for all q and a do3: ν(0)(q, a)← arbitrary value // Initialization4: end for5: repeat6: for all q and a do // Policy Improvement:7: s(m+1)(q, a)← arg mins∈S(q){ω(m+1)(q, a, s)}8: end for9: for all q and a do // Policy Evaluation:
10: ν(m+1)(q, a)← ω(m+1)(q, a, s(m+1)(q))11: end for12: m← m+ 113: until s(m)(q, a) = s(m−1)(q, a) holds for all q and a14: s∗(q, a)← s(m)(q, a) for all q and a
ν(m)(q − 1, a) + ν(m)(q + 1, a) ≥ 2ν(m)(q, a). (32)
To this end, we only need to show that transmission rate s(m+1)(q+ 1, a) is equal to s∗ or s∗+ 1
when s(m+1)(q, a) is equal to s∗. For a given arrival rate a, we then have that transmission rate
s(m+1)(q, a) under deterministic policy F (m+1) is monotone increasing on queue length q. As a
result, we have thresholds qF (m+1)(s, a) exist, and policy F (m+1) satisfies Eq. (23). With s∗ given
as arg mins∈S(q){ω(m+1)(q, a, s)}, we show the sufficient condition of thresholds’ existence as
ω(m+1)(q + 1, a, s∗) ≤ ω(m+1)(q + 1, a, s∗ − δ) (33)
ω(m+1)(q + 1, a, s∗ + 1) ≤ ω(m+1)(q + 1, a, s∗ + 1 + δ), (34)
where δ≥0. Since s∗ minimizes ω(m+1)(q, a, s) over s∈S(q), we rewrite Eqs. (33) and (34) as
ω(m+1)(q, a, s∗−δ)+ω(m+1)(q+1, a, s∗) ≤ ω(m+1)(q, a, s∗)+ω(m+1)(q+1, a, s∗−δ), (35)
ω(m+1)(q, a, s∗+δ)+ω(m+1)(q+1, a, s∗+1) ≤ ω(m+1)(q, a, s∗)+ω(m+1)(q+1, a, s∗+1+δ), (36)
respectively. According to Eq. (31), we can expand every components of the two inequalities. As a
result, we immediately show the two inequalities following the convexity of P (s) and ν(m)(q, a).
We next show the convexity of ν(m+1)(q, a) based on the threshold-based structure of F (m+1).
In particular, the convexity of ν(m+1)(q, a) is given from the definition of ν(m+1)(q, a) as
ω(m+1)(q + 1, a, s) + ω(m+1)(q − 1, a, s) ≥ 2ω(m+1)(q, a, s∗), (37)
where we have s = arg mins∈S(q+1)
{ω(m+1)(q+1, a, s)} and s = arg mins∈S(q−1)
{ω(m+1)(q−1, a, s)}.
Further, we have that s and s∗ are selected from sets {s∗, s∗ + 1} and {s, s + 1}, respectively.
We first present a sufficient condition for Eq. (37) as
ω(m+1)(q + 1, a, s) + ω(m+1)(q − 1, a, s) ≥ ω(m+1)(q, a, s∗) + ω(m+1)(q, a, s′), (38)
where s′ is an arbitrary transmission rate belonging to set S(q), and the sufficiency is guaranteed
DRAFT September 2, 2019
31
by ω(m+1)(q, a, s∗) = mins∈S(q){ω(m+1)(q, a, s)}. Then, we show the sufficient condition by
considering two cases, where s∗ is given as s or s+ 1, respectively. When s∗ = s, we set s′ = s.
By expanding every components in Eq. (38), we verify the sufficient condition based on the
convexity of ν(m)(q, a). When s∗ = s+1, we set s′ = s−1, under which the sufficient condition
holds based on the convexity of P (s). Since the initial ν(0)(q, a) is convex in q, we have that
deterministic policy F (m) satisfies the threshold-based structure expressed in Eq. (23).
We finally supplement the proof for the degenerate case, in which one vertex may locate at
a line segment generated by two vertices that adjacent with this vertex. As a result, multiple
points on the segment can minimize the Lagrangian relaxation problem. In other words, we may
not obtain the optimal policy for this vertex by Algorithm 2. In this way, we present the optimal
policy based on the sensitivity analysis of the equivalent LP problem. With a slight drift in P (s),
we have that the degenerate case can be removed in the derived Lagrangian problem under the
new P (s) and the corresponding optimal policy will be unchanged. Therefore, we can show that
the optimal policy is threshold-based by using the same consideration as above.
APPENDIX D
PROOF OF THEOREM 9Our proof starts with the observation that the optimal threshold-based policy can be presented
as a convex combination of two adjacent deterministic threshold-based policies that correspond
to two adjacent vertices on the optimal tradeoff curve L. As a result, we shall only need to
show Eq. (27) in Theorem 9 for the vertices of L. For each vertex of L, we can also obtain the
optimal policy for the system over a fading channel by using a value iteration, as shown in the
proof of Theorem 5, through which we further show Eq. (27) under condition in Eq. (28). In
particular, a sufficient condition of Eq. (27) is given as
ω(m+1)(q, a, ι+, s∗)+ω(m+1)(q, a, ι−, s∗+ δ)≤ω(m+1)(q, a, ι+, s∗+ δ)+ω(m+1)(q, a, ι−, s∗), (39)
where we denote the value function of the generalized system by ω(m+1)(q, a, ι, s), and defines∗ = mins∈S(q){ω(m+1)(q, a, ι, s)}. Further, by expanding each component, we immediately show
the sufficient condition under the condition in Eq. (28). As a result, we have a greater transmission
rate under the channel state hι+ than hι− . With the threshold-based structure of the optimal policy,
we finally show the order relation in Eq. (27), which completes the proof.
REFERENCES
[1] A. Osseiran, F. Boccardi, V. Braun, K. Kusume, P. Marsch, M. Maternia, O. Queseth, M. Schellmann, H. Schotten, H. Taoka,H. Tullberg, M. A. Uusitalo, B. Timus, and M. Fallgren, “Scenarios for 5G mobile and wireless communications: Thevision of the METIS project,” IEEE Communications Magazine, vol. 52, no. 5, pp. 26–35, May 2014.
September 2, 2019 DRAFT
32
[2] M. Simsek, A. Aijaz, M. Dohler, J. Sachs, and G. Fettweis, “5G-enabled tactile internet,” IEEE Journal on Selected Areasin Communications, vol. 34, no. 3, pp. 460–473, March 2016.
[3] S. Buzzi, C. I, T. E. Klein, H. V. Poor, C. Yang, and A. Zappone, “A survey of energy-efficient techniques for 5G networksand challenges ahead,” IEEE Journal on Selected Areas in Communications, vol. 34, no. 4, pp. 697–709, April 2016.
[4] R. Q. Hu and Y. Qian, “An energy efficient and spectrum efficient wireless heterogeneous network framework for 5Gsystems,” IEEE Communications Magazine, vol. 52, no. 5, pp. 94–101, May 2014.
[5] C. She, C. Yang, and T. Q. S. Quek, “Radio resource management for ultra-reliable and low-latency communications,”IEEE Communications Magazine, vol. 55, no. 6, pp. 72–78, June 2017.
[6] Q. Liu, S. Zhou, and G. B. Giannakis, “Cross-Layer combining of adaptive Modulation and coding with truncated ARQover wireless links,” IEEE Transactions on Wireless Communications, vol. 3, no. 5, pp. 1746–1755, Sep. 2004.
[7] D. V. Djonin and V. Krishnamurthy, “MIMO transmission control in fading channels-a constrained Markov decision processformulation with monotone randomized policies,” IEEE Transactions on Signal Processing, vol. 55, no. 10, pp. 5069–5083,2007.
[8] A. E. Gamal, C. Nair, B. Prabhakar, E. Uysal-Biyikoglu, and S. Zahedi, “Energy-efficient scheduling of packet transmissionsover wireless networks,” in Proc. IEEE International Conference on Computer Communications (INFOCOM), June 2002,pp. 1773–1782.
[9] U. C. Kozat, I. Koutsopoulos, and L. Tassiulas, “A framework for cross-layer design of energy-efficient communication withQoS provisioning in multi-hop wireless networks,” in Proc. IEEE International Conference on Computer Communications(INFOCOM), March 2004, pp. 1446–1456.
[10] Z. Hou, C. She, Y. Li, T. Q. S. Quek, and B. Vucetic, “Burstiness aware bandwidth reservation for ultra-reliable andlow-latency communications (URLLC) in tactile internet,” IEEE Journal on Selected Areas in Communications, vol. 36,no. 11, pp. 2401–2410, Nov. 2018.
[11] J. Hu, L. Yang, and L. Hanzo, “Energy-efficient cross-layer design of wireless mesh networks for content sharing in onlinesocial networks,” IEEE Transactions on Vehicular Technology, vol. 66, no. 9, pp. 8495–8509, Sep. 2017.
[12] B. Collins and R. L. Cruz, “Transmission policies for time varying channels with average delay constraints,” in Proc.Allerton Conference on Communication, Control, and Computing (Allerton), 1999, pp. 709–717.
[13] R. A. Berry and R. G. Gallager, “Communication over fading channels with delay constraints,” IEEE Transactions onInformation Theory, vol. 48, no. 5, pp. 1135–1149, 2002.
[14] R. Berry, “Optimal power-delay tradeoffs in fading channels–small-delay asymptotics,” IEEE Transactions on InformationTheory, vol. 59, no. 6, pp. 3939–3952, June 2013.
[15] D. Rajan, A. Sabharwal, and B. Aazhang, “Delay-bounded packet scheduling of bursty traffic over wireless channels,”IEEE Transactions on Information Theory, vol. 50, no. 1, pp. 125–144, 2004.
[16] W. Chen, Z. Cao, and K. B. Letaief, “Optimal delay-power tradeoff in wireless transmission with fixed modulation,” inProc. IEEE International Workshop on Cross Layer Design (IWCLD), 2007, pp. 60–64.
[17] M. Goyal, A. Kumar, and V. Sharma, “Power constrained and delay optimal policies for scheduling transmission over afading channel,” in Proc. IEEE International Conference on Computer Communications (INFOCOM), 2003, pp. 311–320.
[18] B. Ata, “Dynamic power control in a wireless static channel subject to a quality-of-service constraint,” Operations Research,vol. 53, no. 5, pp. 842–851, 2005.
[19] M. H. Ngo and V. Krishnamurthy, “Monotonicity of constrained optimal transmission policies in correlated fading channelswith ARQ,” IEEE Transactions on Signal Processing, vol. 58, no. 1, pp. 438–451, 2010.
[20] L. Liu, A. Chattopadhyay, and U. Mitra, “On solving MDPs with large state space: Exploitation of policy structures andspectral properties,” IEEE Transactions on Communications, Early Access, 2019.
[21] N. Sharma, N. Mastronarde, and J. Chakareski, “Accelerated structure-aware reinforcement learning for delay-sensitiveenergy harvesting wireless sensors,” CoRR, vol. abs/1807.08315, 2018. [Online]. Available: http://arxiv.org/abs/1807.08315
[22] X. Chen, W. Chen, J. Lee, and N. B. Shroff, “Delay-optimal buffer-aware scheduling with adaptive transmission,” IEEETransactions on Communications, vol. 65, no. 7, pp. 2917–2930, July 2017.
[23] M. Wang, J. Liu, W. Chen, and A. Ephremides, “Joint queue-aware and channel-aware delay optimal scheduling ofarbitrarily bursty traffic over multi-state time-varying channels,” IEEE Transactions on Communications, vol. 67, no. 1,pp. 503–517, Jan 2019.
[24] J. Liu, W. Chen, and K. B. Letaief, “Delay optimal scheduling for ARQ-aided power-constrained packet transmission overmulti-state fading channels,” IEEE Transactions on Wireless Communications, vol. 16, no. 11, pp. 7123–7137, Nov. 2017.
[25] X. Chen, W. Chen, J. Lee, and N. B. Shroff, “Delay-optimal probabilistic scheduling in green communications with arbitraryarrival and adaptive transmission,” in Proc. IEEE International Conference on Communications (ICC), May 2017, pp. 1–6.
[26] V. Paxson and S. Floyd, “Wide area traffic: The failure of poisson modeling,” IEEE/ACM Transactions on Networking,
DRAFT September 2, 2019
33
vol. 3, no. 3, pp. 226–244, June 1995.[27] M. L. Puterman, Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, 2014.[28] E. P. Kao, An introduction to stochastic processes. Cengage Learning, 1997.[29] E. Altman, Constrained Markov decision processes. CRC Press, 1999.
September 2, 2019 DRAFT