learning equilibria with partial information in cognitive radio … · 2011-05-09 · learning...
TRANSCRIPT
Learning Equilibria with Partial Information inCognitive Radio Networks
Samir M. PerlazaAlcatel Lucent Chair in Flexible Radio at Supelec
Joint work with L. Rose and C. Le Martret from Thales Communications and M.Debbah from Alcatel Lucent Chair in Flexible Radio at Supelec
10 ans de Radio Intelligente : bilan et perspectivesGDR-ISIS Seminar. May 2011. Paris, France.
Outline
Motivation
Learning Equilibria in Cognitive Radio Networks
Study Cases
Conclusions
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Outline
Motivation
Learning Equilibria in Cognitive Radio Networks
Study Cases
Conclusions
1 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Cognitive Radios (CR)
A CR is a radio that can change its transmitter parameters based oninteraction with the environment in which it operates
Federal Communications Commission (FCC)
2 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Game Theory
GT is... a a branch of mathematics which studies the interaction betweenseveral decision-takers aiming at maximizing a common or individual benefit.
One Decision-taker: Optimization and Control Theory
Several Decision-takers: Game Theory (due to interactions).
3 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Modeling Cognitive Radio Networks as Games
Consider the game in normal form:
G =(K, {Ak}k∈K , {uk}k∈K
)(1)
Players: transmitters orreceivers.
K = {1, . . . ,K}. (2)
Actions:
Power AllocationModulation - CodingDecoding Order...
Ak ={
A(1)k , . . . ,A(Nk )
k
}. (3)
Utility function:
uk : A1 × . . .×AK → R (4)
Data RateBEREnergy Efficiency...
4 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Nash Equilibrium and Cognitive Radio NetworksJ. F. Nash, “Equilibrium points in n-person games”, Proceedings of the National Academy of Sciences of the UnitedStates of America, vol. 36, no. 1, pp. 48-49, 1950.
Player k chooses its action ak ∈ Ak following a probability distribution[Borel-1921].
πk =
(π
k,A(1)k, . . . , π
k,A(Nk )
k
)∈ 4 (Ak ) . (5)
Here, πk,A
(nk )
k= Pr
(ak = A(nk )
k
).
Definition (Nash Equilibrium (Nash-1950))A strategy profile π∗ ∈ 4 (A1)× . . .×4 (AK ) is an NE if, for allplayers k ∈ K and ∀πk ∈ 4 (Ak )
uk (π∗k ,π
∗−k ) > uk (πk ,π
∗−k )
5 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Example: A Parallel Multiple Access Channel (1)S. M. Perlaza, H. Tembine, S. Lasaulce, and V. Quintero-Florez, “On the Fictitious play and channel selectiongames”, in IEEE Latin-American Conference on Communications (LATINCOM), Bogota, Colombia, 2010.
Consider the gameG =
(K, {Ak}k∈K , {uk}k∈K
).
K={1,2} and ∀k∈K,
Ak={(0,pmax ),(pmax ,0)}.
uk=12∑2
n=1 log2
(1+
pk,ngk,nσ2+p−k,ng−k,n
).
Tx1\Tx2 A(1)2 =(pmax,0) A(2)
2 =(0,pmax)
A(1)1 =(pmax,0)
12 log2(σ2+pmax(g11+g21))
+ 12 log2(σ2)
12 log2(σ2+pmaxg11)
+ 12 log2(σ2+pmaxg22)
A(2)1 =(0,pmax)
12 log2(σ2+pmaxg12)
+ 12 log2(σ2+pmaxg21)
12 log2(σ2+pmax(g12+g22))
+ 12 log2(σ2)
6 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Example: A Parallel Multiple Access Channel (2)S. M. Perlaza, H. Tembine, S. Lasaulce, and V. Quintero-Florez, “On the Fictitious play and channel selectiongames’’, in IEEE Latin-American Conference on Communications (LATINCOM), Bogota, Colombia, 2010.
1ψ(g22
)
ψ(g
11)
g21g22
1ψ(g12)
1
g11g12
p1 = (0, pmax)p2 = (pmax, 0)
1ψ(g22)
1
0
ψ(g11)
p1 = (pmax, 0)p2 = (0, pmax)
1ψ(g12
)
p1=
(0,p
max )
p2=
(0,p
max )
ψ(g21)
p1 = (pmax, 0)p2 = (pmax, 0)
ψ(g
21)
Figure: NE action profiles as a function of the ratios g11g12
and g21g22
. Thefunction ψ : R+ → R+ is defined as follows: ψ(x) = 1 + SNR x . Here, it hasbeen arbitrarily assumed that ψ(g11) < ψ(g21).
7 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
The Big Challenge
How to achieve a (Nash) equilibrium in cognitive radio networks?
A CR possesses only local information
Topology is dynamically changing.
Signaling between CR is limited.
Spectrum sensing is not reliable.
Some existing contributions: [Sastry-1994] [Yu-2004] [Pang-2008][Scutari-2009] [Leshem-2009] [Larsson-2009] [Iellamo-2011] [Rose-2011][Perlaza-2011a].
8 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Outline
Motivation
Learning Equilibria in Cognitive Radio Networks
Study Cases
Conclusions
9 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Learning Equilibria in Cognitive Radio Networks
Learning Iterative Steps:
Choose action ak (t) ∼ πk (t).
Observe game outcome, e.g.,a−k (t)uk (ak (t), a−k (t)).
Improve πk (t + 1).
h(t)
ak (t)
(i) a−k (t)
(ii) uk (t)
πk (t)
Thus, we can expect that: ∀k ∈ K,
πk (t) t→∞−→ π∗k (6)
uk (πk (t),π−k (t))t→∞−→ uk (π∗k ,π
∗−k ) (7)
where, π∗ = (π∗1 , . . . ,π∗K ) is a NE strategy profile.
10 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Learning Techniques in Cognitive Radio Networks
Best Response Dynamics (BRD):[Scutari-2008a] [Pang-2008] [Larsson-2009] [Perlaza-2009]
Fictitious Play (FP):[Perlaza-2010]
Reinforcement Learning (RL):[Sastry-1994]
Joint Utility STrategy lEarning (JUSTE):[Perlaza-2010c]
Other techniques not treated here:Regret Matching Learning [Altman-2008], Q-Learning [Bennis-2010],Bandits [Liu-2010]], Imitation Learning [Iellamo-2011], etc.
11 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Best Response Dynamics (1)
Definition (Best-Response Correspondence)
In the game G, the correspondence BRk : A−k → Ak , such that
BRk (a−k ) = arg maxa′k∈Ak
uk(a′k , a−k
), (8)
is the best response correspondence of player k, given the actions a−k .
Sequential BRD: One player updates its action.
Simultaneous BRD: All players update its action.
12 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Best Response Dynamics (2)
BRDConvergence NEObservations a−k (t)
Closed Expr. for uk YesCalculation. Cap. Optim.Noise Tolerance. No
Enviroment Static
BRD converges to NE in potential games [Monderer-Shapley-1996a].
13 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Fictitious Play (1)
G. W. Brown, “Iterative solution of games by Fictitious play", Activity Analysis of Production and Allocation, vol.13, no. 1, pp. 374-376, 1951.
At game stage t , player k observes a−k (t − 1).
Each player calculates the empirical frequency of all actions A(nk )k , i.e.,
fk,A
(nk )
k(t) =
1t − 1
t−1∑s=0
1{ak (s)=A
(nk )
k
}, (9)
Player k choses its action ak (t) to maximize its expected utility:
ak (t) ∈ arg maxa′k∈Ak
∑a−k∈A−k
uk(a′k , a−k
) ∏j∈K\{k}
fj,aj (t). (10)
14 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Fictitious Play (2)
BRD FPConvergence NE NEObservations a−k (t) a−k (t)
Closed Expr. for uk Yes YesCalculation. Cap. Optim. Optim.Noise Tolerance. No No
Enviroment Static Stationary
FP converges to NE in
Zero-sum games [Robinson-1951],
Potential games [Monderer-Shapley-1996b],
Generic 2× 2 games [Miyazawa-1961].
Does FP really converges in Potential Games? [Perlaza-2010b]
15 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Cumulative Payoff Matching (1)
Players k observes uk (t) = uk (ak (t), a−k (t)) at each stage.
Player k calculates its cumulative utility θk,A
(nk )
kwith all actions A(nk )
k :
θk,A
(nk )
k(t) =
t−1∑s=0
uk (s)1{ak (s)=A
(nk )
k
}. (11)
The higher θk,A
(nk )
k(t), the higher the probability π
k,A(nk )
k(t), i.e.,
πk,A
(nk )
k(t) =
θk,A
(nk )
k(t)∑
ak∈Akθk,ak (t)
(12)
16 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Cumulative Payoff Matching (2)
BRD FP RLConvergence NE NE −−Observations a−k (t) a−k (t) uk (t)
Closed Expr. for uk Yes Yes NoCalculation. Cap. Optim. Optim. Algeb. Oper.Noise Tolerance. No No No
Enviroment Static Stationary Stationary
In some cases, RL can converge to NE [Sastry-1994].
17 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Joint Utility and Strategy Learning - JUSTES. M. Perlaza, S. Lasaulce, H. Tembine, and M. Debbah, “Radio resource sharing in decentralized wirelessnetworks: A logit equilibrium approach”, Submitted to IEEE JSTP on Signal Processing, June. 2011.
Players k observes uk (t) = uk (ak (t), a−k (t)) at each stage.
Player k calculates its expected utility θk,A
(nk )
kwith all its actions A(nk )
k :
uk,A
(nk )
k(t) =
1T
k,A(nk )
k(t)
t−1∑s=0
uk (s)1{ak (s)=A
(nk )
k
}, (13)
The higher uk,A
(nk )
k(t), the higher the probability π
k,A(nk )
k(t), i.e.,
πk,A
(nk )
k(t) = π
k,A(nk )
k(t − 1) + λk (t)
(β
k,A(nk )
k(uk (t))− π
k,A(nk )
k(t − 1)
).
where β(γk )k (uk (·,π−k )) is a logit function.
18 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Joint Utility and STrategy lEarning - JUSTE (1)
0 2 4 6 8 10 12 140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ExpectedUtility
withthen-thAction
Action Index: n
0 2 4 6 8 10 12 140
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Prob.of
then-thAction
Action Index: n
Smooth Best Response κ = 0.001.
Smooth Best Response κ = 0.1.
Smooth Best Response κ = 1.
Figure: (left) Mean utility achieved with each of the actions. (right)Corresponding Probability Distribution (Logit Function).
19 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Joint Utility and STrategy lEarning - JUSTE (2)
BRD FP RL JUSTEConvergence NE NE −− ε-NEObservations a−k (t) a−k (t) uk (t) uk (t)
Closed Expr. for uk Yes Yes No NoCalculation. Cap. Optim. Optim. Algeb. Oper. Algeb. Oper.Noise Tolerance. No No No Yes
Enviroment Static Stationary Stationary Stationary
JUSTE converges to a ε-NE in potential games [Perlaza-2011a], zero-sumgames with 2 players [Leslie-2003].
20 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Outline
Motivation
Learning Equilibria in Cognitive Radio Networks
Study Cases
Conclusions
21 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Example: A Parallel Interference Channel (1)L. Rose, S. M. Perlaza, and M. Debbah, “On the Nash equilibria in decentralized parallel interferencechannels”, ICC2011, Kyoto, Japan, Jun. 2011.
Consider the gameG =
(K, {Ak}k∈K , {uk}k∈K
).
K={1,2} and ∀k∈K,
Ak={1,0}.
uk (αk ,α−k )=log2
1+αk g(1)k,k
1+α−k g(1)k,−k
+
log2
1+(1−αk )g(2)k,k
1+(1−α−k )g(2)k,−k
.
Tx1\Tx2 α2=1 α2=0
α1=1 (u1(1,1),u2(1,1)) (u1(1,0),u2(0,1))
α1=0 (u1(0,1),u2(1,0)) (u1(0,0),u2(0,0))
Single antenna radios with limited power pmax.
22 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Example: A Parallel Interference Channel (2)L. Rose, S. M. Perlaza, and M. Debbah, “On the Nash equilibria in decentralized parallel interferencechannels”, ICC2011, Kyoto, Japan, Jun. 2011.
0 2 4 6 8 10 12 14 16 18 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Signal to Noise Ratio [dB]
Pro
babili
ty o
f N
E
1 NE
2 NE
Figure: Probability of observing eitherone or two NE in the gameG =
(K, {Ak}k∈K , {uk}k∈K
).
Two cases can be observed:
Unique NE:(α∗1 , α
∗2
)∈ {α1, α2 ∈ {0, 1} :
(α1, α2)}2 NE:
(α∗1 , α
∗2
)∈ {(0, 1) , (1, 0)}
23 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Example: A Parallel Interference Channel (4)Rose, L. and Perlaza, S. M. and Lasaulce, S. and Debbah, M., “Learning Equilibria with Partial Information inWireless Networks", IEEE Communications Magazine, Special Issue in Game Theory for WirelessCommunications, August, 2011.
Figure: Average number of iterations required to observe convergence to a NE in the game
G =(K, {Ak}k∈K , {uk}k∈K
).
24 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Example: A Parallel Interference Channel (5)Rose, L. and Perlaza, S. M. and Lasaulce, S. and Debbah, M., “Learning Equilibria with Partial Information inWireless Networks", IEEE Communications Magazine, Special Issue in Game Theory for WirelessCommunications, August, 2011.
Figure: Average sum spectral efficiency [bps/Hz] as a function of the number of iterations when the signal tonoise ratio (SNR) is fixed to 10 dB.
25 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Example: A Parallel Interference Channel (6)Rose, L. and Perlaza, S. M. and Lasaulce, S. and Debbah, M., “Learning Equilibria with Partial Information inWireless Networks", IEEE Communications Magazine, Special Issue in Game Theory for WirelessCommunications, August, 2011.
Figure: Average sum spectral efficiency [bps/Hz] as a function of the signal to noise ratio (SNR) when themaximum number of iterations has been fixed to 40 iterations.
26 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Example: A Parallel Interference Channel (7)Rose, L. and Perlaza, S. M. and Lasaulce, S. and Debbah, M., “Learning Equilibria with Partial Information inWireless Networks", IEEE Communications Magazine, Special Issue in Game Theory for WirelessCommunications, August, 2011.
2 4 6 8 10 12 14 16 18 200
0.5
1
1.5
2
2.5
Number of Channels (S)
Sum
Spectral
Efficien
cyin
bps/Hz.
Logit Equilibrium, ∀k ∈ K, 1γk
= 0.2
Logit Equilibrium, ∀k ∈ K, 1γk
= 0.4
Logit Equilibrium, ∀k ∈ K, 1γk
= 1
Figure: Achieved sum spectral efficiency in the game G with JUSTE. Here α1(n) = α2(n) = 1
n( 3
4 ),
λ2(n) = 1
n( 2
3 )and λ1(n) = 1
n . Moreover, SNR =pk,maxσ2 = 10 dBs.
27 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Outline
Motivation
Learning Equilibria in Cognitive Radio Networks
Study Cases
Conclusions
28 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
Conclusions
Properties of NE (existance, multiplicity) in CRN are topology-dependent.A general algorithm for achieving NE in CRN is still unknown.
It is possible to converge to NE using only local information. (always?)
More information does not imply better convergence properties.
More choices does not imply better performance.
29 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
ReferencesBorel-1921 Emile Borel, “La théorie du jeu et les équations à noyau symétrique,” Comptes Rendus de
l’Académie des Sciences, vol. 173, pp. 1304-1308, Sept. 1921.
Borkar-1997 V. Borkar, “Stochastic approximation with two timescales”, Systems Control Lett., vol. 29, pp.291Ð294, 1997.
Larsson-2009 Larsson, E. Jorswieck, E. Lindblom, J. Mochaourab, R. “Game theory and the flat fadingGaussian interference channel,” IEEE Transactions on Signal Processing, 2009, October.
Leshem-2009 Leshem, A. Zehavi, E. “Game theory and the frequency selective interference channel,ÓIEEE Transactions on Signal Processing, 2009, October.
Leslie-2003 S. D. Leslie and E. J. Collins, “Convergent multiple-timescales reinforcement learningalgorithms in normal form games,” Ann. Appl. Probab., vol. 13, no. 4, pp. 1231Ð1251, 2003.
Liu-2008 K. Liu and Q. Zhao.“Distributed Learning in Multi-Armed Bandit with Multiple Players”. IEEETransactions on Signal Processing, vol. 58, no. 11, pp. 5667-5681, November, 2010.
Miyazawa-1961 Miyazawa, K. (1961). “On the convergence of the learning process in a 2?2 non- zero-sumtwo-person game”. Econometric Research Program, Princeton
Monderer - 1996a D. Monderer and L. S. Shapley, “Fictitious play property for games with identical interests,"Int. J. Economic Theory, vol. 68, pp. 258-265, 1996.
Monderer - 1996b D. Monderer and L. S. Shapley, , “Potential games,” Games and Economic Behavior, vol. 14,pp. 124-143, 1996.
Nash-1950 Nash, John, “Equilibrium points in n-person games” Proceedings of the National Academy ofSciences 36(1):48-49. 1950
Pang-2008 Jong-Shi Pang, Gesualdo Scutari, Francisco Facchinei, and Chaoxiong Wang, ÒDistributedPower Allocation With Rate Constraints in Gaussian Parallel Interference Channels,Ó IEEETrans. on Information Theory, vol. 54, no. 8, pp. 3471-3489, Aug. 2008
30 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
References
Perlaza-2011a S. M. Perlaza, S. Lasaulce, H. Tembine, and M. Debbah, “Radio resource sharing indecentralized wireless networks: A logit equilibrium approach”, Submitted to IEEE JSTP onSignal Processing. Special Issue in Heterogeneous Networks For Future BroadbandWireless Systems, June. 2011.
Perlaza-2011b Perlaza, S. M. and Tembine, H. and Lasaulce, S. and Debbah, M., “A General Framework forQuality-Of-Service Provisioning in Decentralized Networks”. Submitted to the IEEE Journalin Selected Topics in Signal Processing. Special Issue in Game Theory for SignalProcessing. 2011
Perlaza-2010c S. M. Perlaza, H. Tembine, and S. Lasaulce, “How can ignorant but patient cognitiveterminals learn their strategy and utility?”. SPAWC 2010, Marrakech, Morocco, June 2010.
Perlaza-2010d S. M. Perlaza, H. Tembine, S. Lasaulce, and M. Debbah, “Learning to use the spectrum inself-configuring heterogeneous networks: A logit equilibrium approach,” in 4th InternationalICST Workshop on Game Theory in Communication Networks, 2011
Perlaza-2010e S. M. Perlaza, H. Tembine, S. Lasaulce, and V. Quintero-Florez, “On the fictitious play andchannel selection games,”in IEEE Latin-American Conference on Communications(LATINCOM), Bogota, Colombia, Sept. 2010, pp. 1- 5.
Robinson-1951 An Iterative Method of Solving a Game. Author(s): Julia Robinson. Source: The Annals ofMathematics, Second Series, Vol. 54, No. 2 (Sep., 1951), pp. 296-301
Rose-2011 Rose, L. and Perlaza, S. M., and Debbah, M., “Learning equilibria with partial information inwireless networks,” IEEE Communication Magazine. Special Issue on Game Theory forWireless Communications, Aug. 2011.
Rose-2010 Rose, L. and Perlaza, S. M., and Debbah, M., "On the Nash Equilibria in DecentralizedParallel Interference Channels", in proc. of the IEEE ICC 2011 Workshop on Game Theoryand Resource Allocation for 4G, Kioto, Japan, June, 2011
31 / 32
Motivation Learning Equilibria in Cognitive Radio Networks Study Cases Conclusions
References
Sastry-1994 P. Sastry, V. Phansalkar, and M. Thathachar, “Decentralized learning of Nash equilibria inmulti-person stochastic games with incomplete information," IEEE Transactions on Systems,Man and Cybernetics, vol. 24, no. 5, pp. 769-777, May 1994.
Scutari-2009 Gesualdo Scutari, Daniel P. Palomar, and Sergio Barbarossa, ÒThe MIMO IterativeWaterfilling Algorithm,Ó IEEE Trans. on Signal Processing, vol. 57, no. 5, pp. 1917-1935,May 2009.
Scutari-2008a Gesualdo Scutari, Daniel P. Palomar, and Sergio Barbarossa, ÒAsynchronous IterativeWater-Filling for Gaussian Frequency-Selective Interference Channels,Ó IEEE Trans. onInformation Theory, vol. 54, no. 7, pp. 2868-2878, July 2008.
Scutari-2008b Gesualdo Scutari, Daniel P. Palomar, and Sergio Barbarossa, “Optimal Linear PrecodingStrategies for Wideband Noncooperative Systems Based on Game Theory - Part I: NashEquilibria”. IEEE Trans. on Signal Processing, vol. 56, no. 3, pp. 1230-1249, March 2008.
Scutari-2008c Gesualdo Scutari, Daniel P. Palomar, and Sergio Barbarossa, “Optimal Linear PrecodingStrategies for Wideband Noncooperative Systems Based on Game Theory - Part II:Algorithms”. IEEE Trans. on Signal Processing, vol. 56, no. 3, pp. 1250-1267, March 2008.
Yu-2004 W. Yu, W. Rhee, S. Boyd, and J. Cioffi, “Iterative water-filling for Gaussian vector multipleaccess channels,” IEEE Trans. on Info. Theory, vol. 50, no. 1, pp. 145-152, Jan. 2004.
32 / 32