solution of evolutionary games via hamilton-jacobi-bellman ... · solution of evolutionary games...

1
Solution of Evolutionary Games via Hamilton-Jacobi-Bellman Equations Nikolay Krasovskii * , Arkady Kryazhimskiy ** , Alexander Tarasyev **,*** * Ural State Agrarian University, Ekaterinburg, Russia ** International Institute for Applied System Analysis, Laxenburg, Austria *** N.N. Krasovskii Institute of Mathematics and Mechanics, UrB RAS, Ekaterinburg, Russia Introduction The paper is focused on construction of solution for bimatrix evolutionary games basing on methods of the theory of optimal control and generalized solutions of Hamilton-Jacobi-Bellman equations. It is assumed that the evolutionary dynamics de- scribes interactions of agents in large population groups in bio- logical and social models or interactions of investors on finan- cial markets. Interactions of agents are subject to the dynamic process which provides the possibility to control flows between different types of behavior or investments. Parameters of the dynamics are not fixed a priori and can be treated as controls constructed either as time programs or feedbacks. Payoff functionals in the evolutionary game of two coalitions are determined by the limit of average matrix gains on infinite hori- zon. The notion of a dynamical Nash equilibrium is introduced in the class of control feedbacks within Krasovskii’s theory of differential games. Elements of a dynamical Nash equilibrium are based on guar- anteed feedbacks constructed within the framework of the the- ory of generalized solutions of Hamilton-Jacobi-Bellman equa- tions. The value functions for the series of differential games are constructed analytically and their stability properties are verified using the technique of conjugate derivatives. The equilibrium trajectories are generated on the basis of pos- itive feedbacks originated by value functions. It is shown that the proposed approach provides new qualitative results for the equilibrium trajectories in evolutionary games and ensures bet- ter results for payoff functionals than replicator dynamics in evolutionary games or Nash values in static bimatrix games. The efficiency of the proposed approach is demonstrated by applications to construction of equilibrium dynamics for agents’ interactions on financial markets. Evolutionary Game Let us consider the system of differential equations which de- scribes behavioral dynamics for two coalitions: ˙ x = -x + u, ˙ y = -y + v. (1) Parameter x, 0 x 1 is the probability of the fact that a ran- domly taken individual of the first coalition holds the first strat- egy. Parameter y , 0 y 1 is the probability of choosing the first strategy by an individual of the second coalition. Control parameters u and v satisfy the restrictions 0 u 1, 0 v 1 and can be interpreted as signals for individuals to change their strategies. The system dynamics (1) is interpreted as a version of controlled Kolmogorov’s equations [5] and generalizes evo- lutionary games dynamics [1, 2, 3, 9]. The terminal payoff functionals of coalitions are defined as mathematical expectations corresponding to payoff matrixes A = {a ij }, B = {b ij }, i, j =1, 2 and can be interpreted as “local” interests of coalitions: g A (x(T ),y (T )) = C A x(T )y (T ) - α 1 x(T ) - α 2 y (T )+ a 22 , (2) at a given instant T . Here parameters C A , α 1 , α 2 are deter- mined according to the classical theory of bimatrix games [12]: C A = a 11 - a 12 - a 21 + a 22 , α 1 = a 22 - a 12 , α 2 = a 22 - a 21 . (3) The payoff function g B for the second coalition is determined according to coefficients of matrix B . “Global” interests J A of the first coalition are defined as: J A =[J - A ,J + A ], J - A = lim inf t→∞ g A (x(t),y (t)), J + A = lim sup t→∞ g A (x(t),y (t)). (4) Interests J B of the second coalition are defined analogously. We consider the solution of the evolutionary game basing on the optimal control theory [10] and differential games [8]. Fol- lowing [4, 7, 8, 9] we introduce the notion of a dynamical Nash equilibrium in the class of closed-loop strategies (feedbacks) U = u(t, x, y, ε), V = v (t, x, y, ε). Definition 1. Let ε> 0 and (x 0 ,y 0 ) [0, 1] × [0, 1]. A pair of feedbacks U 0 = u 0 (t, x, y, ε), V 0 = v 0 (t, x, y, ε) is called a Nash equilibrium for an initial position (x 0 ,y 0 ) if for any other feed- backs U = u(t, x, y, ε), V = v (t, x, y, ε) the following condition holds: the inequalities: J - A (x 0 (·),y 0 (·)) J + A (x 1 (·),y 1 (·)) - ε, J - B (x 0 (·),y 0 (·)) J + B (x 2 (·),y 2 (·)) - ε, (5) are valid for all trajectories: (x 0 (·),y 0 (·)) X (x 0 ,y 0 ,U 0 ,V 0 ), (x 1 (·),y 1 (·)) X (x 0 ,y 0 , U, V 0 ), (x 2 (·),y 2 (·)) X (x 0 ,y 0 ,U 0 ,V ). Here the symbol X stands for the set of trajectories, which start from the initial point (x 0 ,y 0 ) and are generated by the corre- sponded strategies (U 0 ,V 0 ), (U, V 0 ), (U 0 ,V ). Dynamic Nash equilibrium can be constructed by pasting posi- tive feedbacks u 0 A , v 0 B and punishing feedbacks u 0 B , v 0 A accord- ing to relations [4]: U 0 = u 0 (t, x, y, ε) u ε A (t), k(x, y ) - (x ε (t),y ε (t))k < ε, u 0 B (x, y ), otherwise, (6) V 0 = v 0 (t, x, y, ε) v ε B (t), k(x, y ) - (x ε (t),y ε (t))k < ε, v 0 A (x, y ), otherwise. (7) Value Function for Positive Feedback The main role in construction of dynamic Nash equilibrium be- longs to positive feedbacks u 0 A , v 0 B , which maximize with guar- antee the mean values g A , g B on the infinite horizon T →∞. For this purpose we construct value functions w A , w B in zero sum games with the infinite horizon. Basing on the method of generalized characteristics for Hamilton-Jacobi-Bellman equa- tions we obtain the analytical structure for value functions. For example, in the case when C A < 0 the value function w A is determined by the system of four functions: w A (x, y )= ψ i A (x, y ), if (x, y ) E i A , i =1, ..., 4, (8) ψ 1 A (x, y )= a 21 + ((C A - α 1 )x + α 2 (1 - y )) 2 4C A x(1 - y ) , ψ 2 A (x, y )= a 12 + (α 1 (1 - x)+(C A - α 2 )y ) 2 4C A (1 - x)y , ψ 3 A (x, y )= C A xy - α 1 x - α 2 y + a 22 , ψ 4 A (x, y )= v A = a 22 C A - α 1 α 2 C A . Here v A is the value of the static game with matrix A. The value function w A is presented in the Figure 1. Fig. 1. Structure of the value function w A . It is shown that the value function w A has properties of u- stability and v -stability [6, 8] which can be expressed in terms of conjugate derivatives [11]: D * w A (x, y )|(s) H (x, y, s), (x, y ) (0, 1) × (0, 1), s =(s 1 ,s 2 ) R 2 , (9) D * w A (x, y )|(s) H (x, y, s), (x, y ) (0, 1) × (0, 1), w A (x, y ) <g A (x, y ), s =(s 1 ,s 2 ) R 2 . (10) Here the conjugate derivatives D * w A , D * w A and the Hamilto- nian H are determined by: D * w A (x, y )|(s) = sup hR 2 (hs, hi- - w A (x, y )|(h)), (11) D * w A (x, y )|(s)= inf hR 2 (hs, hi- + w A (x, y )|(h)), (12) H (x, y, s)= -s 1 x - s 2 y + max{0,s 1 } + min{0,s 2 }. (13) Model Applications Application 1. Let us consider payoff matrices for two players on financial markets of bond and assets. Matrices A, B reflect the behavior of “bulls” and “bears”, respectively: A = 10 0 1.75 3 , B = -5 3 10 0.5 . In the Figure 2 we depict the static Nash equilibrium NE , switching lines K A , K B for feedback strategies, the new equi- librium at the point ME of their intersection, and equilibrium trajectories T 1 , T 2 , T 3 . The new equilibrium point ME differs essentially from the static Nash equilibrium NE and provides better results for payoff functions of both players. Fig. 2. Equilibrium trajectories for the financial markets game. Application 2. Let us consider an example of coordination games. These games envisage coordinated solutions. Such situation describes the investment process in parallel projects: A = 10 0 6 20 , B = 20 0 4 10 . Figure 3 presents the case with three static Nash equilibria N 1 , N 2 , N 3 . The intersection point of switching lines K A , K B does not attract equilibrium trajectories T 1 , T 2 , T 3 , T 4 . Trajectories converge to intersection points of lines K A , K B with the edges of the unit square and provide better payoff results than the Nash equilibrium N 2 . Fig. 3. Equilibrium trajectories in the coordination game of investments. References [1] Basar T., Olsder G.J., Dynamic Noncooperative Game Theory. London: Academic Press, 1982. 519 p. [2] Friedman D., Evolutionary Games in Economics, Econo- metrica. 1991. Vol. 59, No. 3. P. 637–666. [3] Hofbauer J., Sigmund K., The Theory of Evolution and Dynamic Systems. Cambridge, Cambridge Univ. Press. 1988. 341 p. [4] Kleimenov A.F., Nonantagonistic Positional Differential Games, Ekaterinburg, Nauka, 1993. 184 p. [5] Kolmogorov A.N., On Analytical Methods in Probabil- ity Theory // Progress of Mathematical Sciences. 1938. Vol. 5. P. 5–41. [6] Krasovskii A.N., Krasovskii N.N., Control Under Lack of Information. Boston etc.: Birkhauser, 1994. 320 p. [7] Krasovskii N.A., Kryazhimskiy A.V., Tarasyev A.M., Hamilton-Jacobi Equations in Evolutionary Games // Pro- ceedings of IMM UrB RAS. 2014. Vol. 20. 3. P. 114-131. [8] Krasovskii N.N., Subbotin A.I., Game-Theoretical Control Problems, NY, Berlin, Springer, 1988. 517 p. [9] Kryazhimskii A.V., Osipov Y.S., On Differential- Evolutionary Games // Proceedings of Math. Institute of RAS. 1995. Vol. 211. P. 257-287. [10] Pontryagin L.S., Boltyanskii V.G., Gamkrelidze R.V., and Mischenko E.F., The Mathematical Theory of Optimal Processes. New York: Interscience, 1962. 360 p. [11] Subbotin A.I., Tarasyev A.M., Conjugate Derivatives of the Value Function of a Differential Game, Soviet Math. Dokl., 1985, Vol. 32, No. 2, P. 162-166 [12] Vorobyev N.N., Game Theory for Economists and System Scientists, Moscow, Nauka, 1985. 271 p. Acknowledgments The research is supported by the Russian Science Foundation grant No. 15-11-10018. Contact Information E-mails: [email protected] (Nikolay Krasovskii) [email protected], [email protected] (Alexander Tarasyev).

Upload: others

Post on 03-Jun-2020

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Solution of Evolutionary Games via Hamilton-Jacobi-Bellman ... · Solution of Evolutionary Games via Hamilton-Jacobi-Bellman Equations Nikolay Krasovskii*, Arkady Kryazhimskiy **,

Solution of Evolutionary Gamesvia Hamilton-Jacobi-Bellman Equations

Nikolay Krasovskii*, Arkady Kryazhimskiy **, Alexander Tarasyev**,***

∗ Ural State Agrarian University, Ekaterinburg, Russia∗∗ International Institute for Applied System Analysis, Laxenburg, Austria

∗∗∗ N.N. Krasovskii Institute of Mathematics and Mechanics, UrB RAS, Ekaterinburg, Russia

Introduction

The paper is focused on construction of solution for bimatrixevolutionary games basing on methods of the theory of optimalcontrol and generalized solutions of Hamilton-Jacobi-Bellmanequations. It is assumed that the evolutionary dynamics de-scribes interactions of agents in large population groups in bio-logical and social models or interactions of investors on finan-cial markets. Interactions of agents are subject to the dynamicprocess which provides the possibility to control flows betweendifferent types of behavior or investments. Parameters of thedynamics are not fixed a priori and can be treated as controlsconstructed either as time programs or feedbacks.Payoff functionals in the evolutionary game of two coalitions aredetermined by the limit of average matrix gains on infinite hori-zon. The notion of a dynamical Nash equilibrium is introducedin the class of control feedbacks within Krasovskii’s theory ofdifferential games.Elements of a dynamical Nash equilibrium are based on guar-anteed feedbacks constructed within the framework of the the-ory of generalized solutions of Hamilton-Jacobi-Bellman equa-tions. The value functions for the series of differential gamesare constructed analytically and their stability properties areverified using the technique of conjugate derivatives.The equilibrium trajectories are generated on the basis of pos-itive feedbacks originated by value functions. It is shown thatthe proposed approach provides new qualitative results for theequilibrium trajectories in evolutionary games and ensures bet-ter results for payoff functionals than replicator dynamics inevolutionary games or Nash values in static bimatrix games.The efficiency of the proposed approach is demonstrated byapplications to construction of equilibrium dynamics for agents’interactions on financial markets.

Evolutionary Game

Let us consider the system of differential equations which de-scribes behavioral dynamics for two coalitions:

x = −x + u, y = −y + v. (1)

Parameter x, 0 ≤ x ≤ 1 is the probability of the fact that a ran-domly taken individual of the first coalition holds the first strat-egy. Parameter y, 0 ≤ y ≤ 1 is the probability of choosing thefirst strategy by an individual of the second coalition. Controlparameters u and v satisfy the restrictions 0 ≤ u ≤ 1, 0 ≤ v ≤ 1and can be interpreted as signals for individuals to change theirstrategies. The system dynamics (1) is interpreted as a versionof controlled Kolmogorov’s equations [5] and generalizes evo-lutionary games dynamics [1, 2, 3, 9].The terminal payoff functionals of coalitions are defined asmathematical expectations corresponding to payoff matrixesA = {aij}, B = {bij}, i, j = 1, 2 and can be interpreted as“local” interests of coalitions:

gA(x(T ), y(T )) = CAx(T )y(T )− α1x(T )− α2y(T ) + a22, (2)

at a given instant T . Here parameters CA, α1, α2 are deter-mined according to the classical theory of bimatrix games [12]:

CA = a11−a12−a21+a22, α1 = a22−a12, α2 = a22−a21. (3)

The payoff function gB for the second coalition is determinedaccording to coefficients of matrix B.“Global” interests J∞A of the first coalition are defined as:

J∞A = [J−A , J+A ], J−A = lim inf

t→∞gA(x(t), y(t)),

J+A = lim supt→∞

gA(x(t), y(t)).(4)

Interests J∞B of the second coalition are defined analogously.We consider the solution of the evolutionary game basing onthe optimal control theory [10] and differential games [8]. Fol-lowing [4, 7, 8, 9] we introduce the notion of a dynamical Nashequilibrium in the class of closed-loop strategies (feedbacks)U = u(t, x, y, ε), V = v(t, x, y, ε).Definition 1. Let ε > 0 and (x0, y0) ∈ [0, 1] × [0, 1]. A pair offeedbacks U0 = u0(t, x, y, ε), V 0 = v0(t, x, y, ε) is called a Nashequilibrium for an initial position (x0, y0) if for any other feed-backs U = u(t, x, y, ε), V = v(t, x, y, ε) the following conditionholds: the inequalities:

J−A (x0(·), y0(·)) ≥ J+A(x1(·), y1(·))− ε,J−B (x0(·), y0(·)) ≥ J+B(x2(·), y2(·))− ε,

(5)

are valid for all trajectories:

(x0(·), y0(·)) ∈ X(x0, y0, U0, V 0), (x1(·), y1(·)) ∈ X(x0, y0, U, V

0),

(x2(·), y2(·)) ∈ X(x0, y0, U0, V ).

Here the symbol X stands for the set of trajectories, which startfrom the initial point (x0, y0) and are generated by the corre-sponded strategies (U0, V 0), (U, V 0), (U0, V ).Dynamic Nash equilibrium can be constructed by pasting posi-tive feedbacks u0A, v0B and punishing feedbacks u0B, v0A accord-ing to relations [4]:

U0 = u0(t, x, y, ε)

{uεA(t), ‖(x, y)− (xε(t), yε(t))‖ < ε,

u0B(x, y), otherwise,(6)

V 0 = v0(t, x, y, ε)

{vεB(t), ‖(x, y)− (xε(t), yε(t))‖ < ε,

v0A(x, y), otherwise.(7)

Value Function for Positive Feedback

The main role in construction of dynamic Nash equilibrium be-longs to positive feedbacks u0A, v0B, which maximize with guar-antee the mean values gA, gB on the infinite horizon T → ∞.For this purpose we construct value functions wA, wB in zerosum games with the infinite horizon. Basing on the method ofgeneralized characteristics for Hamilton-Jacobi-Bellman equa-tions we obtain the analytical structure for value functions. Forexample, in the case when CA < 0 the value function wA isdetermined by the system of four functions:

wA(x, y) = ψiA(x, y), if (x, y) ∈ EiA, i = 1, ..., 4, (8)

ψ1A(x, y) = a21 +((CA − α1)x + α2(1− y))2

4CAx(1− y),

ψ2A(x, y) = a12 +(α1(1− x) + (CA − α2)y)2

4CA(1− x)y,

ψ3A(x, y) = CAxy − α1x− α2y + a22,

ψ4A(x, y) = vA =a22CA − α1α2

CA.

Here vA is the value of the static game with matrix A. The valuefunction wA is presented in the Figure 1.

Fig. 1. Structure of the value function wA.

It is shown that the value function wA has properties of u-stability and v-stability [6, 8] which can be expressed in termsof conjugate derivatives [11]:

D∗wA(x, y)|(s) ≤ H(x, y, s), (x, y) ∈ (0, 1)× (0, 1),

s = (s1, s2) ∈ R2,(9)

D∗wA(x, y)|(s) ≥ H(x, y, s), (x, y) ∈ (0, 1)× (0, 1),

wA(x, y) < gA(x, y), s = (s1, s2) ∈ R2.(10)

Here the conjugate derivatives D∗wA, D∗wA and the Hamilto-nian H are determined by:

D∗wA(x, y)|(s) = suph∈R2

(〈s, h〉 − ∂−wA(x, y)|(h)), (11)

D∗wA(x, y)|(s) = infh∈R2

(〈s, h〉 − ∂+wA(x, y)|(h)), (12)

H(x, y, s) = −s1x− s2y +max{0, s1} +min{0, s2}. (13)

Model Applications

Application 1. Let us consider payoff matrices for two playerson financial markets of bond and assets. Matrices A, B reflectthe behavior of “bulls” and “bears”, respectively:

A =

(10 01.75 3

), B =

(−5 310 0.5

).

In the Figure 2 we depict the static Nash equilibrium NE,switching lines KA, KB for feedback strategies, the new equi-librium at the point ME of their intersection, and equilibriumtrajectories T1, T2, T3. The new equilibrium point ME differsessentially from the static Nash equilibrium NE and providesbetter results for payoff functions of both players.

Fig. 2. Equilibrium trajectories for the financial markets game.

Application 2. Let us consider an example of coordinationgames. These games envisage coordinated solutions. Suchsituation describes the investment process in parallel projects:

A =

(10 06 20

), B =

(20 04 10

).

Figure 3 presents the case with three static Nash equilibria N1,N2, N3. The intersection point of switching lines KA, KB doesnot attract equilibrium trajectories T1, T2, T3, T4. Trajectoriesconverge to intersection points of lines KA, KB with the edgesof the unit square and provide better payoff results than theNash equilibrium N2.

Fig. 3. Equilibrium trajectories in the coordination game ofinvestments.

References

[1] Basar T., Olsder G.J., Dynamic Noncooperative GameTheory. London: Academic Press, 1982. 519 p.

[2] Friedman D., Evolutionary Games in Economics, Econo-metrica. 1991. Vol. 59, No. 3. P. 637–666.

[3] Hofbauer J., Sigmund K., The Theory of Evolution andDynamic Systems. Cambridge, Cambridge Univ. Press.1988. 341 p.

[4] Kleimenov A.F., Nonantagonistic Positional DifferentialGames, Ekaterinburg, Nauka, 1993. 184 p.

[5] Kolmogorov A.N., On Analytical Methods in Probabil-ity Theory // Progress of Mathematical Sciences. 1938.Vol. 5. P. 5–41.

[6] Krasovskii A.N., Krasovskii N.N., Control Under Lack ofInformation. Boston etc.: Birkhauser, 1994. 320 p.

[7] Krasovskii N.A., Kryazhimskiy A.V., Tarasyev A.M.,Hamilton-Jacobi Equations in Evolutionary Games // Pro-ceedings of IMM UrB RAS. 2014. Vol. 20. 3. P. 114-131.

[8] Krasovskii N.N., Subbotin A.I., Game-Theoretical ControlProblems, NY, Berlin, Springer, 1988. 517 p.

[9] Kryazhimskii A.V., Osipov Y.S., On Differential-Evolutionary Games // Proceedings of Math. Institute ofRAS. 1995. Vol. 211. P. 257-287.

[10] Pontryagin L.S., Boltyanskii V.G., Gamkrelidze R.V., andMischenko E.F., The Mathematical Theory of OptimalProcesses. New York: Interscience, 1962. 360 p.

[11] Subbotin A.I., Tarasyev A.M., Conjugate Derivatives ofthe Value Function of a Differential Game, Soviet Math.Dokl., 1985, Vol. 32, No. 2, P. 162-166

[12] Vorobyev N.N., Game Theory for Economists and SystemScientists, Moscow, Nauka, 1985. 271 p.

Acknowledgments

The research is supported by the Russian Science Foundationgrant No. 15-11-10018.

Contact Information

E-mails: [email protected] (Nikolay Krasovskii)[email protected], [email protected] (Alexander Tarasyev).