1192 ieee transactions on automatic …flyingv.ucsd.edu/papers/pdf/157.pdf · nash equilibrium...

16
1192 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 5, MAY 2012 Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, Student Member, IEEE, Miroslav Krstic, Fellow, IEEE, and Tamer Bas ¸ar, Fellow, IEEE Abstract—We introduce a non-model based approach for locally stable convergence to Nash equilibria in static, noncooperative games with players. In classical game theory algorithms, each player employs the knowledge of the functional form of his payoff and the knowledge of the other players’ actions, whereas in the proposed algorithm, the players need to measure only their own payoff values. This strategy is based on the extremum seeking approach, which has previously been developed for standard optimization problems and employs sinusoidal perturbations to estimate the gradient. We consider static games with quadratic payoff functions before generalizing our results to games with non-quadratic payoff functions that are the output of a dynamic system. Specifically, we consider general nonlinear differential equations with inputs and outputs, where in the steady state, the output signals represent the payoff functions of a noncooperative game played by the steady-state values of the input signals. We employ the standard local averaging theory and obtain local convergence results for both quadratic payoffs, where the actual convergence is semi-global, and non-quadratic payoffs, where the potential existence of multiple Nash equilibria precludes semi-global convergence. Our convergence conditions coincide with conditions that arise in model-based Nash equilib- rium seeking. However, in our framework the user is not meant to check these conditions because the payoff functions are presumed to be unknown. For non-quadratic payoffs, convergence to a Nash equilibrium is not perfect, but is biased in proportion to the perturbation amplitudes and the higher derivatives of the payoff functions. We quantify the size of these residual biases and confirm their existence numerically in an example noncooperative game. In this example, we present the first application of extremum seeking with projection to ensure that the players’ actions remain in a given closed and bounded action set. Index Terms—Extremum seeking, learning, Nash equilibria, noncooperative games. I. INTRODUCTION W E study the problem of solving noncooperative games with players in real time by employing a non-model-based approach where the players determine their actions using only their own measured payoff values. By Manuscript received August 30, 2010; revised May 15, 2011; accepted September 23, 2011. Date of publication October 25, 2011; date of current version April 19, 2012.This work was supported by the Department of Defense (DoD), Air Force Office of Scientific Research (AFOSR), National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a, and by grants from National Science Foundation, Department of Energy (DOE), and AFOSR. Recommended by Associate Editor J. Grizzle. P. Frihauf and M. Krstic are with the Department of Mechanical and Aerospace Engineering University of California at San Diego, La Jolla, CA 92093 (e-mail: [email protected]; [email protected]). T. Bas ¸ar is with the Coordinated Science Laboratory, University of Illinois, Urbana, IL 61801-2307 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TAC.2011.2173412 utilizing deterministic extremum seeking with sinusoidal per- turbations, the players attain their Nash strategies without the need for any model information. We analyze both static games and games with dynamics, where the players’ actions serve as inputs to a general, stable nonlinear dynamic system whose outputs are the players’ payoff values. In the latter scenario, the dynamic system evolves on a faster time scale compared to the time scale of the players’ strategies, resulting in a static game being played at the steady state of the dynamic system. In these scenarios, the players possess either quadratic or non-quadratic payoff functions, which may result in multiple, isolated Nash equilibria. We quantify the convergence bias relative to the Nash equilibria that results from payoff function terms of higher order than quadratic. 1) Literature Review: Most algorithms designed to achieve convergence to Nash equilibria require modeling information of the game and assume the players can observe the actions of the other players. Two classical examples are the best re- sponse and fictitious play strategies, where each player chooses the action that maximizes its payoff given the actions of the other players. The Cournot adjustment, a myopic best response strategy, was first studied by Cournot [1] and refers to the sce- nario where a firm in a duopoly adjusts its output to maximize its payoff based on the known output of its competitor. The strategy known as fictitious play (employed in finite games), where a player devises a best response based on the history of the other players’ actions was introduced in [2] in the context of mixed-strategy Nash equilibria in matrix games. In [3], a gra- dient-based learning strategy where a player updates its action according to the gradient of its payoff function was developed. Stability of general player adjustments was studied in [4] under the assumption that a player’s response mapping is a contrac- tion, which is ensured by a diagonal dominance condition for games with quadratic payoff functions. Distributed iterative al- gorithms for the computation of equilibria in a general class of non-quadratic convex Nash games were designed in [5], and conditions for the contraction of general nonlinear operators were obtained to achieve convergence. In more recent work, a dynamic version of fictitious play and gradient response, which also includes an entropy term, is de- veloped in [6] and is shown to converge to a mixed-strategy Nash equilibrium in cases that previously developed algorithms did not converge. In [7], a synchronous distributed learning al- gorithm, where players remember their own actions and utility values from the previous two times steps, is shown to converge in probability to a set of restricted Nash equilibria. An approach that is similar to our Nash seeking method (found in [8], [9] and in this paper) is studied in [10] to solve coordination problems in mobile sensor networks. Additional results on learning in games can be found in [11]–[15]. Some diverse engineering applica- tions of game theory include the design of communication net- 0018-9286/$26.00 © 2011 IEEE

Upload: lamhuong

Post on 09-May-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

1192 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 5, MAY 2012

Nash Equilibrium Seeking in Noncooperative GamesPaul Frihauf, Student Member, IEEE, Miroslav Krstic, Fellow, IEEE, and Tamer Basar, Fellow, IEEE

Abstract—We introduce a non-model based approach for locallystable convergence to Nash equilibria in static, noncooperativegames with players. In classical game theory algorithms, eachplayer employs the knowledge of the functional form of his payoffand the knowledge of the other players’ actions, whereas in theproposed algorithm, the players need to measure only their ownpayoff values. This strategy is based on the extremum seekingapproach, which has previously been developed for standardoptimization problems and employs sinusoidal perturbations toestimate the gradient. We consider static games with quadraticpayoff functions before generalizing our results to games withnon-quadratic payoff functions that are the output of a dynamicsystem. Specifically, we consider general nonlinear differentialequations with inputs and outputs, where in the steadystate, the output signals represent the payoff functions of anoncooperative game played by the steady-state values of theinput signals. We employ the standard local averaging theoryand obtain local convergence results for both quadratic payoffs,where the actual convergence is semi-global, and non-quadraticpayoffs, where the potential existence of multiple Nash equilibriaprecludes semi-global convergence. Our convergence conditionscoincide with conditions that arise in model-based Nash equilib-rium seeking. However, in our framework the user is not meant tocheck these conditions because the payoff functions are presumedto be unknown. For non-quadratic payoffs, convergence to aNash equilibrium is not perfect, but is biased in proportion to theperturbation amplitudes and the higher derivatives of the payofffunctions. We quantify the size of these residual biases and confirmtheir existence numerically in an example noncooperative game.In this example, we present the first application of extremumseeking with projection to ensure that the players’ actions remainin a given closed and bounded action set.

Index Terms—Extremum seeking, learning, Nash equilibria,noncooperative games.

I. INTRODUCTION

W E study the problem of solving noncooperativegames with players in real time by employing a

non-model-based approach where the players determine theiractions using only their own measured payoff values. By

Manuscript received August 30, 2010; revised May 15, 2011; acceptedSeptember 23, 2011. Date of publication October 25, 2011; date of currentversion April 19, 2012.This work was supported by the Department of Defense(DoD), Air Force Office of Scientific Research (AFOSR), National DefenseScience and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a, andby grants from National Science Foundation, Department of Energy (DOE),and AFOSR. Recommended by Associate Editor J. Grizzle.

P. Frihauf and M. Krstic are with the Department of Mechanical andAerospace Engineering University of California at San Diego, La Jolla, CA92093 (e-mail: [email protected]; [email protected]).

T. Basar is with the Coordinated Science Laboratory, University of Illinois,Urbana, IL 61801-2307 USA (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TAC.2011.2173412

utilizing deterministic extremum seeking with sinusoidal per-turbations, the players attain their Nash strategies without theneed for any model information. We analyze both static gamesand games with dynamics, where the players’ actions serve asinputs to a general, stable nonlinear dynamic system whoseoutputs are the players’ payoff values. In the latter scenario, thedynamic system evolves on a faster time scale compared to thetime scale of the players’ strategies, resulting in a static gamebeing played at the steady state of the dynamic system. In thesescenarios, the players possess either quadratic or non-quadraticpayoff functions, which may result in multiple, isolated Nashequilibria. We quantify the convergence bias relative to theNash equilibria that results from payoff function terms ofhigher order than quadratic.

1) Literature Review: Most algorithms designed to achieveconvergence to Nash equilibria require modeling informationof the game and assume the players can observe the actionsof the other players. Two classical examples are the best re-sponse and fictitious play strategies, where each player choosesthe action that maximizes its payoff given the actions of theother players. The Cournot adjustment, a myopic best responsestrategy, was first studied by Cournot [1] and refers to the sce-nario where a firm in a duopoly adjusts its output to maximizeits payoff based on the known output of its competitor. Thestrategy known as fictitious play (employed in finite games),where a player devises a best response based on the history ofthe other players’ actions was introduced in [2] in the context ofmixed-strategy Nash equilibria in matrix games. In [3], a gra-dient-based learning strategy where a player updates its actionaccording to the gradient of its payoff function was developed.Stability of general player adjustments was studied in [4] underthe assumption that a player’s response mapping is a contrac-tion, which is ensured by a diagonal dominance condition forgames with quadratic payoff functions. Distributed iterative al-gorithms for the computation of equilibria in a general class ofnon-quadratic convex Nash games were designed in [5], andconditions for the contraction of general nonlinear operatorswere obtained to achieve convergence.

In more recent work, a dynamic version of fictitious play andgradient response, which also includes an entropy term, is de-veloped in [6] and is shown to converge to a mixed-strategyNash equilibrium in cases that previously developed algorithmsdid not converge. In [7], a synchronous distributed learning al-gorithm, where players remember their own actions and utilityvalues from the previous two times steps, is shown to convergein probability to a set of restricted Nash equilibria. An approachthat is similar to our Nash seeking method (found in [8], [9] andin this paper) is studied in [10] to solve coordination problems inmobile sensor networks. Additional results on learning in gamescan be found in [11]–[15]. Some diverse engineering applica-tions of game theory include the design of communication net-

0018-9286/$26.00 © 2011 IEEE

Page 2: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

FRIHAUF et al.: NASH EQUILIBRIUM SEEKING IN NONCOOPERATIVE GAMES 1193

works in [16]–[20], integrated structures and controls in [21],and distributed consensus protocols in [22]–[24]. A compre-hensive treatment of static and dynamic noncooperative gametheory can be found in [25].

The results of this work extend the extremum seeking method[26]–[33], originally developed for standard optimization prob-lems. Many works have used extremum seeking, which per-forms non-model based gradient estimation, for a variety of ap-plications, such as steering vehicles toward a source in GPS-de-nied environments [34]–[36], optimizing the control of homo-geneous charge-compression ignition (HCCI) engines [37] andnonisothermal continuously stirred tank reactors [38], reducingthe impact velocity of an electromechanical valve actuator [39],and controlling flow separation [40] and Tokamak plasmas [41].

2) Contributions: We develop a Nash seeking strategy tostably attain a Nash equilibrium in noncooperative -playergames. The key feature of our approach is that the players donot need to know the mathematical model of their payoff func-tions or of the underlying model of the game. They only need tomeasure their own payoff values when determining their respec-tive time-varying actions, which classifies this learning strategyas radically uncoupled according to the terminology of [12].

As noted earlier, a game may possess multiple, isolated Nashequilibria if the players have non-quadratic payoff functions.Hence, we pursue local convergence results because, in non-quadratic problems, global results come under strong restric-tions. We do, however, make the connection to semi-global prac-tical asymptotic stability when the players have quadratic payofffunctions. For non-quadratic payoff functions, we show that theconvergence is biased in proportion to the amplitudes of the per-turbation signals and the third derivatives of the payoff func-tions, and confirm this convergence bias in a numerical example.In this example, we impose an action set for the playersand implement a Nash seeking strategy with projection to ensurethat the players’ actions remain in . This example is the firstsuch application of extremum seeking with projection.

3) Organization: We motivate our Nash seeking strategywith a duopoly price game in Section II and prove convergenceto the Nash equilibrium in -player games with quadraticpayoff functions in Section III. Also in Section III, we showthat the players converge to their best response strategies when asubset of players have a fixed action. In Section IV, we provideour most general result for -player games with non-quadraticpayoff functions and a dynamic mapping from the players’actions to their payoff values. A detailed numerical example isprovided in Section V before concluding with Section VI.

II. TWO-PLAYER GAME

To introduce our Nash seeking algorithm, we first consider aspecific two-player noncooperative game, which for example,may represent two firms competing for profit in a duopolymarket structure. Common duopoly examples include the softdrink companies, Coca-Cola and Pepsi, and the commercialaircraft companies, Boeing and Airbus. We present a duopolyprice game in this section for motivational purposes beforeproving convergence to the Nash equilibrium when players

Fig. 1. Deterministic Nash seeking schemes applied by players in a duopolymarket structure.

with quadratic payoff functions employ our Nash seekingstrategy in Section III.

Let players P1 and P2 represent two firms that produce thesame good, have dominant control over a market, and competefor profit by setting their prices and , respectively. Theprofit of each firm is the product of the number of units soldand the profit per unit, which is the difference between the saleprice and the marginal or manufacturing cost of the product. Inmathematical terms, the profits are modeled by

(1)

where is the number of sales, the marginal cost, andfor P1 and P2. Intuitively, the profit of each firm will

be low if it either sets the price very low, since the profit per unitsold will be low, or if it sets the price too high, since then con-sumers will buy the other firm’s product. The maximum profit isto be expected to lie somewhere in the middle of the price range,and it crucially depends on the price level set by the other firm.

To model the market behavior, we assume a simple, butquite realistic model, where for whatever reason, the consumerprefers the product of P1, but is willing to buy the product ofP2 if its price is sufficiently lower than the price . Hence,we model the sales for each firm as

(2)

(3)

where the total consumer demand is held fixed for simplicity,the preference of the consumer for P1 is quantified by ,and the inequalities and are assumedto hold.

Substituting (2) and (3) into(1) yields expressions for theprofits and that are both quadratic func-tions of the prices and , namely,

(4)

(5)

and thus, the Nash equilibrium is easily determined to be

(6)

(7)

Page 3: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

1194 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 5, MAY 2012

Fig. 2. (a) Price and (b) profit time histories for P1 and P2 when implementing the Nash seeking scheme (9)–(10). The dashed lines denote the values at the Nashequilibrium.

To make sure the constraints , aresatisfied by the Nash equilibrium, we assume that liesin the interval . If , this condition isautomatically satisfied.

For completeness, we provide here the definition of a Nashequilibrium in an -player game:

(8)

where is the payoff function of player , its action, its ac-tion set, and denotes the actions of the other players. Hence,no player has an incentive to unilaterally deviate its action from

. In the duopoly example, , where denotesthe set of positive real numbers.

To attain the Nash strategies (6)–(7) without any knowledgeof modeling information, such as the consumer’s preference ,the total demand , or the other firm’s marginal cost or price,the firms implement a non-model based real-time optimizationstrategy, e.g., deterministic extremum seeking with sinusoidalperturbations, to set their price levels. Specifically, P1 and P2set their prices, and respectively, according to the time-varying strategy (Fig. 1):

(9)

(10)

where , , , , and .Further, the frequencies are of the form

(11)

where is a positive, real number and is a positive, rationalnumber. This form is convenient for the convergence analysisperformed in Section III. The resulting pricing, sales, and profittransients when the players implement (9)–(10) are shown inFig. 2 for a simulation with , , ,

, , , , rad s,rad s, and ,

. From Fig. 2(a), we see that convergence to theNash equilibrium is not trivially achieved sinceand yet increases initially before decreasing to due to

the overall system dynamics of (9)–(10) with(4)–(5). In thesesimulations (and those shown in Section III-D), we utilize awashout (high pass) filter in each player’s extremum seekingloop [28], but we do not include this filter in the theoreticalderivations since it is not needed to derive our stability results.Its inclusion would substantially lengthen the presentation dueto the doubling of the state dimension and obfuscate the sta-bility results. The washout filter removes the DC componentfrom signal , which, while not necessary, typically im-proves performance.

In contrast, the firms are also guaranteed to converge to theNash equilibrium when employing the standard parallel actionupdate scheme [25, Proposition 4.1]

(12)

(13)

which requires each firm to know both its own marginal cost andthe other firm’s price at the previous step of the iteration, andalso requires P1 to know the total demand and the consumerpreference parameter . In essence, P1 must know nearly allthe relevant modeling information. When using the extremumseeking algorithm (9)–(10), the firms only need to measure thevalue of their own payoff functions, and . Convergenceof (12)–(13) is global, whereas the convergence of the Nashseeking strategy for this example can be proved to be semi-global, following [29], or locally, by applying the theory of av-eraging [43]. We establish local results in this paper since weconsider non-quadratic payoff functions in Section IV. We do,however, state a non-local result for static games with quadraticpayoff functions using the theory found in [42]. For detailedanalysis of the non-local convergence of extremum seeking con-trollers applied to general convex systems, the reader is referredto [29].

III. -PLAYER GAMES WITH QUADRATIC PAYOFF FUNCTIONS

We now generalize the duopoly example in Section II to staticnoncooperative games with players that wish to maximizetheir quadratic payoff functions. We prove convergence to aneighborhood of the Nash equilibrium when the players employthe Nash seeking strategy (9)–(10).

Page 4: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

FRIHAUF et al.: NASH EQUILIBRIUM SEEKING IN NONCOOPERATIVE GAMES 1195

A. General Quadratic Games

First, we consider games with general quadratic payoff func-tions. Specifically, the payoff function of player is of the form

(14)

where the action of player is , , , andare constants, , and . Quadratic games

of this form are studied in [25, Sec. 4.6] where Proposition 4.6states that the -player game with payoff functions (14) admitsa Nash equilibrium if and only if

(15)

admits a solution. Rewritten in matrix form, we have, where

.... . .

...(16)

and is unique if is invertible. We make the followingstronger assumption concerning this matrix:

Assumption 3.1: The matrix given by (16) is strictly diag-onally dominant, i.e.,

(17)

By Assumption 3.1, the Nash equilibrium exists and isunique since strictly diagonally dominant matrices are nonsin-gular by the Levy–Desplanques theorem [44], [45]. To attain

stably in real time, without any modeling information, eachplayer employs the extremum seeking strategy (9)–(10).

Theorem 1: Consider the system (9)–(10) with (14) underAssumption 3.1 for an -player game, where ,

, and for all distinct , , , andwhere is rational for all , . There exist

, , such that for all , if is sufficientlysmall, then for all ,

(18)

where and denotesthe Euclidean norm.

Proof: Denote the relative Nash equilibrium error as

(19)

By substituting (14) into (9)–(10), we get the error system

(20)

Let where is the positive, real number in (11).Rewriting(20) in the time scale and rearranging terms yields

(21)

where and is a rationalnumber. Hence, the error system (21) is periodic with pe-riod , where LCM denotesthe least common multiple. With as a small parameter, (21)admits the application of the averaging theory [43] for stabilityanalysis. The average error system can be shown to be

(22)

which in matrix form is , where

.... . .

(23)

and , . (The details of computing (22),which require that , , and forall distinct , , , are shown in Appendix A.)

From the Gershgorin Circle Theorem [45, Theorem 6.1.1],we have , where denotes the spectrum of

and is a Gershgorin disc:

(24)

Since and is strictly diagonally dominant, the unionof the Gershgorin discs lies strictly in the left half of the complexplane, and we conclude that for all . Thus,

Page 5: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

1196 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 5, MAY 2012

given any matrix , there exists a matrixsatisfying the Lyapunov equation .Using as a Lyapunov function,

we obtain

(25)

Noting that satisfies the bounds,, and applying the Comparison

Lemma [43] gives

(26)

where (27)

From (26) and [43, Th. 10.4], we obtain, provided is sufficiently

close to . Reverting to the time scale and noting thatcompletes

the proof.From the proof, we see that the convergence result holds if

(23) is Hurwitz, which does not require the strict diagonal dom-inance assumption. However, Assumption 3.1 allows conver-gence to hold for , , whereas merely assuming (23)is Hurwitz would create a potentially intricate dependence onthe unknown game model and the selection of the parameters

, . Also, while we have considered only the case where theaction variables of the players are scalars, the results equallyapply to the vector case, namely , by simply consid-ering each different component of a player’s action variable tobe controlled by a different (virtual) player. In this case, thepayoff functions of all virtual players corresponding to player

will be the same.Even though is unique for quadratic payoffs, Theorem 1

is local due to our use of standard local averaging theory. Fromthe theory in [42], we have the following non-local result:

Corollary 1: Consider players with quadratic payoff func-tions (14) that implement the Nash seeking strategy (9)–(10)with frequencies satisfying the inequalities stated in Theorem1. Then, the Nash equilibrium is semi-globally practicallyasymptotically stable.

Proof: In the proof of Theorem 1, the average error system(22) is shown to be globally asymptotically stable. By [42, The-orem 2], with the error system (21) satisfying the theorem’s con-ditions, the origin of (21) is semi-globally practically asymptot-ically stable.

For more details on semi-global convergence with extremumseeking controllers, the reader is referred to [29].

B. Symmetric Quadratic Games

If the matrix is symmetric, we can develop a more preciseexpression for the convergence rate in Theorem 1. Specifically,we assume the following.

Assumption 3.2: for all , .Under Assumptions 3.1 and 3.2, is a negative definite

symmetric matrix. Noncooperative games satisfying 3.2 are

a class of games known as potential games [46, Th. 4.5]. Inpotential games, the maximization of the players’ payoff func-tions corresponds to the maximization of a global function

, known as the potential function.Corollary 2: Consider the system (9)–(10) with (14)

under Assumptions 3.1 and 3.2 for an -player game, where, , and for all distinct ,

, , and where is rational for all ,. The convergence properties of Theorem 1

hold with

(28)

(29)

Proof: From the proof of Theorem 1, given any matrix, there exists a matrix satisfying the

Lyapunov equation since , given by (23), isHurwitz. Under Assumption 3.2, we select and obtain

. Then, we directly have

(30)

and using the Gershgorin Circle Theorem [45, Th. 6.1.1], weobtain the bound

(31)

where we note that . From (27),(30), and(31), we obtainthe result.

Of note, the coefficient in Corollary 2 is determined com-pletely by the extremum seeking parameters , while theconvergence rate depends on both , , and the unknowngame parameters . Thus, the convergence rate cannot befully known since it depends on the unknown structure of thegame.

C. Duopoly Price Game

Revisiting the duopoly example in Section II, we see that theprofit functions (4) and (5) have the form (14). Moreover, thisgame satisfies both Assumptions 3.1 and 3.2 since

By Theorem 1 and Corollary 2, the firms converge to a neigh-borhood of (6)–(7) according to

Page 6: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

FRIHAUF et al.: NASH EQUILIBRIUM SEEKING IN NONCOOPERATIVE GAMES 1197

Fig. 3. Model of sales � , � , � in a three-firm oligopoly with prices � , � ,� and total consumer demand � . The desirability of product � is proportionalto ��� .

where is given by (28) and since.

D. Oligopoly Price Game

Consider a static noncooperative game with firms in an oli-gopoly market structure that compete to maximize their profitsby setting the price of their product. As in the duopoly ex-ample in Section II, let be the marginal cost of player ,its sales volume, and its profit, given by (1). The simplifiedsales model for the duopoly (2)–(3), where one firm has a clearadvantage over the other in terms of consumer preference is nolonger appropriate. Instead, we model the sales volume as

(32)

where is the total consumer demand, for all ,, and .

The sales model (32) is motivated by an analogous electriccircuit, shown in Fig. 3, where is an ideal current gener-ator, are ideal voltage generators, and most importantly, theresistors represent the “resistance” that consumers have to-ward buying product . This resistance may be due to quality orbrand image considerations—the most desirable products havethe lowest . The sales in (32) are inversely proportional toand grow as decreases and as , , increases. The profit(1), in electrical analogy, corresponds to the power absorbed bythe portion of the voltage generator .

Proposition 1: There exists such that, for all , thesystem (9)–(10) with (1) and (32) for the oligopoly price game,where , for all distinct , andwhere is rational for all , , exponentiallyconverges to a neighborhood of the Nash equilibrium:

(33)

where . Namely, ifis sufficiently small, then for all

(34)

where is defined in Theorem 1, is given by(28), and

.

Proof: Substituting (32) into (1) yields payoff functions

that are of the form (14). Therefore, the Nash equilibriumsatisfies , where and are given by (16) and haveelements

if

if

for , . Assumption 3.1 is satisfied by since

Thus, the Nash equilibrium of this game exists, is unique, andcan be shown to be (33). (The various parameters here are as-sumed to be selected such that is positive for all .) Moreover,

, so is a negative definite symmetric matrix, satis-fying Assumption 3.2. Then, by Theorem 1 and Corollary 2, wehave the convergence bound (34) and obtain by computing

where , and by noting that. Finally, the error system (20) for this game

does not contain any terms with the product ,so the requirement that for all distinct , ,

does not arise when computing the averageerror system.

The resulting pricing, sales, and profit transients when fourfirms implement (9)–(10) are shown in Fig. 4 for a simulationwith game parameters: , , ,

, , , , , ;extremum seeking parameters: ,

, , , , , ,, ; and initial conditions:

, , ,.

E. -Player Games With Quadratic Payoff Functions andStubborn Players

An interesting scenario to consider is when not all theplayers utilize the Nash seeking strategy (9)–(10). Without lossof generality, we assume that players implement

Page 7: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

1198 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 5, MAY 2012

Fig. 4. (a) Price and (b) profit time histories of firms P1, P2, P3, and P4 when implementing the Nash seeking scheme (9)–(10). The dashed lines denote the valuesat the Nash equilibrium.

(9)–(10) while players are stubborn anduse fixed actions

(35)

To discuss the convergence of the Nash seeking players, weintroduce their reaction curves [25, Def. 4.3], which are theplayers’ best response strategy given the actions of the otherplayers. When the players have quadratic payoff functions (14),their reaction curves are

(36)

where denotes the actions of the other players. Inthe presence of stubborn players, the remaining players’ uniquebest response is given by

......

. . .

.... . .

...... (37)

When there are no stubborn players, .Theorem 2: Consider the -player game with (14) under

Assumption 3.1, where players implement thestrategy (9)–(10), with , , andfor all distinct , , , and stubborn players

implement (35). There exist , , suchthat for all , if is sufficiently small, then for all

(38)

where and is givenby (37).

Proof: Following the proof of Theorem 1 for the Nashseeking players, one can obtain

......

. . ....

.... . .

... (39)

where , and the stubborn players’ errorrelative to the Nash equilibrium is denoted by ,

. The unique, exponentially stable equilibriumof (39) is

......

. . .

.... . .

... (40)

and so from the proof of Theorem 1 and [43, Th. 10.4], we have

(41)

for players . What remains to be shown is thatis the best response of player .

From (35) and (41), the players’ actions converge on averageto . At , thebest response of player is

(42)

Page 8: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

FRIHAUF et al.: NASH EQUILIBRIUM SEEKING IN NONCOOPERATIVE GAMES 1199

where we have substituted . Noting (36) and using(40) to substitute for yields

(43)

Hence, the Nash seeking players converge to their best re-sponse actions .

To provide more insight into this result, we note that the av-erage response of player is

(44)

which is the scaled, continuous-time best response for gameswith quadratic payoff functions. Continuous-time best responsedynamics are studied for more general scenarios in [47], [48].

IV. -PLAYER GAMES WITH NON-QUADRATIC

PAYOFF FUNCTIONS

Now consider a more general noncooperative game withplayers and a dynamic mapping from the players’ actions totheir payoff values . Each player attempts to maximize thesteady-state value of its payoff. Specifically, we consider a gen-eral nonlinear model

(45)

(46)

where is the state, is a vector of the players’actions, is the action of player , its payoff value,

and are smooth, and isa possibly non-quadratic function. Oligopoly games may pos-sess nonlinear demand and cost functions [50], which motivatethe inclusion of the dynamic system (45) in the game struc-ture and the consideration of non-quadratic payoff functions.For this scenario, we pursue local convergence results since thepayoff functions may be non-quadratic and multiple, isolatedNash equilibria may exist. If the payoff functions are quadratic,semi-global practical stability can be achieved following the re-sults of [29].

We make the following assumptions about this -playergame.

Assumption 4.1: There exists a smooth functionsuch that

(47)

Assumption 4.2: For each , the equilibriumof (45) is locally exponentially stable.

Hence, we assume that for all actions, the nonlinear dynamicsystem is locally exponentially stable. We can relax the require-ment that this assumption holds for each as we need tobe only concerned with the action sets of the players, namely,

, and we do this in Section Vfor our numerical example. For notational convenience, we usethis more restrictive case.

The following assumptions are central to our Nash seekingscheme as they ensure that at least one stable Nash equilibriumexists at steady state.

Assumption 4.3: There exists at least one, possibly multiple,isolated stable Nash equilibria such that, forall

(48)

Assumption 4.4: The matrix

.... . .

(49)

is strictly diagonally dominant and hence, nonsingular.By Assumptions 4.3 and 4.4, is Hurwitz.As with the games considered in Sections II and III, each

player converges to a neighborhood of by implementing theextremum seeking strategy (9)–(10) to evolve its action ac-cording to the measured value of its payoff . Unlike the pre-vious games, however, we select the parameters

, where , are small, positive constants and is relatedto the players’ frequencies by (11). Intuitively, is small sincethe players’ actions should evolve more slowly than the dynamicsystem, creating an overall system with two time scales. In con-trast, our earlier analysis assumed to be small, which can beseen as the limiting case where the dynamic system is infinitelyfast and allows to be large.

Formulating the error system clarifies why these parameterselections are made. The error relative to the Nash equilibriumis denoted by (19), which in the time scale , leads to

(50)

(51)

where , , and. The system (50)–(51) is in the stan-

dard singular perturbation form with as a small parameter.Since is also small, we analyze (50)–(51) using the averagingtheory for the quasi-steady state of (50), followed by the use ofthe singular perturbation theory for the full system.

Page 9: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

1200 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 5, MAY 2012

A. Averaging Analysis

For the averaging analysis, we first “freeze” in (50) at itsquasi-steady state , which we substituteinto (51) to obtain the “reduced system”

(52)

This system is in the form to apply averaging theory [43] andleads to the following result.

Theorem 3: Consider the system (52) for an -player gameunder Assumptions 4.3 and 4.4, where , ,

, and for all distinct , ,and is rational for all . There exist

parameters , and , such that, for all and, if is sufficiently small, then for all

(53)

where

, and

(54)

(55)

(56)

if

if(57)

Proof: As already noted, the form of (52) allows for theapplication of averaging theory, which yields the average errorsystem

(58)The equilibrium of (58) satisfies

(59)

for all , and we postulate that has the form

(60)

By approximating about in (59) with a Taylor poly-nomial and substituting (60), the unknown coefficients and

can be determined.To capture the effect of higher order derivatives on average

error system’s equilibrium, we use the Taylor polynomial ap-

proximation [49], which requires to be timesdifferentiable

(61)

where is a point on the line segment that connectsthe points and . In (61), we haveused multi-index notation, namely, ,

, , , and. The second term on

the last line of (61) follows by substituting the postulated formof (60).

For this analysis, we choose to capture the effect ofthe third order derivative on the system as a representative case.Higher order estimates of the bias can be pursued if the thirdorder derivative is zero. Substituting (61) into (59) and com-puting the average of each term gives

(62)

where we have noted (48), utilized (60), and computed the inte-grals shown in Appendices I and II. Substituting (60) into (62)and matching first order powers of gives

......

...

which implies that for all , since is nonsingular byAssumption 4.4. Similarly, matching second-order terms of ,

Page 10: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

FRIHAUF et al.: NASH EQUILIBRIUM SEEKING IN NONCOOPERATIVE GAMES 1201

and substituting to simplify the resulting expressions,yields

......

...

where is defined in (56). Thus, for all , , when, and is given by (55) for , . The

equilibrium of the average system is then

(63)

By again utilizing a Taylor polynomial approximation, onecan show that the Jacobian of (58) at haselements given by

(64)

By Assumptions 4.3 and 4.4, is Hurwitz for suffi-ciently small , which implies that the equilibrium (63)of the average error system (58) is locally exponentiallystable, i.e., there exist constants , such that

, which with [43,Th. 10.4] implies

(65)

provided is sufficiently close to . Defining asin Theorem 3 completes the proof.

From Theorem 3, we see that of reduced system (52) con-verges to a region that is biased away from the Nash equilibrium

. This bias is in proportion to the perturbation magnitudesand the third derivatives of the payoff functions, which are

captured by the coefficients . Specifically, of the reducedsystem converges to as

.Theorem 3 can be viewed as a generalization of the Theorem

1, but with a focus on the error system to highlight the effect ofthe payoff functions’ non-quadratic terms on the players’ con-vergence. This emphasis is needed because of the reducedsystem converges to an -neighborhood of , asin the quadratic payoff case, since .

B. Singular Perturbation Analysis

We analyze the full system (50)–(51) in the time scaleusing singular perturbation theory [43]. In Section IV-A, weanalyzed the reduced model (52) and now, must study the

boundary layer model to state our convergence result. First,however, we translate the equilibrium of the reduced modelto the origin by defining , where by [43, Th.10.4], is a unique, exponentially stable, -periodic solution

such that

(66)

In this new coordinate system, we have for

(67)

(68)

which from Assumption 4.1 has the quasi-steady state, and consequently, the reduced model in

the new coordinates is

(69)

which has an equilibrium at that is exponentially stablefor sufficiently small .

To formulate the boundary layer model, let, and then in the time scale , we have

(70)

where should be viewed as a parameter inde-pendent of the time variable . Since , is anequilibrium of (70) and is exponentially stable by Assumption4.2.

With as a singular perturbation parameter, we applyTikhonov’s Theorem on the Infinite Interval [43, Th. 11.2] to(67)–(68), which requires the origin to be an exponentiallystable equilibrium point of both the reduced model (69) and theboundary layer model (70) and leads to the following:

• the solution of (67) is -close to the solutionof the reduced model (69), so

• the solution of (51) converges exponentially to an-neighborhood of the -periodic solution , and

• the -periodic solution is -close to the equilib-rium .

Hence, as , converges to an -neighborhood

of . Since

, convergesto an -neighborhood of .

Also from Tikhonov’s Theorem on the Infinite Interval, thesolution of (68), which is the same as the solution of (50),satisfies

(71)

Page 11: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

1202 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 5, MAY 2012

where is the solution of the reduced model (52) andis the solution of the boundary layer model (70). Rearrangingterms and subtracting from both sides yields

(72)

After noting that• converges exponentially to , which is

-close to the equilibrium ,• is , and• is exponentially decaying,

we conclude that exponentially converges to an-neighborhood of the origin. Thus,

exponentially converges to an -neigh-borhood of the payoff value .

We summarize with the following theorem.Theorem 4: Consider the system (45)–(46) with (9)–(10) for

an -player game under Assumptions 4.1–4.4, where ,, , and for all distinct

, , and where is rational for all ,. There exists and for any there

exist , such that for the given and anyand , the solutionconverges exponentially to an -neighborhoodof the point , provided the initial conditionsare sufficiently close to this point.

V. NUMERICAL EXAMPLE WITH NON-QUADRATIC

PAYOFF FUNCTIONS

For an example non-quadratic game with players that employthe extremum seeking strategy (9)–(10), we consider the system

whose equilibrium state is given by. The Jacobian at the equilib-

rium is Hurwitz if . Thus, the islocally exponentially stable, but not for all ,violating Assumption 4.2. However, as noted earlier, thisrestrictive requirement of local exponential stability for all

was done merely for notational convenience, and weactually only require this assumption to hold for the players’action sets. Hence, in this example, we restrict the players’actions to the set .

At , the payoff functions are

(73)

Fig. 5. Steady-state payoff function surfaces (73) with both their associated re-action curves (black) (74)–(75), which lie in the action set � , and the extremals�� ��� � � (dashed yellow), which lie outside of � , superimposed.

which have the associated reaction curves, defined on the actionset , shown as follows:

ifif

(74)

(75)

Fig. 5 depicts each player’s payoff surface with both the reac-tion curves and the extremals that lie outside ofsuperimposed. These reaction curves have two intersections inthe interior of [Fig. 6(a)], which correspond to two Nash equi-libria: and . At

, Assumptions 4.3 and 4.4 are satisfied, implying the stabilityof , whereas at , these assumptions are violated—as theymust be—since further analysis will show that is an unstableNash equilibrium. The reaction curves also intersect on the ac-tion set boundary at , which means this game has aNash equilibrium on the boundary that the players may seek de-pending on the game’s initial conditions. To ensure the playersremain in the action set, we employ a modified Nash seekingstrategy that utilizes projection [51]. For this example, we willfirst present simulation results for Nash seeking in the interiorof before considering the Nash equilibrium on the boundary.

A. Nash Seeking in the Interior

For the Nash equilibrium points in the interior of , we wantto determine their stability and to quantify the convergence biasdue to the non-quadratic payoff function. First, we compute theaverage error system for the reduced model according to (58),obtaining

(76)

(77)

where or . The equilibria of (76)–(77)are given by

. While the systemappears to have four equilibria, two for each value of ,two equilibria correspond to the difference between the two

Page 12: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

FRIHAUF et al.: NASH EQUILIBRIUM SEEKING IN NONCOOPERATIVE GAMES 1203

Fig. 6. (a) Reaction curves associated with the steady-state payoff functions (73) with the stable Nash equilibrium (green circle), the unstable Nash equilibrium (redsquare), and the Nash equilibrium on the boundary (blue star), and (b) the convergence bias ��� relative to each interior Nash equilibrium, due to the non-quadraticpayoff function of player 2 and lies along the reaction curve of player 1.

Fig. 7. Time history of the two-player game initialized at (a) �� � � � � ����� ����, and (b) �� � � � � ����� ����.

Nash equilibrium points, meaning that in actuality, only twoequilibria exist and can be written as

(78)

whereis the smallest value of

for each Nash equilibrium. In the sequel, we omit thedependency of , , and on for conciseness. As seenin Fig. 6(b), the equilibrium is biased away from the reactioncurve of player 2 but still lies on the reaction curve of player1 since only the payoff function for player 2 is non-quadratic.

The Jacobian of the average error system for the reducedmodel is

when evaluated at , whereand . Its characteristic equation is

Thus, is Hurwitz if and only if and are positive.For sufficiently small so that , , when

, and when , which im-plies is a stable Nash equilibrium and is unstable. Closeranalysis shows that the Jacobian associated with has two realeigenvalues—one that is negative and one that is positive.

For the simulations, we select , ,, , and , where the parameters are

chosen to be small, in particular the perturbation frequencies ,since the perturbation must occur at a time scale that is slowerthan the fast time scale of the nonlinear system. Fig. 7(a) and (b)depicts the evolution of the players’ actions and initial-ized at and

. The state is initialized at the origin in bothcases. We show instead of to better illustrate the con-vergence of the players’ actions to a neighborhood near—butbiased away from—the Nash strategies since contains theadditive signal . In both scenarios, the average actions ofboth player 1 and player 2 lie below , which is consistentwith the equilibrium of the reduced model’s average errorsystem (78).

The slow initial convergence in Fig. 7(b) can be explainedby examining the phase portrait of the average of the reducedmodel -system. The initial condition lies near theunstable equilibrium, causing the slow initial convergence seenin Fig. 7(b). In Fig. 8, the stable and unstable interior Nashequilibria (green circle and red square), the Nash equilibriumon (blue star), and the eigenvectors (magenta arrows) as-sociated with each interior Nash equilibrium are denoted. The

Page 13: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

1204 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 5, MAY 2012

Fig. 8. Phase portrait of the ��-reduced model average system, with the players’��-trajectories superimposed for each initial condition. The stable and unstableNash equilibrium points (green circle and red square), their eigenvectors (ma-genta arrows), and Nash equilibrium on �� (blue star) are denoted. The shadedregion is the set difference � � �� . If �� � �� , then � � � . The zoomed-inarea in shows the players’ convergence to an almost-periodic orbit that is biasedaway from the stable Nash equilibrium and centered on the reaction curve � .The average actions of the players are marked by �.

eigenvectors are, in general, neither tangent nor perpendicularto the reaction curves at the Nash equilibrium points. The phaseportrait and orientation of the eigenvectors do depend on thevalues of and , but the stability of each Nash equilibriumis invariant to their values, provided these parameters are pos-itive and small as shown in Sections III and IV. Since Fig. 8is a phase portrait of the -reduced system, we include theshaded regions to show that must be restricted to the set

toensure that lies in . Hence, the shaded region’s outer edgeis , the inner edge is , and the shaded interior is the setdifference .

When the players’ initial conditions lie in the stable region ofthe phase portrait, the players’ actions remain in and convergeto a neighborhood of the stable Nash equilibrium in the interiorof . The zoomed-in area highlights the convergence of the tra-jectories to an almost-periodic orbit that is biased away from theNash equilibrium . The depicted orbit consists of the last 300s of each trajectory with the players’ average actions denotedby . These average actions lie on the reaction curve as pre-dicted by (78).

B. Nash Seeking With Projection

To ensure that a player’s action remains in the action set , weemploy a modified extremum seeking strategy that utilizes pro-jection. Namely, the players implement the following strategy:

(79)

(80)

where is defined as in (9)–(10) and

if andotherwise.

(81)

Lemma 1: The following properties hold for the projectionoperator (81).

1) For all , .2) With , the update law (79) with the projection

operator (81) guarantees that is maintained in the pro-jection set, namely, for all .

3) For all and , the following holds:.

Proof: Points 1 and 2 are immediate. Point 3 follows fromthe calculation

if andotherwise

(82)

since when .Several challenges remain in the development of convergence

proofs for Nash seeking players with projection, such as com-puting the average error system and defining the precise notionof stability for equilibria that exist on the action space bound-aries. To alleviate concerns of existence and uniqueness of so-lutions that arise from the projection operator being discontin-uous, a Lipschitz continuous version of projection [51] can beused.

Fig. 9 depicts the trajectories and when the playersemploy (79)–(80) with the same parameters as in Section V-Aand the initial condition

. For this initial condition, the players reachwhere while increases until the trajec-tory reaches the eigenvector and is pushed away from theboundary. From this point, the players follow a trajectory thatis similar to the one depicted in Fig. 7(b).

While the modified strategy (79)–(80) ensures that theplayers’ actions lie in , it effectively prevents the playersfrom converging to a neighborhood of any equilibria that liein . To address this limitation, we propose the modifiedprojection strategy

(83)

(84)

where is defined as in (81). The strategy (83)–(84) isequivalent to (79)–(80) when sinceensures that .

In Fig. 10, the players employ (83)–(84) for the same sce-nario shown in Fig. 9. With this strategy, the players approachthe boundary , where, unlike at the boundary

, the vector field causes to decrease toward theNash equilibrium at . This Nash equilibrium is attainableonly by allowing to enter , but the convergence rateslows near the boundary since as ap-proaches and due to the nearby unstable Nash equilibrium(Fig. 10). If the nonnegative action requirement were relaxed toallow the perturbation to leave , i.e., we restrict to the set

, then thepoint would be attainable using (79)–(80) with .This strategy would exhibit a faster convergence rate alongthan if the players used (83)–(84), but convergence would stillbe slow due to the presence of the unstable Nash equilibrium.

Page 14: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

FRIHAUF et al.: NASH EQUILIBRIUM SEEKING IN NONCOOPERATIVE GAMES 1205

Fig. 9. (a) Time history of the two-player game initialized at ����� ����� when implementing the modified Nash seeking strategy (79)–(80), and (b) planar��-trajectories superimposed on the ��-reduced model average system phase portrait. The stable and unstable Nash equilibrium points (green circle and red square),their eigenvectors (magenta arrows), and the Nash equilibrium on �� (blue star) are denoted. The shaded region is the set difference � � �� .

Fig. 10. (a) Time history of the two-player game initialized at ���������� when implementing the modified Nash seeking strategy (83)–(84), and (b) planar��-trajectories superimposed on the ��-reduced model average system phase portrait. The unstable Nash equilibrium (red square), its eigenvectors (magenta arrows),and the Nash equilibrium on �� (blue star) are denoted. The shaded region is the set difference � � �� .

VI. CONCLUSION

We have introduced a non-model based approach for conver-gence to the Nash equilibria of static, noncooperative gameswith players, where the players measure only their ownpayoff values. Our convergence results include games withnon-quadratic functions and games where the players’ actionsserve as inputs to a general, stable nonlinear differential equa-tion whose outputs are the players’ payoffs. For non-quadraticpayoff functions, the convergence is biased in proportion toboth payoff functions’ higher derivatives and the perturbationsignals’ amplitudes. When the players’ actions are restricted toa closed and bounded action set , extremum seekingwith projection ensures that their actions remain in .

A player cannot determine a priori if its initial action issufficiently close to to guarantee convergence since isunknown and because a quantitative estimate of the regionof attraction for is very difficult to determine. However,for quadratic payoffs, convergence to is semi-global. If weassume the players possess a good estimate of , based oneither partial information of the game or historical data, thislearning strategy remains attractive since it allows players toimprove their initial actions by measuring only their payoffvalues and does not require the estimation of potentially highlyuncertain parameters, e.g., competitors’ prices or consumerdemand. Also, due to the dynamic nature of this algorithm, theplayers can track movements in should the game’s model

change smoothly over time in a manner that is unknown to theplayers.

APPENDIX A

Several integrals over the period, where LCM denotes the least

common multiple, must be computed to obtain the averageerror system (22) and equilibrium (62) for the average errorsystem (58). First, we detail the necessary calculations tocompute the average error system for games with quadraticpayoffs, and then compute the additional terms that arisefor games with non-quadratic payoffs in Appendix B. Toevaluate the integrals, we use the following trigonometricidentities to simplify the integrands: ,

, ,,,

.To obtain (22), we compute the average of each term in the

error system (21) and assume that , , andfor all distinct , , . The averages

of the first and last terms of (21) are zero since

(85)

Page 15: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

1206 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 57, NO. 5, MAY 2012

The average of the second term of (21) depends on whether ornot . If

(86)

since , and if

(87)

which with (86) implies

(88)

Similarly, the average of the fourth term of (21) is

(89)The average of the third term requires computing the followingthree integrals:

APPENDIX B

For games with non-quadratic payoffs, we assume theplayers’ frequencies satisfy the requirements for games withquadratic payoff functions and also , ,

, where , , are distinct.Then, in addition to the integrals computed in Appendix A, thefollowing integrals are computed to obtain (62):

and

REFERENCES

[1] A. Cournot, Recherches sur les Principes Mathématiques de la Théoriedes Richesses. Paris, France: Hachette, 1838.

[2] G. W. Brown, “Iterative solutions of games by fictitious play,” in Ac-tivity Analysis of Production and Allocation, T. C. Koopmans, Ed.New York: Wiley, 1951, pp. 374–376.

[3] J. B. Rosen, “Existence and uniqueness of equilibrium points for con-cave N-person games,” Econometrica, vol. 33, no. 3, pp. 520–534,1965.

[4] D. G. Luenberger, “Complete stability of noncooperative games,” J.Optim. Theory Appl., vol. 25, no. 4, pp. 485–505, 1978.

[5] S. Li and T. Basar, “Distributed algorithms for the computation of non-cooperative equilibria,” Automatica, vol. 23, no. 4, pp. 523–533, 1987.

[6] J. S. Shamma and G. Arslan, “Dynamic fictitious play, dynamicgradient play, and distributed convergence to Nash equilibria,” IEEETrans. Autom. Control, vol. 53, no. 3, pp. 312–327, 2005.

[7] M. Zhu and S. Martínez, “Distributed coverage games for mobile visualsensor networks,” SIAM J. Control Optim., Jan. 2010 [Online]. Avail-able: http://arxiv.org/abs/1002.0367, submitted for publication

[8] M. Krstic, P. Frihauf, J. Krieger, and T. Basar, “Nash equilibriumseeking with finitely- and infinitely-many players,” in Proc. 8thIFAC Symp. Nonlinear Control Syst., Bologna, Italy, Sep. 2010, pp.1086–1091.

[9] P. Frihauf, M. Krstic, and T. Basar, “Nash equilibrium seeking forgames with non-quadratic payoffs,” in Proc. IEEE Conf. Decision Con-trol, Atlanta, GA, Dec. 2010, pp. 881–886.

[10] M. S. Stankovic, K. H. Johansson, and D. M. Stipanovic, “Distributedseeking of Nash equilibria in mobile sensor networks,” in Proc. IEEEConf. Decision Control, Atlanta, GA, Dec. 2010, pp. 5598–5603.

[11] A. Jafari, A. Greenwald, D. Gondek, and G. Ercal, “On no-regretlearning, fictitious play, and Nash equilibrium,” in Proc. 18th Int.Conf. Mach. Learn., 2001, pp. 226–233.

[12] D. P. Foster and H. P. Young, “Regret testing: Learning to play Nashequilibrium without knowing you have an opponent,” Theoret. Econ.,vol. 1, no. 3, pp. 341–367, 2006.

[13] R. Sharma and M. Gopal, “Synergizing reinforcement learning andgame theory—A new direction for control,” Applied Soft Comput., vol.10, no. 3, pp. 675–688, 2010.

[14] D. Fudenberg and D. K. Levine, The Theory of Learning in Games.Cambridge, MA: The MIT Press, 1998.

[15] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games.New York: Cambridge Univ. Press, 2006.

[16] Q. Zhu, H. Tembine, and T. Basar, “Heterogeneous learning inzero-sum stochastic games with incomplete information,” in Proc.IEEE Conf. Decision Control, Atlanta, GA, Dec. 2010, pp. 219–224.

[17] A. B. MacKenzie and S. B. Wicker, “Game theory and the design ofself-configuring, adaptive wireless networks,” IEEE Commun. Mag.,vol. 39, no. 11, pp. 126–131, 2001.

[18] E. Altman, T. Basar, and R. Srikant, “Nash equilibria for combinedflow control and routing in networks: Asymptotic behavior for a largenumber of users,” IEEE Trans. Autom. Control, vol. 47, no. 6, pp.917–930, 2002.

[19] T. Basar, “Control and game-theoretic tools for communication net-works (overview),” Appl. Comput. Math., vol. 6, no. 2, pp. 104–125,2007.

[20] G. Scutari, D. P. Palomar, and S. Barbarossa, “The MIMO iterativewaterfilling algorithm,” IEEE Trans. Signal Process., vol. 57, no. 5,pp. 1917–1935, May 2009.

[21] S. S. Rao, V. B. Venkayya, and N. S. Khot, “Game theory approach forthe integrated design of structures and controls,” AIAA J., vol. 26, no.4, pp. 463–469, 1988.

[22] D. Bauso, L. Giarré, and R. Pesenti, “Consensus in noncooperativedynamic games: A multiretailer inventory application,” IEEE Trans.Autom. Control, vol. 53, no. 4, pp. 998–1003, 2008.

[23] J. R. Marden, G. Arslan, and J. S. Shamma, “Cooperative control andpotential games,” IEEE Trans. Syst., Man, Cybern.—Part B: Cybern.,vol. 39, no. 6, pp. 1393–1407, Jun. 2009.

[24] E. Semsar-Kazerooni and K. Khorasani, “Multi-agent team coop-eration: A game theory approach,” Automatica, vol. 45, no. 10, pp.2205–2213, 2009.

[25] T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory, 2nded. Philadelphia, PA: SIAM, 1999.

[26] H.-H. Wang and M. Krstic, “Extremum seeking for limit cycle mini-mization,” IEEE Trans. Autom. Control, vol. 45, no. 12, pp. 2432–2437,Dec. 2000.

[27] J.-Y. Choi, M. Krstic, K. B. Ariyur, and J. S. Lee, “Extremum seekingcontrol for discrete-time systems,” IEEE Trans. Autom. Control, vol.47, no. 2, pp. 318–323, Feb. 2002.

[28] K. B. Ariyur and M. Krstic, Real-Time Optimization by Extremum-Seeking Control. Hoboken, NJ: Wiley-Interscience, 2003.

[29] Y. Tan, D. Nesic, and I. Mareels, “On non-local stability properties ofextremum seeking control,” Automatica, vol. 42, no. 6, pp. 889–903,2006.

[30] L. Luo and E. Schuster, “Mixing enhancement in 2D magnetohydrody-namic channel flow by extremum seeking boundary control,” in Proc.Amer. Control Conf., St. Louis, MO, Jun. 2009, pp. 1530–1535.

Page 16: 1192 IEEE TRANSACTIONS ON AUTOMATIC …flyingv.ucsd.edu/papers/PDF/157.pdf · Nash Equilibrium Seeking in Noncooperative Games Paul Frihauf, ... In classical game theory algorithms,

FRIHAUF et al.: NASH EQUILIBRIUM SEEKING IN NONCOOPERATIVE GAMES 1207

[31] W. H. Moase, C. Manzie, and M. J. Brear, “Newton-like ex-tremum-seeking part I: Theory,” in Proc. IEEE Conf. DecisionControl, Shanghai, China, Dec. 2009, pp. 3839–3844.

[32] M. S. Stankovic and D. M. Stipanovic, “Extremum seeking under sto-chastic noise and applications to mobile sensors,” Automatica, vol. 46,pp. 1243–1251, 2010.

[33] D. Nesic, Y. Tan, W. H. Moase, and C. Manzie, “A unifying approachto extremum seeking: Adaptive schemes based on estimation of deriva-tives,” in Proc. IEEE Conf. Decision Control, Atlanta, GA, Dec. 2010,pp. 4625–4630.

[34] C. Zhang, D. Arnold, N. Ghods, A. Siranosian, and M. Krstic, “Sourceseeking with nonholonomic unicycle without position measurementand with tuning of forward velocity,” Syst. Control Lett., vol. 56, pp.245–252, 2007.

[35] J. Cochran and M. Krstic, “Nonholonomic source seeking with tuningof angular velocity,” IEEE Trans. Autom. Control, vol. 54, no. 4, pp.717–731, Apr. 2009.

[36] J. Cochran, E. Kanso, S. D. Kelly, H. Xiong, and M. Krstic, “Sourceseeking for two nonholonomic models of fish locomotion,” IEEETrans. Robot., vol. 25, no. 5, pp. 1166–1176, Oct. 2009.

[37] N. J. Killingsworth, S. M. Aceves, D. L. Flowers, F. Espinosa-Loza,and M. Krstic, “HCCI engine combustion-timing control: Optimizinggains and fuel consumption via extremum seeking,” IEEE Trans. Con-trol Syst. Technol., vol. 17, no. 6, pp. 1350–1361, Nov. 2009.

[38] M. Guay, M. Perrier, and D. Dochain, “Adaptive extremum seekingcontrol of nonisothermal continuous stirred reactors,” Chem. Eng. Sci.,vol. 60, pp. 3671–3681, 2005.

[39] K. Peterson and A. Stefanopoulou, “Extremum seeking control for softlanding of an electromechanical valve actuator,” Automatica, vol. 29,pp. 1063–1069, 2004.

[40] R. Becker, R. King, R. Petz, and W. Nitsche, “Adaptive closed-loopseparation control on a high-lift configuration using extremumseeking,” AIAA J., vol. 45, no. 6, pp. 1382–1392, 2007.

[41] D. Carnevale, A. Astolfi, C. Centioli, S. Podda, V. Vitale, and L. Zac-carian, “A new extremum seeking technique and its application to max-imize RF heating on FTU,” Fusing Eng. Design, vol. 84, pp. 554–558,2009.

[42] A. R. Teel, J. Peuteman, and D. Aeyels, “Semi-global practical asymp-totic stability and averaging,” Syst. Control Lett., vol. 37, no. 5, pp.329–334, 1999.

[43] H. K. Khalil, Nonlinear Systems, 3rd ed. Upper Saddle River, NJ:Prentice Hall, 2002.

[44] O. Taussky, “A recurring theorem on determinants,” Amer. Math.Monthly, vol. 56, no. 10, pp. 672–676, 1949.

[45] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, U.K.:Cambridge Univ. Press, 1985.

[46] D. Monderer and L. S. Shapley, “Potential games,” Games Econ.Behav., vol. 14, no. 1, pp. 124–143, 1996.

[47] E. Hopkins, “A note on best response dynamics,” Games Econ. Behav.,vol. 29, no. 1–2, pp. 138–150, 1999.

[48] J. Hofbauer, “Best response dynamics for continuous zero-sum games,”Discrete Contin. Dynam. Syst.: Ser. B, vol. 6, no. 1, pp. 215–224, 2006.

[49] T. M. Apostol, Mathematical Analysis, 2nd ed. Reading, MA: Ad-dison-Wesley, 1974.

[50] A. K. Naimzada and L. Sbragia, “Oligopoly games with nonlineardemand and cost functions: Two boundedly rational adjustmentprocesses,” Chaos, Solitons, Fractals, vol. 29, pp. 707–722, 2006.

[51] M. Krstic, I. Kanellakopoulos, and P. Kokotovic, Nonlinear and Adap-tive Control Design. New York: Wiley-Interscience, 1995.

Paul Frihauf (S’09) received the B.S. and M.S.degrees in systems science and engineering fromWashington University, St. Louis, MO, in 2005.He is currently working toward the Ph.D. degree indynamic systems and control from the Mechanicaland Aerospace Engineering Department, Universityof California at San Diego, La Jolla.

He was an Industrial Engineering Intern at TheBoeing Company in 2003, a Summer Research As-sistant at MIT Lincoln Laboratory in 2004, AssociateProfessional Staff in the Guidance, Navigation, and

Control Group, The Johns Hopkins University Applied Physics Laboratory,

from 2005 to 2007, and an Algorithms Intern at Cymer in 2011. His researchinterests include noncooperative games, multi-agent control, and the control ofdistributed parameter systems.

Mr. Frihauf is a member of SIAM. He is a National Defense Science andEngineering Graduate (NDSEG) Fellow and an ARCS scholar.

Miroslav Krstic (S’92–M’95–SM’99–F’02) re-ceived the Ph.D. degree from the University ofCalifornia (UC), Santa Barbara, in 1994.

He was an Assistant Professor at the University ofMaryland, College Park, until 1997. He is the DanielL. Alspach Professor and the founding Director ofthe Cymer Center for Control Systems and Dynamics(CCSD) at UC San Diego, La Jolla. He is a coauthorof eight books: Nonlinear and Adaptive Control De-sign (Wiley, 1995), Stabilization of Nonlinear Uncer-tain Systems (Springer, 1998), Flow Control by Feed-

back (Springer, 2002), Real-time Optimization by Extremum Seeking Control(Wiley, 2003), Control of Turbulent and Magnetohydrodynamic Channel Flows(Birkhauser, 2007), Boundary Control of PDEs: A Course on Backstepping De-signs (SIAM, 2008), Delay Compensation for Nonlinear, Adaptive, and PDESystems (Birkhauser, 2009), and Adaptive Control of Parabolic PDEs (PrincetonUniv. Press, 2009).

Dr. Krstic is a Fellow of IFAC and has received the Axelby and Schuck PaperPrizes, NSF Career, ONR Young Investigator, and PECASE Award. He has heldthe appointment of Springer Distinguished Visiting Professor of Mechanical En-gineering at UC Berkeley.

Tamer Basar (S’71–M’73–SM’79–F’83) receivedthe B.S.E.E. degree from Robert College, Istanbul,Turkey, and the M.S., M.Phil, and Ph.D. degreesfrom Yale University, New Haven, CT.

He is with the University of Illinois at Ur-bana-Champaign (UIUC), where he holds thepositions of Swanlund Endowed Chair; Centerfor Advanced Study Professor of Electrical andComputer Engineering; Research Professor at theCoordinated Science Laboratory; and Research Pro-fessor at the Information Trust Institute. He joined

UIUC in 1981 after holding positions at Harvard University and MarmaraResearch Institute (Turkey). He has published extensively in systems, control,communications, and dynamic games, and has current research interests inmodeling and control of communication networks; control over heterogeneousnetworks; estimation and control with limited sensing and transmission;resource allocation, management and pricing in networks; cooperative andnoncooperative game theory for networks; mobile and distributed computing;and security and trust in computer systems. He is currently the Editor-in-Chiefof Automatica, the Editor of the Birkhäuser Series on Systems & Control, theEditor of the Birkhäuser Series on Static & Dynamic Game Theory, ManagingEditor of the Annals of the International Society of Dynamic Games (ISDG),and member of editorial and advisory boards of several international journalsin control, wireless networks, and applied mathematics.

Dr. Basar has received several awards and recognitions over the years, amongwhich are the Medal of Science of Turkey (1993); Distinguished MemberAward (1993), Axelby Outstanding Paper Award (1995), and Bode LecturePrize (2004) of the IEEE Control Systems Society (CSS); Millennium Medalof IEEE (2000); Tau Beta Pi Drucker Eminent Faculty Award of UIUC (2004);the Outstanding Service Award (2005) and the Giorgio Quazza Medal (2005)of the International Federation of Automatic Control (IFAC); the RichardBellman Control Heritage Award (2006) of the American Automatic ControlCouncil (AACC); honorary doctorate (Doctor Honoris Causa) from Dogus Uni-versity (Istanbul; 2007); honorary professorship from Northeastern University(Shenyang; 2008); and Isaacs Award of ISDG (2010). He is a member of theU.S. National Academy of Engineering, a member of the European Academyof Sciences, a Fellow of IFAC, a council member of IFAC, a past president ofCSS, a past (founding) president of ISDG, and the current President of AACC.