on the dynamics of machine learning algorithms and behavioral game theory

On the Dynamics of Machine LearningAlgorithms and Behavioral Game Theory

Towards Effective Decision Makingin Multi-Agent Environment

Graduate School of Systems and Information Engineering University of Tsukuba

Sep 17, 2016

Rikiya Takahashi, Ph.D. (SmartNews, Inc.)[email protected]

mailto:[email protected]

About Myself● Rikiya TAKAHASHI ( 高橋力矢 )

● Engineer in SmartNews, Inc., from 2015 to current

● Research Staff Member in IBM Research – Tokyo, from 2004 to 2015

● Ph.D in Engineering from University of Tsukuba, 2014

– Dissertation: "Stable Fitting of Nonparametric Models to Predict Complex Human Behaviors"

– Supervisor: Prof. Setsuya Kurahashi● M.Sc (2004) & B.Eng (2002) from The University of Tokyo

● Research Interests: machine learning, reinforcement learning, cognitive science, behavioral economics, complex systems

● Descriptive models about real human behavior● Prescriptive decision making by exploiting such descriptive models

References

Choice and Social Interaction

Why did you purchase Windows 10 XXX Edition?

Because the price and quality of that OS were good?

Or because your friends were using it?

Or both reasons?

Are you interested in quantifying each factor for betterdecision making?

SmartNews Connecting Machine Learning Algorithms with Behavioral Game Theory

References

Decision Making under Social Uncertainty

You can be either a player or a designer of the market.

Players: consumers, firms competing with other brandsDesigners: politicians, platformer of auction or SNS

In both scenarios you must optimize your decisions underuncertainty over other players’ decisions.


0 5 10 15 20 25 30 350

0.25

0.5

0.75

1

Elapsed Time [days]Retention

What I was doing during PhD: Stable Fitting of Power Law Models

● Heavy-tail distributions / long-range dependence● Bounded rationality: incomplete information, cognitive bias● Positive feedback: richer-get-richer, increasing survival prob.

Travel time in road network Ebbinghaus forgetting curve

Asset returns in finance Pageview of a video in YouTube

Power-law decayby cascading word-of-mouth (Crane &

Sornette, 2008)

Heavy-tail by price crash

(stimultaneous shorting)

http://finance.yahoo.com

Heavy-tailby huge

trafficcongestion

http://en.wikipedia.org/wiki/Traffic_congestion

Power-law decay by

interaction among short-, mid-, & long-

term memories

What I was doing during PhD: Power-Law Models = Multi-Scale Nonparametrics

● Global optimization in fitting nonparametric models● Non-linear modeling by linearly mixing

local or multi-scale basis functions● Convex optimization of the mixing weights● Domain-specific design of fixed basis functions

Elapsed Time

Rete

ntio

n

Value

Pro

babi

lity

De

nsity

Heavy-tail distribution as scale-mixture of Gaussians

Power-law decay as scale-mixture of exponential decays

Agenda

● Irrationality and Disequilibrium: essential phenomena making social science challenging

● Failed Forecasts by Rational Economic Models: Irrational Disequilibrium or Multiple Equilibria?

● Frontiers in Mathematical Modeling of Irrationality: a transitional-state perspective

● For on-going PhD students: how to exploit your research experiences into Jobs

Intertemporal Decision Making● $100 today = $100 * (1+interest rate) in the future● Objective must be time-consistent.

● Exponential discounting (constant interest rate)

● Unchanged preference order over time

V (t 0)=V (t)exp(−λ ( t−t 0)) where t> t 0, λ > 0

Real Human Discounts by Power Law● Hyperbolic discounting (Ainslie, 1974)

● Time-inconsistent preference order

V (t 0)=V (t)

(1+ λ (t−t 0))α where t> t 0, λ > 0,α > 0

Irrationality of Hyperbolic Discounting● Discrepancy between thought and action

● Long-term-oriented when the decision time is distant.● But suddently become myopic as the time reaches.

"About 1 month ago, I was thinking I would study hard (=long-term large utility) in the last 1 week before the exam, but I did play video games (=short-term small utility) this week..."

Long-term Option B is more prioritized than Short-term Option A at t=0,but its order is reversed at t=2.

Irrationality of Hyperbolic Discounting● Money pumps (Cubit and Sugden, 2001)

● We can steal money from hyperbolic discounter without risks, while cannot steal from exponential discounter.

At time t=0, we borrow Option B and exchange it with the target's Option A and $2.5 (=$15-$12.5).

Then we earn interests on this $2.5.

At time t=3, we exchange our Option A with the target's Option B and $10 (=$30-$20). Then return this Option B.

We get ($2.5 * (1 + interests) + $10 – borrowing cost) without risks.

Known Counterarguments● Hyperbolic discounting is rather rational, when the

interest rates in the future is uncertain.● E.g., (Azfar, 1999; Farmer & Geanakoplos, 2009)● Meaningful particularly in financial decision making● Integral on gamma prior distribution for interest rate

(multi-scale mixture of exponential discountings)

Power-Law and Disequilibrium● Power-law or fat tails in asset-return distributions

● See (Cont, 2001) for stylized facts.

● Short-term momentum generates outlying returns ● Positive autocorrelation in rare events (Sornette, 2004).


http://www.proba.jussieu.fr/pageperso/ramacont/papers/empirical.pdf

What Causes Fat Tails?● Hypothesis #1. Interplay among momentum traders

● E.g., Log-Periodic Power-Law (LPPL) model (Johansen+, 1999) as an extension of rational bubble model (Blanchard & Watson, 1982)

Sell?

Sell!

Sell More!!

$$: market price

http://arxiv.org/pdf/1107.3171.pdf

si=sign(K∑ j∈N (i)s j+ ε i)

si∈{−1,+ 1}

K: strength of interactionN(i): set of neighbors for investor i

epsiloni : investor i's own indiosyncratic prediction

http://arxiv.org/pdf/1107.3171.pdf

What Causes Fat Tails?● Hypothesis #2. Over-confidence on stability

● Leverage in low-volatility period (Thurner+, 2012)– Once a downward price fluctuation occurs, resulting

margin call causes rushes of selling into an already falling market, amplifying the downward price movement.


Low-Volatility Periodwith Leverage

Sudden Price Dropwith Margin Calls

http://finance.yahoo.com/

What Causes Fat Tails?● Implications are obtained by explicitly modeling

and simulating the dynamics in trading.● Physical modeling using stochastic processes● Transitionary states and disequilibrium play crucial roles.● Do not think that the system is always in equilibrium.

https://www.amazon.co.jp/dp/B009IRP3GW

M. Buchanan, “Forecast: What Physics, Meteorology, and the Natural Sciences Can Teach Us About

Economics,” A&C Black, 2013

https://www.amazon.co.jp/dp/B009IRP3GW

Regarding Irrationality as Disequilibrium

● Assume that human plays a game in his mind.● Then irrationality is regarded as an outcome

from state dynamics in mental processes.● Rationality = choose the strategy in stationary state● Irrationality = choose a strategy in transitional state

● Possibility to formalize many social phenomena universally via explicit state dynamics

● For better understanding: play p-beauty contest

References

Understand Dynamics by (2/3)-beauty Contest

What are the numbers chosen by these n players?

Each player i ∈{1, . . ., n} chooses an integer Yi ∈ [0, 100].

Winner(s): player(s) whose Yi is closest to 23

(1n

∑nj=1Yj

).


References

Equilibrium of p-beauty Contest (Moulin, 1986)

Nash Equilibrium when 0 ≤ p < 1: ∀i Yi = 0

1 Let C0 be a set of purely naıve players, who choose from0 to 100 at uniformly random.

2 Since E [ 1|C0|∑

i∈C0Yi ] = 50, a slightly more strategic

player in class C1 will choose round(50× 2/3)=33.

3 Further strategic players in class C2 will chooseround(33× 2/3)=22. Players in class C3 will choose ...

At convergence, every player should choose zero.

However, do you believe such prediction?


References

A Result of p-beauty Contest by Real Humans

Mean is apart from 0 (Camerer et al., 2004; Ho et al., 2006).

Table: Average Choice in (2/3)-beauty Contests

Subject Pool Group Size Sample Size Mean[Yi ]Caltech Board 73 73 49.4

80 year olds 33 33 37.0High School Students 20-32 52 32.5

Economics PhDs 16 16 27.4Portfolio Managers 26 26 24.3

Caltech Students 3 24 21.5Game Theorists 27-54 136 19.1


References

Unreality of Nash Equilibrium

Every player is homogeneous.All of them adopt the same thinking process.

Every player has infinite forecasting horizon.Can all real humans think so intelligently?

Such unrealistic assumption leads vulnerability to perturbation.

What if one player does not understand the game rule?

What if one player intends to punish “rational” others?


Analyzing More Complex Interactions

● Agent-based modeling● Can be free from some assumptions: homogeneity,

complete information, rationality, etc.

● True challenge: design of good agent models● Often too many degrees of freedom in tuning

J. M. Epstein, “Generative Social Science: Studies in Agent-Based Computational

Modeling,“ Princeton University Press, 2012.

S. F. Railsback and V. Grimm, “Agent-Based and Individual-Based Modeling: A Practical

Introduction,“ Princeton University Press, 2011.

One Viewpoint for Good Agent Design● Explicitly model human's bounded rationality.

● Irrationality is not the outcome of human's stupidity.● Human does try optimization, but cannot reach the

true optimum due to the lack of mental resources.– Finite memory about past events– Uncertainty over the future environment– Uncertainty over other agents' decisions

● Refer to Behavioral Game Theory● Jewels in modeling bounded-rational agents

Short Summary● Discussed irrationality in the real world.

● Observed that transitional states are often more realistic forecasts than equilibrium.

● Discussed direction for good agent models: hints for accurately modeling dynamics.

Agenda





What is Rational / Irrational?● Rationality = optimizing a consistent objective● Irrationality = any behavior different from rationality

● Inconsistent optimization risks being manipulated by others.● E.g., hyperbolic discounting: time-inconsistent preference order

causes vulnerability of money pumps.

● Other forms of irrational decision making● Choice from options whose coverage is manipulated by others

References

Discrete Choice Modelling

Goal: predict prob. of choosing an option from a choice set.

Why solving this problem?

For business: brand positioning among competitors

For business: sales promotion (yet involving some abuse)

To deeply understand how human makes decisionsSmartNews Connecting Machine Learning Algorithms with Behavioral Game Theory

References

Random Utility Theory as Rational Model

Each human is a maximizer of a probabilistic utility.

i ’s choice from Si = arg maxj∈Si

fi (v j )︸︷︷︸mean utility

+ εij︸︷︷︸random noise

Si : choice set for i , v j : vector of j ’s attributes, fi : i ’smean utility function

Assuming independence among every option’s attractiveness

For both mean and noise: (e.g., logit (McFadden, 1980))For only mean: (e.g., nested logit (Williams, 1977))


References

Context Effects: Complexity of Human’s Choice

An example of choosing PC (Kivetz et al., 2004)

Each subject chooses 1 option from a choice set

A B C D ECPU [MHz] 250 300 350 400 450Mem. [MB] 192 160 128 96 64

Choice Set #subjects{A, B, C} 36:176:144{B, C, D} 56:177:115{C, D, E} 94:181:109

Can random utility theory still explain the preference reversals?

B�C or C�B?


References

Similarity Effect (Tversky, 1972)

Top-share choice can change due to correlated utilities.

E.g., one color from {Blue, Red} or {Violet, Blue, Red}?


References

Attraction Effect (Huber et al., 1982)

Introduction of an absolutely-inferior option A− (=decoy)causes irregular increase of option A’s attractiveness.

Despite the natural guess that decoy never affects the choice.

If D�A, then D�A�A−.

If A�D, then A is superior to both A− and D.


References

Compromise Effect (Simonson, 1989)

Moderate options within each chosen set are preferred.

Different from non-linear utility function involvingdiminishing returns (e.g.,

√inexpensiveness+

√quality).


Multiple Equilibria also Spoil Forecasts● Pivotal mechanism (Clarke, 1971) to decide

whether to start a public project● Every player discloses a utility of the project outcome.● If and only if sum(utilities) > 0, then project is started.● Player i must pay tax amount abs(other players' utility sum), when

sign of player i's utility and that of other players is opposite.

● For every player, honestly disclosing his true utility is optimal regardless other players' utilities.

sum(utilities) = 1decision = start

disclosed utility -1 2 5 -3 -2

tax 2 0 0 4 3

Multiple Equilibria also Spoil Forecasts● Failure of pivotal mechanism (Attiyeh+, 2000)

● Being rational is difficult because of too complex rules● Even if rationality leading into an equilibrium exists,

which equilibrium will be actually chosen?

● Each equilibrium has its own path from initial state.● Identifying both of the path and finite time is hard.● One promising way: converting transitional state in one

game into an equilbrium of other game.

Short Summary● Introduced more examples of irrational decision

making by real humans.

● Irrationality spoils forecasting by standard economic models.

● Multiple equilibria further complicate the forecasting in addition to the irrational disequilibrium.

Agenda





References

Game with Heterogeneous Pay-Offs

Which numbers will be chosen by these 3 players?

Each player i ∈{1, . . ., n} chooses an integer Yi ∈ [0, 10].

Player #1’s pay-off: 39 + 12Y1 − (Y1+Y2)2

Player #2’s pay-off: 47 + 20Y2 − (Y2+Y3)2

Player #3’s pay-off: 6Y3 − (Y3− 12(Y1 + Y2))2


References

An Idiot’s View of Game Theory

If other players’ decisions Y \i , (Y1, . . . ,Yi−1,Yi+1, . . . ,Yn)are known, optimal decision Y ∗i for player i is given by

∀i ∈{1, . . . , n} Y ∗i |Y \i = arg maxY

ui (Y ,Y \i ). (1)

ui : utility function of player i

Game theory is merely solving a system of n equations byassuming ∀i Yi ≡ Y ∗i in Eq. (1).

Every player is assumed to be a utility maximizer.

Variety of games just comes from the variable type of Yi .

However, what if players are irrational or unpredictable?


References

Equilibrium of Linearly-Solvable Games

Maximization of concave-quadratic equation = linear equality

0 = 12− 2(Y ∗1 +Y2)

0 = 20− 2(Y ∗2 +Y3)

0 = 6− 2(Y ∗3 −1

2(Y1 + Y2))

∀i Yi≡Y ∗i leads a matrix-vector relationship 2 2 00 2 2−1 −1 2

Y ∗1Y ∗2Y ∗3

=

12206

.

(Y ∗1 ,Y∗

2 ,Y∗

3 ) = (2, 4, 6) with Pay-offs = (27, 27, 27)


References

Belief Learning: Iterative Solving of Game

Equilibrium is tractable only for limited classes of utilityfunctions, while in general is iteratively computed as

t =0: Initialize each player’s decision by some value.

t > 0: Compute the t-step optimum given the(t−1)-step decisions by others.

∀i ∈{1, . . . , n} Y (t)i |Y

(t−1)\i = arg max

Yui (Y ,Y

(t−1)\i )

Belief learning: classes of algorithms to iteratively computethe equilibrium. (t + 1)-step looking-ahead player beats thet-step-only players, (t + 2)-step player beats...

How about using Y(t)

i at finite t, instead of the one at t→∞?


How to Formalize Context Effects?● What dynamics causes context effects?

● Hypothesis: a dynamical process to estimate utility function (Takahashi & Morimura, 2015).

● Irrational contextual effects are observed via regularized estimates of the utility function.

● Machine learning as a dynamical process● Transitionary state in maximum-likelihood estimation● Stationary state in Bayesian shrinkage estimation

References

Gaussian Process Uncertainty Aversion (GPUA)

A dual-personality model regarding utilities as samples instatistics (Takahashi and Morimura, 2015)

Assumption 1: Utility function is partially disclosed to DMS.1 UC computes the sample value of every option’s utility,

and sends only these samples to DMS.2 DMS statistically estimates the utility function.


References

GPUA: Mental Conflict as Bayesian Shrinkage

Assumption 2: DMS does Bayesian shrinkage estimation.i ∈{1, . . . , n}: context, yi ∈{1, . . . ,m[i ]}: final choiceX i , (x i1∈RdX , . . . , x im[i ])

>: features of m[i ] options

Objective Data: values of random utilities

v i ,(vi1, . . . , vim[i ])>∼N

(µi , σ

2Im[i ]

), vij = b+w>φφ (x ij )

µi : Rm[i ]: vec. of the true mean utility, σ2: noise levelb: bias term, φ : RdX →Rdφ : mapping function. wφ: vec. of coefficients

Subjective Prior: choice-set-dependent Gaussian process

µi ∼ N(0m[i ], σ

2K (X i ))

s.t. K (X i ) = (K (x ij , x ij ′))∈Rm[i ]×m[i ]

µi ∈Rm[i ]: vec. of random utilities, K(·, ·): similarity between options

Final choice: based on (Posterior mean u∗i + i.i.d. noise) as

u∗i = K (X i )(Im[i ]+K (X i )

)−1 (b1m[i ]+Φiwφ

),

yi = arg maxj

(u∗ij + εij ) where ∀j εij ∼ Gumbel .


References

GPUA: Irrationality by Bayesian Shrinkage

Implication of (2): similarity-dependent discounting

u∗i = K (X i )(Im[i ]+K (X i )

)−1︸︷︷︸shrinkage factor

(b1m[i ]+Φiwφ

)︸︷︷︸vec. of utility samples

. (2)

Under RBF kernel K (x , x ′) = exp(−γ‖x − x ′‖2),an option dissimilar to others involves high uncertainty.

Strongly shrunk into prior mean 0.

Context effects as Bayesian uncertainty aversion

0 0.2 0.4 0.6 0.8

1 1.2 1.4

1 2 3 4

Fin

al E

valu

ation

X1=(5-X2)

DA- A

{A,D}{A,A

-,D}

0 0.2 0.4 0.6 0.8

1 1.2 1.4

1 2 3 4F

inal E

valu

ation

X1=(5-X2)

DCBA

{A,B,C}{B,C,D}


References

GPUA: Convex Optimization using Posterior Mean

Global fitting of the parameters using data (X i , yi )ni=1

Fix the mapping and similarity functions during updates.

Shrinkage factor H i ,K (X i )(Im[i ] + K (X i ))−1 isconstant!

Obtaining a MAP estimate is convex w.r.t. (b,wφ).

maxb,wφ

n∑i=1

`( bH i 1m[i ]+H i Φiwφ︸︷︷︸Context−specific H i is multiplied .

, yi )−c

2‖wφ‖2

Exploiting the log-concavity of multinomial logit

`(u∗i , yi ), logexp(u∗iyi

)∑m[i ]j ′=1 exp(u∗ij ′)


References

GPUA: Experimental Settings

Evaluates accuracy & log-likelihood for real choice data.

Dataset #1: PC (n=1, 088, dX =2)

Dataset #2: SP (n=972, dX =2)

Subjects are asked of choosing a speaker.

A B C D EPower [Watt] 50 75 100 125 150

Price [USD] 100 130 160 190 220

Choice Set #subjects{A, B, C} 45:135:145{B, C, D} 58:137:111{C, D, E} 95:155: 91

Dataset #3: SM (n=10, 719, dX =23)

SwissMetro dataset (Antonini et al., 2007)Subjects are asked of choosing one transportation, eitherfrom {train, car, SwissMetro} or {train, SwissMetro}.Attribute of option: cost, travel time, headway, seattype, and type of transportation.


References

GPUA: Cross-Validation Performances

High predictability in addition to the interpretable mechanism.

For SP, successfully detected combination of compromiseeffect & prioritization of power.

1st best for PC & SP.

2nd best for higher-dimensional SM: slightly worse thanhighly expressive nonparametric version of mixedmultinomial logit (McFadden and Train, 2000).

-1.1

-1

-0.9

-0.8

Ave

rag

e L

og

-Lik

elih

oo

d

Dataset

PC SP SM

LinLogitNpLogit

LinMixNpMixGPUA

0.3

0.4

0.5

0.6

0.7

Cla

ssific

atio

n A

ccu

racy

Dataset

PC SP SM

LinLogitNpLogit

LinMixNpMixGPUA

2

3

4

100 150 200

Eva

lua

tio

n

Price [USD]

EDCBA

Obj. Eval.{A,B,C}{B,C,D}{C,D,E}


Linking ML with Game Theory (GT)via Shrinkage Principle

Optimizationwithout shrinkage

Optimizationwith shrinkage

ML GT

Maximum-Likelihood estimation

Bayesian estimation Transitional Stateor Quantal Response Equilibrium

Nash Equilibrium

Optimal for training data,but less generalization capability to test data

Optimal for given gamebut less predictable to real-

world decisions

Shrinkage towards uniform probabilities causes suboptimality

for the given game, but more predictable to real-world decisions

Shrinkage towards prior causes suboptimality for training data,

but more generalization capability to test data

References

Quantitative Handling of Irrationality

Iterative equilibrium computation lightens two natural ways.

Early stopping at step k : Level-k thinking or CognitiveHierarchy Theory (Camerer et al., 2004)

Humans cannot predict the infinite future.Using non-stationary transitional state

Randomization of utility via noise εit : Quantal ResponseEquilibrium (McKelvey and Palfrey, 1995)

∀i ∈{1, . . . , n} Y (t)i |Y

(t−1)\i = arg max

Y

[fi (Y ,Y

(t−1)\i ) + εit

]Both methods essentially work as regularization of rationality.

Shrinkage into initial values or uniform choice probabilities

Affinity to Bayesian regularization in ML


References

Logit Quantal Response Equilibrium (LQRE)

A special form of QRE associated with RUT.

If εit obeys the standard Gumbel distribution and

Y(t)

i |Y(t−1)\i = arg max

Y∈Sfi (Y ,Y

(t−1)\i ) + εit/βi ,

then choice probability becomes closed-form as

P(Y(t)

i = y |Y (t−1)\i ) =

exp(βi fi (y ,Y

(t−1)\i )

)∑

y ′∈S exp(βi fi (y ′,Y

(t−1)\i )

) .βi is called the degree of irrationality of player i .

βi→0: uniform choice probability (naıve)

βi→∞: Nash equilibrium (deterministic & rational)


Early Stopping and Regularization

ML as a Dynamical Systemto find the optimal parameters

GT as a Dynamical Systemto find the equilibrium

Parameter #1

Parameter #2 Exact Maximum-likelihood

estimate (e.g., OLS)

Exact Bayesian estimateshrunk towards zero

(e.g., Ridge regression)

0

t=10

t=20

t=30

t=50

An early-stopping estimate

t=0

t=1

t→∞

t=2

...

mean = 50

mean = 34

mean = 15

mean = 0Nash

Equilibrium

Level-2 Transitional State

References

Towards Useful Decision Making by using QRE

Economists discuss when utility functions {fi}ni=1 are known.

QRE is analytically-intractable but can be simulated.

E.g., ad-auction for irrational bidders (Rong et al., 2015)

ML scientists should estimate unknown utility functions!

Extension of statistical marketing research methodsthrough rich functional approximation techniques in ML


References

Multi-Agent Extension of RUT in Marketing

RUT in marketing research has already been data-oriented.

Estimating utility functions from real data

DCM such as Logit model (McFadden, 1980)

Identical opt. objective to multinomial logistic regression

Conjoint analysis (Green and Srinivasan, 1978)

Special case of DCM by showing only 2 optionsRelated with learning to rank problem: see (Chapelleand Harchaoui, 2005)

Adding other-player-dependent terms into existing marketingresearch models yields a simulation model to compute QRE.


References

Possible Formalisms & Algorithmic Studies

Multi-agent generalization of DCM or learning to rank

Simulation-based fitting (e.g., Approximate BayesianComputation (Tavare et al., 1997))

Functional approximations (e.g., Gradient BoostingDecision Trees (Friedman, 2001), Deep Neural Network)with partially-observable other players’ decisions


References

A Future Forecast: Rise of Deep Belief Learning

Belief Learning (BL) vs Reinforcement Learning (RL)

BL: explicitly guessing other players’ thinking processes

RL: choosing optimal actions purely from experiences

Other players’ decision functions are implicitly parts ofthe environment

While predictive accuracies would be similar, BL providesmore white boxes than RL in terms of thinking processes

AlphaGo is a successful application of Deep RL (e.g., (Mnihet al., 2013; van Hasselt et al., 2016)).

What will be killer applications of Deep BL?


Other Approaches for Irrationality● Use quantum theory instead of probability

● Quantum Cognition – (Burza+, 2009; Mogiliansky+, 2009;

de Barros & Suppes, 2009; Busemeyer & Burza, 2012)

● Key mechanism: double standardin quantum theory– During (unobserved) thinking:

integrated over complex state space– In (observed) decision:

classical probability by taking theabsolute magnitude of state

https://www.amazon.co.jp/Quantum-Models-Cognition-Decision-Busemeyer/dp/1107419883/

https://www.amazon.co.jp/Quantum-Models-Cognition-Decision-Busemeyer/dp/1107419883/

Short Summary● Introduced recent advances on mathematical

modeling of human's irrationality, for more accurate forecasts.

● Handled irrationality as transional states in both Machine Learning and Game Theory.

● Importing mathematical techniques from both ML and GT communities will serve better social decision making with more accurate forecasts.

Agenda





WARNING

The following pages exhibit the author's personal opinions on how to make a good research

direction and/or identify a good area of business.

Effectiveness of these ideas has not been scientifically proved. Read them at your own risk.

How PhD changed my life● Before obtaining PhD

● Job: Research Staff Member in a large B2B enterprise

● Research Topic: Required to be sticked with one coherent research direction

● Seeking for problems that are solvable via my Machine Learning (ML) disciplines

● After obtaining PhD● Job: Engineer in a small

B2C startup

● Research Topic: Freedom to target more ambitious topics in broader area

● Integrating ML disciplines with multi-agent perspective obtained during schooling

Hope and Actuality in PhD Course● What I was intending

● Exploit ML for automatically designing agents.

● Or learn the essence in manually designing agents, through seminar discussions.

● What actually occurred● Still difficult to know how

to design agents!– Why this paper's agent

model is designed like this?

● Effective viewpoints on the design of agents came after finishing PhD.

Interplay between Research and Job● Paid Job requires real-world decision making.

● Skin in the game: you cannot use models or approaches that you do not rely on.

● In order to be confident on your approach,make focus & apply Occam's razor strongly. ● Avoid using models #1 & #2 & #3 ... Combintion

makes difficult of root-cause finding in failure.● Define your unique optimization problem, which is

directly solvable by one essential approach.– Also one-principle-based paper is easily publishable.

Interplay between Research and Job● A case in job: how to create network externality?

● The key factor in successful platform business(e.g., Operating Systems, Social Network Services)

● You must have a good mechanism to incentivize users to use your platform.● Do the existing mechanisms really incentivize users?● Are they quantitative to enable real operations?● Freeing from unrealistic assumptions and practicality

requirement are natual sources of research ideas.

Some Tactics under Competitions● Development of the truly universal approach =

Red Ocean fought by the World's Top Talents

● Identify the minimum requirement. You create an approach at least universal in your area.● Make an approach that competitors dislike to use.● Such approach often causes disruptive innovation.

● Do not confuse simplicity with naïvety

Necessity of Ample Surveys● Avoid reinventing wheels. Most industrial

problems have already been partially solved.● Respect & steal other players' ideas by reading. ● Remember that some prior work is written over-

confidently; prior authors do not know conditions that spoil their approaches in your new problem.

● Key for success: good strategy to search for relevant papers and books

Encouraging Bottom-Up Learning ● Check the neighboring disciplines from yours; be

in Optimum Stimulation Level (Berlyne, 1960)● Your brain is strongly stimulated by insights in slightly distant

areas from your expertise.● Deep understanding on the very slight difference between two

areas often clarifies the white space in your area.

Machine Learnng

StatisticsBiostatistics

EconometricsPsychometrics

CognitiveScienceNeuroscience Behavioral

EconomicsBehavioral

Game Theory

Uncertainty is Your Friend● Most people hate uncertainty, but you must love it.

● Further one tactic: beat the irrationality of your competitors!● The more uncertain parts your research or business contains,

the more competitors will be fooled by too much complexity.

● You: solve the entire problem by one critial solution.● Competitors: solve each of the sub-problems by its specific

method, and trapped by poor sub-optima.

● Optimism in face of uncertainty

Uncertainty is Your Friend● Care the difference between risks and uncertainty.

● Risks: volatility calibrated from existing data● Uncertainty: cannot be quantified from data● Donald Rumsfeld's unknown unknowns.

● You do not have take high risks. But you should take high uncertainties.● In big-data era, competitors rush into the areas with

ample datasets, and become professed with risks.● By contrast, the human's nature of hating uncertainty

would remain, and it will be a source of your success.

References

References I

Ainslie, G. W. (1974). Impulse control in pigeons. Journal of theExperimental Analysis of Behavior, 21(3):485–489.

Antonini, G., Gioia, C., Frejinger, E., and Themans, M. (2007).Swissmetro: description of the data.http://biogeme.epfl.ch/swissmetro/examples.html.

Attiyeh, G., Franciosi, R., and Isaac, R. M. (2000). Experimentswith the pivot process for providing public goods. Public Choice,102:95–114.

Azfar, O. (1999). Rationalizing hyperbolic discounting. Journal ofEconomic Behavior and Organization, 38(2):245–252.

Blanchard, O. J. and Watson, M. W. (1982). Bubbles, rationalexpectations and financial markets. Crises in the Economic andFinancial Structure, pages 295–316.


References

References II

Bruza, P., Kitto, K., Nelson, D., and McEvoy, C. (2009). Is theresomething quantum-like about the human mental lexicon?Journal of Mathematical Psychology, 53(5):362–377.

Camerer, C. F., Ho, T. H., and Chong, J. (2004). A cognitivehierarchy model of games. Quarterly Journal of Economics,119:861–898.

Chapelle, O. and Harchaoui, Z. (2005). A machine learningapproach to conjoint analysis. In Advances in NeuralInformation Processing Systems 17, pages 257–264. MIT Press,Cambridge, MA, USA.

Clarke, E. H. (1971). Multipart pricing of public goods. PublicChoice, 2:19–33.

Cont, R. (2001). Empirical properties of asset returns: stylizedfacts and statistical issues. Quantitative Finance, 1(2):223–236.


References

References III

Cubitt, R. P. and Sugden, R. (2001). On money pumps. Gamesand Economic Behavior, 37(1):121–160.

de Barros, J. A. and Suppes, P. (2009). Quantum mechanics,interference, and the brain. Journal of MathematicalPsychology, 53(5):306–313.

Dotson, J. P., Lenk, P., Brazell, J., Otter, T., Maceachern, S. N.,and Allenby, G. M. (2009). A probit model with structuredcovariance for similarity effects and source of volumecalculations. http://ssrn.com/abstract=1396232.

Farmer, J. D. and Geanakoplos, J. (2009). Hyperbolic discountingis rational: Valuing the far future with uncertain discount rates.Cowles Foundation Discussion Paper No. 1719.

Friedman, J. H. (2001). Greedy function approximation: A gradientboosting machine. Annals of Statistics, 29(5):1189–1232.


References

References IV

Gonzalez-Vallejo, C. (2002). Making trade-offs: A probabilistic andcontext-sensitive model of choice behavior. PsychologicalReview, 109:137–154.

Green, P. and Srinivasan, V. (1978). Conjoint analysis in consumerresearch: Issues and outlook. Journal of Consumer Research,5:103–123.

Ho, T. H., Lim, N., and Camerer, C. F. (2006). Modeling thepsychology of consumer and firm behavior with behavioraleconomics. Journal of Marketing Research, 43(3):307–331.

Huber, J., Payne, J. W., and Puto, C. (1982). Addingasymmetrically dominated alternatives: Violations of regularityand the similarity hypothesis. Journal of Consumer Research,9:90–98.


References

References V

Johansen, A., Ledoit, O., and Sornette, D. (2000). Crashes ascritical points. International Journal of Theoretical and AppliedFinance, 3:219–255.

Johansen, A. and Sornette, D. (1999). Critical crashes. Risk,12(1):91–94.

Johansen, A., Sornette, D., and Ledoit, O. (1999). Predictingfinancial crashes using discrete scale invariance. Journal of Risk,1(4):5–32.

Kivetz, R., Netzer, O., and Srinivasan, V. S. (2004). Alternativemodels for capturing the compromise effect. Journal ofMarketing Research, 41(3):237–257.

McFadden, D. and Train, K. (2000). Mixed MNL models fordiscrete response. Journal of Applied Econometrics,15:447 –470.


References

References VI

McFadden, D. L. (1980). Econometric models of probabilisticchoice among products. Journal of Business, 53(3):13–29.

McKelvey, R. and Palfrey, T. (1995). Quantal response equilibriafor normal form games. Games and Economic Behavior, 10:6–38.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I.,Wierstra, D., and Riedmiller, M. (2013). Playing atari with deepreinforcement learning. In NIPS Deep Learning Workshop.

Mogiliansky, A. L., Zamir, S., and Zwirn, H. (2009). Typeindeterminacy: A model of the KT (kahneman tversky)-man.Journal of Mathematical Psychology, 53(5):349–361.

Moulin, H. (1986). Game Theory for the Social Sciences. NYUPress, second edition edition.


References

References VII

Roe, R. M., Busemeyer, J. R., and Townsend, J. T. (2001).Multialternative decision field theory: A dynamic connectionistmodel of decision making. Psychological Review, 108:370–392.

Rong, J., Qin, T., and An, B. (2015). Computing quantal responseequilibrium for sponsored search auctions. In Proceedings of the2015 International Conference on Autonomous Agents andMultiagent Systems (AAMAS 2015), pages 1803–1804,Richland, SC. International Foundation for Autonomous Agentsand Multiagent Systems.

Shenoy, P. and Yu, A. J. (2013). A rational account of contextualeffects in preference choice: What makes for a bargain? InProceedings of the Cognitive Science Society Conference.

Simonson, I. (1989). Choice based on reasons: The case ofattraction and compromise effects. Journal of ConsumerResearch, 16:158–174.


References

References VIII

Sornette, D. (2004). Why Stock Markets Crash: Critical Events inComplex Financial Systems. Princeton University Press.

Takahashi, R. and Morimura, T. (2015). Predicting preferencereversals via gaussian process uncertainty aversion. InProceedings of the 18th International Conference on ArtificialIntelligence and Statistics (AISTATS 2015), pages 958–967.

Tavare, S., Balding, D. J., Griffiths, R. C., and Donnelly, P.(1997). Inferring coalescence times from DNA sequence data.Genetics, 145(2):505–518.

Thurner, S., Farmer, J. D., and Geanakoplos, J. (2012). Leveragecauses fat tails and clustered volatility. Quantitative Finance,12(5):695–707.


References

References IX

Trueblood, J. S. (2014). The multiattribute linear ballisticaccumulator model of context effects in multialternative choice.Psychological Review, 121(2):179– 205.

Tversky, A. (1972). Elimination by aspects: A theory of choice.Psychological Review, 79:281–299.

Usher, M. and McClelland, J. L. (2004). Loss aversion andinhibition in dynamical models of multialternative choice.Psychological Review, 111:757– 769.

van Hasselt, H., Guez, A., and Silver, D. (2016). Deepreinforcement learning with double Q-learning. In Proceedings ofthe Thirtieth AAAI Conference on Artificial Intelligence(AAAI-16).

Wen, C.-H. and Koppelman, F. (2001). The generalized nestedlogit model. Transportation Research Part B, 35:627–641.


References

References X

Williams, H. (1977). On the formulation of travel demand modelsand economic evaluation measures of user benefit. Environmentand Planning A, 9(3):285–344.

Yai, T. (1997). Multinomial probit with structured covariance forroute choice behavior. Transportation Research Part B:Methodological, 31(3):195–207.

Yue, Y. and Guestrin, C. (2011). Linear submodular bandits andtheir application to diversified retrieval. In Shawe-taylor, J.,Zemel, R., Bartlett, P., Pereira, F., and Weinberger, K., editors,Advances in Neural Information Processing Systems 24, pages2483–2491.


References

References XI

Zhang, S. and Yu, A. J. (2013). Forgetful Bayes and myopicplanning: Human learning and decision-making in a banditsetting. In Burges, C., Bottou, L., Welling, M., Ghahramani, Z.,and Weinberger, K., editors, Advances in Neural InformationProcessing Systems 26, pages 2607–2615. Curran Associates,Inc.

Ziegler, C.-N., McNee, S. M., Konstan, J. A., and Lausen, G.(2005). Improving recommendation lists through topicdiversification. In Proceedings of the 14th internationalconference on World Wide Web (WWW 2005), pages 22–32.ACM.


THANK YOU FOR ATTENDING!

on the dynamics of machine learning algorithms and behavioral game theory

Economy & Finance