rlchina 2021 game theory and machine learning in

Post on 30-Oct-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Reinforcement Learning China Summer School

RLChina 2021

Game Theory and Machine Learning

in Multiagent Communication and

Coordination

Prof. FANG Fei

Leonardo Assistant Professor

School of Computer Science

Carnegie Mellon University

August 20, 2021

Machine Learning + Game Theory

for Societal Challenges

Security & Safety

Environmental

SustainabilityTransportation

Zero HungerArtificial

Intelligence

Machine Learning

Computational

Game Theory

Societal Challenges

2

Protect Ferry Line from Potential Attacks

0

2

4

6

8

Max 𝔼

[π‘ˆ]

Previous USCG

Game-theoretic

Defender-attacker security game

Randomized patrol strategy

Minimize attacker’s maximum

expected utility

Reduce potential risk by 50%Deployed by US Coast Guard

Optimal Patrol Strategy for Protecting Moving Targets with Multiple Mobile

Resources. Fei Fang, Albert Xin Jiang, Milind Tambe. In AAMAS-133

In collaboration with US Coast Guard

Protect Wildlife from Poaching

Data from past patrols

& satellite imageryPredicted poaching threat

Machine Learning Methods

Ensemble Learning, Decision Trees,

Neural Networks, Gaussian

Process, Markov Random Field, …

Learn poacher behavior from data

Ranger-poacher game to plan patrols

Deployed in Uganda, China, Malaysia

Increased detection of poaching

Available to more than 600 sites worldwide

In collaboration with Uganda Wildlife Authority, Wildlife Conservation Society, World Wild Fund for Nature, Panthera, Rimba

IJCAI-15, IAAI-16, AAMAS-17, ECML-PKDD 2017, COMPASS 2019, IAAI 20214

Outline

β€’ Game Theory and Machine Learning for Multiagent

Communication and Coordination

β€’ Role of informants in security games

β€’ Strategic signaling in security games

β€’ Maintaining/breaking information advantage in security games

β€’ Coordination through correlated signals

β€’ Coordination through notification in platform-user settings

β€’ Discussion and Summary

5

Motivation: Community Engagement in Anti-Poaching

6

β€’ Lack of patrol resources, e.g., 1 patroller/167 π‘˜π‘š2

β€’ Recruit informants to provide tips about poachers

β€’ Other domains: community watchers for urban safety

Green Security Game with Community Engagement Taoan Huang, Weiran

Shen, David Zeng, Tianyu Gu, Rohit Singh, Fei Fang In AAMAS-20

Motivation: Community Engagement in Anti-Poaching

β€’ How should the rangers plan patrols with/without tips?

β€’ Informant’s goal may not be always aligned with the

defender

β€’ Strategic informants can choose what to tell

AttackerDefender

Informant!

7When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

β€’ A set 𝑇 of 𝑛 targets

β€’ Defender: choose a patrol strategy

– A (randomized) allocation of π‘Ÿ resources to 𝑛 targets

β€’ Attacker: attack a target

β€’ Informant: has type πœƒ ∈ Θ with prior distribution 𝑝(πœƒ), send a message to defender

β€’ Defender: determine a defense plan

Defender-Attacker-Informant Game

When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Utility Covered Uncovered

Defender

Attacker

8

Defender-Attacker-Informant Game

Step 1

Step 2

Step 3

Step 4

9

β€’ A defense plan 𝑑 = 𝑀, π‘₯, π‘₯0

– 𝑀: A set of possible messages

– π‘₯0 ∈ [0,1]𝑛: A routine patrol strategy (when no messages)

– π‘₯:𝑀 β†’ [0,1]𝑛: A mapping from message to a patrol strategy

When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Direct Defense Plan

10

In a direct defense plan 𝑑 = 𝑀, π‘₯, π‘₯0 , 𝑀 = 𝑇 Γ— Θ. A direct defense plan is truthful, if reporting the

actual target and his true type is the informant’s

best strategy.

Definition

When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Revelation Principle

11

For any defense plan 𝑀, π‘₯, π‘₯0 , there exists a truthful

direct defense plan 𝑀, π‘₯, π‘₯0 , such that all players

obtain the same utility, for any target and any type.

Theorem (Revelation Principle)

When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

How many messages are enough?

There exists a defense plan 𝑀, π‘₯, π‘₯0 , with 𝑀 = 𝑛 + 1that achieves the optimal defender’s utility.

Theorem

12

Upper bound: Direct defense plan: 𝑀 = 𝑛|Θ|

Interpretation

Message 1 to 𝑛: pro-defender informants

Message 𝑛 + 1: pro-attacker informants

For target 𝑑:

Informant reports message 𝑑, if π‘ˆπ‘‘π‘ πœƒ > π‘ˆπ‘‘

𝑒 πœƒ

Informant reports message 𝑛 + 1, if π‘ˆπ‘‘π‘’ πœƒ > π‘ˆπ‘‘

𝑐 πœƒ

When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Computation

β€’ The optimal defense plan can be computed in

polynomial time

β€’ Solve a linear program (LP) for each target

– Each LP ensures

β€’ The attacker’s best strategy is to attack a target 𝑑

β€’ Informant’s best strategy is to report π‘šπ‘‘ if the

informant is defender aligned on target 𝑑, i.e.,

π‘ˆπ‘‘π‘ πœƒ > π‘ˆπ‘‘

𝑒 πœƒ

β€’ Informant’s best strategy is to report π‘šπ‘›+1 if the

informant is attacker aligned on target 𝑑, i.e.,

π‘ˆπ‘‘π‘ πœƒ < π‘ˆπ‘‘

𝑒 πœƒ

13When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Computation

14

The attacker’s best strategy is to

attack a target 𝑑

Informant’s best strategy is to

report π‘šπ‘‘β€² if the informant is

defender aligned on target 𝑑′, and

to report π‘šπ‘›+1 otherwise

Maximize the defender’s expected

utility

LP assuming target 𝑑 is the best choice for attacker

Defender has π‘Ÿ resources in total

When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

β€’ Utility vs. informant type

– Type 1: Fully defender aligned: π‘ˆπ‘‘π‘ πœƒ > π‘ˆπ‘‘

𝑒 πœƒ , βˆ€π‘‘

– Type 2: Fully attacker aligned: π‘ˆπ‘‘π‘ πœƒ < π‘ˆπ‘‘

𝑒 πœƒ , βˆ€π‘‘

– Type 3: Random

β€’ Informant could significantly affect the game

Experiments

15When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Experiments

β€’ If informant is not fully aligned with defender, more

defender resources are needed to achieve the same

expected utility

β€’ Giving the informant additional reward helps a lot

16When to Follow the Tip: Security Games with Strategic Informants Weiran

Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20

Outline

β€’ Game Theory and Machine Learning for Multiagent

Communication and Coordination

β€’ Role of informants in security games

β€’ Strategic signaling in security games

β€’ Maintaining/breaking information advantage in security games

β€’ Coordination through correlated signals

β€’ Coordination through notification in platform-user settings

β€’ Summary

17

Motivation: UAV & Human Patrols in Anti-Poaching

18SPOT Poachers in Action: Augmenting Conservation Drones with Automatic Detection in Near Real Time. Elizabeth

Bondi, Fei Fang, Mark Hamilton, Debarun Kar, Donnabell Dmello, Jongmoo Choi, Robert Hannaford, Arvind Iyer,

Lucas Joppa, Milind Tambe, Ram Nevatia. In IAAI-18

Motivation: UAV & Human Patrols in Anti-Poaching

19

Not enough rangers

Flash light to deter poachers

Actual video of poacher running away

Signaling

β€’ Flash light is a signal to indicate ranger is arriving

β€’ The signal can be deceptive

β€’ If Prob(ranger arrives|signal) = 0.1, poacher may not be

stopped

β€’ Must be strategic in deceptive signaling

20

Signaling with Perfect Detection

21

Assuming

perfect

detectionNo Signal

no detection

detection

No Signal

0.3

0.7

0.3

0.4

0.3

How to incorporate uncertainty?

Strategic Coordination of Human Patrollers and Mobile Sensors with Signaling for Security

Games Haifeng Xu, Kai Wang, Phebe Vayanos and Milind Tambe AAAI 2018

Signaling with Detection Uncertainty

β€’ Key insight: With uncertainty, adversary also uncertain

about our uncertainty. Exploit the information advantage

22

no detectionNo Signal

?

detection

No Signal

To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and

Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20

Stackelberg Security Game Model

23

Defender utilities at each target:

β€’ Positive if covered

β€’ Negative if attacked

β€’ 0 if attacker runs away

To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and

Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20

β€’ Enumerate all possible β€œstates” of a target

β€’ Defender’s pure strategy: Assign a state to each target

β€’ Goal: Find optimal mixed strategy for the defender

Solution

24

Matched

Unmatched

Matched

UnmatchedPatroller far

p

n+

n-

s+

s-

s

Patroller near

To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and

Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20

Solution

β€’ Linear Programming + Branch and Price

25

π‘₯π‘–πœƒ: Prob. of allocating resources to ensure state πœƒ at target 𝑖

πœ“π‘–πœƒ: Prob. of sending signal in state πœƒ with detection

πœ‘π‘–πœƒ: Prob. of sending signal in state πœƒ without detection

Defender’s expected utility

To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and

Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20

Joint prob. (not

conditional prob.)

Feasibility of π‘ž and π‘₯

Marginal prob.

Feasibility of πœ“ and πœ‘

Attacker attacks if

signal is 0 and runs

away if 1

π‘žπ‘’: Prob. of defender choosing pure strategy 𝑒

Experimental Results

β€’ Perform worse than expected if ignoring uncertainty

26

Case Study

To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and

Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20

Outline

β€’ Game Theory and Machine Learning for Multiagent

Communication and Coordination

β€’ Role of informants in security games

β€’ Strategic signaling in security games

β€’ Maintaining/breaking information advantage in security games

β€’ Coordination through correlated signals

β€’ Coordination through notification in platform-user settings

β€’ Summary

27

Information Advantage

β€’ Consider a finitely repeated Bayesian security game

β€’ 𝑇 rounds

β€’ In each round, defender/attacker chooses actions

π‘Žπ‘‘1, π‘Žπ‘‘2 (targets to protect/attack) simultaneously

β€’ Defender has no commitment power

β€’ Attacker’s type (utility) is unknown to the defender

β€’ Defender need to infer attacker’s type πœ† ∈ Ξ› from their actions

β€’ Prior type distribution 𝐩 = {π‘πœ†}

β€’ Attacker balance between playing myopically and

maintaining information advantage to maximize

accumulated payoff

β€’ Attack can be viewed as (deceptive) signal of his type

β€’ Task: Find optimal defender strategy28

Bayesian Equilibrium

β€’ Rationality

β€’ Belief Consistency: the belief is updated followed

the Bayes' rule

29

Optimality from any decision point onward

30

a (P1)

L R

b (P2) c (P2)

3; 1

K U

1; 3 2; 1

K U

0; 0

a (P1)

L R

b (P2) c (P2)

3; 1

K U

1; 3 2; 1

K U

0; 0

NE NE and Perfect NE

Perfect Bayesian Equilibrium

β€’ Equilibrium refinement for Bayesian Equilibrium

β€’ Sequential rationality starting from any information set

β€’ Most existing work solve using Mathematical

Programming-based method (Nguyen et al. 2019[1]; Guo et

al. 2017[2])

– Very precise

– Lacks scalability: long time and large memory to solve

Thanh H. Nguyen, Yongzhao Wang, Arunesh Sinha, and Michael P. Wellman. Deception

in finitely repeated security games. In AAAI-19

31

Our Algorithm for Computing PBE

β€’ Temporal Induced Self-Play (TISP)

– A framework that can be combined with different

learning algorithms

32

Backward

inductionPolicy

Learning

Belief-space

Approximation

Belief-based

representation

Belief-based representation

β€’ Use belief instead of history: πœ‹(𝑠, 𝑏) instead of πœ‹(β„Ž)

– πœ‹(attack Target 1 in (𝑙 βˆ’ 1) round, 2 in 𝑙 βˆ’ 2 round, . . )is now πœ‹(0.2 prob. of being attacker type a)

β€’ Helps in the case with long history

33

Backward Induction

β€’ Reverse the training process

– From round 𝐿 βˆ’ 1 to round 𝐿 βˆ’ 2, to …, to round 0

– Use trained value network 𝑉 and policy network πœ‹in round 𝑙 + 1 when training round 𝑙

β€’ Do not sample the whole trajectory from round 0

to round 𝐿 βˆ’ 1, but one step trajectory from

round 𝑙 to round 𝑙 + 1.

β€’ Using a special reset function to help

β€’ Different networks for different rounds

β€’ Improve performance without adding training cost

34

Belief Space Approximation

β€’ Sample 𝐾 belief vectors, and train the strategies

specifically conditioning on the belief and round,

β€’ Query time:

πœ‹ π‘Ž 𝑏, 𝑠 = π‘˜=1𝐾 πœ‹πœƒπ‘˜ π‘Ž 𝑠; π‘π‘˜ 𝑀(𝑏, π‘π‘˜)

π‘˜=1𝐾 𝑀(𝑏, π‘π‘˜)

35

Policy Learning

β€’ Policy gradient:

– Update rule changed:

β€’ Regret matching:

πœ‹π‘‘+1 π‘Ž 𝑠, 𝑏 =𝑅𝑑+1(𝑠,𝑏,π‘Ž)

+

π‘Žβ€² 𝑅𝑑+1(𝑠,𝑏,π‘Žβ€²) +

where

π›»πœƒ π‘‰πœ†(πœ‹, 𝑏, 𝑠) =

π‘ŽβˆˆΞ‘

π›»πœƒ πœ‹πœƒ π‘Ž 𝑏, 𝑠 π‘„πœ†(πœ‹, 𝑏, 𝑠, π‘Ž))

= 𝐸[ π‘„πœ† πœ‹, 𝑏, 𝑠, π‘Ž π›»πœƒ π‘™π‘›πœ‹πœƒ(π‘Ž | 𝑏, 𝑠) + π›Ύπ›»πœƒπ‘β€²π›»π‘β€²π‘‰πœ†(πœ‹, 𝑏′, 𝑠′)]

𝑅𝑑+1 𝑠, 𝑏, π‘Ž =

𝜏=1

𝑑

π‘„πœ πœ‹πœ, 𝑠, 𝑏, π‘Ž βˆ’ π‘‰πœ™πœ(πœ‹πœ, 𝑠, 𝑏)

36

Temporal Induced Self-play Training

37

Test-time Policy Transformation

38

Experiment: Security Game

β€’ Better scalability than MP-based method

β€’ Much higher quality than other learning-based method

TISP can be used for more complex Stochastic Bayesian games

39

Outline

β€’ Game Theory and Machine Learning for Multiagent

Communication and Coordination

β€’ Role of informants in security games

β€’ Strategic signaling in security games

β€’ Maintaining/breaking information advantage in security games

β€’ Coordination through correlated signals

β€’ Coordination through notification in platform-user settings

β€’ Summary

40

Coordination in Games

β€’ Correlated equilibrium (CE)

β€’ Correlation device: send private signals to players

– Signals are sampled from a joint probability distribution over the actions of players and represent recommended player behavior

– Equivalent to having a mediator that privately recommends behavior to the players, but does not enforce it

Nash Equilibrium:

Total Utility=7+2=2+7=9

0.5

0.25

0.25

Correlated Equilibrium:

Total Utility=0.25*2(7+2)+0.5(6+6)

=10.5

41

Understand and Compute EFCE

β€’ Extensive-form correlated equilibrium (EFCE):– Correlation device selects

private signals for the players before the game starts;

– Recommendations are revealed incrementally as the players progress in the game tree

β€’ It is computationally challenging to compute EFCE

42

Theoretical Results

β€’ Theorem (informal): Finding an EFCE in a two-player

game can be seen as a bilinear saddle-point problem

β€’ Conceptual implication: A zero-sum game between the

mediator and the deviator

β€’ Computational implication: The bilinear saddle-point

formulation opens the way to the plethora of optimization

algorithm that has been developed specifically for

saddle-point problems

43

minπ‘₯βˆˆπ‘‹maxπ‘¦βˆˆπ‘Œπ‘₯𝑇𝐴𝑦

Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks Gabriele Farina,

Chun Kai Ling, Fei Fang, Tuomas Sandholm. In NeurIPS-19:

Algorithms to Compute EFCE

β€’ Algorithm 1: A simple subgradient descent method

– Exploits the bilinear saddle-point problem formulation

– Use structural properties of EFCEs

– Can lead to better scalability than the prior approach

based on linear programming

β€’ Algorithm 2: A regret minimization-based algorithm

– Adapt the self-play methods based on regret

minimization

– Much more scalable than Algorithm 1

44Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks Gabriele Farina, Chun Kai Ling,

Fei Fang, Tuomas Sandholm. In NeurIPS-19. Efficient Regret Minimization Algorithm for Extensive-Form

Correlated Equilibrium Gabriele Farina, Chun Kai Ling, Fei Fang, Tuomas Sandholm In NeurIPS-19

Copula Learning for Agent Coordination

Correlation

Device

Players in Team

1

Players in Team

2

Our goal is to design the copula, from which we can

derive the distribution of the signals, to achieve good

enough coordination among players

Design the distribution

of the signal

represented by a

copula

e.g., a neural network

Parameterize the copula with a neural network and try to learn the

parameters to ensure the players have incentive to follow the

recommended action

Deep Archimedean Copulas Chun Kai Ling, Fei Fang, Zico Kolter, NeurIPS-2045

Archimedean Copulas

β€’ Copulas: Multivariate CDF with marginals uniform in [𝟎, 𝟏]

β€’ 𝐢 π‘₯1, … , π‘₯𝑑 = 𝑃(𝑋1 ≀ π‘₯1, … , 𝑋𝑑 ≀ π‘₯𝑑)

β€’ Archimedean Copulas: specified by a generator πœ‘: 0,∞ β†’0, 1

β€’ 𝐢 π‘₯1, … π‘₯𝑑 = πœ‘ πœ‘βˆ’1 π‘₯1 +β‹―+ πœ‘

βˆ’1 π‘₯𝑑

β€’ Commonly used A.C. are parameterized by a single scalar πœƒe.g.,

Frank Clayton

πœ‘πœƒ 𝑑 = βˆ’1

πœƒlog(eβˆ’π‘‘ π‘’βˆ’πœƒ βˆ’ 1 + 1) πœ‘πœƒ 𝑑 = 1 + 𝑑

βˆ’1/πœƒ

Image from Scherer, Matthias, and Jan-frederik Mai. Simulating copulas: stochastic models, sampling algorithms, and

applications.

46

Our approach: ACNet

β€’ ACNet learns a πœ‘ as a convex combination of negative

exponentials

β€’ Other probabilistic quantities obtained by differentiation

w.r.t. inputs

– Joint Density: πœ•π‘‘πΆ π‘₯1,…,π‘₯𝑑

πœ•π‘₯1β€¦πœ•π‘₯𝑑

– Conditional densities, conditional distributions, etc.

β€’ Evaluating these quantities requires computing πœ‘βˆ’1

– Provide a wrapper in PyTorch to compute inverses

using Newtons method

– Fully differentiable, derivatives w.r.t. weights are

computed using auto-differentiation

47

Fitting real world data

Ground Truth

Boston INTC-MSFT GOOG-FB

Best parametric

ACNet

48

We find that ACNet…

β€’ Is able to fit synthetic data generated from common A.C.

β€’ Outperforms common A.C. in real-world datasets

β€’ Can give conditional densities/CDFs in a single model

β€’ Can be sampled from efficiently even in high dimensions

β€’ Can be the basis for computing correlated equilibrium in

complex games (e.g., with continuous action space)

β€’ Future direction: Leverage ACNet to compute correlated

equilibrium for complex games

49

Outline

β€’ Game Theory and Machine Learning for Multiagent

Communication and Coordination

β€’ Role of informants in security games

β€’ Strategic signaling in security games

β€’ Maintaining/breaking information advantage in security games

β€’ Coordination through correlated signals

β€’ Coordination through notification in platform-user settings

β€’ Summary

50

Motivation: Volunteer-Based Food Rescue Platform

Volunteer claim

rescue

Pick up from

donor

Deliver to recipient

Success!

Food waste and food insecurity coexist Waste up to 40% of our food globally (>1.3 billion tons annually)

1 in 8 people go hungry every day

Rescue good food!

Post rescue

requests

In collaboration with 412 Food Rescue (412FR)51

Motivation: Volunteer-Based Food Rescue Platform

β€’ Challenges

– Uncertainty about whether a rescue will be claimed

and completed

Human dispatcher intervenesSend notifications to volunteers

1-to-many communication

Volunteer: notification fatigue

1-to-1 communication

Dispatcher: overstretched

52

Motivation: Volunteer-Based Food Rescue Platform

β€’ How can AI help?

– Predictive model of rescue claim status

β€’ Determine which rescues need special attention

from human dispatcher

– Data-driven optimization of intervention and

notification

β€’ Avoid excessive notifications to help retain

volunteers

Improving Efficiency of Volunteer-Based Food Rescue Operations. Zheyuan

Ryan Shiβˆ—, Yiwen Yuanβˆ—, Kimberly Lo, Leah Lizarondo, Fei Fang. In IAAI-20.

53

Predictive Model of Rescue Claim Status

Timing

Weather

Location

Percentage of unclaimed

rescues by zip codeFeatures used for predictive model

Operational dataset of 412FR from March 2018 to May 2019

54

Predictive Model of Rescue Claim Status

A stacking model

Predict whether a rescue will be claimed by volunteers

+: Claimed (3825)

-: Not claimed (749)

NN

Training data: May 2018 to Dec 2018

Test data: Jan 2019 to May 2019

55

Predictive Model of Rescue Claim Status

Predict whether a rescue will be claimed by volunteers

Model Accuracy Precision Recall F1 AUC

Gradient

boosting0.73 0.86 0.82 0.84 0.51

Random forest 0.71 0.87 0.78 0.82 0.54

Gaussian

process0.56 0.88 0.54 0.67 0.60

Stacking model 0.69 1.00* 0.64 0.78 0.81

56

Rescue

published

Dispatcher

Intervene

60 minutes

5 miles

15 minutes

1st wave

notification

Notify all

volunteers

Optimize Intervention and Notification Scheme

Current Practice (Default INS)

964 volunteers get 1st-wave notification on average

44.6% rescues are claimed by volunteers receiving 1st-wave

notification 57

Rescue

published

Dispatcher

Intervene

𝑧 minutes

𝑦 miles

π‘₯ minutes

1st wave

notification

Notify all

volunteers

(2nd-wave)

Optimize Intervention and Notification Scheme

58

Improve INS with minor changes

Task: Find best values of π‘₯, 𝑦, 𝑧 to reduce notifications and

human intervention while ensuring at least the same claim rate

Optimize Intervention and Notification Scheme

β€’ Optimization problem

59

minπ‘₯,𝑦,𝑧𝑣 𝑦 + π‘ž π‘₯, 𝑦, 𝑧 + πœ† Γ— 𝑠 π‘₯, 𝑦, 𝑧

s.t. 𝑝 π‘₯, 𝑦, 𝑧 β‰₯ 𝑏

𝑣 𝑦 = Expected # of volunteers receiving 1st-wave notification

π‘ž π‘₯, 𝑦, 𝑧 = Expected # of volunteers receiving 2nd-wave

notification𝑠 π‘₯, 𝑦, 𝑧 = Expected # of rescues requiring human

intervention

π‘₯, 𝑦, 𝑧 ∈ 𝑆

Claim rate β‰₯ threshold

Counterfactual Estimation

β€’ How to estimate 𝑣 𝑦 , π‘ž π‘₯, 𝑦, 𝑧 , 𝑠 π‘₯, 𝑦, 𝑧 , 𝑝(π‘₯, 𝑦, 𝑧)?

– Assume rescue and volunteer distribution remain the

same

– Estimate the quantities based on historical rescues

– For each rescue, we calculate the counterfactual

claim time (CCT) under an INS with a number of

assumptions

β€’ Assumption 1: Upon receiving the notification, a

volunteer take the same amount of time to respond

β€’ Assumption 2: Success of intervention is not

affected by INS

β€’ …

60

Branch-and-Bound Algorithm

β€’ Can we avoid enumerating all possible INSs?

β€’ Estimate a lower bound of objective value with INS

(π‘₯, 𝑦, 𝑧) when a subset of parameters are specified

β€’ Use the lower bound to prioritize and prune INSs through

branch-and-bound

61

π‘₯, 𝑦, 𝑧 ∈ 𝑆

Example: π‘₯ = 𝑒𝑛𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑒𝑑, 𝑦 = 5 π‘šπ‘–π‘™π‘’π‘ , 𝑧 = 45 π‘šπ‘–π‘›

Lower bound = 𝑣 𝑦 + π‘ž π‘₯π‘šπ‘Žπ‘₯ , 𝑦, 𝑧 + πœ† 𝑠 π‘₯π‘šπ‘–π‘›, 𝑦, 𝑧

1st wave 2nd wave Human

Branch-and-Bound Algorithm

62

(π‘₯, 𝑦, 𝑧)unspecified

𝑧 = 45 𝑧 = 60𝑧 = 50 𝑧 = 55

𝑦 = 4.5 𝑦 = 6𝑦 = 5 𝑦 = 5.5

Calculate LB

π‘₯ = 14π‘₯ = 15

π‘₯ = 16

Calculate LB

Calculate Objective Value

𝑂𝑏𝑗 = 2500

𝐿𝐡 = 2600

Recommended INS

β€’ Optimize on data from May 2018 to Dec 2018

β€’ Test on test data from Jan 2019 to May 2019

β€’ Recommend INS by checking Pareto frontier

Deployed

since Jan

2020 63

Rescue-Specific Notification Scheme

β€’ Can we do better than simply changing the parameters

of the default INS?

– Send 1st-wave notifications to volunteers that are

more likely to claim the rescue

β€’ Task: Given a rescue, provide a list of π‘˜ volunteers

β€’ Similar to recommender systems

– Users - Rescue trips

– Items - Volunteers

A Recommender System for Crowdsourcing Food Rescue Platforms. Zheyuan

Ryan Shi, Leah Lizarondo, Fei Fang In WWW-21

64

Distribution of Donor and Recipient Organizations

0 1 2

3 4 5

6 7 8

10 119

12 13 14

15

0 1 2

3 4 5

6 7 8

10 119

12 13 14

15

Donor Organizations Recipient Organizations

Divide the Pittsburgh area into 16 regions

65

Feature Extraction

Feature extraction

Rescue

Volunteer

Volunteer’s # completed

rescues in donor’s region

Volunteer’s # completed

rescues in recipient’s region

Volunteer’s total # completed

rescues

Time between rescue and

volunteer’s onboarding

Distance between donor and

volunteer

66

Predictive Model of Rescue-Volunteer Compatibility

Predict whether a rescue will be claimed by a specific volunteer

+: Claimed

-: Not claimed Training data: Mar 2018 to Oct 2019

Test data: Nov 2019 to Mar 2020

Feature

s

Neural

network

with 4

hidden

layers

6757 rescues

9212 volunteers

67

Rescue-Specific Notification

Volunteer 1

Volunteer 2

Volunteer N

…

0.341

0.105

0.422

0.663

0.635

0.002

Volunteer 8346

Volunteer 333

Volunteer 1835

…

68

Evaluation

β€’ Metric: Hit ratio at top π‘˜ (HR@k): % of rescues that are

claimed by volunteers in top π‘˜

β€’ π‘˜ = 964 to match the default INS

69

Caveat with Rescue-Specific Notification

β€’ ML model discovers some frequent volunteers and

sends them notifications almost all the time

70

ML + Online Planning

β€’ Rather than greedily taking the top π‘˜ volunteers, we

enforce a constraint such that each volunteer receives at

most 𝐿 notifications per day

β€’ For current rescue 𝑖, determine who to send notifications

to by planning with an projected set of future rescues 𝑅

71

π‘₯𝑖𝑗 ∈ {0,1}: Whether to send

notification of rescue 𝑖 to

volunteer 𝑗

𝑝𝑖𝑗 ∈ 0,1 : Output of ML model

indicating the prob. that

volunteer 𝑗 will claim rescue 𝑖

𝑏𝑗 ∈ {0, … , 𝐿}: Number of

notifications volunteer 𝑗 can

receive for the rest of the day

Online Planning-Based Rescue-Specific Notification

β€’ Avoid the over-concentration with 𝐿 = 5

β€’ HR@k = 0.645, much better than current practice

72

(𝐿)

Outline

β€’ Game Theory and Machine Learning for Multiagent

Communication and Coordination

β€’ Role of informants in security games

β€’ Strategic signaling in security games

β€’ Maintaining/breaking information advantage in security games

β€’ Coordination through correlated signals

β€’ Coordination through notification in platform-user settings

β€’ Summary

73

Summary

β€’ Communication and Coordination in Multi-Agent

Interaction

– Informants, Signals, Notifications

– Historical actions

β€’ ML+ GT for Communication and Coordination

– Mathematical programming-based algorithms

– Learn human behavior

– Learn equilibrium / optimal strategy

β€’ ML+ GT for Societal Challenges

– Security, Sustainability, Food security

74

Acknowledgment

β€’ Advisors, postdocs, students and all co-authors!

β€’ Collaborators and partners

β€’ Funding support

75

top related