rlchina 2021 game theory and machine learning in
Post on 30-Oct-2021
2 Views
Preview:
TRANSCRIPT
Reinforcement Learning China Summer School
RLChina 2021
Game Theory and Machine Learning
in Multiagent Communication and
Coordination
Prof. FANG Fei
Leonardo Assistant Professor
School of Computer Science
Carnegie Mellon University
August 20, 2021
Machine Learning + Game Theory
for Societal Challenges
Security & Safety
Environmental
SustainabilityTransportation
Zero HungerArtificial
Intelligence
Machine Learning
Computational
Game Theory
Societal Challenges
2
Protect Ferry Line from Potential Attacks
0
2
4
6
8
Max πΌ
[π]
Previous USCG
Game-theoretic
Defender-attacker security game
Randomized patrol strategy
Minimize attackerβs maximum
expected utility
Reduce potential risk by 50%Deployed by US Coast Guard
Optimal Patrol Strategy for Protecting Moving Targets with Multiple Mobile
Resources. Fei Fang, Albert Xin Jiang, Milind Tambe. In AAMAS-133
In collaboration with US Coast Guard
Protect Wildlife from Poaching
Data from past patrols
& satellite imageryPredicted poaching threat
Machine Learning Methods
Ensemble Learning, Decision Trees,
Neural Networks, Gaussian
Process, Markov Random Field, β¦
Learn poacher behavior from data
Ranger-poacher game to plan patrols
Deployed in Uganda, China, Malaysia
Increased detection of poaching
Available to more than 600 sites worldwide
In collaboration with Uganda Wildlife Authority, Wildlife Conservation Society, World Wild Fund for Nature, Panthera, Rimba
IJCAI-15, IAAI-16, AAMAS-17, ECML-PKDD 2017, COMPASS 2019, IAAI 20214
Outline
β’ Game Theory and Machine Learning for Multiagent
Communication and Coordination
β’ Role of informants in security games
β’ Strategic signaling in security games
β’ Maintaining/breaking information advantage in security games
β’ Coordination through correlated signals
β’ Coordination through notification in platform-user settings
β’ Discussion and Summary
5
Motivation: Community Engagement in Anti-Poaching
6
β’ Lack of patrol resources, e.g., 1 patroller/167 ππ2
β’ Recruit informants to provide tips about poachers
β’ Other domains: community watchers for urban safety
Green Security Game with Community Engagement Taoan Huang, Weiran
Shen, David Zeng, Tianyu Gu, Rohit Singh, Fei Fang In AAMAS-20
Motivation: Community Engagement in Anti-Poaching
β’ How should the rangers plan patrols with/without tips?
β’ Informantβs goal may not be always aligned with the
defender
β’ Strategic informants can choose what to tell
AttackerDefender
Informant!
7When to Follow the Tip: Security Games with Strategic Informants Weiran
Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20
β’ A set π of π targets
β’ Defender: choose a patrol strategy
β A (randomized) allocation of π resources to π targets
β’ Attacker: attack a target
β’ Informant: has type π β Ξ with prior distribution π(π), send a message to defender
β’ Defender: determine a defense plan
Defender-Attacker-Informant Game
When to Follow the Tip: Security Games with Strategic Informants Weiran
Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20
Utility Covered Uncovered
Defender
Attacker
8
Defender-Attacker-Informant Game
Step 1
Step 2
Step 3
Step 4
9
β’ A defense plan π = π, π₯, π₯0
β π: A set of possible messages
β π₯0 β [0,1]π: A routine patrol strategy (when no messages)
β π₯:π β [0,1]π: A mapping from message to a patrol strategy
When to Follow the Tip: Security Games with Strategic Informants Weiran
Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20
Direct Defense Plan
10
In a direct defense plan π = π, π₯, π₯0 , π = π Γ Ξ. A direct defense plan is truthful, if reporting the
actual target and his true type is the informantβs
best strategy.
Definition
When to Follow the Tip: Security Games with Strategic Informants Weiran
Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20
Revelation Principle
11
For any defense plan π, π₯, π₯0 , there exists a truthful
direct defense plan π, π₯, π₯0 , such that all players
obtain the same utility, for any target and any type.
Theorem (Revelation Principle)
When to Follow the Tip: Security Games with Strategic Informants Weiran
Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20
How many messages are enough?
There exists a defense plan π, π₯, π₯0 , with π = π + 1that achieves the optimal defenderβs utility.
Theorem
12
Upper bound: Direct defense plan: π = π|Ξ|
Interpretation
Message 1 to π: pro-defender informants
Message π + 1: pro-attacker informants
For target π‘:
Informant reports message π‘, if ππ‘π π > ππ‘
π’ π
Informant reports message π + 1, if ππ‘π’ π > ππ‘
π π
When to Follow the Tip: Security Games with Strategic Informants Weiran
Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20
Computation
β’ The optimal defense plan can be computed in
polynomial time
β’ Solve a linear program (LP) for each target
β Each LP ensures
β’ The attackerβs best strategy is to attack a target π‘
β’ Informantβs best strategy is to report ππ‘ if the
informant is defender aligned on target π‘, i.e.,
ππ‘π π > ππ‘
π’ π
β’ Informantβs best strategy is to report ππ+1 if the
informant is attacker aligned on target π‘, i.e.,
ππ‘π π < ππ‘
π’ π
13When to Follow the Tip: Security Games with Strategic Informants Weiran
Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20
Computation
14
The attackerβs best strategy is to
attack a target π‘
Informantβs best strategy is to
report ππ‘β² if the informant is
defender aligned on target π‘β², and
to report ππ+1 otherwise
Maximize the defenderβs expected
utility
LP assuming target π‘ is the best choice for attacker
Defender has π resources in total
When to Follow the Tip: Security Games with Strategic Informants Weiran
Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20
β’ Utility vs. informant type
β Type 1: Fully defender aligned: ππ‘π π > ππ‘
π’ π , βπ‘
β Type 2: Fully attacker aligned: ππ‘π π < ππ‘
π’ π , βπ‘
β Type 3: Random
β’ Informant could significantly affect the game
Experiments
15When to Follow the Tip: Security Games with Strategic Informants Weiran
Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20
Experiments
β’ If informant is not fully aligned with defender, more
defender resources are needed to achieve the same
expected utility
β’ Giving the informant additional reward helps a lot
16When to Follow the Tip: Security Games with Strategic Informants Weiran
Shen, Weizhe Chen, Taoan Huang, Rohit Singh, Fei Fang In IJCAI-PCAI-20
Outline
β’ Game Theory and Machine Learning for Multiagent
Communication and Coordination
β’ Role of informants in security games
β’ Strategic signaling in security games
β’ Maintaining/breaking information advantage in security games
β’ Coordination through correlated signals
β’ Coordination through notification in platform-user settings
β’ Summary
17
Motivation: UAV & Human Patrols in Anti-Poaching
18SPOT Poachers in Action: Augmenting Conservation Drones with Automatic Detection in Near Real Time. Elizabeth
Bondi, Fei Fang, Mark Hamilton, Debarun Kar, Donnabell Dmello, Jongmoo Choi, Robert Hannaford, Arvind Iyer,
Lucas Joppa, Milind Tambe, Ram Nevatia. In IAAI-18
Motivation: UAV & Human Patrols in Anti-Poaching
19
Not enough rangers
Flash light to deter poachers
Actual video of poacher running away
Signaling
β’ Flash light is a signal to indicate ranger is arriving
β’ The signal can be deceptive
β’ If Prob(ranger arrives|signal) = 0.1, poacher may not be
stopped
β’ Must be strategic in deceptive signaling
20
Signaling with Perfect Detection
21
Assuming
perfect
detectionNo Signal
no detection
detection
No Signal
0.3
0.7
0.3
0.4
0.3
How to incorporate uncertainty?
Strategic Coordination of Human Patrollers and Mobile Sensors with Signaling for Security
Games Haifeng Xu, Kai Wang, Phebe Vayanos and Milind Tambe AAAI 2018
Signaling with Detection Uncertainty
β’ Key insight: With uncertainty, adversary also uncertain
about our uncertainty. Exploit the information advantage
22
no detectionNo Signal
?
detection
No Signal
To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and
Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20
Stackelberg Security Game Model
23
Defender utilities at each target:
β’ Positive if covered
β’ Negative if attacked
β’ 0 if attacker runs away
To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and
Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20
β’ Enumerate all possible βstatesβ of a target
β’ Defenderβs pure strategy: Assign a state to each target
β’ Goal: Find optimal mixed strategy for the defender
Solution
24
Matched
Unmatched
Matched
UnmatchedPatroller far
p
n+
n-
s+
s-
s
Patroller near
To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and
Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20
Solution
β’ Linear Programming + Branch and Price
25
π₯ππ: Prob. of allocating resources to ensure state π at target π
πππ: Prob. of sending signal in state π with detection
πππ: Prob. of sending signal in state π without detection
Defenderβs expected utility
To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and
Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20
Joint prob. (not
conditional prob.)
Feasibility of π and π₯
Marginal prob.
Feasibility of π and π
Attacker attacks if
signal is 0 and runs
away if 1
ππ: Prob. of defender choosing pure strategy π
Experimental Results
β’ Perform worse than expected if ignoring uncertainty
26
Case Study
To Signal or Not To Signal: Exploiting Uncertain Real-Time Information in Signaling Games for Security and
Sustainability Elizabeth Bondi, Hoon Oh, Haifeng Xu, Fei Fang, Bistra Dilkina, Milind Tambe In AAAI-20
Outline
β’ Game Theory and Machine Learning for Multiagent
Communication and Coordination
β’ Role of informants in security games
β’ Strategic signaling in security games
β’ Maintaining/breaking information advantage in security games
β’ Coordination through correlated signals
β’ Coordination through notification in platform-user settings
β’ Summary
27
Information Advantage
β’ Consider a finitely repeated Bayesian security game
β’ π rounds
β’ In each round, defender/attacker chooses actions
ππ‘1, ππ‘2 (targets to protect/attack) simultaneously
β’ Defender has no commitment power
β’ Attackerβs type (utility) is unknown to the defender
β’ Defender need to infer attackerβs type π β Ξ from their actions
β’ Prior type distribution π© = {ππ}
β’ Attacker balance between playing myopically and
maintaining information advantage to maximize
accumulated payoff
β’ Attack can be viewed as (deceptive) signal of his type
β’ Task: Find optimal defender strategy28
Bayesian Equilibrium
β’ Rationality
β’ Belief Consistency: the belief is updated followed
the Bayes' rule
29
Optimality from any decision point onward
30
a (P1)
L R
b (P2) c (P2)
3; 1
K U
1; 3 2; 1
K U
0; 0
a (P1)
L R
b (P2) c (P2)
3; 1
K U
1; 3 2; 1
K U
0; 0
NE NE and Perfect NE
Perfect Bayesian Equilibrium
β’ Equilibrium refinement for Bayesian Equilibrium
β’ Sequential rationality starting from any information set
β’ Most existing work solve using Mathematical
Programming-based method (Nguyen et al. 2019[1]; Guo et
al. 2017[2])
β Very precise
β Lacks scalability: long time and large memory to solve
Thanh H. Nguyen, Yongzhao Wang, Arunesh Sinha, and Michael P. Wellman. Deception
in finitely repeated security games. In AAAI-19
31
Our Algorithm for Computing PBE
β’ Temporal Induced Self-Play (TISP)
β A framework that can be combined with different
learning algorithms
32
Backward
inductionPolicy
Learning
Belief-space
Approximation
Belief-based
representation
Belief-based representation
β’ Use belief instead of history: π(π , π) instead of π(β)
β π(attack Target 1 in (π β 1) round, 2 in π β 2 round, . . )is now π(0.2 prob. of being attacker type a)
β’ Helps in the case with long history
33
Backward Induction
β’ Reverse the training process
β From round πΏ β 1 to round πΏ β 2, to β¦, to round 0
β Use trained value network π and policy network πin round π + 1 when training round π
β’ Do not sample the whole trajectory from round 0
to round πΏ β 1, but one step trajectory from
round π to round π + 1.
β’ Using a special reset function to help
β’ Different networks for different rounds
β’ Improve performance without adding training cost
34
Belief Space Approximation
β’ Sample πΎ belief vectors, and train the strategies
specifically conditioning on the belief and round,
β’ Query time:
π π π, π = π=1πΎ πππ π π ; ππ π€(π, ππ)
π=1πΎ π€(π, ππ)
35
Policy Learning
β’ Policy gradient:
β Update rule changed:
β’ Regret matching:
ππ‘+1 π π , π =π π‘+1(π ,π,π)
+
πβ² π π‘+1(π ,π,πβ²) +
where
π»π ππ(π, π, π ) =
πβΞ
π»π ππ π π, π ππ(π, π, π , π))
= πΈ[ ππ π, π, π , π π»π ππππ(π | π, π ) + πΎπ»ππβ²π»πβ²ππ(π, πβ², π β²)]
π π‘+1 π , π, π =
π=1
π‘
ππ ππ, π , π, π β πππ(ππ, π , π)
36
Temporal Induced Self-play Training
37
Test-time Policy Transformation
38
Experiment: Security Game
β’ Better scalability than MP-based method
β’ Much higher quality than other learning-based method
TISP can be used for more complex Stochastic Bayesian games
39
Outline
β’ Game Theory and Machine Learning for Multiagent
Communication and Coordination
β’ Role of informants in security games
β’ Strategic signaling in security games
β’ Maintaining/breaking information advantage in security games
β’ Coordination through correlated signals
β’ Coordination through notification in platform-user settings
β’ Summary
40
Coordination in Games
β’ Correlated equilibrium (CE)
β’ Correlation device: send private signals to players
β Signals are sampled from a joint probability distribution over the actions of players and represent recommended player behavior
β Equivalent to having a mediator that privately recommends behavior to the players, but does not enforce it
Nash Equilibrium:
Total Utility=7+2=2+7=9
0.5
0.25
0.25
Correlated Equilibrium:
Total Utility=0.25*2(7+2)+0.5(6+6)
=10.5
41
Understand and Compute EFCE
β’ Extensive-form correlated equilibrium (EFCE):β Correlation device selects
private signals for the players before the game starts;
β Recommendations are revealed incrementally as the players progress in the game tree
β’ It is computationally challenging to compute EFCE
42
Theoretical Results
β’ Theorem (informal): Finding an EFCE in a two-player
game can be seen as a bilinear saddle-point problem
β’ Conceptual implication: A zero-sum game between the
mediator and the deviator
β’ Computational implication: The bilinear saddle-point
formulation opens the way to the plethora of optimization
algorithm that has been developed specifically for
saddle-point problems
43
minπ₯βπmaxπ¦βππ₯ππ΄π¦
Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks Gabriele Farina,
Chun Kai Ling, Fei Fang, Tuomas Sandholm. In NeurIPS-19:
Algorithms to Compute EFCE
β’ Algorithm 1: A simple subgradient descent method
β Exploits the bilinear saddle-point problem formulation
β Use structural properties of EFCEs
β Can lead to better scalability than the prior approach
based on linear programming
β’ Algorithm 2: A regret minimization-based algorithm
β Adapt the self-play methods based on regret
minimization
β Much more scalable than Algorithm 1
44Correlation in Extensive-Form Games: Saddle-Point Formulation and Benchmarks Gabriele Farina, Chun Kai Ling,
Fei Fang, Tuomas Sandholm. In NeurIPS-19. Efficient Regret Minimization Algorithm for Extensive-Form
Correlated Equilibrium Gabriele Farina, Chun Kai Ling, Fei Fang, Tuomas Sandholm In NeurIPS-19
Copula Learning for Agent Coordination
Correlation
Device
Players in Team
1
Players in Team
2
Our goal is to design the copula, from which we can
derive the distribution of the signals, to achieve good
enough coordination among players
Design the distribution
of the signal
represented by a
copula
e.g., a neural network
Parameterize the copula with a neural network and try to learn the
parameters to ensure the players have incentive to follow the
recommended action
Deep Archimedean Copulas Chun Kai Ling, Fei Fang, Zico Kolter, NeurIPS-2045
Archimedean Copulas
β’ Copulas: Multivariate CDF with marginals uniform in [π, π]
β’ πΆ π₯1, β¦ , π₯π = π(π1 β€ π₯1, β¦ , ππ β€ π₯π)
β’ Archimedean Copulas: specified by a generator π: 0,β β0, 1
β’ πΆ π₯1, β¦ π₯π = π πβ1 π₯1 +β―+ π
β1 π₯π
β’ Commonly used A.C. are parameterized by a single scalar πe.g.,
Frank Clayton
ππ π‘ = β1
πlog(eβπ‘ πβπ β 1 + 1) ππ π‘ = 1 + π‘
β1/π
Image from Scherer, Matthias, and Jan-frederik Mai. Simulating copulas: stochastic models, sampling algorithms, and
applications.
46
Our approach: ACNet
β’ ACNet learns a π as a convex combination of negative
exponentials
β’ Other probabilistic quantities obtained by differentiation
w.r.t. inputs
β Joint Density: πππΆ π₯1,β¦,π₯π
ππ₯1β¦ππ₯π
β Conditional densities, conditional distributions, etc.
β’ Evaluating these quantities requires computing πβ1
β Provide a wrapper in PyTorch to compute inverses
using Newtons method
β Fully differentiable, derivatives w.r.t. weights are
computed using auto-differentiation
47
Fitting real world data
Ground Truth
Boston INTC-MSFT GOOG-FB
Best parametric
ACNet
48
We find that ACNetβ¦
β’ Is able to fit synthetic data generated from common A.C.
β’ Outperforms common A.C. in real-world datasets
β’ Can give conditional densities/CDFs in a single model
β’ Can be sampled from efficiently even in high dimensions
β’ Can be the basis for computing correlated equilibrium in
complex games (e.g., with continuous action space)
β’ Future direction: Leverage ACNet to compute correlated
equilibrium for complex games
49
Outline
β’ Game Theory and Machine Learning for Multiagent
Communication and Coordination
β’ Role of informants in security games
β’ Strategic signaling in security games
β’ Maintaining/breaking information advantage in security games
β’ Coordination through correlated signals
β’ Coordination through notification in platform-user settings
β’ Summary
50
Motivation: Volunteer-Based Food Rescue Platform
Volunteer claim
rescue
Pick up from
donor
Deliver to recipient
Success!
Food waste and food insecurity coexist Waste up to 40% of our food globally (>1.3 billion tons annually)
1 in 8 people go hungry every day
Rescue good food!
Post rescue
requests
In collaboration with 412 Food Rescue (412FR)51
Motivation: Volunteer-Based Food Rescue Platform
β’ Challenges
β Uncertainty about whether a rescue will be claimed
and completed
Human dispatcher intervenesSend notifications to volunteers
1-to-many communication
Volunteer: notification fatigue
1-to-1 communication
Dispatcher: overstretched
52
Motivation: Volunteer-Based Food Rescue Platform
β’ How can AI help?
β Predictive model of rescue claim status
β’ Determine which rescues need special attention
from human dispatcher
β Data-driven optimization of intervention and
notification
β’ Avoid excessive notifications to help retain
volunteers
Improving Efficiency of Volunteer-Based Food Rescue Operations. Zheyuan
Ryan Shiβ, Yiwen Yuanβ, Kimberly Lo, Leah Lizarondo, Fei Fang. In IAAI-20.
53
Predictive Model of Rescue Claim Status
Timing
Weather
Location
Percentage of unclaimed
rescues by zip codeFeatures used for predictive model
Operational dataset of 412FR from March 2018 to May 2019
54
Predictive Model of Rescue Claim Status
A stacking model
Predict whether a rescue will be claimed by volunteers
+: Claimed (3825)
-: Not claimed (749)
NN
Training data: May 2018 to Dec 2018
Test data: Jan 2019 to May 2019
55
Predictive Model of Rescue Claim Status
Predict whether a rescue will be claimed by volunteers
Model Accuracy Precision Recall F1 AUC
Gradient
boosting0.73 0.86 0.82 0.84 0.51
Random forest 0.71 0.87 0.78 0.82 0.54
Gaussian
process0.56 0.88 0.54 0.67 0.60
Stacking model 0.69 1.00* 0.64 0.78 0.81
56
Rescue
published
Dispatcher
Intervene
60 minutes
5 miles
15 minutes
1st wave
notification
Notify all
volunteers
Optimize Intervention and Notification Scheme
Current Practice (Default INS)
964 volunteers get 1st-wave notification on average
44.6% rescues are claimed by volunteers receiving 1st-wave
notification 57
Rescue
published
Dispatcher
Intervene
π§ minutes
π¦ miles
π₯ minutes
1st wave
notification
Notify all
volunteers
(2nd-wave)
Optimize Intervention and Notification Scheme
58
Improve INS with minor changes
Task: Find best values of π₯, π¦, π§ to reduce notifications and
human intervention while ensuring at least the same claim rate
Optimize Intervention and Notification Scheme
β’ Optimization problem
59
minπ₯,π¦,π§π£ π¦ + π π₯, π¦, π§ + π Γ π π₯, π¦, π§
s.t. π π₯, π¦, π§ β₯ π
π£ π¦ = Expected # of volunteers receiving 1st-wave notification
π π₯, π¦, π§ = Expected # of volunteers receiving 2nd-wave
notificationπ π₯, π¦, π§ = Expected # of rescues requiring human
intervention
π₯, π¦, π§ β π
Claim rate β₯ threshold
Counterfactual Estimation
β’ How to estimate π£ π¦ , π π₯, π¦, π§ , π π₯, π¦, π§ , π(π₯, π¦, π§)?
β Assume rescue and volunteer distribution remain the
same
β Estimate the quantities based on historical rescues
β For each rescue, we calculate the counterfactual
claim time (CCT) under an INS with a number of
assumptions
β’ Assumption 1: Upon receiving the notification, a
volunteer take the same amount of time to respond
β’ Assumption 2: Success of intervention is not
affected by INS
β’ β¦
60
Branch-and-Bound Algorithm
β’ Can we avoid enumerating all possible INSs?
β’ Estimate a lower bound of objective value with INS
(π₯, π¦, π§) when a subset of parameters are specified
β’ Use the lower bound to prioritize and prune INSs through
branch-and-bound
61
π₯, π¦, π§ β π
Example: π₯ = π’ππ ππππππππ, π¦ = 5 πππππ , π§ = 45 πππ
Lower bound = π£ π¦ + π π₯πππ₯ , π¦, π§ + π π π₯πππ, π¦, π§
1st wave 2nd wave Human
Branch-and-Bound Algorithm
62
(π₯, π¦, π§)unspecified
π§ = 45 π§ = 60π§ = 50 π§ = 55
π¦ = 4.5 π¦ = 6π¦ = 5 π¦ = 5.5
Calculate LB
π₯ = 14π₯ = 15
π₯ = 16
Calculate LB
Calculate Objective Value
πππ = 2500
πΏπ΅ = 2600
Recommended INS
β’ Optimize on data from May 2018 to Dec 2018
β’ Test on test data from Jan 2019 to May 2019
β’ Recommend INS by checking Pareto frontier
Deployed
since Jan
2020 63
Rescue-Specific Notification Scheme
β’ Can we do better than simply changing the parameters
of the default INS?
β Send 1st-wave notifications to volunteers that are
more likely to claim the rescue
β’ Task: Given a rescue, provide a list of π volunteers
β’ Similar to recommender systems
β Users - Rescue trips
β Items - Volunteers
A Recommender System for Crowdsourcing Food Rescue Platforms. Zheyuan
Ryan Shi, Leah Lizarondo, Fei Fang In WWW-21
64
Distribution of Donor and Recipient Organizations
0 1 2
3 4 5
6 7 8
10 119
12 13 14
15
0 1 2
3 4 5
6 7 8
10 119
12 13 14
15
Donor Organizations Recipient Organizations
Divide the Pittsburgh area into 16 regions
65
Feature Extraction
Feature extraction
Rescue
Volunteer
Volunteerβs # completed
rescues in donorβs region
Volunteerβs # completed
rescues in recipientβs region
Volunteerβs total # completed
rescues
Time between rescue and
volunteerβs onboarding
Distance between donor and
volunteer
66
Predictive Model of Rescue-Volunteer Compatibility
Predict whether a rescue will be claimed by a specific volunteer
+: Claimed
-: Not claimed Training data: Mar 2018 to Oct 2019
Test data: Nov 2019 to Mar 2020
Feature
s
Neural
network
with 4
hidden
layers
6757 rescues
9212 volunteers
67
Rescue-Specific Notification
Volunteer 1
Volunteer 2
Volunteer N
β¦
0.341
0.105
0.422
0.663
0.635
0.002
Volunteer 8346
Volunteer 333
Volunteer 1835
β¦
68
Evaluation
β’ Metric: Hit ratio at top π (HR@k): % of rescues that are
claimed by volunteers in top π
β’ π = 964 to match the default INS
69
Caveat with Rescue-Specific Notification
β’ ML model discovers some frequent volunteers and
sends them notifications almost all the time
70
ML + Online Planning
β’ Rather than greedily taking the top π volunteers, we
enforce a constraint such that each volunteer receives at
most πΏ notifications per day
β’ For current rescue π, determine who to send notifications
to by planning with an projected set of future rescues π
71
π₯ππ β {0,1}: Whether to send
notification of rescue π to
volunteer π
πππ β 0,1 : Output of ML model
indicating the prob. that
volunteer π will claim rescue π
ππ β {0, β¦ , πΏ}: Number of
notifications volunteer π can
receive for the rest of the day
Online Planning-Based Rescue-Specific Notification
β’ Avoid the over-concentration with πΏ = 5
β’ HR@k = 0.645, much better than current practice
72
(πΏ)
Outline
β’ Game Theory and Machine Learning for Multiagent
Communication and Coordination
β’ Role of informants in security games
β’ Strategic signaling in security games
β’ Maintaining/breaking information advantage in security games
β’ Coordination through correlated signals
β’ Coordination through notification in platform-user settings
β’ Summary
73
Summary
β’ Communication and Coordination in Multi-Agent
Interaction
β Informants, Signals, Notifications
β Historical actions
β’ ML+ GT for Communication and Coordination
β Mathematical programming-based algorithms
β Learn human behavior
β Learn equilibrium / optimal strategy
β’ ML+ GT for Societal Challenges
β Security, Sustainability, Food security
74
Acknowledgment
β’ Advisors, postdocs, students and all co-authors!
β’ Collaborators and partners
β’ Funding support
75
top related