development of a people mass movement …...development of a people mass movement simulation...

1
Development of a people mass movement simulation framework based on reinforcement learning Yanbo Pang 1,3 Kota Tsubouchi 2 Takahiro Yabe 3 Yoshihide Sekimoto 3 1. Department of Civil Engineering, the University of Tokyo 2.Yahoo! Japan Research 3. IIS, the University of Tokyo 東京大学 関本研究室 / Sekimoto Lab. IIS, the University of Tokyo. Motivation Sekimoto Lab. @ IIS Human Centered Urban Informatics, the University of Tokyo Understanding individual and crowds dynamics in urban environment is critical for numerous fields such as urban planning, transportation management, and commercial services. How to use emerging data source such as GPS data without personal information to modeling human daily behavior decision making is still challenging and understudied. Our research aims to develop a novel agent-based modeling and simulation framework to replicate and simulate urban dynamics in various situations by learning the decision making process of individuals from large scale GPS datasets from mobile phones. Agent-based modeling and simulation Methods Agent-based modeling and simulation (ABMS) is a useful approach to represent the individual decision-making process. However, traditional modeling techniques are heavily depend-dent on travel-related surveys, with limited scenarios and old Key idea Let the agent to directly imitate real human behaviors from their trajectories, and then simulate urban dynamics more realistically Figure 1. Agent-based modeling based on Survey Data Results Figure 2. Visualization of Simulation Results(Blue: stay, Yellow: walk, Green: vehicle, Red: train) 1. Data 1. Result Visualization Visualization of trips generated by proposed framework and ground truth. Each dot presents a moving agent, and the dot color represents agent's transportation mode. 2. Performance Fig 4. Traffic Estimation on Weekday Fig 5. Traffic Estimation on Weekend As can be seen in Fig. 4 and 5, the simulation results of our proposed method are very close to the ground truth both in weekday and national holiday experiments. Figure 3. Scatterplots of Population Distribution between synthetic dataset (y-axis) and test dataset (x-axis) from 6:00 a.m. to 11:00 p.m. in Tokyo In Fig.3, the population distribution tested for two study area from 6:00 a.m. to 11:00 p.m. between the test dataset (x-axis) and synthetic dataset (y-axis) is shown. The correlation coefficients were higher than 0.8 over all time periods in Tokyo. In this study, we leverage the GPS data collected by Yahoo Japan Cor- poration through the disaster alert application. The GPS dataset consist of the ID, timestamp, longitude and latitude with1 million users in Japan Survey Individual Attributes • Gender • Age • Occupation • Income Car ownership • …… Agent 2. Framework As shown in Fig. 1, the end-to-end framework consists of three parts: developed data processing, agent modeling with parameter training, and agent-based travel micro-simulation. States: is the state space that an agent can visit. Actions: An action is denoted as and indicates the trip destination and transport mode. a = [ destination ,  mode ] Reward function: In this study, reward is assumed to be the linear combination between the features and a weight . f (s,  a) θ Reference [1]Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017) [2] Ziebart, Brian D., et al. "Maximum entropy inverse reinforcement learning." Aaai. Vol. 8. 2008. 3. Reinforcement Learning Agents for People Flow The Markov decision process (MDP) is used to model the sequential decision process of a rational agent. We assume people observed from GPS dataset as optimal policy and recover its reward function based on Inverse Reinforcement Learning. Then, agent’s behavior policy is solved by Proximal Policy Optimization (PPO)[1]. [2]

Upload: others

Post on 24-Feb-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Development of a people mass movement …...Development of a people mass movement simulation framework based on reinforcement learning Yanbo Pang1,3、Kota Tsubouchi2、Takahiro Yabe

Development of a people mass movement simulation framework based on reinforcement learning

Yanbo Pang1,3、Kota Tsubouchi2、Takahiro Yabe3、Yoshihide Sekimoto3

1. Department of Civil Engineering, the University of Tokyo 2.Yahoo! Japan Research 3. IIS, the University of Tokyo

東京大学 関本研究室 / Sekimoto Lab. IIS, the University of Tokyo.

Motivation

Sekimoto Lab. @ IIS Human Centered Urban Informatics, the University of Tokyo

• Understanding individual and crowds dynamics in urban environment is critical for numerous fields such as urban planning, transportation management, and commercial services.

• How to use emerging data source such as GPS data without personal information to modeling human daily behavior decision making is still challenging and understudied.

• Our research aims to develop a novel agent-based modeling and simulation framework to replicate and simulate urban dynamics in various situations by learning the decision making process of individuals from large scale GPS datasets from mobile phones.

Agent-based modeling and simulation

Methods

• Agent-based modeling and simulation (ABMS) is a useful approach to represent the individual decision-making process.

• However, traditional modeling techniques are heavily depend-dent on travel-related surveys, with limited scenarios and old

Key idea Let the agent to directly imitate real human behaviors from their

trajectories, and then simulate urban dynamics more realistically

Figure 1. Agent-based modeling based on Survey Data

Results

Figure 2. Visualization of Simulation Results(Blue: stay, Yellow: walk, Green: vehicle, Red: train)

1. Data

1. Result VisualizationVisualization of trips generated by proposed framework and g r o u n d

truth. Each dot presents a moving agent, and the dot color represents agent's transportation mode.

2. Performance

Fig 4. Traffic Estimation on Weekday Fig 5. Traffic Estimation on Weekend

As can be seen in Fig. 4 and 5, the simulation results of our proposed method are very close to the ground truth both in weekday and national holiday experiments.

Figure 3. Scatterplots of Population Distribution between synthetic dataset (y-axis) and test dataset (x-axis) from 6:00 a.m. to 11:00 p.m. in Tokyo

In Fig.3, the population distribution tested for two study area from 6:00 a.m. to 11:00 p.m. between the test dataset (x-axis) and synthetic dataset (y-axis) is shown. The correlation coefficients were higher than 0.8 over all time periods in Tokyo.

𝑝(ζ θ) =exp(∑𝑡 θ𝑇𝒇(𝑠𝑡, 𝑎𝑡))

𝑍(θ)∝ (∑

𝑡

θ𝑇𝒇(𝑠𝑡, 𝑎𝑡))

In this study, we leverage the GPS data collected by Yahoo Japan Cor-poration through the disaster alert application. The GPS dataset consist of the ID, timestamp, longitude and latitude with1 million users in Japan

Survey

Individual Attributes• Gender• Age• Occupation• Income• Car ownership• ……

Agent

2. FrameworkAs shown in Fig. 1, the end-to-end framework consists of three parts: developed data processing, agent modeling with parameter training, and agent-based travel micro-simulation.

𝒮States: is the state space that an agent can visit. Actions: An action is denoted as and indicates the trip destination and transport mode.

a = [destination,   mode]a ∈ 𝒜𝒜

Reward function: In this study, reward is assumed to be the linear combination between the features and a weight .

f (s,  a)θ

Reference[1]Schulman, John, et al. "Proximal policy optimization algorithms." arXiv preprint arXiv:1707.06347 (2017)[2] Ziebart, Brian D., et al. "Maximum entropy inverse reinforcement learning." Aaai. Vol. 8. 2008.

3. Reinforcement Learning Agents for People Flow The Markov decision process (MDP) is used to model the sequential

decision process of a rational agent. We assume people observed from GPS dataset as optimal policy and recover its reward function based on Inverse Reinforcement Learning. Then, agent’s behavior policy is solved by Proximal Policy Optimization (PPO)[1].

[2]