jiayi weng - tuixue.online · jiayi weng cst department, tsinghua university 13 / 18. bio research...

18
Bio Research Experience (RL) Future Direction Jiayi Weng CST Department, Tsinghua University February 29, 2020 Jiayi Weng CST Department, Tsinghua University 1 / 18

Upload: others

Post on 11-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

Jiayi Weng

CST Department, Tsinghua University

February 29, 2020

Jiayi Weng CST Department, Tsinghua University 1 / 18

Page 2: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

1 Bio

2 Research Experience (RL)

3 Future Direction

Jiayi Weng CST Department, Tsinghua University 2 / 18

Page 3: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

Jiayi Weng• B.Eng. in Computer Science, Tsinghua University, 2016 -

2020.GPA 3.75/4.00 (Rank 17/158)

• Research Assistant to Prof. Yoshua Bengio, MilaJul. 2019 - now, topic: NLP, RL, Neural-Symbolic Reasoning

• Research Assistant to Prof. Jun Zhu, Tsinghua UniversityMar. 2018 - now, topic: RL

• Research Assistant to Dr. Hongwei Qin, Sensetime Inc.Mar. 2018 - Jun. 2018, topic: CV (HDR+, denoising)

• Research Assistant to Prof. Hailong Yao, Tsinghua UniversityMay 2017 - Oct. 2017, topic: VLSI (escape routing)

• GitHub 1000+ personal stars, 600+ followers

Jiayi Weng CST Department, Tsinghua University 3 / 18

Page 4: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

1 Bio

2 Research Experience (RL)VizDoom AI Competition 2018Relational Markov Decision ProcessRule-Transformer

3 Future Direction

Jiayi Weng CST Department, Tsinghua University 4 / 18

Page 5: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

VizDoom AI Competition 2018

1 Bio

2 Research Experience (RL)VizDoom AI Competition 2018Relational Markov Decision ProcessRule-Transformer

3 Future Direction

Jiayi Weng CST Department, Tsinghua University 5 / 18

Page 6: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

VizDoom AI Competition 2018

birth place navigation road exit door

(a) Navigation

(d) Final goal: open exit door

(b) Resources: weapon and health bonus

(c) Shoot enemies

Figure 1: Example of single-player scenarios in VizDoom.

Jiayi Weng CST Department, Tsinghua University 6 / 18

Page 7: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

VizDoom AI Competition 2018

Challenges & Solutions

1 Visual understanding from raw images⇒ Set up an object detection system (YOLO-v31) aiming to

efficiently identify the obstacles and enemies.2 Visual navigation in the 3D environment

⇒ Incorporated the depth signal into the navigation network,which facilitated the agent to plot its paths. (+SNAIL2)

3 Sparse and delayed reward for the agent⇒ Formulated the problem within a hierarchical reinforcement

learning framework by dividing the challenge into subtasks:navigation, attack, tool, and resources.

1Joseph Redmon et al. “YOLOv3: An Incremental Improvement”. In: CoRRabs/1804.02767 (2018). arXiv: 1804.02767.

2Nikhil Mishra et al. “A Simple Neural Attentive Meta-Learner”. In: 6thInternational Conference on Learning Representations, ICLR 2018, Vancouver,BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. 2018.

Jiayi Weng CST Department, Tsinghua University 7 / 18

Page 8: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

VizDoom AI Competition 2018

VizDoom Competition Result3

• All teams submitted an agent to finish the game on tenunseen and more difficult maps.

• Our method achieved first place in both public-rank andprivate-rank leaderboards.

Team Ours DoomNet VIPLAB ddangeloTotal Time (min) 25.34 29.86 31.54 37.33# of Best Record 8 4 3 4

Table 1: VDAIC 2018 Track(1) Competition Result (top 4 team).

3Detailed result can be found athttp://vizdoom.cs.put.edu.pl/competitions/vdaic-2018-cig/results.

Jiayi Weng CST Department, Tsinghua University 8 / 18

Page 9: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

Relational Markov Decision Process

1 Bio

2 Research Experience (RL)VizDoom AI Competition 2018Relational Markov Decision ProcessRule-Transformer

3 Future Direction

Jiayi Weng CST Department, Tsinghua University 9 / 18

Page 10: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

Relational Markov Decision Process

We aim to accelerate the deep reinforcement learning procedure bystate abstraction (using first-order logic) and reward shaping.

DRL

Our Framework

Abstraction

Planning

Raw Inputs  Entries  Predicates 

Optimal Value Function Agent 

Environemnt 

Original Rewards 

Potential­based Reward Shaping 

Figure 2: The working procedure of RMDP. Generate the credits forstate-action pairs of the deep reinforcement learning in three stages:abstraction, planning, and potential-based reward shaping.

Jiayi Weng CST Department, Tsinghua University 10 / 18

Page 11: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

Relational Markov Decision Process

• Our method can accelerate the convergence of value-based RLmethod.

• RMDP is more robust under harder games.

0 20 40 60 80 100

500

1000

1500

2000

2500

3000 Our frameworkDRL

(a) Easy Map

0 20 40 60 80 100400

600

800

1000

1200

1400

1600 Our frameworkDRL

(b) Hard Map

Figure 3: Test RMDP on Health Gathering scenarios. Vertical axisindicates the accumulated reward per episode through training phase.

Jiayi Weng CST Department, Tsinghua University 11 / 18

Page 12: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

Rule-Transformer

1 Bio

2 Research Experience (RL)VizDoom AI Competition 2018Relational Markov Decision ProcessRule-Transformer

3 Future Direction

Jiayi Weng CST Department, Tsinghua University 12 / 18

Page 13: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

Rule-Transformer

• We aimed to develop a system with Consciousness Prior4,which integrated the advantage of both neural and symbolicmodels.

• To make Transformers more like reasoning with rules, insteadof a fixed rule per layer, allow to also pick a rule, at eachlevel, and share that set across levels.

Figure 4: The Consciousness Prior, credit5.

4Yoshua Bengio. “The Consciousness Prior”. In: CoRR abs/1709.08568(2017). arXiv: 1709.08568.

5Yoshua Bengio. From System 1 Deep Learning to System 2 Deep Learning.2019.

Jiayi Weng CST Department, Tsinghua University 13 / 18

Page 14: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

Rule-Transformer

Figure 5: Architecture ofRule-Transformer.

• Allow picking a rule at each level• Sharing knowledge across the level• Update different variables with

different functions• The rule could potentially be

dynamically defined

Jiayi Weng CST Department, Tsinghua University 14 / 18

Page 15: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

Rule-Transformer

What we have tried and are trying

x Machine Translationx Language Modeling

= GLUE Benchmark= Multitask Imitation Learningx Mathematical Datasetv LeetCode-style algorithmic problems? Brainfuck programming language explainer (ongoing)

Jiayi Weng CST Department, Tsinghua University 15 / 18

Page 16: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

1 Bio

2 Research Experience (RL)

3 Future Direction

Jiayi Weng CST Department, Tsinghua University 16 / 18

Page 17: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

• Generally, I would like to continue my research in the field ofreinforcement learning.

• I am interested in addressing the challenges of samplecomplexity (incorporating human-prior knowledge for betterdisentanglement) / robustness and generalization (combiningwith meta-learning, game-theory).

• I am also keen to work on the real-world application ofreinforcement learning: learning-based control system,self-driving cars, etc.

Jiayi Weng CST Department, Tsinghua University 17 / 18

Page 18: Jiayi Weng - tuixue.online · Jiayi Weng CST Department, Tsinghua University 13 / 18. Bio Research Experience (RL) Future Direction Rule-Transformer Figure 5: Architecture of Rule-Transformer

Bio Research Experience (RL) Future Direction

Thank you!

Jiayi Weng CST Department, Tsinghua University 18 / 18