machine learning techniques for mobile...
TRANSCRIPT
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Machine Learning Techniques for Mobile Computing
Joongheon Kim [김중헌] (Chung-Ang University [중앙대학교컴퓨터공학부])Contacts: [email protected] , [email protected]
Tutorial, 2017 KICS Winter Conference [2017년한국통신학회동계학술대회] (January 19, 2017)
1
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Stochastic Optimization and Machine Learning Algorithms
• Stochastic Network Optimization (Lyapunov Optimization Framework)
• Markov Decision Process (MDP) with the Example of Mobile Robot Platforms
• Artificial Neural Networks
2
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
3
Stochastic Optimization and Machine Learning Algorithms (Part 1)
Lyapunov Optimization Framework
Markov Decision Process
Artificial Neural Networks
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Stochastic Optimization and Machine Learning Algorithms
• Stochastic Network Optimization (Lyapunov Optimization Framework)
• Motivation
• Theory
• Applications
• Markov Decision Process (MDP) with the Example of Mobile Robot Platforms
• Artificial Neural Networks
4
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Reference
5
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Stochastic Network Optimization (Lyapunov Optimization Framework)
• Motivation
• Theory
• Introduction to Queues (Queue Dynamics)
• Basic Queueing Theory: A Quick Review
• Lyapunov Drift and Lyapunov Optimization
• Applications
• Introduction
• Core Scheduling with 3-Core CPU
• Buffer-Stable Adaptive Per-Module Power Allocation for Energy-Efficient 5G Platforms
• Quality-Aware Streaming and Scheduling for Device-to-Device Video Delivery
6
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Optimization with Queueing
• Matching with Queues
7
a1
a2
a3
b1
b2
p11=3
p12=0
p21=0
p22=1
p31=2
p32=0
Q1[t]
Q2[t]
Q3[t]
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Motivation
• Theory
• Introduction to Queues (Queue Dynamics)
• Basic Queueing Theory: A Quick Review
• Lyapunov Drift and Lyapunov Optimization
• Applications
• Introduction
• Core Scheduling with 3-Core CPU
• Buffer-Stable Adaptive Per-Module Power Allocation for Energy-Efficient 5G Platforms
• Quality-Aware Streaming and Scheduling for Device-to-Device Video Delivery
8
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Queue Dynamics
• [Definition (Queue Dynamics)] A single-server discrete-time queueing system:
𝑄 𝑡 + 1 = max 𝑄 𝑡 − 𝑏 𝑡 , 0 + 𝑎 𝑡 , for all 𝑡 = 0,1,2,⋯
• Alternative Form:
𝑄 𝑡 + 1 = 𝑄 𝑡 − 𝑏 𝑡 + 𝑎 𝑡 , for all 𝑡 = 0,1,2,⋯
where 𝑏 𝑡 = min 𝑄 𝑡 , 𝑏[𝑡]
9
Arrival Process
𝑎𝑖[t]
𝑄𝑖[𝑡]
Departure (Service) Process
𝑏𝑖[t]
ResourcesQueue
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Queue Dynamics
• [Definition (Time-Average Rate)] An arrival process 𝑎[𝑡] 𝑡=0
∞ and a service process 𝑏[𝑡] 𝑡=0∞ have time average rates 𝑎 and 𝑏, respectively, if
lim𝑡→∞
1
𝑡
𝜏=0
𝑡−1
𝑎[𝜏] = 𝑎, lim𝑡→∞
1
𝑡
𝜏=0
𝑡−1
𝑏[𝜏] = 𝑏
• [Definition (Rate Stable)]A queue 𝑄[𝑡] is rate stable if
lim𝑡→∞
𝑄[𝑡]
𝑡= 0
• [Definition (Mean Rate Stable)]A queue 𝑄[𝑡] is mean rate stable if
lim𝑡→∞
𝐸 𝑄[𝑡]
𝑡= 0
10
lim𝑡→∞
𝑄 𝑡 < ∞
lim𝑡→∞
𝐸 𝑄 𝑡 < ∞
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Motivation
• Theory
• Introduction to Queues (Queue Dynamics)
• Basic Queueing Theory: A Quick Review
• Lyapunov Drift and Lyapunov Optimization
• Applications
• Introduction
• Core Scheduling with 3-Core CPU
• Buffer-Stable Adaptive Per-Module Power Allocation for Energy-Efficient 5G Platforms
• Quality-Aware Streaming and Scheduling for Device-to-Device Video Delivery
11
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Brief Introduction
• Basic Assumption for Stability: 𝜇 > 𝜆
• Queues
• B/B/1 Queue: Arrival with Bernoulli Distribution and Departure with Bernoulli Distribution
• B/G/1 Queue: Arrival with Bernoulli Distribution and Departure with General Distribution
• G/G/1 Queue: Arrival with General Distribution and Departure with General Distribution
12
Arrival Rate (𝜆)
𝑄[𝑡]
Departure/Service Rate (𝜇)
ResourcesQueue
Queue Size Queueing Delay (Little’s Theorem)
B/B/1𝑄 =
𝜆(1 − 𝜆)
𝜇 − 𝜆 𝑊 =𝑄
𝜆=
1 − 𝜆
𝜇 − 𝜆
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Motivation
• Theory
• Introduction to Queues (Queue Dynamics)
• Basic Queueing Theory: A Quick Review
• Lyapunov Drift and Lyapunov Optimization
• Applications
• Introduction
• Core Scheduling with 3-Core CPU
• Buffer-Stable Adaptive Per-Module Power Allocation for Energy-Efficient 5G Platforms
• Quality-Aware Streaming and Scheduling for Device-to-Device Video Delivery
13
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Lyapunov Drift and Optimization
• Time-Average Optimization under Queue Stability
14
minimize: 𝑦0
subject to: • [C1] 𝑦𝑙 ≤ 0, ∀𝑙 ∈ 1,⋯ , 𝐿 ,• [C2] 𝛼 𝑡 ∈ 𝐴𝜔 𝑡 , ∀𝑡 ∈ 0,1,⋯ ,
• [C3] Queue Stability
𝑦0 = lim𝑡→∞
1
𝑡
𝜏=0
𝑡−1
𝑦0[𝑡]
𝑦𝑙 = lim𝑡→∞
1
𝑡
𝜏=0
𝑡−1
𝑦𝑙[𝑡] , ∀𝑙 ∈ 1,⋯ , 𝐿
𝜔[𝑡]: random event at 𝑡𝛼[𝑡]: action control which is chosen
after observing 𝜔[𝑡] at 𝑡𝐴𝜔[𝑡]: an action space associated
with 𝜔[𝑡]
Lyapunov Drift
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Lyapunov Drift and Optimization
• [Definition (Quadratic Lyapunov Function) or (Lyapunov Function)]
𝐿 𝑄[𝑡] =1
2 𝑘=1
𝐾 𝑄𝑘2[𝑡], where 𝑄 𝑡 = 𝑄1 𝑡 ,⋯ , 𝑄𝐾 𝑡
• [Definition (One-Slot Conditional Lyapunov Drift) or (Lyapunov Drift)]
∆ 𝑄[𝑡] = 𝐸 𝐿 𝑄[𝑡 + 1] − 𝐿 𝑄[𝑡] 𝑄[𝑡]
means the expected change in the Lyapunov function over one slot 𝑡 → 𝑡 + 1 , given that the current state in slot 𝑡 is 𝑄[𝑡].
15
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Lyapunov Drift and Optimization
• [Theorem (Lyapunov Optimization)]
Suppose that 𝐸 𝐿 𝑄[𝑡] < ∞ and there exist constants 𝐵 > 0, 𝑉 ≥ 0, 𝜖 ≥ 0. In addition,
suppose that there exists 𝑦∗ such that for all slots 𝜏 ∈ 0,1,⋯ and all possible values of 𝑄 𝜏 :
∆ 𝑄 𝜏 + 𝑉 ∙ 𝐸 𝑦0 𝜏 𝑄 𝜏 ≤ 𝐵 + 𝑉 ∙ 𝑦∗ − 𝜖
𝑘=1
𝐾
𝑄𝑘 𝜏
• [Theorem (Optimization with Lyapunov Drift)]
∆ 𝑄 𝜏 + 𝑉 ∙ 𝐸 𝑦0 𝜏 𝑄 𝜏 ≤ 𝐵 + 𝑉 ∙ 𝐸 𝑦0 𝜏 𝑄 𝜏 +
𝑘=1
𝐾
𝑄𝑘 𝜏 𝐸 𝑎𝑘 𝑡 − 𝑏𝑘 𝑡 𝑄 𝜏
16
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Lyapunov Drift and Optimization
• [Theorem (Lyapunov Drift)]
Consider the quadratic Lyapunov function, and assume 𝐸 𝐿 𝑄[𝑡] < ∞. Suppose that there
exist constants 𝐵 > 0, 𝜖 ≥ 0, such that the following drift condition holds, ∀𝜏 ∈ 0,1,⋯ and all
possible 𝑄 𝜏 :
∆ 𝑄 𝜏 ≤ 𝐵 − 𝜖
𝑘=1
𝐾
𝑄𝑘 𝜏
• Then,• If 𝜖 ≥ 0, then all queues 𝑄𝑘 𝜏 are mean rate stable.
17
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Lyapunov Drift and Optimization
• [Theorem (Optimization with Lyapunov Drift)]
∆ 𝑄 𝜏 + 𝑉 ∙ 𝐸 𝑦0 𝜏 𝑄 𝜏 ≤ 𝐵 + 𝑉 ∙ 𝐸 𝑦0 𝜏 𝑄 𝜏 +
𝑘=1
𝐾
𝑄𝑘 𝜏 𝐸 𝑎𝑘 𝑡 − 𝑏𝑘 𝑡 𝑄 𝜏
minimize: 𝑉 ∙ 𝑦0 𝛼 𝑡 , 𝜔 𝑡 + 𝑘=1𝐾 𝑄𝑘 𝑡 ∙ 𝑎𝑘 𝛼 𝑡 , 𝜔 𝑡 − 𝑏𝑘 𝛼 𝑡 , 𝜔 𝑡
18
• Stable?• Separable?• V?
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Motivation
• Theory
• Introduction to Queues (Queue Dynamics)
• Basic Queueing Theory: A Quick Review
• Lyapunov Drift and Lyapunov Optimization
• Applications
• Introduction
• Core Scheduling with 3-Core CPU
• Buffer-Stable Adaptive Per-Module Power Allocation for Energy-Efficient 5G Platforms
• Quality-Aware Streaming and Scheduling for Device-to-Device Video Delivery
19
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Introduction (For better understanding…)
• Basic Form (Separable)
minimize: 𝑉 ∙ 𝑦0 𝛼 𝑡 , 𝜔 𝑡 + 𝑘=1𝐾 𝑄𝑘 𝑡 ∙ 𝑎𝑘 𝛼 𝑡 , 𝜔 𝑡 − 𝑏𝑘 𝛼 𝑡 , 𝜔 𝑡
minimize: 𝑉 ∙ 𝑦0 𝛼 𝑡 + 𝑄 𝑡 𝑎 𝛼 𝑡 − 𝑏 𝛼 𝑡
20
Objective Function
ControlObservation
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Motivation
• Theory
• Introduction to Queues (Queue Dynamics)
• Basic Queueing Theory: A Quick Review
• Lyapunov Drift and Lyapunov Optimization
• Applications
• Introduction
• Example #1: Core Scheduling with 3-Core CPU
• Example #2: Buffer-Stable Adaptive Per-Module Power Allocation
• Example #3: Quality-Aware Streaming and Scheduling for Device-to-Device Video Delivery
21
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Example #1: Core Scheduling with 3-Core CPU
• Tradeoff • More core allocation more power consumption (-); more departure in queue (good for stability) (+)
• Less core allocation less power consumption (+); less departure in queue (bad for stability) (-)
• Objective of Optimization with the Tradeoff
• We want to minimize time-average power consumption subject to queue stability
22
Arrival Process
𝑎[𝑡]Departure Process
𝑏[𝑡]
Processing with Multiple CoresTask Queue, 𝑄[𝑡]
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Example #1: Core Scheduling with 3-Core CPU
• Optimization with Lyapunov Drift• Minimize: 𝑉 ∙ 𝑃 𝛼 𝑡 + 𝑄 𝑡 𝑎 𝛼 𝑡 − 𝑏 𝛼 𝑡
• V: Tradeoff parameter
• 𝛼 𝑡 : Core selection action at 𝑡 (in this three core case, 𝛼 𝑡 ∈ 0,1,2,3 )
• 𝑄 𝑡 : Queue backlog−size at 𝑡
• 𝑎 𝛼 𝑡 : Arrival process with given control action at 𝑡: In this case, the arrival is random. So, this will be ignored.
• 𝑏 𝛼 𝑡 : Departure process with given control action at 𝑡
• 𝑃 𝛼 𝑡 : Power consumption when our core selection is 𝛼 𝑡
• Final Form: Minimize: 𝑽 ∙ 𝑷 𝜶 𝒕 − 𝑸 𝒕 𝒃 𝜶 𝒕• Intuition
• If queue is empty (𝑄 𝑡 = 0), we have to minimize 𝑉 ∙ 𝑃 𝛼 𝑡 . This means we do not need to allocate cores.
• If queue is almost infinite (𝑄 𝑡 ≈ ∞), we have to minimize −𝑏 𝛼 𝑡 (i.e., maximize 𝑏 𝛼 𝑡 ). This means we have to allocate all cores.
23
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Example #1: Core Scheduling with 3-Core CPU
• Example-based Understanding• Minimize: 𝑽 ∙ 𝑷 𝜶 𝒕 − 𝑸 𝒕 𝒃 𝜶 𝒕 [denoted by 𝑭 𝒕 ]
• V = 10; // We want to focus on our objective function ten times more than queue stability
• 𝛼 𝑡 ∈ 0,1,2,3 ; // We can allocation 1, 2, or 3 cores. Or, we can turn off the CPU
• 𝑏 𝛼 𝑡 ;
• 𝑏 𝛼 𝑡 = 0 =0; // If no cores are selected, there is no departure process.
• 𝑏 𝛼 𝑡 = 1 =6; // If 1 core is selected, 6 tasks will be processed.
• 𝑏 𝛼 𝑡 = 2 =11; // If 2 cores are selected, 11 tasks will be processed.
• 𝑏 𝛼 𝑡 = 3 =15; // If 3 cores are selected, 15 tasks will be processed.
• 𝑃 𝛼 𝑡 ;
• 𝑃 𝛼 𝑡 = 0 = 0; // If no cores are selected, there is no power consumption.
• 𝑃 𝛼 𝑡 = 1 =3; // If 1 core is selected, there exits 3 amounts of power consumption.
• 𝑃 𝛼 𝑡 = 2 =5; // If 2 cores are selected, there exits 5 amounts of power consumption.
• 𝑃 𝛼 𝑡 = 3 =6; // If 3 cores are selected,there exits 6 amounts of power consumption.
24
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Example #1: Core Scheduling with 3-Core CPU
• Tradeoff Table
• Actions: 𝑭 𝒕 = 𝑽 ∙ 𝑷 𝜶 𝒕 − 𝑸 𝒕 𝒃 𝜶 𝒕 where 𝑉 = 10
25
𝑏 𝛼 𝑡 𝑃 𝛼 𝑡
𝛼 𝑡 = 0 0 0
𝛼 𝑡 = 1 6 3
𝛼 𝑡 = 2 11 5
𝛼 𝑡 = 3 15 6
𝑡 = 0 𝑡 = 1 𝑡 = 2 𝑡 = 3
Queue-Backlog, 𝑄 𝑡 0 7 3 5
Arrival (random), 𝑎 𝑡 7 3 2 2
𝛼 𝑡 𝛼 𝑡 = 0 10 ∙ 0 − 0 ∙ 0 = 0 10 ∙ 0 − 7 ∙ 0 = 0 10 ∙ 0 − 3 ∙ 0 = 0 10 ∙ 0 − 5 ∙ 0 = 0
𝛼 𝑡 = 1 10 ∙ 3 − 0 ∙ 6 = 30 10 ∙ 3 − 7 ∙ 6 = −12 10 ∙ 3 − 3 ∙ 6 = 12 10 ∙ 3 − 5 ∙ 6 = 0
𝛼 𝑡 = 2 10 ∙ 5 − 0 ∙ 11 = 50 10 ∙ 5 − 7 ∙ 11 = −27 10 ∙ 5 − 3 ∙ 11 = 17 10 ∙ 5 − 5 ∙ 11 = −5
𝛼 𝑡 = 3 10 ∙ 6 − 0 ∙ 15 = 60 10 ∙ 6 − 7 ∙ 15 = −45 10 ∙ 6 − 3 ∙ 15 = 15 10 ∙ 6 − 5 ∙ 15 = −15
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Motivation
• Theory
• Introduction to Queues (Queue Dynamics)
• Basic Queueing Theory: A Quick Review
• Lyapunov Drift and Lyapunov Optimization
• Applications
• Introduction
• Example #1: Core Scheduling with 3-Core CPU
• Example #2: Buffer-Stable Adaptive Per-Module Power Allocation
• Example #3: Quality-Aware Streaming and Scheduling for Device-to-Device Video Delivery
26
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Example #2: Buffer-Stable Adaptive Per-Module Power Allocation
• Tradeoff • More MAA power-on more power consumption (-); more departure in queue (good for stability) (+)
• Less MAA power-on less power consumption (+); less departure in queue (bad for stability) (-)
• Objective of Optimization with the Tradeoff
• We want to minimize time-average power consumption subject to queue stability
27
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Example #2: Buffer-Stable Adaptive Per-Module Power Allocation
• Optimization with Lyapunov Drift• Minimize: 𝑉 ∙ 𝑃 𝛼 𝑡 + 𝑄 𝑡 𝑎 𝛼 𝑡 − 𝑏 𝛼 𝑡
• V: Tradeoff parameter
• 𝛼 𝑡 : MAA power-on/off action at 𝑡 (𝛼 𝑡 ∈ 0,1,2,3,4,5,6,7,8 ), i.e., the number of power-on MAAs
• 𝑄 𝑡 : Queue backlog−size at 𝑡
• 𝑎 𝛼 𝑡 : Arrival process with given control action at 𝑡: In this case, the arrival is random. So, this will be ignored.
• 𝑏 𝛼 𝑡 : Departure process with given control action at 𝑡
• 𝑃 𝛼 𝑡 : Power consumption when our MAA power-on/off decision is 𝛼 𝑡
• Final Form: Minimize: 𝑽 ∙ 𝑷 𝜶 𝒕 − 𝑸 𝒕 𝒃 𝜶 𝒕• Intuition
• If queue is empty (𝑄 𝑡 = 0), we have to minimize 𝑉 ∙ 𝑃 𝛼 𝑡 . This means we do not need to power-on MAAs.
• If queue is almost infinite (𝑄 𝑡 ≈ ∞), we have to minimize −𝑏 𝛼 𝑡 (i.e., maximize 𝑏 𝛼 𝑡 ). This means we have to power-on all MAAs.
28
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Example #2: Buffer-Stable Adaptive Per-Module Power Allocation
• Plotting Result
29
More Queue Stability(V decreases)
More Energy-Efficiency(V increases)
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Motivation
• Theory
• Introduction to Queues (Queue Dynamics)
• Basic Queueing Theory: A Quick Review
• Lyapunov Drift and Lyapunov Optimization
• Applications
• Introduction
• Example #1: Core Scheduling with 3-Core CPU
• Example #2: Buffer-Stable Adaptive Per-Module Power Allocation
• Example #3: Quality-Aware Streaming and Scheduling for Device-to-Device Video Delivery
30
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Example #3: Quality-Aware Streaming and Scheduling
• Tradeoff
• High compression on chunks
• Low quality on chunks (-);
• More stabilization on queues (+)
• Less compression on chunks
• High quality on chunks (+);
• Less stabilization on queues (-)
• Objective of Optimization with the Tradeoff
• We want to maximize time-average video quality subject to queue stability
31
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Example #3: Quality-Aware Streaming and Scheduling
• Optimization with Lyapunov Drift• Maximize: 𝑉 ∙ 𝑄𝑢𝑎𝑙𝑖𝑡𝑦 𝛼 𝑡 − 𝑄 𝑡 𝑎 𝛼 𝑡 − 𝑏 𝛼 𝑡
• V: Tradeoff parameter
• 𝛼 𝑡 : Compression action at 𝑡 (𝛼 𝑡 ∈ 1,2,3 ), i.e., three different levels of compression
• 𝑄 𝑡 : Queue backlog−size at 𝑡
• 𝑎 𝛼 𝑡 : Arrival process with given 𝛼 𝑡 at 𝑡: This is the size of chunks (due to three different levels of compression)
• 𝑏 𝛼 𝑡 : Departure process at 𝑡: In this case, the system transmits packets as much as the network allows.
• 𝑃 𝛼 𝑡 : Power consumption when our MAA power-on/off decision is 𝛼 𝑡
• Final Form: Maximize: 𝑽 ∙ 𝑸𝒖𝒂𝒍𝒊𝒕𝒚 𝜶 𝒕 − 𝑸 𝒕 𝒂 𝜶 𝒕• Intuition
• If queue is empty (𝑄 𝑡 = 0), we have to maximize 𝑉 ∙ 𝑄𝑢𝑎𝑙𝑖𝑡𝑦 𝛼 𝑡 . This means we are doing less compression for better quality of streaming.
• If queue is almost infinite (𝑄 𝑡 ≈ ∞), we have to minimize 𝑎 𝛼 𝑡 . This means we are doing high compression for queue stability.
32
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
33
Stochastic Optimization and Machine Learning Algorithms (Part 2)
Lyapunov Optimization Framework
Markov Decision Process
Artificial Neural Networks
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Stochastic Optimization and Machine Learning Algorithms
• Stochastic Network Optimization (Lyapunov Optimization Framework)
• Markov Decision Process (MDP) with the Example of Mobile Robot Platforms
• Artificial Neural Networks
34
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Markov Decision Process
• Markov Decision Process (MDP) Components: <𝑆, 𝐴, 𝑅, 𝑇, 𝛾>
• 𝑆: Set of states
• 𝐴: Set of actions
• 𝑅: Reward function
• 𝑇: Transition function
• 𝛾: Discount factor
35
How can we use MDP to model agent in a maze?
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Markov Decision Process
• Markov Decision Process (MDP) Components: <𝑆, 𝐴, 𝑅, 𝑇, 𝛾>
• 𝑺: Set of states
• 𝐴: Set of actions
• 𝑅: Reward function
• 𝑇: Transition function
• 𝛾: Discount factor
36
𝑆: location (𝑥, 𝑦) if the maze is a 2D grid• 𝑠0: starting state• 𝑠: current state• 𝑠′: next state• 𝑠𝑡: state at time 𝑡
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Markov Decision Process
• Markov Decision Process (MDP) Components: <𝑆, 𝐴, 𝑅, 𝑇, 𝛾>
• 𝑆: Set of states
• 𝑨: Set of actions
• 𝑅: Reward function
• 𝑇: Transition function
• 𝛾: Discount factor
37
𝑆: location (𝑥, 𝑦) if the maze is a 2D grid𝐴: move up, down, left, or right• 𝑠 → 𝑠′
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Markov Decision Process
• Markov Decision Process (MDP) Components: <𝑆, 𝐴, 𝑅, 𝑇, 𝛾>
• 𝑆: Set of states
• 𝐴: Set of actions
• 𝑹: Reward function
• 𝑇: Transition function
• 𝛾: Discount factor
38
𝑆: location (𝑥, 𝑦) if the maze is a 2D grid𝐴: move up, down, left, or right𝑅: how good was the chosen action?• 𝑟 = 𝑅 𝑠, 𝑎, 𝑠′
• -1 for moving (battery used)• +1 for jewel? +100 for exit?
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Markov Decision Process
• Markov Decision Process (MDP) Components: <𝑆, 𝐴, 𝑅, 𝑇, 𝛾>
• 𝑆: Set of states
• 𝐴: Set of actions
• 𝑅: Reward function
• 𝑻: Transition function
• 𝛾: Discount factor
39
𝑆: location (𝑥, 𝑦) if the maze is a 2D grid𝐴: move up, down, left, or right𝑅: how good was the chosen action?𝑇: where is the robot’s new location?• 𝑇 = 𝑠′ 𝑠, 𝑎
Stochastic Transition
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Markov Decision Process
• Markov Decision Process (MDP) Components: <𝑆, 𝐴, 𝑅, 𝑇, 𝛾>
• 𝑆: Set of states
• 𝐴: Set of actions
• 𝑅: Reward function
• 𝑇: Transition function
• 𝜸: Discount factor
40
𝑆: location (𝑥, 𝑦) if the maze is a 2D grid𝐴: move up, down, left, or right𝑅: how good was the chosen action?𝑇: where is the robot’s new location?𝛾: how much does future reward worth? • 0 ≤ 𝛾 ≤ 1
• 𝛾 ≈ 0: future reward is near 0(immediate action is preferred)
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Markov Decision Process (MDP)
• Markov Property
• Policy and Return
• Value Functions (V, Q)
• Solving MDP
• Planning
• Reinforcement Learning (Value-based)
• Monte-Carlo Method
• TD Method (Q-Learning)
• Reinforcement Learning (Policy-based) advanced topic (out of scope)
41
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Markov Property
• Does 𝑠𝑡+1 depend on 𝑠0, 𝑠1, ⋯ , 𝑠𝑡−1, 𝑠𝑡 ? No.
• Memoryless!
• Future only depends on present
• Current state is a sufficient statistic of agent’s history
• No need to remember agent’s history
• 𝑠𝑡+1 depends only on 𝑠𝑡 and 𝑎𝑡
• 𝑟𝑡 depends only on 𝑠𝑡 and 𝑎𝑡
42
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Markov Decision Process (MDP)
• Markov Property
• Policy and Return
• Value Functions (V, Q)
• Solving MDP
• Planning
• Reinforcement Learning (Value-based)
• Monte-Carlo Method
• TD Method (Q-Learning)
• Reinforcement Learning (Policy-based) advanced topic (out of scope)
43
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Policy and Return
• Policy• 𝜋: 𝑆 → 𝐴
• Maps states to actions
• Gives an action for every state
• Return
• Discounted sum of rewards
• Could be undiscounted Finite horizon
44
𝑅𝑡 =
𝑘=0
∞
𝛾𝑘𝑟𝑡+𝑘
Our goal:Find 𝜋 that maximizes expected return!
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Markov Decision Process (MDP)
• Markov Property
• Policy and Return
• Value Functions (V, Q)
• Solving MDP
• Planning
• Reinforcement Learning (Value-based)
• Monte-Carlo Method
• TD Method (Q-Learning)
• TD(𝜆) Method
• Reinforcement Learning (Policy-based) advanced topic (out of scope)
45
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Value Functions (V, Q)
• State Value Function (𝑉)
• Expected return of starting at state 𝑠 and following policy 𝜋
• How much return do I expect starting from state 𝑠?
• Action Value Function (𝑄)
• Expected return of starting at state 𝑠, taking action 𝑎, and then following policy 𝜋
• How much return do I expect starting from state 𝑠 and taking action 𝑎?
46
𝑉𝜋 𝑠 = 𝐸𝜋 𝑅𝑡 𝑠𝑡 = 𝑠 = 𝐸𝜋 𝑘=0∞ 𝛾𝑘𝑟𝑡+𝑘 𝑠𝑡 = 𝑠
𝑄𝜋 𝑠, 𝑎 = 𝐸𝜋 𝑅𝑡 𝑠𝑡 = 𝑠, 𝑎𝑡 = 𝑎 = 𝐸𝜋 𝑘=0∞ 𝛾𝑘𝑟𝑡+𝑘 𝑠𝑡 = 𝑠, 𝑎𝑡 = 𝑎
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Markov Decision Process (MDP)
• Markov Property
• Policy and Return
• Value Functions (V, Q)
• Solving MDP
• Planning
• Reinforcement Learning (Value-based)
• Monte-Carlo Method
• TD Method (Q-Learning)
• Reinforcement Learning (Policy-based) advanced topic (out of scope)
47
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Planning
• Again, our goal is to find the optimal policy
• If 𝑇 𝑠′ 𝑠, 𝑎 and 𝑅 𝑠, 𝑎, 𝑠′ are known, this is a planning problem.
• We can use dynamic programming to find the optimal policy.
• Keywords: Bellman equation, value iteration, policy iteration
48
𝜋∗ 𝑠 = max𝜋
𝑅𝜋 𝑠
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Planning
• Bellman Equation
• Value Iteration
• Policy Iteration
• Policy Evaluation
• Policy Improvement
49
∀𝑠 ∈ 𝑆: 𝑉∗ 𝑠 = max𝑎
𝑠′
𝑇 𝑠, 𝑎, 𝑠′ 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉∗ 𝑠′
∀𝑠 ∈ 𝑆: 𝑉𝑖+1 𝑠 ← max𝑎
𝑠′
𝑇 𝑠, 𝑎, 𝑠′ 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉∗ 𝑠′
∀𝑠 ∈ 𝑆: 𝑉𝑖+1𝜋𝑘 𝑠 ←
𝑠′
𝑇 𝑠, 𝜋𝑘(𝑠), 𝑠′ 𝑅 𝑠, 𝜋𝑘(𝑠), 𝑠
′ + 𝛾𝑉𝑖𝜋𝑘 𝑠′
𝜋𝑘+1 𝑠 = argmax𝑎
𝑠′
𝑇 𝑠, 𝑎, 𝑠′ 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉𝜋𝑘 𝑠′
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Reinforcement Learning (Value-based)
• If 𝑇 𝑠′ 𝑠, 𝑎 and 𝑅 𝑠, 𝑎, 𝑠′ are unknown, this is a reinforcement learning problem.
• Agent need to interact with the world and gather experience
• At each time-step,
• From state 𝑠
• Take action 𝑎 (𝑎 = 𝜋(𝑠) if stochastic)
• Receive reward 𝑟
• End in state 𝑠′
• Value-based: learn an optimal value function from these data
50
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Reinforcement Learning (Value-based): Monte-Carlo Method
• One way to learn 𝑄(𝑠, 𝑎)
• Use empirical mean return instead of expected return
• Average sampled returns
• Policy chooses action that maximize 𝑄(𝑠, 𝑎)
• Using 𝑉(𝑠) requires the model:
51
𝑄 𝑠, 𝑎 =𝑅1 𝑠, 𝑎 + 𝑅2 𝑠, 𝑎 + ⋯+ 𝑅𝑛 𝑠, 𝑎
𝑛
𝜋(𝑠) = max𝑎
𝑄(𝑠, 𝑎)
𝜋 𝑠 = argmax𝑎
𝑠′
𝑇 𝑠, 𝑎, 𝑠′ 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉 𝑠′
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
52
Stochastic Optimization and Machine Learning Algorithms (Part 3)
Lyapunov Optimization Framework
Markov Decision Process
Artificial Neural Networks
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Outline
• Stochastic Optimization and Machine Learning Algorithms
• Stochastic Network Optimization (Lyapunov Optimization Framework)
• Markov Decision Process (MDP) with the Example of Mobile Robot Platforms
• Artificial Neural Networks
53
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Artificial Neural Networks: Introduction
54
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Artificial Neural Networks: Basic Example in Logic Design
55
AND gate
OR gate
NOT gate
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Artificial Neural Networks: A Network of Artificial Neurons
• A Network of Artificial Neurons
• Nonlinear I/O mapping
• Adaptivity
• Generalization ability
• Fault-tolerance (graceful degradation)
• Biological analogy
56
Multilayer Perceptron Network
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Artificial Neural Networks: A Network of Artificial Neurons
57
Multilayer Perceptron Network
Weight Adaptation
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Artificial Neural Networks: A Network of Artificial Neurons
58
Example: Character RecognitionWeight Adaptation
Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon
Tutorial at 2017 KICS Winter Conference(January 2017)
Neural Networks: Multilayer Perceptron Network (MPN)
• Benefits of Multilayer Perceptron Network (MPN)
59
1st LayerPerceptron Network
2nd LayerPerceptron Network
3rd LayerPerceptron Network