machine learning techniques for mobile...

Professor Joongheon Kim (CSE@CAU)https://sites.google.com/site/joongheon

Tutorial at 2017 KICS Winter Conference(January 2017)

Machine Learning Techniques for Mobile Computing

Joongheon Kim [김중헌] (Chung-Ang University [중앙대학교컴퓨터공학부])Contacts: [email protected] , [email protected]

Tutorial, 2017 KICS Winter Conference [2017년한국통신학회동계학술대회] (January 19, 2017)

1

mailto:[email protected]

mailto:[email protected]



Outline

• Stochastic Optimization and Machine Learning Algorithms

• Stochastic Network Optimization (Lyapunov Optimization Framework)

• Markov Decision Process (MDP) with the Example of Mobile Robot Platforms

• Artificial Neural Networks

2



3

Stochastic Optimization and Machine Learning Algorithms (Part 1)

Lyapunov Optimization Framework

Markov Decision Process

Artificial Neural Networks



Outline



• Motivation

• Theory

• Applications



4



Reference

5



Stochastic Network Optimization (Lyapunov Optimization Framework)

• Motivation

• Theory

• Introduction to Queues (Queue Dynamics)

• Basic Queueing Theory: A Quick Review

• Lyapunov Drift and Lyapunov Optimization

• Applications

• Introduction

• Core Scheduling with 3-Core CPU

• Buffer-Stable Adaptive Per-Module Power Allocation for Energy-Efficient 5G Platforms

• Quality-Aware Streaming and Scheduling for Device-to-Device Video Delivery

6



Optimization with Queueing

• Matching with Queues

7

a1

a2

a3

b1

b2

p11=3

p12=0

p21=0

p22=1

p31=2

p32=0

Q1[t]

Q2[t]

Q3[t]



Outline

• Motivation

• Theory




• Applications

• Introduction




8



Queue Dynamics

• [Definition (Queue Dynamics)] A single-server discrete-time queueing system:

𝑄 𝑡 + 1 = max 𝑄 𝑡 − 𝑏 𝑡 , 0 + 𝑎 𝑡 , for all 𝑡 = 0,1,2,⋯

• Alternative Form:

𝑄 𝑡 + 1 = 𝑄 𝑡 − 𝑏 𝑡 + 𝑎 𝑡 , for all 𝑡 = 0,1,2,⋯

where 𝑏 𝑡 = min 𝑄 𝑡 , 𝑏[𝑡]

9

Arrival Process

𝑎𝑖[t]

𝑄𝑖[𝑡]

Departure (Service) Process

𝑏𝑖[t]

ResourcesQueue



Queue Dynamics

• [Definition (Time-Average Rate)] An arrival process 𝑎[𝑡] 𝑡=0

∞ and a service process 𝑏[𝑡] 𝑡=0∞ have time average rates 𝑎 and 𝑏, respectively, if

lim𝑡→∞

1

𝑡

𝜏=0

𝑡−1

𝑎[𝜏] = 𝑎, lim𝑡→∞

1

𝑡

𝜏=0

𝑡−1

𝑏[𝜏] = 𝑏

• [Definition (Rate Stable)]A queue 𝑄[𝑡] is rate stable if

lim𝑡→∞

𝑄[𝑡]

𝑡= 0

• [Definition (Mean Rate Stable)]A queue 𝑄[𝑡] is mean rate stable if

lim𝑡→∞

𝐸 𝑄[𝑡]

𝑡= 0

10

lim𝑡→∞

𝑄 𝑡 < ∞

lim𝑡→∞

𝐸 𝑄 𝑡 < ∞



Outline

• Motivation

• Theory




• Applications

• Introduction




11



Brief Introduction

• Basic Assumption for Stability: 𝜇 > 𝜆

• Queues

• B/B/1 Queue: Arrival with Bernoulli Distribution and Departure with Bernoulli Distribution

• B/G/1 Queue: Arrival with Bernoulli Distribution and Departure with General Distribution

• G/G/1 Queue: Arrival with General Distribution and Departure with General Distribution

12

Arrival Rate (𝜆)

𝑄[𝑡]

Departure/Service Rate (𝜇)

ResourcesQueue

Queue Size Queueing Delay (Little’s Theorem)

B/B/1𝑄 =

𝜆(1 − 𝜆)

𝜇 − 𝜆 𝑊 =𝑄

𝜆=

1 − 𝜆

𝜇 − 𝜆



Outline

• Motivation

• Theory




• Applications

• Introduction




13



Lyapunov Drift and Optimization

• Time-Average Optimization under Queue Stability

14

minimize: 𝑦0

subject to: • [C1] 𝑦𝑙 ≤ 0, ∀𝑙 ∈ 1,⋯ , 𝐿 ,• [C2] 𝛼 𝑡 ∈ 𝐴𝜔 𝑡 , ∀𝑡 ∈ 0,1,⋯ ,

• [C3] Queue Stability

𝑦0 = lim𝑡→∞

1

𝑡

𝜏=0

𝑡−1

𝑦0[𝑡]

𝑦𝑙 = lim𝑡→∞

1

𝑡

𝜏=0

𝑡−1

𝑦𝑙[𝑡] , ∀𝑙 ∈ 1,⋯ , 𝐿

𝜔[𝑡]: random event at 𝑡𝛼[𝑡]: action control which is chosen

after observing 𝜔[𝑡] at 𝑡𝐴𝜔[𝑡]: an action space associated

with 𝜔[𝑡]

Lyapunov Drift




• [Definition (Quadratic Lyapunov Function) or (Lyapunov Function)]

𝐿 𝑄[𝑡] =1

2 𝑘=1

𝐾 𝑄𝑘2[𝑡], where 𝑄 𝑡 = 𝑄1 𝑡 ,⋯ , 𝑄𝐾 𝑡

• [Definition (One-Slot Conditional Lyapunov Drift) or (Lyapunov Drift)]

∆ 𝑄[𝑡] = 𝐸 𝐿 𝑄[𝑡 + 1] − 𝐿 𝑄[𝑡] 𝑄[𝑡]

means the expected change in the Lyapunov function over one slot 𝑡 → 𝑡 + 1 , given that the current state in slot 𝑡 is 𝑄[𝑡].

15




• [Theorem (Lyapunov Optimization)]

Suppose that 𝐸 𝐿 𝑄[𝑡] < ∞ and there exist constants 𝐵 > 0, 𝑉 ≥ 0, 𝜖 ≥ 0. In addition,

suppose that there exists 𝑦∗ such that for all slots 𝜏 ∈ 0,1,⋯ and all possible values of 𝑄 𝜏 :

∆ 𝑄 𝜏 + 𝑉 ∙ 𝐸 𝑦0 𝜏 𝑄 𝜏 ≤ 𝐵 + 𝑉 ∙ 𝑦∗ − 𝜖

𝑘=1

𝐾

𝑄𝑘 𝜏

• [Theorem (Optimization with Lyapunov Drift)]

∆ 𝑄 𝜏 + 𝑉 ∙ 𝐸 𝑦0 𝜏 𝑄 𝜏 ≤ 𝐵 + 𝑉 ∙ 𝐸 𝑦0 𝜏 𝑄 𝜏 +

𝑘=1

𝐾

𝑄𝑘 𝜏 𝐸 𝑎𝑘 𝑡 − 𝑏𝑘 𝑡 𝑄 𝜏

16




• [Theorem (Lyapunov Drift)]

Consider the quadratic Lyapunov function, and assume 𝐸 𝐿 𝑄[𝑡] < ∞. Suppose that there

exist constants 𝐵 > 0, 𝜖 ≥ 0, such that the following drift condition holds, ∀𝜏 ∈ 0,1,⋯ and all

possible 𝑄 𝜏 :

∆ 𝑄 𝜏 ≤ 𝐵 − 𝜖

𝑘=1

𝐾

𝑄𝑘 𝜏

• Then,• If 𝜖 ≥ 0, then all queues 𝑄𝑘 𝜏 are mean rate stable.

17




• [Theorem (Optimization with Lyapunov Drift)]

∆ 𝑄 𝜏 + 𝑉 ∙ 𝐸 𝑦0 𝜏 𝑄 𝜏 ≤ 𝐵 + 𝑉 ∙ 𝐸 𝑦0 𝜏 𝑄 𝜏 +

𝑘=1

𝐾

𝑄𝑘 𝜏 𝐸 𝑎𝑘 𝑡 − 𝑏𝑘 𝑡 𝑄 𝜏

minimize: 𝑉 ∙ 𝑦0 𝛼 𝑡 , 𝜔 𝑡 + 𝑘=1𝐾 𝑄𝑘 𝑡 ∙ 𝑎𝑘 𝛼 𝑡 , 𝜔 𝑡 − 𝑏𝑘 𝛼 𝑡 , 𝜔 𝑡

18

• Stable?• Separable?• V?



Outline

• Motivation

• Theory




• Applications

• Introduction




19



Introduction (For better understanding…)

• Basic Form (Separable)

minimize: 𝑉 ∙ 𝑦0 𝛼 𝑡 , 𝜔 𝑡 + 𝑘=1𝐾 𝑄𝑘 𝑡 ∙ 𝑎𝑘 𝛼 𝑡 , 𝜔 𝑡 − 𝑏𝑘 𝛼 𝑡 , 𝜔 𝑡

minimize: 𝑉 ∙ 𝑦0 𝛼 𝑡 + 𝑄 𝑡 𝑎 𝛼 𝑡 − 𝑏 𝛼 𝑡

20

Objective Function

ControlObservation



Outline

• Motivation

• Theory




• Applications

• Introduction

• Example #1: Core Scheduling with 3-Core CPU

• Example #2: Buffer-Stable Adaptive Per-Module Power Allocation

• Example #3: Quality-Aware Streaming and Scheduling for Device-to-Device Video Delivery

21



Example #1: Core Scheduling with 3-Core CPU

• Tradeoff • More core allocation more power consumption (-); more departure in queue (good for stability) (+)

• Less core allocation less power consumption (+); less departure in queue (bad for stability) (-)

• Objective of Optimization with the Tradeoff

• We want to minimize time-average power consumption subject to queue stability

22

Arrival Process

𝑎[𝑡]Departure Process

𝑏[𝑡]

Processing with Multiple CoresTask Queue, 𝑄[𝑡]




• Optimization with Lyapunov Drift• Minimize: 𝑉 ∙ 𝑃 𝛼 𝑡 + 𝑄 𝑡 𝑎 𝛼 𝑡 − 𝑏 𝛼 𝑡

• V: Tradeoff parameter

• 𝛼 𝑡 : Core selection action at 𝑡 (in this three core case, 𝛼 𝑡 ∈ 0,1,2,3 )

• 𝑄 𝑡 : Queue backlog−size at 𝑡

• 𝑎 𝛼 𝑡 : Arrival process with given control action at 𝑡: In this case, the arrival is random. So, this will be ignored.

• 𝑏 𝛼 𝑡 : Departure process with given control action at 𝑡

• 𝑃 𝛼 𝑡 : Power consumption when our core selection is 𝛼 𝑡

• Final Form: Minimize: 𝑽 ∙ 𝑷 𝜶 𝒕 − 𝑸 𝒕 𝒃 𝜶 𝒕• Intuition

• If queue is empty (𝑄 𝑡 = 0), we have to minimize 𝑉 ∙ 𝑃 𝛼 𝑡 . This means we do not need to allocate cores.

• If queue is almost infinite (𝑄 𝑡 ≈ ∞), we have to minimize −𝑏 𝛼 𝑡 (i.e., maximize 𝑏 𝛼 𝑡 ). This means we have to allocate all cores.

23




• Example-based Understanding• Minimize: 𝑽 ∙ 𝑷 𝜶 𝒕 − 𝑸 𝒕 𝒃 𝜶 𝒕 [denoted by 𝑭 𝒕 ]

• V = 10; // We want to focus on our objective function ten times more than queue stability

• 𝛼 𝑡 ∈ 0,1,2,3 ; // We can allocation 1, 2, or 3 cores. Or, we can turn off the CPU

• 𝑏 𝛼 𝑡 ;

• 𝑏 𝛼 𝑡 = 0 =0; // If no cores are selected, there is no departure process.

• 𝑏 𝛼 𝑡 = 1 =6; // If 1 core is selected, 6 tasks will be processed.

• 𝑏 𝛼 𝑡 = 2 =11; // If 2 cores are selected, 11 tasks will be processed.

• 𝑏 𝛼 𝑡 = 3 =15; // If 3 cores are selected, 15 tasks will be processed.

• 𝑃 𝛼 𝑡 ;

• 𝑃 𝛼 𝑡 = 0 = 0; // If no cores are selected, there is no power consumption.

• 𝑃 𝛼 𝑡 = 1 =3; // If 1 core is selected, there exits 3 amounts of power consumption.

• 𝑃 𝛼 𝑡 = 2 =5; // If 2 cores are selected, there exits 5 amounts of power consumption.

• 𝑃 𝛼 𝑡 = 3 =6; // If 3 cores are selected,there exits 6 amounts of power consumption.

24




• Tradeoff Table

• Actions: 𝑭 𝒕 = 𝑽 ∙ 𝑷 𝜶 𝒕 − 𝑸 𝒕 𝒃 𝜶 𝒕 where 𝑉 = 10

25

𝑏 𝛼 𝑡 𝑃 𝛼 𝑡

𝛼 𝑡 = 0 0 0

𝛼 𝑡 = 1 6 3

𝛼 𝑡 = 2 11 5

𝛼 𝑡 = 3 15 6

𝑡 = 0 𝑡 = 1 𝑡 = 2 𝑡 = 3

Queue-Backlog, 𝑄 𝑡 0 7 3 5

Arrival (random), 𝑎 𝑡 7 3 2 2

𝛼 𝑡 𝛼 𝑡 = 0 10 ∙ 0 − 0 ∙ 0 = 0 10 ∙ 0 − 7 ∙ 0 = 0 10 ∙ 0 − 3 ∙ 0 = 0 10 ∙ 0 − 5 ∙ 0 = 0

𝛼 𝑡 = 1 10 ∙ 3 − 0 ∙ 6 = 30 10 ∙ 3 − 7 ∙ 6 = −12 10 ∙ 3 − 3 ∙ 6 = 12 10 ∙ 3 − 5 ∙ 6 = 0

𝛼 𝑡 = 2 10 ∙ 5 − 0 ∙ 11 = 50 10 ∙ 5 − 7 ∙ 11 = −27 10 ∙ 5 − 3 ∙ 11 = 17 10 ∙ 5 − 5 ∙ 11 = −5

𝛼 𝑡 = 3 10 ∙ 6 − 0 ∙ 15 = 60 10 ∙ 6 − 7 ∙ 15 = −45 10 ∙ 6 − 3 ∙ 15 = 15 10 ∙ 6 − 5 ∙ 15 = −15



Outline

• Motivation

• Theory




• Applications

• Introduction




26



Example #2: Buffer-Stable Adaptive Per-Module Power Allocation

• Tradeoff • More MAA power-on more power consumption (-); more departure in queue (good for stability) (+)

• Less MAA power-on less power consumption (+); less departure in queue (bad for stability) (-)


• We want to minimize time-average power consumption subject to queue stability

27




• Optimization with Lyapunov Drift• Minimize: 𝑉 ∙ 𝑃 𝛼 𝑡 + 𝑄 𝑡 𝑎 𝛼 𝑡 − 𝑏 𝛼 𝑡


• 𝛼 𝑡 : MAA power-on/off action at 𝑡 (𝛼 𝑡 ∈ 0,1,2,3,4,5,6,7,8 ), i.e., the number of power-on MAAs


• 𝑎 𝛼 𝑡 : Arrival process with given control action at 𝑡: In this case, the arrival is random. So, this will be ignored.

• 𝑏 𝛼 𝑡 : Departure process with given control action at 𝑡

• 𝑃 𝛼 𝑡 : Power consumption when our MAA power-on/off decision is 𝛼 𝑡

• Final Form: Minimize: 𝑽 ∙ 𝑷 𝜶 𝒕 − 𝑸 𝒕 𝒃 𝜶 𝒕• Intuition

• If queue is empty (𝑄 𝑡 = 0), we have to minimize 𝑉 ∙ 𝑃 𝛼 𝑡 . This means we do not need to power-on MAAs.

• If queue is almost infinite (𝑄 𝑡 ≈ ∞), we have to minimize −𝑏 𝛼 𝑡 (i.e., maximize 𝑏 𝛼 𝑡 ). This means we have to power-on all MAAs.

28




• Plotting Result

29

More Queue Stability(V decreases)

More Energy-Efficiency(V increases)



Outline

• Motivation

• Theory




• Applications

• Introduction




30



Example #3: Quality-Aware Streaming and Scheduling

• Tradeoff

• High compression on chunks

• Low quality on chunks (-);

• More stabilization on queues (+)

• Less compression on chunks

• High quality on chunks (+);

• Less stabilization on queues (-)


• We want to maximize time-average video quality subject to queue stability

31



Example #3: Quality-Aware Streaming and Scheduling

• Optimization with Lyapunov Drift• Maximize: 𝑉 ∙ 𝑄𝑢𝑎𝑙𝑖𝑡𝑦 𝛼 𝑡 − 𝑄 𝑡 𝑎 𝛼 𝑡 − 𝑏 𝛼 𝑡


• 𝛼 𝑡 : Compression action at 𝑡 (𝛼 𝑡 ∈ 1,2,3 ), i.e., three different levels of compression


• 𝑎 𝛼 𝑡 : Arrival process with given 𝛼 𝑡 at 𝑡: This is the size of chunks (due to three different levels of compression)

• 𝑏 𝛼 𝑡 : Departure process at 𝑡: In this case, the system transmits packets as much as the network allows.

• 𝑃 𝛼 𝑡 : Power consumption when our MAA power-on/off decision is 𝛼 𝑡

• Final Form: Maximize: 𝑽 ∙ 𝑸𝒖𝒂𝒍𝒊𝒕𝒚 𝜶 𝒕 − 𝑸 𝒕 𝒂 𝜶 𝒕• Intuition

• If queue is empty (𝑄 𝑡 = 0), we have to maximize 𝑉 ∙ 𝑄𝑢𝑎𝑙𝑖𝑡𝑦 𝛼 𝑡 . This means we are doing less compression for better quality of streaming.

• If queue is almost infinite (𝑄 𝑡 ≈ ∞), we have to minimize 𝑎 𝛼 𝑡 . This means we are doing high compression for queue stability.

32



33







Outline





34




• Markov Decision Process (MDP) Components: <𝑆, 𝐴, 𝑅, 𝑇, 𝛾>

• 𝑆: Set of states

• 𝐴: Set of actions

• 𝑅: Reward function

• 𝑇: Transition function

• 𝛾: Discount factor

35

How can we use MDP to model agent in a maze?





• 𝑺: Set of states





36

𝑆: location (𝑥, 𝑦) if the maze is a 2D grid• 𝑠0: starting state• 𝑠: current state• 𝑠′: next state• 𝑠𝑡: state at time 𝑡






• 𝑨: Set of actions




37

𝑆: location (𝑥, 𝑦) if the maze is a 2D grid𝐴: move up, down, left, or right• 𝑠 → 𝑠′







• 𝑹: Reward function



38

𝑆: location (𝑥, 𝑦) if the maze is a 2D grid𝐴: move up, down, left, or right𝑅: how good was the chosen action?• 𝑟 = 𝑅 𝑠, 𝑎, 𝑠′

• -1 for moving (battery used)• +1 for jewel? +100 for exit?








• 𝑻: Transition function


39

𝑆: location (𝑥, 𝑦) if the maze is a 2D grid𝐴: move up, down, left, or right𝑅: how good was the chosen action?𝑇: where is the robot’s new location?• 𝑇 = 𝑠′ 𝑠, 𝑎

Stochastic Transition









• 𝜸: Discount factor

40

𝑆: location (𝑥, 𝑦) if the maze is a 2D grid𝐴: move up, down, left, or right𝑅: how good was the chosen action?𝑇: where is the robot’s new location?𝛾: how much does future reward worth? • 0 ≤ 𝛾 ≤ 1

• 𝛾 ≈ 0: future reward is near 0(immediate action is preferred)



Outline

• Markov Decision Process (MDP)

• Markov Property

• Policy and Return

• Value Functions (V, Q)

• Solving MDP

• Planning

• Reinforcement Learning (Value-based)

• Monte-Carlo Method

• TD Method (Q-Learning)

• Reinforcement Learning (Policy-based) advanced topic (out of scope)

41



Markov Property

• Does 𝑠𝑡+1 depend on 𝑠0, 𝑠1, ⋯ , 𝑠𝑡−1, 𝑠𝑡 ? No.

• Memoryless!

• Future only depends on present

• Current state is a sufficient statistic of agent’s history

• No need to remember agent’s history

• 𝑠𝑡+1 depends only on 𝑠𝑡 and 𝑎𝑡

• 𝑟𝑡 depends only on 𝑠𝑡 and 𝑎𝑡

42



Outline


• Markov Property



• Solving MDP

• Planning





43



Policy and Return

• Policy• 𝜋: 𝑆 → 𝐴

• Maps states to actions

• Gives an action for every state

• Return

• Discounted sum of rewards

• Could be undiscounted Finite horizon

44

𝑅𝑡 =

𝑘=0

∞

𝛾𝑘𝑟𝑡+𝑘

Our goal:Find 𝜋 that maximizes expected return!



Outline


• Markov Property



• Solving MDP

• Planning




• TD(𝜆) Method


45



Value Functions (V, Q)

• State Value Function (𝑉)

• Expected return of starting at state 𝑠 and following policy 𝜋

• How much return do I expect starting from state 𝑠?

• Action Value Function (𝑄)

• Expected return of starting at state 𝑠, taking action 𝑎, and then following policy 𝜋

• How much return do I expect starting from state 𝑠 and taking action 𝑎?

46

𝑉𝜋 𝑠 = 𝐸𝜋 𝑅𝑡 𝑠𝑡 = 𝑠 = 𝐸𝜋 𝑘=0∞ 𝛾𝑘𝑟𝑡+𝑘 𝑠𝑡 = 𝑠

𝑄𝜋 𝑠, 𝑎 = 𝐸𝜋 𝑅𝑡 𝑠𝑡 = 𝑠, 𝑎𝑡 = 𝑎 = 𝐸𝜋 𝑘=0∞ 𝛾𝑘𝑟𝑡+𝑘 𝑠𝑡 = 𝑠, 𝑎𝑡 = 𝑎



Outline


• Markov Property



• Solving MDP

• Planning





47



Planning

• Again, our goal is to find the optimal policy

• If 𝑇 𝑠′ 𝑠, 𝑎 and 𝑅 𝑠, 𝑎, 𝑠′ are known, this is a planning problem.

• We can use dynamic programming to find the optimal policy.

• Keywords: Bellman equation, value iteration, policy iteration

48

𝜋∗ 𝑠 = max𝜋

𝑅𝜋 𝑠



Planning

• Bellman Equation

• Value Iteration

• Policy Iteration

• Policy Evaluation

• Policy Improvement

49

∀𝑠 ∈ 𝑆: 𝑉∗ 𝑠 = max𝑎

𝑠′

𝑇 𝑠, 𝑎, 𝑠′ 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉∗ 𝑠′

∀𝑠 ∈ 𝑆: 𝑉𝑖+1 𝑠 ← max𝑎

𝑠′

𝑇 𝑠, 𝑎, 𝑠′ 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉∗ 𝑠′

∀𝑠 ∈ 𝑆: 𝑉𝑖+1𝜋𝑘 𝑠 ←

𝑠′

𝑇 𝑠, 𝜋𝑘(𝑠), 𝑠′ 𝑅 𝑠, 𝜋𝑘(𝑠), 𝑠

′ + 𝛾𝑉𝑖𝜋𝑘 𝑠′

𝜋𝑘+1 𝑠 = argmax𝑎

𝑠′

𝑇 𝑠, 𝑎, 𝑠′ 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉𝜋𝑘 𝑠′



Reinforcement Learning (Value-based)

• If 𝑇 𝑠′ 𝑠, 𝑎 and 𝑅 𝑠, 𝑎, 𝑠′ are unknown, this is a reinforcement learning problem.

• Agent need to interact with the world and gather experience

• At each time-step,

• From state 𝑠

• Take action 𝑎 (𝑎 = 𝜋(𝑠) if stochastic)

• Receive reward 𝑟

• End in state 𝑠′

• Value-based: learn an optimal value function from these data

50



Reinforcement Learning (Value-based): Monte-Carlo Method

• One way to learn 𝑄(𝑠, 𝑎)

• Use empirical mean return instead of expected return

• Average sampled returns

• Policy chooses action that maximize 𝑄(𝑠, 𝑎)

• Using 𝑉(𝑠) requires the model:

51

𝑄 𝑠, 𝑎 =𝑅1 𝑠, 𝑎 + 𝑅2 𝑠, 𝑎 + ⋯+ 𝑅𝑛 𝑠, 𝑎

𝑛

𝜋(𝑠) = max𝑎

𝑄(𝑠, 𝑎)

𝜋 𝑠 = argmax𝑎

𝑠′

𝑇 𝑠, 𝑎, 𝑠′ 𝑅 𝑠, 𝑎, 𝑠′ + 𝛾𝑉 𝑠′



52







Outline





53



Artificial Neural Networks: Introduction

54



Artificial Neural Networks: Basic Example in Logic Design

55

AND gate

OR gate

NOT gate



Artificial Neural Networks: A Network of Artificial Neurons

• A Network of Artificial Neurons

• Nonlinear I/O mapping

• Adaptivity

• Generalization ability

• Fault-tolerance (graceful degradation)

• Biological analogy

56

Multilayer Perceptron Network




57

Multilayer Perceptron Network

Weight Adaptation




58

Example: Character RecognitionWeight Adaptation



Neural Networks: Multilayer Perceptron Network (MPN)

• Benefits of Multilayer Perceptron Network (MPN)

59

1st LayerPerceptron Network

2nd LayerPerceptron Network

3rd LayerPerceptron Network



Questions?

60

machine learning techniques for mobile...

Documents