running multiple workloads on a gpu a ux oriented approach · a ux oriented approach yuval sarna...

63
Running Multiple Workloads on a GPU A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming

Upload: others

Post on 30-Sep-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Running Multiple Workloads on a GPUA UX Oriented Approach

Yuval SarnaGraphics Software Expert @ GameFly Streaming

Page 2: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Agenda

• Sharing the GPU

• We all like to Play

• Introduction to GPU Scheduling

• Proposed GPU Scheduler

• Summary & Q&A

Page 3: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

What does it means to “share the GPU”?

• Most modern applications use the GPU

• They all share the same hardware resources –CPU, RAM, GPU, etc.

• The GPU executes tasks coming from different processes, satisfying their needs – be it Graphical HW Acceleration GPGPU Etc.

Page 4: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

The GPU Model

• Many physical cores but a single core computational model (no “SetAffinity”)

• Access model is FIFO, no fairness, no preemption

• Many processes use the GPU simultaneously – can process only one task at a time

Page 5: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Why do we need to share the GPU?

• Cost Efficiency

• Cloud Environments

• Academic Super-Computers

Page 6: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Difficulties in Sharing the GPU Efficiently

• Running non-demanding application in parallel is easy Not real-time – i.e., don’t require low latency

• When it comes to running multiple demanding workloads on the GPU, sharing becomes difficult Which workload should execute now? How do we handle greedy workloads? What do we expect from a GPU sharing scheme?

Page 7: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Efficient GPU Sharing

• Utilizing the GPU

• Fairness of GPU between applications

• Smooth User Experience (UX)

Page 8: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Agenda

• Sharing the GPU

• We all like to Play

• Introduction to GPU Scheduling

• Proposed GPU Scheduler

• Summary & Q&A

Page 9: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Case Study – GameFly Streaming

Page 10: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Case Study – GameFly Streaming

Rendered frames are streamed as video in real-

time to the client

Gamepad commands are sent back to the server

Game is running (and rendered)

on a server

Page 11: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

The Technology

Game is running (and rendered)

on a server

Rendered frames are streamed as video in real-

time to the client

Page 12: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Agenda

• Sharing the GPU

• We all like to Play

• Introduction to GPU Scheduling

• Proposed GPU Scheduler

• Summary & Q&A

Page 13: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Definitions & Assumptions

GPU

Node 1 Node 2 Node 3 Node 4Node 0 Node 5 Node 6

GPU Scheduler

Process

Context

Context

Process

Context

Context

Context

Context

Process

Node 2

Command Buffers

Page 14: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Scheduling Efficiency

To measure the efficiency of a scheduling algorithm, we may look at two main factors:

• Maximum utilization of the GPU The algorithm should allow it to be 100% utilized.

• Number of frames that missed their deadline With relation to them exceeding their expected

time.

• Ask your target audience

Efficient GPU Sharing

• Utilizing the GPU

• Fairness of GPU between applications

• Smooth User Experience (UX)

0ms Deadline 33msBeginFrame

CB CB CB CB?

Page 15: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Fairness

If life is unfair to everyone,

Isn’t life fair?

Page 16: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

How is it done?

• Windows Display Driver Model

• Stall command buffers if they shouldn’t yet be submitted for GPU execution

Application

Direct3D runtime User-modedisplay driver OpenGL runtime Win32®

GDI

OpenGL installableclient driver (ICD)

Kernel-mode access(gdi32.dll)

Win32K.sysDirectX graphics kernel subsystem (Dxgkrnl.sys), which includesDisplay port driver, video memory manager, and GPU scheduler

Display miniport driver

User Mode

Kernel Mode

DirectX graphics kernel subsystem (Dxgkrnl.sys), which includesDisplay port driver, video memory manager, and GPU scheduler

Page 17: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Windows OS GPU Scheduler

• Round-Robin scheduling algorithm

• Let’s take a look at a video showing the issues GPU utilization is ~105% Six concurrent games –

• 5 Overlord II• 1 Alan Wake’s American Nightmare

Running on NVIDIA GRID K520

Page 18: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Windows OS GPU Scheduler

• Round-Robin scheduling algorithm

• Let’s take a look at a video showing the issues GPU utilization is ~105% Six concurrent games –

• 5 Overlord II• 1 Alan Wake’s American Nightmare

X5

Page 19: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Windows OS GPU Scheduler

• Round-Robin scheduling algorithm

• Let’s take a look at a video showing the issues GPU utilization is ~105% Six concurrent games –

• 5 Overlord II• 1 Alan Wake’s American Nightmare

X5

Page 20: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Windows OS GPU Scheduler

• Round-Robin scheduling algorithm

• Let’s take a look at a video showing the issues GPU utilization is ~105% Six concurrent games –

• 5 Overlord II• 1 Alan Wake’s American Nightmare

X5

Page 21: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Windows OS GPU Scheduler

• Round-Robin scheduling algorithm

• Let’s take a look at a video showing the issues GPU utilization is ~105% Six concurrent games –

• 5 Overlord II• 1 Alan Wake’s American Nightmare

Running on NVIDIA GRID K520

Page 22: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

A Look Behind the Scenes

Grey command buffersare new frames released

by the game

~142ms ~48ms ~57ms ~35ms ~80ms

Page 23: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

A Look Behind the Scenes

~24ms

Page 24: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Agenda

• Sharing the GPU

• We all like to Play

• Introduction to GPU Scheduling

• Proposed GPU Scheduler

• Summary & Q&A

Page 25: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Why can it be done better?

• We know what kind of workloads we want to schedule

• We can set a target performance

• Our scheduler doesn’t have to be generic

Page 26: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

GPU Resources• For example, say we set the target performance to a 30 frames per-

second (FPS) rate

• Each frame shouldn’t take more than ~33ms

• These are the GPU’s resources we have to manage and schedule

• We don’t allow running more than “33 blocks” worth of workloads concurrently But is it enough?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33GPU

Page 27: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

First Attempt – Earliest Deadline First

• Prioritize CBs with earlier deadlines using the following data:

The time it took the context to complete the previous frame The time a context has used so far to create the current frame

Page 28: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Round-Robin Scheduling

???

0 8 16 24 32 40MS

Page 29: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Round-Robin Scheduling

0 8 16 24 32 40MS

Page 30: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Round-Robin Scheduling

??

0 8 16 24 32 40MS

Page 31: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Round-Robin Scheduling

??

0 8 16 24 32 40MS

Page 32: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

0 8 16 24 32 40MS

Round-Robin Scheduling

??

Page 33: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

0 8 16 24 32 40MS

Round-Robin Scheduling

??

Page 34: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

0 8 16 24 32 40MS

Round-Robin Scheduling33 ms – Frame Deadline

Tomb Raider is the only game that managed to complete its frame before the deadline.

Page 35: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

First Attempt – Earliest Deadline First

???

0 8 16 24 32 40MS

Page 36: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

First Attempt – Earliest Deadline First

0 8 16 24 32 40MS

Page 37: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

First Attempt – Earliest Deadline First

??

0 8 16 24 32 40MS

Page 38: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

First Attempt – Earliest Deadline First

??

0 8 16 24 32 40MS

This game has already started, so its priority is higher.

Page 39: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

0 8 16 24 32 40MS

First Attempt – Earliest Deadline First

??

This game has already started, so its priority is higher.

Page 40: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

0 8 16 24 32 40MS

First Attempt – Earliest Deadline First

??

Page 41: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

0 8 16 24 32 40MS

First Attempt – Earliest Deadline First33 ms – Frame Deadline

Both Tomb Raider & MotoGP15 completed their frames before the deadline.

Page 42: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Results

• 10 games running concurrently• UX is improved – frames interval variance is reduced significantly

0

200

400

600

800

1000

1200

1400

0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96Frames Interval (ms)

Windows GPU Scheduler

Sum

0

200

400

600

800

1000

1200

1400

0 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 90

Frames Interval (ms)

EDF GPU Scheduler

Sum

Page 43: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

First Attempt – Earliest Deadline First

• Drawbacks: Tries to schedule more than 100% capacity worth of work.

Greedy workloads get the highest priority

The innocents suffer from low FPS and stuttering

Page 44: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

First Attempt – Earliest Deadline First

• Drawbacks: Tries to schedule more than 100% capacity worth of work.

Greedy workloads get the highest priority

The innocents suffer from low FPS and stuttering

Page 45: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

First Attempt – Earliest Deadline First

• Drawbacks: Tries to schedule more than 100% capacity worth of work.

Greedy workloads get the highest priority

The innocents suffer from low FPS and stuttering

Page 46: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Proposed New Scheduling Algorithm

• The proposed new algorithm uses a combination of two principles:

Each process gets a time quantum.• If the time quantum is depleted before finishing the frame, the process may not

further submit tasks for execution.• The time given to all processes will always be equal to the global frame time (for

example, 33ms).

Amongst those with available time quantum, use priorities using:• Deadline.• Other schemes

Page 47: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Definitions

• n – Number of running processes.• i – The index of a process (counting from 1).• 𝑻𝑻𝒊𝒊 – The time the previous frame took for process i.• 𝑬𝑬𝑬𝑬𝑻𝑻𝒊𝒊 – The expected time a single frame will take for process i.• 𝑫𝑫𝒊𝒊 – How much did process i exceeded its expected frame time,

compared to the previous frame. 𝐷𝐷𝑖𝑖 ≥ 0.• Time(i) – The new time quantum process i receives.• FT – The global Frame Time. This dictates the deadlines. For

example, for a 30FPS target, the FT is ~33.66ms.

Page 48: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Calculating Time Quantum

1. If ∑𝑖𝑖=1𝑛𝑛 𝑇𝑇𝑇𝑇 = 0 : 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇 = 𝐹𝐹𝑇𝑇

2. Else If 0 < ∑𝑖𝑖=1𝑛𝑛 𝑇𝑇𝑇𝑇 ≤ 𝐹𝐹𝑇𝑇 : 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇 = 𝑇𝑇𝑖𝑖

∑𝑗𝑗=1𝑛𝑛 𝑇𝑇𝑗𝑗

∗ 𝐹𝐹𝑇𝑇

3. Else :

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇 = 𝑇𝑇𝑖𝑖 −(∑𝑗𝑗=1

𝑛𝑛 𝑇𝑇𝑇𝑇−𝐹𝐹𝑇𝑇)

(∑𝑗𝑗=1𝑛𝑛 𝐷𝐷𝑗𝑗)

∗ 𝐷𝐷𝑖𝑖

• 𝐼𝐼𝐼𝐼 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇 ≤ 0 → 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇 = 𝑇𝑇𝑖𝑖

Utilization

0%

≤ 100%

> 100%

Page 49: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Example 1 Utilization < 100%

n = 3FT = 33ms

𝑫𝑫𝒊𝒊𝑬𝑬𝑬𝑬𝑻𝑻𝒊𝒊𝑻𝑻𝒊𝒊𝐷𝐷1 = 𝑇𝑇1 − 𝐸𝐸𝐹𝐹𝑇𝑇 = 088P1

01312P2235P3

�𝑖𝑖=1

3

𝐷𝐷𝑖𝑖 = 2𝑇𝑇𝑚𝑚�𝑖𝑖=1

3

𝑇𝑇𝑖𝑖 = 25 𝑇𝑇𝑚𝑚Total

Utilization: ∑𝑖𝑖=13 𝑇𝑇𝑖𝑖𝐹𝐹𝑇𝑇

= 2533

= ~75%

Page 50: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Example 1 Utilization < 100%

• Here’s the time quantum each process will get for the current frame:

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 1 = 𝑇𝑇1∑𝑗𝑗=13 𝑇𝑇𝑗𝑗

∗ 𝐹𝐹𝑇𝑇 = 825∗ 33 = 10.56 𝑇𝑇𝑚𝑚

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 2 = 𝑇𝑇2∑𝑗𝑗=13 𝑇𝑇𝑗𝑗

∗ 𝐹𝐹𝑇𝑇 = 1225∗ 33 = 15.84 𝑇𝑇𝑚𝑚

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 3 = 𝑇𝑇3∑𝑗𝑗=13 𝑇𝑇𝑗𝑗

∗ 𝐹𝐹𝑇𝑇 = 525∗ 33 = 6.6 𝑇𝑇𝑚𝑚

𝑫𝑫𝒊𝒊𝑬𝑬𝑬𝑬𝑻𝑻𝒊𝒊𝑻𝑻𝒊𝒊

088P1

01312P2

235P3

2. If 0 < ∑𝑖𝑖=1𝑛𝑛 𝑇𝑇𝑇𝑇 ≤ 𝐹𝐹𝑇𝑇 :

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇 =𝑇𝑇𝑖𝑖

∑𝑇𝑇=1𝑛𝑛 𝑇𝑇𝑇𝑇∗ 𝐹𝐹𝑇𝑇

Utilization: ∑𝑖𝑖=13 𝑇𝑇𝑖𝑖𝑚𝑚𝑒𝑒(𝑖𝑖)

𝐹𝐹𝑇𝑇= 33

33= 1 → 100%

Page 51: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Example 2 Utilization > 100%

n = 3FT = 33ms

𝑫𝑫𝒊𝒊𝑬𝑬𝑬𝑬𝑻𝑻𝒊𝒊𝑻𝑻𝒊𝒊𝐷𝐷1 = 𝑇𝑇1 − 𝐸𝐸𝐹𝐹𝑇𝑇 = 01010P1

61016P22810P3

�𝑖𝑖=1

3

𝐷𝐷𝑖𝑖 = 8𝑇𝑇𝑚𝑚�𝑖𝑖=1

3

𝑇𝑇𝑖𝑖 = 36 𝑇𝑇𝑚𝑚Total

Utilization: ∑𝑖𝑖=13 𝑇𝑇𝑖𝑖𝐹𝐹𝑇𝑇

= 3633

= ~110%

Page 52: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Example 2 Utilization > 100%

• 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 1 = 𝑇𝑇1 −∑𝑗𝑗=13 𝑇𝑇𝑇𝑇−𝐹𝐹𝑇𝑇

∑𝑗𝑗=13 𝐷𝐷𝑗𝑗

∗ 𝐷𝐷1 = 10 − 36−338

∗ 0 = 10 − 38∗ 0 = 10 𝑇𝑇𝑚𝑚

• 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 2 = 𝑇𝑇2 −∑𝑗𝑗=13 𝑇𝑇𝑇𝑇−𝐹𝐹𝑇𝑇

∑𝑗𝑗=13 𝐷𝐷𝑗𝑗

∗ 𝐷𝐷2 = 16 − 36−338

∗ 6 = 16 − 38∗ 6 = 13.75 𝑇𝑇𝑚𝑚

• 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 3 = 𝑇𝑇3 −∑𝑗𝑗=13 𝑇𝑇𝑇𝑇−𝐹𝐹𝑇𝑇

∑𝑗𝑗=13 𝐷𝐷𝑗𝑗

∗ 𝐷𝐷3 = 10 − 36−338

∗ 2 = 10 − 38∗ 2 = 9.25 𝑇𝑇𝑚𝑚

𝑫𝑫𝒊𝒊𝑬𝑬𝑬𝑬𝑻𝑻𝒊𝒊𝑻𝑻𝒊𝒊

01010P1

61016P2

2810P3

3. Else:

𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑇𝑇 = 𝑇𝑇𝑖𝑖 −(∑𝑇𝑇=1𝑛𝑛 𝑇𝑇𝑇𝑇 − 𝐹𝐹𝑇𝑇)

(∑𝑇𝑇=1𝑛𝑛 𝐷𝐷𝑇𝑇)∗ 𝐷𝐷𝑖𝑖

Utilization: ∑𝑖𝑖=13 𝑇𝑇𝑖𝑖𝑚𝑚𝑒𝑒(𝑖𝑖)

𝐹𝐹𝑇𝑇= 33

33= 1 → 100%

Page 53: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Calculating Priorities

• To address the case where we have several processes with enough time quantum, each process also gets a priority

• Priorities are given based on the deadline by using Earliest Deadline First

• Other schemes may be used – For example, we could take into account the amount of the time the

process exceeded its expected frame time

Page 54: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Example 1 Context given enough QT

• Context received 12ms time quantum

• Finished Frame @ 10ms• QT Left – 2ms

33ms0msBeginFrame EndFrame Deadline BeginFrame

10ms

Page 55: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Example 2 Context not given enough QT

• Context received 10ms time quantum, needs 14ms

• FPS Drop to 27FPS

33ms0msBeginFrame Out of Time Quantum.

All future CBs mustwait.

Deadline New timeQuantum given.

10ms 37msEndFrame BeginFrame

Page 56: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Results

• Let’s take a look at a video showing the scheduler’s result GPU utilization is ~105% Six concurrent games –

• 5 Overlord II• 1 Alan Wake’s American Nightmare

Running on NVIDIA GRID K520

Page 57: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Results

• Let’s take a look at a video showing the scheduler’s result GPU utilization is ~105% Six concurrent games –

• 5 Overlord II• 1 Alan Wake’s American Nightmare

X5

Page 58: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Results

• Let’s take a look at a video showing the scheduler’s result GPU utilization is ~105% Six concurrent games –

• 5 Overlord II• 1 Alan Wake’s American Nightmare

X5

Page 59: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Results

• Let’s take a look at a video showing the scheduler’s result GPU utilization is ~105% Six concurrent games –

• 5 Overlord II• 1 Alan Wake’s American Nightmare

X5

Page 60: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Results

• Let’s take a look at a video showing the scheduler’s result GPU utilization is ~105% Six concurrent games –

• 5 Overlord II• 1 Alan Wake’s American Nightmare

Running on NVIDIA GRID K520

Page 61: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Results

Purple command buffersare new frames released

by the game

Page 62: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Agenda

• Sharing the GPU

• We all like to Play

• Introduction to GPU Scheduling

• Proposed GPU Scheduler

• Summary & Q&A

Page 63: Running Multiple Workloads on a GPU A UX Oriented Approach · A UX Oriented Approach Yuval Sarna Graphics Software Expert @ GameFly Streaming. ... • UX is improved – frames interval

Thank You!

• You’re more than welcome to talk to me after the lecture or email me

Yuval [email protected]

• Please don’t forget to fill out the survey