may 8-11 2017 | silicon valley evaluating windows 10 learn … · 2017. 6. 9. · windows 10...
TRANSCRIPT
May 8-11 2017 | Silicon Valley
EVALUATING WINDOWS 10LEARN WHY YOUR USERS NEED GPU ACCELERATION
Jason Kyungho Lee, Sr Performance Engineer, NVIDAI GRID @NVIDIA
Hari Sivaraman, Staff Engineer @ VMware
2
AGENDA
• Introduction
• Latest Announcements
• Windows 10 vs. Windows 7
• Performance Testing
• Summary
3
TESLA LINEUP FOR GRIDThe most powerful data center GPUs targeted at graphics virtualization
M10 M6 M60
GPU Quad Mid-level Maxwell Single High-end Maxwell Dual High-end Maxwell
CUDA Cores 2560 (640 per GPU) 1536 4096 (2048 per GPU)
Memory Size 32 GB GDDR5 (8 GB per GPU) 8 GB GDDR5 16 GB GDDR5 (8GB per GPU)
H.264 1080p30 streams 28 18 36
Max vGPU instances 64 16 32
Form Factor PCIe 3.0 Dual Slot (rack servers) MXM (blade servers) PCIe 3.0 Dual Slot (rack servers)
Power 225W 100W (75W opt) 240W / 300W (225W opt)
Thermal passive bare board active / passive
USER DENSITYOptimized
BLADEOptimized
PERFORMANCEOptimized
5
LATEST ANNOUNCEMENTS
6
LATEST ANNOUNCEMENTS
• Instant Clone Support (VMware Horizon 7.1)
• Allows ultra fast provisioning of virtual machines.
• NVIDIA is the only GPU vendor supported
• High Availability Support(VMware vSphere 6.5)
• vSphere 6.5 supports HA for NVIDIA GRID vGPU enabled virtual machines
• Multi Monitor support with Blast Extreme H.264 HW (VMware Horizon 7.1)
• Offload the H.264 encode to the NVIDIA GPU for improved and predictable UX
S7763 - DELIVER A TRANSFORMATIVE 3D GRAPHICS USER EXPERIENCE WITH VMWARE HORIZON, BLAST EXTREME ADAPTIVE TRANSPORT, AND NVIDIA GRID
S7429 - EXPERT AND CUSTOMER ROUNDTABLE: REAL-WORLD TALES OF GPU-ACCELERATED DESKTOPS AND APPS - IMPLEMENTERS SHARE BEST PRACTICES
7
WINDOWS 10
8
WINDOWS 10 NEW CHANGES
• Visual compelling Modern UI / Menu with transparency
• No Modern UI Disabling, assumption is you have GPU on Windows 10
• GPU accelerated Virtual desktop / Task view / Alt-TAB preview
• Video playback GPU acceleration by default media player
• GPU accelerated font(DPI) and display scaling with Ultra high definition resolution
• Windows Device Driver Model WDDM 2.0 / DirectX 12 supported
• Microsoft Edge GPU acceleration
9
WINDOWS 10 REQUIRES MORE RESOURCES FOR IMPROVEMENT USER EXPERIENCE
Windows 10 requires more GPU frame bufferWindows 10 requires more CPU cycles
0
100
200
300
400
Windows 7(single
1920x1080)
Windows 10(single
1920x1080)
Windows 10(single
2560x1600)
Windows 10(dual
1920x1080)
0
10
20
30
40
50
60
70
80
90
100
CPU
host
uti
liza
tion %
Time
Windows 7 Windows 10
64 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload
15% more CPU utilization
10
WINDOWS START BUTTON EXPERIENCEThis is Side-by-Side
11
PERFORMANCE TESTING
12
• Two identical servers run LoginVSI Knowledge Worker to create a realistic customer environment
• CPU Utilization of the hosts is around 60-80%
• Testers don’t know which session is GPU accelerated
• Testers do the same tasks on both systems
• Access Devices (Thin Client/Monitor/Mouse/Keyboard) are the same with a single screen and 1080p resolution
• Predefined scenarios plus freestyle at the end.
• Scenarios include (Browsing, YouTube, Creation of PowerPoint, Google Maps, WebGL)
TEST SETUP - SUBJECTIVE USER TESTING
13
0.0
1.0
2.0
3.0
4.0
5.0
Horizon 7 with PCoIP - No GPU Horizon 7 with Blast Extreme and H.264 HW
CPU ONLY VS. NVIDIA GRIDGPU with NVENC provide an average positive increase to UX of 34%
Higher is
better
Testing ran on two identical systems, CPU system was loaded up to 60-80% utilization, the GPU system ran the same workload
User Experience Scale
1 Unacceptable, unusable -
fire someone in IT!
2 Barely useable, borderline,
but I’ll get tired of this
soon
3 Tolerable, I guess I can
make do
4 Pretty good for a virtual
desktop
5 Outstanding - as good (or
almost) as physical
+20% +5%+19% +65%
+6% +21%+55% +26%
+9%+13%+13% +30% +68%+133%
15
CLICK TO PHOTONWhat it is and why it matters
• Click-to-Photon is more than network latency
• Click-to-Photon is a key metric that contributes to the overall user experience
• Click-to-Photon defines how interactive/snappy the solution is
• Click-to-Photon measures the overall latency from the user perspective
• Click-to-Photon measures the time of the mouse click till the action is visible to the user
• includes latency of the USB device process, rendering the frame, displaying the frame, etc.
• Click-to-Photon in remote environments (VDI, etc.) in addition includes
• encode latency, network latency and decode latency
16
CLICK TO PHOTON SIMPLIFIED
Mouse button
releasedMouse click
processed
Packetized and
encoded
Packet Received Packed Decoded
Frame displayedPacket
transmitted
Network Latency on the WAN
(i.e. 50ms)
CLICK-TO-PHOTON CAPTURES THE OVERALL LATENCY
Network Latency on the WAN
(i.e. 50ms)
Access Device
ServerPacket Received
Mouse click
processed
New Frame
renderedFrame Captured
via NVIDIA NVFBC
Frame Encoded
via NVIDIA NVENC
Frame
transmitted
Packet Decoded Application
17
CLICK TO PHOTON SIMPLIFIED
Mouse button
released
Mouse click
processed
Packetized and
encoded
Packet Received Packed Decoded
Frame displayedPacket
transmitted
Network Latency on the WAN
(i.e. 50ms)
CLICK-TO-PHOTON CAPTURES THE OVERALL LATENCY
Network Latency on the WAN
(i.e. 50ms)
Access Device
Server
CLICK-TO-PHOTON LATENCY
Packet ReceivedMouse click
processed
New Frame
renderedFrame Captured
via NVIDIA NVFBC
Frame Encoded
via NVIDIA NVENC
Frame
transmitted
Packet Decoded Application
18
CLICK TO PHOTON SIMPLIFIED
Mouse button
released
Mouse click
processed
Packetized and
encoded
Packet Received Packed Decoded
Frame displayedPacket
transmitted
Network Latency on the WAN
(i.e. 50ms)
CLICK-TO-PHOTON CAPTURES THE OVERALL LATENCY
Network Latency on the WAN
(i.e. 50ms)
Access Device
Server
CLICK-TO-PHOTON LATENCY
Network Latency
Packet ReceivedMouse click
processed
New Frame
renderedFrame Captured
via NVIDIA NVFBC
Frame Encoded
via NVIDIA NVENC
Frame
transmitted
Packet Decoded Application
19
65
185
155 165
125107
0
50
100
150
200
250
300
Local PCwith
IntegratedGPU
BlastExtremeNo GPU -
JPEG/PNG
BlastExtremeM10-1B -
JPEG/PNG
BlastExtreme No GPU -
H.264Software
BlastExtremeM10-1B -
H.264Software
BlastExtremeM10-1B -
H.264Hardware
CLICK TO PHOTON LATENCYBlast Extreme with NVENC decreases latency up to 140ms
at <1ms network latency
Lower is
better
ms
20
65
185
155 165125
107
250
170
240
160
110
0
50
100
150
200
250
300
Local PCwith
IntegratedGPU
BlastExtremeNo GPU -
JPEG/PNG
BlastExtremeM10-1B -
JPEG/PNG
BlastExtremeNo GPU -
H.264Software
BlastExtremeM10-1B -
H.264Software
BlastExtremeM10-1B -
H.264Hardware
Idle, 1 VM
Scale, 64VMs
Lower is
better
63 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency
CLICK TO PHOTON LATENCYComparing latency of single VM and at scale at <1ms network latency
ms
21
HOST CPU OFFLOADINGBlast Extreme decreases CPU utilization on the host, up to 42%
Lower is
better
63 x Tesla M10-1B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency
0
10
20
30
40
50
60
70
80
90
100
NOGPU-PCoIP GPU-PCoIP
NoGPU-JPEG GPU-JPEG
NOGPU-Blast-H.264 CPU GPU-BLAST-H.264CPU
GPU-BLAST-NVENC
0
15000
30000
45000
60000
75000
90000
22
0
10
20
30
40
50
60
70
80
90
Perc
ent
One C
PU
core
Tim
e
Time
Remoting process utilization(PCoIP_server.exe or BlastW.exe) in
Guest VM
NOGPU-PCoIP GPU-PCoIP
NoGPU-JPEG GPU-JPEG
NOGPU-Blast-H.264 CPU GPU-BLAST-H.264CPU
GPU-BLAST-NVENC
GUEST VM, REMOTING PROCESS CPU OFFLOADINGBlast Extreme decreases CPU utilization on the VM
Lower is better
63 x Tesla M10-0B VMs on a host running LoginVSI knowledge worker workload and 1 additional VM measuring latency
23
VIDEO PLAYBACKUp to 52% improved User Experience due to GRID vGPU and H.264
FPS is remoted FPS
24
VIDEO PLAYBACK
10
15
20
25
0 10 20 30 40
FPS
#VM
Average FPS for a set of Videos
JPG+vGPU
HW-H264 +vGPU
JPG-NOvGPU
SW-H264
5
105
205
305
405
505
605
705
805
0 10 20 30 40
FPS
#VM
Total FPS for a set of Videos
JPG +vGPU
HW-H264+ vGPU
JPG-NOvGPU
SW-H264
25
VIDEO PLAYBACK
0
5
10
15
20
25
0 5 10 15 20 25 30 35
CP
U-U
til (
%)
#VM
CPU-Util (%) for a set of Videos
JPG +vGPU
HW-H264+vGPU
JPG-NOvGPU
SW-H264
26
VIDEOS
27
POWERPOINT ANIMATIONThis is Side-by-Side
28
VIDEO PLAYBACK AND OFF LOADING CPUThis is Side-by-Side
29
SUMMARY
30
WINDOWS 10 IS DIFFERENT Windows 10 is Microsoft’s most graphical operating system
• Windows is differs to Windows 10
• requires more CPU resources
• Leveraged the GPU more
• NVIDIA GRID vGPU
• Improves user experience (as Microsoft intended)
• Reduces Click-to-Photon latency(snappy user interaction)
• Predictable and consistent user experience
• reduces CPU cycles to allow higher user density
6/9/2017
May 8-11 2017 | Silicon Valley
THANK YOU