performance tips for windows store apps using directx and c++

45
Performance tips for Windows Store apps using DirectX and C++ Max McMullen Principal Development Lead – Direct3D Microsoft Corporation 4-102

Upload: rance

Post on 25-Feb-2016

57 views

Category:

Documents


1 download

DESCRIPTION

Performance tips for Windows Store apps using DirectX and C++. Max McMullen Principal Development Lead – Direct3D Microsoft Corporation 4-102. Agenda. Overview Measuring rendering performance Power efficient GPU characteristics Optimizing for power efficient GPUs. Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Performance tips for Windows Store apps using DirectX and C++

Performance tips for Windows Store apps using DirectX and C++Max McMullenPrincipal Development Lead – Direct3DMicrosoft Corporation4-102

Page 2: Performance tips for Windows Store apps using DirectX and C++

Overview

Measuring rendering performance

Power efficient GPU characteristics

Optimizing for power efficient GPUs

Agenda

Page 3: Performance tips for Windows Store apps using DirectX and C++

Overview

Page 4: Performance tips for Windows Store apps using DirectX and C++

Optimizing for the Windows 8/RT OSNew form factors and platforms require new optimizations

Windows uses DirectX to get every pixel on screen

Direct3D 11.1 provides new APIs to optimize rendering

Page 5: Performance tips for Windows Store apps using DirectX and C++

Use optimized Windows 8/RT platformsAll Windows Store apps use DirectX for rendering

WWA & XAML optimized use of Direct2D and Direct3D 11.1

Direct2D and Direct2D Effects fully leverage Direct3D 11.1

But sometimes you really need to use Direct3D itself…

Page 6: Performance tips for Windows Store apps using DirectX and C++

What you should knowBasics of building a C++ Windows Store app

Direct3D fundamentals

Page 7: Performance tips for Windows Store apps using DirectX and C++

Measuring rendering performance

Page 8: Performance tips for Windows Store apps using DirectX and C++

Many useful tools for Windows performance optimization:Visual Studio Performance Profiler, Visual Studio Graphics Diagnostics, hardware partner tools…

Two primary tools used to optimize Direct3D usage in the Windows 8/RT OS:Basic: FPS/time measurement in app/microbenchmarksAdvanced: GPUView

How do you measure rendering performance?

Page 9: Performance tips for Windows Store apps using DirectX and C++

Frames per second (FPS)Quick but sometimes misleading

C++/DirectX Windows Store apps sync to the display refresh

Measure render time, not presentCall ID3D11DeviceContext::Flush instead of IDXGISwapchain::Present

Infrequent output: file output

Frequent output: look at FPSCounter.cpp in the GeometryRealization sample

Page 10: Performance tips for Windows Store apps using DirectX and C++

Demo: FPS measurement

Page 11: Performance tips for Windows Store apps using DirectX and C++

GPUViewPart of the Windows Performance Toolkit

ETW Logging of CPU and GPU work

Measures graphics performanceFPS, startup time, glitching, render time, latency

Enables detailed analysis of CPU and GPU workloads and interdependencies

Page 12: Performance tips for Windows Store apps using DirectX and C++

GPUView – Record and AnalyzeInstallx86: Windows Performance ToolkitARM: Windows Kits\8.0\Windows Performance Toolkit\Redistributables\WPTarm-arm_en-us.msi

RecordRun log.cmd to startPerform actionRun log.cmd to stop

AnalyzeData captured in merged.etl, load in GPUView

Page 13: Performance tips for Windows Store apps using DirectX and C++

GPUView - Interface

CPU Threads

Flip Queue

CPU Queues

GPU Hardware Queue

Page 14: Performance tips for Windows Store apps using DirectX and C++

GPUView Interface: GPU Hardware Queue

The GPU Hardware Queue shows command buffers rendering on the GPU.CPU Queue command buffers moved to the GPU Hardware Queue when the hardware is ready to receive more commands.

Page 15: Performance tips for Windows Store apps using DirectX and C++

Demo: GPUView

Page 16: Performance tips for Windows Store apps using DirectX and C++

Power efficient GPU characteristics

Page 17: Performance tips for Windows Store apps using DirectX and C++

What to expect with power efficient GPUsFeature level 9_1 or 9_3

Limited available bandwidth

Both immediate render and tiled render GPUs

Limited shader instruction throughput

Page 18: Performance tips for Windows Store apps using DirectX and C++

Feature Level 9.x (FL9.1, FL9.3)

Real-time render limitations generally occur before reaching these maximums

Feature Level 9.1 9.3

Texture size 2048x2048 4096x4096

Pixel shader instructions

64 arithmetic, 32 sample

512 total

Page 19: Performance tips for Windows Store apps using DirectX and C++

GPU Memory BandwidthBaseline requirement: 1.9 GB/sec benchmarked

7.5 I/O operation per screen pixel, 1366x768x32bpp@60hz

I/O Cost Operation1 Screen Fill w/Solid Color2 Screen Fill w/Texture3 Screen Fill w/Texture & Alpha Blend

Page 20: Performance tips for Windows Store apps using DirectX and C++

Immediate renderGPU

shader coresMemory bus

Graphics memory

Page 21: Performance tips for Windows Store apps using DirectX and C++

Tiled renderGPU

shader coresMemory bus

Graphics memory

Page 22: Performance tips for Windows Store apps using DirectX and C++

Tiled renderGPU

shader coresMemory bus

Graphics memory

Page 23: Performance tips for Windows Store apps using DirectX and C++

Tiled renderGPU

shader coresMemory bus

Graphics memory

Page 24: Performance tips for Windows Store apps using DirectX and C++

Shader instruction throughputFill rates on GPUs depend on a number of factorsMemory bandwidthBlend modeShader coresShader complexityEtc

Power efficient GPUs become shader throughput bound at approximately ~4 pixel shader instructions

Page 25: Performance tips for Windows Store apps using DirectX and C++

Optimizing for low power GPUs

Page 26: Performance tips for Windows Store apps using DirectX and C++

Bandwidth optimization: basicsRender opaque objects front-to-back with z-buffering

Disable alpha blending for opaque objects

Use geometry to trim large transparent areas

Page 27: Performance tips for Windows Store apps using DirectX and C++

Bandwidth optimization: compress resourcesDirect3D supports texture compression at all feature levelsBC1 4-bits/pixel for RGB formats - 6x compression ratioBC2,3 8-bits/pixel for RGBA formats - 4x compression ratio

Smaller resources also means faster downloads of your app

Page 28: Performance tips for Windows Store apps using DirectX and C++

Bandwidth optimization: quantize resourcesUse the 16 bit formats added to Direct3D 11.1:

DXGI_FORMAT_B5G6R5_UNORMDXGI_FORMAT_B5G5R5A1_UNORMDXGI_FORMAT_B4G4R4A4_UNORM

Page 29: Performance tips for Windows Store apps using DirectX and C++

Bandwidth optimization: flip presentMust use DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL

OS automatically uses “fullscreen” flips when:Swapchain buffer dimensions match the desktop resolutionSwapchain format is DXGIFMT_B8G8R8A8_UNORM*App is the only content onscreen

Buffer dimensions need to be converted correctly from device independent pixels (dips)

Just create the swapchain with zero width and height to get the right size

Page 30: Performance tips for Windows Store apps using DirectX and C++

using namespace Windows::Graphics::Display;

float ConvertDipsToPixels(float dips){ static const float dipsPerInch = 96.0f; return floor(dips*DisplayProperties::LogicalDpi/dipsPerInch+0.5f);}

Platform::Agile<Windows::UI::Core::CoreWindow> m_window;

float swapchainWidth = ConvertDipsToPixels(m_window->Bounds.Width);float swapchainHeight = ConvertDipsToPixels(m_window->Bounds.Height);

Page 31: Performance tips for Windows Store apps using DirectX and C++

Demo: Optimized flip presents

Page 32: Performance tips for Windows Store apps using DirectX and C++

Bandwidth optimization: tiled render GPUsMinimize command buffer flushesDon’t map resources in use by the GPU, use DISCARD and NO_OVERWRITE

Minimize scene flushesVisit RenderTargets only once per frameDon’t update resources in use by the GPU from the CPU, use DISCARD and NO_OVERWRITE with ID3D11DeviceContext::CopySubresourceRegion1

Use scissors when updating small portions of a RenderTarget

Page 33: Performance tips for Windows Store apps using DirectX and C++

Bandwidth optimization: tiled render GPUsNew Direct3D APIs provide hints to avoid unnecessary copies

Rendering artifacts if used incorrectly

Page 34: Performance tips for Windows Store apps using DirectX and C++

Bandwidth optimization: Discard* APIs

m_swapChain->Present(1, 0); // present the image on the display

ComPtr<ID3D11View> view; m_renderTargetView.As(&view); // get the view on the RT

m_d3dContext->DiscardView(view.Get()); // discard the view

Use ID3D11DeviceContext1::DiscardView and ID3D11DeviceContext1::DiscardResource1 to prevent unnecessary tile copies

Artifacts if used incorrectly

Page 35: Performance tips for Windows Store apps using DirectX and C++

Tiled renderGPU

shader coresMemory bus

Graphics memory

Page 36: Performance tips for Windows Store apps using DirectX and C++

Tiled renderGPU

shader coresMemory bus

Graphics memory

Page 37: Performance tips for Windows Store apps using DirectX and C++

Shader instruction throughputPower efficient GPUs have limited throughput for full precision

Minimum precision hints increase throughput when precision doesn’t matter

Specifies minimum rather than actual precisionmin16float, min16int, min10int

Don’t change precision often

20-25% improvement in practice with min16float

Page 38: Performance tips for Windows Store apps using DirectX and C++

Minimum precisionstatic const float brightThreshold = 0.5f;

Texture2D sourceTexture : register(t0);float4 DownScale3x3BrightPass(QuadVertexShaderOutput input) : SV_TARGET{ float3 brightColor = 0; // Gather 16 adjacent pixels (each bilinear sample reads a 2x2 region) brightColor = sourceTexture.Sample(linearSampler, input.tex, int2(-1,-1)).rgb; brightColor += sourceTexture.Sample(linearSampler, input.tex, int2( 1,-1)).rgb; brightColor += sourceTexture.Sample(linearSampler, input.tex, int2(-1, 1)).rgb; brightColor += sourceTexture.Sample(linearSampler, input.tex, int2( 1, 1)).rgb; brightColor /= 4.0f;

// Brightness thresholding brightColor = max(0, brightColor - brightThreshold);

return float4(brightColor, 1.0f);}

Page 39: Performance tips for Windows Store apps using DirectX and C++

Minimum precisionstatic const min16float brightThreshold = (min16float)0.5;

Texture2D<min16float4> sourceTexture : register(t0);float4 DownScale3x3BrightPass(QuadVertexShaderOutput input) : SV_TARGET{ min16float3 brightColor = 0; // Gather 16 adjacent pixels (each bilinear sample reads a 2x2 region) brightColor = sourceTexture.Sample(linearSampler, input.tex, int2(-1,-1)).rgb; brightColor += sourceTexture.Sample(linearSampler, input.tex, int2( 1,-1)).rgb; brightColor += sourceTexture.Sample(linearSampler, input.tex, int2(-1, 1)).rgb; brightColor += sourceTexture.Sample(linearSampler, input.tex, int2( 1, 1)).rgb; brightColor /= (min16float)4.0;

// Brightness thresholding brightColor = max(0, brightColor - brightThreshold);

return float4(brightColor, 1.0f);}

Page 40: Performance tips for Windows Store apps using DirectX and C++

Minimum precision – bad usagestatic const min16float brightThreshold = (min16float)0.5;

Texture2D<min16float4> sourceTexture : register(t0);float4 DownScale3x3BrightPass(QuadVertexShaderOutput input) : SV_TARGET{ min16float3 brightColor = 0; // Gather 16 adjacent pixels (each bilinear sample reads a 2x2 region) brightColor = sourceTexture.Sample(linearSampler, input.tex, int2(-1,-1)).rgb; brightColor += sourceTexture.Sample(linearSampler, input.tex, int2( 1,-1)).rgb; brightColor += sourceTexture.Sample(linearSampler, input.tex, int2(-1, 1)).rgb; brightColor += sourceTexture.Sample(linearSampler, input.tex, int2( 1, 1)).rgb; brightColor /= (min10int)4.0;

// Brightness thresholding brightColor = max(0, brightColor - brightThreshold);

return float4(brightColor, 1.0f);}

Page 41: Performance tips for Windows Store apps using DirectX and C++

Wrap-upOptimize!

Use the right tools and techniques to measure performance

Tune for power efficient GPUs’ unique performance characteristics

Direct3D 11.1 and Windows 8 provide the APIs to fully leverage power efficient GPUs

Page 42: Performance tips for Windows Store apps using DirectX and C++

Resources

Page 43: Performance tips for Windows Store apps using DirectX and C++

Build 2012 Talk: 3-113 Graphics with the Direct3D11.1 API made easyBuild 2012 Talk: 3-109 Developing a Windows Store app using C++ and DirectX

Visual Studio 2012 Remote Debugging: http://blogs.msdn.com/b/dsvc/archive/2012/10/26/windows-rt-windows-store-app-debugging.aspx

FPS Counter in GeometryRealization sample: http://code.msdn.microsoft.com/windowsapps/Geometry-Realization-963be8b7#content

GPUView: http://msdn.microsoft.com/en-us/library/windows/desktop/jj585574(v=vs.85).aspx

Direct3D11.1: http://msdn.microsoft.com/en-us/library/windows/desktop/hh404562(v=vs.85).aspx

Page 45: Performance tips for Windows Store apps using DirectX and C++

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.