“maxis-mizing darkspore game performance with … · “maxis-mizing” darkspore game...

Post on 27-Aug-2018

240 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

“MAXIS-mizing” Darkspore game performance with Intel® GPA 4.0!

Omar A Rodriguez, IntelDavid Lee Swenson, Maxis

1

Agenda

• Quick Intro to GPA 4.0

• Sneak Peek of Darkspore

• Live demo of GPA 4.0 using Darkspore and Scaleform GFX4

• Q & A

2

Intel GPA helps you analyze and tune your game for performance

• In-game analysis with System Analyzer HUD using state overrides and real-time metrics graphs

• Deep frame analysis with Frame Analyzerdown to the draw call level, incl. shaders, textures, D3D states, pixel history

• View system wide picture of CPU and GPU workload with Platform Analyzer

3

Here’s how to use Intel GPA 4.0

4

Target Game with HUD

If CPU bound, use Platform Analyzer or other CPU profiling

tools like VTune

If GPU bound, use Frame Analyzer

Improved Gaming Experience on Mainstream Graphics with SandyBridge

• 1280 x 720

• 30 FPS

• Medium settings

5

System Analyzer HUD

6

System Analyzer HUD

• Frames per second

• DirectX* version

• Vsync on/off

• Resolution

• 4 metrics graphs configurable from GPA Monitor

• Fast rendering with low overhead

7

Configure HUD from the GPA Monitor

• Hot keys & GPU, CPU, DirectX metrics shown on HUD are fully configurable from the GPA Monitor Preferences

8

Capture “Interesting” Frames

• Select metric

• Decide the condition based on the selected metric

• Select type of capture: frame, trace, or both

• Choose what to do with the application after the capture

9

State Overrides Accessible from HUD

• Use state overrides to find high-level bottlenecks

• Example: Disable Draw Calls

– CPU or GPU bound

• Example: 1x1 Scissor Rect

– Helpful to determine if your game is pixel shader bound

10

Darkspore and GPA Demo by David Lee Swenson

11

12

Darkspore (Marketing Pitch)

• Online Action RPG

• Play a Squad of 3 Heroes

• Spore Editor

• 100 Heroes to Unlock

• 4 player Co-op

• 2v2 PVP

13

Darkspore Game Demo

14

Darkspore Rendering

• Deferred Pass First Target

• Normal (RGB) + Gloss (A)

15

Darkspore Rendering

• Deferred Pass Second Target

• Depth (R*256+G) + SpecPow (B) + ToonId (A)

16

Darkspore Rendering

• Lighting pass

• Diffuse (RGB) + Specular (A)

17

Darkspore Rendering

• Final Pass

• Color + Glow + Post FX + Particles + UI =

18

Darkspore Rendering (Final Frame)

19

Darkspore and GPA

Without GPA

20

Darkspore and GPA

21

Darkspore and GPA

Start

22

Darkspore and GPA

• Tall bars are draw calls taking the most time

• Wide bars spend more time in the pixel shader

• In Darkspore we saw a lot of big bars…

23

Darkspore and GPA

Blood Decals in the Deferred and Final pass

24

Darkspore and GPA

Look at the shader

25

Darkspore and GPA

Finished?

26

Darkspore and GPA

Nope! GPA shows a 30% and 24% improvement

27

Darkspore and GPA

Blood decals are volumes that write more pixels

28

Darkspore and GPA

Use the stencil to kill pixels in the final pass

29

Darkspore and GPA

Simple Frame Rate Check

30

Darkspore and GPA

Changing states in GPA

31

Darkspore and GPA

Ok, finished!

…maybe

32

Other Optimizations for Darkspore

Trees all had roots below the ground!

33

Other Optimizations for Darkspore

• Terrain mixes 4 textures together per pass, but large sections only really need one.

34

Other Optimizations for Darkspore

• Creatures were really dense and burning quads.

35

Other Optimizations for Darkspore

• View space normals took only two channels but weren’t worth the cost.

36

37

Thanks for listening…

Platform Analyzer

38

Working with leading middleware providers for complete compatibility

39

#1 Video Game UI Solution

The company’s latest release, GFx

4.0, which includes an all new multi-threaded renderer and mobile compatibility, has been developed in conjunction with Intel to allow detailed GPA profiling

Havok added instrumentation on latest version of Havok Physics

Scaleform was used to create Darkspore UI and tuned using Intel GPA

40

41

Instrumenting Code for Platform Analyzer

#include <ittnotify.h>

void System::DoWork( … )

{

__itt_begin_task( “System::DoWork” );

// do work

__itt_end_task();

}

42

43

• Per frame visualization of middleware activity

• Tasks associated with Task Groups

44

• Overview statistics for the current trace displayed

• Tabs display information for selectedtasks

• Relations, such as dependencies,between tasks shown in hierarchy

• CPU/GPU and DirectX* metrics displayedper selected frame

45

• Tasks timeline displays instrumented code as it is executed over time

• CPU/GPU Frames and DirectX* calls displayed by default

Pre-instrumented components

• DirectX* interceptor used by Intel® GPA

• Threading Building Blocks

• Scaleform and Havok first two…many more middleware to come pre-instrumented

46

Where do I get Intel® GPA 4.0?http://www.intel.com/software/gpa

• Join the Visual Adrenaline program

• Come see us at the Intel Booth at Expo, North Hall #1212

47

www.intel.com/software/gdcMonetizing Games on Devices: Intel’s AppUp Business Room 302 Wed 4:30-5:30

This is your brain on game development Business Thu 9:00-10:00

Adaptive Order Independent Transparency Programming Thu 1:30-2:30

Dynamic resolution rendering Programming Room 110 Fri 9:30-10:30

Increase Your FPS with CPU Onload Programming Room 110 Fri 11:00-12:00

Hotspots, Flops and uOps Programming Room 123 Fri 2:00-3:00

PC Gaming’s Global Value Propositions Business Fri 2:00-3:00

Delivering Demand-Based Worlds with Intel® SSDs Programming Room 110 Fri 3:30-4:30

48

Legal DisclaimersINFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH

PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT.

Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.

Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights.

Intel may make changes to specifications, product descriptions, and plans at any time, without notice.

The Intel processor and/or chipset products referenced in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

All dates provided are subject to change without notice. All dates specified are target dates, are provided for planning purposes only and are subject to change.

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

* Other names and brands may be claimed as the property of others.

Copyright © 2010, Intel Corporation. All rights reserved.

Optimization NoticeOptimization Notice

Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and specific microprocessors they implicate, please refer to the “Intel® Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel® compiler products are morehighly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel® compiler products offer optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel microprocessors.

Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel®

Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.

While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or library; pleaselet us know if you find we do not.

Notice revision #20101101

Backup

51

64-bit games support

SandyBridge GPU Metrics

Platform View

Easy game launchConfigurable In-Game HUD

Full DirectX 11 support

New single system

workflow

52

New Intel® GPA 4.0 Features

PC Volumes will Dwarf Console Volumes by 2014• Total PC Gaming TAM for DT & NB (GPU+Proc/Gfx) represents the largest opportunity

– 550mu 2009 growing to 913mu by 2014• Overall Console TAM growing slowly but being rapidly outpaced by PC TAM

– 236mu Consoles in 2009 growing to 318mu by 2014

0

100

200

300

400

500

600

700

800

900

1000

2009 2010 2011 2012 2013 2014

(Mu) WW Consumer PC TAM

PC Gaming (DX9-11) vs 6th & 7th Gen Console

PC Gaming

PS2

Wii/PS3/360

NextGen Console

Potential for

PCs in 2009 are 2.32x bigger than 6th/7th Gen Consoles combined. By 2014 it becomes 3.08x bigger

53

Source: PCGA 2010 Horizon’s Hardware Report (IDC)

54

New SandyBridge metrics cover all GPU stages

Array of Unified

Execution Units

EU

Sampler

EU EU EU

EU EU EU EU

EU EU EU EU

DataPort

PixelOps

Cache

Vertex

Fetch

Vertex

Shader

Geometry

Shader

Clipper

Setup

Windower

Two types:D3D Pipeline StatisticsGPU Performance Statistics

Examples:GPU Active Cycles - EU Cores ArrayVS Invocations - Vertex ShaderPost-Clip Primitives – ClipperPrim. Setup Active – SetupPixel Rendered – PixelOpsPost-Filter Texels - Sampler

Metrics Available in GPA 4.0

55

GPU DurationVS DurationPS DurationGS Duration

CPU n LoadAggregated CPU LoadTarget App CPU LoadFrame TimeFrames Per SecondApplication TimeFrame NumberGPU Frame Number

Draw CallsState ChangesVB LocksVB Lock TimeIB LocksIB Lock TimeLocks

Lock TimeState Block AppliesState Block CapturesRT ChangesColor FillsSurface UpdatesStretch Rects

Surface LocksVolume LocksSurface Lock TimeVolume Lock TimeTexture1D MapsTexture2D MapsTexture3D Maps

Buffer MapsZ/Stencil ClearsRT ClearsResource CopySubresource CopySubresource UpdateTexture Creations

Surface CreationsIB CreationsVB CreationsBuffer CreationsResource CreationsRT Data Gets

DirectX*

CPU/App Graphics HW

Extended Metrics Available on Intel HW in GPA 4.0

56

CPU n LoadAggregated CPU LoadTarget App CPU LoadFrame TimeFrames Per SecondApplication TimeFrame NumberGPU Frame Number

Draw CallsState ChangesVB LocksVB Lock TimeIB LocksIB Lock TimeLocks

Lock TimeState Block AppliesState Block CapturesRT ChangesColor FillsSurface UpdatesStretch Rects

Surface LocksVolume LocksSurface Lock TimeVolume Lock TimeTexture1D MapsTexture2D MapsTexture3D Maps

Buffer MapsZ/Stencil ClearsRT ClearsResource CopySubresource CopySubresource UpdateTexture Creations

Surface CreationsIB CreationsVB CreationsBuffer CreationsResource CreationsRT Data Gets

GPU Duration GPU EUs ActiveGPU EUs StalledGPU FrequencyEUs Active in GSEUs Active in PSEUs Active in VSEUs Stalled in GSEUs Stalled in PSEUs Stalled in VS

Primitive CountVertex CountVS DurationVS Invocations GS DurationGS InvocationsPost-GS PrimitivesPS DurationPS InvocationsPS Killed Pixels

Clipper ActiveClipper InvocationsPost-Clip PrimitivesPrimitive Setup ActiveBlended PixelsPixels RenderedGPU Texture ReadsPost-Filter TexelsSampler BusySampler Stalled

DirectX*

CPU/App Graphics HW

top related