in the trenches optimizing ue4 for intel
TRANSCRIPT
Jeff Rous – Intel
Niklas Smedberg – Epic Games
In The Trenches: Optimizing UE4 For Intel
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
Legal
Copyright © 2016 Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice.
All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user.
Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.
Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.Intel.com/performance
Iris™ graphics is available on select systems. Consult your system manufacturer.
Intel, Intel Inside, the Intel logo, Intel Core and Iris are trademarks of Intel Corporation in the United States and other countries.
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others.
Agenda
Rationale
How We Measured
Common Pain Points
Shader Optimizations
Optimizing for DX12
Android x86/x64 Support
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 4
Why Work Together?
Benefits all games that use the engine
UE4 runs on more hardware
Intel is 18% GPU share as of last Steam survey
Optimizations help everyone – high end to phone
Common goals
Leading edge APIs like DX12 are going to power tomorrow’s games
Android is a large market and key for Epic and Intel
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 5
Intel® HD Graphics: Roadmap
Sandy Bridge
Intel® 2nd Gen Core™ Processor
• 32nm
• Feature Level 10.1
• Up to 12 EUs
2011
Ivy Bridge
Intel® 3nd Gen Core™ Processor
• 22nm
• Feature Level 11.0
• Up to 16EUs
2012
Haswell
Intel® 4nd Gen Core™ Processor
• Feature Level 11.1• DX Extensions
• GT3 (40 EUs)• EDRAM• Iris Pro™, Iris™
brands
2013
Broadwell
Intel® 5nd Gen Core™ Processor
• 14nm
• Feature Level 11.2
• Up to 48 EUs
2014
Skylake
Intel® 6th Gen Core™ Processor
• Feature Level 12.0
• GT4 (72 EUs)• GT3e 15/28W• DX12 HW
2015-16
Up to 30X faster graphics over last 5 years
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 6
Intel® HD Graphics: EDRAM
Basic facts
Located on the same package with CPU 64-128MB Bandwidth – 50 GB/Sec each way
(100BGB/sec total BW) Acts as 4th level $ Just works: no API required to use and take
advantage
Bandwidth Saving
Increasing compute requires more bandwidth
EDRAM helps to reduce BW consumption and improve EU efficiency
Just works but efficiency can be improved by re-using frame data
CPU Package
Intel 6rd Gen Core™ chip
CPU Core
CPU Core
CPU Core
Ring-bus
CPUCore
LL
$SystemMemory
Gfx Core
EDRAM
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 7
How We Measured – Intel GPA
Use ToggleDrawEvents command
Frame debugging and live mode
Experiment!
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 8
How We Measured
ProfileGPU command
Stat commands
Windows Performance Analyzer
Intel Extreme Tuning Utility
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 9
Intel Pain Points – Memory Bandwidth
Memory bandwidth is at a premium with integrated graphics
Gbuffers are memory hungry. UE4 is configurable where you can change the format, eliminate, or even combine channels. Scaling the resolution of gbuffersis good to a point.
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 10
Intel Pain Points – Dense Geometry
Sub pixel or very dense mesh vertex shader execution can’t be covered by pixel shader execution leading to hardware starving. Use LOD where possible.
Clipper can get bottlenecked in the worst cases. Use frustum culling on bounding boxes at the very least. Occlusion culling for hidden objects.
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 11
A Word About Power
Intel graphics typically in low power systems.
Less CPU usage means more graphics.
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 12
Shaders – Local Memory
64 byte cache lines benefit from loop unrolling a great deal.
Avoid small loads in tight loops
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 13
Shaders – Branching and Sampling
Using lots of temporaries can starve the hardware.
Branching is expensive if the loads are inside the conditional blocks.
Group the loads as early in the shader as possible to help cover latency.
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 14
Demo - DX12 In Engine Metrics
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 15
DX12 Performance – Fast Clear
Specify optional D3D12_CLEAR_COLOR when calling CreateCommittedResource
Intel hardware has fast clear path for 1 bit per pixel clear values eg. (1,0,1,0)
When clearing, use the up front specified color for maximum performance.
~9% performance gain on Elemental Demo on DX12!
In the engine today
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 16
DX12 Performance – Root Signature
Blueprint of resources available
Root constants
Root descriptors
Descriptor tables
Constants that sit directly in root are copied to each invocation of the shader (pushed) rather than read from memory when used (pulled)
Can significantly speed up shader execution
Automatically handled by driver in DX11
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 17
Video - GPA Live Metrics on Android
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 18
Android x86/x64 Support
Native apps reduce CPU load, startup times and power consumption
Supported in UE4 today through editor menu
Requires source build
Package as fat or separated APKs
Popular toolchains support x86/x64
Intel INDE
Google NDK
Nvidia CodeWorks
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 19
OpenGL ES 3.1 + Android Extension Pack
Supported on latest Intel tablets (Acer Predator 8, Lenovo Yoga Tab 3)
Enabled in UE4 for highest end mobile visuals
Runs with deferred renderer
ASTC textures
PC features are now on mobile
Compute shaders
Indirect drawing
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 20
And Announcing…Fast ASTC compression
Next gen format (OpenGL ES, Vulkan)
Very good compression on RGB/RGBA for variety of block sizes
UE4 is adding support for Intel’s fast texture compressor for ASTC
44x speed improvement
Quality comparable to ARM compressor
UE4 uses Intel’s BC6H/BC7 compressors already
Aiming for 4.12 release
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 21
ASTC Quality Comparison
Zoomed in portion of a 2048x2048 normal map
Original: 12 MB ETC1: 2 MB ASTC 6x6: 1.8 MB
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 22
Wrap up
Test on Intel hardware early. UE4 is powerful but you can easily bring down a high end discrete card. With proper optimization UE4 runs really well on Intel hardware.
Take advantage of scaling features in UE4 – Epic has done a lot of work to support lower end hardware.
UE4 is mobile ready – Take advantage of built-in Android x86/x64 and OpenGL ES 3.1 support in your games for better performance and visuals.
Intel Software – Developer Relations DivisionCopyright © 2016, Intel Corporation. All rights reserved. * Other names and brands may be
claimed as the property of others. 24
Links
Intel Developer Zone (software.intel.com)
Unreal Engine 4 (unrealengine.com)
Intel GPA (software.intel.com/en-us/gpa)
ISPC Texture Compressor sample (software.intel.com/en-us/articles/fast-ispc-texture-compressor-update)
Using Android x86 on UE4 (software.intel.com/en-us/articles/Unreal-Engine-4-with-x86-Support)
Intel Software Occlusion Culling sample (software.intel.com/en-us/articles/software-occlusion-culling)