eecs 487: interactive computer graphicssugih/courses/eecs487/... · 2015-10-30 · computer...

9
EECS 487: Interactive Computer Graphics Lecture 21: Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan Console Games Why do games look and perform so much better on consoles than on PCs with equivalent specs? consoles are closed platforms with long shelf live, programmers can program the hardware directly doing away with serialized call-preparation bottleneck allows for better utilization of multiple CPU cores Motivations for low-level graphics APIs: faster graphics from reduced API overhead “close-to-metal,” direct control of the GPU [Anandtech:Smith] Low-Overhead, Low-Level API Whence the high overhead of graphics API? hardware abstractions hide underlying platform diversity, providing programming convenience and flexibility: graphics vs. system programming “newby-friendly” safety nets of error checking and state validation Code gurus (the ones writing game engines and renderers) would rather have performance than hand-holding [Anandtech:Smith] Low-Overhead, Low-Level API Why is this an issue now? GPU performance is far outstripping CPU due to the massively parallel nature of graphics rendering: API overhead at the CPU is throttling GPU performance serialized command assembly prior to issuing draw calls restricts utilization of multi-core CPU instancing and batching objects into a smaller number of draw calls can only help so much Another advantage: easier porting of console games to PCs? [Anandtech:Smith]

Upload: others

Post on 13-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

EECS487:InteractiveComputerGraphics

Lecture21:•  OverviewofLow-levelGraphicsAPI

Metal,Direct3D12,Vulkan

ConsoleGames

WhydogameslookandperformsomuchbetteronconsolesthanonPCswithequivalentspecs?

•  consolesareclosedplatformswithlongshelflive,programmerscanprogramthehardwaredirectly

•  doingawaywithserializedcall-preparationbottleneckallowsforbetterutilizationofmultipleCPUcores

Motivationsforlow-levelgraphicsAPIs:

•  fastergraphicsfromreducedAPIoverhead

•  “close-to-metal,”directcontroloftheGPU

[Anandtech:Smith]

Low-Overhead,Low-LevelAPI

WhencethehighoverheadofgraphicsAPI?•  hardwareabstractionshideunderlyingplatformdiversity,providingprogrammingconvenienceandflexibility:

graphicsvs.systemprogramming

•  “newby-friendly”safetynetsoferrorcheckingandstatevalidation

Codegurus(theoneswritinggameenginesand

renderers)wouldratherhaveperformancethan

hand-holding

[Anandtech:Smith]

Low-Overhead,Low-LevelAPI

Whyisthisanissuenow?• GPUperformanceisfaroutstrippingCPUduetothemassivelyparallelnatureofgraphicsrendering:API

overheadattheCPUisthrottlingGPUperformance

•  serializedcommandassemblypriortoissuingdrawcallsrestrictsutilizationofmulti-coreCPU

•  instancingandbatchingobjectsintoasmallernumberofdrawcallscanonlyhelpsomuch

Anotheradvantage:easierportingofconsole

gamestoPCs?

[Anandtech:Smith]

Page 2: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

HowtoImprovePerformance?

1. Commandbuffer

a.  reduceddrawcalloverhead

b. bettercommandsubmissionmulti-threading

2. Baked-instates

a.  pipelinestateobjects

b.  resourcebinding

3. Pre-compiledshaders

BiggestSourceofCPUOverhead

Assemblyofcommandstreampriortoissuingadrawcall,e.g.,thegatheringtogetherof

•  linemode

•  polygonmode

•  flatorsmoothshading

•  textureobjectstouse• whichvertexarrayobjects

• whichvertexbufferobjects

•  settingvertexattributepointers•  argumentstodrawcalls

Donebydriver

[Anandtech:Smith]

Queue

Single-threadedJobAssembly

GPUFront-End

cmd

driver

cmd cmd

cmd

[nvidia:Foley]

Problem:single-threadedjobassemblybyCPUisoftennot

fastenoughtokeepGPUbusyCPUThread

CPUThread

Utilizesatmost

2cores

commandbuffer

CommandBuffer/List

Developersself-assemblecommandstreamintoacommandbuffer(Vulkan)orcommandlist(D3D12)

Eachcommandbufferisself-contained,somultiple

bufferscanbeassembledinparallel,eachonitsownthread/corewithoutextraconcurrencywork

Finalsubmissionofthecommandbuffersviathe

commandqueueisstillserial,butishighlyefficient

[Microsoft:Sandy]

Metal D3D12 Vulkan

MTLCommandBuffer() ID3D12CommandList() VkCmdBuffer()

MTLCommandQueue() ID3D12CommandQueue() VkCmdQueue()

Page 3: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

Queue

Multi-ThreadedJobAssembly

GPUFront-End

CPUThread cmd cmd

CPUThread cmd cmd

CPUThread

CPUThread cmd cmd

cmd cmd

cmd

cmd

cmd cmd cmd cmd cmd

[nvidia:Foley]

CPUThread

CommandBufferRe-Use

InVulkanacommandbuffercanbere-used•  a“top-level”commandbuffercan“call”

second-levelcommandbuffers

InD3D12acommandlist“recorded”asabundle

canbesubmittedoncetotheGPUbutexecutedmultipletimes,withdifferentresources,e.g.,

differenttextures(muchlikeOpenGL’sretainedmodedisplaylist)

Metalcurrentlydoesn’tsupportcommandbuffer

re-use

[nvidia:Foley;Microsoft:Sandy]

Direct

3D11

Direct

3D1

2

3DMark–Multi-threadScalingand50%BetterCPUUtilization

UserModedriverworkloaddistributedacrossmultiplethreads/cores

AppLogic

Single-threadedbottleneckreduced

Windowskerneltimereduced

Direct3Ddrivertimereduced

KernelModedrivertimetotallyremoved

[Microsoft:Sandy;Anandtech:Smith]

Imagination’sGnomeHordeNoinstancing

Re-usecommandbuffersforeachtile:

300tiles,13,500draws/frame,30fps,lightCPUusageOver400,000drawcalls/sec,eachwithadifferenttransformationwithmanydifferentmaterials,textures,

blendmodes,andshaders[Imagination:Smith]

Page 4: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

FastMovingCamera

appcpuusage

systemcpuusage

[Imagination:Smith]

Commandbuffersneedtoberegeneratedveryfrequently

single-threadedCPUbottleneckcannot

feedGPUfastenough�lowFPS

Commandbufferassemblydistributedto

multiplecores

couldrunCPUatlowerfrequency

PowerEfficiency

WhenCPUandGPUhavetosharepowerandthermalbudget

•  lowerCPUusageallowsmorepowerandthermalbudgettogotoGPU

•  spreadingworkloadacrossmoreCPUcoresalloweachtorunatalowerclockspeed,furtherreducingpowerusageascomparedtorunningasinglethreadatahighfrequency

(tofeedtheGPU)

[Intel;Lauritzen;Anandtech:Barrett&Smith]

D3D12:CPUuses1/3thepowerofGPU,allowmoreGPUprocessing,renderingspedupbyover70%

D3D12:Ormaintainthesamerenderingperformanceat50%powerusage

D3D11:CPUusesasmuchpowerasGPU

[Intel:Lauritzen]

50,000Asteroids(draws/frame)

PipelineStateObjects(PSOs)

Problem:draw-timevalidationofshaderstatesdelayshardwaresetupandreducesthenumberof

drawcallsperframe

Solution:bake(compileandvalidate)pipelinestates

intoPSOsthatarefinalizedoncreation,switching

PSOshaveloweroverheadthancomputinghardwarestateonthefly

[Microsoft:Sandy]

Page 5: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

PipelineStateObjectsContainsallstaticstateforentire3Dpipeline• shaders,vertexattributeformats,rasterization,colorblend,depthstencil,etc.

Createdoutsideoftheperformancecriticalpaths

PSOcanbecachedforre-use,evensavedtodisk/

cloudforre-useacrossappruns

[nvidia:Daniell;AMD]

PSO

WhatDoesn’tGointoaPSO?Resourcebindings•  theactualvertex,index,constantbuffers•  textures,samplers,etc.

Fixed-functionstatesthatdonotcauseshader

recompilation:viewport,colorblendconstants,polygonoffset,scissor,stencilmasksandrefs,etc.

[nvidia:Foley]

GPUStateVectorPipelineStateObject

Textures Buffers

Samplers

BindingTables

DescriptorTablesandPool/HeapProblem:tousedifferentresources,e.g.,texture,anappmustbindandrebindthemtofixedandlimited

bindslots(descriptors)andissuemultipledrawcalls

[Microsoft:Sandy;nvidia:Foley]

GPUStateVector

PipelineStateObject

TexturesBuffers

Samplers

BindingTables

DescriptorTableandHeap/PoolSolution:pre-writemultiplesetsofdescriptorstodescriptorheap;changingresourcessimplyswitches

descriptorsetsalreadyresidentinGPUmemory

[nvidia:Foley;Microsoft:Sandy]

GPUStateVector

PipelineStateObject

Textures

Buffers

DescriptorTables

Samplers

RootTable

Page 6: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

GPUMemoryManagement

Withhigh-levelAPI,topassdatafromapptoGPU,firstallocateadriver-managedbufferand

copythedatabeforepassingthedatatotheshader⇒CPUoverhead

Withlow-levelAPI,adevelopersimplymapsthe

GPUmemoryaddressandwritestothatmemorylocationdirectly,noCPUintervention

[Imagination:Smith]

Pre-CompiledShader

Vulkan:•  pre-compilesshadersintoacommonintermediaterepresentation

•  providesomeIPprotection,developerscandistributeshadersinacompiledintermediaterepresentationinsteadofinsource

•  pre-compiledshadersalsospeedupdrawcalls

Metalalsopre-compilesshaders

[Anandtech:Smith]

[Khronos]

andothers

opensource

GameEnginesotherlanguages,e.g.,C++ShadingLanguage

StandardPortableIntermediateRepresentation:coreinVulkan

VulkanShaderProgramming

otherIR,e.g.,LLVM

GLSLtoSPIR-Vcompiler

opensourcetranslator

MorePredictablePerformance

Previously:appsubmitsadrawcall,mapsabuffer,etc.

Drivermight(GPUdependent):

•  compileshaders

•  insertsynchronizationfencesintoGPUschedule•  flushcaches•  allocatememory

Withlow-levelAPIalltheabovemustbedonebythe

appitself,butdriverperformanceacrossvendorsbecomesmorepredictable

[nvidia:Foley]

Page 7: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

WhyVulkanisNotforBeginners

Musthandlemulti-threadingandconcurrency/synchronization

Mustmanagememoryallocationandusage

TheseareoptionalinOpenGL,butmandatory

inVulkan

SummaryofFeatures

[nvidia:Foley;Anandtech:Smith]

Tech Metal Direct3D12 Vulkan

commandbuffer ✔ ✔ ✔

pipelinestateobjects ✔ ✔✔

descriptortable ✗ ✔✔

tile-basedrenderpass ✔ ✗ ✔

multi-adapter ✗ ✔ ✔?

VulkanandD3D12:• bothsimilartoMantletostartwith• Mantlesupportsmulti-GPU

• notaslowlevel,tobecross-vendorandcross-platform

Tile-basedArchitectures

“MobileGPU”usuallymeans“tile-basedGPU”• mostAndroidandalliOSdevicesusetile-basedrendering

•  VulkanandMetalhavesupportfortile-basedarchitecture,butnotDirect3D 12

• tilingreducesuseofexpensiveoff-chipmemorybandwidth

[Google:Hall;Imagination:Sommefeldt]

Immediate-ModeRendering

Fragmentshading,includingtexturesampling,performedevenonfragmentsthatwilleventually

failthedepthtest• requiresaccessingoff-chipmemory

•  inefficientuseofoff-chipbandwidth

[Imagination:Sommefeldt;Merry]

Page 8: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

Tile-basedArchitectures

Tile-basedrenderingsplitsframebufferupintotiles(e.g.,16×16or32×32pixels)andsortalltrianglesontileusingon-chipstoragebeforefragmentshading

[Merry]

Multi-AdapterSupport

PCscancontainmultiplegraphicscards

Appscanenumerategraphicscards

•  cancreateadeviceabstractionforeach

SomegraphicscardshavemultipleGPUs

• eachwithitsownenginesandmemory

Appsshouldbeabletoassignworkto

anyGPUonanygraphicscard• createqueuesonanyengineandsubmitcommandbuffers

• allocateresourcesinmemoryassociatedwithanyGPU

[Microsoft:Boyd]

Multi-AdapterSupport

Options:• Alternate-framerendering(AFR):framepacingbecomesanissueiftheGPUsareofdifferentperformance

• Split-framerendering

• Worksharingofindividualframes

D3D12ExplicitMulti-Adapter(EMA)modeallows

exchangeofmultipledatatypesbetweenGPUs,beyondjustfinished,renderedimages

ButtransferringdataoverPCIebusisslowandwith

highlatency!

[Anandtech:Smith]

FeatureSets/Levels

Hardwarefeaturescoping•  canbedefinedfordifferentplatformsorversionsoftheAPI

•  allfeatureslistedinasetmustbesupported

•  developerscandevelopagainstFeatureSets•  featuresenabledatdevicecreationtime

[Microsoft:Sandy;Anandtech:Smith]

Apple Apple?Imagination?Khronos?Lockedout?

Page 9: EECS 487: Interactive Computer Graphicssugih/courses/eecs487/... · 2015-10-30 · Computer Graphics Lecture 21: • Overview of Low-level Graphics API Metal, Direct3D 12, Vulkan

ReferencesSmith,R.,“MicrosoftAnnouncesDirectX12,”Anandtech,Mar.24,2014Smith,R.,“UnderstandingAMDMantle,”Anandtech,Sep.26,2013Smith,R.,“SomeThoughtsonApple’sMetalAPI,”Anandtech,Jun.3,2014Smith,R.,“KhronosAnnouncesNextGenerationOpenGLInitiative,”Anandtech,Aug.11,2014Chester,B.,“ComparingOpenGLEStoMetaloniOS,”Anandtech,Jun.15,2015Sandy,M.,“DirectX12,”DirectXDeveloperBlog,Mar.20,2014Lauritzen,A.,“DirectX12onIntel,”Aug.11,2014Yeung,A.,“DirectX12–LookingbackatGDC2015,”Mar.9,2015Langley,B.,“Windows10 andDirectX12Released!,”DirectXDeveloperBlog,Jul.29,2015Smith,R.,“MicrosoftDetailsDirect3D11.3&12NewRenderingFeatures,”Anandtech,Sep.18,2014Smith,R.,“TheDirectX12PerformancePreview,”Anandtech,Feb.6,2015Smith,R.andCutress,I.,“ExploringDirectX12:3DMarkAPIOverheadFeatureTest,”Anandtech,Mar.27,2015Smith,R.,“NextGenerationOpenGLBecomesVulkan,”Anandtech,Mar.3,2015Smith,A.,“TryingoutthenewVulkangraphicsAPIonPowerVRGPUs,”ImaginationPowerVRGraphicsBlog,Mar.3,2015Smith,A.,“GnomespersecondinVulkanandOpenGLES,”ImaginationPowerVRGraphicsBlog,Aug.10,2015Foley,T.,“Next-GenerationGraphicsAPIs:SimilaritiesandDifferences,”ACMSIGGRAPH2015Sellers,G.,“AWhirlwindTourofVulkan,”ACMSIGGRAPH2015Hall,J.,“VulkanonAndroid,”ACMSIGGRAPH2015Hall,J.,“UsingNext-GenerationAPIsonMobileGPUs,”ACMSIGGRAPH2015Daniell,P.,“VulkanonNVIDIAGPUs,”ACMSIGGRAPH2015Boyd,C.,“Direct3D12,”ACMSIGGRAPH2015Yeung,A.,“DirectX12Multiadapter,”DirectXDeveloperBlog,Apr.30,2015Smith,R.,“GeForce+Radeon:PreviewingDirectX12 Multi-Adapter,”Anandtech,Oct.26,2015Merry,B.,“PerformanceTuningforTile-BasedArchitecture,”OpenGLInsights,eds.Cozzi,P.andRiccio,C.,2012