hc-4015, an overview of the hsa system architecture requirements, by paul blinzer
DESCRIPTION
Presentation HC-4015 by Paul Blinzer at the AMD Developer Summit (APU13) November 11-13, 2013.TRANSCRIPT
1 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
THE HSA SYSTEM ARCHITECTURE REQUIREMENTS – AN OVERVIEW PAUL BLINZER, FELLOW, HSA SYSTEM SOFTWARE, AMD SYSTEM ARCHITECTURE WORKGROUP CHAIR, HSA FOUNDATION
2 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
AGENDA
! What is the HSA FoundaKon?
! The System Architecture Workgroup and its goals
! What defines HSA plaVorms and components?
! The Shared Virtual Memory requirements
! The HSA Memory Model Requirements
! The HSA Queuing Architecture
! Some other requirements set by the System Architecture specificaKon
! Where to find further informaKon
! Q & A
3 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
WHAT IS THE HSA FOUNDATION?
! The HSA FoundaKon is a not-‐for-‐profit consorKum of SOC and SOC IP vendors, OEMs, academia, OSVs and ISVs defining a consistent heterogeneous plaVorm architecture to make it dramaKcally easier to program heterogeneous parallel devices ! It spans mulKple host plaVorm architectures and programmable data parallel components (e.g. CPU: x86, ARM, MIPS, … device types: GPUs, DSPs, …) to work collaboraKvely within the same HSA system architecture
! It defines a set of specificaKons that define HW & SW plaVorm requirements to enable applicaKons to target the feature set from high level languages and APIs
! It’s not a replacement to e.g. OpenCL but complementary to it, defining the system level properKes “below the API”, leveraged by applicaKon-‐ and system soiware
! The System Architecture specificaKon defines the required component and plaVorm features for HSA compliant components
! This presentaKon is an overview of the current System Architecture definiKons and does not represent a complete or “final” state ! that one is the specificaKon itself when available ☺
" This is the short version…
Platform(Software)System
ArchitectureSpecification
Programmer’sReferenceManual
SystemRuntime
Specification
ConformanceTools
4 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
THE SYSTEM ARCHITECTURE WORKGROUP OF THE HSA FOUNDATION
" Who ParKcipates and what are the goals? " The workgroup membership spans a wide variety of IP and plaVorm architecture owners
‒ Several host plaVorm architectures are targeted
" The specificaKons define a common set of plaVorm properKes that provide a dependable hardware and system foundaKon for applicaKon soiware, libraries and runKmes
" The goal is to eliminate “weak points” in the system soiware-‐ and hardware architecture of tradiKonal plaVorms that lead to unnecessary overhead in the operaKons of data parallel workloads
" The main deliverables are: ‒ Well-‐defined, consistent and dependable memory model all HSA agents operate in ‒ Share access to process virtual memory between HSA agents (“ptr-‐is-‐ptr”) ‒ Low-‐latency workload dispatch contained in user-‐mode queues ‒ Scalability across a wide range of plaVorms ‒ These properKes are leveraged in the “HSA Programmer’s Reference”, HSAIL and HSA RunKme specificaKons
5 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
WHAT DEFINES HSA PLATFORMS AND COMPONENTS?
" In short, an HSA compaKble plaVorm consists of “HSA agents” (hardware components that parKcipate in the HSA memory model) adhering to the various system architecture requirements
" Each HSA agent adheres to the same queuing & dispatch mechanics, low-‐latency synchronizaKon primiKves, memory coherence and data visibility (memory model) requirements ‒ Defined mainly in the “(Soiware) System Architecture” specificaKon
‒ The HSAIL and “Programmer’s Reference Manual” specificaKons define the soiware execuKon model
‒ Architected mechanisms to enqueue and dispatch workloads from one HSA agent queue to another eliminate the need to use the host CPU for these purposes for a lot of scenarios
‒ Architected infrastructure allows exchanging data with non-‐HSA compliant components in a plaVorm
‒ Fundamental data types are naturally aligned
6 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
WHAT DEFINES HSA PLATFORMS AND COMPONENTS?
Proper&es Small Machine Model Large Machine Model PlaVorm targets embedded or personal device space (controllers,
smartphones, etc.) PC, workstaKon, cloud Server, etc running more demanding workloads
NaKve pointer size 32bit 64bit (+ 32bit ptr if 32bit processes are supported)
FloaKng point size Half (FP16*), Single (FP32) precision Half (FP16*), Single (FP32), Double (FP64) precision
Atomic ops size 32bit 32bit, 64bit
‒ There are two different machine models (“small” and “large”) that target different funcKonality levels ‒ It takes into account different feature requirements for different plaVorm environments ‒ In all cases, the same HSA applicaKon programming model is used to target HSA agents and provides the same power–efficient and low-‐latency dispatch mechanisms, synchronizaKon primiKves and SW programming model
‒ ApplicaKons wriren to target HSA small model machines will generally work on large model machines, too ‒ If the large model plaVorm and host OperaKng System provides a 32bit process environment
*min. Load and store on memory
7 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
THE SHARED PROCESS VIRTUAL ADDRESS SPACE REQUIREMENTS(1)
" Each HSA agent adheres to the same user process address space view as the host CPU ‒ HSA operates in a “flat” virtual address space, using 64bit & 32bit ptrs depending on applicaKon/machine model
‒ A pointer value references the same memory for every HSA agent ‒ An HSA agent can “walk” or update linked data structures directly without any assistance from a host CPU
" The process address view is established by the hardware’s page table mappings ‒ HSA agent virtual address range matches the host plaVorm (e.g. 48bit, 32bit, …) ‒ HSA agents always operate at “user privilege” of the host CPU, policy enforced by system ‒ HSA agents observe the same memory page table arributes (cache, read, write, …) and page sizes of the host CPU, policy enforced by system
" HSA agents support page faults, allowing to directly operate on pageable memory as provided by the OperaKng System environment ‒ For allocated pageable memory, System Soiware takes page faults, commits memory, loads contents from backup store and restarts execuKon like it does for any access from host CPU threads
‒ There is no tedious device buffer copy, explicit page lock or similar needed to access data in allocated memory by an HSA agent directly!
‒ The Basis of “ptr-‐is-‐ptr”
8 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
THE SHARED PROCESS VIRTUAL ADDRESS SPACE REQUIREMENTS(2)
" On AMD processor-‐based pla9orms, the IOMMUv2 device provides the HSAMMU translaKon services via standard PCI Express™ ATS/PRI protocols to HSA compliant hardware when accessing memory from the HSA agent
‒ IOMMUv2 integraKon into OS memory manager provides the low-‐level infrastructure (e.g. in Linux® kernel)
‒ Different host plaVorm architectures may use different detail mechanisms here
" The HSAMMU funcKonality is provided in addiKon to IOMMU funcKonality used in device virtualizaKon ‒ separate translaKon levels are used (see block diagram)
" ImplementaKon of shared virtual address space by other vendors on other host plaVorms may be different ‒ As long as it follows the HSA Sysarch requirements, it is ok
‒ The implementaKon detail is not relevant to the applicaKon and dealt within the system soiware (e.g. OS)
" The basis of “ptr-‐is-‐ptr”
Com
mand
Buffer
Event Log
HSA MMU(IOMMUv2 device)
I/O page tables
DeviceTable
Device Tablebase
register
System memory
Command Buffer
base registerEvent Log
base register
InterruptR
emappingTable
Guest & host
translation
Hosttranslation
Page S
ervice R
equest LogPage Req
Logbase register
Event Counter registers
Perf Counters &RAS Info (opt.)
Peripheral Page Requests
(PPR) Service
HSA MMUTranslation Tables
(per Process, PASID)
HSA MMU Data structures
9 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
THE HSA MEMORY MODEL REQUIREMENTS
" A memory model defines how writes by one work item or agent becomes visible to other work items and agents, rules that need to be adhered to by compilers and applicaKon threads ‒ It defines visibility and ordering rules of write and read events across work items, HSA agents and interacKons with non-‐HSA components in the system
‒ Important to define scope for performance opKmizaKons in the compiler, to allow reordering of code in the Finalizer
" At its base, the HSA memory model is based on a “relaxed” load acquire/store release model ‒ Inherently maps to many CPU and device architectures very easily ‒ Efficient sequenKal consistency mechanisms supported to fit high-‐level language programming models
" A consistent, full set of atomic operaKons is available ‒ Naturally aligned on size, small machine model supports 32bit, large machine model supports 32bit and 64bit
" Cache Coherency between HSA agents (& host CPU) is maintained by default ‒ key feature of the HSA system & plaVorm environment
" What are Its key properKes?
10 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
THE HSA QUEUEING ARCHITECTURE REQUIREMENTS(1)
" The queue dispatch occurs through architected queue packets (“Architected Queuing Language”, AQL ) that references the work items & parameters ‒ Dispatch to HW occurs directly in user mode, eliminaKng a notable source of latency overhead in tradiKonal architectures!
‒ Two architected packet types exist at the moment, dispatch and barrier packets
‒ Each queue is defined by several architected parameters (type, base address, size, read index, write index, …) that allow targeKng the queue from other HSA agents and the host CPU
‒ The design allows an HSA agent on the plaVorm to build & dispatch jobs to a queue using HSA architected interfaces
" ApplicaKons and runKme can build different queuing models on top of the infrastructure ‒ Single-‐producer, MulK-‐producer queuing models, lock-‐free dispatch, … are all opKons SW can implement on top of the system architecture’s queue definiKon to fit the use model
" The basis of the workload dispatch on HSA
11 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
THE HSA QUEUEING ARCHITECTURE REQUIREMENTS(2)
" The HSA System Architecture defines a user mode queue based dispatch mechanism ‒ Each queue is only valid within that process context and represents a virtual enKty that is scheduled to hardware
‒ The job execuKon occurs at “user privilege” like the rest of the applicaKon code, enforced by system architecture
" Each HSA agent allows for mulKple queues per applicaKon process ‒ HSA defines in-‐order dispatch semanKcs of work items within queues for efficient HW implementaKon
‒ HW may execute dispatch packets “out-‐of-‐order”, if no dependencies exist and in-‐order semanKcs are followed externally
‒ “Out of order” execuKon applies between queues, with explicit, memory based synchronizaKon mechanisms between them as needed
" It is “cheap” to create queues in HSA, so applicaKons can have one queue per HSA agent for each applicaKon thread, or leveraging mulKple HSA user queues per thread if needed ‒ This gives applicaKons a lot of flexibility to structure the queue layout to match the problem instead of trying to fit the problem to work with one or a few queues only
" The basis of the workload dispatch on HSA
12 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
OTHER REQUIREMENTS SET BY THE HSA SYSTEM ARCHITECTURE
" HSA Memory based signaling and synchronizaKon primiKves ‒ Defines memory based semanKcs to synchronize with work items processed by HSA agents
‒ e.g. 32bit or 64bit value, content update, wait on value by HSA agents and AQL packets
‒ Hardware-‐assisted, power-‐efficient & low-‐latency way to synchronize execuKon of work items between threads
‒ Allows one-‐to-‐one and one-‐to-‐many signaling
‒ The signaling semanKcs follow atomicity requirements defined in the memory model
‒ RunKme & applicaKon SW can use infrastructure to build mutexes, semaphores, other synchronizaKon primiKves
" HSA Cache Coherency Domains ‒ Defines the scope of HSA cache coherency and relate to other non-‐HSA system resource operaKons
‒ Associated with the memory model requirements
‒ Architected way to interact with non-‐HSA plaVorm infrastructure (e.g. graphics)
" Miscellaneous menKon, but nevertheless important to make it work well…
13 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
OTHER REQUIREMENTS SET BY THE HSA SYSTEM ARCHITECTURE
" HSA system Kmestamp requirements ‒ Defines a low-‐overhead mechanism to “determine the passing of Kme” on an HSA plaVorm
‒ Represented by a 64bit Kmestamp value that does not roll over and is incremented at a constant rate in HW
‒ The value can be queried by HSAIL or HSA runKme
‒ ApplicaKons and tools are able to build a consistent Kmeline across all HSA agents
" HSA Topology requirements ‒ Defines system topology and properKes of HSA agents discoverable on an HSA plaVorm by an applicaKon to take advantage of plaVorm properKes
‒ Examples are #of compute units, max. work item dimensions, work group size, work item size, queue properKes, …
‒ API’s like OpenCL™ and others can leverage HSA system topology data to discover memory layout, compute unit properKes and other properKes and consistently report the system topology for applicaKons to leverage
" Miscellaneous menKon, but nevertheless important HSA Platform - Simple
System Memory
HSA APU
GPU
H-CU
H-CUH-CU
H-CUMem HSA MMU
CPUcore
corecore
core
HSA Platform
System Memory
HSA APU
GPU
H-CU
H-CU
H-CU
CPU
Mem HSA MMU
HSA GPU
GPU
H-CU
H-CU
H-CU
Device Local Memory
IOBUS
Mem
Firmware
Add-In GPU (optional)
core
corecore
core
System Firmware
14 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
WHERE TO FIND FURTHER INFORMATION ON SYSTEM ARCHITECTURE?
" HSA FoundaKon Website: hrp://www.hsafoundaKon.com ‒ The main locaKon for specs, developer info, tools, publicaKons and many things more
‒ HSA Programmer’s Reference Manual v 0.95 has been published
‒ HSA PlaVorm Soiware Systems Architecture SpecificaKon is quickly nearing the 0.95 state
‒ Will be published aier raKficaKon by the HSA FoundaKon Board of Directors
‒ Stay Tuned
15 | THE HSA PLATFORM SYSTEM ARCHITECTURE SPECIFICATION – AN OVERVIEW | NOVEMBER 12, 2013 | APU13
ANY QUESTIONS? " Of course there are, so go ahead ☺