Download - HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding
![Page 1: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/1.jpg)
HSAHSA A F ll S t E l tA F ll S t E l tHSAemuHSAemu ‐‐ A Full System Emulator A Full System Emulator for HSA Platformfor HSA Platformfor HSA Platformfor HSA Platform
Prof. Yeh‐Ching Chung
System Software LaboratoryDepartment of Computer scienceDepartment of Computer science National Tsing Hua University
National Tsing Hua University ® copyright OIANational Tsing Hua University 1
![Page 2: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/2.jpg)
Outline
Introduction to HSAIntroduction to HSA Design of HSAemu P f E l ti Performance Evaluation Conclusions and Future Work
National Tsing Hua University ® copyright OIANational Tsing Hua University 2
![Page 3: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/3.jpg)
Introduction to HSA
HSA is an industry standard to define next‐generation f g
hardware/software architecture for heterogeneous computingfor heterogeneous computing
National Tsing Hua University ® copyright OIANational Tsing Hua University 3
![Page 4: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/4.jpg)
Hardware Platform of HSA
National Tsing Hua University ® copyright OIANational Tsing Hua University 4
![Page 5: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/5.jpg)
Simplified HSA Software Stack
Application
Domain Specific Libs(Bolt, OpenCV™, … many others)Application
SW O GL ES OthRenderscript OpenGL‐ESRuntime
OtherRuntime
p/OpenCLRuntime
Legacy Driver
HSA Runtime
HSAILHSA Software
Ctl
Legacy Driver
HSA FinalizerDrivers
Kernel Driver
CPU(s) GPU(s) Other AcceleratorsDifferentiated HW
GPU ISA
National Tsing Hua University ® copyright OIANational Tsing Hua University 5
![Page 6: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/6.jpg)
Specification of Simple HSA Platform
Hardware– Memory
SoftwareHSA R ti API– Memory
• Shared Virtual Memory (hUMA)• Cache Coherency Domains• Memory‐Based Signaling and
– HSA Runtime APIs• Initialization of HSA components• Topology discovery• Manage AQL packets
Synchronization for CPU and GPU
– Task Control• Architected Queuing Language (AQL)
Manage AQL packets• Dispatch application tasks• Signal HW and wait for result• Recycle available resources
• Efficient Syscall Infrastructure• Preemptive Context Switching
– Debugging Infrastructure
– User Mode Queue• Store AQL packets
– Virtual ISA ‐ HSAILgg g• Allow system software to set
Instruction/ Memory/ Conditional, etc., breakpoints
E ti H dli
Virtual ISA HSAIL• A low level instruction set designed for
parallel computing
– Exception Handling• GPU trap handler to trigger GPU
interrupt for GPU exception
National Tsing Hua University ® copyright OIANational Tsing Hua University 6
![Page 7: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/7.jpg)
What Is HSAemu
HSAemu is a full system emulator that supports the following HSA features– Shared virtual memory between CPU and GPU– Memory based signaling and synchronizationMemory based signaling and synchronization– Multiple user level command queues– Preemptive GPU context switching
Concurrent execution of CPU threads and GPU threads– Concurrent execution of CPU threads and GPU threads– HSA runtime– FinalizerA P j S d b M di T k (MTK) A Project Sponsored by MediaTek (MTK)
Currently, it supports simple HSA platform simulationsimulation– Functional‐accurate simulation– Cycle‐accurate simulation
National Tsing Hua University ® copyright OIANational Tsing Hua University 7
![Page 8: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/8.jpg)
Architecture of HSAemu
HSAemu consists of 6 components– HSA Runtime– CPU Simulation Module– GPU Task Dispatcher– Functional‐Accurate GPU Simulator (Fast‐GPU Simulator)
– Cycle‐Accurate GPU Simulator (Mult2sim)– GPU Helper Functions
National Tsing Hua University ® copyright OIANational Tsing Hua University 8
![Page 9: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/9.jpg)
HSAemu Runtime
User Mode Queue– Store AQL packetsStore AQL packets
AQL Queue Manager – Manage AQL packets in User Mode
Queue
AQL Command Dispatcher Launch the execution of kernel jobs on– Launch the execution of kernel jobs on HSAemu
Support OpenCL runtime
National Tsing Hua University ® copyright OIANational Tsing Hua University
pp p
9
![Page 10: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/10.jpg)
CPU Simulation Module (1)
PQEMU – Perform multicore CPU simulation HSA Signal Handler – Receive AQL command
from HSA Runtime and launch GPU simulation
National Tsing Hua University ® copyright OIANational Tsing Hua University 10
![Page 11: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/11.jpg)
CPU Simulation Module (2)
PQEMU– A parallel system emulator based on QEMU– A parallel system emulator based on QEMU– Tow efficient synchronization models (UCC/SCC)– Dynamic binary translation (DBT) technique– A project sponsored by MTK
Agent code, HSA runtime, and operating system are run on PQEMUsystem are run on PQEMU
Code Cache
DBT DBT DBT DBT
CPU CPUCPU CPU
Unified Code Cache (UCC) Model
National Tsing Hua University ® copyright OIANational Tsing Hua University 11
“PQEMU: A Parallel System Emulator Based on QEMU” (ICPADS 2011)
![Page 12: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/12.jpg)
GPU Task Dispatcher (1)
AQL Command Monitor– Receive signal from HSA Signal Handler– Copy AQL packets from User Mode Queue
to HW AQL Queue– Launch AQL Packet Worker
AQL Packet Worker– Dequeue AQL packets from HW AQL Queue– Parse AQL packetParse AQL packet– Dispatch kernel jobs to Fast‐GPU Simulator
or M2S‐GPU Simulator according to the kernel informationkernel information
National Tsing Hua University ® copyright OIANational Tsing Hua University 12
![Page 13: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/13.jpg)
GPU Task Dispatcher (2)
Execution Flow
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 14: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/14.jpg)
GPU Task Dispatcher (3)
Signal from HAS Signal Handler
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 15: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/15.jpg)
GPU Task Dispatcher (4)
Copy AQL packets fromCopy AQL packets fromUser Mode Queue
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 16: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/16.jpg)
GPU Task Dispatcher (5)
Ask AQL Packet Workerto parse AQL Packet
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 17: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/17.jpg)
GPU Task Dispatcher (6)
Launch Fast-GPUSimulator
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 18: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/18.jpg)
GPU Task Dispatcher (7)
Launch M2S-GPU SimulationSimulation
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 19: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/19.jpg)
Fast‐GPU Simulator
A functional‐accurate simulator for generic GPU model simulation– HSAIL Translator
• Act as a Finalizer• Use static binary translation technique to translate BRIG file to host executableto translate BRIG file to host executable binary file (x86) based on LLVM
• Host SSE instruction optimization
– GPU Thread Scheduler• Simulate a generic GPU model
National Tsing Hua University ® copyright OIANational Tsing Hua University 19
![Page 20: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/20.jpg)
HSAIL Translator (1)
Architecture
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 21: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/21.jpg)
HSAIL Translator (2)
Launch LLVMHSAIL Translator
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 22: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/22.jpg)
HSAIL Translator (3)
ConstructConstruct Control Flow
Graph of HSAIL
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 23: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/23.jpg)
HSAIL Translator (4)
Translate HSAIL to LLVM IR
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 24: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/24.jpg)
HSAIL Translator (5)
Translate LLVM IRto Host Executableto Host Executable
Object File
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 25: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/25.jpg)
HSAIL Translator (6)
Load Host ExecutableLoad Host ExecutableObject File
to memory
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 26: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/26.jpg)
HSAIL Translator (7)
Link to GPU Helper Functions
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 27: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/27.jpg)
HSAIL Translator (8)
SStore the translation resultto GPU Code Cache
National Tsing Hua University ® copyright OIANational Tsing Hua University
![Page 28: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/28.jpg)
HSAIL Translator (2)
Host SSE instruction Optimization– Reconstruct the control flow graph of kernel function
– Use bitmap masking and packing/unpacking algorithms to generate host SSE instructionsalgorithms to generate host SSE instructions
National Tsing Hua University ® copyright OIANational Tsing Hua University 28
![Page 29: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/29.jpg)
HSAIL Translator (3)
Example : The control flow graph for kernel function $foo
National Tsing Hua University ® copyright OIANational Tsing Hua University 29
![Page 30: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/30.jpg)
HSAIL Translator (4) Reconstruct the control flow graph by depth‐first traversal
Perform bitmap maskingand packing & unpackingalgorithmsalgorithms
National Tsing Hua University ® copyright OIANational Tsing Hua University 30
![Page 31: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/31.jpg)
GPU Thread Scheduler
Simulate a generic GPU model– GPU Thread Scheduler assigns work groups
to free CU threads in the GPU Thread Poolto free CU threads in the GPU Thread Pool– Each CU thread executes all work items in a
work group The maximum number of CU threads is– The maximum number of CU threads is limited by host operating system
National Tsing Hua University ® copyright OIANational Tsing Hua University 31
![Page 32: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/32.jpg)
M2S‐GPU Simulator (1)
A cycle‐accurate simulator for AMD Southern Islands GPU model simulation– HSAIL Translator
• Translate BRIG file to GPU binary
– M2S Bridge• Bridge Multi2Sim GPU Model to HSAemuHSAemu
– M2S GPU Module• Simulate a cycle‐accurate GPU modelSimulate a cycle accurate GPU model
National Tsing Hua University ® copyright OIANational Tsing Hua University 32
![Page 33: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/33.jpg)
M2S‐GPU Simulator (2)
HSAIL Translator– Act as a Finalizer– Translate HSAIL to AMD Southern Islands GPU binary
– Use static binary translation technique based on LLVM
National Tsing Hua University ® copyright OIANational Tsing Hua University 33
![Page 34: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/34.jpg)
M2S‐GPU Simulator (3)
M2S Bridge : An interface to launch M2S GPU M d lM2S GPU Module– Initialize the data structures used by
AMD Southern Islands GPU, including aAMD Southern Islands GPU, including a memory register for AMD Southern Islands GPU to access the shared system memory in HSAemumemory in HSAemu
– Invoke M2S GPU Module (the AMD Southern Islands GPU module in Multi2Sim)
National Tsing Hua University ® copyright OIANational Tsing Hua University 34
![Page 35: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/35.jpg)
M2S‐GPU Simulator (4)
M2S GPU Module– A cycle‐accurate AMD Southern Islands GPU simulator in Multi2Sim
Memory access is performed by y p yHSAemu memory helper function to comply the hUMA modelp y
National Tsing Hua University ® copyright OIANational Tsing Hua University 35
![Page 36: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/36.jpg)
GPU Helper Functions (1)
Memory Helper Function– A soft‐mmu of GPU with a page table
worker and a TLB to enable hUMA model– Support the redirect access of a local pp
segment memory to a non‐shared private memory in GPU
K l I f ti H l F ti Kernel Information Helper Function– Collect and return information of GPU
simulation and current execution state s u at o a d cu e t e ecut o state– Retrieve kernel information such as
working item ID, work group size, etc, from AQL packetAQL packet
National Tsing Hua University ® copyright OIANational Tsing Hua University 36
![Page 37: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/37.jpg)
GPU Helper Functions (2)
Mathematic Helper Function– Simulate special mathematical instructions
such as trigonometric instructions by calling the corresponding mathematical functions in standard library
Synchronization Helper Function– Barrier synchronization implementation for
generic GPU model simulation
National Tsing Hua University ® copyright OIANational Tsing Hua University 37
![Page 38: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/38.jpg)
hUMAModel in HSAemu
Unified coherent address space – GPU can access a virtual memory page allocated by CPU
Soft‐mmu is simulated for GPU– TLB hit/miss events can be traced
Memory segment access– Global memory segment access is handled by memory helper function
– Group memory segment access is handled by host ld/st instructions
National Tsing Hua University ® copyright OIANational Tsing Hua University 38
![Page 39: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/39.jpg)
Recall: Hardware Simulation of HSAemu
HSA hardware components simulated– Multicore CPU: A parallel multicore CPU model simulation– Functional‐Accrate GPU: A generic GPU model simulation– Cycle‐Accurate GPU: AMD Southern Islands GPU model simulation
– hUMA: A unified address space between CPU and GPU simulation
– Synchronization Primitive: Barrier instruction simulation– Hardware AQL Queue: A HW dispatch queue for GPU
i l tisimulation
National Tsing Hua University ® copyright OIANational Tsing Hua University 39
![Page 40: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/40.jpg)
Recall: Software Utilities of HSAemu
HSA software utilities designed– HAS Runtime: HSA runtime library (OpenCL runtime)– Topology Discovery: Discover the current platform topology– User Mode Queue: A queue for each user application– Signal Event: Notify GPU to work– HSAIL Generator: A PTX to HSAIL source level translator– BRIG Generator: Generate a binary format from a Kernel file– HSAIL Translator: Translate HSAIL to host executable binary– GPU Code Cache: store translated host binaries
National Tsing Hua University ® copyright OIANational Tsing Hua University 40
![Page 41: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/41.jpg)
Performance Evaluation
Experimental Environment
Benchmarks: – Nearest Neightbor (NN), K‐Means, FFT, FWT, N‐Body– Binary Search, Bitonic Sort, Reduction, FWT
National Tsing Hua University ® copyright OIANational Tsing Hua University
y , , ,
41
![Page 42: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/42.jpg)
Scalability of Fast‐GPU Simulator
Comparison of NN, K‐means and FWT benchmarks on 32 physical coresphysical cores
The speedup is scalable when # of CU threads < # of host physical coresphysical cores
National Tsing Hua University ® copyright OIANational Tsing Hua University 42
![Page 43: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/43.jpg)
SSE Optimization of Fast‐GPU Simulator
Performance comparison of FFT when turn on/off SSE i i iSSE optimization
National Tsing Hua University ® copyright OIANational Tsing Hua University 43
![Page 44: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/44.jpg)
N‐Body Simulation by Fast‐GPU Simulator
N‐Body Simulation
All of host physical CPUs are running
National Tsing Hua University ® copyright OIANational Tsing Hua University 44
![Page 45: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/45.jpg)
Comparison of HSAemu and Multi2Sim
20
benchmark speedup
14
16
18
Fast‐GPU Sim > M2S‐GPU sim > Multi2Sim
10
12
14
4
6
8
BinarySearch BitonicSort FastWalshTransform Reductionmulti2sim 1 1 1 1
0
2
multi2sim 1 1 1 1HSAemu 2.931317 18.88827 8.645516 6.294213Hybrid 2.873768 0.921835 2.407809 2.105663
multi2sim HSAemu Hybrid
National Tsing Hua University ® copyright OIANational Tsing Hua University 45
![Page 46: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/46.jpg)
Conclusions
An HSA‐compliant full system emulator has been implemented– A functional‐accurate simulator for generic GPU model– A cycle‐accurate simulator for AMD Southern Islands GPU model (from Multi2Sim)
The HSAIL Translator acts as a finalizer that enables the integration of HSAemu with existing simulators, for example, Multi2Sim
Open source – Nov. 12, 2013p ,– http://hsaemu.org/
National Tsing Hua University ® copyright OIANational Tsing Hua University 46
![Page 47: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/47.jpg)
Future work
Enhance HSAemu by implementing more HSA f tfeatures
I HSA i h i i l Integrate HSAemu with some existing cycle‐accurate GPU simulators
Design a cycle‐accurate simulator based on PQEMU for generic CPU model
Deisgn a cycle‐accurate simulator based on PQEMU for big.LITTLE CPU model
National Tsing Hua University ® copyright OIANational Tsing Hua University 47
![Page 48: HSA-4138, HSAemu – A Full System Emulator for HSA Platform, by Yeh Ching Chung and Jiun-Hung Ding](https://reader033.vdocuments.site/reader033/viewer/2022052910/559b8a591a28ab72158b46e7/html5/thumbnails/48.jpg)
Q & AQ & A
National Tsing Hua University ® copyright OIANational Tsing Hua University 48