![Page 1: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/1.jpg)
Vinay Gangadhar, Raghu Balasubramanian, Mario Drumond, Ziliang Guo,
Jai Menon, Cherin Joseph, Robin Prakash, Sharath Prasad, Pradip Vallathol,
Karu Sankaralingam
Vertical Research Group
University of Wisconsin - Madison
MIAOW: An Open Source GPGPU
www.miaowgpu.org
![Page 2: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/2.jpg)
Executive Summary
• MIAOW is a credible GPGPU implementation
– Compatible with AMD Southern Islands ISA
– Runs OpenCL programs and prototyped on FPGA
– Similar design to industry state-of-art
– Similar performance to industry state-of-art*
– Flexible and Extendable
• MIOAW’s hardware design is Open Source
• Contributes to changing hardware landscape
2 * Frequency, Physical Design, Area goals relaxed
![Page 3: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/3.jpg)
3
![Page 4: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/4.jpg)
4
Applications that drive computing are changing
Need innovative new hardware
![Page 5: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/5.jpg)
5
131 maker fairs 750K people
Open Source Hardware is gaining momentum
![Page 6: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/6.jpg)
6
Some Open Source Hardware Microprocessors
![Page 7: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/7.jpg)
7
ZERO Open Source GPUs
![Page 8: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/8.jpg)
Lessons from Open Source S/W
8
PHP, Linux, ruby, mysql, sqlite, apache, gcc late 80s, early 90s
$0
Facebook, Twitter, Whatsapp, Instagram
Web 2.0
>$10 bill.
? MIAOW, OpenCores, RISC-V etc..
$0
![Page 9: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/9.jpg)
9
MIAOW Technical Overview
Demonstrate MIAOW is credible GPGPU
Implications and Possibility of Open Source Hardware
![Page 10: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/10.jpg)
ISA Summary
• 95 instructions
• Single-precision support only
• No graphics support (yet)
10
![Page 11: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/11.jpg)
MIAOW Overview
11 MIAOW has 32 Compute Units (CUs)
![Page 12: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/12.jpg)
Hardware Organization
12
• Single Issue
• 40
• 16-wide vector ALUs
• LSU – Memory operations
Wavefronts
CU Microarchitecture
![Page 13: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/13.jpg)
MIAOW Implementations
13
(a) Full ASIC Design (b) Mapped to FPGA (c) Hybrid Design
Area, Power from Floorplan and simulation
Long running apps S/W development
Prototyping
Academic research
Virtex7 FPGA 1 CU @ 50 MHz
133K LUTs, 100K Registers
![Page 14: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/14.jpg)
Design Team
• Small initial design team (12 mo)
– 5-person HDL team
– 1-person software team
– 1-person physical design team
• Added FPGA expert
• 3 undergrads extended the design
• Total duration: 36 months
• Area, Frequency, Performance, Power – NON GOALs
14
![Page 15: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/15.jpg)
Software Compatibility
• Runs unmodified OpenCL programs
• All OpenCL benchmarks
• Many Rodinia benchmarks
• Easily extendable to add additional instructions
15
![Page 16: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/16.jpg)
16
MIAOW Technical Overview
Demonstrate MIAOW is a credible GPGPU
Implications and Possibility of Open Source Hardware
![Page 17: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/17.jpg)
MIAOW vs. AMD Tahiti
17 AMD’s latest GPU CU architectures build upon Tahiti. Tahiti released in 2011.
![Page 18: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/18.jpg)
MIAOW
CU Design
18
GCN CU*
*HOTCHIPS 2014
![Page 19: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/19.jpg)
Performance Comparison
19
0
0.2
0.4
0.6
0.8
1
Re
lati
ve P
erf
orm
ance
(Cyc
les)
MIAOW AMD Tahiti
5X 107X 121X 46X 30X 40X
Tahiti Speedup over AMD A10-7850K CPU
![Page 20: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/20.jpg)
Area Comparison
20
Tahiti CU area*: 5.02 mm2 @ 28nm MIAOW CU area: 9.1 mm2 @ 32nm
*Estimate from die-photo analysis and block diagrams from wccftech.com
![Page 21: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/21.jpg)
Area Comparison
21
Tahiti CU area*: 5.02 mm2 @ 28nm MIAOW CU area: 9.1 mm2 @ 32nm
*Estimate from die-photo analysis and block diagrams from wccftech.com
![Page 22: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/22.jpg)
Power
22
Tahiti CU power*: 0.52 W MIAOW CU Power: 1.1 W
* Ballpark estimate from TDP and occupancy
![Page 23: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/23.jpg)
Power
23
Tahiti CU power*: 0.52 W MIAOW CU Power: 1.1 W
* Ballpark estimate from TDP and occupancy
![Page 24: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/24.jpg)
MIAOW is comparable to industry designs
24
![Page 25: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/25.jpg)
25
MIAOW Technical Overview
Demonstrate MIAOW is a credible GPGPU
Implications and Possibility of Open Source Hardware
![Page 26: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/26.jpg)
Lessons Learned
• It was surprising this was doable!
– Microarchitecture design, HDL implementation, verification was not tedious
• Software toolchain being available was great
• We punted on physical design
• FPGA tools are still quite tedious to use
26
![Page 27: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/27.jpg)
Implications for Industry
• Open Source Hardware GPU
– Relevance to OpenCompute & Maker movement
• How can a HW startup benefit from MIAOW
– Start with MIAOW and focus on innovative pieces from day one
• IP and Compiler
– License under BSD, ISA is OK, compiler usable
– How to avoid IP infringement?
27
![Page 28: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/28.jpg)
Lessons from Open Source S/W
28
PHP, Linux, ruby, mysql, sqlite, apache, gcc late 80s, early 90s
$0
Facebook, Twitter, Whatsapp, Instagram
Web 2.0
>$10 bill.
? MIAOW, OpenCores, RISC-V etc..
$0
![Page 29: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/29.jpg)
What drives Open Source Software?
• It’s fun!
– “Enjoyment-based intrinsic motivation, namely how creative a person feels when working on the project, is the strongest and most pervasive driver.”
• It’s valuable
– “user need, intellectual stimulation derived from writing code, and improving programming skills are top motivators”
29
Why Hackers Do What They Do: Understanding Motivation and Effort in Free/Open Source Software Projects, Lakhani K and Wolf R. In Perspectives on Free and Open Source Software
![Page 30: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/30.jpg)
Conclusion
• MIAOW is transformative for GPU research
• Its role in open source hardware movement?
• Are open source hardware chips feasible?
• More community support First Open Source Silicon GPU Chip
30
![Page 31: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/31.jpg)
www.miaowgpu.org Journal article (ACM TACO 2015): Enabling GPGPU Low-Level Hardware
Explorations with MIAOW: An Open-Source RTL Implementation of a GPGPU
31
3.9 FPS on FPGA @ 50 MHz 23 FPS in simulation @ 222 MHz
![Page 32: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/32.jpg)
32
Back Up Slides
![Page 33: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/33.jpg)
Fetch & Wavepool
33
![Page 34: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/34.jpg)
Vector ALU/FPU
34
![Page 35: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/35.jpg)
Issue
35
![Page 36: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/36.jpg)
Load/Store Unit
36
![Page 37: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/37.jpg)
HOTCHIPS 2015 Many technical details in this publication: TACO 2015: Enabling
GPGPU Low-Level Hardware Explorations with MIAOW: An Open-Source RTL Implementation of a GPGPU
37
www.miaowgpu.org
![Page 38: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/38.jpg)
Virtex7-based FPGA – Neko
38
• 1 CU design • 50 MHz • Utilization:
LUTs 133K Registers: 100K
![Page 39: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/39.jpg)
39
![Page 40: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/40.jpg)
Flexibility
40
![Page 41: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/41.jpg)
Verification
41
![Page 42: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/42.jpg)
As a Research Tool
42
Direction Research Idea MIAOW enabled findings
Traditional µarch Thread-block compaction
(TBC)
• Implemented TBC in RTL • Significant design complexity • Increase in Critical Path length
Validation of Simulator studies
Transient Fault Injection • RTL Level Fault Injection • More Gray area than CPUs • Silent data corruption seen
New Directions
Circuit-Failure Prediction (Aged SDMR)
• Implemented entirely in µarch • Works elegantly in GPUs • Small area, power overheads
Timing Speculation (TS) • Quantifies error-rate on GPU • TS framework for future studies
![Page 43: MIAOW: An Open Source GPGPU vinay/talks/HC27-MIAOW-Talk.pdf · •We punted on physical design •FPGA tools are still quite tedious to use 26 . Implications for Industry •Open](https://reader035.vdocuments.site/reader035/viewer/2022071106/5fdfedd6477b61793d73b99d/html5/thumbnails/43.jpg)
Virtex7-based FPGA – Neko
43
Virtex7 FPGA 1 CU @ 50 MHz
133K LUTs, 100K Registers