jaguar microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfcops, they are dispatched...
TRANSCRIPT
![Page 1: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/1.jpg)
JaguarMicroarchitecture
Alex Avery, Cody Smith
![Page 2: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/2.jpg)
Agenda
● AMD Processors● Jaguar Overview● Example Hardware● Core Pipeline● Instruction Fetch and Cache● Instruction Decoding● Scheduling● Integer & FP Execution● Memory● Cache
![Page 3: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/3.jpg)
What is a Microarchitecture?
Microarchitecture is the Computer Organization
Microarchitecture + Instruction Set Architecture = Computer Architecture
A Microarchitecture describes the electrical circuitry of the device, it is how the ISA is implemented.
![Page 4: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/4.jpg)
AMD Processors● Bobcat (2011)● Piledriver (2012)● Jaguar (2013)● Steamroller (2014)● Puma (2014)● Excavator (2015)
![Page 5: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/5.jpg)
Jaguar Overview
● Targets 2-25W Devices● Low cost● 28 nm Technology● Up to 4 Cores● Split L1 Cache - 32 KiB instruction and 32 KiB data per core● Unified L2 Cache - 1-2 MiB, 16 way● Out-of-order and Speculative Execution● Integrated memory controller● Two-way integer execution● Two-way 128-bit floating-point execution
![Page 6: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/6.jpg)
Example Hardware● Gaming Consoles
○ Xbox One○ PS4
● Desktop Processors○ Athlon 5350○ Sempron 3850
● Laptops/Mini PCs○ A6-5200○ E2-3000
● Tablets○ A6-1450
● Embedded Processors○ GX-420CA
![Page 7: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/7.jpg)
Jaguar Core Pipeline
![Page 8: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/8.jpg)
Instruction Fetch and Cache● 6 Stages● 32KB 2 way set associative L1 cache● Pseudo least recently used (LRU)
replacement algorithm● 32B Instruction fetch window● Branch predictors exploit
characteristics of both direct and indirect branches as well as branch density
![Page 9: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/9.jpg)
Instruction Decoding● Can decode two x86 Instructions per cycle● Variable length x86 instructions are decoded
into complex micro-operations (COPs)● Can handle 128-bit vector units as well as
x86 Advanced Vector Extensions (AVX)
![Page 10: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/10.jpg)
Scheduling● Out-of-order execution● After instructions are decoded into
COPs, they are dispatched● Each COP allocates a Retire
Control Unit (RCU) entry
![Page 11: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/11.jpg)
Integer Execution● Separate Integer and Floating Point
Units● 2 Symmetrical integer pipelines● Integer addition/subtraction takes 3
cycles○ Read operands○ Execute○ Write back
● 6 Cycle multiplication● Separate hardware divider
![Page 12: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/12.jpg)
Floating Point Execution● Designed for 128-bit wide execution● Targets SSE and AVX vector
extensions● 2 Asymmetrical FP pipelines● 4-7 cycles per addition/subtraction
○ Read operands (2 cycles)○ Execute (1-4 cycles)○ Write back (1 cycle)
● Co-processor architecture○ Dedicated decode, rename, out-of-order
scheduler and retire queue
![Page 13: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/13.jpg)
Memory● Separate load and store pipelines● Aggressive re-ordering
○ Loads can occur out-of-order
○ Loads can be moved ahead of stores before the target address is resolved
● Memory Ordering Queue and Store Queue handle memory ordering
![Page 14: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/14.jpg)
L1 Data Cache● 32KB● 8-way associative● Parity protected writeback cache● Pseudo-LRU replacement algorithm● Can handle a 128-bit read and a 128-bit write each cycle● Average latency of 3 cycles for a L1 hit
![Page 15: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/15.jpg)
L2 Cache● 1 - 2 MB (depending on application)● 16-way set associative● Unified, shared by 2 to 4 cores● ECC Memory (Error Correcting Code) for tag and data arrays● Forms an EDC/ECC cache structure● Minimum of 25 cycles per hit
![Page 16: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/16.jpg)
Jaguar Benchmarks● Athlon 5350● Athlon 5150● Sempron 3850
![Page 17: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/17.jpg)
Athlon 5350 vs. Intel Core i3 3220 vs. Celeron J1900
![Page 18: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/18.jpg)
Athlon 5350 vs. Intel Core i7 5930KThe Athlon 5350 is much lower performance, however:
● Much better efficiency● Much lower cost● Better performance per
watt● Better performance per
dollar
![Page 19: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/19.jpg)
Zen● Entirely new core design● New design family ‘Summit Ridge’● Simultaneous Multithreading● New Cache System● FinFET manufacturing process
![Page 20: Jaguar Microarchitecturemeseec.ce.rit.edu/551-projects/fall2015/3-3.pdfCOPs, they are dispatched Each COP allocates a Retire Control Unit (RCU) entry. Integer Execution Separate Integer](https://reader036.vdocuments.site/reader036/viewer/2022062913/5e3d4f2c3eabf71fda22e30c/html5/thumbnails/20.jpg)
Resourceshttp://www.anandtech.com/show/6976/amds-jaguar-architecture-the-cpu-powering-xbox-one-playstation-4-kabini-temash
http://www.realworldtech.com/jaguar/
http://www.tomshardware.com/reviews/microsoft-xbox-one-console-review,3681-3.html
https://nathanlamont91.wordpress.com/2015/03/22/my-report-on-the-amd-jaguar-quad-core-cpu/
https://www.deepdyve.com/lp/institute-of-electrical-and-electronics-engineers/the-floating-point-unit-of-the-jaguar-x86-core-1TVYueOORA
http://www.xbitlabs.com/news/cpu/display/20120904201534_AMD_Discloses_Peculiarities_of_Next_Generation_Jaguar_Micro_Architecture.html