experiences implementing tinuso in gem5 maxwell walter, pascal schleuniger, andreas erik hindborg,...
TRANSCRIPT
![Page 1: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/1.jpg)
Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson
Technical University of Denmark
![Page 2: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/2.jpg)
14/06/2015Maxwell Walter2 DTU Compute, Technical University of Denmark
Motivation
• We have developed the Tinuso architecture–For multi-core research–Targeted for FPGAs
• Application dependent accelerators are important for multi-core research
• Software/hardware co-design is difficult!
![Page 3: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/3.jpg)
14/06/2015Maxwell Walter3 DTU Compute, Technical University of Denmark
Motivation
• We have developed the Tinuso architecture–For multi-core research
• Application dependent accelerators are important for multi-cores
• Software/hardware co-design is difficult!
app
![Page 4: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/4.jpg)
14/06/2015Maxwell Walter4 DTU Compute, Technical University of Denmark
Motivation
• We have developed the Tinuso architecture–For multi-core research
• Application dependent accelerators are important for multi-cores
• Software/hardware co-design is difficult! • So we would like to do it automatically
app
parameters
toolchain evaluate
feedback
![Page 5: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/5.jpg)
14/06/2015Maxwell Walter5 DTU Compute, Technical University of Denmark
Contributions
• Implementation of the Tinuso processor architecture in gem5
• Discussion of gem5 and designing application specific accelerators
![Page 6: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/6.jpg)
14/06/2015Maxwell Walter6 DTU Compute, Technical University of Denmark
Outline:
• Motivation• Contributions• Tinuso Architecture• Gem5 Implementation• Design Space Exploration• Conclusions
![Page 7: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/7.jpg)
14/06/2015Maxwell Walter7 DTU Compute, Technical University of Denmark
Tinuso
• Philosophy: move complexity to software– Predicated execution to lower branch costs– Very fast 8 stage pipeline– No pipeline interlocking; Compiler must produce a valid schedule
• GCC 4.9 toolchain • Designed for FPGA synthesis• Will be released as open source• Small and fast
Tinuso MicroBlaze
376 MHz 194 MHz
1322 LUTs 2024 LUTs
![Page 8: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/8.jpg)
14/06/2015Maxwell Walter8 DTU Compute, Technical University of Denmark
Gem5 Implementation
• Instruction Predication– Easily handled in the instruction decoder
• Configurable branch delay slots– New PCState with counter and NNPC
• Instruction delay slots for compiler validation– Tracked by the Decoder– Validated at instruction decode
![Page 9: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/9.jpg)
14/06/2015Maxwell Walter9 DTU Compute, Technical University of Denmark
Gem5 Implementation
• Instruction Predication– Easily handled in the instruction decoder
• Configurable branch delay slots– New PCState with counter and NNPC
• Instruction delay slots for compiler validation– Tracked by the ISA/Decoder– Validated at instruction decode
• Gem5 implementation was easy and painless– A good fit into our workflow
![Page 10: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/10.jpg)
14/06/2015Maxwell Walter10 DTU Compute, Technical University of Denmark
Gem5 In Our Workflow
• RTL simulator validation–Simulator built directly from VHDL sources
• Toolchain validation
Test RTL Time Gem5 Time
memcpy-chk.x1 6.47s 3.5s
memmove.x4 21.78s 3.7s
![Page 11: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/11.jpg)
14/06/2015Maxwell Walter11 DTU Compute, Technical University of Denmark
Design Space Exploration
• Tinuso is intended for multi-core accelerator systems–Easily configured for specific applications
• Many configuration parameters–ISA, cache sizes, pipeline depth, #of cores
![Page 12: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/12.jpg)
14/06/2015Maxwell Walter12 DTU Compute, Technical University of Denmark
Design Space Exploration
• Tinuso is intended for multi-core accelerator systems–Easily configured for specific applications
• Many configuration parameters
![Page 13: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/13.jpg)
14/06/2015Maxwell Walter13 DTU Compute, Technical University of Denmark
Tinuso multicore systems
PENI
R
PENI
R
PENI
R
PENI
R
PENI
R
PENI
R
PENI
R
PENI
R
PENI
R
PENI
R
• Barrel shifter
• Multiplier
• FPU instructions
• Profiling infrastructure
• Cache sizes
• Pipeline depth
• Data Link width
• Arbitration scheme
PENI
R
PENI
R
PENI
R
PENI
R
PENI
R
• Up to 480 processor cores on Xilinx Virtex-7 device
• synthesizable processor cores• packet switched 2D mesh interconnect
![Page 14: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/14.jpg)
14/06/2015Maxwell Walter14 DTU Compute, Technical University of Denmark
Design Space Exploration
• Tinuso is intended for multi-core accelerator systems–Easily configured for specific applications
• Many configuration parameters–ISA, cache sizes, pipeline depth, #of cores
• Changing parameters manually is tedious and can be error prone
• Effective searching requires fast simulation
![Page 15: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/15.jpg)
14/06/2015Maxwell Walter15 DTU Compute, Technical University of Denmark
Design Space Exploration
• Use gem5 for quick performance estimation–Can help direct the performance optimization
• Use more accurate tools, like Vivado, for power estimation and resource usage
app
parameters
toolchain evaluate
feedback
![Page 16: Experiences Implementing Tinuso in gem5 Maxwell Walter, Pascal Schleuniger, Andreas Erik Hindborg, Carl Christian Kjærgaard, Nicklas Bo Jensen, Sven Karlsson](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d765503460f94a57aef/html5/thumbnails/16.jpg)
14/06/2015Maxwell Walter16 DTU Compute, Technical University of Denmark
Conclusions
• We have implemented the Tinuso architecture in gem5–It was an easy and painless process
• The Tinuso gem5 implementation is useful for a number of workflow considerations
• We leverage gem5 for design space exploration of custom multi-core accelerators