on-chip interconnects alexander grubb jennifer tam jiri simsa harsha simhadri martha mercaldi kim,...
TRANSCRIPT
![Page 1: On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic](https://reader030.vdocuments.site/reader030/viewer/2022032517/56649caf5503460f9497398d/html5/thumbnails/1.jpg)
On-Chip InterconnectsAlexander Grubb
Jennifer TamJiri Simsa
Harsha Simhadri
• Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic On- Chip Networks,” in Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA), June 2008.
• Dongkook Park, Soumya Eachempati, Reetuparna Das, Asit K. Mishra, Yuan Xie, N. Vijaykrishnan, and Chita R. Das. “MIRA: A Multi-layer On Chip Interconnect Router Architecture,” in Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA), June 2008.
![Page 2: On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic](https://reader030.vdocuments.site/reader030/viewer/2022032517/56649caf5503460f9497398d/html5/thumbnails/2.jpg)
Motivation
• As more processors are put on chip, it is important to have communication infrastructure, interconnect, stay on chip too.
![Page 3: On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic](https://reader030.vdocuments.site/reader030/viewer/2022032517/56649caf5503460f9497398d/html5/thumbnails/3.jpg)
Motivation
• No single network topology optimal for all workloads.
• Are these good workloads to be examining?
![Page 4: On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic](https://reader030.vdocuments.site/reader030/viewer/2022032517/56649caf5503460f9497398d/html5/thumbnails/4.jpg)
Definition
• Polymorphic on-chip network - a network which can act like any other network
• Since polymorphic network can imitate any network by configuring network building blocks, we just need to decide before runtime which network we want to imitate.
![Page 5: On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic](https://reader030.vdocuments.site/reader030/viewer/2022032517/56649caf5503460f9497398d/html5/thumbnails/5.jpg)
Polymorphic design
• Two network components: buffers and connections
• Router is just buffers with some logic
![Page 6: On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic](https://reader030.vdocuments.site/reader030/viewer/2022032517/56649caf5503460f9497398d/html5/thumbnails/6.jpg)
Polymorphic Overhead
• ~40% space overhead used for network
• How much latency does this extra space introduce?• Could a fixed network be made faster/wider?
![Page 7: On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic](https://reader030.vdocuments.site/reader030/viewer/2022032517/56649caf5503460f9497398d/html5/thumbnails/7.jpg)
Issues
• Bad traffic models• Time overheads, would simulations
outputs remain the same
![Page 8: On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic](https://reader030.vdocuments.site/reader030/viewer/2022032517/56649caf5503460f9497398d/html5/thumbnails/8.jpg)
MIRAMulti-layered On-Chip Interconnect Router
• Newer fab processes allows multi-layer interconnects
• You can stack a 128-bit line in to 4x32-bit lateys for lesser power, lesser latency and higher speed
• Can use this technology to design networks for chips.
• Main issue: power dissipation
Multi-layer connect
Multi-layer interconnect
![Page 9: On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic](https://reader030.vdocuments.site/reader030/viewer/2022032517/56649caf5503460f9497398d/html5/thumbnails/9.jpg)
MIRA
• Interconnect design elements: – Interconnect buffer– Crossbar– Inter-router link– Routing logic– Switch logic
![Page 10: On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic](https://reader030.vdocuments.site/reader030/viewer/2022032517/56649caf5503460f9497398d/html5/thumbnails/10.jpg)
MIRA
• Interconnect buffer
Design considerations LSBs (and other control signals) change faster than MSBs
(which are usually mostly 0s) Put them on top to dissipate power faster
![Page 11: On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic](https://reader030.vdocuments.site/reader030/viewer/2022032517/56649caf5503460f9497398d/html5/thumbnails/11.jpg)
MIRA
• Crossbar: 5x5 ports. Logic controls data flow.
Overall area prop. to width2
Saves overall area too.
![Page 12: On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic](https://reader030.vdocuments.site/reader030/viewer/2022032517/56649caf5503460f9497398d/html5/thumbnails/12.jpg)
MIRA
• Typical Design– 36 nodes. Interconnect is a torus– 8 nodes are cores, 28 are L2
caches.
• Used the saved footprint to build multi hop links– Reduces latency
– Also saves power as fewer switching in involved
![Page 13: On-Chip Interconnects Alexander Grubb Jennifer Tam Jiri Simsa Harsha Simhadri Martha Mercaldi Kim, John D. Davis, Mark Oskin, and Todd Austin. “Polymorphic](https://reader030.vdocuments.site/reader030/viewer/2022032517/56649caf5503460f9497398d/html5/thumbnails/13.jpg)
MIRA
• Questions– How many layers?
• Tradeoffs: heat dissipation, cost Vs latency, total power consumption
– What if you could put nodes themselves in multiple layers?
• Where would you place the cores and L2 caches
– What topology would you implement?