dean tullsen ucsd. the parallelism crisis has the feel of a relatively new problem ◦ results from...
TRANSCRIPT
![Page 1: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/1.jpg)
The High Hanging Fruit
Dean TullsenUCSD
![Page 2: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/2.jpg)
The parallelism crisis has the feel of a relatively new problem◦ Results from a huge technology shift◦ Has suddenly become pervasive◦ Carries extreme urgency – our ability to continue to
scale performance is now completely tied to our ability to find parallelism.
◦ Many researchers rushing in to work on the problem.
But it is a very old problem◦ Smart people have been thinking about and
building parallel machines for about 6 decades.
The Dilemma
![Page 3: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/3.jpg)
We are faced with a “new”, critically urgent problem, but with all of the low hanging fruit stripped clean.
Few easy solutions remain.
As a result
![Page 4: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/4.jpg)
Parallel speedup of sequential code
Small pockets of parallelism
Unpowered transistors
Some deep reservoirs of untapped parallelism
![Page 5: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/5.jpg)
Many important algorithms inherently sequential.
Amdahl’s Law tells us that eventually, the sequential code always dominates.
Sequential Code Will Always Be Critically Important
![Page 6: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/6.jpg)
We’ve been referring to this as “non-traditional parallelism”
Simply stated – how do you use multiple hardware contexts to run sequential code faster than a single context?
Can we run sequential code faster on a machine optimized for parallel execution than on a machine optimized for sequential execution?
Parallel Speedup of Sequential Code
![Page 7: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/7.jpg)
Traditional ParallelismCore 1 Core 2 Core 3 Core 4
![Page 8: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/8.jpg)
Traditional ParallelismCore 1 Core 2 Core 3 Core 4
![Page 9: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/9.jpg)
Non-Traditional Parallelism (one model)
Core 1 Core 2 Core 3 Core 4
![Page 10: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/10.jpg)
nearly any code, no matter how inherently serial, can benefit from parallelization.
Much more dynamic than traditional parallelism – threads can be added or subtracted without significant disruption.
Not bound by traditional (e.g., linear) speedup limits. We often see 10X speedup with 2 or 4 cores.
Things we like about non-traditional parallelism
![Page 11: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/11.jpg)
Helper thread prefetching on multithreaded machines
Event-driven compilation (helper threads improve code and specialize for runtime conditions)
Software data spreading Inter-core prefetching Speculative multithreading/thread level
speculation??
Some examples of non-traditional parallelism
![Page 12: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/12.jpg)
Parallel speedup of sequential code
Small pockets of parallelism
Unpowered transistors
Some deep reservoirs of untapped parallelism
![Page 13: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/13.jpg)
Optimized for:
Traditional CPUs
Billio
ns o
f instru
ctions
![Page 14: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/14.jpg)
When parallel code looks like:
•Our traditional CPUs work great.
![Page 15: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/15.jpg)
When parallel code looks like:
![Page 16: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/16.jpg)
We find that we have the wrong◦ CPUs◦ Interconnect◦ Memory Hierarchy◦ Branch Predictors◦ Etc.
They’re all optimized for running billions of instructions without interruption. They perform very poorly when running 100s of instructions.
Then
![Page 17: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/17.jpg)
Parallel speedup of sequential code
Small pockets of parallelism
Unpowered transistors
Some deep reservoirs of untapped parallelism
![Page 18: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/18.jpg)
The big point – it’s easy to add transistors (cores), difficult to add more powered-up cores.
Assuming expected scaling trends, larger and larger portions of the processor must remain unpowered (idle).
Parallelism and Dark Silicon
![Page 19: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/19.jpg)
General: How do we add transistors/logic to the processor that add value even when they are not turned on?
Specific to today’s topic: How do we get higher parallel speedup from n cores (out of P*n total) than we can get from n cores (out of n total)?
The Dark Silicon Question
![Page 20: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/20.jpg)
Heterogeneous cores. Customization, specialization…
Answers we know today…
![Page 21: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/21.jpg)
If your models can’t explain the parallel speedups we’re achieving, then they are of limited usefulness. ◦ Parallel speedup of sequential code◦ Accurately accounting for overheads of spawning
computation, including sw overheads, communication, cold start, etc.
◦ Handling highly irregular opportunities for parallelism◦ Accounting for power and energy bounds on
computation.◦ Smoothly handling heterogeneous computing
elements.
Algorithms and Theory?
![Page 22: Dean Tullsen UCSD. The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive](https://reader036.vdocuments.site/reader036/viewer/2022070410/56649f275503460f94c3ebf8/html5/thumbnails/22.jpg)
Thank you