research directions for 21 st century computer systems asplos 2013 panel
DESCRIPTION
Research Directions for 21 st Century Computer Systems ASPLOS 2013 Panel. Impact? $15M NSF XPS (Exploiting Parallelism & Scalability) cites 1 & 4. 0. Mark Hill: Introduction Kathryn McKinley on NAS Report The Future of Computing Performance : Game Over or Next Level? - PowerPoint PPT PresentationTRANSCRIPT
Research Directions for 21st Century Computer
Systems
ASPLOS 2013 Panel0. Mark Hill: Introduction1. Kathryn McKinley on NAS Report
The Future of Computing Performance: Game Over or Next Level?
2. Josep Torrellas on CCC WorkshopsAdvancing Computer Architecture Research (ACAR)
3. Mark Hill on ISAT WorkshopAdvancing Computer Systems without Technology Progress
4. Sarita Adve on CCC White Paper21st Century Computer Architecture
5. Emmett Witchel unbounded
Impact? $15M NSF XPS (Exploiting Parallelism & Scalability) cites 1 & 4.
Q: Do to facilitate, transcend, or refute these partially overlapping visions?
The Futureof ComputingPerformance:
Game Over or Next Level?
Thanks to Sam Fuller & Mark Hill
Samuel H. Fuller, Chair
March 22, 2011
Computer Science and Telecommunications Board (CSTB)National Research Council (NRC)
Committee On Sustaining Growth In Computing Performance
Experts Addressed the Problem• SAMUEL H. FULLER, Analog Devices Inc., Chair• LUIZ ANDRÉ BARROSO, Google, Inc.• ROBERT P. COLWELL, Independent Consultant• WILLIAM J. DALLY, NVIDIA Corporation and Stanford University• DAN DOBBERPUHL, PA Semi/Apple• PRADEEP DUBEY, Intel Corporation• MARK D. HILL, University of Wisconsin–Madison• MARK HOROWITZ, Stanford University• DAVID KIRK, NVIDIA Corporation• MONICA LAM, Stanford University• KATHRYN S. McKINLEY, University of Texas at Austin• CHARLES MOORE, Advanced Micro Devices• KATHERINE YELICK, University of California, Berkeley
Staff• LYNETTE I. MILLETT, Study Director• SHENAE BRADLEY, Senior Program Assistant
3
Executive Summary
1. Computer hardware has transitioned to multicore2. Dennard scaling of CMOS has broken down3. Parallelism and locality must be exploited by
software4. Chip power will soon limit multicore scaling
Virtuous Cycle
5
Devices 2x more capable, efficient,
cheaper, smaller, …
doubling of transistors
Hardware ComplexitySequential Interface
Software Innovation
Software ComplexitySequential Interface
Breaks in Virtuous Cycle
6
Devices 2x more capable, efficient,
cheaper, smaller, …
doubling of transistors
Hardware ComplexitySequential Interface
Software Innovation
Software ComplexitySequential Interface
end of Dennard Scaling
Sequential Interface
Next StepsInnovate within and across layers
• Algorithms
• Programming “systems”
• Architecture
• Technology
• Education
7
Community
No news here? But…
Are we all acting on this knowledge or are we acting business as usual?
Are we thinking beyond next paper to where to create future value?
Denial … Acceptance Act?
2. Advancing Computer Architecture Research (ACAR)
• Two workshops sponsored by CCCo 25 + 19 attendees
• Organizers: J. Torrellas (U Illinois) & M. Oskin (U Wash.)
• Issued a community-wide call for white papers• Selection committee picked most relevant papers• Included industry folks• Also invited DARPA, DOE, NSF program managers
http://www.cra.org/ccc/docs/ACAR_Report_Popular-Parallel-Programming.pdfhttp://www.cra.org/ccc/docs/ACAR2-Report.pdf
What We FoundData centers and extreme
scale computingSpecialized architectures
and heterogeneity
Ultimate goal: fully automated generation of app-specific HW for programs
Architectures for programmability
Performance scaling: • Past: no SW changes• Now: extensive
SW+HW changes
Energy and power consumption are the key limiters
End of road for conventional ISA
Modern systems are skyscrapers built on the ISA of a bungalow
Secure, reliable and predictable from the HW up
Foundation of computing is breaking apart; malicious parties are exploiting it
What We Found
Architecture research enables new technologies to enter the market quickly
Exploiting emerging technologies
Discussion Points• Many directions of research are relevant:
o Computer systems research is broadening• Focus on increasing funding pie, not re-distributing it• Need to create coalitions with other communities:
o Big datao New computing materials and deviceso Healthcareo …
• Need to move away from incrementalism
13
Advancing Computer Systems without Technology
ProgressSy
stem
Cap
abili
ty
(log
)
80s
90s
00s
10s
20s
30s
40s
CMOS
Fallow Period
New
TechnologyOur Focus
50s
The views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.
Approved for Public Release, Distribution Unlimited
Seek ~1000x = two decades of Moore Law via four thrusts
A. Spectrum of Hardware SpecializationMetric Ops/mm2 Ops/Watt Time to
Soln NRE
Normalized to General-Purpose 1 1 1
(programming GPP) 1
Specialized ISA(domain specific) 1.5 3-5 2-3
(designing & programming)
1.5
Progr. Accelerator(domain specific)
3 5-10 2-3(designing &
programming)2-3
Fixed Accelerator(app specific)
5-10 10 10(SoC design) 3-5
Specialized Mem & Interconnect (monolithic die)
10 10 10(SoC design) 10
Package level integration(multi die: logic,mem,analog)
10+ 10+5
(silicon interposer)
5Approved for Public Release, Distribution Unlimited
• Can we achieve PHP productivity at BLAS efficiency?
PHP 9,298,440 ms 51,090xPython 6,145,070 ms 33,764xJava 348,749 ms 1816xC 19,564 ms 107xTiled C 12,887 ms 71xVectorized 6,607 ms 36xBLAS Parallel 182 ms 1
Approved for Public Release, Distribution Unlimited
C. Reduce Software Bloat(e.g., matrix multiply)
16
D. Locality-aware Parallelism
• Now: Seek (vast) parallelismo e.g., simple, energy efficient cores
• But remote communication >100x cost of compute
Approved for Public Release, Distribution Unlimited
= 1200 pJ (24x)
C. Approximate Computing Example
SECOND ORDER DIFFERENTIAL EQUATION ON ANALOG ACCELERATOR WITH DIGITAL ACCELERATOR.
Approved for Public Release, Distribution Unlimited
18
Workshop Takeaway• Can Harvest in the “Fallow” Period!
A. HW/SW Specialization/Co-designB. Reduce SW Bloat
C. Approximate Computing---------------------------------------------------
~1000x = 2 decades of Moore’s Law!• D. Systems must exploit LOCALITY-AWARE
parallelism
• HILL’s TWO CENTS: Move beyond General-Purposeo Systems that do new things, e.g., Kinecto Optimizations that help some, e.g., big memory workloads
Approved for Public Release, Distribution Unlimited
21st Century Computer Architecture A Community White Paper, April-May 2012
+ Jim Larus & Jeannette Wing gave feedback+ CCC, Erwin Gianchandani, Ed Lazowska guided process 19
Mark D. Hill, U Wisconsin (coordinator)Sarita Adve, U IllinoisDavid H. Albonesi, Cornell UDavid Brooks, Harvard ULuis Ceze, U Washington Sandhya Dwarkadas, U Rochester Joel Emer, Intel/MIT Babak Falsafi, EPFL Antonio Gonzalez, Intel/UPC Mary Jane Irwin, Penn State U David Kaeli, Northeastern U Stephen W. Keckler, NVIDIA/U TexasChristos Kozyrakis, Stanford UAlvin Lebeck, Duke UMilo Martin, U Pennsylvania
José F. Martínez, Cornell UMargaret Martonosi, Princeton U Kunle Olukotun, Stanford UMark Oskin, U Washington Li-Shiuan Peh, M.I.T. Milos Prvulovic, Georgia Tech Steven K. Reinhardt, AMDMichael Schulte, AMD/U WisconsinSimha Sethumadhavan, Columbia UGuri Sohi, U Wisconsin Daniel Sorin, Duke UJosep Torrellas, U Illinois Thomas F. Wenisch, U Michigan David Wood, U Wisconsin Katherine Yelick, UC Berkeley/LBNL
Technology’s Challenges
Late 20th Century The New Reality
Moore’s Law —2× transistors/chip
Transistor count still 2× BUT…
Dennard Scaling —~constant power/chip
Gone. Can’t repeatedly double power/chip
Modest (hidden) transistor unreliability
Increasing transistor unreliability can’t be hidden
Focus on computation over communication
Communication (energy) more expensive than computation
1-time costs amortized via mass market
One-time cost much worse &want specialized platforms
How should architects step up as technology falters?
21st Century Computer Architecture
20th Century 21st Century
Single-chip in stand-alone computer
Architecture as Infrastructure: Spanning sensors to cloudsPerformance plus security, privacy, availability, programmability, …
Cross-Cutting:
Break current layers with new interfaces
Performance via invisible instructionlevel parallelism
Energy First● Parallelism● Specialization● Cross-layer design
Predictable technologies: CMOS, DRAM, & disks
New technologies (non-volatile memory, near-threshold, 3D, photonics, …) Rethink: memory & storage, reliability, communication
21
X
X
Some Thoughts
Need to step up for agency positions
Architecture
PL OS
ASPLOS 2014
??????
ASPLOS
NSF CCF Division Director Search
5. Emmett Witchel Unbounded
THE 90SSUCKED
JERRY GARCIADEAD1995
THE VERVE THE VERVE PIPE
ARCHITECTUREWAS
BORING
IntelDate µArch Clock Int9505/96 Pentium 133 04.210/97 Pentium II 266 10.809/98 Pentium II 450 17.3
DEC AlphaDate µArch Clock Int9503/96 21064 266 04.304/97 21164 500 14.409/98 21164 533 16.8
Architecture
MicroarchitectureorClock rate
1. Buy machine2. Wait 18 months3. Buy next one
MICROARCHITECTUREPROVIDES PERFORMANCE
LIFE IS BETTERNOW
ARCHITECTURE CHANGESPROVIDE VALUE
IntelDate µArch Arch01/10 Westmere AES-NI01/11 Sandy Bridge Instruction for SHA-109/11 Ivy Bridge RdRand
• VT-x (11/05)• Extended Page Tables (11/08)• VT-d (11/08)• VPID (11/08) (tagged TLB!)
1. Consider app2. Buy machine3. Goto 1
HARDWARE + SOFTWARE COOPERATION NECESSARY
SecurityMobileData centersConcurrencyGPU/Accelerator
The ‘10sbelong to ASPLOS
Research Directions for 21st Century Computer
Systems
ASPLOS 2013 Panel0. Mark Hill: Introduction1. Kathryn McKinley on NAS Report
The Future of Computing Performance: Game Over or Next Level?
2. Josep Torrellas on CCC WorkshopsAdvancing Computer Architecture Research (ACAR)
3. Mark Hill on ISAT WorkshopAdvancing Computer Systems without Technology Progress
4. Sarita Adve on CCC White Paper21st Century Computer Architecture
5. Emmett Witchel unbounded
33
Kathryn S. McKinleyKathryn S. McKinley is a Principal Researcher at Microsoft and an Endowed Professor of Computer Science at The University of Texas at Austin. She and her collaborators have produced widely used tools: the DaCapo Java Benchmarks, TRIPS Compiler, Hoard memory manager, MMTk garbage collector toolkit, and Immix garbage collector. Her awards include: NSF Career, ASPLOS 2009 Best Paper, 2012 IEEE Top Picks, CACM Research Highlights (2006, 2012), Most Influential OOPSLA Paper from 2002 (awarded 2012), the 2011 ACM SIGPLAN Distinguished Service Award, and the 2012 ACM SIGPLAN Programming Languages Software Award. She has graduated 17 PhD students. She is an IEEE Fellow and ACM Fellow.
34
Josep TorrellasJosep Torrellas is a Professor of Computer Science at the University of Illinois Urbana-Champaign. He is the Director of the Center for Programmable Extreme Scale Computing, and the Director of the Illinois-Intel Parallelism Center (I2PC). He has also been a Willett Faculty Scholar and lead the OpenSPARC Center of Excellence. He is the past Chair of the IEEE Technical Committee on Computer Architecture, and currently serves as a Council Member of CRA's Computing Community Consortium. He is a Fellow of IEEE and ACM. He has made many technical contributions in the areas of shared-memory parallel computer architecture, low-power design, hardware reliability, and software dependability. He has graduated 30 Ph.D. students, who are now leaders in academia and industry. He is currently working on the Bulk Multicore Architecture, and on the DARPA-funded Runnemede Extreme Scale Architecture, both in collaboration with Intel.
35
Mark HillMark D. Hill (www.cs.wisc.edu/~markhill) is professor in both the computer sciences department and the electrical and computer engineering department at the University of Wisconsin--Madison, where he also co-leads the Wisconsin Multifacet (www.cs.wisc.edu/multifacet/) project with David Wood. His research interests include parallel computer system design, memory system design, computer simulation, deterministic replay and transactional memory. He earned a PhD from University of California, Berkeley. He is an ACM Fellow and a Fellow of the IEEE.
36
Sarita AdveSarita Adve is Professor of Computer Science at the University of Illinois at Urbana-Champaign. Her research interests are in computer architecture and systems, parallel computing, and power and reliability-aware systems. Her honors include the Anita Borg Institute Women of Vision award in innovation, the ACM SIGARCH Maurice Wilkes award, the University Scholar recognition by the University of Illinois, and an Alfred P. Sloan Research Fellowship. She is a fellow of the ACM and the IEEE. She serves on the boards of the Computing Research Association and ACM SIGARCH. She received the Ph.D. in Computer Science from the University of Wisconsin-Madison in 1993.
37
Emmitt WitchelEmmett Witchel is an associate professor in computer science at The University of Texas at Austin. He and his group are interested in operating systems, security, and architecture. Most of his current research is about secure systems, GPU systems, and concurrent systems. He received his doctorate from MIT in 2004.