joint uiuc/umd parallel algorithms/programming course david padua, university of illinois at...

17
Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker Jeffrey C. Carver, University of Alabama

Post on 20-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

Joint UIUC/UMD Parallel Algorithms/Programming Course

David Padua, University of Illinois at Urbana-ChampaignUzi Vishkin, University of Maryland, speakerJeffrey C. Carver, University of Alabama

Page 2: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

Motivation 1/4Programmers of today’s parallel machines must overcome 3

productivity busters, beyond just identifying operations that can be executed in parallel:

(i) impose the often difficult 4-step programming-for-locality recipe: decomposition, assignment, orchestration, and mapping [CS99]

(ii) reason about concurrency in threads; e.g., race conditions (iii) for machines such as GPU, that fall behind on serial (or

low parallelism) code, whole programs must be highly parallel

2

Page 3: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

Motivation 2/4: Commodity computer systems

If you want your program to run significantly faster … you’re going to have to parallelize it

Parallelism: only game in town

But, where are the players?

“The Trouble with Multicore: Chipmakers are busy designing microprocessors that most programmers can't handle”—D. Patterson, IEEE Spectrum 7/2010

• Only heroic programmers can exploit the vast parallelism in current machines – Report by CSTB, U.S. National Academies 2011

• An education agenda must: (i) recognize this reality, (ii) adapt to it, and (iii) identify broad impact opportunities for education

Page 4: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

Motivation 3/4: Technical Objectives• Parallel computing exists for providing speedups over serial computing• Its emerging democratization the general body of CS students &

graduates must be capable of achieving good speedups

What is at stake?A general-purpose computer that can be programmed effectively by too

few programmers, or requires excessive learning application SW development costs more, weakening market potential of not only the computer:

Traditionally, Economists look to the manufacturing sector for bettering the recovery prospects of the economy. Software production is the quintessential 21st century mode of manufacturing. These prospects are at peril if most programmers are unable to design effective software for mainstream computers

4

Page 5: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

Motivation 4/4: Possible Roles for Education

• Facilitator. Prepare & train students and the workforce for a future dominated by parallelism.

• Testbed. Experiment with vertical approaches and refine them to identify the most cost-effective ways for achieving speedups.

• Benchmark. Given a vertical approach, identify the developmental stage at which it can be taught. Rationale: Ease of learning/teaching is a necessary (though not sufficient) condition for ease-of-programming

5

Page 6: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

The joint inter-university course

• UIUC: Parallel Programming for Science and Engineering, Prof: DP• UMD: Parallel Algorithms, Prof: UV• Student population: upper-division undergrads and graduate

students. Diverse majors and backgrounds• ~1/2 of the fall 2010 sessions, joint by videoconferencing.

Objectives 1. Demonstrate logistical and educational feasibility of a real-time co-

taught course.Outcome Overall success. Minimal glitches. Helped to alert students

that success on material taught by the other prof is as important.2. Compare OpenMP using 8-processor SMP against PRAM/XMTC using

64-processor XMT (<1/4 of silicon area for 2 SMP processors)

6

Page 7: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

Joint sessions• DP taught OpenMP programming. Provided parallel architecture

knowledge• UV taught parallel (PRAM) algorithms. ~20 minutes of XMTC

programming • 3 joints programming assignments

Non-shared sessions • UIUC: mostly MPI. Submitted more OpenMP programming assignments• UMD: More parallel algorithms. Dry homework on design & analysis of

parallel algorithms. Submitted a more demanding XMTC programming assignment

JC: Anonymous questionnaire filled by the students. Accessed by DP and UV only after all grades were posted, per IRB guidelines

7

Page 8: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

Rank approaches for achieving (hard) speedups

Breadth-first-search (BFS) example • 42 students in fall 2010 joint UIUC/UMD course - <1X speedups using OpenMP on 8-processor SMP- 7x-25x speedups on 64-processor XMT FPGA prototype

Questionnaire All students, but one : XMTC ahead of OpenMP for achieving speedups

In view of this evidence Are we really ready for standards?

8

Page 9: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

Parallel Random-Access Machine/Model

PRAM:

n synchronous processors all having unit time access to a shared memory.

Reactions You got to be kidding, this is way:- Too easy - Too difficult: Why even mention processors? What to do with n processors? How to allocate processors to instructions?

Page 10: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

Immediate Concurrent Execution

10

‘Work-Depth framework’ SV82, Adopted in Par Alg texts [J92,KKT01].Example: Pairwise parallel summation. 1st round for 8 elements: In parallel 1st+2nd, 3rd+4th,5th+6th,7th+8th ICE basis for architecture specs: V, Using simple abstraction to reinvent computing for parallelism, CACM 1/2011

Similar to role of stored-program & program-counter in arch specs for serial comp

Page 11: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

Feasible for many-cores

Algorithms

Programming

Programmer’s workflow

Rudimentary yet stable

compiler

PRAM-On-Chip HW Prototypes64-core, 75MHz FPGA of XMT [SPAA98..CF08

] Toolchain Compiler +

simulator HIPS’11128-core interconnection network

IBM 90nm: 9mmX5mm, - 400 MHz [HotI07]

FPGA designASIC • IBM 90nm: 10mmX10mm

• 150 MHz

Architecture scales to 1000+ cores on-chip

XMT homepage: www.umiacs.umd.edu/users/vishkin/XMT/index.shtml or search: ‘XMT’

Page 12: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

Has the study of PRAM algorithmshelped XMT programming?

• Majority of UIUC students No• UMD students Strong Yes: enforced by written explanation

DiscussionExposure of UIUC students to PRAM algorithms and XMT programming much

more limited. Their understanding of this material not challenged by analytic homework, or exams.

For same programming challenges, performance of UIUC and UMD students was similar.

Must students be exposed to minimal amount of parallel algorithms and their programming, and be properly challenged on analytic understanding to internalize their merit? If yes: tension with pressure on parallel computing courses to cover a hodge-podge of programming paradigms & architecture backgrounds

Page 13: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

More Issues/lessons• Recall the title of the courses at UIUC/UMD: Should we

use class time only for algorithms or also for programming? Algorithms: high level of abstraction. Allows to cover more advanced problems. Note: Understanding tested only for UMD students.

• Made do with already assigned courses. Next time: more homogenous population; e.g., CS grad class. If interested in taking part, please let us know

• General lesson: IRB requires pre-submission of all questionnaires. Must complete planning by then.

Page 14: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

ConclusionFor parallelism to succeed serial computing in the

mainstream, the first experience of students got to: - demonstrate solid hard speedups- be trauma-free

Beyond education Objective rankings of approaches for achieving hard speedups provide a clue for curing the ills of the field.

14

Page 15: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

Course homepagesagora.cs.illinois.edu/display/cs420fa10/Home andwww.umiacs.umd.edu/users/vishkin/TEACHING/enee459p-f10.html

For summary of the PRAM/XMT education approach:www.umiacs.umd.edu/users/vishkin/XMT/PPOPPCPATH2011.pdf

Includes teaching experience extending from middle school to

graduate courses, course material [class notes, programming assignments, video presentations of a full-day tutorial and a full-semester graduate course], a software toolchain (compiler and cycle-accurate simulator, HIPS 5/20) available for free download, and the XMT hardware

15

Page 16: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

How I teach parallel algorithms at different developmental stage

• Graduate In class, same PRAM algorithms course as in prior decades and complexity-style dry HW. <20 minutes of XMTC programming. 6 programming assigning with target hard speedups objectives. Include: parallel graph connectivity and XMT performance tuning

• Upper division undergraduate Less dry HW. Less programming. Still demand hard speedups

• Freshmen/HS [SIGCSE’10] Minimal/no dry HW. Same problems as in freshmen serial programming course

Understanding of par algorithms needs to be enforced & validated by programming, or otherwise most students will get very little from it

16

Page 17: Joint UIUC/UMD Parallel Algorithms/Programming Course David Padua, University of Illinois at Urbana-Champaign Uzi Vishkin, University of Maryland, speaker

What about architecture education?• Need badly parallel architectures that make parallel thinking easier• In the happy days of serial computing, stored-program + program

counter wall between arch and alg algs low priority. Not now!• A trigger for XMT: brilliant incompetence of CSE@UMD. ECE faculty never teach undergrad alg courses. Can be alg researcher

and teach arch courses … XMT Reality Few regularly teach arch and (grad) alg courses, not to say par

algs But, why rely on accidents?! teach next generation arch students to

master both, so that they can be better architects • Very different thought styles are used for one and the same problem

more often than are very closely related ones—1935, Ludwik Fleck (‘the Turing’ of Sociology of Science)

17