parallel and distributed computing parallel and distributed

31
Parallel and Distributed Computing Parallel and Distributed Computing Chapter 1: Introduction to Parallel Computing Jun Zhang Laboratory for High Performance Computing & Computer Simulation Department of Computer Science University of Kentucky Lexington, KY 40506 Chapter 1: CS621 1

Upload: duongkien

Post on 14-Feb-2017

277 views

Category:

Documents


5 download

TRANSCRIPT

Parallel and Distributed ComputingParallel and Distributed Computing Chapter 1: Introduction to Parallel Computing

Jun Zhang

Laboratory for High Performance Computing & Computer SimulationDepartment of Computer Science

University of KentuckyLexington, KY 40506

Chapter 1: CS621 1

g

1.1a: von Neumann Architecture

Common machine model for over 40 yearsStored-program conceptCPU executes a stored programA sequence of read andA sequence of read and write operations on the memoryOrder of operations is sequential

Chapter 1: CS621 2

1.1b: A More Detailed Architecture based on von Neumann Model

Chapter 1: CS621 3

1.1c: Old von Neumann Computer

Chapter 1: CS621 4

1.1d: CISC von Neumann Computer

CISC stands for Complex Instruction SetCISC stands for Complex Instruction Set Computer with a single bus systemHarvard (RISC) architecture utilizes twoHarvard (RISC) architecture utilizes two buses, a separate data bus and an address busbusRISC stands for Reduced Instruction Set ComputerComputerThey are SISD machines – Single Instruction Stream on Single Data Stream

Chapter 1: CS621 5

Stream on Single Data Stream

1.1e: Personal Computer

Chapter 1: CS621 6

1.1f: John von Neumann

December 28, 1903 –,February 8, 1957Hungarian mathematicianM t d l l t 8Mastered calculus at 8Graduate level math at 12Got his Ph D at 23Got his Ph.D. at 23His proposal to his 1st wife, “You and I might be able to have some fun together, seeing as how we both like to drink."

Chapter 1: CS621 7

to drink.

1.2a: Motivations for Parallel Computing

Fundamental limits on single processorFundamental limits on single processor speedDisparity between CPU & memory speedsDisparity between CPU & memory speedsDistributed data communicationsN d f l l tiNeed for very large scale computing platforms

Chapter 1: CS621 8

1.2b: Fundamental Limits – Cycle Speed

Cray 1: 12ns 1975Cray 1: 12ns 1975Cray 2: 6ns 1986Cray T 90 2ns 1997Cray T-90 2ns 1997Intel PC 1ns 2000Today’s PC 0.3ns 2006 (P4)

Speed of light: 30cm in 1nsSignal travels about 10 times slower

Chapter 1: CS621 9

g

1.2c: High-End CPU is Expensive

Price for high-endPrice for high-endCPU rises sharply

Intel processorprice/performanceprice/performance

Chapter 1: CS621 10

1.2d: Moore’s Law

Moore’s observation in 1965: the number ofMoore s observation in 1965: the number of transistors per square inch on integrated circuits had doubled every year since the y yintegrated circuit was inventedMoore’s revised observation in 1975: theMoore s revised observation in 1975: the pace slowed down a bit, but data density had doubled approximately every 18 monthspp y yHow about the future? (price of computing power falls by a half every 18 months?)

Chapter 1: CS621 11

p y y )

1.2e: Moore’s Law – Held for Now

Chapter 1: CS621 12

1.3a: CPU and Memory Speeds

In 20 years, CPU speed (clock rate) hasIn 20 years, CPU speed (clock rate) has increased by a factor of 1000DRAM speed has increased only by a factor p y yof smaller than 4How to feed data faster enough to keep CPU busy?CPU speed: 1-2 nsDRAM speed: 50-60 nsCache: 10 ns

Chapter 1: CS621 13

1.3b: Memory Access and CPU Speed

DDR double data rate

Chapter 1: CS621 14

DDR – double data rate

1.3b: CPU, Memory, and Disk Speed

Chapter 1: CS621 15

1.3c: Possible Solutions

A hierarchy of successively fast memoryA hierarchy of successively fast memory devices (multilevel caches)Location of data referenceLocation of data referenceEfficient programming can be an issueP ll l t idParallel systems may provide1.) larger aggregate cache2.) higher aggregate bandwidth to the memory system

Chapter 1: CS621 16

1.4a: Distributed Data Communications

Data may be collected and stored at differentData may be collected and stored at different locationsIt is expensive to bring them to a centralIt is expensive to bring them to a central location for processingMany computing assignments many beMany computing assignments many be inherently parallelPrivacy issues in data mining and other largePrivacy issues in data mining and other large scale commercial database manipulations

Chapter 1: CS621 17

1.4b: Distributed Data Communications

Chapter 1: CS621 18

1.5a: Why Use Parallel Computing

Save time – wall clock time – manySave time wall clock time many processors work togetherSolve larger problems – larger than oneSolve larger problems larger than one processor’s CPU and memory can handleProvide concurrency – do multiple things atProvide concurrency – do multiple things at the same time: online access to databases, search enginesearch engineGoogle’s 4,000 PC servers are one of the largest clusters in the world

Chapter 1: CS621 19

largest clusters in the world

1.5b: Other Reasons for Parallel Computing

Taking advantages of non-local resources –Taking advantages of non local resources using computing resources on a wide area network, or even internet (grid computing)Cost savings – using multiple “cheap” computing resources instead of a high-end CPUCPUOvercoming memory constraints – for large problems using memories of multipleproblems, using memories of multiple computers may overcome the memory constraint obstacle

Chapter 1: CS621 20

1.6a: Need for Large Scale Modeling

Weather forecastingWeather forecastingOcean modelingOil reservoir simulationsOil reservoir simulationsCar and airplane manufactureSemiconductor simulationPollution trackingLarge commercial databasesAerospace (NASA microgravity modeling)

Chapter 1: CS621 21

p ( g y g)

1.6b: Semiconductor Simulation

Before 1975, an engineer had to make several runsBefore 1975, an engineer had to make several runs through the fabrication line until a successful device was fabricatedDevice dimensions shrink below 0.1 micro-meterA fabrication line costs 1.0 billion dollars to buildA design must be thoroughly verified before it is committed to siliconA realistic simulation for one diffusion process mayA realistic simulation for one diffusion process may take days or months to run on a workstationChip price drops quickly after entering the market

Chapter 1: CS621 22

Chip price drops quickly after entering the market

1.6c: Drug Design

Most drugs work by binding to a specific site,Most drugs work by binding to a specific site, called a receptor, on a proteinA central problem is to find molecules p(ligands) with high binding affinityNeed to accurately and efficiently estimate electrostatic forces in molecular and atomic interactions C l l t d t i bi di i fCalculate drug-protein binding energies from quantum mechanics, statistical mechanics and simulation techniques

Chapter 1: CS621 23

and simulation techniques

1.6d: Computing Protein Binding

Chapter 1: CS621 24

1.7: Issues in Parallel Computing

Design of parallel computersDesign of parallel computersDesign of efficient parallel algorithmsMethods for evaluating parallel algorithmsMethods for evaluating parallel algorithmsParallel computer languagesParallel programming toolsPortable parallel programsAutomatic programming of parallel computersEducation of parallel computing philosophy

Chapter 1: CS621 25

p p g p p y

1.8 Eclipse Parallel Tools Platform

A standard portable parallel integratedA standard, portable parallel integrated development environment that supports a wide range of parallel architectures and run g ptime systems (IBM)A scalable parallel debuggerA scalable parallel debuggerSupport for the integration of a wide range of parallel toolsparallel toolsAn environment that simplifies the end-user interaction with parallel systems

Chapter 1: CS621 26

interaction with parallel systems

1.9 Message Passing Interface (MPI)

We will use MPI on UK supercomputersWe will use MPI on UK supercomputers Message Passing Interface can also be downloaded from an online websitedownloaded from an online websiteParallel computing can be simulated on your own computersown computersUK can provide distributed computing services for research purposes for freeservices for research purposes, for free

Chapter 1: CS621 27

1.10 Cloud Computing

Cloud Computing is a style of computing in which d i ll l bl d ft i t li ddynamically scalable and often virtualized resources are provided over the InternetUsers need not have knowledge of, expertise in, orUsers need not have knowledge of, expertise in, or control over the technology infrastructure in the “cloud” that support themC d G id C i ( l f k dCompared to: Grid Computing (cluster of networked, loosely coupled computers)Utility Computing (packaging of computing resources asUtility Computing (packaging of computing resources as a metered service) and Autonomic Computing (computer systems capable of self-management)

Chapter 1: CS621 28

1.11 Cloud Computing

By Sam Johnston in Wikimedia Commons

Chapter 1: CS621 29

1.12 Cloud Computing

A means to increase computing capacity or add computing capabilities at any time without investing in new infrastructure training new personnel or licensing newwithout investing in new infrastructure, training new personnel, or licensing new software

Chapter 1: CS621 30

1.13 Cloud Computing

Chapter 1: CS621 31