parallel processing sharing the load. inside a processor chip in package circuits primarily...
TRANSCRIPT
Parallel ProcessingSharing the load
Inside a Processor
Chip in Package
Circuits
• Primarily Crystalline Silicon
• 1 mm – 25 mm on a side
• 100 million to billions of transistors– current “feature size” (process)
~ 22 nanometers
• Package provides:– communication with motherboard– heat dissipation
Moore's Law
• Number of transistors in same area doubles every 2 years
• Net effects:Processing power doubles approximately every 18 months
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Exponential Growth
• Doubling is exponential growthYear 0 1.5 3 4.5 6 7.5 9 10.5 12
Speed 1 2 4 8 16 32 64 128 256
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Moore's Law
Gordon MooreIntel Cofounder
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Moore's Law
• If Moore's Law were applicable to the airline industry, a flight from New York to Paris in 1978 that cost $900 and took seven hours, would now cost about $0.01 and take less than one second.
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Power Density Prediction circa 2000
40048008
8080 8085
8086
286 386486
Pentium® procP6
1
10
100
1000
10000
1970 1980 1990 2000 2010
Year
Pow
er D
ensi
ty (W
/cm
2)
Hot Plate
Nuclear Reactor
Rocket Nozzle
Source: S. Borkar (Intel)
Sun’s Surface
Core 2
MultiCore• Multicore : Multiple processing cores on one chip– Each core can run a different program
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Going Multi-core Helps Energy Efficiency• Speed takes power,
Power = heat– Can run at 80% speed with
50% power
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Moore's Law Related Curves
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Moore's Law Related Curves
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Issues
• Not every part of a problem scales well– Parallel : can run at same time– Serial : must run one at a time in order
Adapted from UC Berkeley "The Beauty and Joy of Computing"
• 5 workers can do parallel portion in 1/5th the time• Can't affect serial part
Speedup Issues
Time
Number of Cores
Parallel portion
Serial portion
1 5
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Speedup IssuesTime
Number of Cores
Parallel portion
Serial portion
1 2 3 4 5
• Increasing workers provide diminishing returns
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Amdahl’s Law
• Amdahl’s law : Predicts how many times faster N workers can do a task in which P portion is parallel
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Amdahl’s Law
• 60% of a job can be made parallel. We use 2 processors:
• 1.43x faster with 2 than 1
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Amdahl’s Law
• 60% of a job can be made parallel. We use 3 processors:
• 1.67x faster than with 1 worker
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Amdahl’s Law
• Always have to do 40% of the work in serial• With infinite workers:
Only 2.5x faster!
2.5
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Limits
• Max speedup limited by parallel portion of code:
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Speedup Issues : Overhead• Even assuming no sequential portion, there’s…– Time to think how to divide the problem up – Time to hand out small “work units” to workers – All workers may not work equally fast– Some workers may fail – There may be contention for shared resources – Workers could overwriting each others’ answers– You may have to wait until the last worker returns to
proceed (the slowest / weakest link problem)– There’s time to put the data back together in a way that
looks as if it were done by one
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Concurrency
• Concurrency : two things happening at the same time
• Many things don't work well concurrently– Printers– Shared memory
Adapted from UC Berkeley "The Beauty and Joy of Computing"
No Synchronization
• Race Condition : unpredictable result based on timing of concurrent operations
Adapted from UC Berkeley "The Beauty and Joy of Computing"
• X starts as 5, four possible answers:
No Synchronization
Case 1 Case 2 Case 3 Case 4
A runs, x = 15B runs, x =16
B runs, x = 6A runs, x = 16
A gets x (5)B gets x (5)A adds 10 (has 15)B adds 1 (has 6)A stores x = 15B stores x = 6
A gets x (5)B gets x (5)A adds 10 (has 15)B adds 1 (has 6) B stores x = 6A stores x = 15
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Locks
• Can prevent concurrency problems with locks:
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Deadlock
• But if we have…– Mutual exclusion : can't share resources– Hold and wait : you can reserve one resource
while waiting on another– No preemption : can't remove a resource from a
process's control
• Can have deadlock…
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Deadlock
• Workers A and B both want to use locked resources X and Y:
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Breaking Deadlock
• Must remove one condition– Mutual Exclusion• Find a way to share
– Hold and Wait• If you wait you must give up other resources
– No preemption• Take back a resource someone has claimed
Adapted from UC Berkeley "The Beauty and Joy of Computing"
Why Parallelism?
• We have no choice!– Multicore processors are a plan B, not a triumph for
parallelism
• Parallel processing takes new– Architectures– Algorithms– Structures– Languages