multicore, parallelism, and multithreading

38
MULTICORE, PARALLELISM, AND MULTITHREADING By: Eric Boren, Charles Noneman, and Kristen Janick

Upload: siusan

Post on 23-Feb-2016

115 views

Category:

Documents


0 download

DESCRIPTION

Multicore, parallelism, and multithreading. By: Eric Boren, Charles Noneman , and Kristen Janick. Multicore Processing. Why we care. What is it?. A processor with more than one core on a single chip - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multicore, parallelism, and multithreading

MULTICORE, PARALLELISM, AND MULTITHREADINGBy: Eric Boren, Charles Noneman, and Kristen Janick

Page 2: Multicore, parallelism, and multithreading

MULTICORE PROCESSINGWhy we care

Page 3: Multicore, parallelism, and multithreading

What is it?

A processor with more than one core on a single chip

Core: An independent system capable of processing instructions and modifying registers and memory

Page 4: Multicore, parallelism, and multithreading

Motivation Advancements in component technology

and optimization are limited in contribution to processor speed

Many CPU applications attempt to do multiple things at once: Video editing Multi-agent simulation

So, use multiple cores to get it done faster

Page 5: Multicore, parallelism, and multithreading

Hurdles Instruction assignment (who does what?)

Mostly delegated to the operating system Can be done to a small degree through

dependency analysis on the chip Cores must still communicate at times –

how? Shared-memory Message passing

Page 6: Multicore, parallelism, and multithreading

Advantages Multiple Programs:

Can be separated between cores Other programs don’t suffer when one hogs CPU

Multi-threaded Applications: Independent threads don’t have to wait as long for

each other – results in faster overall execution VS Multiple Processors

Less distance between chips - faster communication results in higher maximum clock rate

Less expensive due to smaller overall chip area, shared components (caches, etc.)

Page 7: Multicore, parallelism, and multithreading

Disadvantages OS and programs must be optimized for

multiple cores, or no gain will be seen

In a singly-threaded application, little to no improvement

Overhead in assigning tasks to cores

Real bottleneck is typically memory and disk access time – independent of number of cores

Page 8: Multicore, parallelism, and multithreading

Amdahl’s Law

Potential performance increase on a parallel computing platform is given by Amdahl’s law. Large problems are made up of several

parallelizable parts and non-parallelizable parts.

S = 1/(1-P) S = speed-up of program P = fraction of program that is parallizable

Page 9: Multicore, parallelism, and multithreading

Current State of the Art Commercial processors:

Most have at least 2 cores Quad-core are highly popular for desktop applications 6-core processors have recently appeared on the

market (Intel’s i7 980X) 8-core exist but are less common

Academic and research: MIT: RAW 16-core Intel Polaris – 80-core UC Davis: AsAP – 36 and 167-core, individually-

clocked

Page 10: Multicore, parallelism, and multithreading

PARALLELISM

Page 11: Multicore, parallelism, and multithreading

What is Parallel Computing?

Form of computation in which many calculations are carried out simultaneously.

Operating on the principle that large problems can often be divided into smaller ones, which are solved concurrently.

Page 12: Multicore, parallelism, and multithreading

Types of Parallelism

Bit level parallelism Increase processor word size

Instruction level parallelism Instructions combined into groups

Data parallelism Distribute data over different computing environments

Task parallelism Distribute threads across different computing environments

Page 13: Multicore, parallelism, and multithreading

Flynn’s Taxonomy

Page 14: Multicore, parallelism, and multithreading

Single Instruction, Single Data (SISD)

Provides no parallelism in hardware

1 data stream processed by the CPU in 1 clock cycle

Instructions executed in serial fashion

Page 15: Multicore, parallelism, and multithreading

Multiple Instruction, Single Data (MISD)

Process single data stream using multiple instruction streams simultaneously

More theoretical model than practical model

Page 16: Multicore, parallelism, and multithreading

Single Instruction, Multiple Data (SIMD)

Single instruction steam has ability to process multiple data streams in 1 clock cycle

Takes operation specified in one instruction and applies it to more than 1 set of data elements at 1 time

Suitable for graphics and image processing

Page 17: Multicore, parallelism, and multithreading

Multiple Instruction, Multiple Data (MIMD)

Different processors can execute different instructions on different pieces of data

Each processor can run independent task

Page 18: Multicore, parallelism, and multithreading

Automatic parallelization

The goal is to relieve programmers from the tedious and error-prone manual parallelization process.

Parallelizing compiler tries to split up a loops so that its iterations can be executed on separate processors concurrently

Identify dependences between references -- independent actions can operate in parallel

Page 19: Multicore, parallelism, and multithreading

Parallel Programming languages

Concurrent programming languages, libraries, API’s, and parallel programming models have been created for programming parallel computers.

Parallel languages make it easier to write parallel algorithms Resulting code will run more efficiently because

the compiler will have more information to work with

Easier to identify data dependencies so that the runtime system can implicitly schedule independent work

Page 20: Multicore, parallelism, and multithreading

MULTITHREADING TECHNIQUES

Page 21: Multicore, parallelism, and multithreading

fork()

Make a (nearly) exact duplicate of the process

Good when there is no or almost no need to communicate between processes

Often used for servers

Page 22: Multicore, parallelism, and multithreading

fork()ParentGlobalsHeap Stack

ChildGlobalsHeapStack

ChildGlobalsHeapStack

ChildGlobalsHeapStack

ChildGlobalsHeapStack

Page 23: Multicore, parallelism, and multithreading

fork()

pid_t pID = fork();

if (pID == 0) {//child

} else {//parent

}

Page 24: Multicore, parallelism, and multithreading

POSIX Threads C library for threading

Available in Linux, OS X

Shared Memory

Threads are created and destroyed manually

Has mechanisms for locking memory

Page 25: Multicore, parallelism, and multithreading

POSIX ThreadsProcessGlobalsHeap

Thread

Stack

Thread

Stack

Thread

Stack

Thread

Stack

Page 26: Multicore, parallelism, and multithreading

POSIX Threadspthread_t thread;pthread_create( &thread, NULL,

function_to_call, (void*) data);

//Do stuff

pthread_join(thread, NULL);

Page 27: Multicore, parallelism, and multithreading

POSIX Threadsint total = 0;

void do_work() {//Do stuff to create “result”total = total + result;

}

Thread 1 reads total (0)Thread 2 reads total (0)Thread 1 does add and saves total (1)Thread 2 does add and saves total (2)

Page 28: Multicore, parallelism, and multithreading

POSIX Threadsint total = 0;pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

void do_work() {//Do stuff to create “result”pthread_mutex_lock( &mutex );total = total + result;pthread_mutex_unlock( &mutex );

}

Page 29: Multicore, parallelism, and multithreading

OpenMP Library and compiler directives for multi-threading

Support in Visual C++, gcc

Code compiles even if compiler doesn't support OpenMP

Popular in high performance communities

Easy to add parallelism to existing code

Page 30: Multicore, parallelism, and multithreading

OpenMPInitialize an Array

const int array_size = 100000;int i, a[array_size]; #pragma omp parallel forfor (i = 0; i < array_size; i++) { a[i] = 2 * i;}

Page 31: Multicore, parallelism, and multithreading

OpenMPReduction

#pragma omp parallel for reduction(+:total)

for(i = 0; i < array_size; i++) { total = total + a[i]; }

Page 32: Multicore, parallelism, and multithreading

Grand Central Dispatch

Apple Technology for Multi-ThreadingProgrammer puts work into queuesA system central process determines the number threads to give to each queueAdd code to queues using a closureRight now Mac only, but open sourceEasy to add parallelism to existing code

Page 33: Multicore, parallelism, and multithreading

Grand Central DispatchInitialize an Array

dispatch_apply(array_size, dispatch_get_global_queue(0, 0), ^(int i) {

a[i] = 2*i; });

Page 34: Multicore, parallelism, and multithreading

Grand Central DispatchGUI Example

void analyzeDocument(doc) { do_analysis(doc); //May take a very long

time update_display();}

Page 35: Multicore, parallelism, and multithreading

Grand Central DispatchGUI Example

void analyzeDocument(doc) {

dispatch_async(dispatch_get_global_queue(0, 0), ^{

do_analysis(doc); update_display(); });}

Page 36: Multicore, parallelism, and multithreading

Other Technologies

Threading in Java, Python, etc.

MPI – for clusters

Page 37: Multicore, parallelism, and multithreading

QUESTIONS?