using parallel computing platform - nhdnug

33
Visual Studio 2010 Using the Parallel Computing Platform Phil Pennington [email protected]

Upload: north-houston-net-users-group

Post on 10-May-2015

1.129 views

Category:

Technology


0 download

DESCRIPTION

Slides from Phil Pennington\'s talk on Using Parallel Computing with Visual Studio 2010 and .NET 4.0, originally presented at the North Houston .NET Users Group (facebook.com/nhdnug).

TRANSCRIPT

Page 1: Using Parallel Computing Platform - NHDNUG

Visual Studio 2010

Using the Parallel Computing PlatformPhil Pennington

[email protected]

Page 2: Using Parallel Computing Platform - NHDNUG

Agenda

2

• What’s new with Windows?• Parallel Computing Tools in Visual Studio• Using .NET Parallel Extensions

Page 3: Using Parallel Computing Platform - NHDNUG

First, An ExampleMonte Carlo Approximation of Pi

S = 4*r*r C = Pi*r*r

Pi = 4*(C/S)

For each Point (P),d(P) = SQRT((x * x) + (y * y))

if (d < r) then P(x,y) is in C

Page 4: Using Parallel Computing Platform - NHDNUG

Windows and Maximum Processors• Before Win7/R2, the maximum number of Logical Processors (LPs)

was dictated by processor integral word size– LP state (e.g. idle, affinity) represented in

word-sized bitmask– 32-bit Windows: 32 LPs– 64-bit Windows: 64 LPs

01631

32-bit Idle Processor Mask

Idle Busy

Page 5: Using Parallel Computing Platform - NHDNUG

Processor GroupsNew with Windows 7 and Windows Server R2

5

GROUPNUMA NODE

NUMA NODE

Socket Socket

Core Core

Core CoreLP

LP

LP

LP

Page 6: Using Parallel Computing Platform - NHDNUG

Processor GroupsExample: 2 Groups, 4 nodes, 8 sockets, 32 cores, 128 LP’s

6

Group

NUMA NodeSocket

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

Socket

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

NUMA NodeSocket

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

Socket

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

Group

NUMA NodeSocket

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

Socket

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

NUMA NodeSocket

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

Socket

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

CoreLP

LP

LP

LP

Page 7: Using Parallel Computing Platform - NHDNUG

Many-Core Topology APIs Discovery

7

Page 8: Using Parallel Computing Platform - NHDNUG

Many-Core Topology APIs Resource Localization

8

Page 9: Using Parallel Computing Platform - NHDNUG

Many-Core Topology APIs Memory Management

9

Page 10: Using Parallel Computing Platform - NHDNUG

Your Schedule

rLogic

Reason:

Yield

Wait

Reason:

Yield

User Mode SchedulingArchitectural Perspective

Application

Kernel

S1 S2

Scheduler Threads

CPU 1 CPU 2

W1 W2 W3 W4

Blocked Worker Threads

UMS Scheduler’s Ready List

UMS Completion List

Reason:

Created

Reason:

Blocked

Page 11: Using Parallel Computing Platform - NHDNUG

Task Scheduling with a UMS SchedulerMaximize Quantum, Minimize Blocking Affects

• Tasks are run by worker threads, which the scheduler controls

Dead Zone

WT0

WT1

WT2

WT3 Without UMS (signal-and-wait)

With UMS (UMS yield)

WT0

WT1

WT2

WT3

Page 12: Using Parallel Computing Platform - NHDNUG

CPU0 CPU1 CPU2 CPU3

Static Scheduling

Load-Balancing, Work Stealing Scheduler

Dynamic scheduling improves performance by distributing work efficiently at runtime.

CPU0 CPU1 CPU2 CPU3

Dynamic Scheduling

Page 13: Using Parallel Computing Platform - NHDNUG

Demos

The Platform- Topology- Schedulers

Page 14: Using Parallel Computing Platform - NHDNUG

Agenda

14

• What’s new with Windows?• Parallel Computing Tools in Visual Studio• Using .NET Parallel Extensions

Page 15: Using Parallel Computing Platform - NHDNUG

Tools Programming Models – Structured Parallelism

.NET Parallel Extensions

.NET Runtime

Visual Studio 2010, .NET Developer Tools, Programming Models, Runtimes

Parallel LINQ(PLINQ)

Resource Manager

Task Scheduler

Managed Library

Threads Pools

Dat

a S

tru

ctu

res

Tools

Debugger

Profiler

Task ParallelLibrary (TPL)

Page 16: Using Parallel Computing Platform - NHDNUG

Thread-Pool Scheduler in .NET 4.0

• Global Q is shared by legacy ThreadPool API and TPL

• Local work queues and work stealing scheduler (TPL only)

Enqueue

Global Queue (FIFO)

Thread 1Dispatch

Loop

Thread 1Local Queue

(LIFO)

Thread 2Dispatch

Loop

Thread 2Local Queue

(LIFO)

Thread NDispatch

Loop

Thread NLocal Queue

(LIFO)

Dequeue

DequeueEnqueue

Steal

T2T3 T4

Steal Steal

T5

T6

T7

T8

T1

Page 17: Using Parallel Computing Platform - NHDNUG

Task Parallel Library (TPL)Tasks Concepts

TaskAn asynchronous operation

Task<TResult>A Task that returns a result

ContinuationA Task that starts when another

completes

FromAsyncA Task that wraps an existing APM

implementation

TaskCompletionSourceA Task that represents another

operation

TaskSchedulerAn extensible scheduler that executes

Tasks

Common Functionality: waiting, cancellation, continuations, parent/child relationships

Page 18: Using Parallel Computing Platform - NHDNUG

Primitives and Structures• Thread-safe, scalable collections

– IProducerConsumerCollection<T>• ConcurrentQueue<T>• ConcurrentStack<T>• ConcurrentBag<T>

– ConcurrentDictionary<TKey,TValue>

• Phases and work exchange– Barrier – BlockingCollection<T>– CountdownEvent

• Partitioning– {Orderable}Partitioner<T>

• Partitioner.Create

• Exception handling– AggregateException

• Initialization– Lazy<T>

• LazyInitializer.EnsureInitialized<T>

– ThreadLocal<T>

• Locks– ManualResetEventSlim– SemaphoreSlim– SpinLock– SpinWait

• Cancellation• CancellationToken{Source}

Page 19: Using Parallel Computing Platform - NHDNUG

Parallel Debugging

• Two new debugger toolwindows– Support both native and managed

• “Parallel Tasks”• “Parallel Stacks”

Page 20: Using Parallel Computing Platform - NHDNUG

Parallel Tasks

− What threads are executing my tasks?− Where are my tasks running (location,

call stack)?− Which tasks are blocked?− How many tasks are waiting to run?

Page 21: Using Parallel Computing Platform - NHDNUG

Parallel Stacks

Zoom control Bird’s eye view

− Multiple call stacks in a single view− Task-specific view (Task status)− Easy navigation to any executing method− Rich UI (zooming, panning, bird’s eye view,

flagging, tooltips)

Page 22: Using Parallel Computing Platform - NHDNUG

Parallel Profiling

Page 23: Using Parallel Computing Platform - NHDNUG

CPU Utilization

Number of cores

Your Process

Idle time

Other processes

Page 24: Using Parallel Computing Platform - NHDNUG

Threads

Usage Hints

Detailed thread analysis(one channel per thread)

Active Legend

Hide uninteresting

threads

Measure time for interesting segments

Zoom in and out

Call Stacks

Page 25: Using Parallel Computing Platform - NHDNUG

CoresEach logical core

in a swim lane

One color per thread

Cross-core migration details

Migration visualization

Page 26: Using Parallel Computing Platform - NHDNUG

Demo

LibrariesLanguagesDebuggersProfilers

Page 27: Using Parallel Computing Platform - NHDNUG

Agenda

27

• What’s new with Windows?• Parallel Computing Tools in Visual Studio• Using .NET Parallel Extensions

Page 28: Using Parallel Computing Platform - NHDNUG

Thinking Parallel - “Task” vs. “Data” Parallelism

Task Parallelism

Parallel.Invoke(() => { Console.WriteLine("Begin first task...");

}, () => { Console.WriteLine("Begin second task...");

}, () => { Console.WriteLine("Begin third task...");

} );

Data Parallelism

IEnumerable<int> numbers = Enumerable.Range(2, 100-3);var myQuery =

from n in numbers.AsParallel()where Enumerable.Range(2,

(int)Math.Sqrt(n)).All(i => n % i > 0)select n;

int[] primes = myQuery.ToArray();

Page 29: Using Parallel Computing Platform - NHDNUG

Thinking Parallel – How to Partition Work?

Several partitioning schemes built-in– Chunk

• Works with any IEnumerable<T>• Single enumerator shared; chunks handed out on-demand

– Range• Works only with IList<T>• Input divided into contiguous regions, one per partition

– Stripe• Works only with IList<T>• Elements handed out round-robin to each partition

– Hash• Works with any IEnumerable<T>• Elements assigned to partition based on hash code

Custom partitioning available through Partitioner<T>– Partitioner.Create available for tighter control over built-in partitioning schemes

Page 30: Using Parallel Computing Platform - NHDNUG

Thinking Parallel – How to Execute Tasks?

Page 31: Using Parallel Computing Platform - NHDNUG

Thinking Parallel – How to Collate Results?

Page 32: Using Parallel Computing Platform - NHDNUG

Demos

PartitionExecuteCollate

Page 33: Using Parallel Computing Platform - NHDNUG

Resources

• Native APIs/runtimes (Visual C++ 10)– Tasks, loops, collections, and Agents– http://msdn.microsoft.com/en-us/library/dd504870(VS.100).aspx

• Tools (in the VS2010 IDE)– Debugger and profiler– http://msdn.microsoft.com/en-us/library/dd460685(VS.100).aspx

• Managed APIs/runtimes (.NET 4)– Tasks, loops, collections, and PLINQ– http://msdn.microsoft.com/en-us/library/dd460693(VS.100).aspx

General VS2010 Parallel Computing Developer Centerhttp://msdn.microsoft.com/en-us/concurrency/default.aspx