parallel computing models & techniques
DESCRIPTION
Parallel Computing Models & Techniques. About Me. Microsoft MVP Intel Blogger TechEd Israel, TechEd Europe Expert C++ Book http ://AsyncOp.com http://Asaf.Shelly.co.il. Parallel Computing. Multi-Core Distributed Systems SOA & WebServices Transaction, Session, Queue, Event, Interrupt - PowerPoint PPT PresentationTRANSCRIPT
Parallel ComputingModels & Techniques
About Me
• Microsoft MVP• Intel Blogger• TechEd Israel, TechEd Europe• Expert C++ Book
• http://AsyncOp.com• http://Asaf.Shelly.co.il
Parallel Computing
• Multi-Core• Distributed Systems• SOA & WebServices• Transaction, Session, Queue, Event, Interrupt• User Experience over User Interface• Maximize performance: No Free Work Unit• Best performance: No I/O Wait
Advantages of Multi-Core
• Low Power Consumption• Extended battery life• Less heating• Smaller and lighter devices• Software replaces custom hardware!
Signaling In Hardware
Error Detection
CPU
RAMRequest: Read AddressWait: Preparing DataResponse: Data
Signaling In Hardware
Error Detection
CPU
RAMInterrupt : Data Pending
Interrupt : Processing Complete
Software Locks?
• Locks Are BAD!• By design a lock is forcing serial work• Using a resource on a single core• Use a lock only when you want to use 1 core
at a time and eliminate parallel work• Locks can be used on single steps for example
entrance to a queue
Locks Are BAD!
• Can you find the bug??
Lock( MUTEX_A )Buffer_A [ 12 ] = 23;// here Buffer_A [ 12 ] is 57 !!!!
Unlock( MUTEX_A )
• Would you find it with a code review?
Lock = Stop
Need Lock-Free Solutions!
Protecting A Resource
• Lock as way to share ownership• Using a single owner– Owner Thread– Owner Task– TPL Agent– Device Driver– Owner Service
Asynchronous Work Without Locks
• Phone as Synchronous System• Phone as Asynchronous System• Mail and Email System• Order Pizza
Unprotected Parallel Access To Data
• Two Writers or Writer and Reader
Writer
A A A A A A A A A A A A A A A A A
Writer Reader
Race Condition - Location
• Two Writers or Writer and Reader
Writer
A A A A A A A A A A A A A A A A A
Writer Reader
Race Condition – Timeframe
• Collision over the same communication line
Writer Writer Reader
A
Race Condition - Sequence
• Bugs in Parallel Pipeline
Clear Buffer
Add ABCDE ABCDE
Add X ABCDEX
Add ‘1’ To ASCII BCDEFY
123
Add ABCDE 123ABCDE
Clear Buffer
Add ‘1’ To ASCII
Add X X
123
TCP, CJP, PF
Race Condition Solutions
• Wave-In Signal – Manager (ex. USB BUS)• Pass Ownership (Token Ring, MUTEX)• TDM• Burst Write, Retry Read (ex. SeqLock, Reader-
Writer Lock, Network Layer 2)• Write and Verify• Queue• Transaction – A Sequence
Serial Problem with Communication
• Transaction based Ping-Pong
ComputerUSB Device
Packet Request
Packet Data
Acknowledge
Packet Request
Packet Data
Acknowledge
Parallel Solution for Ping-Pong
• Collected Transaction
Request List
Packet Data A
Packet Data B
Packet Data C
Packet Data D
Retransmit B
Ack List
ComputerUSB Device
Cancel Operation
• Search For File
Request List
Packet Data A
Data found in B
Packet Data B
Packet Data C
Acknowledge A
Packet Data D
Abort
Packet Data E
Acknowledge
ComputerUSB Device
Object Oriented Design
• Definition Of Objects• Object Relations• Object Reusability• Object Management• Object Oriented Block Diagram• Object Oriented System Design• Avoid “Spaghetti Code”
Procedural Design
• Definition Of State• Procedure Relations• Procedure Reusability• Flow Control Management• Poor Block Diagram• Limited System Design• Avoid “Spaghetti Flow”
Good Application Design
Good Application Design
Queue
• Pass Data Without Using Lock• Full Asynchronous Operation• Event With Data• Event With Priority• Event With Destination• Structured Event vs. Stream
Flow Control
• Keep Internal State• Object State• Execution Phase• Collection of State as System State• System State for Debug
Task Management
• Stack – Hardware Accelerated Management• Fork• Software Stack Management• Session• Task Groups
Software Dispatcher
• .Net Parallel Extensions
Network Dispatcher
• Load Balancing
Firewall
Load Balance Front End
Firewall Firewall
Load Balance Front End
Hardware Dispatcher
• 10 Gbps Network Switch
2.67 GHzCORE
10 GHz Network Dispatcher / MUX
2.67 GHzCORE
2.67 GHzCORE
2.67 GHzCORE
Cloud Dispatcher
• Microsoft Server 2008 HPC
Parallel Memory?
Task Oriented Design
Operation: Setting up a Tent
Task: locate items in storage
Task: carry items to build site
Task: use items to build tent
Execution Timeline
Time
Output
Locate Carry Use
Pole
Fabric
Wires
Horizontal Division
Time
Time
Vertical Division
Resource Partitioning
Force Duplication
• Entire Process• Sharing Resources• Flow Barriers• Simple to implement• Simple Affinity• Simple Priority• No Optimization
Pipeline
• Functional• Resources Ownership• Communication Barriers• Requires Design• Affinity Planning• Priority Planning• Optimization
Super Networks and Grids
• Multiple Reads• Multiple Writes• Replication Time• Replication Overhead• Network Consistency• Data Snapshot• Real Time
Super Networks
Thank You