multi-core development
DESCRIPTION
Multi-Core Development. Kyle Anderson. Overview. History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism. History. First 4 bit microprocessor – 1971 60,000 instructions per second 2,300 transistors First 8 bit microprocessor – 1974 290,000 instructions per second - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/1.jpg)
Multi-Core Development
Kyle Anderson
![Page 2: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/2.jpg)
Overview
• History
• Pollack’s Law
• Moore’s Law
• CPU
• GPU
• OpenCL
• CUDA
• Parallelism
![Page 3: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/3.jpg)
History
• First 4 bit microprocessor – 1971• 60,000 instructions per second• 2,300 transistors
• First 8 bit microprocessor – 1974• 290,000 instructions per second• 4,500 transistors• Altair 8800
• First 32 bit microprocessor – 1985• 275,000 transistors
![Page 4: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/4.jpg)
History
• First Pentium processor released – 1993• 66 MHz
• Pentium 4 released – 2000• 1.5 GHz• 42,000,000 transistors
• Approach 4GHz 2000 - 2005
• Core 2 Duo released – 2006• 291,000,000 tranisitors
![Page 5: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/5.jpg)
History
![Page 6: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/6.jpg)
Pollack’s Law
• Processor Performance grows with square root of area
![Page 7: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/7.jpg)
Pollack’s Law
![Page 8: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/8.jpg)
Moore’s Law
• “The Number of transistors incorporated in a chip will approximately double every 24 months.”
– Gordon Moore, Intel co-founder
• Smaller and smaller transistors
![Page 9: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/9.jpg)
Moore’s Law
![Page 10: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/10.jpg)
CPU
• Sequential
• Fully functioning cores
• 16 cores maximum Currently
• Hyperthreading
• Little Latency
![Page 11: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/11.jpg)
GPU
• Higher latency
• Thousands of cores
• Simple calculations
• Used for research
![Page 12: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/12.jpg)
OpenCL
• Multitude of Devices
• Run-time compilation ensures most up to date features on device
• Lock-Step
![Page 13: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/13.jpg)
OpenCL Data Structures
• Host
• Device• Compute Units
• Work-Group• Work-Item
• Command Queue
• Kernel
• Context
![Page 14: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/14.jpg)
OpenCL Types of Memory
• Global
• Constant
• Local
• Private
![Page 15: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/15.jpg)
OpenCL
![Page 16: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/16.jpg)
OpenCL Example
![Page 17: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/17.jpg)
OpenCL Example
![Page 18: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/18.jpg)
OpenCL Example
![Page 19: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/19.jpg)
CUDA
• NVidia's proprietary API for their GPU’s
• Stands for “Compute Unified Device Architecture”
• Compiles directly to hardware
• Used by Adobe, Autodesk, National Instruments, Microsoft and Wolfram Mathematica
• Faster than OpenCL because compiled directly on hardware and focus on a single architecture.
![Page 20: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/20.jpg)
CUDA Indexing
![Page 21: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/21.jpg)
CUDA Example
![Page 22: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/22.jpg)
CUDA Example
![Page 23: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/23.jpg)
CUDA Example
![Page 24: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/24.jpg)
CUDA Function Call
cudaMemcpy( dev_a, a, N * sizeof(int),cudaMemcpyHostToDevice );
cudaMemcpy( dev_b, b, N * sizeof(int),cudaMemcpyHostToDevice );
add<<<N,1>>>( dev _ a, dev _ b, dev _ c );
![Page 25: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/25.jpg)
Types of Parallelism
• SIMD
• MISD
• MIMD
• Instruction parallelism
• Task parallelism
• Data parallelism
![Page 26: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/26.jpg)
SISD
• Stands for Single Instruction, Single Data
• Does not use multiple cores
![Page 27: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/27.jpg)
SIMD
• Stands for “Single Instruction, Multiple Data Streams”
• Can be process multiple data streams concurrently
![Page 28: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/28.jpg)
MISD
• Stands for “Multiple Instruction, Single Data”
• Risky because several instructions are processing the same data
![Page 29: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/29.jpg)
MIMD
• Stands for “Multiple Instruction, Multiple Data”
• Instructions are processed sequentially
![Page 30: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/30.jpg)
Instruction Parallelism
• Mutually exclusive
• MIMD and MISD often use this
• Allows multiple instructions to be run at once
• Instructions considered operations
• Not programmatically done• Hardware • Compiler
![Page 31: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/31.jpg)
Task Parallelism
• Dividing up of main tasks or controls
• Runs multiple threads concurrently
![Page 32: Multi-Core Development](https://reader035.vdocuments.site/reader035/viewer/2022062315/56814fe6550346895dbdb0f4/html5/thumbnails/32.jpg)
Data Parallelism
• Used by SIMD and MIMD
• A list of instructions is able to work concurrently on a several data sets