scalable data clustering with gpus andrew d. pangborn thesis defense rochester institute of...
TRANSCRIPT
![Page 1: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/1.jpg)
Scalable Data Clustering with GPUs
Andrew D. Pangborn
Thesis DefenseRochester Institute of Technology
Computer Engineering DepartmentFriday, May 14th 2010
![Page 2: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/2.jpg)
Intro
• Overview of the application domain• Trends in computing architecture• GPU Architecture, CUDA• Parallel Implementation
![Page 3: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/3.jpg)
Data Clustering
• A form of unsupervised learning that groups similar objects into relatively homogeneous sets called clusters
• How do we define similarity between objects?– Depends on the application domain, implementation
• Not to be confused with data classification, which assigns objects to predefined classes
![Page 4: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/4.jpg)
Data Clustering Algorithms
Clustering Taxonomy from “Data Clustering: A Review”, by Jain et al. [1]
![Page 5: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/5.jpg)
Example: Iris Flower Data
![Page 6: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/6.jpg)
Flow Cytometry
• Technology used by biologists and immunologists to study the physical and chemical characteristics of cells
• Example: Measure T lymphocyte counts to monitor HIV infection [2]
![Page 7: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/7.jpg)
Flow Cytometry
• Cells in a fluid pass through a laser
• Measure physical characteristics with scatter data
• Add fluorescently labeled antibodies to measure other aspects of the cells
![Page 8: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/8.jpg)
Flow Cytometer
![Page 9: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/9.jpg)
Flow Cytometry Data Sets
• Multiple measurements (dimensions) for each event– Upwards of 6 scatter dimensions and 18 colors per
experiment• On the order of 105 – 106 events• ~24 million values that must be clustered• Lots of potential clusters• Clustering can take many hours on a CPU
![Page 10: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/10.jpg)
Parallel Computing
• Fortunately many data clustering algorithms lend themselves naturally to parallel processing
• Typically with clusters of commodity CPUs• Common APIs:– MPI: Message Passing Interface– OpenMP: Open Multi-processing
![Page 11: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/11.jpg)
Multi-core
• Current trends: – Adding more cores– Application specific
extensions• SSE3/AVX, VT-x, AES-NI
– Point-to-Point interconnects, higher memory bandwidths
![Page 12: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/12.jpg)
GPU Architecture Trends
• Throughput Performance
•Pr
ogra
mm
abili
ty
• CPU
•G
PU• Figure based on Intel Larabee Presentation at SuperComputing 2009
• Fixed Function
• Fully Programmable
• Partially Programmable
• Multi-threaded • Multi-core • Many-core• Intel LarabeeNVIDIA CUDA
![Page 13: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/13.jpg)
Tesla GPU Architecture
![Page 14: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/14.jpg)
Tesla Cores
![Page 15: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/15.jpg)
GPGPU
• General Purpose computing on Graphics Processing Units
• Past– Programmable shader languages: Cg, GLSL, HLSL– Use textures to store data
• Present:– Multiple frameworks using traditional general
purpose systems and high-level languages
![Page 16: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/16.jpg)
CUDA: Software Stack
![Page 17: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/17.jpg)
CUDA: Streaming Multiprocessors
![Page 18: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/18.jpg)
CUDA: Thread Model• Kernel
– A device function invoked by the host computer
– Launches a grid with multiple blocks, and multiple threads per block
• Blocks– Independent tasks comprised of
multiple threads– no synchronization between blocks
• SIMT: Single-Instruction Multiple-Thread– Multiple threads executing time
instruction on different data (SIMD), can diverge if neccesary
![Page 19: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/19.jpg)
CUDA: Memory Model
![Page 20: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/20.jpg)
CUDA: Program FlowApplication Start
Search for CUDA Devices
Load data on host
Allocate device memory
Copy data to device
Launch device kernels to process data
Copy results from device to host memory
CPUMain Memory
Device MemoryGPU Cores
PCI-Express
![Page 21: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/21.jpg)
When is CUDA worthwhile?
• High computational density– Worthwhile to transfer data to separate device
• Both coarse-grained and fine-grained SIMD parallelism– Lots of independent tasks (blocks) that don’t require
frequent synchronization map to different multiprocessors on the GPU
– Within each block, lots of individual SIMD threads• Contiguous memory access patterns• Frequently/Repeatedly used data small enough to fit
in shared memory
![Page 22: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/22.jpg)
C-means
• Minimizes square error between data points and cluster centers using Euclidean distance
• Alternates between computing membership values and updating cluster centers
![Page 23: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/23.jpg)
C-means Parallel Implementation
![Page 24: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/24.jpg)
C-means Parallel Implementation
![Page 25: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/25.jpg)
EM with a Gaussian mixture model
• Data described by a mixture of M Gaussian distributions• Each Gaussian has 3 parameters
![Page 26: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/26.jpg)
E-step
• Compute likelihoods based on current model parameters
• Convert likelihoods into membership values
![Page 27: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/27.jpg)
M-step
• Update model parameters
![Page 28: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/28.jpg)
EM Parallel Implementation
![Page 29: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/29.jpg)
EM Parallel Implementation
![Page 30: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/30.jpg)
Performance Tuning
• Global Memory Coalescing– 1.0/1.1 vs 1.2/1.3 devices
![Page 31: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/31.jpg)
Performance Tuning
• Partition Camping
![Page 32: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/32.jpg)
Performance Tuning
• CUBLAS
![Page 33: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/33.jpg)
Multi-GPU Strategy
• 3 Tier Parallel hierarchy
![Page 34: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/34.jpg)
Multi-GPU Strategy
![Page 35: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/35.jpg)
Multi-GPU Implementation
• Very little impact on GPU kernel implementations, just their inputs / grid dimensions
• Discuss host-code changes
![Page 36: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/36.jpg)
Data Distribution
• Asynchronous MPI sends from host instead of each node reading input file from data store
![Page 37: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/37.jpg)
Results - Kernels
• Speedup figures • Speedup figures
![Page 38: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/38.jpg)
Results - Kernels
• Speedup figures • Speedup figures
![Page 39: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/39.jpg)
Results – Overhead
• Time-breakdown for I/O, GPU memcpy, etc
![Page 40: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/40.jpg)
Multi-GPU Results
• Amdahl’s Law vs. Gustafson’s Law– i.e. Strong vs. Weak Scaling– i.e. Fixed Problem Size vs. Fixed-Time – i.e. True Speedup vs. Scaled Speedup
![Page 41: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/41.jpg)
Fixed Problem Size Analysis
![Page 42: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/42.jpg)
Time-Constrained Analysis
![Page 43: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/43.jpg)
Conclusions
![Page 44: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/44.jpg)
Future Work
![Page 45: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/45.jpg)
Questions?
![Page 46: Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th](https://reader030.vdocuments.site/reader030/viewer/2022032702/56649cb85503460f9497ec14/html5/thumbnails/46.jpg)
References
1. A. K. Jain, M. N. Murty, and P. J. Flynn, “Data clustering: a review,” ACM Comput. Surv., vol. 31, no. 3, pp. 264–323, 1999.
2. H. Shapiro, J. Wiley, and W. InterScience, Practical flow cytometry. Wiley-Liss New York, 2003.