ocelot and the sst- macsim simulator
DESCRIPTION
Ocelot and the SST- MacSim Simulator. Genie Hsieh § , Andrew Kerr, Hyesoon Kim, Jaekyu Lee, Nagesh Lakshminarayana , Arun Rodrigues § , Sudhakar Yalamanchili. School of Computer Science and School of Electrical and Computer Engineering Georgia Institute of Technology - PowerPoint PPT PresentationTRANSCRIPT
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY
Ocelot and the SST-MacSim Simulator
Genie Hsieh§, Andrew Kerr, Hyesoon Kim, Jaekyu Lee, Nagesh Lakshminarayana, Arun Rodrigues§, Sudhakar Yalamanchili
School of Computer Science and School of Electrical and Computer Engineering
Georgia Institute of TechnologyAtlanta, GA. 30332
§Scalable Computer Architecture Department
Sandia National LaboratoriesAlbuquerque, NM. 87185
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY
System Diversity
Keeneland System Tianhe-1A
Amazon EC2 GPU Instances
Heterogeneity is Mainstream
Mobile Platforms
2
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY
Heterogeneity On-ChipVector ExtensionsAES Instructions
Programmable Pipeline (GEN6)
Sandy Bridge
Programmable Accelerator
PowerEN
16, PowerPC cores Accelerators
• Crypto Engine• RegEx Engine• XML Engine
ARM Style
Memory
Denver
Multiple models of Computation Multi-ISA
3
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY
Heterogeneous Systems: Keeneland
201 TFLOPS in 7 racks (90 sq ft incl service area)
677 MFLOPS per watt on HPL (#9 on Green500, Nov 2010)
Final delivery system planned for early 2012 Keeneland System(7 Racks)
ProLiant SL390s G7(2CPUs, 3GPUs)
S6500 Chassis(4 Nodes)
Rack(6 Chassis)
M2070
Xeon 5660
12000-SeriesDirector Switch
Integrated with NICSDatacenter GPFS and TGFull PCIe X16
bandwidth to all GPUs
67GFLOPS
515GFLOPS
1679GFLOPS
24/18 GB
6718GFLOPS
40306GFLOPS
201528GFLOPS
Courtesy J. Vetter (GT/ORNL)
4
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY
Heterogeneous Architecture & Systems Research
• Lexical Analyzer• Parser• Semantic analysis
• Optimization • Code generation • Post pass
optimization
Substrate
Read-out ckt
NVM
DRAM
NVRAM
DRAMMany-tier hybrid
memory system
Substrate
Read-out ckt
NVM
DRAM
NVRAM
DRAMMany-tier hybrid
memory system
VLIW (Caymen)SIMT (Fermi) New Designs
• Microarchitecture• Memory systems• Network on Chip• Power Management• + Many more
• Memory Optimizations• Program Transformations• Control Flow Optimizations• + Many more
Common Research Themes
Instruction set architecture
Focus on explicitly data parallel languages – bulk
synchronous models
5
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY 6
Research Infrastructure Challenges
Microarch Simulator
Power & Thermal Models
Compiler
Open source Compiler infrastructures for
GPU computing Microarchitecture cycle-level
timing simulators for heterogeneous architectures
Integration between compiler, simulators, and models
Scalable simulation infrastructures
Simulation wall! Ability to integrate point toolsTileTile
TileTile Tile Tile
Tile Tile
Tile
Tile
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY 7
Tutorial Overview
Low level Compiler Infrastructure for GPU
Computing
Ocelot Dynamic Execution
Infrastructure Andrew Kerr, Sudhakar Yalamanchili
Heterogeneous Cycle-level Architecture Models
Parallel Simulation Infrastructure
MacSim Heterogeneous Architecture Simulator
SST: Structural Simulation Toolkit
J. Lee, N. Lakshminarayana, H. Kim
G. Hsieh, A. Rodrigues
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY 8
Tutorial Schedule
Topical Description
Part 1 (90 min.) Ocelot Overview: Architecture
Ocelot: Supported Devices
Part II (90 min.) Structural Simulation Toolkit
MacSim: Overview
Lunch
Part III (90 min) MacSim: Simulator Architecture
MacSim: Configuration
Part IV (90 min.) Case Studies using Ocelot and SST-MacSIm