adapteva epiphany iii - rochester institute of...
TRANSCRIPT
Adapteva Epiphany III
Bryan T. Meyers
May 7, 2015
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 1 / 24
Introduction
Introduction
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 2 / 24
Introduction
Introduction
Problem
I Current CMP architectures suffer from:I Power InefficiencyI Dark SiliconI Bandwidth Contention
I Increasing need for ASPs to build efficient devices
I New paradigms show promise, but do not address general purposecompute
Solution
Develop a new CMP architecture that is designed for high efficiencycompute and low thermal output, with a high-speed network-on-chip.
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 3 / 24
Introduction
Introduction
Problem
I Current CMP architectures suffer from:I Power InefficiencyI Dark SiliconI Bandwidth Contention
I Increasing need for ASPs to build efficient devices
I New paradigms show promise, but do not address general purposecompute
Solution
Develop a new CMP architecture that is designed for high efficiencycompute and low thermal output, with a high-speed network-on-chip.
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 3 / 24
Introduction
Adapteva
I Founded 2008 in Lexington, MassachusettsI Goal
I Build the most-efficient CMP in the worldI Lead the way to next-gen compute scales
I Used Kickstarter to raise $100 million to build the Parallella
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 4 / 24
Parallella
Parallella
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 5 / 24
Parallella
Platform
Figure: Adapteva Parallella
Specifications
I Xilinx ZynQ 7010-7020I 800 MHz Dual-core ARM A9I FPGA
I Adapteva Epiphany IIICoprocessor
I 1GB DDR2 SDRAM
I MicroSD Card
I Ethernet HDMI
I USB
I GPIO
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 6 / 24
Parallella
Architecture
Figure: Parallella High-Level Architecture
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 7 / 24
Adapteva Epiphany III
Adapteva Epiphany III
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 8 / 24
Adapteva Epiphany III
Introduction
Specifications
I 16-core CMPI Cores
I 32-bitI 1 GHz Superscalar RISCI 32KB Multibank SRAMI DMA
I 2D Mesh Network on Chip
I 2W TDP
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 9 / 24
Adapteva Epiphany III Architecture
Overview
Tilera?
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 10 / 24
Adapteva Epiphany III Architecture
Overview
Tilera?
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 10 / 24
Adapteva Epiphany III Architecture
Core
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 11 / 24
Adapteva Epiphany III Architecture
Core - Memory
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 12 / 24
Adapteva Epiphany III Architecture
Networking
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 13 / 24
Adapteva Epiphany III Architecture
Limitations
Host
I Slow Host toCoprocessor Link
I No Send/Receive
Core
I Only 32-bit FP
I Interrupted by RemoteRead/Write
Memory
I Only 32KB per core
I No MemoryManagement
Network
I Reads are Requests toWrite
I Load/Store notSend/Receive
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 14 / 24
Adapteva Epiphany III Architecture
Limitations
Host
I Slow Host toCoprocessor Link
I No Send/Receive
Core
I Only 32-bit FP
I Interrupted by RemoteRead/Write
Memory
I Only 32KB per core
I No MemoryManagement
Network
I Reads are Requests toWrite
I Load/Store notSend/Receive
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 14 / 24
Adapteva Epiphany III Architecture
Limitations
Host
I Slow Host toCoprocessor Link
I No Send/Receive
Core
I Only 32-bit FP
I Interrupted by RemoteRead/Write
Memory
I Only 32KB per core
I No MemoryManagement
Network
I Reads are Requests toWrite
I Load/Store notSend/Receive
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 14 / 24
Adapteva Epiphany III Architecture
Limitations
Host
I Slow Host toCoprocessor Link
I No Send/Receive
Core
I Only 32-bit FP
I Interrupted by RemoteRead/Write
Memory
I Only 32KB per core
I No MemoryManagement
Network
I Reads are Requests toWrite
I Load/Store notSend/Receive
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 14 / 24
Programming
Programming
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 15 / 24
Programming
OpenCL
STDCL
I Open-source OpenCLimplementation
I SupportsI X86I ARMI CUDAI Epiphany
I Can access networkeddevices (cluster)
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 16 / 24
Programming
Epiphany SDK
Epiphany SDK
I Host Library (EHAL)I Platform
ManagementI Program ManagementI Data Movement
I Device Library (ELIB)I Platform InfoI Data MovementI Mutex / Semaphore
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 17 / 24
Programming
Epiphany SDK
Epiphany SDK
I Host Library (EHAL)I Platform
ManagementI Program ManagementI Data Movement
I Device Library (ELIB)I Platform InfoI Data MovementI Mutex / Semaphore
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 17 / 24
Programming
Epiphany SDK
Epiphany SDK
I Host Library (EHAL)I Platform
ManagementI Program ManagementI Data Movement
I Device Library (ELIB)I Platform InfoI Data MovementI Mutex / Semaphore
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 17 / 24
Programming
Caveats
OpenCL
I DataI Assume Shared
Memory ArchitectureI Loads and StoresI Host to Coprocessor
Communication isSlow
I ComputationI Must have high
Computation toCommunication ratio
I Does not leveragelocal memorybandwidth
Epiphany SDK
I DataI No Memory
ManagementI Loads and StoresI Host to Coprocessor
Communication isSlow
I Steep Learning Curve
I May need Assembly torealize performance goals
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 18 / 24
Programming
Caveats
OpenCL
I DataI Assume Shared
Memory ArchitectureI Loads and StoresI Host to Coprocessor
Communication isSlow
I ComputationI Must have high
Computation toCommunication ratio
I Does not leveragelocal memorybandwidth
Epiphany SDK
I DataI No Memory
ManagementI Loads and StoresI Host to Coprocessor
Communication isSlow
I Steep Learning Curve
I May need Assembly torealize performance goals
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 18 / 24
Programming
Caveats
OpenCL
I DataI Assume Shared
Memory ArchitectureI Loads and StoresI Host to Coprocessor
Communication isSlow
I ComputationI Must have high
Computation toCommunication ratio
I Does not leveragelocal memorybandwidth
Epiphany SDK
I DataI No Memory
ManagementI Loads and StoresI Host to Coprocessor
Communication isSlow
I Steep Learning Curve
I May need Assembly torealize performance goals
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 18 / 24
Performance
Performance
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 19 / 24
Performance
Claims
I 2 Watt MaxTDP for 16 cores
I 32 GFLOPsI 1 FMAC /
cycle / coreI 1 GHz
I 512 GB/sOn-chip BW
I 8 GB/s Off-chipBW
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 20 / 24
Performance
Examination
Design
I Resembles a DSP closelyI Multi-bank memory to
allow faster accessI Memory Management
is ManualI Default heap
allocation is Off-chipI Must write linker file
to set up memoryspaces
Performance
I CalculationI Peak Performance Measured
by FMACI Integer MIPS = FP MIPSI Byte operations are 4 times
faster
I CommunicationI Major Bottleneck to HostI Reads are delayed WritesI Peak On-chip BW is for
Broadcast
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 21 / 24
Performance
Examination
Design
I Resembles a DSP closelyI Multi-bank memory to
allow faster accessI Memory Management
is ManualI Default heap
allocation is Off-chipI Must write linker file
to set up memoryspaces
Performance
I CalculationI Peak Performance Measured
by FMACI Integer MIPS = FP MIPSI Byte operations are 4 times
faster
I CommunicationI Major Bottleneck to HostI Reads are delayed WritesI Peak On-chip BW is for
Broadcast
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 21 / 24
Performance
Examination
Design
I Resembles a DSP closelyI Multi-bank memory to
allow faster accessI Memory Management
is ManualI Default heap
allocation is Off-chipI Must write linker file
to set up memoryspaces
Performance
I CalculationI Peak Performance Measured
by FMACI Integer MIPS = FP MIPSI Byte operations are 4 times
faster
I CommunicationI Major Bottleneck to HostI Reads are delayed WritesI Peak On-chip BW is for
Broadcast
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 21 / 24
Performance
Possible Improvements
Network
I Sends and ReceivesI Native MPI
I ScatterI GatherI BroadcastI Reduce
I Faster Link to Host
Cores
I Memory ManagementI Bank AllocationI Malloc/Free
I True Byte Addressing
I Larger Local Memory(64KB+)
I Caching(?)
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 22 / 24
Performance
Possible Improvements
Network
I Sends and ReceivesI Native MPI
I ScatterI GatherI BroadcastI Reduce
I Faster Link to Host
Cores
I Memory ManagementI Bank AllocationI Malloc/Free
I True Byte Addressing
I Larger Local Memory(64KB+)
I Caching(?)
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 22 / 24
Performance
References
I Adapteva.com
I Parallella
I Epiphany III Datasheet
I Epiphany Architecture Reference
I Parallella Manual
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 23 / 24
Questions?
Questions?
Bryan T. Meyers Adapteva Epiphany III May 7, 2015 24 / 24