risc-v i/o scale-out architecture for distributed data analytics...summary • scale-out analytics...

27
Integrated Device Technology RISC-V I/O Scale-out Architecture for Distributed Data Analytics Mohammad Akhter IntegratedDevice Technology, Inc. [email protected] July 12, 2016 © Copyright 2016, IDT Inc.

Upload: others

Post on 17-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Integrated Device Technology

The Analog and Digital Company™

RISC-VI/OScale-outArchitectureforDistributedData

Analytics

MohammadAkhterIntegratedDeviceTechnology,Inc.

[email protected]

July12,2016

©Copyright2016, IDTInc.

Page 2: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Contributors/Acknowledgement

• HenryCook,SiFive,Inc.• HowardMao,SiFive,Inc.• KrsteAsanovic,SiFive,Inc.• MohammadAkhter,IDT,Inc.• StephenDurr,IDT,Inc.

2

Integrated Device Technology

The Analog and Digital Company™

Page 3: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

SocialmediaisdrivingMobileInternetAccess

Massive Data Growth Driving Internet

3

3.6BMoreVideouploadedtoYouTubethanthetraditionalTV

networks3.6BPhotosuploaded in2014

(whatsapp, facebook,instagram,snapchat …)

807M 807MpeoplewatchedYouTubevideo"Charliebitmyfingeragain“

(642MtoDisneylandsince1955)

91%

UnstructuredData….MachineDatanotincluded….

Page 4: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

The Real-Time Usage Rising

Live Data from Wireless to the Cloud

Networksand

Data Centersmust

THINK FAST

Social Media, Finance, Health, Video, Audio, Cloud Compute, Transportation, Military C3I

• Millions of users• More data per user• Low Latency• Data Analytics

4

Page 5: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Focus on the Use Cases …

5

HPCandAnalyticsà HeterogeneousComputingà PerformanceCriticalDataAnalyticsà LowestLatencyProcessing

WirelessAccessNetworkà Edgehighperformancecomputingà DeterministicLowestLatencyà Fault-TolerantSystemà NetworkandSignalProcessing

Robotics/SensorFusion/Autonmousà Real-timeVideo/ImageRecognitionà TimesynchronizedProcessingà DeterministicLatencyà Fault-TolerantSystem

SpaceandMilitaryà HighPerformanceEmbedComputingà HeterogeneousArchitectureà VideoandImageAnalyticsà LowestLatencyProcessing

Page 6: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Underlying Structure for Analytics…

6

Demands balanced low latency computing, I/O, memory, and storage processing

Page 7: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Wireless Network EvolutionDriven by Real-time Data with better QoS

7

150xReduction

LowerLatencytoachievesuperiorQoE

~10x

~10-50x

HigherBandwidthdrivenbyVideo/SocialMedia

Page 8: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Edge-Core Network ConvergenceRace to msec!

88

UserRadioBase-Station

GatewayInternet/C

ore Network

Servers/Storage

AccessEdgeCore/Aggregation

Page 9: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Traditional Computing Model

9

I/O, Memory and Storage

Bottleneck

Suffers from inherent mismatch between I/O, Memory, and Storage and Processing (CPU/Accelearators) Performance Evolution

Page 10: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Distributed Edge Network Computing Model

10

Power

X86/GPU

EdgeServers

CloudRAN

4GBTSwithintegratedcomputing

à Reducesbandwidthdemand fromAccesstoCore

à Provides Real-timepredictivedecision

à UserCentricoptimizationofContentdelivery

Page 11: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

ComputingandNetworkConvergenceUsercentricNetwork

Real-timeUserCentricNetwork

11

Page 12: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Deep Learning Video Example(Hacking a new Computing)

12

Page 13: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Hacking a new Computer• BalancingGPU/CPUperformancewithMemoryandI/O

Page 14: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Hacking a new Computer

• Scale-outComputingandstorage

4 GPU Eval Cards8G I/O NIC

4 GPU (NVIDIA Tegra K1)56G I/O (4x14G NIC) 200G Switching

38U System144 TFLops608 GPU Nodes (4x4x36 )38 Port Switching Fabric

Page 15: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Hacking a new Computer

Page 16: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Deep Learning Micro-ClusterReal-time Video/Social Analytics

RapidIO Switch Appliance

Low Power ARM+GPU Cluster with RapidIO

~29 frames/sec/node~100 ns RapidIO switching latency

> TFlops Processing/node

~5W-11 W per GPU node

Page 17: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Example Analytics Deep Learning Model for Image/Video

17

Page 18: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Analytics Computing Model Challenges

18

+SharedmemorybetweenCPUandGPU+Supports Scale-outSystem- SharedMemoryandStoragechallengingwithouthardwaresupport+Switchlatencyexcellent(~100nsec)- Memorytomemorylatencycanbeimproved

Page 19: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Scale-out Analytics Clustering Model

19

+ LargeScale-outAnalyticsModel+ Lowlatency<200- 300nsec+ LowPower+ SharedMemory/StoragePool+ OpenISAModel+ StandardFabric

Page 20: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

RISC-V Scale-out Analytics Clustering Model

20

Page 21: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

FabricTileLink and AMBA

• Off-chipFabric– Scale-outandRemoteMemoryAccelerationwithRapidIO

• On-chipFabric– TileLink

• UCBerkley• Hierarchical/DeadlockFree/Built-inMessagetypes(supportsAccelerators)

• ExtensiblewithCustomMessageforCoherency(5-stateandothers)

– AMBA• ARMInc.• AMBAAXIandACEProtocolSpecification

– IssueE.22Feb,2013• CC-NUMAArchitecture,Up-to5StateModel

21

Page 22: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

RISC-V Architecture with RapidIO

22

TileLink-RapidIOI/OandMemoryFabric

Page 23: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

RISC-V Multi-chip Generator

23

InitialVersionofMulti-chipHardwareGeneratoravailable

Page 24: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Remote Memory/IO Scale-out Packets

24

DestinationID SourceID FlowControl Payload ProtectionAddressing TYPE

SoC1 SoC2 SoC4

SoC3

SoC5

Request-ResponsePeer-to-PeerModel

Page 25: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

TileLink Basic Protocol and RapidIO Packet Structure

25

client

manager

Acquire FinishGrantProbe Release

+0 +1 +2 +3

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Byte 0 ackID vc CRF prio tt=10 Ftype destinationID[7:0] sourceID[7:0] W 0Byte 4 TType (transaction) rQoS rd-/wr-size srcTID rSizeBurst W 1

Byte 8 udrB rsvd rOffset rsvd rUnion W 2

Byte 12 Address Address W 3Byte 16 Address Address W 4Byte 20 Payload(Word0) W 5Byte 24 Payload(Word1) W 6

…Payload(Word15) W 20

CRC Padding W 21

Page 26: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Hardware Simulation Model

26

RapidIOEgress

RapidIOIngress

TargetLatencyCPUtoCPU~100– 200nsec

Page 27: RISC-V I/O Scale-out Architecture for Distributed Data Analytics...Summary • Scale-out Analytics with balancing Fabric and Computing – Scale-out through distributed Edge Computing

Summary• Scale-outAnalyticswithbalancingFabricandComputing

– Scale-outthroughdistributedEdgeComputingModel• Reducesround-triplatency• Reducesaccesstocorenetworkbandwidth

– RISC-VwithintegratedFabricforI/OandMemory• FabricforAnyTopology(Mesh/Hypercube)• LowLatencydirectlyfromSoC on-chipFabric• SupportsTileLink scale-outformulti-nodeclustering• EnablesBalancedCoherentandscale-outarchitecture

27

RISC-VCPUGeneratormodelwithPortforRapidIOavailable!