tvm at arm · 2019. 12. 17. · •interested in arm cpu architecture support •investigating arm...

14
© 2019 Arm Limited Ramana Radhakrishnan u99127/@ramana-arm 5 th December 2019 TVM at Arm TVM Summit – Seattle

Upload: others

Post on 07-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

© 2019 Arm Limited

Ramana Radhakrishnan u99127/@ramana-arm

5th December 2019

TVM at Arm

TVM Summit – Seattle

Page 2: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

2 © 2019 Arm Limited

Agenda

• AI / ML in End Points

• Brief overview of Arm’s ML Platform

• Using TVM in Arm• Current Areas of Interest• Future Areas of Interest

• Challenges / Observations

Page 3: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

3 © 2019 Arm Limited

Vision

Images and video

Object detection, face unlock, defocus (bokeh),

beautification, scaling, etc.

Voice

Recognition and creation

Keyword spotting, speech recognition, natural language processing, speech synthesis,

etc.

Vibration

Any ‘signal’

Accelerometer, pressure, lidar/radar, speed, shock,

vibration, pollution, density, viscosity, etc.

AI performs well with ‘patterns’ of data

What is AI Being Used for in Endpoints?

Page 4: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

4 © 2019 Arm Limited

Diversity of AI Requirements in the Market

Premium• Best user experience and responsiveness

• Highest performance in power-efficient design

Balanced• Superior user experience in mid-range designs

• Balance performance with area and power

Cost Sensitive• Delivering advanced user experiences for the most

cost-sensitive designs

• Optimized for performance in the smallest area

Page 5: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

5 © 2019 Arm Limited

Introducing Ethos NPUs for Every Market Segment

Performance-critical AI applications delivering premium experiences

Supporting AI applications in the most cost-sensitive

endpoint devices

Enabling AI applications in mid-range devices balancing performance with cost and

battery life constraints

Page 6: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

6 © 2019 Arm Limited

HostCortex-A CPU

NPUCompiler/Driver

Arm NN

ML Framework

ML Application

NCU

MCE PLE

DMA

Arm Ethos NPU

Ethos NPU Software Stack

SRAM SRAM

DRAM

TVM

Page 7: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

7 © 2019 Arm Limited

Comprehensive ML Platform Makes Developing AI Easy

Armv8 SVE

Page 8: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

8 © 2019 Arm Limited

Ethos Integration into TVM – Compile Time

DL Framework Frontend

Relay Lowering

NPU Graph Partitioning

Rest of the TVM stack

Annotated Output

Page 9: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

9 © 2019 Arm Limited

Current Areas of Work with TVM

• Support for Mali Bifrost schedules. • Improvements by about

20-70% • Interested in Arm CPU

architecture support• Investigating Arm

Compute Library integration

Arm CPU and GPU

• Pre-quantized TensorFlow-Lite• Some operator support

• Framework versioning.

• Reviewing various bits of Arm architecture support.

• LLDB Pretty Printers

• Investigating μTVM

• Graph Partitioning for NPU

• Integrating support for Ethos-N77, Ethos-N57 and Ethos-N37

General Areas Ethos NPU

Page 10: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

10 © 2019 Arm Limited

Future Areas of Interest

General Framework

Graph partitioning and Relay optimizations.

Improvements to code generation and generic optimizations.

Auto-tuning

Common command line utilities for using TVM.

Improvements to Continuous Integration / Testing

μTVM

Arm Architecture Support

• Scalable Vector Extensions (SVE/SVE2)

• Matrix Multiplication Support

• BFloat16

• Improved Advanced SIMD Support

Armv8-A architecture

• Mali Bifrost

• Mali Valhall

GPU Support

• DSP Instructions support.

• Support for Helium / MVE.

Cortex-M Architecture

Page 11: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

11 © 2019 Arm Limited

Challenges / Opportunities

Deployment

• conda

• pypi packaging

• Integration with native packaging

Getting ready for packaging

Release process

• Execution tests and performance monitoring.

• Managing version updates in frameworks

Continuous Integration

Scalability

• Features vs Bug fixes.

• Isolation of changes.

Developmental Practice

• Better explanation with changes

• Debug helpers.

• Understanding the test infrastructure.

Developer efficiency

• Make it easier !

Getting Started

Page 12: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

12 © 2019 Arm Limited

And finally, we are hiring!

Page 13: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in

the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.

www.arm.com/company/policies/trademarks

© 2019 Arm Limited

Page 14: TVM at Arm · 2019. 12. 17. · •Interested in Arm CPU architecture support •Investigating Arm Compute Library integration Arm CPU and GPU • Pre-quantized TensorFlow-Lite •Some

14 © 2019 Arm Limited

tvm-driver

• tvm-driver –help• compile

– --debug-relay-all– --debug-tvm-all– --debug-all– --print-llvm– --print-assembler

• execute– --native– --remote

• auto-tune

Motivation

• Ease of use of TVM stack

• Common way of getting hold of outputs.