“speech recognition on low power devices” · please contact [email protected] if you are...

“Speech Recognition on low power devices”

Vikrant Tomar and Sam Myer – Fluent.ai Inc.

September 15, 2020

tinyML Talks Sponsors

Additional Sponsorships available – contact [email protected] for info

mailto:[email protected]

| Confidential Presentation ©2020 Deeplite, All Rights Reserved

VISIT bit.ly/Deeplite FOR MORE INFO

WE USE AI TO MAKE OTHER AI FASTER, SMALLER ANDMORE POWER EFFICIENT

Automatically compress SOTA models like MobileNet to <200KB withlittle to no drop in accuracy for inference on resource-limitedMCUs

Reduce model optimization trial & error from weeks to days usingDeeplite's design space exploration

Deploy more models to your device without sacrificing performance orbattery life with our easy-to-use software

bit.ly/Deeplite bit.ly/Deeplite

Copyright © EdgeImpulse Inc.

TinyML for all developers

Get your free account at http://edgeimpulse.com

Test

Edge Device Impulse

Dataset

Embedded and edge

compute deployment

options

Acquire valuable

training data securely

Test impulse with

real-time device

data flows

Enrich data and train

ML algorithms

Real sensors in real time

Open source SDK

http://edgeimpulse.com/

Maxim Integrated: Enabling Edge IntelligenceSensors and Signal Conditioning

Health sensors measure PPG and ECG signals critical to understanding vital signs. Signal chain products enable measuring even the most sensitive signals.

Low Power Cortex M4 Micros

The biggest (3MB flash and 1MB SRAM) and the smallest (256KB flash and 96KB SRAM) Cortex M4 microcontrollers enable algorithms and neural networks to run at wearable power levels

Advanced AI Acceleration

AI inferences at a cost and power point that makes sense for the edge. Computation capability to give vision to the IoT, without the power cables. Coming soon!

Wide range of ML methods: GBM, XGBoost, Random

Forest, Logistic Regression, Decision Tree, SVM, CNN, RNN,

CRNN, ANN, Local Outlier Factor, and Isolation Forest

Easy-to-use interface for labeling, recording, validating, and

visualizing time-series sensor data

On-device inference optimized for low latency, low power

consumption, and a small memory footprint

Supports Arm® Cortex™- M0 to M4 class MCUs

Automates complex and labor-intensive processes of a

typical ML workflow – no coding or ML expertise required!

Industrial Predictive Maintenance

Smart Home

Wearables

Qeexo AutoML for Embedded AIAutomated Machine Learning Platform that builds tinyML solutions for the Edge using sensor data

Automotive

Mobile

IoT

QEEXO AUTOML: END-TO-END MACHINE LEARNING PLATFORM

Key Features Target Markets/Applications

For a limited time, sign up to use Qeexo AutoML at automl.qeexo.com for FREE to bring intelligence to your devices!

https://automl.qeexo.com/

Extensive, highly-optimized feature spaces

Super-compact code for MCUs and Gateways

Sensor selection and placement analysis

AI-driven component specs

Automated data quality checks

Data collection, augmentation & labeling services

No open source - clean licensing

Next-Generation AI Tools for

Product Development

Get started w/ a special tinyML Talks offer for corporate customers: https://reality.ai/get-started

$

https://reality.ai/get-started

SynSense (formerly known as aiCTX) builds ultra-low-power(sub-mW) sensing and inference hardware for embedded, mobile and edge devices. We design systems for real-time always-on smart sensing, for audio, vision, bio-signals and

more.

https://SynSense.ai

Next tinyML Talks

Date Presenter Topic / Title

Tuesday,September 29

Michael GieldaVP Business Development and co-founder, Antmicro

Stuart FefferCo-founder and CEO, Reality AI

Running TF Lite on Microcontrollers without hardware in Renode

Building Products using Edge AI / TinyML on MCUs

Webcast start time is 8 am Pacific timeEach presentation is approximately 30 minutes in length

Please contact [email protected] if you are interested in presenting

mailto:[email protected]

Vikrant Tomar

Vikrant is Founder and CTO of Fluent.ai Inc. He is a scientist and executive with nearly 10 years of experience in speech recognition and machine/deep learning. He obtained his PhD in automatic speech recognition at McGill University, Canada, where he worked on manifold learning and deep learning approaches for acoustic modeling. In the past, he has also worked at Nuance Communications Inc. and Vestec Inc. as a Research Scientist.

Sam Myer

Sam is the lead developer at Fluent.ai Inc.,

where his responsibilities include Fluent's

embedded speech recognition engine. He

has a M.Sc. in signal processing from

Queen Mary University of London and a

B.Sc. in computer science from McGill

University. Sam has extensive software

development experience encompassing

nearly 15 years and multiple cities

including New York, Berlin and Montreal.

Overview

• About Fluent.ai

• Model Transformation

• Model Compression

• Fluent.ai µCore

• Demos

12

About Fluent.ai

• Founded in 2015 after over 7 years of ground-breaking machine

learning/AI research by international thought-leaders

• Research partnerships with many leading research labs and

institutions

• Strong and experienced team of leading scientists, engineers, sales

staff and managers/executives (~25)

• Working with customers in North America, Europe and Asia (robust,

multilingual and off-line).

Strong Institutional Backers

13

Fluent.ai Technology

Lights please

Please turn on the

lights

Lig

hts

On

Please turn on the

lights

Please turn on the

lights

Speech to Text NLP

End to end Spoken Language Understanding

Conventional approaches

End to end SLU

• Smaller training data needs

• Higher accuracy and

robustness against noise

• Offline and personalizable

• Any language, multi-language

14

Our Right to Win

15

“Fluent’s unique models can be

trained quickly to deliver the required

accuracy in many dialects,

languages and noise conditions

and be embedded on the world’s

devices.”

– William Tunstall-Pedoe, Founder of Evi

(Acquired by Amazon Alexa)

Advisor/Investor in Fluent.ai

Large Global Opportunity &

Demand

• Wake up your voice-enabled device with one of

our low-power keyword spotting solutions that

beat state of the art systems.

• Less than 5% FRR at 3FAs/24hrs

16

Single, Multiple,

User-Trainable

Smart-home

Devices

Smart

Toys

System Req. RAM Storage Latency Min. Freq.

Arm Cortex M4 25 KB 200 KB 100 ms 48 MHz

WearablesCar

InfotainmentRobotics

&

Industrial IoT

Voice

Remotes

17

• Offline/on-device for guaranteed privacy and security

• Any language and accent, multiple languages

concurrently

• Personalizable by the end-user

• Lower development cost, faster Time-to-Market

World’s first end to end spoken

language understanding system

=>faster, more flexible and more

accurate voice user interfaces than

conventional technologies.

Voice AI for on-device

speech understanding

Smart

Speakers

Smart Home

Hubs

Wearables

Voice Remotes

System Req. RAM Storage Latency Min. Freq.

Arm Cortex M4 100 KB 550 KB < 200 ms 48 MHz

Car

Infotainment Wearables

Demo: Multilingual voice control on ESP32https://bit.ly/fluent-m5-demo-public

18

https://bit.ly/fluent-m5-demo-public

Community contributions

• Fluent Speech Commands Dataset

• Speech to intent dataset

• ~28,000 utterances from ~100 speakers

• 31 intents, 254 commands

• Link: bit.ly/fluent-speech-commands

• Downloaded over 500 times. Many research papers.

• Speech Brain Project at MILA

• A PyTorch based speech toolkit

19

https://bit.ly/fluent-speech-commands

Fluent.ai µCore

20

Fluent.ai µCore

• Proprietary low-resource spoken language understanding

library

• Detects wakephrase(s) or keywords + commands

21

Challenges

• Taking neural networks from GPUs to MCUs:

⎬

• Low-footprint (memory and CPU usage)

• Real-time processing for low latency recognition⎬

22

Model compression & Fluent.aiµCore

Fluent.ai Transformer

1 Model compression

• Compress size of the model by removing unimportant weights:

• Filter pruning, Kernel pruning, or Layer pruning

• One shot or iterative pruning

• Fine-grained pruning or coarse-grained pruning

• Some popular pruning methods:

• Level pruner

• Slim pruner

• Net_adapt

• AGP

1 Automated model compression (AMC)

• Using reinforcement learning algorithm (DDPG) to

automatically learn the pruning ratio for each layer

• Reward is a function of accuracy and FLOPS

1 AMC on Fluent_MN

• Compressing Fluent_MN

architecture up to 50%

with less than 1 percent

accuracy loss

Accuracy FLOPS

Original 99.634 14.61

Compressed 99.270 9.06

2 Transforming models

• Trained model on GPU using PyTorch

• Perform post-processing

• Generate C++ code describing model

• Compile model C++ code with library 26

• Generates C++ code

• Conditional compilation

• 8-bit quantization

• Weight reordering

Fluent.ai µCore

3 Real-time processing

Convolution in existing libraries designed for images, not time-

series

Training - batched utterances Inference - streaming audio

Entire utterance is available Audio streamed one frame at a time

Decoding latency not considered Latency minimized for good user

experience

Finite utterance length Continuously listening, NN applied in

overlapping windows

Activations for entire network stored

in memory

Memory usage must be minimized

27

Layer types

• Streaming layer types

• Unidirectional recurrent layers (GRU, LSTM)

• Convolution / depthwise-separable

• Windowed functions (e.g. MaxPool)

• Streaming cumulative functions (e.g. GlobalMaxPool)

• Skip connections

• Activation functions (ReLU, sigmoid, tanh)

28

Fluent µCore -- NN Layer structure

• All NN weights are stored in Flash

• Arm MCU platform allows network

weights to be fetched layer by layer

• Only activation buffer is stored in

RAM

• Process function

• Uses CMSIS

• Calculates activations as data is

received and updates buffer

• 1 frame input/output

29

Sequence of layers

• Layers joined in sequence

• Input frame propagates through layers

• Layers are independent

30

Convolution example

Streaming convolution, kernel size = 3, stride = 2

Process

Buffer OutputInput

31

Convolution example


Process

Buffer

NULL

OutputInputFrame

1

32

Convolution example


Process

Buffer

NULL

OutputInputFrame

2

33

Convolution example


Process

Buffer OutputInputFrame

3

34

Convolution example


Process

Frame

3

35

Convolution example


Process

Buffer

NULL

OutputInputFrame

4

36

Advantages of streaming NN

• No need to keep features/activations for entire utterance

• => lower memory requirements

• Live processing while the user is speaking

• => lower latency

• Redundant calculations eliminated by not using

overlapping windows

• => lower CPU usage

37

µCore vs tflite-micro

• Same Fluent wakeword model running on µCore and

tflite on a Linux machine

26.553 48

80

250

770

692

80

300

163 147

400

Tensor RAM (kB) Decoding t ime (ms)

MIPS (MHz/s) Output in terval (ms)

Fluent tflite-micro win=1.16s tflite-micro win=1.48s

38

µCore Summary

• Low footprint

• CPU efficient code, reduced model size & memory load operations

• small code size — to fit into limited flash memory

• small memory footprint (RAM)

• e.g., Arm Cortex M4, M33 @48 MHz, DSPG, XMOS

• Streaming NN: optimized for low latency / real-time operations

• Cross platform

39

! Cheaper yet effective device designs !

more demos

40

Demo: Multiple WWs and multilingual

intent on Arm Cortex M4• Arm Cortex-M4 microcontroller running at up

to 100 MHz.

41

bit.ly/fluent_ww_air_cortexm4

http://bit.ly/fluent_ww_air_cortexm4

Demo: Smart-home voice control on Cortex

M7 https://bit.ly/fluent-m7-demo

42

https://bit.ly/fluent-m7-demo

Copyright Notice

This presentation in this publication was presented as a tinyML® Talks webcast. The content reflects the opinion of the author(s) and their respective companies. The inclusion of presentations in this publication does not constitute an endorsement by tinyML Foundation or the sponsors.

There is no copyright protection claimed by this publication. However, each presentation is the work of the authors and their respective companies and may contain copyrighted material. As such, it is strongly encouraged that any use reflect proper acknowledgement to the appropriate source. Any questions regarding the use of any materials presented should be directed to the author(s) or their companies.

tinyML is a registered trademark of the tinyML Foundation.

www.tinyML.org

“speech recognition on low power devices” · please contact [email protected] if you are...

Documents