![Page 1: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/1.jpg)
Developing Your Own Wake Word Engine
Just Like “Alexa” and “OK Google”
Xuchen Yao, CEO, KITT.AI
Guoguo Chen, CTO, KITT.AI
![Page 2: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/2.jpg)
What’s a “wake word”?
• Wake word
• Hot word
• Offline
• Code runs on
CPU/DSP/MCU
• 7x24• Always listening
• One shot
understanding
• Online
• Code runs on cloud
• On Demand
• Explicit permission
Alexa
OK Google
Hey Siriwhat’s the weather today?
![Page 3: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/3.jpg)
Conversational UI Pipeline
wake up
device
speech text
text
understandingdialogue
management
text speech
text
voice
![Page 4: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/4.jpg)
a customizable hotword detection engine
a.k.a: deep neural network in 2MB of RAM
hotword.io video blog
![Page 5: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/5.jpg)
![Page 6: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/6.jpg)
10,000+ developers, 7000+ unique hotwords
Who’s using it (released 5/2016)
Dominating developer community for hotword detection
![Page 7: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/7.jpg)
Use Cases
![Page 8: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/8.jpg)
#1 Hotword: Smart Mirrorhttps://github.com/evancohen/smart-mirror (credits to Evan Cohen) video link
![Page 11: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/11.jpg)
Conversational UI Pipeline
wake up
device
speech text
text
understandingdialogue
management
text speech
text
voice
Speech Pipeline
![Page 12: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/12.jpg)
VoiceMicrophone
Array
Wake Word
Detection
Speech
Recognition
local
• Close talking
• Far field (3-9
feet)
• 2, 4, or 6
microphones
• Linear/circular
cloud/local
• Voice Activity
Detection
• Auto Gain
Control
• Fast response
(0.1 second)
• High accuracy
• Adaptive Echo
Cancellation
• Beam forming
• IBM/Microsoft/Nua
nce/Google
• Alexa Voice Service
• Kaldi
• PocketSphinx
• HTK
• Command & Control
• Language
Understanding
• Telephone
(8KHz Sampling)
• Others (16KHz)
• Noises: TV,
radio, street,
café, car, music
• Pitch: children,
adults, senior
• Accent:
US/UK/Europe/
Asian…
Speech Pipeline
![Page 13: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/13.jpg)
Supported Platforms and Wrappers
• Raspberry Pi
• Mac OS X
• iPhone/iPad/iPod
• x86/64bit Ubuntu
• Android
• Pine 64
• Intel Edison
• Samsung Artik
• Allwinner R-series
• Ingenic X1000
• Rockchip
![Page 14: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/14.jpg)
Personal vs. Universal modelsPersonal Universal
Voice samples needed 3 At least 1500
Speaker-independent No Yes
Speaker-specific Sort of No
Robust against noise No Yes
Free Yes No
Time needed Immediately 2 weeks
![Page 15: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/15.jpg)
Customizing a universal model
define
hotwordcollect voice
train a
model
deliver &
evaluate
deploy to
beta users
ship &
success
collect voice
from device
hotword
web API
Iterate & Improve
desired performance:
>90% detection rate
<= 3 false alarms in 24 hours
![Page 16: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/16.jpg)
Science behind wake word
![Page 17: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/17.jpg)
Challenges
• High detection rate
• Low false alarm
• Efficient: detect every 0.1 second
• Small RAM: <2MB
• Too much ambiguity, not much context
Is this “Alexa”?
short window longer window
![Page 18: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/18.jpg)
Existing Algorithm
![Page 19: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/19.jpg)
Existing Algorithm
![Page 20: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/20.jpg)
Existing Algorithm
• Advantage:
–Simplified pipeline
–Simplified decoder
• Disadvantage:
–Massive hotword specific training data
![Page 21: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/21.jpg)
Possible Ways to Improve
• Data augmentation
– Adding noise
– Adding reverberation
– And so on…
original add noise add noise
and reverberation
![Page 22: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/22.jpg)
Possible Ways to Improve
• Network models
– Model selection
• Feedforward models? Recurrent models?
– Model compression
• 32-bit float 16-bit float 8-bit integer
• Parameters with small absolute value
![Page 23: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/23.jpg)
Possible Ways to Improve
• Decoder redesigning
– Modeling smaller units
• Syllables, phones, etc
– False alarm suppression
• Additional classifier?
![Page 24: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/24.jpg)
Training with Tesla K20/K80
• Positive data
– 1,500 hotword samples
• Negative data
– Thousands of hours of speech
• Training time
– Half a day with 4 K80 GPUs
![Page 25: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/25.jpg)
Software Architecture
FrontendBackend
![Page 26: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/26.jpg)
KITT.AI Scientific Computing
Deep Learning Cloud
DevicesProduction
Cloud
Traffic
ELB
Content
Websocket
audio, msg
HTTPs
Message
Queue
Data Training Model Deploy
![Page 27: Developing Your Own Wake Word Engine Just Like Alexa and ...on-demand.gputechconf.com/gtc/2017/presentation/s... · Developing Your Own Wake Word Engine Just Like “Alexa” and](https://reader035.vdocuments.site/reader035/viewer/2022071005/5fc2796594e2e566ff223897/html5/thumbnails/27.jpg)
Running Your First Snowboy Demo