aws re:invent 2016: tips and tricks on bringing alexa to your products (alx304)
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Matt Tavis, Principal Solutions Architect
Alexa Voice Service (@alexadevs)
December 2, 2016
Tips and Tricks on Bringing
Alexa to Your Products
ALX 304
What to Expect from the Session
• Key concepts for using the Alexa Voice Service
• Tips and Tricks for implementing an AVS client
• Considerations for evolving your solution
• Key components of a hands-free solution
Amazon Alexa Enabled
Open and extensible
solution to add Alexa to
any connected, for free
Alexa
Skills
Kit (ASK)
Works With Alexa
Open APIs and tools that
make it fast & easy to build
skills for Alexa products.
Lives In The Cloud
Automated Speech Recognition (ASR)
Natural Language Understanding (NLU)
Always Getting Smarter (AI)
Alexa
Voice
Service (AVS)
The Alexa Ecosystem
Supported by two powerful frameworks that leverage open APIs
Devices
Intelligent Cloud Service
Optimized suite of on-device + cloud-based technologies and services that power a wide array of connected devices
ON
-DE
VIC
E
CO
MP
ON
EN
TS
DE
VIC
E
TY
PE
SA
MA
ZO
N S
PE
EC
H
OS
3P
CO
NT
EN
T
HW
SW
Mic Arrays Speaker Notification LEDs Mute Button SoC/DSP
Audio Player AEC Beamforming State Machine HTTP Manager LWA Auth
Speech PrimitivesProduct Platform Platform Services
ASR TTS NLU
State Mgr
Knowledge (Evi)
Model Training Analytics
Data Ingestion Auth Tools
Personalization
GUI Cards
Domains Services
VUI UX
Speech orchestrator
3P Skills Smart Home
Smart Things Wink Insteon SmartHome APIUber Dominos + 3000 more
Dialog Mgr
Skills
ASR NLU TTS
Learning
Alexa Voice Service – how it works
?Your Product
Recent AVS Announcements
“Omate Rise 3G smartwatch
slaps Amazon Alexa on your
wrist”
– Engadget 9/1/16
Adding Alexa to the already-
intriguing Pebble Core takes it
from “Huh, that’s interesting” to
“Did we just catch up to Star
Trek?” – Forbes 6/3/16
“This smart watch puts
Alexa on your wrist” – The
Verge 4/20/16
“Amazon Alexa is now available
on first device not made by
Amazon” – TechCrunch
4/28/16
“Nucleus debuts first
Alexa-enabled
touchscreen video
device” – Mashable
8/4/16
“Amazon Alexa support
coming to LG's
SmartThinQ hub” –
Engadget 9/2/16
“Sonos Bringing Voice Control
To Its Speakers With Amazon
Partnership” – Forbes 8/30/16
Beam me up, Alexa! Onyx
communicator gets voice
assistant integration – CNET
9/14/16
Tip #1: Follow the Sample brick road
• Get started working almost anywhere
• PC, Linux, Mac, Raspberry Pi, CHIP, …
• NEW! – Includes hands-free implementation
• Application.log shows the proper message flow
• 3 Sample companion apps for linking and tokens
• Android, iOS, Web app
• First stop for all debugging!
https://github.com/alexa/alexa-avs-sample-app
Example AVS Client Architecture
AVS Client
Companion Apps
Connection Management
Messaging Layer
Controller
Audio Input (Mic)
Audio Player
Alert Management
Wake Word Engine
Web App
iOS
Android
Native Media Player
Native Timers and
Alarms
Wake Word Process
Alexa model
GUI / Attention System
State
Mgmnt
Directive
Queues
Event
Dispatch
Audio Output HTTP/2
AVSControl
Logic
3rd Party / Built-in
Custom dev / Sample
Interacting with AVS Cloud Service
• AVS is Amazon’s intelligent cloud service that allows you as a
developer to voice-enable any connected product with a microphone
and speaker
• API endpoint (https://avs-alexa-na.amazon.com)
• /events – for all speech, playback and alert events
• /directives – the source path of AVS directives (read-only)
• /ping – to keep connection open
• Message bus for all Events and Directives
• Response messages and a down channel
• State machine to determine how to handle messages
• Pause playback? Duck audio? Alert versus music?
Tip #2: Take a Phased Approach
Port sample
• Re-platform (e.g., Java to C++)
• Swap 3rd party components (e.g., Jetty to OkHttp)
• Integrate with native components (e.g., Android MediaPlayer, local buttons)
Harden tap-to-talk solution
• Implement AVS functional design guidelines
• Define device monitoring and management
• Design an update and deployment process
• Perform functional validation of core features and music
Integrate hands-free
• Integrate hands-free components
• Test and tune hands-free performance
• Responsiveness
• Distance testing
• Testing with audio output
• Testing with ambient noise
AVS functional design guidelines:
https://developer.amazon.com/public/solutions/alexa/alexa-voice-service/content/alexa-voice-service-functional-design-guide
Tip #3 Love the logs – Events
21:22:55.064 [AWT-EventQueue-0] INFO com.amazon.alexa.avs.http.AVSClient- Request metadata:
{
"event" : {
"header" : {
"namespace" : "SpeechRecognizer",
"name" : "Recognize",
"messageId" : "b15376c6-6265-451c-acee-bc5b9168af8e",
"dialogRequestId" : "919336ea-25d5-43d9-8af8-61d1344fbcb5"
},
"payload" : {
"profile" : "CLOSE_TALK",
"format" : "AUDIO_L16_RATE_16000_CHANNELS_1"
}
…
}
Thread Id
Event Name
Message Id
Tip #3: Love the logs - Directives
21:23:00.827 [RequestThread] INFO com.amazon.alexa.avs.http.AVSClient -x-amzn-requestid: 0e8aaffffee24de5-000017a1-0008f272-94b39f8f1fc8f82d-50d324c7-5-
21:23:00.926 [RequestThread] INFO com.amazon.alexa.avs.http.MessageParser - Response metadata:
{
"directive" : {
"header" : {
"namespace" : "SpeechSynthesizer",
"name" : "Speak",
"messageId" : "65106c28-f005-4f5a-87d5-f38ccaa58e0a",
"dialogRequestId" : "919336ea-25d5-43d9-8af8-61d1344fbcb5"
},
…
}
Request Id
Event Name
Message Id
Complex Sequences - Multi-turn
Alexa, set a timer.
Recognize event
Speak directive
For how long?
ExpectSpeech directive
SpeechStarted event
SpeechFinished event
Recognize event
AVS Controller
AudioPlayer
Microphone10 minutes.
10 minutes starting now.
…
PCM
PCM
Complex Sequences - Setting an Alarm
Alexa, set a timer for 10 minutes.
Recognize event
Speak directive
10 minutes starting now.
SpeechStarted event
SpeechFinished event
SetAlertSucceeded event
AVS Controller
AudioPlayer
Alert Manager
PCM
SetAlert directive
Alert
Store
AlertStarted event
AlertEnteredForeground event
Time passes….
Local
management
Complex Sequences – Music Playback
Alexa, play classical music. Playing classical music from
Amazon Music.
PlaybackStarted event
AVS Controller
AudioPlayer
Play directive
ProgressReportDelayElapsed event
ProgressReportIntervalElapsed event
PlaybackNearlyFinished event
ProgressReportIntervalElapsed event
…
PlaybackFinished event
Play directive…
Tip #4: Music comes in many formats
- Common formats
- Need support for all current codecs
- Need to handle playlists as well
AAC/MP4 Amazon Music, iHeartRadio, TuneIn
MP3 Amazon Music, TuneIn
HLS Amazon Music, iHeartRadio, TuneIn, Audible
PLS iHeartRadio, TuneIn
m3u TuneIn, Amazon Music
Shoutcast / ICY iHeartRadio, TuneIn
ID3 Tags iHeartRadio, TuneIn
Audio Player State Machine
Playing
Stopped
Idle
Buffer
Underrun
Paused Finished
Audio Player State Machine
Playing
Stopped
Idle
Buffer
Underrun
Paused Finished
Playback initiated via voice or
companion app.
- Directive: Play
- Events: PlaybackStarted,
Progress events
Superseded by other channels:
1. Dialog
2. Alerts
3. Content
Next Play directive comes after
PlaybackNearlyFinished event.
Audio Player State Machine
Playing
Stopped
Idle
Buffer
Underrun
Paused Finished
Playback paused by user
action or other channels.
- Directive: none
- Events: PlaybackPaused,
PlaybackResumed (back to
Playing)
Audio Player State Machine
Playing
Stopped
Idle
Buffer
Underrun
Paused Finished
Playback stopped via voice
command or companion app.
- Directive: Stop or
ClearQueue.CLEAR_ALL
- Events: PlaybackStopped
Playback continues with a Play
directive.
Audio Player State Machine
Playing
Stopped
Idle
Buffer
Underrun
Paused Finished
Playback reaches end of
content.
- Directive: none
- Events: PlaybackFinished
Playback ends when no Play
directives follow
PlaybackNearlyFinished/
PlaybackFinished events.
Playback continues with a new
Play directive.
Tip #4: Design for the Future
• Events and Directives
• Directives can come in at any time – don’t assume order
• New directives and events can be added at any time – drop
unknown directives on the floor
• Message Formats
• New elements should be able to be added to JSON formats
at any time
• Software Updating
• All AVS devices should have an OTA update mechanism
• Updates should not “brick” the device and support fallback
Hands-free Requires Hands-on
• Building a hands-free experience requires sourcing
multiple components and libraries
• Plan months (>3) in advance for tuning of a hands-free
solution
• No all-in-one offerings today but multiple solutions to
consider
• Wake word spotter:
• Front-end hardware:
• Audio libraries:
Hands-free Front End Architecture
Mic Array
Echo Cancellation
Wake Word Spotter
Beamforming (only for multiple mics)
Noise Reduction
One of more input microphones (SNR >=
65dB, Sensitivity: -38dB ±1dB @ 94dB SPL)
Hardware (DSP) or software solution to
subtract device audio output from mic input
Software process and library trained to “spot”
the Alexa wake word from an audio buffer
Decision making library to pick the best quality
mic for capturing user utterance
Optional component to further reduce ambient
noise and tune audio for an ASR
All of these components need to be sourced or developed
for your solution from 3rd party offerings or by hand.
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MILES KINGSTONGeneral Manager
Smart Home Group
Intel and Amazon Partner To Enable Natural
Voice Interaction For Consumers
Amazon + Intel
CLOUD &
DATA CENTER
THINGS &
DEVICES
AWS IOT Alexa Voice
Services
• 10+ year partnership
• Joint development
• Shared customer passion
• High performance + low costs
• World class supply chain
Amazon EC2 Amazon S3
Did You Know?
Collateral &
SW/HW Dev Kits
Standards
Influence
Form Factor
Reference
Design
Innovation
Excellence
Program
ODM Reference
System
Ethnographic
Research
Enabling Personal Assistance
Design
Speech
Context
VoiceAudio
Enabling Personal Assistance
Design
Speech
Context
VoiceAudio
Intel’s Solid Voice & Speech Expertise
• Support for multiple designs and form
factors
• Broad set of voice processing
components
• Low power, highly optimized noise
reduction
• High quality tuning & configuration tools
• Audio labs fully synchronized with
leading partners
Enriching Daily Life with the
Personal Experience and
Simple, Natural Interaction
of Voice
Intel and Amazon are Collaborating to Extend Natural Voice Interaction
For Consumers
Call to Action
• Download the Sample from GitHub – build out a
Raspberry Pi! ~ 2 hours
• Start your new product today…
https://github.com/alexa/alexa-avs-sample-app
Port sample
Harden tap-to-talk solution
Integrate hands-free
Other Alexa SessionsT
hurs
day
11:30am ALX202: How Amazon is enabling the future of Automotive Venetian, Level 3, Lido
3003
1pm ALX303: Building a Smarter Home with Alexa Venetian, Level 3,
Murano 3203
3:30 ALX307: Voice-enabling Your Home and Devices with Amazon Alexa and AWS
IoT
Venetian, Level 2,
Opaline Theatre
5pm ALX302: Build a Serverless Back End for Your Alexa-Based Voice Interactions Venetian, Level 2,
Opaline Theatre
9:30am ALX304: Tips and Tricks on Bringing Alexa to Your Products Venetian, Level 1,
Marco Polo 806
11am ALX305: From VUI to QA: Building a Voice-Based Adventure Game for Alexa Venetian, Level 1,
Marco Polo 806Friday
11am ALX203: Workshop: Creating Voice Experiences with Alexa Skills: From Idea to
Testing in Two Hours
Mirage, Jamaica B
1pm ALX306: State of the Union: Amazon Alexa and Recent Advances in
Conversational AI
Venetian, Level 2,
Sands Showroom
11:30am
and
2:30pm
ALX204: Workshop: Build an Alexa-Enabled Product with Raspberry Pi Mirage, Antigua B
5pm ALX301: Alexa in the Enterprise: How JPL Leverages Alexa to Further Space
Exploration with Internet of Things
Venetian, Level 2,
Venetian B
Wednesday
Thank you!
Remember to complete
your evaluations!