aws re:invent 2016: tips and tricks on bringing alexa to your products (alx304)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Matt Tavis, Principal Solutions Architect

Alexa Voice Service (@alexadevs)

December 2, 2016

Tips and Tricks on Bringing

Alexa to Your Products

ALX 304

What to Expect from the Session

• Key concepts for using the Alexa Voice Service

• Tips and Tricks for implementing an AVS client

• Considerations for evolving your solution

• Key components of a hands-free solution

Amazon Alexa Enabled

Open and extensible

solution to add Alexa to

any connected, for free

Alexa

Skills

Kit (ASK)

Works With Alexa

Open APIs and tools that

make it fast & easy to build

skills for Alexa products.

Lives In The Cloud

Automated Speech Recognition (ASR)

Natural Language Understanding (NLU)

Always Getting Smarter (AI)

Alexa

Voice

Service (AVS)

The Alexa Ecosystem

Supported by two powerful frameworks that leverage open APIs

Devices

Intelligent Cloud Service

Optimized suite of on-device + cloud-based technologies and services that power a wide array of connected devices

ON

-DE

VIC

E

CO

MP

ON

EN

TS

DE

VIC

E

TY

PE

SA

MA

ZO

N S

PE

EC

H

OS

3P

CO

NT

EN

T

HW

SW

Mic Arrays Speaker Notification LEDs Mute Button SoC/DSP

Audio Player AEC Beamforming State Machine HTTP Manager LWA Auth

Speech PrimitivesProduct Platform Platform Services

ASR TTS NLU

State Mgr

Knowledge (Evi)

Model Training Analytics

Data Ingestion Auth Tools

Personalization

GUI Cards

Domains Services

VUI UX

Speech orchestrator

3P Skills Smart Home

Smart Things Wink Insteon SmartHome APIUber Dominos + 3000 more

Dialog Mgr

Skills

ASR NLU TTS

Learning

Alexa Voice Service – how it works

?Your Product

Recent AVS Announcements

“Omate Rise 3G smartwatch

slaps Amazon Alexa on your

wrist”

– Engadget 9/1/16

Adding Alexa to the already-

intriguing Pebble Core takes it

from “Huh, that’s interesting” to

“Did we just catch up to Star

Trek?” – Forbes 6/3/16

“This smart watch puts

Alexa on your wrist” – The

Verge 4/20/16

“Amazon Alexa is now available

on first device not made by

Amazon” – TechCrunch

4/28/16

“Nucleus debuts first

Alexa-enabled

touchscreen video

device” – Mashable

8/4/16

“Amazon Alexa support

coming to LG's

SmartThinQ hub” –

Engadget 9/2/16

“Sonos Bringing Voice Control

To Its Speakers With Amazon

Partnership” – Forbes 8/30/16

Beam me up, Alexa! Onyx

communicator gets voice

assistant integration – CNET

9/14/16

Tip #1: Follow the Sample brick road

• Get started working almost anywhere

• PC, Linux, Mac, Raspberry Pi, CHIP, …

• NEW! – Includes hands-free implementation

• Application.log shows the proper message flow

• 3 Sample companion apps for linking and tokens

• Android, iOS, Web app

• First stop for all debugging!

https://github.com/alexa/alexa-avs-sample-app


Example AVS Client Architecture

AVS Client

Companion Apps

Connection Management

Messaging Layer

Controller

Audio Input (Mic)

Audio Player

Alert Management

Wake Word Engine

Web App

iOS

Android

Native Media Player

Native Timers and

Alarms

Wake Word Process

Alexa model

GUI / Attention System

State

Mgmnt

Directive

Queues

Event

Dispatch

Audio Output HTTP/2

AVSControl

Logic

3rd Party / Built-in

Custom dev / Sample

Interacting with AVS Cloud Service

• AVS is Amazon’s intelligent cloud service that allows you as a

developer to voice-enable any connected product with a microphone

and speaker

• API endpoint (https://avs-alexa-na.amazon.com)

• /events – for all speech, playback and alert events

• /directives – the source path of AVS directives (read-only)

• /ping – to keep connection open

• Message bus for all Events and Directives

• Response messages and a down channel

• State machine to determine how to handle messages

• Pause playback? Duck audio? Alert versus music?

https://avs-alexa-na.amazon.com/

Tip #2: Take a Phased Approach

Port sample

• Re-platform (e.g., Java to C++)

• Swap 3rd party components (e.g., Jetty to OkHttp)

• Integrate with native components (e.g., Android MediaPlayer, local buttons)

Harden tap-to-talk solution

• Implement AVS functional design guidelines

• Define device monitoring and management

• Design an update and deployment process

• Perform functional validation of core features and music

Integrate hands-free

• Integrate hands-free components

• Test and tune hands-free performance

• Responsiveness

• Distance testing

• Testing with audio output

• Testing with ambient noise

AVS functional design guidelines:

https://developer.amazon.com/public/solutions/alexa/alexa-voice-service/content/alexa-voice-service-functional-design-guide

https://developer.amazon.com/public/solutions/alexa/alexa-voice-service/content/alexa-voice-service-functional-design-guide

Tip #3 Love the logs – Events

21:22:55.064 [AWT-EventQueue-0] INFO com.amazon.alexa.avs.http.AVSClient- Request metadata:

{

"event" : {

"header" : {

"namespace" : "SpeechRecognizer",

"name" : "Recognize",

"messageId" : "b15376c6-6265-451c-acee-bc5b9168af8e",

"dialogRequestId" : "919336ea-25d5-43d9-8af8-61d1344fbcb5"

},

"payload" : {

"profile" : "CLOSE_TALK",

"format" : "AUDIO_L16_RATE_16000_CHANNELS_1"

}

…

}

Thread Id

Event Name

Message Id

Tip #3: Love the logs - Directives

21:23:00.827 [RequestThread] INFO com.amazon.alexa.avs.http.AVSClient -x-amzn-requestid: 0e8aaffffee24de5-000017a1-0008f272-94b39f8f1fc8f82d-50d324c7-5-

21:23:00.926 [RequestThread] INFO com.amazon.alexa.avs.http.MessageParser - Response metadata:

{

"directive" : {

"header" : {

"namespace" : "SpeechSynthesizer",

"name" : "Speak",

"messageId" : "65106c28-f005-4f5a-87d5-f38ccaa58e0a",

"dialogRequestId" : "919336ea-25d5-43d9-8af8-61d1344fbcb5"

},

…

}

Request Id

Event Name

Message Id

Complex Sequences - Multi-turn

Alexa, set a timer.

Recognize event

Speak directive

For how long?

ExpectSpeech directive

SpeechStarted event

SpeechFinished event

Recognize event

AVS Controller

AudioPlayer

Microphone10 minutes.

10 minutes starting now.

…

PCM

PCM

Complex Sequences - Setting an Alarm

Alexa, set a timer for 10 minutes.

Recognize event

Speak directive

10 minutes starting now.

SpeechStarted event

SpeechFinished event

SetAlertSucceeded event

AVS Controller

AudioPlayer

Alert Manager

PCM

SetAlert directive

Alert

Store

AlertStarted event

AlertEnteredForeground event

Time passes….

Local

management

Complex Sequences – Music Playback

Alexa, play classical music. Playing classical music from

Amazon Music.

PlaybackStarted event

AVS Controller

AudioPlayer

Play directive

ProgressReportDelayElapsed event

ProgressReportIntervalElapsed event

PlaybackNearlyFinished event

ProgressReportIntervalElapsed event

…

PlaybackFinished event

Play directive…

Tip #4: Music comes in many formats

- Common formats

- Need support for all current codecs

- Need to handle playlists as well

AAC/MP4 Amazon Music, iHeartRadio, TuneIn

MP3 Amazon Music, TuneIn

HLS Amazon Music, iHeartRadio, TuneIn, Audible

PLS iHeartRadio, TuneIn

m3u TuneIn, Amazon Music

Shoutcast / ICY iHeartRadio, TuneIn

ID3 Tags iHeartRadio, TuneIn

Audio Player State Machine

Playing

Stopped

Idle

Buffer

Underrun

Paused Finished


Playing

Stopped

Idle

Buffer

Underrun

Paused Finished

Playback initiated via voice or

companion app.

- Directive: Play

- Events: PlaybackStarted,

Progress events

Superseded by other channels:

1. Dialog

2. Alerts

3. Content

Next Play directive comes after

PlaybackNearlyFinished event.


Playing

Stopped

Idle

Buffer

Underrun

Paused Finished

Playback paused by user

action or other channels.

- Directive: none

- Events: PlaybackPaused,

PlaybackResumed (back to

Playing)


Playing

Stopped

Idle

Buffer

Underrun

Paused Finished

Playback stopped via voice

command or companion app.

- Directive: Stop or

ClearQueue.CLEAR_ALL

- Events: PlaybackStopped

Playback continues with a Play

directive.


Playing

Stopped

Idle

Buffer

Underrun

Paused Finished

Playback reaches end of

content.

- Directive: none

- Events: PlaybackFinished

Playback ends when no Play

directives follow

PlaybackNearlyFinished/

PlaybackFinished events.

Playback continues with a new

Play directive.

Tip #4: Design for the Future

• Events and Directives

• Directives can come in at any time – don’t assume order

• New directives and events can be added at any time – drop

unknown directives on the floor

• Message Formats

• New elements should be able to be added to JSON formats

at any time

• Software Updating

• All AVS devices should have an OTA update mechanism

• Updates should not “brick” the device and support fallback

Hands-free Requires Hands-on

• Building a hands-free experience requires sourcing

multiple components and libraries

• Plan months (>3) in advance for tuning of a hands-free

solution

• No all-in-one offerings today but multiple solutions to

consider

• Wake word spotter:

• Front-end hardware:

• Audio libraries:

Hands-free Front End Architecture

Mic Array

Echo Cancellation

Wake Word Spotter

Beamforming (only for multiple mics)

Noise Reduction

One of more input microphones (SNR >=

65dB, Sensitivity: -38dB ±1dB @ 94dB SPL)

Hardware (DSP) or software solution to

subtract device audio output from mic input

Software process and library trained to “spot”

the Alexa wake word from an audio buffer

Decision making library to pick the best quality

mic for capturing user utterance

Optional component to further reduce ambient

noise and tune audio for an ASR

All of these components need to be sourced or developed

for your solution from 3rd party offerings or by hand.

Amazon + Intel

CLOUD &

DATA CENTER

THINGS &

DEVICES

AWS IOT Alexa Voice

Services

• 10+ year partnership

• Joint development

• Shared customer passion

• High performance + low costs

• World class supply chain

Amazon EC2 Amazon S3

Did You Know?

Collateral &

SW/HW Dev Kits

Standards

Influence

Form Factor

Reference

Design

Innovation

Excellence

Program

ODM Reference

System

Ethnographic

Research

Enabling Personal Assistance

Design

Speech

Context

VoiceAudio

Intel’s Solid Voice & Speech Expertise

• Support for multiple designs and form

factors

• Broad set of voice processing

components

• Low power, highly optimized noise

reduction

• High quality tuning & configuration tools

• Audio labs fully synchronized with

leading partners

Enriching Daily Life with the

Personal Experience and

Simple, Natural Interaction

of Voice

Intel and Amazon are Collaborating to Extend Natural Voice Interaction

For Consumers

Call to Action

• Download the Sample from GitHub – build out a

Raspberry Pi! ~ 2 hours

• Start your new product today…


Port sample

Harden tap-to-talk solution

Integrate hands-free


Other Alexa SessionsT

hurs

day

11:30am ALX202: How Amazon is enabling the future of Automotive Venetian, Level 3, Lido

3003

1pm ALX303: Building a Smarter Home with Alexa Venetian, Level 3,

Murano 3203

3:30 ALX307: Voice-enabling Your Home and Devices with Amazon Alexa and AWS

IoT

Venetian, Level 2,

Opaline Theatre

5pm ALX302: Build a Serverless Back End for Your Alexa-Based Voice Interactions Venetian, Level 2,

Opaline Theatre

9:30am ALX304: Tips and Tricks on Bringing Alexa to Your Products Venetian, Level 1,

Marco Polo 806

11am ALX305: From VUI to QA: Building a Voice-Based Adventure Game for Alexa Venetian, Level 1,

Marco Polo 806Friday

11am ALX203: Workshop: Creating Voice Experiences with Alexa Skills: From Idea to

Testing in Two Hours

Mirage, Jamaica B

1pm ALX306: State of the Union: Amazon Alexa and Recent Advances in

Conversational AI

Venetian, Level 2,

Sands Showroom

11:30am

and

2:30pm

ALX204: Workshop: Build an Alexa-Enabled Product with Raspberry Pi Mirage, Antigua B

5pm ALX301: Alexa in the Enterprise: How JPL Leverages Alexa to Further Space

Exploration with Internet of Things

Venetian, Level 2,

Venetian B

Wednesday

Thank you!

Remember to complete

your evaluations!

aws re:invent 2016: tips and tricks on bringing alexa to your products (alx304)

Technology