the rise of voice platforms - comparing voice related api's

12
Comparing voice related API’s Christian Rebernik @crebernik7791

Upload: christian-rebernik

Post on 14-Feb-2017

34 views

Category:

Technology


1 download

TRANSCRIPT

Comparing voice related API’s

Christian Rebernik @crebernik7791

Voice First Footprint

In 2017 there will be 33 mio devices

● The Voice 2017 Report - VoiceLabs analysis combined with research from CIRP, KPCB and InfoScout

Voice adoption

The ‘Voice First’ era has already started

● Alexa in 4% of US households (end 2016)

● Siri handles over 2bn commands a week

● 20% of Google searches on Android handsets input by voice

Alexa

Google home

Ding Dong

Voice Devices

Creating an open ecosystem

Amazon EchoSkills and Alexa Voices Service

Google HomeGoogle Assistant Actions

Speech Recognition API

Developing for the Amazon Alexa● Limit understanding

Amazon Echo is build for predefined options (e.g. no custom notes). Session is ended after 8 sec.

● Predefined wake word defines the customer experience.Only 4 wake words available and must be in any conversation.

● No notifications and no presenceYou can’t alert the user of an event. You cannot react on e.g. welcome home.

● No audio / No identificationAnybody can use Alexa (guests, etc.) and access all informations

Technology Stack

Components enabling Voice User Interfaces

Implemented use cases leveraging the Hardware and AI Software

Software that interprets speech, enables conversations and provide natural voice.

Devices the consumer is interacting like Amazon Echo or Google Home

Applications

AI Software

Hardware

AI overview

120 companies in Speech Recognition

Ventures Scanner, Contact [email protected]

Speech Recognition API

Real time speech-to-text API’sGoogle4 IBM3 Microsoft2

Status Beta Beta/Production Preview

Language Support1 43 (89) 8 (14) 6 (7)

Cost/min 0,024 €0,006 / 15sec

0,02 € 0,06 €1000 calls a 15 sec for 4$

Speaker detection no English (8KHz) no

Audio Formats FLAC, Linear16, MULAW, ARM, AMR_WB

FLAC, PCM, WAV, OGG, NULAW

PCM single channel, Siren, SirenSR

Noise Friendly Yes Unkown Unkown

Word hints Yes No No1) Languages support (Languages supported including dialects)2) Microsoft: https://www.microsoft.com/cognitive-services/en-us/speech-api 3) IBM: http://www.ibm.com/watson/developercloud/speech-to-text.html4) Google: https://cloud.google.com/speech/

● High audio capturing qualityUse lossless coding. Capture audio with 16,000 Hz or higher. Use native sample rate.

● No additional noiseAPI’s include noise reduction. Duplicate noise reduction can reduce the quality. Echo and noise has huge impact on speech recognition quality

● User educationEducate user to be close to the microphone

● One speaker per stream.For multi speaker setting try to separate the audio streams as the current API’s are built for dictation

● Provide contextContext matters a lot. Provide word hints to help the system to correct detection.

Speech Recognition API

Best practices

Problem

Real life - Voice is in the early days

Speech-to-text-quality

Speaker recognition

Language mixing

Punctuation

Demo

Voice interaction in IoT

We are building a voice first company and are looking for support

- Technical Research- Deep Learning & NLP Scientist- Software Engineers

Christian Rebernik Contact: [email protected]