audrey echoes 2

35
Echoes From Audrey Voice Assistants Cross the Chasm ©2017 Fivesight Research LLC, www.fivesightresearch.com 1 Joe Buzzanga Founder/Chief Analyst Fivesight Research LLC [email protected] @joebuzzanga www.fivesightresearch.com 6/15/2017

Upload: joe-buzzanga

Post on 29-Jan-2018

49 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Audrey echoes 2

Echoes From AudreyVoice Assistants Cross the Chasm

©2017 Fivesight Research LLC, www.fivesightresearch.com 1

Joe BuzzangaFounder/Chief AnalystFivesight Research [email protected]@joebuzzangawww.fivesightresearch.com

6/15/2017

Page 2: Audrey echoes 2

Topics

©2017 Fivesight Research LLC, www.fivesightresearch.com

• Voice Assistants: Definition and Motivation

• Voice Assistant Landscape

• AI and Speech Recognition

• Market Adoption

• What’s Next?

2

Page 3: Audrey echoes 2

What is a Voice Assistant?

©2017 Fivesight Research LLC, www.fivesightresearch.com

AI—powered general purpose software application that simulates intelligence through conversational (vocal) interaction, factual knowledge, predictive abilities and personalization

3

What It Is

• ChatBot

• Dictation/transcription

• Smart Speaker

• Voice Search

What It Is Not

Page 4: Audrey echoes 2

Voice Assistant Motivation

©2017 Fivesight Research LLC, www.fivesightresearch.com 4

• Solving Human/Machine Interaction

• Solving Search

Search

• Natural language queries

• Q/A

• Query intent

Human/Machine

• Miniaturization

• More compute

• Mobile

Artificial Intelligence

Voice Assistant

Page 5: Audrey echoes 2

Human/Machine StruggleTyranny of the Typewriter

©2017 Fivesight Research LLC, www.fivesightresearch.com

• History of input devices is a history of human adaption to machine

• Interfaces & input have co-evolved with devices

• But the keyboard, real or virtual, remains de facto standard for text input

5

RemingtonU.S.

Commercial production 1873

Rasmus Malling-Hansen Writing BallDenmark

Commercial production 1870

Page 6: Audrey echoes 2

Escaping the KeyboardSome Mass Market Milestones

©2017 Fivesight Research LLC, www.fivesightresearch.com 6

1980 1990 2000 2010 2020

KeyboardApple II

1977

MouseApple Lisa

1983

Knowledge Navigator

1987

TrackballApple

Powerbook1991

TouchpadApple

Powerbook1994

StylusApple

Newton1993

StylusPalm1996

ClickwheeliPod2001

GestureNintendo

Wii2005

MultitouchiPhone2007

Digital CrownApple Watch

2015

HapticiPhone2015

VASiri

2011

EchoAmazon

2015

1970

Page 7: Audrey echoes 2

Solving “Search”

©2017 Fivesight Research LLC, www.fivesightresearch.com 7

• Keyboard

• Keywords

• Links

• Passive

• Stationary

• ”Private”

• Insentient

• Web/Database

• Voice

• Conversational

• Answers

• Active/Predictive

• Mobile

• Public

• Sentient

• Web/Personal/ Sensor

Plain Old Search Box

Search is an AI problem; it is “solved” in the VA

Page 8: Audrey echoes 2

Topics

©2017 Fivesight Research LLC, www.fivesightresearch.com

• Voice Assistants: Definition and Motivation

• Voice Assistant Landscape

• AI and Speech Recognition

• Market Adoption

• What’s Next?

8

Page 9: Audrey echoes 2

Siri circa 2010

©2017 Fivesight Research LLC, www.fivesightresearch.com 9

(Source: Adam Cheyer Presentation, December 18, 2014, https://wit.ai/blog/2014/12/18/adam-keynote)

Siri was conceived as a “do” engine• ~15 structured domains• Speech recognition & NLP• Integrates with Web & Apps

Page 10: Audrey echoes 2

Voice Assistant Landscape

©2017 Fivesight Research LLC, www.fivesightresearch.com 10

Vendor Product Initial Release Devices Vision Skills Notes

Apple Siri October, 2011

iPhone, AppleWatch, TVos,

CarPlay, MacOS, HomePod,

HomeKit/IoT

No Limited Acquisition

Google Google Now July, 2012 Android, iOS No No Phasing Out?

GoogleGoogle

AssistantMay, 2016

iOS, Android, AndroidWear,

Android TV, Android Auto,

Google Home/IoT

Yes, Google

LensYes, 230

Microsoft Cortana April, 2014iOS, Android, Windows,

Harmon Kardon Smart SpeakerNo Yes, 55

Amazon Alexa Nov. 2014Echo Devices, Moto Z, TV,

Auto, Smart Home/IoTYes Yes, 10,000

• Alexa inside Amazon App on iOS and

Android

• Vision/image recognition is part of

Amazon App

Facebook MAug. 2015

GA April, 2017

Inside Messenger on Android,

iOSNo No

• Not voice activated

• Contextual suggestions

Samsung S Voice May, 2012 Phasing Out?

Samsung Bixby DelayedSamsung S8, Wearables, TV,

Auto, Smart Home/IoT

Yes, Pinterest

LensPlanned

• March 2017 Launch delayed

• Acquisition of Viv but may be based on

S Voice

Baidu Duer Sept. 2015 Android, iOS, others in devYes, DuSee

ARYes

• Phasing Out?

• Acquisition of RavenTech, Feb. 2017

Page 11: Audrey echoes 2

High Level Functions

©2017 Fivesight Research LLC, www.fivesightresearch.com 11

Voice Assistant

Search Q/A Commands

Book a rideSend an

emailTurn on

lights

Device Control

Battery management

App suggestions

Personal management

Page 12: Audrey echoes 2

Voice Assistants Everywhere

©2017 Fivesight Research LLC, www.fivesightresearch.com

Voice Assistants can animate any microprocessor controlled device

12

Page 13: Audrey echoes 2

Ecosystem Metrics

©2017 Fivesight Research LLC, www.fivesightresearch.com 13

http://voicebot.ai/amazon-echo-alexa-stats/

Page 14: Audrey echoes 2

Topics

©2017 Fivesight Research LLC, www.fivesightresearch.com

• Voice Assistants: Definition and Motivation

• Voice Assistant Landscape

• AI and Speech Recognition

• Market Adoption

• What’s Next?

14

Page 15: Audrey echoes 2

Speech Recognition: State of the Art ~2010

©2017 Fivesight Research LLC, www.fivesightresearch.com

In most speech recognition tasks, human subjects produce one to two orders of magnitude less errors than machines. There is now increasing interest in finding ways to bridge such a performance gap.

15

Anusuya, M. A., and Shriniwas K. Katti. "Speech recognition by machine, a review." arXiv preprint arXiv:1001.2267 (2010). https://pdfs.semanticscholar.org/5cc6/ba30820e7c3b36d314f4deee990dc5655afb.pdf

Anusuya, M. A., and Shriniwas K. Katti. "Speech recognition by machine, a review." arXiv preprint arXiv:1001.2267 (2010). https://pdfs.semanticscholar.org/5cc6/ba30820e7c3b36d314f4deee990dc5655afb.pdf

Page 16: Audrey echoes 2

The Rise of Deep Learning

©2017 Fivesight Research LLC, www.fivesightresearch.com 16

https://blogs.nvidia.com/http://www.asimovinstitute.org/neural-network-zoo/

Page 17: Audrey echoes 2

Deep Learning Breakthrough

©2017 Fivesight Research LLC, www.fivesightresearch.com 17

The neural networks behind Google Voice transcriptionTuesday, August 11, 2015Posted by Françoise Beaufays, Research Scientist

“Since it launched in 2009, Google Voice transcription had used Gaussian Mixture Model (GMM) acoustic models, the state of the art in speech recognition for 30+ years. Sophisticated techniques like adapting the models to the speaker's voice augmented this relatively simple modeling method.

Then around 2012, Deep Neural Networks (DNNs) revolutionized the field of speech recognition. These multi-layer networks distinguish sounds better than GMMs by using “discriminative training,” differentiating phonetic units instead of modeling each one independently.

But things really improved rapidly with Recurrent Neural Networks (RNNs), and especially LSTM RNNs, first launched in Android’s speech recognizer in May 2012. Compared to DNNs, LSTM RNNs have additional recurrent connections and memory cells that allow them to “remember” the data they’ve seen so far—much as you interpret the words you hear based on previous words in a sentence.”https://research.googleblog.com/2015/08/the-neural-networks-behind-google-voice.html

Page 18: Audrey echoes 2

Speech Recognition Using Neural Nets

©2017 Fivesight Research LLC, www.fivesightresearch.com

In most speech recognition tasks, human subjects produce one to two orders of magnitude less errors than machines. There is now increasing interest in finding ways to bridge such a performance gap.

18

Anusuya, M. A., and Shriniwas K. Katti. "Speech recognition by machine, a review." arXiv preprint arXiv:1001.2267 (2010). https://pdfs.semanticscholar.org/5cc6/ba30820e7c3b36d314f4deee990dc5655afb.pdf

Anusuya, M. A., and Shriniwas K. Katti. "Speech recognition by machine, a review." arXiv preprint arXiv:1001.2267 (2010). https://pdfs.semanticscholar.org/5cc6/ba30820e7c3b36d314f4deee990dc5655afb.pdf

In this paper, we measure the human error rate on the widely used NIST 2000 test set, and find that our latest automated system has reached human parity. …This marks the first time that human parity has been reported for conversational speech.

Xiong, Wayne, et al. "Achieving human parity in conversational speech recognition." arXiv preprint arXiv:1610.05256 (2016)https://arxiv.org/pdf/1610.05256.pdf

Page 19: Audrey echoes 2

Speech Recognition: Microsoft “Historic Achievement”

©2017 Fivesight Research LLC, www.fivesightresearch.com 19

•Posted October 18, 2016 By Allison Linn

Microsoft has made a major breakthrough in speech recognition, creating a technology that recognizes the words in a conversation as well as a person does.In a paper published Monday, a team of researchers and engineers in Microsoft Artificial Intelligence and Research reported a speech recognition system that makes the same or fewer errors than professional transcriptionists. The researchers reported a word error rate (WER) of 5.9 percent, down from the 6.3 percent WER the team reported just last month.The 5.9 percent error rate is about equal to that of people who were asked to transcribe the same conversation, and it’s the lowest ever recorded against the industry standard Switchboard speech recognition task.

https://blogs.microsoft.com/next/2016/10/18/historic-achievement-microsoft-researchers-reach-human-parity-conversational-speech-recognition/#8oWG4Sj1FzHzqZX4.99

“Historic Achievement: Microsoft researchers reach human parity in conversational speech recognition”

Page 20: Audrey echoes 2

Speech Recognition: Google

©2017 Fivesight Research LLC, www.fivesightresearch.com 20

Source: Google I/O Keynote, May 17, 2017; https://youtu.be/Y2VF8tmLFHw?list=PLOU2XLYxmsIKC8eODk_RNCWv3fBcLvMMy&t=542

Page 21: Audrey echoes 2

Speech Recognition: IBM Disputes Microsoft “Historic Achievement”

©2017 Fivesight Research LLC, www.fivesightresearch.com 21

“The second and perhaps more important point made in this paper is that, unlike what was claimed in [1], we do not believe that human parity has been reached on this task. “Saon, George, et al. "English conversational telephone speech recognition by humans and machines." arXiv preprint arXiv:1703.02136 (2017). https://arxiv.org/abs/1703.02136

[1]Xiong, Wayne, et al. "Achieving human parity in conversational speech recognition." arXiv preprint arXiv:1610.05256 (2016)https://arxiv.org/pdf/1610.05256.pdf

Page 22: Audrey echoes 2

Speech Recognition vs Cognition

©2017 Fivesight Research LLC, www.fivesightresearch.com 22

Recognition/Dictation

CognitionConversation/Dialog

Inte

llige

nce

Command&ControlLimited NLP

Page 23: Audrey echoes 2

Topics

©2017 Fivesight Research LLC, www.fivesightresearch.com

• Voice Assistants: Definition and Motivation

• Voice Assistant Landscape

• AI and Speech Recognition

• Market Adoption

• What’s Next?

23

Page 24: Audrey echoes 2

Some Market Data Points

©2017 Fivesight Research LLC, www.fivesightresearch.com 24

• In the U.S. 20% of mobile queries on Google Android app are Voice; and trend is increasing

(Source: Google I/O 2016 Keynote, May 18, 2016, https://youtu.be/862r3XS2YB0?t=324)

• “Almost 70 percent of requests to the (Google) Assistant are expressed in natural language, not the typical keywords people type in a web search. And many requests are follow-ups that continue an ongoing conversation.”

(Source: Google Blog, May 17, 2017 https://blog.google/products/assistant/your-assistant-getting-better-on-google-home-and-your-phone/)

• “Bing profitability continues to grow with greater than 40 percent of the search revenue in June from Windows 10 devices. Bing PC query share in the United States approached 22 percent this quarter, not including volume from AOL and Yahoo!. The Cortana search box has over 100 million monthly active users, with 8 billion questions asked to date.”

(Source: Microsoft FY16Q4 Earnings Call, July 19,2016, MSFT Earnings Call

Page 25: Audrey echoes 2

Google Study: 2014

©2017 Fivesight Research LLC, www.fivesightresearch.com 25

https://googleblog.blogspot.com/2014/10/omg-mobile-voice-survey-reveals-teens.html

Page 26: Audrey echoes 2

Fivesight Market Research

©2017 Fivesight Research LLC, www.fivesightresearch.com

• March, 2017 survey of 800 U.S. smartphone users, split evenly between iOS and Android

https://www.fivesightresearch.com/report

26

Page 27: Audrey echoes 2

Siri is Number Two “Search Engine”

©2017 Fivesight Research LLC, www.fivesightresearch.com

• 6% of smartphone users selected Siri as their primary search enginehttps://www.fivesightresearch.com/report

27

Google84%

Siri6%

Yahoo2%

Browser2%

Bing2%

Other4%

What is your primary search engine on your smartphone?

Source: Fivesight Research Search Engine Survey, Q1 2017Base: Combined iOS and Android respondents, n=800Note: Percentages rounded

Page 28: Audrey echoes 2

Results by OS

©2017 Fivesight Research LLC, www.fivesightresearch.com

• 13% of iOS smartphone users selected Siri as their primary search enginehttps://www.fivesightresearch.com/report

28

78%

13%

4%

2%

1%

3%

90%

1%

1%

2%

2%

4%

Google

Siri

Yahoo

Browser

Bing

Other

What is your primary search engine on your smartphone?

iOS

Android

Source: Fivesight Research Search Engine Survey, Q1 2017Base: iOS, n=400; Android, n=400Note: Percentages rounded

Page 29: Audrey echoes 2

Usage as Secondary Search Engine

©2017 Fivesight Research LLC, www.fivesightresearch.com

• 72% use a VA to augment primary search engine

29

Yes72%

Yes84%

Yes61%

No28%

No16%

No39%

0%

20%

40%

60%

80%

100%

All Users iOS Android

Per

cen

t o

f R

esp

on

den

ts

Respondents Using a Voice Personal Assistant to Supplement Their Primary Smartphone Search Engine

Source: Fivesight Research Search Engine Survey, Q1 2017Base: All users, n=740; iOS, n=350; Android, n=390Note: Percentages rounded

https://www.fivesightresearch.com/report

Page 30: Audrey echoes 2

Topics

©2017 Fivesight Research LLC, www.fivesightresearch.com

• Voice Assistants: Definition and Motivation

• Voice Assistant Landscape

• AI and Speech Recognition

• Market Adoption

• What’s Next?

30

Page 31: Audrey echoes 2

Perceptual/Sensory Projection (Camera, Microphone, Speakers, Sensors)

©2017 Fivesight Research LLC, www.fivesightresearch.com

Pixel Sensors1. Proximity / ALS

2. Accelerometer / Gyrometer

3. Magnetometer

4. Pixel Imprint – Back-mounted fingerprint sensor

5. Barometer

6. Hall effect sensor

7. Android Sensor Hub

8. Advanced x-axis haptics for sharper / defined response

Location/Networking

1. GPS

2. Wi-Fi

3. Cellular

4. Bluetooth 4.2

5. NFC

31

iPhone Sensors1. Touch ID fingerprint sensor

2. Barometer

3. Three-axis gyro

4. Accelerometer

5. Proximity sensor

6. Ambient light sensor

Location/Networking

1. Assisted GPS and GLONASS

2. Digital compass

3. Wi-Fi

4. Cellular

5. iBeacon microlocation

6. Bluetooth 4.2

7. NFC

Page 32: Audrey echoes 2

Voice to Vision to Augmented Reality

©2017 Fivesight Research LLC, www.fivesightresearch.com

• Google Lens—physical world as corpus; objects as keywords• VPS AR

• Tensorflow Object recognition open API

• Snap—camera the new platform

• Amazon—Echo Look

• Facebook—AR

32

https://developers.facebook.com/FacebookforDevelopers/videos/10154614408993553/

Page 33: Audrey echoes 2

Tangible Bits, Virtual Buttons and Beyond

©2017 Fivesight Research LLC, www.fivesightresearch.com 33

https://youtu.be/wm5WCScGKxs?t=367

https://www.ultrahaptics.com/wp-content/themes/ultrahaptics/videos/4.webm

Ultrahaptics has developed a unique technology that enables users to receive tactile feedback without needing to wear or touch anything. The technology uses ultrasound to project sensations through the air and directly onto the user. Users can ‘feel’ touch-less buttons get feedback for mid-air gestures or interact with virtual objects.

Page 34: Audrey echoes 2

©2017 Fivesight Research LLC, www.fivesightresearch.com 34

General AI

HCIFusion

Voice Assistant

2017

?

The Ultimate SynthesisGeneral AI & Human/Computer Interface Fusion

Page 35: Audrey echoes 2

©2017 Fivesight Research LLC, www.fivesightresearch.com 35

Joe BuzzangaFivesight Research [email protected]@joebuzzangawww.fivesightresearch.com