audrey echoes 2
TRANSCRIPT
Echoes From AudreyVoice Assistants Cross the Chasm
©2017 Fivesight Research LLC, www.fivesightresearch.com 1
Joe BuzzangaFounder/Chief AnalystFivesight Research [email protected]@joebuzzangawww.fivesightresearch.com
6/15/2017
Topics
©2017 Fivesight Research LLC, www.fivesightresearch.com
• Voice Assistants: Definition and Motivation
• Voice Assistant Landscape
• AI and Speech Recognition
• Market Adoption
• What’s Next?
2
What is a Voice Assistant?
©2017 Fivesight Research LLC, www.fivesightresearch.com
AI—powered general purpose software application that simulates intelligence through conversational (vocal) interaction, factual knowledge, predictive abilities and personalization
3
What It Is
• ChatBot
• Dictation/transcription
• Smart Speaker
• Voice Search
What It Is Not
Voice Assistant Motivation
©2017 Fivesight Research LLC, www.fivesightresearch.com 4
• Solving Human/Machine Interaction
• Solving Search
Search
• Natural language queries
• Q/A
• Query intent
Human/Machine
• Miniaturization
• More compute
• Mobile
Artificial Intelligence
Voice Assistant
Human/Machine StruggleTyranny of the Typewriter
©2017 Fivesight Research LLC, www.fivesightresearch.com
• History of input devices is a history of human adaption to machine
• Interfaces & input have co-evolved with devices
• But the keyboard, real or virtual, remains de facto standard for text input
5
RemingtonU.S.
Commercial production 1873
Rasmus Malling-Hansen Writing BallDenmark
Commercial production 1870
Escaping the KeyboardSome Mass Market Milestones
©2017 Fivesight Research LLC, www.fivesightresearch.com 6
1980 1990 2000 2010 2020
KeyboardApple II
1977
MouseApple Lisa
1983
Knowledge Navigator
1987
TrackballApple
Powerbook1991
TouchpadApple
Powerbook1994
StylusApple
Newton1993
StylusPalm1996
ClickwheeliPod2001
GestureNintendo
Wii2005
MultitouchiPhone2007
Digital CrownApple Watch
2015
HapticiPhone2015
VASiri
2011
EchoAmazon
2015
1970
Solving “Search”
©2017 Fivesight Research LLC, www.fivesightresearch.com 7
• Keyboard
• Keywords
• Links
• Passive
• Stationary
• ”Private”
• Insentient
• Web/Database
• Voice
• Conversational
• Answers
• Active/Predictive
• Mobile
• Public
• Sentient
• Web/Personal/ Sensor
Plain Old Search Box
Search is an AI problem; it is “solved” in the VA
Topics
©2017 Fivesight Research LLC, www.fivesightresearch.com
• Voice Assistants: Definition and Motivation
• Voice Assistant Landscape
• AI and Speech Recognition
• Market Adoption
• What’s Next?
8
Siri circa 2010
©2017 Fivesight Research LLC, www.fivesightresearch.com 9
(Source: Adam Cheyer Presentation, December 18, 2014, https://wit.ai/blog/2014/12/18/adam-keynote)
Siri was conceived as a “do” engine• ~15 structured domains• Speech recognition & NLP• Integrates with Web & Apps
Voice Assistant Landscape
©2017 Fivesight Research LLC, www.fivesightresearch.com 10
Vendor Product Initial Release Devices Vision Skills Notes
Apple Siri October, 2011
iPhone, AppleWatch, TVos,
CarPlay, MacOS, HomePod,
HomeKit/IoT
No Limited Acquisition
Google Google Now July, 2012 Android, iOS No No Phasing Out?
GoogleGoogle
AssistantMay, 2016
iOS, Android, AndroidWear,
Android TV, Android Auto,
Google Home/IoT
Yes, Google
LensYes, 230
Microsoft Cortana April, 2014iOS, Android, Windows,
Harmon Kardon Smart SpeakerNo Yes, 55
Amazon Alexa Nov. 2014Echo Devices, Moto Z, TV,
Auto, Smart Home/IoTYes Yes, 10,000
• Alexa inside Amazon App on iOS and
Android
• Vision/image recognition is part of
Amazon App
Facebook MAug. 2015
GA April, 2017
Inside Messenger on Android,
iOSNo No
• Not voice activated
• Contextual suggestions
Samsung S Voice May, 2012 Phasing Out?
Samsung Bixby DelayedSamsung S8, Wearables, TV,
Auto, Smart Home/IoT
Yes, Pinterest
LensPlanned
• March 2017 Launch delayed
• Acquisition of Viv but may be based on
S Voice
Baidu Duer Sept. 2015 Android, iOS, others in devYes, DuSee
ARYes
• Phasing Out?
• Acquisition of RavenTech, Feb. 2017
High Level Functions
©2017 Fivesight Research LLC, www.fivesightresearch.com 11
Voice Assistant
Search Q/A Commands
Book a rideSend an
emailTurn on
lights
Device Control
Battery management
App suggestions
Personal management
Voice Assistants Everywhere
©2017 Fivesight Research LLC, www.fivesightresearch.com
Voice Assistants can animate any microprocessor controlled device
12
Ecosystem Metrics
©2017 Fivesight Research LLC, www.fivesightresearch.com 13
http://voicebot.ai/amazon-echo-alexa-stats/
Topics
©2017 Fivesight Research LLC, www.fivesightresearch.com
• Voice Assistants: Definition and Motivation
• Voice Assistant Landscape
• AI and Speech Recognition
• Market Adoption
• What’s Next?
14
Speech Recognition: State of the Art ~2010
©2017 Fivesight Research LLC, www.fivesightresearch.com
In most speech recognition tasks, human subjects produce one to two orders of magnitude less errors than machines. There is now increasing interest in finding ways to bridge such a performance gap.
15
Anusuya, M. A., and Shriniwas K. Katti. "Speech recognition by machine, a review." arXiv preprint arXiv:1001.2267 (2010). https://pdfs.semanticscholar.org/5cc6/ba30820e7c3b36d314f4deee990dc5655afb.pdf
Anusuya, M. A., and Shriniwas K. Katti. "Speech recognition by machine, a review." arXiv preprint arXiv:1001.2267 (2010). https://pdfs.semanticscholar.org/5cc6/ba30820e7c3b36d314f4deee990dc5655afb.pdf
The Rise of Deep Learning
©2017 Fivesight Research LLC, www.fivesightresearch.com 16
https://blogs.nvidia.com/http://www.asimovinstitute.org/neural-network-zoo/
Deep Learning Breakthrough
©2017 Fivesight Research LLC, www.fivesightresearch.com 17
The neural networks behind Google Voice transcriptionTuesday, August 11, 2015Posted by Françoise Beaufays, Research Scientist
“Since it launched in 2009, Google Voice transcription had used Gaussian Mixture Model (GMM) acoustic models, the state of the art in speech recognition for 30+ years. Sophisticated techniques like adapting the models to the speaker's voice augmented this relatively simple modeling method.
Then around 2012, Deep Neural Networks (DNNs) revolutionized the field of speech recognition. These multi-layer networks distinguish sounds better than GMMs by using “discriminative training,” differentiating phonetic units instead of modeling each one independently.
But things really improved rapidly with Recurrent Neural Networks (RNNs), and especially LSTM RNNs, first launched in Android’s speech recognizer in May 2012. Compared to DNNs, LSTM RNNs have additional recurrent connections and memory cells that allow them to “remember” the data they’ve seen so far—much as you interpret the words you hear based on previous words in a sentence.”https://research.googleblog.com/2015/08/the-neural-networks-behind-google-voice.html
Speech Recognition Using Neural Nets
©2017 Fivesight Research LLC, www.fivesightresearch.com
In most speech recognition tasks, human subjects produce one to two orders of magnitude less errors than machines. There is now increasing interest in finding ways to bridge such a performance gap.
18
Anusuya, M. A., and Shriniwas K. Katti. "Speech recognition by machine, a review." arXiv preprint arXiv:1001.2267 (2010). https://pdfs.semanticscholar.org/5cc6/ba30820e7c3b36d314f4deee990dc5655afb.pdf
Anusuya, M. A., and Shriniwas K. Katti. "Speech recognition by machine, a review." arXiv preprint arXiv:1001.2267 (2010). https://pdfs.semanticscholar.org/5cc6/ba30820e7c3b36d314f4deee990dc5655afb.pdf
In this paper, we measure the human error rate on the widely used NIST 2000 test set, and find that our latest automated system has reached human parity. …This marks the first time that human parity has been reported for conversational speech.
Xiong, Wayne, et al. "Achieving human parity in conversational speech recognition." arXiv preprint arXiv:1610.05256 (2016)https://arxiv.org/pdf/1610.05256.pdf
Speech Recognition: Microsoft “Historic Achievement”
©2017 Fivesight Research LLC, www.fivesightresearch.com 19
•Posted October 18, 2016 By Allison Linn
Microsoft has made a major breakthrough in speech recognition, creating a technology that recognizes the words in a conversation as well as a person does.In a paper published Monday, a team of researchers and engineers in Microsoft Artificial Intelligence and Research reported a speech recognition system that makes the same or fewer errors than professional transcriptionists. The researchers reported a word error rate (WER) of 5.9 percent, down from the 6.3 percent WER the team reported just last month.The 5.9 percent error rate is about equal to that of people who were asked to transcribe the same conversation, and it’s the lowest ever recorded against the industry standard Switchboard speech recognition task.
https://blogs.microsoft.com/next/2016/10/18/historic-achievement-microsoft-researchers-reach-human-parity-conversational-speech-recognition/#8oWG4Sj1FzHzqZX4.99
“Historic Achievement: Microsoft researchers reach human parity in conversational speech recognition”
Speech Recognition: Google
©2017 Fivesight Research LLC, www.fivesightresearch.com 20
Source: Google I/O Keynote, May 17, 2017; https://youtu.be/Y2VF8tmLFHw?list=PLOU2XLYxmsIKC8eODk_RNCWv3fBcLvMMy&t=542
Speech Recognition: IBM Disputes Microsoft “Historic Achievement”
©2017 Fivesight Research LLC, www.fivesightresearch.com 21
“The second and perhaps more important point made in this paper is that, unlike what was claimed in [1], we do not believe that human parity has been reached on this task. “Saon, George, et al. "English conversational telephone speech recognition by humans and machines." arXiv preprint arXiv:1703.02136 (2017). https://arxiv.org/abs/1703.02136
[1]Xiong, Wayne, et al. "Achieving human parity in conversational speech recognition." arXiv preprint arXiv:1610.05256 (2016)https://arxiv.org/pdf/1610.05256.pdf
Speech Recognition vs Cognition
©2017 Fivesight Research LLC, www.fivesightresearch.com 22
Recognition/Dictation
CognitionConversation/Dialog
Inte
llige
nce
Command&ControlLimited NLP
Topics
©2017 Fivesight Research LLC, www.fivesightresearch.com
• Voice Assistants: Definition and Motivation
• Voice Assistant Landscape
• AI and Speech Recognition
• Market Adoption
• What’s Next?
23
Some Market Data Points
©2017 Fivesight Research LLC, www.fivesightresearch.com 24
• In the U.S. 20% of mobile queries on Google Android app are Voice; and trend is increasing
(Source: Google I/O 2016 Keynote, May 18, 2016, https://youtu.be/862r3XS2YB0?t=324)
• “Almost 70 percent of requests to the (Google) Assistant are expressed in natural language, not the typical keywords people type in a web search. And many requests are follow-ups that continue an ongoing conversation.”
(Source: Google Blog, May 17, 2017 https://blog.google/products/assistant/your-assistant-getting-better-on-google-home-and-your-phone/)
• “Bing profitability continues to grow with greater than 40 percent of the search revenue in June from Windows 10 devices. Bing PC query share in the United States approached 22 percent this quarter, not including volume from AOL and Yahoo!. The Cortana search box has over 100 million monthly active users, with 8 billion questions asked to date.”
(Source: Microsoft FY16Q4 Earnings Call, July 19,2016, MSFT Earnings Call
Google Study: 2014
©2017 Fivesight Research LLC, www.fivesightresearch.com 25
https://googleblog.blogspot.com/2014/10/omg-mobile-voice-survey-reveals-teens.html
Fivesight Market Research
©2017 Fivesight Research LLC, www.fivesightresearch.com
• March, 2017 survey of 800 U.S. smartphone users, split evenly between iOS and Android
https://www.fivesightresearch.com/report
26
Siri is Number Two “Search Engine”
©2017 Fivesight Research LLC, www.fivesightresearch.com
• 6% of smartphone users selected Siri as their primary search enginehttps://www.fivesightresearch.com/report
27
Google84%
Siri6%
Yahoo2%
Browser2%
Bing2%
Other4%
What is your primary search engine on your smartphone?
Source: Fivesight Research Search Engine Survey, Q1 2017Base: Combined iOS and Android respondents, n=800Note: Percentages rounded
Results by OS
©2017 Fivesight Research LLC, www.fivesightresearch.com
• 13% of iOS smartphone users selected Siri as their primary search enginehttps://www.fivesightresearch.com/report
28
78%
13%
4%
2%
1%
3%
90%
1%
1%
2%
2%
4%
Siri
Yahoo
Browser
Bing
Other
What is your primary search engine on your smartphone?
iOS
Android
Source: Fivesight Research Search Engine Survey, Q1 2017Base: iOS, n=400; Android, n=400Note: Percentages rounded
Usage as Secondary Search Engine
©2017 Fivesight Research LLC, www.fivesightresearch.com
• 72% use a VA to augment primary search engine
29
Yes72%
Yes84%
Yes61%
No28%
No16%
No39%
0%
20%
40%
60%
80%
100%
All Users iOS Android
Per
cen
t o
f R
esp
on
den
ts
Respondents Using a Voice Personal Assistant to Supplement Their Primary Smartphone Search Engine
Source: Fivesight Research Search Engine Survey, Q1 2017Base: All users, n=740; iOS, n=350; Android, n=390Note: Percentages rounded
https://www.fivesightresearch.com/report
Topics
©2017 Fivesight Research LLC, www.fivesightresearch.com
• Voice Assistants: Definition and Motivation
• Voice Assistant Landscape
• AI and Speech Recognition
• Market Adoption
• What’s Next?
30
Perceptual/Sensory Projection (Camera, Microphone, Speakers, Sensors)
©2017 Fivesight Research LLC, www.fivesightresearch.com
Pixel Sensors1. Proximity / ALS
2. Accelerometer / Gyrometer
3. Magnetometer
4. Pixel Imprint – Back-mounted fingerprint sensor
5. Barometer
6. Hall effect sensor
7. Android Sensor Hub
8. Advanced x-axis haptics for sharper / defined response
Location/Networking
1. GPS
2. Wi-Fi
3. Cellular
4. Bluetooth 4.2
5. NFC
31
iPhone Sensors1. Touch ID fingerprint sensor
2. Barometer
3. Three-axis gyro
4. Accelerometer
5. Proximity sensor
6. Ambient light sensor
Location/Networking
1. Assisted GPS and GLONASS
2. Digital compass
3. Wi-Fi
4. Cellular
5. iBeacon microlocation
6. Bluetooth 4.2
7. NFC
Voice to Vision to Augmented Reality
©2017 Fivesight Research LLC, www.fivesightresearch.com
• Google Lens—physical world as corpus; objects as keywords• VPS AR
• Tensorflow Object recognition open API
• Snap—camera the new platform
• Amazon—Echo Look
• Facebook—AR
32
https://developers.facebook.com/FacebookforDevelopers/videos/10154614408993553/
Tangible Bits, Virtual Buttons and Beyond
©2017 Fivesight Research LLC, www.fivesightresearch.com 33
https://youtu.be/wm5WCScGKxs?t=367
https://www.ultrahaptics.com/wp-content/themes/ultrahaptics/videos/4.webm
Ultrahaptics has developed a unique technology that enables users to receive tactile feedback without needing to wear or touch anything. The technology uses ultrasound to project sensations through the air and directly onto the user. Users can ‘feel’ touch-less buttons get feedback for mid-air gestures or interact with virtual objects.
©2017 Fivesight Research LLC, www.fivesightresearch.com 34
General AI
HCIFusion
Voice Assistant
2017
?
The Ultimate SynthesisGeneral AI & Human/Computer Interface Fusion
©2017 Fivesight Research LLC, www.fivesightresearch.com 35
Joe BuzzangaFivesight Research [email protected]@joebuzzangawww.fivesightresearch.com