wired for speech: how voice activates and advances the human-computer relationship clifford nass...
TRANSCRIPT
Wired for Speech:
How Voice Activates and Advances the Human-Computer Relationship
Clifford Nass
Stanford University
Speaking is Fundamental
Fundamental means of human communication Everyone speaks
IQs as low as 50 Brains as small as 400 grams
Humans are built for words Learn new word every two hours for 11 years
Listening to Speech is Fundamental
Womb: Mother’s voice differentiation One day old: Differentiate speech vs. other sounds
Responses Brain hemispheres
Four day olds: Differentiate native language vs. other languages
Adults: Phoneme differentiation at 40-50 phonemes per second Cope with cocktail parties
Listening Beyond Speech is Fundamental
Humans are acutely aware of para-linguistic cues Gender Personality Accent Emotion Identity
Humans are Wired for Speech
Special parts of the brain devoted to Speech recognition Speech production Para-linguistic processing Voice recognition and discrimination
Therefore …
Voice interface should be the most
Enjoyable,
Efficient, &
Memorable
method for providing and acquiring information
Are They? No!Why Not?
Machines are different than humans Technology is insufficient
But are these good reasons?
Critical Insights
Voice = Human
Technology Voice = Human Voice
Human-Technology Interaction =
Human-Human Interaction
Where’s the Leverage?
Social sciences can give us What’s important What’s unimportant Understanding Methods Unanswered questions
Male or Female Voice?
Is gender important? Can technology have gender?
The Case of BMW
Brains are Built to Detect Voice Gender
First human category Infants at six months Self-identification by 2-3 years old Within seconds for adults
Multiple ways to recognize gender in voice Pitch Pitch range Variety of other spectral characteristics
Once Person Identifies Gender by Voice
Guides every interaction Same-gender favoritism
Trust Comfort
Gender stereotyping
Gender and Products
Gender should match product More appropriate More credible
Mutual influence of voice and product gender Female voices feminize products (and conversely) Female products feminize voices (and conversely) “Match principle”
Research Context
“Gender” of voice (synthetic) Gender of user “Gender” of product E-Commerce website
Examples of Advertisements
“Female” voice; female product
“Male” voice; female product
“Male” voice; male product
Appropriateness of the Voice
2
3
4
Female Product Male Product
Female Voice
Male Voice
Voice/Product Gender Influences
Female voices feminize products;Male voices masculinize products Strongest for opposite gender products
Female products feminize voices;Male products maculinize voices
Strong preference when voice matches product
Results for User Gender
People trust voices that match themselves Females conform more with “female” voices Males conform more with “male” voices
People like voices that match themselves Females like the “female” voice more Males like the “male” voice more
Other Results
Participants denied stereotyping technology Participants denied harboring stereotypes!
People stereotype voices by gender
Voice “gender” should match content “gender” Product descriptions Teaching Praise Jokes
Gender is Marked by Word Choice Female speech More “I,” “you,” “she,” “her,” “their,” “myself” Less “the,” “that,” these,” “one,” “two,” “some
more” More compliments More apologies More relationships between things Less description of particular things “They” for living things only
Voices should speak consistently with their “gender”
Selecting Voices
Voices manifest many traits Gender Personality Age Ethnicity
Voice traits should match content traits Content Language style Appearance (e.g., accent and race) Context
Voice traits should match user traits
If Only One Voice
Consider stereotypes Masculine vs. feminine (same voice)
Boost high frequencies (feminine) Boost low frequencies (masculine)
Emotions
Emotion and Voice
Voice is the first indicator of emotion Voice emotion has many markers
Pitch Value Range Change rate
Amplitude Value Range Change rate
Words per minute
Emotion is always relevant
User has initial emotion Interactions create emotions
Voice is particularly powerful Frustration is particularly powerful
Emotion and Technology
Could technology-based voices exhibit emotion?
Could technology-based voice emotion influence people?
Research Context
Create upset or happy drivers Have them “drive” for 15 minutes Female voice gives information and makes suggestions
Upbeat
Subdued
Number of Accidents
1
5
9
Happy Driver Upset Driver
Upbeat Voice
Subdued Voice
Results
People speak to car much more when emotion is consistent
People like car much more when emotion is consistent
Implications
User emotion is a critical part of any interaction
Emotion must match content Perception of voice
Trust Intelligence
User Performance Comfort Enjoyment
One Voice Emotion: Select for Goal
Overall liking Slightly happy voice
Attention-getting Anger Sadness
Trust and vulnerability Sadness (mild)
If You Can’t Manipulate Voice Emotion
Manipulate content Manipulate music
Using the First Person: Should IT say “I”
Should Voice Interfaces say “I”?
When should a voice interface say “I”? Does synthetic vs. recorded speech affect the
answer to the previous question?
The Importance of “I”
“I” is the most basic claim to humanity “I think, therefore I am” “I, Robot” Dobby and monsters don’t say “I”
“I” is the marker of responsibility “I made a mistake” vs.
“Mistakes were made”
Research Context Auction site Telephone interface with speech recognition Recorded bidding behavior Online questionnaire
Average Bidding Price
20
22
24
26
'I' No ''I'
Recorded Voice
Synthetic Voice
Results
When “I”+Recorded or “No I”+Synthetic System is higher quality Users were much more relaxed
“No I” is more objective “I” is more “present”
Results
“I” is right for embodiments Robots Characters Autonomous intelligence (“KITT”)
“I” is wrong when voice is second fiddle to technology Traditional car Heavily-branded products
Design
Text-to-Speech is a machine voice Recorded speech is a human voice Design questions are
Not philosophical questions Not judgment questions Experimentally verifiable
Mistakes are Tough to Talk About
Who is Responsible for Errors?
Recognition is not perfect When system fails, who should be assigned
responsibility? System User No one
Responding to Errors
Modesty Likable Unintelligent (people believe modesty!)
Criticism Isn’t really constructive Unpleasant Intelligent
Scapegoating Effective Safe
System Responses to Errors
System blame (most common)
No blame
User blame
Research context
Amazon-by-phone Numerous planned interaction errors
2
3
4
Likelihood of Purchase
No blame
System blame
User blame
Book Buying
Results
Neutral and system blame Sell much better than user blame
Neutral blame Easier to use than system blame Nicer than system blame
User blame is most intelligent! System blame is least intelligent
Results for Errors
Take responsibility when unavoidable Increases trust Increases liking Weak negative effect on intelligence
Ignore errors whenever possible Duck responsibility to third party if needed
Blame the phone line Blame the road
Results for Errors
Show commitment to the interaction Make guesses Show concern Griceian maxims
Quantity Relevance Clarity
Design
Error recovery is critically important Negative experiences are more memorable Adaptation is crucially important
Flattery is effective Note times when interaction is successful
Design to avoid errors Alignment (good repetition) Air quotes
Scripting is important at all stages of the interaction
Other Key Findings
Personality Accents Multiple voices and mixing voices Input vs. output modality Microphone type
Tying it All Together
Voice interfaces can be the most enjoyable, efficient, and memorable method for acquiring and providing information
Voice interfaces turn up the volume knob in user responses
The key is leveraging social aspects of speech
Summary – Part 1
Humans are wired for speech Interactions with voice interfaces are
fundamentally social Same social rules Same social expectations
Summary – Part 2
Social aspects of voice interfaces can be beneficial Users perform better Users feel better Users understand better
Social aspects of voice interfaces cannot be ignored Social audit is critical Social design is critical
Design psychology can be leveraged Less expensive than technology More effective than technology Broader impact than technology