wired for speech: how voice activates and advances the human-computer relationship clifford nass...

Wired for Speech:

How Voice Activates and Advances the Human-Computer Relationship

Clifford Nass

Stanford University

Speaking is Fundamental

Fundamental means of human communication Everyone speaks

IQs as low as 50 Brains as small as 400 grams

Humans are built for words Learn new word every two hours for 11 years

Listening to Speech is Fundamental

Womb: Mother’s voice differentiation One day old: Differentiate speech vs. other sounds

Responses Brain hemispheres

Four day olds: Differentiate native language vs. other languages

Adults: Phoneme differentiation at 40-50 phonemes per second Cope with cocktail parties

Listening Beyond Speech is Fundamental

Humans are acutely aware of para-linguistic cues Gender Personality Accent Emotion Identity

Humans are Wired for Speech

Special parts of the brain devoted to Speech recognition Speech production Para-linguistic processing Voice recognition and discrimination

Therefore …

Voice interface should be the most

Enjoyable,

Efficient, &

Memorable

method for providing and acquiring information

Are They? No!Why Not?

Machines are different than humans Technology is insufficient

But are these good reasons?

Critical Insights

Voice = Human

Technology Voice = Human Voice

Human-Technology Interaction =

Human-Human Interaction

Where’s the Leverage?

Social sciences can give us What’s important What’s unimportant Understanding Methods Unanswered questions

Male or Female Voice?

Is gender important? Can technology have gender?

The Case of BMW

Brains are Built to Detect Voice Gender

First human category Infants at six months Self-identification by 2-3 years old Within seconds for adults

Multiple ways to recognize gender in voice Pitch Pitch range Variety of other spectral characteristics

Once Person Identifies Gender by Voice

Guides every interaction Same-gender favoritism

Trust Comfort

Gender stereotyping

Gender and Products

Gender should match product More appropriate More credible

Mutual influence of voice and product gender Female voices feminize products (and conversely) Female products feminize voices (and conversely) “Match principle”

Research Context

“Gender” of voice (synthetic) Gender of user “Gender” of product E-Commerce website

Examples of Advertisements

“Female” voice; female product

“Male” voice; female product

“Male” voice; male product

Appropriateness of the Voice

2

3

4

Female Product Male Product

Female Voice

Male Voice

Voice/Product Gender Influences

Female voices feminize products;Male voices masculinize products Strongest for opposite gender products

Female products feminize voices;Male products maculinize voices

Strong preference when voice matches product

Results for User Gender

People trust voices that match themselves Females conform more with “female” voices Males conform more with “male” voices

People like voices that match themselves Females like the “female” voice more Males like the “male” voice more

Other Results

Participants denied stereotyping technology Participants denied harboring stereotypes!

People stereotype voices by gender

Voice “gender” should match content “gender” Product descriptions Teaching Praise Jokes

Gender is Marked by Word Choice Female speech More “I,” “you,” “she,” “her,” “their,” “myself” Less “the,” “that,” these,” “one,” “two,” “some

more” More compliments More apologies More relationships between things Less description of particular things “They” for living things only

Voices should speak consistently with their “gender”

Selecting Voices

Voices manifest many traits Gender Personality Age Ethnicity

Voice traits should match content traits Content Language style Appearance (e.g., accent and race) Context

Voice traits should match user traits

If Only One Voice

Consider stereotypes Masculine vs. feminine (same voice)

Boost high frequencies (feminine) Boost low frequencies (masculine)

Emotions

Emotion and Voice

Voice is the first indicator of emotion Voice emotion has many markers

Pitch Value Range Change rate

Amplitude Value Range Change rate

Words per minute

Emotion is always relevant

User has initial emotion Interactions create emotions

Voice is particularly powerful Frustration is particularly powerful

Emotion and Technology

Could technology-based voices exhibit emotion?

Could technology-based voice emotion influence people?

Research Context

Create upset or happy drivers Have them “drive” for 15 minutes Female voice gives information and makes suggestions

Upbeat

Subdued

Number of Accidents

1

5

9

Happy Driver Upset Driver

Upbeat Voice

Subdued Voice

Results

People speak to car much more when emotion is consistent

People like car much more when emotion is consistent

Implications

User emotion is a critical part of any interaction

Emotion must match content Perception of voice

Trust Intelligence

User Performance Comfort Enjoyment

One Voice Emotion: Select for Goal

Overall liking Slightly happy voice

Attention-getting Anger Sadness

Trust and vulnerability Sadness (mild)

If You Can’t Manipulate Voice Emotion

Manipulate content Manipulate music

Using the First Person: Should IT say “I”

Should Voice Interfaces say “I”?

When should a voice interface say “I”? Does synthetic vs. recorded speech affect the

answer to the previous question?

The Importance of “I”

“I” is the most basic claim to humanity “I think, therefore I am” “I, Robot” Dobby and monsters don’t say “I”

“I” is the marker of responsibility “I made a mistake” vs.

“Mistakes were made”

Research Context Auction site Telephone interface with speech recognition Recorded bidding behavior Online questionnaire

Average Bidding Price

20

22

24

26

'I' No ''I'

Recorded Voice

Synthetic Voice

Results

When “I”+Recorded or “No I”+Synthetic System is higher quality Users were much more relaxed

“No I” is more objective “I” is more “present”

Results

“I” is right for embodiments Robots Characters Autonomous intelligence (“KITT”)

“I” is wrong when voice is second fiddle to technology Traditional car Heavily-branded products

Design

Text-to-Speech is a machine voice Recorded speech is a human voice Design questions are

Not philosophical questions Not judgment questions Experimentally verifiable

Mistakes are Tough to Talk About

Who is Responsible for Errors?

Recognition is not perfect When system fails, who should be assigned

responsibility? System User No one

Responding to Errors

Modesty Likable Unintelligent (people believe modesty!)

Criticism Isn’t really constructive Unpleasant Intelligent

Scapegoating Effective Safe

System Responses to Errors

System blame (most common)

No blame

User blame

Research context

Amazon-by-phone Numerous planned interaction errors

2

3

4

Likelihood of Purchase

No blame

System blame

User blame

Book Buying

Results

Neutral and system blame Sell much better than user blame

Neutral blame Easier to use than system blame Nicer than system blame

User blame is most intelligent! System blame is least intelligent

Results for Errors

Take responsibility when unavoidable Increases trust Increases liking Weak negative effect on intelligence

Ignore errors whenever possible Duck responsibility to third party if needed

Blame the phone line Blame the road

Results for Errors

Show commitment to the interaction Make guesses Show concern Griceian maxims

Quantity Relevance Clarity

Design

Error recovery is critically important Negative experiences are more memorable Adaptation is crucially important

Flattery is effective Note times when interaction is successful

Design to avoid errors Alignment (good repetition) Air quotes

Scripting is important at all stages of the interaction

Other Key Findings

Personality Accents Multiple voices and mixing voices Input vs. output modality Microphone type

Tying it All Together

Voice interfaces can be the most enjoyable, efficient, and memorable method for acquiring and providing information

Voice interfaces turn up the volume knob in user responses

The key is leveraging social aspects of speech

Summary – Part 1

Humans are wired for speech Interactions with voice interfaces are

fundamentally social Same social rules Same social expectations

Summary – Part 2

Social aspects of voice interfaces can be beneficial Users perform better Users feel better Users understand better

Social aspects of voice interfaces cannot be ignored Social audit is critical Social design is critical

Design psychology can be leveraged Less expensive than technology More effective than technology Broader impact than technology

wired for speech: how voice activates and advances the human-computer relationship clifford nass...

Documents