all your voices are belong to us: stealing voices to fool humans and machines dibya mukhopadhyay,...

26
All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama at Birmingham, USA

Upload: archibald-lynch

Post on 18-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines

Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh SaxenaUniversity of Alabama at Birmingham, USA

Page 2: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Premise

• We leave voice traces behind• How difficult is it to make a machine talk like you?• What are the consequences?• Voice is used as a biometrics -> attacking voice-based user

authentication system• Voice makes us known to people -> attacking arbitrary

speech contexts

smita
fix ordering
Page 3: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Voice Morphing

• TTS Voice Synthesis (e.g., [AT&T voice synthesizer])

• Voice Conversion (e.g., Festvox)

Trained Voice Conversion

System

Source (Attacker) Speaker samples

Target (Victim) Speaker samples

map the source voice to target voice

Training

TestingInput: Samples in Attacker Voice

Output: Samples Spoken in Victim’s Voice

Voila!

smita
add citations to both approaches
Page 4: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Speaker Verification

• Machine-based Speaker Verification (e.g., [Douglas et al., DSP, 2006])

• A 2-class problem to identify claimant • System creates a model of a speaker in the training phase

to be verified in testing phase

• Human-based Speaker Verification• A human user serves as the verifier • Implicit in arbitrary communication

smita
add some citations
smita
add a paper representative
Page 5: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Our Contributions

• We study voice impersonation attacks • We evaluate attack feasibility against state-of-the-art

automated speaker verification algorithms as well as manual verification

• Our attacks represent realistic settings and are practical• We use an off- the-shelf voice morphing engine

• We use very less amount of training samples for voice conversions : approx. 6-8 minutes of training speech

• Most of the training samples are recorded using low-end devices such as smartphones / laptops

smita
cite the papers that break autoamted speaker verification
Page 6: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

System and Threat Model

Phase II: Building Voice Morphing Model

Training Conversion

Attacker’s (Source S) Voice

A =

(a1…

a m)

Any

utte

ranc

eOS = (s

1 …s

n )

Same utterance

as OT

M = µ(OS, OT)

Audio Recording

Target’s (T) Audio Samples

WiretappingSocial Media

OT = (t1…tn)

Phase I: Collecting Audio Samples

Bob

fT = M(A) = (f1…fm)

Human-based Speaker Verification

Machine-based Speaker Verification

Phase III: Attacking Applications with Morphed Voices

?Access Granted

I am Bob

Fake Utterance A in Bob’ voice

smita
turn into animation
Page 7: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Experiments and Measures

• Benign Setting: Test samples spoken by original speaker

• Attack Setting• Different Speaker Attack • Conversion Attack

• Metrics Used:• False Rejection Rate (FRR): fraction of genuine samples

rejected in benign setting• False Acceptance Rate (FAR): fraction of attack samples

accepted in attack setting

smita
make the notions FRR and FAR consistnet
Page 8: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Attacking Machine-based Speaker Verification

Page 9: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Tools and Algorithms

• Festvox Voice Conversion System• Bob Spear Speaker Verification System [E. Khoury; ICASSP, 2014]

• UBM-GMM: A modeling technique that uses the spectral features; computes a log-likelihood of the Gaussian Mixture Models for background modeling and speaker verification

• ISV: An improvement to UBM-GMM, where a speaker’s variability due to age, surroundings, etc., are compensated for, and it gives better performance for the same user in different scenarios

Page 10: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Datasets

• Voxforge• Recorded using standard recording devices, length: 5 secs• 28 (all male) speakers (chosen)

• MOBIO• Recorded using laptop microphones, length: 7-30 secs• 152 (99 male, 53 female) speakers

smita
move to the attack dataset part
Page 11: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Conversion Attack Setup

• Voxforge: • Attacker: 1 male speaker (CMU Arctic)• Victims: 8 speakers• Training: 100 samples of 5 secs each (i.e.,≈ 8 mins speech)

• MOBIO: • Attackers: 6 male and 3 female speakers• Victims: 32 male and 17 female speakers• Training: 12 samples of 30 secs each (i.e.,≈ 6 mins speech)

CMU Arctic Databases: http://festvox.org/cmu_arctic/index.html

smita
make it high level
Page 12: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Different Speaker Attack Setup

• Testing Voxforge: Original samples were swapped by samples spoken by each of the chosen CMU Arctic speakers

• Testing MOBIO: Original samples were swapped with other speakers’ samples

Page 13: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Results

YesNoYesNo Yes

Yes

smita
use same labeling as in the other attack table
Page 14: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Attacking Human-based Speaker Verification

Page 15: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

User Studies

• Famous Speaker Study: Attackers mimic celebrities, users have to identify celebrities’ samples

• Briefly Familiar Speaker Study: Attackers mimic speakers, users have to identify speakers’ samples

• Study Platform: Amazon Mechanical Turk (M-Turk)• # of Participants: 65 and 32 (for the two studies) M-Turk

online users• Related work: Prior work [Shirvanian-Saxena; CCS’14] studied

“Short Authenticated Strings”; we look at arbitrary speech

smita
remove incentieves
Page 16: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Famous Speaker Study Setup

• Samples collected using an application published on M-Turk • 5 Female speakers mimicked Oprah Winfrey (100 samples)• 5 Male speakers mimicked Morgan Freeman (100 samples)

• Users listen to a 2-min speech of Oprah and Morgan followed by several benign and attacked challenges

• Speaker Verification: identify the original speaker• Voice Similarity Test: rank the similarity of voice to

the original speaker

Page 17: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Attack Setup

• Different Speaker Attack • Female M-Turk Speakers for Oprah• Male M-Turk Speakers for Morgan

• Conversion Attack:• # of Training samples: 100 sentences of 4 secs each• Source: Male/Female M-Turk Speakers• Target: Oprah/Morgan

smita
make it high lvevel...rm number of speakers
Page 18: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Tests

• Speaker Verification Test: • Question: Is the speaker Oprah/Morgan?• Answer options: Yes, No, Not Sure

• Voice Similarity Test• Question: How similar is each sample to Oprah/Morgan?• Answer options: exactly similar, very similar, somehow

similar, not very similar, different

smita
combne this with next and make high level
Page 19: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Briefly Familiar Study Setup

• Male and female M-Turk speakers as victims • from the previous dataset

• 90 secs long victim’s voices played for familiarization• Speaker Verification Test (as before)• Voice Similarity Test (as before)

smita
make high level
Page 20: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Attack Setup

• Different Speaker Attack • Female M-Turk Speakers for Female Speaker• Male M-Turk Speakers for Male Speaker

• Conversion Attack:• Source: Female/male M-Turk Speakers• Target: Female/male M-Turk Speakers

smita
either remove or make high level
Page 21: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Results: Speaker Verification Test

Page 22: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Results: Voice Similarity TestOprah

Morgan

• Original Speaker: 88.08% found “exactly similar” or “very similar”

• Different Speaker: 86.81% found “different” or “not very similar”

• Conversion Attack: 74.10% rated “somehow similar” or “very similar”

• Original Speaker: 95.77% found “exactly similar” or “very similar”

• Different Speaker: 94.36% found “different” or “not very similar”

• Conversion Attack: 59.74% rated “somehow similar” or “very similar”

Page 23: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Results: Voice Similarity Test

Briefly Familiar Speaker Study

• Original Speaker: 88.08% found “exactly similar” or “very similar”

• Different Speaker: 86.81% found “different” or “not very similar”

• Conversion Attack: 74.10% rated “somehow similar” or “very similar”

Page 24: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Conclusions

• Conversion attack is successful about 80-90% against state-of-the-art speaker verification algorithms

• About 50% of the cases, human verifiers were fooled by morphed samples

• Attacks against human verifiers will improve as voice conversion/synthesis techniques will continue to improve

Page 25: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Limitations and Future Work

• We only used the known state-of-the-art biometric speaker verification system and an off-the-shelf voice conversion tool.

• The possibility of accepting an attacked sample may increase in real-life as people may not pay due attention.

• Attacks might improve when the human subjects have any hearing disability

• The current study does not tell us how the attacks might work in other scenarios such as faking real-time communication, or faking court evidences.

Page 26: All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines Dibya Mukhopadhyay, Maliheh Shirvanian, Nitesh Saxena University of Alabama

Thank You!