speaker veri cation using i-vectors - da/sec

30
Speaker Verification using i-Vectors CAST-F¨ orderpreis IT-Sicherheit 2014 Andreas Nautsch Hochschule Darmstadt, atip GmbH, CASED, da/sec Security Research Group Darmstadt, 20.11.2014 Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 1/22

Upload: others

Post on 07-Dec-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speaker Veri cation using i-Vectors - da/sec

Speaker Verification using i-VectorsCAST-Forderpreis IT-Sicherheit 2014

Andreas Nautsch

Hochschule Darmstadt, atip GmbH, CASED, da/sec Security Research Group

Darmstadt, 20.11.2014

.

.

.

. .

.

.. .

.. .

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

. ....... ...........

.. .. ..... ........... .

. .......... ....... . .

......... .........

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. .

.. ..

. .

. .

.

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 1/22

Page 2: Speaker Veri cation using i-Vectors - da/sec

Outline

I Motivation with research questions

I Speaker verification and i-vectors

I Research and development

I Conclusion and future perspectives

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 2/22

Page 3: Speaker Veri cation using i-Vectors - da/sec

Motivation

Biometric IT-security & forensic applications

I Authentication and recognition by voice

I Advantages to knowledge-/token-based approaches:I Cannot be forgotten

I Cannot be incorporated

I Application fields and scenarios e.g.:I Mobile device authentication: random PINs, short duration

I Call-center user validation: free speech, variant duration

I Suspect tracking: various contents & signal qualities

Voice reference Feature extraction

Voice probe Feature extraction

Comparison Score

Accept

Reject

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 3/22

Page 4: Speaker Veri cation using i-Vectors - da/sec

Motivation

Assessment of Speaker Recognition

I Known technology: modeling acoustic features

! Very accurate due to detailed modeling

! Fast processing on short duration scenarios

% High computational effort on text-independent scenarios

I State-of-the-Art: identity vector (i-vector) features, 2011

! Fully text- & language-independent

! Fast computation & scoring independent of duration

% Unknown behavior in commercial voice biometric scenarios

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 4/22

Page 5: Speaker Veri cation using i-Vectors - da/sec

Motivation

Voice biometrics: speech duration & sample completeness

I Sound unit (phoneme) distribution by duration

From [T. Hasan et al., 2013]

I Text-independent case: content varies from sample to sampleI Long duration ⇒ stable distribution

I Short duration ⇒ insufficient data

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 5/22

Page 6: Speaker Veri cation using i-Vectors - da/sec

Motivation

Baseline speaker recognition approach

I Hidden-Markov-Models (HMMs)I State-based model

I States can represent articulation phases, phonemes, . . .

0 1 2

p00

p10

p11

p12

p22

I Detailed, but extensive computation of optimal path

2

1

0

2

1

0

2

1

0

2

1

0

2

1

0

2

1

0

2

1

0

2

1

0

2

1

0

Speaker modelm

Impostor model

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 6/22

Page 7: Speaker Veri cation using i-Vectors - da/sec

Motivation

Research questions

1. Is the i-vector approach extensible on short duration scenarioswith applicable performances?

2. Do i-vector systems deliver new information to HMM systems?

3. Are duration-depending performance mismatchescompensable?

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 7/22

Page 8: Speaker Veri cation using i-Vectors - da/sec

Speaker Verification

Speech processing

1. Raw speech signal as air pressure changes

2. Frequency analysis: spectral representation

3. Short-time acoustic features, e.g. Mel-Frequency CepstralCoefficients (MFCCs)

Time

Airpressure

Time

Frequency

Time

MFCCs

−50

0

50

100

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 8/22

Page 9: Speaker Veri cation using i-Vectors - da/sec

Speaker Verification

Statistic model: i-vector extraction

4. Gaussian modeling

5. Speaker sub-spaces from Universal Background Model (UBM)

6. identity vectors (i-vectors) as characteristic offset

Mapping by total variability matrix*

MFCC-1

MFCC-2

Speaker A

Speaker B

UBM

MFCC-1MFCC-2

* iteratively in order to optimize the model fit: UBM offset 7→ i-vectors on development data

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 9/22

Page 10: Speaker Veri cation using i-Vectors - da/sec

Speaker Verification

How to imagine i-vectors?

Speaker separation

I Relevant parametersI UBM size: detail of the acoustic spaceI # iterations: adaptation depth of total variabilityI # characteristic factors: i-vector dimension

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 10/22

Page 11: Speaker Veri cation using i-Vectors - da/sec

Research & Development

Extending the baseline, exploring new technologies

I Research on new technologiesI Experimental evaluations on i-vectors

I Commercial and academic scenarios

I Participation in international research evaluation

I Developing more robust approachesI Extending state-of-the-art i-vector score normalization

I Implementation of speaker verification framework in Matlabaccording to ISO/IEC IS 19795-1Information technology – Biometric performance testing and reporting –

Part 1: Principles and framework ⇒ reproducible research

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 11/22

Page 12: Speaker Veri cation using i-Vectors - da/sec

Research & Development

Commercial scenario: experimental set-up

I Aiming at research questions

1. i-vectors on short duration scenarios?

2. New information to HMMs by i-vectors?

I Short but fix duration scenarioI In-house database: 3 – 5 German digits

I Text-independent: random sequences

I 362 / 56 / 300 subjects (development, calibration, evaluation)

I 30 – 34 reference / 2 probe samples

≈ 200,000 comparisons

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 12/22

Page 13: Speaker Veri cation using i-Vectors - da/sec

Research & Development

Examining i-vector parameters

UBM size: 128 UBM size: 256

12

510

200300

400600

2

5

IterationsFactors

EE

R(i

n%

)

12

510

200300

400600

2

5

IterationsFactors

EE

R(i

n%

)

UBM size: 512 UBM size: 1024

12

510

200300

400600

2

5

IterationsFactors

EE

R(i

n%

)

12

510

200300

400600

2

5

IterationsFactors

EE

R(i

n%

)

Equal Error Rate (EER): % impostor match = % genuine non-match → lower error = better

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 13/22

Page 14: Speaker Veri cation using i-Vectors - da/sec

Research & Development

Performance analysis

UBM size: 128 UBM size: 256

12510200

300400

600

25

IterationsFactors

EE

R(i

n%

)

12510200

300400

600

25

IterationsFactors

EE

R(i

n%

)

UBM size: 512 UBM size: 1024

12510200

300400

600

25

IterationsFactors

EE

R(i

n%

)

12510200

300400

600

25

IterationsFactors

EE

R(i

n%

)

.1.2 .5 1 2 5 10 20 40

.1

.2

.512

5

10

20

40

False Match Rate (FMR in %)

Fa

lse

No

n-M

atc

hR

ate

(FN

MR

in%

) Detection Error Tradeoff diagram

i-vector-128 i-vector-128

i-vector-256 i-vector-256

i-vector-512 i-vector-512

30 FNMs

Equal Error Rate (EER): % impostor match = % genuine non-match → lower error = better

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 14/22

Page 15: Speaker Veri cation using i-Vectors - da/sec

Research & Development

Information analysis: cross-entropy of system fusion

−10 −5 0 5 100

0.5

1

Bayesian thresholds η

Normalized

entrop

yHMM

−10 −5 0 5 100

0.5

1

Bayesian thresholds η

Normalized

entrop

y

i-vector-256

−10 −5 0 5 100

0.5

1

Bayesian thresholds η

Normalized

entrop

y

HMM+i-vector-256

Optimizing

minEntropy(η ≈ 4.6 = odds 1:100)

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 15/22

Page 16: Speaker Veri cation using i-Vectors - da/sec

Research & Development

Performance gains by fusion

−10 −5 0 5 100

0.5

1

Bayesian thresholds η

Normalized

entrop

y

HMM

−10 −5 0 5 100

0.5

1

Bayesian thresholds η

Normalized

entrop

y

i-vector-256

−10 −5 0 5 100

0.5

1

Bayesian thresholds η

Normalized

entrop

y

HMM+i-vector-256

Optimizing

minEntropy(η ≈ 4.6 = odds 1:100)

.1.2 .5 1 2 5 10 20 40

.1

.2

.512

5

10

20

40

FMR (in %)

FN

MR

(in

%)

HMM

i-vector-256

HMM+i-vector-256

30 FNMs

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 16/22

Page 17: Speaker Veri cation using i-Vectors - da/sec

Research & Development

Academic scenario: experimental set-up

I Aiming at research question

3. Compensation of duration mismatches on i-vectors?

I Variable duration scenarioI Data of 2013 – 2014 NIST i-vector Machine Learning challenge

I Text-independent, multi-lingual scenario

I 4,781 / 1,306 subjects (development, evaluation)

I 5 reference samples / 9,634 probe i-vectors

≈ 12,000,000 comparisons

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 17/22

Page 18: Speaker Veri cation using i-Vectors - da/sec

Research & Development

Analysis of a variant duration scenario

I 10 offline 5-fold cross-validations

I NIST baseline system2048-component UBM, 60 MFCCs, 600-dim i-vectors

I Adaptive Symmetric (AS) score-normalization

S′ = 12

(S−µreferenceσreference

+S−µprobe

σprobe

), each from top-100 scores of comparisons to dev-set

5s 10s 20s 40s full

1

2

5

10

EE

R(i

n%

)

Baseline AS-norm

Baseline, all AS-norm, all

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 18/22

Page 19: Speaker Veri cation using i-Vectors - da/sec

Research & Development

Proposing duration-invariant AS score-normalization

I Observation: independent i-vector factors⇒ poor comparisons

dev-set full 40s 20s 10s 5s

5s 141 91 66 44 010s 230 140 70 020s 246 118 040s 180 0full 0

(Student t-test)

I Idea:I On probes: dev-set duration > 60s

I On references: dev-set duration ≈ probe duration

System EER (in %) all 5s 10s 20s 40s full

Baseline 2.56 0.89 0.95 0.93 0.92 0.89 0.86AS-norm 2.49 0.17 1.18 0.41 0.18 0.08 0.05

dAS-norm 2.06 0.10 0.35 0.20 0.11 0.07 0.07

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 19/22

Page 20: Speaker Veri cation using i-Vectors - da/sec

Conclusion and perspectives

Conclusion and future perspectives

I Research questions positively confirmed

1. i-vectors are applicable to short duration scenarios

2. Relative cross-entropy gain: 48% to HMM baseline

3. Performance mismatch compensation: 89% to baseline

I Examining probabilistic i-vector comparators e.g.,Probabilistic Linear Discriminant Analysis (PLDA)

I Analyse effects on other speech signal features

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 20/22

Page 21: Speaker Veri cation using i-Vectors - da/sec

Conclusion and perspectives

Community impact

I XI IAPR/IEEE Int.l Summer school for advanced studies onbiometrics: Biometrics for Forensics, Security and beyond,Alghero, Italy, June 9 – 13, 2014Presentation to well-established biometric researchers

I ISCA Odyssey 2014: The Speaker and Language RecognitionWorkshop, Joensuu, Finland, June 16 – 19, 2014A. Nautsch, C. Rathgeb, C. Busch, H. Reininger, and K. Kasper:

Towards Duration Invariance of i-Vector-based Adaptive Score

Normalization, ISCA Odyssey, 2014.

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014 21/22

Page 22: Speaker Veri cation using i-Vectors - da/sec

.

.

.

. .

.

.. .

.. .

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

. ....... ...........

.. .. ..... ........... .

. .......... ....... . .

......... .........

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. .

.. ..

. .

. .

.

Andreas Nautsch, M.Sc.Doctoral Researcher | Research Area: Secure Services

CASEDMornewegstr. 3264293 Darmstadt/[email protected]

+49 6151 16-75182+49 6151 16-4321

TelefonFaxwww.cased.de

Acknowledgment: atip GmbH, associated company duringB.Sc. and M.Sc. studies, for founding and education.

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

Page 23: Speaker Veri cation using i-Vectors - da/sec

.

.

.

. .

.

.. .

.. .

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

. ....... ...........

.. .. ..... ........... .

. .......... ....... . .

......... .........

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. .

.. ..

. .

. .

.

System parameters & metricsI Variable parameters

I UBM sizeI # i-vector factorsI # training interations of

total variability matrix

I Performance metricsI Equal Error Rate (EER)I False Match Rate (FMR)I False Non-Match Rate (FNMR)I Score-cross entropy

.1.2 .5 1 2 5 10 20 40

.1

.2

.512

510

20

40

FMR (in %)

FN

MR

(in

%)

Detection Error Tradeoff diagram

System

30 FMs

30 FNMs

−10 −5 0 5 100

0.5

1

Bayesian thresholds η

Nor

m.

entr

op

y

Hdefaultnorm

Hminnorm

Htotnorm

30 FMs

30 FNMs

NISTη

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

Page 24: Speaker Veri cation using i-Vectors - da/sec

.

.

.

. .

.

.. .

.. .

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

. ....... ...........

.. .. ..... ........... .

. .......... ....... . .

......... .........

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. .

.. ..

. .

. .

.

Real-time evaluation

Table: Enroll/VerifyHMM ⇔ i-vector

System Enroll Verify

HMM 206.2% 5.5%

i-vector-128 6.1% 3.2%i-vector-256 9.9% 3.1%i-vector-512 16.2% 3.3%

Baum-Welch statistics T matrix estimation

128 256 512 1024

2003004006003

3.23.4

UBM sizeFactors×

RT

in%

128 256 512 1024

2003004006000

1020

UBM sizeFactors

×R

Tin

%

Enrollments Verifications

128 256 512 1024

200300400600

2040

UBM sizeFactors

×R

Tin

%

128 256 512 1024

2003004006003

4

UBM sizeFactors×

RT

in%

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

Page 25: Speaker Veri cation using i-Vectors - da/sec

.

.

.

. .

.

.. .

.. .

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

. ....... ...........

.. .. ..... ........... .

. .......... ....... . .

......... .........

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. .

.. ..

. .

. .

.

Illustration of total variability matrix

0 500 1,000 1,500 2,000

0

100

200

300

UBM offset vector

i-ve

ctor

elem

ents

−5

0

5

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

Page 26: Speaker Veri cation using i-Vectors - da/sec

.

.

.

. .

.

.. .

.. .

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

. ....... ...........

.. .. ..... ........... .

. .......... ....... . .

......... .........

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. .

.. ..

. .

. .

.

Analyzing i-vector processing steps

I State-of-the-Art: spherical space projection

Raw i-vectors 7→ Spherical i-vectors

From [N. Dehak et al., 2011]

I On short-duration scenario (calibration set)

64 128 256 512 1024 20481

2

5

10

20

UBM size

Eq

ua

lE

rror

Ra

teE

ER

(in

%) Raw i-vectors

Spherical i-vectors

5% EER cut-off

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

Page 27: Speaker Veri cation using i-Vectors - da/sec

.

.

.

. .

.

.. .

.. .

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

. ....... ...........

.. .. ..... ........... .

. .......... ....... . .

......... .........

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. .

.. ..

. .

. .

.

Duration-invariant Adaptive Symmetric score normalization

Development set

i-vectors

Subset

z-norm

Subset

t-norm

Probe i-vector

with duration

Λsubset

dprobe == dΛ

Λ>60

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

Page 28: Speaker Veri cation using i-Vectors - da/sec

.

.

.

. .

.

.. .

.. .

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

. ....... ...........

.. .. ..... ........... .

. .......... ....... . .

......... .........

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. .

.. ..

. .

. .

.

Probabilistic Linear Discriminant Analysis (PLDA)

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

Page 29: Speaker Veri cation using i-Vectors - da/sec

.

.

.

. .

.

.. .

.. .

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

. ....... ...........

.. .. ..... ........... .

. .......... ....... . .

......... .........

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. .

.. ..

. .

. .

.

Evaluation framework in Matlab

Biometric

characteristics

Presentation

Sample

Data Capture

Sensor

Voice Activity Detection VAD samples

Features

Signal Processing

Enrolment database

Comparator

Identity

claim

Threshold

Verification

outcome

Decision

Score

normalisation

Simlarity

score

Enrolment

Verification

Comparison

scores

Normalised

scoresReference

Reference

Probe

Feature extraction

Fused (calibrated)

scores

Score

fusion/calibration

Evaluation

Metrics & plots

UBM

Features

GMMs

Template

creation

Figure: Framework design according to ISO/IEC 19795-1

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014

Page 30: Speaker Veri cation using i-Vectors - da/sec

.

.

.

. .

.

.. .

.. .

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

. ....... ...........

.. .. ..... ........... .

. .......... ....... . .

......... .........

.. ..

.. ..

.. ..

.. ..

.. ..

.. ..

.. .

.. ..

. .

. .

.

Evaluation framework in Matlab

Voice Activity Detection

Signal Processing

Comparator

Threshold

Verification

outcome

Decision

Score

normalisation

Simlarity

score

Feature extraction

Template

creation

UBM

Features

Score

fusion/calibration

Evaluation

Metrics & plotsatip VAD

HTK

Matlab &

rastamat

HTK

Matlab &

Joint Factor Analysis Matlab Demo

Matlab &

Joint Factor Analysis Matlab Demo

Matlab

BOSARIS toolkit

BOSARIS toolkit

Matlab

Matlab

Matlab &

HDF5

Matlab &

HDF5

Matlab &

HDF5

Matlab &

HDF5

Raw-PCM

files

Matlab &

HDF5

Matlab &

HDF5

Figure: Framework & Toolboxes

Andreas Nautsch Speaker Verification using i-Vectors / Darmstadt, 20.11.2014