dictionary learning for atrial fibrillation modelling

Dictionary learning for atrial fibrillation modelling

B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot

IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL

September 12, 2009

B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 1 / 17

Electrocardiography

Electrodes on the limbs and thorax measure skin potentialRecord potential difference between the feet and the other electrodesMultichannel signal S , sum of atrial and ventricular activities

S = (S1...S8)

= A + V

Sane patient heart beat

Systole:atrial contraction to pumpthe blood in the ventricles(P wave)ventricular contraction topump the blood out ofthe heart (QRS complex)

Diastole: ventricularrelaxation (T wave)

Atrial fibrillation

Atria fibrillate instead ofcontractingThe ventricles have toperform the systole on theirown

AF observation in the ECG

Observed mixture S = A + VMuch lower atrial energy ‖A‖2 � ‖V ‖2 (between -10dB and -20dB)The observation of A could help the dignosis

ProblemVentricular cancellation problem: given S, find an estimate A of A.

Model of V :succession of QRST complexesstrong inter-patient variability, but regular for a given patient

Model of AF :irregular oscillationsfewer a priori knowledge than V or sane A because of the difficulty to observeit

Sparse models

Signal s of length NDictionary Φ with D > N atoms

Definitions is sparse on Φ iff

∃(x , r) ∈ RD × RN ,s = Φx + r‖r‖2 � ‖s‖2‖X‖0 � N

V is sparse on a wavelet dictionaryA is sparse on a time-frequency dictionary

Joint sparsity for multichannel signals

S = (S1 . . . S8)

Each channel Sc is sparse on a dictionary Φc :

∀1 ≤ c ≤ 8,Sc = ΦcXc + Rc

X = (X1 . . .X8)

R = (R1 . . .R8)

One looks for a decomposition X with the same non-zero coefficients on allchannels:

X‖R‖FRO � ‖S‖FRO

‖X‖2,0 =D∑

‖Xd,:‖02 � N

Morphological source separation [Starck 2004]

S = A + V

If A is sparse on ΦA and V is sparse on ΦV , then S is sparse on (ΦAΦV ).

S = (ΦAΦV )X + R

If A is not sparse on ΦV and V is not sparse on ΦA, one can estimate thesources from X :

)A = ΦAXA

V = ΦVXV

Application to ventricular cancellation [Divorra 2006]Gabor / Gaussian spike dictionariesOff-the-shelf dictionaries are bad at discriminating the other sourceHow about learnt dictionaries?

Dictionary learning

ProblemGiven a set S of training data, find a couple (Φ, X) such that ∀S ∈ S, S is sparseon Φ

Iterative algorithm:decompose every S ∈ S over Φ

∀S ∈ S, S = ΦXS + RS

optimize Φ given S and the XS to minimize the qudratic errorXS∈S

‖S − ΦXS‖22

Application to ventricular cancellation:learn ΦA and ΦV

no separate training dataHow to learn 2 dictionaries from 1 mixture?

Alternate ΦV and ΦA learnings

Learn ΦV on S − A, learn ΦA on S − VStart with ΦV as ‖V ‖2 � ‖A‖2Initially, A = V = 0, R = S

dictionary learning

The number of learnt patterns increases at each iterationΦA post-processing:

residual is concentrated on QRS complexesremove spikes from AF patterns

Evaluation on synthetic data

Data synthesis:A: numerical simulation of a physical heart modelV: manual removal of the P wave from a sane patient’s ECG

4 patients, 21 simulated AF

Error during the QRS

Criterions:SIR: ratio of the other original sources in the estimated oneSAR: ratio of computation artefactsSDR: ratio of all kinds of errors

Comparison with Average Beat Subtraction [Lemay 2007]

Lead VAR dictionaries ABSSDR SIR SAR SDR SIR SAR

VR V 15.6 24.1 16.7 15.1 24.3 16.1A -12.3 1.2 23.0 1.4 -0.5 19.2 0.5

V1 V 16.4 23.3 17.7 16.8 24.6 17.9A -11.7 3.0 28.4 3.1 1.5 27.9 2.5

V4 V 20.3 28.9 21.3 19.8 31.5 20.2A -17.9 -1.4 22.2 -1.3 -1.9 21.1 -0.7

VR VA -12.3 1.2 -0.5

V1 VA -11.7 3.0 1.5

V4 VA -17.9 -1.4 -1.9

Atrial SDR = main peformance measureAverage 1dB gain over ABS

VR V 24.1 24.3A -12.3

V1 V 23.3 24.6A -11.7

V4 V 28.9 31.5A -17.9

Loss in ventricular SIRVentricular dictionary is still not discriminating enough

Learnt dictionary on one lead

What’s next?

Application to real dataHow to evaluate the algorithm without the original sources?

Dictionary-based diagnosisDictionary ≈ signal summary, without temporal information

GeneralizationDiscriminating learning instead of post-processing? [Mairal 2008]

Average Beat Subtraction (ABS)

Hypotheses:Except ectopic beats, the VA is quite regular for a given patientAF and VA are uncorrelated

Algorithm:Detect the QRS complexesCompute a typical beat (or template) through PCASubtract it from each occurence

AF and VA are uncorrelated → AF is averaged out of the templateVA energy is much higher than AF → slight errors lead to significantpertubation on estimated AFWhen several templates are learnt, they get corrupted with AF interference

1- ECG analysis

2- L1 minimization for dictionary learning

Rémi Gribonval METISS team (audio signal processing, indexing, source separation)

INRIA, Rennes, France

Karin SchnassLTS2, EPFL, Switzerland

Atelier «Décompositions atomiques en imagerie cérébrale : nouvelles avenues en traitement du signal»

Université de Montréal, 14-19 septembre 2009

Outline

• Preliminaries: blind source separation, dictionary learning & related problems

• Objectives of theoretical dictionary learning

• L1 minimization for dictionary learning

• Main results! geometric “local” identifiability condition! random model and finite sample size analysis

• Discussion, conclusion & challenges

Sparse signal models

• An image / a signal = sum of few atoms

" Dictionary = collection of atoms = " Representation = coefficient vector =

• Sparsity of ? Only if dictionary is “well chosen”

" Pre-chosen atoms: wavelets, Gabor, etc." Learned dictionary = from collection of signals / images

b =∑

xkak = Ax

bn = Axn, 1 ≤ n ≤ N

• Sparse modeling : choose a dictionary

Dictionary learning for sparse representations

Training image database

bn = Axn, 1 ≤ n ≤ Npatch extraction

Training patches

Unknownsparse coefficients

Unknown dictionary

bn = Axn, 1 ≤ n ≤ Npatch extraction

Training patches

Unknownsparse coefficients

Unknown dictionary

sparse learning

bn = Axn, 1 ≤ n ≤ N

A = edge-like atoms[Olshausen & Field 96]

= shifted edge-like motifs[Jost, Vandergheynst, Lesage & Gribonval 2005]

patch extraction

Training patches

Dictionary learning ?• Problem : estimate a matrix given observed samples

bn = Axn, 1 ≤ n ≤ N

B = AX

bn = Axn, 1 ≤ n ≤ N

AUnknown mixing matrix (blind source separation)Unknown dictionary (sparse signal approximation)Unknown channel filter (blind channel estimation) ...

{B = AX

bn = Axn, 1 ≤ n ≤ N

{X Unknown sources / signal representations / ...

B = AX

bn = Axn, 1 ≤ n ≤ N

• Fundamentally ill-posed factorization problem : need (weak) model on unknown coefficients X and / or matrix

X Unknown sources / signal representations / ...

B = AX

Model of ...

Assumption

Identifiability

Identification

Issues

ICA (Independent Component Analysis) SCA (Sparse Component Analysis)

probability density function sample matrix

IndependenceSparsity / geometry

# many zeroes in# and concentrate around union

of low dimensional subspaces

Darmois theorem[Georgiev, Theis & Cichocki 05][Aharon, Elad & Bruckstein 06]

Contrast functions Combinatorial algorithms

In practice : finite training sets

expectation sample average

Identifiability assumes:

# highly sparse coefficients# (combinatorially ?) many training

examples

Theoretical dictionary learning• Problem : estimate a matrix given samples

bn = Axn, 1 ≤ n ≤ N B = AXA

p(X) =∏

p(xn(k))

A ∼ W−1

W := arg minW

EX(f(WAX))

Model of ...

Assumption

Identifiability

Identification

Issues

ICA (Independent Component Analysis) SCA (Sparse Component Analysis)

probability density function sample matrix

IndependenceSparsity / geometry

# many zeroes in# and concentrate around union

of low dimensional subspaces

Darmois theorem[Georgiev, Theis & Cichocki 05][Aharon, Elad & Bruckstein 06]

Contrast functions Combinatorial algorithms

In practice : finite training sets

expectation sample average

Identifiability assumes:

# highly sparse coefficients# (combinatorially ?) many training

examples

Theoretical dictionary learning• Problem : estimate a matrix given samples

bn = Axn, 1 ≤ n ≤ N B = AXA

p(X) =∏

p(xn(k))

A ∼ W−1

W := arg minW

EX(f(WAX))

Holy grail: provably good + efficient sparse learning• Sparse representations

! Known matrix

! Data model

! Identifiability theorems:

! Much literature since 2001 (Donoho & Huo, Elad & Bruckstein, Gribonval & Nielsen, Candès & Romberg & Tao, Tropp, Donoho & Tanner, ... and many others)

• Dictionary learning

! Unknown matrix

! Data model

! Identifiability theorem ?

! Most literature on Independent Component Analysis (ICA), density model rather than finite sample size geometric model

B = A0X0b = Ax0

Ax = b AX = Bx0 = arg min ‖x‖1 (A0, X0) ∈ arg min ‖X‖1

A0, X0 ∈?‖x0‖0 ≤ k1(A)

• Cloud of 2500 training samples in ! ~1000 sparse [= on axes]! ~1500 non-sparse

Numerical example

• Orthonormal basis! Angle

Numerical example

a1(θ)a2(θ)

Aθ = [a1(θ),a2(θ)]

• L1 criterion

Numerical example

‖A−1θ A0X‖1

a1(θ)a2(θ)

Aθ = [a1(θ),a2(θ)]

• L1 criterion

! global optimum=original! no other local minimum

Numerical example

‖A−1θ A0X‖1

a1(θ)a2(θ)

Aθ = [a1(θ),a2(θ)]

‖A−

1θ1,θ

0X‖ 1

θ2 θ1

Aθ1,θ2

Numerical example

Non orthogonal bases

‖A−

1θ1,θ

0X‖ 1

θ2 θ1

Aθ1,θ2

Numerical example

Non orthogonal bases

a) Global minima match the original basis b) There is no other local minimum.

Empirical observations

Theoretical results

• “Local identifiability” for (non overcomplete) L1 dictionary learning! algebraic / geometric characterization of local minima

• Probability of identifiability! model on X: random, weakly-sparse! analysis of identifiability for (small) finite sample size

Local identifiability result

• Assumptions: ! : for each row k, up to column

permutation, has decomposition

! = basis of sufficiently incoherent unit atoms

• Conclusion :

! = local minimum of L1 among (not necessarily orthonormal) bases

(A′, X ′) ≈ (A0, X)

A′X ′ = A0X

‖X ′‖1 ≥ ‖X‖1

XksTk = Xkdk

dk, ‖dk‖∞ < 1,

and there exists

∀k‖ak‖2 = 1 maxk !=l

|〈ak, al〉| # 1

Trivial example

• If X has at most one nonzero entry per column (at unknown positions)

! Simply choose

• How robust is the condition to weakly-sparse outliers ?

• How many samples N does it then typically require ?

XksTk = Xkdk

dk, ‖dk‖∞ < 1,

and there exists

|〈ak, al〉| # 1

Trivial example

! Simply choose

XksTk = Xkdk

dk, ‖dk‖∞ < 1,

and there exists

|〈ak, al〉| # 1

Trivial example

! Simply choose

XksTk = Xkdk

dk, ‖dk‖∞ < 1,

and there exists

|〈ak, al〉| # 1

dk = 0

Trivial example

! Simply choose

XksTk = Xkdk

dk, ‖dk‖∞ < 1,

and there exists

|〈ak, al〉| # 1

dk = 0

Trivial example

! Simply choose

XksTk = Xkdk

dk, ‖dk‖∞ < 1,

and there exists

|〈ak, al〉| # 1

dk = 0

• Dimension of the problem

• General dictionary , basis

• Required number of training samples: " With , maximum sparsity achieved for

" Identifiability from samples for all “nice” ?" Identifiability with weakly-sparse X?

How many training samples ?

N training samples

signal dimension

K ≥ d K = d

N ≤ CK log K

A = B != Abnak

B = AX =

X = Id1 atom = 1 training sample

Second result : probability of identifiability

• Random model

! i.i.d. (sub)Gaussian entries in ! a fraction set to zero at random

• Using concentration of measure :

X = (xkn)

ConclusionLocal identifiability guaranteed with high probability from only “few” training samples:

(almost linear in dimension K, even for small p)

RKProbability of failure ...

training samples

N ≥ C(p) · K log K

P ( ) ≤ C exp(aK log K − bN)

• L1-minimization for dictionary learning:! Sufficient condition for local identifiability of bases! Condition typically valid

" even if only weakly-sparse training samples" even with relatively few training samples (non combinatorial training set)

• Consequence : ! ideal convergence of descent algorithms conditionally on

good initialization! conjecture : with high probability, no spurious local minima

Summary

Perspectives & challenges

• Main open questions:" Probability of spurious local minima" Optimization algorithm (L1 criterion is nonconvex ...)" Stability/robustness to noise / compressible X ?

• Extensions:! other learning paradigms: efficiency? equivalence?

" greedy approaches (“deflation”, ongoing work)" alternate optimization (MOD, K-SVD, ...)

! blind sparse deconvolution ! learning general subspace arrangments / manifolds [cf Yi Ma]

THE END

remi.gribonval@inria.fr

dictionary learning for atrial fibrillation modelling

Documents

treatment guide atrial fibrillation - cleveland...

atrial fibrillation ablation - hamilton health sciences ·...

atrial fibrillation ppt

atrial fibrillation ablation

atrial fibrillation and atrial · pdf fileatrial...

203297605 atrial fibrillation

atrial fibrillation

series atrial fibrillation 1 stroke prevention in atrial...

atrial fibrillation related stroke - maine...

atrial fibrillation & anticoagulants

atrial fibrillation management

atrial fibrillation -latest

palpitations & atrial fibrillation

atrial fibrillation

atrial fibrillation & atrial flutter

atrial fibrillation - jeje

atrial flutter and atrial fibrillation

atrial fibrillation...rx

atrial fibrillation - interior health · pdf fileatrial...

atrial fibrillation: the management of atrial...