dictionary learning for atrial fibrillation modelling
Post on 24-Jun-2022
2 Views
Preview:
TRANSCRIPT
Dictionary learning for atrial fibrillation modelling
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot
IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL
September 12, 2009
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 1 / 17
Electrocardiography
Electrodes on the limbs and thorax measure skin potentialRecord potential difference between the feet and the other electrodesMultichannel signal S , sum of atrial and ventricular activities
S = (S1...S8)
= A + V
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 2 / 17
Sane patient heart beat
Systole:atrial contraction to pumpthe blood in the ventricles(P wave)ventricular contraction topump the blood out ofthe heart (QRS complex)
Diastole: ventricularrelaxation (T wave)
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 3 / 17
Atrial fibrillation
Atria fibrillate instead ofcontractingThe ventricles have toperform the systole on theirown
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 4 / 17
AF observation in the ECG
Observed mixture S = A + VMuch lower atrial energy ‖A‖2 � ‖V ‖2 (between -10dB and -20dB)The observation of A could help the dignosis
ProblemVentricular cancellation problem: given S, find an estimate A of A.
Model of V :succession of QRST complexesstrong inter-patient variability, but regular for a given patient
Model of AF :irregular oscillationsfewer a priori knowledge than V or sane A because of the difficulty to observeit
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 5 / 17
Sparse models
Signal s of length NDictionary Φ with D > N atoms
Definitions is sparse on Φ iff
∃(x , r) ∈ RD × RN ,s = Φx + r‖r‖2 � ‖s‖2‖X‖0 � N
V is sparse on a wavelet dictionaryA is sparse on a time-frequency dictionary
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 6 / 17
Joint sparsity for multichannel signals
S = (S1 . . . S8)
Each channel Sc is sparse on a dictionary Φc :
∀1 ≤ c ≤ 8,Sc = ΦcXc + Rc
X = (X1 . . .X8)
R = (R1 . . .R8)
One looks for a decomposition X with the same non-zero coefficients on allchannels:
c
d
X‖R‖FRO � ‖S‖FRO
‖X‖2,0 =D∑
d=1
‖Xd,:‖02 � N
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 7 / 17
Morphological source separation [Starck 2004]
S = A + V
If A is sparse on ΦA and V is sparse on ΦV , then S is sparse on (ΦAΦV ).
S = (ΦAΦV )X + R
If A is not sparse on ΦV and V is not sparse on ΦA, one can estimate thesources from X :
X =
(XAXV
)A = ΦAXA
V = ΦVXV
Application to ventricular cancellation [Divorra 2006]Gabor / Gaussian spike dictionariesOff-the-shelf dictionaries are bad at discriminating the other sourceHow about learnt dictionaries?
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 8 / 17
Dictionary learning
ProblemGiven a set S of training data, find a couple (Φ, X) such that ∀S ∈ S, S is sparseon Φ
Iterative algorithm:decompose every S ∈ S over Φ
∀S ∈ S, S = ΦXS + RS
optimize Φ given S and the XS to minimize the qudratic errorXS∈S
‖S − ΦXS‖22
Application to ventricular cancellation:learn ΦA and ΦV
no separate training dataHow to learn 2 dictionaries from 1 mixture?
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 9 / 17
Alternate ΦV and ΦA learnings
Learn ΦV on S − A, learn ΦA on S − VStart with ΦV as ‖V ‖2 � ‖A‖2Initially, A = V = 0, R = S
dictionary learning
dictionary learning
i
V
A
The number of learnt patterns increases at each iterationΦA post-processing:
residual is concentrated on QRS complexesremove spikes from AF patterns
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 10 / 17
Evaluation on synthetic data
Data synthesis:A: numerical simulation of a physical heart modelV: manual removal of the P wave from a sane patient’s ECG
4 patients, 21 simulated AF
Error during the QRS
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 11 / 17
Evaluation on synthetic data
Criterions:SIR: ratio of the other original sources in the estimated oneSAR: ratio of computation artefactsSDR: ratio of all kinds of errors
Comparison with Average Beat Subtraction [Lemay 2007]
Lead VAR dictionaries ABSSDR SIR SAR SDR SIR SAR
VR V 15.6 24.1 16.7 15.1 24.3 16.1A -12.3 1.2 23.0 1.4 -0.5 19.2 0.5
V1 V 16.4 23.3 17.7 16.8 24.6 17.9A -11.7 3.0 28.4 3.1 1.5 27.9 2.5
V4 V 20.3 28.9 21.3 19.8 31.5 20.2A -17.9 -1.4 22.2 -1.3 -1.9 21.1 -0.7
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 12 / 17
Evaluation on synthetic data
Criterions:SIR: ratio of the other original sources in the estimated oneSAR: ratio of computation artefactsSDR: ratio of all kinds of errors
Comparison with Average Beat Subtraction [Lemay 2007]
Lead VAR dictionaries ABSSDR SIR SAR SDR SIR SAR
VR VA -12.3 1.2 -0.5
V1 VA -11.7 3.0 1.5
V4 VA -17.9 -1.4 -1.9
Atrial SDR = main peformance measureAverage 1dB gain over ABS
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 13 / 17
Evaluation on synthetic data
Criterions:SIR: ratio of the other original sources in the estimated oneSAR: ratio of computation artefactsSDR: ratio of all kinds of errors
Comparison with Average Beat Subtraction [Lemay 2007]
Lead VAR dictionaries ABSSDR SIR SAR SDR SIR SAR
VR V 24.1 24.3A -12.3
V1 V 23.3 24.6A -11.7
V4 V 28.9 31.5A -17.9
Loss in ventricular SIRVentricular dictionary is still not discriminating enough
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 14 / 17
Learnt dictionary on one lead
A
V
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 15 / 17
What’s next?
Application to real dataHow to evaluate the algorithm without the original sources?
Dictionary-based diagnosisDictionary ≈ signal summary, without temporal information
GeneralizationDiscriminating learning instead of post-processing? [Mairal 2008]
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 16 / 17
Average Beat Subtraction (ABS)
Hypotheses:Except ectopic beats, the VA is quite regular for a given patientAF and VA are uncorrelated
Algorithm:Detect the QRS complexesCompute a typical beat (or template) through PCASubtract it from each occurence
AF and VA are uncorrelated → AF is averaged out of the templateVA energy is much higher than AF → slight errors lead to significantpertubation on estimated AFWhen several templates are learnt, they get corrupted with AF interference
B. Mailhé, M. Lemay, R. Gribonval, J.-M. Vesin, P. Vandergheynst, F. Bimbot (IRISA - Université de Rennes 1, INRIA, CNRSITS - EPFL)Dictionary learning for atrial fibrillation modelling September 12, 2009 17 / 17
1- ECG analysis
2- L1 minimization for dictionary learning
Rémi Gribonval METISS team (audio signal processing, indexing, source separation)
INRIA, Rennes, France
Karin SchnassLTS2, EPFL, Switzerland
Atelier «Décompositions atomiques en imagerie cérébrale : nouvelles avenues en traitement du signal»
Université de Montréal, 14-19 septembre 2009
Outline
• Preliminaries: blind source separation, dictionary learning & related problems
• Objectives of theoretical dictionary learning
• L1 minimization for dictionary learning
• Main results! geometric “local” identifiability condition! random model and finite sample size analysis
• Discussion, conclusion & challenges
2
Sparse signal models
• An image / a signal = sum of few atoms
" Dictionary = collection of atoms = " Representation = coefficient vector =
• Sparsity of ? Only if dictionary is “well chosen”
" Pre-chosen atoms: wavelets, Gabor, etc." Learned dictionary = from collection of signals / images
3
b =∑
k
xkak = Ax
Ax
x
bn = Axn, 1 ≤ n ≤ N
ak
A
• Sparse modeling : choose a dictionary
Dictionary learning for sparse representations
4
Training image database
• Sparse modeling : choose a dictionary
Dictionary learning for sparse representations
4
Training image database
bn = Axn, 1 ≤ n ≤ Npatch extraction
Training patches
• Sparse modeling : choose a dictionary
Unknownsparse coefficients
Unknown dictionary
Dictionary learning for sparse representations
4
Training image database
bn = Axn, 1 ≤ n ≤ Npatch extraction
Training patches
• Sparse modeling : choose a dictionary
Unknownsparse coefficients
Unknown dictionary
Dictionary learning for sparse representations
sparse learning
4
Training image database
bn = Axn, 1 ≤ n ≤ N
A = edge-like atoms[Olshausen & Field 96]
= shifted edge-like motifs[Jost, Vandergheynst, Lesage & Gribonval 2005]
patch extraction
Training patches
Dictionary learning ?• Problem : estimate a matrix given observed samples
5
bn = Axn, 1 ≤ n ≤ N
A
Dictionary learning ?• Problem : estimate a matrix given observed samples
5
bn = Axn, 1 ≤ n ≤ N
A
Dictionary learning ?• Problem : estimate a matrix given observed samples
5
bn = Axn, 1 ≤ n ≤ N
A
B = AX
Dictionary learning ?• Problem : estimate a matrix given observed samples
5
bn = Axn, 1 ≤ n ≤ N
A
AUnknown mixing matrix (blind source separation)Unknown dictionary (sparse signal approximation)Unknown channel filter (blind channel estimation) ...
{B = AX
Dictionary learning ?• Problem : estimate a matrix given observed samples
5
bn = Axn, 1 ≤ n ≤ N
A
AUnknown mixing matrix (blind source separation)Unknown dictionary (sparse signal approximation)Unknown channel filter (blind channel estimation) ...
{X Unknown sources / signal representations / ...
B = AX
Dictionary learning ?• Problem : estimate a matrix given observed samples
5
bn = Axn, 1 ≤ n ≤ N
A
AUnknown mixing matrix (blind source separation)Unknown dictionary (sparse signal approximation)Unknown channel filter (blind channel estimation) ...
• Fundamentally ill-posed factorization problem : need (weak) model on unknown coefficients X and / or matrix
{
A
X Unknown sources / signal representations / ...
B = AX
Model of ...
Assumption
Identifiability
Identification
Issues
ICA (Independent Component Analysis) SCA (Sparse Component Analysis)
probability density function sample matrix
IndependenceSparsity / geometry
# many zeroes in# and concentrate around union
of low dimensional subspaces
Darmois theorem[Georgiev, Theis & Cichocki 05][Aharon, Elad & Bruckstein 06]
Contrast functions Combinatorial algorithms
In practice : finite training sets
expectation sample average
Identifiability assumes:
# highly sparse coefficients# (combinatorially ?) many training
examples
Theoretical dictionary learning• Problem : estimate a matrix given samples
6
bn = Axn, 1 ≤ n ≤ N B = AXA
p(X)
p(X) =∏
nk
p(xn(k))
A ∼ W−1
X
W := arg minW
EX(f(WAX))
Xbnxn
Model of ...
Assumption
Identifiability
Identification
Issues
ICA (Independent Component Analysis) SCA (Sparse Component Analysis)
probability density function sample matrix
IndependenceSparsity / geometry
# many zeroes in# and concentrate around union
of low dimensional subspaces
Darmois theorem[Georgiev, Theis & Cichocki 05][Aharon, Elad & Bruckstein 06]
Contrast functions Combinatorial algorithms
In practice : finite training sets
expectation sample average
Identifiability assumes:
# highly sparse coefficients# (combinatorially ?) many training
examples
Theoretical dictionary learning• Problem : estimate a matrix given samples
6
bn = Axn, 1 ≤ n ≤ N B = AXA
p(X)
p(X) =∏
nk
p(xn(k))
A ∼ W−1
X
W := arg minW
EX(f(WAX))
Xbnxn
Holy grail: provably good + efficient sparse learning• Sparse representations
! Known matrix
! Data model
! Identifiability theorems:
! Much literature since 2001 (Donoho & Huo, Elad & Bruckstein, Gribonval & Nielsen, Candès & Romberg & Tao, Tropp, Donoho & Tanner, ... and many others)
• Dictionary learning
! Unknown matrix
! Data model
! Identifiability theorem ?
! Most literature on Independent Component Analysis (ICA), density model rather than finite sample size geometric model
7
A A0
B = A0X0b = Ax0
Ax = b AX = Bx0 = arg min ‖x‖1 (A0, X0) ∈ arg min ‖X‖1
A0, X0 ∈?‖x0‖0 ≤ k1(A)
• Cloud of 2500 training samples in ! ~1000 sparse [= on axes]! ~1500 non-sparse
Numerical example
8
R2
• Cloud of 2500 training samples in ! ~1000 sparse [= on axes]! ~1500 non-sparse
• Orthonormal basis! Angle
Numerical example
8
R2
θ
θ
a1(θ)a2(θ)
Aθ = [a1(θ),a2(θ)]
• Cloud of 2500 training samples in ! ~1000 sparse [= on axes]! ~1500 non-sparse
• Orthonormal basis! Angle
• L1 criterion
Numerical example
8
θ
‖A−1θ A0X‖1
R2
θ
θ
a1(θ)a2(θ)
Aθ = [a1(θ),a2(θ)]
• Cloud of 2500 training samples in ! ~1000 sparse [= on axes]! ~1500 non-sparse
• Orthonormal basis! Angle
• L1 criterion
! global optimum=original! no other local minimum
Numerical example
8
θ
‖A−1θ A0X‖1
R2
θ
θ
a1(θ)a2(θ)
Aθ = [a1(θ),a2(θ)]
9
‖A−
1θ1,θ
2A
0X‖ 1
θ2 θ1
Aθ1,θ2
θ1
θ2
Numerical example
Non orthogonal bases
9
‖A−
1θ1,θ
2A
0X‖ 1
θ2 θ1
Aθ1,θ2
θ1
θ2
Numerical example
Non orthogonal bases
a) Global minima match the original basis b) There is no other local minimum.
Empirical observations
Theoretical results
• “Local identifiability” for (non overcomplete) L1 dictionary learning! algebraic / geometric characterization of local minima
• Probability of identifiability! model on X: random, weakly-sparse! analysis of identifiability for (small) finite sample size
10
Local identifiability result
• Assumptions: ! : for each row k, up to column
permutation, has decomposition
! = basis of sufficiently incoherent unit atoms
• Conclusion :
! = local minimum of L1 among (not necessarily orthonormal) bases
11
A0
(A′, X ′) ≈ (A0, X)
A′X ′ = A0X
‖X ′‖1 ≥ ‖X‖1
}X
XksTk = Xkdk
dk, ‖dk‖∞ < 1,
0X =
and there exists
A0
∀k‖ak‖2 = 1 maxk !=l
|〈ak, al〉| # 1
Trivial example
• Assumptions: ! : for each row k, up to column
permutation, has decomposition
! = basis of sufficiently incoherent unit atoms
• If X has at most one nonzero entry per column (at unknown positions)
! Simply choose
• How robust is the condition to weakly-sparse outliers ?
• How many samples N does it then typically require ?
12
A0
X
XksTk = Xkdk
dk, ‖dk‖∞ < 1,
0X =
and there exists
∀k‖ak‖2 = 1 maxk !=l
|〈ak, al〉| # 1
Trivial example
• Assumptions: ! : for each row k, up to column
permutation, has decomposition
! = basis of sufficiently incoherent unit atoms
• If X has at most one nonzero entry per column (at unknown positions)
! Simply choose
• How robust is the condition to weakly-sparse outliers ?
• How many samples N does it then typically require ?
12
A0
X
XksTk = Xkdk
dk, ‖dk‖∞ < 1,
0X =
and there exists
∀k‖ak‖2 = 1 maxk !=l
|〈ak, al〉| # 1
=0
0=
Trivial example
• Assumptions: ! : for each row k, up to column
permutation, has decomposition
! = basis of sufficiently incoherent unit atoms
• If X has at most one nonzero entry per column (at unknown positions)
! Simply choose
12
A0
X
XksTk = Xkdk
dk, ‖dk‖∞ < 1,
0X =
and there exists
∀k‖ak‖2 = 1 maxk !=l
|〈ak, al〉| # 1
=0
0=
dk = 0
Trivial example
• Assumptions: ! : for each row k, up to column
permutation, has decomposition
! = basis of sufficiently incoherent unit atoms
• If X has at most one nonzero entry per column (at unknown positions)
! Simply choose
• How robust is the condition to weakly-sparse outliers ?
12
A0
X
XksTk = Xkdk
dk, ‖dk‖∞ < 1,
0X =
and there exists
∀k‖ak‖2 = 1 maxk !=l
|〈ak, al〉| # 1
=0
0=
dk = 0
Trivial example
• Assumptions: ! : for each row k, up to column
permutation, has decomposition
! = basis of sufficiently incoherent unit atoms
• If X has at most one nonzero entry per column (at unknown positions)
! Simply choose
• How robust is the condition to weakly-sparse outliers ?
• How many samples N does it then typically require ?
12
A0
X
XksTk = Xkdk
dk, ‖dk‖∞ < 1,
0X =
and there exists
∀k‖ak‖2 = 1 maxk !=l
|〈ak, al〉| # 1
=0
0=
dk = 0
• Dimension of the problem
• General dictionary , basis
• Required number of training samples: " With , maximum sparsity achieved for
" Identifiability from samples for all “nice” ?" Identifiability with weakly-sparse X?
How many training samples ?
17
XK
N training samples
atoms
A
K
signal dimension
d
K ≥ d K = d
N = K
N ≤ CK log K
A = B != Abnak
B = AX =
X = Id1 atom = 1 training sample
A
X
Second result : probability of identifiability
• Random model
! i.i.d. (sub)Gaussian entries in ! a fraction set to zero at random
• Using concentration of measure :
18
X = (xkn)
0
0
0
0
0
0
0
0
0
0
0
0
p
K
N
ConclusionLocal identifiability guaranteed with high probability from only “few” training samples:
(almost linear in dimension K, even for small p)
RKProbability of failure ...
training samples
atoms
p(x)
p
N ≥ C(p) · K log K
P ( ) ≤ C exp(aK log K − bN)
• L1-minimization for dictionary learning:! Sufficient condition for local identifiability of bases! Condition typically valid
" even if only weakly-sparse training samples" even with relatively few training samples (non combinatorial training set)
• Consequence : ! ideal convergence of descent algorithms conditionally on
good initialization! conjecture : with high probability, no spurious local minima
Summary
19
Perspectives & challenges
• Main open questions:" Probability of spurious local minima" Optimization algorithm (L1 criterion is nonconvex ...)" Stability/robustness to noise / compressible X ?
• Extensions:! other learning paradigms: efficiency? equivalence?
" greedy approaches (“deflation”, ongoing work)" alternate optimization (MOD, K-SVD, ...)
! blind sparse deconvolution ! learning general subspace arrangments / manifolds [cf Yi Ma]
20
top related