seong jae hwang zirui tao won hwa kim vikas singhpitt.edu/~sjh95/publication/iccv2019_poster.pdf ·...

1
Conditional Recurrent Flow: Conditional Generation of Longitudinal Samples with Applications to Neuroimaging Seong Jae Hwang Zirui Tao Won Hwa Kim Vikas Singh M OTIVATION :L ONGITUDINAL N EUROIMAGING A NALYSIS Goal: Understand the progression of longitudinal neuroimaging measures (e.g., PET) of people with various covariate progressions (e.g., “Abnormal”: HighLow cognition vs. “Normal”: HighHigh cognition) Challenge: Difficult to perform strong statistical analysis with small sample size of longitudinal neuroimaging datasets Figure: Longitudinal neuroimaging analysis of Alzheimer’s disease (AD) pathology. Understanding how amyloid (AD pathology) accumulates differently in brain regions between people of varying conditions may help us to better understand the underlying disease mechanism. O BJECTIVE :G ENERATION OF N EUROIMAGING M EASURES Solution: Conditional generation of longitudinal neuroimaging measures via Conditional Recurrent Flow (CRow) Example of Figure below with a trained CRow model: 1) Given: A sequential condition of decreasing cognition (i.e., a memory test score sequence y 1 i y 2 i y 3 i indicating HighMediumLow Cognition performance). 2) Model: Conditional Recurrent Flow (CRow). 3) Generate: A sequence of brain image progression x 1 i x 2 i x 3 i corresponding to the given cognition progression (i.e., brain regions with high (red) and low (blue) disease pathology). Conditional Recurrent Flow High Cognition Med Cognition Low Cognition Sequential Condition Generated Sequence Real Data Sequence Our Model Figure: Conditional sequence generation illustration. The Generated Sequence follows the trend of the Real Data Sequence (i.e., similar () to the real brain image progression) from the subjects with similarly decreasing cognition scores. P RELIMINARY 1: N ORMALIZING F LOW AND C OUPLING LAYER Normalizing Flow: Maps a sample x to a latent variable z = f (x) where z is from a standard normal distribution Z: p X (x)= p Z (z)/|J X |, |J X | = [x = f -1 (z)] z (1) where |J X | is a Jacobian determinant. Coupling Layer [Dinh et al., 2016]: Allows an exactly invertible mapping u v with subnetworks r and s . 1. Forward map (Fig. (a)): v 1 = u 1 , v 2 = u 2 exp(s (u 1 )) + r (u 1 ) (2) 2. Inverse map (Fig. (b)): u 1 = v 1 , u 2 =(v 2 - r (v 1 )) exp(s (v 1 )) (3) (a) Forward map (b) Inverse map Figure: Coupling layer in normalizing flow. Advantage: Easy Jacobian determinant computation via triangular Jacobian matrix J v : |J v | = v u = v 1 u 1 v 1 u 2 v 2 u 1 v 2 u 2 = I 0 v 2 u 1 diag (exp s (u 1 )) = exp( X i (s (u 1 )) i ) (4) Coupling Block: Apply the transformation on both partitions. 1. Forward map: v 1 = u 1 exp(s 2 (u 2 )) + r 2 (u 2 ), v 2 = u 2 exp(s 1 (v 1 )) + r 1 (v 1 ) (5) 2. Inverse map: u 2 =(v 2 - r 1 (v 1 )) exp(s 1 (v 1 )), u 1 =(v 1 - r 2 (u 2 )) exp(s 2 (u 2 )) (6) P RELIMINARY 2: C ONDITIONAL S AMPLE G ENERATION Conditional Invertible Neural Network [Ardizonne et al., 2019]: A conditional invertible mapping between x [y, z] using Coupling Layer. Figure: Conditional Sample Generation. (1) Sample z Z, (2) Assign label y (i.e., color), (3) Inverse map to generate x = f -1 ([y, z]) in the appropriate location. M ETHOD :C ONDITIONAL R ECURRENT F LOW (CR OW ) Conditional Recurrent Flow (CRow) [Hwang et al., 2019]: Given: Sequential data x t X and label/covariate y t Y. At each t with given [u t 1 , u t 2 ]= u t x t and [v t 1 , v t 2 ]= v t [y t , z t ], 1. Forward Map: v t 1 = u t 1 exp(q s 2 (u t 2 , h t -1 2 )) + q r 2 (u t 2 , h t -1 2 ), v t 2 = u t 2 exp(q s 1 (v t 1 , h t -1 1 )) + q r 1 (v t 1 , h t -1 1 ) 2. Inverse Map: u t 2 =(v t 2 - q r 1 (v t 1 , h t -1 1 )) exp(q s 1 (v t 1 , h t -1 1 )), u t 1 =(v t 1 - q r 2 (u t 2 , h t -1 2 )) exp(q s 2 (u t 2 , h t -1 2 )) where q (·) is a recurrent subnetwork (e.g., GRU). Advantage: Longitudinal and conditional data generation along with density estimation. Figure: The CRow model. Temporal Context Gating (TCG): Context-based (i.e., history) transformation of the “non-transformed partition” (e.g., u 1 in Eq. (2)). f TCG (α t , h t -1 )= α t cgate (h t -1 ) (forward), f -1 TCG (α t , h t -1 )= α t cgate (h t -1 ) (inverse) Advantage: Additional recurrent power while preserving the triangular Jacobian matrix for fast Jacobian determinant computation: |J v | = v 1 u 1 v 1 u 2 v 2 u 1 v 2 u 2 = diag (cgate (h t -1 )) 0 v 2 u 1 diag (exp s (u 1 )) =[ Y j cgate (h t -1 ) j ] * [exp( X i (s (u 1 )) i )] E XPERIMENT 1: G ENERATE M OVING MNIST S EQUENCES Q: Can we generate new data sequences given new sequential conditions? (1) Train with label sequences y : 5 5 5 5 5 5 and y : 9 9 9 9 9 9 (2) Generate data sequences given a newly seen label sequences y which changes label midway: 5 5 9 9 9 9 59: Figure: Generated sequences using CRow. (top of each frame, [digit label]: density) Ankle BootSneaker: Figure: Examples of generated Moving Fashion MNIST sequences using CRow. E XPERIMENT 2: N EUROIMAGING FOR A LZHEIMER S D ISEASE Q: Can we detect the differences in the Alzheimer’s disease (AD) pathology progression between people with “Normal” and “Abnormal” covariate progressions? (Idea illustrated in Fig. of Motivation Sec.) Dataset: N = 276 Amyloid PET images (AV45) of T = 3 time points from Alzheimer’s Disease Neuroimaging Initiative (ADNI) in 82 Desikan Atlas regions 1. Longitudinal region-wise amyloid measures: x t R 82 for t = 1, 2, 3 2. Longitudinal covariates: y t R for t = 1, 2, 3 (e.g., cognition) Analysis Setup: For each covariate type, (1) Training : Train with all N = 276 subjects (2) Generation : Generate (i) Group A: 100 samples of x t given “Abnormal” y t and (ii) Group B: 100 samples of x t given “Normal” y t (3) Statistical Analysis : At t = 3, perform a group difference test between Group A and Group B in each region. Count significantly differences regions. Figure: Generated sequences vs. real data sequences comparison for CN (top)MCI (middle)AD (bottom). Left (blue frames): The average of the 100 generated sequences conditioned on CNMCIAD. Right (pink frames): The average of the real samples with CNMCIAD in the dataset. Red/blue indicate high/low AV45. ROIs are expected to turn more red as CNMCIAD. The generated samples show magnitudes and sequential patterns similar () to those of the real samples from the training data. How well did CRow improve the statistical analysis? # of Statistically Significant ROIs (# of ROIs after type-I error correction) Covariates Diagnosis ADAS13 MMSE RAVLT-I CDR-SB Control CNCNCN 101010 303030 707070 000 Progression CNMCIAD 102030 302622 705030 0510 cINN (N 1 =100 / N 2 = 100) 11 (4) 5 (2) 5 (0) 3 (0) 7 (0) Ours (N 1 =100 / N 2 = 100) 25 (11) 24 (12) 19 (2) 15 (2) 18 (7) Ours + TCG (N 1 =100 / N 2 = 100) 28 (12) 32 (14) 31 (2) 19 (2) 25 (9) Control CNCNCN 101010 303030 707070 000 Early-progression CNMCIMCI 101316 302826 706050 024 cINN (N 1 =150 / N 2 = 150) 2 (0) 2 (2) 2 (0) 0 (0) 1 (0) Ours (N 1 =150 / N 2 = 150) 6 (2) 6 (4) 11 (4) 5 (1) 2 (0) Ours + TCG (N 1 =150 / N 2 = 150) 6 (4) 8 (5) 12 (4) 5 (1) 5 (1) Table: Number of ROIs identified by statistical group analysis using the generated measures with respect to various covariates associated with AD at significance level α = 0.01 (type-I error controlled result shown in parenthesis). Each column denotes sequences of disease progression represented by diagnosis/test scores. How similar are the generated sequences to the real sequences? Cohen’s d of Gen. vs. Real of Progressions Cohen’s d of Gen. vs. Real of Early-progressions Covariates Diagnosis ADAS13 MMSE RAVLT-I CDR-SB Diagnosis ADAS13 MMSE RAVLT-I CDR-SB cINN 1.255 1.596 1.149 1.894 1.551 1.065 1.498 0.948 1.843 1.454 Ours 0.419 0.556 0.348 0.711 0.645 0.359 0.561 0.295 0.613 0.625 Ours+TCG 0.282 0.391 0.167 0.588 0.377 0.234 0.5248 0.090 0.544 0.499 Table: Difference between the generated sequences and the real sequences at t = 3. Lower the effect size (Cohen’s d), smaller the difference between the comparing distributions. International Conference on Computer Vision (ICCV) 2019 Research supported by NIH (R01AG040396, R01EB022883, R01AG062336, R01AG059312), UW CPCP (U54AI117924), UW CIBM (T15LM007359) NSF CAREER Award (1252725), USDOT Research and Innovative Technology Administration (69A3551747134), and UTA Research Enhancement Program (REP).

Upload: others

Post on 18-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Seong Jae Hwang Zirui Tao Won Hwa Kim Vikas Singhpitt.edu/~sjh95/publication/iccv2019_poster.pdf · Seong Jae Hwang Zirui Tao Won Hwa Kim Vikas Singh MOTIVATION: LONGITUDINAL NEUROIMAGING

Conditional Recurrent Flow: Conditional Generation of Longitudinal Sampleswith Applications to Neuroimaging

Seong Jae Hwang Zirui Tao Won Hwa Kim Vikas SinghMOTIVATION: LONGITUDINAL NEUROIMAGING ANALYSIS

Goal: Understand the progression of longitudinal neuroimaging measures(e.g., PET) of people with various covariate progressions (e.g., “Abnormal”:High→Low cognition vs. “Normal”: High→High cognition)Challenge: Difficult to perform strong statistical analysis with small samplesize of longitudinal neuroimaging datasets

Figure: Longitudinal neuroimaging analysis of Alzheimer’s disease (AD) pathology.Understanding how amyloid (AD pathology) accumulates differently in brain regions betweenpeople of varying conditions may help us to better understand the underlying diseasemechanism.

OBJECTIVE: GENERATION OF NEUROIMAGING MEASURESSolution: Conditional generation of longitudinal neuroimaging measures viaConditional Recurrent Flow (CRow)Example of Figure below with a trained CRow model:1) Given: A sequential condition of decreasing cognition (i.e., a memory testscore sequence y1

i → y2i → y3

i indicating High→Medium→Low Cognitionperformance).2) Model: Conditional Recurrent Flow (CRow).3) Generate: A sequence of brain image progression x1

i → x2i → x3

icorresponding to the given cognition progression (i.e., brain regions with high(red) and low (blue) disease pathology).

Conditional Recurrent Flow

High Cognition Med Cognition Low Cognition

≈ ≈ ≈

Sequential

Condition

Generated

Sequence

Real Data

Sequence

Our Model

Figure: Conditional sequence generation illustration. The Generated Sequence follows thetrend of the Real Data Sequence (i.e., similar (≈) to the real brain image progression) from thesubjects with similarly decreasing cognition scores.

PRELIMINARY 1: NORMALIZING FLOW AND COUPLING LAYERNormalizing Flow: Maps a sample x to a latent variable z = f (x) where z isfrom a standard normal distribution Z:

pX(x) = pZ(z)/|JX|, |JX| =

∣∣∣∣∣∂[x = f−1(z)]∂z

∣∣∣∣∣ (1)

where |JX| is a Jacobian determinant.

Coupling Layer [Dinh et al., 2016]: Allows an exactly invertible mapping u↔ vwith subnetworks r and s.1. Forward map (Fig. (a)):

v1 = u1, v2 = u2 ⊗ exp(s(u1)) + r (u1) (2)2. Inverse map (Fig. (b)):

u1 = v1, u2 = (v2 − r (v1))� exp(s(v1)) (3)

(a) Forward map (b) Inverse map

Figure: Coupling layer in normalizing flow.

Advantage: Easy Jacobian determinant computation via triangular Jacobianmatrix Jv:

|Jv| =∣∣∣∣∂v∂u

∣∣∣∣ =∣∣∣∣∣∂v1∂u1

∂v1∂u2

∂v2∂u1

∂v2∂u2

∣∣∣∣∣ =∣∣∣∣∣ I 0∂v2∂u1

diag(exp s(u1))

∣∣∣∣∣ = exp(∑

i

(s(u1))i) (4)

Coupling Block: Apply the transformation on both partitions.1. Forward map:

v1 = u1 ⊗ exp(s2(u2)) + r2(u2), v2 = u2 ⊗ exp(s1(v1)) + r1(v1) (5)2. Inverse map:

u2 = (v2 − r1(v1))� exp(s1(v1)), u1 = (v1 − r2(u2))� exp(s2(u2)) (6)

PRELIMINARY 2: CONDITIONAL SAMPLE GENERATIONConditional Invertible Neural Network [Ardizonne et al., 2019]: A conditionalinvertible mapping between x↔ [y, z] using Coupling Layer.

Figure: Conditional Sample Generation. (1) Sample z ∼ Z, (2) Assign label y (i.e., color), (3)Inverse map to generate x = f−1([y, z]) in the appropriate location.

METHOD: CONDITIONAL RECURRENT FLOW (CROW)Conditional Recurrent Flow (CRow) [Hwang et al., 2019]:Given: Sequential data xt ∈ X and label/covariate yt ∈ Y.At each t with given [ut

1,ut2] = ut ← xt and [vt

1,vt2] = vt ← [yt, zt],

1. Forward Map:vt

1 = ut1 ⊗ exp(qs2(u

t2,h

t−12 )) + qr2(u

t2,h

t−12 ), vt

2 = ut2 ⊗ exp(qs1(v

t1,h

t−11 )) + qr1(v

t1,h

t−11 )

2. Inverse Map:ut

2 = (vt2 − qr1(v

t1,h

t−11 ))� exp(qs1(v

t1,h

t−11 )), ut

1 = (vt1 − qr2(u

t2,h

t−12 ))� exp(qs2(u

t2,h

t−12 ))

where q(·) is a recurrent subnetwork (e.g., GRU).Advantage: Longitudinal and conditional data generation along with densityestimation.

Figure: The CRow model.

Temporal Context Gating (TCG): Context-based (i.e., history) transformationof the “non-transformed partition” (e.g., u1 in Eq. (2)).

fTCG(αt,ht−1) = αt ⊗ cgate(ht−1) (forward), f−1

TCG(αt,ht−1) = αt � cgate(ht−1) (inverse)

Advantage: Additional recurrent power while preserving the triangularJacobian matrix for fast Jacobian determinant computation:

|Jv| =

∣∣∣∣∣∂v1∂u1

∂v1∂u2

∂v2∂u1

∂v2∂u2

∣∣∣∣∣ =∣∣∣∣∣diag(cgate(ht−1)) 0

∂v2∂u1

diag(exp s(u1))

∣∣∣∣∣ = [∏

j

cgate(ht−1)j] ∗ [exp(∑

i

(s(u1))i)]

EXPERIMENT 1: GENERATE MOVING MNIST SEQUENCES

Q: Can we generate new data sequences given new sequential conditions?(1) Train with label sequences y : 5→5→5→5→5→5 and y : 9→9→9→9→9→9(2) Generate data sequences given a newly seen label sequences y whichchanges label midway: 5→5→9→9→9→9

5→9:

Figure: Generated sequences using CRow. (top of each frame, [digit label]: density)

Ankle Boot→Sneaker:

Figure: Examples of generated Moving Fashion MNIST sequences using CRow.

EXPERIMENT 2: NEUROIMAGING FOR ALZHEIMER’S DISEASEQ: Can we detect the differences in the Alzheimer’s disease (AD) pathologyprogression between people with “Normal” and “Abnormal” covariateprogressions? (Idea illustrated in Fig. of Motivation Sec.)

Dataset: N = 276 Amyloid PET images (AV45) of T = 3 time points fromAlzheimer’s Disease Neuroimaging Initiative (ADNI) in 82 Desikan Atlas regions⇒ 1. Longitudinal region-wise amyloid measures: xt ∈ R82 for t = 1,2,3

2. Longitudinal covariates: yt ∈ R for t = 1,2,3 (e.g., cognition)

Analysis Setup: For each covariate type,(1) Training: Train with all N = 276 subjects(2) Generation: Generate (i) Group A: 100 samples of xt given “Abnormal” yt

and (ii) Group B: 100 samples of xt given “Normal” yt

(3) Statistical Analysis: At t = 3, perform a group difference test betweenGroup A and Group B in each region. Count significantly differences regions.

Figure: Generated sequences vs. real data sequences comparison for CN (top)→MCI(middle)→AD (bottom). Left (blue frames): The average of the 100 generated sequencesconditioned on CN→MCI→AD. Right (pink frames): The average of the real samples withCN→MCI→AD in the dataset. Red/blue indicate high/low AV45. ROIs are expected to turn morered as CN→MCI→AD. The generated samples show magnitudes and sequential patterns similar(≈) to those of the real samples from the training data.

How well did CRow improve the statistical analysis?# of Statistically Significant ROIs (# of ROIs after type-I error correction)

Covariates Diagnosis ADAS13 MMSE RAVLT-I CDR-SBControl CN→CN→CN 10→10→10 30→30→30 70→70→70 0→0→0Progression CN→MCI→AD 10→20→30 30→26→22 70→50→30 0→5→10cINN (N1=100 / N2= 100) 11 (4) 5 (2) 5 (0) 3 (0) 7 (0)Ours (N1=100 / N2= 100) 25 (11) 24 (12) 19 (2) 15 (2) 18 (7)Ours + TCG (N1=100 / N2= 100) 28 (12) 32 (14) 31 (2) 19 (2) 25 (9)Control CN→CN→CN 10→10→10 30→30→30 70→70→70 0→0→0Early-progression CN→MCI→MCI 10→13→16 30→28→26 70→60→50 0→2→4cINN (N1=150 / N2= 150) 2 (0) 2 (2) 2 (0) 0 (0) 1 (0)Ours (N1=150 / N2= 150) 6 (2) 6 (4) 11 (4) 5 (1) 2 (0)Ours + TCG (N1=150 / N2= 150) 6 (4) 8 (5) 12 (4) 5 (1) 5 (1)

Table: Number of ROIs identified by statistical group analysis using the generated measures withrespect to various covariates associated with AD at significance level α = 0.01 (type-I errorcontrolled result shown in parenthesis). Each column denotes sequences of diseaseprogression represented by diagnosis/test scores.

How similar are the generated sequences to the real sequences?Cohen’s d of Gen. vs. Real of Progressions Cohen’s d of Gen. vs. Real of Early-progressions

Covariates Diagnosis ADAS13 MMSE RAVLT-I CDR-SB Diagnosis ADAS13 MMSE RAVLT-I CDR-SBcINN 1.255 1.596 1.149 1.894 1.551 1.065 1.498 0.948 1.843 1.454Ours 0.419 0.556 0.348 0.711 0.645 0.359 0.561 0.295 0.613 0.625Ours+TCG 0.282 0.391 0.167 0.588 0.377 0.234 0.5248 0.090 0.544 0.499

Table: Difference between the generated sequences and the real sequences at t = 3. Lower theeffect size (Cohen’s d), smaller the difference between the comparing distributions.

International Conference on Computer Vision (ICCV) 2019 Research supported by NIH (R01AG040396, R01EB022883, R01AG062336, R01AG059312), UW CPCP (U54AI117924), UW CIBM (T15LM007359) NSF CAREER Award (1252725), USDOT Research and Innovative Technology Administration (69A3551747134), and UTA Research Enhancement Program (REP).