modeling motor learning using heteroscedastic functional...

14
Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=uasa20 Journal of the American Statistical Association ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: https://www.tandfonline.com/loi/uasa20 Modeling Motor Learning Using Heteroscedastic Functional Principal Components Analysis Daniel Backenroth, Jeff Goldsmith, Michelle D. Harran, Juan C. Cortes, John W. Krakauer & Tomoko Kitago To cite this article: Daniel Backenroth, Jeff Goldsmith, Michelle D. Harran, Juan C. Cortes, John W. Krakauer & Tomoko Kitago (2018) Modeling Motor Learning Using Heteroscedastic Functional Principal Components Analysis, Journal of the American Statistical Association, 113:523, 1003-1015, DOI: 10.1080/01621459.2017.1379403 To link to this article: https://doi.org/10.1080/01621459.2017.1379403 View supplementary material Accepted author version posted online: 29 Sep 2017. Published online: 29 Sep 2017. Submit your article to this journal Article views: 513 View Crossmark data

Upload: others

Post on 11-Sep-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

Full Terms & Conditions of access and use can be found athttps://www.tandfonline.com/action/journalInformation?journalCode=uasa20

Journal of the American Statistical Association

ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: https://www.tandfonline.com/loi/uasa20

Modeling Motor Learning Using HeteroscedasticFunctional Principal Components Analysis

Daniel Backenroth, Jeff Goldsmith, Michelle D. Harran, Juan C. Cortes, JohnW. Krakauer & Tomoko Kitago

To cite this article: Daniel Backenroth, Jeff Goldsmith, Michelle D. Harran, Juan C. Cortes,John W. Krakauer & Tomoko Kitago (2018) Modeling Motor Learning Using HeteroscedasticFunctional Principal Components Analysis, Journal of the American Statistical Association,113:523, 1003-1015, DOI: 10.1080/01621459.2017.1379403

To link to this article: https://doi.org/10.1080/01621459.2017.1379403

View supplementary material

Accepted author version posted online: 29Sep 2017.Published online: 29 Sep 2017.

Submit your article to this journal

Article views: 513

View Crossmark data

Page 2: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, VOL. , NO. , –: Applications and Case Studieshttps://doi.org/./..

Modeling Motor Learning Using Heteroscedastic Functional Principal ComponentsAnalysis

Daniel Backenrotha, Jeff Goldsmitha, Michelle D. Harranb,c, Juan C. Cortesb,d, John W. Krakauerb,c,d,and Tomoko Kitagob,e

aDepartment of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY; bDepartment of Neurology, Columbia UniversityMedical Center, New York, NY; cDepartment of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD; dDepartment of Neurology,Johns Hopkins University School of Medicine, Baltimore, MD; eBurke Neurological Institute, White Plains, NY

ARTICLE HISTORYReceived February Revised August

KEYWORDSFunctional data; Kinematicdata; Motor control;Probabilistic PCA; Variancemodeling; Variational Bayes

ABSTRACTWepropose a novelmethod for estimating population-level and subject-specific effects of covariates on thevariability of functional data. We extend the functional principal components analysis framework by mod-eling the variance of principal component scores as a function of covariates and subject-specific randomeffects. In a setting where principal components are largely invariant across subjects and covariate values,modeling the variance of these scores provides a flexible and interpretableway to explore factors that affectthe variability of functional data. Our work is motivated by a novel dataset from an experiment assessingupper extremity motor control, and quantifies the reduction in movement variability associated with skilllearning. The proposed methods can be applied broadly to understand movement variability, in settingsthat include motor learning, impairment due to injury or disease, and recovery. Supplementary materialsfor this article are available online.

1. Scientific Motivation

1.1. Motor Learning

Recent work in motor learning has suggested that change inmovement variability is an important component of improve-ment in motor skill. It has been suggested that when a motortask is learned, variance is reduced along dimensions relevantto the successful accomplishment of the task, although it mayincrease in other dimensions (Scholz and Schöner 1999; Yarrow,Brown, and Krakauer 2009). Experimental work, moreover,has shown that learning-induced improvement of movementexecution, measured through the trade-off between speed andaccuracy, is accompanied by significant reductions inmovementvariability. In fact, these reductions inmovement variabilitymaybe a more important feature of learning than changes in theaverage movement (Shmuelof, Krakauer, and Mazzoni 2012).These results have typically been based on assessments of vari-ability at a few time points, for example, at the end of the move-ment, although high-frequency laboratory recordings of com-plete movements are often available.

In this article we develop a modeling framework that can beused to quantify movement variability based on dense record-ings of fingertip position throughoutmovement execution. Thisframework can be used to explore many aspects of motor skilland learning: differences in baseline skill among healthy sub-jects, effects of repetition and training to modulate variabilityover time, or the effect of baseline stroke severity on movement

CONTACT Daniel Backenroth [email protected] Department of Biostatistics, Mailman School of Public Health, Columbia University, West thStreet, th Floor, New York, NY .Color versions of one or more of the figures in the article can be found online atwww.tandfonline.com/r/JASA.

Supplementary materials for this article are available online. Please go towww.tandfonline.com/r/JASA.

variance and recovery (Krakauer 2006). By taking full advantageof high-frequency laboratory recordings, we shift focus fromparticular time points to complete curves. Our approach allowsus to model the variability of these curves as they depend oncovariates, like the hand used or the repetition number, as wellas the estimation of randomeffects reflecting differences in base-line variability and learning rates among subjects.

Section 1.2 describes our motivating data in more detail, andSection 2 introduces our modeling framework. A review of rele-vant statistical work appears in Section 3. Details of our estima-tion approach are in Section 4. Simulations and the applicationto our motivating data appear in Sections 5 and 6, respectively,and we close with a discussion in Section 7.

1.2. Dataset

Our motivating data were gathered as part of a study of motorlearning among healthy subjects. Kinematic data were acquiredin a standard task used to measure control of reaching move-ments. In this task, subjects rest their forearm on an air-sledsystem to reduce effects of friction and gravity. The subjects arepresented with a screen showing eight targets arranged in a cir-cle around a starting point, and they reach with their arm to atarget and back when it is illuminated on the screen. Subjects’movements are displayed on the screen, and they are rewardedwith 10 points if they turn their hand around within the target,and 3 or 1 otherwise, depending on how far their hand is from

© American Statistical Association

Page 3: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

1004 D. BACKENROTH ET AL.

the target at the point of return. Subjects are not rewarded formovements outside prespecified velocity thresholds.

Our dataset consists of 9481 movements by 26 right-handedsubjects. After becoming familiarized with the experimentalapparatus, each subject made 24 or 25 reaching movements toeach of the 8 targets, in a semirandom order, with both the leftand right hand.Movements that did not reach at least 30% of thedistance to the target andmovements with a directionmore than90 deg away from the target direction at the point of peak veloc-ity were excluded from the dataset, because of the likelihoodthat they were made to the wrong target or not attempted dueto distraction. Movements made at speeds outside the range ofinterest, with peak velocity less than 0.04 or greater than 2.0m/s,were also excluded. These exclusion rules and other similar ruleshave beenused previously in similar kinematic experiments, andare designed to increase the specificity of these experiments forprobing motor control mechanisms (Huang et al. 2012; Tanaka,Sejnowski, andKrakauer 2009; Kitago et al. 2015). A small num-ber of additionalmovements were removed from the dataset dueto instrumentation and recording errors. The data we considerhave not been previously reported.

For each movement, the X andY position of the handmove-ment is recorded as a function of time from movement onset tothe initiation of return to the starting point, resulting in bivari-ate functional observations denoted [PX

i j (t ),PYi j (t )] for subjecti and movement j. In practice, observations are recorded notas functions but as discrete vectors. There is some variability inmovement duration, which we remove for computational con-venience by linearly registering eachmovement onto a commongrid of lengthD = 50. The structure of the registered kinematicdata is illustrated in Figure 1. The top and bottom rows show,respectively, the first and last right-hand movement made to

each target by each subject. The reduction in movement vari-ability after practice is clear.

Prior to our analyses, we rotate curves so that all movementsextend to the target at 0◦. This rotation preserves shape andscale, but improves interpretation. After rotation, movementalong the X coordinate represents movement parallel to the linebetween origin and target, and movement along the Y coordi-nate represents movement perpendicular to this line. We buildmodels forX andY coordinate curves separately in our primaryanalysis. An alternative bivariate analysis appears in AppendixC.

2. Model for Trajectory Variability

We adopt a functional data approach to model position curvesPi j(t ). Herewe omit theX andY superscripts for notational sim-plicity. Our starting point is the functional principal componentanalysis (FPCA) model of Yao, Müller, and Wang (2005) withsubject-specific means. In this model, it is assumed that eachcurve Pi j(t ) can be modeled as

Pi j(t ) = μi j(t ) + δi j(t )

= μi j(t ) +∞∑k=1

ξi jkφk(t ) + εi j(t ). (1)

Here μi j(t ) is the mean function for curve Pi j(t ), the devia-tion δi j(t ) is modeled as a linear combination of eigenfunctionsφk(t ), the ξi jk are uncorrelated random variables with mean 0and variances λk, where

∑k λk < ∞ and λ1 ≥ λ2 ≥ · · · , and

εi j(t ) is white noise. Here all the deviations δi j(t ) are assumedto have the same distribution, that of a single underlying randomprocess δ(t ).

Figure . Observed kinematic data. The top row shows the first right-hand movement to each target for each subject; the bottom row shows the last movement. The leftpanel of each row shows observed reaching data in the X andY plane. Targets are indicatedwith circles. Themiddle and right panels of each row show the PXi j (t ) and P

Yi j (t )

curves, respectively.

Page 4: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1005

Model (1) is based on a truncation of the Karhunen-Loèverepresentation of the random process δ(t ). The Karhunen-Loève representation, in turn, arises from the spectraldecomposition of the covariance of the random processδ(t ) from Mercer’s Theorem, from which one can obtaineigenfunctions φk(t ) and eigenvalues λk.

The assumption of constant score variances λk in model (1)is inconsistent with our motivating data because it implies thatthe variability of the position curves Pi j(t ) is not covariate- orsubject-dependent. However, movement variability can dependon the subject’s baseline motor control and may change inresponse to training. Indeed, these changes in movement vari-ance are precisely our interest.

In contrast to the preceding, we therefore assume that eachrandomprocess δi j(t )has a potentially unique distribution, witha covariance operator that can be decomposed as

cov[δi j(s), δi j(t )] =∞∑k=1

λi jkφk(s)φk(t ),

so that the eigenvalues λi jk, but not the eigenfunctions, varyamong the curves. We assume that deviations δi j(t ) are uncor-related across both i and j.

The model we pose for the Pi j(t ) is therefore

Pi j(t ) = μi j(t ) +K∑

k=1

ξi jkφk(t ) + εi j(t ), (2)

where we have truncated the expansion inmodel (1) toK eigen-functions, and into which we incorporate covariate and subject-dependent heteroscedasticity with the score variance model

λi jk = λk|x∗i jk,z

∗i jk,gik = Var(ξi jk|x∗

i jk, z∗i jk, gik)

= exp

(γ0k +

L∗∑l=1

γlkx∗i jlk +

M∗∑m=1

gimkz∗i jmk

)(3)

where, as before, ξi jk is the kth score for the jth curve of theith subject. In model (3), γ0k is an intercept for the variance ofthe scores, γlk are fixed effects coefficients for covariates x∗

i jlk,l = 1, . . . , L∗, and gimk are random effects coefficients forcovariates z∗

i jmk, m = 1, . . . ,M∗. The vector gik consists of theconcatenation of the coefficients gimk, and likewise for the vec-tors x∗

i jk and z∗i jk. Throughout, the subscript k indicates that

models are used to describe the variance of scores associatedwith each basis function φk(t ) separately. The covariates x∗

i jlkand z∗

i jmk in model (3) need not be the same across principalcomponents. This model allows exploration of the dependenceof movement variability on covariates, like progress througha training regimen, as well as of idiosyncratic subject-specificeffects on variance through the incorporation of random inter-cepts and slopes.

Together, models (2) and (3) induce a subject- and covariate-dependent covariance structure for δi j(t ):

cov[δi j(s), δi j(t )|x∗i jk, z

∗i jk, φk, gik]

=K∑

k=1

λk|x∗i jk,z

∗i jk,gikφk(s)φk(t ).

In particular, the φk(t ) are assumed to be eigenfunctions of aconditional covariance operator. Our proposal can be relatedto standard FPCA by considering covariate values random andmarginalizing across the distribution of random effects andcovariates using the law of total covariance:

cov[δi j(s), δi j(t )] = E{cov[δi j(s), δi j(t )|x∗, z∗, g]

}+cov

{E[δi j(s)|x∗, z∗, g]E[δi j(t )|x∗, z∗, g]

}=

K∑k=1

E[λk|x∗

i jk,z∗i jk,gik

]φk(s)φk(t ).

We assume that the basis functions φk(t ) do not depend oncovariate or subject effects, and are therefore unchanged by thismarginalization. Scores ξi jk are marginally uncorrelated over k;this follows from the assumption that scores are uncorrelated inour conditional specification, and holds even if random effectsgik are correlated over k. Finally, the order of marginal variancesE[λk|x∗

i jk,z∗i jk,gik] may not correspond to the order of conditional

variances λk|x∗i jk,z

∗i jk,gik for some or even all values of the covari-

ates and random effects coefficients.In our approach, we assume that the scores ξi jk have mean

zero. For this assumption to be valid, the mean μi j(t ) in model(2) should be carefully modeled. To this end we use the well-studied multilevel function-on-scalar regression model (Guo2002; Di et al. 2009; Morris and Carroll 2006; Scheipl, Staicu,and Greven 2015),

μi j(t ) = β0(t ) +L∑

l=1

xi jlβl (t ) +M∑

m=1

zi jmbim(t ). (4)

Here β0(t ) is the functional intercept; xi jl for l ∈ 1, . . . , L arescalar covariates associated with functional fixed effects withrespect to the curve Pi j(t ); βl (t ) is the functional fixed effectassociated with the lth such covariate; zi jm for m ∈ 1, . . . ,Mare scalar covariates associated with functional random effectswith respect to the curve Pi j(t ); and bim(t ) form ∈ 1, . . . ,M arefunctional random effects associated with the ith subject.

Keeping the basis functions constant across all subjects andmovements, as in conventional FPCA, maintains the inter-pretability of the basis functions as the major patterns ofvariation across curves. Moreover, the covariate and subject-dependent score variances reflect the proportion of variationattributable to those patterns. To examine the appropriatenessof this assumption for our data, we estimated basis functionsfor various subsets of movements using a traditional FPCAapproach, after rotating observed data so that all movementsextend to the target at 0◦. As illustrated in Figure 2, the basisfunctions for movements made by both hands and at differentstages of training are similar.

3. Prior Work

FPCA has a long history in functional data analysis. It is com-monly performed using a spectral decomposition of the sam-ple covariance matrix of the observed functional data (Ramsayand Silverman 2005; Yao,Müller, andWang 2005).Most relevantto our current work are probabilistic and Bayesian approachesbased on nonfunctional PCA methods (Tipping and Bishop

Page 5: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

1006 D. BACKENROTH ET AL.

Figure . FPC basis functions estimated for various data subsets after rotating curves onto the positive X axis. The left panel shows the first and second FPC basis functionsestimated for the X coordinate of movements to each target, for the left and right hand separately, and separately for movement numbers -, -, -, and -. Theright panel shows the same for theY coordinate.

1999; Bishop 1999; Peng and Paul 2009). Rather than proceed-ing in stages, first by estimating basis functions and then, giventhese, estimating scores, such approaches estimate all parame-ters inmodel (1) jointly. James, Hastie, and Sugar (2000) focusedon sparsely observed functional data and estimated parametersusing an EM algorithm; van der Linde (2008) took a variationalBayes approach to estimation of a similar model. Goldsmith,Zipunnikov, and Schrack (2015) considered both exponential-family functional data and multilevel curves, and estimatedparameters using Hamiltonian Monte Carlo.

Some previous work has allowed for heteroskedasticity inFPCA. Chiou, Müller, and Wang (2003) developed a modelwhich uses covariate-dependent scores to capture the covari-ate dependence of the mean of curves. In a manner that isconstrained by the conditional mean structure of the curves,some covariate dependence of the variance of curves is alsoinduced; the development of models for score variance was,however, not pursued. Here, by contrast, our interest is to useFPCA tomodel the effects of covariates on curve variance, inde-pendently of the mean structure. We are not using FPCA tomodel the mean; rather, the mean is modeled by the function-on-scalar regression model (4). Jiang and Wang (2010) intro-duce heteroscedasticity by allowing both the basis functions andthe scores in an FPCA decomposition to depend on covariates.Briefly, rather than considering a bivariate covariance as theobject to be decomposed, the authors pose a covariance surfacethat depends smoothly on a covariate. Aside from the challengeof incorporating more than a few covariates or subject-specificeffects, it is difficult to use this model to explore the effects ofcovariates on heteroscedasticity: covariates affect both the basisfunctions and the scores,making the interpretation of scores andscore variances at different covariate levels unclear. Although itdoes not allow for covariate-dependent heteroscedasticity, themodel of Huang, Li, and Guan (2014) allows curves to belong toone of a few different clusters, each with its own FPCs and scorevariances.

In contrast to the existing literature, our model provides ageneral framework for understanding covariate and subject-dependent heteroskedasticity in FPCA. This allows the estima-tion of rich models with multiple covariates and random effects,while maintaining the familiar interpretation of basis functions,scores, and score variances.

Variational Bayes methods, which we use here to approxi-mate Bayesian estimates of the parameters inmodels (2) and (3),are computationally efficient and typically yield accurate pointestimates for model parameters, although they provide onlyan approximation to the complete posterior distribution and

inference may suffer as a result (Ormerod and Wand 2012; Jor-dan 2004; Jordan et al. 1999; Titterington 2004). These tools havepreviously been used in functional data analysis (van der Linde2008; Goldsmith, Wand, and Crainiceanu 2011; McLean et al.2013); in particular, Goldsmith and Kitago (2016) used varia-tional Bayes methods in the estimation of model (4).

4. Methods

The main contribution of this article is the introduction of sub-ject and covariate effects on score variances inmodel (3). Severalestimation strategies can be used within this framework. Herewe describe three possible approaches. Later, these will be com-pared in simulations.

4.1. Sequential Estimation

Models (2) and (3) can be fit sequentially in the followingway. First, the mean μi j(t ) in model (2) is estimated throughfunction-on-scalar regression under a working independenceassumption of the errors; we use the function pffr in therefund package (Goldsmith et al. 2016) in R. Next, the resid-uals from the function-on-scalar regression are modeled usingstandard FPCA approaches to obtain estimates of principalcomponents and marginal score variances; given these quan-tities, scores themselves can be estimated (Yao, Müller, andWang 2005). For this step we use the function fpca.sc, alsoin the refund package, which is among the available imple-mentations. Next, we reestimate the mean μi j(t ) in model (2)with function-on-scalar regression using pffr, although now,instead of assuming independence, we decompose the residu-als using the principal components and score variances esti-mated in the previous step.We then reestimate principal compo-nents and scores using fpca.sc. The final step is to model thescore variances given these score estimates. Assuming that thescores are normally distributed conditional on random effectsand covariates, model (3) induces a generalized gamma linearmixedmodel for ξ 2

i jk, the square of the scores, with log link, coef-ficients γlk and gimk, and shape parameter equal to 1/2. We fitthis model with the lme4 package, separately with respect tothe scores for each principal component, in order to obtain esti-mates of our parameters of interest in the score variance model(Bates et al. 2015).

The first two steps of this approach are consistent with thecommon strategy for FPCA, and we account for non-constantscore variance through an additional modeling step. We antic-ipate that this sequential approach will work reasonably well

Page 6: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1007

in many cases, but note that it arises as a sequence of modelsthat treat estimated quantities as fixed. First, one estimates themean; then one treats themean as fixed to estimate the principalcomponents and the scores; finally, one treats the scores as fixedto estimate the score variance model. Overall performance maydeteriorate by failing to incorporate uncertainty in estimatesin each step, particularly in cases of sparsely observed curvesor high measurement error variances (Goldsmith, Greven, andCrainiceanu 2013). Additionally, because scores are typicallyestimated in a mixed model framework, the use of marginalscore variances in the FPCA step can negatively impact scoreestimation and the subsequent modeling of conditional scorevariances.

4.2. Bayesian Approach

... BayesianModelJointly estimating all parameters in models (2) and (3) in aBayesian framework is an appealing alternative to the sequentialestimation approach. We expect this to be less familiar to read-ers than the sequential approach, and therefore provide a moredetailed description.

Our Bayesian specification of these models is formulated inmatrix form to reflect the discrete nature of the observed data.In the following � is a known D × Kθ matrix of Kθ spline basisfunctions evaluated on the shared grid of lengthD on which thecurves are observed. We assume a normal distribution of thescores ξi jk conditional on random effects and covariates:

pi j =L∑

l=0

xi jl�βl +M∑

m=1

zi jm�bim +K∑

k=1

ξi jk�φk + εi j

βl ∼ MVN[0, σ 2

βlQ−1

]; σ 2

βl∼ IG [α, β]

bi ∼ MVN[0, σ 2

b ((1 − π)Q + πI)−1] ; σ 2b ∼ IG [α, β]

φk ∼ MVN[0, σ 2

φkQ−1

]; σ 2

φk∼ IG [α, β]

ξi jk ∼ N

[0, exp

( L∗∑l=0

γlkx∗i jlk +

M∗∑m=1

gimkz∗i jmk

)]γlk ∼ N

[0, σ 2

γlk

]gik ∼ MVN

[0,�gk

] ;�gk ∼ IW [�k, ν]εi j ∼ MVN

[0, σ 2I

] ; σ 2 ∼ IG [α, β] . (5)

In model (5), i = 1, . . . , I refers to subjects, j = 1, . . . , Ji refersto movements within subjects, and k = 1, . . . ,K refers to prin-cipal components. We define the total number of functionalobservations n = ∑I

i=1 Ji. The column vectors pi j and εi j arethe D × 1 observed functional outcome and independent errorterm, respectively, on the finite grid shared across subjectsfor the jth curve of the ith subject. The vectors βl , for l =0, . . . , L, are functional effect spline coefficient vectors, bim, fori = 1, . . . , I andm = 1, . . . ,M, are random effect spline coeffi-cient vectors, and φk, for k = 1 . . . ,K, are principal componentspline coefficient vectors, all of length Kθ . Q is a penalty matrixof the form �TMTM�, whereM is a matrix that penalizes thesecond derivative of the estimated functions. I is the identitymatrix. MVN refers to the multivariate normal distribution, N

to the normal distribution, IG to the inverse-gamma distribu-tion, and IW to the inverse-Wishart distribution.Models (3) and(4) can be written in the form of model (5) above by introduc-ing into those models covariates x∗

i j0k (in model (3), multiplyingγ0k) and xi j0 (in model (4), multiplying β0(t )), identically equalto 1. Some of the models used here, like in our real data analysis,do not have a global functional intercept β0 or global score vari-ance intercepts γ0k; in these models there are no such covariatesidentically equal to 1.

As discussed further in Section 4.2.3, for purposes of iden-tifiability and to obtain FPCs that represent non-overlappingdirections of variation, when fitting this model we introduce theadditional constraint that the FPCs should be orthonormal andthat each FPC should explain the largest possible amount of vari-ance in the data, conditionally on the previously estimated FPCs,if any.

In keeping with standard practice, we set the prior vari-ances σ 2

γlkfor the fixed-effect coefficients in the score variance

model to a large constant, so that their prior is close to uni-form.We set ν, the degrees of freedomparameter for the inverse-Wishart prior for the covariance matrices �gk , to the dimen-sion of gik. We use an empirical Bayes approach, discussed fur-ther in Section 4.2.4, to specify �k, the scale matrix parametersof these inverse-Wishart priors. When the random effects gikare one-dimensional, this prior reduces to an inverse-Gammaprior. Sensitivity to prior specifications of this model should beexplored, and we do so with respect to our real data analysis inAppendix D.

Variance components {σ 2βl

}Ll=0 and {σ 2φk

}Kk=1 act as tuningparameters controlling the smoothness of coefficient functionsβl (t ) and FPC functions φk(t ), and our prior specification forthem is related to standard techniques in semiparametric regres-sion. σ 2

b , meanwhile, is a tuning parameter that controls theamount of penalization of the random effects, and is sharedacross the bim(t ), so that all random effects for all subjectsshare a common distribution. Whereas fixed effects and func-tional principal components are penalized only through theirsquared second derivative, the magnitude of the random effectsis also penalized through the full-rank penaltymatrix I to ensureidentifiability (Scheipl, Staicu, and Greven 2015; Djeundje andCurrie 2010). The parameter π , 0 < π < 1, determines the bal-ance of smoothness and shrinkage penalties in the estimation ofthe random effects bim(t ).We discuss how to set the value of thisparameter in Section 4.2.4.We setα andβ , the parameters of theinverse-gammaprior distributions for the variance components,to 1.

Our framework can accommodate more complicated ran-dom effect structures. In our application in Section 6, for exam-ple, each subject has 8 random effect vectors gilk, one for eachtarget, indexed by l = 1, . . . , 8; the index l is used here since inSection 6 l is used to index targets. We model the correlationsbetween these random effect vectors through a nested randomeffect structure:

gilk ∼ MVN[gik,�gik

] ; gik ∼ MVN[0,�gk

]. (6)

Here the random effect vectors gilk for subject i and FPC k,l = 1, . . . 8, are centered around a subject-specific randomeffectvector gik. We estimate two separate random effect covariance

Page 7: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

1008 D. BACKENROTH ET AL.

matrices, �gik and �gk , for each FPC k, one at the subject-target level and one at the subject level. These matrices are giveninverse-Wishart priors, and are discussed further in Section4.2.4.

... Estimation StrategiesSampling-based approaches to Bayesian inference of model (5)are challenging due to the constraints we impose on theφk(t ) forpurposes of interpretability of the score variance models, whichare our primary interest. We present two methods for Bayesianestimation and inference for model (5): first, an iterative varia-tional Bayes method, and second, a Hamiltonian Monte Carlo(HMC) sampler, implemented with the STAN Bayesian pro-gramming language (Stan Development Team 2013). Our itera-tive variational Bayes method, which estimates each parameterin turn conditional on currently estimated values of the otherparameters, is described in detail in Appendix E. This appendixalso includes a brief overview of variational Bayes methods. OurHMC sampler, also described in Appendix E, conditions on esti-mates of the FPCs and fixed and random functional effects fromthe variational Bayes method, and estimates the other quantitiesin model (5).

... OrthonormalizationA well-known challenge for Bayesian and probabilisticapproaches to FPCA is that the basis functions φk(t ) arenot constrained to be orthogonal. In addition, when the scoresξi jk do not have unit variance, the basis functions will also beindeterminate up to magnitude, since any increase in theirnorm can be accommodated by decreased variance of thescores. Where interest lies in the variance of scores with respectto particular basis functions, it is important for the basis func-tions to be well-identified and orthogonal, so that they representdistinct and non-overlapping modes of variation. We thereforeconstrain estimated FPCs to be orthonormal and require eachFPC to explain the largest possible amount of variance in thedata, conditionally on the previously estimated FPCs, if any.

Let� be the n × Kmatrix of principal component scores andφ the K by Kθ matrix of principal component spline coefficientvectors. In each step of our iterative variational Bayes algorithm,we apply the singular value decomposition to the matrix prod-uct �φT�T ; the orthonormalized principal component basisvectors which satisfy these constraints are then the right sin-gular vectors of this decomposition. A similar approach wasused to induce orthogonality of the principal components in theMonte Carlo Expectation Maximization algorithm of (Huang,Li, and Guan 2014) and as a post-processing step in (Goldsmith,Zipunnikov, and Schrack 2015). Although explicit orthonormal-ity constraints may be possible in this setting (Šmídl and Quinn2007), our simple approach, while not exact, provides for accu-rate estimation. Our HMC sampler conditions on the varia-tional Bayes estimates of the FPCs, and therefore also satisfiesthe desired constraints.

... Hyperparameter SelectionThe parameter π in model (5) controls the balance of smooth-ness and shrinkage penalization in the estimation of therandom effects bim. In our variational Bayes approach we choose

π to minimize the Bayesian information criterion (Schwarz1978), following the approach of Djeundje and Currie (2010).

To set the hyperparameter �k in model (5) (or the hyperpa-rameters in the inverse-Wishart priors for the variance param-eters in model (6)), we use an empirical Bayes method. First,we estimate scores ξi jk using our variational Bayes method, witha constant score variance for each FPC. We then estimate therandom effects gik (or gilk) using a generalized gamma linearmixed model, as described in Section 4.1. Finally, we computethe empirical covariance matrix corresponding to �gk (or �gikand �gk), and set the hyperparameter so that the mode of theprior distribution matches this empirical covariance matrix.

5. Simulations

We demonstrate the performance of our method using simu-lated data. Here we present a simulation that includes functionalrandom effects as well as scalar score variance random effects.Appendix F includes additional simulations in a cross-sectionalcontext which demonstrate the effect of varying the number ofestimated FPCs, the number of spline basis functions, and themeasurement error.

In our simulation design, the jth curve for the ith subject isgenerated from the model

Pi j(t ) = 0 + bi(t ) +4∑

k=1

ξi jkφk(t ) + εi j(t ). (7)

We observe the curves at D = 50 equally spaced points on thedomain [0, 2π]. FPCs φ1 and φ2 correspond to the functionssin(x) and cos(x) and FPCs φ3 and φ4 correspond to the func-tions sin(2x) and cos(2x). We divide the curves equally into twogroups m = 1, 2. We define x∗

i j1 to be equal to 1 if the ith sub-ject is assigned to group 1, and 0 otherwise, and we define x∗

i j2to be equal to 1 if the ith subject is assigned to group 2, and 0otherwise. We generate scores ξi jk from zero-mean normal dis-tributions with variances equal to

var(ξi jk|x∗i j, gik) = exp

( 2∑l=1

γlkx∗i jl + gik

)(8)

We set γ1k for k = 1, . . . , 4 to the natural logarithms of 36, 12, 6and 4, respectively, and γ2k for k = 1, . . . , 4 to the natural log-arithms of 18, 24, 12 and 6, respectively. The order of γ1k andγ2k for FPCs (represented by k) 1 and 2 black is purposelyreversed between groups 1 and 2 so that the dominant modeof variation is not the same in the two groups. We generatethe random effects gik in the score variance model from a nor-mal distribution with mean zero and variance σ 2

gk , setting σ 2gk to

3.0, 1.0, 0.3, and 0.1 across FPCs. We simulate functional ran-dom effects bi(t ) for each subject by generating 10 elementsof a random effect spline coefficient vector from the distribu-tionMVN[0, σ 2

b ((1 − π)Q + πI)−1], and thenmultiplying thisvector by a B-spline basis function evaluation matrix. We setπ = σ 2

b = 1/2000, resulting in smooth random effects approx-imately one-third the magnitude of the FPC deviations. Theεi j(t ) are independent errors generated at all t from a normaldistribution with mean zero and variance σ 2 = 0.25.

We fix the sample size I at 24, and set the number of curvesper subject Ji to 4, 12, 24, and 48. Two hundred replicate datasets

Page 8: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1009

Figure . Selected results for the VB method for one simulation replicate with I = Ji = 24. This simulation replicate was selected because the estimation quality of thegroup-level score variances, shown in the bottom row, is close to median with respect to all simulations. Panels in the top row show simulated curves for two subjects inlight black, the simulated functional random effect for that subject as a dashed line, and the estimated functional random effect for that subject as a dark solid line. Thesubjects were selected to show one subject with a poorly estimated functional random effect (left) and one with a well estimated functional random effect (right). Panelsin the bottom row show, for each FPC, estimates and simulated values of the group-level and subject-specific score variances. Large colored dots are the group-level scorevariances, and small colored dots are the estimated score variances for each subject, that is, they combine the fixed effect and the random effect.

were generated for each of the four scenarios. The simulationscenario with I = Ji = 24 is closest to the sample size in our realdata application, where for each of 8 targets we have I = 26 andJi ≈ 24.

We fit the following model to each simulated dataset usingeach of the three approaches described in Section 4:

pi j = �β0 + �bi +4∑

k=1

ξi jk�φk + εi j

ξi jk ∼ N

[0, exp

( 2∑l=1

γlkx∗i jl + gik

)].

Here pi j is the vectorized observation of Pi j(t ) from model (7).We use 10 spline basis functions for estimation, so that � isa 50 × 10 B-spline basis function evaluation matrix. For theBayesian approaches, we use the priors specified in model (5),including N[0, 100] priors for variance parameters σ 2

γlk. We use

the empirical Bayes approach discussed in Section 4.2.4 to setthe scale parameters for the inverse-gamma priors for the vari-ances σ 2

gk of the random effects gik.Figures 3, 4, and 5 illustrate the quality of variational Bayes

(VB) estimation of functional random effects, FPCs, and fixedand random effect score variance parameters. The top rowof Figure 3 shows the collection of simulated curves for two

Figure . Estimation of FPCs using the VB method. Panels in the top row show a true FPC in dark black, and the VB estimates of that FPC for all simulation replicates withJi = 24 in light black. Panels in the bottom row show, for each FPC and Ji , boxplots of integrated square errors (ISEs) for VB estimates φ̂k(t ) of each FPC φk(t ), definedas ISE = ∫ 2π

0 [φk(t ) − φ̂k(t )]2dt . The estimates in the top row therefore correspond to the ISEs for Ji = 24 shown in the bottom row. Figure A. in Appendix F shows

examples of estimates of FPCs with a range of different ISEs.

Page 9: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

1010 D. BACKENROTH ET AL.

Figure . Estimation of score variance fixed and random effects using VB. Panels in the top row show, for each FPC, group, and Ji , boxplots of signed relative errors (SREs)

for VB estimates γ̂lk of the fixed effect score variance parameters γlk , defined as SRE = γ̂lk−γlkγlk

. Panels in the bottom row show, for each FPC and Ji, the correlation between

random effect score variance parameters gik and their VB estimates. Intercepts and slopes for linear regressions of estimated on simulated random effect score variancesare centered around and , respectively (not shown).

subjects and includes the true and estimated subject-specificmean. The bottom row of this figure shows the true and esti-mated score variances across FPCs for a single simulated dataset,and suggests that fixed and random effects in the score variancemodel can be well-estimated.

The top row of Figure 4 shows estimated FPCs across allsimulated datasets with Ji = 24; the FPCs are well-estimatedand have no obvious systematic biases. The bottom row showsintegrated squared errors (ISEs) for the FPCs across each pos-sible Ji. As expected, the ISEs are smaller for the FPCs withlarger score variances, and decrease as Ji increases. For 12 andespecially for 4 curves per subject, estimates of the FPCs cor-respond to linear combinations of the simulated FPCs, leadingto high ISEs and to inaccurate estimates of parameters in ourscore variance model (examples of poorly estimated FPCs canbe seen in Appendix F).

Panels in the top row of Figure 5 show that estimates of fixedeffect score variance parameters are shrunk towards zero, espe-cially for lower numbers of curves per subject and FPCs 3 and 4.We attribute this to overfitting of the random effects in themeanmodel, which incorporates some of the variability attributableto the FPCs into the estimated random effects and reduces esti-mated score variances. Score variance random effects, shown inthe bottom row of Figure 5, are more accurately estimated withmore curves per subject.

Figure 6 and Table 1 show results from a comparison of theVB estimation procedure to the sequential estimation (SE) andHamiltonianMonteCarlo (HMC)methods described in Section4. We ran 4 HMC chains for 800 iterations each, and discardedthe first 400 iterations fromeach chain.We assessed convergenceof the chains by examining the convergence criterion of Gelmanand Rubin (1992). Values of this criterion near 1 indicate con-vergence. For each of our simulation runs the criterion for everysampled variable was less than 1.1, and usually much closer to 1,

suggesting convergence of the chains. In general, performancefor the VB and HMC methods is comparable, and both meth-ods are in some respects superior to the performance of the SEmethod. Figure 6 compares the threemethods’ estimation of thescore variance parameters. Especially for FPC 4, the SE methodoccasionally estimates random effect variances at 0; these arerepresented in the lower-right panel of Figure 6 as points wherethe correlation between simulated and estimated score variancerandom effects is 0. Table 1 shows, based on the simulationscenario with Ji = 24, the frequentist coverage of 95% credibleintervals for the VB and HMCmethods, and of 95% confidenceintervals for the SE method, in each case, for the fixed effectscore variance parameters γlk. For FPCs 3 and 4 especially, theSE procedure confidence intervals are too narrow. The medianISE for the functional random effects is about 30% higher withthe VB method than with the SE method. This results from therelative tendency of the VB method to shrink FPC score esti-mates to zero; when the mean of the scores is in fact non-zero,this shifts estimated functional random effects away from zero.Other comparisons of these methods are broadly similar.

The HMC method is more computationally expensive thanthe other two methods. Running 4 chains for 800 iterationsin parallel took approximately 90 minutes for Ji = 24. On one

Table . Coverage of % credible/confidence intervals for the score varianceparameters γlk using the VB, SE, and HMC procedures, for Ji = 24.

FPC Group VB SE HMC

. . . . . . . . . . . . . . . . . . . . . . . .

Page 10: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1011

Figure . Comparison of estimation of score variance fixed and random effects using three methods. Panels in the top row show, for each FPC, group, and estimationmethod, boxplots of signed relative errors (SREs) for estimates of the fixed effect score variance parameters γlk for Ji = 24. Panels in the bottom row show, for each FPCand estimation method, the correlation between random effect score variance parameters gik and their estimates for Ji = 24. Intercepts and slopes for linear regressionsof estimated on simulated random effect score variances are centered around and , respectively (not shown).

processor, by comparison, the SE method took about 20 min-utes, almost entirely to run function-on-scalar regression usingpffr. The VBmethod took approximately six minutes, includ-ing the grid search to set the value of the parameter π , whichcontrols the balance between zeroth and second-derivativepenalties in the estimation of functional random effects.

6. Analysis of Kinematic Data

We now apply the methods described above to our motivat-ing dataset. To reiterate, our goal is to quantify the process ofmotor learning in healthy subjects, with a focus on the reduc-tion of movement variability through repetition. Our datasetconsists of 26 healthy, right-handed subjects making repeatedmovements to each of 8 targets. We focus on estimation, inter-pretation and inference for the parameters in a covariate andsubject-dependent heteroscedastic FPCA model, with primaryinterest in the effect of repetition number in the model for scorevariance. We hypothesize that variance will be lower for laterrepetitions due to skill learning.

Prior to fitting the model, we rotate all movements to be inthe direction of the target at 0◦ so that the X axis is the majoraxis of movement. For this reason, variation along the X axisis interpretable as variation in movement extent and variationalong theY axis is interpretable as variation in movement direc-tion. We present results for univariate analyses of the PX

i j (t ) andPYi j (t ) position curves in the right hand and describe a bivariateapproach to modeling the same data.

We present models with 2 FPCs, since 2 FPCs are sufficientto explain roughly 95% of the movement variability (and usu-ally more) of movements remaining after accounting for fixedand random effects in the mean structure. Most of the variabil-ity of movements around themean is explained by the first FPC,so we emphasize score variance of the first FPC as a convenientsummary for themovement variability, and briefly present someresults for the second FPC.

6.1. Model

We examine the effect of practice on the variability of move-ments while accounting for target and individual-specificidiosyncrasies. To do this, we use amodel for score variance thatincludes a fixed intercept and slope parameter for each targetand one random intercept and slope parameter for each subject-target combination. Correlation between score variance ran-dom effects for different targets for the same subject is inducedvia a nested random effects structure. The mean structure forobserved curves consists of functional intercepts βl for each tar-get l ∈ {1, . . . , 8} and random effects bil for each subject-targetcombination, to account for heterogeneity in the average move-ment across subjects and targets. Our heteroscedastic FPCAmodel is therefore:

pi j =8∑

l=1

I(tari j = l)(�βl + �bil

)+

K∑k=1

ξi jk�φk + εi j (9)

ξi jk ∼ N[0, σ 2

ξi jk= exp

( 8∑l=1

I(tari j = l)

(γlk,int + gilk,int + (repi j − 1)(γlk,slope + gilk,slope)

))]gilk ∼ MVN

[gik,�gik

] ; gik ∼ MVN[0,�gk

](10)

The covariate tari j indicates the target to which movement jby subject i is directed. The covariate repi j indicates the repe-tition number of movement j, starting at 1, among all move-ments by subject i to the target to whichmovement j is directed,and I(·) is the indicator function. To accommodate differencesin baseline variance across targets, this model includes separate

Page 11: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

1012 D. BACKENROTH ET AL.

population-level intercepts γlk,int for each target l. The slopesγlk,slope on repetition number indicate the change in variancedue to practice for target l; negative values indicate a reduc-tion inmovement variance. To accommodate subject and target-specific effects, each subject-target combination has a randomintercept gilk,int and a random slope gilk,slope, and each subjecthas an overall random intercept gik,int and overall random slopegik,slope, in the score variance model for each functional prin-cipal component. This model parameterization allows differ-ent baseline variances and changes in variance for each targetand subject, but shares FPC basis functions across targets. Themodel also assumes independence of functional random effectsbil , l = 1, . . . , 8 by the same subject to different targets, as wellas independence of functional random effects bil and score vari-ance random effects gilk for the same subject. The validity ofthese assumptions for our data are discussed in Appendix D.

Throughout, fixed effects γlk,int and γlk,slope are givenN[0, 100] priors. Random effects gilk,int and gilk,slope are mod-eled using a bivariate normal distribution to allow for correla-tion between the random intercept and slope parameters in eachFPC score variance model, and with nesting to allow for corre-lations between the random effects for the same subject and dif-ferent targets. We use the empirical Bayes method described inSection 4.2.4 to set the scale matrix parameters of the inverse-Wishart priors for gilk and gik. Appendix D includes an analysiswhich examines the sensitivity of our results to various choicesof prior hyperparameters.

We fit (9) and (10) using our VBmethod, with K = 2 princi-pal components and a cubic B-spline evaluation matrix � withKθ = 10 basis functions.

6.2. Results

Figure 7 shows estimated score variances as a function of repeti-tion number for X and Y coordinate right-hand movements toall targets. There is a decreasing trend in score variance for thefirst principal component scores for all targets and for both theX andY coordinates, which agrees with our hypotheses regard-ing learning. Figure 7 also shows that nearly all of the varianceof movement is attributable to the first FPC. Baseline variance isgenerally higher in theX direction than theY direction, indicat-ing that movement extent is generally more variable thanmove-ment direction.

To examine the adequacy of modeling score variance as afunction of repetition numberwith a linearmodel, we comparedthe results ofmodel (10)with amodel for the score variances sat-urated in repetition number, i.e., where each repetition numberm has its own set of parameters γlkm in the model for the scorevariances:

ξi jk ∼ N

[0, σ 2

ξi jk= exp

( 8∑l=1

24∑m=1

I(tari j = l, repi j = m)γlkm

)].

(11)

The results for these two models are included in Figure 7. Thegeneral agreement between the linear and saturatedmodels sug-gests that the slope-intercept model is reasonable. For some tar-gets score variance is especially high for the first movement,

which may reflect a familiarization with the experimental appa-ratus.

We now consider inference for the decreasing trend in vari-ance for the first principal component scores. We are interestedin the parameters γl1,slope, which estimate the population-leveltarget-specific changes in score variance for the first principalcomponent with each additional movement. Figure 8 shows VBestimates and 95% credible intervals for the γl1,slope parametersfor movements by the right hand to each target. All the pointestimates γl1,slope are lower than 0, indicating decreasing firstprincipal component score variance with additional repetition.For some targets and coordinates there is substantial evidencethat γl1,slope < 0; these results are consistent with our under-standing ofmotor learning, although they do not adjust formul-tiple comparisons.

Appendix C includes results of a bivariate approach tomodeling movement kinematics, which accounts for the 2-dimensional nature of the movement. In this approach, the Xand Y coordinates of curves are concatenated, and each princi-pal component reflects variation in both X and Y coordinates.For curves rotated to extend in the same direction, the results ofthis approach suggests that variation in movement extent (rep-resented by the X coordinate) and movement direction (repre-sented by the Y coordinate) are largely uncorrelated: the esti-mate of the first bivariate FPC represents variation primarily intheX coordinate, and is similar to the estimate of the first FPC inthe X coordinate model, and vice versa for the second bivariateFPC. Analyses of score variance, then, closely follow the preced-ing univariate analyses.

Appendix B includes an analysis of data for one target usingthe VB, HMC, and SEmethods. The threemethods yield similarresults.

7. Discussion

This article develops a framework for the analysis of covari-ate and subject-dependent patterns of movement variability inkinematic data. Our methods allow for flexible modeling of thecovariate-dependence of variance of functional data with eas-ily interpretable results. Our approach allows for the estimationof subject-specific effects on variance, as well as the considera-tion ofmultiple covariates.We aremotivated by studies ofmotorcontrol, and anticipate these methods can be broadly applied inthis context; for example, ongoing work uses themethods devel-oped here to study the initial post-stroke degree of impairmentand the recovery process over time.

By applying these methods to our motivating dataset, wehave demonstrated that movement variability is reduced withrepetition. Results in Appendix A additionally show that thebaseline level of skill of subjects is correlated across targets andhands, and that baseline movement variability is considerablygreater in the non-dominant than the dominant hand. Furtherapplications of these methods in scientifically important con-texts could focus, for example, on whether movement variabil-ity is reduced with training faster in the dominant hand, oron whether training with one hand transfers skill to the otherhand. Further research could also investigate target-specific dif-ferences in improvement of variance with training. Movementsto someof the targets require coordination between the shoulder

Page 12: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1013

Figure . VB estimates of score variances for right handmovements to each target (in columns), separately for each direction (X orY , in rows). Panels show the VB estimatesof the score variance as a function of repetition number using the slope-interceptmodel () in red and orange (first and second FPC, respectively), and using the saturatedone-parameter-per-repetition number model (), in black and gray (first and second FPC, respectively).

and elbow,whereas others are primarily single-jointmovements;the effectiveness of training may depend on the complexity ofthe movement.

We have provided three different estimation approaches forfitting heteroscedastic functional principal components mod-els. Given its computational efficiency and comparable accuracyto the HMC and SE methods, we recommend use of the VBapproach for exploratory analyses and model building. How-ever, because of its approximate nature, we advise that any con-clusions derived from the VB approach be confirmed with oneof the other two methods, perhaps with a subset of the data ifrequired for computational feasibility.

An alternative approach to the analysis of this dataset couldtreat the target effects γlk,int and γlk,slope in model (10) for thescore variances as random effects centered around parame-ters μk,int and μk,slope, representing the average across-targetbaseline score variance and change in score variance withrepetition. Some advantages of this approach would be the

estimation of parameters that summarize the global effect ofrepetition on movement variability and shrinkage of the target-specific score variance parameters. However, with only 8 ran-dom target effects, the model would be sensitive to the speci-fication of priors. Moreover, as discussed above, movements todifferent targets impose different demands on coordination andskill, which may reduce the interpretability of the parametersμk,int and μk,slope.

Our analysis here is of curves linearly registered onto a com-mon time domain, although our method could be applied tocurves with different time domains, or to sparsely observedfunctional data. Our research group is currently working ondeveloping an improved approach to registration in kinematicexperiments which will take account of the repeated obser-vations at the subject level by seeking to estimate subject-and curve-specific warping functions. This approach, com-bined with the methods we present in the current article,will eventually allow a more complete model for movement

Figure . VB estimates of γl1,slope . This figure shows VB estimates and % credible intervals for target-specific score variance slope parameters γl1,slope for movements bythe right hand to each target, for the X andY coordinates.

Page 13: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

1014 D. BACKENROTH ET AL.

variability that takes into account both variability in movementduration and variability in movement trajectories.

There are several directions for further development. A fullBayesian treatment could estimate all quantities in model (5)jointly, or could condition on only the FPCs and jointly esti-mate all other quantities; given the very flexible nature ofthis model, additional constraints might be required in sucha Bayesian treatment to improve identifiability. More complexmodels could allow for correlations between functional randomeffects and score variance random effects. Considering our datafrom the perspective of shape analysismay provide better under-standing of interpretable movement features like location, scaleand orientation (Kurtek et al. 2012; Gu, Pati, and Dunson 2012).Finally, an alternative approach to that presented here wouldbe to model covariate-dependent score distributions throughquantile regression. This may produce valuable insights into thecomplete distribution of movements, especially when this is notsymmetric, but some work is needed to understand the connec-tion of this technique to traditional FPCA.

8. Supplementary Materials

The supplementary materials contains the following appen-dices: A, containing additional results from our real dataanalysis; B, containing results comparing the VB, HMC andSE methods as applied to the kinematic data; C, containing adescription of and results from a bivariate approach to the anal-ysis of our kinematic data; D, containing a sensitivity analysis forour choice of hyperparameters and examining alternative meanspecifications; E, containing a derivation of our variational Bayesalgorithm and more detail on our HMC implementation; and F,containing additional simulation results.

The supplementary materials also include code implement-ing ourmethods and all the simulation scenarios presented here.This code can fit models in the form ofmodel (5), provided that,as in our real data application and simulations, only one func-tional random effect is associated with each functional observa-tion.

Funding

The work of D. Backenroth and J. Goldsmith was supported in part byAward R21EB018917 from the National Institute of Biomedical Imagingand Bioengineering. The work of J. Goldsmith was also supported byAward R01NS097423-01 from the National Institute of Neurological Dis-orders and Stroke and Award R01HL123407 from the National Heart,Lung, and Blood Institute. National Heart, Lung, and Blood Institute[R01HL123407]; National Institute of Biomedical Imaging and Bioengi-neering [R21EB018917]; National Institute of Neurological Disorders andStroke [R01NS097423-01].

References

Bates, D., Mächler, M., Bolker, B., and Walker, S. (2015), “Fitting LinearMixed-Effects Models Using lme4,” Journal of Statistical Software, 67,1–48. [1006]

Bishop, C. M. (1999), “Bayesian PCA,”Advances in Neural Information Pro-cessing Systems, 382–388. [1006]

Chiou, J.-M., Müller, H.-G., and Wang, J.-L. (2003), “Functional Quasi-Likelihood Regression Models with Smooth Random Effects,” Journalof the Royal Statistical Society, Series B, 65, 405–423. [1006]

Di, C.-Z., Crainiceanu, C. M., Caffo, B. S., and Punjabi, N. M. (2009), “Mul-tilevel Functional Principal Component Analysis,” Annals of AppliedStatistics, 4, 458–488. [1005]

Djeundje, V. A. B., and Currie, I. D. (2010), “Appropriate Covariance-Specification via Penalties for Penalized Splines in Mixed Modelsfor Longitudinal Data,” Electronic Journal of Statistics, 4, 1202–1224.[1007,1008]

Gelman, A., and Rubin, D. B. (1992), “Inference from Iterative SimulationUsing Multiple Sequences,” Statistical Science, 7, 457–472. [1010]

Goldsmith, J., Greven, S., and Crainiceanu, C. M. (2013), “Corrected Con-fidence Bands for Functional Data using Principal Components,” Bio-metrics, 69, 41–51. [1007]

Goldsmith, J., and Kitago, T. (2016), “Assessing Systematic Effects of Strokeon Motor Control using Hierarchical Function-on-Scalar Regression,”Journal of the Royal Statistical Society, Series C, 65, 215–236. [1006]

Goldsmith, J., Scheipl, F., Huang, L., Wrobel, J., Gellar, J., Harezlak, J.,McLean, M. W., Swihart, B., Xiao, L., Crainiceanu, C., and Reiss, P. T.(2016), refund: Regression with Functional Data. R package version 0.1-17, available at http://CRAN.R-project.org/package=refund [1006]

Goldsmith, J., Wand, M. P., and Crainiceanu, C. M. (2011), “FunctionalRegression via Variational Bayes,” Electronic Journal of Statistics, 5,572–602. [1006]

Goldsmith, J., Zipunnikov, V., and Schrack, J. (2015), “Generalized Multi-level Function-on-Scalar Regression and Principal Component Analy-sis,” Biometrics, 71, 344–353. [1006,1008]

Gu, K., Pati, D., and Dunson, D. B. (2012), “Bayesian Hierarchical Model-ing of Simply Connected 2D Shapes,” arXiv preprint arXiv:1201.1658.[1014]

Guo, W. (2002), “Functional Mixed Effects Models,” Biometrics, 58,121–128. [1005]

Huang, H., Li, Y., and Guan, Y. (2014), “Joint Modeling and Cluster-ing Paired Generalized Longitudinal Trajectories with Application toCocaine Abuse Treatment Data.” Journal of the American StatisticalAssociation, 83, 210–223. [1006,1008]

Huang, V., Ryan, S., Kane, L., Huang, S., Berard, J., Kitago, T., Mazzoni,P., and Krakauer, J. (2012), “3D Robotic Training in Chronic StrokeImproves Motor Control but not Motor Function,” Society for Neuro-science, October 2012, New Orleans. [1004]

James, G. M., Hastie, T. J., and Sugar, C. A. (2000), “Principal ComponentModels for Sparse Functional Data,” Biometrika, 87, 587–602. [1006]

Jiang, C.-R., andWang, J.-L. (2010), “Covariate Adjusted Functional Princi-pal Components Analysis for Longitudinal Data,” The Annals of Statis-tics, 38, 1194–1226. [1006]

Jordan, M. I. (2004), “Graphical Models,” Statistical Science, 19, 140–155.[1006]

Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. (1999), “AnIntroduction to Variational Methods for Graphical Models,” MachineLearning, 37, 183–233. [1006]

Kitago, T., Goldsmith, J., Harran, M., Kane, L., Berard, J., Huang, S.,Ryan, S. L., Mazzoni, P., Krakauer, J. W., and Huang, V. S. (2015),“Robotic Therapy for Chronic Stroke: General Recovery of Impair-ment or ImprovedTask-Specific Skill?” Journal of Neurophysiology, 114,1885–1894. [1004]

Krakauer, J. W. (2006), “Motor Learning: Its Relevance to Stroke Recov-ery and Neurorehabilitation,”Current Opinion in Neurology, 19, 84–90.[1003]

Kurtek, S., Srivastava, A., Klassen, E., and Ding, Z. (2012), “Statistical Mod-eling of Curves Using Shapes and Related Features,” Journal of theAmerican Statistical Association, 107, 1152–1165. [1014]

McLean, M. W., Scheipl, F., Hooker, G., Greven, S., and Ruppert, D.(2013), “Bayesian Functional GeneralizedAdditiveModels for SparselyObserved Covariates,” under review. [1006]

Morris, J. S. and Carroll, R. J. (2006), “Wavelet-Based Functional MixedModels,” Journal of the Royal Statistical Society, Series B, 68, 179–199.[1005]

Ormerod, J., andWand,M. P. (2012), “Gaussian Variational ApproximationInference for Generalized Linear Mixed Models,” The American Statis-tician, 21, 2–17. [1006]

Peng, J., and Paul, D. (2009), “A Geometric Approach to Maximum Likeli-hood Estimation of the Functional Principal Components from Sparse

Page 14: Modeling Motor Learning Using Heteroscedastic Functional …blam-lab.org/wp-content/uploads/2019/01/Modeling-Motor... · 2019. 1. 31. · 1004 D.BACKENROTHETAL. thetargetatthepointofreturn.Subjectsarenotrewardedfor

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 1015

Longitudinal Data,” Journal of Computational and Graphical Statistics,18, 995–1015. [1006]

Ramsay, J. O., and Silverman, B. W. (2005), Functional Data Analysis, NewYork: Springer. [1005]

Scheipl, F., Staicu, A.-M., and Greven, S. (2015), “Functional AdditiveMixed Models,” Journal of Computational and Graphical Statistics, 24,477–501. [1005,1007]

Scholz, J.-P., and Schöner, G. (1999), “The Uncontrolled Manifold Con-cept: IdentifyingControlVariables for a Functional Task,”ExperimentalBrain Research, 126, 289–306. [1003]

Schwarz, G. (1978), “Estimating the Dimension of a Model,” The Annals ofStatistics, 6, 461–464. [1008]

Shmuelof, L., Krakauer, J. W., andMazzoni, P. (2012), “How is aMotor SkillLearned? Change and Invariance at the Levels of Task Success and Tra-jectory Control,” Journal of Neurophysiology, 108, 578–594. [1003]

Stan Development Team (2013), Stan Modeling Language User’s Guide andReference Manual, Version 1.3, available at http://mc-stan.org/ [1008]

Tanaka, H., Sejnowski, T. J., and Krakauer, J. W. (2009), “Adaptation toVisuomotor Rotation Through Interaction between Posterior Parietal

andMotorCortical Areas,” Journal ofNeurophysiology, 102, 2921–2932.[1004]

Tipping, M. E., and Bishop, C. (1999), “Probabilistic Principal ComponentAnalysis,” Journal of the Royal Statistical Society, Series B, 61, 611–622.[1006]

Titterington, D. M. (2004), “Bayesian Methods for Neural Networks andRelated Models,” Statistical Science, 19, 128–139. [1006]

van der Linde, A. (2008), “Variational Bayesian Functional PCA,” Compu-tational Statistics and Data Analysis, 53, 517–533. [1006]

Šmídl, V., and Quinn, A. (2007), “On Bayesian Principal ComponentAnalysis,” Computational Statistics & Data Analysis, 51, 4101–4123.[1008]

Yao, F., Müller, H., and Wang, J. (2005), “Functional Data Analysis forSparse Longitudinal Data,” Journal of the American Statistical Associ-ation, 100, 577–590. [1004,1005,1006]

Yarrow, L., Brown, P., and Krakauer, J.-W. (2009), “Inside the Brainof an Elite Athlete: The Neural Processes that Support HighAchievement in Sports,” Nature Reviews Neuroscience, 10, 585–596.[1003]