# individualized fusion learning (ifusion) for ... ... personalized (precision)...

Post on 24-Aug-2020

0 views

Embed Size (px)

TRANSCRIPT

Individualized Fusion Learning (iFusion) for Individualized Inference

Min-ge Xie Department of Statistics, Rutgers University

Joint work with Jieli Shen and Regina Liu

IMA Workshop on Precision Medicine Minneapolis/St. Paul, Minnesota, USA; September 14-16 2017

Research supported in part by grants from NSF and Dun&Bradstreet

Big data, heterogeneity & fusion learning

Today, the integration of computer technology into science and daily life has enabled the collection of big data across all fields

Impose difficulties/challenges (and also lead to opportunities)

– Memory/storage issues: too large to fit into a single computer or a single site.

– Computing issues: too expensive to perform any computationally intensive analysis

– Statistical issues: heterogeneity (in designs, information, & more), sparsity, non-conventional (e.g.,image/voice/text/network) data, missing data, random/stochastic components, etc.

I My task today – Introduce an iFusion approach, which is an example on how to conduct fusion learning and process information in big data using a new statistical inference tool, known as confidence distributions (CDs).

Illustration Example: iFusion with big data

Red = Interest; Blue = Clique (similar ones)

Personalized (precision) medicine/individualized inference

◦ Bias-variance tradeoff

iFusion: Summarize ind. info. in CDs; form ‘clique’ (ties/near ties); combine info in clique – theoretical sound; division of labor

◦ Computationally feasible for big data (vs. conv. hierarchical/mixture/non-parametric Bayesian methods)

Illustrative Example: iFusion with big data

Simulation study: iFusion of big data with no subgroup/clustering structures Setting: Model Yik = αk + βk xik + N(0, 1), for i = 1, . . . , nk , k = 1, . . . ,K

θk = (αk , βk ) = (R cos( [ k−1

5

] 2π 1200 ) +

U(−1,1) nk

,R sin( [ k−1

5

] 2π 1200 ) +

U(−1,1) nk

)

Size/Target: K × nk = 6000× 40 ≈ .24 millions; Target: Individual-1500 (say)

Chain structure: No subgroups

Figure : Parameter values (αk , βk ) (left) and simulated samples (xik , yik ) (right), i = 1, . . . , 50, k = 1, . . . , 6000, with target individual in blue and its clique in yellow.

Question: Borrow information from individuals with θ the same/similar θ1500?

Fusion learning

Fusion learning refers to learning from different studies/sources (or

different parts of a single study) that leads to more effective inference

and prediction than any individual study/part alone.

– Such learning methodology is of vital importance, especially in light of the trove of data collected routinely from various sources in almost all aspects of the real world and at all time!

Fusion learning concerns with four V’s –

– In data science era, we have information explosion with big data of three Vs (Volume, Velocity, Variety!) from different databases, different data sources, different labs, ...

– What do we gain from combining inferences (fusion learning) – Validity (the 4th V) + Enhance/strengthen overall inference

◦ Meta-analysis is one type of fusion learning, albeit fusion learning is much broad.

I Confidence distribution is a useful tool for fusion learning/meta analysis.

Introduction to confidence distribution (CD)

Statistical inference (Parameter estimation):

Point estimate

Interval estimate

Distribution estimate (e.g., confidence distribution)

Example: X1, . . . ,Xn i.i.d. follows N(µ, 1)

Point estimate: x̄n = 1n ∑n

i=1 xi

Interval estimate: (x̄n − 1.96/ √

n, x̄n + 1.96/ √

n)

Distribution estimate: N(x̄n, 1n )

The idea of the CD approach is to use a sample-dependent distribution (or

density) function to estimate the parameter of interest.

(Xie and Singh 2013)

CD is very informative – Point estimators, confidence intervals, p-values & more

CD can provide meaningful answers for all questions in statistical inference –

b

(cf., Xie & Singh 2013; Singh et al. 2007)

Definition: Confidence Distribution

Definition:

A confidence distribution (CD) is a sample-dependent distribution

function on parameter space that can represent confidence intervals

(regions) of all levels for a parameter of interest.

– Cox (2013, Int. Stat. Rev. ): The CD approach is “to provide simple and interpretable summaries of what can reasonably be learned from data (and an assumed model).”

– Efron (2013, Int. Stat. Rev. ): The CD development is “a grounding process” to help solve “perhaps the most important unresolved problem in statistical inference” on “the use of Bayes theorem in the absence of prior information.”

� Wide range of examples: bootstrap distribution, (normalized) likelihood function, empirical likelihood, p-value functions, fiducial distributions, some informative priors and Bayesian posteriors, among others

More CD examples

Under regularity conditions, we can prove that a normalized likelihood function (with respect to parameter θ)

L(θ|data)∫ L(θ|data)dθ

is a confidence density function.

Example: X1, . . . ,Xn i.i.d. follows N(µ, 1)

Likelihood function

L(µ|data) = ∏

f (xi |µ) = Ce− 1 2 ∑

(xi−µ)2 = Ce− n 2 (x̄n−µ)

2− 12 ∑

(xi−x̄n)2

Normalized with respect to µ

L(µ|data)∫ L(µ|data)dµ

= ... = 1√

2π/n e−

n 2 (µ−x̄n)

2

It is the density of N(x̄n, 1n )!

More CD examples

Example: (Bivariate normal correlation) Let ρ denote the correlation coefficient of a bivariate normal population; ρ̂ be the sample version.

Fisher’s z z =

1 2

log 1 + ρ̂ 1− ρ̂

has the limiting distribution N ( 1

2 log 1+ρ 1−ρ ,

1 n−3

) =⇒

Hn(ρ) = 1− Φ (√

n − 3 (

1 2

log 1 + ρ̂ 1− ρ̂ −

1 2

log 1 + ρ 1− ρ

)) is an asymptotic CD for ρ, when sample size n→∞.

– Hn(ρ) is a cumulative distribution function on Θ = (−1, 1), the parameter space of ρ

– The quantiles of Hn(ρ) can provide confidence intervals of all levels for ρ.

Three forms of CD presentations

0.0 0.2 0.4 0.6

0 1

2 3

4

mu

C D

d e

n s it y

0.0 0.2 0.4 0.6

0 .0

0 .4

0 .8

mu

C D

0.0 0.2 0.4 0.6

0 .0

0 .4

0 .8

mu C V

.

Confidence density: in the form of a density function hn(θ)

e.g., N(x̄n, 1n ) as hn(θ) = 1√

2π/n e−

n 2 (θ−x̄n)

2 .

Confidence distribution in the form of a cumulative distribution function Hn(θ)

e.g., N(x̄n, 1n ) as Hn(θ) = Φ (√

n(θ − x̄n) )

Confidence curve: CVn(θ) = 2 min

{ Hn(θ), 1− Hn(θ)

} e.g., N(x̄n, 1n ) as CVn(θ) = 2 min

{ Φ (√

n(θ − x̄n) ) , 1− Φ

(√ n(θ − x̄n)

)}

CD — a unifying concept for distributional inference

Our understanding/interpretation: Any approach, regardless of being frequentist, fiducial or Bayesian, can potentially be unified under the

concept of confidence distributions, as long as it can be used to build

confidence intervals of all levels, exactly or asymptotically.

I May provide a union for Bayesian, frequentist & fiducial (BFF) inferences

I Supports new methodology developments — providing inference tools whose solutions are previously unavailable or unknown

◦ From our Rutgers group, for instance -

– New prediction approaches

– New testing methods

– New simulation schemes (⇒Application to precision medicine??)∗

– Combining information from diverse sources through combining CDs (fusion learning/meta analysis, split & conquer, etc.)

Fusion learning by CDs

Key idea (steps)

Summarize relevant data information using a CD in each study

Synthesize information from diverse sources/studies via combination of the CDs from these studies

General (& unifying) framework on combining CDs has been developed

o A simple illustrative example (Stouffer method):

H(c)(θ) = Φ ({

Φ−1 ( H1(θ)

) + . . .+ Φ−1

( HK (θ)

)}/√ K ) .

where Hi (θ) is the CD from the i th study/source

o For more approaches and indepth discussions – see Singh et al (2005), Xie et al (2011) and Schweder and Hjort (2016).

Fusion learning by CDs

Why combine CDs in fusion learning?

CD is informative (much more than a single point or an interval)

CD concept is broad (covering a broad range of example across BFF paradigms)

CD combination is supported by statistical theory (e.g., ensuring frequentist coverage, etc.)

It’s computationally feasible for big data (inherently a "divide- and-conquer” approach)

. . . , flexible, effective, versatile, etc.

Individualized fusion learn