individualized fusion learning (ifusion) for ... ... personalized (precision)...

Download Individualized Fusion Learning (iFusion) for ... ... Personalized (precision) medicine/individualized

If you can't read please download the document

Post on 24-Aug-2020

0 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Individualized Fusion Learning (iFusion) for Individualized Inference

    Min-ge Xie Department of Statistics, Rutgers University

    Joint work with Jieli Shen and Regina Liu

    IMA Workshop on Precision Medicine Minneapolis/St. Paul, Minnesota, USA; September 14-16 2017

    Research supported in part by grants from NSF and Dun&Bradstreet

  • Big data, heterogeneity & fusion learning

    Today, the integration of computer technology into science and daily life has enabled the collection of big data across all fields

    Impose difficulties/challenges (and also lead to opportunities)

    – Memory/storage issues: too large to fit into a single computer or a single site.

    – Computing issues: too expensive to perform any computationally intensive analysis

    – Statistical issues: heterogeneity (in designs, information, & more), sparsity, non-conventional (e.g.,image/voice/text/network) data, missing data, random/stochastic components, etc.

    I My task today – Introduce an iFusion approach, which is an example on how to conduct fusion learning and process information in big data using a new statistical inference tool, known as confidence distributions (CDs).

  • Illustration Example: iFusion with big data

    Red = Interest; Blue = Clique (similar ones)

    Personalized (precision) medicine/individualized inference

    ◦ Bias-variance tradeoff

    iFusion: Summarize ind. info. in CDs; form ‘clique’ (ties/near ties); combine info in clique – theoretical sound; division of labor

    ◦ Computationally feasible for big data (vs. conv. hierarchical/mixture/non-parametric Bayesian methods)

  • Illustrative Example: iFusion with big data

    Simulation study: iFusion of big data with no subgroup/clustering structures Setting: Model Yik = αk + βk xik + N(0, 1), for i = 1, . . . , nk , k = 1, . . . ,K

    θk = (αk , βk ) = (R cos( [ k−1

    5

    ] 2π 1200 ) +

    U(−1,1) nk

    ,R sin( [ k−1

    5

    ] 2π 1200 ) +

    U(−1,1) nk

    )

    Size/Target: K × nk = 6000× 40 ≈ .24 millions; Target: Individual-1500 (say)

    Chain  structure:    No  subgroups  

    Figure : Parameter values (αk , βk ) (left) and simulated samples (xik , yik ) (right), i = 1, . . . , 50, k = 1, . . . , 6000, with target individual in blue and its clique in yellow.

    Question: Borrow information from individuals with θ the same/similar θ1500?

  • Fusion learning

    Fusion learning refers to learning from different studies/sources (or

    different parts of a single study) that leads to more effective inference

    and prediction than any individual study/part alone.

    – Such learning methodology is of vital importance, especially in light of the trove of data collected routinely from various sources in almost all aspects of the real world and at all time!

    Fusion learning concerns with four V’s –

    – In data science era, we have information explosion with big data of three Vs (Volume, Velocity, Variety!) from different databases, different data sources, different labs, ...

    – What do we gain from combining inferences (fusion learning) – Validity (the 4th V) + Enhance/strengthen overall inference

    ◦ Meta-analysis is one type of fusion learning, albeit fusion learning is much broad.

    I Confidence distribution is a useful tool for fusion learning/meta analysis.

  • Introduction to confidence distribution (CD)

    Statistical inference (Parameter estimation):

    Point estimate

    Interval estimate

    Distribution estimate (e.g., confidence distribution)

    Example: X1, . . . ,Xn i.i.d. follows N(µ, 1)

    Point estimate: x̄n = 1n ∑n

    i=1 xi

    Interval estimate: (x̄n − 1.96/ √

    n, x̄n + 1.96/ √

    n)

    Distribution estimate: N(x̄n, 1n )

    The idea of the CD approach is to use a sample-dependent distribution (or

    density) function to estimate the parameter of interest.

    (Xie and Singh 2013)

  • CD is very informative – Point estimators, confidence intervals, p-values & more

    CD can provide meaningful answers for all questions in statistical inference –

    b  

    (cf., Xie & Singh 2013; Singh et al. 2007)

  • Definition: Confidence Distribution

    Definition:

    A confidence distribution (CD) is a sample-dependent distribution

    function on parameter space that can represent confidence intervals

    (regions) of all levels for a parameter of interest.

    – Cox (2013, Int. Stat. Rev. ): The CD approach is “to provide simple and interpretable summaries of what can reasonably be learned from data (and an assumed model).”

    – Efron (2013, Int. Stat. Rev. ): The CD development is “a grounding process” to help solve “perhaps the most important unresolved problem in statistical inference” on “the use of Bayes theorem in the absence of prior information.”

    � Wide range of examples: bootstrap distribution, (normalized) likelihood function, empirical likelihood, p-value functions, fiducial distributions, some informative priors and Bayesian posteriors, among others

  • More CD examples

    Under regularity conditions, we can prove that a normalized likelihood function (with respect to parameter θ)

    L(θ|data)∫ L(θ|data)dθ

    is a confidence density function.

    Example: X1, . . . ,Xn i.i.d. follows N(µ, 1)

    Likelihood function

    L(µ|data) = ∏

    f (xi |µ) = Ce− 1 2 ∑

    (xi−µ)2 = Ce− n 2 (x̄n−µ)

    2− 12 ∑

    (xi−x̄n)2

    Normalized with respect to µ

    L(µ|data)∫ L(µ|data)dµ

    = ... = 1√

    2π/n e−

    n 2 (µ−x̄n)

    2

    It is the density of N(x̄n, 1n )!

  • More CD examples

    Example: (Bivariate normal correlation) Let ρ denote the correlation coefficient of a bivariate normal population; ρ̂ be the sample version.

    Fisher’s z z =

    1 2

    log 1 + ρ̂ 1− ρ̂

    has the limiting distribution N ( 1

    2 log 1+ρ 1−ρ ,

    1 n−3

    ) =⇒

    Hn(ρ) = 1− Φ (√

    n − 3 (

    1 2

    log 1 + ρ̂ 1− ρ̂ −

    1 2

    log 1 + ρ 1− ρ

    )) is an asymptotic CD for ρ, when sample size n→∞.

    – Hn(ρ) is a cumulative distribution function on Θ = (−1, 1), the parameter space of ρ

    – The quantiles of Hn(ρ) can provide confidence intervals of all levels for ρ.

  • Three forms of CD presentations

    0.0 0.2 0.4 0.6

    0 1

    2 3

    4

    mu

    C D

    d e

    n s it y

    0.0 0.2 0.4 0.6

    0 .0

    0 .4

    0 .8

    mu

    C D

    0.0 0.2 0.4 0.6

    0 .0

    0 .4

    0 .8

    mu C V

    .

    Confidence density: in the form of a density function hn(θ)

    e.g., N(x̄n, 1n ) as hn(θ) = 1√

    2π/n e−

    n 2 (θ−x̄n)

    2 .

    Confidence distribution in the form of a cumulative distribution function Hn(θ)

    e.g., N(x̄n, 1n ) as Hn(θ) = Φ (√

    n(θ − x̄n) )

    Confidence curve: CVn(θ) = 2 min

    { Hn(θ), 1− Hn(θ)

    } e.g., N(x̄n, 1n ) as CVn(θ) = 2 min

    { Φ (√

    n(θ − x̄n) ) , 1− Φ

    (√ n(θ − x̄n)

    )}

  • CD — a unifying concept for distributional inference

    Our understanding/interpretation: Any approach, regardless of being frequentist, fiducial or Bayesian, can potentially be unified under the

    concept of confidence distributions, as long as it can be used to build

    confidence intervals of all levels, exactly or asymptotically.

    I May provide a union for Bayesian, frequentist & fiducial (BFF) inferences

    I Supports new methodology developments — providing inference tools whose solutions are previously unavailable or unknown

    ◦ From our Rutgers group, for instance -

    – New prediction approaches

    – New testing methods

    – New simulation schemes (⇒Application to precision medicine??)∗

    – Combining information from diverse sources through combining CDs (fusion learning/meta analysis, split & conquer, etc.)

  • Fusion learning by CDs

    Key idea (steps)

    Summarize relevant data information using a CD in each study

    Synthesize information from diverse sources/studies via combination of the CDs from these studies

    General (& unifying) framework on combining CDs has been developed

    o A simple illustrative example (Stouffer method):

    H(c)(θ) = Φ ({

    Φ−1 ( H1(θ)

    ) + . . .+ Φ−1

    ( HK (θ)

    )}/√ K ) .

    where Hi (θ) is the CD from the i th study/source

    o For more approaches and indepth discussions – see Singh et al (2005), Xie et al (2011) and Schweder and Hjort (2016).

  • Fusion learning by CDs

    Why combine CDs in fusion learning?

    CD is informative (much more than a single point or an interval)

    CD concept is broad (covering a broad range of example across BFF paradigms)

    CD combination is supported by statistical theory (e.g., ensuring frequentist coverage, etc.)

    It’s computationally feasible for big data (inherently a "divide- and-conquer” approach)

    . . . , flexible, effective, versatile, etc.

  • Individualized fusion learn

Recommended

View more >