answering neuroscience questions from connectomics data ... · pdf fileanswering neuroscience...

25
Answering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T.Vogelstein Dept of Statistical Science & Mathematics, Duke University Institute for Data Intensive Engineering and Sciences, Johns Hopkins University Endeavor Scientist Fellowship, Child Mind Institute I’ve tried to avoid text being down here so everybody can see everything

Upload: hacong

Post on 18-Feb-2018

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Answering Neuroscience Questions from Connectomics Data using Statistical Tools

Joshua T. VogelsteinDept of Statistical Science & Mathematics, Duke University

Institute for Data Intensive Engineering and Sciences, Johns Hopkins UniversityEndeavor Scientist Fellowship, Child Mind Institute

I’ve tried to avoid text being down here so everybody can see everything

Page 2: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of
Page 3: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of
Page 4: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of
Page 5: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Take Home Messages• Graphs are mathematical objects too!

• Standard (“Euclidean”) statistical tools are inappropriate

• Nonetheless, we can write down statistical distributions over graphs

• We can formally state many neurobiological questions via statistical graph theory (SGT)

• We can map graphs to Euclidean space, we want those mappings to have desired statistical properties such as consistency, robustness, etc.

• Sometimes STG may be useful

Page 6: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Outline• Motivation

• Some theory stuff

• (an application)

• Celebrations!

Page 7: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

A Concrete Motivating Example

• We estimate graphs from two populations of brains (e.g., different psychiatric conditions, sex, personalities, etc.)

• We want to know: are the two populations different

• This is like a two-sample t-test for graph-valued observations

Page 8: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

What I Do & Don’t Care About(for the purposes of this talk)

• Don’t: How to estimate graphs

• Don’t: Where the graphs came from, eg, MRI, EM, Calcium, Ephys, etc.

• Do: I assume somebody gave me graphs estimated from neural data, some how, using some experimental technique, with neurons and synapses wrong/missing, from some species, at some scale, and i don’t care how (for the purposes of this talk)

Page 9: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Formal Statement of Problem

• G1,...,Gn ~ F0, Gn+1,....Gn+m ~ F1

• H0: F0 = F1

• HA: F0 != F1

• NB: all graphs have the same vertex set (for here, for now)

Page 10: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Graph are Mathematical Objects Too

• G=(V,E)

• V is a set of vertices (nodes) (perhaps a vertex is a neuron)

• E is a set of edges (arcs/links) (perhaps an edge is a synapse)

• Graphs are simple meaning: edges are binary, undirected, no loops (for here, for now)

• I am not analyzing functions of graphs (eg, degree distribution) in this talk; that is an interesting and complementary topic

Page 11: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Why Not Just Use Lasso?

• A is an adjacency matrix, where A(u,v)=1 iff u~v

• Let A & A’ be adjacency matrices of two graphs

• We could vectorize and then use standard techniques, but we might lose some structure from the data

• For example, rows & cols of A correspond to the same vertex, if we vectorize, standard analysis techniques do not use that information

Page 12: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Recall: All of Statistics• The statistical properties of a hypothesis

test (e.g., its power) depends on a statistical model

• For example, a t-test is optimal under certain assumptions data

• But when data are corrupted, robust methods, such as the rank-sum test, have higher power

Conjecture: SGT might be useful to cast and address connectomics questions

Page 13: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Distributions over graphs

• G ~ P, P is some distribution over graphs

• P is discrete, so P(G) is the likelihood of graph G

• Two extremes examples: (i) ER(n,p), (ii) Categorical(theta)

• Number of possible graphs with n vertices?

(draw it; booyah Pillow!)

Page 14: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Latent Position Random Graphs• P[A(u,v)] = f(u,v) in (0,1)

• Posit the existence of a latent vector for each vertex

• The probability of a connection twix u & v is independent of everything conditioned on the two latent vectors

• Intuition from: (i) social network analysis, (ii) neuroscience

• We can also include observed attributes for each vector

Page 15: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Random Dot Product Graphs• Let Xu in R^d for each u

• f(u,v) = <Xu,Xv>

• X=(X1,...,Xn) can be estimated consistently up to a rotation via eig

• X can be estimated quickly via eig

• For sparse graphs, X can be estimated even with n=10^6 or more

• The stochastic block model is a special case

Page 16: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Parametric rainbow

Massively parametric = (practically) nonparametric"5

number of parameters

IndependentBernoulli

HistogramIsing

RBM 3rd orderMaxEnt

cascadedlogistic

Parametric Massively parametric

• Slide from Il Memming Park

• Model of spikes vs graphs

Page 17: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Our Generative Model• for each graph i, y_i ~ Bernoulli(p) # class

• for each graph i, for each vertex u

• X_u^i | y^i ~ Dirichlet(theta_y) # latent positions

• for each graph i, for each edge,

• A(u,v)^i = <X_u^i,X_v^i> # edges

• (you can specify a prior on p and theta’s if you want)

Page 18: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

A Simulated ExampleFx | 0

0.5 1

1

0.5

0

Fx | 1

dimension 1

dim

ensi

on 2

0.5 1

1

0.5

0

0 0.5 10

0.5

1Xi, i ! N 0

0 0.5 10

0.5

1Xj, j ! N 1

0 25 500

25

50Ai, i ! N 0

0 25 500

25

50

vertex #

verte

x #

Aj, j ! N 1

Page 19: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Schematic of Our Approach• Estimate the latent position matrix for each graph

• Compute all pairwise distances between those estimates

• Embed those distances into low-dimensional subspace (via MDS)

• Use standard statistical tests on the embedded graphs

• Gretton & others have developed elegant theory for this style approach

• The art is in choosing the kernel

Page 20: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Our Distance Between Graphs

• d(G,G’) = min_W || Xhat - W*Xhat’||

• Can be solved via SVD: efficient, scalable, exact, awesome

Page 21: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Estimating the Illustration

0 0.5 10

0.5

1True Latent Positions

latent dim. 1

late

nt d

im. 2

0 1 20

1

2Estimated Latent Positions

est. latent dim. 1

est.

late

nt d

im. 2

sample #

sam

ple

#

Distance Matrix

20 40 60 80 100

10080604020

0

−0.3 0 0.4−0.4

0

0.5Class 0 Density Estimate

coordinate 1

coor

dina

te 2

−0.3 0 0.4−0.4

0

0.5Class 1 Density Estimate

coordinate 1

coor

dina

te 2

−0.3 0 0.4−0.4

0

0.5

coordinate 1

coor

dina

te 2

Embedded Graphs

Page 22: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Power of Our Approach

0 50 100 150 200

0.5

1

pow

er

sample size0 25 50 75 1000

0.5

1po

wer

# vertices

NB: i am not claiming this is the best possible method,rather, i’m saying that we have a consistent statistical test

Page 23: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Application to Sex

0 20 40 600

20

40

60

Ai, i ! N 0

0 20 40 600

20

40

60

vertex #

verte

x #

Aj, j ! N 1

sample #

sam

ple

#

Distance Matrix

20 40

40

20

0−9 −2 6 13−8

−0

7

15Class 0 Density Estimate

coordinate 1

coor

dina

te 2

−9 −2 6 13−8

−0

7

15Class 1 Density Estimate

coordinate 1

coor

dina

te 2

−9 −2 6 13−8

−0

7

15

coordinate 1

coor

dina

te 2

Embedded Graphs

Page 24: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Acknowledgements• Carey Priebe

• R. Jacob Vogelstein

• Daniel Sussman

• Vince Lyzinski

• Youngser Park

• Minh Tang

• Yummy

• DARPA (XDATA)

• Child Mind Institute

• CRCNS

• You (please interrupt!)

Page 25: Answering Neuroscience Questions from Connectomics Data ... · PDF fileAnswering Neuroscience Questions from Connectomics Data using Statistical Tools Joshua T. Vogelstein Dept of

Final Slide!• Graphs are awesome, and we can treat them as mathematical objects

and develop statistical tools specifically for graph valued data

• We’ve only just begun....we don’t yet have code to conduct most analyses that we want

• But we have obtained sufficient theory and emperical intuition to develop such tools as appropriate

• Call me: 443.858.9911, [email protected], http://jovo.me

• Questions?