(it is all about interdisciplinarity) dynamic molecular nano-machines: the study of structurally...

44
Dynamic molecular nano- machines: The study of structurally heterogeneous specimen images populations (it is all about interdisciplinarity) (it is all about interdisciplinarity) José María Carazo [email protected] Biocomputing Unit National Center of Biotechnology

Upload: may-newton

Post on 13-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Dynamic molecular nano-machines: The study of structurally

heterogeneous specimen images populations

(it is all about interdisciplinarity)(it is all about interdisciplinarity)

José María [email protected]

Biocomputing UnitNational Center of Biotechnology

Page 2: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Talk organizationTalk organization • 1.- Goal Statement1.- Goal Statement

• 2.- Means: Exploratory Data Analysis in EM2.- Means: Exploratory Data Analysis in EM

• 3.- Example cases in 2D and 3D3.- Example cases in 2D and 3D

• 4.- New developments: The case of “quantitative” SOM 4.- New developments: The case of “quantitative” SOM clustering in 2D and 3D and Oblique Analyisclustering in 2D and 3D and Oblique Analyis

• 5.- A Challenging case: “The cell at atomic resolution”5.- A Challenging case: “The cell at atomic resolution”

• 6.- Conclusions6.- Conclusions

Page 3: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

1.- Goal statement1.- Goal statement

• We wish to analysis quantitatively and as automatically We wish to analysis quantitatively and as automatically as possible large data sets formed by projection images as possible large data sets formed by projection images of macromolecular nanomachines.of macromolecular nanomachines.

• Most likely, there will be some form of structural Most likely, there will be some form of structural heterogeneicity not known at prioriheterogeneicity not known at priori

• In 3DEM the general problem is simplified to the one In 3DEM the general problem is simplified to the one of sorting projection images into clases, normally of sorting projection images into clases, normally extracting the images only from one orientation, and extracting the images only from one orientation, and then reconstruct separately each “class” as formed by then reconstruct separately each “class” as formed by an structurally homogeneous data set of projection an structurally homogeneous data set of projection data . That is, decouple the 2D/3D problem!data . That is, decouple the 2D/3D problem!

Page 4: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

2.- Means: Exploratory data analysis2.- Means: Exploratory data analysis

PCA (Principal Component Analysis)CA (Correspondence Analysis) Projection pursuitMDS (Multidimensional scaling)Sammon mappingCCA (Curvilinear Component Analysis)ICA (Independent Component Analysis)Principal curvesISOMAPLLE (Locally Linear Embedding)SOM (Self-Organizing Maps)

KenDerSOM

NNMF

Mapping/projection Clustering

Hierarchical:Agglomerative (HAC)Divisive

Partitional clusteringFuzzyHard

Model-basedDensity-basedGrid-basedGraph-based

Visualization

Page 5: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

• pH 7.6• 0.1 mM ATP

3.- Examples cases in 2D and 3D3.- Examples cases in 2D and 3DLet start with a (noisy) micrographLet start with a (noisy) micrograph

DnaB.DnaC in vitreous ice

Page 6: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Step 1: Particle selection (in single Step 1: Particle selection (in single molecules)molecules)

Particles belonging to the molecule under study are selected from the micrograph

Every particle is a projection of the volume. But … from where ?? (and from which structure)???

Page 7: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Step 2: Exploratory Data Analysis Step 2: Exploratory Data Analysis of the 2D Imagesof the 2D Images

• Use some form of Information Mapping/ Use some form of Information Mapping/ Projection/ Projection/ Dimensionality ReductionDimensionality Reduction

• Apply some form of derived information Apply some form of derived information visualizationvisualization

• Perform a clustering/classification processPerform a clustering/classification process

Page 8: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

-

Proyectar Retroproyectar

x

...

Iterar (n veces)

Step 3: Three-dimensional Step 3: Three-dimensional reconstruction from projectionsreconstruction from projections

Conceptual schema

Page 9: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Talk organizationTalk organization • 1.- Goal Statement1.- Goal Statement

• 2.- Means: Exploratory Data Analysis in EM2.- Means: Exploratory Data Analysis in EM

• 3.- Example cases in 2D and 3D3.- Example cases in 2D and 3D

• 4.- New develpments: The case of “quantitative” SOM 4.- New develpments: The case of “quantitative” SOM clustering in 2D and 3D and the new Oblique Analyisclustering in 2D and 3D and the new Oblique Analyis

• 5.- A practical case:: “Fitting (detailed) pieces into a 5.- A practical case:: “Fitting (detailed) pieces into a (larger) puzzle”(larger) puzzle”

• 7.- Conclusions7.- Conclusions

Page 10: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Dimensionality Reduction: Dimensionality Reduction: Two new approachesTwo new approaches

• New “quantitative” Self Organizing Maps (SOM’s)New “quantitative” Self Organizing Maps (SOM’s)

• New “Oblique Projections” (Non-Negative Matrix New “Oblique Projections” (Non-Negative Matrix Factorization)Factorization)

SIMPLE

THOUSANDS OF PAPERS

HUNDREDS OF LICENSED PATENTS

Page 11: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Self-Organizing Maps (SOM) (“Kohonen’s Maps”)

•It is a SIMPLE neural network model that simulates the hypothetical self-organization of the neurons in the brain cortex when some stimulus is presented.

Teuvo KohonenTeuvo Kohonen

Dr. Eng., Emeritus Professor of the Academy of Finland; Academician

Page 12: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Sstructure and Functionality of SOMSstructure and Functionality of SOM::

Steps:Steps:

1.1. Input data is presented to the SOMInput data is presented to the SOM;;

2.2. Neurons in the output layer compete Neurons in the output layer compete each othereach other;;

3.3. The winner (most similar neuron) is The winner (most similar neuron) is updatedupdated;;

4.4. The neighbors of the winner are also The neighbors of the winner are also updatedupdated,, but in a much lesser scale. but in a much lesser scale.

vvi,ti,t = = vvi,t-1i,t-1 + + t t hhr,tr,t((xxkk--vvi,t-1i,t-1))

SOM

Input Data: x0, x1, x2,..,xn

“Neurons”

Page 13: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

SOM’s interesting properties:

Smoothness

1- Fidelity to the input data2- Smoothness

vi,t = vi,t-1 + t hr,t(xk-vi,t-1)

Fidelity to the input data

Updating rule:

Page 14: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Functionality of SOM

Original data:

Page 15: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Final map:

Page 16: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

B C

SOM at work:Two-dimensional (2D) structural analysis of double LAT hexamers on the SV40 Origin of replication

24 nm

8 nm

12 n

m

Page 17: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about
Page 18: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

HOWEVER…..Problems with Kohonen’s SOM

Page 19: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Heuristic method….

Mathematical problem

Algorithm

Algorithm

What is the mathematical

problem?

Kohonen SOMNormal way:

Page 20: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

…….not quantitative .not quantitative

MOTIVACION:MOTIVACION:

““The SOM algorithm is very astonishing. On the one hand, it is very simple to The SOM algorithm is very astonishing. On the one hand, it is very simple to write down and to simulate, its practical properties are clear and easy to write down and to simulate, its practical properties are clear and easy to observe. observe. However, on the other hand, its theoretical properties still remain However, on the other hand, its theoretical properties still remain without proof in the general case, despite the tremendous efforts of several without proof in the general case, despite the tremendous efforts of several authors….authors….

……. The Kohonen . The Kohonen algorithm is surprisingly resistant to a complete algorithm is surprisingly resistant to a complete mathematical study. As far as we know, the only case where a complete mathematical study. As far as we know, the only case where a complete analysis has been achieved is the one-dimensional case…”analysis has been achieved is the one-dimensional case…””.”.

M. Cottrell et.al. M. Cottrell et.al.

Neurocomputing 21 (1998) 119-138Neurocomputing 21 (1998) 119-138

Page 21: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

MOTIVATION: MOTIVATION: Develop a “new” SOM, Develop a “new” SOM, that inherits whatever is good from SOM that inherits whatever is good from SOM

and whose results were quantitativeand whose results were quantitative

• Focus:Focus:– The work has been focused to solve the following

problem: “Find a functional whose optimization generate self organization properties similar to the ones typical of Kohonen algorithm”

– Concepts to keep from Kohonen:» 1. “Representants” of the input data

distributed over a low-dimensionality grid » 2. Interaction among Representant via a

neighborhood function

Page 22: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Note: interdisciplinarity does indeed Note: interdisciplinarity does indeed work?????work?????. YES!New Idea (Joint work Madrid- Key Ins. Zurich):

Given a set of original data items, the problem is to find a set of surrogate data items (code vectors) such that their estimated probability density resembles as best as possible the density of the

given data.

Page 23: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

RESULT: RESULT: Smoothly Distributed Smoothly Distributed Kernel Probability Density EstimatorKernel Probability Density Estimator

• In the context of “Exploratory Data Analysis”, it would In the context of “Exploratory Data Analysis”, it would be interesting to work with be interesting to work with a new SOM optimized to a new SOM optimized to preserve the estimation of the pdf of the input in the preserve the estimation of the pdf of the input in the mapped (output) spacemapped (output) space

• Results:Results:

Calculation of Uij

Iterative calculation of Vj

Maximum Likelihood

Page 24: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Application in 2D analysis:

Original T-Antigen double hexamers cryo-electron single particle images.

Page 25: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Short & straight(1576)

Long & straight(1411)

Middle curvature(2048)

High curvature(2579)

Average

Page 26: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Application in 3D Electron TomographyApplication in 3D Electron TomographyStudy of muscle contraction mechanismsStudy of muscle contraction mechanisms

Page 27: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Insect Flight Muscle:Insect Flight Muscle:

Average motifs

3D Reconstruction

Original motifs

Page 28: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Raw 3D motifs:Raw 3D motifs:

Page 29: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Unsupervised classification using Unsupervised classification using KerDenSOMKerDenSOM

A

B C

D

E F

Page 30: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Clusters:Clusters:

D E

A B C

F

Single chevrons

Incomplete double chevrons Double chevrons

Double chevrons Double chevrons

Incomplete double chevrons

Page 31: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Dimensionality Reduction: Dimensionality Reduction: Two new approachesTwo new approaches

• New, “quantitative” Self Organizing MapsNew, “quantitative” Self Organizing Maps

• New “Oblique Projections” (Non-Negative New “Oblique Projections” (Non-Negative Matrix Factorization)Matrix Factorization)

THOUSANDS OF PAPERS

HUNDREDS OF LICENSED PATENTS

Page 32: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Raw data PCA axes

Dimensionality reduction:Dimensionality reduction:Non-negative Matrix factorization versus PCANon-negative Matrix factorization versus PCA

Page 33: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Example on PCAExample on PCA

Original data

Page 34: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

…….However….However…

MOTIVATION:MOTIVATION:

We would like to attach as closely as possible a “biological meaning” to We would like to attach as closely as possible a “biological meaning” to the factors…..the factors…..

Of course, we have not defined what is a “biological meaning”….Of course, we have not defined what is a “biological meaning”….

Let state that we would like to decompose the original data into factors Let state that we would like to decompose the original data into factors that represent “compact parts of the original data”, so that the that represent “compact parts of the original data”, so that the original data would be obtained by a linear combination of these original data would be obtained by a linear combination of these “compact parts” (FACE = MOUTH + EYES + ….)“compact parts” (FACE = MOUTH + EYES + ….)

Page 35: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Factorization ModelFactorization Model

• We require W and H to be non-negative (V is We require W and H to be non-negative (V is assumed to be non-negative)assumed to be non-negative)

Page 36: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

The The nicenice idea idea::

• By requiring that W and H were non-negative, plus By requiring that W and H were non-negative, plus enforcing sparsity, we can go in the direction of enforcing sparsity, we can go in the direction of

• ““parts-based” basis imagesparts-based” basis images

• As with Kohonen, many papers, patents, even DB As with Kohonen, many papers, patents, even DB developers express “support” for NMF functionality in developers express “support” for NMF functionality in future releases!! (Oracle version 10)future releases!! (Oracle version 10)

• As with Kohonen, how to enforce sparsity in H and W As with Kohonen, how to enforce sparsity in H and W is a matter of research (Our approach: “SNMF”).is a matter of research (Our approach: “SNMF”).

Page 37: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Example on (S)NMFExample on (S)NMF

Page 38: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

GG4040P: 4 factorsP: 4 factors

W:

Clusters of H:

Average images:

Page 39: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Talk organizationTalk organization • 1.- Goal Statement1.- Goal Statement

• 2.- Means: Exploratory Data Analysis in EM2.- Means: Exploratory Data Analysis in EM

• 3.- Example cases in 2D and 3D3.- Example cases in 2D and 3D

• 4.- New develpments: The case of “quantitative” SOM 4.- New develpments: The case of “quantitative” SOM clustering in 2D and 3D and Oblique Analyisclustering in 2D and 3D and Oblique Analyis

• 5.- A Challenging case: “Fitting (small) pieces into a 5.- A Challenging case: “Fitting (small) pieces into a (large) puzzle”(large) puzzle”

• 6.- Conclusions6.- Conclusions

Page 40: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Examples: Multiresolution Examples: Multiresolution modelingmodeling

• The DnaB case (in the six-fold form)The DnaB case (in the six-fold form)

ATP Binding PocketP-loop

Arg loop

Page 41: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

C3 C6 C3-C6

Fitting of the Modelled Domains on the Density MapsFitting of the Modelled Domains on the Density Maps

CN

CN

C CN N

C CN

N

CCNN

CC NN

CCNN

Page 42: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

ConclussionsConclussions

• The increased yield of data in all fields, The increased yield of data in all fields, together with their complexity, requires to together with their complexity, requires to consider advanced Pattern Recognition tools consider advanced Pattern Recognition tools coupled to the reconsctruction processcoupled to the reconsctruction process

• Electron Microscopy, already an the frontier Electron Microscopy, already an the frontier between different disciplines, has still to between different disciplines, has still to incorporate new computer science, maths and incorporate new computer science, maths and AI skills into their research teamsAI skills into their research teams

Page 43: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

Note: interdisciplinarity does Note: interdisciplinarity does indeed work?????indeed work?????

• YES!YES!

• As an example, we can now start tackling As an example, we can now start tackling directly in 3D the directly in 3D the a posterioria posteriori angular angular assignement of projection directions in an assignement of projection directions in an heterogeneous mixture of 3D structures by a heterogeneous mixture of 3D structures by a new multirefence 3D ML approach!new multirefence 3D ML approach!

(Joint work MADRID-CUNY)(Joint work MADRID-CUNY)

Page 44: (it is all about interdisciplinarity) Dynamic molecular nano-machines: The study of structurally heterogeneous specimen images populations (it is all about

AcknowledgementsAcknowledgements• The CNB Biocomputing The CNB Biocomputing

UnitUnit::

• L.E.DonateL.E.Donate• Mikel ValleMikel Valle• Carmen San Martin Carmen San Martin • Yolanda RobledoYolanda Robledo

Rafael NúñezRafael Núñez• Yacob Yacob

• Monica Chagoyen Monica Chagoyen • Roberto MarabiniRoberto Marabini• Alberto Pascual Alberto Pascual • Carlos-Oscar SanchezCarlos-Oscar Sanchez• Sjors ScheresSjors Scheres• Javier A. Velázquez-MurielJavier A. Velázquez-Muriel• Pedro CarmonaPedro Carmona• David ElgueroDavid Elguero• Jesus CuencaJesus Cuenca

• IntegromicsIntegromics

• Pedro A. De AlarcónPedro A. De Alarcón

• Extra mural:Extra mural:

• CUNY: Dr.Herman`s LabCUNY: Dr.Herman`s Lab

• NAU (Camberra): Dr. N. DixonNAU (Camberra): Dr. N. Dixon

• Zurich Key Inst.: Dr. R.Pascual-Zurich Key Inst.: Dr. R.Pascual-MarquisMarquis

• (and MANY other (and MANY other interactions)interactions)