requiring two weeks on 300 cores to process a dataset with · 2017. 4. 4. · [4] w. t. baxter, r....

10

Upload: others

Post on 05-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • requiring two weeks on 300 cores to process a dataset with

    200,000 images.

    We introduce a framework for Cryo-EM density estima-

    tion, formulating the problem as one of stochastic optimiza-

    tion to perform maximum-a-posteriori (MAP) estimation in

    a probabilistic model. The approach is remarkably efficient,

    providing useful low resolution density estimates in an hour.

    We also show that our stochastic optimization technique is

    insensitive to initialization, allowing the use of random ini-

    tializations. We further introduce a novel importance sam-

    pling scheme that dramatically reduces the computational

    costs associated with high resolution reconstruction. This

    leads to speedups of 100,000-fold or more, allowing struc-

    tures to be determined in a day on a modern workstation. In

    addition, the proposed framework is flexible, allowing parts

    of the model to be changed and improved without impacting

    the estimation; e.g., we compare the use of three different

    priors. To demonstrate our method, we perform reconstruc-

    tions on two real datasets and one synthetic dataset.

    2. Background and Related Work

    In Cryo-EM, a purified solution of the target molecule

    is cryogenically frozen into a thin (single molecule thick)

    film, and imaged with a transmission electron microscope.

    A large number of such samples are obtained, each of which

    provides a micrograph containing hundreds of visible, in-

    dividual molecules. In a process known as particle pick-

    ing, individual molecules are selected, resulting in a stack

    of cropped particle images. Particle picking is often done

    manually, however there have been recent moves to partially

    or fully automate the process [17, 40]. Each particle image

    provides a noisy view of the molecule, but with unknown

    3D pose, see Fig. 2 (right). The reconstruction task is to es-

    timate the 3D electron density of the target molecule from

    the potentially large set of particle images.

    Common approaches to Cryo-EM density estimation,

    e.g., [7, 11, 37], use a form of iterative refinement. Based

    on an initial estimate of the 3D density, they determine the

    best matching pose for each particle image. A new density

    estimate is then constructed using the Fourier Slice The-

    orem (FST); i.e., the 2D Fourier transform of an integral

    projection of the density corresponds to a slice through the

    origin of the 3D Fourier transform of that density, in a plane

    perpendicular to the projection direction [13]. Using the

    3D pose for each particle image, the new density is found

    through interpolation and averaging of the observed particle

    images.

    This approach is fundamentally limited in several ways.

    Even if one begins with the correct 3D density, the low SNR

    of particle images makes accurately identifying the correct

    pose for each particle nearly impossible. This problem is

    exacerbated when the initial density is inaccurate. Poor

    initializations result in estimated structures that are either

    clearly wrong (see Fig. 9) or, worse, appear plausible but

    are misleading in reality, resulting in incorrectly estimated

    3D structures [12]. Finally, and crucially for the case of

    density estimation with many particle images, all data are

    used at each refinement iteration, causing these methods to

    be extremely slow. Mallick et al. [25] proposed an approach

    which attempted to establish weak constraints on the rela-

    tive 3D poses between different particle images. This was

    used to initialize an iterative refinement algorithm to pro-

    duce a final reconstruction. In contrast, our refinement ap-

    proach does not require an accurate initialization.

    To avoid the need to estimate a single 3D pose for each

    particle image, Bayesian approaches have been proposed in

    which the 3D poses for the particle images are treated as la-

    tent variables, and then marginalized out numerically. This

    approach was originally proposed by Sigworth [35] for 2D

    image alignment and later by Scheres et al. [33] for 3D es-

    timation and classification. It was since been used by Jaitly

    et al. [14], where batch, gradient-based optimization was

    performed. Nevertheless, due to the computational cost of

    marginalization, the method was only applied to small num-

    bers of class-average images which are produced by cluster-

    ing, aligning and averaging individual particle images ac-

    cording to their appearance, to reduce noise and the number

    of particle images used during the optimization. More re-

    cently, pose marginalization was applied directly with par-

    ticle images, using a batch Expectation-Maximization algo-

    rithm in the RELION package [34]. However, this approach

    is extremely computationally expensive. Here, the proposed

    approach uses a similar marginalized likelihood, however

    unlike previous methods, stochastic rather than batch op-

    timization is used. We show that this allows for efficient

    optimization, and for robustness with respect to initializa-

    tion. We further introduce a novel importance sampling

    technique that dramatically reduces the computational cost

    of the marginalization when working at high resolutions.

    3. A Framework for 3D Density Estimation

    Here we present our framework for density estimation

    which includes a probabilistic generative model of image

    formation, stochastic optimization to cope with large-scale

    datasets, and importance sampling to efficiently marginalize

    over the unknown 3D pose of the particle in each image.

    3.1. Image Formation Model

    In Cryo-EM, particle images are formed as orthographic,

    integral projections of the electron density of a molecule,

    V ∈ RD3

    . In each image, the density is oriented in an un-

    known pose, R ∈ SO(3), relative to the direction of themicroscope beam. The projection along this unknown di-

    rection is a linear operator, which is represented by the ma-

    trix PR ∈ RD2×D3 . Along with pose, the in-plane transla-

    tion t ∈ R2 of each particle image is unknown, the effect of

  • (likelihood) then becomes

    p(Ĩ|θ, Ṽ) ≈

    MR∑

    j=1

    wRj

    Mt∑

    ℓ=1

    wtkp(Ĩ|θ,Rj , tℓ, Ṽ)p(R)p(t)

    (4)

    where {(Rj , wR

    j )}MRj=1 are weighted quadrature points over

    SO(3) and {(tℓ, wt

    ℓ)}Mtℓ=1 are weighted quadrature points

    over R2. The accuracy of the quadrature scheme, and con-

    sequently the values of MR and Mt, are set automatically

    based on ω, the specified maximum frequency such that

    higher values of ω results in more quadrature points.

    Given a set of K images with CTF parameters D ={(Ii, θi)}

    Ki=1 and assuming conditional independence of the

    images, the posterior probability of a density V is

    p(V|D) ∝ p(V)

    K∏

    i=1

    p(Ĩi|θi, Ṽ) (5)

    where p(V) is a prior over 3D molecular electron densities.Several choices of prior are explored below, but we found

    that a simple independent exponential prior worked well.

    Specifically, p(V) =∏D3

    i=1 λe−λVi where Vi is the value

    of the ith voxel and λ is the inverse scale parameter. Other

    choices of prior are possible and is a promising direction for

    future research.

    Estimating the density now corresponds to finding Vwhich maximizes Equation (5). Taking the negative log

    and dropping constant factors, the optimization problem be-

    comes argminV∈RD

    3

    +

    f(V),

    f(V) = − log p(V)−K∑

    i=1

    log p(Ĩi|θi, Ṽ) (6)

    where V is restricted to be positive (negative density is phys-ically unrealistic). Optimizing Eq. (6) directly is costly due

    to the marginalization in Eq. (4) as well as the large num-

    ber (K) of particle images in a typical dataset. To deal with

    these challenges, the following sections propose the use of

    two techniques, namely, stochastic optimization and impor-

    tance sampling.

    3.2. Stochastic Optimization

    In order to efficiently cope with the large number of

    particle images in a typical dataset, we propose the use

    of stochastic optimization methods. Stochastic optimiza-

    tion methods exploit the large amount of redundancy in

    most datasets by only considering subsets of data (i.e., im-

    ages) at each iteration by rewriting the objective as f(V) =∑

    k fk(V) where each fk(V) evaluates a subset of data.This allows for fast progress to be made before a batch op-

    timization algorithm would be able to take a single step.

    There are a wide range of such methods, ranging from

    simple stochastic gradient descent with momentum [28, 29,

    36] to more complex methods such as Natural Gradient

    methods [2, 3, 19, 20] and Hessian-free optimization [26].

    Here we propose the use of Stochastic Average Gradient

    Descent (SAGD) [21] which has several important advan-

    tages. First, it is effectively self-tuning, using a line-search

    to determine and adapt the learning rate. This is particu-

    larly important, as many methods require significant man-

    ual tuning for new objective functions and, potentially, each

    new dataset. Further, it is specifically designed for the finite

    dataset case allowing for faster convergence.

    At each iteration τ , SAGD [21] considers only a single

    subset of data, kτ , which defines part of the objective func-

    tion fkτ (V) and its gradient gkτ (V). The density V is thenupdated as

    Vτ+1 = Vτ −ǫ

    KL

    K∑

    j=1

    dVτj (7)

    where ǫ is a base learning rate, L is a Lipschitz constant of

    gk(V), and

    dVτk =

    {

    gk(Vτ ) k = kτ

    dVτ−1k otherwise(8)

    is the most recent gradient evaluation of datapoint j at it-

    eration τ . This step can be computed efficiently by stor-

    ing the gradient of each observation and updating a run-

    ning sum each time a new gradient is seen. The Lipschitz

    constant L is not generally known but can be estimated us-

    ing a line-search technique. Theoretically, convergence oc-

    curs for values of ǫ ≤ 116

    [21], however in practice larger

    values at early iterations can be beneficial, thus we use

    ǫ = max( 116, 21−⌊τ/150⌋). To allow parallelization and re-

    duce the memory requires of SAGD, the data is divided into

    minibatches of 200 particles images. Finally, to enforce the

    positivity of density, negative values of V are truncated tozero after each iteration. More details of the stochastic op-

    timization can be found in the Supplemental Material.

    3.3. Importance Sampling

    While stochastic optimization allows us to scale to large

    datasets, the cost of computing the required gradient for

    each image remains high due to the marginalization over

    orientations and shifts. Intuitively, one could consider ran-

    domly selecting a subset of the terms in Eq. (4) and using

    this as an approximation. This idea is formalized by impor-

    tance sampling (IS) which allows for an efficient and accu-

    rate approximation of the discrete sums in Eq. (4).1 A full

    review of importance sampling is beyond the scope of this

    paper but we refer readers to [38].

    1One can also apply importance sampling directly to the continuous

    integrals in Eq. (3) but it can be computationally advantageous to precom-

    pute a fixed set of projection and shift matrices, P̃R and S̃t, which can be

    reused across particle images.

  • References

    [1] S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and

    R. Szeliski, “Building rome in a day,” in ICCV, 2009. 1

    [2] S.-I. Amari, H. Park, and K. Fukumizu, “Adaptive method

    of realizing natural gradient learning for multilayer percep-

    trons,” Neural Computation, vol. 12, no. 6, pp. 1399–1409,

    2000. 4

    [3] S. Amari, “Natural gradient works efficiently in learning,”

    Neural Computation, vol. 10, no. 2, pp. 251–276, 1998. 4

    [4] W. T. Baxter, R. A. Grassucci, H. Gao, and J. Frank, “Deter-

    mination of signal-to-noise ratios and spectral snrs in cryo-

    em low-dose imaging of molecules,” J Struct Biol, vol. 166,

    no. 2, pp. 126–32, May 2009. 1

    [5] J. Besag, “Statistical analysis of non-lattice data,” Journal

    of the Royal Statistical Society. Series D (The Statistician),

    vol. 24, no. 3, pp. 179–195, 1975. 7

    [6] R. Bhotika, D. Fleet, and K. Kutulakos, “A probabilistic the-

    ory of occupancy and emptiness,” in ECCV, 2002. 1

    [7] J. de la Rosa-Trevı́n, J. Otòn, R. Marabini, A. Zaldı́var,

    J. Vargas, J. Carazo, and C. Sorzano, “Xmipp 3.0: An im-

    proved software suite for image processing in electron mi-

    croscopy,” Journal of Structural Biology, vol. 184, no. 2, pp.

    321–328, 2013. 1, 2

    [8] A. Doucet, S. Godsill, and C. Andrieu, “On sequential Monte

    Carlo sampling methods for Bayesian filtering,” Statistics

    and Computing, vol. 10, no. 3, pp. 197–208, 2000. 5

    [9] M. Gräf and D. Potts, “Sampling sets and quadrature formu-

    lae on the rotation group,” Numerical Functional Analysis

    and Optimization, vol. 30, no. 7-8, pp. 665–688, 2009. 3

    [10] J. Gregson, M. Krimerman, M. B. Hullin, and W. Heidrich,

    “Stochastic tomography and its applications in 3d imaging

    of mixing fluids,” ACM Trans. Graph. (Proc. SIGGRAPH

    2012), vol. 31, no. 4, pp. 52:1–52:10, 2012. 1

    [11] N. Grigorieff, “Frealign: high-resolution refinement of sin-

    gle particle structures,” J Struct Biol, vol. 157, no. 1, p.

    117–125, Jan 2007. 1, 2, 7

    [12] R. Henderson, A. Sali, M. L. Baker, B. Carragher, B. De-

    vkota, K. H. Downing, E. H. Egelman, Z. Feng, J. Frank,

    N. Grigorieff, W. Jiang, S. J. Ludtke, O. Medalia, P. A.

    Penczek, P. B. Rosenthal, M. G. Rossmann, M. F. Schmid,

    G. F. Schröder, A. C. Steven, D. L. Stokes, J. D. Westbrook,

    W. Wriggers, H. Yang, J. Young, H. M. Berman, W. Chiu,

    G. J. Kleywegt, and C. L. Lawson, “Outcome of the first

    electron microscopy validation task force meeting,” Struc-

    ture, vol. 20, no. 2, pp. 205 – 214, 2012. 1, 2, 7

    [13] J. Hsieh, Computed Tomography: Principles, Design, Arti-

    facts, and Recent Advances. SPIE, 2003. 1, 2

    [14] N. Jaitly, M. A. Brubaker, J. Rubinstein, and R. H. Lilien, “A

    Bayesian Method for 3-D Macromolecular Structure Infer-

    ence using Class Average Images from Single Particle Elec-

    tron Microscopy,” Bioinformatics, vol. 26, pp. 2406–2415,

    2010. 2

    [15] J. Keeler, Understanding NMR Spectroscopy. Wiley, 2010.

    1

    [16] K. N. Kutulakos and S. M. Seitz, “A theory of shape by space

    carving,” IJCV, vol. 38, no. 3, pp. 199–218, 2000. 1

    [17] R. Langlois, J. Pallesen, J. T. Ash, D. N. Ho, J. L. Rubin-

    stein, and J. Frank, “Automated particle picking for low-

    contrast macromolecules in cryo-electron microscopy,” Jour-

    nal of Structural Biology, vol. 186, no. 1, pp. 1 – 7, 2014. 2

    [18] W. C. Y. Lau and J. L. Rubinstein, “Subnanometre-resolution

    structure of the intact Thermus thermophilus H+-driven ATP

    synthase,” Nature, vol. 481, pp. 214–218, 2012. 3, 5, 6

    [19] N. Le Roux and A. Fitzgibbon, “A fast natural Newton

    method,” in ICML, 2010. 4

    [20] N. Le Roux, P.-A. Manzagol, and Y. Bengio, “Topmoumoute

    online natural gradient algorithm,” in NIPS, 2008, pp. 849–

    856. 4

    [21] N. Le Roux, M. Schmidt, and F. Bach, “A stochastic gradi-

    ent method with an exponential convergence rate for strongly

    convex optimization with finite training sets,” in NIPS, 2012.

    4

    [22] V. J. Lebedev and D. N. Laikov, “A quadrature formula for

    the sphere of the 131st algebraic order of accuracy,” Doklady

    Mathematics, vol. 59, no. 3, pp. 477 – 481, 1999. 3

    [23] X. Li, P. Mooney, S. Zheng, C. R. Booth, M. B. Braunfeld,

    S. Gubbens, D. A. Agard, and Y. Cheng, “Electron count-

    ing and beam-induced motion correction enable near-atomic-

    resolution single-particle cryo-em,” Nature Methods, vol. 10,

    no. 6, pp. 584–590, 2013. 8

    [24] S. P. Mallick, B. Carragher, C. S. Potter, and D. J. Kriegman,

    “Ace: Automated {CTF} estimation,” Ultramicroscopy, vol.104, no. 1, pp. 8–29, 2005. 3

    [25] S. P. Mallick, S. Agarwal, D. J. Kriegman, S. J. Belongie,

    B. Carragher, and C. S. Potter, “Structure and view estima-

    tion for tomographic reconstruction: A bayesian approach,”

    in CVPR, 2006. 2

    [26] J. Martens, “Deep learning via hessian-free optimization,” in

    ICML, 2010. 4

    [27] J. A. Mindell and N. Grigorieff, “Accurate determination

    of local defocus and specimen tilt in electron microscopy,”

    Journal of Structural Biology, vol. 142, no. 3, pp. 334–47,

    2003. 3

    [28] Y. Nesterov, “A method of solving a convex programming

    problem with convergence rate o (1/k2),” Soviet Mathematics

    Doklady, vol. 27, no. 2, pp. 372–376, 1983. 4

  • [29] B. Polyak, “Some methods of speeding up the convergence

    of iteration methods,” USSR Computational Mathematics

    and Mathematical Physics, vol. 4, no. 5, pp. 1–17, 1964. 4

    [30] L. Reimer and H. Kohl, Transmission Electron Microscopy:

    Physics of Image Formation. Springer, 2008. 3

    [31] J. L. Rubinstein, J. E. Walker, and R. Henderson, “Struc-

    ture of the mitochondrial atp synthase by electron cryomi-

    croscopy,” The EMBO Journal, vol. 22, no. 23, pp. 6182–

    6192, 2003. 5, 6, 7

    [32] B. Rupp, (2009). Biomolecular Crystallography: Principles,

    Practice and Application to Structural Biology. Garland

    Science, 2009. 1

    [33] S. H. W. Scheres, H. Gao, M. Valle, G. T. Herman, P. P. B.

    Eggermont, J. Frank, and J.-M. Carazo, “Disentangling con-

    formational states of macromolecules in 3d-em through like-

    lihood optimization,” Nature Methods, vol. 4, pp. 27–29,

    2007. 2, 3

    [34] S. H. Scheres, “RELION: Implementation of a Bayesian

    approach to cryo-EM structure determination ,” Journal of

    Structural Biology, vol. 180, no. 3, pp. 519 – 530, 2012. 1,

    2, 7

    [35] F. Sigworth, “A maximum-likelihood approach to single-

    particle image refinement,” Journal of Structural Biology,

    vol. 122, no. 3, pp. 328 – 339, 1998. 2, 3

    [36] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the im-

    portance of initialization and momentum in deep learning,”

    in ICML, 2013. 4

    [37] G. Tang, L. Peng, P. Baldwin, D. Mann, W. Jiang, I. Rees,

    and S. Ludtke, “EMAN2: an extensible image processing

    suite for electron microscopy,” Journal of Structural Biology,

    vol. 157, no. 1, pp. 38–46, 2007. 1, 2

    [38] S. T. Tokdar and R. E. Kass, “Importance sampling: a

    review,” Wiley Interdisciplinary Reviews: Computational

    Statistics, vol. 2, no. 1, pp. 54–60, 2010. 4

    [39] Z. Xu, A. L. Horwich, and P. B. Sigler, “The crystal structure

    of the asymmetric GroEL-GroES-(ADP)7 chaperonin com-

    plex,” Nature, vol. 388, pp. 741–750, 1997. 5, 6

    [40] J. Zhao, M. A. Brubaker, and J. L. Rubinstein, “TMaCS:

    A hybrid template matching and classification system for

    partially-automated particle selection,” Journal of Structural

    Biology, vol. 181, no. 3, pp. 234 – 242, 2013. 2

    Acknowledgements This work was supported in part by

    NSERC Canada and the CIFAR NCAP Program. MAB was

    funded in part by an NSERC Postdoctoral Fellowship. The au-

    thors would like thank John L. Rubinstein for providing data and

    invaluable feedback.