1 introduction 2 review: modeling molecules with spherical

9
CS273: Algorithms for Structure Handout # 12 and Motion in Biology Stanford University Thursday, 6 May 2003 Lecture #12: 6 May 2004 Topics: Geometric Models of Molecules II Scribe: Itamar Rosenn 1 Introduction The size and shape of molecules such as proteins have a strong influence in determining their function. In this lecture, we elaborate upon methods discussed in the previous lecture for modeling molecules geometrically, and explore how those methods can be used to accurately quantify the size and shape of the molecule itself. 2 Review: Modeling Molecules with Spherical Balls Space-Filling Diagrams In biology, space-filling diagrams are used to model the space occupied by a molecule. Each atom is modeled as a spherical ball; thus, since a molecule is a collection of atoms, it is modeled accordingly as a union of balls. When modeling molecules using spherical balls, we consider modeling the surface in one of three ways: The van der Walls surface is the surface of what is covered by the atoms, using the van der Walls radius of each atom. The solvent-accessible surface is generated by rolling a spherical probe around the van der Waals surface to reflect the accessibility of the molecule to a solvent. The molecular surface consists of the solvent-accessible surface offset inwards to remove areas touched by the probe surface, resulting in a smoother model than the original van der Waals surface, with less extreme crevices. Power Diagrams and Delaunay Triangulations A power diagram divides the space of our molecular model into several regions, each containing a unique atom of the molecule. For a given atom, its region in the power diagram consists of the space that is closer in power distance to that atom than to any other atom. We can use this notion of a power diagram to decompose our model into information about its atoms.

Upload: others

Post on 01-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

CS273: Algorithms for Structure Handout # 12and Motion in BiologyStanford University Thursday, 6 May 2003

Lecture #12: 6 May 2004Topics: Geometric Models of Molecules IIScribe: Itamar Rosenn

1 Introduction

The size and shape of molecules such as proteins have a strong influence in determiningtheir function. In this lecture, we elaborate upon methods discussed in the previouslecture for modeling molecules geometrically, and explore how those methods can beused to accurately quantify the size and shape of the molecule itself.

2 Review: Modeling Molecules with Spherical Balls

Space-Filling Diagrams

In biology, space-filling diagrams are used to model the space occupied by a molecule.Each atom is modeled as a spherical ball; thus, since a molecule is a collection of atoms,it is modeled accordingly as a union of balls. When modeling molecules using sphericalballs, we consider modeling the surface in one of three ways:

• The van der Walls surface is the surface of what is covered by the atoms, usingthe van der Walls radius of each atom.

• The solvent-accessible surface is generated by rolling a spherical probe aroundthe van der Waals surface to reflect the accessibility of the molecule to a solvent.

• The molecular surface consists of the solvent-accessible surface offset inwards toremove areas touched by the probe surface, resulting in a smoother model than theoriginal van der Waals surface, with less extreme crevices.

Power Diagrams and Delaunay Triangulations

A power diagram divides the space of our molecular model into several regions, eachcontaining a unique atom of the molecule. For a given atom, its region in the powerdiagram consists of the space that is closer in power distance to that atom than to anyother atom. We can use this notion of a power diagram to decompose our model intoinformation about its atoms.

2 CS273: Handout # 12

Figure 1: (A) van der Waals surface, (B) solvent-accessible surface, and (C) molecularsurface

More formally:

In three-dimensional space, let |x − y| be the Euclidean distance between two pointsx and y, and let bi(zi, ri) be a spherical ball whose center is zi and radius is ri. Thus, wedefine bi(zi, ri) as {x ∈ R3 : |x − zi| ≤ ri}, the set of points that are at least as close tozi as the radius ri.

The union of a set B of balls is ∪B = {x ∈ R3 : x ∈ b ∈ B}. The complement ofthis union, R3 − ∪B, consists of an unbounded region that corresponds to the area out-side of the molecule, and zero or more bounded regions that are cavities of ∪B, whichcorrespond to bounded empty spaces inside the molecule. The complement of the spaceoccupied by the molecule will be addressed further in our discussion of pockets and voidsin section 6.

Define the power distance of a point x ∈ R3 from a ball bi = b(zi, ri) asΠi(x) = |zi − x|2 − r2

i .

Note that the power distance between balls bi = (zi, ri) and bj = (zj, rj) isΠ(ij) = |zi − zj|2 − r2

i − r2j .

The power diagram of a molecule is a collection of power cells for each of the spheri-cal balls, where the power cell of a ball bi ∈ B is defined as the set of points at least asclose to bi as to any other ball in B:

Vi = {x ∈ R3 : Πi(x) ≤ Πj(x), j ∈ B}

The dual of the power diagram of a molecule is its Delaunay Triangulation, whichexpresses the connectivity of our union of balls. This structure is formed by connectingthe centers of each pair of atom-balls whose cells are adjacent in the power diagram.The convex hull of the atom centers consists of the outer boundary of the DelaunayTriangulation.

CS273: Handout # 12 3

Figure 2: (a) a molecular model segmented into a power diagram, (b) the correspondingDelaunay Triangulation, and (c) the alpha shape and corresponding dual complex (seesection 4).

3 Simplicial Complexes

To understand how we can use the notions of a power diagram and a Delaunay Triangula-tion to extract useful information about the actual molecule, we first need to understanda handful of basic topological concepts.

In topology, a vertex is referred to as a 0-simplex, an edge as a 1-simplex, a triangle as a2-simplex, and a tetrahedron as a 3-simplex, where the integer in each term signifies thedimension of the element. The boundary of a simplex consists of other, lower-dimensionsimplices that we call the faces or subsimplices of the simplex.

Simplicial complexes are objects that are collections of simplices. Their constructionadheres to the following two rules:

(i) For every simplex in the construction, its faces are also part of the construction.(ii) The intersection of any two simplices is either a face of both simplices, or empty.

4 Alpha Shapes

We have already seen an example of a simplicial complex: the Delaunay Triangulation ofour molecular model. This complex consists of many vertices, edges, triangles, and tetra-hedra. However, we need an organized way of building up these components. The natural

4 CS273: Handout # 12

Figure 3: (A) 0-simplex, 1-simplex, 2-simplex, and 3-simplex; (B) a legal complex ofsimplices; (C) intersection patterns of simplices that are not allowed in a complex.

way to arrange these subsimplices of the Delaunay Triangulation is in a sequence using aball growth model, in which the atom centers are expanded into balls of increasing radii.When two, three, or four atom balls have grown large enough so they collide, an edge,a triangle, or a tetrahedron spanning their centers is added to the complex, respectively.When an element is introduced into the complex in this manner, it is marked with thetime point at which it has been introduced.

This process of growing the atom balls and building the Delaunay complex in a chrono-logical manner is known as filtration. We let the time t range from −∞ to ∞, andgrow the weight (i.e. squared radius) of each ball bi to (r2

i + t) at time t. Since thepower diagram of our model remains the same at all times, the dual complexes thatarise through the filtration process must be subcomplexes of our entire Delaunay com-plex. Also, since as each ball grows, it covers an increasingly larger portion of its region inthe power diagram, the dual complexes that result from filtration only increase over time.

Instead of explicitly using a time parameter t, we use α =√

t. The motivation be-hind this convention is that if we set the initial radius of a ball to zero, the radius of theball at time t is α. Let Bα be our collection of balls and Kα the corresponding complexat time t = α2. Kα is referred to as the α-complex, and its dual, the current shape of ourmolecule, is referred to as the α-shape of B. For small enough α, all radii are imaginary,

CS273: Handout # 12 5

so our union of balls and its corresponding dual complex are both empty. When α growslarge enough, meaning enough time has passed for the balls to grow sufficiently large,the corresponding α-complex is equivalent to the Delaunay complex of our model. Thus,the process of filtration yields a sequence of complexes that contains the empty complexat one endpoint, the full Delaunay complex D at the other endpoint, and along the wayreflects the shape of our model at increasingly higher levels of resolution:

∅ ⊂ Kα ⊂ Kβ ⊂ D for every−∞ < α2 ≤ β2 < ∞

where our filtration sequence is the ordered set:{∅ = K0, K1, . . . , Km = D}.

Note that D contains only finitely many simplices; thus there are only finitely manysubcomplexes of D that appear during the filtration.

Figure 4: (A) The alpha complex for small alpha consists mostly of low-dimension sim-plices. (B) The alpha complex for medium alpha features more extensive simplicialcomplexes. (C) The alpha complex for large alpha is connected, and contains previouslybuilt complexes along with new simplices in its complex.

5 Computing Metric Properties of a Molecule

The problem of computing metric properties of a molecule such as surface area and vol-ume is difficult because the spherical balls representing the atoms overlap due to chemicalbonds, van der Waals contacts, and solvent contacts. As a result of these intersections,the area or volume of the molecule cannot be computed simply as a sum of the area orvolume of the individual atoms.

To account for intersections in computing metric properties, we can use the principleof inclusion-exclusion: when two atoms overlap, we subtract the metric value of the over-lap from the sum of the metric values of the individual atoms. When three atoms overlap,we first subtract the pairwise overlaps, and then add the triple overlaps. This process

6 CS273: Handout # 12

continues when there are four, five, or more atoms that intersect.

To compute our desired metric property using inclusion-exclusion, it is sufficient to useonly the intersections represented as elements of our dual complex, which we can con-struct using the process of filtration explained above. Restricting our formula to theseterms yields a polynomial-size computation. Note that without this restriction affordedby the dual complex, all possible intersections would have to be considered, resulting inan exponential-size computation.

Figure 5: By inclusion-exclusion, area of union = A + B + C + D − AB − AC − AD −BC − BD − CD + ABC + ABD + ACD + BCD − ABCD. Using only the simplicesin the Delaunay Triangulation, this simplifies to A + B + C + D − AB − AC − AD −BC−CD +ABC +ACD. Note that there is an equivalence among the cancelled terms:BD = BCD + ABD − ABCD.

6 Pockets, Voids, and Topological Persistence

As mentioned earlier, our motivation in quantifying the size and shape a molecule is thatthese properties affect how the molecule interacts with other molecules, which in turnhelps to specify its function. Sometimes, the site of this interaction is a concavity ofthe molecule, known as a pocket, which is surrounded on most sides by atoms of themolecule. For example, if the molecule is a protein, a solvent can bind to the moleculeat a pocket by gaining access to the molecule through the mouth, which is the area ofpocket not enclosed by atoms. A void is a cavity that is bound on all sides by atoms ofthe molecule, and is therefore inaccessible to other molecules.

During filtration, as the atom-balls swell, more and more balls intersect each other.Some of these intersections close the mouths of pockets in the model, first turning these

CS273: Handout # 12 7

pockets into voids and then subsuming all the empty space inside the voids, until nopockets or voids are left in the model. Indeed, the Delaunay Triangulation, which isthe final complex generated by our filtration, by definition contains only unbound emptyspace outside the convex hull of the molecule. Thus, we can identify and analyze pocketsthrough the process of filtration by noting which concavities of one alpha shape becomevoids in some later alpha shape and are eventually subsumed entirely.

More precisely, pockets of a certain alpha shape are identified and measured using adiscrete flow relation defined on the corresponding alpha complex. Discrete flow is de-fined on simplices of the Delaunay complex that have not yet appeared on the alphacomplex; for example, in the two-dimensional case, we consider triangles of the DelaunayTriangulation that are empty in our current alpha complex, meaning that they have notyet appeared (compare figure 2b with 2c). According to the flow relation, an obtuseempty triangle flows to the triangle that borders it on the edge opposite the obtuse ver-tex, whereas an acute empty triangle is a sink that collects flow from all its neighboringempty triangles. A series of empty triangles that flows to a sink define a pocket, whilea series of empty triangles that flow to infinity define the exterior of the molecule. Theactual size of a pocket is computed by summing the size of empty triangles that definethe pocket and then subtracting the fraction of atom space that appears in those triangles.

Figure 6: (a) A pocket formed by five empty Delaunay triangles. Obtuse triangles flowinto the sink. The top of triangle 1 forms the mouth of the pocket. (b) This structureis not identified as a pocket because the empty triangles flow to infinity rather than to asink; this concavity does not get closed into a void at any point of the filtration.

We may regard filtration as an evolutionary growth process in which topological fea-tures such as pockets and voids are created and later destroyed (see figure 7). The

8 CS273: Handout # 12

lifetime, or persistence, of each feature is an interval whose boundaries are the alphavalue corresponding to the complex at which the feature first appears and the alphavalue corresponding the complex at which it disappears. Very short-lived topological at-tributes may be considered data noise and removed, while attributes that persist longerare kept as meaningful topological features.

Figure 7: Emergence and disappearance of topological features of the alpha shape throughfiltration.

References

[1] H. Edelsbrunner and E.P. Mucke, Three-dimensional alpha shapes, ACM Trans.Graphics, 13:43-72, 1994.

[2] H. Edelsbrunner, The union of balls and its dual shape, Discrete Comput. Geometry,13:415-440, 1995.

[3] H. Edelsbrunner, M.A. Facello and J. Liang, On the definition and the constructionof pockets in macromolecules, Discrete Appl. Math, 88:83-102, 1998.

[4] J. Liang, H. Edelsbrunner, P. Fu, P.V. Sudharkar and S. Subramaniam, Analyticshape computation of macromolecules I: molecular area and volume through alphashape, Proteins: Structure, Function and Genetics 33:1-17, 1998.

CS273: Handout # 12 9

[5] J. Liang, H. Edelsbrunner, and C. Woodward, Anatomy of protein pockets and cavi-ties: measurement of binding site geometry and implications for ligand design, ProteinScience 7:1884-1897, 1998.