013_20160328_topological_measurement_of_protein_compressibility
TRANSCRIPT
A topological measurement of protein compressibility
Tran Quoc Hoan
@k09ht haduonght.wordpress.com/
28 March 2016, Paper Alert, Hasegawa lab., Tokyo
The University of Tokyo
Marcio Gameiro et. al. (Japan J. Indust. Appl. Math (2015) 32:1-17)
Abstract
Topological Measurement of Protein Compressibility 2
…we partially clarify the relation between the compressibility of a protein and its molecular geometric structure. To identify and understand the relevant topological features within a given protein, we model its molecule as an alpha filtration and hence obtain multi-scale insight into the structure of its tunnels and cavities. The persistence diagrams of this alpha filtration capture the sizes and robustness of such tunnels and cavities in a compact and meaningful manner…
Our main result establishes a clear linear correlation between the topological measure and the experimentally-determined compressibility of most proteins for which both PDB information and experimental compressibility data are available…..
Tutorial of Topological Data Analysis
Tran Quoc Hoan
@k09ht haduonght.wordpress.com/
Hasegawa lab., Tokyo
The University of Tokyo
Part I - Basic Concepts
Outline
TDA - Basic Concepts 4
1. Topology and holes
3. Definition of holes
5. Some of applications
2. Simplicial complexes
4. Persistent homology
Outline
TDA - Basic Concepts 5
1. Topology and holes
5. Some of applications
2. Simplicial complexes
4. Persistent homology
3. Definition of holes
Topology
I - Topology and Holes 6
The properties of space that are preserved under continuous deformations, such as stretching and bending, but not tearing or gluing
⇠= ⇠= ⇠=
⇠= ⇠= ⇠=
⇠=
�
�
Invariant
7
Question: what are invariant things in topology?
⇠= ⇠= ⇠=
⇠= ⇠=
⇠=
⇠=
ConnectedComponent Ring Cavity
1 0 0
2 0 0
1 1 0
1 10
Number of
I - Topology and Holes
Holes and dimension
8
Topology: consider the continuous deformation under the same dimensional hole
✤ Concern to forming of shape: connected component, ring, cavity
• 0-dimensional “hole” = connected component• 1-dimensional “hole” = ring
• 2-dimensional “hole” = cavity
How to define “hole”?
Use “algebraic” Homology group
I - Topology and Holes
Homology group
9
✤ For geometric object X, homology Hl satisfied:
k0 : number of connected components
k1 : number of rings
k2 : number of cavities
kq : number of q-dimensional holes
Betti-numbers
I - Topology and Holes
Image source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Outline
TDA - Basic Concepts 10
1. Topology and holes
5. Some of applications
2. Simplicial complexes
4. Persistent homology
3. Definition of holes
Simplicial complexes
11
Simplicial complex:A set of vertexes, edges, triangles, tetrahedrons, … that are closed under taking faces and that have no improper intersections
vertex(0-dimension)
edge(1-dimension)
triangle(2-dimension)
tetrahedron(3-dimension)
simplicial complex
not simplicial complex
2 - Simplicial complexes
k-simplex
Simplicial
12
n-simplex:The “smallest” convex hull of n+1 affinity independent points
vertex(0-dimension)
edge(1-dimension)
triangle(2-dimension)
tetrahedron(3-dimension) n-simplex
� = |v0v1...vn| = {�0v0 + �1v1 + ...+ �nvn|�0 + ...+ �n = 1,�i � 0}
A m-face of σ is the convex hull τ = |vi0…vim| of a non-empty subset of {v0, v1, …, vn} (and it is proper if the subset is not the entire set)
⌧ � �
2 - Simplicial complexes
Simplicial
13
Direction of simplicial:The same direction with permutation <i0i1…in>
1-simplex
2-simplex
3-simplex
2 - Simplicial complexes
Simplicial complex
14
Definition:A simplicial complex is a finite collection of simplifies K such that
(1) If � 2 K and for all face ⌧ � � then ⌧ 2 K
(2) If �, ⌧ 2 K and � \ ⌧ 6= ? then � \ ⌧ � � and � \ ⌧ � ⌧
The maximum dimension of simplex in K is the dimension of K
K2 = {|v0v1v2|, |v0v1|, |v0v2|, |v1v2|, |v0|, |v1|, |v2|}
K = K2 [ {|v3v4|, |v3|, |v4|}
NOT YES
2 - Simplicial complexes
Simplicial complexes
15
Hemoglobin simplicial complex
Image source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
2 - Simplicial complexes
✤ Let be a covering of
Nerve
16
� = {Bi|i = 1, ...,m} X = [mi=1Bi
✤ The nerve of is a simplicial complex� N (�) = (V,⌃)
2 - Simplicial complexes
Nerve theorem
17
✤ If is covered by a collection of convex closed sets then X and are homotopy equivalent
X ⊂ RN
� = {Bi|i = 1, ...,m} N (�)
2 - Simplicial complexes
Cech complex
18
P = {xi 2 RN |i = 1, ...,m}
Br(xi) = {x 2 RN | ||x� xi|| r}
✤ The Cech complex C(P, r) is the nerve of
✤
� = {Br(xi)| xi 2 P}
✤ From nerve theorem: C(P, r)
Xr = [mi=1Br(xi) ' C(P, r)
✤ Filtration
ball with radius r
2 - Simplicial complexes
Cech complex
19
✤ The weighted Cech complex C(P, R) is the nerve of
✤ Computations to check the intersections of balls are not easy
ball with different radius� = {Bri(xi)| xi 2 P}
Alpha complex
2 - Simplicial complexes
Voronoi diagrams and Delaunay complex
20
✤ P = {xi 2 RN |i = 1, ...,m}
Vi = {x 2 RN | ||x� xi|| ||x� xj ||, j 6= i}
RN = [mi=1Vi
Voronoi cell
✤ � = {Vi|i = 1, ...,m}
D(P ) = N (�)
Voronoi decomposition
Delaunay complex
2 - Simplicial complexes
General position
21
✤ is in a general position, if there is no
✤ If all combination of N+2 points in P is in a general position, then P is in a general position
x1, ..., xN+2 2 RN
x 2 RNs.t.||x� x1|| = ... = ||x� xN+2||
✤ If P is in a general position then
The dimensions of Delaunay simplexes <= N
Geometric representation of D(P) can be embedded in RN
2 - Simplicial complexes
Alpha complex
22
✤
✤
✤ The alpha complex is the nerve of �
�
↵(P, r) = N (�)
✤ From Nerve theorem:Xr ' ↵(P, r)
2 - Simplicial complexes
Alpha complex
23
✤
✤
✤ The weighted alpha complex is defined with different radius
if P is in a general position
filtration of alpha complexes
2 - Simplicial complexes
Alpha complex
24
✤ Computations are much easier than Cech complexes
✤ Software: CGAL
• Construct alpha complexes of points clouds data in RN with N <= 3
Filtration of alpha complexImage source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
2 - Simplicial complexes
Outline
TDA - Basic Concepts 25
1. Topology and holes
3. Definition of holes
5. Some of applications
2. Simplicial complexes
4. Persistent homology
Definition of holes
26
Simplicial complex
Chain complex
Homologygroup
Algebraic Holes
Geometrical object
Algebraic object
3 - Definition of Holes
What is hole?
27
✤ 1-dimensional hole: ring
not ring have ring
boundary
without ring
without boundary
Ring = 1-dimensional graph without boundary?
However, NOT
1-dimensional graph without
boundary but is 2-dimensional graph ’s boundary
Ring = 1-dimensional graph without boundary and is not boundary of 2-dimensional graph
3 - Definition of Holes
What is hole?
28
✤ 2-dimensional hole: cavity
not cavity have cavity
boundary
without cavity
without boundary
However, NOT
2-dimensional graph without
boundary but is 3-dimensional graph ’s boundary
Cavity = 2-dimensional graph without boundary and is not boundary of 3-dimensional graph
Cavity = 2-dimensional graph without boundary?
3 - Definition of Holes
Hole and boundary
29
q-dimensional hole
q-dimensional graph without boundary and
is not boundary of (q+1)-dimensional graph
=We try to make it clear by “Algebraic” language
3 - Definition of Holes
Chain complexes
30
Let K be a simplicial complex with dimension n. The group of q-chains is defined as below:
The element of Cq(K) is called q chain.
Definition:
Cq(K) := {X
↵i
⌦vi0 ...viq
↵|↵i 2 R,
⌦vi0 ...viq
↵: q simplicial in K}
0 q nifCq(K) := 0, if q < 0 or q > n
3 - Definition of Holes
Boundary
31
Boundary of a q-simplex is the sum of its (q-1)-dimensional faces.
Definition:
vil is omitted
@|v0v1v2| := |v0v1|+ |v1v2|+ |v0v2|
3 - Definition of Holes
Boundary
32
Fundamental lemma@q�1 � @q = 0
@2 @1For q = 2
In general• For a q - simplex τ, the boundary ∂qτ, consists of all (q-1) faces of τ.• Every (q-2)-face of τ belongs to exactly two (q-1)-faces, with different direction
@q�1@q⌧ = 0
3 - Definition of Holes
Hole and boundary
33
q-dimensional holeq-dimensional graph without boundary and is
not boundary of (q+1)-dimensional graph
(1)
(2)
(1)
(2)
:= ker @q
:= im@q+1
(cycles group)
(boundary group)
Bq(K) ⇢ Zq(K) ⇢ Cq(K)
@q � @q+1 = 0
3 - Definition of Holes
Hole and boundary
34
q-dimensional holeq-dimensional graph without boundary and is
not boundary of (q+1)-dimensional graph
(1)
(2)
Elements in Zq(K) remain after make Bq(K) become zero
This operator is defined as Q=
:= ker @q := im@q+1
Q(z0) = Q(z) +Q(b) = Q(z)
(z and z’ are equivalent in with respect to )
q-dimensional hole = an equivalence class of vectors
ker @qim @q+1
For z0 = z + b, z, z0 2 ker @q, b 2 im @q+1
3 - Definition of Holes
Homology group
35
Homology groupsThe qth Homology Group Hq is defined as Hq = Ker@q/Im@q+1
= {z + Im@q+1 | z 2 Ker@q } = {[z]|z 2 Ker@q}
Divided in groups with operator [z] + [z’] = [z + z’]
Betti NumbersThe qth Betti Number is defined as the dimension of Hq
bq = dim(Hq)
H0(K): connected component H1(K): ring H2(K): cavity
3 - Definition of Holes
Computing Homology
36
v0
v1 v2
v3
All vectors in the column space of Ker@0 are equivalent with respect to Im@1
b0 = dim(H0) = 1Im@2 has only the zero vector
b1 = dim(H1) = 1H1 = {�(|v0v1|+ |v1v2|+ |v2v3|+ |v3v0|)}
3 - Definition of Holes
Computing Homology
37
v0
v1 v2
v3
H1 = {�(hv0v1i+ hv1v2i+ hv2v3i � hv0v3i)}
All vectors in the column space of Ker@0 are equivalent with respect to Im@1
b0 = dim(H0) = 1Im@2 has only the zero vector
b1 = dim(H1) = 13 - Definition of Holes
Outline
TDA - Basic Concepts 38
1. Topology and holes
3. Definition of holes
5. Some of applications
2. Simplicial complexes
4. Persistent homology
Persistent Homology
Persistent homology 39
✤ Consider filtration of finite type
K : K0 ⇢ K1 ⇢ ... ⇢ Kt ⇢ ...
9 ⇥ s.t. Kj = K⇥, 8j � ⇥
✤ : total simplicial complexK = [t�0Kt
Kk
Ktk
T (�) = t � 2 Kt \Kt�1
: all k-simplexes in K
: all k-simplexes in K at time t
: birth time of the simplex
time
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Persistent Homology
40
✤ Z2 - vector space
✤ Z2[x] - graded module
✤ Inclusion map
✤ is a free Z2[x] module with the baseCk(K)
Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Persistent Homology
41
✤ Boundary map
✤ From the graded structure
✤ Persistent homology
(graded homomorphism)face of σ
Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Persistent Homology
42
✤ From the structure theorem of Z2[x] (PID)
✤ Persistent interval
✤ Persistent diagram
Ii(b): inf of Ii, Ii(d): sup of Ii
Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Persistent Homology
43
birth time
death time
✤ “Hole” appears close to the diagonal may be the “noise”
✤ “Hole” appears far to the diagonal may be the “noise”
✤ Detect the “structure hole”
Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Outline
TDA - Basic Concepts 44
1. Topology and holes
3. Definition of holes
5. Some of applications
2. Simplicial complexes
4. Persistent homology
see more at part2 of tutorial
Applications
5 - Some of applications 45
• Persistence to Protein compressibilityMarcio Gameiro et. al. (Japan J. Indust. Appl. Math (2015) 32:1-17)
Protein Structure
Persistence to protein compressibility 46
amino acid 1 amino acid 2
3-dim structure of hemoglobin1-dim structure of protein
foldingpeptide bond
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Protein Structure
Persistence to protein compressibility 47
✤ Van der Waals radius of an atom
H: 1.2, C: 1.7, N: 1.55 (A0)O: 1.52, S: 1.8, P: 1.8 (A0)
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Van der Waals ball model of hemoglobin
Alpha Complex for Protein Modeling
Persistence to protein compressibility 48
✤
✤
✤
: position of atoms
: radius of i-th atom
: weighted Voronoi Decomposition
: power distance
: ball with radius ri
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Alpha Complex for Protein Modeling
Persistence to protein compressibility 49
✤
✤
✤
Alpha complex nerve
k - simplex
Nerve lemma
Changing radius
to form a filtration (by w)
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Topology of Ovalbumin
Persistence to protein compressibility 50
birth time
deat
h tim
e
birth time
deat
h tim
e1st betti
plot2nd betti
plot
PD1 PD2
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Compressibility
Persistence to protein compressibility 51
3-dim structureFunctionality
Softness
Compressibility
Experiments Quantification
Persistence diagrams
(Difficult)
…..…..
Select generators and fitting parameters with experimental compressibility
holes
Denoising
Persistence to protein compressibility 52
birth timede
ath
time
✤ Topological noise
✤ Non-robust topological features depend on a status of fluctuations
✤ The quantification should not be dependent on a status of fluctuations
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Holes with Sparse or Dense Boundary
Persistence to protein compressibility 53
✤ A sparse hole structure is deformable to a much larger extent than the dense hole → greater compressibility
✤ Effective sparse holes
: van der Waals ball: enlarged ball
birth time
deat
h tim
e
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
# of generators v.s. compressibility
Persistence to protein compressibility 54
# of generators v.s. compressibility
Topological Measurement Cp
Com
pres
sibi
lity
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Applications
5 - Some of applications 55
• Persistence to Phylogenetic Trees
Protein Phylogenetic Tree
Persistence to Phylogenetic Trees 56
✤ Phylogenetic tree is defined by a distance matrix for a set of species (human, dog, frog, fish,…)
✤ The distance matrix is calculated by a score function based on similarity of amino acid sequences
amino acid sequences
fish hemoglobin
frog hemoglobin
human hemoglobin
distance matrix ofhemoglobin
fishfroghumandog
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Persistence Distance and Classification of Proteins
Persistence to Phylogenetic Trees 57
✤ The score function based on amnio acid sequences does not contain information of 3-dim structure of proteins
✤ Wasserstein distance (of degree p)
Cohen-Steiner, Edelsbrunner, Harer, and Mileyko, FCM, 2010
on persistence diagrams reflects similarity of persistence diagram (3-dim structures) of proteins
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Persistence Distance and Classification of Proteins
Persistence to Phylogenetic Trees 58
birth time
deat
h tim
e
birth time
birth time
deat
h tim
e
deat
h tim
eWasserstein distance
Bijection
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Distance between persistence diagrams
Persistence to Phylogenetic Trees 59
Persistence of sub level sets
Stability Theorem (Cohen-Steiner et al., 2010)birth time
deat
h tim
e
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Phylogenetic Tree by Persistence
Persistence to Phylogenetic Trees 60
✤ Apply the distance on persistence diagrams to classify proteins
Persistence diagram used the noise band same as in the computations of compressibility
3DHT
3D1A
1QPW
3LQD
1FAW
1C40
2FZB
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Future work
TDA - Basic Concepts 61
✤ Principle to de-noise fluctuations in persistence diagrams (NMR experiments)
✤ Finding minimum generators to identify specific regions in a protein (e.g., a region inducing high compressibility, hereditarily important regions)
✤ Zigzag persistence for robust topological features among a specific group of proteins (quiver representation)
✤ Multi-dimensional persistence (PID → Grobner basic)
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
Applications more in part … of tutorials
5 - Some of applications 62
✤ Robotics
✤ Computer Visions
✤ Sensor network
✤ Concurrency & database
✤ Visualization
Prof. Robert Ghrist Department of Mathematics University of Pennsylvania
One of pioneers in applications
Michael Farber Edelsbrunner
Mischaikow Gaucher Bubenik
Zomorodian
Carlsson
Software
TDA - Basic Concepts 63
• Alpha complex by CGALhttp://www.cgal.org/
• Persistence diagrams by Perseus (coded by Vidit Nanda)
http://www.sas.upenn.edu/~vnanda/perseus/index.html
http://chomp.rutgers.edu/Project.html
• CHomP project
Reference link
Topological Measurement of Protein Compressibility 64
✤ Original paper
✤ Author slideshttp://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
http://www.sas.upenn.edu/~vnanda/source/compressibility-final.pdf
✤ Books (very good)- (Japaneses) タンパク質構造とトポロジー パーシステントホモロジー群入門 平岡 裕章- (English) Computational Topology - An Introduction, Herbert Edelsbrunner, John L. Harer