tutorial of topological data analysis part 3(mapper algorithm)

29
Tutorial of Topological Data Analysis Tran Quoc Hoan @k09ht haduonght.wordpress.com/ Paper Alert 2016-04-15, Hasegawa lab., Tokyo The University of Tokyo Part III - Mapper Algorithm

Upload: ha-phuong

Post on 16-Feb-2017

312 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Tutorial of topological data analysis part 3(Mapper algorithm)

Tutorial of Topological Data Analysis

Tran Quoc Hoan

@k09ht haduonght.wordpress.com/

Paper Alert 2016-04-15, Hasegawa lab., Tokyo

The University of Tokyo

Part III - Mapper Algorithm

Page 2: Tutorial of topological data analysis part 3(Mapper algorithm)

My TDA = Topology Data Analysis ’s road

TDA Road 2

Part I - Basic concepts & applications

Part II - Advanced TDA computation

Part III - Mapper Algorithm

Part V - Applications in…

Part VI - Applications in…

Part IV - Software Roadmap

He is following me

Page 3: Tutorial of topological data analysis part 3(Mapper algorithm)

TDA Road Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/

Mapper Algorithm

Page 4: Tutorial of topological data analysis part 3(Mapper algorithm)

Basic motivation

Mapper Algorithm 4

Basic ideaPerform clustering at different “scales”, track how clusters change as scale varies

Motivation• Coarser than manifold learning, but

still works in nonlinear situation

• Extract meaningful geometric information about dataset

• Efficiently computable (for large dataset) Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition.

G Singh, F Mémoli, GE Carlsson - SPBG, 2007

Page 5: Tutorial of topological data analysis part 3(Mapper algorithm)

Morse theory

Mapper Algorithm 5

Basic ideaDescribe topology of a smooth manifold M using level sets of a suitable function h : M -> R

• Recover M by looking at h-1((∞, t]), as t scans over the range of h

• Topology of M changes at critical points of h

Page 6: Tutorial of topological data analysis part 3(Mapper algorithm)

Reeb graphs

Mapper Algorithm 6

• For each t in R, contract each component of f-1(t) to a point

• Resulting structure is a graph

Page 7: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper

Mapper Algorithm 7

The mapper algorithm is a generalization of this procedure (Singh-Memoli-Carlsson)

Input✤ Filter (continuous) function f: X -> R✤ Cover L of im(f) by open intervals:

Method✤ Cluster each inverse image f-1(Lα) into various connected components✤ The Mapper is the nerve of V

• Clusters are vertices• 1 k-simplex per (k+1)-fold intersection

connected cover V

✤ Color vertices according to average value of f in the cluster\ki=0Vi 6= ;, V0, ..., Vk 2 V

Page 8: Tutorial of topological data analysis part 3(Mapper algorithm)

Workflow - Illustration

Mapper Algorithm 8Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/

f could be in n-dimension

Page 9: Tutorial of topological data analysis part 3(Mapper algorithm)

Workflow - Illustration

Mapper Algorithm 9Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/

f could be in n-dimension

Page 10: Tutorial of topological data analysis part 3(Mapper algorithm)

Workflow - Illustration

Mapper Algorithm 10Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/

f could be in n-dimension

Page 11: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper in practice

Mapper Algorithm 11

Input✤ Filter (continuous) function f: P -> R✤ Cover L of im(f) by open intervals:

Method✤ Cluster each inverse image f-1(Lα) into various connected components

in G

✤ The Mapper is the nerve of Vconnected cover V

✤ Color vertices according to average value of f in the cluster

- Point cloud P with metric dP

- Compute neighborhood graph G = (P, E)

• Clusters are vertices• 1 k-simplex per (k+1)-fold intersection

\ki=0Vi 6= ;, V0, ..., Vk 2 V

(intersections materialized by data points)

Page 12: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper in practice

Mapper Algorithm 12Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/

Page 13: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper in practice

Mapper Algorithm 13Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/

Page 14: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper in practice

Mapper Algorithm 14Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/

Page 15: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper in practice

Mapper Algorithm 15

Parameters✤ Filter (continuous) function f: P -> R✤ Cover L of im(f) by open intervals:

✤ Neighborhood size δ

Example: uniform cover L• Resolution / granularity: r (diameter of intervals)

• Gain: g (percentage of overlap)

range scale

geometric scale

Page 16: Tutorial of topological data analysis part 3(Mapper algorithm)

Filter functions

Mapper Algorithm 16

Choice of filter function is essential

• Some kind of density measure• A score measure difference (distance) from some baseline• An eccentricity measure

Statistics

Mean/Max/Min Variance n-Moment Density …

Machine Learning

PCA/SVD Auto encoders Isomap/MDS/TSNE SVM Distance Error/Debugging Info …

Geometry

CentralityCurvatureHarmonic Cycles …

Page 17: Tutorial of topological data analysis part 3(Mapper algorithm)

Filter functions

Mapper Algorithm 17

Eccentricity

Density

- How close the point lies to the “center” of the point cloud.

- How close the point to the surrounding points

Page 18: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper in applications

Mapper Algorithm 18

Extracting insights from the shape of complex data using topology, Lum et al., Nature, 2013

Topological Data Analysis for Discovery in Preclinical Spinal Cord Injury and Traumatic Brain Injury, Nielson et al., Nature, 2015

Using Topological Data Analysis for Diagnosis Pulmonary Embolism, Rucco et al., arXiv preprint, 2014

Topological Methods for Exploring Low-density States in Biomolecular Folding Pathways, Yao et al., J. Chemical Physics, 2009

CD8 T-cell reactivity to islet antigens is unique to type 1 while CD4 T-cell reactivity exists in both type 1 and type 2 diabetes, Sarikonda et al., J. Autoimmunity, 2013

Innate and adaptive T cells in asthmatic patients: Relationship to severity and disease mechanisms, Hinks et al., J. Allergy Clinical Immunology, 2015

Page 19: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper in practice

Mapper Algorithm 19

1. Clustering

2. Feature selection

Page 20: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper in clustering

Mapper Algorithm 20

(1) Compute the Mapper

(2) Detect interesting topological substructures (“loops”, “flares”)

(3) Use substructure to cluster data

select parameters

Not easy (Tutorial part 1 + 2)

Page 21: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper Algorithm 21

Extracting insights from the shape of complex data using topology, Lum et al., Nature, 2013

f: 1st and 2nd SVD r = 120, g = 22%

PCA can show the Republican/Democrat cluster but TDA gives more information

House Party representative groupingPoint: member of the House

PCA

Page 22: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper Algorithm 22

Extracting insights from the shape of complex data using topology, Lum et al., Nature, 2013

Detect new clusters for NBA players

Page 23: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper Algorithm 23

Innate and adaptive T cells in asthmatic patients: Relationship to severity and disease mechanisms, Hinks et al., J. Allergy Clinical Immunology, 2015

The TDA used 62 subjects with most complete data.

f: 1st and 2nd SVDr = 120, g = 14%, equalized

Page 24: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper in feature selection

Mapper Algorithm 24

(1) Compute the Mapper

(2) Detect interesting topological substructures (“loops”, “flares”)

(3) Select features that best discriminate data in substructure

select parameters Kolmogorov-Smirnov test on (substructure) feature vs. (whole dataset) feature,

select features with low p-val

Page 25: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper Algorithm 25

Extracting insights from the shape of complex data using topology, Lum et al., Nature, 2013

Goal: detect factors that influence survival after therapy in breast cancer patients

Points: breast cancer patients that went through specific therapy

PCA/Single-linkage clustering cannot see this

f: eccentricityr = 1/30, g = 33%

Page 26: Tutorial of topological data analysis part 3(Mapper algorithm)

Mapper Algorithm 26

Topological Data Analysis for Discovery in Preclinical Spinal Cord Injury and Traumatic Brain Injury, Nielson et al., Nature, 2015

Page 27: Tutorial of topological data analysis part 3(Mapper algorithm)

Select Parameters

Mapper Algorithm 27

parameter r

parameter g

parameter δ

parameter f

• Small r -> fine cover (close to Reeb) (sensitive to δ)

• Large r -> rough cover (less sensitive to δ)

• g ≈ 1 -> more points inside intersections , less sensitive to δ but far from Reeb

• g ≈ 0 -> controlled Mapper dimension, close to Reeb

• Large δ -> fewer nodes, clean Mapper but far from Reeb (more straight lines)

• Small δ -> distinct topological structure but lots of nodes (noisy)

• Depend mostly on the dataset

coordinate, density estimation, eccentricity, eigenvector

Page 28: Tutorial of topological data analysis part 3(Mapper algorithm)

Select Parameters

Mapper Algorithm 28

Example: P in R2 sampled from known distributionf = density estimator, r = 1/30, g = 20%δ = percentage of the diameter of X

Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/

Page 29: Tutorial of topological data analysis part 3(Mapper algorithm)

Reference links

Mapper Algorithm 29

• INF563 Topological Data Analysis Course http://www.enseignement.polytechnique.fr/informatique/INF563/

• AYASDIhttp://www.ayasdi.com/

• …