allison gilmore, data scientist, ayasdi at mlconf sf - 11/13/15

18
The Shape of Data Allison Gilmore Principal Data Scientist November 13, 2015

Upload: mlconf

Post on 16-Feb-2017

1.020 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

The Shape of DataAllison Gilmore

Principal Data Scientist

November 13, 2015

Page 2: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

2Company Confidential & Proprietary

Data has shape.Shape has meaning.

You already know this.

Page 3: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary

Shape as Organizing Principle

Page 4: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary

Geometry or Topology?Geometry : Metric Topology : Locality

≅ ≅

Page 5: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary

Topological Summaries Capture Shape

          

    

 

 

  

         

             

      

 

          

 

           

 

 

  

         

      

 

        

   

   

 

        

    

    

        

 Lens

 

Page 6: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary

Topological Summaries Capture Shape

  

Page 7: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary

Topological Summaries Capture Shape

  

Page 8: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary

Enhancing Traditional Methods

Page 9: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary 9

Topological Summaries Capture Shape

Nodes are groups of similar data points.

Edges connect similar nodes.

Node position on the screen does not matter.

Page 10: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary

Enhancing Traditional Methods

PCA sees 3 clusters.

Using PCA coordinates as lenses, we can see more.

Page 11: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary

Topological Summary Shows 4 Clusters

Page 12: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary

Disease State & Model Choice

David Schneider, Stanford Microbiology and Immunology

Page 13: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary 14

Topological Model for Total Knee Replacement

Low length of stay

Low to moderate length of stay

Long length of stay

Page 14: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary

Carepaths for Total Knee Replacement

16

Page 15: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary 18

Beating* the Curse of Dimensionality

* I mean, there are always conditions.Niyogi, Smale, and Weinberger, A Topological View of Unsupervised Learning from Noisy Data,SIAM J. of Computing 20(2011) 646-663. http://math.uchicago.edu/~shmuel/noise.pdf

If a dataset is supported near a manifold, its key topological features can be detected from a sample whose size is independent of the dimension of ambient space.

Doesn’t matter!Dimension d < N

Page 16: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

19Company Confidential & Proprietary

[email protected]

www.ayasdi.com

Page 17: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary 20

Understanding Shape Improves Models

20

HighLow

Ground Truth Fraud Model Predicted Fraud

HighLow

Page 18: Allison Gilmore, Data Scientist, Ayasdi at MLconf SF - 11/13/15

Company Confidential & Proprietary 21

Topology Guides Model Creation