the shape of data: machine learning and topology · •r •tda, tdamapper, igraph, networkd3…...
TRANSCRIPT
![Page 1: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/1.jpg)
The Shape of Data: Machine Learning and Topology
Minnesota Developers Conference 2018Kaisa Taipale
![Page 2: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/2.jpg)
Disclosures
• Grew up in pure math• I hack through code like someone lost in the Amazon with only a
machete (it’s not pretty, but it gets me someplace!)• Reproducibility matters to me because in working with students, non-
reproducible code causes me personal pain• Finance is interesting to me as a dynamical system and as an
exploration of sociology
![Page 3: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/3.jpg)
The plan:
• First, a high-level look at the context of applications of topology• Next, geek out on pure math• Last, put it into practice: packages and tools
![Page 4: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/4.jpg)
Math finance, statistics… Amazon'I'choose'daily'data'and'fit'the'distributions'of'it.' '1. Cauchy'
'2. Norm' '
'''
Statistics: linear regression, p-values, distributionsFinancial math: Black-Scholes, binomial trees, time series, stochastic modeling
![Page 5: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/5.jpg)
….machine learning… topology?
Machine learning: Neural networks, clustering, manifold learning
Topology?! Shape of data, for feature discovery and interpolation between clusters and manifold learning
![Page 6: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/6.jpg)
Finding use in diabetes research…
Identification of type 2 diabetes subgroups through topological analysis of patient similarityLi Li,1 Wei-Yi Cheng,1 Benjamin S. Glicksberg,1 Omri Gottesman,2 Ronald Tamler,3 Rong Chen,1 Erwin P. Bottinger,2and Joel T. Dudley1,4,*
![Page 7: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/7.jpg)
What’s topology?
![Page 8: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/8.jpg)
Simplicial complexes
By cflm (talk) - Own work. Derived from File:Simplicial complex example.png by Trevorgoodchild(en.wp), released under PD-self. Coloured using Inkscape., Public Domain, https://commons.wikimedia.org/w/index.php?curid=7937755
![Page 9: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/9.jpg)
What are simplices?
![Page 10: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/10.jpg)
Persistent homologyPersistent homology: what topological features persist as we vary the cutoff parameter?
(Image from BARCODES: THE PERSISTENT TOPOLOGY OF DATA by Robert Ghrist)
![Page 11: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/11.jpg)
Betti numbers: just count simplices
![Page 12: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/12.jpg)
Big topology ideas
I’ll talk about four major ideas from the TDA toolbox and then apply them to finance examples.• Super-level sets • Persistent homology• The Mapper algorithm for visualization• Betti numbers
![Page 13: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/13.jpg)
Big topology ideas
I’ll talk about four major ideas from the TDA toolbox and then apply them to finance examples.
• Super-level sets (look at graph from δ>c)
• Persistent homology (look at shapes that persist as δ varies)
• The Mapper algorithm for visualization (slice and cluster, then build a simpler graph)
• Betti numbers (counting number of simplices in each dimension)
![Page 14: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/14.jpg)
Tools
• Python • Kepler-mapper, moguTDA, …
• R • TDA, TDAmapper, Igraph, NetworkD3…
• Docker, Git, etc
• Gigantum?• Julia: Eirene (fast!)
![Page 15: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/15.jpg)
How do we use topology on data?
• Data generally comes as a point cloud (a set of points), ideally in a csv file.• Load the data• Specify a metric (a way to quantify “nearness”) – this is a choice• Build simplices on the data set • Analyze the topology of the simplicial complexes
![Page 16: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/16.jpg)
Dow Jones stocks: correlation network in R
Transform correlation and build correlation networks. First, all edges :
12
3
45
6
7
8
9
10
11
12
1314
15
16
17 18
19
20
21
22
23
24
25
26
27
2829
![Page 17: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/17.jpg)
Dow Jones stocks: superlevel sets in R
![Page 18: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/18.jpg)
Dow Jones stocks: superlevel sets in R
![Page 19: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/19.jpg)
Dow Jones stocks: superlevel sets in R
![Page 20: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/20.jpg)
Dow Jones stocks: superlevel sets in R
![Page 21: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/21.jpg)
Dow Jones stocks: persistent homology
![Page 22: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/22.jpg)
Dow Jones stocks: persistent homology
![Page 23: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/23.jpg)
Dow Jones stocks: persistent homology
![Page 24: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/24.jpg)
Dow Jones stocks: persistent homology
![Page 25: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/25.jpg)
Dow Jones: Betti numbers (Python)
![Page 26: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/26.jpg)
Dow Jones Betti numbers today (Python)
The bump is in January 2018.
![Page 27: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/27.jpg)
S&P 500: data issues
• Stocks enter and leave the S&P 500 based on company characteristics• By definition, the S&P represents only a particular view of the
equities market!• Problems with data continuity: mergers, bankruptcies, etc.
![Page 28: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/28.jpg)
S&P 500: superlevel sets
Just a mess!!! Very hard to interpret.
![Page 29: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/29.jpg)
S&P 500: singular value decomposition
![Page 30: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/30.jpg)
S&P 500: Mapper (iGraph in R)
![Page 31: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/31.jpg)
S&P 500: Mapper (networkD3 in R)
![Page 32: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/32.jpg)
Thank you!
• Thank you, Minnesota Developers Conference audience and organizers!
• Thank you to the MCFAM summer seminar participants: Jialing Cai, Yunpeng Liu, Ayman Ahmed, John Burbidge, Fiona Jiang, Ziqi Dong, John Nguyen, Zhongwu Wang, Ayush Bansal, Ameya Phadke, Ziran Xu, Heng, Qinzheng Xu, Yin Xu, Doreen Vescelius, Yifan Xu, Bo Zhu, Jianfeng Liu….
![Page 33: The Shape of Data: Machine Learning and Topology · •R •TDA, TDAmapper, Igraph, NetworkD3… •Docker, Git, etc •Gigantum? •Julia: Eirene (fast!) How do we use topology on](https://reader033.vdocuments.site/reader033/viewer/2022050406/5f838a6b5b0c2d57dc5bee10/html5/thumbnails/33.jpg)
ReferencesNews/pop science:
• https://www.technologyreview.com/s/602234/how-the-mathematics-of-algebraic-topology-is-revolutionizing-brain-science/• https://www.wired.com/story/the-mind-boggling-math-that-maybe-mapped-the-brain-in-11-dimensions/
Academic:
• A talk with some different applications – mode discovery, image analysis, http://www.sci.utah.edu/~beiwang/acmbcbworkshop2016/slides/ChaoChen.pdf
• Robert Ghrist has written great mathy notes! https://www.math.upenn.edu/~ghrist/
• Kathryn Hess is a mathematician working on neuroscience problems – mentioned in Wired article above – see technical talk at https://www.youtube.com/watch?v=vD27zKxoio0&index=6&list=PL4kY-dS_mSmJ4DU2OmOUWB8QIN5nG0CMv
• Marian Gidea, some of the first public applications to finance: https://arxiv.org/abs/1701.06081
• Some of the founders of the field are Gunnar Carlsson, Gurjeet Singh, Afra Zomorodian – look for papers with their names.
Follow me if you want to see upcoming applications to finance!
• http://www.kaistataipale.net/blog or http://www-users.math.umn.edu/~taipale/