math 4910 & 5910 topological data analysis · 1 prediction methods (supervised learning):...
TRANSCRIPT
“What’s topology got to do with data analysis?”
MATH 4910 & 5910Topological Data Analysis
Instructor: Mehmet Aktas
January 9, 2018
1 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topology is “Pure” math!
http://xkcd.com/435/
2 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Credit
This presentation is inspired by the following:
Robert Ghrist, Barcodes: The Persistent Topology of Data, 2008
Lecture Notes of Sara Kalisnik Verovsek
3 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Outline
1 What is Topology?
2 What is data?
3 Topological Data Analysis
4 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is Topology?
Outline
1 What is Topology?
2 What is data?
3 Topological Data Analysis
5 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is Topology?
Topology
Pure branch of mathematics that dates back to 1700’.
Euler in Konigsberg
Konigsberg was a city in Prussia situated on the Pregel river (modern dayKaliningrad, a major industrial center of western Russia). Seven bridgesspanned the various branches of the river as depicted in the picture.
Is possible to cross all seven bridges exactly once and return to a startingpoint in a single stroll?
6 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is Topology?
Topology
Pure branch of mathematics that dates back to 1700’.
Euler in Konigsberg
Konigsberg was a city in Prussia situated on the Pregel river (modern dayKaliningrad, a major industrial center of western Russia). Seven bridgesspanned the various branches of the river as depicted in the picture.
Is possible to cross all seven bridges exactly once and return to a startingpoint in a single stroll?
6 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is Topology?
Topology
Pure branch of mathematics that dates back to 1700’.
Euler in Konigsberg
Konigsberg was a city in Prussia situated on the Pregel river (modern dayKaliningrad, a major industrial center of western Russia). Seven bridgesspanned the various branches of the river as depicted in the picture.
Is possible to cross all seven bridges exactly once and return to a startingpoint in a single stroll?
6 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is Topology?
What is Topology?
7 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is Topology?
Why Topology?
Three key ideas:
1 Invariance under deformation
2 Coordinate freeness
3 Compressed representations
8 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is Topology?
Why Topology?
Three key ideas:
1 Invariance under deformation
2 Coordinate freeness
3 Compressed representations
8 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is Topology?
Why Topology?
Three key ideas:
1 Invariance under deformation
2 Coordinate freeness
3 Compressed representations
8 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Outline
1 What is Topology?
2 What is data?
3 Topological Data Analysis
9 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Data is Big!
Big Data
Data is everywhere.
10 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Data Growth
An article by Forbesstates that
Data is growing fasterthan ever beforeBy the year 2020,about 1.7 megabytesof new informationwill be created everysecond for everyhuman being on theplanet.
11 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
What do data scientists do?
Make discoveries while swimming indata
12 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
What do data scientists do?
Make discoveries while swimming indata
The statistics represent this significantand growing demand for data scientists.
Data mining tops LinkedIns list of the“hottest skills of 2014”Best Job in USA for 20163,433: Number of Job Openings in2016#16 Highest Paying Job in Demandin 2016
Average Base Salary : $105,395:
12 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
What is Data science/mining?
Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data
13 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
What is Data science/mining?
Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data
Storing, organizing and integrating hugeamount of unstructured data
13 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
What is Data science/mining?
Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data
Storing, organizing and integrating hugeamount of unstructured data
a.k.a. KDD (knowledge discovery indatabases)
13 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
What is Data science/mining?
Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data
Storing, organizing and integrating hugeamount of unstructured data
a.k.a. KDD (knowledge discovery indatabases)
Types of Data
13 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
What is Data science/mining?
Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data
Storing, organizing and integrating hugeamount of unstructured data
a.k.a. KDD (knowledge discovery indatabases)
Types of Data
Time-series data,
13 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
What is Data science/mining?
Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data
Storing, organizing and integrating hugeamount of unstructured data
a.k.a. KDD (knowledge discovery indatabases)
Types of Data
Time-series data,
Sequence data,
13 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
What is Data science/mining?
Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data
Storing, organizing and integrating hugeamount of unstructured data
a.k.a. KDD (knowledge discovery indatabases)
Types of Data
Time-series data,
Sequence data,
Graphs, social networks,
13 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
What is Data science/mining?
Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data
Storing, organizing and integrating hugeamount of unstructured data
a.k.a. KDD (knowledge discovery indatabases)
Types of Data
Time-series data,
Sequence data,
Graphs, social networks,
Multimedia, WWWdata,
13 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
What is Data science/mining?
Non-trivial extraction of implicit,previously unknown, and potentiallyuseful information from data
Storing, organizing and integrating hugeamount of unstructured data
a.k.a. KDD (knowledge discovery indatabases)
Types of Data
Time-series data,
Sequence data,
Graphs, social networks,
Multimedia, WWWdata,
Text data.13 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Application of Data Science
Internet search
14 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Application of Data Science
Internet search
Recommender systems
14 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Application of Data Science
Internet search
Recommender systems
Biological Classification
14 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Application of Data Science
Internet search
Recommender systems
Biological Classification
...
14 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Data Mining Tasks
1 Prediction Methods (Supervised Learning): Predict unknown orfuture values of the data using other known data
15 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Data Mining Tasks
1 Prediction Methods (Supervised Learning): Predict unknown orfuture values of the data using other known data
Classification: Is this A or B?
15 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Data Mining Tasks
1 Prediction Methods (Supervised Learning): Predict unknown orfuture values of the data using other known data
Classification: Is this A or B?
Anomaly detection: Is this weird?
15 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Data Mining Tasks
1 Prediction Methods (Supervised Learning): Predict unknown orfuture values of the data using other known data
Classification: Is this A or B?
Anomaly detection: Is this weird?
Regression: How much? Howmany?
15 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Data Mining Tasks - Continued
2 Description Methods (Unsupervised learning): Findhuman-interpretable (previously unknown) patterns that describe thedata (unlabeled)
16 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Data Mining Tasks - Continued
3 Description Methods (Unsupervised learning): Findhuman-interpretable (previously unknown) patterns that describe thedata (unlabeled)
Clustering: How is dataorganized?
16 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
Data Mining Tasks - Continued
4 Description Methods (Unsupervised learning): Findhuman-interpretable (previously unknown) patterns that describe thedata (unlabeled)
Clustering: How is dataorganized?
Association rule mining:Are these related?
16 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
What is data?
Collection of data objects andtheir attributes
Simple Case : n × d matrix
n objects with d dimensioneach,d columns are called variables,features or attributes ofobjects
17 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
What is data?
What is data?
Collection of data objects andtheir attributes
Simple Case : n × d matrix
n objects with d dimensioneach,d columns are called variables,features or attributes ofobjects
17 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Outline
1 What is Topology?
2 What is data?
3 Topological Data Analysis
18 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Data
In recent years, data is complex
19 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Data
In recent years, data is complex
It is “Big Data”.
19 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Data
In recent years, data is complex
It is “Big Data”.It has also very rich features.
19 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Data
In recent years, data is complex
It is “Big Data”.It has also very rich features.
Usually both!
19 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Data
In recent years, data is complex
It is “Big Data”.It has also very rich features.
Usually both!
The problem in both cases is that thereis not a single story happening in yourdata
19 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Topological Data Analysis
Topological Data Analysis (TDA) is the tool that summarizes out theirrelevant stories to get at something interesting.
TDA has applications in
BiologyMedical SciencesScience of VotingMusicsSports...
20 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Topological Data Analysis
Topological Data Analysis (TDA) is the tool that summarizes out theirrelevant stories to get at something interesting.
TDA has applications in
BiologyMedical SciencesScience of VotingMusicsSports...
20 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Topological Data Analysis
Topological Data Analysis (TDA) is the tool that summarizes out theirrelevant stories to get at something interesting.
TDA has applications in
BiologyMedical SciencesScience of VotingMusicsSports...
20 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Topological Data Analysis
Topological Data Analysis (TDA) is the tool that summarizes out theirrelevant stories to get at something interesting.
TDA has applications in
BiologyMedical SciencesScience of VotingMusicsSports...
20 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Topological Data Analysis
Topological Data Analysis (TDA) is the tool that summarizes out theirrelevant stories to get at something interesting.
TDA has applications in
BiologyMedical SciencesScience of VotingMusicsSports...
20 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Topological Data Analysis
Topological Data Analysis (TDA) is the tool that summarizes out theirrelevant stories to get at something interesting.
TDA has applications in
BiologyMedical SciencesScience of VotingMusicsSports...
20 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Topological Data Analysis
Topological Data Analysis (TDA) is the tool that summarizes out theirrelevant stories to get at something interesting.
TDA has applications in
BiologyMedical SciencesScience of VotingMusicsSports...
20 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Topological Data Analysis
Topological Data Analysis (TDA) is the tool that summarizes out theirrelevant stories to get at something interesting.
TDA has applications in
BiologyMedical SciencesScience of VotingMusicsSports...
20 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Basic Idea in TDA
Data has “shape”,
21 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Basic Idea in TDA
Data has “shape”,
shape has “meaning”,
21 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Basic Idea in TDA
Data has “shape”,
shape has “meaning”,
meaning drives “values”.
21 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Data has “shape”
Convert the data into a graph (moregenerally a simplical complex)
22 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Data has “shape”
Convert the data into a graph (moregenerally a simplical complex)
Use the data points as the vertices ofthe graph
22 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Data has “shape”
Convert the data into a graph (moregenerally a simplical complex)
Use the data points as the vertices ofthe graph
To locate edges;
Choose a radius r and draw circlescentered at data points
22 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Data has “shape”
Convert the data into a graph (moregenerally a simplical complex)
Use the data points as the vertices ofthe graph
To locate edges;
Choose a radius r and draw circlescentered at data pointsLocate edges between pairs of pointswhen their circles intersect
22 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Which radius?
How many groups are there in this data?
23 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Which radius?
How many groups are there in this data?
23 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Which radius?
How many groups are there in this data?
23 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Which radius?
How many groups are there in this data?
23 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Which radius?
How many groups are there in this data?
23 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Shape has “meaning” - Persistence Homology
Persistence homology tracks the evolution of the topological featuresof data across scales
Betti numbers present these topological features of the dataquantitatively: β0 is the number of components, β1 is the number ofholes.
24 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Shape has “meaning” - Persistence Homology
Persistence homology tracks the evolution of the topological featuresof data across scales
Betti numbers present these topological features of the dataquantitatively: β0 is the number of components, β1 is the number ofholes.
24 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Shape has “meaning” - Persistence Homology
Persistence homology tracks the evolution of the topological featuresof data across scales
Betti numbers present these topological features of the dataquantitatively: β0 is the number of components, β1 is the number ofholes.
24 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Shape has “meaning” - Persistence Homology
Persistence homology tracks the evolution of the topological featuresof data across scales
Betti numbers present these topological features of the dataquantitatively: β0 is the number of components, β1 is the number ofholes.
24 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Persistence Barcodes
25 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Persistence Barcodes
25 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Persistence Barcodes
25 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Persistence Barcodes
25 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Persistence Barcodes
25 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Persistence Barcodes
25 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Persistence Barcodes
Barcodes offer an optimum balance between encoding rich shapeinformation and not being computationally intensive.
25 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Persistence Barcodes
Barcodes offer an optimum balance between encoding rich shapeinformation and not being computationally intensive.
25 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Meaning drives “values” - Bottleneck Distance
We have persistence barcodes for each data.
How can we compare two persistence barcodes?
A robust metric between two barcodes: bottleneck distance
26 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Meaning drives “values” - Bottleneck Distance
We have persistence barcodes for each data.
How can we compare two persistence barcodes?
A robust metric between two barcodes: bottleneck distance
26 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis
Topological Data Analysis
Meaning drives “values” - Bottleneck Distance
We have persistence barcodes for each data.
How can we compare two persistence barcodes?
A robust metric between two barcodes: bottleneck distance
26 / 26 Instructor: Mehmet Aktas MATH 4910 & 5910 Topological Data Analysis