Sztuka czytania między wierszami
czyli język R i Data Mining w akcji
Katarzyna Mrowca
<me>
</me>
The deal
Agenda
• Quick glance on theory - Data mining• Exercises on… paper• Quick glance on tool – R console• Exercises – became friend with R• …
Agenda
• Quick glance on theory - Data mining• Exercises on… paper• Quick glance on tool – R console• Exercises – became friend with R• …
Exercise
Theory
Agenda
• Quick glance on theory - Data preparation• Exercises • Decision trees• Cluser analysis• Text mining• …
Exercise
Theory
Agile is everywhere!
Agile is everywhere!
• Retro after second break
Quick glance on theory!
What data mining is?
What „google” says?
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), [1] an interdisciplinary subfield of computer science,
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
What „google” says?
The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
What „google” says?
The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
What „google” says?
The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
What „google” says?
Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
Source: wikipedia
Data mining – what is „inside”
• Predictive• Regression• Classification• Collaborative Filtering
• Descriptive• Clustering / similarity matching• Association rules and variants• Deviation detection
Data mining – what is „inside”
• Predictive:• Regression• Classification• Collaborative Filtering
• Descriptive:• Clustering / similarity matching• Association rules and variants• Deviation detection
Data mining – what is „inside”
• Predictive:• Regression• Classification• Collaborative Filtering
• Descriptive:• Clustering / similarity matching• Association rules and variants• Deviation detection
What data mining is not?
Why Data Mining is so popular?
What is a difference between statistics and data mining?
Exercise
Data preparation
Variables
Qualitative & Quantitative
Tame R console!
Take a break
Regression
Time series
Decision trees
Regression trees
Classification trees
K means
Text mining
Thank you!