introduction to data mining group members: karim c. el-khazen pascal suria lin gui philsou lee...
TRANSCRIPT
![Page 1: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/1.jpg)
Introduction to Data Mining
Group Members:
Karim C. El-KhazenPascal Suria
Lin GuiPhilsou Lee
Xiaoting Niu
![Page 2: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/2.jpg)
DefinitionGeneral Concept
FoundationsEvolutionApplicationsChallenges
AlgorithmsClassicalNext Generations
Introduction to Data Mining
![Page 3: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/3.jpg)
What is Data Mining?
Data mining is the process for the non-trivial extraction of implicit, previously unknown and potentially useful information from data stored in repositories using pattern recognition technologies as well as statistical and mathematical methods.
Introduction to Data Mining
![Page 4: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/4.jpg)
Introduction to Data Mining
Foundations
Massive data collectionPowerful multiprocessor computersData mining algorithms
![Page 5: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/5.jpg)
Introduction to Data Mining
EvolutionEvolutionaryStep
Business Question EnablingTechnologies
ProductProviders
Characteristics
Data Collection(1960s)
"What was my totalrevenue in the lastfive years?"
Computers, tapes,disks
IBM, CDC Retrospective, staticdata delivery
Data Access(1980s)
"What were unitsales in NewEngland lastMarch?"
Relationaldatabases(RDBMS), SQL,ODBC
Oracle, Sybase,Informix, IBM,Microsoft
Retrospective,dynamic datadelivery at recordlevel
DataWarehousing &Decision Support(1990s)
"What were unitsales in NewEngland last March?Drill down toBoston."
OLAP, multi-dimensionaldatabases, datawarehouses
Pilot, Comshare,Arbor, Cognos,Microstrategy
Retrospective,dynamic datadelivery at multiplelevels
Data Mining(EmergingToday)
"What’s likely tohappen to Bostonunit sales nextmonth? Why?"
Advancedalgorithms,multiprocessorcomputers,massive databases
Pilot, Lockheed,IBM, SGI,numerous startups
Prospective,proactiveinformation delivery
![Page 6: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/6.jpg)
Introduction to Data Mining
ApplicationsIndustry
RetailsHealth maintenance groupTelecommunications Credit card
Web miningSports and entertainment solutions
![Page 7: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/7.jpg)
Introduction to Data Mining
Challenges
Ability to handle different types of data Graceful degeneration of data mining algorithms Valuable data mining results Representation of data mining requests and results Mining at different abstraction levels Mining information from different sources of data Protection of privacy and data security
![Page 8: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/8.jpg)
Introduction to Data Mining
Hierarchy of Choices and Decisions
Business goalCollecting, cleaning and preparing dataPredictionModel type and algorithms
![Page 9: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/9.jpg)
Introduction to Data Mining
Data Description
Descriptions of data characteristics in elementary and aggregated formSummarization Visualization
![Page 10: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/10.jpg)
Introduction to Data Mining
Predictive Data Mining
Predictive modeling is a term used to describe the process of mathematically or mentally representing a phenomenon or occurrence with a series of equations or relationships.
![Page 11: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/11.jpg)
Introduction to Data Mining
Prediction: Classification
Classification predicts class membership Pre-classify (using classification algorithms)Test to determine the quality of the modelPredict (using effective classifier)
![Page 12: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/12.jpg)
Introduction to Data Mining
Prediction: Regression
Regression takes a numerical dataset and develops a mathematical formula that fits the data.
When you're ready to use the results to predict future behavior, you simply take your new data, plug it into the developed formula and you get a prediction!
![Page 13: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/13.jpg)
Introduction to Data Mining
AlgorithmsClassical Techniques
StatisticsNeighborhoodsClustering
Next GenerationsDecision TreeNeural NetworkRule Induction
![Page 14: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/14.jpg)
Introduction to Data Mining
StatisticsClassical Statistics:
Related to the collection and description of dataBelieves: there exists an underlying pattern of data
distributionObjective: find the best guess
Data Mining:Employs statistical methodsNeeds to analyze huge amounts of dataBeyond traditional statistics
![Page 15: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/15.jpg)
Introduction to Data Mining
NeighborhoodsBasic idea:
For a new problem, look for the similar problems (neighborhoods) that have been solved
Key point: find the neighborhoodCalculate the distance: how far is good to be
considered as a neighbor?Which class the new problem belong to?
Large computational load:New calculation for each new case
![Page 16: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/16.jpg)
Introduction to Data Mining
ClusteringElements grouped together according to different
characteristicsEvery cluster share same values (homogenous)
Problem: Control the number of clusterHierarchical clustering: flexibilityNon-hierarchical clustering: given by user
Used most frequently for:Consolidating data into a high-level of viewGroup records into likely behaviors
![Page 17: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/17.jpg)
Introduction to Data Mining
Decision TreeA way of representing a series of rules that lead to a
class or value
Structure: Decision node, branches, leaves
Example: A loan officer wants to determine the credit of applicants
![Page 18: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/18.jpg)
Introduction to Data Mining
Decision Tree (continued)
Help to induce the tree and its rules to make predictions
![Page 19: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/19.jpg)
Introduction to Data Mining
Neural NetworksEfficiently modeling large and complex
problems with hundreds of predictor variablesStructure:
Input layer, hidden layer, output layerActivation function between nodesRequires training and testing of relations
![Page 20: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/20.jpg)
Introduction to Data Mining
Neural Networks (continued)Example:
![Page 21: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/21.jpg)
Introduction to Data Mining
Rule InductionA method to derive a set of rules to classify
casesFor example, rule induction can be used to
discover patterns relating decisions (e.g., credit card application)
Rules may not cover all possible situations
![Page 22: Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu](https://reader035.vdocuments.site/reader035/viewer/2022062304/56649e855503460f94b88071/html5/thumbnails/22.jpg)
Introduction to Data Mining