csci 347 / cs 4206: data mining module 01: introduction topic 03: stages in data mining

7
CSCI 347 / CS 4206: Data Mining Module 01: Introduction Topic 03: Stages in Data Mining

Upload: gordon-barker

Post on 24-Dec-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Slide 2
  • CSCI 347 / CS 4206: Data Mining Module 01: Introduction Topic 03: Stages in Data Mining
  • Slide 3
  • Module 01: Introduction - Objectives Understand the definition of basic data mining terms Understand, at a general level, structural descriptions in data mining Understand, at a general level, the main steps/stages in data mining Be aware of the biases of different basic approaches to data mining Be aware of fielded applications in data mining Understand and identify technical and ethical issues in data mining 2CSCI347/CS4206 Data Mining
  • Slide 4
  • Stages in Data Mining The overall approach your textbook uses to describe data mining is to look at it according to what goes into the system (input), what happens to it (the algorithms or processing), and what comes out (output). Input Data Acquisition Cleansing / Transformation Processing (Algorithms) Output Representation Evaluation 3CSCI347/CS4206 Data Mining
  • Slide 5
  • Input As the text authors state, We are overwhelmed with data. We collect an incredible amount of data, and there are potentially useful patterns in that data, but the vast amount of data available makes it impossible to manually uncover these patterns. Input data is not only divided on the dimension of source or industry, but also by type of data. Is the data numeric or symbolic? Is it relatively error-free, or is there much error in it? Is it consistent? 4CSCI347/CS4206 Data Mining
  • Slide 6
  • Processing Some authors divide the data mining task into two categories: predictive and descriptive (Tan, Steinbach, and Kumar, 2004). Predictive systems use some variables to predict unknown or future values of other variables Descriptive systems find human interpretable patterns in the data. Some predictive systems are: Classification Regression Deviation Detection Some descriptive systems are: Clustering Association Rule Discovery Sequential Pattern Discovery What are some examples of these? 5CSCI347/CS4206 Data Mining
  • Slide 7
  • Output The format of the output of the system is also important. Sometimes the correct answer is all that matters Sometimes it is important that the patterns discovered make sense to human users For example: If Im classifying sea ice, I may not be terribly concerned about the patterns the classifier came up with in making its decisions, as long as I have faith that the decisions are correct. If Im a physician, Im far more worried about having a traceable decision logic that is human readable if Im going to make a decision to intervene or not in a pregnancy. 6CSCI347/CS4206 Data Mining
  • Slide 8
  • THE Mystery Sound And what is the mystery sound for this section???