introduction to data mining & warehousing · data mining data mining is the process of...

33
Introduction to Data Mining & Warehousing

Upload: others

Post on 29-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to Data Mining &

Warehousing

Objectives

After finishing this class the

students will:

Understand the basic terms

in Data Mining and

Warehousing

Understand their necessity

in business and IS

Objectives

Understand the basic

concepts of Data Mining

and Warehousing

Understand the

implementation processes of

those concepts

Motivation

Motivation

Lots of data is being collected

and warehoused

Web data, e-commerce

purchases at department/

grocery stores

Bank/Credit Card

transactions

Motivation

Computers have become cheaper and more powerful

Competitive Pressure is Strong

Need better, customized services for an edge (e.g. in Customer Relationship Management)

Motivation

Data Warehousing

A data warehouse is

repository of information

collected from multiple

sources, stored under a

unified scheme, and

usually resides at a

single site

Data Warehousing

A data warehouse is

only a half solution of

mining the huge data

Typical Data Warehousing Architecture

Data Mining

Data Mining

Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

Data Mining

Data mining is the process of discovering

actionable information from large sets of data.

Data mining uses mathematical analysis to

derive patterns and trends that exist in data.

Typically, these patterns cannot be discovered by

traditional data exploration because the

relationships are too complex or because there is

too much data.

Data Mining

Is a synonym for Knowledge Discovery in Database

Discovering the knowledge

Data cleaning

Remove the noise or irrelevant data

Data integration

Combine the possible data sources

Data selection

Retrieve the relevant data for such analysis task

Discovering the knowledge

Data transformation

Transform and consolidate data into a form that appropriate for mining

Data Mining

Pattern evaluation

Identify the interesting patterns that representing the knowledge

Discovering the knowledge

Knowledge Presentation

Visualize and presents the mined knowledge to the user

Typical Data mining architecture

Data mining tasks

Prediction Methods

Use some variables to predict unknown or

future values of other variables.

Data mining tasks

Description Methods

Find human-interpretable patterns that

describe the data.

Data mining tasks

Classification [Predictive]

Clustering [Descriptive]

Association Rule Discovery [Descriptive]

Sequential Pattern Discovery [Descriptive]

Regression [Predictive]

Deviation Detection [Predictive]

Data mining Algorithms

Classification algorithms

predict one or more discrete variables,

based on the other

Regression algorithms

predict one or more continuous variables,

such as profit or loss, based on other

attributes in the dataset.

Data mining Algorithms

Segmentation algorithms

divide data into groups, or clusters, of

items that have similar properties

Data mining Algorithms

Association algorithms

find correlations between different

attributes in a dataset. The most common

application of this kind of algorithm is for

creating association rules, which can be

used in a market basket analysis.

Data mining Algorithms

Sequence analysis algorithms

summarize frequent sequences or

episodes in data, such as a Web path flow.

Data mining Models

patterns and trends that were collected

are defined as a data mining model.

Data mining Models

Forecasting

Estimating sales, predicting server loads or

server downtime

Data mining Models

Risk and probability

Choosing the best customers for targeted

mailings, determining the probable break-

even point for risk scenarios, assigning

probabilities to diagnoses or other

outcomes

Data mining Models

Recommendations

Determining which products are likely to be

sold together, generating

recommendations

Data mining Models

Finding sequences

Analyzing customer selections in a

shopping cart, predicting next likely events

Data mining Models

Grouping

Separating customers or events into

cluster of related items, analyzing and

predicting affinities

References

J. Han, M. Kamber, Data Mining:

Concepts and Techniques, 2001

Dr. Ir. Muhammad Ikhwan Jambak, MEng