data warehousing and mining original) (2)
TRANSCRIPT
-
8/2/2019 Data Warehousing and Mining Original) (2)
1/25
Data Warehousing And Data Mining
Presented By
E.David Joshua
&Pankaj Jain
CSE department
Tirumala Engineering College
Keesara, Bogaram.
-
8/2/2019 Data Warehousing and Mining Original) (2)
2/25
Topics To Be discussed:
Introduction
History Of Data Warehousing
Data Warehousing
Data Warehouse Architecture
Data Mining KDD process
Classification of data mining systems
Data mining Architecture
Conclusion
-
8/2/2019 Data Warehousing and Mining Original) (2)
3/25
Introduction: Data Warehousing, OLAP(Online Analytical Processing) and Data Mining: what and why ?
Relation to OLTP(Online Transaction Processing )
A producer wants to know.
-
8/2/2019 Data Warehousing and Mining Original) (2)
4/25
Data, Data everywhere
yet ...
I cant find the data I need:
data is scattered over the network
many versions, subtle differences
I cant get the data I need:
need an expert to get the data
I cant understand the data I found:
available data poorly documented
I cant use the data I found:
results are unexpected
data needs to be transformed from one form to other
-
8/2/2019 Data Warehousing and Mining Original) (2)
5/25
Data should be integrated across the enterprise Summary data has a real value to the organization
Historical data holds the key to understand data over time
What-if capabilities are required
What are the users saying...
We need a special Data Base..!!
A single, complete and consistent store of data
obtained from a variety of different sources made
available to end users in a what they can understandand use in a business context is called DATA
WAREHOUSING.
-
8/2/2019 Data Warehousing and Mining Original) (2)
6/25
60s: Batch reportshard to find and analyze information
inflexible and expensive, reprogram every new request
70s: Terminal-based DSS and EIS (executive information systems)still inflexible, not integrated with desktop tools
80s: Desktop data access and analysis toolsquery tools, spreadsheets, GUIs
easier to use, but only access operational databases
90s: Data warehousing with integrated OLAP engines and tools
History Of Data Warehousing:
-
8/2/2019 Data Warehousing and Mining Original) (2)
7/25
What is Data Warehousing?
Data
Information
A process of transforming data intoinformation and making it available to users
in a timely enough manner to make a
difference.
Simply, it is a collection of various databases
into a single roof.
A data warehouse is a subject-oriented
integrated
time-variant
non-volatile
collection of data that is used primarily in
organizational decision making.
-
8/2/2019 Data Warehousing and Mining Original) (2)
8/25
DataWarehouseArchitecture..!!
-
8/2/2019 Data Warehousing and Mining Original) (2)
9/25
Client:-
* Query specification
* Data Analysis
* Data access
Application/Data Mart Server:-
* Summarizing
* Filtering
* Meta Data
DW Server:-
* Data logic
* Data services
* Meta data
* File services
Three-Tier Architecture of Data Warehouse
-
8/2/2019 Data Warehousing and Mining Original) (2)
10/25
Construction And Maintaining of Warehouse..!!
A good database schema must me designed to hold an integratedcollection of data copied from various sources.
Data is extracted from operational databases and external sources.
Cleaned to minimize errors and Fill in missing information when possible.
The cleaned and transformed data is finally loaded into the warehouse.
The Transforming of data is typically accomplished by defining
a relational view over the tables in the data sources.
-
8/2/2019 Data Warehousing and Mining Original) (2)
11/25
DepartmentallyStructured
IndividuallyStructured
Data WarehouseOrganizationallyStructured
Less
More
HistoryNormalizedDetailed
Data
Information
Data Warehouse vs. Data Marts
A data mart is the access layer of the data
warehouse environment that is used to getdata out to the users.
-
8/2/2019 Data Warehousing and Mining Original) (2)
12/25
Problems with Data Mart Centric Solution
If you end up creating multiple warehouses, integratingthem is a problem
-
8/2/2019 Data Warehousing and Mining Original) (2)
13/25
True Warehouse
Data Marts
Data Sources
Data Warehouse
-
8/2/2019 Data Warehousing and Mining Original) (2)
14/25
You are going to spend much time
extracting, cleaning, and loading data.
You are going to find problems with
systems feeding the data warehouse.
Your warehouse users will develop conflicting business rules.
You will need to validate data not being validated by transaction processing systems.
Data Warehouse Pitfalls
You are building a HIGH maintenance
system.
-
8/2/2019 Data Warehousing and Mining Original) (2)
15/25
For a Successful Warehouse
From day one establish that
warehousing is a joint user/builder
project.
Look closely at the data extracting,cleaning, and loading tools.
Determine a plan to test the
integrity of the data in the
warehouse.
From the start get warehouse users in the habit of 'testing' complex queries.
-
8/2/2019 Data Warehousing and Mining Original) (2)
16/25
Data Mining
-
8/2/2019 Data Warehousing and Mining Original) (2)
17/25
What is data mining?
Extractingor mining Knowledge from large amount of data.
On what kind of data mining can be done..??
Relational databases
Data warehouses
Transactional databases
Advanced data and Information systems
-
8/2/2019 Data Warehousing and Mining Original) (2)
18/25
Data Mining: A KDD Process
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
KnowledgData mining: The core of knowledgediscovery process.
-
8/2/2019 Data Warehousing and Mining Original) (2)
19/25
Data Mining
Database
TechnologyStatistics
Other
Disciplines
Information
Science
Machine
LearningVisualization
Classification of data mining systems:
-
8/2/2019 Data Warehousing and Mining Original) (2)
20/25
-
8/2/2019 Data Warehousing and Mining Original) (2)
21/25
What kind of patterns can be mined..?
Concept/Class description
Mining frequent patterns, Association and Correlations
Classification and Prediction
Cluster Analysis
Outliner Analysis
Evolution analysis
Query Processing
Indexing: Exploiting indexes to reduce scanning of data is ofcrucial importance.
-
8/2/2019 Data Warehousing and Mining Original) (2)
22/25
Bitmap Indexes
22
Customer
Query : select * fromcustomer wheregender = F and vote
= Y
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
M
F
F
F
F
M
Y
Y
Y
N
N
N
Join Indexes
A join index between a fact table and a dimension table correlates a
dimension tuple with the fact tuples that have the same value on the
common dimensional attribute.
-
8/2/2019 Data Warehousing and Mining Original) (2)
23/25
Data Mining and Data Warehousing
The goal of a data warehouse is to support decision making with data.
Data mining can be used in conjunction with a data warehouse to help
with certain types of decisions.
Data mining can be applied to operational databases with individual
transactions.
To make data mining more efficient, the data warehouse should havean aggregated or summarized collection of data.
Goals of Data Mining and Knowledge Discovery
Prediction:
Identification:
Classification:
Optimization:
-
8/2/2019 Data Warehousing and Mining Original) (2)
24/25
A data warehouse takes the organizations operational data, historical data and
external data consolidates it into a separately designed manages it into a format
that is optimized for end users to access and analyze.
The data warehouse technology together with online transaction processing and
data mining, allows the management to provide better customer service.
Last but never the least; the Internet has emerged as the largest data
warehouse of unstructured and free form data. The new technologies aregeared towards mining this great data warehouse.
CONCLUSION
-
8/2/2019 Data Warehousing and Mining Original) (2)
25/25
Thank You..!!
Any Queries..??