cs 157b: database management systems ii march 20 class meeting department of computer science san...
TRANSCRIPT
CS 157B: Database Management Systems IIMarch 20 Class Meeting
Department of Computer ScienceSan Jose State University
Spring 2013Instructor: Ron Mak
www.cs.sjsu.edu/~mak
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
2
Unofficial Field Trip
Computer History Museum in Mt. View http://www.computerhistory.org/
Experience a fully restored IBM 1401 mainframe computer from the early 1960s in operation. General info: http://en.wikipedia.org/wiki/IBM_1401 My summer seminar: http://www.cs.sjsu.edu/~mak/1401/ Restoration:
http://ed-thelen.org/1401Project/1401RestorationPage.html Private demos at 11:45 and at 2:00.
See a life-size working model of Charles Babbage’s Difference Engine in operation, a hand-cranked mechanical computer designed in the early 1800s. Public demo at 1:00.
Saturday, March 23.Meet in the museumlobby at 11:15 AM.
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
3
Extra Credit!
There will be extra credit if you participate in the unofficial field trip to the Computer History Museum. Up to 10 points added to your midterm score. To be decided:
a quiz (via Desire2Learn) or an essay
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
4
Extract, Transform, and Load (ETL)
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
5
Extract, Transform, and Load (ETL)
You want only high quality data in your data warehouse.
What is high quality data? correct unambiguous consistent complete
The transform phase of ETL produces high quality data. Cleaning the data. Conforming data from multiple sources.
_
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
6
Extract, Transform, and Load (ETL)
In the real world, data is often dirty. Therefore, the ETL process must clean the source data
when the data is being copied into the data warehouse.
Cleaning operations Remove or correct corrupted data. Remove or correct invalid or inconsistent data.
unexpected null values missing data values out of range misspellings referential integrity violations business rule violations
_
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
7
Extract, Transform, and Load (ETL)
Data from multiple sources may need to be conformed to be usable together in the data warehouse.
Type conversion Example: Convert a user ID in a data source from a string to a
long integer to match with the user ID in other data sources.
Format conversion Example: Dates and times, names
Align field and attribute names Examples: customer_name vs. name_of_client
store vs. retail_outlet_
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
8
ETL: Semantic Mappings
Unit conversions Example: feet vs. yards, miles vs. kilometers
Structural mappings Example: federal state city district
vs. kingdom region parish
Temporal mappings Example: One data source has a measure taken once an hour,
another data source has the same measure taken daily.
Spatial mappings Example: street addresses
vs. GIS coordinates (latitude + longitude) vs. political boundaries (cities, districts, counties, etc.)
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
9
ETL: Semantic Mappings
Spatio-temporal mappings Locations in space-time
And even more complex mappings May require the use of ontologies.
shared vocabularies knowledge structures models of reality etc.
_
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
10
Dimensional Modeling
Fact tables Contain values that are measures, usually numeric.
Example: the number of sales
Dimension tables Contain the context for the measures.
Examples: time, location, product Dimensions are usually grouped and hierarchical
Example: western locations, eastern locations Example: yearly, quarterly, monthly, weekly, daily, hourly
Often denormalized for query performance. Many queries, few updates.
_
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
11
Dimensional Modeling
Design criteria
What are the facts? What are we measuring? Example: number of sales
What is the grain, or granularity of the facts? Determined by the dimensions. All measurements in a fact table must be at the same grain. Example: sales figures collected at the point of sale
What are the dimensions?What context do we need to provide for the measures in the fact table? Examples: stores, dates, products
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
12
Dimensional Modeling
Implementation Star schema
Measures: number of units soldDimensions: date, store, product
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
13
Online Analytical Processing (OLAP)
A common type of business analysis. Also used to analyze scientific data.
Visualize data in a multidimensional manner. Analytical processes
that involve manipulating data along different dimensions.
The OLAP cube.
“What happened recently, and why?”_
http://gerardnico.com/wiki/database/oracle/oracle_olap
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
14
Online Analytical Processing (OLAP)
OLAP operations slice and dice drill up, drill down drill across, drill through pivot
_
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
15
Online Analytical Processing (OLAP)
Slice View or manipulate the data
along a subset of the dimensions.
Consider onlydata from thefirst quarter.
http://www.csun.edu/~twang/595DM/Slides/Week6.pdf
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
16
Online Analytical Processing (OLAP)
Dice View or manipulate the
data within subsets of the ranges of the dimensions.
Consider only data from Q1 and Q2from only Toronto and Vancouver
for only computers and home entertainment.
http://www.csun.edu/~twang/595DM/Slides/Week6.pdf
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
17
Online Analytical Processing (OLAP)
Drill down View or manipulate a
dimension at a lower level of detail.
Drill down on the time dimension
from quarters to months.
http://www.csun.edu/~twang/595DM/Slides/Week6.pdf
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
18
Online Analytical Processing (OLAP)
Drill up “Roll up” (aggregate) data
to a higher level in along a dimension.
Sum up the cities by country.
http://www.csun.edu/~twang/595DM/Slides/Week6.pdf
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
19
Online Analytical Processing (OLAP)
Drill across Integrate data from more than one fact table.
Drill through Access the database tables that underlie the OLAP cube.
_
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
20
Online Analytical Processing (OLAP)
Pivot Rotate the axes (dimensions)
to present a different view.
http://www.csun.edu/~twang/595DM/Slides/Week6.pdf
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
21
OLAP Summary
http://www.csun.edu/~twang/595DM/Slides/Week6.pdf
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
22
DW Summary
http://www.csun.edu/~twang/595DM/Slides/Week6.pdf
Plus: dashboards and scorecards
Department of Computer ScienceSpring 2013: March 20
CS 157B: Database Management Systems II© R. Mak
23
Cognos
Business intelligence (BI) tool from IBM. Queries and reports Dashboards and scorecards OLAP Data mining
predictive analysis
Cognos Business Intelligence 10 is available in the IBM Academic Cloud along with a sample data warehouse. I will create student accounts. Online tutorials Cognos demo