Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

Download Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

Post on 18-Jan-2016




0 download

Embed Size (px)


<p>No Slide Title</p> <p>Organized around major subjects, such as customer, product, salesFocusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processingProvide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process4Data WarehouseSubject-OrientedConstructed by integrating multiple, heterogeneous data sourcesrelational databases, flat files, on-line transaction recordsData cleaning and data integration techniques are applied.Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sourcesE.g., Hotel price: currency, tax, breakfast covered, etc.When data is moved to the warehouse, it is converted. 5Data WarehouseIntegratedThe time horizon for the data warehouse is significantly longer than that of operational systemsOperational database: current value dataData warehouse data: provide information from a historical perspective (e.g., past 5-10 years)Every key structure in the data warehouseContains an element of time, explicitly or implicitlyBut the key of operational data may or may not contain time element</p> <p>6Data WarehouseTime VariantA physically separate store of data transformed from the operational environmentOperational update of data does not occur in the data warehouse environmentDoes not require transaction processing, recovery, and concurrency control mechanismsRequires only two operations in data accessing: initial loading of data and access of data7Data WarehouseNonvolatileOLTP (on-line transaction processing)Major task of traditional relational DBMSDay-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.OLAP (on-line analytical processing)Major task of data warehouse systemData analysis and decision makingDistinct features (OLTP vs. OLAP):User and system orientation: customer vs. marketData contents: current, detailed vs. historical, consolidatedDatabase design: ER + application vs. star + subjectView: current, local vs. evolutionary, integratedAccess patterns: update vs. read-only but complex queries8Data Warehouse vs. Operational DBMSOLTP vs. OLAP</p> <p>9What is a data warehouse? A multi-dimensional data modelData warehouse architectureFrom data warehousing to data mining10Data Warehousing and OLAP Technology: An OverviewA data warehouse is based on a multidimensional data model which views data in the form of a data cubeA data cube, such as sales, allows data to be modeled and viewed in multiple dimensionsDimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables11From Tables and Spreadsheets to Data Cubes11Modeling data warehouses: dimensions &amp; measuresStar schema: A fact table in the middle connected to a set of dimension tables Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake12Conceptual Modeling of Data Warehouses12 13Example of Star Schematime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcitystate_or_provincecountrylocationSales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranch14Example of Star Schema</p> <p>15Example of Snowflake Schematime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcity_keylocationSales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_salesMeasuresitem_keyitem_namebrandtypesupplier_keyitembranch_keybranch_namebranch_typebranchsupplier_keysupplier_typesuppliercity_keycitystate_or_provincecountrycity16Example of Snowflake Schema</p> <p>Allows data to be modeled and viewed in multiple dimensionsAnother representation like star schemaDimensions are entities which you want to keep recordsTime, item, branch, locationEach dimension has a table called dimension table17Data Cubes</p> <p>183-D Data Cube 4-D cubes can be visualized as a series of 3-D cubes</p> <p>Supplier 1Supplier 2Supplier 3A data cube is often referred as a cuboidCan generate a cubiod for all possible subsets of dimensionsProvide different level of summarizationN-D cube base cuboidLowest level of summary0-D cube apex cuboidHighest level of summarySummary over all dimensions 19Cuboid20Cube: A Lattice of Cuboidstime,itemtime,item,locationtime, item, location, supplieralltimeitemlocationsuppliertime,locationtime,supplieritem,locationitem,supplierlocation,suppliertime,item,suppliertime,location,supplieritem,location,supplier0-D(apex) cuboid1-D cuboids2-D cuboids3-D cuboids4-D(base) cuboid20Start with time-product (2-D) table that shows sale amounts</p> <p>21Ex. Data Cube GenTVPCVCR1st Qtr10008503502nd Qtr13529402983rd Qtr14506583144th Qtr1500965365USATVPCVCRTVPCVCRTVPCVCR1st Q1000850350260075042513008503502nd Q13529402981752860236120010004003rd Q1450658314105545852011505555104th Q15009653651350106539090075042522Ex. Data Cube Gen (3D)USACanadaMexicoA Sample Data Cube23Total annual salesof TV in U.S.A.DateProductCountryAll, All, Allsumsum TVVCRPC1Qtr2Qtr3Qtr4QtrU.S.ACanadaMexicosum24Cuboids Corresponding to the Cubeallproductdatecountryproduct,dateproduct,countrydate, countryproduct, date, country0-D(apex) cuboid1-D cuboids2-D cuboids3-D(base) cuboidSales volume as a function of product, month, and region25Multidimensional DataProductRegionMonthDimensions: Product, Location, TimeHierarchical summarization pathsIndustry Region Year</p> <p>Category Country Quarter</p> <p>Product City Month Week</p> <p> Office DayA Concept Hierarchy: Dimension (location)26allEuropeNorth_AmericaMexicoCanadaSpainGermanyVancouverM. WindL. Chan..................allregionofficecountryTorontoFrankfurtcityRoll up (drill-up): summarize databy climbing up hierarchy or by dimension reductionDrill down (roll down): reverse of roll-upfrom higher level summary to lower level summary or detailed data, or introducing new dimensionsSlice and dice: project and select Pivot (rotate): reorient the cube, visualization, 3D to series of 2D planes27Typical OLAP Operations28</p> <p>What is a data warehouse? A multi-dimensional data modelData warehouse architectureFrom data warehousing to data mining29Chapter 3: Data Warehousing and OLAP Technology: An Overview30Data Warehouse: A Multi-Tiered ArchitectureDataWarehouseExtractTransformLoadRefreshOLAP EngineAnalysisQueryReportsData miningMonitor&amp;IntegratorMetadataData SourcesFront-End ToolsServeData MartsOperational DBsOthersourcesData StorageOLAP ServerData Marta subset of corporate-wide data that is of value to a specific groups of users. Its scope is confined to specific, selected groups, such as marketing data martMeta data is the data defining warehouse objects. It stores:Description of the structure of the data warehouseschema, view, dimensions, hierarchies, data mart locations and contentsMonitoring information warehouse usage statistics, error reportsAlgorithms used for summarization</p> <p>31Warehouse ArchitectureWhat is a data warehouse? A multi-dimensional data modelData warehouse architectureFrom data warehousing to data mining32Chapter 3: Data Warehousing and OLAP Technology: An OverviewThree kinds of data warehouse applicationsInformation processingsupports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphsAnalytical processingmultidimensional analysis of data warehouse datasupports basic OLAP operations, slice-dice, drilling, pivotingData miningknowledge discovery from hidden patterns supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools33Data Warehouse UsageWhat is a data warehouse? A multi-dimensional data modelData warehouse architectureFrom data warehousing to data miningSummary34Chapter 3: Data Warehousing and OLAP Technology: An OverviewWhy data warehousing?A multi-dimensional model of a data warehouseStar schema, snowflake schemaA data cube consists of dimensions &amp; measuresOLAP operations: drilling, rolling, slicing, dicing and pivotingData warehouse architecture35Summary: Data Warehouse and OLAP TechnologyOLTPOLAP</p> <p>functionday to day operationsdecision support</p> <p>DB designapplication-orientedsubject-oriented</p> <p>datacurrent, up-to-date</p> <p>detailed, flat relational</p> <p>isolatedhistorical, </p> <p>summarized, multidimensional</p> <p>integrated, consolidated</p> <p>accessread/write</p> <p>index/hash on prim. keylots of scans</p> <p>unit of workshort, simple transactioncomplex query</p> <p># records accessedtensmillions</p> <p>#usersthousandshundreds</p> <p>DB size100MB-GB100GB-TB</p>


View more >