What is a Data Warehouse - University of smiertsc/4397cis/What_is_a_Data_ is a Data Warehouse? “A copy of transaction data specifically structured for query and ... Data Warehouse D Wh Data Mining Data Storage

Download What is a Data Warehouse - University of  smiertsc/4397cis/What_is_a_Data_  is a Data Warehouse? “A copy of transaction data specifically structured for query and ... Data Warehouse D Wh Data Mining Data Storage

Post on 01-May-2018

216 views

Category:

Documents

4 download

TRANSCRIPT

  • What is a Data Warehouse?What is a Data Warehouse?What is a Data Warehouse?What is a Data Warehouse?WhatisaDataWarehouse?WhatisaDataWarehouse?WhatisaDataWarehouse?WhatisaDataWarehouse?

    BySusanL.Miertschin

  • A data warehouse is a subject oriented integrated time variantAdatawarehouseisasubjectoriented,integrated,timevariant,nonvolatile,collectionofdatainsupportofmanagement'sdecisionmakingprocess.h // b i dk/ k /fil /Wh i D Whttps://www.business.auc.dk/oekostyr/file/What_is_a_Data_Warehouse.pdf

    2

  • What is a Data Warehouse?WhatisaDataWarehouse?Acopyoftransactiondataspecificallystructuredforqueryandanalysis

    3

  • Data Warehousing is the coordination architected and periodicDataWarehousingisthecoordination,architected,andperiodiccopyingofdatafromvarioussources,bothinsideandoutsidetheenterprise,intoanenvironmentoptimizedforanalyticalandinformational processinginformationalprocessing

    AlanSimonDataWarehousingforDummies

    4

  • BusinessIntelligence(BI)BusinessIntelligence(BI)

    impliesthinkingabstractlyabouttheorganization reasoning about the businessorganization,reasoningaboutthebusiness,organizinglargequantitiesofinformationabout the business environment p 6 inaboutthebusinessenvironment. p.6inGiovinazzo textbook

    Purpose of BI is to define and execute a PurposeofBIistodefineandexecuteastrategy

    5

  • StrategicThinkingStrategicThinking

    BusinessstrategistAlways looking forward to see how the company Alwayslookingforwardtoseehowthecompanycanmeettheobjectivesreflectedinthemissionstatementstatement

    SuccessfulcompaniesDo more than just react to the day to day Domorethanjustreacttothedaytodayenvironment

    Understand the past Understandthepast Areabletopredictandadapttothefuture

    6

  • BusinessIntelligenceLoopBusinessIntelligenceLoop

    Encompasses entire

    BusinessIntelligence Figure11p.2Giovinazzo

    Encompassesentireloopshown

    Data Storage + ETC =

    Business Strategist

    OLAP Data Mining Reports DataStorage+ETC=DataWarehouseD W h

    OLAP Data Mining Reports

    Data Storage

    DataWarehouse+Tools(yellow)=D i i S

    Extraction,Transformation, Cleaning

    DecisionSupportSystem

    CRM Accounting Finance HR

    7

  • TheDataWarehouseTheDataWarehouse

    Decision Support Systems

    D t

    Central Repository

    DataMetadata

    E t tiDependentD t M t Data

    AdministrationExtraction

    LogData Mart

    Cleansing/Tranformation

    ExtractionExtraction

    StoreExternalSource

    IndependentData Mart Operational Environment

    8

    Figure 1-2 p. 9 Giovinazzo

  • DataPathDataPath

    Thepathtogetdatafrom the operational

    Decision Support Systemsfromtheoperationalenvironmenttothebusiness strategist is D t

    Central Repository

    DataMetadata

    E t tiDependentD t M tbusinessstrategistis

    complex There is much more

    DataAdministration

    ExtractionLog

    Data Mart

    Cleansing/Tranformation Thereismuchmoretoadatawarehousethan the Central

    Cleansing/Tranformation

    ExtractionExtraction

    StoreExternalSource

    thantheCentralRepository

    IndependentData Mart Operational Environment

    9

  • OperationalEnvironmentOperationalEnvironment

    Operationalenvironment runs

    Cleansing/Tranformation

    ExtractionExtraction

    Store

    environmentrunsdaytodayactivitiesof the organizationExtraction Store

    IndependentData Mart Operational Environment

    oftheorganization Systemscontainraw data Operational Environment raw datatransactionaldataD t d ib th Datadescribesthecurrentstateofthe

    i tiorganization10

  • IndependentDataMartIndependentDataMart

    DataMartfocusesonone subject area

    Cleansing/Tranformation

    ExtractionExtraction

    Store

    onesubjectareawithintheorganizationExtraction Store

    IndependentData Mart Operational Environment

    organization Datawarehousefocuses on the entireOperational Environment focusesontheentireorganization

    11

  • ExtractionExtraction

    Extractionengineretrieves/receives

    Cleansing/Tranformation

    retrieves/receivesdatafromtheoperational

    ExtractionExtraction

    StoreExternalSource

    operationalenvironment

    Data from other Datafromotherexternalsourcesmayalso be collectedalsobecollectedduringextraction

    12

  • ExtractionStoreExtractionStore

    Holdingareaforthecollected data until it

    Cleansing/Tranformation

    collecteddatauntilitcanbecleanedandtransformed into the

    ExtractionExtraction

    StoreExternalSource

    transformedintothecorrectformat

    13

  • Transformation/CleansingTransformation/Cleansing

    Scrubbing=datatransformation +

    Cleansing/Tranformation

    transformation+cleansing

    Transformation =Extraction

    ExtractionStore

    ExternalSource

    Transformation=convertingdatatoacommon formatcommonformat

    Cleansing=i fremovingerrorsfrom

    data

    14

  • TheExtractionLogTheExtractionLog

    Theextractionlogrecords

    Central Repository

    recordssuccess/failureofextraction process

    DataAdministration

    DataMetadata

    ExtractionLog

    DependentData Mart

    extractionprocesssteps(+more)

    The log is part of the ThelogispartoftheMetadataU d t if lit Usedtoverifyqualityofdataplacedinthe

    hwarehouse15

  • CentralRepositoryCentralRepository

    Cornerstoneofthedata warehousedatawarehousearchitecture

    Stores all the dataCentral Repository

    Storesallthedataandmetadataforthedata warehouse

    DataAdministration

    DataMetadata

    ExtractionLog

    DependentData Mart

    datawarehouse

    16

  • DependentDataMartDependentDataMart

    Differentfromanindependent dataindependentdatamart

    Dependent data martCentral Repository

    Dependentdatamartreliesonthedatawarehouse as the

    DataAdministration

    DataMetadata

    ExtractionLog

    DependentData Mart

    warehouseasthesourceofitsdata

    17

  • BusinessIntelligenceInfrastructureBusinessIntelligenceInfrastructure

    Decision Support Systems

    D t

    Central Repository

    DataMetadata

    E t tiDependentD t M t Data

    AdministrationExtraction

    LogData Mart

    Cleansing/Tranformation

    ExtractionExtraction

    StoreExternalSource

    IndependentData Mart Operational Environment

    18

  • SubjectOrientationofDWSubjectOrientationofDW

    Subject oriented Focuses on dayto

    DataWarehouse OperationalDatabase

    Subjectoriented Focusesonthewhatthings drive

    Focusesonday todaytransactions

    Normalized andthingsdriveoperationaltransactions

    Normalizedandoptimizedforthispurposetransactions purpose

    19

  • DataWarehousevs.OperationalDbDataWarehousevs.OperationalDb

    Gathers distributed Distributed across

    DataWarehouse OperationalDatabase

    Gathersdistributeddatatogetherintoone place

    Distributedacrossmultipletableswithin an applicationoneplace

    Facilitatesanalysisprocesses

    withinanapplication Distributedacrossmultiple applicationsprocesses multipleapplications

    20

  • IntegratingTransactionalDataintoDWIntegratingTransactionalDataintoDW

    Mosttimeconsuming and

    Cleansing/Tranformation

    consumingandproblematicprocess

    Two stepsExtraction

    ExtractionStore

    ExternalSource

    Twosteps DatatransformationD t Cl i DataCleansing

    21

  • DataCleansingDataCleansing

    Removeerrorsfromdataextractedfromtheoperation environmentoperationenvironment

    Critical Whatshouldbedonewithdatathatcontainserrors? Sendthedatabacktobefixedattheoperationallevelandresubmitted

    Fixthedataandinformtheoperationalsystemoftheerrors

    22

  • DataTransformationDataTransformation

    Operationalenvironmentconsistsofnumerous applications and databasesnumerousapplicationsanddatabases

    Datadefinitionswillnotbeconsistent MustbeconsistentformatforDW Fourissuestoaddress

    Description Encodingg UnitsofMeasure FormatFormat

    23

  • DescriptionDescription

    Samethingsmaybedescribeddifferentlyacross systemsacrosssystems

    Mapeachdifferentdescriptionintoasingledescriptiondescription

    Example:customer,client,user

    24

  • EncodingEncoding

    Nominalscale:numberorletterassignedasalabel a category namelabel,acategoryname orderingisarbitraryE l Example: R=Red Red =Red 36=Red B=Blue Blue =Blue 45=Blue

    25

  • UnitsofMeasureUnitsofMeasure

    Measurementsystemmust be common

    Example: Values in metric vsmustbecommon

    Precisioncancauseproblems

    Valuesinmetricvs.Englishunits

    Example:problems Example: 1/3=.333=

    3333333333333.3333333333333

    26

  • FormatFormat

    Differentoperationalsystemsstoredataindifferent formatsdifferentformats

    Example: SS#:999999999

    Char(9) Int Int

    27

  • IsDataaPoliticalIssue?IsDataaPoliticalIssue?

    CanbeO ti l l l d i k h d t Operationalleveldesignersworkhardtomakethedatamatchtheirneeds

    Argumentscanariseoverwhethercustomernameshouldbe30charactersor32characterslong

    28

  • DWContainsaSnapshotDWContainsaSnapshot

    Oncedataisplacedinthecentralrepositoryit becomes readonlyitbecomesreadonly

    Timebecomesanimportantdimensionforthe datathedata

    DW:Aseriesoforganizationalsnapshotsovertime

    29

  • DecisionSupportSystems(DSS)DecisionSupportSystems(DSS)

    Dataplacedinthedatawarehousemustbeeasy to access for business strategistseasytoaccessforbusinessstrategists TimelySupport their mission Supporttheirmission

    ThreecommonDSStools Reports OLAP DataMining

    30

  • ReportsReports

    OneofthemostbasicDSStoolsP t i f ti Presentsummaryinformation

    Reportingtoolshouldsupport Rapiddevelopment Easymaintenance Easydistribution Internetenabled

    31

  • OnOnLineAnalyticalProcessing(OLAP)LineAnalyticalProcessing(OLAP)

    OLAPenvironmentallowsbusinessstrategistto interact directly with datatointeractdirectlywithdata

    OLAPtoolshouldsupportl i l di i l i f d Multipledimensionalpresentationofdata

    Rotation{DataCube} Drilldown/Rollup Whatifanalysis

    32

  • DataMiningDataMining

    DataMiningisdefinedasaprocessofidentifying hidden patterns and relationshipsidentifyinghiddenpatternsandrelationshipswithindata. RobertGroth

    B siness strate ists se OLAP to help them BusinessstrategistsuseOLAPtohelpthemfindanswerstotheirquestions

    Dataminingsuppliesanswerswithoutknowingthequestions(sometimes)

    33

  • InSummaryInSummary

    DWistheheartofBIDW i th j t hi f DWismorethanjustanarchiveofoperationaldata

    Datamustbeformedaccordingtobusinessneedsforstrategicinformation

    Timebecomesadimensionofthedata DSStoolsusedtoanalyzethedatay

    34

  • What is a Data Warehouse?What is a Data Warehouse?What is a Data Warehouse?What is a Data Warehouse?WhatisaDataWarehouse?WhatisaDataWarehouse?WhatisaDataWarehouse?WhatisaDataWarehouse?

    BySusanL.Miertschin

Recommended

View more >