What is a Data Warehouse - University of smiertsc/4397cis/What_is_a_Data_ is a Data Warehouse? “A copy of transaction data specifically structured for query and ... Data Warehouse D Wh Data Mining Data Storage

Download What is a Data Warehouse - University of  smiertsc/4397cis/What_is_a_Data_  is a Data Warehouse? “A copy of transaction data specifically structured for query and ... Data Warehouse D Wh Data Mining Data Storage

Post on 01-May-2018

216 views

Category:

Documents

4 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>What is a Data Warehouse?What is a Data Warehouse?What is a Data Warehouse?What is a Data Warehouse?WhatisaDataWarehouse?WhatisaDataWarehouse?WhatisaDataWarehouse?WhatisaDataWarehouse?</p><p>BySusanL.Miertschin</p></li><li><p>A data warehouse is a subject oriented integrated time variantAdatawarehouseisasubjectoriented,integrated,timevariant,nonvolatile,collectionofdatainsupportofmanagement'sdecisionmakingprocess.h // b i dk/ k /fil /Wh i D Whttps://www.business.auc.dk/oekostyr/file/What_is_a_Data_Warehouse.pdf</p><p>2</p></li><li><p>What is a Data Warehouse?WhatisaDataWarehouse?Acopyoftransactiondataspecificallystructuredforqueryandanalysis</p><p>3</p></li><li><p>Data Warehousing is the coordination architected and periodicDataWarehousingisthecoordination,architected,andperiodiccopyingofdatafromvarioussources,bothinsideandoutsidetheenterprise,intoanenvironmentoptimizedforanalyticalandinformational processinginformationalprocessing</p><p> AlanSimonDataWarehousingforDummies</p><p>4</p></li><li><p>BusinessIntelligence(BI)BusinessIntelligence(BI)</p><p> impliesthinkingabstractlyabouttheorganization reasoning about the businessorganization,reasoningaboutthebusiness,organizinglargequantitiesofinformationabout the business environment p 6 inaboutthebusinessenvironment. p.6inGiovinazzo textbook</p><p> Purpose of BI is to define and execute a PurposeofBIistodefineandexecuteastrategy</p><p>5</p></li><li><p>StrategicThinkingStrategicThinking</p><p> BusinessstrategistAlways looking forward to see how the company Alwayslookingforwardtoseehowthecompanycanmeettheobjectivesreflectedinthemissionstatementstatement</p><p> SuccessfulcompaniesDo more than just react to the day to day Domorethanjustreacttothedaytodayenvironment</p><p> Understand the past Understandthepast Areabletopredictandadapttothefuture</p><p>6</p></li><li><p>BusinessIntelligenceLoopBusinessIntelligenceLoop</p><p> Encompasses entire</p><p>BusinessIntelligence Figure11p.2Giovinazzo</p><p>Encompassesentireloopshown</p><p> Data Storage + ETC =</p><p>Business Strategist</p><p>OLAP Data Mining Reports DataStorage+ETC=DataWarehouseD W h</p><p>OLAP Data Mining Reports</p><p>Data Storage</p><p> DataWarehouse+Tools(yellow)=D i i S</p><p>Extraction,Transformation, Cleaning</p><p>DecisionSupportSystem</p><p>CRM Accounting Finance HR</p><p>7</p></li><li><p>TheDataWarehouseTheDataWarehouse</p><p>Decision Support Systems</p><p>D t</p><p>Central Repository</p><p>DataMetadata</p><p>E t tiDependentD t M t Data</p><p>AdministrationExtraction</p><p>LogData Mart</p><p>Cleansing/Tranformation</p><p>ExtractionExtraction</p><p>StoreExternalSource</p><p>IndependentData Mart Operational Environment</p><p>8</p><p>Figure 1-2 p. 9 Giovinazzo</p></li><li><p>DataPathDataPath</p><p> Thepathtogetdatafrom the operational</p><p>Decision Support Systemsfromtheoperationalenvironmenttothebusiness strategist is D t</p><p>Central Repository</p><p>DataMetadata</p><p>E t tiDependentD t M tbusinessstrategistis</p><p>complex There is much more</p><p>DataAdministration</p><p>ExtractionLog</p><p>Data Mart</p><p>Cleansing/Tranformation Thereismuchmoretoadatawarehousethan the Central</p><p>Cleansing/Tranformation</p><p>ExtractionExtraction</p><p>StoreExternalSource</p><p>thantheCentralRepository</p><p>IndependentData Mart Operational Environment</p><p>9</p></li><li><p>OperationalEnvironmentOperationalEnvironment</p><p> Operationalenvironment runs</p><p>Cleansing/Tranformation</p><p>ExtractionExtraction</p><p>Store</p><p>environmentrunsdaytodayactivitiesof the organizationExtraction Store</p><p>IndependentData Mart Operational Environment</p><p>oftheorganization Systemscontainraw data Operational Environment raw datatransactionaldataD t d ib th Datadescribesthecurrentstateofthe</p><p>i tiorganization10</p></li><li><p>IndependentDataMartIndependentDataMart</p><p> DataMartfocusesonone subject area</p><p>Cleansing/Tranformation</p><p>ExtractionExtraction</p><p>Store</p><p>onesubjectareawithintheorganizationExtraction Store</p><p>IndependentData Mart Operational Environment</p><p>organization Datawarehousefocuses on the entireOperational Environment focusesontheentireorganization</p><p>11</p></li><li><p>ExtractionExtraction</p><p> Extractionengineretrieves/receives</p><p>Cleansing/Tranformation</p><p>retrieves/receivesdatafromtheoperational</p><p>ExtractionExtraction</p><p>StoreExternalSource</p><p>operationalenvironment</p><p> Data from other Datafromotherexternalsourcesmayalso be collectedalsobecollectedduringextraction</p><p>12</p></li><li><p>ExtractionStoreExtractionStore</p><p> Holdingareaforthecollected data until it</p><p>Cleansing/Tranformation</p><p>collecteddatauntilitcanbecleanedandtransformed into the</p><p>ExtractionExtraction</p><p>StoreExternalSource</p><p>transformedintothecorrectformat</p><p>13</p></li><li><p>Transformation/CleansingTransformation/Cleansing</p><p> Scrubbing=datatransformation +</p><p>Cleansing/Tranformation</p><p>transformation+cleansing</p><p> Transformation =Extraction</p><p>ExtractionStore</p><p>ExternalSource</p><p> Transformation=convertingdatatoacommon formatcommonformat</p><p> Cleansing=i fremovingerrorsfrom</p><p>data</p><p>14</p></li><li><p>TheExtractionLogTheExtractionLog</p><p> Theextractionlogrecords</p><p>Central Repository</p><p>recordssuccess/failureofextraction process</p><p>DataAdministration</p><p>DataMetadata</p><p>ExtractionLog</p><p>DependentData Mart</p><p>extractionprocesssteps(+more)</p><p> The log is part of the ThelogispartoftheMetadataU d t if lit Usedtoverifyqualityofdataplacedinthe</p><p>hwarehouse15</p></li><li><p>CentralRepositoryCentralRepository</p><p> Cornerstoneofthedata warehousedatawarehousearchitecture</p><p> Stores all the dataCentral Repository</p><p> Storesallthedataandmetadataforthedata warehouse</p><p>DataAdministration</p><p>DataMetadata</p><p>ExtractionLog</p><p>DependentData Mart</p><p>datawarehouse</p><p>16</p></li><li><p>DependentDataMartDependentDataMart</p><p> Differentfromanindependent dataindependentdatamart</p><p> Dependent data martCentral Repository</p><p> Dependentdatamartreliesonthedatawarehouse as the</p><p>DataAdministration</p><p>DataMetadata</p><p>ExtractionLog</p><p>DependentData Mart</p><p>warehouseasthesourceofitsdata</p><p>17</p></li><li><p>BusinessIntelligenceInfrastructureBusinessIntelligenceInfrastructure</p><p>Decision Support Systems</p><p>D t</p><p>Central Repository</p><p>DataMetadata</p><p>E t tiDependentD t M t Data</p><p>AdministrationExtraction</p><p>LogData Mart</p><p>Cleansing/Tranformation</p><p>ExtractionExtraction</p><p>StoreExternalSource</p><p>IndependentData Mart Operational Environment</p><p>18</p></li><li><p>SubjectOrientationofDWSubjectOrientationofDW</p><p> Subject oriented Focuses on dayto</p><p>DataWarehouse OperationalDatabase</p><p>Subjectoriented Focusesonthewhatthings drive</p><p>Focusesonday todaytransactions</p><p> Normalized andthingsdriveoperationaltransactions</p><p> Normalizedandoptimizedforthispurposetransactions purpose</p><p>19</p></li><li><p>DataWarehousevs.OperationalDbDataWarehousevs.OperationalDb</p><p> Gathers distributed Distributed across</p><p>DataWarehouse OperationalDatabase</p><p>Gathersdistributeddatatogetherintoone place</p><p>Distributedacrossmultipletableswithin an applicationoneplace</p><p> Facilitatesanalysisprocesses</p><p>withinanapplication Distributedacrossmultiple applicationsprocesses multipleapplications</p><p>20</p></li><li><p>IntegratingTransactionalDataintoDWIntegratingTransactionalDataintoDW</p><p> Mosttimeconsuming and</p><p>Cleansing/Tranformation</p><p>consumingandproblematicprocess</p><p> Two stepsExtraction</p><p>ExtractionStore</p><p>ExternalSource</p><p> Twosteps DatatransformationD t Cl i DataCleansing</p><p>21</p></li><li><p>DataCleansingDataCleansing</p><p> Removeerrorsfromdataextractedfromtheoperation environmentoperationenvironment</p><p> Critical Whatshouldbedonewithdatathatcontainserrors? Sendthedatabacktobefixedattheoperationallevelandresubmitted</p><p> Fixthedataandinformtheoperationalsystemoftheerrors</p><p>22</p></li><li><p>DataTransformationDataTransformation</p><p> Operationalenvironmentconsistsofnumerous applications and databasesnumerousapplicationsanddatabases</p><p> Datadefinitionswillnotbeconsistent MustbeconsistentformatforDW Fourissuestoaddress</p><p> Description Encodingg UnitsofMeasure FormatFormat</p><p>23</p></li><li><p>DescriptionDescription</p><p> Samethingsmaybedescribeddifferentlyacross systemsacrosssystems</p><p> Mapeachdifferentdescriptionintoasingledescriptiondescription</p><p> Example:customer,client,user</p><p>24</p></li><li><p>EncodingEncoding</p><p> Nominalscale:numberorletterassignedasalabel a category namelabel,acategoryname orderingisarbitraryE l Example: R=Red Red =Red 36=Red B=Blue Blue =Blue 45=Blue</p><p>25</p></li><li><p>UnitsofMeasureUnitsofMeasure</p><p> Measurementsystemmust be common</p><p> Example: Values in metric vsmustbecommon</p><p> Precisioncancauseproblems</p><p>Valuesinmetricvs.Englishunits</p><p> Example:problems Example: 1/3=.333=</p><p>3333333333333.3333333333333</p><p>26</p></li><li><p>FormatFormat</p><p> Differentoperationalsystemsstoredataindifferent formatsdifferentformats</p><p> Example: SS#:999999999</p><p> Char(9) Int Int</p><p>27</p></li><li><p>IsDataaPoliticalIssue?IsDataaPoliticalIssue?</p><p> CanbeO ti l l l d i k h d t Operationalleveldesignersworkhardtomakethedatamatchtheirneeds</p><p> Argumentscanariseoverwhethercustomernameshouldbe30charactersor32characterslong</p><p>28</p></li><li><p>DWContainsaSnapshotDWContainsaSnapshot</p><p> Oncedataisplacedinthecentralrepositoryit becomes readonlyitbecomesreadonly</p><p> Timebecomesanimportantdimensionforthe datathedata</p><p> DW:Aseriesoforganizationalsnapshotsovertime</p><p>29</p></li><li><p>DecisionSupportSystems(DSS)DecisionSupportSystems(DSS)</p><p> Dataplacedinthedatawarehousemustbeeasy to access for business strategistseasytoaccessforbusinessstrategists TimelySupport their mission Supporttheirmission</p><p> ThreecommonDSStools Reports OLAP DataMining</p><p>30</p></li><li><p>ReportsReports</p><p> OneofthemostbasicDSStoolsP t i f ti Presentsummaryinformation</p><p> Reportingtoolshouldsupport Rapiddevelopment Easymaintenance Easydistribution Internetenabled</p><p>31</p></li><li><p>OnOnLineAnalyticalProcessing(OLAP)LineAnalyticalProcessing(OLAP)</p><p> OLAPenvironmentallowsbusinessstrategistto interact directly with datatointeractdirectlywithdata</p><p> OLAPtoolshouldsupportl i l di i l i f d Multipledimensionalpresentationofdata</p><p> Rotation{DataCube} Drilldown/Rollup Whatifanalysis</p><p>32</p></li><li><p>DataMiningDataMining</p><p> DataMiningisdefinedasaprocessofidentifying hidden patterns and relationshipsidentifyinghiddenpatternsandrelationshipswithindata. RobertGroth</p><p> B siness strate ists se OLAP to help them BusinessstrategistsuseOLAPtohelpthemfindanswerstotheirquestions</p><p> Dataminingsuppliesanswerswithoutknowingthequestions(sometimes)</p><p>33</p></li><li><p>InSummaryInSummary</p><p> DWistheheartofBIDW i th j t hi f DWismorethanjustanarchiveofoperationaldata</p><p> Datamustbeformedaccordingtobusinessneedsforstrategicinformation</p><p> Timebecomesadimensionofthedata DSStoolsusedtoanalyzethedatay</p><p>34</p></li><li><p>What is a Data Warehouse?What is a Data Warehouse?What is a Data Warehouse?What is a Data Warehouse?WhatisaDataWarehouse?WhatisaDataWarehouse?WhatisaDataWarehouse?WhatisaDataWarehouse?</p><p>BySusanL.Miertschin</p></li></ul>

Recommended

View more >