gis data quality producing better data quality through robust business processes brightstar training...

Post on 30-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GIS Data QualityGIS Data Quality

Producing better data quality Producing better data quality through robust business through robust business

processesprocessesBrightStar

TRAININGKim Ollivier

Schedule Day 2Schedule Day 2

Suggested breaks for the following times: Start: 9:00

Session 1 ( 90 min)Morning tea: 10:30 to 10:45

Session 2 ( 105 min)Lunch: 12:30 to 1:30

Session 3 ( 90 min) Afternoon tea: 3:00 to 3:15

Session 4 ( 105 min)Finish: 5:00

Each session will have an exercise or interactive discussion

TopicsTopics

•Metadata

•Designing rules

•Data warehouse and ETL

•Feature maintenance

MetadataMetadata

Data modelData model Business rules, relations, stateBusiness rules, relations, state Subclasses (lookup tables)Subclasses (lookup tables) GIS Metadata NZGLS and ISO XMLGIS Metadata NZGLS and ISO XML Readme.txt or readme.htmlReadme.txt or readme.html

MetadataMetadata

Which standard?Which standard? ISO 19115, NZGMSISO 19115, NZGMS Aust asdd.ga.gov.auAust asdd.ga.gov.au

Examine MetadataExamine Metadata

Geospatial metadataGeospatial metadata Benefit to users or producer?Benefit to users or producer? How do we collect it?How do we collect it? Standardisation or not?Standardisation or not? metadata\topo250k_metadata.htmlmetadata\topo250k_metadata.html metadata\metadata\DCW_DQ_Project.htmDCW_DQ_Project.htm metadata\metadata\meta.htmlmeta.html

Morning Tea

Data Quality RulesData Quality Rules

Attribute domain constraintsAttribute domain constraints Relational integrity rulesRelational integrity rules Rules for historical dataRules for historical data Rules for state-dependent objectsRules for state-dependent objects General dependency rulesGeneral dependency rules Spatial feature rulesSpatial feature rules

A GIS Data A GIS Data Quality SystemQuality System

Assess

Data Quality AssessmentData Profiling

Improve Prevent Recognise

Data CleaningMonitoring

Data IntegrationInterfaces

Ensuring Quality ofData Conversionand Consolidation

Building DataQuality Metadata

Warehouse

Monitor

Recurrent Data QualityAssessment

Assessing QualityAssessing Quality

Project stepsProject steps Required rolesRequired roles Defining the objectivesDefining the objectives Designing rulesDesigning rules Scorecard and MetadataScorecard and Metadata Frequency of assessmentFrequency of assessment

Building RulesBuilding Rules

Data profilingData profiling Interview usersInterview users Examine data modelExamine data model Data GazingData Gazing Application v data matrixApplication v data matrix

Attribute Domain ConstraintsAttribute Domain Constraints

Lookup tablesLookup tables Numeric rangesNumeric ranges Null valuesNull values Blank valuesBlank values Format constraintsFormat constraints PrecisionPrecision Complex domain restraintsComplex domain restraints

Relational Integrity RulesRelational Integrity Rules

Identity ruleIdentity rule Reference rulesReference rules Cardinal rulesCardinal rules Inheritance rulesInheritance rules

Historical DataHistorical Data

Time dependent attributeTime dependent attribute Value constraintsValue constraints Rates of changeRates of change VolatilityVolatility ContinuityContinuity GranularityGranularity

State-dependent ObjectsState-dependent Objects

State-transition modelsState-transition models States, terminatorsStates, terminators ActionsActions

start

Terminated(T)

On Leave(L)

Active(A)

Retired(R)

Deceased(D)

Event HistoriesEvent Histories

An object may have many eventsAn object may have many events Event OverlapsEvent Overlaps Event FrequenciesEvent Frequencies Event ConditionsEvent Conditions

Spatial RulesSpatial Rules

Projection, unitsProjection, units Dimensions 2D,3D,M,ZDimensions 2D,3D,M,Z point,line,polypoint,line,poly PrecisionPrecision TopologyTopology

Valuation RollValuation Roll

Legacy structure, 50 years oldLegacy structure, 50 years old Variable maintenance standardVariable maintenance standard Valuer General audit (DQ spec)Valuer General audit (DQ spec)

Rules ExerciseRules Exercise

Split into pairsSplit into pairs Examine sample DVR datasetExamine sample DVR dataset Devise some rules for each categoryDevise some rules for each category

Verbal discussion with classVerbal discussion with class

Lunch

Data Warehouse & ETLData Warehouse & ETL

Why not direct access to online DB?Why not direct access to online DB? Staging AreaStaging Area Scripting toolsScripting tools Trade-offsTrade-offs KPI for projectKPI for project

• better quality than sourcebetter quality than source• better quality than targetbetter quality than target

ETL ExtractETL Extract

ExtractExtract

ETL TransformETL Transform

The importance of primary keysThe importance of primary keys

ETL LoadETL Load

Batch offline most commonBatch offline most common Daily status usually enoughDaily status usually enough

Safe Software FMESafe Software FME

ExamplesExamples

Afternoon Tea

Data Quality TeamData Quality Team

IT DQ Team Users

Maintenance of featuresMaintenance of features

Time series importantTime series important Line/polygon features are not atomicLine/polygon features are not atomic Splitting loses inheritanceSplitting loses inheritance Calculating depreciation Calculating depreciation Direct editing bypasses business Direct editing bypasses business

rulesrules

Maintenance of the QualityMaintenance of the Quality

Gardening, not mountain climbingGardening, not mountain climbing Discussion of course topicsDiscussion of course topics

ReferencesReferences

Data Quality, Data Quality, The Accuracy DimensionThe Accuracy Dimension – Jack E – Jack E OlsonOlson

The Data Warehouse ETL Toolkit – Ralph KimballThe Data Warehouse ETL Toolkit – Ralph Kimball

Please fill in evaluation forms

Finish

top related