data quality class 4. goals questions review of sql select data quality rules
Post on 21-Dec-2015
220 views
TRANSCRIPT
![Page 1: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/1.jpg)
Data Quality
Class 4
![Page 2: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/2.jpg)
Goals
Questions Review of SQL select Data Quality Rules
![Page 3: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/3.jpg)
SQL
Structured Query Language Used to extract data from databases Used to insert data into a database
![Page 4: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/4.jpg)
The Select Statement
select [all | distinct] <select_list> from [<table_name> | <view_name> ] [,[<table_name> | <view_name> ] . . .] [where <search_condition>] [group by <column_name> [, <column_name>]. . .] [having <search_conditions>] [order by {<column_name> | <select_list_number>} [asc | desc]
[,{<column_name> | <select_list_number>} [asc | desc]] . . .]
![Page 5: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/5.jpg)
Data Quality Rules
Definitions Proscriptive Assertions Prescriptive Assertions Conditional Assertions Operational Assertions
![Page 6: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/6.jpg)
Definitions
Nulls Domains Mappings
![Page 7: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/7.jpg)
Proscriptive Assertions
Describe what is not allowed Used to figure out what is wrong with data Used for validation
![Page 8: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/8.jpg)
Prescriptive Assertions
Describe what is supposed to happen with data Can be used for data population, extraction,
transformation Can also be used for validation
![Page 9: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/9.jpg)
Conditional Assertions
Define an assertion that must be true if a condition is true
![Page 10: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/10.jpg)
Operational Assertions
Define an action that must be taken if a condition is true
![Page 11: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/11.jpg)
9 Classes of Rules
1. Null value rules2. Value rules3. Domain membership rules4. Domain Mappings5. Relation rules6. Table, Cross-table, and Cross-message assertions7. In-Process directives8. Operational Directives9. Other rules
![Page 12: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/12.jpg)
Null Value Rules
Null value specification– Define GETDATE for unavailable as “fill in date”
Null values allowed– Attribute A allowed nulls {GETDATE, U, X}
Null values not allowed– Attribute B nulls not allowed
![Page 13: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/13.jpg)
Value Rules
Value restriction ruleRestrict GRADE: value >= ‘A’ AND value <= ‘F’
AND value != ‘E’
![Page 14: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/14.jpg)
Domain Rules
Domain Definition Domain Membership Domain Nonmembership Domain Assignment
![Page 15: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/15.jpg)
Mapping Rules
Mapping definition Mapping membership Mapping nonmembership Mapping Assignment
![Page 16: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/16.jpg)
Relation Rules
Completeness Exemption Consistency Derivation
![Page 17: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/17.jpg)
Completeness
Defines when a record is complete (I.e., what fields must be present)IF (Orders.Total > 0.0), Complete With
{Orders.Billing_Street,
Orders.Billing_City,
Orders.Billing_State,
Orders.Billing_ZIP}
![Page 18: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/18.jpg)
Exemption
Defines which fields may be missingIF (Orders.Item_Class != “CLOTHING”) Exempt
{Orders.Color,
Orders.Size
}
![Page 19: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/19.jpg)
Consistency
Define a relationship between attributes based on field content– IF (Employees.title == “Staff Member”) Then
(Employees.Salary >= 20000 AND Employees.Salary < 30000)
![Page 20: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/20.jpg)
Derivation
Prescriptive form of consistency rule Details how one attribute’s value is determined
based on other attributesIF (Orders.NumberOrdered > 0) Then {
Orders.Total = (Orders.NumberOrdered * Orders.Price) * 1.05
}
![Page 21: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/21.jpg)
Table and Cross-Table Rules
Functional Dependence Primary Key Assertion Foreign Key Assertion (=referential integrity)
![Page 22: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/22.jpg)
Functional Dependence
Functional Dependence between columns X and Y:– For any two records R1 and R2 in a table,
if field X of record R1 contains value x and field X of record R2 contains the same value x, then if field Y of record R1 contains the value y, then field Y of record R2 must contain the value y.
In other words, attribute Y is said to be determined by attribute X.
![Page 23: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/23.jpg)
Primary Key Assertion
A set of attributes defined as a primary key must uniquely identify a record
Enforcement = testing for duplicates across defined key set
![Page 24: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/24.jpg)
Foreign Key Assertion
When the values in field f in table T is chosen from the key values in field g in table S, field S.g is said to be a foreign key for field T.f
If f is a foreign key, the key must exist in table S, column g (=referential integrity)
![Page 25: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/25.jpg)
In-process Directives
Definition directives (labeling information chain members)
Measurement directives Trigger directives
![Page 26: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/26.jpg)
Operational Directives
Transformation Update
![Page 27: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules](https://reader035.vdocuments.site/reader035/viewer/2022062714/56649d5e5503460f94a3d07f/html5/thumbnails/27.jpg)
Other Rules
Approximate Searching rules Approximate Matching rules