uncertainty in data integration ai jing 2007-11-10
TRANSCRIPT
![Page 1: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/1.jpg)
Uncertainty in Data Integration
Ai Jing2007-11-10
![Page 2: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/2.jpg)
Outline Data Integration with Uncertainty Overview of Workshop on
Management of Uncertain Data Uncertainty in Deep Web
![Page 3: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/3.jpg)
Outline
Data Integration with Uncertainty Overview of Workshop on
Management of Uncertain Data Uncertainty in Deep Web
![Page 4: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/4.jpg)
Data Integration with Uncertainty
Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions
![Page 5: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/5.jpg)
Data Integration with Uncertainty
Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions
![Page 6: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/6.jpg)
Traditional Data Integration SystemsSELECT P.title AS title, P.year AS year, A.
name AS authorFROM Author, Paper, AuthoredBy
WHERE Author.aid = AuthoredBy.aid AND Paper.pid = AUthoredBy.pid Q
Q1
Q2
Q3
Q4
Q5
![Page 7: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/7.jpg)
Uncertainty Can Occur at Three Levels in Data Integration Applications
III. Query Level
II. Mapping Level
I. Data Level
Focus of the paper:Probabilistic schema mappings
![Page 8: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/8.jpg)
Example Probabilistic Mappings
T(name, email, mailing-addr, home-addr, office-addr)S(pname, email-addr, current-addr, permanent-addr)
T(name, email, mailing-addr, home-addr, office-addr) S(pname, email-addr, current-addr, permanent-addr)
T(name, email, mailing-addr, home-addr, office-addr)
S(pname, email-addr, current-addr, permanent-addr)
m1:
0.5
m2:
0.4
m3:
0.1
![Page 9: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/9.jpg)
Top-k Query Answering w.r.t. Probabilistic Mappings
Mediated Schema
Q: SELECT mailing-addr FROM T
0.5 0.40.1
Q1: SELECT current-addr FROM S
Q2: SELECT permanent-addr FROM S
Q3: SELECT email-addr FROM S
![Page 10: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/10.jpg)
Data Integration with Uncertainty
Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions
![Page 11: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/11.jpg)
Definition of probabilistic mappings
Schema Mapping
Probabilistic Mapping
S=(pname, email-addr, home-addr, office-addr)
T=(name, mailing-addr)
one-to-one schema matchinghave exact knowledge of mapping
S=(pname, email-addr, home-addr, office-addr)
T=(name, mailing-addr)
1.0 0.1 0.5 0.4
![Page 12: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/12.jpg)
By-Table Semantics
DT=
m
0.5
![Page 13: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/13.jpg)
By-Tuple Semantics
DT=
Pr(<m1,m3>)=0.05
…
![Page 14: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/14.jpg)
Data Integration with Uncertainty
Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions
![Page 15: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/15.jpg)
By-Table Query Answering
![Page 16: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/16.jpg)
By-Tuple Query Answering
![Page 17: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/17.jpg)
Data Integration with Uncertainty
Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions
![Page 18: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/18.jpg)
Complexity of query answering
![Page 19: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/19.jpg)
More on By-Tuple Query Answering The high complexity comes from computing probabili
ties the number of mapping sequences is exponential in the size of the i
nput data n tuples, m mappings m^n mapping sequences
There are two subsets of queries that can be answered in PTIME by query rewriting SELECT mailing-addr FROM T SELECT mailing-addr FROM T,V
WHERE T.mailing-addr = V.hightech In general query answering cannot be done by query
rewriting
One of Dt
![Page 20: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/20.jpg)
Extensions to More Expressive Mappings
The complexity results for query answering carry over to three extensions to more expressive mappings Complex mappings
GLAV mappings
Conditional mappings:
![Page 21: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/21.jpg)
Data Integration with Uncertainty
Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions
![Page 22: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/22.jpg)
Contributions
Definition of probabilistic mappingsSemantics: by-table v.s. by-tuple
Complexity of query answering
![Page 23: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/23.jpg)
Outline
Data Integration with Uncertainty Overview of Workshop on
Management of Uncertain Data Uncertainty in Deep Web
![Page 24: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/24.jpg)
Overview of MUD 2007
Theory A New Language and Architecture to Obtain Fuzzy Global Depende
ncies About the Processing of Division Queries Addressed to Possibilistic
Databases Making Aggregation Work in Uncertain and Probabilistic Datab
ases Application
Materialized Views in Probabilistic Databases
Application Flexible matching of Ear Biometrics Consistent Joins Under Primary Key Constraints
![Page 25: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/25.jpg)
A New Language and Architecture to Obtain Fuzzy Global Dependencies
SQL does not satisfy the minimum requirements to be true DM language
A New Language: dmFSQL (data mining Fuzzy Structured Query Language)
Fuzzy Database Data mining
![Page 26: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/26.jpg)
About the Processing of Division Queries Addressed to Possibilistic Databases
They devised a data model which is a strong representation system for operations in possibilistic databases
A possibilistic databases D can be interpreted as a weighted disjunctive set of regular databases
Division Queries
![Page 27: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/27.jpg)
Making Aggregation Work inUncertain and Probabilistic Databases
Trio is a prototype database management system for storing and querying data with uncertainty and lineage
Trio’s query language——TriQL
Trio data model and query semantics
Aggregation function in the Trio system for uncertain and probabilistic data
![Page 28: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/28.jpg)
Materialized Views in Probabilistic Databases
Materialized Views for probabilistic may not define a unique probability distribution
view representation Answer queries on large probabilistic dat
a set more efficiently with materialized views
![Page 29: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/29.jpg)
Flexible matching of Ear Biometrics
Research area Image Recognition (or Identification)
Scenario identifying found bodies in a large-scale disaster
Challenge fast and cheap identification no DNA-databases or fingerprint
databases are at hand
![Page 30: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/30.jpg)
Consistent Joins Under Primary KeyConstraints
Inconsistent database primary key
will the natural join of the repaired relations always be nonempty, no matter whichtuples are selected?
game theory, winning strategy
![Page 31: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/31.jpg)
Outline
Data Integration with Uncertainty Overview of Workshop on
Management of Uncertain Data Uncertainty in Deep Web
![Page 32: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/32.jpg)
Uncertainty in Deep Web
No “perfect” data Noise Dirty Redundancy ……
No “perfect” solution Web data extraction Interface integration ……
![Page 33: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/33.jpg)
Uncertainty in Deep Web Data Integration(1)
Query Translation
Resul ts Extraction
Data Merging
Integrated Interface
Deep Web
WDB Discovery
Interface Integration
RDBWeb DB
Web DB
Web DB
Web DBWeb DB
Interface Schema Extraction
WDB Clustering
Query Process Modul e
I nterface I ntegrati on Modul e
WDB Selection
Query Submission
Resul ts Annotation
Resul t Process Modul e
•Robust•Evaluable
![Page 34: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/34.jpg)
Uncertainty in Deep Web Data Integration(2)
Query Translation
Resul ts Extraction
Data Merging
Integrated Interface
Deep Web
WDB Discovery
Interface Integration
RDBWeb DB
Web DB
Web DB
Web DBWeb DB
Interface Schema Extraction
WDB Clustering
Query Process Modul e
I nterface I ntegrati on Modul e
WDB Selection
Query Submission
Resul ts Annotation
Resul t Process Modul e
•Tuning•Feedback•Evaluable
![Page 35: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/35.jpg)
Uncertainty in Jobtong(1)
Data level
![Page 36: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/36.jpg)
Uncertainty in Jobtong(2)
Query level
How can we give every result a probability to show it’s importance?
![Page 37: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/37.jpg)
Uncertainty in Jobtong(3)
The automatic maintenance of configuration files
<record><xpath>/html/body//table/tr[@class='nob']</xpath> <combination>2</combination> <items> <item> <name>title</name> <xpath>td[2]/a/span</xpath> </item> <item> <name>company</name> <xpath>td[3]/a/span</xpath> </item> </items></record>
<record> <xpath>/html/body//table/tr[@class='list2' or @class='list3']</xpath> <combination>2</combination> <items> <item> <name>title</name> <xpath>td[2]/a</xpath> </item> <item> <name>company</name> <xpath>td[3]/a</xpath> </item> </items></record>
![Page 38: Uncertainty in Data Integration Ai Jing 2007-11-10](https://reader035.vdocuments.site/reader035/viewer/2022062618/5513ceb655034674748b4bca/html5/thumbnails/38.jpg)
Q&A
Thank you!