1 lecture 13: database heterogeneity debriefing project phase 2
Post on 21-Dec-2015
216 views
TRANSCRIPT
![Page 1: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/1.jpg)
1
Lecture 13:Database Heterogeneity
Debriefing Project Phase 2
![Page 2: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/2.jpg)
2
Outline
• Database Integration
• Wrappers
• Mediators
• Schema Integration
Book Section
![Page 3: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/3.jpg)
3
Database Integration
• How to build applications using multiple DBs?
Ebay DVDorders
IMDB amazon
Oracle PointBase MySQL IBM DB2
movie DB order movie order status
![Page 4: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/4.jpg)
4
Problem Dimensions
Distribution
Autonomy
Heterogeneity
![Page 5: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/5.jpg)
5
How to Deal with Distribution?
• Problems
• Solutions
![Page 6: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/6.jpg)
6
How to Deal with Autonomy?
• Problems
• Solutions
![Page 7: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/7.jpg)
7
How to Deal with Heterogeneity?
• Problems
• Solutions
![Page 8: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/8.jpg)
8
Solution Variants
• General issues– Bottom-up vs. top-down engineering– Virtual vs. materialized integration– Read-only vs. read-write access– Transparency: language, schema, location
• What did you do?
![Page 9: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/9.jpg)
9
A Generic System Architecture
• Wrapper-Mediator architecture
DB1 DB2 DB3 DB4
Oracle PointBase MySQL IBM DB2
wrapper wrapper wrapper wrapper
mediator
application 1 application 2 application 3
mediators integrate thedata from the DBs
wrappers convert to acommon representation
![Page 10: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/10.jpg)
10
A Closer Look at Data Models
• Data model used by sources– relational? HTML? XML? Text?
• Data model used by integrated DB– canonical data model (e.g. relational, XML)
• Query models– Structured queries, retrieval queries, data
mining (statistics)
![Page 11: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/11.jpg)
11
A Generic Wrapper Architecture
request/query result/data
Compensationfor missingprocessing capabilities
Transformationof data model
Communicationinterface
Source data
Metadata
integrity constraints
![Page 12: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/12.jpg)
12
Wrapper Tasks
• Data Model consists of– Data types– Integrity constraints– Operations (e.g. query language)
• Translate among different data models• Overcome other "syntactic" heterogeneity
Which was the task?
How was it implemented?
![Page 13: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/13.jpg)
13
Example: Wrapping Relational Data in XML/HTML
• Data types– trivial
• Integrity Constraints (e.g. primary keys)– requires XML Schema
• Operations– none in HTML
Where did this play a role?
![Page 14: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/14.jpg)
14
Example: Wrapping XML/HTML into Relational
• Data Types– which difficulties?
• Integrity Constraints– none in HTML
• Operations– requires generally XQuery– form fields can be considered as hard-coded
queries
![Page 15: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/15.jpg)
15
A Closer Look at Schemas
• Tight vs. loose integration– Is there a global schema?
• Support for semantic integration– collection, fusion, abstraction
![Page 16: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/16.jpg)
16
Schema Architecture for Federated DBMS
View 1 View 2 View 3
Integrated Schema
ExportSchema
ExportSchema
ExportSchema
ExportSchema
ImportSchema
ImportSchema
ImportSchema
ImportSchema
...
Relational.DBMS
Objectorient.DBMS
FileSystem
WebServer
• accepted model for integrated database systems with integrated schema
• 5-level architecture
• data independence
![Page 17: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/17.jpg)
17
Export Schema
• provided by data source
• source DB can change w/o changing export schema
which was the export schema?
View 1 View 2 View 3
Integrated Schema
ExportSchema
ExportSchema
ExportSchema
ExportSchema
ImportSchema
ImportSchema
ImportSchema
ImportSchema
...
Relational.DBMS
Objectorient.DBMS
FileSystem
WebServer
![Page 18: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/18.jpg)
18
Import Schema
• provided by wrapper
• export schema can change w/o changing import schema
which was the import schema?
View 1 View 2 View 3
Integrated Schema
ExportSchema
ExportSchema
ExportSchema
ExportSchema
ImportSchema
ImportSchema
ImportSchema
ImportSchema
...
Relational.DBMS
Objectorient.DBMS
FileSystem
WebServer
![Page 19: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/19.jpg)
19
Integrated Schema
• provided by mediator
• import schemas can change w/o changing integrated schema
which was the integrated schema?
View 1 View 2 View 3
Integrated Schema
ExportSchema
ExportSchema
ExportSchema
ExportSchema
ImportSchema
ImportSchema
ImportSchema
ImportSchema
...
Relational.DBMS
Objectorient.DBMS
FileSystem
WebServer
![Page 20: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/20.jpg)
20
Application View
• provided by application
• integrated DB can change w/o changing application (code)
which were application views?
View 1 View 2 View 3
Integrated Schema
ExportSchema
ExportSchema
ExportSchema
ExportSchema
ImportSchema
ImportSchema
ImportSchema
ImportSchema
...
Relational.DBMS
Objectorient.DBMS
FileSystem
WebServer
![Page 21: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/21.jpg)
21
Mediator Tasks
• Integrate data with same "real-world meaning", but different representation– integration mapping schema integration– can be implemented, e.g., as database view
• Decompose queries against the integrated schema to queries against source DBs– only for virtual integration
![Page 22: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/22.jpg)
22
Schema Integration
• Standard Methodology
Schema translation(wrapper)
Correspondenceinvestigation
Conflict resolutionand schema integration
![Page 23: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/23.jpg)
23
Identifying Schema Correspondences
Sources of information– source schema– source database– source application– database administrator, developer, user
Which were your information sources?
![Page 24: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/24.jpg)
24
Identifying Schema Correspondences
• Semantic correspondences – e.g. related names
• Structural correspondences– reachability by paths
• Data analysis– distribution of values
Can you give examples?
![Page 25: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/25.jpg)
25
Conflicts
• What types of problems did you encounter integrating corresponding data?
![Page 26: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/26.jpg)
26
Types of Conflicts
• Schema level– Naming conflicts– Structural conflicts– Classification conflicts– Constraint and behavioral conflicts
• Data level– Identification conflicts– Representational conflicts– Data errors
![Page 27: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/27.jpg)
27
Conflict Resolution
• Depends on type of conflict
• Requires construction of mappings
• Mappings might be complex, e.g. not expressible as SQL views
![Page 28: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/28.jpg)
28
Naming Conflicts
• Homonyms (give example)– same name used for different concepts– Resolution: introduce prefixes to distinguish
the names
• Synonyms (give example)– different names for the same concepts– Resolution: introduce a mapping to a common
name
![Page 29: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/29.jpg)
29
Structural Conflicts
• Different, non-corresponding attributes– Resolution: create a relation with the union of
the attributes
• Different datatypes – Resolution: build a mapping function
• Different data model constructs– e.g. attribute vs. relation– Resolution: requires higher order mappings
![Page 30: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/30.jpg)
30
Classification Conflicts
• Relations can have different coverage (inclusion, non-empty intersection)– Resolution: build generalization hierarchies
• Additional problem– Identification of corresponding data instances– "real world" correspondence is application
dependent
![Page 31: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/31.jpg)
31
Data Correspondences
• Corresponding data instances– similar to naming conflicts at schema level– Resolution: mapping tables and functions– Similarity functions
• Corresponding data values, data conflicts– of corresponding data instances– Resolution: mapping tables and functions– Prefer data from more trusted data source
![Page 32: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/32.jpg)
32
Constraint and Behavioral Conflicts
• Cardinality conflicts– different types of cardinality constraints on
relationships– Resolution: use the more general constraint
• Behavioral conflicts for relation update– E.g. cascading delete vs. non-cascading– Resolution: add missing behavior at global level
![Page 33: 1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2](https://reader031.vdocuments.site/reader031/viewer/2022032801/56649d565503460f94a338f4/html5/thumbnails/33.jpg)
33
More?
• Security– protecting data
• Data Quality– actively managing data quality
• Integration as Agreement Process– "emergent semantics"