querying heterogeneous information sources using source descriptions authors: alon y. levy anand...

15
Querying Heterogeneous Information Sources Using Source Descriptions Authors: Alon Y. Levy Anand Rajaraman Joann J. Ordille Presenter: Yihong Ding

Post on 21-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Querying Heterogeneous Information Sources Using Source Descriptions

Authors: Alon Y. Levy

Anand Rajaraman

Joann J. Ordille

Presenter: Yihong Ding

Challenges for Information Integration

• Interrelated data over multiple information sources

• Large number of the sources

• Limited size of data in many of the sources

• Greatly variant details of interacting with each source

IM Architecture

1

2 3

Bucket algorithm

IM World View

Product(Model)Automobile(Model, Year, Category)Motorcycle(Model, Year)Car(Model, Year, Category) NewCar(Model, Year, Category)UsedCar(Model, Year, Category) CarForSale(Model, Year, Category, Price, SellerContact)

Automobile

Car Motorcycle

Car

UsedCar CarForSale

Product

Automobile

Virtual Relations:

Classes:

NewCar

Source Descriptions

For each source:

• Content Record • Capability Record

Web Sources forAutomobile Application

Content Records of Auto Sources

Capability Recordsof Auto Sources

desired input set possible output set

capable selection set

Query Reformulation

• Containing instead of equivalent– Incomplete source – Useful subset

• Utilizes Plan Generator to:– Prune irrelevant sources– Split query into subgoals– Generate conjunctive query plans– Find executable ordering of subgoals

The Bucket Algorithm

Given: user query q, source descriptions {Vi}

1. Find relevant source (fill buckets) For each relation g in query q

• Find Vj that contains relation g

• Check that constraints in Vj are compatible with q

2. Combine source relations {Vj} from each bucket into a conjunctive query q’ and check for containment (q’ q)

The Bucket Algorithm: Example

q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992, Model(c,m), Price(c,p), ProductReview(m,y,r)

q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992, Model(c,m), Price(c,p), ProductReview(m,y,r)

1. Filling the Buckets

q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992,

Model(c,m), Price(c,p), ProductReview(m,y,r)

q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992,

Model(c,m), Price(c,p), ProductReview(m,y,r)

V1(c1)

V2(c2)

V3(c3)

V1(c1,t1)

V2(c2,t2)

V3(c3,t3)

V1(c1,y1)

V2(c2,y2)

V3(c3,y3)

V1(c1,m1)

V2(c2,m2)

V3(c3,m3)

V1(c1,p1)

V2(c2,p2)

V3(c3,p3)

V5(m5,y5,r5)

CarForSale(c), Category(c,t), Year(c,y), Model(c,m), Price(c,p), ProductReview(m,y,r)y1992t=sportscar

2. Checking Containment

User Queryq(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992,

Model(c,m), Price(c,p), ProductReview(m,y,r)

User Queryq(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992,

Model(c,m), Price(c,p), ProductReview(m,y,r)

Result Queryq’(m,p,r) V1(c)({Category(c):sportscar}, {Price(c), Model (c), Year(c)}, {Year(c)1992, Category(c)=sportscar}), V5(m,y,r)({m:Model(c), y:Year(c)}, {r}, {}).

Result Queryq’(m,p,r) V1(c)({Category(c):sportscar}, {Price(c), Model (c), Year(c)}, {Year(c)1992, Category(c)=sportscar}), V5(m,y,r)({m:Model(c), y:Year(c)}, {r}, {}).

?

Expanded Queryq’(m,p,r) CarForSale(c), UsedCar(c), Category(c,t), t=sportscar, Model(c,m), Year(c,y), Price(c,p), ProductReview(m,y,r), y1992

Expanded Queryq’(m,p,r) CarForSale(c), UsedCar(c), Category(c,t), t=sportscar, Model(c,m), Year(c,y), Price(c,p), ProductReview(m,y,r), y1992

Finding an Executable Ordering

CarForSale(c), Category(c,t), Year(c,y), Model(c,m), Price(c,p), ProductReview(m,y,r)y1992t=sportscar

V1(c) V1(c,t) V1(c,y) V1(c,m) V1(c,p) V5(m,y,r)

BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s)}

BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s)}

BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r)}

BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r)}

BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r), y1992}

BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r), y1992}

Experimental Results

• Query 1: Find titles and years of movies featuring Tom Hanks

• Query 2: Find titles and reviews of movies featuring Tom Hanks

• Query 3: Find telephone number(s) for Alaska Airlines

Conclusions

• Source descriptions as content record and capability record

• Bucket algorithm for query reformulation