planning and optimizing semantic information requests using domain modeling and resource...
TRANSCRIPT
Planning and Optimizing Planning and Optimizing Semantic Information Requests Semantic Information Requests Using Domain Modeling and Using Domain Modeling and Resource CharacteristicsResource Characteristics
byShuchi Patel
OutlineOutline
Motivation InfoQuilt Background Planning and Optimization IScape Execution Execution Monitoring Related Work Conclusions and Future Work
MotivationMotivation
Explosion of data Heterogeneities between sources Limitations of web search techniques Limitations of database search
techniques Need to manually integrate data
Related WorkRelated Work
SIMS (USC) TSIMMIS (Stanford, IBM Almaden) Information Manifold (AT&T) OBSERVER Infomaster (Stanford)
InfoQuilt - GoalInfoQuilt - Goal
To provide an environment that allows users to query and analyze the data available from a multitude of diverse autonomous sources (including web-based sources), gain better understanding of the domains and their interactions as well as study hypothetical relationships to establish or disprove them.
Important Building BlocksImportant Building Blocks
Semantic domain modeling Semantic inter-domain relationship
modeling Resource characteristics modeling Complex operations Semantic information request modeling Learning paradigm
OntologiesOntologies
Disaster
eventDate
description
region => latitude, longitude
sitelatitude
longitude
Natural Disaster
Man-made Disaster
damage
numberOfDeaths
damagePhoto
Volcano
EarthquakeNuclearTest
magnitude
bodyWaveMagnitude
conductedBy
explosiveYield
bodyWaveMagnitude < 10
bodyWaveMagnitude > 0
magnitude < 10
magnitude > 0
Terms/Concepts(Attributes)
Functional Dependencies
(FDs)
Domain Rules
Hierarchies
Ontologies..Ontologies..
NuclearTest ( site, explosiveYield, bodyWaveMagnitude,
testType, eventDate, conductedBy,
latitude, longitude,
bodyWaveMagnitude > 0,
bodyWwaveMagnitude < 10,
testSite -> latitude longitude );
OperationsOperations
Complex operators Post-processing data Simulations
Clarke Urban Growth ModelModeling urban growth and land use
change
Clarke Urban Growth Model Clarke Urban Growth Model (UGM)(UGM)
Source: http://edcdgs9.cr.usgs.gov/urban/factsht.pdf
Inter-Ontological RelationshipsInter-Ontological Relationships
A nuclear test could have caused an earthquakeif the earthquake occurred some time after thenuclear test was conducted and in a nearby region.
NuclearTest Causes Earthquake
<= dateDifference( NuclearTest.eventDate,
Earthquake.eventDate ) < 30
AND distance( NuclearTest.latitude,
NuclearTest.longitude,
Earthquake,latitude,
Earthquake.longitude ) < 10000
Resource modelingResource modeling
Attributes available Data Characteristic (DC) rules
A web site on earthquakesearthquakes after January 1, 1990
eventDate > “January 1, 1990”
Resource Modeling...Resource Modeling...
Local CompletenessHartsfield International Airport(http://atlanta-airport.com/)
flights to and from Atlanta airport
toCity = “Atlanta”
fromCity = “Atlanta”
Resource Modeling...Resource Modeling...
Binding Pattern
[toCity, toState, fromCity, fromState, departureMonth, departureDay]
AirTran Airways (www.airtran.com)
Information Scape (IScape)Information Scape (IScape)
Specified in terms of components of knowledge base
“understands” user’s information request
“Find all earthquakes with epicenter less than 5000 mile from the location at latitude 60.790 North and longitude 97.570 East and find all tsunamis that they might have caused”
Learning ParadigmLearning Paradigm
Understand domains and relationships between them
Query and analyze data from multiple autonomous and heterogeneous sources
Explore potential relationships Analyze data to support or disprove
potential relationships
Where we are...Where we are...
Motivation ADEPT Background Planning and Optimization IScape Execution Execution Monitoring Related Work Conclusions and Future Work
Planning and OptimizationPlanning and Optimization
IScapes are specified in terms of ontologies
Source selection Execution plans that are executable Plans that retrieve more complete
information Integrate data from sources Optimization using domain and resource
characteristics
IScape 1IScape 1NuclearTestsDB( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, [dc] waveMagnitude > 3, [dc] eventDate > “January 1, 1985” );
NuclearTestSites( testSite, latitude, longitude );
SignificantEarthquakesDB( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, [dc] eventDate > “January 1, 1970” );
NuclearTest( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, latitude, longitude, waveMagnitude > 0, waveMagnitude < 10, testSite -> latitude longitude );
Earthquake( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, magnitude > 0 );
“Find all nuclear tests conducted by India or Pakistan after January 1, 1995 with seismic body wave magnitude > 4.5 and find all earthquakes that could have been
caused due to these tests.”
NuclearTest Causes Earthquake <= dateDifference( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance( NuclearTest.latitude, NuclearTest.longitude, Earthquake,latitude, Earthquake.longitude ) < 10000
Ontology Ontology
ResourceResource
Resource
Relationship
IScape
Semantic CheckSemantic Check
Apply domain rules Check if IScape is semantically
correct Constraint reduction
IScape 1IScape 1NuclearTestsDB( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, [dc] waveMagnitude > 3, [dc] eventDate > “January 1, 1985” );
NuclearTestSites( testSite, latitude, longitude );
SignificantEarthquakesDB( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, [dc] eventDate > “January 1, 1970” );
NuclearTest( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, latitude, longitude, waveMagnitude > 0, waveMagnitude < 10, testSite -> latitude longitude );
Earthquake( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, magnitude > 0 );
“Find all nuclear tests conducted by India or Pakistan after January 1, 1995 with seismic body wave magnitude > 4.5 and find all earthquakes that could have been
caused due to these tests.”
NuclearTest Causes Earthquake <= dateDifference( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance( NuclearTest.latitude, NuclearTest.longitude, Earthquake,latitude, Earthquake.longitude ) < 10000
Ontology Ontology
ResourceResource
Resource
Relationship
IScape
Source SelectionSource Selection
One ontology at a time First check locally complete sources
The DC rules of the resource should not falsify the IScape constraint
If none found, select all sources The DC rules of a resource should not falsify
the IScape’s constraint Binding Patterns on resources should be
respected
IScape 1IScape 1
NuclearTestsDB( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, [dc] waveMagnitude > 3, [dc] eventDate > “January 1, 1985” );
NuclearTestSites( testSite, latitude, longitude );
SignificantEarthquakesDB( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, [dc] eventDate > “January 1, 1970” );
NuclearTest( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, latitude, longitude, waveMagnitude > 0, waveMagnitude < 10, testSite -> latitude longitude );
Earthquake( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, magnitude > 0 );
“Find all nuclear tests conducted by India or Pakistan after January 1, 1995 with seismic body wave magnitude > 4.5 and find all earthquakes that could have been
caused due to these tests.”
NuclearTest Causes Earthquake <= dateDifference( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance( NuclearTest.latitude, NuclearTest.longitude, Earthquake,latitude, Earthquake.longitude ) < 10000
Ontology Ontology
ResourceResource
Resource
Relationship
IScape
Missing AttributesMissing Attributes
Use functional dependencies (FD) <attribute>+ -> <missing attributes><attribute>*
Couple with associate resource
Primary resource Associate resource
Join (using LHS attributes)
All available attributes(A, B, C, D)
LHS attributes + missing attributes(B, C, E, F)BC -> DEF
Primary.B = Associate.B ANDPrimary.C = Associate.C
A, B, C, D A, B, C, E, F
Missing Attributes…Missing Attributes…
Criteria for FD All the missing attributes should be in the
RHS of the FD All attributes in the LHS of the FD should be
available from the primary resource.
Primary resource Associate resource
Join (using LHS attributes)
All available attributes(A, B, C, D)
LHS attributes + missing attributes(B, C, E, F)BC -> DEF
Primary.B = Associate.B ANDPrimary.C = Associate.C
A, B, C, D A, B, C, E, F
Missing Attributes…Missing Attributes…
Criteria for associate resource Provide missing attributes + attributes in LHS of FD Resource rules should not falsify query constraint Resource rules should not falsify resource rules on
primary resource BP can be supplied by primary resource
Primary resource Associate resource
Join (using LHS attributes)
All available attributes(A, B, C, D)
LHS attributes + missing attributes(B, C, E, F)BC -> DEF
Primary.B = Associate.B ANDPrimary.C = Associate.C
A, B, C, D A, B, C, E, F
BPSupplier and Join
IScape 1IScape 1
NuclearTestsDB( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, [dc] waveMagnitude > 3, [dc] eventDate > “January 1, 1985” );
NuclearTestSites( testSite, latitude, longitude );
SignificantEarthquakesDB( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, [dc] eventDate > “January 1, 1970” );
NuclearTest( testSite, explosiveYield, waveMagnitude, testType, eventDate, conductedBy, latitude, longitude, waveMagnitude > 0, waveMagnitude < 10, testSite -> latitude longitude );
Earthquake( eventDate, description, region, magnitude, latitude, longitude, numberOfDeaths, damagePhoto, magnitude > 0 );
“Find all nuclear tests conducted by India or Pakistan after January 1, 1995 with seismic body wave magnitude > 4.5 and find all earthquakes that could have been
caused due to these tests.”
NuclearTest Causes Earthquake <= dateDifference( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance( NuclearTest.latitude, NuclearTest.longitude, Earthquake,latitude, Earthquake.longitude ) < 10000
Ontology Ontology
ResourceResource
Resource
Relationship
IScape
Sources SelectedSources Selected
Resource Access
NuclearTestsDB
testSite, explosiveYield,waveMagnitude, testType,eventDate, conductedBy
Resource AccessNuclearTestSites
testSite, Latitude,longitude
Resource Access
SignificantEarthquakesDB
eventDate, description,Region, magnitude,Latitude, longitude,numberOfDeaths, damagePhoto
JointestSiteEquals ( NuclearTestsDB.testSite, NuclearTestSites.testSite )Select
NuclearTest.waveMagnitude > 4.5AND ( NuclearTest.conductedBy = “India” OR NuclearTest.conductedBy = “Pakistan” )
Use of FD to retrieve missing attributes
Use of function to resolve syntactic
heterogeneity
Data IntegrationData Integration
Integrate data from all sources Union Function Evaluations Constraint Checking Relationship Evaluation Aggregations Projection
Data IntegrationData Integration
Resource Access
NuclearTestsDB
Resource Access
NuclearTestSites
Resource Access
SignificantEarthquakesDB
Join
testSiteEquals ( NuclearTestsDB.testSite, NuclearTestSites.testSite )Select
NuclearTest.waveMagnitude > 4.5AND ( NuclearTest.conductedBy = “India” OR NuclearTest.conductedBy = “Pakistan” )
Union * Union *
Data IntegrationData Integration
Union * Union *
Function Evaluator
dateDifference ( “January 1, 1995”, NuclearTest.eventDate )
Select
dateDifference ( “January 1, 1995”, NuclearTest.eventDate ) > 0
NuclearTest Earthquake
Data IntegrationData Integration
Union * Union *
Function Evaluator
dateDifference ( “January 1, 1995”, NuclearTest.eventDate )
Select
dateDifference ( “January 1, 1995”, NuclearTest.eventDate ) > 0
NuclearTest Earthquake
Relationship Evaluator
NuclearTest Causes Earthquake
dateDifference ( NuclearTest.eventDate, Earthquake.eventDate ) < 30 AND distance ( NuclearTest.latitude, NuclearTest.longitude, Earthquake.latitude, Earthquake.longitude ) < 10000
Data Integration…Data Integration…
Relationship Evaluator
NuclearTest Causes Earthquake
Project
N.testSite, N.eventDate, N.testType, N.explosiveYield, N.waveMagnitude, N.conductedBy, E.eventDate, E.region, E.description, E.magnitude, E.numberOfDeaths, E.damagePhoto, dateDifference( N.eventDate, E.eventDate ), distance( N.latitude, N.longitude, E.latitude, E.longitude )
N = NuclearTestE = Earthquake
ResultsResults
IScape 2IScape 2
YahooTravel ( airlineCompany, flightNo, aircraft, fromCity, fromState, toCity, toState, departureDate, meals, departureTime, arrivalTime, [fromCity, fromState, toCity, toState, departureDate] );
AirlineLogos ( airlineCompany, airlineLogo );
WeatherChannel ( date, city, state, description, icon, hiTemp, loTemp, [city, state] );
DirectFlight ( airlineCompany, airlineLogo, flightNo, aircraft, fromCity, fromState, toCity, toState, departureDate, fare, meals, departureTime, arrivalTime, airlineCompany -> airlineLogo );
DailyWeather ( date, city, state, description, icon, hiTemp, loTemp );
“Find all direct flights from Atlanta, GA to Boston, MA for March 23, 2001 and show the weather in the destination city on that day.”
Ontology Ontology
ResourceResource
Resource
IScape
AirTranAirways( airlineCompany, flightNo, fromCity, fromState, toCity, toState, departureDate, fare, departureTime, arrivalTime, [dc] airlineCompany = “AirTran Airways”, [lc] airlineCompany = “AirTran Airways”, [fromCity, fromState, toCity, toState, departureDate]);
Resourcehttp://travel.yahoo.com http://www.airtran.com
http://www.weather.com
Plan GenerationPlan Generation
IScape is semantically correct Locally complete sources available but
not applicable
IScape 2IScape 2
YahooTravel ( airlineCompany, flightNo, aircraft, fromCity, fromState, toCity, toState, departureDate, meals, departureTime, arrivalTime, [fromCity, fromState, toCity, toState, departureDate] );
AirlineLogos ( airlineCompany, airlineLogo );
WeatherChannel ( date, city, state, description, icon, hiTemp, loTemp, [city, state] );
DirectFlight ( airlineCompany, airlineLogo, flightNo, aircraft, fromCity, fromState, toCity, toState, departureDate, fare, meals, departureTime, arrivalTime, airlineCompany -> airlineLogo );
DailyWeather ( date, city, state, description, icon, hiTemp, loTemp );
“Find all direct flights from Atlanta, GA to Boston, MA for March 23, 2001 and show the weather in the destination city on that day.”
Ontology Ontology
ResourceResource
Resource
IScape
AirTranAirways( airlineCompany, flightNo, fromCity, fromState, toCity, toState, departureDate, fare, departureTime, arrivalTime, [dc] airlineCompany = “AirTran Airways”, [lc] airlineCompany = “AirTran Airways”, [fromCity, fromState, toCity, toState, departureDate]);
Resourcehttp://travel.yahoo.com http://www.airtran.com
http://www.weather.com
Binding PatternsBinding Patterns
The execution plan should specify : Which BP is to be used How values for the BP attributes should be
supplied Values can be supplied in following
ways Query constraint Attributes in other ontologies Associate Resource
Binding Pattern Using Binding Pattern Using Associated ResourceAssociated Resource
Criteria for associate resource Should supply all attributes needed for BP Values for its BP, if any, should be supplied from
IScape’s constraint only Resource rules involving only BP attributes being
retrieved should not falsify IScape’s constraint
Primary resource Associate resource
BP Supplier
All attributes(A, B, C, D)
BP attributes(A, B)
A, B, C, D[A, B]
A, B, C, E, F
BP attributes(A, B)
Binding Patterns…Binding Patterns…
BP of AirTranAirways and YahooTravel can be supplied from query constraint
DirectFlight.fromCity = “Atlanta” AND DirectFlight.fromState = “GA” AND DirectFlight.toCity = “Boston” AND DirectFlight.toState = “MA” AND DirectFlight.departureDate = “March 23, 2001” AND DailyWeather.city = DirectFlight.toCity AND DailyWeather.state = DirectFlight.toState AND DailyWeather.date = DirectFlight.departureDate
[ fromCity: (“Atlanta”), fromState: (“GA”), toCity: (“Boston”), toState: (“MA”), departureDate: (“March 23, 2001”) ]
Binding Patterns…Binding Patterns…
BP of WeatherChannel to be supplied using attributes from DirectFlight
DirectFlight.fromCity = “Atlanta” AND DirectFlight.fromState = “GA” AND DirectFlight.toCity = “Boston” AND DirectFlight.toState = “MA” AND DirectFlight.departureDate = “March 23, 2001” AND DailyWeather.city = DirectFlight.toCity AND DailyWeather.state = DirectFlight.toState AND DailyWeather.date = DirectFlight.departureDate
Sources SelectedSources Selected
Resource Access
YahooTravel
airlineCompany, flightNo, aircraft, departureDate, departureTime, …
fromCity = “Atlanta”, fromState = “GA”, toCity = “Boston”, toState = “MA”, departureDate = “March 23, 2001”
Resource AccessAirlineLogos
airlineCompany, airlineLogo
Resource Access
AirTranAirways
airlineCompany, flightNo, aircraft, departureDate, departureTime, …
fromCity = “Atlanta”, fromState = “GA”, toCity = “Boston”, toState = “MA”, departureDate = “March 23, 2001”
Join
YahooTravel.airlineCompany = AirlineLogos.airlineCompany
Resource Access
WeatherChannel
Desrcription, icon, hiTemp, loTemp, city, state, date
Join
AirTranAirways.airlineCompany = AirlineLogos.airlineCompany
BPSupplier
W.city = F.toCity, W.state = F.toState, W.date = F.departureDate
W = DailyWeatherF = DirectFlight
Use of FD to retrieve missing attributes
BP supplied using IScape constraint
Only one Resource Access node for one resource if possible
Use BPSupplier Node when BP values are retrieved from
another ontology
Data IntegrationData Integration
Resource Access
YahooTravel
fromCity = “Atlanta”, fromState = “GA”, toCity = “Boston”, toState = “MA”, departureDate = “March 23, 2001”
Resource AccessAirlineLogos
airlineCompany, airlineLogo
Resource Access
AirTranAirways
fromCity = “Atlanta”, fromState = “GA”, toCity = “Boston”, toState = “MA”, departureDate = “March 23, 2001”
Join
YahooTravel.airlineCompany = AirlineLogos.airlineCompany
Resource Access
WeatherChannel
Desrcription, icon, hiTemp, loTemp, city, state, date
Join
AirTranAirways.airlineCompany = AirlineLogos.airlineCompany
BPSupplier
W.city = F.toCity, W.state = F.toState, W.date = F.departureDate
W = DailyWeatherF = DirectFlight
Union *Intermediate Union
Data Integration…Data Integration…
Resource Access
YahooTravel
Resource AccessAirlineLogos
Resource Access
AirTranAirways
Join
YahooTravel.airlineCompany = AirlineLogos.airlineCompany
Resource Access
WeatherChannel
Join
AirTranAirways.airlineCompany = AirlineLogos.airlineCompany
BPSupplier
W.city = F.toCity, W.state = F.toState, W.date = F.departureDate
W = DailyWeatherF = DirectFlight
Union *
BP values are retrieved from intermediate
union
Data Integration…Data Integration…
Resource Access
YahooTravel
Resource AccessAirlineLogos
Resource Access
AirTranAirways
Join
Resource Access
WeatherChannel
Join
BPSupplier
W = DailyWeatherF = DirectFlight
Union *
Join
W.city = F.toCity AND W.state = F.toState AND W.day = F.departureDay AND W.month = F.departureMonth
Project
F.airlineCompany, F.airlineLogo, F.flightNo, F.aircraft, …, W.description, W.icon, W.date, W.loTemp, W.hiTemp,…
ResultsResults
IScape ExecutionIScape Execution
IScape
Plan
Plan
KnowledgeIScape
Query Query Query
Data retrieved
Final Results
Final Results
Where we are...Where we are...
Motivation ADEPT Background Planning and Optimization IScape Execution Execution Monitoring Related Work Conclusions and Future Work
Execution MonitoringExecution Monitoring
IScape Processing Monitor (IPM) GUI High-level debugger Allows monitoring how much time each
phase of IScape processing takes Allows localizing errors
IScape Processing Monitor IScape Processing Monitor (IScape 1)(IScape 1)
IScape Processing Monitor IScape Processing Monitor (IScape 2)(IScape 2)
Related WorkRelated Work
Features of InfoQuilt not supported by any other systems Ability to assist in learning about domains
and complex inter-domain relationships Support for use of functions and simulations
to post-process Support for complex relationships and
constraints that can use functions as special operators
Powerful semantic query interface (IScapes)
Related Work…Related Work…
SIMS Mediator specialized to one domain Cannot use local completeness information
about sources One BP per resource
OBSERVER Limited to basic relationships Resource models are not as rich
Related Work…Related Work…
TSIMMIS Mediators defined using MSL Adding or removing sources is difficult Query-centric (uses pre-defined query templates) Can answer a restricted set of queries
Information Manifold No domain rules, FDs Local Completeness can not be modeled
precisely Capability records cannot model query capability
limitations precisely
ContributionsContributions
Planning and Optimization Algorithm Efficient source selection Ability to use sources in conjunction to
retrieve more complete information Generation of executable plans Integration of information retrieved from the
sources selected Multi-threaded IScape execution IScape Processing Monitor Framework for functions and simulations
Future WorkFuture Work
The Planning Agent could create backup plans that the Correlation Agent can switch to on failure
More precise specification of query capabilities of the resource
Better framework for simulations
Thank You!Thank You!