realizing interoperability of heterogeneous repositories
DESCRIPTION
Realizing Interoperability of Heterogeneous Repositories. Daniel Olmedilla L3S Research Center / Hannover University Programa de Postgrado en Ingeniería Informática y de Telecomunicación (Máster y Doctorado) Universidad Autónoma de Madrid, 10 th April, 2008. Outline. - PowerPoint PPT PresentationTRANSCRIPT
Realizing Interoperabilityof Heterogeneous Repositories
Daniel OlmedillaL3S Research Center / Hannover University
Programa de Postgrado en Ingeniería Informática y de Telecomunicación (Máster y Doctorado)
Universidad Autónoma de Madrid, 10th April, 2008
Apr. 10th, 2008Universidad Autónoma de Madrid 2Daniel Olmedilla
Outline
Introduction and Motivation Interoperability: what is it and why is it
needed? Common Query Interface Common Metadata Schema Ranking Successful Interoperability
Demonstrations Conclusions & Open Issues
Apr. 10th, 2008Universidad Autónoma de Madrid 3Daniel Olmedilla
Outline
Introduction and Motivation Interoperability: what is it and why is it Interoperability: what is it and why is it
needed?needed? Common Query InterfaceCommon Query Interface Common Metadata SchemaCommon Metadata Schema RankingRanking Successful Interoperability Successful Interoperability
DemonstrationsDemonstrations Conclusions & Open IssuesConclusions & Open Issues
Apr. 10th, 2008Universidad Autónoma de Madrid 4Daniel Olmedilla
IntroductionSimple Motivation Scenario (I)
Simple Scenario:
Alice is interested in learning about Windows and would like to attend a lecture about it this year
Apr. 10th, 2008Universidad Autónoma de Madrid 5Daniel Olmedilla
IntroductionSimple Motivation Scenario (& II)
Apr. 10th, 2008Universidad Autónoma de Madrid 6Daniel Olmedilla
IntroductionSearch Engine Limitations
Unstructured information and lack of semantics
Size and coverage of the Web Hidden Web (also Deep Web) Personalized Ranking
Apr. 10th, 2008Universidad Autónoma de Madrid 7Daniel Olmedilla
IntroductionOther Approaches: Coalitions
Repositories interconnected Lack of standards, ad-hoc solutions Individual agreement required to join
Approaches Replication
Loose control over data sometimes undesirable
Federated Search Lack of standards costly
Apr. 10th, 2008Universidad Autónoma de Madrid 8Daniel Olmedilla
IntroductionOther Approaches: P2P Networks
Advantages Scalability No single point of failure Control remains with owners Dynamicity
Disadvantages Decrease on performance Ad-hoc interfaces lack of
interoperability
Apr. 10th, 2008Universidad Autónoma de Madrid 9Daniel Olmedilla
IntroductionA bit More Complex Motivation Scenario
Alice is a consultant and she has been asked to lead a project starting in two months. Now she needs to retrieve courses in order to refresh and improve her previous
knowledge on project management get some basic knowledge about
accounting and auditing practice her advanced level of English
Apr. 10th, 2008Universidad Autónoma de Madrid 10Daniel Olmedilla
IntroductionProblem Statement
Lack of standards and appropriate integration solutions prevent users from easily and effectively finding relevant resources to their needs
Apr. 10th, 2008Universidad Autónoma de Madrid 11Daniel Olmedilla
Outline
Introduction and MotivationIntroduction and Motivation Interoperability: what is it and why is it
needed? Definition Why Interoperability? Challenges to achieve it
Common Query InterfaceCommon Query Interface Common Metadata SchemaCommon Metadata Schema RankingRanking Successful Interoperability DemonstrationsSuccessful Interoperability Demonstrations Conclusions & Open IssuesConclusions & Open Issues
Apr. 10th, 2008Universidad Autónoma de Madrid 12Daniel Olmedilla
Interoperability: What and Why? Exercise 1: simple questions
What is interoperability?
What does it mean two systems interoperate?
And at the information level?
Apr. 10th, 2008Universidad Autónoma de Madrid 13Daniel Olmedilla
Interoperability: What and Why? What is it?
Summary from existing definitions:
Ability of working together to accomplish a common task
Work in conjunction Exchange of information and USE it Provided at different levels Without increasing the effort of the user
[Concise Oxford Dictionary, NISO, IEEE: Standard Computer Dictionary, DMReview, Whatis.com]
Apr. 10th, 2008Universidad Autónoma de Madrid 14Daniel Olmedilla
Interoperability: What and Why? Interoperability encompasses …
Technical Interoperability Semantic Interoperability Political Interoperability Inter-community Interoperability Legal Interoperability International Interoperability
Apr. 10th, 2008Universidad Autónoma de Madrid 15Daniel Olmedilla
Interoperability: What and Why? Investment in Technology
ICT Gobally $1,45 trillion
annually
Technology in Europe €6,4 billion in 2004 Increasing (10%
more than previous year)
[Money for Growth, The European Technology Investment Report 2005. PricewaterhouseCoopers Report, Jun. 2005]
Apr. 10th, 2008Universidad Autónoma de Madrid 16Daniel Olmedilla
Interoperability: What and Why? Key Technological Issues (I)
38 industry associations in 27 different countries
The most significant technology issues … included Integration (21%) Standards (20%)
[International Survey of E-Commerce. World Information Technology and Services Alliance (WITSA), 2000]
Apr. 10th, 2008Universidad Autónoma de Madrid 17Daniel Olmedilla
Interoperability: What and Why? Key Technological Issues (& II)
[International Survey of E-Commerce. World Information Technology and Services Alliance (WITSA), 2000]
Apr. 10th, 2008Universidad Autónoma de Madrid 18Daniel Olmedilla
Interoperability: What and Why? Interoperability Inhibited by Cost
“Although interoperability is a significant strategic direction, it is often inhibited by
cost”
[Survey: Integration costs still hamper agility. Computerworld Today, February 2006]
Apr. 10th, 2008Universidad Autónoma de Madrid 19Daniel Olmedilla
Interoperability: What and Why? User Effectiveness: Some Facts
User Effectiveness Knowledge workers spend from 15% to 35%
of their time searching for information Searchers are successful in finding what
they seek 50% of the time or less
Total Lost not finding the right information:
estimated among $2.5 to $3.5 million per year for an enterprise with 1000 knowledge workers
opportunity cost: potential additional revenue of $15 million annually
[Feldman. The high cost of not finding information. IDC White Paper & KMWorld Magazine, 2004]
Apr. 10th, 2008Universidad Autónoma de Madrid 20Daniel Olmedilla
Interoperability: What and Why? Challenges to achieve it
CommonCommunication
Interface
CommonQuery
Language
CommonMetadataSchema
Ranking
Apr. 10th, 2008Universidad Autónoma de Madrid 21Daniel Olmedilla
Interoperability: What and Why? E-Learning Study Analysis: Technical Requirements
Training-life-cycle in companies across Europe Retrieving learning services from a wide
variety of providers Search heuristics Metadata queries Matching skill gaps with learning service
selections Matching personal development gaps
with learning services
[Gunnarsdottir. User Trials – Evaluation Report. EU IST ELENA Deliverable, May 2005]
Apr. 10th, 2008Universidad Autónoma de Madrid 22Daniel Olmedilla
Outline
Introduction and MotivationIntroduction and Motivation Interoperability: what is it and why is it Interoperability: what is it and why is it
needed?needed? Common Query Interface
Simple Query Interface Opening P2P to the rest of the World
Common Metadata SchemaCommon Metadata Schema RankingRanking Successful Interoperability DemonstrationsSuccessful Interoperability Demonstrations Conclusions & Open IssuesConclusions & Open Issues
Apr. 10th, 2008Universidad Autónoma de Madrid 23Daniel Olmedilla
Common Communication Interface Simple Query Interface (SQI)
Simple but Highly flexible: targets different interoperability scenarios
Official CEN/ISSS Workshop Agreement since October 2006
Listed by IMS on Query Services
Widely adopted in E-Learning community
Apr. 10th, 2008Universidad Autónoma de Madrid 24Daniel Olmedilla
Common Communication Interface Simple Query Interface: Design Issues
Independent of query language, result format and vocabularies
Complex information sources may be queried (e.g., P2P networks) Synchronous and asynchronous
Support for Lightweight implementations Stateful and stateless
Access-control and search separation Easy extensibility
Apr. 10th, 2008Universidad Autónoma de Madrid 25Daniel Olmedilla
Common Communication Interface Simple Query Interface: Session Management
Authentication/authorization are requirements
Independent of the search interface
Separation is managed via sessions session createAnonymousSession () session createSession (user, passwd) destroySession (sessionId)
Other different methods are allowed (e.g., based on credentials or trust negotiations)
Apr. 10th, 2008Universidad Autónoma de Madrid 26Daniel Olmedilla
Common Communication Interface Traditional Access Control in Decentralized Systems
Assumption: I already know you---you have a local account!
Not a member?
Apr. 10th, 2008Universidad Autónoma de Madrid 27Daniel Olmedilla
Common Communication Interface Trust Negotiation: Features
Trust is based on parties’ properties
Every party can define access control policies to control outsiders’ access to their sensitive resources
Establish trust iteratively and bilaterally by the disclosure of certificates and by requests for certificates
Apr. 10th, 2008Universidad Autónoma de Madrid 28Daniel Olmedilla
Common Communication Interface Trust Negotiation: Example
Step 1: Alice requests a service from Bob
Step 5: Alice discloses her VISA card credential
Step 4: Bob discloses his BBB credential
Step 6: Bob grants access to the serviceService
BobAlice
Step 2: Bob discloses his policy for the service
Step 3: Alice discloses her policy for VISA
Apr. 10th, 2008Universidad Autónoma de Madrid 29Daniel Olmedilla
Common Communication Interface Simple Query Interface: Query (I)
Apr. 10th, 2008Universidad Autónoma de Madrid 30Daniel Olmedilla
Common Communication Interface Simple Query Interface: Query (& II)
Apr. 10th, 2008Universidad Autónoma de Madrid 31Daniel Olmedilla
Common Communication Interface P2P Proxying Architecture
User
Provider
SQI ConsumerProxy
SQI ProviderProxy
Provider
Provider
Provider
User
External Provider
SQI ProviderProxy
External Provider
Web Service SQI
Web Service SQI
Web Service SQI
Web Service SQI
EDUTELLA NETWORK
[Brunkhorst, Olmedilla. Interoperability for peer-to-peer networks: Opening P2P to the rest of the World. EC-TEL, Oct 2006]
Apr. 10th, 2008Universidad Autónoma de Madrid 32Daniel Olmedilla
Outline
Introduction and MotivationIntroduction and Motivation Interoperability: what is it and why is it Interoperability: what is it and why is it
needed?needed? Common Query InterfaceCommon Query Interface Common Metadata Schema
Learning Resource Schema Competence Modeling
RankingRanking Successful Interoperability DemonstrationsSuccessful Interoperability Demonstrations Conclusions & Open IssuesConclusions & Open Issues
Apr. 10th, 2008Universidad Autónoma de Madrid 33Daniel Olmedilla
Common Metadata SchemaData Integration
Local As View
Global as View
S1(attr1, attr2, attr3, attr4)
Unified Schema (attr1, attr2, attr3, attr4)Unified Schema (attr1, attr2, attr3, attr4)
S2(attr1, attr2) S3(attr1, attr3, attr4)
SELECT * FROM S1UNION
SELECT S2.attr1, S2.attr2, S3.attr3, S4.attr4FROM S2, S3 WHERE S2.attr1 = S3.attr1
S1(attr1, attr2, attr3, attr4)
SELECT attr1, attr2, attr3, attr4FROM S1
Unified Schema (attr1, attr2, attr3, attr4)Unified Schema (attr1, attr2, attr3, attr4)
S2(attr1, attr2)
SELECT attr1, attr2FROM S2
S3(attr1, attr3, attr4)
SELECT attr1, attr3, attr4FROM S3
Apr. 10th, 2008Universidad Autónoma de Madrid 34Daniel Olmedilla
Common Metadata SchemaData Integration
Given a query reformulating it in terms of the sources Is easier in GAV (just needs unfolding of the
query) Is harder in LAV
Adding a new source Supposedly easier in LAV (just need to
express the new source as a view of the global schema)
Harder in GAV (as the global schema needs to be revised)
Apr. 10th, 2008Universidad Autónoma de Madrid 35Daniel Olmedilla
Common Metadata SchemaSimple Learning Resource Schema
Apr. 10th, 2008Universidad Autónoma de Madrid 36Daniel Olmedilla
Common Metadata SchemaComplex Learning Resource Schema
Apr. 10th, 2008Universidad Autónoma de Madrid 37Daniel Olmedilla
Common Metadata SchemaCompetence Requirements
Excerpt extracted from a newspaper Complete Master’s Degree (any faculty) Expert knowledge in Java J2EE, Servlets,
JSP) Very good IT English and / or Spanish
Drawbacks Does not indicate what is mandatory or
optional It is not machine-understandable
Apr. 10th, 2008Universidad Autónoma de Madrid 38Daniel Olmedilla
Common Metadata SchemaCompetence Definition
“an effective performance within a domain / context at different levels of proficiency”
Example: Competency “English Language”, Level “Advanced”, Context ”Computer Science”
Competence
ProficiencyLevel
Context
Competency
Apr. 10th, 2008Universidad Autónoma de Madrid 39Daniel Olmedilla
Common Metadata SchemaCompetency
We use IEEE RCD to represent a Competency
Uniquely identify an isolated competency
Enriched with human-readable titles and descriptions
Metadata
-RCD Schema Version-Additional Metadata
-RCD Schema
Statement
-Token
-Name-Text
-IdDefinition
-Model Source
RCD
-Description-Title
-Statements
1..*
0..1
0..1
Competence
ProficiencyLevel
Context
Competency
Apr. 10th, 2008Universidad Autónoma de Madrid 40Daniel Olmedilla
Common Metadata SchemaProficiency Level
Reusable scales of totally ordered proficiency levels
Each level is identified by an ID, a human-readable label and an optional mapping to a numerical domain
Proficiency Level
-Universal Scale Mapping [0..1]-LabelProficiency Scale
Ordered list
-levels
1..*
Competence
ProficiencyLevel
Context
Competency
Apr. 10th, 2008Universidad Autónoma de Madrid 41Daniel Olmedilla
Common Metadata SchemaContext
“... the interlaced conditions in which something exists or occurs”
Competences might be interpreted differently in a different context
Context are defined in tree-like hierarchies Easier to model and to handle Simpler algorithms, no cycle detection necessary May optionally link to additional ontologies
Context
-Label -subClassOf
0..1
Competence
ProficiencyLevel
Context
Competency
Apr. 10th, 2008Universidad Autónoma de Madrid 42Daniel Olmedilla
Aggregate Competence
-Sequenced : boolean = false
SimpleCompetence
Competence
-NameRCD
-RCD Ref-parts
2..*
-Prof Level Ref
Proficiency Level
Alternative Competence
-minNumber : Integer = 1-maxNumber : Integer
Composite Competence
Global Identifier
-Catalogue-Entry
Context
-alternatives
2..*
-Context Ref
-identifier
1
Common Metadata SchemaCompetence
Links to the dimensions objects High degree of
reusability Better support for gap
analysis
Competences can be simple or composed of other (arbitrary nested) competences Aggregation Set Selection
Competence
ProficiencyLevel
Context
Competency
Apr. 10th, 2008Universidad Autónoma de Madrid 43Daniel Olmedilla
Common Metadata SchemaA bit More Complex Motivation Scenario (Revisited)
Alice is a consultant and she has been asked to lead a project starting in two months. Now she needs to retrieve courses in order to refresh and improve her previous
knowledge on project management get some basic knowledge about
accounting and auditing practice her advanced level of English
Apr. 10th, 2008Universidad Autónoma de Madrid 44Daniel Olmedilla
Outline
Introduction and MotivationIntroduction and Motivation Interoperability: what is it and why is it Interoperability: what is it and why is it
needed?needed? Common Query InterfaceCommon Query Interface Common Metadata SchemaCommon Metadata Schema Ranking
Link-based Personalized Ranking Platform
Successful Interoperability Successful Interoperability DemonstrationsDemonstrations
Conclusions & Open IssuesConclusions & Open Issues
Apr. 10th, 2008Universidad Autónoma de Madrid 45Daniel Olmedilla
RankingPageRank
Page score based on the link structure of the web
It measures page popularity page i pointing to page j means vote from i to j The more backlinks a page has, the more important it is Sum of the ranks of the backlinks
Apr. 10th, 2008Universidad Autónoma de Madrid 46Daniel Olmedilla
RankingPageRank Example
Apr. 10th, 2008Universidad Autónoma de Madrid 47Daniel Olmedilla
RankingPageRank Personalization
It has a personalization vector
Computationally expensive: not possible to make the whole computation for each user
Apr. 10th, 2008Universidad Autónoma de Madrid 48Daniel Olmedilla
RankingPersonalized PageRank
Hubs: pages pointing to many important pages
Compute one Personalized PageRank Vector for each user (PPV)
Challenges:- Reduce storage required- Reduce time for computation
Each PPV corresponding to a Preference Set P can be expressed as a linear combination of Basis Hub Vector
Decomposes each Basis Hub Vector in two parts: Hub skeleton vector (common interrelationships and precomputed) Partial vector (unique values and computed at construction-time)
Apr. 10th, 2008Universidad Autónoma de Madrid 49Daniel Olmedilla
RankingPersonalized PageRank Limitations
Personalization relies on user’s ability to choose a good Preference Set High quality hubs which match his
preferences
This process can be automated: Information collected from the user can
be used to derive his Preference Set
User does not even need to know what is a hub
Apr. 10th, 2008Universidad Autónoma de Madrid 50Daniel Olmedilla
RankingA Personalized Ranking Platform (I)
Personalization relies on user’s ability to choose a good Preference Set High quality hubs which match his
preferences
This process can be automated: Information collected from the user can
be used to derive his Preference Set
User does not even need to know what is a hub
Apr. 10th, 2008Universidad Autónoma de Madrid 51Daniel Olmedilla
RankingA Personalized Ranking Platform (II)
User’s interests are determined by Most surfed pages User’s bookmarks
We get a set of pages from the user but They are not highly ranked hubs
HubFinder is an algorithm to find related web pages It allows pluggable filtering mechanisms
We use HubRank to find highly rated hubs related to a given initial set of pages
User web pages set of related highly rated hubs
Apr. 10th, 2008Universidad Autónoma de Madrid 52Daniel Olmedilla
SearchEngine
Crawler
Proxy
Crawl the WebComputescores
Extractbookmarks
User surfsin Internet
Tracksurfed pages
Search EngineIndex
User's BrowserUser's Browser
Hubrank scores
User’s bookmarks
Most surfedpages
Bookmarksrelated Hubs
Preferenceset
Surfed pagesrelated Hubs
User’sHub Set
Hubfinde
rH
ubfinder
UserUser queries
the search engine
RankingA Personalized Ranking Platform (& III)
Apr. 10th, 2008Universidad Autónoma de Madrid 53Daniel Olmedilla
RankingSelected Example (I)
Crawl with 3,000,000 web pages
30 bookmarks 15 on architecture 7 on traveling 6 on software 2 on sports
78 selected surfed pages
Computed 1300 pages as hub set
Apr. 10th, 2008Universidad Autónoma de Madrid 54Daniel Olmedilla
RankingSelected Example (II)
Query Keywords
PageRank PPR PROS
Rel.
P.Rel.
Irrel.
Rel.
P.Rel. Irrel.
Rel. P.Rel. Irrel.
architecture
5 3 2 3 7 0 8 2 0
building 3 2 5 2 3 5 4 1 5
Paris 6 0 4 2 3 5 6 2 2
park 6 0 4 8 0 2 10 0 0
surf 3 0 7 4 2 4 7 2 1
Total 23 5 22 19 15 16 35 7 8
Apr. 10th, 2008Universidad Autónoma de Madrid 55Daniel Olmedilla
RankingSelected Example (& III)
Apr. 10th, 2008Universidad Autónoma de Madrid 56Daniel Olmedilla
Outline
Introduction and MotivationIntroduction and Motivation Interoperability: what is it and why is it Interoperability: what is it and why is it
needed?needed? Common Query InterfaceCommon Query Interface Common Metadata SchemaCommon Metadata Schema RankingRanking Successful Interoperability Demonstrations
HCD-Online: Advanced Network Search Bringing Learning Repositories to a Global Network Knowledge Resource Sharing for a Life Long
Learning Infrastructure Conclusions & Open IssuesConclusions & Open Issues
Apr. 10th, 2008Universidad Autónoma de Madrid 57Daniel Olmedilla
Successful Interoperability DemonstrationsHCD-Online: Advanced Network Search
Apr. 10th, 2008Universidad Autónoma de Madrid 58Daniel Olmedilla
Successful Interoperability DemonstrationsPROLEARN & GLOBE
Apr. 10th, 2008Universidad Autónoma de Madrid 59Daniel Olmedilla
Successful Interoperability DemonstrationsTENCompetence, MACE, MELT, …
... Lobster Flickr YouTubeAriadneLionshare
Wrapper Wrapper WrapperWrapper
FederatedSearch Engine
Search Publishing
InformationSource Registry
Information Source Mgmt.
UserDatabase
User Mgmt.Rating
ClientClient
INFORMATION SOURCE LAYER
SERVICE LAYER
CLIENT LAYER
LionsharePeer at DesktopGUI
Taste
Apr. 10th, 2008Universidad Autónoma de Madrid 60Daniel Olmedilla
Outline
Introduction and MotivationIntroduction and Motivation Interoperability: what is it and why is it Interoperability: what is it and why is it
needed?needed? Common Query InterfaceCommon Query Interface Common Metadata SchemaCommon Metadata Schema RankingRanking Successful Interoperability Successful Interoperability
DemonstrationsDemonstrations Conclusions & Open Issues
Apr. 10th, 2008Universidad Autónoma de Madrid 61Daniel Olmedilla
Conclusions & Further WorkConclusions
Interoperability is a key technological issue
Lack of standards and integration solutions reusability prevent users from finding the information they need
Apr. 10th, 2008Universidad Autónoma de Madrid 62Daniel Olmedilla
Conclusions & Further WorkMain contributions
1. Identification of Requirements for system interoperability
2. Specification and Standardization of Simple Query Interface
3. SQI-based open-source components for easy adoption by information providers
4. Proxying architecture for distributed environments such as P2P networks
5. Data models and ontologies for semantic representation of learning objects and competences
6. Semantic integration based on query rewriting mechanisms
7. New personalized ranking algorithms for linked and unlinked corpus
8. Proof of concept integrated prototypes
9. Demonstration of interoperability achievement through several networks and projects world wide
Apr. 10th, 2008Universidad Autónoma de Madrid 63Daniel Olmedilla
Conclusions & Further WorkFurther Work
Interfaces for other services than search (e.g., publishing)
More research on flexible query languages (e.g., PLQL)
Development and Evolution of schemas
Adaptation, optimization and improvement of ranking algorithms
Apr. 10th, 2008Universidad Autónoma de Madrid 64Daniel Olmedilla
Questions?
[email protected] - http://www.olmedilla.info/
Thanks!