implementation of topic centered portals

24
Slide 1 Reproduksjon forbudt uten tillatelse fra Computas AS © Implementation of Topic Centered Portals David Norheim Computas AS, Norway Robert Engels, ESIS AS, Norway

Upload: hada

Post on 03-Feb-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Implementation of Topic Centered Portals. David Norheim Computas AS, Norway Robert Engels, ESIS AS, Norway. Motivation The system Challenges and lessons learned Future work. Computas. 23 years experience in knowledge management, expert systems, and process modeling - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Implementation of Topic  Centered Portals

Slide 1 Reproduksjon forbudt uten tillatelse fra Computas AS ©

Implementation of Topic Centered Portals

David NorheimComputas AS, Norway

Robert Engels, ESIS AS, Norway

Page 2: Implementation of Topic  Centered Portals

Slide 2 Reproduction prohibited without authorization by Computas AS ©

Motivation

The system

Challenges and lessons learned

Future work

Page 3: Implementation of Topic  Centered Portals

Slide 3 Reproduction prohibited without authorization by Computas AS ©

Computas

23 years experience in knowledge management, expert systems, and process modeling

Special focus toward government and the oil- and gas sector

The major semantic web company in Norway

Page 4: Implementation of Topic  Centered Portals

Slide 4 Reproduction prohibited without authorization by Computas AS ©

Computas’ semantic Web activities

Sectors

• Oil- and gas industry

• Government

Type of applications

• Knowledge management

• Semantic search support

• Research and commerical projects

Page 5: Implementation of Topic  Centered Portals

Slide 5 Reproduction prohibited without authorization by Computas AS ©

Background A clear shift towards open source and open standards

• Linux for Schools, Open Document formats in the public sector

• National semantic registry, Large governmental information portals based on semantic standards

• The government through Norwegian Archive, Library and Museum Authority (ABM-utvikling): development of an open standard based, open-source software for creation and maintenance of topic-driven portals.

”there is a need for a targeted effort to create a framework based on Semantic Web to enable professional users to organize information and to make libraries build and maintain metadata-driven search solutions.”

A digital culture and knowledge policy?

EFN.no

Page 6: Implementation of Topic  Centered Portals

Slide 6 Reproduction prohibited without authorization by Computas AS ©

A topic driven portal

For a library it is as natural to evaluate, describe and enable retrieval of any resource on the web as printed material

Quality evaluated collection of information resources organized according to some topic structure and published online.

Retrieval through search and navigation in topics

Source: Ellen Aarbakken, Oslo Public Library (Deichmanske Bibliotek)

Page 7: Implementation of Topic  Centered Portals

Slide 7 Reproduction prohibited without authorization by Computas AS ©

Why a topic centric portal tool and not search?

Yahoo! provided the first subject driven portal, but focused on most popular aspects -> replaced by Search (e.g. Google)

However, the words in the long tail is context dependent, and generic web search will frequently pollute results due to ambiguity

Example of long tail portals

• Medical information for laymen

• Primary school educational resources

• Public information for immigrants

• Juridical information for laymen

• Norwegian architecture portal

Page 8: Implementation of Topic  Centered Portals

Slide 8 Reproduction prohibited without authorization by Computas AS ©

Why not Web 2.0?

• Folksonomies• Collaborative “categorization” • Freely chosen keywords• Manual “tagging”, practically

no existing metadata• Mostly acting as a

popularity measure

• Topic tools• Conceptual level

with navigation• Quality evaluated

with metadata• Manual “tagging”,

but support formore automation

Page 9: Implementation of Topic  Centered Portals

Slide 9 Reproduction prohibited without authorization by Computas AS ©

SUBject oriented tool for LIbraries, Museums

and Archives Several roads to the same

destination

Key requirements in developing the tool

• Handle metadata of various sources and vocabularies (e.g. Dublin Core)

• Interoperability - among portals based on the same tool and same protocols (SPARQL, SRU)

• Open source and open (semantic web) standards

• Combining free text search and navigation through models

• Handling both informal and formal models (e.g. SKOS and OWL DL) - future

Page 10: Implementation of Topic  Centered Portals

Slide 10 Reproduction prohibited without authorization by Computas AS ©

Scandinavian Medical Information for Laymen (SMIL) is a Scandinavian international cooperation to offer quality controlled meta-data with references to pages related to health, illnesses and treatments. Contributing partners to the portal are librarians and nurses from the Nordic countries. The current SMIL base consists of 8500 records creating around 250.000 triples.

Two initial portals

Detektor targets public schools. Resources are annotated by public libraries consists of about 1850 topics and 4600 resources. This results in about 100.000 triples

Page 11: Implementation of Topic  Centered Portals

Slide 11 Reproduction prohibited without authorization by Computas AS ©

Portal Technical Characteristica (grounding technologies)

Technology Name, release Comment

Operating system Linux Ubuntu Also tested on Redhat, Windows and OS X

Database Postgress (under Jena) indexing with Lucene

Should work with any SPARQL and SPARUL supported storage

Document repository

The Web, any URLs

Webserver Apache Tomcat v.5.5 and 6.0 Also tested on RESIN

Applied ontology Domain Ontology and Portal ontology (object types)

Ontology Language

SKOS, RDF/S Currently implementing OWL support

Export/Import RDF/XML, Turtle

Reuse and Interoperbility

Voc.: DC, FOAF, SIOC, Powder Lingvoj. Query lang.: SPARQL, CQL

Also using SPARUL

Inference engine None Will implement OWL DL supported inference engine Q4 2008

Ontology editor Internal web-based, Protégé (external)

Export ontology and continue to work in any RDF/OWL compliant ontology editor

User interface HTML, Apache Cocoon

License Open Source, CDDL-lisence

Evaluation criterieas inspired by the Esperonto project

Page 12: Implementation of Topic  Centered Portals

Slide 12 Reproduction prohibited without authorization by Computas AS ©

Architecture

Web client

Search and navigation

SPARQL dispatcher

SPARQL queries

Local endpoint

IndexingTopic ontology

Metadata store

Ontology maintenance

External clients

SUR

clien

t

External servers

SRU

serv

er

SPAR

QL

endp

oint

Crawler

Portal configuration

SPARQL update

Web resources

Ope

n se

arch

SPAR

QL

clien

t

The client consists of a search interface allowing users to search using free text and meta-data search. The search string is transferred into a structured SPARQL query Interoperabilit

y at the query layer

System accept queries from both SPARQL and SRU/CQL

Backend consists of an RDF Store with SPARQL interfaces. Freetext indexing using lucene/LARQ

System can query external SPARQL and SRU/CQL services

Page 13: Implementation of Topic  Centered Portals

Slide 13 Reproduction prohibited without authorization by Computas AS ©

Sublima Ontologies generally

provide the structure for the navigation of the results, support browsing and classification.

Ontologies allow for term disambugation, query rewriting and semantic distance measures

In sublima we use informal SKOS to

• Navigating through subjects, showing the subject relations (“fish eye”)

• Search expansion; synonyms, common misspellings

• Faceted filtering; topics as well as other metdata

Future version will also support OWL DL

Page 14: Implementation of Topic  Centered Portals

Slide 14 Reproduction prohibited without authorization by Computas AS ©

Good and bad choices, lessons learned the hard way

• Keeping the semantics

• Living with free-text indexing and structrued queries

• Tool maturity

• Scalabilty

Keep in mind this is NOT a research project, but with a real and demanding customers expecting everything to work

Page 15: Implementation of Topic  Centered Portals

Slide 15 Reproduction prohibited without authorization by Computas AS ©

Perserving the semantics

We needed flexibility for users to add any metadata without touching code

SPARQL SELECT loses the meaning returning only a binding, hence clients become static. We therefore used SPARQL DESCRIBE extensively

DESCRIBE ?x

WHERE { ?lit pf:textMatch ”cancer*”@en .

?x dc:title ?lit .

}

Page 16: Implementation of Topic  Centered Portals

Slide 16 Reproduction prohibited without authorization by Computas AS ©

Living with free-text indexing and structrued queries

Indexing with respect to structure

Our breastfeeding twin-problem

• Not sufficient to index all literals as users expect hits on the combination of dc:title and dc:description

• And even worse; the combination of dc:title and dc:subject/skos:preferedLabel

Scoring/ranking

• Easy with SELECT, but not with DESCRIBE

• How do you rank results from a structured query?

No universal way to handle sturctured and unstructure information

Page 17: Implementation of Topic  Centered Portals

Slide 17 Reproduction prohibited without authorization by Computas AS ©

Constistent tool maturity and missing links

Some ”small” issues

• Support for Turtle in Protégé -> needed to convert to RDF/XML

• Resources identified with URLs in Protégé

• Tools mostly geared towards one dialect of RDF/OWL

• Indeterministic RDF/XML serialization for XSLT processing

• Lacking a binding from OWL classes to OO languages

The simple things sometimes turns out to be the hardest…

Page 18: Implementation of Topic  Centered Portals

Slide 18 Reproduction prohibited without authorization by Computas AS ©

Scalability

Response time varies with store size and query complexity• Too much complexity

in queries

Moving from 500k triples to 10th of millions• Need to refactor into

smaller faster queries

• Federation of queries

Page 19: Implementation of Topic  Centered Portals

Slide 19 Reproduction prohibited without authorization by Computas AS ©

Some good lessons

New standards (e.g. SPARQL), proposals for standardization (e.g. SPARUL), new tools (e.g. Jena), open source (e.g. Tomcat, Apache), lack of good documentation all say high risk!!!!

However, the support and maintenance from the W3C community and open source developers (e.g. Jena team) has been impressive, the support through IRC channels, mailing lists etc has been invaluable for the project.

Page 20: Implementation of Topic  Centered Portals

Slide 20 Reproduction prohibited without authorization by Computas AS ©

Some good lessons

Good experiences with reusing metadata schemas• FOAF, Dublin Core, Powder, SKOS, SIOC,

Lingvoj

Extensive dereferencing of URIs, any topic and resource URI pasted in the browser results in a DESCRIBE query for that URI.

Page 21: Implementation of Topic  Centered Portals

Slide 21 Reproduction prohibited without authorization by Computas AS ©

Living with informal and formal ontologies

Current ontologies are modeled informally with W3C Simple Knowledge Organization System (SKOS)• No distinction between part-of, contains,

is-a

• No reasoning support

• Possible with small datasets

Sublima will also support models using formal ontologies• Formal IS-A

• DL reasoning

• Required for large datasets

Expressivity

Reasoning

Large data sets

Smaller data sets

Page 22: Implementation of Topic  Centered Portals

Slide 22 Reproduction prohibited without authorization by Computas AS ©

Future work

• Integration with other SPARQL-based portals.

• Interoperability with ISO Topic Maps models

• Graphical visualization with touch screen, clever UIs

• Hi-quality multimedia resources

The code-base is no in use in more

projects

Page 23: Implementation of Topic  Centered Portals

Slide 23 Reproduction prohibited without authorization by Computas AS ©

Conclusion

We clearly found that the technology currently available starts to reach a certain state of maturity if it comes to functionality. BUT STILL RISKS!

Careful evaluation of tools and scalability is needed as content increases.

Query interoperability

Do not eat the whole menu at once!

Recording companies

Broad-casters High quality metadata Open metadata

e.g.Wikipedia

Page 24: Implementation of Topic  Centered Portals

Slide 24 Reproduction prohibited without authorization by Computas AS ©

Thank you for your [email protected]

We welcome sharing our experiences with yours! Welcome to upcoming conferences in Norway next year

•Mid February in Oslo - hands-on tutorials

•May in Stavanger - Semantic Days focusing on the oil- and gas industry

•September 2008 - initiating Scandinavian Semantic Web Conference