deploying semantic technologies for digital publishing: a case study from logos bible software

33
Deploying Semantic Technologies for Digital Publishing A Case Study from Logos Bible Software Sean Boisen ([email protected]) Slides at: http://semanticbible.org/other/presentations/2007-SemTe ch/

Upload: sboisen

Post on 12-May-2015

2.015 views

Category:

Technology


3 download

DESCRIPTION

Presented May 24, 2007 at the Semantic Technology ConferenceThis talk describes an effort at Logos Research Systems to build a semantic knowledgebase encompassing general background information about entities and relationships from the Bible (one of the world's most popular collections of information). The scope includes people, places, belief systems, ethnic attributes, social roles, as well as family and other inter-personal relationships, places visited, etc. This Bible Knowledgebase (BK) will be used to support knowledge discovery and visualization in both desktop and web-server configurations for Logos' products. It will also provide an integration framework for Logos' substantial digital library (more than 7000 titles from over 100 different publishers). The project is a good example of what it takes to move a real-world, knowledge-intensive application into a Semantic Web framework.

TRANSCRIPT

Page 1: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Deploying Semantic Technologies for Digital

PublishingA Case Study from

Logos Bible SoftwareSean Boisen

([email protected])Slides at: http://semanticbible.org/other/presentations/2007-SemTech/

Page 2: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Outline

• Background: application and motivation• Scope and Overview• Technical Challenges:

– Reification for provenance data– Converting legacy data– Tools for knowledge extension

• Future directions

Page 3: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Who Am I?

• 19 years with BBN Technologies– Information extraction, human language

technology– Scientist, technology manager

• Semantic Web hobbyist• Senior Information Architect at Logos• One-man semantic band

Page 4: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

The Importance of the Bible as a Semantic Domain

• The most widely distributed book – 35M Bibles and

Testaments in 2005

• The most widely translated work – > 2000 languages– 41 languages at

www.biblegateway.com

• Spans 1000s of years of ancient history

Page 5: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Logos Bible Software

• High-end desktop digital library– > 7000 titles– Resources in a dozen

languages– Users in 180 countries– Extensive cross-indexing

and hyper linking

• Leading publisher and developer of digital resources for Bible study

Page 6: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Logos Value

• Digital library with hyperlinked references and citations

• Information integration for navigation, search

• Support for original languages• Search• New content to enrich Bible study

Page 7: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

The Bible Knowledgebase (BK)

• A machine-readable knowledgebase of semantically-organized Bible data– In OWL– Linked to Biblical texts– Search, navigation, visualization

• Relationships support discovery and exploration• Reusable content (unlike prose)• Integration framework for library resources (future)• Today: named people and places, and their

relationships• Tomorrow: chronology, events, concepts, non-

named things, key terms, topics, …

Page 8: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Approach

• Build on Semantic Web standards• Model the domain rather than annotate

texts• Layer knowledge: first entities, then

relationships• Be conservative in what we assert and

provide references as evidence• Try to avoid philosophy and focus on end-

user value

Page 9: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

The Semantic Value Proposition

• Identify and disambiguate entities (beyond names)– 30 people named Zechariah– Jesus’ disciple: Peter, Simon, Simeon, Cephas …– Judah: person, tribe, territory

• Link reference information to passages for background

• Provide a rich set of relationships to encourage exploration and discovery

• Provide consistent cross-resource indexing• Leverage third-party tools• Provide scalability• Avoid reinventing the wheel

Page 10: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

User Benefits

• Disambiguation makes search work better• Passage guide displays relevant entities to

provide background information• Relationships encourage browsing and

exploration• Visualization makes complex information

easier to grasp

Page 11: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Development Tools

• Ontology development and instance creation with Protégé

• Legacy data conversion and data merging through XSLT

• Storage in Sesame• Some integration code in Python for

loading and querying RDF• TBD

Page 12: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Most Important BK Classes

• > 60 classes in all (not counting reified relationships)

• Many upper classes are not instantiated

• General coordination of class names with SUMO– But not true re-use

Page 13: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

BK Classes for Places

Page 14: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

BK Abstract Classes

Page 15: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

BK Instances

• ~100k triples• ~3000 people instances

– Aaron to Zurishaddai– Names (various

languages)

• ~20k passage references for assertions

• 90 cities, other places• Ethnicities, belief systems,

languages, social roles, organizations

Page 16: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Major BK Relationships

Family Relationships

Human

Human

Domain RangeProperty

Knows, collaborates, antagonist, enemy

Member ofGroup

Region

Native, resident, visited place

And inverse relationships …

(attributes)

Social role, Ethnicity, Belief

Region Subregion

Geolocation data Latitude, longitude,

etc.

Page 17: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Challenge: Assertions about Properties

• Provenance is important to the domain and application• Problem: how to make assertions about properties

– <#John.3, isFatherOf, #Peter>: says who?

#John.3

#Peter

isFatherOf

hasFather

#Andrew.1

hasFather

isFatherOf

Page 18: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Reification

• Merriam-Webster:– “to regard (something abstract) as a material

or concrete thing “

• Model the relationship between instances as an instance itself

Page 19: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Reified Relationships

• Solution: make the relationship an object about which we can make assertions– All “simple” properties get more complex

#John.3

#John.3_parent_Andrew.

1

isFatherOf

hasFather

#Andrew.1

isSonOf

hasSon

“bible.64.1.42”

reference

#John.3_parent_

Peter

isFatherOf

hasFather

#Peter

isSonOf

hasSon

“bible.64.21.15”

“bible.64.21.16”

“bible.64.21.17”

Page 20: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Some Consequences of Reification

• Class and property instance overhead– 2 simple inverse properties become 4

properties and 1 class– Abstract hierarchy of classes of reified

relationships– Add overhead as well to ontology

development, query construction, etc.

• Symmetric and transitive properties• Challenges for reasoning

– Restrictions come from a combination of properties and reified classes

Page 21: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Reified Classes (Family)

• All binary relationships with appropriate restrictions on their arguments (max 2, range restrictions, etc.)

Page 22: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Other Reified Classes

Page 23: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Properties Between Reified Properties

• Beyond OWL• Defined with respect to particular reified classes• Automatically derivable from the ontology

reif:isFatherOf

reif:isSonOf

reif:hasSon

reif:hasFather

owl:inverseOf

owl:inverseOf

reif:inverseOf

reif:pairedProperty

reif:onsetOf

reif:codaOf

Page 24: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Reified Relationships: Names

• Appellations (names) are class instances– An Appellation instance has string

representations (in various languages)– Keeps all the facts about a name (different

language versions, pronunciation, literal meaning, etc.) in one place

• An individual has a (reified) NamingRelation to an Appellation instance– Mentions of the individual are properties of the

NamingRelation

Page 25: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Reified Names Example

#Barnabas

isNamedBy hasAppellation

“Barnabas”@en

"Βαρναβᾶς"@el

"Bernabé"@eshasName

#Joseph2.1

bär'nə-bəs

hasPhoneticRepresentation

www.libronix.com/

bkaudio/barnabas.wav

hasPronunciation

bk:Man

rdf:type

bk:NameRel

#BarnabasnamedByBarnabas

#BarnabasnamedBy

Joseph

#Joseph2.1

namedByJoseph

bk:Appellation

#NameOfBarnabas

#NameOfJoseph

“bible.61.1.16”

reference

“bible.61.1.18 Etc.

And all the right-to-left equivalents …

Page 26: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Challenge: Converting Legacy Data

• Strategy: use XSL to generate RDF matching the ontology– Legacy XML data organized by name and by person– Generate reified relations from simple ones

• Lookup table for reified inverse properties (but kb query would be cleaner)

– Both sides of family relationships are defined independently

• URI Naming– Map different XML names to a single URI– Generate shared URIs for reified relations like

#<personURI>_<relation>_<personURI>– RDF merging connects them in the kb

• Why not owl:sameAs?– Additional complexity but no practical benefit for internal-

only data

Page 27: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Converting Legacy Data (2)

• Other OWL data with different URIs and non-reified relations– Map entities to common URIs (shared across

both legacy datasets)– Adopt same URI construction principles– Expand out reified relations– RDF merge in the kb

Page 28: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Legacy Data Pipeline

XSLAaron.xmlAaron.xml

Aaron.xmlAaron.xml

Biblical People XML

Aaron.xmlAaron.xml

Aaron.xmlAaron.rdf

… in RDF

Loader

NTNames (OWL)

BK ontology

XSL BK-NTNames

merge map

• Query (SeRQL, SPARQL)

• Extract

• API

• Web service

Other data (OWL)

Page 29: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Challenge: Maintenance and Extension

• How to lower the skill threshold for extending the data?

• Approach: – Distinguish different operations

1. Adding new instances of relationships (easy)2. Adding non-relational attributes (easy)3. Adding new instances of basic entities (a little harder)4. Fixing bad data, extending the ontology (hard)

– Get the core entities right first (enables #1)– Develop specialized tools that

• Are constrained in scope• Provide simple choices• Hide complications (like reification)

Page 30: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

How Do We Deliver Semantics?

• Part of a consumer software application: not on the open web

• Not practical to ship an RDF store• Likely: combination of

– Some static results shipped with product– Some web service support for dynamic

information– A web portal with richer search capabilities

Page 31: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Open Architecture Issues

• Visualization – Likely: custom MFC

• End-user query – Likely: at most, templated queries

• Reasoning– Necessary, but …

Page 32: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

Future Extensions to BK

• Place names and related properties• Brief descriptions for entities• Place people in Biblical eras• Narrative role (greetings in epistles, scene

participants, background)• Key events from narratives• Concepts• Unnamed things (descriptions, pronouns)• Headwords and lexical relationships

Page 33: Deploying Semantic Technologies for Digital Publishing: A Case Study from Logos Bible Software

References

• Weaving the New Testament into the Semantic Web, http://semanticbible.org/other/presentations/2006-sbl/Weaving.xhtml

• Suggested Upper Merged Ontology (SUMO), http://www.ontologyportal.org/

• Defining N-ary Relations on the Semantic Web, http://www.w3.org/TR/swbp-n-aryRelations