towards a digital library theory: a formal digital library ontology
DESCRIPTION
Towards a Digital Library Theory: A Formal Digital Library Ontology. Marcos Andr é Gonçalves, Layne T. Watson, and E dward A. Fox Virginia Tech, Blacksburg, VA 24061 USA, [email protected] (For ACM SIGIR Mathematical/Formal Methods in Information Retrieval, MF/IR 2004, Sheffield, UK, Aug. 29, 2004). - PowerPoint PPT PresentationTRANSCRIPT
Towards a Digital Library Theory: A Formal Digital Library Ontology
Marcos André Gonçalves, Layne T. Watson, and Edward A. Fox
Virginia Tech, Blacksburg, VA 24061 USA, [email protected]
(For ACM SIGIR Mathematical/Formal Methods in Information
Retrieval, MF/IR 2004, Sheffield, UK, Aug. 29, 2004)
Outline Background: The 5S Model Motivation for this Work Digital Library Formal Ontology Taxonomy of DL Services Applications of the Theory Conclusions and Future Work
Background: The 5S Model Why 5S?
DLs are not benefiting from formal theories as have other CS fields: DB, IR, PL, etc.
DL construction: difficult, ad-hoc, lacking support for tailoring/customization
Conceptual modeling, requirements analysis, and methodological approaches are rarely supported in DL development. Lack of specific DL models, formalisms, languages
Background: The 5S Model Informally, DLs can be defined as complex
information systems that: help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) (re)present info in usable ways (spaces) communicate info with users (streams)
Background: The 5S ModelStreams
Scenarios
Societies
Structures
Spaces
Static /Passive
Dynamic /Active
Background: 5S and DL formal definitions and compositions (April 2004 TOIS)
5S
structures (d.10)streams (d.9) spaces (d.18) scenarios (d.21) societies (d. 24)
structural metadataspecification(d.25)
descriptive metadataspecification(d.26)
repository(d. 33)
collection (d. 31)
(d.34)indexingservice
structured stream (d.29)
digitalobject (d.30)
metadata catalog (d.32)
browsingservice
(d.37)
searchingservice (d.35)
digital library(minimal) (d. 38)
services (d.22)
sequence (d. 3)
graph (d. 6)function (d. 2)
measurable(d.12), measure(d.13), probability (d.14), vector (d.15), topological (d.16) spaces
event (d.10)state (d. 18)
hypertext(d.36)
sequence (d. 3)
transmission(d.23)
relation (d. 1) language (d.5)
grammar (d. 7)
tuple (d. 4)*
Background: The 5S Model Summary of TOIS 2004 Formal Definitions:
A digital library is a 10-tuple (Streams, Structs, Sps, Scs, St2, Coll, Cat, Rep, Serv, Soc) in which Streams is a set of streams, which are sequences of
arbitrary types (e.g., bits, characters, pixels, frames); Structs is a set of structures, which are tuples, (G, ),
where G= (V, E) is a directed graph and : (V E) L is a labeling function;
Sps is a set of spaces each of which can be a measurable, measure, probability, topological, metric, or vector space.
Background: The 5S Model Scs = {sc1, sc2, …, scd} is a set of scenarios where each sck =
<e1k({p1k}), e2k({p2k}), …, ed_kk({pd_kk})> is a sequence of events that also can have a number of parameters {pik}. Events represent changes in computational states; parameters represent specific locations in a state and respective values.
St2 is a set of functions : V Streams ( ) that associate nodes of a structure with a pair of natural numbers (a, b) corresponding to a portion (span/segment) of a stream.
Coll = {C1, C2, …, Cf} is a set of DL collections where each DL
collection Ck = {do1k, do2k, …, dof_kk} is a set of digital objects.
Each digital object dok = (hk, Stm1k, Stt2k, k) is a tuple where
Stm1k Streams, Stt2k Structs, k St2, and hk is a handle
which represents a unique identifier for the object.
Background: The 5S Model Cat = {DMC_1, DMC_2, …, DMC_f} is a set of metadata catalogs for Coll where
each metadata catalog DMC_k = {(h, msshk)}, and msshk = {mshk1, mshk2, …, mshkn_hk} is a set of descriptive metadata specifications. Each descriptive metadata specification mshki is a structure with atomic values (e.g., numbers, dates, strings) associated with nodes.
A repository Rep = {(Ci, DMC_i)} (i=1 to f) is a set of pairs (collection, metadata catalog); it is assumed there exist operations to manipulate the family of pairs (e.g., get, store, delete).
Serv = {Se1, Se2, …, Ses} is a set of services where each service Sek = {sc1k, .., scs_kk} is described by a set of related scenarios.
Soc = (C, R) where C is a set of communities and R is a set of relationships among communities. SM = {sm1, sm2, …, smj}, and Ac = {ac1, ac2, …, acr } are two such communities where the former is a set of service managers responsible for running DL services and the latter is a set of actors that use those services. Being basically an electronic entity, a member smk of SM distinguishes
itself from actors by defining or implementing a set of operations {op1k, op2k, …, opnk} smk
BackgroundStreams
text
audio
image
video do mss
R
C DMcIc
Se
Sc
e
SM
Ac
op
Scenarios
Societies
Top
Pr
Metric
Measurable
Measure
Structures
Spaces
Vec
ms
Motivation Previous definitions emphasize syntactic aspects, i.e., how
digital library concepts are composed or built from previously defined concepts.
Complete a formal DL theory by: Making explicit the implicit relationships that exist among the DL
formal concepts defined in [Gonc04] Providing set of axiomatic rules that precisely define and constrain
the semantics of the relationships Categorizing and classifying DL services on the basis of the
ontology Research questions
How should DL services be built from the other DL components
Which are the fundamental and elementary DL services ? How can services be built/composed from other DL services?
We will explore semantic relations and rules of the DL domain by using ontologies.
Digital Library Formal Ontology An ontology is a tuple = (Ontol_Concepts,
Ontol_Rels) where: Ontol_Concepts is a family of ontological concepts, Ontol_Rels is a family of relations. Relations in Ontol_Rels are operationally realized by
one or more rules (e.g., first-order logic axioms) which intentionally specify or constrain which elements of a concept can participate in a relation.
Ontol_Rules is a family of rules of a particular ontology.
Relationships Intra-Model
Video contains Audio (MM) Metadata Catalog describes Collection (LIS) Probabilistic Space is_a Measure Space Service extends Service (reuse) Service Manager inherits_from Service Manager (OO)
Inter-Model Event executes Operation Actor participates_in Scenario Service Manager runs Service Service employs/produces Streams Structures
Spaces
Digital Library Formal Ontology
Digital Library Formal Ontology Concepts: {Se, Sc, e}; Key: Se = service; Sc = scenario; e =
event. Relations:
contains Sc e Symbolic Rule. x, y (x contains y Sc(x) e(y) j: (j x.Dom y = x(j)) )
precedes e e Sc; happens_before e e Sc Symbolic Rule 1. x, y, z (x precedesz y e(x) e(y) Sc(z) i, j: (z contains x
z contains y x = z(i) y=z(j) i + 1 = j)) Symbolic Rule 2. x, y, z (x happens_beforez y e(x) e(y) Sc(z) i, j: (z
contains x z contains y x = z(i) y=z(j) i < j)) includes Se Se Sc Sc; extends Se Se Sc Sc
Symbolic Rule 1. x, y (x includes y Sc(x) Sc(y) (z: e(z) y contains z x contains z) (p, q: e(p) e(q) p precedesy q p precedesx q))
Symbolic Rule 2. x, y (x extends y Sc(x) Sc(y) (z: e(z) y contains z x contains z) (p, q: e(p) e(q) p happens_beforey q p happens_beforex q))
Symbolic Rule 3. x, y (x extends y Se(x) Se(y) y x (x y p, q: Sc(p) Sc(q) p x q y p extends q))
Digital Library Formal OntologyStreams
text
audio
image
video do mss
R
C DMc
describes
stores
is_version_of
Ic
Se
Sc
e
extendsreuses
SM
Ac
opexecutes
participates_in
recipient
runs
Scenarios
Societies
inherits_from/includes
association
uses
Top
Pr Metric
Measurable
Measure
describes
employsproduces
employsproduces
employsproduces
Structures
Spaces
Vec
belongs_to
contains
ms
is_ais_a
precedeshappens_before
is_a
redefinesinvokes
contains
contains
contains
Digital Library Formal Ontology Consistency Rules
Catalog-Collection A complete catalog has at least one set of metadata
specifications for each digital object in the collection it describes (surjective partial function).
In a consistent catalog, each set of metadata specifications describes (exactly) one digital object in the related collection (total function).
Scenarios-Society A scenario x is consistent with regards to a set of
service managers Y if each operation executed by each event in the scenario is defined in some service manager y Y.
Digital Library Formal Ontology Characterizing employs/produces relationships In the table each service is characterized by
parameters (input, output) of the initial and final events of the scenarios that compose those services
All other previous definitions and keys apply here.
That set is complemented with the following definitions:
Services Related Definitions A query q is the representation of user interest or
information need. Hyptxt is an hypertext; wherein an anchor is a node. A log_entry is a descriptive metadata specification about
an event of a scenario. Let {doi} = {doi1, doi2,…, doin } be a set of digital
objects and Ct = {c1, c2,…,cn} be a set of labels for categories. A classifier classCt: {doi} 2Ct is a function that maps a digital object to a set of categories.
A cluster cluk = {do1k, do2k, …, donk} is a subset of a set of digital objects.
Service User input Other Service Input
Output
Acquiring {doi} Ci Cj
Browsing anchor Hyptxtk {doi}
Cataloging doi, msi_k (hi, mssi_m) (hi, mssi_(m+k))
Classifying doi classCt (doi, {ck_i})
Clustering {doi} X {cluk_i}
Expanding (query) {doi} IC_i, qi qj
Indexing Ci none IC_i
Linking doi Hyptxtk Hyptxtik
Logging none ei({pi}) log_entryi
Rating doi ,acj none {(doi,acj,rk)}
Searching q, Ci IC_i {dok}
Visualizing {doi} tfrk spik
Infrastructure Services: dealing with basic concepts such as collections and catalogs Repository-Building: create collections (digital objects)
and/or catalogs (metadata specifications). Preservational: generate instances by copying collections
(digital objects) or transforming (converting/translating) objects into different formats for preservation purposes
Add_Value: either aggregate value/information to collections (digital objects) or connect objects together.
Information Satisfaction: dealing with higher level societal requirements
KEY in next slide: Fundamental: minimal set of services or essential to existence
of a DL Composite DL service: takes input from some other service;
otherwise the service is called elementary.
Applications: A Taxonomy of DL Services
Applications: A Taxonomy of DL Services
Infrastructure Services Repository-Building Creational Preservational
Add_ Value
Information Satisfaction Services
Acquiring Authoring Cataloging Crawling (focused) Describing Digitizing Harvesting Submitting
Conserving Converting Copying/Replicating Emulating Translating (format)
Annotating Classifying Clustering Evaluating Extracting Indexing Linking Logging Measuring Rating Reviewing (peer) Surveying Training (classifier) Translating (language/format)
Binding Browsing Customizing Disseminating Expanding (query) Filtering Recommending Requesting Searching Visualizing
Searching Browsing
Ic
AcquiringUser interests/needs
query anchor
UniversalCollection
Ci
DMCi
Indexing
Society
actor
DescribingCataloguing
Linking
Hypertext
Infra-structure Services(fundamental)
Information Satisfaction Services(fundamental)
criteria sortOrder
{doi}
Submitting
Authoring
dok
mskj
Application: A Taxonomy of DL Services
DL Services I/O Behavior Regarding the prior figure, which shows:
Instantiations of the “Services Definition” model Inputs and outputs of examples of infrastructure
and information satisfaction DL services Key:
CDL = Collection
ICDL = index for collection CDL
{doi} = digital object
Soc = Society
Applications: A Taxonomy of DL Services
SearchingBrowsing
queryanchor
Society
actor
criteria sortOrder
Ck, {doi}
Recommending Filtering Binding Visualizing Expanding query
user model/expr Classifier/expr {doj}
{doR} {doF}
bi
InformationSatisfaction Services
spV query’
fundamental
Rating/Reviewing (peer)
Training
Infrastructure
Services (Add_Value)
composite
Application: Defining Quality in Digital Libraries
Formal theory can help to define “what’s a good digital library” by: Formally defining metrics of quality for each formal
concept (and relationships) Helping defining and applying numerical measures to
these metrics
Consider this in the Information Life Cycle
AuthoringModifying
OrganizingIndexing
Storing
Archiving
NetworkingAccessing
Filtering
Creation
DistributionUtilization
Reputation
Similarity
Desirability
AccuracyCompletenessConformance
Discovery
SearchingBrowsingRecommending
Relevance
Timeliness
Accessibility
Usage
Inactive
Active
Discard
RetentionMining
Semi-Active
Preservability
Timeliness
Defining Quality in Digital LibrariesDL Concept Dimensions of Quality Digital object Accessibility
Pertinence Preservability Relevance Similarity Significance Timeliness
Metadata specification Accuracy Completeness Conformance
Collection Completeness Impact Factor
Catalog Completeness Consistency
Repository Completeness Consistency
Services Composability Efficiency Effectiveness Extensibility Reusability Reliability
Defining Quality in Digital Libraries
Metadata specifications and metadata format - completeness Completeness of metadata specifications refers to the degree to
which values are present in the description, according to a metadata standard. As far as an individual property is concerned, only two situations are possible: either a value is assigned to the property in question, or not.
Metric Completeness(msx) = 1 - (no. of missing attributes in
msx/ total attributes of the schema to which msx
conforms)
Defining Quality in Digital Libraries Metadata specifications and metadata format
- completeness OCLC NDLTD Union Catalog
00. 10. 20. 30. 40. 50. 60. 70. 80. 91
GWUD LSU
VTETD
MIT
UBC
PHYSNET
VTINDIV
VANDERBILT
NCSU
USASK
PITT HKU
HUMBOLT
OCLC
BGMYU
DRESDEN
VIENNA
GATECH
ETSU USF
MUENCHEN
UTENN
CCSD
WATERLOO
NSYSU
LAVAL
UPSALLA
CALTECH
UCL
WagUniv
Defining Quality in Digital Libraries Services - Extensibility and Reusability
A service Y reuses a service X if the behavior of Y incorporates the behavior of X.
A service Y extends a service X if it subsumes the behavior of X and potentially includes additional subflows of events.
Metrics Macro-Reusability(Serv) = ( reused(sei), sei Serv)/ |Serv|,
where reused is a 1, if smj, sej reuses si; 0, otherwise. Micro-Reusability(Serv) = ( LOC(smx) * reused(sei), smx
SM, sei Serv, sex runs sei )/ |LOC(sm), sm SM|, where LOC corresponds to the number of lines of code of a service manager
Defining Quality in Digital Libraries Services - Extensibility and Reusability
Service Component
Based
LOC for implementing
service
LOC reused from
component
Total LOC
Searching – Back-end Yes - 1650 1650
Search Wrapping No 100 - 100
Recommending Yes - 700 700
Recommend Wrapping No 200 - 200
Annotating – Back-end Yes 50 600 600
Annotate Wrapping No 50 - 50
Union Catalog Yes - 680 680
User Interface Service No 1800 - 1600
Browsing No 1390 - 1390
Comparing (objects) No 650 - 650
Marking Items No 550 - 550
Items of Interest No 480 - 480
Recent Searches/Discussions
No 230 - 230
Collections Description No 250 - 250
User Management No 600 - 600
Framework Code No 2000 - 2000
Total 8280 3630 11910
Macro-Reusability = 3/16 = 0.187Micro-Reusability = 3630 / 11910 = 0.304
Application: Re-engineering a DL Specification Language
5SL: Specification Language Reengineering
Using the relationships to redefine/reorganize the semantics and organization of the XML elements within the several sections of the DL specification
Re-engineering a DL Specification Language
Re-engineering a DL Specification Language
5SLGen: Automatic DL Generation
5S Meta
Model5SLGraph
DL Expert
DL Designer
5SL DL
Model
5SLGen
Practitioner
Researcher
TailoredDL
Services
Teacher
componentpool
ODLSearch,ODLBrowse,ODLRate,ODLReview,
…….
Requirements (1) Analysis (2)
Implementation (4)
Design (3)
Conclusions and Future Work Presented a DL formal ontology which
specifies the semantics of the relationships among the DL concepts therefore completing a theory for DLs
Applied the resulting ontology to: Define a taxonomy of DL services Create a Quality Model for DLs Re-engineer a DL specification language
Conclusions and Future Work Future Work Include:
Including Pre- and Post-Conditions in the Service Behavior Analysis
New Applications of the Model/theory New Design and Generation Tools Quality tools Modeling Complex Heterogeneous/Integrated Systems
Archaeology (ETANA)
Develop theorems and proofs Writing books…