ontology-based knowledge organization systems in digital ...€¦ · the knowledge representation...

17
Ontology-based Knowledge Organization Systems in Digital Libraries: A Comparison of Experiments in OWL and KAON Ontologies Shalini R. Urs Executive Director International School of Information Management University of Mysore & Angrosh M A Research Fellow University of Mysore Keywords: Ontology; Knowledge Representation Formalisms; Description Logics; Web Ontology Language (OWL); Karlsruhe Ontology (KAON); Digital Libraries; Semantic Web; Facet Analysis; Ontology Markup Languages; Resource Description Language Schema (RDFS). 1 Introduction Grounded on a strong belief that ontologies enhance the performance of information retrieval systems, there has been an upsurge of interest in ontologies. Its importance is identified in diverse research fields such as knowledge engineering, knowledge representation, qualitative modeling, language engineering, database design, information integration, object-oriented analysis, information retrieval and extraction, knowledge management and agent-based systems design (Guarino, 1998). While the role-played by ontologies, automatically lends a place of legitimacy for these tools, research in this area gains greater significance in the wake of various challenges faced in the contemporary digital environment. With the objective of overcoming various pitfalls associated with current search mechanisms, ontologies are increasingly used for developing efficient information retrieval systems. An indicator of research interest in the area of ontology is the Swoogle, a search engine for Semantic Web documents, terms and data found on the Web (Ding, Li et al, 2004). Given the complex nature of the digital content archived in digital libraries, ontologies can be employed for designing efficient forms of information retrieval in digital libraries. Knowledge representation assumes greater significance due to its crucial role in ontology development. These systems aid in developing intelligent information systems, wherein the notion of intelligence implies the ability of the system to find implicit consequences of its explicitly represented knowledge (Baader and Nutt, 2003). Knowledge representation formalisms such as ‘Description Logics’ are used to obtain explicit knowledge representation of the subject domain. These representations are developed into ontologies, which are used for developing intelligent information systems. Against this backdrop, the paper examines the use of Description Logics for conceptually modeling a chosen domain, which would be utilized for developing domain ontologies. The knowledge representation languages identified for this purpose are Web Ontology Language (OWL) and KArlsruhe ONtology (KAON) language. Drawing upon the various technical constructs in developing ontology-based information systems, the paper explains the working of the prototypes and also presents a comparative study of the two prototypes. 2 Past Research Efforts Over a short span of time, ontologies have spawned fair degree of research interests, as evidenced in the literature. For instance, while Shum et al (2000) utilize ontology principles for developing SholOnto, an 1

Upload: others

Post on 16-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Ontology-based Knowledge Organization Systems in Digital Libraries: A Comparison of Experiments in OWL and KAON Ontologies

    Shalini R. Urs

    Executive Director International School of Information Management

    University of Mysore &

    Angrosh M A Research Fellow

    University of Mysore

    Keywords: Ontology; Knowledge Representation Formalisms; Description Logics; Web Ontology Language (OWL); Karlsruhe Ontology (KAON); Digital Libraries; Semantic Web; Facet Analysis; Ontology Markup Languages; Resource Description Language Schema (RDFS). 1 Introduction Grounded on a strong belief that ontologies enhance the performance of information retrieval systems, there has been an upsurge of interest in ontologies. Its importance is identified in diverse research fields such as knowledge engineering, knowledge representation, qualitative modeling, language engineering, database design, information integration, object-oriented analysis, information retrieval and extraction, knowledge management and agent-based systems design (Guarino, 1998). While the role-played by ontologies, automatically lends a place of legitimacy for these tools, research in this area gains greater significance in the wake of various challenges faced in the contemporary digital environment. With the objective of overcoming various pitfalls associated with current search mechanisms, ontologies are increasingly used for developing efficient information retrieval systems. An indicator of research interest in the area of ontology is the Swoogle, a search engine for Semantic Web documents, terms and data found on the Web (Ding, Li et al, 2004). Given the complex nature of the digital content archived in digital libraries, ontologies can be employed for designing efficient forms of information retrieval in digital libraries. Knowledge representation assumes greater significance due to its crucial role in ontology development. These systems aid in developing intelligent information systems, wherein the notion of intelligence implies the ability of the system to find implicit consequences of its explicitly represented knowledge (Baader and Nutt, 2003). Knowledge representation formalisms such as ‘Description Logics’ are used to obtain explicit knowledge representation of the subject domain. These representations are developed into ontologies, which are used for developing intelligent information systems. Against this backdrop, the paper examines the use of Description Logics for conceptually modeling a chosen domain, which would be utilized for developing domain ontologies. The knowledge representation languages identified for this purpose are Web Ontology Language (OWL) and KArlsruhe ONtology (KAON) language. Drawing upon the various technical constructs in developing ontology-based information systems, the paper explains the working of the prototypes and also presents a comparative study of the two prototypes. 2 Past Research Efforts Over a short span of time, ontologies have spawned fair degree of research interests, as evidenced in the literature. For instance, while Shum et al (2000) utilize ontology principles for developing SholOnto, an

    1

  • ontology-based digital library server to support scholarly interpretation and discourse, Lai and Yang (2001) have experimented utilization of ontology principles for deriving ontology-based metadata for Chinese information services in Chinese digital libraries. Yeh (2002) designed an ontology-based portal for digital archive services. Uszkoreit et al (2003) successfully employed Ontologies for creating a knowledge portal for the field of Language Technology. Kalfoglou et al (2004) developed an ontology-driven web-based system for personalized news services. Since the last few decades, research in the field of knowledge representation and reasoning is focused on developing methods for obtaining high-level descriptions of the world that can be effectively used to build intelligent information systems. These approaches are roughly divided into two major categories viz., logic based formalisms and non-logic based representations. While logic based formalisms have evolved believing intuitively that predicate calculus could be used unambiguously to capture facts about the world, the latter involves non-logic based representations. Logic based knowledge representations also known as ‘Description Logics’ refers to the family of knowledge representation formalisms that capture the knowledge of a particular domain. These formalisms can be employed for obtaining a high-level description of the subject domain, which could form the basis of developing ontologies. In our research, we are experimenting with the use of faceted principles in conjunction with Description Logics for deriving the required knowledge representation. Historically, faceted approach and analysis developed by Ranganathan engendered a paradigm shift in traditional library classification field (Ranganathan, 1957). In recent times, application of faceted approach has received the attention and generated increased research efforts in the Internet technologies as well. Studies observe huge potential for application of these principles in different aspects of digital information management (Godert, 1991; Ingwersen and Wormell, 1992). Facet relations are observed to provide a multi-dimensional schematic representation of available information, making it possible for creating newer and efficient pointers of access for retrieving more precise information. Ellis and Vasconcelos (2000) note that the application of facet relations in conjunction with subject directories and search engines can drastically alleviate problems currently faced in searching the World Wide Web. Set in the backdrop of the importance of description logics, facet relations and ontologies in current digital information scenario, the present paper examines the use of description logics and facet relations for developing ontology-based knowledge management systems for digital libraries. The field of Agriculture has been selected as the domain of interpretation. The following section details the use of Description Logics and facet relations for conceptually modeling the Agriculture domain. 3 Knowledge Representation of Agriculture Domain using Description Logics and Facet Relations With the objective of developing a pilot ontology-based information system, the field of “Agricultural Documents” was chosen as a domain of interpretation. In order to conceptually model the identified domain, we segregated the domains of Agricultural Documents into individual domains of ‘Agriculture’ and ‘Documents’. Employing Description Logics, the conceptual modeling of the domain in question involves definition of:

    • Atomic Concepts (unary predicates) of the domain, which denotes a set of individuals • Atomic Roles (binary predicates), which are used to express relationships between concepts

    The simplified model for facet analysis (Spiteri, 1998) was followed for identifying the major facets appearing in the Agriculture domain. For instance, the field of Agriculture comprises of various facets including Agricultural Crop; Pest; Disease; Pesticide and Geographical Area. Further, the semantics of

    2

  • concept description is defined by the notion of interpretations, wherein an interpretation I consists of a non-empty set ∆I (the domain of interpretation) and an interpretation function, which assigns to every atomic concept A a set AI ⊆ ∆I and to every atomic role R a binary relation ⊆ ∆I x ∆I Interpretation I = (∆I, .I) The Description Logics categorically differentiates a knowledge base into intensional knowledge i.e., general knowledge about the domain in question and extensional knowledge, which is specific to a particular domain. Thus, a DL knowledge base typically comprises of two components viz. a “TBox” and an “ABox”. While the TBox contains intensional knowledge in the form of a terminology, which is built through declarations that describe general properties of concepts, the ABox contains extensional knowledge – also referred to as assertional knowledge. Assertional knowledge refers to the knowledge that is specific to the individuals of the domain of discourse. 3.1 Knowledge Terminology in a TBox The TBox conceptualizes the vocabulary of the knowledge base in terms of concepts and roles. The important assumptions about DL terminologies are maintained. These include:

    • allowance of only one definition for a concept. • acyclic characteristics of the definitions – in the sense that concepts are neither defined in terms of

    themselves nor in terms of other concepts that indirectly refer to them An information retrieval system in the context of a digital library should not only take into account the subject matter dealt in the DL, it is also required to consider the characteristics of the document domain which holds the subject content. As the study considered Electronic Theses and Dissertations (ETDs) as a document genre for its pilot system, the different characteristics of ETDs considered in the TBox included: (a) Title (b) Creator (c) Contributor / Supervisor (d) Degree Grantor and (e) year of publication. The concepts developed for the domain of Agriculture is in the form of taxonomy. Figure 1 portrays the conceptual structure of the hierarchy developed in the TBox.

    Figure 1: The TBox Hierarchy of the Agriculture Domain

    Science Year

    Degree Grantor

    Contributor / Supervisor

    Creator

    Title

    Agriculture

    Horticultural Crops

    PlantationCrops

    Field Crops

    Soil Agricultural Diseases

    Agricultural Pests

    Agricultural Crops

    ETDs

    Agricultural Document

    3

  • The set of definitions used to describe complex descriptions in our TBox are as follows:

    Document ≡ ∀hasTitle.Title п ∀hasAuthor.Author п ∀hasEdition.Edition п ∀hasDescription.Description

    Agricultural Document ≡ Document ⊔ Agriculture Document of a Special Class

    in Agriculture ≡ ∀hasTitle.Title п ∀hasSpecialClass.SpecialClass

    Document of a Specific Crop in Agriculture

    ≡ ∀hasTitle.Title п ∀hasCropTerm.Crop

    Document of a Specific Disease in Agriculture

    ≡ ∀hasTitle.Title п ∀hasDiseaseTerm.Disease

    Document of a Specific Soil in Agriculture

    ≡ ∀hasTitle.Title п ∀hasSoilTerm.Crop

    Document of a Specific Fertilizer in Agriculture

    ≡ ∀hasTitle.Title п ∀hasFertilizerTerm.Fertilizer

    Document of a Specific Pest in Agriculture

    ≡ ∀hasTitle.Title п ∀hasPestTerm.Pest

    Document of a Specific Development State in

    Agriculture

    ≡ ∀hasTitle.Title п ∀hasDevelopmentStateTerm.State

    The assertional knowledge primarily refers to the document specific metadata and domain specific metadata of Agricultural documents identified for the study. The aforementioned DL knowledge base involving the TBox and the ABox are implemented in terms of a DL based ontology representation languages. To this end, the ontology representation languages chosen for the study are OWL DL, a Description Logics based language with computational properties for reasoning systems and KAON language, a primitive of RDFS. The objective of selecting OWL and KAON for the study is that with Description Logics representation implemented in these languages, the same can be used for inference and reasoning. This would facilitate developing Description Logics based intelligent information systems. The following section while defining the concept of ontologies, explains the rationale for choosing the identified languages and details the process of developing ontologies from the available DL knowledge base. 4 Ontology While philosophical ontology is primarily concerned with the ‘establishment of truth’ by finding answers to questions such as ‘what exists?’ the field of Artificial Intelligence utilizes the term ‘ontology’ to mean the science of specification of existing concepts. According to the famous definition laid down by Tom Gruber, ontology is defined as “a specification of conceptualization”, wherein ‘conceptualization’ refers to couching of knowledge of a particular domain (or general) in terms of entities (things, relations and constraints) and ‘specification’ implies representation of conceptualization in concrete form (Gruber, 1993). The field of information science has accepted the notion of ‘ontology’ as a formal theory involving definitions and a supporting framework of axioms themselves providing implicit definitions of terms involved (Smith, 2003). In the context of information systems, ontology is distinguished as ‘a software artifact or formal language designed with a specific set of uses and computational environments in mind (Viinikkala, 2004). Furthering this notion, Zuniga (2001) notes that information systems ontology is ‘a formal language designed to represent a particular domain of knowledge, which depicts the structure of domain objects in question and

    4

  • accounts for the intended meaning of a formal vocabulary or protocols. This paper adopts the notion of ontology as a ‘representation of knowledge about a particular domain, which would be utilized for generating rich semantic metadata structures, to be acted upon as ‘search points’ for providing value added information services in digital libraries. 4.1 Ontology Markup Languages The surge of interest in the semantic web and other knowledge organization systems has resulted in the development of various ontology languages for exploiting the power of web. Commonly referred to as web-based ontology languages or ontology markup languages, the syntax of these languages are based on existing markup languages such as HTML and XML. Simple HTML Ontology Extensions (SHOE), built on HTML combining frames and rules was the first ontology language to emerge. Developed as an extension of HTML, the language used tags that were different from those of HTML specifications, allowing the insertion of ontologies in HTML documents. The ontology languages developed later were based on XML. XML-based Ontology Exchange Language (XOL) was developed as an XMLization of a small subset of primitives from the Open Knowledge Based Connectivity (OKBC) protocol, called OKBC-Lite. RDF (Resource Description Framework) was developed by the W3C as a semantic-network based language to describe web resources. The RDF Schema was also developed by the W3C as an extension to RDF with frame-based primitives. The combination of RDF and RDF Schema is normally known as RDF(S). While these languages have established the foundations of the Semantic Web, three more languages have been developed as extensions to RDF(S). These languages include OIL, DAML+OIL and OWL. The Web-Ontology (WebOnt) Working Group formed by the W3C in 2001 came out with a new ontology markup language for the Semantic Web known as the OWL language. In addition to these W3C standards, there have also been various initiatives by different research projects for promoting the development of Semantic Web. KAON (KArlsruhe ONtology) is one such initiative of Forschungszentrum Informatik (FZI) and the University of Karlsruhe (TH). It is an open-source ontology management infrastructure targeted for business applications. This paper is an outcome of our exploration into the usefulness of ontology languages by specifically considering OWL and KAON based on the specifics of these languages and developing applications with these languages. 4.2 Choice of OWL and KAON Ontology Languages: The Rationale Due to easy availability of various ontology languages, the process of identifying and selecting an appropriate ontology language assumes crucial significance. The selection process demands special attention because each ontology language differs in terms of expressiveness and reasoning capabilities. Following Gomez-Perez (2003), the parameters considered for selecting the ontology language included:

    Expressiveness Inference Mechanisms Appropriation of the language for exchanging ontologies between applications. Integration of the ontology language with other web standards

    4.2.1 Expressiveness of the Ontology Language OWL provides the following three increasingly expressive sub-languages designed for use by specific communities of implementers and users (W3C, 2004):

    OWL Lite - supports users who primarily need a classification hierarchy and simple constraints.

    5

  • OWL DL - supports users who want the maximum expressiveness while retaining computational completeness and decidability. OWL DL includes all OWL language constructs, but they can be used only under certain restrictions.

    OWL Full - is meant for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees.

    Reynolds et al (2005) note that OWL supports representation flexibility (especially ability to model graph structure) and facilitates representing instance and class information in the same formalism and thereby combine them. With reference to expressiveness, KAON provides for the following:

    Clear separation of the model and meta-model (for instance, subClassOf, domain etc. are modeling primitives and are not accessible as entities in the model)

    Support for RDFS modeling constructs. Support for symmetric, transitive and inverse properties, with properties having domains and

    ranges, which are sets of concepts. Explicit presence of lexical information in the model. Support for meta-class modeling.

    4.2.2 Inference Mechanism of the Ontology Language OWL is equipped with formal semantics described in the OWL Web Ontology Semantics and Abstract Syntax (OWLS & AS) (W3C, 2003). These semantics facilitate in drawing inferences about ontologies and individuals. OWL supports the following types of inference mechanism: (a) Class Inferences (b) Instance Inferences (c) Distribution Rules and (d) Closed and Open Worlds. KAON supports the notion of unambiguous and clear formal semantics represented in ontologies. It utilizes symmetric, transitive and inverse properties in inference problems. Further studies observe that the deductive databases employed in KAON outperform tableau based OWL reasoners by several orders of magnitudes and that the DLP fragment is sufficient to express most OWL ontologies (> 80 %) completely (FZI and AIFB, 2001). 4.2.3 Appropriation of Ontology Languages for Exchanging Ontologies between Applications Re-use of ontologies can largely minimize intellectual effort involved in developing ontologies. OWL facilitates ontology sharing and re-use through the following: (W3C, 2004).

    Equivalence between classes and properties (equivalentClass, equivalentProperty) Identity between individuals (sameAs) Different individuals (differentFrom, AllDifferent)

    With reference to KAON, its power is realized in terms of middleware support for web application development. Oberele et al (2006) employ KAON server to leverage ontology for supporting developers and administrators in their complex tasks during development, deployment and runtime. Presenting a conceptual architecture for facilitating exchange of ontologies, Bozsak et al (2002) propose mapping of several logic languages, extended with further axioms for developing a KAON-based ontology model as the fourth layer of Semantic Web. 4.2.4 Integration of Ontologies with other Web standards

    6

  • The semantic web is expected to be built on XML's ability for defining customized tagging schemes and RDF's flexible approach for representing the data. With this intent, OWL is designed as a part of the growing stack of W3C recommendations related to semantic web, which includes XML, XML Schema, RDF and RDS Schema.

    KAON is established as a platform required for applying semantic web technologies to e-commerce and B2B scenarios. The KAON language is observed to support RDFS (RDF Schema) modeling constructs. The KAON API, the focal point of the middleware layer in KAON architecture provides ontology-compliant access to data stored in existing systems such as relational or XML databases through special mapping implementations. The KAON Server which provides a persistent, transactional and secure RDF repository is realized within J2EE framework, which conforms to current web standards. Thus, various specifics identified for OWL and KAON indicate that these ontology languages offer standard facilities for developing ontologies. Based on the above observations and understanding, the present study focused on OWL and KAON. The following sections briefly describe the experiments that were carried out. A comparative description of experiments undertaken sums up the research findings. 5 Experiments with OWL and Jena The TBox representation of the DL knowledge base for the Agriculture domain identifies the key concepts of the domain and the relations binding the facets. In order to derive rich knowledge base, the experiment focused on developing relational databases of these concepts. For instance, databases for the concepts of Agricultural Crop, Agricultural Pest and Agricultural diseases may be seen in Tables 1, 2 and 3 respectively. The idea was to relate these databases with the concepts defined in the ontology.

    Table 1: Snippet of the Agricultural Crop Database in Agri-Pest Domain

    CropID CropTypeID CropName LocalName CropScientificName1 CropScientificName2 1 1 Finger Millet Ragi Eleusine coracana L. Gaertn - 2 2 Pearl Millet Bajra Pennesitum glaucum L. - 3 3 Rice Paddy Oryza sativa O. glaberrima

    Table 2: Snippet of the Agricultural Pest Database in Agri-Pest Domain

    PestID PestTypeID PestCrop PestCommonName PestScientificName PestFamily PestFoundInPlaces DamageMade

    1 1 Sorghum African Armyworm

    Spodoptera exempta (Wlk.) Noctuidae South East Asia

    Larvae defoliate

    2 2 Rice African Rice Bug Stenocoris south wodi Ahmad Coreidae Africa Sap-sucker

    3 3 Sorghum American Bollworm

    Heliothis armigera (Hub.) Noctuidae South America

    Larvae feed on panicle

    Table 3: Snippet of the Agricultural Crop Disease Database in Agri-Pest Domain

    DiseaseID DiseaseTypeID CropName DiseaseName DiseaseCausalOrganism DiseaseControl

    1 1 Rice Bacterial Blight Xanthomonas campestris pv. Oryzae (Ishiyama) Dye

    Agrimycin and wetteble Ceresan

    2 2 Wheat Black Rust Puccinia graminis Resistant Varieties

    3 3 Finger Millet

    Blast - Finger Millet Pyricularia oryzae Cavara

    Bordeaux mixture ; fungicide

    7

  • 5.1 Ontology modeling of Agri-Pest-Disease Domain The TBox hierarchy obtained for the Agri-Pest-Disease domain is developed as a OWL DL ontology. The Protégé-OWL Editor, an extension of Protégé that supports OWL was utilized for developing the Agri-Pest ontology (Protégé, 2006). An OWL language may include descriptions of classes, properties and their instances. A schematic representation of the OWL ontology developed in Protégé-OWL Editor is shown in Figure 2.

    Figure 2: Schematic Representation of OWL ontology facets

    Sub Class Relationships Property Relationships

    pest Controlled By

    has Disease Name

    disease Controlled By

    disease Affected Crop

    has Crop Scientific Name

    has Crop Name

    infects Agricultural Crop

    has Pest Scientific Name

    has Pest Common Name

    Creates Damages

    Agricultural Pesticides

    Agricultural Crop Disease has Disease Causal Organism

    Agricultural Crop Agricultural Pest

    Agri-Pest Domain

    Ontology

    5.2 Ontology-based Knowledge Management System using OWL DL Ontology The primary objective of creating the ontology is to facilitate derivation of semantic metadata capturing contextual information pertinent to the domain in question. The domain databases described in the previous sections are targeted through the domain ontology in order to create contextually rich semantic metadata. The framework of the developed pilot ontology-based knowledge management system is shown in Figure 3. As seen in Figure 3, the process of ontology building involves execution of a series of activities primarily aimed at developing knowledge organization systems. Following Beck and Pinto (2002), the process of ontology development involved specification, conceptualization, formalization, implementation and maintenance. While ‘specification’ implies identification of purpose and scope of the ontology, ‘conceptualization’ deals with description of the conceptual model as per stipulations laid in the earlier stage. ‘Formalization’ indicates transforming the conceptual description into a formal form. The process of ‘implementation’ involves implementing the formalized ontology in a formal knowledge representation language. As seen in Figure 3, an Information Manager is responsible for organizing and managing digital information. The information manager provides valuable points of access to digital resources through the creation of syntactic metadata. The practice of information search in the present digital environment is

    8

  • currently limited to this level. However, with the availability of domain ontology, domain-specific databases and syntactic metadata, a process can be initiated which reads the syntactic metadata through the domain ontology in relation to available domain specific databases to produce semantic metadata. Semantic metadata are those forms of metadata, which provide contextually relevant information about the content based on a domain specific ontology (Fisher, 2005). The process thus results in the creation of a knowledge base involving knowledge representation conglomerating syntactic metadata and domain specific knowledge. The architecture of the developed ontology-based knowledge organization system is shown in Figure 4. As seen in Figure 4, the architecture comprise of three layers viz., Presentation Layer, Business Logic Layer and Data Layer.

    DOMAIN EXPERT

    KNOWLEDGE WORKER

    INFORMATION MANAGER

    END USER

    ONTOLOGY WORKER

    Implementation

    Figure 3: Framework of the Ontology-based Knowledge Organization System

    Maintenance

    Subject knowledgeembedded in

    documents

    Facet Relations

    Maintenance of Ontology

    Conceptualization

    Specification

    Conceptual mode & transformation info formal

    form

    Purpose, scope and requirements identification

    Query Expansion Mechanisms Knowledge Sharing

    Knowledge Base (Semantic Metadata)

    Domain Knowledge Databases

    Syntactic Metadata Syntactic Metadata

    Domain Ontology

    Facet Ontology

    P

    Figure 4: Architecture of the Ontology-based Knowledge Organization System

    Application ServerWeb Domain Data Manager Domain

    Knowledge Database DatabaseManager DO

    Manager

    Document Data Manager

    Document Database Ontology Manager JENA

    MODEL Data Read J

    S

    Information Ontology

    Knowledge Search Engine Knowledge Base KO Search Domain Ontology Presentation Layer Business Logic Layer Data Layer

    9

  • The presentation layer comprise of several JSP and HTML pages. This layer is responsible for the following functionalities:

    • Update document databases and domain specific databases • View Ontology • Derive updated semantic metadata through ontology • Execute information search

    The business logic layer consists of three main components viz., Document Object Manager, Ontology Manager driven Jena model and a Knowledge Object Search entity. While the DO manager is responsible for updating various databases, the Ontology manager reads the Agri-Pest domain ontology into a Jena model for producing contextually rich semantic metadata. The semantic metadata thus available is queried through MK Search Engine, an open source artifact that was configured for including properties defined in the ontology. Such configuration facilitated in creating contextually relevant search points. The data layer primarily consisted of various data sources necessary for the present prototype. 5.3 Working of OWL DL Ontology-based System for Agri-Pest-Disease Domain As seen in Figure 2, the Database Manager is primarily responsible for adding, modifying and deleting records pertinent to digital library. A record primarily constitutes digital content, which can be a full-text article, web page, e-book or an Electronic Thesis and Dissertation (ETD) as the case might be. In the first stage, the Database Manager is supposed to provide syntactic or descriptive metadata in accordance with Dublin Core metadata standards available from the record. At a higher level, the database manager is supposed to identify the agricultural crop, pest and disease, dealt in the record. The data keying-in interface of the knowledge management is designed to prompt the manager to input the principal agricultural crop, pest and disease identified. Suitable provisions are provided to limit the addition of records with either a crop and a pest or a crop and a disease combination as the case demands. As an illustrative example, consider keying-in of a record titled “Sorghum Insect Pests and their Management” in the knowledge system. The ontology manager is primarily responsible for transforming existing descriptive metadata into contextually rich semantic metadata. The semantic metadata creation primarily involves a mapping process of currently available metadata record and its associated data fields present in the domain databases through the developed ontology. The process results in the creation of an RDF metadata knowledge base involving knowledge representation capturing records and pertinent data present in domain databases. For our example above, the RDF semantic metadata capturing relevant contextual information is shown in Table 4.

    Table 4: Semantic Metadata Capturing Contextual Information for the sample

    Contarinia sorghicola

    /j.0:createsDamages> Larvae destroy seeds< Rhopalosiphum Sacchari Larvae defoliate Spodoptera exempta The paper describes various pests of Sorghum crop in detail Infest foliage Sorghum bicolor (L.) Moench. Sorghum Midges NONE - - - Sorghum Sorghum Aphid

    Thailand

    Sorghum Sorghum Insect Pests and their Management - - African Armyworm Sorghum Insect Pests and their Management - Buntin, David G. sorghum Africa

    10

  • The present prototype ontology-based knowledge management system utilizes a Meta search engine for using semantic metadata created by the Ontology Manager. MK Search Engine, an open source meta search engine is utilized for achieving this objective. The search engine facilitates representation of various relations defined in the agri-pest-disease domain ontology to be included in the search process through creation of search points. While regular search strategies involving syntactic metadata would limit search options to key search terms, a semantic metadata involved ontology-based knowledge organization system would provide more options in searching the desired information. These systems provide options for creating 'search building mechanisms' wherein a user can integrate various fields for retrieving the required information. Screen shots of the developed system and the search interface employing MK search engine is shown in Figures 5 and 6 respectively.

    Figure 5 & 6: Screenshots of ontology based KOS developed in OWL

    6 Experiments with KAON The KAON is targeted for semantics-driven business applications (FZI and AIFB, 2001). The suite includes a comprehensive tool set that enables easy ontology management and application. The OI-Modeler available in KAON, which facilitates KAON ontology creation and modification, was utilized for developing the required ontology. The KAON Portal, a simple tool that can be used to generate ontology-based web portals was configured for using the developed ontology. The knowledge model underlying the KAON suite is based on an extension of RDF(S), the Resource Description Framework Schema. RDF(S) is widely used as a representation format in many tools and projects, and there exists a huge amount of resources for RDF(S) handling, such as browsing, editing, validating, querying, storing etc. (Gomez-Perez et al, 2003). RDF(S) provides the most basic primitives for ontology modeling, achieving a balance between expressiveness and reasoning. It has been developed as a stable core of primitives that can be easily extended. Different ontology languages that extend and reuse RDF(S) include OIL, DAML+OIL and OWL (op. cit.) As against our earlier approach in the case of OWL, wherein a schematic view of the domain was represented in the ontology, which was targeted to different databases for deriving RDF data and queried employing search engine, the approach in KAON involved developing the ontology for the required domain and then populate the ontology with document instances in appropriate classes. For this purpose, the DL TBox defined for the Agriculture domain is considered as to contain different classes for different documents relating to specific crops. For instance, the schematic representation of various classes and relationships defined for Sorghum Class is shown in Figure 7.

    11

  • Figure 7: Schematic View of Sorghum related classes defined in KAON ontology

    hasAu hor tName

    (Main Clas

    Degree Gran tor

    (Subclass)

    Documents related to Sorghum (Main Class)

    Crop Scientific Name

    (Main Class)

    Sorghum – PesClass

    ts

    (Main Class)

    Sorghum – P Common Na

    estme

    (Subclass)

    Sorghum – P Scientific Na

    estme

    (Subclass)

    Sorghum – P Family

    est

    (Subclass)

    Sorghum – Pe - Geographic Pl s

    stsace

    (Subclass)

    Sorghum – Dis se Name

    ea

    (Subclass)

    Sorghum – Dis se Causal Organ m

    eais

    (Subclass)

    Sorghum – Dis se Control

    ea

    (Subclass)

    Creator

    (Subclass) Year

    (Subclass)

    Sorghum – Dise s aseClass

    s)

    hasPestCommonName

    hasPestScientificName

    hasPestFamily

    hasPestGeographicPlace

    hasDiseaseName

    hasDiseaseCausalOrganism

    hasDiseaseControl

    hasPublisher

    hasEdition

    hasCr Scientific opName

    Thus, in order for developing the ontology capturing the subject content of the sample document instances, the following approach was adopted: 6.1 Definition of Required Classes

    • Classify documents according to the Crop. (E.g. Sorghum – Documents) • Define the individual Crop classes (E.g. Sorghum – Documents Class) • Define classes containing descriptive metadata (E.g. Creator Class, Degree Grantor Class etc.) • Define the Pest class and its associated classes (E.g. Pest Common Name Class etc.) • Define the Disease class and its associated classes. (E.g. Disease Causal Organism Class etc.)

    6.2 Definition of Required Properties and Mapping of Instances

    • Various properties that characterize the instances of Descriptive Metadata classes, Pests and its associated classes and Disease and its associated classes are defined for Crop – Document class.

    • Mapping between the document instance of the Crop-Document Class and the instances of related classes are carried out through using properties defined above.

    A similar approach is followed for developing various classes and relations for other crops considered in the study. 6.3 Working of the KAON ontology in KAON Portal Once the ontology was developed with the above identified classes and relationships, the KAON Portal was configured to browse the ontology. For our sample agri-pest domain, focusing on five crops identified in Table 1, we had about 15 classes each representing the documents related to the Crop, the Pests associated with the Crop and the diseases observed in a particular Crop respectively. Further the Author class, Publisher Class and the Crop Scientific Name were defined as Main Classes due to their general relations with the documents and the crop. Thus we have around 18 Top Concepts, which could be browsed by the user.

    12

  • 6.4 Information Search in KAON Portal There are two options for the user to search for information in the ontology.

    a. Simple search b. Browse the ontology and search for relevant information

    Figure 8 provides the screenshot of the KAON portal developed to browse the ontology

    Figure 8: Screenshot of the Top Page of KAON ontology in KAON Portal The simple search involves keying a related term in the form provided in the user interface (may be seen in Figure 8). The search returns various fields involving the searched term. Upon viewing a particular term, the system returns available relation to the respective document. Thus, the user can obtain the desired record. The second way of searching the information in the KAON ontology, involves the browsing of the ontology itself. As mentioned before the first stage provides a view of available top concepts in the ontology. The user can then go to the required class, view its subclasses and find the required instance. By clicking on the instance, the system pops up available relations defined for the instance connecting other instances. Thus the user can obtain his piece of information through browsing the ontology. 7 Comparison of Experiments with OWL and KAON The two approaches adopted in the study viz., use of OWL ontology and KAON ontology throw valuable insights upon the role of ontologies in the context of digital libraries particularly with reference to information retrieval. The following sections provide the different perspectives of these approaches. 7.1 Ontology Development Environment Ontology building tools are observed to facilitate the following - acquire domain ontology; organize the ontology; flesh out the ontology; check and commit the ontology (Denny, 2002). The OWL ontology was developed using Protégé-OWL Ontology Editor (Protégé, 2006). The editor provided for a good environment for developing the ontology. As developing the OWL ontology involved development of schematic representation of the entire domain, the number of various classes and relations were relatively

    13

  • less. The editor supported definition of various classes and relations. However, in order to utilize the ontology developed with Protégé in Eclipse environment, the present study utilized SWeDe (Semantic Web Development Environment) an extensible framework developed by BBN Technologies (SWeDe, 2005) for integrating new and existing tools for the semantic web. The toolkit provided for an OWL editor and an OWL validator in the Eclipse environment. The Java interfaces contained in the OWL ontology was generated using the ‘Schemagen Vocabulary’ generator available with SWeDe. The developed prototype utilized these interfaces for the mapping of instances stored in MySQL database. The linguistics of OWL language was user-friendly in understanding the classes and relations defined in the ontology. The KAON ontology was developed using OI Modeler (FZI and AIFB, 2001). While the OI modeler provided for a good graphical user interface for developing ontologies, the task of developing ontologies became tedious with the increase in classes and relations. The language representation in KAON language was complex in nature particularly in understanding the different classes and relations defined in the ontology. 7.2 Search Experience The experiments conducted in OWL and KAON provided for various search options for searching information. In both the experiments, the search process was initiated upon stored information in ontology form. In the case of OWL, the developed domain ontology was targeted at backend databases for obtaining an ontology representation capturing data values in RDF form. This schematic ontology representation was then targeted for querying purposes. However, in the case of KAON, the ontology was developed to include both the domain and data values. Thus, in both the cases, the ontology was utilized for search process. The developed systems provided for the following search features: 7.2.1 Simple Search: Both OWL and KAON approaches facilitate simple search. The simple search could be executed for single words, which can include various fields such as Author, Title, Keywords and Publisher etc. 7.2.2 View Property Relations: The experiments conducted with OWL and KAON provided for viewing of property relations defined in the ontology for the user. Effective search process can be performed with the availability of such property relations. While these relations allow the user community to narrow down their search process to precise information, it also facilitates informing the user about the available relations in the domain. Thus, while in the case of OWL, the relations available were in the form of input fields, used by a user for keying in the related data, the property relations in KAON facilitated navigation to the desired specific context. 7.2.3 Query Building Mechanism: The prototype developed with OWL approach facilitated in ‘query building mechanisms’ for searching information. Such query building strategies largely aid in bringing together different fields during the search process, which considerably narrows down the search process for required information. These mechanisms can be effectively utilized in increasing the precision rates of search strategies. 7.2.4 Taxonomic View: Experiments conducted with KAON provided for a taxonomic view of the domain and the related documents represented in the ontology. A taxonomic view of the domain and the available information is found to be highly useful in information retrieval practices. McGuinness (2003) observes that a taxonomic view aids in (a) providing a controlled vocabulary (b) site organization and navigation (c) supports expectation setting (d) support browsing support and (e) helps in creating ‘umbrella’ structures, which extend content (McGuinness, 2003). These positive facets of taxonomic view were realized with the KAON approach.

    14

  • The various features compared with OWL and KAON approach is tabulated in Table 5.

    Table 5: Comparison of Experiments with OWL and KAON

    Sl. No. Task OWL KAON Ontology Development

    1. Ontology Editors • Protégé-OWL Editor • SWeDe plugin for Eclipse

    OI-Modeler

    2. Mapping of Instance Data 3. Linguistics

    (Language Usage) Easy Complex

    Search Experience 4. Simple Search 5. Query Building Mechanism 6. Taxonomic View 7. View Property Relations

    8 Conclusions and Future Work The results of our initial research into the use of Description Logics and facet relations for developing OWL and KAON ontology-based information management systems appear to be very promising. Ontologies, specifically when based on knowledge representation formalisms do add value and enhance the search experience. Taking lead from these initial findings, we intend to develop a full-fledged ontology for the domain of agriculture, identifying different relationships that exist among the different facets of the domain. Further, the study would be expanded by adding more document instances. Our future research would be directed towards implementing KAON using relational databases and working on the engineering server mode.

    15

  • 9 References

    1. Baader, F. and Nutt W. 2003. Basic Description Logics. In The Description Logic Handbook: Theory, implementation and applications, pp. 47-100, edited by Franz Baader, Diego Calvanese, Deborah McGuinness, Daniele Nardi and Peter Patel-Schneider. Cambridge: Cambridge University Press.

    2. Beck, H. and Pinto, H.S. 2002. Overview of Approach, Methodologies, Standards and Tools for Ontologies. UN: The Agricultural Ontology Service, FAO.

    3. Bozsak et al. 2002. KAON: Towards a large Scale Semantic Web. In Proceedings of the Third International Conference on E-Commerce and Web Technologies. pp. 304-313. 2002. http://portal.acm.org

    4. Ding, Li et al. 2004. Swoogle: A Search and Metadata Engine for the Semantic Web. In Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management. New York: ACM Press, pp. 652-659. http://ebiquity.umbc.edu/paper/html/id/183/Swoogle-A-Search-and-Metadata-Engine-for-the-Semantic-Web.

    5. Denny, M. 2002. Ontology Building: A survey of editing tools. xml.com. http://www.xml.com/pub/a/2002/1/06/ontologies.htm.

    6. Ellis, D. and A. Vasconcelos. 2000. The Relevance of Facet Analysis for World Wide Web Subject Organizations and Searching. Journal of Internet Cataloguing. 2(3/4), pp. 97-114.

    7. Fisher, M. 2005. Semantic Enterprise Content Management. Semagix. www.semagix.com 8. FZI and AIFB. 2001. KAON: Karlsruhe Ontology Management Suite. FZI and AIFB. 9. Godert, W. 1991.Facet Classification in Online Retrieval. Internal Classification. 18(2), pp. 98-

    109. 10. Gomez-Perez A. Mariano Fernandez-Lopez and Oscar Corcho. 2003. Ontological Engineering.

    London: Springer-Verlag. 11. Guarino, N. 1998. Formal Ontology and Information Systems. In Proceedings of the 1st

    International Conference on Formal Ontologies in Information Systems, FOIS’ 98, edited by N. Guarino, Italy: IOS Press. pp. 3-15. http://citeseer.ist.psu.edu/guarino98formal.html

    12. Gruber, T. 1993. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisitions. 5, pp. 199-220.

    13. Ingwersen, P. and Wormell, I. 1992. Ranganathan in the Perspective of Advanced Information Retrieval. Libri. 42, pp. 184-201.

    14. Kalfoglou, Y.; John Domingue; Enrico Motta; Maria Vargas-Vera and Simon Buckingham Shum. 2004. myPlanet: An Ontology-driven Web-based Personalized News Service. Knowledge Media Institute (KMi), The Open University, Milton Keynes, U.K. http://www.tzi.de/buster/IJCAIwp/Finals/kalfoglou.pdf

    15. Lai, Mao-sheng and Yang, Xiu-dan. 2001. Ontology-based Metadata Schema for Chinese Digital Libraries. Department of Information Management, Peking University, Beijing. http://www.cs.vu.nl/franksh/postscript/K-CAP01.pdf

    16. McGuinness D.L. 2003. Ontologies Come of Age. In Spinning the Semantic Web: Bringing the World Wide Web to its full potential, edited by Dieter Fensel, James A. Hendler, Henry Lieberman and Wolfgang Wahlster. England: The MIT Press, pp. 171-196.

    17. Oberle D.; Steffen Staab and Andreas Eberhart. 2006. Towards Semantic Middleware for Web Application Development. IEEE Distributed Systems Online. ISSN: 1541-4922. http://dsonline.computer.org/portal/site/dsonline

    18. Protégé. 2006. Protégé-OWL Editor, Stanford Medical Informatics. http://protégé.stanford.edu/overview/protégé-owl.htm.

    19. Ranganathan, S.R. 1957. Prolegomena to Library Classification. Bombay: Asian Publishing House.

    16

    http://portal.acm.org/http://ebiquity.umbc.edu/paper/html/id/183/Swoogle-A-Search-and-Metadata-Engine-for-the-Semantic-Webhttp://ebiquity.umbc.edu/paper/html/id/183/Swoogle-A-Search-and-Metadata-Engine-for-the-Semantic-Webhttp://www.xml.com/pub/a/2002/1/06/ontologies.htmhttp://www.semagix.com/http://citeseer.ist.psu.edu/guarino98formal.htmlhttp://www.tzi.de/buster/IJCAIwp/Finals/kalfoglou.pdfhttp://www.cs.vu.nl/franksh/postscript/K-CAP01.pdfhttp://dsonline.computer.org/portal/site/dsonlinehttp://prot�g�.stanford.edu/overview/prot�g�-owl.htm

  • 17

    20. Reynolds, D.; Carol Thompson; Jishnu Mukerji and Derek Coleman. 2005. An Assessment of RDF/OWL Modeling. HPL-2005-189. October. http://www.hpl.hp.com/techreports/2005/HPL-2005-189.pdf.

    21. Shum, S.B.; Enrico Motta and John Domingue. 2000. ScholOnto: An Ontology-based Digital Library Server for Research Documents and Discourse. International Journal of Digital Libraries. 3(3), pp. 237-248.

    22. Smith, B. 2003. Ontology. In Blackwell Guide to the Philosophy of Computing and Information, edited by Luciano Floridi, Oxford: Blackwell Publishers, pp. 155-166.

    23. Spiteri, L. 1998. A Simplified Model for Facet Analysis. Canadian Journal of Information and Library Science. April-July, pp. 1-30.

    24. SWeDE. 2005. SWeDe Plugin. BBN Technologies. http://owl-eclipse.projects.semwebcentral.org/ 25. Uszkoreit, H.; Brigitte Jorg and Gregor Erbach. 2003. An Ontology-based Knowledge Portal for

    Language Technology. German Research Center for Artificial Intelligence and Saarland University, Germany. http://www.mcgreg.net/pub/COLLATE-EnablerElsnet03.pdf

    26. Viinikkala, M. 2004. Ontology in Information Systems. 8108103. http://www.cs.tut.fi/kk/webstuff/Ontology.pdf

    27. W3C. 2004. Ontology Web Language (OWL) Guide. http://www.w3.org/TR/owl-guide/. 28. W3C. 2003. OWL Reasoning Examples. http://:owl.man.ac.uk/2003/why/20031203 29. Yeh, Ching-Long. 2002. Development of an Ontology-Based Portal for Digital Archive

    Services. Department of Computer Science and Engineering, Tatung University. http://www.iis.sinica.edu.tw/APE02/Program/chingyeh.pdf

    30. Zuniga, G.L. 2001. Ontology: Its transformation from philosophy to information systems. In Proceedings of the International Conference on Formal Ontology in Information Systems - (FOIS’01). October 17-19, Ogunquit, Maine, New York. http://portal.acm.org/citation.cfm?id=505187&coll=portal&dl=ACM

    http://www.hpl.hp.com/techreports/2005/HPL-2005-189.pdfhttp://owl-eclipse.projects.semwebcentral.org/http://www.mcgreg.net/pub/COLLATE-EnablerElsnet03.pdfhttp://www.cs.tut.fi/kk/webstuff/Ontology.pdfhttp://www.w3.org/TR/owl-guide/http://:owl.man.ac.uk/2003/why/20031203http://www.iis.sinica.edu.tw/APE02/Program/chingyeh.pdfhttp://portal.acm.org/citation.cfm?id=505187&coll=portal&dl=ACM

    Keywords: Ontology; Knowledge Representation Formalisms; Description Logics; Web Ontology Language (OWL); Karlsruhe Ontology (KAON); Digital Libraries; Semantic Web; Facet Analysis; Ontology Markup Languages; Resource Description Language Schema (RD1 IntroductionFigure 1: The TBox Hierarchy of the Agriculture DomainThe set of definitions used to describe complex descriptions in our TBox are as follows:4 Ontology4.1 Ontology Markup Languages5 Experiments with OWL and Jena5.1 Ontology modeling of Agri-Pest-Disease DomainFigure 2: Schematic Representation of OWL ontology facets

    5.2 Ontology-based Knowledge Management System using OWL DL Ontology5.3 Working of OWL DL Ontology-based System for Agri-Pest-Disease DomainTable 4: Semantic Metadata Capturing Contextual Information for the sample

    6 Experiments with KAONFigure 7: Schematic View of Sorghum related classes defined in KAON ontology

    6.3 Working of the KAON ontology in KAON PortalFigure 8: Screenshot of the Top Page of KAON ontology in KAON Portal

    7 Comparison of Experiments with OWL and KAON7.1 Ontology Development Environment7.2 Search ExperienceTable 5: Comparison of Experiments with OWL and KAON

    Ontology DevelopmentSearch Experience

    8 Conclusions and Future Work9 References