the effect of data structures modifications on …hdp/pdf/dissertation.pdf · structures tool...

312
THE EFFECT OF DATASTRUCTURES MODIFICATIONS ON ALGORITHMS FOR REASONING OPERATIONS USING A CONCEPTUAL GRAPHS KNOWLEDGE BASE BY HEATHER DAY PFEIFFER, B.S., M.S. A dissertation submitted to the Graduate School in partial fulfillment of the requirements for the degree Doctor of Philosophy Subject: Computer Science New Mexico State University Las Cruces, New Mexico December 2007 Copyright c 2007 by Heather Day Pfeiffer, B.S., M.S.

Upload: others

Post on 23-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON ALGORITHMS

FOR REASONING OPERATIONS USING A CONCEPTUAL GRAPHS

KNOWLEDGE BASE

BY

HEATHER DAY PFEIFFER, B.S., M.S.

A dissertation submitted to the Graduate School

in partial fulfillment of the requirements

for the degree

Doctor of Philosophy

Subject: Computer Science

New Mexico State University

Las Cruces, New Mexico

December 2007

Copyright c© 2007 by Heather Day Pfeiffer, B.S., M.S.

Page 2: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

“The Effect of Data Structures Modifications On Algorithms for Reasoning Operations

Using a Conceptual Graphs Knowledge Base,” a dissertation prepared by Heather Day

Pfeiffer, B.S., M.S. in partial fulfillment of the requirements for the degree, Doctor of

Philosophy, has been approved and accepted by the following:

Linda LaceyDean of the Graduate School

Roger T. HartleyChair of the Examining Committee

Date

Committee in charge:

Dr. Roger T. Hartley, Chair

Dr. Desh Ranjan

Dr. Clinton Jeffery

Dr. Jeanine Cook

ii

Page 3: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

DEDICATION

This Dissertation is dedicated to my husband, Dr. Joseph J. Pfeiffer, Jr. who has

supported me through "thick and thin", my children, Joseph “Joel” III and Rebecca

“Becca” who have seen "Mom" work on a degree all their lives, my parents, Lloyd

and Barbara Day who have always believed in education and instilled that belief in

their children, and my in-laws (may they rest in peace) Joe and Mary Elizabeth “Betty”

Pfeiffer.

iii

Page 4: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

ACKNOWLEDGMENTS

David J. Benn, from the University of South Australia at Adelaide, for working to help

intergrat his ‘pCG’ system with the CPE "Operations" moduleand help in testing and

debugging comparison tests with CPE and pCG.

Dr. John F. Sowa who gave me some very lively discussions on growing ideas of Con-

ceptual Structures and especially Conceptual Graphs. Also, for allowing me to work

with and expand on his original CGIF format.

Dr. Jean-François Baget and Dr. Madalina Croitoru who have taught me much about

Simple Conceptual Graphs (SCGs) and how relation hierarchies make great Supports

for SCGs. Also for evaluating and discussing some of the theoretical finds of this dis-

sertation.

All the past and current AI graduate students at New Mexico State University, in partic-

ular, Dr. Melanie Martin, Nemecio “Chito” Chavez, Jr., Dr. Dan Tappan and Dr. Tom

O’Hara.

The hard work of my committee, in particular, Dr. Clinton Jeffery who carefully looked

at both content and formatting of all the chapters and traveled all the way back from

Idaho, and Dr. Jeanine Cook who kept me "on track" and over thebumps in the roads.

iv

Page 5: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

VITA

February 11, 1955 Born in Dallas, Texas, USA

June 1977 B.S. in Microbiology/Biology from University of Washington

1980-1984 Systems Analyst at The Boeing Company in Seattle,Washington

May 1988 M.S. in Computer Science from New Mexico State University

1987-2007 Computer Consultant based in Las Cruces, New Mexico

2005-2006 Senior Computer Scientist at Horton Technical Associates, Inc.in Las Cruces, New Mexico

Professional Societies

Association for Computing Machinery (ACM)

IEEE Computer Society

The American Society for Information Systems and Technology (ASIS&T)

New Mexico Network for Women in Science and Engineering (NMNWSE)

Publications

H.D. Pfeiffer and R.T. Hartley. Semantic additions to conceptual programming. InProc. of the Fourth Annual Workshop on Conceptual Structures, Detroit, MA, 1989.

v

Page 6: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

M.J. Coombs, R.T. Hartley, H.D. Pfeiffer, and B. Kilgore. How to become immuneto facts. InProc. Rocky Mountain Conference on Artificial Intelligence, Las Cruces,NM, June 1990.

H.D. Pfeiffer and R.T. Hartley. Additions for set representation and processing to con-ceptual programming. InProc. of the Fifth Annual Workshop on Conceptual Structures,pages 131–140, Boston&Stockholm, 1990.

H.D. Pfeiffer and R.T. Hartley. The Conceptual ProgrammingEnvironment, CP: Rea-soning representation using graph structures and operations. InProc. of IEEE Work-shop on Visual Languages, Kobe, Japan, 1991.

M.J. Coombs, H.D. Pfeiffer, and R.T. Hartley. e-MGR: an Architecture for SymbolicPlasticity. Inthe special issue of International Journal of Man-Machine Studies on inSymbolic Problem Solving in Noisy, Novel, and Uncertain Task Environments, 36:1–17,1992.

C.A. Fields, H.D. Pfeiffer, and T.C. Eskridge. Knowledge representation and control ingm1, and automated dna sequence analysis system based on theMGR architecture. InInternational Journal of Man-Machine Studies, 34:549–573,1992.

R.T. Hartley, H.D. Pfeiffer, and D. Qui. Representation forViewgen: Structures andReasoning. InWorkshop on Propositional Knowledge Representation, Stanford, CA,1992.

H.D. Pfeiffer and R.T. Hartley. The Conceptual ProgrammingEnvironment, CP. InT.E. Nagle, J.A. Nagle, L.L. Gerholz, and P. W. Ekland, editors,Conceptual Structures:Current Research and Practice, Ellis Horwood Workshops. Ellis Horwood, 1992.

H.D. Pfeiffer and R.T. Hartley. Temporal, spatial, and constraint handling in the Con-ceptual Programming Environment, CP.Journal of Experimental and Theoretical AI,4(2):167–182,1992.

H.D. Pfeiffer and T.E. Nagle, editors.Conceptual Structures: Theory and Implementa-tion, volume 754 ofLNAI. Springer-Verlag, Heidelberg, W. Germany, 1993.

H.D. Pfeiffer and B.J. Waltar. Automated message analysis using the Conceptual Pro-gramming Environment, CP. In G. Ellis and P. Ekland, editors, Supp. Proc. of the3rdInternational Conference On Conceptual Structures, Santa Cruz, CA, 1995.

vi

Page 7: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

H.D. Pfeiffer and R.T. Hartley. Visual CP representation ofknowledge. In G. Stumme,editor, Working with Conceptual Structures - Contributions to ICCS2000, Shaker-Verlag. pages 175–188, 2000.

H.D. Pfeiffer and R.T. Hartley. ARCEdit - CG editor. InCGTools Workshop Pro-ceedings in connection with ICCS 2001, Stanford, CA, 2001. [Online Access: July2001] URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html.

H.D. Pfeiffer and R.T. Hartley, editors.CGTools Workshop Proceedings in connec-tion with ICCS 2001, Stanford, CA, 2001. [Online Access: July 2001]URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html.

R.T. Hartley and H.D. Pfeiffer. Data models for Conceptual Structures. InFoundationsand Applications of Conceptual Structures, Contributionsto ICCS 2002. ICCS2002,2002.

K.E. Wolff, H.D. Pfeiffer, and H.S. Delugach, editors.Conceptual Structures at Work,volume 3127 ofLNAI. ICCS2004, Springer, July 2004.

H.D. Pfeiffer, K.E. Wolff, and H.S. Delugach, editors.Conceptual Structures at Work,Contributions to ICCS 2004, Aachen, July 2004. ICCS2004, Shaker Verlag.

H.D. Pfeiffer. An exportable CGIF module from the CP environment: A pragmaticapproach. In K.E. Wolff, H.D. Pfeiffer, and H.S. Delugach, editors,Conceptual Struc-tures at Work, volume 3127 ofLNAI, pages 319–332. ICCS2004, Springer, July 2004.

M.A. Keeler and H.D. Pfeiffer. Collaboratory testbed partnerships as a knowledgecapture challenge. In P. Clark and G. Schreiber, editors,Proceedings of the Third Inter-national Conference on Knowledge Capture, pages 203–204. KCAP’05, ACM Press,October 2005.

M.A. Keeler and H.D. Pfeiffer. Games of inquiry for collaborative concept structuring.In F. Dau, M-L Mugnier, and G. Stumme, editors,Conceptual Structures: Common Se-mantics for Sharing Knowledge, ICCS2005, pages 396–410, Berlin, Springer-Verlag,LNAI 3596, July 2005.

H.D. Pfeiffer. Games for co-evolution of digital resourcesand knowledge tools. InInformation Realities: Shaping the Digital Future for All, ASIS&T 2006, Austin, TX,November 2006.

vii

Page 8: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

M.A. Keeler and H.D. Pfeiffer. Building a pragmatic methodology for KR tool re-search and development. In H. Scharfe, P. Hitzler, and P. Ohrstrom, editors,ConceptualStructures: Inspiration and Application, ICCS2006, pages 314–330, Berlin, Springer-Verlag, LNAI 4068, July 2006.

H.D. Pfeiffer and R.T. Hartley. A comparison of different conceptual structures projec-tion algorithms. In U. Priss, S. Polovina, and R. Hill, editors, Conceptual Structures:Knowledge Architectures for Smart Applications, ICCS’07, pages 165–178, Berlin Hei-delberg, Springer-Verlag, LNAI 4604, July 2007.

H.D. Pfeiffer and J.J. Pfeiffer, Jr. Representation levelswithin knowledge represen-tation. In U. Priss, S. Polovina, and R. Hill, editors,Conceptual Structures: Knowl-edge Architectures for Smart Applications, ICCS’07, pages 484–487, Berlin Heidel-berg, Springer-Verlag, LNAI 4604, July 2007.

H.D. Pfeiffer, N.R. Chavez, Jr., and J.J. Pfeiffer, Jr. CPE design considering inter-operability. In H.D. Pfeiffer, A. Kabbaj, and D.J. Benn, editors,CS-TIW 2007 SecondConceptual Structures Tool Interoperability Workshop, pages 71–75, 2007.

H.D. Pfeiffer, A. Kabbaj, and D.J. Benn, editors.CS-TIW 2007 Second ConceptualStructures Tool Interoperability Workshop.Research Press International, 2007.

Field of Study

Major field: Artificial Intelligence

Conceptual Structures

viii

Page 9: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

ABSTRACT

THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON ALGORITHMS

FOR REASONING OPERATIONS USING A CONCEPTUAL GRAPHS

KNOWLEDGE BASE

BY

HEATHER DAY PFEIFFER, B.S., M.S.

Doctor of Philosophy

New Mexico State University

Las Cruces, New Mexico, 2007

Dr. Roger T. Hartley, Chair

Knowledge representation (KR) is used to store and retrievemeaningful in-

formation. Meaning cannot be directly stored in the computer; therefore, a series of

levels of representation transforms knowledge to a format that a computer can process.

This transformed knowledge is saved using dynamic data structures that are suitable

for the style of KR being implemented, and through the KR the system manipulates

the knowledge in the data using reasoning operations. The data structure, together with

the contents of the transformed knowledge, is called the knowledge base (KB). An al-

gorithm and the associated data structures make up the reasoning operation, and the

performance of this operation is dependent on the KB it uses.

ix

Page 10: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

In this work, the basic reasoning operations for knowledge management will

be explored using a particular style of KR called ConceptualGraphs (CGs). These

operations,projectionandmaximal join, are the foundation for query/answer and hy-

pothesis generation (abduction) systems, respectively. It is believed that changing the

reasoning operation’s algorithm and providing adequate data structures for them can

improve the implementation of the operation for use in intelligent systems; therefore,

making them faster and more efficient. Different algorithmsand data structures execu-

tion times are analyzed over the most general form of CGs knowledge base showing

that flexible, fast and efficient operations can improve higher level systems.

x

Page 11: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

TABLE OF CONTENTS

LIST OF ALGORITHMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Knowledge and Knowledge Representation . . . . . . . . . . . . .. . 2

1.1.1 Representation Levels . . . . . . . . . . . . . . . . . . . . . . 4

1.1.2 Speed and Efficiency in Processing . . . . . . . . . . . . . . . 16

1.2 Foundational Information . . . . . . . . . . . . . . . . . . . . . . . . 17

1.2.1 Basis of Subgraph Isomorphism . . . . . . . . . . . . . . . . 18

1.2.2 Overview of Unification/Matching . . . . . . . . . . . . . . . 20

1.2.3 Database vs Knowledge Base . . . . . . . . . . . . . . . . . . 22

1.3 Organization of Dissertation . . . . . . . . . . . . . . . . . . . . . .. 23

2 ONTOLOGY, KNOWLEDGE AND REPRESENTATION . . . . . . . . . . 27

2.1 Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1.1 Abstract Hierarchies . . . . . . . . . . . . . . . . . . . . . . . 29

2.1.2 Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.1.2.1 Compositional . . . . . . . . . . . . . . . . . . . 29

2.1.2.2 Quantification . . . . . . . . . . . . . . . . . . . . 31

xi

Page 12: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

2.1.2.3 Qualitative . . . . . . . . . . . . . . . . . . . . . . 33

2.2 Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.2.1 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.2.1.1 Declarative Knowledge . . . . . . . . . . . . . . . 36

2.2.1.2 Procedural Knowledge . . . . . . . . . . . . . . . 37

2.2.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.2.2.1 Terminological . . . . . . . . . . . . . . . . . . . 38

2.2.2.2 Assertional . . . . . . . . . . . . . . . . . . . . . 39

2.2.2.3 Generalization . . . . . . . . . . . . . . . . . . . . 39

2.2.2.4 Specialization . . . . . . . . . . . . . . . . . . . . 40

2.3 Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.3.1 Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.3.1.1 Logic . . . . . . . . . . . . . . . . . . . . . . . . 41

2.3.1.2 Rule-Bases . . . . . . . . . . . . . . . . . . . . . 43

2.3.1.3 Semantic Network . . . . . . . . . . . . . . . . . . 43

2.3.2 Internal Representation . . . . . . . . . . . . . . . . . . . . . 47

2.3.2.1 Predicate Calculus . . . . . . . . . . . . . . . . . . 47

2.3.2.2 IF..THEN . . . . . . . . . . . . . . . . . . . . . . 49

2.3.2.3 Conceptual Structures . . . . . . . . . . . . . . . . 50

xii

Page 13: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

3 DEFINITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.1 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.1.1 Digraph and Bigraph . . . . . . . . . . . . . . . . . . . . . . 56

3.1.2 Walk, Path and Connected . . . . . . . . . . . . . . . . . . . 57

3.2 Types and Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.2.1 Concept Type Hierarchy . . . . . . . . . . . . . . . . . . . . 61

3.2.2 Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.3 FOL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.4 Conceptual Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.4.1 Graph Theory Relationships . . . . . . . . . . . . . . . . . . 70

3.4.2 Formation Rules . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.4.3 Simple Conceptual Graphs (SCGs) . . . . . . . . . . . . . . . 74

3.4.4 Conceptual Graphs Interchange Format (CGIF) . . . . . . .. 76

3.5 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.5.1 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3.5.2 Hash Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.5.2.1 Perfect Hashing . . . . . . . . . . . . . . . . . . . 79

3.5.2.2 Hash Table/Hash Tables . . . . . . . . . . . . . . . 80

4 REASONING OPERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . 81

xiii

Page 14: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

4.1 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.1.1 Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.1.2 Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.2 Graph and Subgraph Isomorphism . . . . . . . . . . . . . . . . . . . 85

4.2.1 Graph Isomorphism . . . . . . . . . . . . . . . . . . . . . . . 85

4.2.2 Subgraph Isomorphism . . . . . . . . . . . . . . . . . . . . . 85

4.2.2.1 Non-labeled nodes and undirected edges . . . . . . 87

4.2.2.2 Labeled nodes and undirected edges . . . . . . . . 87

4.2.3 Subtree Isomorphism . . . . . . . . . . . . . . . . . . . . . . 88

4.2.3.1 Hamiltonian Path . . . . . . . . . . . . . . . . . . 88

4.2.3.2 Subforest Isomorphism . . . . . . . . . . . . . . . 88

4.2.4 Subbipartite Isomorphism . . . . . . . . . . . . . . . . . . . . 89

4.2.5 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4.2.5.1 Historical Algorithms . . . . . . . . . . . . . . . . 90

4.2.5.2 Proposed Algorithm . . . . . . . . . . . . . . . . . 91

4.2.6 Maximal Join . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.2.6.1 Historical Algorithms . . . . . . . . . . . . . . . . 92

4.2.6.2 Proposed Algorithm . . . . . . . . . . . . . . . . . 92

4.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

xiv

Page 15: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

4.3.1 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4.3.2 Maximal Join . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.3.3 Over Knowledge bases . . . . . . . . . . . . . . . . . . . . . 98

5 ALGORITHMS AND ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . 101

5.1 Foundational Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.1.1 SCG Projection . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.1.2 SCG Relation Projection . . . . . . . . . . . . . . . . . . . . 107

5.1.3 Polyprojection . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.1.4 Notio Projection . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.2 New Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.2.1 Supporting Information . . . . . . . . . . . . . . . . . . . . . 115

5.2.1.1 Variables and Given values . . . . . . . . . . . . . 115

5.2.1.2 Actual Supporting Routines . . . . . . . . . . . . . 117

5.2.1.3 Worst Case Analysis for Support Routines . . . . . 117

5.2.2 New Projection . . . . . . . . . . . . . . . . . . . . . . . . . 122

5.2.2.1 Actual Algorithm . . . . . . . . . . . . . . . . . . 124

5.2.2.2 Execution Time . . . . . . . . . . . . . . . . . . . 124

5.2.2.3 Worst Case Analysis for Projection . . . . . . . . . 125

5.2.3 New Maximal Join . . . . . . . . . . . . . . . . . . . . . . . 126

xv

Page 16: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

5.3 Typical Scenario Analysis for Projection Algorithms . .. . . . . . . . 128

5.3.1 Projection Algorithms using SCG . . . . . . . . . . . . . . . 128

5.3.1.1 SCG Projection . . . . . . . . . . . . . . . . . . . 129

5.3.1.2 SCG Relation Projection . . . . . . . . . . . . . . 129

5.3.2 Notio Projection . . . . . . . . . . . . . . . . . . . . . . . . . 130

5.3.3 New Projection . . . . . . . . . . . . . . . . . . . . . . . . . 130

5.3.3.1 Typical Case for Support Routines . . . . . . . . . 130

5.3.3.2 Typical Case for New Projection Algorithm . . . . 132

6 SYSTEMS/ENVIRONMENTS AND IMPLEMENTATIONS . . . . . . . . . 135

6.1 Semantic Network Systems . . . . . . . . . . . . . . . . . . . . . . . 135

6.1.1 KL-ONE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

6.1.2 SNePS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.1.3 SNAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.1.4 CS Initial Project - PEIRCE . . . . . . . . . . . . . . . . . . . 143

6.2 Conceptual Graphs Environments . . . . . . . . . . . . . . . . . . . .147

6.2.1 CoGITaNT . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.2.2 Amine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

6.2.3 pCG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

6.2.4 CPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

xvi

Page 17: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

6.2.4.1 Basic Architecture for the Environment . . . . . . 151

6.2.4.2 Data Flow within the Environment . . . . . . . . . 152

6.2.4.3 Data Structures used by the Environment . . . . . . 153

6.3 ADT Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 153

6.3.1 Logical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

6.3.2 Basic Data Structures . . . . . . . . . . . . . . . . . . . . . . 154

6.3.3 Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

6.4 Experiment Systems Implementation . . . . . . . . . . . . . . . . .. 156

6.4.1 pCG - Original Notio . . . . . . . . . . . . . . . . . . . . . . 157

6.4.2 CP Environment (CPE) . . . . . . . . . . . . . . . . . . . . . 159

6.4.2.1 Array (Vectors) . . . . . . . . . . . . . . . . . . . 160

6.4.2.2 Hash Tables . . . . . . . . . . . . . . . . . . . . . 162

7 PROJECTION EXPERIMENTS, RESULTS AND ANALYSIS . . . . . . . . 165

7.1 Domain Problem - ‘Blocks World’ . . . . . . . . . . . . . . . . . . . . 165

7.2 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

7.2.1 Single Appearance of Relation within Graph . . . . . . . . .172

7.2.1.1 Increase # of Graphs in KB . . . . . . . . . . . . . 173

7.2.1.2 Increase # of Nodes in Graphs in KB . . . . . . . . 173

7.2.1.3 Increase # of Nodes in Query Graph . . . . . . . . 175

xvii

Page 18: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

7.2.2 Multiple Appearance of Relation with a Graph . . . . . . . .. 178

7.2.2.1 Increase # of Nodes in Graphs in KB . . . . . . . . 179

7.2.2.2 Increase # of Nodes in Query Graph . . . . . . . . 180

7.3 Results of Each Experiment Systems . . . . . . . . . . . . . . . . . .182

7.3.1 pCG - Original Notio . . . . . . . . . . . . . . . . . . . . . . 182

7.3.2 CP Environment . . . . . . . . . . . . . . . . . . . . . . . . . 182

7.3.2.1 Array (Vector) . . . . . . . . . . . . . . . . . . . 183

7.3.2.2 Hash Tables . . . . . . . . . . . . . . . . . . . . . 183

7.4 Results of Each # of Nodes in KB . . . . . . . . . . . . . . . . . . . . 184

7.4.1 5 nodes in KB graphs . . . . . . . . . . . . . . . . . . . . . . 184

7.4.2 11 nodes in KB graphs . . . . . . . . . . . . . . . . . . . . . 187

7.4.3 21 nodes in KB graphs . . . . . . . . . . . . . . . . . . . . . 189

7.4.4 31 nodes in KB graphs . . . . . . . . . . . . . . . . . . . . . 192

7.4.5 53 nodes in KB graphs . . . . . . . . . . . . . . . . . . . . . 194

7.4.6 73 nodes in KB graphs . . . . . . . . . . . . . . . . . . . . . 197

7.5 Analysis of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

7.5.1 Change # of Graphs in KB . . . . . . . . . . . . . . . . . . . 200

7.5.2 Change # of Nodes in KB Graphs . . . . . . . . . . . . . . . . 200

7.5.3 Change # of Nodes in Query Graph . . . . . . . . . . . . . . . 201

xviii

Page 19: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

7.5.4 Change # of Identical Relations in Graph . . . . . . . . . . . .202

8 CONCLUSIONS AND FUTURE WORK . . . . . . . . . . . . . . . . . . . 203

8.1 Evaluation of Four Projection Algorithms . . . . . . . . . . . .. . . . 203

8.1.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

8.1.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

8.2 Data Structures and Algorithms Effectiveness Comparison forImplemented Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 206

8.2.1 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

8.2.2 Weaknesses . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

8.3 Significance of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

8.3.1 Full Conceptual Graphs . . . . . . . . . . . . . . . . . . . . . 208

8.3.2 Finds All Valid Projections . . . . . . . . . . . . . . . . . . . 208

8.3.3 Data Structure Integration in Algorithm over LargeKB and Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 209

8.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

8.4.1 Experiments and Analysis of Maximal Join Algorithm . .. . 210

8.4.2 KB Stored From and To Standard Relational DB . . . . . . . . 210

8.4.3 Time and Space Constraints . . . . . . . . . . . . . . . . . . . 211

8.4.3.1 Heuristics . . . . . . . . . . . . . . . . . . . . . . 212

8.4.3.2 Time . . . . . . . . . . . . . . . . . . . . . . . . . 213

xix

Page 20: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

8.4.3.3 Space . . . . . . . . . . . . . . . . . . . . . . . . 215

8.4.4 Different Domain Problems and Interoperability . . . .. . . . 217

APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

A PROGRAMMING LANGUAGE CRITERIA . . . . . . . . . . . . . . . . . 219

A.1 Language Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

A.1.1 Visual Basic .Net . . . . . . . . . . . . . . . . . . . . . . . . 222

A.1.2 JavaTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

A.1.3 C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

A.1.4 C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

A.2 Language Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 223

A.2.1 C++ to C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

A.2.2 C++ to JavaTM . . . . . . . . . . . . . . . . . . . . . . . . . 224

A.2.3 C++ to Prolog . . . . . . . . . . . . . . . . . . . . . . . . . . 225

A.2.4 C++ to Visual Basic 6.0 . . . . . . . . . . . . . . . . . . . . . 225

B DOCUMENTATION OF CGIF - VERSION 2001 . . . . . . . . . . . . . . . 227

B.1 Added Definitions For CGIF Categories . . . . . . . . . . . . . . . .. 227

B.2 Lexical Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

B.3 Syntactic Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

C DOCUMENTATION OF SYSTEMS . . . . . . . . . . . . . . . . . . . . . . 245

xx

Page 21: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

C.1 pCG (CGP Programs) . . . . . . . . . . . . . . . . . . . . . . . . . . 245

C.2 CP Environment, CPE . . . . . . . . . . . . . . . . . . . . . . . . . . 250

C.2.1 CPE Module Documentation . . . . . . . . . . . . . . . . . . 250

C.2.1.1 CP_Graph Reasoning Operations . . . . . . . . . . 250

C.2.1.2 CP_Graph Reasoning Internal Operations . . . . . 251

C.2.1.3 CGHash_Graph and CG_Graph Public Functions . 252

C.2.2 CPE Class Documentation . . . . . . . . . . . . . . . . . . . 253

C.2.2.1 cp_graph Class Reference . . . . . . . . . . . . . . 253

C.2.2.2 cghash_graph Class Reference . . . . . . . . . . . 254

C.2.2.3 cg_graph Class Reference . . . . . . . . . . . . . . 255

D DATA COLLECTED FROM SAMPLE TESTS . . . . . . . . . . . . . . . . 257

D.1 Data Collected for Computing Each Experimental ResultsTestSet - 53 nodes in KB Graphs . . . . . . . . . . . . . . . . . . . . . . . 257

D.2 Error Bar Data - 53 nodes in KB Graphs . . . . . . . . . . . . . . . . . 259

D.3 Validation of Correct Projection . . . . . . . . . . . . . . . . . . .. . 262

D.3.1 11 nodes in KB graphs - Unique Relation Results . . . . . . .264

D.3.2 13 nodes in KB graphs - Multi-Instances Relation Results . . . 265

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

xxi

Page 22: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

LIST OF ALGORITHMS

5.1 Π is a General Projection fromT to G . . . . . . . . . . . . . . . . . . 105

5.2 Π Modified as an Injective Projection fromT to G . . . . . . . . . . . 107

5.3 Notio Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.4 Supporting Projection Routines . . . . . . . . . . . . . . . . . . . .. 118

5.5 Supporting Projection Routines (Cont1) . . . . . . . . . . . . .. . . . 119

5.6 Supporting Projection Routines (Cont2) . . . . . . . . . . . . .. . . . 120

5.7 New Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

5.8 New Maximal Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

xxii

Page 23: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

LIST OF TABLES

1.1 Brachman and Guarino Classification Levels and Main Fea-tures (Adapted from [[45], Figure 6]). . . . . . . . . . . . . . . . . . .9

3.1 Execution Times For Single Element with Set of Sizen. . . . . . . . . . 78

4.1 Related Problem Classes. . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.1 KB Single Relation Graph Files. . . . . . . . . . . . . . . . . . . . . .172

7.2 Single Relation: Query Graph Size Run vs Number of Nodesin KB Graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

7.3 Multi-Relation: Query Graph Size Run vs Number of Nodesin KB Graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

7.4 Number of Projections Found: Query Graph Size vs KB GraphSize. . . 202

8.1 Comparison of Four Algorithms. . . . . . . . . . . . . . . . . . . . . 203

C.1 CGP Program Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

D.1 Average Data Values for 53 nodes KB with 1000 Graphs. . . . .. . . . 257

D.2 Average Data Values for 53 nodes KB with 2500 Graphs. . . . .. . . . 258

D.3 Average Data Values for 53 nodes KB with 5000 Graphs. . . . .. . . . 259

D.4 Fast/Slow Values for 53 nodes KB with 1000 Graphs. . . . . . .. . . . 260

D.5 Fast/Slow Values for 53 nodes KB with 2500 Graphs. . . . . . .. . . . 261

xxiii

Page 24: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

D.6 Fast/Slow Values for 53 nodes KB with 5000 Graphs. . . . . . .. . . . 261

D.7 Error Bar Data Values for 53 nodes KB with 1000 Graphs. . . .. . . . 262

D.8 Error Bar Data Values for 53 nodes KB with 2500 Graphs. . . .. . . . 263

D.9 Error Bar Data Values for 53 nodes KB with 5000 Graphs. . . .. . . . 263

xxiv

Page 25: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

LIST OF FIGURES

1.1 Levels of Representations. . . . . . . . . . . . . . . . . . . . . . . . .13

1.2 Abstract Data Type (ADT). . . . . . . . . . . . . . . . . . . . . . . . . 16

1.3 UnifierU, ProjsU −→ G1 andU −→ G2, Unification G isFound (Adapted from [[136], Figure 5]). . . . . . . . . . . . . . . . . .22

2.1 Time Chart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.2 Logic Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.3 Meaning Triangle for Symbols, Concepts, and Referents (Basedon [[129], Figure 1]). . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

2.4 Peirce’s Triadic Relation. . . . . . . . . . . . . . . . . . . . . . . . .. 52

3.1 A Graph to Illustrate Graph Theory Concepts (Adapted from[[46], Figure 2.9]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2 A Digraph that is a Bipartite Graph. . . . . . . . . . . . . . . . . . .. 57

3.3 A Type Hierarchy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.4 An Animal Concept Hierarchy. . . . . . . . . . . . . . . . . . . . . . . 61

3.5 Support Using a Relation Hierarchy (Based on [[5], Figure 1]). . . . . . 63

3.6 Basic Abstract Conceptual Graph. . . . . . . . . . . . . . . . . . . .. 66

3.7 Basic Abstract Conceptual Graph in Digraph Format that is Bipartite. . 67

3.8 Basic Conceptual Graph with Actor. . . . . . . . . . . . . . . . . . .. 69

3.9 Action Function For Basic Actor Graph. . . . . . . . . . . . . . . .. . 69

xxv

Page 26: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

3.10 Basic Detached Conceptual Graph. . . . . . . . . . . . . . . . . . .. . 72

3.11 Simple Basic Conceptual Graph. . . . . . . . . . . . . . . . . . . . .. 73

3.12 Second Concept Type Hierarchy. . . . . . . . . . . . . . . . . . . . .. 73

3.13 Simple Restricted Basic Conceptual Graph. . . . . . . . . . .. . . . . 74

3.14 Simple Conceptual Graph (SCG). . . . . . . . . . . . . . . . . . . . .75

4.1 Project (Mp (Q, H) = P) (Adapted from [[92], Figure 3]). . . . . . . . 82

4.2 Join (MJ (Q, H) = J) (Adapted from [[92], Figure 2]). . . . . . . . . . 84

4.3 Query Graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.4 KB Graph with Type Hierarchy. . . . . . . . . . . . . . . . . . . . . . 95

4.5 Projection Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.6 Join ofP1 andP2 Graphs. . . . . . . . . . . . . . . . . . . . . . . . . 97

4.7 Common Graph of Basic Graphs. . . . . . . . . . . . . . . . . . . . . . 98

4.8 Join of Detached Basic and Simple Basic Graphs. . . . . . . . .. . . . 98

6.1 A KL-ONE Diagram of a Simple ‘Blocks-World’ Arch (Basedon [[141], Figure 1]). . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.2 A SNePS Representation of “A on B on a Table” (Based on[[110], Figure 12]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

6.3 SNAP Semantic Network of “USC in LA, CA” (Based on [[72],Figure 2]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6.4 PEIRCE Schema for Age (Based on [[119], Figure 6.5]). . . .. . . . . 144

xxvi

Page 27: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

6.5 Current CP Environment (From [[87], Figure 1, page 322]). . . . . . . . 152

7.1 Part 1: Example of Blocks World Benchmark File. . . . . . . . .. . . 166

7.2 Part 2: Example of Blocks World Benchmark File. . . . . . . . .. . . 167

7.3 Part 3: Example of Blocks World Benchmark File. . . . . . . . .. . . 169

7.4 Part 4: Example of Blocks World Benchmark File. . . . . . . . .. . . 170

7.5 A Picture of the Benchmark File. . . . . . . . . . . . . . . . . . . . . .171

7.6 5 nodes in KB of 1000 Graphs. . . . . . . . . . . . . . . . . . . . . . . 185

7.7 5 nodes in KB of 2500 Graphs. . . . . . . . . . . . . . . . . . . . . . . 186

7.8 5 nodes in KB of 5000 Graphs. . . . . . . . . . . . . . . . . . . . . . . 186

7.9 11 nodes in KB of 1000 Graphs. . . . . . . . . . . . . . . . . . . . . . 187

7.10 11 nodes in KB of 2500 Graphs. . . . . . . . . . . . . . . . . . . . . . 188

7.11 11 nodes in KB of 5000 Graphs. . . . . . . . . . . . . . . . . . . . . . 189

7.12 21 nodes in KB of 1000 Graphs. . . . . . . . . . . . . . . . . . . . . . 190

7.13 21 nodes in KB of 2500 Graphs. . . . . . . . . . . . . . . . . . . . . . 191

7.14 21 nodes in KB of 5000 Graphs. . . . . . . . . . . . . . . . . . . . . . 191

7.15 31 nodes in KB of 1000 Graphs. . . . . . . . . . . . . . . . . . . . . . 192

7.16 31 nodes in KB of 2500 Graphs. . . . . . . . . . . . . . . . . . . . . . 193

7.17 31 nodes in KB of 5000 Graphs. . . . . . . . . . . . . . . . . . . . . . 194

7.18 53 nodes in KB of 1000 Graphs. . . . . . . . . . . . . . . . . . . . . . 195

xxvii

Page 28: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

7.19 53 nodes in KB of 2500 Graphs. . . . . . . . . . . . . . . . . . . . . . 196

7.20 53 nodes in KB of 5000 Graphs. . . . . . . . . . . . . . . . . . . . . . 196

7.21 73 nodes in KB of 1000 Graphs. . . . . . . . . . . . . . . . . . . . . . 197

7.22 73 nodes in KB of 2500 Graphs. . . . . . . . . . . . . . . . . . . . . . 198

7.23 73 nodes in KB of 5000 Graphs. . . . . . . . . . . . . . . . . . . . . . 199

8.1 Interval Time Relationships. . . . . . . . . . . . . . . . . . . . . . .. 212

8.2 A Simple Time Map. . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

8.3 Time Chart for a Bouncing Ball. . . . . . . . . . . . . . . . . . . . . . 215

8.4 Conceptual Space Diagram for a Bouncing Ball. . . . . . . . . .. . . 216

B.1 The Display Format for‘A person is between a rock and a hard place.’. 244

C.1 Part 1: Example of CGP Program from pCG. . . . . . . . . . . . . . . 246

C.2 Part 2: Example of CGP Program from pCG. . . . . . . . . . . . . . . 247

C.3 Part 3: Example of CGP Program from pCG. . . . . . . . . . . . . . . 248

C.4 Part 4: Example of CGP Program from pCG. . . . . . . . . . . . . . . 249

C.5 Inheritance Diagram for Class ‘cp_graph’. . . . . . . . . . . .. . . . . 254

D.1 KB for Verifying 3 nodes Query onto 11 nodes KB. . . . . . . . . .. . 265

D.2 Query Graph for Verifying 3 nodes Query onto 11 nodes KB. .. . . . . 266

D.3 Projection Verifying 3 nodes Query onto 11 nodes KB. . . . .. . . . . 266

xxviii

Page 29: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

D.4 Query Graph for Verifying 5 nodes Query onto 13 nodes KB. .. . . . . 267

D.5 KB for Verifying 5 nodes Query onto 13 nodes KB. . . . . . . . . .. . 268

D.6 Projections Verifying 5 nodes Query onto 13 nodes KB. . . .. . . . . . 269

xxix

Page 30: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

xxx

Page 31: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

CHAPTER 1

INTRODUCTION

Knowledge representation (KR) is used to store and retrievemeaningful infor-

mation that can not directly be stored in a computer. However, this work develops a se-

ries of levels of representation to transforms knowledge toa format that a computer can

use to process this information. This transformed knowledge is saved using dynamic

data structures that are suitable for the style of KR being implemented, and through the

KR the system manipulates the knowledge in the data using reasoning operations.

The data structure used together with the contents of the transformed knowl-

edge, is called the knowledge base (KB). An algorithm and this associated data struc-

ture makes up the reasoning operation, and the performance of this operation is de-

pendent on the associated KB. In this work, the basic reasoning operations for knowl-

edge management will be explored using a particular style ofKR called Conceptual

Graphs (CGs). These operations,projectionandmaximal join, are the foundation for

query/answer and hypothesis generation (abduction) systems, respectively. It will be

shown that changing the reasoning operation’s algorithm and providing adequate data

structures for them can improve the implementation of the operation for use in an intel-

ligent system; therefore, making it faster and more efficient. Different algorithms and

data structures execution times are analyzed over the most general form of CGs knowl-

1

Page 32: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

edge base showing that flexible, fast and efficient operations can improve a higher level

system.

1.1 Knowledge and Knowledge Representation

Artificial Intelligence (AI) emerged in the 1960’s, and can be characterized as

the process of describing a problem in such a way that a machine could find a solution.

AI uses general reasoning techniques that develop along thelines believed used by an

intelligent human [12, 65, 106]. AI systems, therefore, needed to representknowledge

in the computer so that these reasoning techniques can be applied to the problem. First,

consider what knowledge is and then how to represent it to thecomputer. According

to the on-line dictionaries, knowledge is “the range of onesinformation or understand-

ing; the circumstance or condition of apprehending truth orfact through reasoning; the

fact or state of knowing; the perception of fact or truth; clear and certain mental ap-

prehension” [60, 59]. However, there are two types of knowledge that human beings

deal with every day, 1) knowledge that defines an idea or concept and their relation-

ships [120], and 2) knowledge that gives understanding to time, space, or constraints in

connection to these definitions [3, 26, 4]. So knowledge allows us to have a definition

or understanding of the events and acts around us; knowledgeallows us todescribeour

world. Second, for the computer the description of the problem that it is to solve has

become known asknowledge representation. The representation consists of a set of

syntactic and semantic rules to describe a problem domain [1]. Given that syntax stud-

2

Page 33: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

ies the grammar rules for expressing the arrangement of symbols [119], and semantics

“is the scientific study of the relations between signs or symbols and what they denote

or mean” [[139] page 41], knowledge representation, when abstractly described, may

appear very informal and without concrete structure. This seems informal because the

syntactic rules perform symbol manipulation, while the semantic rules define a map-

ping that gives an interpretation of the representation in terms of another representation.

The term “semantics” (meaning) has come to be associated with many different

types of processing of relationships. Two key relationshiptypes (discussed as links in

[139]) are 1) structural links - which set up parts of propositions, and are definitional

relationships within a network of concepts, and 2) assertion links - which assert some-

thing about the world, and are basic relations that hold between concepts (i.e. part-of,

a-kind-of, etc.). Structural links give definition of knowledge, where assertion links

define facts. However, the processing of each of these links does not imply semantic

meaning. Meaning can be defined in terms of axioms of basic propositions, or truth

maintenance with correctness of assertion [82, 71, 123]. Semantic interpretations and

procedural semantics are used in determining these meanings [139]. One misuse of the

term semantics is in the area of semantic inferences [139]. Semantic inferences refer to

inferences that cross the boundary between symbol and referent; however, all steps of

the process are not semantic. If the step of the process involves parsing or processing a

structural link then one now has a syntactic operation.

3

Page 34: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Many knowledge representations used by computers have beendeveloped in-

cluding semantic networks, logic, frames, and rule-based representations. Within AI,

knowledge representations have been built into different working applications some of

which are referred to assoftware information systems[134].

Knowledge representation systems are built to help find solutions to problems.

Many times the knowledge representation, KR, is broken intoboth a processing lan-

guage, and a knowledge base, KB (see Section 1.2.3 for discussion), that has special

data structures and operations that process the data. Some systems address only partic-

ular problem domains, i.e. neural networks for pattern matching, while other systems

attempt to process large amounts of diverse data, i.e. CYC KBfrom Cycorp [62]. Also,

historically, many KRs and KBs work as standalone systems, while newer systems are

being constructed as a group of modules each handling a specific aspect of the problem

solving process [124]. Sometimes these are actually different modules within a single

system [87]; others are designed as agents in a multi-agent environment [31].

1.1.1 Representation Levels

For Newell, intelligent systems (AI systems) need both a symbol and a knowl-

edge level to perform reasoning [80]. The symbol level is where representations of

knowledge would be processed. This is the level where the data structures are defined

and acted upon. The knowledge level has no physical structure, only a general func-

tional equation for knowledge. The symbol or program level is where physical structure

4

Page 35: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

or environment is defined for the knowledge level. Within thesymbol level, computa-

tional mechanisms are defined for the environment of knowledge.

Some of the confusion in the field of knowledge representations, and in particu-

lar semantic networks, is what rules, syntactic or semantic, are defined at each of these

levels of representation. In many readings, it is not made clear what knowledge can

be processed directly by the computer as machine code representation, and what must

be transformed (mapped) into another representation level. It should be noted that, in

general, abstract representations are too informal for machine processing. Therefore,

most knowledge representations must be translated to a moreconcrete representation

in order to be coded for the machine, and for execution and analysis to be performed

by the computer.

Back in 1971, Shapiro [109] attempted to divide all representations defined by

semantic networks into the following two levels:

• item - conceptual level of a semantic network.

• system - structural level of interconnection that ties the structured assertions of

facts represented in the network to items participating in those facts.

Levelization only looks at the actual semantic network represented on the page. It does

not consider the semantics defined by the network or this knowledge representation

would be coded for machine processing. The item level is concerned with the nodes

5

Page 36: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

that appear in the network. These nodes are both concepts andrelations, and have

some definition represented within the semantic network. The system level, according

to Shapiro, is attempting to define the links that are presentbetween the nodes in the

network.

In 1979, Brachman [11] tried to address the confusion about representations of

knowledge by defining levels for different types of semanticnetwork representations. In

this way, Brachman was describing one representation in terms of another. When levels

are defined by other levels and representations are defined byother representations, a

confusion [12] is produced in the field of knowledge representations. When knowledge

representations have this interpretation one can see it as a“levelization” of representa-

tions. Historically, this levelization of representations has been looked at mainly when

discussing the specific knowledge representation scheme known as semantic networks

[123], but it could be applied to most representation schemes. In his paper Brachman

defines a “level” as a distinctive type of node or link. These are conceptual levels and

a network’s notation can be analyzed in terms of any of these levels. The levels are the

following:

1. implementation level - a network is only a data structure.

2. logical level - in a network, links represent logical relationships such as:

• ∀ (for all)

6

Page 37: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

• ∃ (there exists)

• ¬ (not)

• ∨ (and)

• ∧ (or)

• → (implication)

• ≡ (if and only if)

3. epistemological level (Brachman’s missing level) - in a network, links give for-

mal structure to conceptual units and create a set of their interrelationships as

conceptual units.

4. conceptual level - in a network, links represent semanticor conceptual relation-

ships.

5. linguistic level - in a network, primitive elements are language-specific and links

stand for arbitrary relationships that exist in the world.

Brachman’s levels are defined types of network nodes and links. He states:

“It should be clear, then, that one of the main problems with many of

the older formalisms was their lack of a clear notion of what level they

were designed for” [[11] page 32].

7

Page 38: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

For the five levels given above, Brachman saw the implementation as the lowest level;

that is, the most basic type of network. This level only had data structures associated

with it; there are really no semantics related to the network. The logical level is seen

as needing the semantics of the basic logical operators. Theconceptual level is similar

to Shapiro’s item level discussed above. However, this level defines the semantics of

the concepts being included within the level. The linguistic level is very abstract and is

used to define, for the network, a level that has an open concept.

The epistemological level is seen by Brachman as a missing level, located be-

tween the logical level and the conceptual level. Brachman then uses all these levels to

define semantic networks in terms of cases (or roles) with slots (or sets of fillers), by

looking at the types of links needed when processing the network. Currently, this type

of representation is known as “frames” and in some circles isa knowledge representa-

tion in its own right (see Section 6.1.1 for a discussion of a Frame system).

On evaluation of the main feature of the epistemological level, this author would

place it between the conceptual and linguistic level. The reasoning behind the move is

because it is similar to the system level discussed by Shapiro, and is very much con-

cerned with the interrelationships of concepts and conceptual units. However, Guarino

like Brachman also saw missing information in the levels, but argues instead of moving

the level to add an ontological level to Brachman’s classification levels. The ontologi-

cal level gives a foundation for the knowledge engineering process and depict a set of

8

Page 39: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

features for the computational properties of each level (see Table 1.1) [45]. The onto-

logical level in Guarino’s eyes should be introduced between the epistemological and

conceptual level, being neutral with respect to the epistemological level, but not any

epistemological formalism is necessarily adequate. For Brachman and Guarino, all the

levels are processed as part of the knowledge representation.

Table 1.1: Brachman and Guarino Classification Levels and Main Features(Adapted from [[45], Figure 6]).

Level Primitive concepts Main feature Interpretation

Implementation are pointers Concrete ObjectiveLogical are predicates Formalization Arbitrary

Epistemological are structuring primitives Structure ArbitraryOntological satisfy meaning postulates Meaning ConstrainedConceptual are cognitive primitives Conceptualization SubjectiveLinguistic are linguistic primitives Language Subjective

Brachman did not try to actually look at processing representation from a com-

puter processing point of view. Then in 1982, Newell [80] began the redefinition of a

“level” from this new point of view. He defined a level in the following way:

“a level consists of a medium that is to be processed, components that

provide primitive processing, laws of composition that permit compo-

nents to be assembled into systems, and laws of behavior thatdetermine

how system behavior depends on the component behavior and the struc-

9

Page 40: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

ture of the system” [[80] page 92].

Newell referred to computer systems levels as going throughthe following bottom

(highest) to top (lowest) sequence:

• device level

• circuit level

• logic level (sub-levels - combinatorial and sequential circuits)

• register-transfer level and symbol (program) level

• configuration level

• knowledge level (new level)

As a third sibling just below the configuration level, Newelladded a new level known

as the knowledge level. For each of the levels, the followingaspects need to be de-

fined: the medium, the components, the assembly of the components into a system, the

composition laws and the behavior laws. In looking at knowledge and representation,

Newell’s symbol level and knowledge level are the most important. Each of these levels

has been defined according to the above aspects. The medium for the symbol level is

symbols and/or expressions. The components include memories and operations. The

components are assembled into systems known as computer systems. In the aspect of

10

Page 41: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

laws, composition is built on designation and/or association, while behavior is sequen-

tial interpretation. When looking at the knowledge level, the medium is knowledge.

The components are goals, actions and bodies (physical code). The composition laws

are a set of actions, a set of goals and a body, code, for the system that is referred to as

the agent. Lastly, the behavior law is the principle of rationality: “Actions are selected

to attain the agent’s goals”. This principle provides a general functional equation for

the knowledge medium to act on. However, the agent is very abstract and has no real

physical structure. The medium definition shows that knowledge is very open and has a

potential for generating an action. The knowledge level is an approximation and there

are no guarantees on the system’s behavior.

In 2002, the Object Management Group (OMG) dealing with relating legacy

systems to business modeling continued to defined aModel Driven Architecture(MDA)

[114, 112]. Within this architecture, the modeling space transforms into the code space,

where the representation of the business process/rules transforms all the way to the code

to be deployed. MDA-enabled tools do this transformation using a set of levels to move

from the business model, through to an intermediate level that represents the aspects of

the model that need to be coded, on to the actual generation ofthe code. Even though

this is discussed in terms of an architecture instead of a knowledge representation,

the transitioning of representation from an abstract levelto a concrete code level still

applies.

11

Page 42: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

This work expands on Newell’s computer processing level idea, in particular

investigating what could be the possible computational mechanisms or physical struc-

tures of the symbol level (representations), while seeing level relationships more from

Brachman’s definition [11] point of view. This work defineslevelas:

“There is alevel of processing of representations that sees the lowest

level to be a very abstract representation and then, as levels increase, the

representation becomes more concrete or machine like.”

The highest level of representation would then be processeddirectly by a computer

(see Figure 1.1) because it is the actual implementation that is compiled or interpreted

as machine code.

When one discusses semantic networks, it is not clear what rules, syntactic or

semantic, are defined at each of these processing levels of representation. In many read-

ings, it is not indicated clearly what rules can or should be processed directly by the

computer at each knowledge representation level. In general, abstract representations

are too informal for machine processing and these need to be translated to another more

concrete representation. Therefore, when looking at all forms of knowledge representa-

tion translation to a more concrete representation allows coding, and later for execution

and analysis to be performed with the computer.

Therefore, now consider representation in an AI system to bea series of these

processing levels. Encapsulating the knowledge representation (KR) is the level of

12

Page 43: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

ontological information[81], level 0. This level would be considered the knowledge

level under Newell’s levels, part of the linguistic level for Brachman, and would be a

relocation of Guarino’s ontological level. The information represented is not actually

part of the structure of the domain knowledge and is the most abstract of all the levels

of representation and implementation. In fact, it is more ofa hierarchy of conceptual

Level 0

Knowlege Representation

Defining ADT

Level 2

Level 1

Level 3

Level 4

Storing RepresentationImplementing ADT

Defining Representation

Internal RepresentationDeclaring ADT

Ontology

Figure 1.1: Levels of Representations.

13

Page 44: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

information than knowledge, so will be called “ontology” (see Section 2.1).

This ontology level contains more general information thanwhat is found within

the KR [61]; this might also contain any meta data needed to bestored for the knowl-

edge representation. Within the ontology level any particular system may use an ab-

stract hierarchy. These hierarchies define relationships between the conceptual units

within the knowledge representation and information outside the KR, such as group

membership. Therefore, defined hierarchies are to be considered part of level 0 in our

representation levels.

KR will start processing at level 1. It should be noted, the semantics at this

level are declarative and/or procedural in terms of its interpretation to a second repre-

sentation, and therefore are not concrete. For Newell this level would be part of the

symbol level, very close to the knowledge level. This is where the representation of

the knowledge medium would begin. In Brachman’s levels thiswould encompass part

of the conceptual level and all of the epistemological level. The epistemological level

as defined clearly should be placed between the linguistic level and conceptual level as

opposed to where Brachman placed it in his work [11].

The second level of representation, level 2, is an internal representation that

could be viewed as a virtual machine. When comparing this level to the MDA architec-

ture, this would be the platform-independent modeling level. Within the representation

of KR, this is where the declaration of an abstract data type (ADT) is performed (see

14

Page 45: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Section 1.1.2). This syntactic representation is more formal and can be used in the def-

inition and implementation of the declared ADT. The syntactic rules are concrete and

define a mapping of symbols to operators. However, in order toimplement the ADT

declared by this level of representation, there must be a third level of definitions giving

more structure to the representation.

This level, level 3, consists of the actual semantic definition of the ADT declared

in level 2. The semantic rules are also concrete, and define a mapping of operations to

functions. This representation level can be use to implement code for the computer to

store and retrieve knowledge. It defines the algorithms to beperformed, and theoretical

time/space analysis can be performed on these algorithms. There is a strong connection

between level 2 and level 3 because the concrete rules of the representation in level 2

will work over the algorithms of level 3 during the implementation of the data structures

at the next level.

The innermost level of representation, level 4, is the actual implementation of

the ADT definition and the implementation within the MDA architecture. This level is

where all the data structures come together. It is at this level that a computer program-

ming language (see Appendix A), such as, C, Prolog, Lisp, or anewly defined language

is chosen [134]. This is also the level that the coding of datastructures and algorithms

will be performed, and any time/space analysis is done. Thisrepresentation is the most

concrete. Level 4 is the representation where the domain knowledge being worked on

15

Page 46: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

will tie into the computer language used for the implementation.

1.1.2 Speed and Efficiency in Processing

An abstract data type (ADT) (see Figure 1.2) can be broken down into two

parts: 1) specification and 2) implementation. The specification, which is abstract,

includes the definition of data types, including their structure and values, and supporting

operations for those data types; this half of an ADT will be referred to as adata model.

ADT

SPECIFICATION IMPLEMENTATIONabstract concrete

Set of OperationsSet of Values Data Representation Algorithm Code Bodies

Figure 1.2: Abstract Data Type (ADT).

The data model provides a mapping from general knowledge to the abstract

element of the ADT. The implementation, which is concrete, contains the data repre-

sentation used by the algorithms and algorithmic code bodies of the operations. This is

the second half of the ADT, and connects the knowledge to the algorithms being used

for implementation.

Many issues come into play when defining an ADT for a processor. Probably

16

Page 47: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

one of the most important is the efficiency of data structureswhen implementing the op-

erations. When examining the data in terms of their time and space requirements, some

data structures are better than others. Well-designed and well-defined data structures

can certainly help in these respects, whereas poorly definedstructures lead to ineffi-

ciencies. The data structures directly affect the efficiency of both aspects of the ADT

because the data model deals with the data types, while the data representation is part

of the implementation. Modification of the data structures to give faster access times

to the data types and representations can help in the efficiency of the algorithms being

implemented for the operations to be performed. In this way the algorithms are being

implemented to work towards their best possible execution time, where as, if the data

structures are not optimized there will be a higher probability that the worst possible

execution time is seen.

One important aspect in processing the underlying knowledge represented is

how to communicate that knowledge to other systems and applications. This may affect

the speed and efficiency of the implemented data storage.

1.2 Foundational Information

The following sections: subgraph isomorphism, unificationand data/knowledge

base give some foundational information as building blocksfor working with the rea-

soning operations that are the basis of this work.

17

Page 48: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

1.2.1 Basis of Subgraph Isomorphism

When dealing with basic graph operations (see Section 3.1 for definitions), the

efficiency of some of the algorithms have been investigated by many researchers. In

looking at these algorithms and their efficiency, it is important to understand relation-

ships between the different complexity classes that they may fall into:

P =⇒NP =⇒ NP-Complete=⇒NP-Hard

P: Problems that can be solved in polynomial time; NP: problems that are in NP, but not

known to be either in P or NP-Complete; NP-Complete: problems that are reducible to

NP-Complete problems and are decision questions; and NP-Hard: problems that are at

least as hard as an NP-Complete problem, but are not decisionquestions so can not be

reduced to a known NP-Complete problem.

At the core of graph isomorphism (see Section 4.2.1 for full definition and ex-

ample), the problem is to find a mapping,f , of graphG to graphH, such thatG and

H are identical. Discovering if two graphs are isomorphic is not known to be an NP-

Complete or P problem [42]. It is defined to be in the complexity class between P and

NP given that P6=NP. For this discussion, it will be called the class ‘NP’.

However, in most cases involving reasoning operations, given graphsG andH,

a more important question, than if they are identical, is knowing whether a small pattern

graph inG, asubgraph, is isomorphic toH. This is known assubgraph isomorphism.

18

Page 49: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Because this question can be restricted to the well known NP-complete problem of a

CLIQUE (see page 64 of Garey and Johnson [42]) by allowing only instances for which

H is a complete graph, it is known to be an NP-complete problem [42].

When sub-problems (“special cases”) of the subgraph isomorphic question are

analyzed, some are found to be solvable in polynomial time. One of these sub-problems

is subtree isomorphism[42]; this is when bothG andH are trees (a graph,G1, is atree

if and only if every two distinct vertices ofG1 are connected by a unique path ofG1

[Theorem 3.4 on page 69 of [13]] ). A polynomial time algorithm for this sub-problem

was shown by Reyner [103].

When labels are added to graphs, such as in bipartite graphs,these can be fac-

tored into an isomorphic algorithm. The two label types can significantly speed up

subgraph matching by allowing a pruning of some possibilities through separating the

vertices into two groups.

Another sub-problem that produces a polynomial time algorithm besides just

labeling the vertices, is to define that a class or group of thevertices may only have a

specific number of edges [2]; such as in feature term graphs (see Section 3.4.3). This

process is constraining the problem to bring it into polynomial space.

It should be mentioned that all of these sub-problems are concerning two graphs

and are considering the running time based on the number of vertices in the graphs,n

= vertices [68].

19

Page 50: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

1.2.2 Overview of Unification/Matching

As was discussed in Martelli [67],“unification was first introduced by Robinson

[104] as the central step of the inference rule calledresolution.” Resolution became a

single rule that could replace all the axioms and inference rules of first-order predicate

calculus and be used in designing mechanical theorem provers. Unification can be

expressed in the following way: Given two terms containing some variables, if there

exist such, find the simplest substitution (assignment of some term to every variable)

which makes the two terms equal. This substitution becomes amatching of the terms

based on variables binding assignment and therefore is aunifier. There may be many

ways to unify a pair of terms, but there will be at most onemost general unifier,MGU;

the other unifiers add extra bindings for sub-terms which arevariables in the original

terms. If a unifier,U , is the MGU of a set of expressions, then any other unifier,V, can

be expressed asV = UW, whereW is another substitution.

As discussed in Myaeng and Lopez-Lopez [77], graph matchinghas been rec-

ognized as a central problem across many application areas.Many researchers have

attempted to reduce the computational complexity by developing application-specific

matches [78, 79]. As discussed above, while the general subgraph isomorphism prob-

lem is known to be NP-complete, matching graphs containing conceptual information

appears to be computationally tractable [77]. This is because conceptual graphs are

connected (acyclic or cycles), bipartite (can be separatedinto two distinct groups) and

20

Page 51: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

directed (finite), and for reasons given in Section 1.2.1 improves the general subgraph

isomorphism problem. However, adding labels to the conceptual information [77] is

essential in extending the plain graphs to be tractable.

If graph matching is reduced to a unification problem, then one should be able

to check a setU of finite terms over a set of function symbols and a countable set of

variables, where there is defined a finite set of pairs of terms, {< ui ,vi > |i ∈ I}. The

question is now to determine if there exists a substitutionσ = {(x j→ t j) | t j ∈U, j > 0}

such thatσ(ui) = σ(vi) for i ∈ I [84]. However, most unification algorithms that can

be done in linear time require that the graph is acyclic [84].The reason the graphs can

not contain cycles is because of theoccurs check. This is a feature of implementations

of unification which causes substitution to fail if the structure S being unified against

contains the variable,V, being substituted [133]. If occurs check is not evaluated, then

unsound inference could occur. Some implementations couldgo into a indefinite loop

if a cycle appears in the structure; therefore, it is disallowed [84].

However, if the relationship is functional as in feature term graphs (see Section

3.4.3), then the unification can be performed even with cycles. Figure 1.3 from Willems

paper on projection and unification with conceptual graphs[136], shows the unification

of two projections into a single graph even when one graph contains a cycle. The Figure

1.3 will be explained more fully in Section 4.3.2.

21

Page 52: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

U: [Person:*y] -> (Name) -> [Word:*x].G1: [Man] ->

(Name) -> [Word:*x].(Child) -> [Girl:*y] -> (Name) -> [Word:*x].

G2: [Person:*y] -> (Name) -> [Word:*x ‘Smith’].G: [Man] ->

(Name) -> [Word:*x ‘Smith’].(Child) -> [Girl:*y] -> (Name) -> [Word:*x ‘Smith’].

Figure 1.3: UnifierU, Projs U −→ G1 and U −→ G2, Unification G is Found(Adapted from [[136], Figure 5]).

1.2.3 Database vs Knowledge Base

As defined by Wikipedia: “a database is a collection of records stored in a

computer.” These records are fields of data that contain information that is queried

to answer questions and make decisions. This data is stored in files by records. A

“database management system” is used to access and query thefields and records of

the data information. However, databases only can retrievedata that is explicitly stored

in its structures. No information that is not factual can be retrieved.

A knowledge base is like a database, but it contains more thanfields and records

of data. Most knowledge bases also contain some kind of inference engine that uses

reasoning operations over the structure of the data stored in the records to infer more

information. Theknowledge baseas describe by Tappan, “operates over a framework

of objects, properties, and relations towards the goal of supporting reasoning” [128].

22

Page 53: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

The framework will be different depending on the “goal” of the domain of knowledge.

Knowledge bases like databases use a management system, butthis system adds struc-

tural information to the data (many times called meta-data)to help in the discovery

of additional information. This meta-data gives organization to the data and allows

the deduction of contextual knowledge from implied semantics of the inference engine

[74, 128]. The new contextual knowledge deduced by implicitsemantics may or may

not be factual, but can be presented to the user of the knowledge base to see if it should

be added to the stored data information. Then this new data can be used to answer

queries and help in making decisions.

In the future, it is hoped that more and more users will use knowledge bases

over databases; however, the speed of retrieval from a knowledge base is slower than

a database because of the added structural information and because of the built in in-

ference engine. In this work, some of the advances that have been found in database

algorithms and data structures will be applied to knowledgebases, in hope of improving

some of those retrieval speed problems.

1.3 Organization of Dissertation

This work will begin by looking at ontology, knowledge and representation in

Chapter 2. This will include looking at how ontology can be processed, knowledge

types and operation, and moving knowledge through different types of representations.

Above the outer most level of representation as seen in Figure 1.1 (one could describe

23

Page 54: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

this level as being the top) would be a zero level of macro information. The informa-

tion represented is not actually part of the domain knowledge and is even more abstract

than the first level of representation. In fact, it is more of ahierarchy of conceptual

information than knowledge, so will be referred to asabstract hierarchiesand discuss

this zero level in Section 2.1.1. Different ADT representations of knowledge are used

for implementing a semantic networks KR. These internal representations (see Section

2.3.2) use different formal approaches for syntactic processing, such as: 1) proposi-

tional logic, 2) predicate calculus, and 3) graph grammar (with set theory). Some of

the representations of knowledge used by semantic networkswill be discussed in more

detail in Section 2.3.1.3. However, within a propositionallogic approach, propositions

and logical operators use arbitrary conceptual units and expression links to define nodes

and arcs with semantic descriptions and context. Predicatecalculus is built on top of

a propositional approach and also incorporates the use of predicates with quantifica-

tion over variables. Using graph grammars or a set theoreticapproach not only is built

above the predicate calculus and quantified variables approach, but also uses primitive

objects and actions with procedural operators to help definethe semantics of the net-

work. Graph grammars built on top of graph theory instead of set theory also give a

visual representation which can be more expressive (see Section 3.1). Several different

types of knowledge representations were originally investigated, but a detailed example

(see Section 2.3.1.3) will be given for semantic networks.

Chapter 3 gives definitions for several elements that will beused through out

24

Page 55: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

the thesis, so that the reader has a frame of reference. Data structures that are rele-

vant to the implementation of the problem are defined and their basic running times are

examined. Reasoning operations, projection and maximal join, are then explained in

Chapter 4. Chapter 5 presents a new projection algorithm after explaining and analyz-

ing the foundational projection algorithms; continuing onto theoretically analyze the

new algorithm and show how it compares in a “typical case” with the other algorithms.

Some example environments/systems will be discussed in Chapter 6. KL-ONE,

SNePS, SNAP, PEIRCE, CoGITaNT, Amine, pCG and CPE, are all semantic network

knowledge representation systems. In Chapter 6 each of these systems will be dis-

cussed; evaluating the different ADT representations thatare used in each case. Chapter

6 also gives an evaluation of possible data structures to used in implementation of the

new algorithm given in Chapter 5. While implementation different ADTs, different data

structures will be explored, and their efficiency with storage and speed of algorithmic

execution will be analysis. This leads into the practical element of this dissertation.

An important aspect of the dissertation (the practical element) is presented in

Chapter 7 where the change in data structures, and algorithms are shown to effect speed,

efficiency, flexibility and space needs. One can see how the change in the data struc-

tures can effect the system speed and efficiency. Creating a system fast enough to

retrieve and process thousands of graphs in a reasonable amount of time for simple

query processing, therefore making it a usable system. As well as looking at tying the

25

Page 56: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

algorithms to the data structures to improve the system’s functionality, the new system

was designed with a flexible system in mind, such that, even a sub part of the system,

a module, can connect and be used by another standalone application. Also, the new

algorithm can find results that the baseline system was not able to process. These last

two features are important contributions of this dissertation. Chapter 8 draws conclu-

sions and describes future work. Different implementationlanguages were examined

to find the fastest system; they are discussed in Appendix A. Next, Appendix B gives

the actual CGIF format for the 2001 version. Appendix C givesdocumentation for how

pCG program work and for the implementation of the CPE systems. Appendix D gives

sample data of the averages and error spreads that are shown in the experimental results.

It also show verification that the CPE algorithm for projection gives correct results for

both single and multiple projections.

26

Page 57: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

CHAPTER 2

ONTOLOGY, KNOWLEDGE AND REPRESENTATION

For use later in declaring an ADT for an internal representation, this chapter begins

by evaluating the interplay between hierarchies, relationships and operations. These

elements have impact on how the higher levels of representation will be designed, de-

fined and implemented, and an understanding of each of these elements is necessary to

clarify the different representation level interactions.

2.1 Ontology

Unlike the definition of a knowledge base given in the Tappan thesis [128] (dis-

cussed in Section 1.2.3), the knowledge base is more than just its ontology, but also the

higher levels of representation. Hierarchies and relationships are the abstract elements

of ontology, and are more informal and open in their presentation of the representa-

tion. One can see them as thebuilding blocksof the ontology; therefore, they will be

discussed in this section. Operations are more processing oriented and give a more

concrete representation. These depict how information blocks are put together, so will

be discussed later (see Section 2.2.2). Below are defined some of the elements of the

knowledge that are needed to process the representation levels. As discussed in [80],

the knowledge level gives the general functional expression that builds the notation at

27

Page 58: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

the symbol or programming level. Evaluating the function ofan ontology will reveal

some basic elements of this expression.

In order to look at the ontological elements of knowledge representation, let us

define some basic entities: object and act. Anobjectis a thing, for example a subject of

a sentence, and is commonly considered to be a physical object such as a ball, a book,

a person, etc. An object has size, shape, mass, color, temperature, speed, etc. Basically,

an object exists as a physical thing. Anact is to perform, i.e. a verb in a sentence. An

act has properties such as rate, acceleration, direction, orientation, etc. Each of these

entities are important in understanding overall concepts about representations and both

not only have related term information, but also exist in time and space.

As defined in [120, 123],ontologycomes from the Greek wordsonto, being,

and logosmeaning the study of being or the basic categories for existence. Anontol-

ogy is a synonym for the arrangement of a generalization hierarchy that classifies the

categories or concept types of the hierarchy. The ontology also looks at the relation-

ships, operations and constraints that are essential to help define the nature (knowledge)

of our world orreality [106]. This general knowledge defines an informal list of con-

cepts that are part of the domain. These concepts will be seenas terms(see Section

2.1.2 for more of a discussion on terms) within the ontology,and they may be defined

by categories [106] in which they are members. The next section will begin by looking

at different abstract hierarchies, and later tie the hierarchies to the categories of objects

28

Page 59: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

within a domain.

2.1.1 Abstract Hierarchies

Abstract hierarchies can be used to define membership in various categories, or

give macro definitions about the categories. The actual structure of hierarchies will be

discussed in Section 3.2 within Chapter 3 containing definitions.

2.1.2 Relationships

Relationships can be divided into different categories:compositional, quantita-

tiveand/orqualitative[43]. Each of these relationships are involved in the construction

and propagation of information within sentences or expressions. Quantification deals

with fuzzy quantifiers likemost, as they relate to the classical universal and existential

quantifiers; qualification looks at fuzzy probabilities [43]. Sentences from a logic point

of view may be simple predicates with an arity ofn termarguments that return either

a true or false value. Sentences also may be more complex and return more complex

information within structures. First, looking at simple sentences and how they are built

using compositional operations.

2.1.2.1 Compositional

Within simple sentences there are term arguments [65]. Terms may be of three

different types: constant symbols, variable symbols and function expressions. The con-

stant symbols are symbols that do not change; two known constant symbols are the

29

Page 60: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

truth symbols,trueandfalse. These symbols may also be things such as numbers, 1, 2,

etc. Each of these symbols has a known interpretation as specific objects or acts in the

world. They are also members of a specific category within thedefined world. Variable

symbols are used to designate general classes of objects or properties in the world [65].

Variables are not constant, and as seen later, they may be substituted. Function sym-

bols have an attached arity indicating the number of elements of the domain mapped

onto each element of the range. A function expression consists of a function symbol

followed by the number of terms indicated in the function symbol’s arity.

The terms are built into sentences using connectives. Thereare different types of

Boolean connectives that are used when mathematically working with sets or equations,

for example: conjunction, disjunction, negation, implication and equivalence. These

Boolean connectives can be used to create sentences or composite sentences by treating

the connectives as compositional relationships (or functions). Each of these connectives

operate as follows: theConjunction(“and”) operator forms a ‘collective’1 set, where

each member of the set is “anded” with the other members of theset; theDisjunction

(“or”) operator forms a ‘distributive’2 set, where each member is “ored” with the other

members of the set; theNegation(“not”) operator forms an ‘opposite’ set, where each

member is the opposite of what it is in the set; theImplication(“if A then B”) operator

1Refers to a generic assemblage of items.

2Refers to a generic bag of items.

30

Page 61: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

is used in an equation, where the truth of A causes the truth value assignment to be

concluded from the truth value of B; otherwise, the assignment of the implication is

always true; and theEquivalence(“equals”) operator forms a ‘identity’ set, where all

members within the set contain the following properties: 1)Reflexivity: a ≡ a; 2)

Symmetry:i f a≡ b then b≡ a; and 3) Transitivity:i f a≡ b and b≡ c then a≡ c.

These simple and complex sentences or expressions can be expanded to use

variable symbols and function expressions. This introduces more complex relation-

ships where properties are applied to a whole set of terms or acollection. These rela-

tionships may be either quantitative or qualitative in nature. A quantitative relationship

propagates information by performing quantification of values and variables to provide

an interpretation or meaning for a symbol or expression [65]; a qualitative relationship

is based on qualitative physics and propagates through bothmoments in time when acts

occur, and locations of objects in space [47]. Each of these types of relationship will be

used in the next section when discussing constraints. Next these relationships will be

examined more closely.

2.1.2.2 Quantification

Quantification allows the substitution of variables with numeric values, and

these values using numbers and arithmetic operations can beperformed within a rea-

soning process [92]. When there is a fixed number of constant symbols with only a

finite number of substitution possibilities, atruth value assignmentcan be determined

31

Page 62: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

as either true or false, for each substitution of the quantification. These truth value as-

signments are then collected into atruth tablegiving an interpretation for expressions

or sentences over a domain. A truth table can be used to exhaustively test all possible

assignments of member values [12].

However, a more common use of a quantitative relationship iswith variables.

Variables may be quantified in two ways: 1) universally or 2) existentially. A variable

is universally quantified when in a sentence it is true that all constants intended in the

interpretation can be substituted for the variable. The symbol indicating this universal

quantifier is∀. Universal quantification introduces problems in computing truth value

assignments for a complete sentence. There now becomes an infinite number of possi-

ble substitutions; therefore, making the creation of a complete truth table impossible.

This exhaustive testing of all substitutions computationally is an undecidable problem

[65]. At the same time, the quantitative relationships (or functions) allow a larger map-

ping of information in a knowledge-base which can be more powerful as seen later.

The second quantifier for variables is existentially quantified. In this case at

least one substitution is true for the variable across the interpretation of the domain.

The symbol for an existential quantifier is∃. Existential quantification is no easier to

compute than universal quantification; this is because of the infinite number of possi-

bilities.

For quantification of variables, the scope of the quantified variable is indicated

32

Page 63: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

by enclosing the quantified occurrences of the variable in parentheses. Quantification

allows one to look at infinite possibilities within one instance in time and space; when

a time or space continuum is introduced then qualitative relationships (or functional

mappings) are needed.

2.1.2.3 Qualitative

Qualitative relationships are used inqualitative physics. This area of knowledge

representation is concerned with constructing a logical, non-numeric theory of objects

and acts [106]. This theory defines relationships that process operators of time and

space. In order to define these relationships, one must definewhat entities and relation-

ships are relevant to time and space. An entity in the domain of time is called amoment

or instant [47], while an entity in a spatial domain is alocation. A relationship over

time is anintervaland over space is aregion. When these aspects of entities are related

to objects and acts within the world and to each other interesting things start to occur.

When one looks at the properties of an object for the current point in time and

space they are said to have astate [47]. If that object is in relationship with other

objects, apartial ordering of statescan be produced. When the properties of an act

for the current point in time and space are examined there is aprocess; that act in

relationship with other acts gives apartial ordering of processes.

When one now starts to look at the interconnection between objects and acts, if

a set of objects are in a spatial relationship for a single moment in time they are said to

33

Page 64: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

have aschematic. As these relationships are looked at over a set of moments intime,

one gets apartial ordering of schematics. Also, when there is a set of acts in a temporal

relationship for a single region in space, it has achronicle. Extending this to a set of

regions, one gets apartial ordering of chronicles.

When qualitative relationships are executed as functions the above defined par-

tial orderings are processed. For example, a ball in one state may be at the top of a

bounce, and in the next state at the bottom. However, the interesting part is that when

moving between the two states the ball also moved from point Ato point B which

was a forward direction. Here is seen a time and space progression being performed

within an operation. There are many more temporal and spatial qualitative functions

(see Hartley’s work [47] for a much more complete list).

Now, if one looks at a set of objects that are participating ina single act, this

is said to be anevent. If one looks at a set of acts with a single object, this can be

described as anexperience. Both an event and an experience are atomic units to the

single act or object, respectively. It should be noted that events are time-independent

and experiences are space-independent [47]. Unlike the functions defined above, these

do not produce a partial ordering across entities. This is because there is only one act

or object present. However, these events and experiences can be linked together in a

set to form related knowledge structures. These structuresare similar to standard case

relations already available within knowledge representations [47].

34

Page 65: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Temporal and spatial operators allow one to discuss time andspace relation-

ships. However, the drawback is how to represent the knowledge and the time and

space it takes to process the partial orderings.

2.2 Knowledge

First, considering closer the actual definition of knowledge and learning from

Piaget [98],

“in each act of understanding, some degree of invention is involved; in

development, the passage from one stage to the next is alwayscharac-

terized by the formation of new structures which did not exist before,

either in the external world or in the subject’s mind” [[98] page 70]

and the types of knowledge that make up this definition as discussed previously in

Section 1.1. If one learns information and then keeps it in their mind so that they can

understand it, obviously that information, or knowledge, must be stored. However, what

representation it is actually stored in is still a mystery. Whatever the representation, the

mind is able to recall the information at will.

Second, ontological information can add additional structure to the representa-

tion by placing a macro level of knowledge for the defined world, outside of the types

of conceptual knowledge that has been defined. This additional structure can work

35

Page 66: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

through knowledge operations to use the representation levels previously defined (see

Section 1.1.1) to store information in a knowledge base.

2.2.1 Types

Therefore, to examine more closely what representation themind might use to

store knowledge, the types of knowledge will be discussed. Knowledge can be thought

of asDeclarativeor Procedural; the following sections will define what is entailed in

each type.

2.2.1.1 Declarative Knowledge

The first type of knowledge is known as declarative knowledge, describing a

collection of definitions about the world. Throughout history, language has been used

to describe knowledge and conceptual relationships. In many instances it is easier to

describe in words definitions of concepts and their relationships, for example: a cat is an

animal with four legs and a long tail. In this example, a definition is being performed

to give attributes and characteristics to a cat; that is, an attribute of four legs and a

characteristic of a long tail. There could also be given domain information, that is, a

cat is an animal, but this will be discussed further in ontologies. It can be noted that in

the definition of a cat, it has been declared that this animal has four legs and a tail. In

some other world of declarative knowledge, a cat may have only three legs and no tail.

36

Page 67: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

2.2.1.2 Procedural Knowledge

The second type of knowledge is procedural knowledge, describing the tempo-

ral, spatial, and constraint aspects for the above definitions. It is believed that there is a

duality between these two types of knowledge [107], but one type is inadequate without

the other. If the simple example given above is expanded to include a location for the

cat, it can now be defined that: a cat is an animal with four legsand a tail and the cat

is located on a mat. Within this expanded example both types of knowledge are being

used: 1) cats have attributes and characteristics, and 2) spatially, the cat is located on a

mat. If then this statement is slightly changed to add that a cat with four legs and a tail

saton a mat, not only is declared definitional information aboutthe cat, four legs and

a tail, but spatial information of the location, a mat, and temporal information, sat (this

moment in time). Sometimes, written language is not an easy tool to use to describe all

knowledge information. If the example is changed to add one more temporal wrinkle:

A rat sat on the mat before a cat sat on the mat. Assuming that the cat is the one already

defined in our world knowledge, there can now be two interpretations of this idea: 1)

the rat is sitting in front of the cat on the mat at the same time, or 2) the rat sat on

the mat prior to the time the cat sat on the mat. Here a picture or a time diagram (see

Figure 2.1 from [95] on page 176), can help display the correct interpretation. Again,

both types of knowledge are being used, 1) cats located on mats; rats located on mats;

2) spatially, the cat on the mat and the rat on the mat; and 3) temporally, the rat on the

37

Page 68: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

mat before the cat on the mat or the rat and cat on the mat at the same time.

Figure 2.1: Time Chart.

2.2.2 Operations

Besides using the relationships just defined above, an example system uses dif-

ferent types of operations to process the internal knowledge being stored in the knowl-

edge base and the hierarchies of ontology information beingapplied to the data struc-

tures.

2.2.2.1 Terminological

Terminological operations work over terms orconceptsand are designed to fa-

cilitate the expression of definitions [66]. Some common operations are: subsumption,

inheritance, completion and coherence. Let us look at each operation briefly [141].

Subsumption, as defined in Section 3.2, is when a term is subsumed by another term.

When all appropriate subsumption relations are identified for a given set of terms, then

the terms are said to beclassified. Inheritance is the operation of identifying the appro-

priate subsumption relations, and completion is the process of identifying and recording

38

Page 69: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

all conditions that should be applied to a term so it can be classified. Lastly, the coher-

ence operation is finding a model in which the term’s denotation is not empty. These

terminological operations work above the knowledge base when trying to actually pro-

cess rules or predicates.

2.2.2.2 Assertional

Assertional operations try to state constraints or facts that apply to a particular

domain or world [66]. The most common assertional operationis realization. Realiza-

tion is the process of identifying all concepts that have been instantiated [141]. Once

a concept has been instantiated, it can be entered into the domain as a fact. One very

important aspect of this operation is whether or not theclosed world assumption, this

is where only definitions or facts defined within the world canbe operated on, is being

made [141]. Most systems no longer make this assumption.

2.2.2.3 Generalization

Thesimplificationoperator generalizes an entity by taking it to a more general

form [119]. This generalization sometimes removes part of aconceptual idea that has

more information to take the idea to a more general form. Whengeneralization is

performed on hierarchies the concepts are moved upward in the hierarchy from the

bottom to the top. The top (⊤) is the most generalized.

39

Page 70: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

2.2.2.4 Specialization

The join operator (see Section 4.1.2) allows the specialization of entities by

performing unification (see Section 1.2.2) between two entities. When unification is

performed, a substitution is made in one entity by another [106]. If it is a concept that

is being unified, then the concept may go from a general form toa more specific one.

Specialization on hierarchies moves the concepts from the top toward the bottom. The

bottom (⊥) is the most specialized.

2.3 Representation

As was discussed in Section 1.1.1 of the introduction, conceptual ideas can be

transformed through representation levels to a form that can be processed by a com-

puter. In this section, examples will be discussed from level 1 and level 2 of those

representation levels.

2.3.1 Knowledge

Level 1 from Figure 1.1 discussed the concept of knowledge representation

(KR) at a beginning syntactic level. This section will present three KRs: Logic, Rule-

Base, and Semantic Network, and their basic representationof knowledge.

40

Page 71: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

2.3.1.1 Logic

Logic as a knowledge representation looks at representation of knowledge in

two parts: implicit and explicit [63]. The implicit part allows knowledge to be repre-

sented within a closed world assumption; that is, it contains a set of sentences of the

form (s 6= t) for any two terms in the universe that have not already been explicitly

defined. This allows the user of the world to know what is “not true” for the universe.

This part of the knowledge, when speaking about the processing level of knowledge

representation, relates to the ontology (as discussed in Section 2.1) of the Logic KR.

The explicit part is a collection of first-order sentences (asubset are called Horn

clauses) of the form:

∀x1 · · ·xn[P1∧· · ·∧Pm⊃ Pm+1] wherem≥ 0 and eachPi is atomic.

If m = 0 and the arguments to the predicates,P, are all constants then there

is nothing more than a relational database of facts. However, this may be a first or-

der logic, FOL (see Section 3.3) sentence. These first-ordersentences define what

is “known” about the universe, and give the syntax of the Logic KR. Logic must be

mapped to the next machine processing level using some ADT (see Section 2.3.2).

The computational part of Logic KR is theexecution, inference, of the logic

system. This can be seen as a form of semantics for logic. The inference engine also

uses an ADT declaration to interface to a machine representation. See Figure 2.2 for

a simple example of a knowledge representation (KR) and internal representation (IR)

41

Page 72: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

that uses logic. Within this example, at the KR level one can see a FOL (see Section

3.3 for definition) sentence where there is a red block on top of a yellow block which

is on the table. When translating this to the IR level, the KR single sentence translates

into 13 triples of relationship information.

KR level

( Table( table-1 ) ∧(( Block( block-1 ) ∧ Color( Yellow ) ) ∧

Supported-by( block-1, table-1 )) ∧(( Block( block-2 ) ∧ Color( Red ) ) ∧

Supported-by( block-2, block-1 )) )

IR level

(inst table-1 table)

(inst block-1 block)(color yellow block-1)(and and1 block-1 yellow)(supported-by sup1 block-1 table-1)(and and2 and1 sup1)

(inst block-2 block)(color red block-2)(and and3 block-2 red)(supported-by sup2 block-2 block-1)(and and4 and3 sup2)

(and and5 and2 and4)(and and6 and5 table-1)

Figure 2.2: Logic Example.

42

Page 73: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

2.3.1.2 Rule-Bases

Rule-Baseknowledge representations are procedural schemes that represent

knowledge as a set of instructions for solving a problem. Theinstructions are in the

form of if ... then ... rule and may be interpreted as a procedure for solving a goal

in a problem domain. At the heart of the system is a knowledge base that holds the

instructions. An inference engine takes the rules (knowledge) from the knowledge base

and applies them in the correct order to produce a solution (goal) to an actual problem.

This is a recognize-act control cycle, and procedures that implement the control cycle

are separate from the rules in the knowledge base. The procedures can be seen as the

semantics of the system and they produce a very simple ADT foroperation by the infer-

ence engine for processing the rules. Rule-Base systems arethe basis of expert systems

and an expert provides the rules for the system. These systems focus on a narrow set of

problems in which knowledge is extracted from a specialist in this area.

2.3.1.3 Semantic Network

A semantic network is an example of a knowledge representation that is dis-

played as a discrete graphical structure of vertices and arcs [61]. Within the graphical

structure, the vertices are called nodes and may be displayed as circles or boxes. The

arcs are called links and are displayed as lines with arrows between the nodes. The

nodes are related to each other through their links, where the links are assigned a one-

to-one correspondence with a conceptual meaning defining the relationship [108].

43

Page 74: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

The nodes are sometimes called conceptual units and may be seen as objects

within the network. These objects may be of many different types including entities,

attributes, events or even states. Syntactically, each object is just a symbol (normally

text within a box or circle), in the graphical structure. On top of the semantic network,

abstract hierarchies are organized according to levels of generalization for the concep-

tual units. These hierarchies were discussed in Section 2.1.1. The links of the network

form relational connections between the conceptual units,such that, the valence (or par-

ity) of the relational connection is the number of units thatare connected to a particular

unit with a link. In a semantic network links are usually dyadic (binary) connecting two

conceptual units together.

The syntax of the semantic network is a set of the grammaticalrules that express

how the symbols of the network can be combined within the graphical structure. In

this way, the syntax of the network is very abstract. The semantics of the network

is the abstract meaning of the links and their nodes. Becausethe semantic network’s

representation, in the abstract, appears as informal, its semantics is an interpretation

of the objects displayed within the graphical structure. This creates a transformation

from one representation level to the next. Therefore, the interpretation of the network

defines a modeling of the relational connections between conceptual units using an

abstract and generative form of semantics, and has the characteristic notion of a set of

links which connect individual conceptual units, referredto as facts, into a total basic

network structure. In this way, the representation of knowledge or implementation of

44

Page 75: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

the knowledge representation is at a different level of representation than the semantic

network.

Elements of semantic networks appeared as early as the late nineteenth cen-

tury in works by Alfred Kempe in 1886 and Charles Peirce in 1897 [86, 61, 37, 121].

Both gentlemen used a graphical structure of conceptual units to diagram meaning [86].

However, semantic networks were not introduced for use withcomputers until 1956 by

R.H. Richens in a system called ’NUDE’ [53]. This system was used for machine

translation of Russian to English by going through a neutralconceptual language. This

procedure actually operates over the innermost level of representation produced by the

translation of the semantic network to the storage representation of knowledge. The

actual natural language, Russian, is mapped onto a semanticnetwork knowledge repre-

sentation for natural language processing. This KR is then mapped onto an internal rep-

resentation, which is really a new language declaration fora new conceptual language.

It uses the nodes and arcs within the semantic network to map to the new language.

The virtual machine internal representation, at level 2 (asseen in Figure 1.1), is not the

semantics of the network, but the representation produced by applying the semantics of

the network through a mapping to the new representation language. After the semantic

network has been translated to an internal representation,the new language is mapped

through the definition level with the new data structures onto the implementation of

the algorithm for processing that data structure, so the innermost (highest) level can be

executed and perform reasoning operations and analysis.

45

Page 76: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Examples of applications where semantic networks have beenused are natural

language understanding, planning, machine translation, deductive databases, and ex-

pert systems [61]. However, in order for a semantic network to be a good knowledge

representation for an application, the network must be interpreted in terms of a repre-

sentation that algorithmically or procedurally can process the network’s meaning and

perform reasoning. Interpretation requires that the representation be translated from

the abstract to a more concrete representation. For any semantic network, different

representations of knowledge, levels 2 - 4 (as seen in Figure1.1), may be used for

implementing the storage representation. These representations use different formal

approaches for syntactic processing, such as: 1) propositional logic, 2) predicate cal-

culus, and 3) graph grammar (set theory). Some of the representations of knowledge

used by semantic networks will be discussed in more detail inSection 2.3.2. However,

within a propositional logic approach, propositions, logical operators, and abstract hi-

erarchies use arbitrary conceptual units and expression links to define nodes and arcs

with semantic descriptions and context. Predicate calculus is built on top of a propo-

sitional approach and also incorporates the use of predicates. A graph grammar or set

theoretic approach not only is built above the predicate calculus approach, but also uses

primitive entities and actions with procedural operators to help define the semantics of

the network.

A specific type of semantic network, or a knowledge representation in its own

right, is a frame representation. Theframeis a named data object with an unbounded

46

Page 77: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

collection of namedslots(attributes or fields) which can havevalues[61]. The value

to a slot in a frame can be a pointer to another frame, thereby producing a network

of frames (therefore the representations name). A frame is an object represented by a

node with a set of slots; a slot is information about the object and may be represented

by a pointer to another node, restrictions on attribute values, by a pointer to an attached

procedure for calculating a value, an actual simple value, or a set of values [63]. Frames

collect explicit information about an individual object ata node level.

2.3.2 Internal Representation

Each of these internal representations is at the next higherlevel than the KR

used (see Level 2 from Figure 1.1). Even though each internalrepresentation will be

discussed as being the ADT for a “best fit” with a particular knowledge representa-

tion, any of these ADTs could be used with any knowledge representation, as stated in

Brachman’s work [11] when he was discussing semantic networks.

2.3.2.1 Predicate Calculus

Elements within the syntactic representation of the semantic network knowl-

edge representation can be grouped into structures. These structures have predefined

reductions (meaning) to a ADT. The structures arepropositions, predicates, logical op-

eratorsandprocedural operators. Let us look at each of these structures and how they

affect building an ADT.

47

Page 78: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Propositions will be discussed in Section 2.3.2.2. From that discussion it can

be seen that predicates not only generalize propositions, but also define relationships.

They can be separated into their intentional and extensional characteristics [139]. The

extensionof the predicate refers to the set of things that this conceptdenotes; while

the intentionof the predicate defines the meaning for the concept. Both characteristics

define the semantics of the concept. However, the intention of a predicate gives an

abstract function which can be assigned to the extension of the predicate, the concept

itself.

Logical operators use model-theoretic semantics with the basic operators being

conjunction, disjunction, negation, and existential and universal quantifiers. How each

of the operators are used in relationships was looked at moreclosely in Section 2.1.2.

So, let us consider the question: “what is model-theoretic semantics?” The wordmodel

has multiple meanings, three of them being [119]:

• Simulation - simplified system that simulates some significant characteristics of

some other system.

• Realization - a set of axioms as a data structure in which these axioms are true.

• Prototype - an ideal or standard for a system.

Theoryon the other hand is a proved hypothesis. Therefore, logicaloperators are mod-

eling a proved hypothesis by the conjunction of true propositions containing existing

48

Page 79: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

objects and conjoined predicates and relations [61]. Systems that exclusively use only

logical operators answer reasoning questions by usingtheorem provingin FOL (see

Section 3.3).

Procedural operators define procedures that actively interpret the semantic net-

work and operate over it [119]. Use of these operators in defining the semantics of the

network sets up a controversy,procedural vs declarative.

The procedural semantics assume that knowledge of the worldor meaning, can

be represented byknowing howa concept operates; declarative semantics assumes that

knowledge can be represented byknowing thata concept is defined by a collection of

facts [119]. This controversy will appear throughout the discussion of the systems in

Section 6.1.

Each of these structures will need to be examined in buildingan ADT for the in-

ternal representation. Figure 2.2 shows an example of a logic knowledge representation

that is mapped to an internal predicate calculus representation. This internal represen-

tation can then be used to help define an ADT for processing thestructures. As one can

see the connectives get turned into predicates and are called to instantiate the objects

from the knowledge representation level.

2.3.2.2 IF..THEN

For some knowledge representations, in particular rule-base representation, the

data structure that just consists of the propositional query can be used. A proposition

49

Page 80: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

comes from mathematical logic and is a simple statement which may have atruth value,

TRUE or FALSE, associated with it. These simple statements can generate, manipulate,

and/or relate concepts through logical functions. Propositions are alwaysintentional,

define concepts, and do not consider relationships or dependencies between concepts.

They also only use quantitative relationships which allow the application of heuristics

to reduce the search space, but do not function in the areas oftime or space.

The IF..THENrule construct can be defined directly in most programming lan-

guages and makes for a very simple ADT for defining the inference engine. However

because of the simplicity of the data structure only simple questions can be answered.

It is for this reason that this internal representation is not used for knowledge represen-

tations such as logic or semantic networks.

2.3.2.3 Conceptual Structures

Even though there are multiple semantic network representations available, the

representation that has flexibility in its use of the above approaches isconceptual struc-

tures. Conceptual Structures, CS, are a logic based representation of C.S. Peirce’s exis-

tential graphs [86] developed by John Sowa[119]. Graphicaldiagrams that are built out

of the logic building blocks of conceptual structures areconceptual graphs, CG (see

Section 3.4).

Semantic networks play a very important role in the use of conceptual graphs.

Sowa claims that “a conceptual graph has no meaning in isolation. Only through the

50

Page 81: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

semantic network are its concepts and relations linked to context, language, emotion,

and perception”. Such concepts as TOMATO or DOG are easier tounderstand and

define than abstract concepts such as PEACE or JUSTICE. In order to capture the

meaning of abstract concepts, these concepts must be hookedup through a vast network

of relationships which will eventually link them to concrete concepts. The philosopher

A. R. White [135] defined the meaning of a concept as follows:

“To discover the logical relations of a concept is to discover the nature

of that concept. For concepts are, in this respect, like points; they have

no quality except position. Just as the identity of a point isgiven by its

coordinates, that is, its position relative to other pointsand ultimately to

a set of axes, so the identity of a concept is given by its position relative

to other concepts and ultimately to the kind of material to which it is

intensively applicable. A concept is that which is logically related to

others just as a point is that which is spatially related to others” [135].

In Tepfenhart’s paper [129], he stated that the conceptual grounding for conceptual

structures is based on the triangle meaning for the relationships between symbols, con-

cepts, and referents (see Figure 2.3).

Peirce [86] actually had a different relationship triangle(see Figure 2.4); it

aligns its sign relation with Tepfenhart’s symbol, while the concept stayed the same.

For Tepfenhart, a referent was the instantiation of the concept in the triangle meaning,

51

Page 82: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Figure 2.3: Meaning Triangle for Symbols, Concepts, and Referents (Basedon [[129], Figure 1]).

while Peirce saw the object as the instantiation of the concept. This makes Peirce’s

triangle more general to all conceptual logics; not just conceptual structures. Concep-

tual Structures (CS) are the development of human “concepts” in such a way that they

can be processed by machines. The structures give meaning inthe computer for the

conceptual ideas [119].

Figure 2.4: Peirce’s Triadic Relation.

52

Page 83: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Going back to language as a mechanism for communicating human concepts;

over time, the foundation of conceptual structures within knowledge representation has

changed. Chomsky maintained thattraditional grammars, which are syntactic, carried

the structure needed to process sentences in computers, andthat each sentence was a

single structure [16]. However, he clarified in 1965 that these structures were an ab-

stract theory of competence, which is an idealized knowledge of language, as opposed

to a performance structure, which is the actual use of natural language [17]. Jackendoff

maintains that themeaningof a sentence, which is semantic, in natural human language

actually has separate semantic structures for each elementof the sentence [52].

John Sowa took both of these ideas and blended them together to develop a

graph diagrammatic representation for the structure called Conceptual Graphs, CG

[119, 121]. Section 3.4 defines conceptual graphs. Later, Bernhard Ganter and Rudolf

Wille realized that they had developed a similar, but simpler lattice representation for

conceptual structures that had a mathematical foundation calledFormal Concept Anal-

ysis, FCA[41]. FCA is a mathematical formalism [41] that handles concepts with

attributes in a lattice format. These mathematical structures can be traversed as in a

type hierarchy to discover super and sub-type relationships between concepts. They

can also be easily stored in a relational database [6, 7]. Their latest research has been in

the area of adding temporal attributes to the lattices to handle time relationships [138].

53

Page 84: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

54

Page 85: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

CHAPTER 3

DEFINITIONS

This chapter gives definitions for several concepts that will be developed though out

this work. Theses definitions are complete for the knowledgeneeded for this work, but

are not a complete coverage of all of these areas of study.

3.1 Graph Theory

Graph theory, unlike logic, is not built on sentences of predicates that evalu-

ate to TRUE and FALSE, but is based on the visual elements of drawings. A graph,

G = {V,E} whereV is a finite nonempty set of points (or vertices), andE is the set

of all the links (or edges) between adjacent points [46]. An edge,x = {u,v} wherex

is said to join verticesu andv [46]. The example in Figure 3.1 is a graphG where

V = {v1,v2,v3,v4,v5} and E = {{v1,v2},{v1,v3},{v2,v3},{v2,v4},{v3,v4},{v3,v5},

{v4,v5}}. However, even though graphs must have at least one vertex, they do not

have to have any edges. Graphs are very useful for discovering if a finite number of

objects, vertices, are in relationship, edges, with each other. The next sections are graph

theory definitions that are important within the details discussed later in this work.

55

Page 86: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

v

5

2

vvv

4

31

G:

v

Figure 3.1: A Graph to Illustrate Graph Theory Concepts (Adapted from [[46], Figure2.9]).

3.1.1 Digraph and Bigraph

A directedgraph (ordigraph), H = {V,E} whereV is a finite nonempty set of

vertices, andE is set of ordered pairs of all directed edges between adjacent vertices

[13, 46]. For these edges, a directed pairx = (u,v) joinsu andv in an irreflexive binary

relation and the direction is fromu to v. When drawing the edges in a graph there

is an arrow to indicate the direction [13, 46]. As can be seen in Figure 3.2, graphH

containsV = {v1,v2,v3,u1,u2} andE = {(v1,u1),(u1,v2),(v2,u2),(u2,v3)}. For each

of the pairs inE, the arrow is in the direction from the first vertex to touching on the

second vertex, i.e.v1 to u1 where the arrow is touchingu1.

A bipartite graph,B = {V,E} is a graph with the distinction that all vertices in

V can be divided into two subsetsV1 andV2 , or colors, such that every edge,E, of

graphB connects an element ofV1 to an element ofV2 and there are no edges between

the vertices in the subsets (for example with the same color)[46]. If Figure 3.2 is again

56

Page 87: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

v u1

v

u22

3

H:

v

1

Figure 3.2: A Digraph that is a Bipartite Graph.

examined, it is seen that besides being a digraph this is alsoa bigraph (bipartite graph)

whereV1 = {v1,v2,v3} andV2 = {u1,u2}.

3.1.2 Walk, Path and Connected

A walk of a graph is an alternating sequence of vertices and edges where the

beginning and ending of the walk is at a vertex, and the edges are incident on two

vertices [46]. In the previous example, Figure 3.1, a simplewalk has verticesv1v2v3v4v5

with the edges comprising the following order:{v1,v2}{v2,v3}{v3,v4}{v4,v5}. This

walk does not include all the edges, but does include all the vertices. In the example

just stated, since all edges are distinct, it is called atrail . There are other kinds of

walks, such as when the first and last nodes are the same, then the walk is acycle[46].

57

Page 88: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Using Figure 3.1 again, a cycle would be verticesv1v2v4v5v3v1 with edge ordering

{v1,v2}{v2,v4}{v4,v5}{v5,v3}{v3,v1}. This second example is also a nontrivial trail

that is closed. This kind of trail is referred to as acircuit [13]. Another example of a

trail, that is a cycle, would bev2v4v2 with edges{v2,v4}{v4,v2} (because the edges are

not directed). However, this is not a circuit because it is a trivial trail.

A pathfor a graph, in graph theory terms [46], is a walk in which all the vertices

on the walk are distinct except in one special case. If the path creates a cycle then the

path will come back to the starting vertex. Remember the firstexample encountered in

this section, Figure 3.1, was a path, but not a cycle.

For a graph to beconnectedevery pair of vertices is joined by a path [46]. Both

Figures 3.1 and 3.2 are connected graphs.

3.2 Types and Hierarchies

A type is a label that represents an idea with a underlying perceived object or

entity; these aretype labels. These entities within the world are in relationship with

each other. These entities can beaxiomatic, that is primitive and not made up of any

other defining entities, or the entities can bedefinedmeaning that they are built of more

than one axiom [125, 90]. The relationships can be seen as a hierarchy and can be

broken down into two functions:

58

Page 89: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

1. The functionctypemaps a finite set of vertices calledconceptnodes onto a set,

TC, of type labels. Each type label inTC is specified as axiomatic or defined.

Examples from Figure 3.4 of a type labels areBird,∼Cat,∼Dog,∼ etc.

2. The functionrtypemaps a finite set of vertices calledconceptual relationnodes

onto a set,TR, of type labels. Each type label inTR is specified as axiomatic or

defined. Examples from Figure 3.5 of these type labels aremember,∼ works−

with,∼ etc.

A type hierarchyis a partially ordered set of type labels,TH . Type hierarchies can be

used to define membership in various categories of entities.An example of a four level

hierarchy can be seen in Figure 3.3.

TOP

CH

M GF

A JI

K L B N

BOTTOM

E

D

Figure 3.3: A Type Hierarchy.

59

Page 90: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

The levels are counted from the TOP of the hierarchy to the BOTTOM. In each

level of the hierarchy, there are entities that are members of the hierarchy. These mem-

bers are organized into a partial ordering with the symbol≥ being used to designate the

ordering from top to bottom of the hierarchy. An example of a partial ordering from

Figure 3.3 would beTOP≥C≥D≥ A≥ L≥ BOTTOM.

Members at the top of the hierarchy are considered to be more general; the

members at the bottom are more specific. More general membersin the partial ordering

are said tosubsumethe more specific members and the more specific membersinherit

information from the more general ones. As stated in MacGregor:

“a concept C subsumes a concept D if any individual satisfying

the definition for D necessarily satisfies the definition of C”

[[66] page 388].

Through this process of moving down the hierarchy to gain more specific information

concepts are classified based on a relationship known assubsumption[140].

As seen in these hierarchies there is a partial ordering between its members.

However, when this membership is extended such that for any two elementsx andy of

L, the setx,y has both a least upper bound and a greatest lower bound then anlattice

exists [44]. When the elements aretypes, then these are referred to astype lattices.

60

Page 91: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

3.2.1 Concept Type Hierarchy

With these type lattices, theconcept type hierarchiesare organized into partial

ordered hierarchies according to the level of generality ofthe types. Using a more con-

crete example in Figure 3.4, there is given a set of labels {Animal, Mammal, Bird, Cat,

Dog,Human}. If Mammal≤ Animal thenMammalis called a subtype ofAnimaland

Animalis called a supertype of Mammal, writtenAnimal≥Mammal. If Cat≤ Animal

andCat≤MammalthenCat is called a common subtype,∩, of MammalandAnimal.

If Animal≥MammalandAnimal≥Cat thenAnimalis called a common supertype,∪,

of MammalandCat.

T

Animal

BirdMammal

Cat Dog Human

Figure 3.4: An Animal Concept Hierarchy.

61

Page 92: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Extending the type lattice definition as a type hierarchy plus the operators∪ and

∩, it can be seen that the minimal common supertype ofa and b, writtena∪b, has the

property that for any typet, if t ≥ a andt ≥ b, thent ≥ a∪b. The maximal common

subtype ofa andb, writtena∩b, has the property that for any typet, if t ≤ a andt ≤ b,

thent ≤ a∩b. In order to make the lattice complete, the labels⊥ and⊤ are introduced

such that for any typet, ⊥ ≤ t ≤ ⊤. The levels from⊤ to⊥ in the hierarchy go from

general to specialized for the types (e.g.Animalto Cat). Relationships that hold for all

objects of a given type are inherited through the hierarchy by all subtypes.

3.2.2 Support

Per the definition given within Baget and Mugnier [5], asupportis defined as

4-tupleS= (TC,TR, I ,τ) (see Figure 3.5 for an example).TC andTR are two partially

ordered finite sets of concept types and relation types, respectively.TR is partitioned into

subset of hierarchies,T1R . . .Tk

R, of relation types of arity 1. . .k wherek≥1, respectively.

Both orders onTC andTR are denoted byx≤ y, which means thatx is a subtype (or

specialization (see Section 2.2.2.4)) ofy. I is the set of individual markers (or referents),

andτ is a mapping fromI to TC.

As can be seen in Figure 3.5,TC andTR are written in more of a shorthand for

type hierarchies; they do not include the⊥ and⊤ labels, even though they are implied.

Also, as discussed above, the relation hierarchy is broken down into a set of hierarchies

using thertype function (as defined earlier in this section) noted by the arity of the

62

Page 93: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

relations within the hierarchy. The individual markers,J. andK., are mapped to the

conceptResearcherthrough the mapping functionτ. The ‘. . .’ in each membership list

indicates that there are more elements to each of these lists.

TC

T TR R

2= { }

Tc

TTr2

member works−with geographical−relation

in near

adjoin

OfficePersonProject

Researcher Manager

HeadOfProject

I = {J.,K., . . .}τ = {(J.,Researcher),(K.,Researcher), . . .}

Figure 3.5: Support Using a Relation Hierarchy (Based on [[5], Figure 1]).

3.3 FOL

First Order Logic, FOL, is a well understood form of symbolicreasoning pi-

oneered by Boole, Frege, and C.S. Peirce [51]. Each sentencethat appears in FOL

contains a predicate and a subject in variable form. The predicate can either define

or modify the subject, but the resolution of the predicate isdefined only for the logi-

cal truth values,TRUE andFALSE . When these sentences are combined, they must

adhere to the rules of Boolean algebra. These sentences “only have variables for first-

order objects (and these expressions such as “∀x” and “∃x” apply only to the elements

of a structure), so will be call afirst-order language” [[35] page 9].

63

Page 94: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

FOL is considered part of first order predicate calculus, FOPC, but for FOPC

there is also a finite number of axioms. As restrictions are relaxed, and one looks at

the full area of logic considered FOPC, predicates can extend beyond just TRUE and

FALSE [12], where there are predicates such that can not be proven TRUE nor FALSE

[50]. With FOPC,λ−expressionsusing the predicate axioms with an infinite sequence

can be expressed. This allows the use of predicates, such as,exists, forall, iff,etc.

Building up these axioms to represent sentence descriptions leads to set theory.

3.4 Conceptual Graphs

In his book [119], John Sowa states: “Conceptual graphs forma knowledge

representation language based on linguistics, psychology, and philosophy” [[119] page

69]. The representation containsa graph, the definition stated in Section 3.1, and oper-

ate according to graph theory rules using graph diagrams that are built out of the logic

building blocks of conceptual structures (see Section 2.3.2.3). The definitions for some

of the blocks are presented beginning with thetypeblock:

Definition 3.4.1 A type is a labeling for an abstract idea which is either

a conceptual unit or a relationship. These types are membersof a set, T,

that may form several structures including hierarchy trees, lattices, and

other related structures. When the structure is a type hierarchy lattice,

the set is labeled TC, and the functionctypemaps a conceptual unit to

64

Page 95: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

the type label in the structure. When the structure is a relation hierarchy

tree, the set is labeled TR, and the functionrtypemaps a relationship to

the type label in the structure.

A referentblock would have the following definition:

Definition 3.4.2 A referent is an abstract conceptual unit that has been

instantiated with a factual value.

Therefore, aconceptual graph, CG, applies the following definition:

Definition 3.4.3 A conceptual graph is a bipartite, connected, directed

graph G= (V,E), such that V , all vertices inG, is partitioned into two

disjoint sets VC and VR. The vertices are labeled, and the set VC is called

theconceptnodes and the set VR is called the conceptual relationsnodes.

Thus, e∈ E is an ordered pair that connects an element of VC to an

element of VR using a directed edge which will be calledan arc.

The label of a concept node is a pair, c=< type, re f erent>. The type

is an element of the set TC, that may be defined in a type lattice (see Sec-

tion 2.1.1). The referent (if present) contains the individual instantiation

for the type field; however, if it is not present then c=< type,empty>or

just written c=< type>.

65

Page 96: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

The label of a conceptual relation node is a pair, cr=< type,signature>,

where type is an element of the set TR, and the signature is a pair,

s =< I ,O > where I is the arcs that are directed into the conceptual

relation and O is the arcs that are directed out from the conceptual re-

lation. The signature is further defined by its subset category of either

relationor actor. The relation is a tuple, r=< type,c1,c2, ...,cn >where

type is defined above and in the signature I⊆ VC and O∈ VC. The

number of concepts in the tuple is the valence of the relation. A con-

ceptual relation of valence n is said to be n-adic, and all signatures

must be at least 1-adic. The actor is a slightly different tuple, a=<

type,c1,c2, ...,{...,cn−1,cn} > where type is defined above and in the

signature I⊆VC and O⊆VC.

Figure 3.6 shows a basic conceptual graph in traditional format with nine nodes.

R2

C1

C3

R1 C2

R3 R4

C4R5

Figure 3.6: Basic Abstract Conceptual Graph.

66

Page 97: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Figure 3.7 shows a conceptual structure with nine nodes in the mathematical

digraph and bigraph format. Within the CS community, it is felt that the typical display

format of Figure 3.6 is easier to read and follow the conceptual relationships.

3

C1 R1

C2 R2

R3C3

C4 R4

R5

Figure 3.7: Basic Abstract Conceptual Graph in Digraph Format that is Bipartite.

In Figure 3.6, four nodes are concepts (seen in display mode as rectangles), five

nodes are relations (seen in display mode as ovals). In this example,VC = {c1,c2,c3,c4}

andVR = {r1, r2, r3, r4, r5}. There is not a type hierarchy, but the four concepts are

67

Page 98: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

c1 =< C1 >, c2 =< C2 >, c3 =< C3 >, c4 =< C4 >, and the relations arer1 =<

R1,< c1,c2 >>,r2 =< R2,< c1,c3 >>,r3 =< R3,< c1,c5 >>, r4 =< R4,< c2,c5 >>

, r5 =< R5,< c3,c5 >> . As can be seen the “R1” relation has the signature< c1,c2 >

which indicates thatr1 is a 2-adic (or binary) relationship wherec1 is the input concept

or the argument to the relationship andc2 is the output concept or the output for the

relationship.

Sowa has shown how unknown objects (nodes with no referent field) can be

computed by anactor node [119]. Actor nodes (displayed as diamond-shaped boxes)

are connected to concept nodes withdashedlines because anactor can best be thought

of as a “functional relation”, where there is a semantics (performed by the procedure)

being represented graphically between two objects. Figure3.8 is a functional relation-

ship, displayed with the diamond shape, betweenCATCHINGandPERSON, CATCH,

andBALL.

A functional relationship has directionality from one nodeto another, but there

is both optional inputs, and multiple outputs possible of conceptual information. In

order to know how to handle the data being processed through this kind of relationship

an action function(see Figure 3.9 for example) is attached to the relation. Therefore

each functional relationship is calledan actor in the conceptual graph representation

because it can perform actions on its data.

68

Page 99: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

AGT

POSS

BALLPERSON

CATCH

CATCHING

PTNT

Figure 3.8: Basic Conceptual Graph with Actor.

void Catching(String person, String catch, String &ball){

// process in knowledge base so this person now has// possession of a specific ball

}

Figure 3.9: Action Function For Basic Actor Graph.

If this internal representation was to be used at a higher level for a logic knowl-

edge representation, one would need to map the intention of the predicate with atype,

and the extension of the predicate with areferent in the internal a conceptual structure

representation.

69

Page 100: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

3.4.1 Graph Theory Relationships

This conceptual structure encodes knowledge using concepts and conceptual

relations. Concepts are “blocks” of typed information and conceptual relations are the

“linkages” between the blocks. This knowledge is then transformed into a graphical

structure such that a conceptual graph contains two kinds ofnodes: concepts and re-

lations. The lines are the arcs between these two kinds of nodes. It is this duality

that make the graph abigraph, or bipartite graph. It should be noted that conceptual

relations can also be of two kinds: direct relationships andfunctional relationships.

Also, unlike a general graph, the pairs of nodes defining the arcs are ordered, or

directed. The arrows on the arcs show the directionality of information movement from

one node to another. As can be seen in Figure 3.6, relationR2 is in a direct relationship

from conceptC1 to conceptC3. R2 receives conceptual input data from conceptC1,

and produces output data and sends it to conceptC3 (instantiatesC3). Therefore, these

nodes are connected in a triple relationship.

The walk for a conceptual graph must not only alternate between nodes and

arcs, but the kind of the nodes must alternate between concepts and relations [136].

When a walk is just atrail such as in Figure 3.6, an example trail would be fromC1→

R2→C3→ R5→C4. In a walk, since the arcs are incident to the nodes, each concept

node has a relationship number to each relation node (i.e.C1 has the relationship

number 1 toR2, andC3 has the relationship number 2 toR2). From now on to this

70

Page 101: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

relationship number will be referred to asthe ith edgeof the relation in respect to each

linked concept.

For there to be a path in a graph all the vertices must be unique. However, for

a conceptual graph only the concept nodes must be unique [136]. If the path is closed,

that is a cycle, then the first concept,c1 would be equal tocn. Since a conceptual graph

is directed, one can follow the arcs through the graph to create a path. A conceptual

graph without a cycle, that does not contain a functional relation, is called atree[136],

and the path followed will lead to a leaf node. Examining Figure 3.6, a path reaching

all the concept nodes (but not all the relations) would beC1→R2→C3→R5→C4→

R4→C2→ R1→C1. Note,R3 is not reached andC1 is repeated; therefore, this is a

path, but not a tree.

3.4.2 Formation Rules

Not every concept and conceptual relation combined together make sense in

a meaning full way; therefore, conceptual graphs that do represent meaning will be

consideredwell-formed, and other combinations with no meaning will be calledill-

formed [118]. When working with well-formed CGs, three formation rules can be

applied repetitively [118]:

1. Copy - An exact copy of a well-formed CG is well formed.

2. Detach - All CGs that remain when any conceptual relation is removedfrom a

71

Page 102: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

well-formed CG are also well-formed.

3. Restrict - If a is a concept in a well-formed CGG, then for any conceptc ≤ a

from the concept type hierarchy (see Section 3.2) ofG, the graph obtained by

substitutingc for a is well-formed.

Examples can be presented to show how each of these formationrules can be applied.

The ‘copy’ formation rule is fairly straight forward: the graphG in Figure 3.7, can be

copied to graphH in Figure 3.6, where both graphs are equivalent and well-formed,

just displayed in a different way.

If one starts with the graphH in Figure 3.6, and performs two ‘detach’ formation

rules; first, remove conceptual relationR3; and second, remove conceptual relationR5;

then graphH ′ shown in Figure 3.10 will be produced. The graphH ′ is still well-formed

even though two of the conceptual relations have been detached.

R4R2

R1

C4C3

C2C1

Figure 3.10: Basic Detached Conceptual Graph.

72

Page 103: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

In order to explain clearly about therestrict rule, graphH ′created with the de-

tach formation rule applications in the paragraph above, and graphG shown in Figure

3.11 will be used in connection with a new concept type hierarchy shown in Figure

3.12.

R1

R6

C6

C5 C2

Figure 3.11: Simple Basic Conceptual Graph.

T

C5

C2 C1

C3 C4

Figure 3.12: Second Concept Type Hierarchy.

73

Page 104: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

GraphG contains the nodesC5, R1,C2, R6, andC6; in which, theC5 node can

be restricted using the second concept type hierarchy (see Figure 3.12) toC1, because

C1 is a subtype of nodeC5 (note: other restrictions could also be performed).

This restriction will produce the well-formed graph in Figure 3.13.

R1 C2C1

R6

C6

Figure 3.13: Simple Restricted Basic Conceptual Graph.

3.4.3 Simple Conceptual Graphs (SCGs)

Researchers M. Chein and M.-L. Mugnier [15] from the LIRMM group at the

Universite Montpelier and others [5, 22] have done researchon a subset of concep-

tual graphs known assimple conceptual graphs, SCGs,(see Sowa 3.1.2 [119]). As

explained in Baget and Mugnier [5], these SCGs are connected, bipartite graphs where

the arcs are labeled and finite but not directed,SG= ((Vc,Vr),U,λ). Figure 3.14 is an

example of a SCG.Vc andVr are the concept and relation nodes, respectively.U is the

set of edges, where edges incident on a relation node are totally ordered (that is, they

74

Page 105: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

1

2

3

C4C3

C1

R2

Figure 3.14: Simple Conceptual Graph (SCG).

are numbered from 1 to the degree of the node).λ is a labeling function of the nodes

and edges [75].

ExaminingU further, an edge numberedi between a relation noder and a con-

cept nodec can be labeled by(r, i,c) and is unique inU ; all edges withinU will be

stored in this triplet format. As an example, from Figure 3.14, (r2,2,c1) would be an

element ofU .

Every node also has a label defined by the mapping ofλ. A relation node’s label

is its (type(r), arity(r)) (defined in Section 3.2.2), and a concept is its (type(c),marker(c))

(defined in Section 3.2.2). The directionality is removed tosimplify the reasoning

processing of the graphs. Due to the fact that there is no directionality, there are no

conceptual relations that are functional (excludes actors).

However, an extension from SCG that does allow directionality and cycles (this

will be discussed more later in actual algorithms), isfeature term graphs,ω− term,

introduced by Ait-Kaci [2]. A conceptual graphG is a feature term graph if it obeys the

75

Page 106: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

following conditions [136]:

• the relations are all binary, for any relationr onlyarg1(r) andarg2(r) are defined,

• the relations arefunctional, for any relationsr andr ′∈ A,arg1(r) = arg1(r ′) and

type(r) = type(r ′) implies thatr = r ′, and

• there is aheadconcepth∈C such that for allc∈C there is a path(c1, r1, . . . , rn−1,

cn) with arg1(r i) = ci andarg2(r i) = ci+1 such thatc1 = h andcn = c. Note that

whenn = 1 this includes the casec = h.

3.4.4 Conceptual Graphs Interchange Format (CGIF)

The conceptual graph interchange format (CGIF1) is a representation for con-

ceptual graphs intended for transmitting CGs across networks and between IT systems

that use different internal representations. The CGIF syntax ensures that all necessary

syntactic and semantic information about a symbol is available before the symbol is

used; therefore, all translations can be performed during asingle pass through the in-

put stream. Part of this information is reproduced here in appendices (see Appendix

B) to give a concrete definition of a conceptual graph and indicating how CGs were

transmitted between the systems during testing discussed later in this work.

1The current archived copy of CGIF from the ICCS2001 workshopis located at:http://www.cs.nmsu.edu/~hdp/CGTools/cgstand/cgstandnmsu.html#Header_44

76

Page 107: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

The CGIF format was originally developed by John Sowa for a possible In-

ternational Standard [126]. It was then modified to the format seen at the CGTools

Workshop, that was held at the International Conference on Conceptual Structures in

2001 [97]. Since that time, it was totally changed and incorporated, as Annex B, into a

larger effort of standardization known as “International Standard for Common Logic”

[33].

3.5 Data Structures

In order to evaluate the array and hash table data structuresover the graph struc-

ture, one looks at how long it takes to store and retrieve a single relationship (see Def-

inition 3.4.3 for a CG) within the graph given the specified data structure. Note: this

does not examine or account for any support (see Section 3.2.2) or hierarchy (see Sec-

tion 3.2) processing. This is from the perception that more retrievals will be done on the

knowledge base than stores, so it is important to optimize the retrieval of relationship

elements of the graph over considering the time and space to store that information.

Table 3.1 indicates the time to store and retrieve a relationship from a set ofn rela-

tionships within a graph for certain data structures. The following sections define how

these values were reached and any related constraints or constants.

77

Page 108: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Table 3.1: Execution Times For Single Element with Set of Sizen.

Data Structure Storage RetrieveArray (sorted) O(n) O(log(n))

Array (unsorted) O(1) O(n)Hash Table O(1) O(1+α)

Perfect Hash (single) O(n) O(1)

Perfect Hash (double) O(n2) O(1)

3.5.1 Arrays

When arrays are used for data structures, the time an array takes to store ele-

ments depends on whether the array of values is sorted. When the data is not sorted,

but just appended to the end of the array then the storage of data is very quick,O(1),

but retrieving the data back can take as long asO(n) because one has to look through

the whole array. When the array is sorted data on storage, it can takeO(n) time to place

the data, but with a binary search on a tree structure it takesO(log(n)) time to retrieve

it back.

However, if the sorted data is from a directed cyclic graph into a knowledge base

structure, storage is equal in execution time for the time needed to retrieve it back. This

can be shown, such that, for building the array, the execution time for a single graph, is

O(n) where C *n = #vertices + #edges and #edges = 1/2#vertices=1/2n, so C = 3/2 (see

Cormen90[20]). For retrieving the element back from the graph (for example, doing a

direct match) the whole array may again need to be checked again giving the execution

78

Page 109: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

time ofO(n).

3.5.2 Hash Tables

Storage of an element in a hash table data structure has the expected storage

time of O(1) plus the time it takes to compute the hash value for the key,h(k), and to

store in the case of collision depending on the secondary data structure. If the secondary

structure is an unsorted linked list then the element can just be placed at the head of

the list (most common data structure [20]). On retrieval, even when there are collisions

with key hash values, there are still far fewer thann wheren is the number of nodes in

the graph. The hashing function will produce more than one value, so all hash keys will

not collide. The expected time for retrieval with a hash table isO(1 + α) whereα is the

time to retrieve the element if there was a collision at storage, and the time to compute

the hash value for the key,h(k).

3.5.2.1 Perfect Hashing

In true perfect hashing, there are no collisions on key values so retrieval time

now becomesO(1) (note: there is a constant because of the execution of the hashing

function to find the key) [24]. However, creation of the perfect hashing function given a

set of dynamic input data can be costly on storage. There has been research on finding

the perfect hashing function, and a hash function description (“program”) for a set of

size n occupiesO(n) words, and can be constructed in expectedO(n) time [83]. Work

79

Page 110: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

has also been done on finding a universal hash function [131] or a quasi-perfect hash

function [23] (as opposed to a perfect hash function) that can be constructed in time

O(1+α) whereα is again not close ton.

3.5.2.2 Hash Table/Hash Tables

When a hash table is the value element of a hash table data structure, then

there is extra storage space in order to hold the overhead needed by the second hash

table. Considering that this hash table is embedded in another hash table with its own

overhead, then there is double the amount of overhead space being used. However,

if both hash tables areperfect hash tablesthen the retrieval time for the finding the

sub-value becomesO(1) * O(1) or O(1) (constant). After the overhead retrieval time,

constant, is accounted for, then the retrieval time isO(1).

Besides the extra overhead space is required, the time to store the double hash

tables would be at maximumO(n2) time for two hash tables to be stored. This assess-

ment is reached by looking at two hash tables in which one table holdsn elements and

the other table holdsm elements. Since the size ofm is ≤ to n, then evaluation can

be performed by using the sizen. When using Pagh’s algorithm [83] discussed above,

it was shown that to store perfect hash tables for one set of hashes takesO(n) time;

therefore, if storing two perfect hash tables it would takeO(n) time at each element in

the first hash table to store the second hash table orO(n2) for both tables.

80

Page 111: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

CHAPTER 4

REASONING OPERATIONS

Within this chapter, first will be described the operator ‘project’ and then how it relates

to the operator ‘join’. The second section will describe graph isomorphism relation-

ships, and the last part of the chapter will describe how all these elements are connected

within reasoning operations.

4.1 Operators

Using the knowledge representation described in Section 3.4, two operators,

project and join, manipulate conceptual graphs using the rules that incorporate type

hierarchy subsumption [48]. These operators are duals (i.e., intersection and union),

therefore, the description of project is, in some sense, thedual of the description of

join.

The following set of correspondences are sufficient to indicate how project and

join compare:

Project ←→ Join

Min. Supertype ←→ Max. Subtype

Intersection ←→ Union

81

Page 112: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

4.1.1 Project

Theprojectoperator is defined through a mappingπ :u→ v, whereπu is a sub-

element ofv. Whenu andv are defined to be conceptual graphs, for graphu to be a

subgraph of graphv, all of the nodes and arcs ofu are inv [46], and the project operator

π holds to the following rules [119, 136]:

• Type preserving: For each conceptc in u, πc is a concept inπu wheretype(πc)

≤ type( c ), and≤ is the subtype relation. Ifc is an individual, that is an actual

instance of an object, thenreferent( c ) = referent( πc).

• Structure preserving: For each conceptual relationr in u, πr is a conceptual rela-

tion in πu wheretype(πr) = type( r ). If the ith edge ofr is linked to a conceptc

in u, the ith edge ofπr must be linked toπc in πu.

The example in Figure 4.1 shows project with the general forms of graphs.

G

A F

A

projectI

J

A

B

I

J

Figure 4.1: Project (Mp (Q, H) = P) (Adapted from [[92], Figure 3]).

82

Page 113: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

This example uses the hierarchy example (see Figure 3.3) from Section 3.2. One

can see that the nodeA from the first graph, which will be calledQ, is projected onto

the second graph, which will be calledH, with a match at its nodeA. This is the only

exact match in the project. Then using the hierarchy,F is the supertype ofI , so when

Q is projected onto graphH, I the common subtype ofI andF forms a new node in

the projection graphP, and this node is linked toA. Lastly, nodesG from graphQ and

J from graphH have a common sub-type ofJ, so that is formed as a new node in the

projection graphP giving the resulting project graphP = {V,E} whereV = {A, I ,J}

andE = {{A, I},{A,J}}. Note, using this hierarchy there are more than just this one

project possible.

If join (see Section 4.1.2) is likened to set union, in that all nodes not joined

are just left alone, and come along for the ride, then projectis like set intersection.

All nodes that are not projected are simply dropped from the resultant graph, and their

associated relation nodes are detached.

4.1.2 Join

With an elementary join between two graphs,U1andU2, that are non-necessarily

distinct; letc1,c2 be two concept vertices belonging respectively toU1andU2, and hav-

ing the same type or subtype, then the results of the join ofU1 andU2would beU3 with

the restriction (see Section 3.4.2) of conceptc1 with c2 and linking toc2 all the edges

that had been linked toc1 now inU3 [15, 119].

83

Page 114: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

In join MJ (see Figure 4.2), the labelsmaybe restricted by replacement with

a label of any subtype, and graphswill be merged on the maximum number of nodes.

Figure 4.2 again uses the hierarchy example (see Figure 3.3)from Section 3.2. One

can see that the nodeA from the first graph, which will be calledQ, is joined with the

second graph, which will be calledH, with a match at its nodeA. This is again the

only exact match in the join. Then using the hierarchy,I is the subtype ofD, so when

Q is joined with graphH, D is restricted toI andI forms a new node in the join graph

J, and this node is linked toA. Lastly, nodeK from graphQ is linked into the new

join graphJ giving the resulting join graphJ = {V,E} whereV = {A, I ,K,B,F} and

E = {{A, I},{A,K},{A,B},{A,F},{I ,F}}.

A

K

I

B F

join

A

B

I

F

K

A D

Figure 4.2: Join (MJ (Q, H) = J) (Adapted from [[92], Figure 2]).

84

Page 115: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

4.2 Graph and Subgraph Isomorphism

Table 4.1 shows how each of these problems and sub-problems fall within

the problem classes discussed in basic subgraph isomorphism reasoning (see Section

1.2.1).

4.2.1 Graph Isomorphism

For two graphs to be identical, the vertices inG must map onto the vertices in

H, such that,(x,y) is an edge ofG iff ( f (x), f (y)) is an edge inH; therefore, giving

isomorphicgraphs. However, if the graphs are labeled, that is the vertices have actual

labels as opposed to variables, then given graphG = (Vg,Eg) and graphH= (Vh,Eh),

such that they are identical, that is(x,y)∈ Eg iff (x,y)∈ Eh, then they can be defined to

beisomorphic. As already stated in Section 1.2.1, graph isomorphism is inthe problem

class NP (see first row of Table 4.1), even though there are known algorithms when the

graphs are labeled that have a polynomial time solution.

4.2.2 Subgraph Isomorphism

As discussed in Ullman’s paper of 1976 [132] and used in basicsubgraph iso-

morphism reasoning (see Section 1.2.1), looking for all theisomorphisms between a

given graphG = (Vg,Eg) and subgraphs of a further graphH= (Vh,Eh) allows the de-

tection of related objects within the two graphs. This subgraph isomorphism helps to

find if two structural patterns within the graphs are related.

85

Page 116: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Table 4.1: Related Problem Classes.

ProblemClass

Graph DescriptionGraph to Graph(worst casetime)

References

GraphIsomorphism

nodes - non-labelededges - undirected

NP [42]

SubgraphIsomorphism

nodes - non-labelededges - undirected

NP-Complete [42, 132, 77, 68]

SubgraphIsomorphism

nodes - labelededges - undirected;labeled

P (n2) [132, 77, 68]

Isomorphismnodes - non-labelededges - undirectedgraphs are both trees

P (n2.5) [103, 42, 111]

SubtreeIsomorphism

nodes - non-labelededges - undirectedquery graph is a tree

NP-Complete [42]

SubforestIsomorphism

nodes - non-labelededges - undirectedquery graph is a forest;search graph is a tree

NP-Complete [42]

SubbipartiteIsomorphism

nodes - bipartite;non-labeled except typeedges - undirected

NP-Complete [38, 39, 40]

Projection

nodes - bipartite; non-labeled except typeedges - labeled;undirected

NP-Hard [119, 48, 74, 22]

ProposedProjection

nodes - bipartite;labelededges - non-labeled;directed

NP-Harddissertationdefinedalgorithm

MaximalJoin

nodes - bipartite; non-labeled except typeedges - labeled;undirected

NP-Hard[84, 119, 48, 74,77]

ProposedMaximalJoin

nodes - bipartite;labelededges - non-labeled;directed

NP-Harddissertationdefinedalgorithm

86

Page 117: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

4.2.2.1 Non-labeled nodes and undirected edges

This is where one wishes to discover ifG1 contains a subgraph isomorphic to

G2. For this class of problem, the vertices are general non-labeled nodes and the edges

are non-labeled, undirected links. According to Garey and Johnson [42], and other

references, this problem can be restricted to the known NP-Complete class problem of

CLIQUE and therefore has the complexity of NP-Complete. Perthe reasoning given

above for graph isomorphism, the complexity of graph,G2, to all the graphs in a knowl-

edge base is also NP-Complete.

4.2.2.2 Labeled nodes and undirected edges

This sub-problem of the subgraph isomorphism problem discussed above is

shown by Ullman [132], and others to be solvable in P (polynomial time). Within

the Messmer and Bunke paper of 2000 [68], they show that by dividing the subgraph

question into two parts: 1) decomposing the graph, and 2) querying the subgraph iso-

morphism question on the smaller graph; this can improve thetime complexity. In fact,

producing unique labels for the decomposed graph parts (down to the single nodes)

allows both parts to run in polynomial time.

87

Page 118: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

4.2.3 Subtree Isomorphism

One of the sub-problems to subgraph isomorphism issubtree isomorphism. This

is when bothG andH are trees (“a tree is a connected acyclic graph” [[46] page 32]).

A polynomial time algorithm for this sub-problem was shown by Reyner [103] where

the running time wasO(n1∗n1.52 ) wheren1 is the number of vertices in the input graph

andn2 is the number of vertices in the knowledge base graph. This polynomial time

algorithm extends tom∗O(n2.5) wherem is the number of graphs in the knowledge base

andO(n2.5) is the polynomial running time for the input graph times the knowledge

base graph. In theP algorithm, then is the maximum number of nodes in the largest

graph. It should be noted that Reyner’s algorithm used maximal matching in a bipartite

graph and therefore considered the trees to be bipartite [103].

4.2.3.1 Hamiltonian Path

When the query graph,H, is a tree and the knowledge base graph,G, is un-

known, thenH contains aHAMILTONIAN PATHas a sub-problem and hence is NP-

Complete [according to Garey and Johnson [42] page 104].

4.2.3.2 Subforest Isomorphism

When the knowledge base graph,G, is a tree then the query graph,H, must be

acyclic. If it is not a tree, then it may be a forest. However, Garey and Johnson [[42]

page 105] also show that this sub-problem is also NP-Complete.

88

Page 119: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

4.2.4 Subbipartite Isomorphism

This sub-problem can be defined as a subgraph isomorphism search using bi-

partite graphs. This would be the most closely related classof problem to the reasoning

operations projection and maximal join as defined in Sowa’s 1984 book [119]. This

sub-problem of subgraph isomorphism answers the decision question: is there a sub-

graph of the knowledge base graph,G, that is isomorphic to the query graph,H, where

G andH are bipartite graphs. According to the Eppstein 1994 work [38], this sub-

graph isomorphism question on bipartite graphs can be answered in the best case in

polynomial time. This comes about because the number of edges are reduced through

the relationship between the nodes and a natural set of labels that are added because

of the types on the nodes. However, it should be stated that these labels are not totally

unique, and therefore, Ettinger [40] clearly states for theworst case running time over

a whole knowledge base where the labels turn out to be duplicated across nodes, the

execution time is still NP-Complete. The labels may not be totally unique even though

they are separated into two groups because the labels in the nodes must only be of two

different types; within a type the label on all nodes may be the same. Therefore, this

sub-problem can be reduced to the Maximal CLIQUE problem which is known to be

NP-Complete.

89

Page 120: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

4.2.5 Projection

Projection involves both the subclass of problems defined above as subbipartite

isomorphism and a new subclass of problem that looks at defining rules for type lattice

(many times called ‘trees’) subsumption.

4.2.5.1 Historical Algorithms

The projection sub-problem can be consideredconstructiveas well as isomor-

phic because of the way the rules are applied. The construction comes from the gener-

alization that can be applied when a node, through the application of subsumption with

the type lattice, isbuilt into a new node of the output graph when part of the projection

of an isomorphic subgraph. Therefore, the output to projection is not simply a logical

true or false, but a newly constructed graph containing the subgraph structure from the

knowledge base graph with possible new constructed nodes through the application of

the subsumption rules. The nodes in the graph are only labeled to the same extent as

the bipartite graphs described in the above class so, when evaluating the running time

of the algorithm, if no rules are applied from a type lattice,the running times are just

the same as for a subgraph isomorphism using bipartite graphs. The output graph in

this case is the subgraph from the knowledge base graph that was being projected onto.

However, the worst case running time when rules for the type lattice are applied

must take into account that projection is a problem known to be in NP (Sowa [119],

90

Page 121: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Hartley and Coombs [48], Mugnier and Chein [74], and Croitoru and Compatangelo

[22]), and is constructive, so NP-hard.

4.2.5.2 Proposed Algorithm

This sub-problem of the projection given above will be defined new in the dis-

sertation. It makes the following two modifications to the maximal projection problem:

1) all nodes are uniquely labeled, and 2) the edges are non-uniquely labeled, but do

have some implicit labeling because they are directed. Tests were also performed us-

ing different data structures at implementation time. It isbelieved that through the

use of different data structures, the execution time will reflect the running time of the

subgraph ‘labeled’ isomorphism problem as opposed to the subgraph isomorphism on

bipartite graphs. The change in data structures also allow achange in how concepts

verses conceptual relations are searched for within the graph structure through the use

of the ‘labels’. Through this shift in sub-problem of subgraph isomorphism, there is an

improvement in the running time for the first part of the projection problem (not having

the application of rules from the type lattice).

However, because the overall problem is still constructiveas opposed to a de-

cision problem, and the application type rules have a worst case running time in NP

(Mugnier and Chein [74], and Croitoru and Compatangelo [22]), the worst case run-

ning time for this sub-problem is still NP-hard.

91

Page 122: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

4.2.6 Maximal Join

Maximal join is a sub-problem of projection, in that, the maximal join algorithm

is a join on compatible projections. These projections are maximally extended from the

common generalization of two graphs which are bipartite graphs [119].

4.2.6.1 Historical Algorithms

In performing a join, the time complexity includes the time to find the subbi-

partite isomorphism(s) of the two graphs, and then the matching (or joining) of these

projections again in a constructive manner to produce the largest extended constructed

graph from the subgraph of the knowledge base graph with the query graph.

Graph matching can be reduced to a unification problem, and bydoing so, in

many cases where the graphs are acyclic, can be performed in linear time (Myaeng and

Lopez-Lopez [77] and Paterson and Wegman [84]). Therefore,the overall complexity

of a maximal join in the best case (when no type rules are applied in the projection)

is still polynomial,O(n4); however, in the worst case (when the projection does apply

type rules) it is a NP-Hard problem.

4.2.6.2 Proposed Algorithm

Like the proposed new projection algorithm, this new algorithm is a sub-problem

of maximal join with modifications. The modifications are unique labels on all the

nodes of the graphs and non-labeled directed edges in the graph. Here, it is seen that

92

Page 123: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

the data structures being modified at implementation time again drive the projection to

be the time complexity of a labeled subgraph isomorphism, without type lattice rules.

These changes also drive the matching in the join to be lineareven when the graphs are

cyclic. Therefore, the best case time complexity isO(n3). However, the worst case,

with type rules being applied, is still NP-hard. During experimentation, it is hoped that

it can be shown that the worst case is not reached very often.

4.3 Operations

There are two basic operations necessary to process CG reasoning processes:

1) projection and maximal join. These operations use the project and join operators,

respectively, and apply the CG KB algorithms over them. These algorithms are based

on the subgraph isomorphism class of problems defined in the section above.

4.3.1 Projection

A projection operation uses the project operator, which is amatching on a graph

morphism, graph data structures with either the support information for SCGs or hierar-

chies when full CGs, and the actual projection algorithm. Stated in Baget and Mugnier,

“the elementary reasoning operation, projection, is a kindof graph homomorphism that

preserves the partial order defined on labels” [[5] page 428]. Not only does projection

use a project operator (see its definition in Section 4.1.1),but either the supportS of

the graph (when a SCG) or the defined type hierarchy (when CG),and produces a gen-

93

Page 124: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

eralization subgraph. During the projection of the query graph onto the match graph,

the match graph is generalized, and structure is removed by conceptual relations being

detached [37].

For the rest of this work, the projection operation evaluation and comparison

will be restricted to injective projection. The projectionmapping is not necessarily one-

to-one; that is, a concept or relation inu may have more than one concept or relation

in v thatπu is a valid mapping. In this respect, there is more than one valid projection

from u to v .

When the projection operation is performed using the query graph from Figure1

4.3 onto the KB graph and hierarchy of Figure 4.4, the two projections,P1 andP2,

discovered are displayed in Figure 4.5. Using the type hierarchy, bothobjectandball

are matches; note, if no hierarchy were present, then there would be only one projec-

tion. This is a simple injective projection because of the small graphs, however, it can

become complex very quickly.

Color: bluepropObject

Figure 4.3: Query Graph.

1The figures in this section were generated byCharGer[32].

94

Page 125: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

CubeBetweenBalls

prop

prop

Object

Ball

Color: blue

Cube: A

Ballbetween

ontop

T

CubeBetweenBalls

prop

prop

Object

Ball

Color: blue

Cube: A

Ballbetween

ontop

Object Cube

Ball

Figure 4.4: KB Graph with Type Hierarchy.

P1

Color: bluepropObject

P2

Color: blueBall prop

P1

Color: bluepropObject

P2

Color: blueBall prop

Figure 4.5: Projection Results.

95

Page 126: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

4.3.2 Maximal Join

For the join operation with conceptual graphs it is always maximal. Maximal

join is therefore defined as: “a join on compatible (common) projections that are max-

imally extended from the common generalization,L, of two conceptual graphs,Q and

G” (see Sowa 3.5.8 [119] page 102). The join is locally maximalbecause there may

be more than one group of compatible projections from two graphs that are maximally

extended (see Figure 4.2). In this way, structure is added orconcepts are made more

specific [37]. Since restrictions are allowed, it is clear that two nodes are join-able as

part of a maximal join operation if they contain types that have a maximal common

subtype using the supportS(in the case of SCGs) or type hierarchy (for CG).

Papers ([15, 92, 91, 48]) contain many examples, but to clarify the maximal

join operation three examples will be shown here. First, thetwo projections found in

the previous example for the projection operation, could bejoined into a single graph

becauseobject is a generalization ofball and cube (see Figure 4.6). Basically the

Objectconcept from graphP1 would be restricted to conceptBall, and then relationR1

would be detached; this produces a graph that is just a copy ofthe graphP2. Because

these two graphs could be fully joined into a completely compatible (common) graph,

where there are no nodes that were not join-able, then these are consideredcompatible

projections. When graphs are specialized, they are maximally joined on compatible

projections of a more general graph [119]; therefore, the joined graph from Figure 4.6

96

Page 127: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

could then be joined back to the original graph seen in Figure4.4 to produce parcel

models. Within these models, the consideration that the second ball that is part of the

‘between’ relation is also colored blue will be shown.

Color: bluepropBall

Figure 4.6: Join ofP1 andP2 Graphs.

The second example relates back to Figure 1.3 given in Section 1.2.2 of the

Introduction Chapter (see Chapter 1). From that example it can be seen that the graph

U is the common projection graph between graphsG1 andG2. When graphsG1 and

G2 are maximally joined this common graph becomes the merged nodes within the

resulting graphG. In order for graphU to be the merging ‘piece’ between graphsG1

andG2, it is assumed that a hierarchy indicating thatGirl ∼≤∼ Personis available

information. It is using this subtype that allows the restrict rule to produce the available

join.

The last example being discussed to clarify the maximal joinoperation comes

about when the graphs in Figures 3.10 and 3.11 (see Chapter 3 Section 3.4.2) are max-

imally joined. It has already been seen, within that section, that the graph in Figure

3.11 be restricted and detached to produce the graph in Figure 3.13. Using the common

97

Page 128: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

graph seen in Figure 4.7, the graphJ in Figure 4.8 is produced with just one step after

restriction.

R1 C2C1

Figure 4.7: Common Graph of Basic Graphs.

R6

C6

R4R2

R1

C4C3

C2C1

Figure 4.8: Join of Detached Basic and Simple Basic Graphs.

4.3.3 Over Knowledge bases

As discussed in Section 1.2.1, all the subgraph isomorphismproblems discussed

so far are from a two graph perspective. However, for knowledge bases there may be

more than one graph within the KB that will match to the input (query) graph [68].

98

Page 129: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Looking at the operations above, when they are performed over a knowledge

base of graphsG , even though the two graph operation in the typical situation can

be solved in P, the functionality of the operation over the whole database gives the

following results.

Projection’s functionality over a set of graphsG is:

projection: G × G → 2G

As described above, there can be more than one valid projection between two

graphs, hence the powerset notation on the set of all graphsG .

The functionality of maximal join over a set of graphsG is:

maximal join: G × G → 2G

There can be more than one maximal join, hence the powerset notation on the

set of all graphsG . Join is a binary operation but multiple graphs can be joinedby

composing it with itself. Unfortunately, there is good reason to believe that join is not

commutative when semantic considerations come into play [91], but for now it will

assume there is no problem.

99

Page 130: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

100

Page 131: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

CHAPTER 5

ALGORITHMS AND ANALYSIS

As discussed in Section 4.3.2, the maximal join operation isan algorithm that involves

the joining of compatible projections that are maximally extended; however, not much

analysis and implementation has been performed on the join operation. Therefore, in

the first section on foundational algorithms, only projection algorithms with be ex-

plored. Later when the newly developed algorithms are discussed, any variations on

maximal extension of graphs and joining will be addressed.

5.1 Foundational Algorithms

In general, the matching part of both the projection and joinalgorithms is unifi-

cation (discussed previously in section 1.2.2) [19], and there are known linear unifica-

tion algorithms for acyclic (tree) graphs [84]. Also, SCGs have been evaluated as both

graph homomorphism and graph isomorphism. In their original paper from 1992 [74],

Mugnier and Chein looked at general projection running times and injective projection.

However, CGs and SCGs are not necessarily trees and only partof the algorithms pre-

sented next apply to injective projection, so these linear algorithms give guidance, but

do not always directly apply.

As discussed in the Messmer and Bunke paper [68], a naive strategy with forward-

101

Page 132: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

checking for establishing a subgraph isomorphism is Ullman’s backtracking in search

tree algorithm [132]. Since Messmer and Bunke feel that it isa common technique with

a good baseline subgraph isomorphism algorithm, the Ullmanalgorithm and its known

complexity (from [132, 68]) will be reiterated here for to define a basis for investigating

projection algorithms. The basic idea of Ullman’s algorithm is to take one vertex of the

input vertices (query graph) at a time and map it onto a model (a graph from the KB)

such that the resulting mapping represents a subgraph isomorphism for a subgraph of

the model (KB graph) projected from the input graph (query graph) (see page 307 and

322 of Messmer and Bunke [68]). If at some point, the mapping being built does not

represent a subgraph isomorphism then the algorithm backtracks and tries a different

mapping. This process is continued until all vertices,v1, . . . ,vM in VI of the input graph

are successfully mapped ontoV of the model. This either produces a subgraph isomor-

phism fromG to GI or stops when a vertex inVI can not be mapped to at least one

vertex inV. In the second case, the algorithm backtracks to a newv1 in V or vn−1in V

and tries to remap the subgraph isomorphism.

Even though this basic algorithm works well for small model and input graphs,

it performs poorly as the graphs become larger. This is because all checks are being

done locally. Ullman added a forward-checking procedure toknow when it is not pos-

sible forvn to be mapped onto an available vertex inVI (see page 322 in Messmer and

Bunke [68]), so that the algorithm can backtrack immediately and save computational

steps. In the best case Ullman’s algorithm is bounded by:O(NIM) whereN = #model

102

Page 133: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

graphs,I = #labeled verticesin the input graph which come from theM set of labels,

M = #labeled verticesin the model graph that are unique. In the worst case the algo-

rithm is bounded by:O(NIMM2) whereN = #modelgraphs,I = #verticesin the input

graph that are not labeled, andM = #verticesin the model graph that are not labeled.

With this general algorithm, labeling of vertices greatly improves the efficiency of the

algorithm. However, it should be noted, that this algorithmdoes not take into account

any support or hierarchy knowledge information.

5.1.1 SCG Projection

This section is an explanation of the projection algorithm found in Marie-Laure

Mugnier and Michel Chein’s 1992 work [74]. Note the base level polynomial algorithm

discussed is for SCG without loops (cycles) in the graph being projected, trees, and

this is the foundation for improving projection between twoSCGs with a support (see

sections 3.4.3 and 3.2.2).

Before discussing the general and injective projection algorithms, some basic

definitions are given which will help the reader understand each algorithm. 1) Using

the projection operation provided in section 4.3.1, the following additional rules on

labels will be added to the graph morphism (from [74] page 240):

Definition 5.1.1 Given two simple conceptual graphs G and G′, a pro-

jectionΠ from G to G′ is an ordered pair of mappings from(RG,CG) to

(RG′,CG′), such that:

103

Page 134: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

(i) For all edges rc of G with label i,Π(r) Π(c) is an edge of G′ with

label i.

(ii) ∀r ∈ RG, type(Π(r)) = type(r); ∀c∈CG, type(Π(c)) = type(c).

There is a general projection fromG to G′ if and only if G′ can be derived fromG by

the elementary specialization rules [119, 15].

2) The set of the numbers on edges between r and c (refer to section 3.4.3 on

SCGs) holds the following definition:

Definition 5.1.2 For c a neighbour of r, let Pr [c] be the class of the

partition of Pr which corresponds to c.

3) Injective projection definition:

Definition 5.1.3 Injective projection is a restricted form of projection

where the image of G in G′ is a subgraph of G′ isomorphic to G.

The projection from a tree to a graph in the general case, as defined on pages 245-246

of Mugnier and Chein work [74], and where there is a concept vertex a in T and a

concept vertexc in G is given in Algorithm 5.1.

104

Page 135: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Algorithm 5.1 Π is a General Projection fromT to G

1: function PROJ-ROOT(a,E) ⊲ a∈CT

2: E←{c∈ E | label(a)≥ label(c)}3: If E = /0 or a is a leaf,return E4: for all r successors ofa do ⊲ Move through the neighbours5: for all c∈ E do6: Wc,r ← { r ′ neighbour ofc | type(r) = type(r ′) andPr [a]⊆ Pr ′[c]}7: end for8: Er ←

S

{Wc,r}c∈E

9: Er ← PROJ-r(r,Er )10: for all c∈ E do11: Vc,r ←Wc,r

T

Er

12: end for13: Er ←{c∈ E |Vc,r 6= /0}14: end for15: return E ⊲ Project of graph16: end function

17: function PROJ-R(r,E) ⊲ r ∈R18: E← r ′ ∈ E | Pr is thinner thanPr ′

19: If E = /0 or | P |= 1 is, return E20: for all ai successors ofr do ⊲ Move through the hierarchy21: Ei ←

S

{cr ′ | Pr [ai ]⊆ Pr ′[cr ′]}r ′∈E22: Ei ← PROJ-ROOT(ai , Ei)23: E←{r ′ ∈ E | cr ′ ∈ Ei}24: end for25: return E ⊲ Projection up relation hierarchy26: end function

For this general algorithm to compute this projection fromT to G, it is broken

into two parts. The first function is used to determine thePROJ-ROOT part of the

definition. As seen in line 4, the function looks for the projection fromT to G by com-

paring the relation vertices connected to concept vertexa in T to the relation vertices

connected to conceptc in G.

105

Page 136: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

The second function at line 17 is used to determine thePROJ-r part of the defi-

nition. This function looks for possible mappings at each concept vertex by examining

sub-trees. The complexity of this general algorithm as proved on page 247 of Mugnier

and Chein [74] isO(mT ×mG), where m denotes the number of edges. The problem

class related to this algorithm is in the NP class of problems.

This should be recognized as a single graph to graph project operator and with

a projection operation (see section 4.3.1) an injective projection is necessary in order

to produce the projection graph. If each graph is a tree then one has a tree to tree

projection which is known to have a polynomial time algorithm [42], but conceptual

graphs are not necessarily trees.

Therefore, Mugnier and Chein [74] modify their algorithm tothe given Algo-

rithm 5.2 to actually return the image of the new projected graph. Within this algorithm

they use the functionPROJ-r to continue to look for possible mappings at each con-

cept vertex, but they modify thePROJ-ROOT routine to return the projection image.

Even though this is a locally injective projection, on page 249 they prove that ifT is a

conceptual tree andG is a cyclic conceptual graph then the decision question problem

being solved by this algorithm is still a NP-complete problem.

106

Page 137: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Algorithm 5.2 Π Modified as an Injective Projection fromT to G

1: function PROJ-ROOT(a,E) ⊲ a∈CT

2: E←{c∈ E | label(a)≥ label(c)}3: If E = /0 or a is a leaf,return E4: for all r successors ofa do ⊲ Move through the neighbours5: for all c∈ E do6: Wc,r ← { r ′ neighbour ofc | type(r) = type(r ′) andPr [a] = Pr ′[c]}7: end for8: Er ←

S

{Wc,r}c∈E

9: Er ← PROJ-r(r,Er )10: for all c∈ E do11: Vc,r ←Wc,r

T

Er

12: end for13: Er ←{c∈ E |Vc,r 6= /0}14: end for15: for all c∈ E do16: Build the bipartite graph(A,B,U) such that:17: A ={sons ofa}, B ={neighbors ofc}18: (B can also be defined as

S

{Vc,ai ,ai ∈ A})19: U = {aiv | v∈Vc,ai}20: If this graph admits a matching with cardinality| A |,21: c is a solution22: end for23: return all c-vertices which are solutions of 22 ⊲ Projection of the subgraph24: end function

5.1.2 SCG Relation Projection

Madalina Croitoru’s new projection algorithm is based on SCGs as described in

her two 2004 papers [22, 21]. This algorithm begins by starting from the foundational

algorithm given in section 5.1.1 by Mugnier and Chein [74]. The decision question

associated with this new algorithm is the same as was stated in the Mugnier and Chein

1992 work [74] and is in the class of problems that are NP-complete. The significant

107

Page 138: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

change applied to Algorithm 5.2 is split the algorithm into two parts and adding a

preprocessing algorithm to each graph pair looking for amatching graphas defined by

the Definition 4.1 (in [22], page 8). Before defining the matching graph, some added

definitions are needed:

Definition 5.1.4 1) λ is a labeling of the nodes of a SCG graph G with

elements from the support S (see Section 3.2.2).

2) d is the degree (or arity) of each node in the SCG graph G.

3) N denotes the neighbour sets for the relation node (see Section 5.1.1).

Now for the actual definition:

Definition 5.1.5 Let SG= (G,λG) and SH= (H,λH) be two SCG’s

without isolated concept vertices defined on the same support S.

The matching graph of SG and SH is the graph MG→H = (V,E)where:

- V ⊆ VR(G)×VR(H) is the set of all pairs(r,s) such that r∈ VR(G),

s∈VR(H), λG(r)≥ λH(s) and for each i∈ {1, . . . ,dG(r)} λG(NiG(r))≥

λH(NiH(s)).

- E is the set of all 2-sets{(r,s),(r ′,s′)}, where r 6= r ′,(r,s),(r ′,s′) ∈

V and for each i∈ {1, . . . ,dG(r)} and j ∈ {1, . . . ,dG(r ′)} such that

NiG(r) = N j

G(r ′) we have NiH(s) = N jH(s′).

108

Page 139: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

These matching graphs indicate which relation vertices should be used as potential

candidates for projection; therefore, reducing the searchspace for the related search

problem. By using this preprocessing with the matching graphs, the projection ofG→

H in its reduced form belong to a class of problems in which finding the maximum

clique can be solved in polynomial time [22]. Therefore, theexecution of the algorithm

gives a polynomial time algorithm to the NP-Hard search problem.

5.1.3 Polyprojection

This is Mark Willems’s algorithm explaining polyprojection and how it relates

to a CG projection algorithm from his 1995 paper [136]. A polyprojection (from Defi-

nition 5 in [136], page 282) is:

Definition 5.1.6 Consider two (conceptual) graphs G=(C,R, type, re f erent,

arg1, . . . , argm) and G′=(C′,R′, type′, re f erent′,arg′, . . . , arg′m). Apolypro-

jectionµ from G to G′is a pair of Cartesian product subsets µC⊆C×C′

and µR⊆ R×R′ that are:

1. Type preserving: for all concepts c∈ C and c′ ∈ C′, cµcc′ only if

type(c)≥ type′(c′), and re f erent(c)= ∗ or re f erent(c)≥ re f erent′(c′),

2. Type preserving: for all relations r∈ R and r′ ∈ R′, rµRr ′ only if

type(r)≥ type′(r ′),

3. Structure preserving: µR◦arg′i = argi ◦µC.

109

Page 140: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

4. Non Empty: for all concepts c∈C there is a concept c′ ∈C′ such that

cµcc′.

It is said thatG′ is structurally similar toG, if there is a polyprojectionµ betweenG′

andG, and will be writtenG′µG.

Given this definition, Willems goes on to define that a polyprojection can be

found by a polynomial algorithm. The algorithm is divided into two parts, the first

part computes steps 1 and 2 from Definition 5 and finds the structure to beType-

preserving(G,G’) (see Algorithm 1 in [136], page 283); the second part computes

step 3 from Definition 5 and finds a polyprojection through theuse ofStructure-

preserving(M) (see Algorithm 2 in [136], page 284) whereM0⊆Type−preserving(G,

G′) and determine a pair of setsM ⊆ M0 that is structure-preserving; that isM =

({(c1,c′1), . . . , (cn,c′m)},{(r1, r ′1), . . . , (ro, r ′p)}) wheren = # of concept verticesin G, m

= # of concept verticesin G′, o = # of relation verticesin G, andp = # of relation ver-

ticesin G′. The actual execution time of the algorithm is not given; theonly statement

is that it is a polynomial result.

The algorithm described above is reminiscent to the one given in Reyner’s work

[103] (see page 284 in [136]). Therefore, if both G and G’ are trees, the polyprojection

of GµG′ is a projection ofG ontoG′ by Corollary 8 (see page 283 in [136]). Willems

goes on to state in Theorem 10 (see page 285 in [136]) that if there is a polyprojection

TµG′ whereT is a tree, then there is a projectionT → G′. This is significant because

110

Page 141: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Garey and Johnson on pages 104 - 106 of [42] indicate that the sub-problem of sub-tree

isomorphism calledsub-forest isomorphismis NP-complete. The sub-forest isomor-

phism problem is where given two graphsG andH, determine ifH is isomorphic to a

subgraph inG, such thatG is required to be a tree, butH is a forest. However, in this

caseH may be a cyclic graph, and given thatG is a tree, a polynomial time algorithm

can be determined. Willems shows that a polynomial time algorithm can be found for

detecting the structure of a projection graph helped in the design of the new algorithm

seen in section 5.2.2.

5.1.4 Notio Projection

The Notio project is a conceptual graph implementation witha well defined API

[117]. It is currently being used by several projects [30, 10, 99] for working with basic

reasoning operations with a CG KB. This is the author’s derived theoretical algorithm

(see Algorithm 5.3) from the Notio implementation code [117, 115] for his injective

projection algorithm (note: Southey never wrote any analysis papers or documentation

on the actual implemented algorithm).

It should be noted for Algorithm 5.3, the vertices are all labeled, but the edges

are directed. Also for the analysis of the execution times given above, the following

definition of variables hold:

111

Page 142: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Definition 5.1.7 Variable definitions:

|Mc |= # of concepts in the KB graph

|Mr |= # of relations in the KB graph

|Qc |= # of concepts in the query graph

|Qr |= # of relations in the query graph

|Qe |= # of edges in the query graph

| N |= # of graphs in the KB

| KBc |= # of concepts in the whole KB

As can be seen in the stated algorithm, in step 1: Notio collects all the concept and

relation vertices from both the KB graph and query graph. This takesO(|Mc |+ |Mr |

+ | Qc | + | Qr |). In step 2: Notio attempts to see if any of the concept vertices from

the KB graph maps to a concept vertex in the query graph.

In this way attempting to see if there is any possible subgraph isomorphism

of the KB graph onto the query graph. In the best case this stepis bounded by:O(|

Mc || Qc |) ; for the worst case by:O(|Mc || Qc || KBc |) ; and expected by:O(|Mc ||

Qc || log(KBc) |) . In step 13: Notio (if a possible mapping was indicated from step

2) will attempt to match all the relation vertices from the KBgraph (along with their

neighboring concepts along their edges) onto query graph vertices with the same edge

relationships. As a match is found for relation vertices in the query graph; only those

relation vertices are now examined. At the end of this step, it is checked that all relation

vertices for the query graph were mapped. In the best case this step is bounded by:

O(|Mr ||Qr ||Mc ||Qc |+ |Qe |) , with the arity being binary (so it is just a constant).

112

Page 143: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Algorithm 5.3 Notio Projection

1: Get all concept and relation vertices from the KB and Query graphs2: for i← 0,num f irstconceptsdo ⊲ all concepts in KB graph3: for j ← 0,numsecondconceptsdo ⊲ all concepts in Query graph4: f oundmatch← f alse5: if (type(ci) == type(c j )) || (supertype(ci) == type(c j )) then6: if (individ(ci) == individ(c j) || (individ(c j) == /0) then7: f oundmatch← true ⊲ match all concepts in query graph8: end if9: end if

10:11: end for12: end for13: if foundmatch ==true then14: for i← 0,num f irstrelationsdo ⊲ all relations in KB graph15: for j← 0,numsecondrelationsdo ⊲ all relations in Query graph16: if (!relation[j].mapped) && (type(r i) == type(r j )) then17: if match fromr j to match to each of its conceptsthen18: relation[j].mapped = true ⊲ repeat line 2 for all19: end if20: end if21:22: end for23: end for24: f oundmatch← true25: for j ← 0,numsecondrelationsdo26: if !relation[j].mappedthen27: f oundmatch← f alse28: end if29: end for30: end if31: if foundmatch ==true then32: P← build new subgraph projection33: return P ⊲ return new projection34: else35: return /0 ⊲ no projection returned36: end if

113

Page 144: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

In the worst case step 13 is bounded by:O(| Mr || Qr || Mc || Qc || KBc | + |

Qe ||Qr |) when the arity of the query graph is fully connected; and expected by: O(|

Mr || Qr ||Mc || Qc || log(KBc) | + | Qe |) , again with the arity being binary and only

having to go to the hierarchy the height number of times. In step 31: if a projection is

found, it is returned.

Therefore the leading step is 13 in the overall running time,so the best case for

finding a projection for all the graphs in the KB =| N | would have a lower bound of

: O((| Mr || Qr || Mc || Qc | + | Qe |))(| N |). Therefore, when the number of graphs

in the KB is small, the number of vertices in the KB graphs are small, and the number

of vertices in the query graph is small then the execution time would move towards

O(n3) wheren =avg # of nodesin the KB graphs. As the KB grows in size and as

the number of vertices in the KB graph and query graph increase the expected run-time

becomes explosive even though not out of P. However, the worst case bound for the

whole KB is very close to the worst case bound given for Ullman’s algorithm above:

O((|Mr ||Qr ||Mc ||Qc || KBc |+ |Qe ||Qr |))(|N |).

5.2 New Algorithms

After examining the above algorithms it was discovered thateven though the

running times were acceptable with small size graphs and fewer numbers of graphs,

the actual algorithms were either not truly general as with SCG or had a very poor

execution times with large data sets. With a SCG set of graphs, the user was confined

114

Page 145: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

by what parts of a valid conceptual graph could be present in the data. The desire to

allow the user to give a directed, connected, bipartite conceptual graph (see Definition

3.4.3) that was cyclic and contained actors prompted new projection and maximal join

algorithms to be designed.

5.2.1 Supporting Information

In order to produce new algorithms, new data structures and supporting routines

were needed. Because the author believes that the connection between the algorithm

and data structures in the KB is critical, the new data structures and variables need to

be designed around the actual supporting routines.

5.2.1.1 Variables and Given values

Evaluating all the past projection algorithms, and lookingat the data struc-

tures used for each knowledge base, the author has discovered that handling conceptual

graphs astriplesas opposed to vectors or linked lists makes the operation of projection

much easier and cleaner to process. This author is not the first researcher to think about

using triples. Kabbaj and Moulin in 2001 [58] looked at CG operations using a boot-

strapping step. It was at this time that they also looked at defining the join operation

using triples as part of the matching data structure. Even asrecent as 2006, Skipper and

Delugach, [113], looked at using triples again in the data structure for the storage of

graphs. However, in both cases, they did not look at exploiting the triples in the actual

115

Page 146: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

algorithm of the operation.

All conceptual graphs in the KB and the query graph are storednot only with

the general conceptual graph information, but also with aC-R-C list andC-A-C list in

a cs-triple format. Their definitions are given below:

• cs-triple is a 3-tuple,T =< ci ,b,c j >, whereci ,c j are concept nodes, andi and

j are not equal.b is a conceptual relation (either a relation or actor node), and

(ci ,b) ∈ E and(b,c j) ∈ E, andci andc j are members in the signature ofb.

• defining labelsare all elements in a data structure that hold a unique label;that

includes concepts, relations, actors, and cs-triples

• c-r-c list is a concept-relation-concept list that holds cs-triple information in

which the ‘b’ in the 3-tuple is a relation node

• c-a-c list is a concept-actor-concept list that holds cs-triple information in which

the ‘b’ in the 3-tuple is an actor node

During the performance of the projection operation, two added data structures are used.

One data structure holds the matching possibilities of the query concepts with the KB

graph concepts, called thematch list, and the second structure holds the matching triples

from the KB graph for each concept in the query graph, called theanchor list. These

data structures improve performance by making available preprocessed information at

the time of creating and building the actual projection graphs. These data structure’s

116

Page 147: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

implementation will be defined when describing the experimentation systems (see Sec-

tion 6.4).

5.2.1.2 Actual Supporting Routines

Because the conceptual information is the structural foundation of a concep-

tual graph and because the relationships between the concept define the meaning of

the graph, the new supporting routines algorithms define in Algorithms 5.4, 5.5 and

5.6 have been defined around thecstriple relationship ofC-R-C.The main supporting

routines are:MatchHierarchy, MatchConcept, MatchConcepts, MatchTriple, andPro-

jection. They are the foundation behind the projection operation, and these routines

will help in determining the projection operation’s worst case and typical case execu-

tion time.

5.2.1.3 Worst Case Analysis for Support Routines

Using the support routines defined in the Algorithms 5.4, 5.5and 5.6, the worst

case execution time will be evaluated.

MatchHierarchy:

The type hierarchy is depicted as a tree of relationships, such that, the maximum depth

of the tree is just all concepts from the top,⊤, to the bottom,⊥. Therefore, in the worst

case the time to match to the given input concept type is to traverse the whole tree, or

linear, which isO(n).

117

Page 148: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Algorithm 5.4 Supporting Projection Routines

1: function MATCHHIERARCHY(qi ,n j ) ⊲ q∈Q andn∈G2: foundmatch =f alse3: if check flag for supertypethen4: check to see ifqi is a supertype ofn j ⊲ check up hierarchy5: if qi is supertype ofn j then6: foundmatch =true7: end if8: else9: check to see ifqi is a subtype ofn j ⊲ check down hierarchy

10: if qi is subtype ofn j then11: foundmatch =true12: end if13: end if14: if foundmatch =true then15: add to match list16: return n j ⊲ returnn j as a match17: else18: return NULL ⊲ return NULL as no match19: end if20: end function ⊲ Check if concept match in hierarchy

21: function MATCHCONCEPT(qi ,n j) ⊲ q∈Q andn∈G22: if check match list forq, n matchthen23: return n j ⊲ returnn j as a match24: else25: if type(qi) == type(n j) then26: M← { qi ,n j } as match27: return n j ⊲ returnn j as a match28: else29: return MatchHierarchy(qi,n j ) ⊲ Check if match in hierarchy30: end if31: end if32: end function ⊲ Check if concepts match

33: function MATCHCONCEPTS(qi ,G) ⊲ q∈Q andG∈ KB34: for eachn j ∈ L, where j = 1 toc(G) do ⊲ L is a list inG35: C← MatchConcept(qi,n j )36: end for37: return C ⊲ All matching concepts from KB graph to Query graph concept38: end function

118

Page 149: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Algorithm 5.5 Supporting Projection Routines (Cont1)

1: function MATCHTRIPLE(ta,sb,directionp) ⊲ t ∈Q, s∈G and2: ⊲ directionp is a BOOLEAN3: if (directionp ==true) && ((direction from ta) == -1)) then4: match← f alse5: end if6: match← Compare relation type ofta to relation type ofsb

7: if match ==true then8: match← Compare MatchConcept(ta,cb,sb,cb)!= NULL9: else

10: match← f alse11: end if12: if ((match ==true) && (directionp == f alse)) then13: match← Compare (direction fromta == direction fromsb)14: else15: match← f alse16: end if17: if match ==true then18: return true ⊲ Indicate two triples are a match19: else20: return f alse ⊲ No triple match21: end if22: end function

MatchConcept:

This routine must first check to see if the query concept,qi , is found in the match

concept,n j , match list, and in the worst case this takes timeO(c∗m) , wherec is the

number of concepts in the query graph,Q, andm is the number of concepts in the

match graph,G. If this check fails then next is to compareqi andn j for a match in

both concept type and referent. This is a constant time operation. If this succeeds, then

adding to the match list is in the worst caseO(c∗m); if not, then worst case running

119

Page 150: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

time will beO(n) which is the height of the type hierarchy tree. Overall the total worst

case running time for this routine would beO(c∗m+n).

Algorithm 5.6 Supporting Projection Routines (Cont2)

1: function PROJECTION(i,W,G,Pset) ⊲ i,W ∈Q2: t← Number of elements in theqi list ∈W3: z← Size of Pset4: if (i == 1) then5: for eachsa ∈ qi , where a = 1 to tdo6: Pset← AddNewProjection(sa, G, PSet) ⊲ Starts Projection Graph7: end for8: else if(t == 1) then9: s1← only element ofqi list

10: AddToExistingProjection(s1, G, Pset) ⊲ Add to existing Projection Graph11: else12: Pset′← /013: for eachsa ∈ qi , where a = 1 to tdo14: Pset′← ProcessProjection(sa, G, PSet,Pset′) ⊲ Process Proj Triple15: end for16: Pset← Pset∪Pset′

17: end if18: return Pset ⊲ Return created and modified Projection Graphs19: end function

MatchConcepts:

This routine will process all the concepts in the match graph, G, wherem is the number

of concepts inG. Since to process the concepts the routine MatchConcept is called and

its worst case running time is known to beO(c∗m+n), then the total worst case time

for this routine would beO(m∗ (c∗m+n)).

120

Page 151: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

MatchTriple:

Within this routine, the driving step would be step 11. This step of the algorithm would

call the routine MatchConcept where its worst case running time is known to beO(c∗

m+n). Therefore, in the worst case this routine would also beO(c∗m+n).

Projection:

This is the routine for creating and building the new projection graphs where there is

a structural match after finding the matching cstriples between the two graphs. Within

this routine are three major step that depend on the processing of the anchor list: 1)

when first concept in the anchor list; 2) when only one relatedtriple matching for the

concept in the anchor list; and 3) when neither of the first twoconditions exist. The

driving section of the algorithm in this routine is this third type of processing. As

can be seen at step 11 of the algorithm, this step calls to routine ProcessProjection.

ProcessProjection checks to see if a new projection graph has to be started by copying

an existing projection or if an existing projection graph can just add the current cstriple

being processed in the For Loop. The easier of the two functions is to add to an existing

projection, but time must be taken to find which projection graph to add to so from the

algorithm it can be seen that is timez, which is the size of Pset or the # of projections.

The more complex modification would be to copy an existing projection graph

in order to add the new cstriple being processed. It was just seen that to add a cstriple

121

Page 152: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

is time z, but at each time step within this processing would be the time needed to

copy a projection graph, which will be calledd times the number of projection graphs

that must be copied which ist. Therefore the worst case time for this step would be

O(z∗ t ∗d). It should be noted that the size of Pset which isz would be growing much

faster than the time needed for copying,d, therefore,d can be dropped from the running

time leavingO(z∗ t).

There is a relationship betweent, the number of triple matches for this concept

in the query graph, andz, the size of Pset; that is, in the worst casez= t i−1. During the

processing of this routine, if all triple matches lead to a new projection graph, then the

number of projection graphs currently in Pset will be the number of all triple matches

currently processed from the anchor list ort i−1. On replacement ofz, one gets a new

worst case running time ofO(t i−1∗ t)or justO(t i).

5.2.2 New Projection

As seen in Algorithm 5.7 for the new projection of the query graph onto the KB

is based on looking at all triples that are in the query graph and checking for a complete

subgraph match of the query graph onto the KB graph. Because each triple in the query

graph is unique, even if the nodetype is not, all projections can be found in the KB

graph.

122

Page 153: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Algorithm 5.7 New Projection

1: function NEWPROJECTION(Q,KB) ⊲ Query and KB graphs2: P = /03: for eachG∈ KB do ⊲ All graphs in KB4: W← A list from Q ⊲ Preprocessing5: for eachqi ∈W, wherei = 1 toc(W) do6: if ((M←MatchConcepts(qi ,G)) > /0) then7: for eachn j ∈M, where j = 1 toM do8: match = f alse9: for eachta ∈Q do

10: ⊲ wherea = 1 to the # of cs-triples in crc list forqi

11: for eachsb ∈G do12: ⊲ whereb = 1 to the # of cs-triples in crc list forn j

13: if MatchTriple(ta,sb, true) == true then14: add (n j , (sb, ta)) to qi ∈W15: match =true16: end if17: end for18: end for19: if match ==f alsethen20: break out of loop and start next graph in KB21: end if22: end for23: else24: break out of loop and start next graph in KB25: end if26: end for27: Pset= /0 ⊲ Projection processing28: for eachqi ∈W, wherei = 1 toc(W) do29: Pset= Projection(i,W,G,Pset)30: end for31: P← P∪Pset32: end for33: return P ⊲ Set of projections from query onto KB34: end function

123

Page 154: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

5.2.2.1 Actual Algorithm

The overall algorithm (see Algorithm 5.7) for the projection of the query graph

onto the KB is checking for a complete subgraph match of the query graph onto the KB

graph during preprocessing. Because each triple in the query graph is unique, even if the

nodetypeis not, all projections can be found in the KB graph. Then after all matches of

conceptual units and triples are found, the actual projection graphs are built. However,

because the temporary data structures are saved from the preprocessing, matching does

not have to happen again at build time. The actual projectionjust uses the match list

and anchor list already created to build up or create the new projection graphs. Because

the anchor list contains all available projections, both injective and non-injective or

homomorphism projections are found.

5.2.2.2 Execution Time

Now that the algorithm is split into two sections, there is a running time for

answering the decision question of whether or not there is a projection, it will be called

the matching algorithm, and a running time for theactual projection. For the new

algorithms, three modifications have been made that affect the execution time of the

projection operation: 1) all nodes and triples are uniquelylabeled, 2) the edges are not

labeled, but do have implied labeling through their directionality within the triples, and

3) the triples are not only part of the data structure of the KB, but also directly effect

the actual projection algorithm. Thelabelingdrives the execution time of the matching

124

Page 155: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

algorithm when doing an injective projection toward the running time for a subgraph

‘labeled’ isomorphism problem which can be solved in polynomial time as opposed

to a straight subgraph isomorphism problem which is known tobe NP-complete. The

triples allow the matching algorithm to stop sooner when no projection is possible.

For the actual projection creation, the number of triples inthe query graph drives

the amount of time needed for the actual projection. The sizeof the graphs in the KB

affects the base of the execution time, but the number of times theProjection function

is executed is based on the number of triples in the query graph.

5.2.2.3 Worst Case Analysis for Projection

The actual projection operation algorithm is broken down into two steps: Pre-

processing (mapping of concepts) and Projection (structural build of new projection

graphs). Within the preprocessing step, the ‘forward’ concepts from the query graph,

H, that are inanchor list, W, are unified (or matched) to concepts in the match graph,

G (see 9 and 11). Because in the worst case the number of ‘forward’ concepts inH

is equal to the total number of concepts in C minus 1 from now onin this analysis the

number of elements inW will be seen as the number of concepts inH. Since in the

worst case the number of concepts inH is equal to the number of concepts inG then

the number of concepts inH will be calledm. For the rest of the processing of the

preprocessing step, it will be recognized that there are four nested For Loops with each

being connected to the value ofm. In two of the four loops, they will be executedm

125

Page 156: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

times with constant time internal processing. The second For Loop at step 26, involves

a call to MatchConcepts which has already been seen to have the worst case running

time ofO(m∗ (c∗m+n)). Assuming in the worst case thatc= m, on expansion of this

time is foundO(m3 + mn) or justO(m3) becausen can never be greater thanm. The

fourth For Loop calls the routine MatchTriple that has the worst case running time of

O(c∗m+n) or O(m2) because of the previous reasoning. This would give a worst case

running time for the matching processing ofO(m8).

The actual projection part loops around the support routineProjection. This

routine was discussed as having the worst case running time of O(t i) where i = m

when called from NewProjection. Given that the actual projection will loop through all

m concepts, in the worst case the actual projection isO(m∗ tm). Therefore, with the

overall NewProjection algorithm, the worst case is driven by the building of the actual

projection with the exponential factor on the number of concepts in the query graph.

5.2.3 New Maximal Join

As described in the Maximal Join operation section (see Section 4.2.6), more

than one node (or groups of nodes) can be joined between two graphs. When these joins

happen, the two graphs are composed into a new graph with possibly more information

than the original input graph. However, the joining of the input graph across the KB,

producing maximal join graphs are not commutative [90] whensemantic considerations

come into play. As with the projection algorithm, the overall algorithm (see Algorithm

126

Page 157: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

5.8) is split into two parts.

Algorithm 5.8 New Maximal Join

1: function NEWMAXIMAL JOIN(I ,KB) ⊲ Input and KB graphs2: J = /03: for eachG∈ KB do ⊲ All graphs in KB4: foundmatch =f alse5: W← A list from I ⊲ Preprocessing6: for eachqi ∈W, wherei = 1 toc(W) do7: if ((qi != null) && (( X← MatchConcepts(qi,G)) > /0)) then8: foundmatch =true9: for eachn j ∈ X, where j = 1 toX do

10: for eachta ∈ I do11: ⊲ wherea = 1 to the # of cs-triples in crc list forqi

12: for eachsb ∈G do13: ⊲ whereb = 1 to the # of cs-triples in crc list forn j

14: if MatchTriple(ta,sb, f alse) == true then15: add (n j , (sb, ta)) to qi ∈W16: end if17: end for18: end for19: end for20: end if21: end for22: Jset= /0 ⊲ Join processing23: if foundmatch ==true then24: for eachn j ∈M, where j = 1 to |M | do25: Jset= MaximalJoin(j,W, I ,G,Jset)26: end for27: end if28: J← J∪Jset29: end for30: return J ⊲ Set of joins from input onto KB31: end function

127

Page 158: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

This new algorithm has the matching algorithm (checking forpossible joins)

happening first, and then the actual joining of the two graphsto build the new maximal

join graph being performed second. This work will actually proceed as future work

using this algorithm as the starting point.

5.3 Typical Scenario Analysis for Projection Algorithms

Unlike the worst case analysis just evaluated for the projection algorithms, with

a typical query sent to a query-answer system, the query graph is much smaller than

the graphs in the knowledge base [100]. Basically, this comes about because the user is

trying to find a specific piece of data. Looking at the “blocks world” domain area (later

to be tested on implemented systems as seen in Chapter 7), onehas a knowledge base

of graphs that represent blocks on a table. The user wishes toknow information like “Is

there a red block in the graph?”, or “Is there a blue block above a red block?”. These

are very small graphs compared to the graphs in the knowledgebase describing all the

blocks on a table and their relationships to each other. As well as descriptions about all

characteristics and relationships to all the blocks on the table. Blocks world is a well

known planning problem [100].

5.3.1 Projection Algorithms using SCG

Both the SCG injective projection algorithms, Mugnier and Chein, and Croitoru,

have a direct tie between thematchingpart of the algorithm and thebuildingpart of the

128

Page 159: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

algorithm. Also both these algorithms are built from the relation perspective which are

typically fewer nodes than concept nodes.

5.3.1.1 SCG Projection

On evaluation of the injective projection algorithm by Mugnier and Chein, given

a typical scenario of a much smaller query graph on few graphsin the KB, the execution

time is still bound by the fact that the matching of relationsand their related concepts,

and the building of the image structure are not separated. Therefore the execution time

for the searching and building of the projection in this typical scenario has to match

every relation from the query graph onto all relations in thematch graph at all iterations.

However, if there is no match the structure of the subgraph does not have to be checked

any further from that root evaluation. When the typical scenario is very small and the

support depth is shallow then this algorithm performs well,but quickly derogates as the

number of valid projections and support depth increases because of the re-evaluation of

the match each time.

5.3.1.2 SCG Relation Projection

Croitoru has a preprocessing phase to her algorithm to look for matches, and

then executes the build phase separately based on the numberof relations in the query

graph. By doing the preprocessing phase with the matching through the search space,

the number of relations fromG that are candidates for projection is pruned. Therefore

129

Page 160: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

the execution time for the building of the projection graph in this typical scenario is

O(qrxg′r), whereqr= # of relations inQ andg′r = the # of relations fromG that were

viable candidates.

5.3.2 Notio Projection

This typical case would match up to the analysis of the lower bound,O(n3),

for the Notio as discussed in section 5.1.4. Notio does the matching and building of

the projection in the same step without pruning the tree. However, Notio only finds

a single projection because all relations within a graph must be unique. Even though

Notio can work over full CGs, this constraint does reduce thesearch space during the

Notio algorithm execution.

5.3.3 New Projection

With this typical case, the new projection algorithm moves towards the best case

results possible from the algorithm. To evaluate the typical case using this algorithm,

first the support routines will be evaluated and then the new projection algorithm will

be looked at.

5.3.3.1 Typical Case for Support Routines

Using the support routines defined in Algorithms 5.4, 5.5 and5.6, the typical

case can be given a foundation by evaluation by first examining these routines:

130

Page 161: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

MatchHierarchy:

The type hierarchy is depicted as a tree of relationships, such that, the maximum depth

of the tree is just all concepts from the top,⊤, to the bottom,⊥, but in a typical case

the tree is a broad tree and the depth of the tree is normally the log(n) wheren is the

number of concepts in the type hierarchy. Therefore, in the typical case the time to

match to the given input concept type isO(log(n)).

MatchConcept:

In the typical case the only step that would not be constant time would be matching

to the hierarchy. Since it was just shown that this running time isO(log(n)) then the

running time for this routine would be the same.

MatchConcepts:

Since to process the concepts the routine MatchConcept is called and its typical case

running time is known to beO(log(n)), then the total time for this routine would be

O(m∗ (log(n))).

MatchTriple:

Again within this routine, the driving step would be step 11.This step of the algorithm

calls the routine MatchConcept where its running time is shown to beO(log(n)) which

would also be routines typical running time.

131

Page 162: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Projection:

Since it was seen in the worst case analysis that this routine’s running time is connected

to the number of triples in element of the anchor list, then ifonly one match is available

for a query graph concept then only one projection would be produced and the running

time for this routine become linear in the number of conceptsin the query graph.

5.3.3.2 Typical Case for New Projection Algorithm

In a typical query-answer scenario where the query graph would potentially

contain normally one to four triples compared to possibly a thousand in the KB graph,

this algorithm takes into account that the query graph is small. Because of that, the

time to do thousands of graphs in a KB is only multiplied by a constant based on the

maximum number of triples in a KB graph that the small query graph is projected onto.

The preprocessing part is again based on the number of concepts in the query

graph. However, for a typical scenario these would be small;probably not more than

eight concepts. Now if the four For Loops are evaluated, two of the loops become

constant time. The second For Loop at step 26, involving a call to MatchConcepts

which has running time ofO(m∗ (log(n))). The fourth For Loop calls the routine

MatchTriple with running time ofO(log(n)). Since as stated beforen would never be

greater thanm, this would give a typical case running time for the matchingprocessing

of O(m2∗ log2(m)).

132

Page 163: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

The actual projection part of the algorithm is multiplicative in the number of

projections available with this query graph. Since in the most common case there is

only one projection, the actual projection creation algorithm becomes polynomial (in

fact linear as seen in the Projection routine analysis).

The preprocessing part now becomes the driving step in the algorithm and shifts

the execution of the problem to one that is polynomial. Through this shift in search

problem performance, the running time for the projection operation for a typical sce-

nario within a query-answer application shows improvement.

133

Page 164: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

134

Page 165: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

CHAPTER 6

SYSTEMS/ENVIRONMENTS AND IMPLEMENTATIONS

This chapter discusses each example system’s basic features, as well ashow it

is used in one or more of the previously defined knowledge representations, ontology

elements and ADTs.

6.1 Semantic Network Systems

For each semantic network system the good points/features will be brought for-

ward and also the drawbacks of each system. Each of these goodand bad features will

attempt to be defined in a factual way.

6.1.1 KL-ONE

The KL-ONE language was originally formulated by Ron Brachman’s Ph.D.

dissertation from Harvard [66]. It was built into a system atBolt Beranek and Newman

(BBN) by Woods and Schmolze [141].

The KL-ONE system was designed originally around the classic framesystem.

As stated earlier, “frames” could be defined as a knowledge representation type all to

itself, but for this work they are classifying it as a sub-type of semantic networks. Typi-

cally a frame will include an “isa” or “ako” pointer to a more general frame from which

135

Page 166: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

additional slots can inherit [141]. KL-ONE forms a taxonomyhierarchy out of multi-

ple links of this type, therefore forming a partial orderingof concepts for inheritance.

Taxonomies were discussed in Section 3.2.

KL-ONE is made up of concepts, roles, and fillers. Structuredconcepts are el-

ements standing in specific relationships to each other [141]; roles are the entity names

for the relationships; and fillers are the structural conditions of the roles. Concepts are

represented in the semantic network by ovals, roles are circled squares, and structural

conditions are double ovals attached to diamond shaped lozenge (see Figure 6.1).

BlockArch

Noncontact

Support Supporters

Supported

Objects

Lintel# = 1

Upright# = 2

V/R

V/R

Figure 6.1: A KL-ONE Diagram of a Simple ‘Blocks-World’ Arch(Based on [[141],Figure 1]).

136

Page 167: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Concepts can be generalized from other concepts. Thesesuper-conceptsspec-

ify a class of which the defined concept is a subclass. In this way, KL-ONE structures

depict a mapping of inheritance. An example from Wood’s work[141] is a concept

[appreciable debt obligation] which has super-concept link to two parents, [debt obliga-

tion] and [appreciable asset]. This example illustrates the utility from multiple parents;

it is also directly represented within the semantic network.

Concepts may also be primitive in definition. That means thatthe collection of

super-concepts, roles and structural conditions are necessary, but not sufficient to define

the concept. These concepts are indicated in the semantic network representation by

putting an asterisk by the oval [141]. Concepts may also be individual, that is they are a

member of a set and not the set itself. Many times they are the instantiation of a generic

concept and are represented in the semantic network by diagonal shading inside the

concept.

Roles also have different forms of structure. Value restrictions on roles are con-

cepts that characterize constraints on possible role fillers [141]; they are shown by roles

with an arrow coming from a role to the concept that applies the constraint. Number

restrictions may also be applied for the maximum and minimumnumber of allowed

fillers. These are seen in Figure 6.1 by the use of “# = <value>”; they may also be

a range such as “#<lowernumberorvariable>,<uppernumberorvariable>”. Roles

may also be “chained” together to produce an access path fromthe concept being de-

137

Page 168: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

fined to the intended filler [141]; this would be depicted by a using a small triangle

between the structural condition diamond and the role. The chain is necessary because

it constrains the filler of the specific role. If roles are justlinked together a square is

used for the intermediate roles.

As discussed previously in this section, taxonomic structures are built into the

semantic structure of KL-ONE. This means that at the internal representation level, sub-

sumption and other terminological operations must be considered and at the ADT level

these operations must be available. Putting the taxonomy hierarchy inside the semantic

network was a deliberate act [141], but as discussed earlier, the taxonomy is part of

level 0 and is not supposed to be part of the semantics of the actual network. Therefore,

the classification operation is used to place new descriptions into the taxonomy at their

correct position [141], and the internal representation must be able to interact with the

semantic network representation when editing the network.

Within the internal representation level, KL-ONE makes a distinction between

terminological components and assertional components. The terminological compo-

nents are called “T-Boxes” and assertions are called “A-Boxes”. The t-box is responsi-

ble for specialized types of reasoning that follow from the structure of the terms, that

is definitional information, where the a-box is responsiblefor general reasoning and

provides factual information to the system. Later systems based on KL-ONE allowed

“hybrids” between these components [141]. Given the three types of ADT defined in

138

Page 169: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Section 6.3, the logical ADT would best represent this system.

An important goal of KL-ONE was to make useful KR services available and

as the system was developed the expressive power of the system increased [141]. With

the development of the roles and fillers, the quantitative relationships were fully im-

plemented; however, this system did not provide for qualitative relationships. In fact,

frame-based systems are severely limited when dealing withprocedural(qualitative

relationship)knowledge[137].

6.1.2 SNePS

SNePS is a system designed for representing the beliefs of a natural-language-

using intelligent system [110]. At the semantic network knowledge representation

level, it consists of nodes and labeled, directed arcs. The nodes are the terms or con-

cepts of the network and the arcs are like grammatical punctuation. All entities in all the

versions of SNePS are nodes [110]; the nodes are four basic types: base nodes, variable

nodes, molecular nodes and pattern nodes. Base nodes represent some particular entity

within the network, while variable nodes represent arbitrary individuals, propositions,

etc. that are distinct from the rest of the network. Neither base or variable nodes have

output arcs. Molecular nodes represent propositions, rules and “structured individuals”,

while pattern nodes are like open sentences or functional terms with free variables. Both

molecular and pattern nodes have input and output arcs and are structurally defined by

the arcs. Every node has an identifier and base nodes may be identified by the user (all

139

Page 170: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

others are system generated identifications).

The arcs were defined differently for different versions of SNePS. Within the

current version, there are two types of arcs: descending andascending. The arcs rep-

resent relationships. The current system also has a belief revision system as a standard

feature. As part of this system, assertion tags ‘!’ are appended onto asserted nodes. For

an example of how the semantic network representation lookssee Figure 6.2.

snsequence

B

Aput

table stack

M18 M19!

M10

M13 M14 M15 M17

M16

M12

M11

lex

lex

lex lex

lex

plan

action action

action

actobject2

object2object2

object1 object1

object2

lex

object1

object1

Figure 6.2: A SNePS Representation of “A on B on a Table” (Based on [[110], Figure12]).

140

Page 171: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Internal to the SNePS system are incorporated some theoretical decisions [110]:

• the system will not build a new node where there is already a node in the structure.

• two variables in one rule can not be instantiated to the same term.

• the universal quantifier is only supported on a proposition whose main connective

is one of the following: and, or, min/max, or thresh.

Given these restrictions, SNePS is not much more than an intensional propositional

representation; however, the inference package, SNIP, is adirect part of SNePS and

adds to the capabilities of the system.

SNIP must be able to interpret rules properly because it is a separate system

and because operator-based formulations may be added on topof SNIP. Also the belief

revision system is also built above SNIP. Therefore, when looking at a possible internal

representation for SNePS one would need the functionality of predicate calculus. This

would also mean that the logical ADT would need to be chosen.

SNePS is a very straight forward representation. It has onlynodes and arcs, and

puts everything into the semantic network structure. Thereis no hierarchy being ap-

plied to the network or even structurally incorporated intothe network, thereby keeping

it very simple. Belief processing is available through assertion tags and operator-based

formulations may be added on top of the system through procedures. However, only

141

Page 172: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

universal quantification is available, therefore limitingknowledge that can be repre-

sented. Also no qualitative relationships are possible.

6.1.3 SNAP

SNAP stands for Semantic Network Array Processor and was implemented at

the University of California. It is a parallel computer architecture with a semantic net-

work representation of the permanent knowledge being stored [72]. The actual model

is one of marker-passing and the knowledge-base does not do much more than just

general production rule processing (see Figure 6.3 for an example).

California

city

Los Angeles

university

USCis-inis-in

is-a is-a

Figure 6.3: SNAP Semantic Network of “USC in LA, CA” (Based on[[72], Figure 2]).

The permanent knowledge for the knowledge-base is stored atstart up time.

Nodes are terms or concepts and the arcs are the labeled relations between the nodes.

For each new relationship within the knowledge-base an instruction is created by the

controller of the machine, transformations are performed and node assignments are

142

Page 173: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

done, and then commands are broadcast to specific array processors for storage of the

knowledge [72].

The temporary knowledge is where the markers are processed.Markers are

flags that travel around a distributed intelligent network.Marking nodes indicate that

they are relevant to the current action. Markers may also have attributes associated with

them.

The inference engine controls the two knowledge areas, but the job of the infer-

ence engine is controlled by the controller on the machine and the intelligent network.

The markers are controlled by the inference engine and spread the searches and queries.

Because of the simpleness of the actual semantic network knowledge represen-

tation, the internal representation can just be basic data structures and the basic ADT

for the IF .. THEN structure can be used. This does not give much expressive power to

the semantic network, but it does allow parallel processingacross an intelligent network

which provides much potential for the future.

6.1.4 CS Initial Project - PEIRCE

The PEIRCE project is named after the American philosopher and logician

Charles Sanders Peirce [37]. In 1883, Peirce developed the first linear notation for

first-order logic [86]; however, he felt that the predicate notation for logic was unduly

complex [121]. Then in 1897, Peirce inventedexistential graphs[86, 25] with the sim-

ple mechanism of graphs within a context that were parts of larger graphical notations

143

Page 174: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

[121, 37]. John Sowa then used these existential graphs as his foundation for his Con-

ceptual Graph theory [119].

The PEIRCE project is designed to be built out of conceptual graphs[37]. It

originated as a joint effort for different systems being built out of conceptual graphs

across the world to work together [37]. Over time, it became aproject at the PEIRCE

Foundation being built by its director Gerard Ellis in Australia. An example of a graph

within the PEIRCE system is given in Figure 6.4.

Person: ∀ Age: Ε1

*x

Date: Ε 1

Date: ∀Chrc

Chrc

Ptim

Ptim

Birth

DT-Birth Diff-DT

schema for Age(x) is

Figure 6.4: PEIRCE Schema for Age (Based on [[119], Figure 6.5]).

144

Page 175: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

These graphs are made out of conceptual structures which were discussed in

Section 2.3.2.3. The PEIRCE system is divided up into the following modules [37]:

• Programming standards

• Database storage and retrieval

• Linear notation input and output

• Massively parallel hardware

• Graphical editor and display

• Conceptual catalogs (ontologies)

• Programming in conceptual graphs with constraints

• Inference/theorem-proving mechanism

• Learning mechanism

• Natural language parsers and generators

• Information systems engineering

• Vision system

The following modules are the only ones that are within the scope of this work: Database

storage and retrieval; Graphical editor and display; and Programming in conceptual

145

Page 176: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

graphs with constraints. Because of the difficulty of collaboration with many people

none of these original modules made it past the design phase,but this work was very

important as these original designs were used within other tools that have been devel-

oped for the conceptual structures community.

The database storage and retrieval module was responsible for storing concep-

tual graphs. It was to use a C++ ADT for graph operations and generalization hierarchy

operations. These were to incorporate the fundamental operations of graph matching

and unification (maximal join [119]). They also perform generalization and specializa-

tion operations on the hierarchy. As stated in the Ellis work[37] large knowledge bases

were being created for processing, but it has taken some timeto deliver these to the

community.

A graphical editor and display that was constructed in X-Windows and executed

on all versions of Unix available (including Linux for PCs) was one of the foundational

modules. This same module runs under Windows. Growing out ofthis effort is the

very complete graphical editorCharGerdeveloped by Harry Delugach [29, 30]. To go

along with the editor would be a compiled language that will allow programming in

conceptual graphs with constraints.

This actual system would be available once the ADT has been coded and boot-

strapped into a compiler for conceptual graphs. Two systems, Amine and FMF, have

grown out of this effort and given Prolog compilers that include CGs as part of the lan-

146

Page 177: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

guage [55, 56, 124]. The system is mainly a set of concepts andtools, however it will

address all quantitative and qualitative relationships and generalization and specializa-

tion operations when functioning.

6.2 Conceptual Graphs Environments

6.2.1 CoGITaNT

CoGITaNT has several useful utilities: a set of library routines in C++ for con-

ceptual modeling, some knowledge bases in conceptual graphs, and an XML specifica-

tion for CGXML [64]. All documentation is in French and none is available in English

(including the installation instructions). In the future,documentation should be avail-

able in English which will allow this author to test and evaluate this very complete

system.

6.2.2 Amine

Amine is actually a “platform” as opposed to an environment [55]. Its main

processing is a multilingua system for ontologies [54]. It was originally built on a

conceptual structures internal representation, with a storage representation compiled

through Prolog [57]. Now that it has been converted to a platform, it is written in Java.

At the present time, only the ontological hierarchies have been converted, but all the

storage representation will soon be made available. Amine is using CGs as an internal

representation for machine translation from French to English.

147

Page 178: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

6.2.3 pCG

pCG is “a process operating upon a CG”. It was developed at the University of

South Australia by David Benn under the direction of Dan Corbett [10, 9, 8]. This is

based on the work of Guy Mineau at the Universite Laval [69, 70]. This system imple-

ments its process mechanism by using the Java library routines of Notio developed by

Finnegin Southey [117].

pCG had several design goals: 1) making concepts, graphs, actors and processes

first class citizens with the pCG language; 2) easy extensibility; 3) rapid development;

4) portability; and 5) minimality [10]. Of these goals, the first, fourth and fifth were

the most interesting to this author. By making all value types first class types in this

language, every type can be passed as a parameter to functions for execution. Portability

was available by using Java as the language and the ANTLR1 construction tool for

parsing. This system was constructed and designed with as few of constraints and built-

in keywords as possible. Therefore many functions that are already available within the

Notio system are directly possible from pCG.

”The pCG language is multi-paradigm, since apart from its object-based char-

acteristic, pCG supports imperative (variables, assignment, operators, selection, iter-

ation), functional (higher order functions, value, recursion), and declarative styles of

programming” [10]. This created the opportunity for interoperability between pCG

1http://www.antlr.org

148

Page 179: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

and other systems.

6.2.4 CPE

The Conceptual Programming system (CP) was originally developed as a sin-

gle, standalone application [92, 93] that handles temporal, spatial and constraint infor-

mation [47, 94] using a knowledge base of Conceptual Graphs (CGs) [119]. CP was

a knowledge representation development environment with agraphical visualization

framework, that had a set of tools that used graph structuresand operations over those

structures to do knowledge reasoning.

All knowledge within the system is stored and operated on as agraph. These

graphs are implementations of Sowa’s Conceptual Graphs [119], but also retain many

of the features of graph theory [46]. Although there exists amapping from CGs to

formulae in first-order predicate calculus (FOPC), the operations used in the CP sys-

tem take advantage of the graphical representation; therefore, the data structures and

operations over the graphs use graph theory [46] instead of FOPC.

The original system was a single application written in Lispand ran only on a

Symbolics machine. The data structures were CGs defined using link lists of structure

elements where the structures held the node information andthe links were the edges

of the graphs. All graphs had to be entered directly into the environment’s editor, and

each graph was stored into the environment’s knowledge base. The CP inference engine

would then operate over these data structures; sometimes creating new graphs or partial

149

Page 180: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

models of conceptual graphs and storing them into the environment’s knowledge base.

In the old environment, there was no way to import or export any of the graphs

or models. This prompted investigation into alternative data structures and models to

allow other applications and systems to communicate with the CP application [95, 96,

49]. Harry Delugach’s invited talk at ICCS2003 [31] outlined a framework for building

active knowledge systems. By 2004 the Conceptual Programming Environment (CPE)

had been introduced with its new modular, multi-component design to increase the flex-

ibility of the environment and to allow modules to be used outside of the environment

by other systems [87]. The viewpoint on the redesigned was tomake the CP Environ-

ment be the “heaven” displayed in Delugach’s framework. At that time, the main form

of interoperability was by using the CGIF interchange format (see Section 3.4.4). John

Sowa, in a paper published in 2002 as part of a ”Special Issue on Artificial Intelligence”

of the IBM Systems Journal[124], proposed a modular framework as an architecture

for intelligent systems because of the flexibility in communication and interoperabil-

ity it provides. This flexible modular framework (FMF) allows different applications

in different memory spaces to communicate using a blackboard architecture of mes-

sage passing between applications. FMF would be very usefulin implementing the

reference framework discussed in Aldo de Moor’s RENISYS specification methodol-

ogy [34] because FMF handles interprocess communication across computers as well

as processes, and it would also be useful in developing the intelligent agent operations

from Delugach’s framework [31]. However, the modularization of CP is at a module

150

Page 181: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

component level, rather than the FMF process communicationlevel, so that the module

can be directly “tied-in” to another application. The modular design at the component

level also allows modules to be interchanged as units, as in modular furniture, to get

the most flexibility from the environment.

The modularization of the CP Environment allows parts of theenvironment, the

actual modules, to be both interfaced and interacted with byoutside systems or appli-

cations. It also has a specific module, CGIF, that creates a mechanism to import and

export CGs created from execution of the environment’s inference engine modules and

storage in the environment’s knowledge base. CPE included simple wrapper modules

to allow other languages, besides C and C++, to use the CGIF module.

6.2.4.1 Basic Architecture for the Environment

Figure 6.5 depicts the new directionality of the CP Environment. The very light

gray background area indicates what is actually part of the environment. The light gray

oval depicts applications, i.e. the pCG reasoning and language system. The medium

gray rounded-corner-square represents editors that are available for CGs, i.e. ARCEdit;

these editors should be able to import/export CGIF formatted files. The light gray

trapezoid and drum shapes indicate data that is not necessarily graphical in nature, but

may be part of a domain of information that a user wishes to process (note: the data in

the database need not necessarily be textual and may be graphical or any visual form).

The very dark gray shapes are modules that are part of the CP Environment and use

151

Page 182: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Figure 6.5: Current CP Environment (From [[87], Figure 1, page 322]).

the environment’s internal data structures. All solid arrowed lines in the figure indicate

data or processing that is currently available; dashed arrowed lines indicate where an

interface, connection, interaction, and/or translation should be available between these

elements, but is not currently present.

6.2.4.2 Data Flow within the Environment

Because the architecture is set up as a set of modules, each module is set up as a

DLL (under Windows) and a library (under Unix or Linux) depending on the operating

152

Page 183: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

system. It also has a specific module, CGIF, that creates a mechanism to import and

export CGs created from execution of the environment’s inference engine modules and

storage in the environment’s knowledge base. This mechanism can be “plugged-in” to

other applications by using the CGIF module’s API specification to call the module’s

implementation code level [117]. All the modules have available APIs to allow their

library routines to be called by other applications. Also, because all data structures can

be stored to a CGIF formatted file, graphs can transferred to other applications through

the graphs in the CGIF file.

6.2.4.3 Data Structures used by the Environment

When originally conceived it was just an implementation of conceptual graphs

algorithms without considering how the data structures affected implementation. In

2000, this system began to change to allow it to be more of a foundational environment

that could be used as the underpinning of a multiple reasoning systems. When this

environment was first conceived it used a double linked list data structure. On redesign,

new data structures were investigated and have been and willbe discussed in other

chapters.

6.3 ADT Implementations

Given is a discussion of three implementations of the internal representation

ADT definitions discussed in Section 2.3.2. These are just basic ideas of how each of

153

Page 184: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

the ADT’s might be implemented. Each of the definitions have been given in pseudo-

code that looks like C++, but that does not have anything to dowith the programming

language that it might be implemented in.

6.3.1 Logical

This ADT could best be implemented in either Prolog or Lisp. The basic struc-

ture of the ADT is one of predicates. If the predicate and treestructure of Prolog is

used, then implementation is straight forward. The syntax and semantics as seen in the

example in Figure 2.2 could be directly mapped onto this ADT.Within the implemen-

tation of the ‘query’ procedure, unification and resolutionwould be performed over the

knowledge-base records. This would be performed by using the ‘SupportClauses’ that

will be saved during processing. If there is a network present, then the routines that

are needed to perform terminological operations would alsobe executed. The theorem

prover would need to use not only the ‘SupportClauses’, but also the stored knowledge-

base from the Calculus class. Note: the ‘Logical’ class is where reasoning is performed

by use of its function, this is the inference engine, and the ‘Calculus’ class is for storing

the knowledge-base.

6.3.2 Basic Data Structures

When implementing the basic data structures that many timesare needed for

simple rule-based systems, languages such as Lisp or C come to mind. This implemen-

154

Page 185: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

tation needs to store records of information (knowledge-base) that can be separated out

as “conditional” information and “rule” information. The rule would be used for pro-

cessing or ‘fired’ when the conditional is found to be true. Ifimplemented in Lisp then a

list representation could be used where the “car” and the “cdr” can give the conditional

or rule back from the record. If one used C then a structure holding the elements of the

IF .. THEN record would be used and functions would need to be defined to retrieve

the conditional and rule parts of the structure.

The inference engine would be implemented in the ‘query’ function. It would

apply the actual knowledge that had been stored in the knowledge-base in order to inter-

pret the conditional [65]. This function is also where the actual reasoning is performed.

If there is any network or hierarchy processing to be performed, it would be imple-

mented in the inference engine, i.e. marker-passing operations are implemented in this

module.

6.3.3 Object

For object manipulation, and in particular, graph manipulation more informa-

tion is needed. To work with graphs there are not only record types of information

about the objects, but the structure of the graph has bearingon both the syntax and

the semantics, or meaning, of the graph and must be stored as part of the represen-

tation ADT. As can be seen by the ADT definition, more basic information needs to

be stored. Because Java and C++ are object-oriented, these languages work well for

155

Page 186: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

the implementation of Conceptual Structures. The implementation must not only know

which conceptual units are linked to which relationship, but must have directionality.

By the use of the ‘3WayTable’ data structure, knowing which is the starting conceptual

unit and which is the ending concept of the relationship is possible. Also, by evalua-

tion of the fetched links, the structure of the physical graph can be known. Through

this knowledge, syntax and semantics of the internal representation can be mapped and

stored.

When this basic ADT is built upon, qualitative functions canbe performed by

using the ‘query’ procedure and adding information to the knowledge base about time

and space. The ‘Graph’ class would also work with any hierarchy that is used with the

knowledge base. In order to do reasoning, several other graph manipulation procedures

and functions have been added. Given a specific system, it is possible that this is not a

complete ADT and more functions will need to be defined.

6.4 Experiment Systems Implementation

The experimental systems were chosen because they were ableto handle full

Conceptual Graphs and did not have the restrictions of SCGs described in 3.4.3. Even

though the SCG algorithm by Muginer and Chein has been implemented in the CoG-

ITaNT system, there is no English documentation in order to work with the system,

and the Croitoru algorithm has not been implemented. Lastly, the author of the pCG

system, David J. Benn, addressed any errors or problems thatarose within the pCG

156

Page 187: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

code.

6.4.1 pCG - Original Notio

The pCG system as discussed previously is built on top of Notio library. It, like

Notio, was written in Java for portability and used the antlrparsing system to read and

process the CGIF format. It was mainly designed to use Notio for the actual match-

ing, projection and join algorithms, while developing alanguagefor inputting and out-

putting simple programs to do analysis. After examining several currently available

systems (see above), pCG was chosen as the most general of thecurrently available

systems. After working with this system, it was discovered that it like several of the

other systems, had the following limitations:

• Only a single copy of a relation could be present in a graph. I.E. if a person

had two characteristics ofbrown hair andblue eyes, creating a graph with both

characteristics was not valid.

[Person]->(CHRC)->[Hair:brown]->(CHRC)->[Eyes:blue]

• It also only found a single projection of a query into a graph even if others were

present.

However, it was possible to work with the pCG programs (see Section C.1) to directly

use many of the same test sets of CG graphs that would test CPE’s data structure varia-

tions.

157

Page 188: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

The data structures used within pCG/Notio are vector arraysfor processing

within an operation. More specifically, that is for matchingnodes, structural analy-

sis of graphs, and evaluating the search space during the projection operation. Through

out all of these processes array data structures are use. However, at the very end of the

projection operation, the graphs in the KB are actually translated to be stored in a hash

table even though if the projection operation is again performed on the KB it will be

again loaded into an array data structure for processing. The hash table is only used

when doing a direct retrieval of a graph from the KB; not during operations.

An interesting feature design of pCG can be related to the ‘typical case’ analysis

from Chapter 5 (see Section 5.3). Within the pCG implementation, it computes the pro-

jection by using a two part algorithm; however, these parts are not the same as the SCG

Relation algorithm of Croitoru. The first part is actually more a part of the storage of

the graphs. In the preparation for the projection operation, this algorithm performs an

Assertionphase which commutes the structure of the graph and re-aligns the labels on

the elements of the graphs to improve the matching later during the projection. There-

fore, as the number of graphs in the KB increases, this Assertion takes a proportional

amount of time to the size of the KB. Since in the typical case the size of the query

graph is small in the number of nodes and the KB size is small, the Assertion does not

have a significant effect on the results, but as the size of graphs in the KB increase and

the number of graphs in the KB also increase this Assertion part should have more of

an effect.

158

Page 189: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

6.4.2 CP Environment (CPE)

This system has been developed since 1988 at New Mexico StateUniversity

[93, 94] and has never had the two limitations listed for the pCG system. However,

besides this difference the two systems are very comparable.

As discussed in Section 6.2.4, CP was originally developed to use doubly linked

lists. When linked lists are sorted, but single linked the execution time for retrieval from

a linked list is the same as that of an array data structure. When in the process of re-

designing CP to use new algorithms for the projection and maximal join operations,

investigation was done on what would be good data structuresto use with these new

algorithms. Therefore, an array data structure was originally chosen to test with the

projection algorithm because when the array is non-sorted then storage is just an ap-

pend at the end of the list and one does not have to use a sorted,or doubly linked

list. After carefully looking at other data structures, hash tables were also chosen to be

investigated.

There are four variables that hold a direct link between the algorithms and data

structures for the system. By changing their underlying data structure, it is believed

that the projection operation execution time will be altered. These variables arec-r-c

andc-a-c, which are part of the CG graph data structure, andmatch listandanchor list

which hold internal data information that will be used to move data from the match-

ing pre-processing part of the algorithm to the actual projection building of the query

159

Page 190: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

graph onto the KB graph. These variables were defined in Section 5.2.1; their actual

implementations will be defined here for use with each test representations.

6.4.2.1 Array (Vectors)

First will be discussed the array implementation for these critical variables (c-

a-c will not be showed in this implementation or the next because the block world

benchmark did not use actors, but its implementation is veryclose to thec-r-c data

structure). In the following descriptive examples, ‘[]’ indicate indices and ‘()’ indicate

structures.

c-r-c

[1] -> (GC1, ([1] -> GT1, (GT1, GC1, R1, GC2, 1)[2] -> GT2, (GT2, GC1, R2, GC3, -1)))

[2] -> (GC2, ([1] -> GT1, (GT1, GC2, R1, GC1, -1)))[3] -> (GC3, ([1] -> GT2, (GT2, GC3, R2, GC1, 1)))

This data structure would be an array that is part of the cg graph class in which the first

part of the structure is the unique concept identifier, for example, at index 3 the key

would be “GC3”. Also at every index in the array, there is an array of cstriple unique

identifiers, for example, at index 1 the key would be “GT2”, that will retrieve a node

structure. This node structure contains the cstriple, forward concept, relation, backward

concept and direction. The direction is either a ‘1’ or ‘-1’ indicating if the cstriple, in

display format, is proceeding from forward concept to back concept with directed arrow

or vice versa. The node structure within the previous built example would be cstriple -

160

Page 191: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

“GT2”, forward concept - “GC3”, relation - “R2”, backward concept - “GC1”, direction

- ‘1’.

match-list

[1] -> (GC1, ([1] -> QC1[2] -> QC2))

[2] -> (GC2, ([1] -> QC2)[2] -> QC1))

[3] -> (GC3, ([1] -> QC3))[4] -> (GC4, NULL)

This list holds the matching concepts between the KB graph and the query graph. In

this example structure, an array would hold all the conceptsfound in the KB graph with

a link to an array of all the matching concepts in the query graph. Until a matching

concept is found, the second array is NULL.

anchor-list

[1] -> (QC1 -> ([1] -> (GC1, ([1] -> GT1,QT1)[2] -> (GC2, ([1] -> GT2,QT2)))

[2] -> (QC2 -> ([1] -> (GC3, ([1] -> GT1,QT1))))[3] -> (QC3 -> ([1] -> (GC4, ([1] -> GT2,QT2))))

The anchor list holds the matching KB concepts that also structurally have the cstriple

relationships found in the query graph. By holding both the matching concepts to

each query graph concept and the related triples, at build time the anchor list can just

be traversed to create the new projection graphs. This example finds two projections

where one projection includes concepts GC1 and GC3 in the projection using the GT1

161

Page 192: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

cstriple node. The second projection includes concepts GC2and GC4 in the projection

using the GT2 cstriple node. All data structures here are arrays.

6.4.2.2 Hash Tables

The important change in the data structures for the criticalvariables given above

came when it was seen that perfect hash tables (as discussed in Section 3.5.2.1) can

improve the overall projection time by greatly improving the actual projection step, or

building of the projection graph, in the second step of processing. In the following

descriptive examples, ‘<>’ indicate hash tables and ‘()’ indicate structures.

c-r-c

<GC1, (<GT1, (<GT1direction, (GT1, GC1, R1, GC2, 1)>)GT2, (<GT2direction, (GT2, GC1, R2, GC3, -1)>)>)

GC2, (<GT1, (<GT1direction, (GT1, GC2, R1, GC1, -1)>)>)GC3, (<GT2, (<GT2direction, (GT2, GC3, R2, GC1, 1)>)>)>

This data structure would be a perfect hash table with a <key,value> that is part of the

cghash graph class. The first part of the structure is the unique concept identifier, for

example, at key GC3 would be a perfect hash value for “GC3”. Also at every key in the

hash table, there is another perfect hash table of cstriple unique identifiers, for example,

at key GT2 would be a perfect hash value for “GT2”. This secondhash table has a value

that is a node structure. The node structure is also stored ina perfect hash table using

the cstriple unique identifier and direction as the key in which the two values create a

162

Page 193: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

perfect indexing value for the hash table. The value of the last hash table is the same as

the value in the above array implementation.

match-list

<GC1, (<QC1, QC2>)GC2, (<QC1, QC2>)GC3, (<QC1>)GC4, NULL>

This hash table holds the matching concepts between the KB graph and the query graph.

In this example structure, a perfect hash table would hold all the concepts found in the

KB graph with a link to a perfect hash table of all the matchingconcepts in the query

graph. As before, the KB graph concept that did not have a match would have a NULL

in its value parameter.

anchor-list

<QC1, (<GC1, (<GT1,(GT1,QT1)>)GC2, (<GT2,(GT2,QT2)>)>)

QC2, (<GC3, (<GT1,(GT1,QT1)>)>)QC3, (<GC4, (<GT2,(GT2,QT2)>)>)>

This example finds the same two projections discovered from the array implementation;

however, all data structures here are perfect hash tables and each unique label would

produce its own unique index in order to have constant time retrieval.

163

Page 194: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

164

Page 195: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

CHAPTER 7

PROJECTION EXPERIMENTS, RESULTS AND ANALYSIS

This chapter includes a discussion of the domain “blocks world” problem and

the actual experiments tested with it. Each of these experiments use a set of reasoning

graphs in the KB for projecting queries against a solution tothe “blocks world” prob-

lem. These are extended graphs from the benchmark set of conceptual graphs from the

CGTools workshop of ICCS2001.

It will also give all the timing results for the cross matrix of test runs discussed

in Section 7.2.1; including a simple analysis of the over allexecution times of each test

set. Comparing and contrasting each test set analyzing the amount of execution time,

overhead time, and space requirements.

7.1 Domain Problem - ‘Blocks World’

Back in 2001, a group of tool developers began the process of truly making con-

ceptual graph systems interoperable. A set of benchmarked files of conceptual graphs

that can be used by reasoning systems to work with theblocks worlddomain were de-

veloped in CGIF format (see Section 3.4.4). During the 2001 Conceptual Graphs Tools

Workshop1 a set of benchmark graphs were defined and place in files with increasing

1The web location: http://www.cs.nmsu.edu/~hdp/CGTools/, holds the resources for theworkshop and the proceedings.

165

Page 196: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

difficulty to process the CGIF format [96]. Figures 7.1, 7.2,7.3, and 7.4 are the contents

of file ‘final_graphs_level2.cgf’ that was able to be processed by all tools submitted to

the workshop.

(GT [TypeLabel: "Entity"] [TypeLabel: "Block"];A block is an entity; ).(GT [TypeLabel: "Entity"] [TypeLabel: "Hand"];A hand is an entity; ).(GT [TypeLabel: "Entity"] [TypeLabel: "Location"];A location is an entity; ).(GT [TypeLabel: "Act"] [TypeLabel: "Pickup"];Pickup is an action; ).(GT [TypeLabel: "Act"] [TypeLabel: "Putdown"];Putdown is an action; ).(GT [TypeLabel: "Act"] [TypeLabel: "MoveHand"];MoveHand is an action; ).(GT [TypeLabel: "Act"] [TypeLabel: "MoveBlock"];MoveBlock is an action; )

Figure 7.1: Part 1: Example of Blocks World Benchmark File.

The Part 1 example (see Figure 7.1) is the type hierarchy definition for the

concepts used in the benchmark.Entity andAct are directly below the top,⊤, concept

of hierarchy, and the other seven concepts,Block, Hand, Location, Pickup, Putdown,

MoveHandandMoveBlock, are directly above the bottom,⊥, concept. Not a very deep

166

Page 197: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

hierarchy.

In Part 2 (see Figure 7.2), one can find the definition graphs for theEntity indi-

vidualized as aBlock, and for theActs individualized asPickup, Putdown, MoveHand

[Entity:’Block’(ATTR [Block*b] [Color])(CHRC ?b [Shape]);Each block has a color and shape; ].[Act:’Pickup’(PTNT [Pickup*p] [Block*b])(INST ?p [Hand*h])(RSLT ?p [Situation: (GRASP ?h ?b)]);Each block is picked up using a hand; ].[Act:’Putdown’(PTNT [Putdown*p] [Block*b])(DEST ?p [Location*l])(INST ?p [Hand])(RSLT ?p [Situation: (Top ?b ?l)]);Each block is put down at a location from the hand; ].[Act:’MoveHand’(DEST [MoveHand*m] [Location*l])(PTNT ?m [Hand*h])(RSLT ?m [Situation: (At ?h ?l)]);This action moves the hand to a location; ].[Act:’MoveBlock’(DEST [MoveBlock*m] [Location*l])(PTNT ?m [Block*b])(INST ?m [Hand])(RSLT ?m [Situation: (At ?b ?l)]);This action moves the block to a location; ]

Figure 7.2: Part 2: Example of Blocks World Benchmark File.

167

Page 198: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

andMoveBlock. The conceptsEntity andAct aredominant concepts(see Subsection

B.1 for definition) with internal structure. The< type, re f erent> pair are the external

scoped concept definition for the instantiation of the dominant concept. During model

processing these individualized definitions can be joined with a reference to the subtype

from the hierarchy. ConceptsHandandLocationhave a concept type and a location in

the type hierarchy; however, they do not have any internal structure to be considered.

Part 3 (see Figure 7.3) contains both the relation hierarchydefinition for the

relationsAt, Above, OnTable, TopandEmptyHandbeing used in the benchmark. It also

gives the dominant conceptRelationinternal structure for each relationship because

these relations are not axioms to CGs. When these referencedrelations appear in other

CGs then the definitional graphs can be joined to them.

The last part of the file, Part 4 (see Figure 7.4), gives the factual graphs con-

tained in the knowledge base. From this section of the file canbe seen, three cubical

blocks with colors ‘Red’, ‘Blue’ and ‘Green’ that are on a table at two locations. Block

#1 is above Block #3 which located directly on the table. Boththese blocks are located

at Location #5 and Block #2 is at Location #6. The hand is emptyand is holding no

blocks. Either Block #1 or Block #2 must be Blue in color, but both can be. The file

contents can be seen in picture form in Figure 7.5.

168

Page 199: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

(GT [RelationLabel: "Relation"] [RelationLabel: "At"]; Relation At ;).(GT [RelationLabel: "Relation"] [RelationLabel: "Above"]; Relation Above ;).(GT [RelationLabel: "Relation"] [RelationLabel: "OnTable"]; Relation OnTable ;).(GT [RelationLabel: "Relation"] [RelationLabel: "Top"]; Relation Top ;).(GT [RelationLabel: "Relation"] [RelationLabel: "EmptyHand"]; Relation EmptyHand ;).[Relation:’At’(POS [Entity] [Location]);An entity is positioned at a location; ].[Relation:’Top’(OnTable [Block*b1] [Location])~[(Above [Block*b2] ?b1)];A block on top is at a location and has no blocks above it; ].[Relation:’EmptyHand’~[(GRASP [Hand] [Block])];A hand is empty when no blocks are in it; ].[Relation:’OnTable’(At [Block*b] [Location])~[(GRASP [Hand] ?b)];A block on the table is at a location and not in the hand; ].[Relation:’Above’(OnTable [Block*b1] [Location*l])(OnTable [Block*b2] ?l);The first block is above the second block at the same location; ]

Figure 7.3: Part 3: Example of Blocks World Benchmark File.

169

Page 200: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

[Block:#1].[Block:#2].[Block:#3].[Hand:#4].[Location:#5].[Location:#6].[Block:@3].;Block #1 is red;(ATTR [Block:#1] [Color:’Red’]).;Block #2 is blue;(ATTR [Block:#2] [Color:’Blue’]).;Block #3 is green;(ATTR [Block:#3] [Color:’Green’]).(OnTable [Block:#1] [Location:#5]).(OnTable [Block:#2] [Location:#6]).(OnTable [Block:#3] [Location:#5]).;Block #1 is above block #3, and block #2 is at a different location;(Above [Block:#1] [Block:#3]).;All the blocks are on the table and not in the hand;(Emptyhand [Hand:#4]).[Either: [Or: (ATTR [Block:#1] [Color:’Blue’])][Or: (ATTR [Block:#2] [Color:’Blue’])]].;All blocks are cubical;(CHRC [Block:@every] [Shape:’Cubical’])

Figure 7.4: Part 4: Example of Blocks World Benchmark File.

170

Page 201: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Green

Blue

Red

Figure 7.5: A Picture of the Benchmark File.

7.2 Tests

The tests that were performed were not only to validate that the new projection

algorithm produced the correct projection of the query ontothe knowledge base graphs,

but to evaluate how different parameters effect the runningof that algorithm given the

data structures used. The data file described in Section 7.1 that was benchmarked was

modified to create larger size knowledge bases and larger size graphs in terms of the

number of nodes (concepts and relations) in the graphs.

171

Page 202: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Each of the tests were run on a single computer that was running the operating

system, Windows XP. There was 2 gigabytes of memory in the machine and all systems

were setup to use all virtual memory. No other applications were executed while the

tests were being performed. There was also 80 gigabytes of disk space, so there were

not space limitations imposed.

7.2.1 Single Appearance of Relation within Graph

Because it turned out that pCG was not able to process more than one instance

of a relation type within a graph, two sets of tests were performed. In Table 7.1 is given

all the files that were tested by all three systems. A knowledge base with 1, 1000, 2500,

and 5000 graphs were each stored in a file; those numbers are across the top of the table.

Then graphs of size 5, 11, 21, 31, 53, and 73 nodes had each of these KB graphs in a

files; those numbers are down the first column. Within each graph in these knowledge

bases, all relation types were unique.

Table 7.1: KB Single Relation Graph Files.

1 1000 2500 50005 graphs_5_1.cgf graphs_5_1000.cgf graphs_5_2500.cgf graph_5_5000.cgf11 graphs_11_1.cgf graphs_11_1000.cgf graphs_11_2500.cgf graphs_11_5000.cgf21 graphs_21_1.cgf graphs_21_1000.cgf graphs_21_2500.cgf graphs_21_5000.cgf31 graphs_31_1.cgf graphs_31_1000.cgf graphs_31_2500.cgf graphs_31_5000.cgf53 graphs_53_1.cgf graphs_53_1000.cgf graphs_53_2500.cgf graphs_53_5000.cgf73 graphs_73_1.cgf graphs_73_1000.cgf graphs_73_2500.cgf graphs_73_5000.cgf

172

Page 203: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

7.2.1.1 Increase # of Graphs in KB

One parameter being examined was how the number of graphs within the knowl-

edge base effected the running time. Therefore, 1, 5, 100, 1000, 2500 and 5000 graphs

were stored in the knowledge base for each graph size. However, as will be seen in

the Results section below (see Section 7.4), the times for 1,5 and 100 graphs in a

knowledge base were so low that there was not significant difference between all of the

systems for evaluation. Therefore, only the 1000, 2500 and 5000 graphs KBs will be

analyzed.

7.2.1.2 Increase # of Nodes in Graphs in KB

Another parameter that was believed to effect the actual execution time of the

projection of the query into the knowledge base was just how many nodes were present

in each graph of the knowledge base. This was somewhat arbitrary because areal world

knowledge base would not have a fixed number of nodes in every graph. In fact, the

sizes of the graphs would be small for factual data, medium for definitional data, but

larger for particle and complete model data. As seen in Table7.1 above the number of

nodes in the graphs of the KBs were increased in the followingway: 5, 11, 21, 31, 53,

and 73.

173

Page 204: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

A sample graph from the single graph KB for each node size is:

5-nodes:

(OnTable [Block*b] [Table])(NAME ?b [Number])

11-nodes:

(ATTR [Block*b] [Color])(NAME ?b [Number])(CHRC ?b [Shape])(LOC ?b [Place])(OnTable ?b [Table])

21-nodes:

(Above [Block*b2] [Block*b1])(OnTable ?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])

31-nodes:

(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2)(OnTable ?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1[Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])(ATTR3 ?b3 [Color])(NAME3 ?b3 [Number])(CHRC3 ?b3 [Shape])(LOC3 ?b3 [Place])

53-nodes:

(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2)(OnTable1 ?b1 [Table*t1])(OnTable2 [Block*b4] ?t1)(Above3 [Block*b5] ?b4)(NAMET ?t1 [Number])(ATTR1 ?b1[Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number])(CHRC2?b2 [Shape])(LOC2 ?b2 [Place])(ATTR3 ?b3 [Color])(NAME3?b3 [Number])(CHRC3 ?b3 [Shape])(LOC3 ?b3 [Place])(ATTR4?b4 [Color])(NAME4 ?b4 [Number])(CHRC4 ?b4 [Shape])(LOC4?b4 [Place])(ATTR5 ?b5 [Color])(NAME5 ?b5 [Number])(CHRC5 ?b5 [Shape])(LOC5 ?b5 [Place])

174

Page 205: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

73-nodes:

(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2)(OnTable1 ?b1 [Table*t1])(OnTable2 [Block*b4] ?t1)(Above3 [Block*b5] ?b4)(Above4 [Block*b6] ?b5)(Above5[Block*b7] ?b6)(NAMET ?t1 [Number])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])(ATTR3 ?b3 [Color])(NAME3 ?b3 [Number])(CHRC3 ?b3 [Shape])(LOC3 ?b3 [Place])(ATTR4 ?b4 [Color])(NAME4 ?b4 [Number])(CHRC4 ?b4 [Shape])(LOC4 ?b4 [Place])(ATTR5 ?b5 [Color])(NAME5 ?b5 [Number])(CHRC5 ?b5 [Shape])(LOC5 ?b5 [Place])(ATTR6 ?b6 [Color])(NAME6 ?b6 [Number])(CHRC6 ?b6 [Shape])(LOC6 ?b6 [Place])(ATTR7 ?b7 [Color])(NAME7 ?b7 [Number])(CHRC7 ?b7 [Shape])(LOC7 ?b7 [Place])

It should be noted that as of 21-nodes in the KB graph, it now became the case that a

relation type needed to be repeated. Therefore, a number wasadded to the relation type

in order to make it unique.

7.2.1.3 Increase # of Nodes in Query Graph

Returning to the ‘typical case’ discussed in Section 5.3, itwas proposed that

smaller query graphs would take less time to project onto a KBwith larger, more nodes,

graphs. Therefore, query graphs ranging in size from 3 nodesall the way to 73 nodes

were tested given the constraint the no query graph was larger in size then the KB graph

size. This is so the projection was always an injective projection as explained in Section

5.1.1.

Examples of several of the query graphs for the projection are given below in

CGIF:

175

Page 206: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

3-nodes:

(ATTR [Block] [Color])

5-nodes:

(ATTR1 [Block*b] [Color])(NAME1 ?b [Number])

7-nodes:

(ATTR [Block*b] [Color])(NAME ?b [Number])(OnTable ?b[Table])

9-nodes:

(ATTR [Block*b] [Color])(NAME ?b [Number])(CHRC ?b[Shape])(LOC ?b [Place])

15-nodes:

(Above [Block*b2] [Block*b1])(OnTable ?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1[Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])

27-nodes:

(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2)(OnTable ?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1[Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2?b2 [Place])(ATTR3 ?b3 [Color])(NAME3 ?b3 [Number])

176

Page 207: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

43-nodes:

(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2)(OnTable1 ?b1 [Table*t1])(OnTable2 [Block*b4] ?t1)(Above3[Block*b5] ?b4)(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])(ATTR3 ?b3 [Color])(NAME3 ?b3 [Number])(CHRC3 ?b3[Shape])(LOC3 ?b3 [Place])(ATTR4 ?b4 [Color])(NAME4 ?b4[Number])(ATTR5 ?b5 [Color])(NAME5 ?b5 [Number])

Not all of the query graphs were given as examples especiallyif the graph structure

is defined in the subsection above for the KB. It should be noted that some of the 3-

nodes query graphs were not exactly the one given. This is because several of the query

graphs had to slightly modified to make the relations match. This happened with not

only 3-nodes queries, but several of the others also. At thistime, pCG did not have

a relation hierarchy to account for relation types that wereactually specializations of

other relations, so CPE was not given one either.

The set of queries evaluated with each KB were attempting to test ‘typical’

queries that would possibly be asked by a user when using a query-answer system. This

is why not all queries were tested against all KBs that meant the injective projection

requirement. A place that this is very obvious is while examining the data in Table

7.2. Here is seen that the query graph with 7 nodes is only usedwhen testing the 11

nodes KB graph. Looking at the structure of the 11 node KB graph, one can see that the

“OnTable” relation without an “Above” relation is only usedin this graph. Therefore

looking for a block with a name, color and directly on the table without a block above

177

Page 208: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Table 7.2: Single Relation: Query Graph Size Run vs Number ofNodes in KB Graphs.

3 5 7 9 11 15 21 27 31 43 53 63 735 X X11 X X X X X21 X X X X X X31 X X X X X X X53 X X X X X X X X X X73 X X X X X X X X X X X X

it would only appear in these graphs. That is why this query graph was tested only with

this KB graph structure.

7.2.2 Multiple Appearance of Relation with a Graph

Due to the fact that some systems are not able to process multiple relations of

the same type within a single graph and it is perceived that this would be necessary for

any system working as a general query-answer system, tests were performed on only

the two data structure versions of CPE to validate that this projection algorithm is in fact

able to handle this type of data. As in the above section, multiple sizes of graphs within

the knowledge base were tested as well as multiple sizes of query graphs. Because

these tests were for validation and not for execution time purposes, multiple size KBs

were not tested.

178

Page 209: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

7.2.2.1 Increase # of Nodes in Graphs in KB

Each of these graphs in the KB are designed to test that a querygraph that is

contained in more than one subgraph will produce all valid projections. In order to test

several different query graphs with several different nodesizes, KB graphs with 13, 23,

33 and 55 nodes within the graph in the KB were tested.

A sample graph from the single graph KB for each node size is:

13-nodes:

[Block*b1][Block*b2](Above ?b1 ?b2)(OnTable ?b2 [Table])(ATTR ?b1 [Color])(NAME ?b1 [Number])(ATTR ?b2 [Color])(NAME ?b2 [Number])

23-nodes:

(Above [Block*b1] [Block*b2])(OnTable ?b2 [Table*t1])(NAME ?t1 [Number])(ATTR ?b1 [Color])(NAME ?b1 [Number])(CHRC ?b1 [Shape])(LOC ?b1 [Place])(ATTR ?b2 [Color])(NAME ?b2 [Number])(CHRC ?b2 [Shape])(LOC ?b2 [Place])

33-nodes:

(Above [Block*b2] [Block*b1])(Above [Block*b3] ?b2)(OnTable ?b1 [Table*t1])(NAME ?t1 [Number])(ATTR ?b1[Color])(NAME ?b1 [Number])(CHRC ?b1 [Shape])(LOC ?b1[Place])(ATTR ?b2 [Color])(NAME ?b2 [Number])(CHRC ?b2[Shape])(LOC ?b2 [Place])(ATTR ?b3 [Color])(NAME ?b3[Number])(CHRC ?b3 [Shape])(LOC ?b3 [Place])

179

Page 210: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

55-nodes:

(Above [Block*b2] [Block*b1])(Above [Block*b3] ?b2)(OnTable ?b1 [Table*t1])(OnTable [Block*b4] ?t1)(Above [Block*b5] ?b4)(NAME ?t1 [Number])(ATTR ?t1[Legs])(ATTR ?b1 [Color])(NAME ?b1 [Number])(CHRC ?b1[Shape])(LOC ?b1 [Place])(ATTR ?b2 [Color])(NAME ?b2[Number])(CHRC ?b2 [Shape])(LOC ?b2 [Place])(ATTR ?b3[Color])(NAME ?b3 [Number])(CHRC ?b3 [Shape])(LOC ?b3[Place])(ATTR ?b4 [Color])(NAME ?b4 [Number])(CHRC ?b4[Shape])(LOC ?b4 [Place])(ATTR ?b5 [Color])(NAME ?b5[Number])(CHRC ?b5 [Shape])(LOC ?b5 [Place])

7.2.2.2 Increase # of Nodes in Query Graph

When examining Table 7.3, it is seen that not as many variations of query graphs

were exam. This is due to the fact that the interest here was invalidating that multiple

projection graphs could be found within the KB graphs. Therewas only a limited

number of nodes that did in fact appear in some form of multiple projection; after that,

as the number of nodes in the query graph grew, the projectionoperation could only

find a single subgraph projection from the query graph onto the KB graph.

Table 7.3: Multi-Relation: Query Graph Size Run vs Number ofNodes in KB Graphs.

3 5 9 1113 X X23 X X X33 X X X X55 X X X X

180

Page 211: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

The query graph node structure was designed to produce multiple projections

given the KB graph. The actual query graph for the projectionis given below in CGIF:

3-nodes:

(ATTR [Block] [Color])

5-nodes:

(ATTR [Block*b] [Color])(NAME ?b [Number])

9-nodes:

(ATTR [Block*b] [Color])(NAME ?b [Number])(CHRC ?b[Shape])(LOC ?b [Place])

11-nodes:

(ATTR [Block*b] [Color])(NAME ?b [Number])(CHRC ?b[Shape])(LOC ?b [Place])(OnTable ?b [Table])

Each of these query graphs will produce multiple projectiongraphs given using the

KBs discussed in Section 7.2.2.1. To look at an example of howthis will give multiple

projection graphs, if the 5 nodes query graph is projected onto the 13 nodes KB graph

it would result in the following two projections:

1. (ATTR [Block*b1] [Color])(NAME ?b1 [Number])

2. (ATTR [Block*b2] [Color])(NAME ?b2 [Number])

181

Page 212: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

7.3 Results of Each Experiment Systems

Beginning will be an overview of what was seen for results of each of the ex-

periment systems.

7.3.1 pCG - Original Notio

As will be seen and discussed in the sections below the pCG system is very

stable when the query graphs are small and when there are few numbers of graphs in

the knowledge base. As the size of the query graphs grow towards the size of the graphs

within the knowledge base and as the number of graphs within the knowledge grows

larger, then the error span increases (see error bar data in Section D.2 of Appendix D)

and becomes very unstable.

7.3.2 CP Environment

The new projection algorithm presented in Chapter 5 and Section 5.2.2 gave

interesting results in both forms of the tested data structures. The array implementation

did very well on the typical case (as it was designed to do); the hash table implemen-

tation did not come on strong until the size of the graphs within the KBs was increase.

Below some more information is given on why it is believed that these results were

seen.

182

Page 213: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

7.3.2.1 Array (Vector)

As laid out in Chapter 6, the data structures here were all arrays. The storage

of these data structures were using an unsorted mechanism excepted that each concept,

relation, and cstriple all had unique labels and were storedaccording to their appearance

within the CGIF formatted graphs in the file. That did cause the most basic concept

node in the graph to many times be stored first in the list; therefore, causing it to be

quickly retrieved during the projection operation. However, as the arrays became longer

and some of the concept nodes had an equal number of links, it can be seen that the time

needed to check the structure and build the projection increased. But the increase and

the shape of the resulting polynomial never went outside of the predicted analysis of

the algorithm given in Chapter 5 and Section 5.3.3.

7.3.2.2 Hash Tables

This data structures implementation behavior as expected.In using a perfect

hash, 1) extra time was needed for storage; 2) more space was needed in order for the

KB to be resident in memory; and 3) there was extra overhead inprocessing the hash

tables. However, even though on the projection of small query graphs onto small KB

graphs did not give excellent results, as the size of the graphs within the KBs increased

and the number of graphs within the KB increased the simple linear regression [73]

showed that the execution time was linear as the size of the query graphs increase. It

is believed that the reason that the results were not seen with the more ‘typical case’

183

Page 214: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

is because the hash tables were designed and implemented so that no collisions would

happen within the tables. This added to both the amount of space needed for the KB

to be resident in memory and added extra overhead during processing. However, when

the execution time needed for the actual projection in the other implementation reached

the overhead for this implementation, this implementationbecame more efficient.

7.4 Results of Each # of Nodes in KB

As discussed previously in this chapter, each of the graph sizes were placed in a

knowledge base of size 1, 5, 100, 1000, 2500 and 5000 graphs. Timings were collected

on runs against part of the graph files, but all of the relevantquery graphs. However

until there were at least 1000 graphs in the KB, there were really no separation in the

acquired execution times; therefore, for each graph size below only the timing for 1000,

2500 and 5000 graphs will be given and discussed.

7.4.1 5 nodes in KB graphs

Here are the result charts for the test runs performed using the knowledge bases

containing graphs with 5 nodes in them. First is Figure 7.6 containing the results for

the projection of query graphs against a KB containing 1000 graphs all with 5 nodes.

As discussed in Section 7.2.1.2, the 5 nodes in the KB graph holds the information of

the name of block that is on the table in the ‘blocks world’ domain. Second is Figure

7.7 containing the results for the projection of query graphs against a KB containing

184

Page 215: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

0

50

100

150

200

250

2 3 4 5 6

#nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.6: 5 nodes in KB of 1000 Graphs.

2500 graphs all with 5 nodes.

Third is Figure 7.8 containing the results for the projection of query graphs

against a KB containing 5000 graphs all with 5 nodes. Lookingat this set of 3 charts,

there really is not enough information to indicate what the real growth curve is for the

projection of the query graphs onto the KB graphs. Therefore, the tests were expanded

to include more nodes and more cstriples.

185

Page 216: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

0

100

200

300

400

500

600

2 3 4 5 6

# of nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.7: 5 nodes in KB of 2500 Graphs.

0

200

400

600

800

1000

1200

2 3 4 5 6

#nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.8: 5 nodes in KB of 5000 Graphs.

186

Page 217: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

7.4.2 11 nodes in KB graphs

Here are the result charts for the test runs performed using the knowledge bases

containing graphs with 11 nodes in them. First is Figure 7.9 containing the results for

the projection of query graphs against a KB containing 1000 graphs all with 11 nodes.

These 11 node graphs from the KB, as seen in Section 7.2.1.2, contain the block on the

table as well as the name, color, shape and location of the block. Second is Figure 7.10

containing the results for the projection of query graphs against a KB containing 2500

graphs all with 11 nodes.

0

50

100

150

200

250

2 3 4 5 6 7 8 9 10 11 12

# of nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.9: 11 nodes in KB of 1000 Graphs.

187

Page 218: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

0

100

200

300

400

500

600

700

800

2 3 4 5 6 7 8 9 10 11 12

# of nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.10: 11 nodes in KB of 2500 Graphs.

Third is Figure 7.11 containing the results for the projection of query graphs

against a KB containing 5000 graphs all with 11 nodes. Because there are more nodes

and cstriples in the KB graphs, more query graphs can be projected onto these graphs

to see more of a gradation in the set of 3 charts. The slope and shape of the curves as

the number of nodes in the query graphs increase become more distinct. Even with this

smaller number of query graphs being tested, CPEHash, is showing a linear slope and

CPE (array format) performs faster than the other two systems.

188

Page 219: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

0

200

400

600

800

1000

1200

1400

1600

2 3 4 5 6 7 8 9 10 11 12

# of nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.11: 11 nodes in KB of 5000 Graphs.

7.4.3 21 nodes in KB graphs

Here are the result charts for the test runs performed using the knowledge bases

containing graphs with 21 nodes in them. First is Figure 7.12containing the results

for the projection of query graphs against a KB containing 1000 graphs all with 21

nodes. These 21 node graphs not only have the information forthe block on the table

including the name, color, shape and location of the block (see Section 7.2.1.2), but

this same information that is defined for the block located above the first block. Second

is Figure 7.13 containing the results for the projection of query graphs against a KB

189

Page 220: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

0

50

100

150

200

250

300

350

400

2 4 6 8 10 12 14 16 18 20 22

# of nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.12: 21 nodes in KB of 1000 Graphs.

containing 2500 graphs all with 21 nodes.

Third is Figure 7.14 containing the results for the projection of query graphs

against a KB containing 5000 graphs all with 21 nodes. In all three of these charts, the

slopes of the execution time for projecting the query graph onto the KB graphs for each

system are the same. However, the execution time for projecting the query graph onto

the KB is definitely a function of the number of graphs in the KB. For the actual graph

isomorphism, that is when the query graph is the same size as the the KB graph, the

execution time is actually coming together.

190

Page 221: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

0

100

200

300

400

500

600

700

800

900

1000

2 4 6 8 10 12 14 16 18 20 22

# of nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.13: 21 nodes in KB of 2500 Graphs.

0

200

400

600

800

1000

1200

1400

1600

1800

2000

2 4 6 8 10 12 14 16 18 20 22

# of nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.14: 21 nodes in KB of 5000 Graphs.

191

Page 222: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

7.4.4 31 nodes in KB graphs

Here are the result charts for the test runs performed using the knowledge bases

containing graphs with 31 nodes in them. First is Figure 7.15containing the results

for the projection of query graphs against a KB containing 1000 graphs all with 31

nodes. These 31 node graphs include the two blocks with theirinformation including

the name, color, shape and location of the block (see Section7.2.1.2). It also indicates

that the first block is on the table, and that a third block withall of its information is on

top of the second block.

0

100

200

300

400

500

600

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32

# of nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.15: 31 nodes in KB of 1000 Graphs.

192

Page 223: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Second is Figure 7.16 containing the results for the projection of query graphs

against a KB containing 2500 graphs all with 31 nodes, and third is Figure 7.17 contain-

ing the results for the projection of query graphs against a KB containing 5000 graphs

all with 31 nodes. These charts are now showing quite clearlythat as the number of

nodes in both the query and KB graphs increase, the shape of the curves become clearer.

These curves are coming very close to crossing indicating that with larger graphs some

algorithms perform better than with small graphs. In fact the curves are very close

together when looking at a large size KB.

0

200

400

600

800

1000

1200

1400

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32

# of nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.16: 31 nodes in KB of 2500 Graphs.

193

Page 224: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

0

500

1000

1500

2000

2500

3000

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32

# of nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.17: 31 nodes in KB of 5000 Graphs.

7.4.5 53 nodes in KB graphs

Here are the result charts for the test runs performed using the knowledge bases

containing graphs with 53 nodes in them. First is Figure 7.18containing the results for

the projection of query graphs against a KB containing 1000 graphs all with 53 nodes.

These 53 node graphs (see Section 7.2.1.2) include all threeof the blocks in one stack

on the table with their information including the name, color, shape and location of the

block. Also, there is a second stack on the same table with twomore blocks including

their information.

194

Page 225: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

0

100

200

300

400

500

600

700

800

900

1000

2 7 12 17 22 27 32 37 42 47 52

# of nodes in Query Graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.18: 53 nodes in KB of 1000 Graphs.

Second is Figure 7.19 containing the results for the projection of query graphs

against a KB containing 2500 graphs all with the 53 nodes information. Third is Figure

7.20 containing the results for the projection of query graphs against a KB containing

5000 graphs all with 53 nodes. Because the number of nodes in the KB graphs has

gotten large enough, the curves have now crossed to indicatethat the overhead from

the hash tables is no longer having as much effect on the overall execution time. The

CPEHash system is continuing to show a linear curve with the cross-over points being

the same in all three charts.

195

Page 226: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

0

500

1000

1500

2000

2500

2 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53

#nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.19: 53 nodes in KB of 2500 Graphs.

0

1000

2000

3000

4000

5000

6000

2 5 8 11 14 17 20 23 26 29 32 35 38 41 44 47 50 53

#nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.20: 53 nodes in KB of 5000 Graphs.

196

Page 227: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

7.4.6 73 nodes in KB graphs

Here are the result charts for the test runs performed using the knowledge bases

containing graphs with 73 nodes in them. First is Figure 7.21containing the results for

the projection of query graphs against a KB containing 1000 graphs all with 73 nodes.

These 73 node graphs (see Section 7.2.1.2) include six blocks in two stacks on the table

with the information for each block including the name, color, shape and location of

the block. The name for the table is also part of each graph in the KB.

0

200

400

600

800

1000

1200

1400

1600

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75

#nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.21: 73 nodes in KB of 1000 Graphs.

197

Page 228: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Second is Figure 7.22 containing the results for the projection of query graphs

against a KB containing 2500 graphs all with 73 nodes. Third is Figure 7.23 containing

the results for the projection of query graphs against a KB containing 5000 graphs all

with 73 nodes. With these three system tests, it is seen that 73 nodes shows the clearest

result changes between the pCG, CPE and CPEHash. As before, CPE does the best

with the smallest (fewest number of nodes) query graphs, butwhen graph isomorphism

is reached, complete coverage of the full graph, then the array vector implementation

causes real slow down. As with the 53 node charts the cross-over of the curves happens

when testing the same query graph projection in each chart.

0

500

1000

1500

2000

2500

3000

3500

4000

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75

#nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.22: 73 nodes in KB of 2500 Graphs.

198

Page 229: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

0

1000

2000

3000

4000

5000

6000

7000

8000

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75

#nodes in Query graph

tim

e in

mill

isec

on

ds

pCG

CPE

CPEHash

Poly. (pCG)

Poly. (CPE)

Linear (CPEHash)

Figure 7.23: 73 nodes in KB of 5000 Graphs.

7.5 Analysis of Results

Each of the results given in the section above is laid out by nodes in the KB

graphs. This appears to be the most direct way of evaluating the result received from

the tests.

199

Page 230: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

7.5.1 Change # of Graphs in KB

Looking at the results above for the 1000, 2500 and 5000 graphs in each KB, the

curves in each chart are the same and just increase in milliseconds by the propositional

number of graphs increased in the KB. Adding large numbers ofgraphs to the KB puts

stress on the amount of memory space needed for processing because the KB needs to

stay resident in memory. However on evaluation of each of theKBs by nodes size, the

shape of the curves in relationship to the three system implementations is showing no

change.

7.5.2 Change # of Nodes in KB Graphs

As the number of nodes in the graphs found in the KB increased,the shape of

the curves in the result graphs became more pronounced. Thatis as the graph sizes

increased and the problems moved closed to “real life”, the effects of the algorithm

changes and data structures were more prominent. When looking at the results from

the 5 nodes and 11 nodes KBs, about all the solutions looked the same except that the

hash tables because of their added overhead took longer thanboth of the other solutions.

However, as the size of the graphs increased, the curves generated by the results took

on either a polynomial exponential or linear shape. By 53 nodes in the KB graphs, the

solutions had started crossing and taking on shape. In the 73nodes KB results, the

same crossings seen at 53 nodes were present and the CPE hash table implementation

was definitely a linear result when tested with a simple linear regression [73].

200

Page 231: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

7.5.3 Change # of Nodes in Query Graph

The number of nodes in the query graphs as projected onto the KB were started

small and then increased until the subgraph isomorphism wasin fact a graph isomor-

phism of the KB graph. Because it was desired to only do an injective projection

operation, the number of nodes in the query graph never exceeded the number of nodes

in the KB graph. As the number of nodes in the query graph increased the number of

concepts in the anchor list also increased. This created more matches of query concepts

to match graph concepts and therefore more processing of cstriples during the building

of projections phase. Therefore, when the number of nodes inthe KB graphs increased,

the execution time of the faster implementation with small graphs started to increase.

By the time the KB size was to 53 nodes even the smaller query graphs were showing

similar execution times. That is for larger size graphs in the KB, the smaller query

graph execution was much closer together for all implementations then when the KB

graphs were small. Also, as the KB graphs increased in size more variations in query

graph sizes could be tested; therefore, allowing the visualization of cross over for the

CPE hash implement. This implementation showed better result than the pCG system

at about 27 nodes in the query graphs and better results than the CPE array implemen-

tation at about 41 nodes in the query graphs. These results were seen for both the 53

and 73 nodes in graphs of KB.

201

Page 232: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

7.5.4 Change # of Identical Relations in Graph

As discussed previously in this chapter, the pCG system can not process iden-

tical relations within a single graph. Therefore the tests run with multiple instances of

the same relation were confided to validating that the CPE system with both its data

structures were correctly finding all projections (see Section D.3.2 in Appendix D for

actual output). Table 7.4 shows how many projections were found when running the

validation tests. These tests gave the same results for bothdata structures implemented

in the CPE system.

Table 7.4: Number of Projections Found: Query Graph Size vs KB Graph Size.

3 5 9 11

13 2 223 2 2 233 3 3 3 155 5 5 5 2

202

Page 233: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

CHAPTER 8

CONCLUSIONS AND FUTURE WORK

8.1 Evaluation of Four Projection Algorithms

Four different, yet related, projection algorithms that use either full Conceptual

Graphs (CGs) or Simple Conceptual Graphs (SCGs) have been described (see Chapter

5). Examining Table 8.1 comparisons between basic units, type of graphs, number of

possible projections found, projection question analysis, overall problem analysis of

projection operation algorithm execution time and actual projection creation execution

time will be evaluated.

Table 8.1: Comparison of Four Algorithms.

M&C Croitoru Notio New Projbasic unit relations relations relations concepts

works over SCGs SCGs CGs CGsprojs found all # relations 1 allproj question NP-Complete NP-Complete NP-Complete NP-Complete

problem NP-Hard NP-Hard NP-Hard NP-Hardproj alg non-impl non-impl n3 n3/n

203

Page 234: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

The Mugnier and Chein and Croitoru algorithms use SCGs, and Notio and the

new algorithm work over full CGs. Looking back at the exampleshown when dis-

cussing the projection operation, Notio would only find one projection because it was

only designed to look for a single projection graph. Croitoru’s algorithm includes a

stop mechanism such that the total number of relations in thequery graph equals the

number of possible projections; therefore, at times it may not find all projections even

though the actual algorithm should final all projections.

It is not clear from the Mugnier and Chein 1992 work [74], if they can handle

two concept pairs with the same relationship between them ina projection operation.

However, from later work [75], it is indicated that the same relationship between dif-

ferent concepts can be found and multiple projections are possible between two CGs,

but the algorithm is based on SCGs, that do not use actors, andare not directed graphs.

The Mugnier and Chein algorithm is also based on the relations found within the graph

and must traverse all of their signatures to discover if there is a subgraph morphism.

The new algorithm is based on the conceptual units, or concepts, within the graph and

can stop searching as soon as there is no match for a concept orconcept triple in the

KB graph for one of a query graph’s conceptual unit.

Mugnier and Chein’s algorithm does the whole projection operation as a sin-

gle injective projection algorithm, where Croitoru, Notioand the new algorithm all use

some form of preprocessing. Notio and the new algorithm havea complete separation

204

Page 235: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

between the preprocessing algorithm and projection; where, Croitoru uses the prepro-

cessing algorithm inside of the actual projection, therefore, giving the same running

time for both the overall algorithm and the actual projection. Notio does preprocessing

at storage time that helps in constructing the projection. However, the actual projection

search problem after the preprocessing is still NP-Hard.

The new algorithm splits the overall projection algorithm into two parts, match-

ing and projection construction. Then data structures are used between these two al-

gorithms to use the structure of the graphs to help in the projection process. In the

most common case the matching algorithm is the longest running part of the overall

algorithm because the actual projection execution is polynomial.

8.1.1 Strengths

All of these algorithms address decision problems that are in the class of NP-

Complete and have search problems that are in the class of NP-Hard, therefore, where

the strength of the algorithms come into play is in how they handle ‘typical case’ situ-

ations where they would be used.

Since the number of database records and semantic web pages are increasing

yearly in the amount of information available, algorithms that can work with knowledge

bases with large amounts of data will be on the forefront.

205

Page 236: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

8.1.2 Weaknesses

Notio has a definite weakness in that it is actually designed to only find one

possible projection graph even when others are available. Also, the SCG Relation algo-

rithm of Croitoru has a weakness in the stop mechanism of the algorithm. This should

be modified to not exclude any possibilities.

The CPE algorithm potentially could take a large amount of time to process the

matching or preprocessing algorithm. However, in most cases discussed within this

work it does not operate on this end of the spectrum.

8.2 Data Structures and Algorithms Effectiveness Comparison for ImplementedAlgorithms

Since the Mugnier and Chein algorithm and Croitoru algorithm were not im-

plemented, comparison will only be in relationship to Notioand CPE algorithms. The

pCG system that implemented the Notio algorithm used an added phase between stor-

age and projection to impose internal structure on the stored graphs. Even though this

added structural information helped the projection process when the graphs were small

in size and the KBs had few graphs, as sizes and KBs increased this added phase be-

came very costly in time.

The CPE algorithm when adding the data structures change also showed some

changes from when the graphs were small and KBs were small to when these elements

increased.

206

Page 237: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

8.2.1 Strengths

pCG algorithm was efficient in its size and memory usage. The graph was stored

in a very tight array and hash table structure. Many graphs could be processed before

the memory available had to be increased in order to have processing of the projections.

The CPE array data structure also was efficient in its size andmemory usage. In

fact, in all tests it never ran out of memory even when the 5000graph KB was resident.

The hash table data structure CPE had the advantage that it executed in linear time as

the number of nodes in the query graph increased.

8.2.2 Weaknesses

With the pCG algorithm, the Assertion phase became a real ‘bottle neck’ as the

number of graphs in the KB increased. Because it wanted to compare all the structures

of all the graphs within the KB when asserting them, this tookover an hour of actual

execution time to assert the 73 node KB with both 2500 and 5000graphs.

CPE hash table implementation require a lot resident memoryfor processing the

large KBs. This was because it used 10000 element hash table indices to be sure that

the labels for all elements (concepts, relations and triples) were unique. The implemen-

tation could have been changed to add an extra part to the processing to redo the unique

identifiers after the graphs were stored, but it was not knownhow much execution time

would be added to the process.

207

Page 238: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

8.3 Significance of Work

Both in a typical scenario where the query graph is small in size (number of

relationships between concepts) compared to the graphs in the knowledge base, and

in actual execution tests, the new injective projection algorithm: 1) performs projec-

tions on full conceptual graphs, 2) finds all projections even when conceptual relation’s

rtypes are not unique, 3) performs the projections faster over a complete KB than com-

parative system, and 4) gives good results when executing against a large KB (5000

graphs).

Data structure modifications when directly integrated intoprojection algorithm

produced significance improvement when executed over larger KBs with larger size

graphs within the KB.

8.3.1 Full Conceptual Graphs

Even though much work has been done with SCG, full conceptualgraphs with

all their functionality are desirable. This new algorithm does not have the added re-

strictions of SCGs and can even process functional relations. Because there are cases

of queries over time and space that require full CGs, this newalgorithm is significant.

8.3.2 Finds All Valid Projections

This new algorithm finds all valid projections given a query.Because it is not

known which projection from the KB graph may answer the needed information, it is

208

Page 239: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

necessary to produce all valid projections. Because of the data structure implementation

finding all valid projections does not really cost any more time than finding one.

8.3.3 Data Structure Integration in Algorithm over Large KB and Graphs

The perfect hash table implementation it is more efficient with large KBs and

large graphs within a KB even though it required much more storage space and memory

allocation. Because the information within many standard databases is increasing with

record information and if one desires to store semantic information from off of the

Semantic Web, being able to handle large amounts of data and knowledge is critical.

Being able to retrieve a projection onto this large KB is a significant improvement.

8.4 Future Work

As an extension to this dissertation’s work, some lines of research can be con-

tinued. The work on the maximal join algorithm can be improved by using the infor-

mation found in this work. Also other researchers can be worked with on this algorithm

analysis.

By evaluating the information about the use of the data structures, this work

can help to develop new ideas for storing knowledge base meta-structures in relational

databases for creating the ability to move factual information to a knowledge base and

then return more information back to the original database.

As new benchmark graphs become available within the community of research

209

Page 240: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

more domains can be tested with this new algorithm and data structures. Also, time and

space constraints can also be tested with the new algorithm,while adding heuristics to

improve on the constraint processing.

8.4.1 Experiments and Analysis of Maximal Join Algorithm

In Section 5.2.3, an algorithm was presented to describe themaximal join oper-

ation in the same terms as the projection operation new algorithm. Now that the author

of the Amine Platform, Adil Kabbaj, wishes to make his systeminteroperate with other

systems [56] and has an implemented the full CG Maximal Join operation, the same

modifications made to theProjectionsupport routine (see Section 5.2.1.3) will be im-

plemented, tested and analyzed. Amine can be used for comparison and to help in the

validation of the new algorithm.

8.4.2 KB Stored From and To Standard Relational DB

Investigation into possibly storing SCG from relational database records has

begun [130]. Given that the new data structure used in storing the CG, in this work, is

in a hash table format, this structure could be translated into a relational database record

structure. Then this meta-data could be used to store the full structure of the CG. Once

the CG is stored in the database, retrieving it again back to aknowledge base would be

easily constructed.

210

Page 241: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

8.4.3 Time and Space Constraints

Constraints are divided into several groups. Some constraints work by modi-

fying and/or evaluating the elements of the domain that are actually processed. Some

constraints apply heuristics to decide what information seen during processing should

continue to be considered. These heuristics may be very simple such as is the current

domain element TRUE or FALSE, maintaining basic truth, or they may be very com-

plex. In constraint-satisfaction problems quantificationoperations are used in a Prolog

type fashion to assign values and variables subject to a set of constraints [28]. Con-

straint specifications give a convenient form for expressing known knowledge while

allowing the system designer to focus on local relationships among entities within the

domain. The next sub-section will discuss heuristic constraints.

Other constraints are concerned with time and space relationships between the

domain elements and between the actual conceptual units within the semantic network.

These constraints use qualitative relationships to propagate over time and space. As

discussed in the Qualitative Section (see 2.1.2.3), these are interval relationships that

are setup “point to point”. In Figure 8.1, adapted from Allen’s 1991 paper (p. 346) [4],

seven of the basic interval relationships originally discussed in the 1983 Allen paper [3]

are shown. There are six other relationships that are the inverse of part of these event

objects not depicted. In the further sections below, these relationships will be discussed

as they relate to time and space.

211

Page 242: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Figure 8.1: Interval Time Relationships.

8.4.3.1 Heuristics

Heuristics are criteria, methods, or principles for deciding which among several

alternative courses of action promises to be the effective in order to achieve some goal

[85]. The idea here is to define a simple criteria that discriminates between good and

bad selection. One may choose a heuristic that is just arule of thumb that guides to a

selection, or one may look to see if the out come of applying a heuristic appears to take

them to a “stronger” position. When one has good heuristics,they provide a simple

212

Page 243: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

means of indicating which course of action will lead to the preferred goal with a quick

path even if it is not the most effective [85]. In general, most reasoning problems are

very complex and have large numbers of cases to evaluate to find an answer. Heuristics

allow these number of cases to be reduced and a shorter, even though maybe not the

most direct, solution to be discovered within reasonable time constraints.

Heuristics use quantification operations to prune and shapethe evaluation of

information. The information may be checked with heuristics for either its feasibility

to lead to a valuable solution or for its correctness in the world that the system knows

as reality.

8.4.3.2 Time

Time intervals over moments in time are processed using qualitative relation-

ships. Temporal reasoning requires that the knowledge representation be able to define

and process asnapshotof time. A snapshot is a constraint in the time interval where

only one moment in time, zero duration, is processed. Dean and McDermott [27] saw

time in terms of duration constraints. Figure 8.2 gives an example from the Dean and

McDermott 1987 paper (p. 41) [27], on how a time map can be designed seeing snap-

shots as a point to point with the duration constraints encoded between the snapshots.

It is like a time slice across the current states and schematics, which will be call asitu-

ation, of the objects being considered. The situations may be processed in a forward or

backward direction of snapshots.

213

Page 244: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Figure 8.2: A Simple Time Map.

As can be seen in Figure 8.3, an interval can be set up with a start and end

time and can be assigned as a property of an act. This propertyis the time duration

of the interval and can be viewed as either a fixed time scale orrelative time. In the

Figure 8.3 relative time is used because the time line does not have actual time values.

Time intervals when used with a relative scale require a way of knowing where to start

to investigate for information. As following the time map given, the ball is suspended,

drop, then falling; a choice is now made from continuing in this time stretch by a

bounce, rising, and stop, or roll and rolling. If the choice is for the bounce, then as the

stop interval finishes, the falling event will return. If theroll event occurs, there will

be no circling back to the falling event. In many temporal reasoning systems, this is

done by time indexing of time tokens at insertion of events; however, as discussed in

the Mukerjee review [76], sometimes the “neat” durations are not available.

Now as one looks at the full time interval each time slice for an object can be

seen as a constraint. If each of these constraints are viewedas their own act property,

then the full time interval picture will be modified depending which time slices are

current and/or which constraints that are satisfied.

214

Page 245: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Figure 8.3: Time Chart for a Bouncing Ball.

8.4.3.3 Space

Regions within space at a location are also processed using qualitative function-

ality. Unlike temporal reasoning with a starting and endingtime, space does not have

time line flow, but can be multi-dimensional patchwork [76].However, one can still

look at regions that are space sliced according to locationsacross processes and chron-

icles, but one does not get a concept of input and output to thespatial relationships

[47].

In Figure 8.4, it can be seen that over time a ball that is bouncing appears in

215

Page 246: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

different locations, bounce height, in space (left axes). When working with spatial

constraints [12], the key is to find objects that fall into some spatially organized category

such as a region and then sort according to the category, Because there is only one

object in this example it is harder to see, but if two balls were bouncing being dropped

at different times, the constraints could be categorized bywhether or not there is zero,

one or two balls in the same space slice.

Figure 8.4: Conceptual Space Diagram for a Bouncing Ball.

216

Page 247: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

8.4.4 Different Domain Problems and Interoperability

The design of the architecture for the CPE system specifically was to address

the need to communicate and interoperate with other systems[87, 89]. The next step

is to work with multiple domains of information and start to connect the modules to

as many applications as possible. Work has already been proceeding on using the CPE

knowledge base as the “back-end” for a Story Understanding System [14, 88, 89].

217

Page 248: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

APPENDICES

Page 249: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

APPENDIX A

PROGRAMMING LANGUAGE CRITERIA

A.1 Language Evaluation

Each of the applications/systems defined in Chapter 6 may or may not share the

same implementation language. So an added complication, besides different internal

data structures, is that an application may not be able to communicate at a function call

level with another application because they are not writtenin the same implementation

language.

When looking to design an application with flexibly modules,the question

arises what implementation language should be used. Since,it was desired that the

system work with conceptual structures as the internal representation existing CS sys-

tems were examined. First the CS editors that were currentlyavailable were evaluated.

These editors, for editing CGs and FCAs, turned out to have different implementation

languages. CharGer [29, 30] is based on the API/Implementation code of Notio which

is written in Java [117, 115]. ARCEdit is a plug-in to PowerPoint [96] and is written

in Visual Basic 6.0. ToscanaJ [6, 7] has an editor as part of its suite of tools written

in JavaTM. While Docco is actually based on a Conceptual Email Manager[18], writ-

ten in C++/QT, the commercial version of the manager [36] is aplug-in for Microsoft

219

Page 250: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Outlook.

Since Notio, written in Java, is already defined as an API/Implementation code

level and is available with extensible class definitions, the author considered using the

Notio interface for the CGIF (see Section 3.4.4) module. However, there are some

drawbacks in communications to Java(see Section A.2.2), and Notio is in hiatus and is

not currently being enhanced or developed [116].

All of the applications in the conceptual structures community employ differ-

ent implementation languages, such as Prolog, XML, Schema,RDF, etc., which made

it difficult to pass even simple syntactic representation data by linking languages in

modules. Files, streams, pipes, blackboards, etc. can be used to pass data information

without passing the actual data structures, but these mechanisms can be slow if there are

a large number of graphs or the graphs are extremely complex.Every time one applica-

tion process needs to talk to another, these mechanisms require multiple file descriptors

to be opened. If the applications or systems execute on different machines, the Flexible

Module Framework, FMF, architecture designed by John Sowa [124] is a flexible way

of passing the syntactic data representation; but, if the applications and systems are able

to be executed on the same machine configuration, a good API/Implementation design

would be more advisable because the module can be linked directly into the existing

application. Communication by files and other stream devices may require a locking

mechanism to be setup, so that one application can know when it is safe to read the

220

Page 251: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

input graphs from another application. The locking of records can cause a problem

when two applications communicate by way of shared databases or message passing

systems, such as MPI. If on the other hand, an application/system can call another

application/system directly (or can link to it), processing can go more quickly.

However, connecting systems when the implementation languages are not the

same is more difficult, because a straight forward “call” to the other system’s functions

is not always possible. Each language implementation has its own calling specifica-

tions.

In order for data to be transferred between working tools, either all the tools

must be implemented within the same environment as a single application or there

must be an interchange format. When tools are developed through a single system, the

same data structure (or model) can be shared among all the tools so that data can be

stored and retrieved. However, when tools are not part of thesame system, they do

not necessarily share the same internal data structure (or design model). To support

interoperability for applications [101], an interchange format must be defined. This

interchange format must be agreed upon by the whole working community. When this

standard format is used to move data between applications, standard benchmark test

can be developed.

One module concentrated on, in respect to data structures for the processors,

is in the use of Conceptual Graph Interchange Format, CGIF, for communication [122,

221

Page 252: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

126]. This is not to say that a processor is constrained in itsinternal structures or imple-

mentation by CGIF, but it makes sense to examine the correspondence between CGIF

and the appropriate data types with a view to minimizing the difficulties of parsing and

generating CGIF syntax [87]. A definition of the actual CGIF syntax and semantic

interpretation used for Conceptual Graphs data structure can be seen in section 3.4.4.

A.1.1 Visual Basic .Net

If it was desirable to have the component modules available only for execu-

tion under the Microsoft Windows OS, Visual Basic .Net does not have any of the

connection or interface problems discussed above. However, this would not allow the

components implementation to be moved off the Microsoft Windows OS, and if the

modules are not implemented in Visual Basic .Net, than they can be made more widely

available under Linux operating systems and eventually under other operating systems

such as Unix.

A.1.2 JavaTM

Java has a very nice visual interface and is able to be transferred to many dif-

ferent operating systems. However, it is an interpreted language and takes more time to

execute than a language that is resident to the machine.

222

Page 253: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

A.1.3 C

C as originally designed to be an operating system base language. It like Visual

Basic .Net is tied to the OS that it is running under. Because of this, it is much faster

than interpreted languages like Java, but is not as portable. It is also designed to be

coded “bottom up”, such that, routines are built into libraries and then called from an

overall application.

A.1.4 C++

C++ is an Object-Oriented Programming (OOP) language that is an evolution-

ary extension from the language C was developed by Bjarne Stroustrup [127]. Even

though it accepts the C syntax, it improves on many features of the language. In par-

ticular, programs written in C++ can be coded “top down” by designing what objects

are needed within the program and then how do they relate. Theactual code comes

directly from the design and specification of the program instead of linking existing

routines together.

A.2 Language Comparison

In order to know which language would be best for implementation of the new

environment’s modular components, so that they could possibly be used directly by

other applications, an evaluation is performed over how theC++ language interacts,

interfaces and communicates with other languages.

223

Page 254: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

A.2.1 C++ to C

Interfacing implementation code between languages that are somewhat similar,

for example: C++ and C, is not as difficult as other communications between languages.

However, this connection may not be bidirectional. The calling sequence for the C

language is simpler than for the C++ language, because C++ does name mangling with

the name of the function, the types of the arguments and the return type of the function.

C does not do the same name mangling and uses only a modified form of the actual

name of the function.

Therefore when designing an API in C++, the interface routines should be ex-

ported as “C” functions as opposed to being methods for a class in C++; this will

prevent the routines from being name mangled by the C++ compiler. Wrapping C++

with standard C routines allows the internal implementation of the module to remain

C++ and use the classes and methods functionality from C++, while at the same time

using the simpler formulation of the name of the calling routine provided by C.

A.2.2 C++ to JavaTM

Connecting C++ to Java is also possible, but is more difficultthan communicat-

ing with C. This connection is also not bidirectional, but for different reasons. Java is

a simpler language than C++ [102], but it is an interpreted language. This means that

Java can be byte-compiled, creating a smaller file to be movedacross the web, but it is

224

Page 255: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

not compiled to machine code. This allows Java to be platformindependent; C++ is a

compiled language and is both platform dependent and operating system (OS) depen-

dent. However, because Java is interpreted instead of compiled, it can not be linked and

called directly by a system (application) that is not written in Java. Java must start the

process, and then can call compiled code in some of the other compiled languages (for

example: C/C++). Therefore, Java can call interface functions written in C or C++, but

C++ can not call Java directly.

A.2.3 C++ to Prolog

Connecting C++ to Prolog is very similar to connecting C++ toJava. Prolog

has foreign function routines for call C++/C functions. Also, when using particular

operating system and version of Prolog, communication may be provided by the Prolog

system (for example: Amzi! Prolog Logic Server and Microsoft C++) for integrating

C++ and Prolog routines. However, in general, like Java and Lisp (not discussed in this

paper) Prolog must start the process of executing the systemand then call to the C++

routines, but C++ cannot call directly to Prolog.

A.2.4 C++ to Visual Basic 6.0

The connection or interface from C++ to Visual Basic 6.0 is the most difficult

connection among the four languages discussed in this paper. One reason is that Visual

Basic 6.0 is a two part language; an event driven module part and a class module part.

225

Page 256: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

The class module part is very similar to C++ and holds the characteristics that

are available in object-oriented languages. Class modulescan also be compiled just

like C++ to native code for the machine. However, the event driven part executes Basic

code in response to an event. These event driven procedures (routines) are triggered by

a form or control which is hooked into the visual part of the language. The triggering

of a routine by an event is similar to the interpretation of a function call in languages

like Java. Because of the event driven part of the language, Visual Basic 6.0 can call

C++ or C API/Interface routines, but C++/C cannot trigger anevent within the Visual

Basic code, so the event procedures (routines) are not executed outside of Visual Basic

code.

A second reason Visual Basic 6.0 is difficult to connect, is that it has different

encoding of some of its data types than C, C++, or Java [105]. Character data is stored in

more bits by Visual Basic than by C. Therefore, to pass a character string as a parameter

from Visual Basic to C or visa versa, the character string must be converted to Unicode

first, that is passed as a parameter, and then decoded from Unicode at the other end.

This makes passing character data much more cumbersome. Also, Visual Basic defines

different Boolean values than C; the “false” value is 0 in both languages, but the “true”

value for Visual Basic is -1 (negative) where in C it is 1 (positive). Therefore, in passing

Boolean values, the user must be careful when working with conditionals.

226

Page 257: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

APPENDIX B

DOCUMENTATION OF CGIF - VERSION 2001

B.1 Added Definitions For CGIF Categories

Context.

A contextC is a concept whose designator is a nonblank conceptual graphg.

• The graphg is said to be immediately nested inC, and any conceptc of g is said

to be immediately nested inC.

• A conceptc is said to be nested inC, if eitherc is immediately nested inC or c is

immediately nested in some contextD that is nested inC.

• Two conceptsc and d are said to be co-nested if eitherc=d or there is some

contextC in whichc andd are immediately nested.

• If a conceptx is co-nested with a contextC, then any concept nested inC is said

to be more deeply nested thanx.

• A conceptd is said to be within the scope of a conceptc if either d is co-nested

with c or d is more deeply nested thanc.

227

Page 258: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Coreference Set.

A coreference setC in a conceptual graphg is a set of one or more concepts selected

from g or from graphs nested in contexts ofg.

• For any coreference setC, there must be one or more concepts inC, called the

dominant conceptsof C, which include all concepts ofC within their scope. All

dominant concepts ofC must be co-nested.

• If a conceptc is a dominant concept of a coreference setC, it may not be a

member of any other coreference set.

• A conceptc may be member of more than one coreference setC1,C2, ... provided

thatc is not a dominant concept of anyCi .

• A coreference setC may consist of a single conceptc, which is then the dominant

concept ofC.

Referent.

Adding to the definition already seen in Definition 3.4.2, a referent of a concept is

specified by a quantifier, a designator, and a descriptor.

• Quantifier. A quantifier is one of two kinds: existential or defined.

• Designator. A designator is one of three kinds:

228

Page 259: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

1. literal, which may be a number, a string, or an encoded literal;

2. locator, which may be an individual marker, an indexical,or a name;

3. undetermined.

• Descriptor. A descriptor is a conceptual graph, possibly blank, which is said to

describe the referent.

B.2 Lexical Categories

The CGIF lexical categories can be recognized by a finite-state tokenizer or

preprocessor. No characters of white space (blanks or othernonprinting characters)

are permitted inside any lexical item other than delimited strings (names, comments,

or quoted strings). Zero or more characters of white space may be inserted or deleted

between any lexical categories without causing an ambiguity or changing the syntactic

structure of CGIF. The only white space that should not be deleted is inside delimited

strings.

Comment.

A comment is a delimited string with a semicolon ";" as the delimiter.

Comment ::= DelimitedStr(";")

229

Page 260: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

DelimtedStr(D).

A delimited string is a sequence of two or more characters that begin and end with a

single character D called the delimiter. Any occurrence of Dother than the first or last

character must be doubled.

DelimitedStr(D) ::= D (AnyCharacterExcept(D) | D D)* D

Exponent.

An exponent is the letter E in upper or lower case, an optionalsign ("+" or "-"), and an

unsigned integer.

Exponent ::= ("e" | "E") ("+" | "-")? UnsignedInt

Floating.

A floating-point number is a sign ("+" or "-") followed by one of three options: (1)

a decimal point ".", an unsigned integer, and an optional exponent; (2) an unsigned

integer, a decimal point ".", an optional unsigned integer,and an optional exponent; or

(3) an unsigned integer and an exponent.

Floating ::= ("+" | "-") ("." UnsignedInt Exponent?| UnsignedInt ("." UnsignedInt? Exponent?| Exponent ) )

230

Page 261: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Identifier.

An identifier is a string beginning with a letter or underscore "_" and continuing with

zero or more letters, digits, or underscores.

Identifier ::= (Letter | "_") (Letter | Digit | "_")*

Integer.

An integer is a sign ("+" or "-") followed by an unsigned integer.

Integer ::= ("+" | "-") UnsignedInt

Name.

A name is a delimited string with a single quote "’" as the delimiter.

Name ::= DelimitedStr("’")

Number.

A number is an integer or a floating-point number.

Number ::= Floating | Integer

QuotedStr.

A quoted string is a delimited string with a double quote ’"’ as the delimiter.

QuotedStr ::= DelimitedStr(’"’)

231

Page 262: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

UnsignedInt.

An unsigned integer is a string of one or more digits.

UnsignedInt ::= Digit+

B.3 Syntactic Categories

The CGIF syntactic categories are defined by a context-free grammar that can be

processed by a recursive-descent parser. Zero or more characters of white space (blanks

or other nonprinting characters) are permitted between anytwo successive constituents

of any grammar rule that defines a syntactic category.

Actor.

An actor begins with "<" followed by a type. It continues withzero or more input arcs,

a separator "|", zero or more output arcs, and an optional comment. It ends with ">".

Actor ::= "<" Type(N) Arc* "|" Arc* Comment? ">"

The arcs that precede the vertical bar are called input arcs,and the arcs that follow the

vertical bar are calledoutput arcs. The valence N of the actor type must be equal to the

sum of the number of input arcs and the number of output arcs.

232

Page 263: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Arc.

An arc is a concept or a bound label.

Arc ::= Concept | BoundLabel

BoundLabel.

A bound label is a question mark “?” followed by an identifier.

BoundLabel ::= "?" Identifier

CG.

A conceptual graph is a list of zero or more concepts, relations, actors, special contexts,

or comments.

CG ::= (Concept | Relation | Actor | SpecialContext | Comment)*

The alternatives may occur in any order provided that any bound coreference label must

occur later in the CGIF stream and must be within the scope of the defining label that

has an identical identifier. The definition permits an empty CG, which contains nothing.

An empty CG, which says nothing, is always true.

CGStream.

A conceptual graph stream is defined as a sequence of one or more CGs, each separated

by a period.

233

Page 264: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

CGStream ::= CG ("." CG)*

Since a CG may itself be empty, the string "....." would also qualify as a CG Stream; as

well as an empty file.

Concept.

A concept begins with a left bracket "[" and an optional monadic type followed by

optional coreference links and an optional referent in either order. It ends with an

optional comment and a required "]".

Concept ::= "[" Type(1)? {CorefLinks?, Referent?} Comment? "]"

If the type is omitted, the default type is Entity. This rule permits the coreference labels

to come before or after the referent. If the referent is a CG that contains bound labels

that match a defining label on the current concept, the defining label must precede the

referent.

Conjuncts.

A conjunction list consists of one or more type terms separated by "&".

Conjuncts(N) ::= TypeTerm(N) ("&" TypeTerm(N))*

The conjunction list must have the same valence N as every type term.

234

Page 265: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

CorefLinks.

Coreference links are either a single defining coreference label or a sequence of zero or

more bound labels.

CorefLinks ::= DefLabel | BoundLabel*

If a dominant concept node (as defined in subsection B.1) has any coreference label, it

must be either a defining label or a single bound label that hasthe same identifier as the

defining label of some co-nested concept.

DefLabel.

A defining label is an asterisk “*” followed by an identifier.

DefLabel ::= "*" Identifier

The concept in which a defining label appears is called the defining concept for that

label; a defining concept may contain at most one defining label and no bound corefer-

ence labels. Any defining concept must be a dominant concept as defined in subsection

B.1.

Every bound label must be resolvable to a unique defining coreference label

within the same context or some containing context. When conceptual graphs are im-

ported from one context into another, however, three kinds of conflicts may arise:

235

Page 266: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

1. A defining concept is being imported into a context that is within the scope of

another defining concept with the same identifier.

2. A defining concept is being imported into a context that contains some nested

context that has a defining concept with the same identifier.

3. Somewhere in the same module there exists a defining concept whose identifier

is the same as the identifier of the defining concept that is being imported, but

neither concept is within the scope of the other.

In cases (1) and (2), any possible conflict can be detected by scanning no further than

the right bracket "]" that encloses the context into which the graph is being imported.

Therefore, in those two cases, the newly imported defining coreference label and all its

bound labels must be replaced with an identifier that is guaranteed to be distinct. In

case (3), there is no conflict that could affect the semanticsof the conceptual graphs

or any correctly designed CG tool; but since a human reader might be confused by the

similar labels, a CG tool may replace the identifier of one of the defining coreference

labels and all its bound labels.

Descriptor.

A descriptor is a structure or a nonempty CG.

Descriptor ::= Structure | CG

236

Page 267: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

A context-free rule, such as this, cannot express the condition that a CG is only called

a descriptor when it is nested inside some concept.

Designator.

A designator is a literal, a locator, or a quantifier.

Designator ::= Literal | Locator | Quantifier

Disjuncts.

A disjunction list consists of one or more conjunction listsseparated by "|".

Disjuncts(N) ::= Conjuncts(N) ("|" Conjuncts(N))*

The disjunction list must have the same valence N as every conjunction list.

FormalParameter.

A formal parameter is a monadic type followed by a optional defining label.

FormalParameter ::= Type(1) [DefLabel]

The defining label is required if the body of the lambda expression contains any match-

ing bound labels.

237

Page 268: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Indexical.

An indexical is the character “#” followed by an optional identifier.

Indexical ::= "#" Identifier?

The identifier specifies some implementation-dependent method that may be used to

replace the indexical with a bound label.

IndividualMarker.

An individual marker is the character “#” followed by an integer.

IndividualMarker ::= "#" UnsignedInt

The integer specifies an index to some entry in a catalog of individuals.

LambdaExpression(N).

A lambda expression begins with "(" and the keyword "lambda", it continues a signature

and a conceptual graph, and it ends with ")".

LambdaExpression(N) ::= "(" "lambda" Signature(N) CG ")"

A lambda expression with N formal parameters is called an N-adic lambda expression.

The simplest example, represented "(lambda ())", is a 0-adic lambda expression with a

blank CG.

238

Page 269: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Literal.

A literal is a number or a quoted string.

Literal ::= Number | QuotedStr

Locator.

A locator is a name, an individual marker, or an indexical.

Locator ::= Name | IndividualMarker | Indexical

Negation.

A negation begins with a tilde “~” and a left bracket “[” followed by a conceptual graph

and a right bracket “]”.

Negation ::= "~[" CG "]"

A negation is an abbreviation for a concept of typePropositionwith an attached relation

of type Neg. It has a simpler syntax, which does not permit coreference labels or at-

tached conceptual relations. If such options are required,the negation can be expressed

by the unabbreviated form with an explicitNegrelation.

Quantifier.

A quantifier consists of an at sign “@” followed by an unsignedinteger or an identifier

and an optional list of zero or more quoted strings enclosed in braces.

239

Page 270: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Quantifier ::= "@" (UnsignedInt | Identifier ("{" (remove(Arc*))QuotedStr ("," QuotedStr)* "}")?)

The symbol @some is called the existential quantifier, and the symbol @every is called

the universal quantifier. If the quantifier is omitted, the default is @some.

Referent.

A referent (see subsection B.1 for added definitions) consists of a colon “:” followed

by an optional designator and an optional descriptor in either order.

Referent ::= ":" {Designator?, Descriptor?}

Relation.

A conceptual relation begins with a left parenthesis “(” followed by an N-adic type, N

arcs, and an optional comment. It ends with a right parenthesis “)”.

Relation ::= "(" Type(N) Arc* Comment? ")"

The valence N of the relation type must be equal to the number of arcs.

Signature.

A signature is a parenthesized list of zero or more formal parameters separated by

commas.

240

Page 271: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Signature ::= "(" (FormalParameter ("," FormalParameter)*)? ")"

SpecialConLabel.

A special context label is one of five identifiers: “if”, “then”, “either”, “or”, and “sc”,

in either upper or lower case.

SpecialConLabel ::= "if" | "then" | "either" | "or" | "sc"

The five special context labels and the two identifiers "else"and "lambda" are reserved

words that may not be used as type labels.

SpecialContext.

A special context is either a negation or a left bracket, a special context label, a colon,

a CG, and a right bracket.

SpecialContext ::= Negation | "[" SpecialConLabel ":" CG "]"

Structure.

A structure consists of an optional percent sign “%” and identifier followed by a list of

zero or more arcs enclosed in braces.

Structure ::= ("%" Identifier)? "{" Arc* "}"

241

Page 272: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Type.

A type is a type expression or an identifier other than the reserved labels: "if", "then",

"either", "or", "sc", "else", "lambda".

Type(N) ::= TypeLabel(N) | TypeExpression(N)

A concept type must have valence N=1. A relation type must have valence N equal

to the number of arcs of any relation or actor of that type. Thetype label or the type

expression must have the same valence as the type.

TypeExpression.

A type expression is either a lambda expression or a disjunction list enclosed in paren-

theses.

TypeExpression(N) ::= LambdaExpression(N) | "(" Disjuncts(N) ")"

The type expression must have the same valence N as the lambdaexpression or the

disjunction list.

TypeLabel(N).

A type label is an identifier.

TypeLabel(N) ::= Identifier

242

Page 273: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

The type label must have an associated valence N.

TypeTerm.

A type term is an optional tilde “~” followed by a type.

TypeTerm(N) ::= "~"? Type(N)

The type term must have the same valence N as the type.

Example.

When transforming the English phrase:A person is between a rock and a hard place,

the display format, DF, of the phrase in CG format can be seen in Figure B.1. Following

is a translation of Figure 2 from DF to CGIF:

(Betw [Rock] [Place *x1] [Person]) (Attr ?x1 [Hard])

For more compact storage and transmission, all white space not contained in comments

or enclosed in quotes may be eliminated:

(Betw[Rock][Place*x1][Person])(Attr?x1[Hard])

243

Page 274: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Between

Place Attr Hard

Person

Rock

Figure B.1: The Display Format for‘A person is between a rock and a hard place.’

This translation takes the option of nesting all concept nodes inside the concep-

tual relation nodes. A logically equivalent translation, which uses more coreference

labels, moves the concepts outside the relations:

[Rock *x1] [Place *x2] [Person *x3] (Betw ?x1 ?x2 ?x3)

[Hard ?x4] (Attr ?x2 ?x4)

The concept and relation nodes may be listed in any order provided that every bound

label follows the defining node for that label.

244

Page 275: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

APPENDIX C

DOCUMENTATION OF SYSTEMS

C.1 pCG (CGP Programs)

In order to use pCG to test the projection operation that was found at the Notio

level, a ‘cgp’ program file had to be generated. Table C.1 shows the cgp programs that

match to each of the tests that are given in section 7.2.1 of Chapter 7.

Table C.1: CGP Program Files.

CGP Program Filename # of KB files # of Queries

5 graphs_5.cgp 6 211 graphs_11.cgp 6 521 graphs_21.cgp 6 631 graphs_31.cgp 6 853 graphs_53.cgp 6 1073 graphs_73.cgp 6 12

The program file contained instructions to the pCG processoron what functions

are to be executed and what information is to be retrieved. Figures C.1, C.2, C.3, and

C.4 contains an example of one of the cgp program files.

245

Page 276: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

# A test of the final graphs. Reads a final graph file, asserts# each non type hierarchy related graph into pCG’s top-level# knowledge base, and projects a filter over all these graphs,# returning matches and sending them to standard output.# Use a June 2001 CG Standard conformant CGIF parser and# generator. The current (0.2.2) Notio defaults are based upon# an older version of the standard.option cgifparser = "cgp.translators.CGIFParser";option cgifgen = "cgp.translators.CGIFGenerator";# Get the file path separator for the current operating system.sep = (_ENV.member("file.separator"))[2];# Final file names.graphFileNames = {"graphs_53_1.cgf", "graphs_53_5.cgf",

"graphs_53_100.cgf", "graphs_53_1000.cgf","graphs_53_2500.cgf", "graphs_53_5000.cgf"};

Figure C.1: Part 1: Example of CGP Program from pCG.

The cgp program is broken into several parts so that it can be displayed in sev-

eral figures. The first part indicates the parser and translator format being used by the

CGP program; this may be either the Notio original format or the CGIF format from

2001 [126]. The next section in this piece of the program, indicated the KBs that should

be tested. Part 2 indicates the query graphs that will be tested against the KB.

The third part examines the parameters that the CGP program will be using to

select the correct KB, query graph to be tested, and the number of times to run the test.

Then reads in the KB from the indicated file and runs the “Assertion” phase of the CGP

program file to setup the knowledge base for the pCG system. Inthe fourth part is the

actual running of the projection algorithm and the printingof the time results.

246

Page 277: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

# filter graphsgraphFilters = {‘(ATTR1 [Block] [Color])‘,‘(ATTR1 [Block*b] [Color])(NAME1 ?b [Number])‘,‘(Above1 [Block*b2] [Block*b1])(OnTable1 ?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])‘,

‘(Above1 [Block*b2] [Block*b1])(OnTable1 ?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(ATTR2 ?b2 [Color])‘,

‘(Above1 [Block*b2] [Block*b1])(OnTable1 ?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])‘,

‘(Above1 [Block*b2] [Block*b1])(OnTable1 ?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])‘,

‘(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2)(OnTable1 ?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])(ATTR3 ?b3 [Color])(NAME3 ?b3 [Number])‘,

‘(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2) (OnTable1?b1 [Table])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1[Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2[Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])(ATTR3 ?b3[Color])(NAME3 ?b3 [Number])(CHRC3 ?b3 [Shape])(LOC3 ?b3 [Place])‘,

‘(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2)(OnTable1 ?b1 [Table*t1])(OnTable2 [Block*b4] ?t1)(Above3 [Block*b5] ?b4)(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2 ?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])(ATTR3 ?b3 [Color])(NAME3 ?b3 [Number])(CHRC3 ?b3 [Shape])(LOC3 ?b3 [Place])(ATTR4 ?b4 [Color])(NAME4 ?b4 [Number])(ATTR5 ?b5 [Color])(NAME5 ?b5 [Number])‘,

‘(Above1 [Block*b2] [Block*b1])(Above2 [Block*b3] ?b2)(OnTable1?b1 [Table*t1])(OnTable2 [Block*b4] ?t1)(Above3 [Block*b5] ?b4)(NAMET ?t1 [Number])(ATTR1 ?b1 [Color])(NAME1 ?b1 [Number])(CHRC1 ?b1 [Shape])(LOC1 ?b1 [Place])(ATTR2 ?b2 [Color])(NAME2?b2 [Number])(CHRC2 ?b2 [Shape])(LOC2 ?b2 [Place])(ATTR3 ?b3[Color])(NAME3 ?b3 [Number])(CHRC3 ?b3 [Shape])(LOC3 ?b3[Place])(ATTR4 ?b4 [Color])(NAME4 ?b4 [Number])(CHRC4 ?b4[Shape])(LOC4 ?b4 [Place])(ATTR5 ?b5 [Color])(NAME5 ?b5[Number])(CHRC5 ?b5 [Shape])(LOC5 ?b5 [Place])‘};

Figure C.2: Part 2: Example of CGP Program from pCG.

247

Page 278: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

# Get optional graph file number from command-line.# Defaults to 1.gFileNum = 1;if _ARGS.length > 0 and _ARGS.length <= 3 thengFileNum = (_ARGS[1]).toNumber(); endif gFileNum > 6 or gFileNum < 1 then

exit "Invalid file number."; endgraphFileName = graphFileNames[gFileNum];gFNum = 1;if _ARGS.length > 1 and _ARGS.length <= 3 thengFNum = (_ARGS[2]).toNumber(); endif gFNum > 10 or gFNum < 1 then exit "Invalid filter number."; endtNum = 1;if _ARGS.length > 2 and _ARGS.length <= 3 thentNum = (_ARGS[3]).toNumber(); end# open the CGF filenewF = file ("examples" + sep + "projection" + sep + graphFileName);# open to get timingsu = new Util;# Read and assert the graphs.println "*** Asserting graphs into KB...";println "";startfull = u.getCurrentTimeInMillis();graphs = newF.readGraphStream();newF.close();endtime1 = u.getCurrentTimeInMillis();println "Storage time is " + (endtime1 - startfull) + " ";startassert = u.getCurrentTimeInMillis();t = 0;foreach g in graphs dorels = g.relations;t.inc();if rels.member("GT") is undefined thenassert g;endendendtime2 = u.getCurrentTimeInMillis();print "Assert time is " + (endtime2 - startassert)";println " for " + t + " graphs.";

Figure C.3: Part 3: Example of CGP Program from pCG.

248

Page 279: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

n = 0;while n < tNum doprojections = {};# Retrieve all graphs in the outer context containing an# OnTable relation.filter = graphFilters[gFNum];# print "Result of projecting " + filter# println " onto asserted graphs...";# println "";t = 0;startpart = u.getCurrentTimeInMillis();foreach g in _KB.graphs doh = g.project(filter);if not (h is undefined) thenif (tNum <= 1) thenendtimefull = u.getCurrentTimeInMillis();println h;endt.inc();projections.append(h);endif h is undefined thent.inc();println "Not found graph is number " + t + ".";endendif (tNum > 1) thenendtimefull = u.getCurrentTimeInMillis();endendpart = (endtimefull - startpart);print "Actual Projection time is " + endpart + " for "println t + " graphs";n.inc();end

Figure C.4: Part 4: Example of CGP Program from pCG.

249

Page 280: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

C.2 CP Environment, CPE

The CP Environment has both module documentation for showing how the

DDLs are designed and class documentation for some of the data structures and func-

tions found in the classes of the systems.

C.2.1 CPE Module Documentation

The module documentation gives the top level systems API functions and the

internal general functions for both the CPE and CG modules. The CPE module is the

most generalized routines available for the CPE system and the CG module holds the

basic data structure for the conceptual graphs storing the knowledge base.

C.2.1.1 CP_Graph Reasoning Operations

These functions perform the basic reasoning operations from the API:

• CPE_API CPLPGraphs STDCALLCPE_projectionUnique (void)

– Note: only returns one projection even if more than one available.

• CPE_API CPLPGraphs STDCALLCPE_projection (void)

– Projects the current query graph onto the current.

250

Page 281: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

C.2.1.2 CP_Graph Reasoning Internal Operations

These functions perform the actual internal reasoning operations:

• CPLPGraphscp_ops::CProjection(CPLPGraph, CPLPGraph)

– Actual graph to graph projection.

• CPLPGraphscp_ops::CProject(CPLPKB,CPLPGraph)

– Knowledge base to query graph projection which processes all the graphs in theKB.

• BOOLEAN cp_ops::get_onlyOne(void)

– Check to see if only one projection needs to be found per KB graph.

• void cp_ops::set_onlyOne(BOOLEAN)

– Set if only one projection needs to be found per KB graph.

• CPLPGraphcp_ops::add_toornewprojections(CGLPChar, CGLPChar, CGLPChar,

CGLPChar, CGLPChar, CGLPNElement, CPLPGraph, CPLPGraph,CPLPGraphs)

– Check new matching concept or new projection line from current matching con-cept.

• BOOLEAN cp_ops::add_toexistprojections(CGLPChar, CGLPChar, CGLP-

NElement, CPLPGraph, CPLPGraphs)

– Add the new query triple match to all related projection graphs.

251

Page 282: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

• BOOLEANcp_ops::add_copyprojections(CGLPChar, CGLPChar, CGLPNEle-

ment, CPLPGraph, CPLPGraph, CPLPGraphs)

– Make a copy of a projection graph and add in the new triple for next round pro-cessing.

• BOOLEAN cp_ops::process_querytriple(CGLPChar, CGLPChar, CGLPChar,

CGLPNElement, CGLPCStr, CPLPGraph, CPLPGraphs, CPLPGraphs)

– Process multiple elements to anchorlist when not on first concept in query graph.

C.2.1.3 CGHash_Graph and CG_Graph Public Functions

These functions perform the basic graph operations from theAPI:

• booladdChild (CGLPChar)

– Adds a new child graph (nested graph) to the CG graph.

• booladdConcept(CGLPChar)

– Adds a new concept to the CG graph concepts list.

• booladdRelation (CGLPChar)

– Adds a new relation to the CG graph relations list.

• booladdTriple (CGLPChar)

– Adds a new triple name to the CG graph triples list.

252

Page 283: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

• bool isChild (CGLPChar)

– Is a children list

• bool isConcept(CGLPChar)

– Is a concepts list

• bool isRelation (CGLPChar)

– Is a relations list

• bool isTriple (CGLPChar)

– Is a triples list

• CGLPNodesgetNodes(short)

– Returns the node list for the type of node being searched for.

C.2.2 CPE Class Documentation

The class documentation indicates how the hierarchy of class references are

setup in C++ for the modules.

C.2.2.1 cp_graph Class Reference

• Inheritance diagram for cp_graph is seen in Figure C.5.

253

Page 284: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

cp_graph

cghash_graph cg_graph

Figure C.5: Inheritance Diagram for Class ‘cp_graph’.

C.2.2.2 cghash_graph Class Reference

This class is the perfect hash implementation of a CG graph. Inheritance dia-

gram for cghash_graph is the left side of Figure C.5. Base graph: cghash_graph data

structure that is changing for graphs; when CGHASH defined then implemented as two

hashtables and all lists are hashtables with keys that are unique numbers. Class specific

functions are:

• cghash_graph(void)

– Constructor function that makes sure most internal lists are built.

• cghash_graph(UINT)

– Constructor function that makes sure internal lists are built and triples lists.

• ∼cghash_graph(void)

– The destructor class for cleaning up at the end.

254

Page 285: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

C.2.2.3 cg_graph Class Reference

This class is the array implementation of a CG graph. Inheritance diagram for

cg_graph is the right side of Figure C.5. Base graph: cg_graph: Conceptual Graph

elements of graph data structure that is changing for graphs; when CG2DARR defined

then implemented as two 2-dimensional arrays and all lists are list of strings. Class

specific functions are:

• cg_graph(void)

– Constructor function that makes sure all internal list are built.

• cg_graph(int)

– Constructor function that makes sure internal lists are built and triples lists.

• ∼cg_graph(void)

– The destructor class for cleaning up at the end.

255

Page 286: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

256

Page 287: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

APPENDIX D

DATA COLLECTED FROM SAMPLE TESTS

This appendix will give an example of some of the data collected to produce the

results found in Chapter 7. It also gives output from the tested systems to verify that the

correct results were produced with the projections in both the unique relation instance

and multiple relations within graph instance.

D.1 Data Collected for Computing Each Experimental ResultsTest Set - 53 nodesin KB Graphs

The three Tables D.1, D.2 and D.3 are the average data values used to produce

the graphs found in subsection 7.4.5 of Chapter 7.

Table D.1: Average Data Values for 53 nodes KB with 1000 Graphs.

# of nodes in Query pCG CPE (array) CPE (hash table)3 82.1 25.85 174.155 105.45 49.05 181.359 143.8 82.95 214.111 161.75 90.85 228.815 206.3 140.5 261.521 279 177.15 305.4527 346.15 280.25 36231 424.25 320.4 392.943 575.8 506.9 503.1553 700.85 662.15 595.35

257

Page 288: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

These averages came from computing the average value over 48runs for each

test case. A test case consisted of selecting the number of nodes in the KB graphs file,

and then selecting the query graph to be projected onto that KB of graphs. Before com-

puting the average value the four lowest (fastest) times andthe four highest (slowest)

times were dropped. The average values seen in the tables arefor runs with the pCG

system, CPE with the array data structures and CPE with the hash table data structures.

Table D.2: Average Data Values for 53 nodes KB with 2500 Graphs.

# of nodes in Query pCG CPE (array) CPE (hash table)

3 225.1 55.3 457.355 291.35 122.4 466.19 396.9 190.1 552.911 465.8 251.05 600.815 575.8 344.2 685.2521 742.25 510.75 786.8527 945.4 713.5 939.831 1037.55 857.15 1052.4543 1494.6 1351.55 1303.6553 1814.95 1875 1514.8

The reason some of the timings collected were not used in computing the av-

erages is timings on the machine used for all testing was onlyaccurate to within 16

milliseconds, so some “spreading” of the timing was seen. How much spreading is

given in the error bar data below. It should be explained, that the 16 milliseconds accu-

racy came because the tests were being executed on at 64 bit processing architecture,

258

Page 289: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

but the clock values could only be retrieved with 32 bit accuracy. Therefore, the clock

timing values jumped by 16 milliseconds on time change.

Table D.3: Average Data Values for 53 nodes KB with 5000 Graphs.

# of nodes in Query pCG CPE (array) CPE (hash table)

3 559.35 131.05 838.65 660.95 226.9 944.99 862.55 431.25 1144.6511 1010.15 536.8 1174.515 1217.2 804.9 1340.421 1578.1 1142.6 1589.727 1976.6 1529.75 1774.131 2220.3 1881.95 2005.3543 2991.45 2933.55 2429.553 3786.7 3924.75 3039.35

The justification for dropping part of collected data was that it was consistent

over all tests runs for all systems being tested. For every 12runs, the highest and lowest

values collected were always dropped.

D.2 Error Bar Data - 53 nodes in KB Graphs

Discussed above was the fact that there was some “spreading”of data values;

that is, not all the data fell cleaning in a small range of timevalues.

Tables D.4, D.5 and D.6 indicate the actual fastest and slowest values collected

for the 53 nodes in KB graph test set.

259

Page 290: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Table D.4: Fast/Slow Values for 53 nodes KB with 1000 Graphs.

# nodes in Q pCG (f) pCG (s) CPE (af) CPE (as) CPE (hf) CPE (hs)3 78 94 0 48 125 2175 94 110 16 78 156 2189 140 172 31 127 141 29811 156 188 31 142 217 27915 203 219 63 219 202 34121 265 359 140 234 250 36227 328 407 188 407 312 42231 406 641 251 438 343 45243 532 797 424 607 403 59853 656 906 576 751 468 736

The columns are laid out by each of the three systems, giving the best (fastest)

time for each set of runs followed by the worst (slowest) timefor that run. Therefore,

first seen is the fastest time for the pCG system run followed by the slowest time for

that same set of runs. Second is the CPE system using the arraydata structure with

its fastest execution time for the runs followed by the slowest time, and lastly the CPE

system using the hash table data structures fastest times followed by the slowest.

The rows in each table are the number of nodes within the querygraph being

projected. The query graph size is smaller than or equal in size to the graphs found in

the KB. In fact, the query graphs are built from the abstract (most general) version of

the graphs in the KB.

Tables D.7, D.8 and D.9 then display the ranges of data (or howfar away from

the average value), which will be referred to as Error Bar Data, for all of the 53 nodes

260

Page 291: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Table D.5: Fast/Slow Values for 53 nodes KB with 2500 Graphs.

# nodes in Q pCG (f) pCG (s) CPE (af) CPE (as) CPE (hf) CPE (hs)3 218 235 16 79 391 5175 281 312 78 188 359 5319 390 407 141 232 468 62711 453 484 203 298 515 67315 562 656 249 438 548 76321 719 843 390 583 667 84927 937 1032 639 808 858 105231 985 1125 720 969 970 112443 1422 1969 1199 1475 1146 144153 1703 2344 1725 2008 1314 1673

Table D.6: Fast/Slow Values for 53 nodes KB with 5000 Graphs.

# nodes in Q pCG (f) pCG (s) CPE (af) CPE (as) CPE (hf) CPE (hs)

3 546 578 48 171 732 9405 656 672 155 314 782 10779 844 875 328 533 1017 129611 1000 1032 433 641 1033 128115 1203 1235 655 908 1221 144121 1562 1656 980 1345 1437 170927 1953 2031 1418 1712 1607 201831 2141 3172 1682 2103 1836 215843 2859 4015 2714 3121 2368 293453 3593 4813 3699 4064 2839 3248

261

Page 292: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

in KB test set. Again this is laid out in the columns by system with distance away from

the lowest (fastest) value followed by the highest (slowest) value for each system. The

rows again are just the number of nodes in the query graph being projected. It can be

seen that as the number of graphs in the KB is increased, than the systems (especially

pCG) become unstable when projecting a query graph that is close too or actually the

size of the KB graphs (see row 43 and 53 in Table D.9).

Table D.7: Error Bar Data Values for 53 nodes KB with 1000 Graphs.

# nodes in Q pCG (l) pCG (h) CPE (al) CPE (ah) CPE (hl) CPE (hh)

3 4.1 11.9 25.85 22.15 49.15 42.855 11.45 4.55 33.05 28.95 25.35 36.659 3.8 28.2 51.95 44.05 73.1 83.911 5.75 26.25 59.85 51.15 11.8 50.215 3.3 12.7 77.5 78.5 59.5 79.521 14 80 37.15 56.85 55.45 56.5527 18.15 60.85 92.25 126.75 50 6031 18.25 216.75 69.4 117.6 49.9 59.143 43.8 221.2 82.9 100.1 100.15 94.8553 44.85 205.15 86.15 88.85 127.35 140.65

D.3 Validation of Correct Projection

Shown within the next two subsections are the actual output data verifying that

the projections were correct. The output data is three figures where the first figure is the

KB graph, second is the query graph that was projected and thethird is the projection

results found. Each of the graphs outputted give several parts:

262

Page 293: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Table D.8: Error Bar Data Values for 53 nodes KB with 2500 Graphs.

# nodes in Q pCG (l) pCG (h) CPE (al) CPE (ah) CPE (hl) CPE (hh)3 7.1 9.9 39.3 23.7 66.35 59.655 10.35 20.65 44.4 65.6 107.1 64.99 6.9 10.1 49.1 41.9 84.9 74.111 12.8 18.2 48.05 46.95 85.8 72.215 13.8 80.2 95.2 93.8 137.25 77.7521 23.25 100.75 120.75 72.25 119.85 62.1527 8.4 86.6 74.5 94.5 81.8 112.231 52.55 87.45 137.15 111.85 82.45 71.5543 72.6 474.4 152.55 123.45 157.65 137.3553 111.95 529.05 150 133 200.8 158.2

Table D.9: Error Bar Data Values for 53 nodes KB with 5000 Graphs.

# nodes in Q pCG (l) pCG (h) CPE (al) CPE (ah) CPE (hl) CPE (hh)

3 13.35 18.65 83.05 39.95 106.6 101.45 4.95 11.05 71.9 87.1 162.9 132.19 18.55 12.45 103.25 101.75 127.65 151.3511 10.15 21.85 103.8 104.2 141.5 106.515 14.2 17.8 149.9 103.1 119.4 100.621 16.1 77.9 162.6 202.4 152.7 119.327 23.6 54.4 111.75 182.25 167.1 243.931 79.3 951.7 199.95 221.05 169.35 152.6543 132.45 1023.55 219.55 187.45 61.5 504.553 193.7 1026.3 225.75 139.25 200.35 208.65

263

Page 294: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

1. Basenode - this is the unique identifier for the single concept node that is consid-

ered the basic node of the graph.

2. Concept nodes - these are the concepts in the graph, givingthe unique identifier

as well as the type, referent (if any) and co-reference links(if any).

3. Relation nodes - these are the relations in the graph, giving the unique identifier

and the type value.

4. CRC list - displays the concept-relation-concept list bygiving the unique iden-

tifier for the node followed by the direction of the linkage into that node. If the

direction is indented on the next line than that linkage is scoped within the unique

identifier displayed above.

D.3.1 11 nodes in KB graphs - Unique Relation Results

The Figures D.1, D.2 and D.3 show the three elements of the test for the 11

nodes graph in KB being projected by a 3 nodes query graph. It should be noted that

the graph seen in the projection graph (see Figure D.3) has the unique identifying nodes

from the KB graph, but the structure of the query graph. Therefore, a subgraph was, in

fact, found within the KB that was isomorphic to the query graph.

264

Page 295: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Graph Graph*G1 built:Basenode - C2062

CG Base graph- Concepts in this graph:

Concept unique label is C2062with co-ref link *b and as Block

Concept unique label is C5665 as ColorConcept unique label is C0493 as NumberConcept unique label is C8990 as PlaceConcept unique label is C3008 as ShapeConcept unique label is C6346 as Table

- Relations in this graph:Relation unique label is R9474 as ATTRRelation unique label is R0897 as NAMERelation unique label is R9634 as LOCRelation unique label is R1126 as CHRCRelation unique label is R4954 as OnTable

- crc:C2062 -> R9474 -> C5665

-> R0897 -> C0493-> R9634 -> C8990-> R1126 -> C3008-> R4954 -> C6346

C5665 <- R9474 <- C2062C0493 <- R0897 <- C2062C8990 <- R9634 <- C2062C3008 <- R1126 <- C2062C6346 <- R4954 <- C2062

Figure D.1: KB for Verifying 3 nodes Query onto 11 nodes KB.

D.3.2 13 nodes in KB graphs - Multi-Instances Relation Results

The Figures D.5, D.4 and D.6 show the three elements of the test for the 13

nodes graph in KB being projected by a 5 nodes query graph. It should be noted that

the graphs seen in the projection graphs (see Figure D.6) show that two subgraphs are

265

Page 296: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

query graphs - 1 graph/s readGraph Graph*G2 built:

Basenode - C6067CG Base graph- Concepts in this graph:Concept unique label is C6067 as BlockConcept unique label is C1389 as Color

- Relations in this graph:Relation unique label is R9447 as ATTR

- crc:C6067 -> R9447 -> C1389C1389 <- R9447 <- C6067

Figure D.2: Query Graph for Verifying 3 nodes Query onto 11 nodes KB.

projection graphsGraph P30001 built:

Basenode - C2062CG Base graph- Concepts in this graph:

Concept unique label is C2062with co-ref link *b and as Block

Concept unique label is C5665 as Color- Relations in this graph:Relation unique label is R9474 as ATTR

- crc:C2062 -> R9474 -> C5665C5665 <- R9474 <- C2062

Figure D.3: Projection Verifying 3 nodes Query onto 11 nodesKB.

266

Page 297: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

found within the KB that are isomorphic to the query graph. Again the image of the

query graph is projected onto the KB graph, but the projection graph have the nodes

from inside of the KB graph.

query graphs - 1 graph/s readGraph Graph*G3 built:

Basenode - C4124CG Base graph- Concepts in this graph:

Concept unique label is C4124with co-ref link *b and as Block

Concept unique label is C1918 as ColorConcept unique label is C5682 as Number

- Relations in this graph:Relation unique label is R7152 as ATTRRelation unique label is R2455 as NAME

- crc:C4124 -> R7152 -> C1918

-> R2455 -> C5682C1918 <- R7152 <- C4124C5682 <- R2455 <- C4124

Figure D.4: Query Graph for Verifying 5 nodes Query onto 13 nodes KB.

267

Page 298: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

Graph Graph*G1 built:Basenode - C9474

CG Base graph- Concepts in this graph:

Concept unique label is C2062with co-ref link *b1 and as Block

Concept unique label is C9474with co-ref link *b2 and as Block

Concept unique label is C0493 as TableConcept unique label is C8990 as ColorConcept unique label is C3008 as NumberConcept unique label is C6346 as ColorConcept unique label is C3285 as Number

- Relations in this graph:Relation unique label is R5665 as AboveRelation unique label is R0897 as OnTableRelation unique label is R9634 as ATTRRelation unique label is R1126 as NAMERelation unique label is R4954 as ATTRRelation unique label is R5963 as NAME

- crc:C2062 -> R5665 -> C9474

-> R9634 -> C8990-> R1126 -> C3008

C9474 -> R0897 -> C0493-> R4954 -> C6346-> R5963 -> C3285<- R5665 <- C2062

C0493 <- R0897 <- C9474C8990 <- R9634 <- C2062C3008 <- R1126 <- C2062C6346 <- R4954 <- C9474C3285 <- R5963 <- C9474

Figure D.5: KB for Verifying 5 nodes Query onto 13 nodes KB.

268

Page 299: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

projection graphsGraph P30001 built:

Basenode - C2062CG Base graph- Concepts in this graph:

Concept unique label is C2062with co-ref link *b1 and as Block

Concept unique label is C8990 as ColorConcept unique label is C3008 as Number

- Relations in this graph:Relation unique label is R9634 as ATTRRelation unique label is R1126 as NAME

- crc:C2062 -> R9634 -> C8990

-> R1126 -> C3008C8990 <- R9634 <- C2062

C3008 <- R1126 <- C2062Graph P30002 built:

Basenode - C9474CG Base graph- Concepts in this graph:

Concept unique label is C9474with co-ref link *b2 and as Block

Concept unique label is C6346 as ColorConcept unique label is C3285 as Number

- Relations in this graph:Relation unique label is R4954 as ATTRRelation unique label is R5963 as NAME

- crc:C9474 -> R4954 -> C6346

-> R5963 -> C3285C6346 <- R4954 <- C9474C3285 <- R5963 <- C9474

Figure D.6: Projections Verifying 5 nodes Query onto 13 nodes KB.

269

Page 300: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

270

Page 301: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

REFERENCES

[1] H. Aidinejad. Semantic networks as a unified model of knowledge representa-tion. MCCS-88-117, 1988.

[2] H. Ait-Kaci. An algebraic semantics approach to the effective resolution of typeequations.Theor. Comp. Sc., 45:293–351, 1986.

[3] J.F. Allen. Maintaining knowledge about temporal intervals. Communicationsof the ACM, 26(11):pp. 832–843, 1983.

[4] J.F. Allen. Time and time again: The many ways to represent time. InternationalJournal of Intelligent Systems, 6(4):pp. 341–355, July 1991.

[5] J.-F. Baget and M.-L. Mugnier. Extensions of simple conceptual graphs: thecomplexity of rules and constraints.Journal of Artificial Intelligence Research(JAIR), 16:425–465, 2002.

[6] P. Becker. ToscanaJ. Technical University of Darmstadt, Germany, 2004.http://toscanaj.sourceforge.net/.

[7] P. Becker and J.H. Correia. The ToscanaJ suite for implementing ConceptualInformation Systems. In G. Stumme, editor,Formal Concept Analysis – State ofthe Art, Berlin – Heidelberg – New York, 2004. Springer. To appear.

[8] D.J. Benn. Implementing conceptual graph processes. Master’s thesis, Uni-versity of South Australia, School of Computer and Information Science, April2001. http://members.ozemail.com.au/ djbenn/Masters/thesis/Thesis.pdf.

[9] D.J. Benn and D. Corbett. An application of the process mechanism to a roomallocation problem using the pCG language. In H.S. Delugachand G. Stumme,editors,Conceptual Structures: Broadening the Base, Springer-Verlag LectureNotes in Computer Science 2120, pages 360–374, 2001.

[10] D.J. Benn and D. Corbett. pCG: An implementation of the process mechanismand an extensible CG programming language. InCGTools Workshop Proceed-ings in connection with ICCS 2001, Stanford, CA, 2001. [Online Access: July2001] URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html.

[11] R.J. Brachman. On the epistemological status of semantic networks. In N.V.Findler, editor,Associative Networks: Representation and Use of KnowledgebyComputers, pages 3–50. Academic Press, New York, 1979.

[12] E. Charniak and D. McDermott.Introduction To Artifical Intelligence. Addison-Wesley, Reading, MA, 1985.

271

Page 302: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

[13] G. Chartrand and L. Lesniak.Graphs & Digraphs. Mathematics Series.Wadsworth & Brooks/Cole, Pacific Grove, CA, second edition edition, 1986.

[14] N.R. Chavez, Jr. and R.T. Hartley. The Role of Object-Oriented Techniques andMulti-Agents in Story Understanding. InProceedings of the International Con-ference on Integration of Knowledge Intensie Multi-Agent Systems, Waltham,Mass, 2005. KIMAS 2005.

[15] M. Chein and M.-L. Mugnier. Conceptual graphs: Fundamental notions.Revued’Intelligence Artificielle, 6-4:365–406, 1992.

[16] N. Chomsky.Syntactic Structures. The Hague, Mouton, 1957.

[17] N. Chomsky. Aspects of the Theory of Syntax. MIT Press, Cambridge, MA,1965.

[18] R.J. Cole, P. Eklund, and G. Stumme. Document retrievalfor email search anddiscovery using Formal Concept Analysis.Applied Artificial Intelligence, 17(3),2003.

[19] D. Corbett. Reasoning and Unification over Conceptual Graphs. Kluwer Aca-demic/Plenum Plublishers, New York, 2003.

[20] T.H. Cormen, C.E. Leiserson, and R.L. Rivest.Introduction to Algorithms. TheMIT Press, 1990.

[21] M. Croitoru and E. Compatangelo. A combinatorial approach to conceptualgraph projection checking. InProc. of the 24th Int’l Conf. of the British Com-puter Society’s Specialist Group on Art’l Intell.AI’2004, Springer-Verlag, 2004.

[22] M. Croitoru and E. Compatangelo. On conceptual graph projection. TechnicalReport AUCS/TR0403, University of Aberdeen, UK, Department of ComputingScience, 2004.

[23] Z.J. Czech. Quasi-perfect hashing.The Computer Journal, 41(6):416–421, 1998.

[24] Z.J. Czech, G. Havas, and B.S. Majewski. Perfect hashing. Theoretical Com-puter Science, 182(1-2):1–143, 15 August 1997. Fundamental Study.

[25] F. Dau. Types and tokens for logic with diagrams. In K.E.Wolff, H.D. Pfeiffer,and H.S. Delugach, editors,Conceptual Structures at Work, 12th InternationalConference on Conceptual Structures, volume LNAI of3127, pages 62–93, Hei-delberg, July 2004. ICCS 2004, Springer.

[26] E. Davis. Representations of Commonsense Knowledge. Morgan Kaufmann,San Mateo, CA, 1990.

272

Page 303: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

[27] T. Dean and D. McDermott. Temporal data base management. Artificial Intelli-gence, 32:pp. 1–55, 1987.

[28] R. Dechter and J. Pearl. Network-based heuristics for constraint-satisfactionproblems.Artificial Intelligence, 34:1–38, 1988.

[29] H.S. Delugach. CharGer: Some lessons learned and new directions. InG. Stumme, editor,Working with Conceptual Structures - Contributions to ICCS2000, pages 306–309, 2000. Shaker-Verlag.

[30] H.S. Delugach. CharGer: A graphical Conceptual Graph ed-itor. In CGTools Workshop Proceedings in connection withICCS 2001, Stanford, CA, 2001. [Online Access: July 2001]URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html.

[31] H.S. Delugach. Towards building active knowledge systems with conceptualgraphs. In A. de Moor, Wilfried Lex, and Bernhard Ganter, editors,ConceptualStructures for Knowledge Creation and Communications, volume 2745 ofLNAI,pages 296–308, Heidelberg, 2003. Springer-Verlag.

[32] H.S. Delugach. CharGer 3.3 - A Conceptual Graph Editor. University ofAlabama in Huntsville, Alabama, USA, 2004. http://www.cs.uah.edu/ delu-gach/CharGer.

[33] H.S. Delugach. Common logic standard. Located at http://cl.tamu.edu, Novem-ber 2006.

[34] A. deMoor. Applying conceptual graph theory to the user-driven specificationof network information systems. In D. Lukose, H.S. Delugach, M. Keeler,L. Searle, and J.F. Sowa, editors,Conceptual Structures: Fulfilling Peirce’sDream, Springer-Verlag Lecture Notes in Artificial Intelligence1257, pages536–550. ICCS, Springer, August 1997.

[35] H.-D. Ebbinghaus, J. Elum, and W. Thomas.Mathematical Logic. Springer-Verlag, Berlin, second edition, 1994.

[36] P. Ekland. Mail-Sleuth. Email Analysis Pty Ltd, Australia, 2004.http://www.mail-sleuth.com/.

[37] G. Ellis and R. Levinson. The birth of peirce: A conceptual graphs workbench.In G. Ellis and R. Levinson, editors,Proccedings of the 1st International Work-shop on PEIRCE, 1992.

273

Page 304: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

[38] D. Eppstein. Arboricity and bipartite subgraph listing algorithms. Tech. Report94-11, University of California, Irvine, CA 92717, February 24 1994. Depart-ment of Information and Computer Science.

[39] D. Eppstein. Subgraph isomorphism in planar graphs andrelated problems.Jour-nal of Graph Algorithms and Applications, 3(3):1–27, 1999.

[40] J.M. Ettinger. The complexity of comparing reaction systems. Technical re-port, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA,November 2001.

[41] B. Ganter and R.Wille.Formal Concept Analysis: Mathematical Foundations.Springer-Verlag, Berlin Heildelberg New York, 1999.

[42] M.R. Garey and D.S. Johnson.Computers and Intractability A Guide to theTheory of NP-Completeness. W.H. Freeman and Company, New York, 1979.

[43] J.C. Giarratano and G. Riley.Expert Systems: Principles and Programming.PWS-KENT Publishing Company, Boston, 1989.

[44] G. Gratzer.Lattice Theory: First concepts and distributive lattices. W.H. Free-man, 1971.

[45] N. Guarino. Philosophy and the Cognitive Science, chapter The OntologicalLevel, pages 443–456. Holder-Pivhler-Tempsky, Vienna, 1994.

[46] F. Harary.Graph Theory. Addison-Wesley, Reading, MA, 1969.

[47] R.T. Hartley. A uniform representation for time and space and their mutualconstraints. In F. Lehmann, editor,Semantics Networks, Oxford, ENGLAND,1992.

[48] R.T. Hartley and M. Coombs. Reasoning with graph operations. In J.F. Sowa,editor,Principles of Semantic Networks: Explorations in the Representation ofKnowledge, San Mateo, CA, 1991. Morgan Kaufmann.

[49] R.T. Hartley and H.D. Pfeiffer. Data models for Conceptual Structures. InFoun-dations and Applications of Conceptual Structures, Contributions to ICCS 2002.ICCS2002, 2002.

[50] L. Henkin. The Completeness of the First-Order Functional Calculus.The Jour-nal of Symbolic Logic, 14, 1949.

[51] W. Hodges.The Blackwell Guide to Philosophical Logic, chapter 1, pages 9–32.Blackwell Publishing, 2001.

274

Page 305: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

[52] R. Jackendoff.Semantics Structures. MIT-Press, Cambridge, UK, 1990.

[53] K.S. Jones.Early years in machine translation: memoirs and biographies ofpioneers. John Benjamins, Amsterdam, 2000.

[54] A. Kabbaj. Un systeme multi-paradigme pour la manipulation des connais-sances utilisant la theorie des Graphes Conceptuels. PhD thesis, Universite deMontreal, Departement d’Informatique et de Recherche Operationnelle, Canada,1996.

[55] A. Kabbaj.The Amine Platform, 2004. http://amine-platform.sourceforge.net.

[56] A. Kabbaj. CS-TIW 2007 Second Conceptual Structures Tool InteroperabilityWorkshop, chapter Interoperability: The next steps for Amine Platform, pages65–70. Research Press International, 2007.

[57] A. Kabbaj and M. Janta-Polcynzki. From Prolog++ to Prolog+CG: A CG object-oriented logic programming language. In B. Ganter and G. Mineau, editors,Conceptual Structures: Logical, Linguistic, and Computional Issues, pages 540–554, Berlin, 2000. Lecture Notes in Artificial Intelligence, vol. 1867, Springer-Verlag.

[58] A. Kabbaj and B. Moulin. An algorithmic definition of CG operations based ona bootstrap step. InProceedings of ICCS’01, 2001.

[59] knowledge. Dictionary.com unabridged (v 1.0.1). Available at Dictionary.comwebsite: http://dictionary.reference.com/browse/knowledge, November 2006.

[60] knowledge. Merriam-webster online dictionary. Available at web-site:http://www.m-w.com/dictionary/knowledge, November 2006.

[61] F. Lehmann, editor.Semantics Networks. Pergamon Press, Oxford, ENGLAND,1992.

[62] D. Lenat and R. Guha.Building Large Knowledge-Based Systems - Representa-tion and Inference in the Cyc Project. Addison-Wesley, Reading, MA, 1990.

[63] H. Levesque. A fundamental tradeoff in knowledge representation and reason-ing. In Proceedings of CSCSI-84, pages 141–152, London, 1984.

[64] LIRMM. GoCITaNT. Montpellier, France, 2004.http://cogitant.sourceforge.net/index.html.

[65] G. Luger and W. Stubblefield.Artifical Intelligence - Structures and Strategiesfor Complex Problem Solving. The Benjamin/Cummings Publishing Company,Inc., Redwood City, CA, 1993.

275

Page 306: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

[66] R. MacGregor. The evolving technology of classification-based knowledge rep-resentation systems. In J.F. Sowa, editor,Principles of Semantic Networks: Ex-plorations in the Representation of Knowledge, San Mateo, CA, 1991. MorganKaufmann.

[67] A. Martelli and U. Montanari. An efficient unification algorithm. ACM Trans-actions on Programming Languages and Systems, 4(2):258–282, April 1982.

[68] B.T. Messmer and H. Bunke. Efficient subgraph isomorphism detection: A de-composition approach.IEEE Transactions on Knowledge and Data Engineering,12(2):307–323, March/April 2000.

[69] G.W. Mineau. From actors to process: The representation of dynamic knowl-edge using conceptual graphs. In Marie-Laure Mugnier and Michel Chein, ed-itors, Conceptual Structures: Theory, Tools, and Applications, volume 1453 ofSpringer-Verlag Lecture Notes in Artificial Intelligence, pages 65–79, Heidel-berg, August 1998. ICCS 1998, Springer.

[70] G.W. Mineau. Constraints on processes: Essential elements for the validationand execution of processes. In William Tepfenhart and Walling Cyre, editors,Conceptual Structures: Standards and Practices, volume 1640 ofLNAI, pages66–82, Heidelberg, July 1999. ICCS 1999, Springer.

[71] G.W. Mineau and Q. Gerbe. Contexts: A formal definition of worlds of asser-tions. In D. Lukose, H.S. Delugach, M. Keeler, L. Searle, andJ.F. Sowa, editors,Conceptual Structures: Fulfilling Peirce’s Dream, volume 1257 ofLNAI, pages80–94. ICCS 1997, Springer, August 1997.

[72] D. Moldovan, W. Lee, C. Lin, and M. Chung. Snap parallel processing appliedto ai. Computer, 25(5):39–49, may 1992.

[73] H. Motulsky. Intuitive Biostatistics. Oxford University Press, New York, 1995.

[74] M.-L. Mugnier and M. Chein. Polynomial algorithms for projection and match-ing. In H.D. Pfeiffer and T.E. Nagle, editors,Conceptual Structures: Theory andImplementation, volume LNAI of 754, pages 239–251. ICCS, Springer-Verlag,July 1992.

[75] M.-L. Mugnier and M. Leclere. On querying simple conceptual graphs withnegation. InData and Knowledge Engineering. DKE, Elsevier, 2006. Revisedversion of R.R. LIRMM 05-051.

[76] A. Mukerjee. Computational Representation and Processing of Spatial Expres-sions, chapter Neat vs Scruffy: A review of Computational Models for SpatialExpressions, pages 1–37. Lawrence Erlbaum Associates, Mahwah, NJ, 1998.

276

Page 307: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

[77] S.H. Myaeng and A. Lopez-Lopez. Conceptual graph matching: A flexible algo-rithm and experiments.Journal for Experimental and Theoretical AI, 4(2):107–126, 1992.

[78] T.E. Nagle, J.W. Esch, and G. Mineau. A notation for conceptual structures graphmatchers. InProceedings of the 5th Conceptual Structures Workshop, Boston,MA, 1990. held in conjunction with AAAI-90.

[79] T.E. Nagle, J.A. Nagle, L.L. Gerholz, and P.W. Ekland, editors. ConceptualStructures: Current Research and Practice. Ellis Horwood Workshops. EllisHorwood, 1992.

[80] A. Newell. The knowledge level.Artifical Intelligence, 18(1):87–127, 1982.

[81] P. Oehrstoem, J. Andersen, and H. Scharfe. What has happened to ontology. InF. Dau, M-L Mugnier, and G. Stumme, editors,Conceptual Structures: Com-mon Semantics for Sharing Knowledge, volume 3596 ofLNAI, pages 425 – 438.ICCS2005, Springer, July 2005.

[82] C.K. Ogden and I.A. Richards.The Meaning Of Meaning. Harcourt, Brace, andWorld, New York, NY, 1946.

[83] R. Pagh. Hash and displace: Efficient evaluation of minimal perfect hashfunctions. InAlgorithms and Data Structures: 6th International Workshop.WADS’99, LNCS, May 1999.

[84] M.S. Paterson and M.N. Wegman. Linear unification.J. Comput. Syst. Sci.,16(2):158–167, April 1978.

[85] J. Pearl.Heuristics. Addison-Wesley, Reading, MA, 1984.

[86] C.S. Peirce. Manuscripts on existential graphs.Peirce, 4:320–410, 1960.

[87] H.D. Pfeiffer. An exportable CGIF module from the CP environment: A prag-matic approach. In K.E. Wolff, H.D. Pfeiffer, and H.S. Delugach, editors,Con-ceptual Structures at Work, volume 3127 ofLNAI, pages 319–332. ICCS2004,Springer, July 2004.

[88] H.D. Pfeiffer, N.R. Chavez, Jr., and R.T. Hartley. A generic interface for commu-nication between story understanding systems and knowledge bases. InRichardTapia Celebration of Diversity in Computing Conference, Albuquerque, NM,2005.

277

Page 308: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

[89] H.D. Pfeiffer, N.R. Chavez Jr., and J.J. Pfeiffer Jr. CPE design considering in-teroperability. In H.D. Pfeiffer, A. Kabbaj, and D.J. Benn,editors,CS-TIW 2007Second Conceptual Structures Tool Interoperability Workshop, pages 71–75. Re-search Press International, 2007.

[90] H.D. Pfeiffer and R.T. Hartley. Semantic additions to conceptual programming.In Proc. of the Fourth Annual Workshop on Conceptual Structures, Detroit, MA,1989.

[91] H.D. Pfeiffer and R.T. Hartley. Additions for set representation and processing toconceptual programming. InProc. of the Fifth Annual Workshop on ConceptualStructures, pages 131–140, Boston&Stockholm, 1990.

[92] H.D. Pfeiffer and R.T. Hartley. The Conceptual Programming Environment, CP:Reasoning representation using graph structures and operations. InProc. of IEEEWorkshop on Visual Languages, Kobe, Japan, 1991.

[93] H.D. Pfeiffer and R.T. Hartley. The Conceptual Programming Environment, CP.In T.E. Nagle, J.A. Nagle, L.L. Gerholz, and P. W. Ekland, editors, Concep-tual Structures: Current Research and Practice, Ellis Horwood Workshops. EllisHorwood, 1992.

[94] H.D. Pfeiffer and R.T. Hartley. Temporal, spatial, andconstraint handling inthe Conceptual Programming Environment, CP.Journal of Experimental andTheoretical AI, 4(2):167–182, 1992.

[95] H.D. Pfeiffer and R.T. Hartley. Visual CP representation of knowledge. InG. Stumme, editor,Working with Conceptual Structures - Contributions to ICCS2000, pages 175–188, 2000. Shaker-Verlag.

[96] H.D. Pfeiffer and R.T. Hartley. ARCEdit - CG editor. InCGTools Workshop Proceedings in connection with ICCS2001, Stanford, CA, 2001. [Online Access: July 2001]URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html.

[97] H.D. Pfeiffer and R.T. Hartley, editors.CGTools Workshop Proceedings in con-nection with ICCS 2001, Stanford, CA, 2001. [Online Access: July 2001]URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html.

[98] J. Piaget.Genetic epistomology. Columbia University Press, New York, 1970.Trans. E. Duckworth.

[99] S. Polovina and R. Hill. Enhancing the initial requirements capture of multi-agent systems through conceptual graphs. In F. Dau, M-L Mugnier, and

278

Page 309: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

G. Stumme, editors,Conceptual Structures: Common Semantics for SharingKnowledge, volume 3596 ofLNAI, pages 439–452. ICCS2005, Springer, July2005.

[100] B. Prasad. A planning system for blocks-world domain.In AICCSA ’01: Pro-ceedings of the ACS/IEEE International Conference on Computer Systems andApplications, page 59, Washington, DC, USA, 2001. IEEE Computer Society.

[101] A. Puder. Mapping of cgif to operational interfaces. In Marie-Laure Mugnier andMichel Chein, editors,Conceptual Structures: Theory, Tools, and Applications,Springer-Verlag Lecture Notes in Computer Science 1453, pages 119–126, 1998.

[102] K. Radeck.C# and Java: Comparing Programming Languages. MSDN, Octo-ber 2003. http://www.windowsfordevices.com/articles/AT2128742838.html.

[103] S.W. Reyner. An analysis of a good algorithm for the subtree problem.SIAM J.Comput., 6:730–732, 1977.

[104] J.A. Robinson.Machine Intelligence, volume 6, chapter Computational logic:The unification computation., pages 63–72. Edinburgh University Press, Edin-burgh, Scotland, 1971.

[105] S. Roman.Win32 API Programming with Visual Basic. O’Reilly, first edition,1999.

[106] S. Russell and P. Norvig.Artifical Intelligence - A Modern Approach. PrenticeHall, Upper Saddle River, NJ, 1995.

[107] G. Ryle.The Concept of Mind. Penguin Books, Harmondsworth, UK, 1949.

[108] L. Schubert. Extending the expressive power of semantic networks. ArtificalIntelligence, 7:163–198, 1976.

[109] S.C. Shapiro. A net structure for semantic information storage, deduction, andretrieval. InProceedings of the 2nd International Conference on Artifical Intel-ligence, pages 512–523, 1971.

[110] S.C. Shapiro and W.J. Rapaport. The sneps family.Computers Math. Applic.,23(2-5):243–275, 1992.

[111] A. Shokoufandeh and S. Dickerson.Graph-Theoretical Methods in ComputerVision. Number 2292 in LNCS. Springer-Verlag, Berlin Heidelberg,2002.

[112] J. Siegel. Making the case: OMG’s Model Driven Architecture (MDA). SanDiego Times, 2002.

279

Page 310: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

[113] D. Skipper and H. Delugach. OpenCG: An open source graph representation. InA. de Moor, S. Polovina, and H. Delugach, editors,First Conceptual StructuresTool Interoperability Workshop, pages 48–57. CS-TIW 2006, Aalborg Univer-sitetsforlag, 2006.

[114] R. Soley.Model Driven Architecture. OMG, 11-05 2000. document.

[115] F. Southey. Notio and Ossa. InCGTools Workshop Proceedings in con-nection with ICCS 2001, Stanford, CA, 2001. [Online Access: July 2001]URL:http://www.cs.nmsu.edu/ hdp/CGTOOLS/proceedings/index.html.

[116] F. Southey.NOTIO, 2003. http://notio.lucubratio.org/index.html.

[117] F. Southey and J.G. Linders. NOTIO - a Java API for developing CG tools. InW. Tepfenhart and W. Cyre, editors,Conceptual Structures: Standards and Prac-tices, pages 262–271, Berlin, 1999. Springer-Verlag. Lecture Notes in ArtificialIntelligence, LNAI 1640.

[118] J.F. Sowa. Conceptual graphs for a data base interface. IBM Journal of Researchand Development, 20(4):336–357, 1976.

[119] J.F. Sowa.Conceptual Structures: Information Processing in Mind andMachine.Addison-Wesley, Reading, MA, 1984.

[120] J.F. Sowa, editor.Principles of Semantic Networks: Explorations in the Repre-sentation of Knowledge. Morgan Kaufmann, San Mateo, CA, 1991.

[121] J.F. Sowa. Conceptual graphs as a universal knowledgerepresentation. InF. Lehmann, editor,Semantics Networks, Oxford, ENGLAND, 1992.

[122] J.F. Sowa. Conceptual graphs: Draft proposed american national standard. InConceptual Structures: Standards and Practices, editors,Conceptual Structures:Standards and Practices, pages 1–65, Berlin, 1999. Springer-Verlag. LectureNotes in Artificial Intelligence, LNAI 1640.

[123] J.F. Sowa.Knowledge Representation: Logical, Philosophical, and Computa-tional Foundations. Brooks/Cole, 2000.

[124] J.F. Sowa. Architectures for intelligent systems.IBM Systems Journal,41(3):331–349, 2002.

[125] J.F. Sowa, N.Y. Foo, and A. Rao, editors.Conceptual Graphs for KnowledgeSystems. Addison Wesley, New York, NY, 1989.

280

Page 311: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

[126] J.F. Sowa et al. Conceptual Graph Standard, American National StandardNCITS. et all, T2/ISO/JTC1/SC32 WG2 M 00 edition, 2001. [Access On-line:April 2001], URL: http://www.bestweb.net/ sowa/cg/cgstand.htm.

[127] B. Stroustrup.The C++ Programming Language. Addison-Wesley, 3rd edition,2000.

[128] D. A. Tappan.Knowledge-Based Spatial Reasoning For Automated Scene Gen-eration From Text Descriptions. PhD thesis, New Mexico State University, May2004.

[129] W.M. Tepfenhart. Ontologies and conceptual structures. In Marie-Laure Mug-nier and Michel Chein, editors,Conceptual Structures: Theory, Tools, and Ap-plications, volume LNAI of 1453, pages 334–348, Heidelberg, August 1998.ICCS 1998, Springer.

[130] R. Thomopoulos, J.-F. Baget, and O. Haemmerle. Conceptual graphs as coopera-tive formalism to build and validate a domain expertise. In U. Priss, S. Polovina,and R. Hill, editors,Conceptual Structures: Knowledge Architectures for SmartApplications, pages 112–125. ICCS2007, Springer, 2007.

[131] M. Thorup. Even strongly universal hashing is pretty fast. InSODA ’00: Pro-ceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms,pages 496–497, Philadelphia, PA, USA, 2000. Society for Industrial and AppliedMathematics.

[132] J.R. Ullman. An algorithm for subgraph isomorphism.J. of the Assoc. for Com-puting Machinery, 23(1):31–42, 1976.

[133] W.P. Weijland. Semantics for logic programs without occur check.TheoreticalComputer Science, 71:155–174, 1990.

[134] C.A. Welty. In Integrated Representation for Software Development andDiscov-ery. PhD thesis, Vassar College, 1995.

[135] A.R. White. Conceptual analysis. In C.J. Bontempo andS.J. Odell, editors,TheOwl of Minerva, pages 103–117. McGraw-Hill, 1975.

[136] M. Willems. Projection and unification for conceptualgraphs. In G. Ellis,R. Levinson, W. Rich, and J. Sowa, editors,Conceptual Structures: Applica-tions, Implementation and Theory, volume LNAI of 954, pages 278–282. ICCS1995, Springer, August 1995.

281

Page 312: THE EFFECT OF DATA STRUCTURES MODIFICATIONS ON …hdp/PDF/dissertation.pdf · Structures Tool Interoperability Workshop.Research Press International, 2007. Field of Study Major field:

[137] T. Winograd. Frame representations and the declarative/procedural controvery.In Readings in Knowledge Representation, pages 185–210. Morgan Kaufman,1975.

[138] K.E. Wolff. ’particles’ and ’waves’ as understood by temporal concept analysis.In K.E. Wolff, H.D. Pfeiffer, and H.S. Delugach, editors,Conceptual Structuresat Work, pages 126–141. ICCS2004, LNAI 3127, Springer, July 2004.

[139] W.A. Woods. What’s in a link: Foundations for semanticnetworks. In D.G.Bobrow and A.M. Collins, editors,Representation and Understanding: Studiesin Cognitive Science, pages 35–82. Academic Press, 1975.

[140] W.A. Woods. Understanding subsumption and taxonomy.In J.F. Sowa, editor,Principles of Semantic Networks: Explorations in the Representation of Knowl-edge. Morgan Kaufmann, 1991.

[141] W.A. Woods and J.G. Schmolze. The kl-one family.Computers Math. Applic.,23(2-5):133–177, 1992.

282