graph databases: efficient storage and rapid retrieval graph databases: efficient storage and rapid...
Post on 19-Dec-2015
224 views
TRANSCRIPT
Graph Databases: Efficient Graph Databases: Efficient storage storage and Rapid and Rapid retrieval retrieval
Robert LevinsonRobert Levinson
Machine Intelligence Machine Intelligence LaboratoryLaboratory
University of CaliforniaUniversity of California
Santa CruzSanta Cruz
THE CG MARS LANDERTHE CG MARS LANDERHigh level architectureHigh level architecture
ADB
ADB Processor
CG Parser& Processor
Query Processor& Matcher
Answer: morespecific CGs in DB
CG Creator/Translator with Type HierarchyEnglish Translator,Source reference,& GUI
English Discourse English Queries
English-CG-English Translation
Santa Cruz:The CG Mars Lander
SUBGRAPH-ISOMORPHISMSUBGRAPH-ISOMORPHISM
NP-COMPLETE NP-COMPLETE 2 Main Methods:2 Main Methods: A. Backtracking SearchA. Backtracking Search B. Refinement O(n^2) on avg. B. Refinement O(n^2) on avg. (both exploit candidate binding lists, (both exploit candidate binding lists,
modulo type hierarchy)modulo type hierarchy) Key Idea: Amortize Cost OverKey Idea: Amortize Cost Over
» Millions of OperationsMillions of Operations» Mega-graph storageMega-graph storage
Exploit Symmetry !! Exploit Symmetry !!
““Invariant with respect to Invariant with respect to transformation.”transformation.”
““Shared information between objectsShared information between objects
or systems or their representations.”or systems or their representations.”
AB+AC = A(B+C). AB+AC = A(B+C).
Symmetry SynonymsSymmetry Synonyms
similaritysimilarity commonalitycommonality structurestructure mutual informationmutual information relationshiprelationship redundancyredundancy
Total Information = Total Information = Diversity + SymmetryDiversity + Symmetry
Diversity corresponds to Comp Sci Diversity corresponds to Comp Sci “Complexity” = resources “Complexity” = resources required.required.
Diversity can often only be Diversity can often only be resolved with Combinatorial resolved with Combinatorial Search Search
Conceptual Graph Conceptual Graph ProcessingProcessing
Concept Types “a cat is an animal “Concept Types “a cat is an animal “ Relation Types or Graph Type Relation Types or Graph Type
“mother-of” Is “parent- “mother-of” Is “parent-of”of”
Transitivity of Projection (subgraph-Transitivity of Projection (subgraph-isomorphism]isomorphism]
Redundant SubstructuresRedundant Substructures Redundant LiteralsRedundant Literals Redundant PointersRedundant Pointers
6 Retrieval Methods: 6 Retrieval Methods:
Method I: Flat OrderingMethod I: Flat Ordering Method II: 2-Levels: Indexes, GraphsMethod II: 2-Levels: Indexes, Graphs Method III: Full Partial Order Method III: Full Partial Order
HierarchyHierarchy Method IV: Multi-Level Hierarchical Method IV: Multi-Level Hierarchical
RetrievalRetrieval Method V: Remember Node BindingsMethod V: Remember Node Bindings Method VI: UDS: The Universal Data Method VI: UDS: The Universal Data
Structure Structure
THE CG MARS LANDERTHE CG MARS LANDERExploit Tuple-Based Exploit Tuple-Based Linear CGs ! Linear CGs !
(a conceptual graph (a conceptual graph syntaxsyntax
that supports rapid that supports rapid retirieval and question-retirieval and question-answering).answering).
@CG000: {@CG000: {
AGNT (government, BE) }.AGNT (government, BE) }. @CG001: {@CG001: { AGNT AGNT
(Hungarian_American_Enterprise_Fund, invest),(Hungarian_American_Enterprise_Fund, invest), OBJ (invest, Dollars | 1000000 ),OBJ (invest, Dollars | 1000000 ), IN (Dollars | 1000000, IN (Dollars | 1000000,
first_business)first_business) }.}. @CG002 : {@CG002 : { AGNT (AGNT (@CG000@CG000, manage),, manage), OBJ (manage, OBJ (manage, @CG001@CG001) }.) }.
THE CG MARS LANDERTHE CG MARS LANDER
A query:A query:/* Q2: Does anybody own the /* Q2: Does anybody own the ragrag
newspapernewspaper New York Post ? */New York Post ? */
Query::@bob_202 : {Query::@bob_202 : { ISA ( New_York_Post , newspaper ISA ( New_York_Post , newspaper
[ n34861 ] ) ,[ n34861 ] ) , CHRC ( newspaper [ n34861 ] , CHRC ( newspaper [ n34861 ] , ragrag
[ n9 ] ) ,[ n9 ] ) , AGNT ( own [ v9125 ] , ????? ) ,AGNT ( own [ v9125 ] , ????? ) ,}.}.
THE CG MARS LANDERTHE CG MARS LANDERAnswer: Answer: /* A2: Rupert Murdoch once owned the troubled /* A2: Rupert Murdoch once owned the troubled
tabloidtabloid newspaper newspaper New York Post. */New York Post. */@CG1684_3 : {@CG1684_3 : { ISA ( New_York_Post , newspaper [ n34861 ] ) ,ISA ( New_York_Post , newspaper [ n34861 ] ) , CHRC ( newspaper [ n34861 ] , CHRC ( newspaper [ n34861 ] , tabloidtabloid
[ n27111 ] ) ,[ n27111 ] ) , CHRC ( newspaper [ n34861 ] , trouble CHRC ( newspaper [ n34861 ] , trouble
[ n25320 ] ) ,[ n25320 ] ) , AGNT ( own [ v9125 ] , Rupert Murdoch) ,AGNT ( own [ v9125 ] , Rupert Murdoch) , CHRC ( own [ v9125 ] , once )CHRC ( own [ v9125 ] , once )}.}.
THE CG MARS LANDERTHE CG MARS LANDERCapabilitiesCapabilities & timings & timings:: Inputs:Inputs:
– CGs (tens of thousands) CGs (tens of thousands) – pre-processed parts of speech pre-processed parts of speech – Type Hierarchy (150,000 WORDNET Type Hierarchy (150,000 WORDNET
augmented English words) augmented English words) – natural language queriesnatural language queries
Outputs:Outputs:– CG (save & restore) DBCG (save & restore) DB– replies to queriesreplies to queries– specializations and maximal specializations and maximal
specializationsspecializations
THE CG MARS LANDERTHE CG MARS LANDER
Capabilities & Capabilities & timingstimings::– benchmark machine:benchmark machine:
– Sun Ultra Enterprise 4000 (with 4 UltraSPARC 167Mhz Sun Ultra Enterprise 4000 (with 4 UltraSPARC 167Mhz and 512KB External Cache CPU and 256MB of main and 512KB External Cache CPU and 256MB of main memory)memory)
Read, process, and store an 18,000 CG input file in Read, process, and store an 18,000 CG input file in 1 hour 1 hour and 46 minutesand 46 minutes. .
Reloading of above DB takes on the order of Reloading of above DB takes on the order of secondsseconds. . A 150,000 word ontology is processed in A 150,000 word ontology is processed in 16 seconds16 seconds. . Each query is handled in at most Each query is handled in at most 5.5 seconds5.5 seconds.. For smaller database (hundreds of CGs only), the time For smaller database (hundreds of CGs only), the time
to handle a single query can be as low as to handle a single query can be as low as 0.2 seconds0.2 seconds. .
THE CG MARS LANDERTHE CG MARS LANDER
Cost/benefit analysis:Cost/benefit analysis: assume N CGs and Q queriesassume N CGs and Q queries
Method I Cost: Method I Cost:
Method III Cost:Method III Cost:• N insertionsN insertions
• Q queriesQ queries
N Q
N log102 N
2Q log10
2N
+
Cost/ Cost/ benefit benefit tabletableN
1010
1010
1010
100100
100100
100100
1,0001,000
1,0001,000
1,0001,000
1,0001,000
10,00010,000
10,00010,000
Q
11
1010
100100
11
1010
100100
11
1010
100100
1,0001,000
1,0001,000
10,00010,000
Method I
Cost
1010
100100
1,0001,000
100100
1,0001,000
10,00010,000
1,0001,000
10,00010,000
100,000100,000
1,000,0001,000,000
10,000,0010,000,0000
100,000,0100,000,00000
Method III
Cost
5.05.0
14.914.9
104.8104.8
296.6296.6
328.6328.6
688.6688.6
7,293.47,293.4
7,374.47,374.4
8,184.48,184.4
16,284.416,284.4
152,823.8152,823.8
296,823.8296,823.8
THE CG MARS LANDERTHE CG MARS LANDER
6 UDS DESIGN PRINCIPLES:6 UDS DESIGN PRINCIPLES:1. 1. Every primitive data object, label Every primitive data object, label
or symbol should be stored only or symbol should be stored only once with pointers used to denote once with pointers used to denote the actual uses of the object.the actual uses of the object.
2.2. Every compound object should be Every compound object should be stored with the minimum stored with the minimum information required to represent information required to represent the combination of its parts.the combination of its parts.
THE CG MARS LANDERTHE CG MARS LANDER3. 3. Given no loss of accuracy, objects Given no loss of accuracy, objects
should be processed at the highest should be processed at the highest level of abstraction possible.level of abstraction possible.
4.4. If one were to implement a If one were to implement a conceptual graph based on the conceptual graph based on the diagrammatic representation, the diagrammatic representation, the costs associated with storage and costs associated with storage and matching would be much higher matching would be much higher than they need to be.than they need to be.
THE CG MARS LANDERTHE CG MARS LANDER
5.5. The same abstraction The same abstraction mechanism that goes from labels mechanism that goes from labels to graphs can be taken one step to graphs can be taken one step further to facilitate the storage further to facilitate the storage and retrieval of nested context and retrieval of nested context graphs.graphs.
6. 6. A graph is itself the best A graph is itself the best descriptor of its nodes.descriptor of its nodes.
CONCLUDING THOUGHTSCONCLUDING THOUGHTS
The key to efficient implementation The key to efficient implementation of CGs is the exploitation of of CGs is the exploitation of symmetry or structure. symmetry or structure.
CG operations can be executed CG operations can be executed efficiently in real-time applications. efficiently in real-time applications.
At the implementation or machine At the implementation or machine level knowledge representation level knowledge representation formalisms sre often nearly the formalisms sre often nearly the same. same.
THE CG MARS LANDERTHE CG MARS LANDERReferencesReferences
[1][1] C. Colin and R. Levinson, `` C. Colin and R. Levinson, ``Partial order Partial order maintenancemaintenance,'' Special Interest Group on ,'' Special Interest Group on Information Retrieval Forum, vol. 23, no. 3,4, Information Retrieval Forum, vol. 23, no. 3,4, pp. 34-59, 1988. pp. 34-59, 1988.
[2][2] G. Ellis, R. A. Levinson, and P. Robinson, G. Ellis, R. A. Levinson, and P. Robinson, ````Managing complex objects in PEIRCEManaging complex objects in PEIRCE,'' Special ,'' Special Issue on Object-Oriented Approaches in Artificial Issue on Object-Oriented Approaches in Artificial Intelligence and Human-Computer Interaction Intelligence and Human-Computer Interaction (IJMMS), vol. 41, pp. 109-148, 1994. (IJMMS), vol. 41, pp. 109-148, 1994.
[3][3] R. Hughey, R. Levinson, and J. D. Roberts, eds., R. Hughey, R. Levinson, and J. D. Roberts, eds., Issues in Parallel Hardware for Graph RetrievalIssues in Parallel Hardware for Graph Retrieval, , 1993. 1993.
More references…More references…
[4]R. Levinson, ``[4]R. Levinson, ``A self-organizing retrieval system A self-organizing retrieval system for graphsfor graphs,'' in AAAI-84, pp. 203-206, Morgan ,'' in AAAI-84, pp. 203-206, Morgan Kaufman, 1984.Kaufman, 1984.
[5][5] R. Levinson, `` R. Levinson, ``Pattern associativity and the Pattern associativity and the retrieval of semantic networksretrieval of semantic networks,'' Computers and ,'' Computers and Mathematics with Applications, vol. 23, no. 6-9, Mathematics with Applications, vol. 23, no. 6-9, pp. 573-600, 1992. Part 2 of Special Issue on pp. 573-600, 1992. Part 2 of Special Issue on Semantic Networks in Artificial Intelligence, Fritz Semantic Networks in Artificial Intelligence, Fritz Lehmann, editor. Also reprinted on pages 573-600 Lehmann, editor. Also reprinted on pages 573-600 of the book, Semantic Networks in Artificial of the book, Semantic Networks in Artificial Intelligence, Fritz Lehmann, editor, Pergammon Intelligence, Fritz Lehmann, editor, Pergammon Press, 1992.Press, 1992.
THE CG MARS LANDERTHE CG MARS LANDER ReferencesReferences[6][6] R. Levinson and G. Ellis, `` R. Levinson and G. Ellis, ``Multilevel hierarchical Multilevel hierarchical
retrievalretrieval,'' Knowledge-Based Systems, vol. 5, ,'' Knowledge-Based Systems, vol. 5, pp. 233-244, September 1992. Special Issue on pp. 233-244, September 1992. Special Issue on Conceptual Graphs. Conceptual Graphs.
[7][7] R. Levinson and G. Fuchs, `` R. Levinson and G. Fuchs, ``A pattern-weight A pattern-weight formulation of search knowledgeformulation of search knowledge,'' Tech. Rep. UCSC-,'' Tech. Rep. UCSC-CRL-91-15, University of California Santa Cruz, 2001. CRL-91-15, University of California Santa Cruz, 2001. Revision to appear in Computational Intelligence. Revision to appear in Computational Intelligence.
[8][8] R. A. Levinson, `` R. A. Levinson, ``UDS: A universal data structureUDS: A universal data structure,'' ,'' in Proc. 2nd International Conference on Conceptual in Proc. 2nd International Conference on Conceptual Structures, (College Park, Maryland USA), pp. 230-Structures, (College Park, Maryland USA), pp. 230-250, 1991. 250, 1991.