exemplar queries: knowledge exploration using information graphs davide mottin, university of trento...
TRANSCRIPT
Exemplar Queries: Knowledge Exploration using Information GraphsDavide Mottin, University of TrentoAugust 20, 2015 @ RMIT University, Melbourne
Department ofInformation Engineering and Computer Science
2 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Short Bio
Education• April 2015 – Now in the job market!: PhD in computer science
from University of Trento• Thesis title: “Advanced Query Paradigms for the Novice User”• Advisors: Prof. Themis Palpanas, Prof. Yannis Velegrakis
• 2010/08: MSc/BSc in computer science
Working Experience• 2012: Yahoo! Labs, Barcelona under Dr. Francesco Bonchi • 2011: Microsoft Research, Beijing under Dr. Haixun Wang
3 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Traditional Query Answering
owns=Search Engine, based=California produces=Mobiles
Database
4 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Hardly Expressible Queries
Query???
Does not know how to describe other companies
Database
5 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
The Exemplar Queries perspective
“I think the greatest way to learn is to learn by someone's example.”
Tobey Maguire
6 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
A different need
7 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Existing Search Engines
acquisitions like Google Youtube
Yahoo!-Tumblr or Microsoft-Skypenot present as interesting acquisitions.
Cannot be solved by Related Queries [Boldi11,Bordino13] and Query Relaxation [Mottin13,Mishra09].
8 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
A new perspective
9 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Exemplar Queries
Input: Qe, an example element of interestOutput: set of elements in the desired result set
Exemplar Query Evaluation• evaluate Qe in a database D, finding a sample s• find the set of elements a similar to s given a similarity relation
[PVLDB 2014, SIGMOD 2014 (Demo)]
10 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Challenges
• Define the similarity between sample and answers• Determine the best data-model for the problem• Find answers efficiently
11 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Our Approach
Exemplar Queries• The user query is an indication of the structure of the answers
12 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Problem
Solution Overview [SIGMOD Record 2014]
13 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
General Solution
Input: User Query Q, an example of the expected resultsOutput: Set of expected results
Procedure:- Detect the sample for the query Q- Find the structures similar to the sample- Rank the results
14 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Data Model: Knowledge graph
14
15 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Strict equality: Edge Isomorphism
15
S A1 A2
Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT16
Similarity: Edge Isomorphism
D. Mottin et al. Exemplar queries: Give me an example of what you need. PVLDB, 7(5), 2014.
17 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
subgraph isomorphism is NP-complete [Cook71]
Solution
Input: User Query Q, an example of the expected results.Output: Set of expected results
Procedure:- Detect the sample for the query Q
- Find the structures edge isomorphic to the sample- Rank the results
- Prune the non-matching nodes
Solution1. IterativePruning: fast
reject non matching nodes
18 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
distance 1 distance 2
a b c a b c
2 0 0 1 2 1
d-neighborhood
distance 1 distance 2
a b c a b c
1 0 0 0 1 1
Query node q1
Graph node 1
Difference
1 0 0 1 1 01 0 0 1 1 0
Theorem
19 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
d-neighborhood
distance 1 distance 2
a b c a b c
1 0 0 0 1 1
distance 1 distance 2
a b c a b c
1 1 1 2 1 0
Query node q1
Graph node 2
Difference
0 1 1 2 0 -10 1 1 2 0 -1
Theorem
20 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
The IterativePruning Algorithm
1. Start from a query node q2. Match q with the graph nodes3. For each adjacent node of q4. Find nodes in the graph from
candidate map of q matching the edge
5. Repeat 2. with an adjacent node of q until all nodes have been visited
Theorem (Pruning Completeness)No subgraph isomoprhic solution is discarded by IterativePruning Algorithm
21 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Solution
Input: User Query Q, an example of the expected results.Output: Set of expected results
Procedure:- Detect the sample for the query Q
subgraph isomorphism is NP-complete [Cook71]
- Prune the non-matching nodes - Find the structures edge isomorphic to the sample- Rank the results
- Restrict the search space
Solution1. IterativePruning: fast
reject non matching nodes
Solution1. IterativePruning: fast
reject non matching nodes
2. RelevantNeighborhood: restrict the search space to “near” nodes
22 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Restricting the search space
22
S A1 A2
User Query
Idea1. Not all the the nodes are equally relevant2. Nodes “far” from the query are less related
23 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
The Relevant Neighborhood Algorithm
Prune the search space by identifying the valuable portions:• Based on an approximation of Personalized PageRank
• Transition matrix A with non-uniform edge weights based on inverse frequency
Procedure1. Assign each node in the sample a fixed number of particles2. Distribute the particles on neighbor nodes favoring sample edge-
labels3. Repeat 2 until the number of particles is less than a threshold
Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT24
Similarity: Simulation
D. Mottin et al. Exemplar queries: a New Way of Searching. Submitted for publication.
25 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Strict equality: Edge Isomorphism
S A1 A2
Why Yahoo! Tumblr are not present?
26 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
More freedom: Simulation
S A1 A2
Tumblr matches both an acquisition and a
website
Match edge-label sequences instead of structures
27 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
• Use Strong Simulation [Ma14], with:• bounded matchings• node-topology preserving
Issue: Strong Simulation preserves node labelsIdea: Apply Strong Simulation algorithm on a graph where edges becomes nodes with label equal to the original edge.
Pruning: • d-neighborhood becomes a boolean vector• a node matches a query node if the boolean and between the two
vectors is positive
Theorem
Algorithms for Simulation
28 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Ranking results
28
S A1 A2
User Query
Google Yahoo! CBS
Combination of two factors1. Structural: similarity of two nodes in terms of neighbor
relationships2. Distance-based: the PageRank already computed
29 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Experimental Setup
Dataset• Freebase: 76M nodes, 314M edges (entire!)
• Freebase Internet Domain: 2M nodes, 6M edges
• Synthetic datasets
• Testset: 100 queries manually mapped from AOL query logs
• Baseline: NeMa [6]: approximate answers on graphs
Measures• Algorithms total time
• User study asking to evaluate the usefulness of our approach
29
30 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Scalability results (10M nodes)
30
Time• RelevantNeighborhood is stable on the number of
answers
• <150ms to get the answers
31 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Usefulness
Quality• 92% people say that
Exemplar Queries are useful
• 62% already had the need for such a service
ComparisonWhich method is preferred? • 64% Exemplar Queries • 30% Other approaches
32 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Simulation vs Isomorphism
32
Analysis• Simulation finds more answers (up to 48%) but aggregates results
• Isomorphism runs faster than simulation (less operations on simple queries)
33 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Qualitative Evaluation
33
Query: Google – YouTube – Menlo Park
Approximate Graph Query Answering [Khan13]
Edge Isomorphism
Simulation
Answers are collapsed
More interesting answers
34 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Size increment for Simulation
25% to 46% more edges than isomorphism: Answers are collapsed
35 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Dealing with too many results
“One of the effects of living with electric information is that we live habitually in a state of information
overload. There's always more than you can cope with.”
Marshall McLuhan
36 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Result Refinement
37 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Information overload
37
I want to know about IT company
acquisitions
38 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Too many results to visualize
39 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Dealing with Information Overload
• Faceted Search• present aspects of the results [Roy08]
• Query reformulation• Modify some of the query conditions
• In structured databases [Mishra09]• In web search [Dang10]
Frist Study of Problem on GRAPHS
40 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Graph Search
40
41 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Graph Query Reformulation
Results
Query
Reformulations:query supergraphs
…
Exponential numberof reformulations
42 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Challenges
• The number of reformulation is exponential• Quantify the interestingness of a reformulation• Finding query reformulations is NP-complete
43 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
A Naïve Approach: k-most frequent super-graphs
Query
480 matches
450 matches
100 matches
Supergraphs
30 matches420 matches
Until k reformulations are found:- Retrieve the most frequent super-
patternFrequent ≠ Interesting
!
44 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Our Approach
Graph Query Reformulation with Diversity• Finds k meaningful reformulation efficiently
D. Mottin, F. Bonchi, F. Gullo. Graph Query Reformulation with Diversity, KDD 2015.
45 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Finding meaningful Reformulations
Results
Query
Coverage Diversity
Find k meaningful reformulations:1. Span all the results
2. Present different aspects of the results
?
46 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Diversity Matters
Results
Query
Objective function f(Q)
λ = 1• Non optimal: f({Q1’,Q2’}) = 7
• Optimal: f({Q3’,Q4’}) = 8
47 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Problem
Graph Query Reformulation with Diversity
47
Theorem (NP-hardness)The problem reduces to MAX-SUM Diversification Problem, so it is NP-hard
[KDD 2015]
48 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Solution: Greedy Algorithm
Greedy
While k-reformulations are not found
1. Find the reformulation leading to the maximum increment of the objective function (marginal gain)
2. Add the reformulation to the results
48
TheoremThe algorithm is a ½-approximation
Finding the maximum gain is #P-complete
[Valiant79]
Solution
Fast_MMPG: Branch and bound algorithm with quality guarantees
49 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
The multiplicity vector
Results
0 0 0 0 01 1 0 0 02 2 1 1 02 2 2 2 02 3 3 3 1
Output set of reformulations
50 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Upper bound on the Marginal gain
LemmaThe marginal gain increases if the multiplicity of the considered item is where |Q| is the number of reformulations in the reformulated set constructed so far.
Upper bound : is the value of the objective function considering only results with multiplicity
Theorem
51 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Upper bound
Results
0 0 0 0 01 2 1 1 1
Output set of reformulations
1 2 1 1 1
52 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Until the reformulation with the maximum upper bound and marginal gain is not found1. Expand the reformulation with the max upper
bound2. Prune Reformulations with marginal gain
smaller than the upper bound so far
The Fast_MMPG Algorithmupper bound
marginal gain
53 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Experimental Setup
• Datasets: • AIDS: 10k chemical compounds
• Financial: 17k transaction workflows
• Web: 13k interactions with a recommender system
• Baseline algorithms: • k-freq: returns top-k frequent supergraphs of a query
• LIndex: informative patterns index
• Experiments: • Time and objective function value varying k, query size, λ
• Anecdotal
• Scalability
54 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Time Comparison
Number of reformulations1. k-freq runs only slighly faster2. Time increases linearly in k3. Fast_MMPG has real-time
performance
Query size1. Fast_MMPG comparable to k-
freq2. Time decreases with query
size (less reformulations)
number of reformulations (k)
query size
55 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Objective function gain
Analysis1. Lambda correctly moves the objective function towards
diversity2. k-freq only captures coverage
56 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Qualitative evaluation
k-freq
Fast_MMPG
C O
O OH
C
O CH3
C
O Fe
C
O NH2
C
O
CH3
C
CH3
O CH3
C
O CH3
C C
O CH2
C C
O NH2
C
O CH2
C NH
Query
Analysis• k-freq finds reformulation of the same superquery
• Fast_MMPG returns reformulations with more diversified structures
57 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Conclusions
Hardly Expressible Queries • Exemplar Queries: user query is an example of the desired
results
• Efficient algorithmic solution scaling on real knowledge graphs
• Study of 2 similarity measures for query answering
Information Overload • Study of the problem in graph databases
• Principled objective function optimizing coverage and diversity
• Algorithmic solutions with quality guarantees
58 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Other Studied Problems
“There are no right answers to wrong questions.”
Ursula K. Le Guin
59 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Company
BasedRevenue
Mobile
Search
Hardware
Cloud
Apple Cupertino $62B 0 0 0 1
Google M.View $80B 0 1 1 0
HP Palo Alto $30B 0 0 1 0
Yahoo!Sunnyval
e$16B 0 1 0 0
Empty-Answer Problem
COMPANYDB
query = Mobile, Search, Hardware
{}
No answer
60 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Dealing with the Empty Answer Problem
• Ranking results based on user preferences• IR [Baeza11] and database solutions [Chaudhuri04]
• Query relaxation• Modify some of the query conditions [Mishra09]
• (-) Suggests all the modification together• (-) Does not take user feedback into account
61 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Our Solution: Interactive Query Relaxation
• Suggests one relaxation at a time• Takes user feedback into account• Models user preferences• Optimization centric relaxation suggestions• User centric (effort, relevance)
• System-centric (profit)
[PVLDB 2013, SIGMOD 2014 (Demo)]
62 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Conclusions
We propose
• Exemplar Query Framework on Information Graphs: user query is an example of the desired results
We study
• Exemplar Query Answering: efficiently answering and ranking of exemplar queries
• Graph Query Reformulation: provide insights of the exemplar query answers
We show
• Solutions scaling on real size information graphs
• Principled approaches with quality guarantee
• Practical applicability of the problem
63 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Future Directions
Query reformulation in connected-graphs• Current: set of small graphs (simulated in big graphs)
Include User preferences• In exemplar queries• In graph query reformulation
Multiple exemplar queries• Current: single exemplar queries• With multiple exemplar queries semantics changes
Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT64
Questions?
Thank you!
65 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Publications
Hardly Expressible Queries• D. Mottin, M. Lissandrini, Y. Velegrakis, T. Palpanas. Exemplar queries: Give
me an example of what you need. PVLDB, 7(5), 2014.• D. Mottin, M. Lissandrini, Y. Velegrakis, T. Palpanas. Searching with XQ: the
eXemplar Query Search Engine. SIGMOD, 2014.• M. Lissandrini, D. Mottin, D. Papadimitriou, T. Palpanas, Y. Velegrakis.
Unleashing the power of information graphs. SIGMOD Record, 43(4), 2014.
• D. Mottin, M. Lissandrini, Y. Velegrakis, T. Palpanas. Exemplar queries: A new Way of Searching. (under submission)
Information Overload• D. Mottin, F. Bonchi, F. Gullo. Graph Query Reformulation with Diversity.
(KDD 2015)
Empty-Answer• D. Mottin, A. Marascu, S. B. Roy, G. Das, T. Palpanas, Y. Velegrakis. A
probabilistic optimization framework for the empty-answer problem. PVLDB, 6(14), 2013.
• D. Mottin, A. Marascu, S. B. Roy, G. Das, T. Palpanas, Y. Velegrakis. IQR: An interactive query relaxation system for the empty-answer problem. SIGMOD, 2014
• D. Mottin, A. Marascu, S. B. Roy, G. Das, T. Palpanas, Y. Velegrakis. A holistic and principled approach for the empty-answer problem. (under submission)
66 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Bibliography
[Mishra09] C. Mishra and N. Koudas. Interactive query refinement. In EDBT, 2009.
[Roy08] S. Basu Roy, H. Wang, G. Das, U. Nambiar, and M. Mohania. Minimum-effort driven dynamic faceted search in structured databases. In CIKM, 2008.
[Chadhuri04] S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum. Probabilistic ranking of database query results. In VLDB, 2004.
[Baeza11] R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval. 2011.
[Haveliwala02] T. H. Haveliwala. Topic-sensitive pagerank. In WWW, 2002.
[Cook71] S. A. Cook. The complexity of theorem-proving procedures. In Symposium on Theory of Computing, 1971.
[Ma14] S. Ma, Y. Cao, W. Fan, J, Huai, and T. Wo. Strong simulation: Capturing topology in graph pattern matching. TODS, 2014.
66
67 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Bibliography
[Valiant79] Leslie G Valiant. The complexity of computing the permanent. Theoretical computer science, 1979.
[Dang10] V. Dang and B.W.Croft. Query reformulation using anchor text. In WSDM, 2010.
[Bordino13] I. Bordino, G. De F. Morales, I. Weber, and F. Bonchi. From machu picchu to rafting the urubamba river: anticipating information needs via the entity-query graph. In WSDM, 2013.
[Boldi11] P. Boldi, F., C. Castillo, and S. Vigna. Query reformulation mining: models, patterns, and applications. Information retrieval, 2011.
[Khan13] A. Khan, Y. Wu, C. C. Aggarwal, and X. Yan. Nema: Fast graph search with label similarity. In PVLDB, 6(3), 2013.
67
68 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Research Topics
Probabilistic databases• Consider probabilistic knowledge bases to capture noise and
uncertainty• Propose solutions that cope with many world semantics• Propose novel similarity measures for exemplar queries• Define reformulations in a probabilistic fashion
Exemplar Query Answering Framework• Study the problem of identifying exemplar queries need• Propose solutions for keyword queries to graph samples• Extend current solution with incomplete queries or multiple queries• Include reformulation capabilities • Study exemplar queries in other context (research papers,
newspapers, …)
69 Exemplar Queries: Knowledge Exploration Using Information Graphs – Davide Mottin @ RMIT
Back-up slides