solr graph query: presented by kevin watters, kmw technology

35
OCTOBER 1114, 2016 BOSTON, MA

Upload: lucidworks

Post on 16-Apr-2017

552 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Solr Graph Query: Presented by Kevin Watters, KMW Technology

O C T O B E R   1 1 -­‐ 1 4 ,   2 0 1 6     •     B O S T O N ,   M A  

Page 2: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Solr  Graph  Query  Kevin  WaDers  

Founder,  KMW  Technology  

Page 3: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Solr  6.0  Graph  Query  Overview  

Kevin  WaDers    KMW  Technology  [email protected]  www.kmwllc.com    

October  14,  2016  

Page 4: Solr Graph Query: Presented by Kevin Watters, KMW Technology

KMW  Technology  Overview  •  Boston  based  soIware  consulJng  and  professional  services  

organizaJon.  •  Founded  in  2010.  •  Developers  &  consultants  with  deep  industry  experience.  •  BouJque  firm  specializing  in  Open  Source,  Search,  Big  Data,  

Machine  Learning,  and  AI    •  Custom  Connectors,  Pipelines,  Classifiers,  Search,  UI/UX  

development.  •  Data  and  InformaJon  Architecture  

Page 5: Solr Graph Query: Presented by Kevin Watters, KMW Technology

What  is  a  Graph?  “One  data  model  to  rule  them  all!”  A  generic  representaJon  of  all  linked  data  models.      G  =  <V,E>    ?!?!  A  graph  is  made  up  of  nodes  and  edges…  •  Nodes/VerJces  (  node_id  )  has  metadata  and  links  to  other  nodes.  •  Edges/Links  (  edge_ids  )  are  associated  with  a  node  and  point  to  other  

nodes.  Nodes  can  be  modeled  as  documents  in  the  index  with  a  mulJ-­‐value  field  containing  the  edges.  For  other  use  cases  edges  can  also  be  modeled  as  documents.  

 

Page 6: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Graph  Traversal  There  are  many  graph  traversal  /  exploraJon  algorithms.    DFS,  BFS,  A*,  Alpha–beta,  etc…    Solr  Graph  Query  implements  “BFS”    Breadth-­‐First  Search,  each  hop  expands  the  “FronJer”  of  the  graph.    It  explores  all  current  edges  in  a  single  step/query!  

Page 7: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Graph  Query  Parser  Syntax  

Parameter   Default   DescripJon  

from   field  containing  the  node  id  

to   Field  contaning  the  edge  id(s)  

maxDepth   -­‐1   The  number  of  hops  to  traverse  from  the  root  of  the  graph.    -­‐1  means  traverse  unJl  all  edges  and  documents  have  been  collected.  maxDepth=1  is  similar  behavior  to  a  JOIN.  

traversalFilter   null   arbitrary  query  string  to  apply  at  each  hop  of  the  traversal  

returnRoot   true   true|false  –  indicaJon  of  if  the  documents  matching  the  root  query  should  be  returned.  

leafNodesOnly   false   true|false  –  indicaJon  to  return  only  documents  in  the  result  set  that  do  not  have  a  value  in  the  “to”  field.  

useAutn   false   Decide  to  use  Automaton  query  term  for  edge  traversal  or  TermsQuery.  

Uses Solr’s query parser plugin and “local params” syntax: {!graph from=”node_id” to=“edge_ids”}query

Page 8: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Key  Features  and  Design  Goals    

“Graph  is  a  Filter  on  top  of  your  data”    -­‐someone    •  Designed  for  large  scale  and  large  number  of  edges  and  very  deep  traversals.  •  Limited  memory  usage  for  traversal  •  Cycle  detecJon  for  “free”  (based  on  current  bit  set!)  •  Highly  cacheable  via  the  FilterCache!  •  Support  mulJValued  fields  for  nodes  and/or  edges  •  Support  arbitrary  query  filters  during  the  exploraJon  with  the  “Traversal  Filter”  •  Follow  Every  Edge!    No  edge  leI  behind!    Traversal  is  complete!  •  Works  with  Facets,  Facet  Queries,  and  other  search  components  seamlessly  

Page 9: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Memory  Usage  •  One  bit  set  to  rule  them  all  (for  the  result  set)  •  BitSet  provides  cycle  detecJon  for  free.  (Have  I  been  here  

before?)  •  BitSet  equal  to  size  of  index!  •  100  Million  doc  index  only  uses  about  12  MB  RAM  per  query!    

(Same  size  as  1  filter  cache  entry!)    •  root  nodes  BitSet  only  if  returnRoot  =  false  •  leaf  nodes  same  for  all  graph  queries.  

Page 10: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Performance  ConsideraJons  •  Use  DocValues,  they’re  SO  MUCH  FASTER!  •  Don’t  tokenize  your  node/edge  ids!  (unless  that’s  what  you  want)  

•  Performance  is  a  funcJon  of  the  number  of  unique  edges  that  are  traversed,  not  the  number  of  nodes.  

•  Limit  depth  if  you  know  how  far  to  go  in  the  traversal.  

Page 11: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Graph  Query  For  Security  •  Graph  queries  are  elegant  and  simple  to  use  for  traversing  security  hierarchies  such  as  LDAP  and  AD  

•  Custom  security  models  that  are  hierarchical  or  folder  based  in  nature.  

•  Supports  Users  being  members  of  Groups  that  can  be  members  of  other  Groups  

•  Adding  or  removing  a  user/group  is  updaJng  just  1  document,  not  re-­‐indexing  large  porJons  of  your  index!  

Page 12: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Example  Company  with  Security  Model    

Page 13: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Document  Security  Model  within  the  Solr  Index  

Page 14: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Graph  Traversal  for  User  1  

Page 15: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Graph  Traversal  for  User  2  

Page 16: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Graph  Based  Security  Query  

•  Single  security  query  to  traverse  the  graph:  {!graph from=node_id to=edge_ids returnOnlyLeaf=true}id:user_1

•  Security  query  is  applied  as  a  filter  to  the  query  request  to  ensure  the  security  filter  is  cached!  

Page 17: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Distributed  &  Solr  Cloud  •  You  can  distribute  the  user/group  records  to  all  shards  in  the  index  with  smart  rouJng!  

•  Distribute  the  documents  only  across  the  shards.  

•  Fixed  number  of  permissions  on  each  shard  and  distributed  documents  keeps  graph  traversals  local  for  the  best  performance!  

Page 18: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Users  ,  AcJons  and  Items  •  Model  your  browsing/purchase  history  as  

– Users  (have  an  ID)  –  Items  (have  an  ID,  metadata,  category,  etc.)  – AcJons  (link  between  user  and  Items,  such  as  raJng,  purchase,  like/dislike)  

Page 19: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Find  similar  users  •  Graph  traversal  from  a  user  (or  set  of  users)  through  their  acJons  to  items  they  like,  to  find  similar  users,  and  out  to  items  they  like.  

•  Now,  exclude  the  original  starJng  set  •  “returnRoot=false”  

Page 20: Solr Graph Query: Presented by Kevin Watters, KMW Technology

User  1  (depth=2)  

Item  1  (root)  

Item  4  (depth=4)  

Item  2  (depth=4)  AcJon/Buy  

(depth=1)  

AcJon/Buy  (depth=3)  

AcJon/Buy  (depth=3)  

User  2  (depth=2)  

Item  3  (depth=4)  

AcJon/Buy  (depth=3)  

4  hops  in  the  graph  from  an  Item  gets    you  to  related  items,  omit  the  starJng  point  and  only  return  records  that  are  “items”  {!graph  from=node_id  to=edge_id  maxDepth=4  returnRoot=false}id:Item_1  AND  type:item  

AcJon/Buy  (depth=1)  

Users  who  buy  X  also  buy  Y  

Page 21: Solr Graph Query: Presented by Kevin Watters, KMW Technology

WordNet  as  a  Knowledge  Graph  WordNet  maintained  by  Princeton  University  provides  a  hierarchical  model  of  the  English  language.    Words  have  relaJonships  to  each  other  such:  •  Hypernym  –  a  more  general  case  of  another  word  •  Hyponym  –  a  more  specific  case  of  another  word  •  Jaguar  is  a  type  of  Cat    •  Cat  is  a  type  of  Animal  Cat  is  a  hypernym  of  Jaguar.    Jaguar  is  a  hyponym  of  cat.  Index  WordNet  entries  with  fields  containing  the  links  to  the  hypernyms  and  hyponyms!    

Page 22: Solr Graph Query: Presented by Kevin Watters, KMW Technology

WordNet  Hypernym  Traversal  +{!graph  from="synset_id"  to="hypernym_id"  maxDepth=8}sense_lemma:jaguar  

Page 23: Solr Graph Query: Presented by Kevin Watters, KMW Technology

WordNet  Graph  IntersecJons  Is  a  jaguar  a  type  of  animal?    If  a  graph  intersecJon  exists,  the  answer  is  yes!    IntersecJon  of  knowledge  graph  traversals  can  be  used  to  answer  quesJons!    

Page 24: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Wikipedia  •  Pages  have  links!    Lots  of  Links…  •  Pages  have  Infoboxes  that  contain  great  metadata.      •  Infobox  types  like  :  person,  scienJst,  writer,  arJst..  Etc  

•  What  if  you’re  looking  for  all  Wikipedia  pages  about  people?    

 

Page 25: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Infobox  facets  •  The  infobox  tags  are  more  specific  than  the  users  search/request.    

•  Searching  for  People  should  include  ScienJsts,  Authors,  and  ArJsts!  

•  Wikipedia  doesn’t  know  a  ScienJst  is  a  person,  but  WordNet  does!  

Page 26: Solr Graph Query: Presented by Kevin Watters, KMW Technology

WordNet  knows  a  scienJst  is  a  person!  

Page 27: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Wikipedia  pages  linked  to  Graph  Theory  

InformaJon  Overload!    It’s  difficult  to  see  the  people  in  this  sea  of  informaJon!  

Page 28: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Combine  WordNet  and  Wikipedia  With  Graph  Queries  to  find  people!  

Using  WordNet  we’re  able  to  disambiguate  that  the  enJty_types  of  “scienJst”  ,  “person”  and  “philosopher”  are  all  types  of  people!    Normal  FaceJng  is  not  enough!  

Page 29: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Nested  and  Filtered  Graph  Queries!  

The  Graph  query  can  be  nested.    This  allows  you  to  traverse  one  set  of  fields,  then  change  the  fields  you  are  traversing.  This  example  first  traverses  all  WordNet  documents  that  are  a  type  of  person,  then  based  on  that  result  set,  it  does  a  1  hope  traversal  to  Wikipedia  data  on  the  enJty_type  field  to  restrict  the  results.    {!graph  from="enPty_type"  to="sense_lemma"  maxDepth=1}{!graph  from="sense_lemma"  to="sense_hyponym_lemma"  maxDepth=2}sense_lemma:person    Intersect  that  with  pages  that  are  related/linked  to  from  the  Wikipedia  query  of  node_id:”Graph  theory”    {!graph  from=node_id  to=edge_ids  maxDepth=1}node_id:”Graph  theory”    AddiJonally  use  returnRoot=false  if  you  want  to  omit  the  WordNet  docs  from  the  result  set!  

Page 30: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Gather  Nodes?  •  If  you’re  interested  in  doing  some  distributed  Graph  traversal  in  Solr  there  are  a  few  opJons.  

•  You  can  use  the  Gather  Nodes  funcJonality  in  Streaming  AggregaJons.    Not  super  fast,  but  it  gets  the  job  done!  

Page 31: Solr Graph Query: Presented by Kevin Watters, KMW Technology

Distributed  Graph  Traversal  •  Do  you  think  you  need  to  scale  up?    We  have  an  implementaJon  based  on  Ka{a  &  Solr  Cloud  that  uses  Ka{a  to  distribute  the  fronJer  query.  

Page 32: Solr Graph Query: Presented by Kevin Watters, KMW Technology

What  next?  •  Edge  weights,  Relevancy,  and  Scoring  

–  Based  on  |/idf  or  bm25,    –  Based  on  numerical  field  values  (min/max/sum/avg  weight  

applicaJon)?  –  Skip  high  frequency  edges?  

•  Min  distance  computaJon  •  Driving  direcJons?  •  Be=er  support  for  visualizaJon  libraries  like  D3.js!  •  Distributed  Traversal  via  Ka{a  fronJer  query  broker  

Page 33: Solr Graph Query: Presented by Kevin Watters, KMW Technology

AddiJonal  Detail  

     Related  Solr  Tickets  h=ps://issues.apache.org/jira/browse/SOLR-­‐7543  h=ps://issues.apache.org/jira/browse/SOLR-­‐8632  

h=ps://issues.apache.org/jira/browse/SOLR-­‐8176        QuesJons?                  Kevin  Wa=ers,  KMW  Technology                [email protected]  

 

Page 34: Solr Graph Query: Presented by Kevin Watters, KMW Technology

AcJons  occur  over  Jme  •  These  events  can’t  easily  be  aggregated  or  fla=ened  onto  a  

record.  •  Model  this  as  a  “person”  record,  with  a  set  of  “acJon”  records.  •  Each  acJon  record  has  the  id  of  the  “previous”  acJon.  •  Search  for  an  acJon,  graph  traverse  based  on  person  id  to  

another  acJon,  then  finally  to  the  person  record.  

Page 35: Solr Graph Query: Presented by Kevin Watters, KMW Technology

OpenCV,  Video  RecogniJon  •  Imagine  indexing  each  frame  of  video  from  security  cameras.    

Pass  each  frame  of  video  through  OpenCV  for  object  recogniJon  &  face  recogniJon.  

•  Each  frame  has  a  frame  number  of  it’s  frame  and  the  previous  frame.  

•  Search  for  object/face  “A”  detected,  followed  by  object/face  “B”  detected,  across  all  of  your  video  streams.