orientdb & lucene

31
Enrico Risa The Dynamic Duo OrientDB & Lucene

Upload: wolf4ood

Post on 05-Dec-2014

1.240 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: OrientDB & Lucene

Enrico RisaThe Dynamic Duo OrientDB & Lucene

Page 2: OrientDB & Lucene

Outline

❖ Apache Lucene in a nutshell!

❖ OrientDB Indexing!

❖ OrientDB-Lucene - Full Text Index - Spatial Index!

❖ Roadmap 2.0

Page 3: OrientDB & Lucene

What Is Lucene?❖ Free-text indexing library!

❖ Implements standard IR/search functionality ● Query models, ranking, indexing!

❖ Written in Java!

❖ Simple Api!

❖ Fast, Mature and constantly evolving!

❖ Many extension points

Page 4: OrientDB & Lucene

Who uses Lucene?❖ Twitter!

❖ Linkedin!

❖ Apple!

❖ Solr!

❖ Elastic Search!

❖ Neo4J!

❖ and now OrientDB

Page 5: OrientDB & Lucene

Base Lucene workflow

Page 6: OrientDB & Lucene

Documents

❖ Basic Unit for indexing and searching!

❖ Contains a list of Fields!

❖ Schema-less

Page 7: OrientDB & Lucene

Fields

❖ Basic component of a Document!

❖ Fields- name - value - store - analyzed

Page 8: OrientDB & Lucene

Fields Types & Options❖ Types

-Field-StringField-TextField-StoredField-IntField-…More!

❖ Options-Stored or Not -Indexed or not -Analyzed or not

Page 9: OrientDB & Lucene

Directory

❖ RAMDirectory Ram based index!

❖ FSDirectory File-based index!

❖ NIOFSDirectory Same as FSDirectory but using NIO api.

Page 10: OrientDB & Lucene

Indexing Documents

Page 11: OrientDB & Lucene

Searching Index

Page 12: OrientDB & Lucene

Inverted Index

Page 13: OrientDB & Lucene

Luke: a graphical user interface

❖ Open Lucene Index!

❖ Browse documents!

❖ Run query!

❖ ….

Page 14: OrientDB & Lucene

OrientDB Indexing❖ SBTree

(Unique,Not unique, Full Text, Dictionary)!

❖ HashIndex (Unique,Not unique, Full Text, Dictionary)!

❖ MVRB-Tree (Deprecated since 1.6)!

❖ Lucene (OrientDB-Lucene)!

❖ … https://github.com/orientechnologies/orientdb/wiki/Custom-Index-Engine

Page 15: OrientDB & Lucene

OrientDB Lucene

❖ Open Source at https://github.com/orientechnologies/orientdb-lucene!

❖ This project aims to bring the power of Lucene index into OrientDB.!

❖ Supports only Spatial Index And Full Text

Page 16: OrientDB & Lucene

Installing OrientDB Lucene

❖ Embedded Mode

❖ Server Mode Grab a jar build and copy it into $ORIENTDB_HOME/plugins

Page 17: OrientDB & Lucene

Spatial Index

❖ No native implementation.!

❖ Build on top Lucene-Spatial Module.!

❖ Currently only points are supported.!

❖ Near and Within query.

Page 18: OrientDB & Lucene

Lucene Spatial

❖ Spatial4j- Handle Shapes (Point,Circle,Rectangle, Polygon) - Distance and Area math utitilities - Read WKT format!

❖ Provide Indexing Strategy - RecursivePrefixTree!

❖ Spatial Query using Shapes

Page 19: OrientDB & Lucene

Creating a Spatial Index❖ SQL

❖ JAVA

Page 20: OrientDB & Lucene

Spatial Operators

❖ NEAR Find all Points near a given location (latitude,longitude)!

❖ WITHIN Find all Points within a Given Bounding Box

Page 21: OrientDB & Lucene

Near Operator❖ Custom Operator that rely on Lucene Index!

❖ Special Syntax to support spatial args ($spatial)!

❖ Context variable $distance!

❖ Result set sorted from nearest to farthest.

Page 22: OrientDB & Lucene

Within Operator❖ Bounding Box Search!

❖ Currently Points within Box!

❖ Result set not sorted

Page 23: OrientDB & Lucene

Full Text Index

❖ Native Full Text Implementation.!

❖ Supports multiple fields.!

❖ Supports Lucene query syntax.!

❖ Lucene Analyzers

Page 24: OrientDB & Lucene

Creating a Full Text Index❖ SQL

❖ JAVA

Page 25: OrientDB & Lucene

Full Text Operators

❖ LUCENE[<fields>] LUCENE <exp>- Query your index using Query Parser syntax - Support Multiple fields- Target all fields (MultiFieldQueryParser) - Target specific field (QueryParser)

Page 26: OrientDB & Lucene

Lucene Operator❖ MultiFieldQueryParser

Target all fields

❖ QueryParser Target specific field

Page 27: OrientDB & Lucene

Indexing Performance

❖ Full Text - 9M records in ~300s with StandardAnalyzer and one field!

❖ Spatial 9M records in ~500s with two field (Point)

Page 28: OrientDB & Lucene

Roadmap 2.0

❖ Production Ready!

❖ Monitoring lucene index!

❖ More configuration!

❖ Gui tool integrated in Studio

Page 29: OrientDB & Lucene

Roadmap 2.0 (Spatial Index)

❖ Index more shape!

❖ More operators (Intersect..)!

❖ Not only BBox!

❖ Support for GeoJson http://geojson.org

Page 30: OrientDB & Lucene

Roadmap 2.0 (Full Text)

❖ Document & Field Boosting!

❖ Score in result set!

❖ Custom Analyzers & Filters!

❖ Search Engine

Page 31: OrientDB & Lucene

Thank You Questions?

❖ Contact Me - Enrico Risa [email protected] - Twitter https://twitter.com/wolf4ood