apache solr/lucene internals by anatoliy sokolenko

Post on 09-May-2015

2.056 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Anatoliy Sokolenko, Software Engineer at Grid Dynamics

TRANSCRIPT

Apache Lucene/Solr

Internals

About meJava and all around

Principal Software Engineer at Grid Dynamics

Kharkiv

asokolenko@griddynamics.com

Apache Lucene/Solr

Internals

4 nodes ✕ 12GB disk space

June 2013 database 14.630.209 records

Indexing took 5 hours in 100 threads

1000 batch

Lucene.net

VM 16 CPU cores

16 GB memory

lightweightperformant

searchlibrary

Data Model

• document oriented

• flat

• store

• index

Data Model

• document oriented

• flat

• store

• index

score:1

tag:java

type:answer

Documentboost = 1.1

docID = 23

Showcase

Basic Flow

LuceneIndex

Index Writer Index Searcher

Analyzer Index Reader

Basic Flowscore:1

tag:java

type:answer

Documentboost = 1.1

LuceneIndex

Index Writer Index Searcher

addDocument

Analyzer Index Reader

Basic Flowscore:1

tag:java

type:answer

Documentboost = 1.1

LuceneIndex

Index Writer Index Searcher

addDocument

querytag:java

Analyzer Index Reader

Basic Flowscore:1

tag:java

type:answer

Documentboost = 1.1

LuceneIndex

Index Writer Index Searcher

addDocument

querytag:java

score:1

tag:java

type:answer

Documentboost = 1.1

search

Analyzer Index Reader

Lucene Index Structure

Index

Index

SegmentA

Index

SegmentA

SegmentB Segment

CSegment

D

score:0

score:1

score:5...

tag:java

tag:mysql

tag:css...

type:answer

type:question

3

4

2

2

3

4

3

2

Term Infos

score:0

score:1

score:5...

tag:java

tag:mysql

tag:css...

type:answer

type:question

3 +1 +2

10 +3 +1

4 +11

5 +2

6 +52 +1

1 +30 +27

3 +7 +1

5 +2

3

+7

+2

4

2

2

3

4

3

2

1 1 1

1 1 1

1 1

1 1

1 1 1

1

1 1 1 1

1 3 5

2 1

Term Infos Term Frequencies

score:0

...

tag:mysql

...

score:0

score:1

score:5...

tag:java

tag:mysql

tag:css...

type:answer

type:questiontype:question

3 +1 +2

10 +3 +1

4 +11

5 +2

6 +52 +1

1 +30 +27

3 +7 +1

5 +2

3

+7

+2

4

2

2

3

4

3

2

1 1 1

1 1 1

1 1

1 1

1 1 1

1

1 1 1 1

1 3 5

2 1

Term InfosTerm Info Index Term Frequencies3

3

2

Showcase

scalableenterprise search

server

RequestHandlers

Data Import Handler

SolrCloud

solrconfig.xmlschema.xml

SolrCloud

Join Cluster

Shard 3Shard 2Shard 1

Join Cluster

Indexing

Shard 1 Shard 2 Shard 3

Indexing

Shard 1 Shard 2 Shard 3

Indexing

Shard 1 Shard 2 Shard 3

Indexing

Shard 1 Shard 2 Shard 3

Indexing

Shard 1 Shard 2 Shard 3

Indexing

Shard 1 Shard 2 Shard 3

Query

Shard 1 Shard 2 Shard 3

Query

Shard 1 Shard 2 Shard 3 querytag:java

Query

Shard 1 Shard 2 Shard 3 querytag:java

Query

Shard 1 Shard 2 Shard 3 querytag:java

Query

Shard 1 Shard 2 Shard 3 querytag:java

Failure

Shard 1 Shard 2 Shard 3

Failure

Shard 1 Shard 2 Shard 3

Failure

Shard 1 Shard 2 Shard 3

CAP Model

C

A

PSolrCloud

Solr

Showcase

Faceted Navigation

Showcase

Algorithm

tag:java

tag:mysql

tag:css

5 +2

6 +52 +1

1 +30 +27 +2

7 31 58 59Query Result

Index

Algorithm

tag:java

tag:mysql

tag:css

5 +2

6 +52 +1

1 +30 +27 +2

7 31 58 59Query Result

Index

5 7

6 58 59

1 31 58 60

Algorithm

tag:java

tag:mysql

tag:css

5 +2

6 +52 +1

1 +30 +27 +2

7 31 58 59Query Result

Index Facet

1

2

2

5 7

6 58 59

1 31 58 60

Showcase

Text Analysis

Analyzer

Tokenizer

Filter

Char filter

Analyzer

Tokenizer

Filter

Char filter

Index time

Analyzer

<strong>There are no pointers in Java!</strong>

Tokenizer

Filter

Char filter

Index time

Analyzer

<strong>There are no pointers in Java!</strong>

Tokenizer

Filter

There are no pointers in Java!

Char filter

Index time

Analyzer

<strong>There are no pointers in Java!</strong>

Tokenizer

Thereare

nopointers

inJava

Filter

There are no pointers in Java!

Char filter

Index time

Analyzer

<strong>There are no pointers in Java!</strong>

Tokenizer

Thereare

nopointers

inJava

Filter

There are no pointers in Java!

Char filter

Index time

??

?pointer

?java

Analyzer

<strong>There are no pointers in Java!</strong>

Tokenizer

Thereare

nopointers

inJava

Filter

There are no pointers in Java!

Char filter

Index time Query time

??

?pointer

?java

Analyzer

<strong>There are no pointers in Java!</strong>

Tokenizer

Thereare

nopointers

inJava

Filter

There are no pointers in Java!

Char filter

pointers in Java

Index time Query time

??

?pointer

?java

Analyzer

<strong>There are no pointers in Java!</strong>

Tokenizer

Thereare

nopointers

inJava

Filter

There are no pointers in Java!

Char filter

pointers in Java

Index time Query time

pointers in Java

??

?pointer

?java

Analyzer

<strong>There are no pointers in Java!</strong>

Tokenizer

Thereare

nopointers

inJava

Filter

There are no pointers in Java!

Char filter

pointers in Java

Index time Query time

pointers in Java

pointers Javain

??

?pointer

?java

Analyzer

<strong>There are no pointers in Java!</strong>

Tokenizer

Thereare

nopointers

inJava

Filter

There are no pointers in Java!

Char filter

pointers in Java

Index time Query time

pointers in Java

pointers Javain

??

?pointer

?java

pointer java?

Showcase

Spell Suggestions

Levenshtein Distance

htmlhtmm

Levenshteindistance = 1

hlmzhtml

Levenshteindistance = 2

tag:php

tag:jquery

tag:json

tag:java

tag:c#

tag:apache

tag:osx

tag:html

Levenshtein Automaton

html

Levenshteindistance = 1

Htt

m

m

ll

tm

l

H t

t

m

m

l l

m

l

tt

m

H

l

m

Showcase

Solr is...

• enterprise level search engine

• vertically scalable

• horizontaly scalable, but...

• tunable

• poorly documentation

• with active community

The End

top related