introduction to lucene and solr - 1

52
Day 1 - Introduction to Lucene/Solr Core Tech @Trend Micro 吳奕慶 YI-CHING WU 1

Upload: yi-ching-wu

Post on 14-Jul-2015

731 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Introduction to Lucene and Solr - 1

Day 1 -

Introduction to

Lucene/Solr

Core Tech @Trend Micro

吳奕慶 YI-CHING WU

1

Page 2: Introduction to Lucene and Solr - 1

Agenda

What is a search engine?

Introduction Lucene and Solr?

Advantages of Solr

Solr Architecture

Query Syntax

Setup Solr Configuration files

Working with Solr : Feed data ,query data

2

Page 4: Introduction to Lucene and Solr - 1

Why do I need a search

engine?

4

Page 5: Introduction to Lucene and Solr - 1

Why do I need a search

engine?

5

Page 6: Introduction to Lucene and Solr - 1

Let’s start with Indexing

That’s information like a

garbage

No structure

Come in all kinds of

shapes, sizes, formats

6

Page 7: Introduction to Lucene and Solr - 1

Let’s start with Indexing

This is what index does

Makes data accessible

in a structure format,

easily accessible

through search

7

Page 8: Introduction to Lucene and Solr - 1

Which one can be

indexed and searched?

Various file formats

HTML

Text Files

Word

PDF

PPT

8

Page 9: Introduction to Lucene and Solr - 1

9

Page 10: Introduction to Lucene and Solr - 1

10

Page 11: Introduction to Lucene and Solr - 1

And now the search

component

11

Page 12: Introduction to Lucene and Solr - 1

12

Page 13: Introduction to Lucene and Solr - 1

What is a search engine?

Indexing Component

Search Component

Index Files

13

User

s

Dat

a

Is Indexed

Sends

search query

Receives

search

results

Page 14: Introduction to Lucene and Solr - 1

Introducing Lucene

Created by Doug Cutting

Not a application but is a Full-text search library (Java

language)

Open source project (Since 2000.3~)

Mature

Easy to learn API

Store its index as files on disk

No Web Crawler

http://lucene.sourceforge.net/talks/pisa/

14

Page 15: Introduction to Lucene and Solr - 1

Typical search application15

Page 16: Introduction to Lucene and Solr - 1

Search?

If you want to find a word in a book : how do you do it?

Naïve approach : linear-search

O(n) : slow

Inverter index

16

Page 17: Introduction to Lucene and Solr - 1

Inverter index17

Page 18: Introduction to Lucene and Solr - 1

Indexing with Lucene18

Page 19: Introduction to Lucene and Solr - 1

Fields of Lucene Indexed

Put the content in the inverter index

Analyzed

Split the content into terms to be added to the inverter index. Normalized terms

Stored

Keep the original content on disk

Multivalued

Repeat the same field multiple times in the same document with different values

OmitNorm

Index time field boost setting

TermVector

WITH_POSITIONS_OFFSETS

19

Page 20: Introduction to Lucene and Solr - 1

Analyzer20

PerFieldAnalyzerWrapper

Page 21: Introduction to Lucene and Solr - 1

Analyzer21

Page 22: Introduction to Lucene and Solr - 1

Analyzer22

Page 23: Introduction to Lucene and Solr - 1

Custom Analyzers23

Page 24: Introduction to Lucene and Solr - 1

Query with Lucene

Ask Lucene “What documents contain this words?”

Lucene applied an Analyzer to each word queried.

Query can be programmatically build powerful Query Syntax.

24

Page 25: Introduction to Lucene and Solr - 1

Query Code25

Query Syntax :

http://www.lucenetutorial.com/lucene-query-syntax.html

http://lucene.apache.org/core/3_5_0/queryparsersyntax.html

Page 26: Introduction to Lucene and Solr - 1

Luke for Lucene Index26

Page 27: Introduction to Lucene and Solr - 1

Relevancy scoring

N dimension vectors for documents

and queries

Score represents how close the

vectors are

TF-IDF(term-frequency-inverse

document frequency)

Document with many of the search

terms are scored higher

Smaller documents are scored higher

27

Page 28: Introduction to Lucene and Solr - 1

Default Similarity Scoring

Algorithm

28

Page 29: Introduction to Lucene and Solr - 1

Introducing Solr

Created by Yonik (since 2004)

Open source(released in 2006)

Http Application built around Lucene

Make it easy to develop search solutions

Most programming tasks in Lucene are configuration tasks in Solr

Advanced features develop on top of Lucene

Data importer, faceting, filter, similarity , replication and distributed search support, dynamic field, etc.

As of 2010, Lucene and Solr are merged development codebases

29

Page 30: Introduction to Lucene and Solr - 1

Solr Architecture30

Page 31: Introduction to Lucene and Solr - 1

Solr Archived Folders and Files31

Page 32: Introduction to Lucene and Solr - 1

Understanding Solr Home32

Page 33: Introduction to Lucene and Solr - 1

Solr Features

Dismax

Edismax

Text Highlight

Spell Checking

More Like This

Cache

Replication

Database connector

Spatial (Geo-location)

33

Page 34: Introduction to Lucene and Solr - 1

Solr Administration Console34

Page 35: Introduction to Lucene and Solr - 1

Solr.xml35

Page 36: Introduction to Lucene and Solr - 1

Diagram of

the main components of Solr 4.x

36

Page 37: Introduction to Lucene and Solr - 1

Solr Schema

Solr allows to administer one or more Lucene Index

Each index has its own schema

List all fields allowed for an index

Defines the analyzers for each field

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFil

ters

37

Page 38: Introduction to Lucene and Solr - 1

Three Main steps to index a

document

38

Page 39: Introduction to Lucene and Solr - 1

Solr Schema

-Conf\schema.xml

39

Page 40: Introduction to Lucene and Solr - 1

Solr Schema

-Conf\schema.xml

40

Page 41: Introduction to Lucene and Solr - 1

Solr- solrconfig.xml41

Page 42: Introduction to Lucene and Solr - 1

Solr Request Handler42

Page 43: Introduction to Lucene and Solr - 1

How request handlers

process Queries?

43

Page 44: Introduction to Lucene and Solr - 1

Solr Indexation

HTTP POST

XML by default, but also json , csv

Multi Threaded

44

Page 45: Introduction to Lucene and Solr - 1

Solr Query

HTTP GET or HTTP POST

Query Parameters

Response in XML by default, but other formats are

supported(json, php, ruby, etc.)

45

Page 46: Introduction to Lucene and Solr - 1

Solr Query using Administration Console46

Page 47: Introduction to Lucene and Solr - 1

Solr Query Parameters47

Page 48: Introduction to Lucene and Solr - 1

Solr Response in XML48

Page 49: Introduction to Lucene and Solr - 1

Solr simple example49

Page 50: Introduction to Lucene and Solr - 1

Q&A50

Page 51: Introduction to Lucene and Solr - 1

Solr Demo

Using TrendMicro Support knowledge base

Indexed using Solr DataImporter

51

Page 52: Introduction to Lucene and Solr - 1

Thank You!52