searching does not mean finding stuff - apache solr for typo3

45
http://www.dkd.de Freitag, 10. Juni 2011

Post on 22-Oct-2014

1.486 views

Category:

Technology


6 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Searching does not mean finding Stuff - Apache Solr for TYPO3

http://www.dkd.de

Freitag, 10. Juni 2011

Page 2: Searching does not mean finding Stuff - Apache Solr for TYPO3

d dkdevelopmentkommunikationdesign

Freitag, 10. Juni 2011

Page 3: Searching does not mean finding Stuff - Apache Solr for TYPO3

Welcome

Olivier DobberkauCEOdkd Internet Service GmbHFrankfurt am Main, Germany

Freitag, 10. Juni 2011

Page 4: Searching does not mean finding Stuff - Apache Solr for TYPO3

Agenda

What is search?

Search in TYPO3

Search expectations today

Apache Solr

Why and how?

Watch out!

Freitag, 10. Juni 2011

Page 5: Searching does not mean finding Stuff - Apache Solr for TYPO3

About�me

Freitag, 10. Juni 2011

Page 6: Searching does not mean finding Stuff - Apache Solr for TYPO3

Olivier�Dobberkau

Founder of dkd Internet Service GmbH

aka „the reverend never-end“

Met TYPO3 with Version 3.2 beta 3

Member of T3A BCC

43 years old

[email protected]

Twitter: @T3RevNeverEnd

Freitag, 10. Juni 2011

Page 7: Searching does not mean finding Stuff - Apache Solr for TYPO3

What�is�Search?

Freitag, 10. Juni 2011

Page 8: Searching does not mean finding Stuff - Apache Solr for TYPO3

Definition�of�Information�Retrieval

Information retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the World Wide Web.

Wikipedia: http://en.wikipedia.org/wiki/Information_retrieval

Freitag, 10. Juni 2011

Page 9: Searching does not mean finding Stuff - Apache Solr for TYPO3

Factors�in�Information�Retrieval

Recall

Precision

Fall-out

Scalability

Performance

Freitag, 10. Juni 2011

Page 10: Searching does not mean finding Stuff - Apache Solr for TYPO3

Factors�in�Information�Retrieval

Recall

Precision

Fall-out

Scalability

Performance

Simplicity

Flexibility

Freitag, 10. Juni 2011

Page 11: Searching does not mean finding Stuff - Apache Solr for TYPO3

Recall

Percent of documents that are returned

400 documents

100 containing information

25% recall

Freitag, 10. Juni 2011

Page 12: Searching does not mean finding Stuff - Apache Solr for TYPO3

Precision

Percentage of documents that are relevant

500 returned, 100 relevant

20% precision

Freitag, 10. Juni 2011

Page 13: Searching does not mean finding Stuff - Apache Solr for TYPO3

Best would be:

100% Recall with 100% Precision

Freitag, 10. Juni 2011

Page 14: Searching does not mean finding Stuff - Apache Solr for TYPO3

Index

The purpose of storing an index is to optimize speed and performance in !nding relevant documents for a search query.

Freitag, 10. Juni 2011

Page 15: Searching does not mean finding Stuff - Apache Solr for TYPO3

Index

Index

Document 5

Document 4

Document 3

Document 2

Document 1

Extbase

TYPO3

San

Baseball

My

is

Francisco

is

cat

T3CON

my

is

a

rocks

Fort

cool

Ghetto

Mason

Sport

Freitag, 10. Juni 2011

Page 16: Searching does not mean finding Stuff - Apache Solr for TYPO3

Posting�File

Word Document

My 1,2

cat 1

is 1,2,5

cool 1

Baseball 2

Sport 2

San 3

Freitag, 10. Juni 2011

Page 17: Searching does not mean finding Stuff - Apache Solr for TYPO3

Search�in�TYPO3

Freitag, 10. Juni 2011

Page 18: Searching does not mean finding Stuff - Apache Solr for TYPO3

Indexed�Search

Indexed Search since TYPO3 Version 3.5

Frontend Indexing through the Frontend

Searches in Pages and in some Filetypes

Works with Languages and Accessrights

Freitag, 10. Juni 2011

Page 19: Searching does not mean finding Stuff - Apache Solr for TYPO3

Indexed�Search

Index in Database

Problems with large websites

Slow

no sorting

no Templating

OK for small websites

Freitag, 10. Juni 2011

Page 20: Searching does not mean finding Stuff - Apache Solr for TYPO3

Search�Expectations

Freitag, 10. Juni 2011

Page 21: Searching does not mean finding Stuff - Apache Solr for TYPO3

Expectation�vs.�Experience

Users expect „Google-Like“ interface and behaviour in search

No one navigates through an online shop

up to 30% of users use the search instead of going through text or navigation

Search is mediocre on a lot of websites

Slow and incomplete

Lots of improvement possible

Freitag, 10. Juni 2011

Page 22: Searching does not mean finding Stuff - Apache Solr for TYPO3

Apache�Solr

Enterprise Search Server

Freitag, 10. Juni 2011

Page 23: Searching does not mean finding Stuff - Apache Solr for TYPO3

Apache�Solr

Apache Software Foundation

Enterprise Search Server

uses the Lucene Index

Lots of great Features

CNet, Net"ix, Zappos.com and many more...

Freitag, 10. Juni 2011

Page 24: Searching does not mean finding Stuff - Apache Solr for TYPO3

Solr�Key-Features

Synonyms

Stopwords

Boosting / Weighting

Facetting

Paid Content / Elevation

Freitag, 10. Juni 2011

Page 25: Searching does not mean finding Stuff - Apache Solr for TYPO3

Solr�Key-Features

Synonyms

Stopwords

Boosting / Weighting

Facetting

Paid Content / Elevation

Spellchecking / Did you mean?

Freitag, 10. Juni 2011

Page 26: Searching does not mean finding Stuff - Apache Solr for TYPO3

Solr�Key-Features

Synonyms

Stopwords

Boosting / Weighting

Facetting

Paid Content / Elevation

Spellchecking / Did you mean?

Speed

Freitag, 10. Juni 2011

Page 27: Searching does not mean finding Stuff - Apache Solr for TYPO3

How�does�it�work?

REST like Interface

Indexing with POST

Search with GET

Results in XML, JSON, PHP and many more

Libraries for many programming languages

SolrPhpClient

Freitag, 10. Juni 2011

Page 28: Searching does not mean finding Stuff - Apache Solr for TYPO3

Why�and�how?

Freitag, 10. Juni 2011

Page 29: Searching does not mean finding Stuff - Apache Solr for TYPO3

Scratching�our�Itch

Why?

Indexed Search was too slow

misses a lot of now a days requirements

Freitag, 10. Juni 2011

Page 30: Searching does not mean finding Stuff - Apache Solr for TYPO3

History

Prototype im Summer 2008

Kick-off February 2009

„Acts like Indexed Search“

Early Access Program

T3CON September 2009 Version 1.0

Freitag, 10. Juni 2011

Page 31: Searching does not mean finding Stuff - Apache Solr for TYPO3

Components

Indexing

Search

Flexible Templating

Analysis and Statistics

Administration

Freitag, 10. Juni 2011

Page 32: Searching does not mean finding Stuff - Apache Solr for TYPO3

Challenges

Page Rendering in TYPO3

Access Rights

File Indexing

Easy Setup for Non Java People

Integrating Solr in general

Freitag, 10. Juni 2011

Page 33: Searching does not mean finding Stuff - Apache Solr for TYPO3

Solutions

Record Monitor und Indexing Queue

Solr Query Parser Plugin

Integration of Apache Tika

Fully Automated bash Install Script

SolrPhpClient

Freitag, 10. Juni 2011

Page 34: Searching does not mean finding Stuff - Apache Solr for TYPO3

Features

Facetted Search

File Indexing

Multi-language Support

Did you mean

Freitag, 10. Juni 2011

Page 35: Searching does not mean finding Stuff - Apache Solr for TYPO3

Features

Search Word Highlighting

Autocomplete / Suggestions

Access Rights Support

More to come

Freitag, 10. Juni 2011

Page 36: Searching does not mean finding Stuff - Apache Solr for TYPO3

Watch�out!

Freitag, 10. Juni 2011

Page 37: Searching does not mean finding Stuff - Apache Solr for TYPO3

„I do not have any solution. I admire the problem.“Ashleight Brillant, Cartonist and Author.

Freitag, 10. Juni 2011

Page 38: Searching does not mean finding Stuff - Apache Solr for TYPO3

Common�Problems

Relanvancy Perception Trap

Assumption: Search should display a certain result like an Employee Name

Query: Mike Miller

Results: Mill 100% Relanvancy

Miller 75% Relanvancy

Possible Issue: Stemming on proper Names

Solution: Don‘t stemm Fields with Names

Freitag, 10. Juni 2011

Page 39: Searching does not mean finding Stuff - Apache Solr for TYPO3

Common�Problems

Finding Corpses in your Corpus

While Searching you !nd „interesting“ Results

You have forgotten to hide content

You have not set the „no search“ Flag

You have made copies of records and forgotten them

Freitag, 10. Juni 2011

Page 40: Searching does not mean finding Stuff - Apache Solr for TYPO3

Common�Problems

Data updates without using the TCE Main

You wonder: Why do my new records of table XY not show up

You have updated the tables with i.e phpMyAdmin

You might have forgotten to add the Language id in the records

Freitag, 10. Juni 2011

Page 41: Searching does not mean finding Stuff - Apache Solr for TYPO3

Common�Problems

Can‘t access the Solr Server

You can not access the Solr Server on another Machine

Possible Solution

Freitag, 10. Juni 2011

Page 42: Searching does not mean finding Stuff - Apache Solr for TYPO3

Common�Problems

Help my Index gets deleted

Syntom: Your Index is empty

Possible Cause: Your Solr Server is not secured

Freitag, 10. Juni 2011

Page 43: Searching does not mean finding Stuff - Apache Solr for TYPO3

Common�Problems

My news are not being indexed

News that you have in a Sysfolder are not showing up in your Results

The Folder in not in the rootline of the Website

Con!gure the PID of the Sysfolder correctly

Freitag, 10. Juni 2011

Page 44: Searching does not mean finding Stuff - Apache Solr for TYPO3

Questions?

Freitag, 10. Juni 2011

Page 45: Searching does not mean finding Stuff - Apache Solr for TYPO3

d dkdevelopmentkommunikationdesign

Thank�you.

Freitag, 10. Juni 2011