searching does not mean finding stuff - apache solr for typo3
Post on 22-Oct-2014
1.486 views
DESCRIPTION
TRANSCRIPT
http://www.dkd.de
Freitag, 10. Juni 2011
d dkdevelopmentkommunikationdesign
Freitag, 10. Juni 2011
Welcome
Olivier DobberkauCEOdkd Internet Service GmbHFrankfurt am Main, Germany
Freitag, 10. Juni 2011
Agenda
What is search?
Search in TYPO3
Search expectations today
Apache Solr
Why and how?
Watch out!
Freitag, 10. Juni 2011
About�me
Freitag, 10. Juni 2011
Olivier�Dobberkau
Founder of dkd Internet Service GmbH
aka „the reverend never-end“
Met TYPO3 with Version 3.2 beta 3
Member of T3A BCC
43 years old
Twitter: @T3RevNeverEnd
Freitag, 10. Juni 2011
What�is�Search?
Freitag, 10. Juni 2011
Definition�of�Information�Retrieval
Information retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the World Wide Web.
Wikipedia: http://en.wikipedia.org/wiki/Information_retrieval
Freitag, 10. Juni 2011
Factors�in�Information�Retrieval
Recall
Precision
Fall-out
Scalability
Performance
Freitag, 10. Juni 2011
Factors�in�Information�Retrieval
Recall
Precision
Fall-out
Scalability
Performance
Simplicity
Flexibility
Freitag, 10. Juni 2011
Recall
Percent of documents that are returned
400 documents
100 containing information
25% recall
Freitag, 10. Juni 2011
Precision
Percentage of documents that are relevant
500 returned, 100 relevant
20% precision
Freitag, 10. Juni 2011
Best would be:
100% Recall with 100% Precision
Freitag, 10. Juni 2011
Index
The purpose of storing an index is to optimize speed and performance in !nding relevant documents for a search query.
Freitag, 10. Juni 2011
Index
Index
Document 5
Document 4
Document 3
Document 2
Document 1
Extbase
TYPO3
San
Baseball
My
is
Francisco
is
cat
T3CON
my
is
a
rocks
Fort
cool
Ghetto
Mason
Sport
Freitag, 10. Juni 2011
Posting�File
Word Document
My 1,2
cat 1
is 1,2,5
cool 1
Baseball 2
Sport 2
San 3
Freitag, 10. Juni 2011
Search�in�TYPO3
Freitag, 10. Juni 2011
Indexed�Search
Indexed Search since TYPO3 Version 3.5
Frontend Indexing through the Frontend
Searches in Pages and in some Filetypes
Works with Languages and Accessrights
Freitag, 10. Juni 2011
Indexed�Search
Index in Database
Problems with large websites
Slow
no sorting
no Templating
OK for small websites
Freitag, 10. Juni 2011
Search�Expectations
Freitag, 10. Juni 2011
Expectation�vs.�Experience
Users expect „Google-Like“ interface and behaviour in search
No one navigates through an online shop
up to 30% of users use the search instead of going through text or navigation
Search is mediocre on a lot of websites
Slow and incomplete
Lots of improvement possible
Freitag, 10. Juni 2011
Apache�Solr
Enterprise Search Server
Freitag, 10. Juni 2011
Apache�Solr
Apache Software Foundation
Enterprise Search Server
uses the Lucene Index
Lots of great Features
CNet, Net"ix, Zappos.com and many more...
Freitag, 10. Juni 2011
Solr�Key-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Freitag, 10. Juni 2011
Solr�Key-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Spellchecking / Did you mean?
Freitag, 10. Juni 2011
Solr�Key-Features
Synonyms
Stopwords
Boosting / Weighting
Facetting
Paid Content / Elevation
Spellchecking / Did you mean?
Speed
Freitag, 10. Juni 2011
How�does�it�work?
REST like Interface
Indexing with POST
Search with GET
Results in XML, JSON, PHP and many more
Libraries for many programming languages
SolrPhpClient
Freitag, 10. Juni 2011
Why�and�how?
Freitag, 10. Juni 2011
Scratching�our�Itch
Why?
Indexed Search was too slow
misses a lot of now a days requirements
Freitag, 10. Juni 2011
History
Prototype im Summer 2008
Kick-off February 2009
„Acts like Indexed Search“
Early Access Program
T3CON September 2009 Version 1.0
Freitag, 10. Juni 2011
Components
Indexing
Search
Flexible Templating
Analysis and Statistics
Administration
Freitag, 10. Juni 2011
Challenges
Page Rendering in TYPO3
Access Rights
File Indexing
Easy Setup for Non Java People
Integrating Solr in general
Freitag, 10. Juni 2011
Solutions
Record Monitor und Indexing Queue
Solr Query Parser Plugin
Integration of Apache Tika
Fully Automated bash Install Script
SolrPhpClient
Freitag, 10. Juni 2011
Features
Facetted Search
File Indexing
Multi-language Support
Did you mean
Freitag, 10. Juni 2011
Features
Search Word Highlighting
Autocomplete / Suggestions
Access Rights Support
More to come
Freitag, 10. Juni 2011
Watch�out!
Freitag, 10. Juni 2011
„I do not have any solution. I admire the problem.“Ashleight Brillant, Cartonist and Author.
Freitag, 10. Juni 2011
Common�Problems
Relanvancy Perception Trap
Assumption: Search should display a certain result like an Employee Name
Query: Mike Miller
Results: Mill 100% Relanvancy
Miller 75% Relanvancy
Possible Issue: Stemming on proper Names
Solution: Don‘t stemm Fields with Names
Freitag, 10. Juni 2011
Common�Problems
Finding Corpses in your Corpus
While Searching you !nd „interesting“ Results
You have forgotten to hide content
You have not set the „no search“ Flag
You have made copies of records and forgotten them
Freitag, 10. Juni 2011
Common�Problems
Data updates without using the TCE Main
You wonder: Why do my new records of table XY not show up
You have updated the tables with i.e phpMyAdmin
You might have forgotten to add the Language id in the records
Freitag, 10. Juni 2011
Common�Problems
Can‘t access the Solr Server
You can not access the Solr Server on another Machine
Possible Solution
Freitag, 10. Juni 2011
Common�Problems
Help my Index gets deleted
Syntom: Your Index is empty
Possible Cause: Your Solr Server is not secured
Freitag, 10. Juni 2011
Common�Problems
My news are not being indexed
News that you have in a Sysfolder are not showing up in your Results
The Folder in not in the rootline of the Website
Con!gure the PID of the Sysfolder correctly
Freitag, 10. Juni 2011
Questions?
Freitag, 10. Juni 2011
d dkdevelopmentkommunikationdesign
Thank�you.
Freitag, 10. Juni 2011