lucene boot camp grant ingersoll lucid imagination nov. 4, 2008 new orleans, la
TRANSCRIPT
![Page 1: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/1.jpg)
Lucene Boot Camp
Grant IngersollLucid Imagination
Nov. 4, 2008 New Orleans, LA
![Page 2: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/2.jpg)
2
Schedule
• In-depth Indexing/Searching – Performance, Internals– Filters, Sorting
• Terms and Term Vectors• Class Project• Q & A
![Page 3: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/3.jpg)
3
Day I Recap
• Indexing– IndexWriter
– Document/Field– Analyzer
• Searching– IndexSearcher
– IndexReader
– QueryParser
• Analysis• Contrib
![Page 4: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/4.jpg)
4
Indexing In-Depth
• Deletions and Updates• Optimize• Important Internals
– File Formats– Segments, Commits, Merging– Compound File System
• Performance
![Page 5: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/5.jpg)
5
Lucene File Formats and Structures
• http://lucene.apache.org/java/2_4_0/fileformats.html
• A Lucene index is made up of one or more Segments
• Lucene tracks Documents internally by an int “id”
• This id may change across index operations– You should not rely on it unless you know your index isn’t changing
• You can ask for a Document by this id on the IndexReader
![Page 6: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/6.jpg)
6
Segments
• Each Segment is an independent index containing:– Field Names– Stored Field values– Term Dictionary, proximity info and normalization factors
– Term Vectors (optional)– Deleted Docs
• Compound File System (CFS) stores all of these logical pieces in a single file
![Page 7: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/7.jpg)
How Lucene Indexes
• Lucene indexes Documents into memory– At certain trigger points, memory (segments) are committed/flushed to the Directory•Can be forced by calling commit()
– Segments are periodically merged (more in a moment)
![Page 8: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/8.jpg)
8
Segments and Merging
• May be created when new documents are added
• Are merged from time to time based on segment size in relation to:– MergePolicy– MergeScheduler– Optimization
![Page 9: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/9.jpg)
9
Merge Policy
• Identifies Segments to be merged
• Two Current Implementations– LogDocMergePolicy– LogByteSizeMergePolicy
• mergeFactor - Max # of segments allowed before merging
![Page 10: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/10.jpg)
10
MergeScheduler
• Responsible for performing the merge
• Two Implementations:– Serial - blocking– Concurrent - new, background
![Page 11: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/11.jpg)
11
Optimize
• Optimize is the process of merging segments down into a single segment
• This process can yield significant speedups in search
• Can be slow• Can also do partial optimizes
![Page 12: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/12.jpg)
12
Final Thoughts On Merging
• Usually don’t have to think about it, except when to optimize
• In high update, performance critical environments, you may need to dig into it more as it can sometimes cause long pauses
• Good to optimize when you can, otherwise, keep a low mergeFactor
![Page 13: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/13.jpg)
Deletion
• A deletion only marks the Document as deleted– Doesn’t get physically removed until a merge
• Deletions can be a bit confusing– Both IndexReader and IndexWriter have delete methods•By: id, term(s), Query(s)
![Page 14: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/14.jpg)
14
Task
– Build your index from yesterday and then try some deletes•Id, term, Query
– Also try out an optimize on a FSDirectory against the full Reuters sample
– 15-20 minutes
![Page 15: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/15.jpg)
15
Updates
• Updates are always a delete and an add
• Updates are always a delete and an add– Yes, that is a repeat!– Nature of data structures used in search
• See IndexWriter.updateDocument()
![Page 16: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/16.jpg)
Performance Factors• setRAMBufferSizeMB
– New model for automagically controlling indexing factors based on the amount of memory in use
– Obsoletes setMaxBufferedDocs• maxBufferedDocs
– Minimum # of docs before merge occurs and a new segment is created
– Usually, Larger == faster, but more RAM
![Page 17: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/17.jpg)
17
More Factors
• mergeFactor– How often segments are merged
– Smaller == less RAM, better for incremental updates
– Larger == faster, better for batch indexing
• maxFieldLength– Limit the number of terms in a Document
• Analysis
• Reuse– Document, TokenStream, Token
![Page 18: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/18.jpg)
Index Threading
• IndexWriter and IndexReader are thread-safe and can be shared between threads without external synchronization
• One open IndexWriter per Directory
• Parallel Indexing– Index to separate Directory instances– Merge using IndexWriter.addIndexes– Could also distribute and collect
![Page 19: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/19.jpg)
Benchmarking Indexing
• contrib/benchmark• Try out different algorithms between Lucene 2.2 and 2.3– contrib/benchmark/conf:
• indexing.alg• indexing-multithreaded.alg
• Info:– Mac Pro 2 x 2GHz Dual-Core Xeon– 4 GB RAM– ant run-task -Dtask.alg=./conf/indexing.alg -
Dtask.mem=1024M
![Page 20: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/20.jpg)
Benchmarking ResultsRecords/Sec
Avg. T Mem
2.2 421 39MTrunk 2,122 52MTrunk-mt (4)
3,680 57MYour results will depend on analysis, etc.
![Page 21: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/21.jpg)
Searching
• Earlier we touched on basics of search using the QueryParser
• Now look at:– Searcher/IndexReader Lifecycle– Query classes– More details on the QueryParser– Filters– Sorting
![Page 22: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/22.jpg)
Lifecycle
• Recall that the IndexReader loads a snapshot of index into memory– This means updates made since loading the index will not be seen
• Business rules are needed to define how often to reload the index, if at all– IndexReader.isCurrent() can help
• Loading an index is an expensive operation– Do not open a Searcher/IndexReader for every search
![Page 23: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/23.jpg)
23
Reopen
• It is possible to have IndexReader reopen new or changed segments– Save some on the cost of loading a new index
• Does not close the old reader, so application must
• See DeletionsUpdatesTest.testReopen()
![Page 24: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/24.jpg)
Query Classes• TermQuery is basis for all non-span queries
• BooleanQuery combines multiple Query instances as clauses– should– required
• PhraseQuery finds terms occurring near each other, position-wise– “slop” is the edit distance between two terms
• Take 2-3 minutes to explore Query implementations
![Page 25: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/25.jpg)
Spans
• Spans provide information about where matches took place
• Not supported by the QueryParser
• Can be used in BooleanQuery clauses
• Take 2-3 minutes to explore SpanQuery classes– SpanNearQuery useful for doing phrase matching
![Page 26: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/26.jpg)
QueryParser
• MultiFieldQueryParser• Boolean operators cause confusion
– Better to think in terms of required (+ operator) and not allowed (- operator)
• Check JIRA for QueryParser issues• http://www.gossamer-threads.com/lists/lucene/java-us
er/40945
• Most applications either modify QP, create their own, or restrict to a subset of the syntax
• Your users may not need all the “flexibility” of the QP
![Page 27: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/27.jpg)
Sorting• Lucene default sort is by score• Searcher has several methods that take in a Sort object
• Sorting should be addressed during indexing
• Sorting is done on Fields containing a single term that can be used for comparison
• The SortField defines the different sort types available– AUTO, STRING, INT, FLOAT, CUSTOM, SCORE, DOC
![Page 28: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/28.jpg)
Sorting II
• Look at Searcher, Sort and SortField
• Custom sorting is done with a SortComparatorSource
• Sorting can be very expensive– Terms are cached in the FieldCache
![Page 29: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/29.jpg)
Filters
• Filters restrict the search space to a subset of Documents
• Use Cases– Search within a Search– Restrict by date– Rating– Security– Author
![Page 30: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/30.jpg)
Filter Classes
• QueryWrapperFilter (QueryFilter)– Restrict to subset of Documents that match a Query
• RangeFilter– Restrict to Documents that fall within a range
– Better alternative to RangeQuery
• CachingWrapperFilter– Wrap another Filter and provide caching
![Page 31: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/31.jpg)
31
Task
• Modify your program to sort by a field and to filter by a query or some other criteria– ~15 minutes
![Page 32: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/32.jpg)
Searchers• MultiSearcher
– Search over multiple Searchables, including remote
• MultiReader– Not a Searcher, but can be used with IndexSearcher to achieve same results for local indexes
• ParallelMultiSearcher– Like MultiSearcher, but threaded
• RemoteSearchable– RMI based remote searching
• Look at MultiSearcherTest in example code
![Page 33: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/33.jpg)
Expert Results
• Searcher has several “expert” methods
• HitCollector allows low-level access to all Documents as they are scored
![Page 34: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/34.jpg)
Search Performance• Search speed is based on a number of factors:– Query Type(s)– Query Size– Analysis– Occurrences of Query Terms– Optimize– Index Size– Index type (RAMDirectory, other)– Usual Suspects
• CPU• Memory• I/O• Business Needs
![Page 35: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/35.jpg)
Query Types
• Be careful with WildcardQuery as it rewrites to a BooleanQuery containing all the terms that match the wildcards
• Avoid starting a WildcardQuery with wildcard
• Use ConstantScoreRangeQuery instead of RangeQuery
• Be careful with range queries and dates– User mailing list and Wiki have useful tips for optimizing date handling
![Page 36: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/36.jpg)
Query Size
• Stopword removal
• Search an “all” field instead of many fields with the same terms
• Disambiguation – May be useful when doing synonym expansion
– Difficult to automate and may be slower
– Some applications may allow the user to disambiguate
• Relevance Feedback/More Like This– Use most important words
– “Important” can be defined in a number of ways
![Page 37: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/37.jpg)
Usual Suspects• CPU
– Profile your application
• Memory– Examine your heap size, garbage collection approach
• I/O– Cache your Searcher
• Define business logic for refreshing based on indexing needs
– Warm your Searcher before going live -- See Solr
• Business Needs– Do you really need to support Wildcards?
– What about date range queries down to the millisecond?
![Page 38: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/38.jpg)
FieldSelector
• Prior to version 2.1, Lucene always loaded all Fields in a Document
• FieldSelector API addition allows Lucene to skip large Fields– Options: Load, Lazy Load, No Load, Load and Break, Load for Merge, Size, Size and Break
• Makes storage of original content more viable without large cost of loading it when not used
• FieldSelectorTest in example code
![Page 39: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/39.jpg)
39
Relevance
• At some point along your journey, you will get results that you think are “bad”
• Is it a big deal?– Content, Content, Content!– Relevance Judgments– Don’t break other queries just to “fix” one
• Hardcode it!– A query doesn’t always have to result in a “search”
![Page 40: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/40.jpg)
Scoring and Similarity
• Lucene has sophisticated scoring mechanism designed to meet most needs
• Has hooks for modifying scores
• Scoring is handled by the Query, Weight and Scorer class
![Page 41: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/41.jpg)
Explanations
• explain(Query, int) method is useful for understanding why a Document scored the way it did
• Shows all the pieces that went into scoring the result:– Tf, DF, boosts, etc.
![Page 42: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/42.jpg)
Tuning Relevance
• FunctionQuery from Solr (variation in Lucene)
• Override Similarity• Implement own Query and related classes
• Payloads• Boosts
![Page 43: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/43.jpg)
43
Task
• Open Luke and try some queries and then use the “explain” button
• Or, write some code to do explains on a query and some documents
• See how Query type, boosting, other factors play a role in the score
![Page 44: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/44.jpg)
44
Terms and Term Vectors
• Sometimes you need access to the Term Dictionary:– Auto suggest– Frequency information
• Sometimes you need a Document-centric view of terms, frequencies, positions and offsets– Term Vectors
![Page 45: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/45.jpg)
Term Information• TermEnum gives access to terms and how many Documents they occur in– IndexReader.terms()
• TermDocs gives access to the frequency of a term in a Document– IndexReader.termDocs()
– TermPositions extends TermDocs and provides access to position and payload info– IndexReader.termPositions()
![Page 46: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/46.jpg)
46
Term Vectors
• Term Vectors give access to term frequency information in a given Document– IndexReader.getTermFreqVector
• TermVectorMapper provides callbacks for working with Term Vectors
![Page 47: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/47.jpg)
47
TermsTest
• Provides samples of working with terms and term vectors
![Page 48: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/48.jpg)
Lunch ?
1-2:30
![Page 49: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/49.jpg)
Recap
• Indexing• Searching• Performance• Odds and Ends
– Explains– FieldSelector– Relevance– Terms and Term Vectors
![Page 50: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/50.jpg)
50
Class Project
• Your chance to really dig in and get your hands dirty
• Ask Questions• Options…
![Page 51: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/51.jpg)
51
Option I
• Start building out your Lucene Application!– Index your Data (or any data)
•Threading/Updates/Deletions•Analysis
– Search•Caching/Warming•Dealing with Updates•Multi-threaded
– Display
![Page 52: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/52.jpg)
52
Option II
• Dig deeper into an area of interest– Performance
•How fast can you index?•Search? Queries per Second?
– Analysis– Query Parsing– Scoring– Contrib
![Page 53: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/53.jpg)
53
Option III
• Dig into JIRA issues and find something to fix in Lucene
• https://issues.apache.org/jira/secure/Dashboard.jspa
• http://wiki.apache.org/lucene-java/HowToContribute
![Page 55: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/55.jpg)
55
Option V
• Other?– Architecture Review/Discussion– Use Case Discussion
![Page 56: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/56.jpg)
Project Post-Mortem
• Volunteers to share?
![Page 57: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/57.jpg)
Open Discussion
• Multilingual Best Practices– UNICODE– One Index versus many
• Advanced Analysis• Distributed Lucene• Crawling• Hadoop• Nutch• Solr
![Page 59: Lucene Boot Camp Grant Ingersoll Lucid Imagination Nov. 4, 2008 New Orleans, LA](https://reader038.vdocuments.site/reader038/viewer/2022103122/56649f585503460f94c7e6f5/html5/thumbnails/59.jpg)
Finally…
• Please take the time to fill out a survey to help me improve this training– Located in base directory of source
– Email it to me at [email protected]
• There are several Lucene related talks on Wednesday