Solr & Lucene @ Etsy by Gregg Donovan

Download Solr & Lucene @ Etsy by Gregg Donovan

Post on 20-May-2015

6.262 views

Category:

Technology

3 download

Embed Size (px)

DESCRIPTION

Slides from my talk on "Solr & Lucene @ Etsy" from the LuceneRevolution conference on May 26th, 2011 in San Francisco.

TRANSCRIPT

<ul><li> 1. Solr &amp; Lucene at Etsy Gregg DonovanTechnical Lead, Searchgregg@etsy.com</li></ul> <p> 2. 1.5 years Solr &amp; Lucene at Etsy.com3 years Solr &amp; Lucene at TheLadders.com 3. 8+ million members 4. 9.3 million items 5. 800k+ active sellers 6. 1+ billion pageviews / month 7. Maximize Solr out-of-the-box 8. Hack at a low-level 9. Know when to do each 10. Or 11. Dont fear trunk 12. builds.apache.org/job/Solr-trunk/changes 13. http://localhost:8393/solr/placesuggest/ select?q={!lucene}s*&amp;sfield=latlong&amp;pt=37.595804,-122.364521&amp;sort=div(geodist(),sqrt(sum(population,50)))%20asc 14. {!lucene} {!eld} {!term} {!boost} {!func}{!dismax}{!edismax} 15. Cheap ranking awesomeness 16. ExternalFileField ftw! 17. schema.xml:/search/data/treasury/external_hotness.1306390802088:1=2.32=1.73=1.1Solr query:sort={!func}hotness+desc 18. ExternalFileField caveats 19. More relevance: boost query 20. http://localhost:8983/solr/listings/select?q={!boost b=$rel v=$qq}&amp;rel=category:furniture^10+OR+((-material:acrylic)^5)&amp;qq=desk 21. Impression tracking 22. etsy.com/search?q=desk&amp;explain=1 23. Side-by-Side testing 24. Cheap performance wins 25. Put off sharding till you must 26. cat ${indexDir}/* &gt; /dev/null 27. Return IDs, minimize stored elds 28. RAM: $10-20 / GB 29. SSD: 0.1ms vs 10ms seek 30. Custom? 31. solr-user 32. Tools for low-level hacking 33. Continuous deployment 34. One button.So easy a dog could do it. 35. MTTR &gt; MTBF 36. github.com/etsy/logster 37. Tracking GC 38. export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -Xloggc:/var/log/search/gc.log" 39. Alerting 40. Testing 41. SaveAsFixture 42. Proling 43. Java Primitive Library fastutil trove4j 44. Know the hooksSolrRequestHandlerSearchComponentQParserPlugin SolrEventListener SolrCache ValueSourceParser 45. SolrIndexSearcher gotchasreference counting using it as a cache key: WeakHashMap myCache... 46. Example:personalized collections 47. fq={!term f=id}123 OR {!term f=id}456 48. Need a map of PK to docId 49. Use custom SolrCache plus SolrEventListener to ll it 50. github.com/giokincade/FastTermFilter 51. i18n currency sorting and ltering 52. currency.xml:! ! ! ! ! ! ! ! ! ...! ! ! ! ! ! ! ! ! ! ... 53. price:[$10.00 to $50.00]price:[10.00USD to 50.00USD] price:20.00EUR 54. MoneyFieldType.java:@Overridepublic Query getRangeQuery(QParser parser, SchemaField field, String part1, String part2,final boolean minInclusive, final boolean maxInclusive) {final MoneyValue p1 = MoneyValue.parse(part1, defaultCurrency);final MoneyValue p2 = MoneyValue.parse(part2, defaultCurrency);if (!p1.getCurrencyCode().equals(p2.getCurrencyCode())) {throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,new ParseException("Cannot parse range query " + part1 + " to " + part2 +": range queries only supported when upper and lower bound have samecurrency."));}String currencyCode = p1.getCurrencyCode();final MoneyValueSource vs = new MoneyValueSource(field, currencyCode, parser);return new SolrConstantScoreQuery(new ValueSourceRangeFilter(vs,p1.getAmount() + "", p2.getAmount() + "", minInclusive, maxInclusive));} 55. Replication gotcha 56. SOLR-2202 57. Related Searches 58. Autosuggest! 59. bjewlery dewelry ejewelry ejwelry ewelery ewerly ewlery fewelryfewlery fjewelery fjewelry gewerly gewlery hewelery hewelry hewerlyhewlery hjewelry iewelry ijewelry jawelery jawlery jeawlery jeeleryjeelryjeewelery jeewelry jeewlery jeewlry jefwelry jejelry jelelryjelery jellery jelwelery jelwelry jelwlery jemelry jemerly jemwelryjeqwelry jerelery jerelry jerely jererly jerlery jerwelery jerwelryjerwely jerwerlyjeselery jeselry jevelry jeverly jewalery jewdelryjewedlry jeweelrry jeweelry jeweely jeweer jeweery jeweilry jeweiryjewejery jewejlry jewejrly jewejry jewekey jewekry jewelary jeweldyjewele jewelee jewelelryjewelera jewelerey jewelerly jewelertjewelerty jeweleru jeweleruy jeweleryl jewelerys jeweleryy jeweletjewelety jeweleya jewelfry jewelfy jeweliy jewellryp jewelltryjewelly jewelory jewelra jewelray jewelrejewelree jewelreyyjewelrfy jewelrh jewelri jewelrky jewelrly jewelrr jewelrs jewelrsyjewelrt jewelrty jewelru jewelruy jewelrye jewelryh jewelryljewelrym jewelryr jewelrys jewelryt jewelryu jewelryukjewelryyjewelrz jewelsry jewelsy jeweltry jewelty jewelw jewelweryjewelwey jewelwy jewelya jewelyj jewelyr jewelyry jewelyu jewelyyjewelzry jeweory jewerey jeweriy jewerky jewerlary jewerley jewerlijewerlly jewerlsjewerlt jewerlu jewerlyh jewerlyr jewerlys jewerlyujewerry jeweryl jewetry jewewlry jewewly jewewrly jewewry jeweylryjewiery jewilary jewkery jewlary jewledy jewleery jewlelery jewlely 60. The TermDictionary is not a whitelist</p>