solr & lucene @ etsy by gregg donovan

84
Solr & Lucene at Etsy Gregg Donovan [email protected] Technical Lead, Search

Upload: gregg-donovan

Post on 20-May-2015

6.335 views

Category:

Technology


3 download

DESCRIPTION

Slides from my talk on "Solr & Lucene @ Etsy" from the LuceneRevolution conference on May 26th, 2011 in San Francisco.

TRANSCRIPT

Page 1: Solr & Lucene @ Etsy by Gregg Donovan

Solr & Lucene at EtsyGregg Donovan

[email protected] Lead, Search

Page 2: Solr & Lucene @ Etsy by Gregg Donovan

1.5 years Solr & Lucene at Etsy.com

3 years Solr & Lucene at TheLadders.com

Page 3: Solr & Lucene @ Etsy by Gregg Donovan
Page 4: Solr & Lucene @ Etsy by Gregg Donovan

8+ million members

Page 5: Solr & Lucene @ Etsy by Gregg Donovan

9.3 million items

Page 6: Solr & Lucene @ Etsy by Gregg Donovan

800k+ active sellers

Page 7: Solr & Lucene @ Etsy by Gregg Donovan

1+ billion pageviews / month

Page 8: Solr & Lucene @ Etsy by Gregg Donovan
Page 9: Solr & Lucene @ Etsy by Gregg Donovan
Page 10: Solr & Lucene @ Etsy by Gregg Donovan
Page 11: Solr & Lucene @ Etsy by Gregg Donovan
Page 12: Solr & Lucene @ Etsy by Gregg Donovan
Page 13: Solr & Lucene @ Etsy by Gregg Donovan
Page 14: Solr & Lucene @ Etsy by Gregg Donovan

Maximize Solr out-of-the-box

Page 15: Solr & Lucene @ Etsy by Gregg Donovan

Hack at a low-level

Page 16: Solr & Lucene @ Etsy by Gregg Donovan

Know when to do each

Page 17: Solr & Lucene @ Etsy by Gregg Donovan
Page 18: Solr & Lucene @ Etsy by Gregg Donovan

Or

Page 19: Solr & Lucene @ Etsy by Gregg Donovan
Page 20: Solr & Lucene @ Etsy by Gregg Donovan

Don’t fear trunk

Page 21: Solr & Lucene @ Etsy by Gregg Donovan

builds.apache.org/job/Solr-trunk/changes

Page 22: Solr & Lucene @ Etsy by Gregg Donovan
Page 23: Solr & Lucene @ Etsy by Gregg Donovan
Page 24: Solr & Lucene @ Etsy by Gregg Donovan

http://localhost:8393/solr/placesuggest/select?

q={!lucene}s*&sfield=latlong&pt=37.595804,-122.364521

&sort=div(geodist(),sqrt(sum(population,50)))%20asc

Page 25: Solr & Lucene @ Etsy by Gregg Donovan

{!lucene}

{!field}

{!func}

{!dismax}

{!edismax}

{!boost}

{!term}

Page 26: Solr & Lucene @ Etsy by Gregg Donovan

Cheap ranking awesomeness

Page 27: Solr & Lucene @ Etsy by Gregg Donovan
Page 28: Solr & Lucene @ Etsy by Gregg Donovan

ExternalFileField ftw!

Page 29: Solr & Lucene @ Etsy by Gregg Donovan

schema.xml: <fieldType name="file" keyField="treasury_id" defVal="0" stored="false" indexed="true" class="solr.ExternalFileField" valType="float"/> <field name="hotness" type="file"/>

/search/data/treasury/external_hotness.1306390802088:1=2.32=1.73=1.1

Solr query:sort={!func}hotness+desc

Page 30: Solr & Lucene @ Etsy by Gregg Donovan

ExternalFileField caveats

Page 31: Solr & Lucene @ Etsy by Gregg Donovan

More relevance: boost query

Page 32: Solr & Lucene @ Etsy by Gregg Donovan

http://localhost:8983/solr/listings/select?q={!boost b=$rel v=$qq}&rel=category:furniture^10+OR+((-material:acrylic)^5)&qq=desk

Page 33: Solr & Lucene @ Etsy by Gregg Donovan

Impression tracking

Page 34: Solr & Lucene @ Etsy by Gregg Donovan

etsy.com/search?q=desk&explain=1

Page 35: Solr & Lucene @ Etsy by Gregg Donovan

Side-by-Side testing

Page 36: Solr & Lucene @ Etsy by Gregg Donovan
Page 37: Solr & Lucene @ Etsy by Gregg Donovan

Cheap performance wins

Page 38: Solr & Lucene @ Etsy by Gregg Donovan

Put off sharding till you must

Page 39: Solr & Lucene @ Etsy by Gregg Donovan

cat ${indexDir}/* > /dev/null

Page 40: Solr & Lucene @ Etsy by Gregg Donovan

Return IDs, minimize stored fields

Page 41: Solr & Lucene @ Etsy by Gregg Donovan

RAM: $10-20 / GB

Page 42: Solr & Lucene @ Etsy by Gregg Donovan

SSD: 0.1ms vs 10ms seek

Page 43: Solr & Lucene @ Etsy by Gregg Donovan

Custom?

Page 44: Solr & Lucene @ Etsy by Gregg Donovan

solr-user

Page 45: Solr & Lucene @ Etsy by Gregg Donovan

Tools for low-level hacking

Page 46: Solr & Lucene @ Etsy by Gregg Donovan

Continuous deployment

Page 47: Solr & Lucene @ Etsy by Gregg Donovan
Page 48: Solr & Lucene @ Etsy by Gregg Donovan

One button. So easy a dog could do it.

Page 49: Solr & Lucene @ Etsy by Gregg Donovan
Page 50: Solr & Lucene @ Etsy by Gregg Donovan
Page 51: Solr & Lucene @ Etsy by Gregg Donovan

MTTR > MTBF

Page 52: Solr & Lucene @ Etsy by Gregg Donovan
Page 53: Solr & Lucene @ Etsy by Gregg Donovan
Page 54: Solr & Lucene @ Etsy by Gregg Donovan

github.com/etsy/logster

Page 55: Solr & Lucene @ Etsy by Gregg Donovan

Tracking GC

Page 56: Solr & Lucene @ Etsy by Gregg Donovan

export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX:+PrintGCDetails -Xloggc:/var/log/search/gc.log"

Page 57: Solr & Lucene @ Etsy by Gregg Donovan
Page 58: Solr & Lucene @ Etsy by Gregg Donovan
Page 59: Solr & Lucene @ Etsy by Gregg Donovan

Alerting

Page 60: Solr & Lucene @ Etsy by Gregg Donovan

Testing

Page 61: Solr & Lucene @ Etsy by Gregg Donovan
Page 62: Solr & Lucene @ Etsy by Gregg Donovan

SaveAsFixture

Page 63: Solr & Lucene @ Etsy by Gregg Donovan

Profiling

Page 64: Solr & Lucene @ Etsy by Gregg Donovan

Java Primitive Library

fastutil

trove4j

Page 65: Solr & Lucene @ Etsy by Gregg Donovan

Know the hooks

QParserPlugin

SolrEventListener

SolrRequestHandler

SearchComponent

SolrCache

ValueSourceParser

Page 66: Solr & Lucene @ Etsy by Gregg Donovan

SolrIndexSearcher gotchasreference counting

using it as a cache key:

WeakHashMap<SolrIndexSearcher,MyValue> myCache...

Page 67: Solr & Lucene @ Etsy by Gregg Donovan

Example:personalized collections

Page 68: Solr & Lucene @ Etsy by Gregg Donovan
Page 69: Solr & Lucene @ Etsy by Gregg Donovan

fq={!term f=id}123 OR {!term f=id}456

Page 70: Solr & Lucene @ Etsy by Gregg Donovan

Need a map of PK to docId

Page 71: Solr & Lucene @ Etsy by Gregg Donovan

Use custom SolrCache plus SolrEventListener to fill it

Page 72: Solr & Lucene @ Etsy by Gregg Donovan

github.com/giokincade/FastTermFilter

Page 73: Solr & Lucene @ Etsy by Gregg Donovan

i18n currency sorting and filtering

Page 74: Solr & Lucene @ Etsy by Gregg Donovan
Page 75: Solr & Lucene @ Etsy by Gregg Donovan

currency.xml:

<currencyConfig version="1.0">! <currencies>! ! <currency name="United States Dollar" symbol="$" code="USD"/>! ! <currency name="Australian Dollar" symbol="$" code="AUD"/>! ! <currency name="Canadian Dollar" symbol="$" code="CAD"/>! ! <currency name="Czech Koruna" symbol="Kč" code="CZK"/>...! </currencies>! <rates>! ! <rate from="USD" to="AUD" rate="1.168750"/>! ! <rate from="USD" to="CAD" rate="1.085000"/>! ! <rate from="USD" to="CZK" rate="20.107500"/>! ! <rate from="USD" to="DKK" rate="5.323750"/>... </rates></currencyConfig>

Page 76: Solr & Lucene @ Etsy by Gregg Donovan

price:[10.00USD to 50.00USD]

price:20.00EUR

price:[$10.00 to $50.00]

Page 77: Solr & Lucene @ Etsy by Gregg Donovan

MoneyFieldType.java:

@Override public Query getRangeQuery(QParser parser, SchemaField field, String part1, String part2, final boolean minInclusive, final boolean maxInclusive) { final MoneyValue p1 = MoneyValue.parse(part1, defaultCurrency); final MoneyValue p2 = MoneyValue.parse(part2, defaultCurrency);

if (!p1.getCurrencyCode().equals(p2.getCurrencyCode())) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, new ParseException("Cannot parse range query " + part1 + " to " + part2 + ": range queries only supported when upper and lower bound have same currency.")); }

String currencyCode = p1.getCurrencyCode(); final MoneyValueSource vs = new MoneyValueSource(field, currencyCode, parser);

return new SolrConstantScoreQuery(new ValueSourceRangeFilter(vs, p1.getAmount() + "", p2.getAmount() + "", minInclusive, maxInclusive)); }

Page 78: Solr & Lucene @ Etsy by Gregg Donovan

Replication gotcha

Page 79: Solr & Lucene @ Etsy by Gregg Donovan

SOLR-2202

Page 80: Solr & Lucene @ Etsy by Gregg Donovan

Related Searches

Page 81: Solr & Lucene @ Etsy by Gregg Donovan

Autosuggest!

Page 82: Solr & Lucene @ Etsy by Gregg Donovan

bjewlery dewelry ejewelry ejwelry ewelery ewerly ewlery fewelry fewlery fjewelery fjewelry gewerly gewlery hewelery hewelry hewerly hewlery hjewelry iewelry ijewelry jawelery jawlery jeawlery jeelery jeelry jeewelery jeewelry jeewlery jeewlry jefwelry jejelry jelelry jelery jellery jelwelery jelwelry jelwlery jemelry jemerly jemwelry jeqwelry jerelery jerelry jerely jererly jerlery jerwelery jerwelry jerwely jerwerly jeselery jeselry jevelry jeverly jewalery jewdelry jewedlry jeweelrry jeweelry jeweely jeweer jeweery jeweilry jeweiry jewejery jewejlry jewejrly jewejry jewekey jewekry jewelary jeweldy jewele jewelee jewelelry jewelera jewelerey jewelerly jewelert jewelerty jeweleru jeweleruy jeweleryl jewelerys jeweleryy jewelet jewelety jeweleya jewelfry jewelfy jeweliy jewellryp jewelltry jewelly jewelory jewelra jewelray jewelre jewelree jewelreyy jewelrfy jewelrh jewelri jewelrky jewelrly jewelrr jewelrs jewelrsy jewelrt jewelrty jewelru jewelruy jewelrye jewelryh jewelryl jewelrym jewelryr jewelrys jewelryt jewelryu jewelryuk jewelryy jewelrz jewelsry jewelsy jeweltry jewelty jewelw jewelwery jewelwey jewelwy jewelya jewelyj jewelyr jewelyry jewelyu jewelyy jewelzry jeweory jewerey jeweriy jewerky jewerlary jewerley jewerli jewerlly jewerls jewerlt jewerlu jewerlyh jewerlyr jewerlys jewerlyu jewerry jeweryl jewetry jewewlry jewewly jewewrly jewewry jeweylry jewiery jewilary jewkery jewlary jewledy jewleery jewlelery jewlely

Page 83: Solr & Lucene @ Etsy by Gregg Donovan

The TermDictionary is not a whitelist

Page 84: Solr & Lucene @ Etsy by Gregg Donovan