nice docs finish first - designing search ranking for fairness at etsy: presented by fiona condon,...
TRANSCRIPT
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Nice Docs Finish First: Designing Search Ranking for Fairness at Etsy
Fiona Condon Senior Software Engineer, Etsy
3
02
4
02
5
02
6
02
7
02Etsy by the Numbers
• 1.5 million active sellers • 21.7 million active buyers • 32 million listings for sale • Solr for listing search
8
02Etsy by the Numbers
9
02Search at Etsy
10
02Search at Etsy
11
01Search at Etsy
12
01Search at Etsy
13
01Our challenge
To return the best listings
14
01Our challenge
To return the most informative, honest, high-quality listings
15
01Our challenge
To return a diverse, fresh mix of the most informative, honest, high-quality listings
16
02
17
01Etsy Values
We are a mindful, transparent, and
humane business.
We plan and build for the long term.
We value craftsmanship in
all we make.
We believe fun should be part of everything we do.
We keep it real, always.
18
01
Don’t hate the player, change the game
We keep it real, always.
19
01Don’t hate the player, change the game
20
01Don’t hate the player, change the game
21
01Don’t hate the player, change the game
The count of the term in the document
The inverse of the count of the term across all documents
22
01Don’t hate the player, change the game
23
01Don’t hate the player, change the game
24
01Don’t hate the player, change the game
25
01Don’t hate the player, change the game
26
01Don’t hate the player, change the game
@Overridepublic float tf(float freq) {
return (freq > 0) ? 1.0f : 0.0f; }
27
01Don’t hate the player, change the game
• Set TF to 1
28
01Don’t hate the player, change the game
• Set TF to 1 • Choose the right fields to index
29
01Don’t hate the player, change the game
30
01Don’t hate the player, change the game
31
01Don’t hate the player, change the game
• Set TF to 1 • Choose the right fields to index
32
01
Crafting a quality signal
We value craftsmanship in all we make
33
01Crafting a quality signal
34
01Crafting a quality signal
35
01Crafting a quality signal
36
01Crafting a quality signal
37
01Crafting a quality signal
38
01Crafting a quality signal
• Avoid presentation bias
39
01Crafting a quality signal
40
01Crafting a quality signal
41
01Crafting a quality signal
• Avoid presentation bias • Store outside the index
42
01Crafting a quality signal
<fieldType name="listing_quality_file" keyField=“listing_id” defVal="0.5" stored="true" indexed="true" class="solr.ExternalFileField" valType=“float" />
43
01Crafting a quality signal
• Avoid presentation bias • Store outside the index • Bootstrap
44
01Crafting a quality signal
45
01
Freshness is fun
We believe fun should be part of everything we do.
46
01Freshness is fun
47
01Freshness is fun
• Diversify by seller
48
01Freshness is funpublic SearchResults diversify(SearchResults results) { SearchResults diversifiedResults; int nextWindow = diversityOptions.getWindow(); do { diversityOptions.window = nextWindow; diversifiedResults = shopDiversifier.diversify(results); DiversityStats shopDiversity = diversifiedResults.docs.stream() .collect(DiversityStatsCalculator.collector(ListingDoc::getShopId)).getStats(); // if the results are sufficiently diverse, we're done if (shopDiversity.getDiversityIndex() <= diversityOptions.progressive.getTargetDiversityIndex()) { break; } // otherwise, broaden the window and re-‐try nextWindow = Math.min( diversifiedResults.totalCount, Math.min(diversityOptions.getMaxWindow(), diversityOptions.window * 2) ); } while (diversityOptions.window < nextWindow); return diversifiedResults; }
49
01Freshness is fun
. . .
// if the list is sufficiently diverse, we're done if (shopDiversity.getDiversityIndex() <= diversityOptions.progressive.getTargetDiversityIndex()) { break; } // otherwise, broaden the window and re-‐try nextWindow = Math.min( diversifiedResults.totalCount, Math.min(diversityOptions.getMaxWindow(), diversityOptions.window * 2) );
. . .
50
01Freshness is fun
51
01Freshness is fun
52
01Freshness is fun
• Diversify by seller • Recency boost
53
01Freshness is fun
The inverse of the time elapsed between now and listing creation
54
01Freshness is fun
55
01
Evaluating for stability
We plan and build for the long term.
56
01Evaluating for stability
• Replayer • CL tool • Uses sampled request logs to “replay” real traffic • Accepts target hosts, duration, query rate • Programmatically filters or alters requests • Provides realistic stats on average/worst-case impact
57
01Evaluating for stability
58
01Evaluating for stability
• RankDelta • Web UI • Allows user to specify query set, hosts and thrift params
in PHP code • Provides high-level statistics about the results • Plus full result set deep-dive
59
01Evaluating for stability
60
01Evaluating for stability
• In an ideal world…
61
01Evaluating for stability
• In an ideal world… • Making trade-offs
62
01
Communicating clearly
We are a mindful, transparent, and
humane business.
63
01Communicating clearly
64
01Communicating clearly
• Focus on the constants
65
01Communicating clearly
• Focus on the constants • Provide a feedback loop
66
01Communicating clearly
67
01Takeaways
• Minor changes to the default scoring can be powerful • Handle quality contextually • Conscious diversity serves both searcher & searchee • Invest in a feedback loop on ranking changes • Be honest but keep it consistent
68
02
@fioroco fiona.io [email protected]
codeascraft.com etsy.com/careers