berlin buzzwords 2013 - faceting analyzed fields with some sprinkles of probability theory
TRANSCRIPT
Faceting analyzed fields with some sprinkles of
probability theoryconjures trending topic analysis and other
interesting insights
Boaz LeskesElasticsearch
@bleskes
work done forBuzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference referencetopic
© Buzzcapture
topic reference≠
topic reference
P (w|T ) = kDt|w 2 DtkkDtk
topic reference
P (w|T ) = kDt|w 2 DtkkDtk
P (w|T ) = kDt|w 2 DtkkDtk
brown
dog
fox
quick
2 5 10 12
5 6 12 13
2
5
6
10
12
13
brown
dog
fox
quick
In our index.
• Terms = 12GB
• “Arrows” = 41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter: { regex: "^#.*", frequency: { min: 10 } } } }}
Drop terms which occur too little
Drop docs with too many terms
reference referencetopic
© Buzzcapture
iculture 10,122floor 8,998cover 6,874toy 4,402ground 3,841
4.0 7,8784.1 4,292rtacties 4,078jelly 2,905bean 2,857